Hadoop and NGS data processing hackathon III

The hackathon is a technical meeting of the WG3 parallel computing task force.

Local organisers: Luca Pireddu, Katia Brigaglia, Cinzia Sardu, Emanuela Falqui.

Contacts: you can reach the local organisers via e-mail at


The hackathon is now finished. Thanks to all who participated.


The large volumes of data generated by modern sequencing technologies present significant challenges in their manipulation and analysis. In addition, the continuous technological advances are lowering sequencing costs, thus allowing larger data sets to be collected and new types of applications to be envisioned. The bioinformatics community is adopting novel computing technologies to cope with the challenges set forth by complex data integration, management, storage and processing tasks.

This hackathon will give the chance to experts in the field to work together, exchange ideas and implement or test prototype solutions to problems of mutual interest. The outcome of this event will be a report describing the results produced by any research, experimental, or design work held at the event, and a reference to any source code that was produced, which must be released to the public.

The hackathon is centered on problems in the high-throughput sequencing domain. Within this realm, the topics of interest include, but are not limited to:

  • cloud computing
  • data/sample tracking; recording operations performed on data
  • dataset and storage management
  • distributed bioinformatics applications
  • distributed cloud databases
  • distributed data processing frameworks, such as Hadoop
  • high-level distributed programming (e.g., Pig, Hive)
  • process automation
  • workflow management; high-level programming

The event is aimed at bioinformaticians and software developers who are using or planning to use Hadoop or similar cloud computing frameworks.

What is a hackathon?

A hackathon is an event in which computer programmers and others involved in software development […] collaborate intensively on software projects. (Wikipedia).

In short, the way SeqAhead hackathons are organized is as follows. Prior to the event, participants contribute ideas for “Tasks” to be tackled at the hackathon. Tasks are items of work defined such that:

  • something useful could be delivered in the time span available (usually two days);
  • they should be of interest to more people than just the author.

At the event, the first thing to do will consist in everyone expressing their interest for whichever tasks they wish to work on. Then we enter the following loop:

  1) Pick a task that interests you;
  2) Approach others who are also interested in the same task;
  3) Create a wiki page for the task (use it to document your work results);
  4) Work together on the task, produce something;
UNTIL true == false;

What to produce?

Whatever the task you decide to work on, produce something concrete.

  • Code: produce a patch for a project or commit new code to our own projects.
  • A report: if you're doing performance testing for instance or dealing with a problem that is too big to solve in a couple of days, produce a report that details your results and/or your longer-term plan or a design for a solution.
  • Documentation, howtos, recipes; i.e. some of the tasks are to produce instructions for performing certain task.

Hackathon tasks

We've started brainstorming. Please have a look and contribute to the Tasks page.

Date and Location

The hackathon will be held on June 4th and 5th, 2013, at CRS4, Pula (CA), Italy.


The event is open to everyone. Please contact the local organizers if you're interested.

Getting here

See the travel page.



Time June 4th June 5th
9:15 Initialization Plug-in and hack
10:30 Coffee break Coffee break
11:00 Hack Hack
12:30 Lunch Lunch
13:30 Hack Hack
16:00 Coffee break Coffee break
16:30 Hack Wrap-up (write reports, share code, etc.)
18:00 Unplug Unplug
18:15 shuttle to hotel shuttle to hotel
20:30 Event dinner

COST reimbursement

For those who qualify, have registered for the event through the local organizers, and attend all days of the hackathon, COST will reimburse you for some of your expenses. The invitation you receive should contain a link to a memo that explains everything in detail. In short, you should be covered for:

  • 3 nights x €120 for accommodations and breakfast
  • 3 meals x €20 (plus you will be invited to the event dinner)
  • flights
  • local transport, with some limitations

If you're attending both SeqAhead Sardinia events, hackathon and workshop, you can have a maximum of 4 nights reimbursed (not 3 + 2) and you can only claim your travel expenses once.

These figures are provided only for informational purposes only. Please verify the figures by checking the COST system or contacting the chair of the COST Action BM1006, which funds this event.


We have a dedicated OpenStack cluster at our disposal. See the cluster page.

Please note that the cluster will be wiped clean after the event.


This hackathon is funded by the SeqAhead COST Action BM1006: Next Generation Sequencing Data Analysis Network with local support from CRS4 and Sardegna Ricerche.

Cloud computing resources were provided by CSC - IT Center for Science and the Finnish ELIXIR node.



Last modified: 2013/06/13 18:06 by Luca Pireddu
DokuWikiRSS Feed