Hadoop and NGS data processing hackathon II

The hackathon is a technical meeting of the WG3 parallel computing task force.

Organisers: Ola Spjuth, Aleksi Kallio, Eija Korpelainen.


The bioinformatics community is adopting novel cloud computing technologies to cope with the challenges set forth by complex data integration tasks and NGS data masses. The hackathon will focus on next challenges that cloud adoption poses: massively distributed data processing frameworks such as Hadoop, distributed cloud databases and distributed bioinformatics applications.

The event will continue from topics raised during the previous hackathon in March. In the March meeting we familiarised ourselves with the relevant questions of Hadoop based data processing and proceeded to benchmark different deployment scenarios, some bioinformatics tools and generic cloud computing tools. The next steps are: proceeding into bioinformatics specific computational challenges (mapping, variant calling, etc.) and discussing the mechanisms that allow easy deployment of new tools into a cloud platform (CloudBioLinux, VM images, Chef, etc.).

The event is aimed at bioinformaticians and software developers who are using or planning to use Hadoop or similar cloud computing frameworks. It will be organised around practical problem solving and group work.

Date & location

The hackathon will be held from Wednesday to Thursday 30.-31.5.2012, at UPPMAX, Uppsala, Sweden.

The Hackathon is arranged at Ångströmlaboratoriet ( in the room Mellanrummet. It is located on ground floor; enter via main entrance and proceed a few blocks ahead and it will be on the right hand side.

Coffee breaks are sponsored by UPPMAX ( and dinner is sponsored by the Swedish strategic research programme eSSENCE (


The hackathon is open to everyone interested, but please inform organisers about your participation before Monday 30.4. Only one person per COST partner can be reimbursed for travel and hotel costs.


The main content of the hackathon is working together with various tasks. See planned tasks page.


Wednesday 30.5.
9.30 Welcome and practical issues room Mellanrummet
9.50 Round of introductions room Mellanrummet
10.00 Working with tasks room Mellanrummet
10.00 Coffee room Mellanrummet
12.00 Lunch Cafeteria
13.00 Working with tasks room Mellanrummet
15.00 Coffee room Mellanrummet
18.00 Dinner Restaurant Plock (
Thursday 31.5.
9.00 Wrap-up of Wednesday room Mellanrummet
9.30 Working with tasks room Mellanrummet
10.00 Coffee room Mellanrummet
12.00 Lunch Cafeteria
13.00 Working with tasks room Mellanrummet
15.00 Coffee room Mellanrummet
~16.30 Wrap-up of Thursday room Mellanrummet
~17.00 Hackathon ends


Organisers will arrange the accommodation, but you have to pay it yourself. Those eligible to COST reimbursement can claim at maximum 2 nights.

The hotel is Hotel Uppsala (


Closest airport is Stockholm Arlanda. Participants need to make their own transportation arrangements.

For more information on local transportation, please see

IT facilities

Each participant needs to bring his or her own laptop (there will be enough power outlets for all). WLAN will be available.

During the hackathon we will have a dedicated Hadoop cluster consisting of approx. 40 HPC nodes, provided by the BioMedInfra people at CSC. There will be also a NFS server that can be used for file sharing. The cluster is running OpenNebula, where we have virtual machine instances that run Cloudera based Hadoop.

Please send your public SSH keys ( / in advance to get access to the cluster. And remember to set a password to your SSH keystore! For instructions, see:

Instructions for using the cluster


For instant messaging during and around the hackathon, we will use IRC. Channel is seqahead_hackathon and network is Freenode.

/join #seqahead_hackathon

For documentation we will use this Wiki (

Last modified: 2012/05/30 16:51 by Aleksi Kallio
DokuWikiRSS Feed