The objective of this group is to provide timely alerts on new technology developments in next generation sequencing. The members read papers about latest technology, applications, and things that are coming, and tell rest of the action' s members what is the latest fashion.
A description of tasks, deliverables, etc. can be found here wg1_task_description.ppt
Purpose of the initial technology report will be
to come soon…
Purpose of the WG1 HTS Library is to provide summaries of publications, tech papers, conferences etc. in the field of HTS.
Authors: Benjamin A Flusberg, Dale R Webster, Jessica H Lee, Kevin J Travers, Eric C Olivares, Tyson A Clark, Jonas Korlach & Stephen W Turner
Title: Direct detection of DNA methylation during single-molecule, real-time sequencing
Journal: Nature Methods
Year/Issue: 2010 Vol 7 No 6
Summary: This paper describes the first application of DNA methylation detection using Pacific BioScience SMRT sequencing. They show that they are able to detect directly both mA and mC while sequencing. A breakthrough technology.
Authors: Clark TA, Murray IA, Morgan RD, Kislyuk AO, Spittle KE, Boitano M, Fomenkov A, Roberts RJ, Korlach J.
Title: Characterization of DNA methyltransferase specificities using single-molecule, real-time DNA sequencing.
Journal: Nucleic Acids Res.
Year/Issue: 2012 Feb;40(4):e29. Epub 2011 Dec 7.
Summary: This paper describes the first detection of N4-methylcytosine during sequencing in addition to N6-methyladenine and 5-methylcytosine using SMRT Pacific Biosciences.
Authors: Chavez L, Jozefczuk J, Grimm C, Dietrich J, Timmermann B, Lehrach H, Herwig R*, Adjaye J*. *equal contribution
Title: Computational analysis of genome-wide DNA methylation during the differentiation of human embryonic stem cells along the endodermal lineage.
Journal: Genome Research
Year/Issue: 2010 Oct;20(10):1441-50. Epub 2010 Aug 27.
Summary: This paper describes a software pipeline, called MEDIPS, for genome-wide methylation studies with the MeDIP-seq (methylated DNA immunoprecipitation followed by sequencing) approach. Core of the pipeline is a newly developed normalization method for MeDIP-Seq data. The rational behind the method is based on the concept of coupling factors addressed by the BATMAN method of Down et al., 2008. Based on a specific distance function for calculating coupling factors, the auhtors estimated in genomic windows the dependency between total CpG density and MeDIP-Seq signals for the low range of coupling factors. MEDIPS weights the MeDIP-Seq signals with respect to the estimated coupling factor dependent normalization parameters with a linear model. In the paper the authors show 0.83 correlation of the MEDIPS normalized MeDIP-seq data to benchmark data generated with bisulfite sequencing. Furthermore, they applied the computational approach to the analysis of genome-wide differential methylation in human embryonic stem cells (hESCs) in contrast to differentiated stem stells to definite endoderm.
Authors: Salzberg SL, Phillippy AM, Zimin A, Puiu D, Magoc T, Koren S, Treangen TJ, Schatz MC, Delcher AL, Roberts M, Marçais G, Pop M, Yorke JA.
Title: GAGE: a critical evaluation of genome assemblies and assembly algorithms
Journal: Genome Research
Year/Issue: 2012 Jan 12
Summary: This paper compares the performance of various de novo assemblers on real short reads sequencing datasets with the goal to define a Genome Assembly Gold standard Evaluation (GAGE). Four different genomes were compared each sequenced with 2 or 3 libraries of various inserts sizes (small 155-400bp, medium 2280-4000bp, and large 8-35kbp) using 8 different assemblers (ABySS, Allpaths-LG, Bambus2, CABOG, MSR-CA, SGA, SOAPdenovo, Velvet). The results show 3 main conclusions: 1) the quality of the data is more important than the assembler. Thus correcting the reads is of crucial importance, something that Allpath-LG does very well. 2) the degree of contiguity of an assembly varies enormously not only among assemblers, but also among the target genomes. 3) the correctness of the assemblies varies widely and is not correlated with the statistics on contiguity. As a criticism, one can argue that they did not optimize the parameters for some assemblers, like the kmer value for Velvet, SOAPdenovo, ABySS and MRS-CA. E.g., using a single kmer of 31 only when the reads are much longer (101bp or more), is really disadvantageous for those assemblers. Whereas the datasets were chosen to fit the requirements of Allpath-LG in order to be able to compare it to the others.
Additional links: Suppl. material http://genome.cshlp.org/content/early/2012/01/12/gr.131383.111/suppl/DC1; Data used by the authors http://gage.cbcb.umd.edu/data
Authors: Qiong-Yi Zhao, Yi Wang, Yi-Meng Kong, Da Luo, Xuan Li, Pei Hao
Title: Optimizing de novo transcriptome assembly from short-read RNA-Seq data: a comparative study
Journal: BMC Bioinformatics
Year/Issue: 2011 Dec 14, 12(Suppl 14):S2
Summary: This paper compares the performance of various de novo RNA-seq assemblers on real public short reads sequencing data sets. Three different transcriptomes were compared: Drosophila melanogaster PE76bp Illumina, Schizosaccharomyces pombe 68PE strand-specific Illumina, and Camellia sinensis 75PE Illumina. The software compared were four single k-mer assemblers (SK: SOAPdenovo, ABySS, Oases and Trinity) and three multiple k-mer methods (MK: SOAPdenovo-MK, trans-ABySS and Oases-MK). Well written and detailed, the article shows that Trinity has an edge on the other assemblers, but is much slower. Oases-MK is a good compromise when time is limited, but oases can require more RAM.
Additional links: Suppl. material http://www.biomedcentral.com/1471-2105/12/S14/S2/additional
Authors: Liliana Florea, Alexander Souvorov, Theodore S. Kalbfleisch, Steven L. Salzberg
Title: Genome Assembly Has a Major Impact on Gene Content: A Comparison of Annotation in Two Bos Taurus Assemblies
Journal: PLoS One
Year/Issue: 2011, 6(6),e21400
Summary: This paper shows a fact that seems obvious when you think about it, but was never demonstrated: the quality of a genome assembly affects the quality of its gene annotation. By comparing 2 assemblies of the same Bos taurus genome obtained with identical starting data sets, but with improved assembly program (Celera WGS). Clearly the second assembly looks better in terms of classical statistics (N50, Contig length, Scaffold length, Nr of gaps etc…), but interestingly the annotation (done with the same pipeline) highlighted 16% of structural variations. Those were sometimes difficult to connect between both annotations and their quality was sometimes better with the first assembly than with the second. With the increasing amount of draft genomes published, it is perhaps worth investing efforts in improving the finishing of those genomes or develop tools to assess and measure the accuracy of a genome assembly.
Authors: Heng Li
Title: Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly
Year/Issue: 2012 May 7 (ahead of pub).
Summary: The author present a new way to store forward and reverse complement DNA sequence in a FM-index. This allows to develop a new de novo assembler called “fermi” achieving similar quality than other assemblers. It is possible to call SNPs and short INDELs from this assembly with INDELs calling outperforming current methods. The other interest is that assembled unitigs represents a lossless reduced representation of reads, preserving small variants and copy numbers, revealing a possible new way to compress reads which would be non-redundant and smaller in size. However the computational cost is prohibitive.
Authors: Louis du Plessis, Nives Skunca and Christophe Dessimoz
Title: The what, where, how and why of gene ontology –a primer for bioinformaticians
Journal: Briefings in Bioinformatics
Year/Issue: 2011, 12, 723-735
Summary: Nice review on Gene Ontology describing the advantages and pitfalls of using these classification tools.
The 16th Human Genome Meeting (HGM) has the main topic of “Genetics and Genomics in Personalised Medicine”. The focus of this meeting is consistent with one of the aims of the Human Genome Organization which is to foster the integration of genomic sciences in biology and medicine towards improving human health. The power of our current sequencing and genotyping technologies and their attendant analytical tools is providing remarkable precision and completeness in our understanding of the genetic causes of disease. Goal of this meeting is to explore the impact of next generation genomic approaches on medicine and health.
Participants reports: to come soon
The European Conference on Computational Biology is the key European computational biology event in 2012 uniting scientists working in a broad range of disciplines, including bioinformatics, computational biology, biology, medicine, and systems biology. One of its featured research areas will be sequencing technology and personalized medicine.
Participants reports: to come soon
Purpose of this section is to provide additional materials for example from SEQAHEAD meetings, presentations, reports etc.
This is a link to Zotero