MaSuRCA, Genome assembler

The University of Maryland Assembly Group aims at creating the best possible software for whole genome assembly. We develop the MaSuRCA genome assembler. The MaSuRCA genome assembler can be used on assembly projects of all sizes, from bacteria genomes to mammalian genomes to large plant genomes. MaSuRCA has been used to assemble de novo a variety of genomes, sometimes improving on published genomes using added data, sometimes creating the first publicly available draft genome for the species.

Super-Reads

The super-reads technique aims at improving genome assembly by replacing many short reads with longer sequences, without losing any information.
A super-read is an extension of a sequencing read. Replacing reads by super-reads will improve many kinds of assemblies. While our assembler MaSuRCA uses super-reads, many other applications of super-reads are possible. Our software "masurca-superreads", part of the MaSuRCA distribution, converts Illumina paired-end reads into super-reads. Super-reads satisfy the following properties:

  • Each of the original reads is contained in a super-read.
  • Many of the original reads yield the same super-read. Using super-reads leads to vastly reduced dataset.

Super-reads can be used for large and small projects: