Fork me on GitHub
Newest version released July, 1st 2015 is Jellyfish 2.2.3.

Jellyfish mer counter

What is it?

Jellyfish is a tool for fast, memory-efficient counting of k-mers in DNA. A k-mer is a substring of length k, and counting the occurrences of all such substrings is a central step in many analyses of DNA sequence. JELLYFISH can count k-mers quickly by using an efficient encoding of a hash table and by exploiting the "compare-and-swap" CPU instruction to increase parallelism.

Jellyfish is a command-line program that reads FASTA and multi-FASTA files containing DNA sequences. It outputs its k-mer counts in an binary format, which can be translated into a human-readable text format using the "jellyfish dump" command. See the documentation below for more details.

Jellyfish is distributed as source code under the GPL license. Jellyfish is developed on Linux 64-bit (x86_64). It requires gcc version 4.4 or newer to compiles. It is reported to compile on Linux with the clang compiler version 3.0, MacOS X (Intel 64 bit) with gcc version 4.7 and Microsoft Windows 7 with cygwin and gcc.

The current version is version 2.0. The older version 1.1 is still available from the CBCB group at the University of Maryland. The current version does not have any limitation on the size of k-mers, unlike version 1.1 which was limited to k <= 31. The support for Quake has been dropped in the new version, use version 1.1 with Quake. The User guide gives some information on how to use Jellyfish and the differences between the new and old versions.

Contact

For any questions or comments, contact Guillaume Marçais or Carl Kingsford .

Source

Compilation

The following sequence should be enough to compile: ./configure make sudo make install

On RedHat 5 and 6 (or CentOS), the default compiler is too old. One needs to install version 4.4 with yum install gcc44-c++ and then run configure like this: ./configure CXX=g++44

To install in a different directory than the default /usr/local, pass the --prefix switch to configure. For example, to install into one's home directory, do: ./configure --prefix=$HOME

On MacOS, a recent version of gcc can be install with MacPorts. Install with: sudo port install gcc49 sudo port select --set gcc mp-gcc49 The first command can take a while to run. The last command is optional and CXX=g++49 can be passed to ./configure instead, like above.

Bindings to Ruby, Python and Perl

By using one of the switch '--enable-ruby-binding', '--enable-python-binding' or '--enable-perl-binding', one triggers the compilation of bindings to scripting languages. This allows to query the output of Jellyfish directly from these languages.

More documentation and examples are available on the github page. Note that, if one uses the distribution on this page, SWIG and the '--enable-swig' switch are NOT necessary to build the bindings (which are necessary if building from the github tree).

Change log

Version 2.2.3

  • Bug fix: ignore quality filter for fasta file

Version 2.2.0

  • SWIG binding for Ruby, Python and Perl available.

Version 2.1.4

  • Added a SWIG directory with bindings to Ruby, Python and Perl. Still experimentatl
  • Added an example directory on how to use the library
  • Removed many unused files from previous version (1.x)
  • Various bug fixes

Version 2.1.3 (Bug fix)

  • Fixed compilation problem on CentOS.

Version 2.1.2

  • Added an interactive mode to the query subcommand. It enables spawning a 'jellyfish query -i' from a script and query the database
  • Speed improvement to the query subcommand. The binary search is guided by the hash ordering

Version 2.1.1 (Bug fix)

  • Fixed compilation issues with gcc 4.4.
  • Fixed testing issues with gcc 4.8

Version 2.1.0

  • Added stats subcommand, similar to existing subcommand in version 1.x
  • Added filtering of input bases by their quality value. Similar feature existing in version 1.x but the command line switches are not compatible