Tuesday, November 4, 2014

Preview and preprint: CNVkit, copy number detection for targeted sequencing

I've posted a preprint of the CNVkit manuscript on bioRxiv. If you think this software or method might suit your needs, please take a look and let me know what you think of it!

What is CNVkit?

CNVkit is a software toolkit for detecting and visualizing germline copy number variants and somatic copy number alterations in targeted or whole-exome DNA sequencing data. (Source code | Documentation)

The method implemented in CNVkit takes advantage of the sparse, nonspecifically captured off-target reads present in hybrid capture sequencing output to supplement on-target read depths. The program also uses a series of normalizations and bias corrections so it can be used with or without a normal-sample copy number reference to accurately call CNVs. The overall resolution and copy ratio values are very close to those obtained with 180K array CGH.

We have used CNVkit at UCSF to assess clinical samples for several research projects over the past year.

Putting it in your pipeline

See the Quick Start page for basic usage. The software package is modular so, in addition to the simple "batch" calling style, the underlying commands can be run directly to support your workflow.

I've attempted to make CNVkit compatible with other software and easy to integrate into sequencing analysis pipelines. The following are currently supported or in development:
  • bcbio-nextgen -- in progress
  • Galaxy -- a basic wrapper is in the development Tool Shed
  • THetA2 -- CNVkit segmentation output can be used directly as input to THetA
  • Integrative Genomics Viewer -- export segments as SEG, then load in IGV to view tracks as a heatmap
  • BioDiscovery Nexus Copy Number -- export files to the Nexus "basic" format
  • Java TreeView -- export CDT or .jtv tabular files, then load in JTV for a microarray-like viewing experience
If you would like to see CNVkit play nicely with another existing program, and/or support another standard output format, or just want some help getting set up, please let me know on SeqAnswers.