What is CNVkit?
CNVkit is a software toolkit for detecting and visualizing germline copy number variants and somatic copy number alterations in targeted or whole-exome DNA sequencing data. (Source code | Documentation)The method implemented in CNVkit takes advantage of the sparse, nonspecifically captured off-target reads present in hybrid capture sequencing output to supplement on-target read depths. The program also uses a series of normalizations and bias corrections so it can be used with or without a normal-sample copy number reference to accurately call CNVs. The overall resolution and copy ratio values are very close to those obtained with 180K array CGH.
We have used CNVkit at UCSF to assess clinical samples for several research projects over the past year.
Putting it in your pipeline
See the Quick Start page for basic usage. The software package is modular so, in addition to the simple "batch" calling style, the underlying commands can be run directly to support your workflow.I've attempted to make CNVkit compatible with other software and easy to integrate into sequencing analysis pipelines. The following are currently supported or in development:
- bcbio-nextgen -- in progress
- Galaxy -- a basic wrapper is in the development Tool Shed
- THetA2 -- CNVkit segmentation output can be used directly as input to THetA
- Integrative Genomics Viewer -- export segments as SEG, then load in IGV to view tracks as a heatmap
- BioDiscovery Nexus Copy Number -- export files to the Nexus "basic" format
- Java TreeView -- export CDT or .jtv tabular files, then load in JTV for a microarray-like viewing experience