Credits
vcfkit exists because of decades of work by others.
htslib and bcftools
Section titled “htslib and bcftools”htslib and bcftools are the reference implementations for VCF/BCF processing. Created and maintained by the Wellcome Sanger Institute; primary authorship by Heng Li (original author), Petr Danecek (bcftools lead), and hundreds of contributors over 15+ years.
vcfkit’s normalization behavior was developed by reading bcftools source. Differential tests validate against bcftools output — if vcfkit and bcftools diverge, vcfkit is wrong by default.
License: MIT/BSD
noodles
Section titled “noodles”noodles by Michael Macias — the pure-Rust VCF, BCF, FASTA, and chain file I/O primitives that vcfkit builds on. Without noodles this project would not exist in its current form.
License: MIT
Tan, Abecasis, Kang 2015
Section titled “Tan, Abecasis, Kang 2015”Tan, Abecasis, Kang 2015 — “Unified
representation of genetic variants,” Bioinformatics 31(13):2202–2204. The
normalization algorithm implemented in vcfkit normalize.
UCSC Genome Browser
Section titled “UCSC Genome Browser”UCSC Genome Browser — chain files and reference FASTAs
used by vcfkit liftover.
1000 Genomes Project
Section titled “1000 Genomes Project”The 1000 Genomes Project phase 3 chr22 dataset is used for differential testing and benchmarking. Data available at the EBI FTP.
AI assistance
Section titled “AI assistance”Portions of this codebase were written with assistance from Claude (Anthropic). The AI generated code; a human verified correctness, reviewed diffs, and owns the result.
Specifically: Claude wrote the majority of the Rust implementation under direction, including the normalize algorithm, the filter expression parser, and the WASM wrappers. Every non-trivial change was validated by differential tests against bcftools or by explicit reasoning about correctness. All algorithm implementations (Tan 2015 left-alignment, multi-allelic splitting, chain file parsing) were derived from reading bcftools source and the referenced papers — not from AI hallucination.
AI assistance is disclosed here because transparency about how code was produced is the right thing to do, especially for tooling used in research.
vcfkit’s own contribution
Section titled “vcfkit’s own contribution”- Modern CLI UX (error messages with what/where/how-to-fix, progress bars)
- Single-binary distribution via cargo install or pre-built release
- Fast paths via raw-line parsing (~4× faster than bcftools on SNP-heavy VCFs)
- Browser-native WASM port (your VCF stays in your tab)
- Natural-language filter queries (Phase 3, not yet shipped)
vcfkit does not replace bcftools. It is a fast companion for three operations.