[vcfkit]
Fast VCF operations. Validated against bcftools. Single binary.
Three operations every VCF pipeline needs — normalize, liftover, filter — measurably faster than bcftools, in one binary with no htslib.
cargo install vcfkit-cli Try it
What it is
vcfkit does three things every VCF pipeline needs — normalize, liftover, and filter — rewritten in Rust for speed and correctness. One static binary with no htslib dependency, measurably faster than bcftools on hot paths, and validated against it on real 1000 Genomes data.
It doesn't replace bcftools. It does three operations well, with an optional natural-language filter that translates plain English to a deterministic expression. The same core runs in your browser via WebAssembly — nothing in the demo above uploads anywhere.
Operations
- normalize
- Left-align indels and split multi-allelic sites. Validates against a reference genome. A fast path handles biallelic SNPs/MNPs without full parsing (~80% of typical VCFs).
- liftover
- Convert between genome builds — hg19, hg38, T2T-CHM13 — using UCSC chain files. Handles strand flips and b37/UCSC contig name mismatches.
- filter
- Select variants with expressions over INFO, FORMAT, CHROM, POS, QUAL, and FILTER fields. Or describe the filter in plain English with
--askand review the expression before it runs.
vcfkit normalize -r ref.fa input.vcf vcfkit liftover --chain hg19ToHg38.chain.gz input.vcf vcfkit filter 'INFO/AF < 0.01 && FILTER == "PASS"' input.vcf Benchmarks
(biallelic SNP/MNP fast path)
(INFO/AF expression on 1.1M variants)
Measured on 1000 Genomes chr22 · 1,103,547 variants · macOS aarch64 · bcftools 1.23.1. Standard normalize (full parse path) is 0.43× bcftools — the fast path is the 4× case. Full methodology →
Ask in plain English
Describe the filter you want. vcfkit translates it to a deterministic expression using Claude, shows you what it will run, and asks before running. Your VCF data never leaves your machine — only the header schema goes to the API.
$ vcfkit filter --ask \
"common variants in Europeans but rare in Africans" \
input.vcf
Expression: INFO/EUR_AF > 0.05 && INFO/AFR_AF < 0.01
Confidence: 95%
Run this filter? [Y/n/edit]
Requires ANTHROPIC_API_KEY.
Translations below 50% confidence require explicit review.
Setup →
Correctness
Every operation is validated against bcftools in differential tests on real 1000 Genomes chr22 data. Tests run nightly in CI. Known differences are documented.
- 100+ unit and integration tests
- Differential tests against bcftools on 1.1M real variants
- WASM parity tests — browser and CLI produce identical output
- Nightly CI against bcftools on 1000 Genomes data
Nightly differential tests Known differences
Not validated for clinical use. vcfkit is research and pipeline tooling only.
Install
# Cargo
cargo install vcfkit-cli
# Prebuilt binary (macOS, Linux, Windows)
curl -fsSL https://vcfkit.dev/install.sh | sh Get in touch
Questions, bug reports, and feedback welcome.
- GitHub Issues — bug reports and feature requests
- GitHub Discussions — questions and ideas
- [email protected]