[vcfkit]

Fast VCF operations. Validated against bcftools. Single binary.

Three operations every VCF pipeline needs — normalize, liftover, filter — measurably faster than bcftools, in one binary with no htslib.

cargo install vcfkit-cli

or see all install options →

Try it

What it is

vcfkit does three things every VCF pipeline needs — normalize, liftover, and filter — rewritten in Rust for speed and correctness. One static binary with no htslib dependency, measurably faster than bcftools on hot paths, and validated against it on real 1000 Genomes data.

It doesn't replace bcftools. It does three operations well, with an optional natural-language filter that translates plain English to a deterministic expression. The same core runs in your browser via WebAssembly — nothing in the demo above uploads anywhere.

Operations

normalize
Left-align indels and split multi-allelic sites. Validates against a reference genome. A fast path handles biallelic SNPs/MNPs without full parsing (~80% of typical VCFs).
vcfkit normalize -r ref.fa input.vcf
liftover
Convert between genome builds — hg19, hg38, T2T-CHM13 — using UCSC chain files. Handles strand flips and b37/UCSC contig name mismatches.
vcfkit liftover --chain hg19ToHg38.chain.gz input.vcf
filter
Select variants with expressions over INFO, FORMAT, CHROM, POS, QUAL, and FILTER fields. Or describe the filter in plain English with --ask and review the expression before it runs.
vcfkit filter 'INFO/AF < 0.01 && FILTER == "PASS"' input.vcf

Benchmarks

faster than bcftools normalize
(biallelic SNP/MNP fast path)
faster than bcftools filter
(INFO/AF expression on 1.1M variants)

Measured on 1000 Genomes chr22 · 1,103,547 variants · macOS aarch64 · bcftools 1.23.1. Standard normalize (full parse path) is 0.43× bcftools — the fast path is the 4× case. Full methodology →

Ask in plain English

Describe the filter you want. vcfkit translates it to a deterministic expression using Claude, shows you what it will run, and asks before running. Your VCF data never leaves your machine — only the header schema goes to the API.

$ vcfkit filter --ask \
    "common variants in Europeans but rare in Africans" \
    input.vcf

Expression: INFO/EUR_AF > 0.05 && INFO/AFR_AF < 0.01
Confidence: 95%

Run this filter? [Y/n/edit]

Requires ANTHROPIC_API_KEY. Translations below 50% confidence require explicit review. Setup →

Correctness

Every operation is validated against bcftools in differential tests on real 1000 Genomes chr22 data. Tests run nightly in CI. Known differences are documented.

  • 100+ unit and integration tests
  • Differential tests against bcftools on 1.1M real variants
  • WASM parity tests — browser and CLI produce identical output
  • Nightly CI against bcftools on 1000 Genomes data

Nightly differential tests    Known differences

Not validated for clinical use. vcfkit is research and pipeline tooling only.

Install

# Cargo
cargo install vcfkit-cli

# Prebuilt binary (macOS, Linux, Windows)
curl -fsSL https://vcfkit.dev/install.sh | sh

Full install instructions and shell completions →

Get in touch

Questions, bug reports, and feedback welcome.