Benchmarks
Results
Section titled “Results”| Operation | vcfkit | bcftools | Speedup |
|---|---|---|---|
filter -e 'INFO/AF < 0.01' | 422 ms | 1,695 ms | 4.0× |
normalize --fast --no-split | 682 ms | 2,820 ms | 4.1× |
normalize (standard, with noodles) | 6,481 ms | 2,820 ms | 0.43× |
liftover | 6,713 ms | — | ~164K rec/s |
1000 Genomes chr22, 1,103,547 variants · macOS aarch64 · bcftools 1.23.1 · full methodology →
- Dataset: 1000 Genomes Project chr22 sites, extracted from the Phase 3 genotype VCF
- URL:
https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5b.20130502.genotypes.vcf.gz - Records: 1,103,547 variants
- Format: plain VCF (uncompressed, no BGZ indexing)
- Variants: predominantly biallelic SNPs + ~875 multi-allelic indel sites
Environment
Section titled “Environment”- Hardware: Apple M2 Pro (macOS aarch64)
- OS: macOS 14
- bcftools: 1.23.1
- Rust toolchain: stable 1.75
- Measured: 2026-04-19
hyperfine with 3 warm-up runs and 10 timed runs.
Commands measured
Section titled “Commands measured”# filterhyperfine \ 'vcfkit filter -e "INFO/AF < 0.01" chr22_sites.vcf' \ 'bcftools view -i "INFO/AF < 0.01" chr22_sites.vcf' \ --warmup 3 --runs 10
# normalize --fasthyperfine \ 'vcfkit normalize --fast --no-split -f hg19.fa chr22_sites.vcf' \ 'bcftools norm --no-version -c w -f hg19.fa chr22_sites.vcf' \ --warmup 3 --runs 10
# normalize (standard)hyperfine \ 'vcfkit normalize -f hg19.fa chr22_sites.vcf' \ --warmup 3 --runs 10
# liftover (no bcftools comparison — plugin unavailable)hyperfine \ 'vcfkit liftover -s hg19.fa -t hg38.fa -c hg19ToHg38.over.chain.gz chr22_sites.vcf' \ --warmup 3 --runs 5Notes on standard normalize
Section titled “Notes on standard normalize”Standard normalize (without --fast) is 2.3× slower than bcftools. This is
expected: the fast path is a raw-line loop; the standard path parses every record
through noodles, which is a pure-Rust parser without htslib’s C optimizations.
The fast path handles biallelic SNPs and MNPs. Multi-allelic records and indels
fall back to the standard noodles path regardless of --fast.
On typical variant call VCFs (mostly biallelic SNPs), --fast applies to the
vast majority of records and the 4× speedup is representative.
Reproduce
Section titled “Reproduce”# Download input datamkdir -p tests/real_worldcd tests/real_world
# Download chr22 genotypes + extract sites-onlyURL="https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ALL.chr22.phase3_shapeit2_mvncall_integrated_v5b.20130502.genotypes.vcf.gz"wget -q "$URL" -O chr22_genotypes.vcf.gzbcftools view -G chr22_genotypes.vcf.gz -O v -o chr22_sites.vcf
# Download reference FASTAwget -q "https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/human_g1k_v37.fasta.gz" \ -O human_g1k_v37.fasta.gzbgzip -d human_g1k_v37.fasta.gzsamtools faidx human_g1k_v37.fasta
cd ../..
# Run benchmarkscargo build --releasehyperfine \ './target/release/vcfkit filter -e "INFO/AF < 0.01" tests/real_world/chr22_sites.vcf' \ 'bcftools view -i "INFO/AF < 0.01" tests/real_world/chr22_sites.vcf' \ --warmup 3 --runs 10