Skip to content

filter

Keep variants matching an expression. The fast path reads raw VCF lines and only parses the fields referenced by the expression — matching records are written as raw bytes without re-serialization.

Terminal window
vcfkit filter [OPTIONS] --expression <EXPR> [INPUT]
FlagDescription
-e, --expression <EXPR>Filter expression (required)
-o, --output <FILE>Output file (default: stdout)
-v, --invertInvert: keep records that do NOT match
-q, --quietSuppress progress bar and stats
Terminal window
# Rare variants
vcfkit filter -e "INFO/AF < 0.01" input.vcf
# High quality PASS variants
vcfkit filter -e "QUAL > 30 && FILTER == 'PASS'" input.vcf
# Substring match (contains)
vcfkit filter -e "INFO/CSQ ~ 'missense'" input.vcf
# Non-PASS variants (inverted filter)
vcfkit filter -e "FILTER == 'PASS'" --invert input.vcf
# Chromosome + position range
vcfkit filter -e "CHROM == 'chr17' && POS >= 43044295 && POS <= 43125483" input.vcf
# Compound expression
vcfkit filter -e "INFO/AF < 0.05 && QUAL >= 50 && FILTER == 'PASS'" input.vcf > output.vcf
# From stdin
bcftools view input.bcf | vcfkit filter -e "INFO/DP > 10"
FieldTypeNotes
INFO/<key>Per-header typee.g., INFO/AF, INFO/DP, INFO/CSQ
FORMAT/<key>Per-header typeFirst sample only
CHROMStringe.g., 'chr1'
POSInteger1-based
QUALFloatMissing (.) evaluates to false
FILTERStringe.g., 'PASS'
OperatorMeaning
<, <=, >, >=, ==, !=Comparison
&&, ||, !Logical
~Substring match (contains)
!~Substring non-match
42 # integer
3.14 # float
'chr1' # string (single quotes)

Fields declared as Type=Float in the VCF header are parsed as f64 for numeric comparisons. Type=Integer as i64. Type=String (including FILTER) as string. A missing value (.) evaluates to false in all comparisons.

INFO fields with Number=A (one value per ALT allele) use any-element semantics: INFO/AF < 0.01 matches if any ALT allele has AF < 0.01.

INFO/AF=0.05,0.003 → INFO/AF < 0.01 matches (0.003 < 0.01)
INFO/AF=0.05,0.12 → INFO/AF < 0.01 does not match
Terminal window
# vcfkit
vcfkit filter -e "INFO/AF < 0.01 && FILTER == 'PASS'" input.vcf
# bcftools
bcftools view -i 'INFO/AF < 0.01 && FILTER == "PASS"' input.vcf

The expression syntax is similar. Key differences: vcfkit uses single quotes for string literals; bcftools uses double quotes.

On 1000 Genomes chr22 (1.1M records): 422ms vs bcftools 1,695ms (4.0× faster).

The fast path reads raw lines. For each line, it only parses the INFO fields referenced in the expression — skipping all other fields. Matching records are written as raw bytes. Non-matching records are discarded. The VCF header is parsed once with noodles to get INFO type metadata.