1. Thinking about our experiment
bcftools isec -C SRR445716.vcf.gz SRR445715.vcf.gz \>\
present_in_IMW004_absent_in_CEN.PK113-7D.txt
chrI 244 C CT 10
chrI 675 A G 10
chrI 1152 T G 10
chrI 1397 A G 10
chrI 1428 T C 10
chrI 1757 G T 10
chrI 2002 G T 10
chrI 2029 T C 10
chrI 2406 A C 10
chrI 12227 C T 10
About: Create intersections, unions and complements of VCF files.
Usage: bcftools isec [options] <A.vcf.gz> <B.vcf.gz> [...]
Options:
-c, --collapse STRING Treat as identical records with <snps|indels|both|all|some|none>, see man page for details [none]
-C, --complement Output positions present only in the first file but missing in the others
-e, --exclude EXPR Exclude sites for which the expression is true
-f, --apply-filters LIST Require at least one of the listed FILTER strings (e.g. "PASS,.")
-i, --include EXPR Include only sites for which the expression is true
--no-version Do not append version and command line to the header
-n, --nfiles [+-=~]INT Output positions present in this many (=), this many or more (+), this many or fewer (-), the exact (~) files
-o, --output FILE Write output to a file [standard output]
-O, --output-type u|b|v|z[0-9] u/b: un/compressed BCF, v/z: un/compressed VCF, 0-9: compression level [v]
-p, --prefix DIR If given, subset each of the input files accordingly, see also -w
-r, --regions REGION Restrict to comma-separated list of regions
-R, --regions-file FILE Restrict to regions listed in a file
--regions-overlap 0|1|2 Include if POS in the region (0), record overlaps (1), variant overlaps (2) [1]
-t, --targets REGION Similar to -r but streams rather than index-jumps
-T, --targets-file FILE Similar to -R but streams rather than index-jumps
--targets-overlap 0|1|2 Include if POS in the region (0), record overlaps (1), variant overlaps (2) [0]
--threads INT Use multithreading with <int> worker threads [0]
-w, --write LIST List of files to write with -p given as 1-based indexes. By default, all files are written
Examples:
# Create intersection and complements of two sets saving the output in dir/*
bcftools isec A.vcf.gz B.vcf.gz -p dir
# Filter sites in A and B (but not in C) and create intersection
bcftools isec -e'MAF<0.01' -i'dbSNP=1' -e - A.vcf.gz B.vcf.gz C.vcf.gz -p dir
# Extract and write records from A shared by both A and B using exact allele match
bcftools isec A.vcf.gz B.vcf.gz -p dir -n =2 -w 1
# Extract and write records from C found in A and C but not in B
bcftools isec A.vcf.gz B.vcf.gz C.vcf.gz -p dir -n~101 -w 3
# Extract records private to A or B comparing by position only
bcftools isec A.vcf.gz B.vcf.gz -p dir -n -1 -c all
Question 17: Can you think of a way to obtain a list of candidates that may underlie the ability of these strains to grow on lactate? Hint: You can assume that variants shared by both IMW004 and IMW005 are likely to have arisen before the start of the experiment (i.e., from the unsequenced initial jen1 delta strain), and therefore are not biologically interesting. How many variants (unfiltered) are in IMW004 that are not shared by any other strain?
My guess
bcftools isec -C SRR445716.vcf.gz SRR445715.vcf.gz SRR445717.vcf.gz --output-type v -o IMW004_unique.vcf -w 1
bcftools isec -C SRR445717.vcf.gz SRR445715.vcf.gz SRR445716.vcf.gz --output-type v -o IMW005_unique.vcf -w 1
bgzip IMW004_unique.vcf
bgzip IMW005_unique.vcf
bcftools index IMW004_unique.vcf.gz
bcftools index IMW005_unique.vcf.gz
bcftools merge IMW004_unique.vcf.gz IMW005_unique.vcf.gz -o Lac_Uniques
Question 18: How many variants remain in IMW004 after filtering?
bcftools filter -i'QUAL>=30 && AD[*:1]>=50 && type="snp"' IMW004_unique.vcf.gz -o IMW004.flt.vcf
bcftools view -H IMW004.flt.vcf | wc -l
25
