Hello Friend

Hi! 👋 I’m a Genomic Science student at ENES Juriquilla, currently in my fourth semester. My academic interests focus on cellular senescence, cell differentiation, and gene regulation—fields that captivate me as I explore how cells age, evolve, and specialize. I’m also actively involved in bioinformatics, assisting in the development of the VieRnes de Bioinformática en LIIGH webpage 🌐.

As a pseudoprogrammer, I enjoy working with R, Python, and C++, using these languages to dive into genomic data analysis and tackle computational biology challenges. My journey in genomics has allowed me to explore RNA sequencing, and I’m eager to expand my skills in single-cell analysis, ATAC-seq, and Hi-C technologies.

While I have a special fondness for rabbits 🐇, in the lab, my favorite model organism is the mouse (Mus musculus) 🐁, which plays a crucial role in advancing our understanding of mammalian biology (though I haven’t worked with it yet).

Missions

  • ✅ RNAseq
  • 🔴 Single Cell
  • 🔴 ATACseq
  • 🔴 Hi-C

Let everything happen to you, beauty and terror, just keep going, no feeling is final
Rainer Maria Rilke

Day 2: VEP and more bcftools analyses

1. Thinking about our experiment



bcftools isec -C SRR445716.vcf.gz SRR445715.vcf.gz \>\
present_in_IMW004_absent_in_CEN.PK113-7D.txt

chrI    244 C   CT  10
chrI    675 A   G   10
chrI    1152    T   G   10
chrI    1397    A   G   10
chrI    1428    T   C   10
chrI    1757    G   T   10
chrI    2002    G   T   10
chrI    2029    T   C   10
chrI    2406    A   C   10
chrI    12227   C   T   10
About:   Create intersections, unions and complements of VCF files.
Usage:   bcftools isec [options] <A.vcf.gz> <B.vcf.gz> [...]

Options:
    -c, --collapse STRING          Treat as identical records with <snps|indels|both|all|some|none>, see man page for details [none]
    -C, --complement               Output positions present only in the first file but missing in the others
    -e, --exclude EXPR             Exclude sites for which the expression is true
    -f, --apply-filters LIST       Require at least one of the listed FILTER strings (e.g. "PASS,.")
    -i, --include EXPR             Include only sites for which the expression is true
        --no-version               Do not append version and command line to the header
    -n, --nfiles [+-=~]INT         Output positions present in this many (=), this many or more (+), this many or fewer (-), the exact (~) files
    -o, --output FILE              Write output to a file [standard output]
    -O, --output-type u|b|v|z[0-9] u/b: un/compressed BCF, v/z: un/compressed VCF, 0-9: compression level [v]
    -p, --prefix DIR               If given, subset each of the input files accordingly, see also -w
    -r, --regions REGION           Restrict to comma-separated list of regions
    -R, --regions-file FILE        Restrict to regions listed in a file
        --regions-overlap 0|1|2    Include if POS in the region (0), record overlaps (1), variant overlaps (2) [1]
    -t, --targets REGION           Similar to -r but streams rather than index-jumps
    -T, --targets-file FILE        Similar to -R but streams rather than index-jumps
        --targets-overlap 0|1|2    Include if POS in the region (0), record overlaps (1), variant overlaps (2) [0]
        --threads INT              Use multithreading with <int> worker threads [0]
    -w, --write LIST               List of files to write with -p given as 1-based indexes. By default, all files are written

Examples:
   # Create intersection and complements of two sets saving the output in dir/*
   bcftools isec A.vcf.gz B.vcf.gz -p dir

   # Filter sites in A and B (but not in C) and create intersection
   bcftools isec -e'MAF<0.01' -i'dbSNP=1' -e - A.vcf.gz B.vcf.gz C.vcf.gz -p dir

   # Extract and write records from A shared by both A and B using exact allele match
   bcftools isec A.vcf.gz B.vcf.gz -p dir -n =2 -w 1

   # Extract and write records from C found in A and C but not in B
   bcftools isec A.vcf.gz B.vcf.gz C.vcf.gz -p dir -n~101 -w 3

   # Extract records private to A or B comparing by position only
   bcftools isec A.vcf.gz B.vcf.gz -p dir -n -1 -c all

Question 17: Can you think of a way to obtain a list of candidates that may underlie the ability of these strains to grow on lactate? Hint: You can assume that variants shared by both IMW004 and IMW005 are likely to have arisen before the start of the experiment (i.e., from the unsequenced initial jen1 delta strain), and therefore are not biologically interesting. How many variants (unfiltered) are in IMW004 that are not shared by any other strain?

My guess


bcftools isec -C SRR445716.vcf.gz SRR445715.vcf.gz SRR445717.vcf.gz --output-type v -o IMW004_unique.vcf -w 1

bcftools isec -C SRR445717.vcf.gz SRR445715.vcf.gz SRR445716.vcf.gz --output-type v -o IMW005_unique.vcf -w 1

bgzip IMW004_unique.vcf

bgzip IMW005_unique.vcf

bcftools index IMW004_unique.vcf.gz  

bcftools index IMW005_unique.vcf.gz  

bcftools merge IMW004_unique.vcf.gz IMW005_unique.vcf.gz -o Lac_Uniques

Question 18: How many variants remain in IMW004 after filtering?


bcftools filter -i'QUAL>=30 && AD[*:1]>=50 && type="snp"' IMW004_unique.vcf.gz -o IMW004.flt.vcf

bcftools view -H IMW004.flt.vcf | wc -l

25

Read more

Jorge Alfredo Suazo-Victoria’s CV

Aside

Contact



Programming Languages

Expertise: R and Rstudio, Bash, AWK
Familiarity: Git/Github, python


Languages

Spanish - Native
English - B2 (TOEFL-IBT)

Disclaimer

Made with the R package pagedown.

Based on EveliaCoss/CV and is powered by nstrayer/cv.

Read more

Day One in: Variant calling and Ensembl VEP exercises - LCGEJ


cd /home/suaria/Documents/variant_calling/data

for sample in SRR445715 SRR445716 SRR445717; do
    samtools stats -r /home/suaria/Documents/variant_calling/data/S288C_ref.fa /home/suaria/Documents/variant_calling/data/${sample}.aligned.sorted.bam > ${sample}.stats
    plot-bamstats -r /home/suaria/Documents/variant_calling/data/other_files/S288C_ref.fa.gc -p ${sample}.graphs/ ${sample}.stats
done

Question 1: What is the percentage of mapped reads in all three files? Check the insert size, GC content, per-base sequence content and quality per cycle graphs. Do they all look reasonable?

The percentage of mapped reads in all three files is:

SRR445717

  1. Total Reads: 13,730,526

  2. Mapped Reads: 13,230,229 (96.4%)

Read more