CLUSTERING-BASED METHOD
(IonTorrent data)
BMP pipeline
Reference: Pylro, et al.. Data analysis for 16S microbial profiling from different benchtop sequencing platforms. Journal of Microbiological Methods, v. 107, p. 30-37, 2014. DOI: 10.1016/j.mimet.2014.08.018.
This example assumes reads in FASTQ format.
Along the BMP pipeline you will need the forward primer sequence (it is GGACTACNNGGGTNTCTAAT in the example below – change degenerated bases to “N”), and you will also need to prepare a FASTA file called “barcodes.fa” containing the barcodes that identify your samples (see example here). The FASTA label for each barcode should be a short name identifying the sample.
This page gives a complete pipeline to analyze 16S rRNA gene data. Of course, you should edit as needed for your reads and file locations (represented here as $PWD/).
From non-demultiplexed Ion Torrent .fastq files (remember to keep the barcode sequence):
To obtaind non-demultiplexed (raw data) .fastq files from the Ion Torrent server:
- Click on "Reanalyze"
- Then click on "Analysis settings"
- On "Barcode Set" select "None or RNA_Barcode_None"
1- Strip barcodes ("Ex" is a prefix for the read labels, can be anything you like) <<<UPARSE Scripts>>>
fastq_strip_barcode_relabel2.py $PWD/reads.fastq GGACTACNNGGGTNTCTAAT $PWD/barcodes.fa Ex > reads2.fastq
2 - Quality filtering, length truncate, and convert to FASTA <<<USING VSEARCH>>>
vsearch --fastx_filter $PWD/reads2.fastq --fastq_maxee 1.0 --fastq_trunclen 200 --fastaout reads.fa
3 - Dereplication <<<USING VSEARCH>>>
vsearch --derep_fulllength $PWD/reads.fa --output derep.fa --sizeout
4 - Abundance sort and discard singletons <<<USING VSEARCH>>>
​
vsearch --sortbysize $PWD/derep.fa --output sorted.fa --minsize 2
​
5 - OTU clustering <<<USING VSEARCH>>>
vsearch --cluster_size $PWD/sorted.fa --consout otus.fa --id 0.97
​
6 - Map reads back to OTU database <<<Using VSEARCH>>>
vsearch --usearch_global $PWD/reads.fa --db otus.fa --strand plus --id 0.97 --uc map.uc
7 - Assign taxonomy to OTUS using uclust method on QIIME (use the file “otus.fa” as input file)
assign_taxonomy.py -i $PWD/otus.fa -o output
​
8 - Align sequences on QIIME, using greengenes reference sequences (use the file “otus.fa” as input file)
align_seqs.py -i $PWD/otus.fa -o rep_set_align
9 - Filter alignments on QIIME
filter_alignment.py -i $PWD/otus_aligned.fasta -o filtered_alignment
10 - Make the reference tree on QIIME
make_phylogeny.py -i $PWD/otus_aligned_pfiltered.fasta -o rep_set.tre
11 - Convert UC to otu-table.txt <<< UPARSE PYTHON SCRIPT>>>
uc2otutab.py $PWD/map.uc > otu_table.txt
12 - Convert otu_table.txt to otu-table.biom, used by QIIME <<< BIOM SCRIPT>>>
​
biom convert -i $PWD/otu_table.txt -o otu_table.biom --table-type="OTU table" --to-json
​
13 - Add metadata (taxonomy) to OTU table
​
biom add-metadata -i $PWD/otu_table.biom -o otu_table_tax.biom --observation-metadata-fp $PWD/otus_tax_assignments.txt --observation-header OTUID,taxonomy,confidence --sc-separated taxonomy --float-fields confidence
​
14 - Check OTU Table on QIIME.
​
biom summarize-table -i $PWD/otu_table_tax.biom -o results_biom_table
​
15 - Run diversity analyses on QIIME (or any other analysis of your choice). The parameter “-e” is the sequencing depth to use for even sub-sampling and maximum rarefaction depth. You should review the output of the ‘biom summarize-table’ (step 14) command to decide on this value.
core_diversity_analyses.py -i $PWD/otu_table_tax.biom -m $PWD/mapping_file.txt -t $PWD/rep_set.tre -e xxxx -o $PWD/core_output