CLUSTERING-BASED METHOD
(IonTorrent data)

BMP pipeline

Reference: Pylro, et al.. Data analysis for 16S microbial profiling from different benchtop sequencing platforms. Journal of Microbiological Methods, v. 107, p. 30-37, 2014. DOI: 10.1016/j.mimet.2014.08.018.

This example assumes reads in FASTQ format.

Along the BMP pipeline you will need the forward primer sequence (it is GGACTACNNGGGTNTCTAAT in the example below – change degenerated bases to “N”), and you will also need to prepare a FASTA file called “barcodes.fa” containing the barcodes that identify your samples (see example here). The FASTA label for each barcode should be a short name identifying the sample.

This page gives a complete pipeline to analyze 16S rRNA gene data. Of course, you should edit as needed for your reads and file locations (represented here as $PWD/).

From non-demultiplexed Ion Torrent .fastq files (remember to keep the barcode sequence):

To obtaind non-demultiplexed (raw data) .fastq files from the Ion Torrent server:

- Click on "Reanalyze"

- Then click on "Analysis settings"

- On "Barcode Set" select "None or RNA_Barcode_None"

1- Strip barcodes ("Ex" is a prefix for the read labels, can be anything you like) <<<UPARSE Scripts>>>

fastq_strip_barcode_relabel2.py $PWD/reads.fastq GGACTACNNGGGTNTCTAAT $PWD/barcodes.fa Ex > reads2.fastq

2 - Quality filtering, length truncate, and convert to FASTA <<<USING VSEARCH>>>

vsearch --fastx_filter $PWD/reads2.fastq --fastq_maxee 1.0 --fastq_trunclen 200 --fastaout reads.fa

3 - Dereplication <<<USING VSEARCH>>>

vsearch --derep_fulllength $PWD/reads.fa --output derep.fa --sizeout

4 - Abundance sort and discard singletons <<<USING VSEARCH>>>

vsearch --sortbysize $PWD/derep.fa --output sorted.fa --minsize 2

5 - OTU clustering <<<USING VSEARCH>>>

vsearch --cluster_size $PWD/sorted.fa --consout otus.fa --id 0.97

6 - Map reads back to OTU database <<<Using VSEARCH>>>

vsearch --usearch_global $PWD/reads.fa --db otus.fa --strand plus --id 0.97 --uc map.uc

7 - Assign taxonomy to OTUS using uclust method on QIIME (use the file “otus.fa” as input file)

assign_taxonomy.py -i $PWD/otus.fa -o output

8 - Align sequences on QIIME, using greengenes reference sequences (use the file “otus.fa” as input file)

align_seqs.py -i $PWD/otus.fa -o rep_set_align

9 - Filter alignments on QIIME

filter_alignment.py -i $PWD/otus_aligned.fasta -o filtered_alignment

10 - Make the reference tree on QIIME

make_phylogeny.py -i $PWD/otus_aligned_pfiltered.fasta -o rep_set.tre

11 - Convert UC to otu-table.txt <<< UPARSE PYTHON SCRIPT>>>

uc2otutab.py $PWD/map.uc > otu_table.txt

12 - Convert otu_table.txt to otu-table.biom, used by QIIME <<< BIOM SCRIPT>>>

biom convert -i $PWD/otu_table.txt -o otu_table.biom --table-type="OTU table" --to-json

13 - Add metadata (taxonomy) to OTU table

biom add-metadata -i $PWD/otu_table.biom -o otu_table_tax.biom --observation-metadata-fp $PWD/otus_tax_assignments.txt --observation-header OTUID,taxonomy,confidence --sc-separated taxonomy --float-fields confidence

14 - Check OTU Table on QIIME.

biom summarize-table -i $PWD/otu_table_tax.biom -o results_biom_table

15 - Run diversity analyses on QIIME (or any other analysis of your choice). The parameter “-e” is the sequencing depth to use for even sub-sampling and maximum rarefaction depth. You should review the output of the ‘biom summarize-table’ (step 14) command to decide on this value.

core_diversity_analyses.py -i $PWD/otu_table_tax.biom -m $PWD/mapping_file.txt -t $PWD/rep_set.tre -e xxxx -o $PWD/core_output

Back

CLUSTERING-BASED METHOD (IonTorrent data)

CLUSTERING-BASED METHOD
(IonTorrent data)