top of page
Scientist on Computer

CLUSTERING-BASED METHOD
(IonTorrent data)

BMP pipeline

Reference: Pylro, et al.. Data analysis for 16S microbial profiling from different benchtop sequencing platforms. Journal of Microbiological Methods, v. 107, p. 30-37, 2014. DOI: 10.1016/j.mimet.2014.08.018.

This example assumes reads in FASTQ format.

 

Along the BMP pipeline you will need the forward primer sequence (it is GGACTACNNGGGTNTCTAAT in the example below – change degenerated bases to “N”), and you will also need to prepare a FASTA file called “barcodes.fa” containing the barcodes that identify your samples (see example here). The FASTA label for each barcode should be a short name identifying the sample.

 

This page gives a complete pipeline to analyze 16S rRNA gene data. Of course, you should edit as needed for your reads and file locations (represented here as $PWD/).  

 

From non-demultiplexed Ion Torrent .fastq files (remember to keep the barcode sequence): 

 

To obtaind non-demultiplexed (raw data) .fastq files from the Ion Torrent server: 

- Click on "Reanalyze"

- Then click on "Analysis settings" 

- On "Barcode Set" select  "None or RNA_Barcode_None"

 

 

1- Strip barcodes ("Ex" is a prefix for the read labels, can be anything you like) <<<UPARSE Scripts>>>

 

fastq_strip_barcode_relabel2.py $PWD/reads.fastq GGACTACNNGGGTNTCTAAT $PWD/barcodes.fa Ex > reads2.fastq

 

2 - Quality filtering, length truncate, and convert to FASTA <<<USING VSEARCH>>>

 

vsearch --fastx_filter $PWD/reads2.fastq --fastq_maxee 1.0 --fastq_trunclen 200 --fastaout reads.fa

 

3 - Dereplication <<<USING VSEARCH>>>

 

vsearch --derep_fulllength $PWD/reads.fa --output derep.fa --sizeout

 

4 - Abundance sort and discard singletons <<<USING VSEARCH>>>

​

vsearch --sortbysize $PWD/derep.fa --output sorted.fa --minsize 2

​

5 - OTU clustering <<<USING VSEARCH>>>

 

vsearch --cluster_size $PWD/sorted.fa --consout otus.fa --id 0.97

​

6 - Map reads back to OTU database <<<Using VSEARCH>>>

 

vsearch --usearch_global $PWD/reads.fa --db otus.fa --strand plus --id 0.97 --uc map.uc

 

7 - Assign taxonomy to OTUS using uclust method on QIIME (use the file “otus.fa”  as input file)

 

assign_taxonomy.py -i $PWD/otus.fa -o output

​

8 - Align sequences on QIIME, using greengenes reference sequences (use the file “otus.fa” as input file)

 

align_seqs.py -i $PWD/otus.fa -o rep_set_align

 

9 - Filter alignments on QIIME

 

filter_alignment.py -i $PWD/otus_aligned.fasta -o filtered_alignment

 

10 - Make the reference tree on QIIME

 

make_phylogeny.py -i $PWD/otus_aligned_pfiltered.fasta -o rep_set.tre

 

11 - Convert UC to otu-table.txt <<< UPARSE PYTHON SCRIPT>>>

 

uc2otutab.py $PWD/map.uc > otu_table.txt

 

12 - Convert otu_table.txt to otu-table.biom, used by QIIME <<< BIOM SCRIPT>>>

​

biom convert -i $PWD/otu_table.txt -o otu_table.biom --table-type="OTU table" --to-json

​

13 - Add metadata (taxonomy) to OTU table

​

biom add-metadata -i $PWD/otu_table.biom -o otu_table_tax.biom --observation-metadata-fp $PWD/otus_tax_assignments.txt --observation-header OTUID,taxonomy,confidence --sc-separated taxonomy --float-fields confidence

​

14 - Check OTU Table  on QIIME.

​

biom summarize-table -i $PWD/otu_table_tax.biom -o results_biom_table

​

15 - Run diversity analyses on QIIME (or any other analysis of your choice). The parameter “-e” is the sequencing depth to use for even sub-sampling and maximum rarefaction depth. You should review the output of the ‘biom summarize-table’ (step 14) command to decide on this value.

 

core_diversity_analyses.py -i $PWD/otu_table_tax.biom -m $PWD/mapping_file.txt -t $PWD/rep_set.tre -e xxxx -o $PWD/core_output

bottom of page