Clustering methods | Brazilian Microbiome Project

CLUSTERING-BASED METHOD
(16S Illumina: BTW/WSL)

BMP pipeline

BTW tutorial: https://github.com/vpylro/BTW/blob/master/README.md

References:

Pylro, et al. Data analysis for 16S microbial profiling from different benchtop sequencing platforms. Journal of Microbiological Methods, v. 107, p. 30-37, 2014. DOI: 10.1016/j.mimet.2014.08.018.

Morais et al. (2018), BTW—Bioinformatics Through Windows: an easy-to-install package to analyze marker gene data. PeerJ 6:e5299. DOI: 10.7717/peerj.5299.

This example assumes reads in FASTQ format.

This page gives a complete pipeline to analyze 16S rRNA gene data. Of course, you should edit as needed for your reads and file locations (represented here as $PWD/).

Create a folder containing 3 empty folders: fastq/ | fasta/ | demul |

1- Take forward and reverse Illumina reads (R1.fastq and R2.fastq files) and join them using the method fastq-join <<<USING QIIME 1.9>>>

multiple_join_paired_ends.py -i raw/ -o merged/

2 - Quality filtering, length truncate, and convert to FASTA each joined sample <<<USING VSEARCH>>>

for i in $(ls fastq/); do vsearch --fastx_filter fastq/$i --fastq_maxee 1.0 --fastq_trunclen 350 --fastaout fasta/${i%.fastq}.fa; done

3 - Change sequence header to make file compatible with further steps <<<USING BMP PERL SCRIPT>>>. This script will generate your converted FASTA file. Sample´s name should not contain any special characters, symbols or spaces. We strongly recommend keeping samples´s name as simple as possible.

for i in $(ls fasta/); do bmp_demultiplexed.pl -i fasta/$i -o demul/${i%.fa}.fa -b ${i%.fa}; done

4 - Make a single file containing all your samples

cat demul/*.fa > reads.fa

5 - Dereplication <<<USING VSEARCH>>>

vsearch --derep_fulllength reads.fa --output derep.fa --sizeout

6 - Abundance sort and discard singletons <<<USING VSEARCH>>>

vsearch --sortbysize derep.fa --output sorted.fa --minsize 2

7 - OTU clustering using UPARSE method <<<USING VSEARCH>>>

vsearch --cluster_size sorted.fa --consout otus1.fa --id 0.97

8 - Fasta Formatter <<<FASTX TOOLKIT SCRIPT>>>

fasta_formatter -i otus1.fa -o formated_otus1.fa

9 - Renamer <<<BMP SCRIPT>>>

bmp-otuName.pl -i formated_otus1.fa -o otus.fa

10 - Map reads back to OTU database <<<VSEARCH>>>

vsearch --usearch_global $PWD/reads.fa --db otus.fa --strand plus --id 0.97 --uc map.txt

11 - Assign taxonomy to OTUS using the RDP Classifier on QIIME (use the file “otus.fa” as input file)

assign_taxonomy.py -i otus.fa -m rdp -o taxonomy

12 - Convert UC to otu-table.txt <<< BMP SCRIPT>>>

bmp-map2qiime.py map.txt > otu_table.txt

13 - Convert otu_table.txt to otu-table.biom <<< QIIME SCRIPT>>>

make_otu_table.py -i otu_table.txt -t taxonomy/otus_tax_assignments.txt -o otu_table.biom

14 - Check OTU Table on QIIME.

biom summarize-table -i $PWD/otu_table.biom -o results_biom_table.txt

The generated .biom OTU table is also fully compatible with the MicrobiomeAnalyst, a user-friendly web-based platform for microbiome data analyses and visualizations, including taxonomy plots and estimates of α- and β-diversity (http://www.microbiomeanalyst.ca).

Back

CLUSTERING-BASED METHOD (16S Illumina: BTW/WSL)

CLUSTERING-BASED METHOD
(16S Illumina: BTW/WSL)