You may not want to use the entire workflow of MOSCA. Here follow some interesting examples of tasks that are better executed running parts of MOSCA separately. The following commands assume you have installed MOSCA as instructed.
Preprocess NGS reads
MOSCA's preprocessing script can be used standalone, as it automatically downloads all resources required.
python ~/anaconda3/envs/mosca/share/MOSCA/scripts/preprocess.py -i {your input reads (e.g. mg_R1.fq,mg_R2.fq)} -t {number of threads} -o {output directory} -adaptdir {resources directory}/adapters -rrnadbs {resources directory}/rRNA_databases -d {data_type (either "dna" or "mrna")} -rd {resources directory} -n --minlen {minimum length of reads to keep} --avgqual {minimum average quality of reads to keep}
Run MOSCA without replicates
MOSCA's differential expression analysis module requires replicates. MOSCA's analysis is still possible without replicates by bypassing this task:
- First, preprocess your datasets as explained above
- Join your reads by sample by running, for each "forward" and "reverse" files, the following command:
cat {forward_file} >> {output}/Preprocess/{sample}_forward.fastq
cat {reverse_file} >> {output}/Preprocess/{sample}_forward.fastq
- Perform assembly by running this, for each sample
python ~/anaconda3/envs/mosca/share/MOSCA/scripts/assembly.py -r {output}/Preprocess/{sample}_forward.fastq,{output}/Preprocess/{sample}_reverse.fastq -t {threads} -o {output}/Assembly/{sample} -a {assembler (either "metaspades" or "megahit"} -m {max_memory}
- Perform binning, if you want to, by running, for each sample
python ~/anaconda3/envs/mosca/share/MOSCA/scripts/binning.py -c {output}/Assembly/{sample}/contigs.fasta -t {threads} -o {output}/Binning/{sample} -r {output}/Preprocess/{sample}_forward.fastq,{output}/Preprocess/{sample}_reverse.fastq -mset {markerset (either "107" or "40")}
- Perform gene calling and annotation over the contigs by running, for each sample
python ~/anaconda3/envs/mosca/share/MOSCA/scripts/annotation.py -i {output}/Assembly/{sample}/contigs.fasta -t {threads} -o {output}/Annotation/{sample} -em {error_model} -db {path/to/diamond_database.(fasta/dmnd)} -mts {diamond_max_target_seqs} --assembled"
- Run UPIMAPI for each sample
upimapi.py -i {output}/Annotation/{sample}/aligned.blast -o {output}/Annotation/uniprotinfo --blast --full-id
- Run reCOGnizer for each sample
recognizer.py -f {output}/Annotation/{sample}/fgs.faa -t {threads} -o {output}/Annotation/{sample} -rd {path/to/resources_directory} --remove-spaces
- Run quantification, all at once
python ~/anaconda3/envs/mosca/share/MOSCA/scripts/quantification_analyser.py -e {path/to/experiments_file} -t {threads} -o {output} -if {input_format_of_experiments_file ("excel" or "tsv")}
- Join all information
python ~/anaconda3/envs/mosca/share/MOSCA/scripts/join_information.py -e {path/to/experiments_file} -t {threads} -o {output} -if {input_format_of_experiments_file ("excel" or "tsv")} -nm {normalization_method ("TMM" or "RLE"}
- Run KEGGCharter
kegg_charter.py -f {output}/MOSCA_Entry_Report.xlsx -o {output}/KEGG_maps -mm {metabolic_maps comma-separate (e.g. 00030,00680,...)} -gcol {mg_names comma-separated} -tcol {mt_names comma-separated} -tc 'Taxonomic lineage ({taxa_level})' -not {number_of_taxa} -keggc 'Cross-reference (KEGG)'
- Run final reporting
python ~/anaconda3/envs/mosca/share/MOSCA/scripts/report.py -e {path/to/experiments_file} -o {output} -ldir ~/anaconda3/envs/mosca/share/MOSCA/resources -if {input_format_of_experiments_file ("excel" or "tsv")}