|
|
## Base arguments for running MOSCA
|
|
|
|
|
|
MOSCA accepts input from a config file, in either JSON or YAML format.
|
|
|
This repo has an available [config file](https://github.com/iquasere/MOSCA/blob/development/config/config.json),
|
|
|
which can be used for MOSCA as follows:
|
|
|
```
|
|
|
python mosca.py --configfile config.json
|
|
|
```
|
|
|
The config file allows to customize MOSCA's workflow, but for the convenience of users, many typical decisions in MG and
|
|
|
MT workflow are already automized. The customization, therefore, is only related to steps that are not yet well established
|
|
|
in the field of MG (e.g. assembling data into contigs is still a controversial step that may lose information on data).
|
|
|
|
|
|
Following are the options available in the config file, and the accepted values:
|
|
|
|
|
|
| Parameter | Options | Required | Description |
|
|
|
|:------------------------------:|:------------------------------------------------------------:|:--------:|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
|
|
| output | String | Yes | Name of folder where MOSCA's results will be stored (if it doesn't exist, it will be created) |
|
|
|
| threads | Int | Yes | Number of maximum threads for MOSCA to use |
|
|
|
| experiments | String | Yes | Name of TSV file with information on samples/files/conditions |
|
|
|
| trimmomatic_adapters_directory | String | Yes | Name of folder containing adapters for Trimmomatic's ADAPTER REMOVAL preprocessing tool |
|
|
|
| rrna_databases_directory | String | Yes | Name of folder containing rRNA databases to use as reference for rRNA removal with SortMeRNA |
|
|
|
| assembler | metaspades, megahit | Yes | Name of assembler to use for iterative co-assembly of MG data |
|
|
|
| markerset | 40, 107 | Yes | Name of markerset to use for completeness/contamination estimation with CheckM over the contigs obtained with MaxBin2 |
|
|
|
| error_model | sanger_5, sanger_10, 454_10, 454_30, illumina_5, illumina_10 | No | Name of file to use as the error model for gene calling with FragGeneScan. sanger, 454 or illumina if either Sanger, pyro- or Illumina sequencing reads are the input to gene calling. Leave empty if assembly was performed. |
|
|
|
| diamond_database | String | Yes | Name of FASTA or DMND (DIAMOND formatted database) file to use as input for annotation with DIAMOND |
|
|
|
| download_uniprot | TRUE, FALSE | Yes | If UniProtKB (SwissProt + TrEMBL) is to be download. If TRUE, will download it to the folder indicated in diamond_database |
|
|
|
| diamond_max_target_seqs | Int | Yes | Number of matches to report for each protein from annotation with DIAMOND |
|
|
|
| recognizer_databases_directory | String | Yes | Name of folder containing the resources for reCOGnizer annotation. If those are not present in the folder, they will be downloaded |
|
|
|
| normalization_method | TMM, RLE | Yes | Method to use for normalization |
|
|
|
| keggcharter_maps | Comma-separated list of KEGG maps' IDs | No | If empty, KEGGCharter will use the default prokaryotic maps. These metabolic maps will have MG information represented in them, and gene expression if MT data is available |
|
|
|
| keggcharter_taxa_level | SPECIES, GENUS, FAMILY, ORDER, CLASS, PHYLUM, SUPERKINGDOM | Yes | The taxonomic level to represent with KEGGCharter. If above SPECIES, KEGGCharter will represent group information and represent is as such for each taxonomic group |
|
|
|
| keggcharter_number_of_taxa | Int, ideally under 11 | Yes | How many of the most abundant taxa should be represented with KEGGCharter |
|
|
|
| reporter_lists_directory | String | Yes | Name of folder containing lists for reporter module of MOSCA | |