scflow: Parameters

Define where the pipeline should find input data and save output data.

The .tsv file specifying sample matrix filepaths.

type: string

default: ./refs/Manifest.txt

The .tsv file specifying sample metadata.

type: string

default: ./refs/SampleSheet.tsv

Optional tsv file containing mappings between ensembl_gene_id's and gene_names's

type: string

default: https://raw.githubusercontent.com/nf-core/test-datasets/scflow/assets/ensembl_mappings.tsv

Cell-type annotations reference file path

type: string

default: https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/28033407/ctd_v1.zip

This is a zip file containing cell-type annotation reference files for the EWCE package.

Optional tsv file specifying manual revisions of cell-type annotations.

type: string

default: ./conf/celltype_mappings.tsv

Optional list of genes of interest in YML format for plotting of gene expression.

type: string

default: ./conf/reddim_genes.yml

Input sample species.

type: string

default: human

Currently, "human" and "mouse" are supported.

Outputs directory.

type: string

default: ./results

Parameters for quality-control and thresholding.

The sample sheet column name with unique sample identifiers.

type: string

default: manifest

The sample sheet variables to treat as factors.

type: string

default: seqdate

All sample sheet columns with numbers which should be treated as factors should be specified here separated by commas. Examples include columns with dates, numeric sample identifiers, etc.

Minimum library size (counts) per cell.

type: integer

default: 250

Maximum library size (counts) per cell.

type: string

default: adaptive

Minimum features (expressive genes) per cell.

type: integer

default: 100

Maximum features (expressive genes) per cell.

type: string

default: adaptive

Minimum proportion of counts mapping to ribosomal genes.

type: number

Maximum proportion of counts mapping to ribosomal genes.

type: number

default: 1

Maximum proportion of counts mapping to mitochondrial genes.

type: string

default: adaptive

Minimum counts for gene expressivity.

type: integer

default: 2

Expressive genes must have >=min_counts in >=min_cells

Minimum cells for gene expressivity.

type: integer

default: 2

Expressive genes must have >=min_counts in >=min_cells

Option to drop unmapped genes.

type: string

default: True

Option to drop mitochondrial genes.

type: string

default: True

Option to drop ribosomal genes.

type: string

default: false

The number of MADs for outlier detection.

type: number

default: 4

The number of median absolute deviations (MADs) used to define outliers for adaptive thresholding.

Options for profiling ambient RNA/empty droplets.

Enable ambient RNA / empty droplet profiling.

type: string

default: true

Upper UMI counts threshold for true cell annotation.

type: string

default: auto

pattern: ^(\d+|auto)$

A numeric scalar specifying the threshold for the total UMI count above which all barcodes are assumed to contain cells, or "auto" for automated estimation based on the data.

Lower UMI counts threshold for empty droplet annotation.

type: integer

default: 100

A numeric scalar specifying the lower bound on the total UMI count, at or below which all barcodes are assumed to correspond to empty droplets.

The maximum FDR for the emptyDrops algorithm.

type: number

default: 0.001

Number of Monte Carlo p-value iterations.

type: integer

default: 10000

An integer scalar specifying the number of iterations to use for the Monte Carlo p-value calculations for the emptyDrops algorithm.

Expected number of cells per sample.

type: integer

default: 3000

If the "retain" parameter is set to "auto" (recommended), then this parameter is used to identify the optimal value for "retain" for the emptyDrops algorithm.

Parameters for identifying singlets/doublets/multiplets.

Enable doublet/multiplet identification.

type: string

default: true

Algorithm to use for doublet/multiplet identification.

type: string

default: doubletfinder

Variables to regress out for dimensionality reduction.

type: string

default: nCount_RNA,pc_mito

Number of PCA dimensions to use.

type: integer

default: 10

The top n most variable features to use.

type: integer

default: 2000

A fixed doublet rate.

type: number

Use a fixed default rate (e.g. 0.075 to specify that 7.5% of all cells should be marked as doublets), or set to 0 to use the "dpk" method (recommended).

Doublets per thousand cells increment.

type: integer

default: 8

The doublets per thousand cell increment specifies the expected doublet rate based on the number of cells, i.e. with a dpk of 8 (recommended by 10X), a dataset with 1000 cells is expected to contain 8 doublets per thousand cells, a dataset with 2000 cells is expected to contain 16 doublets per thousand cells, and a dataset with 10000 cells is expected to contain 80 cells per thousand cells (or 800 doublets in total). If the "doublet_rate" parameter is manually specified this recommended incremental behaviour is overridden.

Specify a pK value instead of parameter sweep.

type: number

default: 0.02

The optimal pK value used by the doubletFinder algorithm is determined following a compute-intensive parameter sweep. The parameter sweep can be overridden by manually specifying a pK value.

Parameters used in the merged quality-control report.

Numeric variables for inter-sample metrics.

type: string

default: total_features_by_counts,total_counts,pc_mito,pc_ribo

A comma-separated list of numeric variables which differ between individual cells of each sample. The merged sample report will include plots facilitating between-sample comparisons for each of these numeric variables.

Categorical variables for further sub-setting of plots

type: string

default: NULL

A comma-separated list of categorical variables. The merged sample report will include additional plots of sample metrics subset by each of these variables (e.g. sex, diagnosis).

Numeric variables for outlier identification.

type: string

default: total_features_by_counts,total_counts

The merged report will include tables highlighting samples that are putative outliers for each of these numeric variables.

Parameters for integrating datasets and batch correction.

Choice of integration method.

type: string

default: Liger

Unique sample identifier variable.

type: string

default: manifest

Fill out matrices with union of genes.

type: string

default: false

See rliger::createLiger(). Whether to fill out raw.data matrices with union of genes across all datasets (filling in 0 for missing data) (requires make.sparse = TRUE) (default FALSE).

Remove non-expressing cells/genes.

type: string

default: true

See rliger::createLiger(). Whether to remove cells not expressing any measured genes, and genes not expressed in any cells (if take.gene.union = TRUE, removes only genes not expressed in any dataset) (default TRUE).

Number of genes to find for each dataset.

type: integer

default: 3000

See rliger::selectGenes(). Number of genes to find for each dataset. Optimises the value of var.thresh for each dataset to get this number of genes.

How to combine variable genes across experiments.

type: string

default: union

See rliger::selectGenes(). Either "union" or "intersection".

Keep unique genes.

type: string

default: false

See rliger::selectGenes().

Capitalize gene names to match homologous genes.

type: string

default: false

See rliger::selectGenes().

Treat each column as a cell.

type: string

default: true

See rliger::removeMissingObs().

Inner dimension of factorization (n factors).

type: integer

default: 30

See rliger::optimizeALS(). Inner dimension of factorization (number of factors). Run suggestK to determine appropriate value; a general rule of thumb is that a higher k will be needed for datasets with more sub-structure.

Regularization parameter.

type: number

default: 5

See rliger::optimizeALS(). Regularization parameter. Larger values penalize dataset-specific effects more strongly (ie. alignment should increase as lambda increases). Run suggestLambda to determine most appropriate value for balancing dataset alignment and agreement (default 5.0).

Convergence threshold.

type: number

default: 0.0001

See rliger::optimizeALS().

Maximum number of block coordinate descent iterations.

type: integer

default: 100

See rliger::optimizeALS().

Number of restarts to perform.

type: integer

default: 1

See rliger::optimizeALS().

Random seed for reproducible results.

type: integer

default: 1

Number of neearest neighbours for within-dataset knn graph.

type: integer

default: 20

See rliger::quantile_norm().

Horizon parameter for shared nearest factor graph.

type: integer

default: 500

See rliger::quantileAlignSNF(). Distances to all but the k2 nearest neighbors are set to 0 (cuts down on memory usage for very large graphs).

Minimum allowed edge weight.

type: number

default: 0.2

See rliger::quantileAlignSNF().

Name of dataset to use as a reference.

type: string

default: NULL

See rliger::quantile_norm(). Name of dataset to use as a "reference" for normalization. By default, the dataset with the largest number of cells is used.

Minimum number of cells to consider a cluster shared across datasets.

type: integer

default: 2

See rliger::quantile_norm().

Number of quantiles to use for normalization.

type: integer

default: 50

See rliger::quantile_norm().

Number of times to perform Louvain community detection.

type: integer

default: 10

See rliger::quantileAlignSNF(). Number of times to perform Louvain community detection with different random starts (default 10).

Controls the number of communities detected.

type: integer

default: 1

See rliger::quantileAlignSNF().

Indices of factors to use for shared nearest factor determination.

type: string

default: NULL

See rliger::quantile_norm().

Distance metric to use in calculating nearest neighbour.

type: string

default: CR

See rliger::quantileAlignSNF(). Default "CR".

Center the data when scaling factors.

type: string

default: false

See rliger::quantile_norm().

Small cluster extraction cells threshold.

type: integer

See rliger::quantileAlignSNF(). Extracts small clusters loading highly on single factor with fewer cells than this before regular alignment (default 0 – no small cluster extraction).

Categorical variables for integration report metrics.

type: string

default: individual,diagnosis,region,sex

The integration report will provide plots and integration metrics for these categorical variables.

Reduced dimension embedding for the integration report.

type: string

default: UMAP

The integration report will provide with and without integration plots using this embedding.

Settings for dimensionality reduction algorithms.

Input matrix for dimension reduction.

type: string

default: PCA,Liger

Dimension reduction outputs to generate.

type: string

default: tSNE,UMAP,UMAP3D

Typically 'UMAP,UMAP3D' or 'tSNE'.

Variables to regress out before dimension reduction.

type: string

default: nCount_RNA,pc_mito

Number of PCA dimensions.

type: integer

default: 30

See uwot::umap().

Number of nearest neighbours to use.

type: integer

default: 35

See uwot::umap().

The dimension of the space to embed into.

type: integer

default: 2

See uwot::umap(). The dimension of the space to embed into. This defaults to 2 to provide easy visualization, but can reasonably be set to any integer value in the range 2 to 100.

Type of initialization for the coordinates.

type: string

See uwot::umap().

Distance metric for finding nearest neighbours.

type: string

See uwot::umap().

Number of epochs to us during optimization of embedded coordinates.

type: integer

default: 200

See uwot::umap().

Initial learning rate used in optimization of coordinates.

type: integer

default: 1

See uwot::umap().

Effective minimum distance between embedded points.

type: number

default: 0.4

See uwot::umap(). Smaller values will result in a more clustered/clumped embedding where nearby points on the manifold are drawn closer together, while larger values will result on a more even dispersal of points. The value should be set relative to the spread value, which determines the scale at which embedded points will be spread out.

Effective scale of embedded points.

type: number

default: 0.85

See uwot::umap(). In combination with min_dist, this determines how clustered/clumped the embedded points are.

Interpolation to combine local fuzzy sets.

type: number

default: 1

See uwot::umap(). The value of this parameter should be between 0.0 and 1.0; a value of 1.0 will use a pure fuzzy union, while 0.0 will use a pure fuzzy intersection.

Local connectivity required.

type: integer

default: 1

See uwot::umap(). The local connectivity required – i.e. the number of nearest neighbors that should be assumed to be connected at a local level. The higher this value the more connected the manifold becomes locally.

Weighting applied to negative samples in embedding optimization.

type: integer

default: 1

See uwot::umap(). Weighting applied to negative samples in low dimensional embedding optimization. Values higher than one will result in greater weight being given to negative samples.

Number of negative edge samples to use per positive edge sample.

type: integer

default: 5

See uwot::umap(). The number of negative edge/1-simplex samples to use per positive edge/1-simplex sample in optimizing the low dimensional embedding.

Use fast SGD.

type: string

default: false

See uwot::umap(). Setting this to TRUE will speed up the stochastic optimization phase, but give a potentially less accurate embedding, and which will not be exactly reproducible even with a fixed seed. For visualization, fast_sgd = TRUE will give perfectly good results. For more generic dimensionality reduction, it's safer to leave fast_sgd = FALSE.

Output dimensionality.

type: integer

default: 2

See Rtsne::Rtsne().

Number of dimensions retained in the initial PCA step.

type: integer

default: 50

See Rtsne::Rtsne().

Perplexity parameter.

type: integer

default: 150

See Rtsne::Rtsne().

Speed/accuracy trade-off.

type: number

default: 0.5

See Rtsne::Rtsne(). Speed/accuracy trade-off (increase for less accuracy), set to 0.0 for exact TSNE (default: 0.5).

Iteration after which perplexities are no longer exaggerated.

type: integer

default: 250

See Rtsne::Rtsne(). Iteration after which the perplexities are no longer exaggerated (default: 250, except when Y_init is used, then 0).

Iteration after which the final momentum is used.

type: integer

default: 250

See Rtsne::Rtsne(). Iteration after which the final momentum is used (default: 250, except when Y_init is used, then 0).

Number of iterations.

type: integer

default: 1000

See Rtsne::Rtsne().

Center data before PCA.

type: string

default: true

See Rtsne::Rtsne(). Should data be centered before pca is applied? (default: TRUE)

Scale data before PCA.

type: string

default: false

See Rtsne::Rtsne(). Should data be scaled before pca is applied? (default: FALSE).

Normalize data before distance calculations.

type: string

default: true

See Rtsne::Rtsne(). Should data be normalized internally prior to distance calculations with normalize_input? (default: TRUE)

Momentum used in the first part of optimization.

type: number

default: 0.5

See Rtsne::Rtsne().

Momentum used in the final part of optimization.

type: number

default: 0.8

See Rtsne::Rtsne().

Learning rate.

type: integer

default: 1000

See Rtsne::Rtsne().

Exaggeration factor used in the first part of the optimization.

type: integer

default: 12

See Rtsne::Rtsne(). Exaggeration factor used to multiply the P matrix in the first part of the optimization (default: 12.0).

Parameters used to tune louvain/leiden clustering.

Clustering method.

type: string

default: leiden

Specify "leiden" or "louvain".

Reduced dimension input(s) for clustering.

type: string

default: UMAP_Liger

One or more of "UMAP", "tSNE", "PCA", "LSI".

The resolution of clustering.

type: number

default: 0.001

Integer number of nearest neighbours for clustering.

type: integer

default: 50

Integer number of nearest neighbors to use when creating the k nearest neighbor graph for Louvain/Leiden clustering. k is related to the resolution of the clustering result, a bigger k will result in lower resolution and vice versa.

The number of iterations for clustering.

type: integer

default: 1

Parameters used for cell-type annotation and the associated report.

SingleCellExperiment clusters colData variable name.

type: string

default: clusters

Max cells to sample.

type: integer

default: 10000

A sample metadata unique sample ID.

type: string

default: individual

SingleCellExperiment cell-type colData variable name.

type: string

default: cluster_celltype

Cell-type metrics for categorical variables.

type: string

default: manifest,diagnosis,sex,capdate,prepdate,seqdate

Cell-type metrics for numeric variables.

type: string

default: pc_mito,pc_ribo,total_counts,total_features_by_counts

Number of top marker genes for plot/table generation.

type: integer

default: 5

Parameters for differential gene expression.

Differential gene expression method.

type: string

default: MASTZLM

MAST method.

type: string

See MAST::zlm(). Either 'glm', 'glmer' or 'bayesglm'.

Expressive gene minimum counts.

type: integer

default: 1

Only genes with at least min_counts in min_cells_pc will be tested for differential gene expression.

Expressive gene minimum cells fraction.

type: number

default: 0.1

Only genes with at least min_counts in min_cells_pc will be tested for differential gene expression. Default 0.1 (i.e. 10% of cells).

Re-scale numeric covariates.

type: string

default: true

Re-scaling and centring numeric covariates in a model can improve model performance.

Pseudobulked differential gene expression.

type: string

default: false

Perform differential gene expression on a smaller matrix where counts are first summed across all cells within a sample (defined by dge_sample_var level).

Cell-type annotation variable name.

type: string

default: cluster_celltype

Differential gene expression is performed separately for each cell-type of this colData variable.

Unique sample identifier variable.

type: string

default: manifest

Dependent variable of DGE model.

type: string

default: group

The dependent variable may be a categorical (e.g. diagnosis) or a numeric (e.g. histopathology score) variable.

Reference class of categorical dependent variable.

type: string

default: Control

If a categorical dependent variable is specified, then the reference class of the dependent variable is specified here (e.g. 'Control').

Confounding variables.

type: string

default: cngeneson,seqdate,pc_mito

A comma-separated list of confounding variables to account for in the DGE model.

Random effect confounding variable.

type: string

default: NULL

If specified, the term + (1 | x ) +is added to the model, where x is the specified random effects variable.

Fold-change threshold for plotting.

type: number

default: 1.1

This absolute fold-change cut-off value is used in plots (e.g. volcano) and the DGE report.

Adjusted p-value cutoff.

type: number

default: 0.05

The adjusted p-value cutoff value is used in plots (e.g. volcano) and the DGE report.

Force model fit for non-full rank.

type: string

default: false

A non-full rank model specification will return an error; to override this to return a warning only, set to TRUE.

Maximum CPU cores.

type: string

default: 'null'

The default value of 'null' utilizes all available CPU cores. As each additional CPU core increases the number of genes simultaneously fit, the RAM/memory demand increases concomitantly. Manually overriding this parameter can reduce the memory demands of parallelization across multiple cores.

Parameters for impacted pathway analysis of differentially expressed genes.

Pathway enrichment tool(s) to use.

type: string

Enrichment method.

type: string

default: ORA

Database(s) to use for enrichment.

type: string

default: GO_Biological_Process

See scFlow::list_databases(). Name of the database(s) for enrichment. Examples include "GO_Biological_Process", "GO_Cellular_Component", "GO_Molecular_Function", "KEGG", "Reactome", "Wikipathway".

Parameters for dirichlet modeling of relative cell-type proportions.

Unique sampler identifier.

type: string

default: individual

Cell-type annotation variable name.

type: string

default: cluster_celltype

Dependent variable of Dirichlet model.

type: string

default: group

Reference class of categorical dependent variable.

type: string

default: Control

Dependent variable classes order.

type: string

default: Control,Low,High

For plotting and reports, the order of classes for the dependent variable can be manually specified (e.g. 'Control,Low,High').

General parameters for plotting.

Preferred embedding for plots.

type: string

default: UMAP_Liger

Point size for reduced dimension plots.

type: number

default: 0.1

To improve visualization the point size should be adjusted according to the total number of cells plotted.

Alpha (transparency) value for reduced dimension plots.

type: number

default: 0.2

To improve visualization the alpha (transparency) value should be adjusted according to the total number of cells plotted.

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden

type: string

default: master

Base directory for Institutional configs.

hidden

type: string

default: https://raw.githubusercontent.com/nf-core/configs/master

If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.

Institutional configs hostname.

hidden

type: string

Institutional config name.

hidden

type: string

Institutional config description.

hidden

type: string

Institutional config contact information.

hidden

type: string

Institutional config URL link.

hidden

type: string

Set the top limit for requested resources for any single job.

Maximum number of CPUs that can be requested for any single job.

hidden

type: integer

default: 16

Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. --max_cpus 1

Maximum amount of memory that can be requested for any single job.

hidden

type: string

default: 256.GB

pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. --max_memory '8.GB'

Maximum amount of time that can be requested for any single job.

hidden

type: string

default: 240.h

pattern: ^(\d+\.?\s*(s|m|h|day)\s*)+$

Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. --max_time '2.h'

Less common options for the pipeline, typically set in a config file.

Display help text.

hidden

type: boolean

Method used to save pipeline results to output directory.

hidden

type: string

The Nextflow publishDir option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.

Email address for completion summary, only when pipeline fails.

hidden

type: string

pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.

Do not use coloured log outputs.

hidden

type: boolean

Directory to keep pipeline Nextflow logs and reports.

hidden

type: string

default: ${params.outdir}/pipeline_info

Boolean whether to validate parameters against the schema at runtime

hidden

type: boolean

default: true

Show all params when using --help

hidden

type: boolean

By default, parameters set as hidden in the schema are not shown on the command line when a user runs with --help. Specifying this option will tell the pipeline to show all parameters.

Run this workflow with Conda. You can also use '-profile conda' instead of providing this parameter.

hidden

type: boolean

Instead of directly downloading Singularity images for use with Singularity, force the workflow to pull and convert Docker containers instead.

hidden

type: boolean

This may be useful for example if you are unable to directly pull Singularity containers to run the pipeline due to http/https proxy issues.

E-mail address for optional workflow completion notification.

hidden

type: string

Send plain-text email instead of HTML.

hidden

type: boolean

NA

hidden

type: string

nf-core/scflow