Define where the pipeline should find input data and save output data.

The .tsv file specifying sample matrix filepaths.

type: string
default: ./refs/Manifest.txt

The .tsv file specifying sample metadata.

type: string
default: ./refs/SampleSheet.tsv

Optional tsv file containing mappings between ensembl_gene_id's and gene_names's

type: string
default: https://raw.githubusercontent.com/nf-core/test-datasets/scflow/assets/ensembl_mappings.tsv

Cell-type annotations reference file path

type: string
default: https://s3-eu-west-1.amazonaws.com/pfigshare-u-files/28033407/ctd_v1.zip

This is a zip file containing cell-type annotation reference files for the EWCE package.

Optional tsv file specifying manual revisions of cell-type annotations.

type: string
default: ./conf/celltype_mappings.tsv

Optional list of genes of interest in YML format for plotting of gene expression.

type: string
default: ./conf/reddim_genes.yml

Input sample species.

type: string
default: human

Currently, "human" and "mouse" are supported.

Outputs directory.

type: string
default: ./results

Parameters for quality-control and thresholding.

The sample sheet column name with unique sample identifiers.

type: string
default: manifest

The sample sheet variables to treat as factors.

type: string
default: seqdate

All sample sheet columns with numbers which should be treated as factors should be specified here separated by commas. Examples include columns with dates, numeric sample identifiers, etc.

Minimum library size (counts) per cell.

type: integer
default: 250

Maximum library size (counts) per cell.

type: string
default: adaptive

Minimum features (expressive genes) per cell.

type: integer
default: 100

Maximum features (expressive genes) per cell.

type: string
default: adaptive

Minimum proportion of counts mapping to ribosomal genes.

type: number

Maximum proportion of counts mapping to ribosomal genes.

type: number
default: 1

Maximum proportion of counts mapping to mitochondrial genes.

type: string
default: adaptive

Minimum counts for gene expressivity.

type: integer
default: 2

Expressive genes must have >=min_counts in >=min_cells

Minimum cells for gene expressivity.

type: integer
default: 2

Expressive genes must have >=min_counts in >=min_cells

Option to drop unmapped genes.

type: string
default: True

Option to drop mitochondrial genes.

type: string
default: True

Option to drop ribosomal genes.

type: string
default: false

The number of MADs for outlier detection.

type: number
default: 4

The number of median absolute deviations (MADs) used to define outliers for adaptive thresholding.

Options for profiling ambient RNA/empty droplets.

Enable ambient RNA / empty droplet profiling.

type: string
default: true

Upper UMI counts threshold for true cell annotation.

type: string
default: auto
pattern: ^(\d+|auto)$

A numeric scalar specifying the threshold for the total UMI count above which all barcodes are assumed to contain cells, or "auto" for automated estimation based on the data.

Lower UMI counts threshold for empty droplet annotation.

type: integer
default: 100

A numeric scalar specifying the lower bound on the total UMI count, at or below which all barcodes are assumed to correspond to empty droplets.

The maximum FDR for the emptyDrops algorithm.

type: number
default: 0.001

Number of Monte Carlo p-value iterations.

type: integer
default: 10000

An integer scalar specifying the number of iterations to use for the Monte Carlo p-value calculations for the emptyDrops algorithm.

Expected number of cells per sample.

type: integer
default: 3000

If the "retain" parameter is set to "auto" (recommended), then this parameter is used to identify the optimal value for "retain" for the emptyDrops algorithm.

Parameters for identifying singlets/doublets/multiplets.

Enable doublet/multiplet identification.

type: string
default: true

Algorithm to use for doublet/multiplet identification.

type: string
default: doubletfinder

Variables to regress out for dimensionality reduction.

type: string
default: nCount_RNA,pc_mito

Number of PCA dimensions to use.

type: integer
default: 10

The top n most variable features to use.

type: integer
default: 2000

A fixed doublet rate.

type: number

Use a fixed default rate (e.g. 0.075 to specify that 7.5% of all cells should be marked as doublets), or set to 0 to use the "dpk" method (recommended).

Doublets per thousand cells increment.

type: integer
default: 8

The doublets per thousand cell increment specifies the expected doublet rate based on the number of cells, i.e. with a dpk of 8 (recommended by 10X), a dataset with 1000 cells is expected to contain 8 doublets per thousand cells, a dataset with 2000 cells is expected to contain 16 doublets per thousand cells, and a dataset with 10000 cells is expected to contain 80 cells per thousand cells (or 800 doublets in total). If the "doublet_rate" parameter is manually specified this recommended incremental behaviour is overridden.

Specify a pK value instead of parameter sweep.

type: number
default: 0.02

The optimal pK value used by the doubletFinder algorithm is determined following a compute-intensive parameter sweep. The parameter sweep can be overridden by manually specifying a pK value.

Parameters used in the merged quality-control report.

Numeric variables for inter-sample metrics.

type: string
default: total_features_by_counts,total_counts,pc_mito,pc_ribo

A comma-separated list of numeric variables which differ between individual cells of each sample. The merged sample report will include plots facilitating between-sample comparisons for each of these numeric variables.

Categorical variables for further sub-setting of plots

type: string
default: NULL

A comma-separated list of categorical variables. The merged sample report will include additional plots of sample metrics subset by each of these variables (e.g. sex, diagnosis).

Numeric variables for outlier identification.

type: string
default: total_features_by_counts,total_counts

The merged report will include tables highlighting samples that are putative outliers for each of these numeric variables.

Parameters for integrating datasets and batch correction.

Choice of integration method.

type: string
default: Liger

Unique sample identifier variable.

type: string
default: manifest

Fill out matrices with union of genes.

type: string
default: false

See rliger::createLiger(). Whether to fill out raw.data matrices with union of genes across all datasets (filling in 0 for missing data) (requires make.sparse = TRUE) (default FALSE).

Remove non-expressing cells/genes.

type: string
default: true

See rliger::createLiger(). Whether to remove cells not expressing any measured genes, and genes not expressed in any cells (if take.gene.union = TRUE, removes only genes not expressed in any dataset) (default TRUE).

Number of genes to find for each dataset.

type: integer
default: 3000

See rliger::selectGenes(). Number of genes to find for each dataset. Optimises the value of var.thresh for each dataset to get this number of genes.

How to combine variable genes across experiments.

type: string
default: union

See rliger::selectGenes(). Either "union" or "intersection".

Keep unique genes.

type: string
default: false

See rliger::selectGenes().

Capitalize gene names to match homologous genes.

type: string
default: false

See rliger::selectGenes().

Treat each column as a cell.

type: string
default: true

See rliger::removeMissingObs().

Inner dimension of factorization (n factors).

type: integer
default: 30

See rliger::optimizeALS(). Inner dimension of factorization (number of factors). Run suggestK to determine appropriate value; a general rule of thumb is that a higher k will be needed for datasets with more sub-structure.

Regularization parameter.

type: number
default: 5

See rliger::optimizeALS(). Regularization parameter. Larger values penalize dataset-specific effects more strongly (ie. alignment should increase as lambda increases). Run suggestLambda to determine most appropriate value for balancing dataset alignment and agreement (default 5.0).

Convergence threshold.

type: number
default: 0.0001

See rliger::optimizeALS().

Maximum number of block coordinate descent iterations.

type: integer
default: 100

See rliger::optimizeALS().

Number of restarts to perform.

type: integer
default: 1

See rliger::optimizeALS().

Random seed for reproducible results.

type: integer
default: 1

Number of neearest neighbours for within-dataset knn graph.

type: integer
default: 20

See rliger::quantile_norm().

Horizon parameter for shared nearest factor graph.

type: integer
default: 500

See rliger::quantileAlignSNF(). Distances to all but the k2 nearest neighbors are set to 0 (cuts down on memory usage for very large graphs).

Minimum allowed edge weight.

type: number
default: 0.2

See rliger::quantileAlignSNF().

Name of dataset to use as a reference.

type: string
default: NULL

See rliger::quantile_norm(). Name of dataset to use as a "reference" for normalization. By default, the dataset with the largest number of cells is used.

Minimum number of cells to consider a cluster shared across datasets.

type: integer
default: 2

See rliger::quantile_norm().

Number of quantiles to use for normalization.

type: integer
default: 50

See rliger::quantile_norm().

Number of times to perform Louvain community detection.

type: integer
default: 10

See rliger::quantileAlignSNF(). Number of times to perform Louvain community detection with different random starts (default 10).

Controls the number of communities detected.

type: integer
default: 1

See rliger::quantileAlignSNF().

Indices of factors to use for shared nearest factor determination.

type: string
default: NULL

See rliger::quantile_norm().

Distance metric to use in calculating nearest neighbour.

type: string
default: CR

See rliger::quantileAlignSNF(). Default "CR".

Center the data when scaling factors.

type: string
default: false

See rliger::quantile_norm().

Small cluster extraction cells threshold.

type: integer

See rliger::quantileAlignSNF(). Extracts small clusters loading highly on single factor with fewer cells than this before regular alignment (default 0 – no small cluster extraction).

Categorical variables for integration report metrics.

type: string
default: individual,diagnosis,region,sex

The integration report will provide plots and integration metrics for these categorical variables.

Reduced dimension embedding for the integration report.

type: string
default: UMAP

The integration report will provide with and without integration plots using this embedding.

Settings for dimensionality reduction algorithms.

Input matrix for dimension reduction.

type: string
default: PCA,Liger

Dimension reduction outputs to generate.

type: string
default: tSNE,UMAP,UMAP3D

Typically 'UMAP,UMAP3D' or 'tSNE'.

Variables to regress out before dimension reduction.

type: string
default: nCount_RNA,pc_mito

Number of PCA dimensions.

type: integer
default: 30

See uwot::umap().

Number of nearest neighbours to use.

type: integer
default: 35

See uwot::umap().

The dimension of the space to embed into.

type: integer
default: 2

See uwot::umap(). The dimension of the space to embed into. This defaults to 2 to provide easy visualization, but can reasonably be set to any integer value in the range 2 to 100.

Type of initialization for the coordinates.

type: string

See uwot::umap().

Distance metric for finding nearest neighbours.

type: string

See uwot::umap().

Number of epochs to us during optimization of embedded coordinates.

type: integer
default: 200

See uwot::umap().

Initial learning rate used in optimization of coordinates.

type: integer
default: 1

See uwot::umap().

Effective minimum distance between embedded points.

type: number
default: 0.4

See uwot::umap(). Smaller values will result in a more clustered/clumped embedding where nearby points on the manifold are drawn closer together, while larger values will result on a more even dispersal of points. The value should be set relative to the spread value, which determines the scale at which embedded points will be spread out.

Effective scale of embedded points.

type: number
default: 0.85

See uwot::umap(). In combination with min_dist, this determines how clustered/clumped the embedded points are.

Interpolation to combine local fuzzy sets.

type: number
default: 1

See uwot::umap(). The value of this parameter should be between 0.0 and 1.0; a value of 1.0 will use a pure fuzzy union, while 0.0 will use a pure fuzzy intersection.

Local connectivity required.

type: integer
default: 1

See uwot::umap(). The local connectivity required – i.e. the number of nearest neighbors that should be assumed to be connected at a local level. The higher this value the more connected the manifold becomes locally.

Weighting applied to negative samples in embedding optimization.

type: integer
default: 1

See uwot::umap(). Weighting applied to negative samples in low dimensional embedding optimization. Values higher than one will result in greater weight being given to negative samples.

Number of negative edge samples to use per positive edge sample.

type: integer
default: 5

See uwot::umap(). The number of negative edge/1-simplex samples to use per positive edge/1-simplex sample in optimizing the low dimensional embedding.

Use fast SGD.

type: string
default: false

See uwot::umap(). Setting this to TRUE will speed up the stochastic optimization phase, but give a potentially less accurate embedding, and which will not be exactly reproducible even with a fixed seed. For visualization, fast_sgd = TRUE will give perfectly good results. For more generic dimensionality reduction, it's safer to leave fast_sgd = FALSE.

Output dimensionality.

type: integer
default: 2

See Rtsne::Rtsne().

Number of dimensions retained in the initial PCA step.

type: integer
default: 50

See Rtsne::Rtsne().

Perplexity parameter.

type: integer
default: 150

See Rtsne::Rtsne().

Speed/accuracy trade-off.

type: number
default: 0.5

See Rtsne::Rtsne(). Speed/accuracy trade-off (increase for less accuracy), set to 0.0 for exact TSNE (default: 0.5).

Iteration after which perplexities are no longer exaggerated.

type: integer
default: 250

See Rtsne::Rtsne(). Iteration after which the perplexities are no longer exaggerated (default: 250, except when Y_init is used, then 0).

Iteration after which the final momentum is used.

type: integer
default: 250

See Rtsne::Rtsne(). Iteration after which the final momentum is used (default: 250, except when Y_init is used, then 0).

Number of iterations.

type: integer
default: 1000

See Rtsne::Rtsne().

Center data before PCA.

type: string
default: true

See Rtsne::Rtsne(). Should data be centered before pca is applied? (default: TRUE)

Scale data before PCA.

type: string
default: false

See Rtsne::Rtsne(). Should data be scaled before pca is applied? (default: FALSE).

Normalize data before distance calculations.

type: string
default: true

See Rtsne::Rtsne(). Should data be normalized internally prior to distance calculations with normalize_input? (default: TRUE)

Momentum used in the first part of optimization.

type: number
default: 0.5

See Rtsne::Rtsne().

Momentum used in the final part of optimization.

type: number
default: 0.8

See Rtsne::Rtsne().

Learning rate.

type: integer
default: 1000

See Rtsne::Rtsne().

Exaggeration factor used in the first part of the optimization.

type: integer
default: 12

See Rtsne::Rtsne(). Exaggeration factor used to multiply the P matrix in the first part of the optimization (default: 12.0).

Parameters used to tune louvain/leiden clustering.

Clustering method.

type: string
default: leiden

Specify "leiden" or "louvain".

Reduced dimension input(s) for clustering.

type: string
default: UMAP_Liger

One or more of "UMAP", "tSNE", "PCA", "LSI".

The resolution of clustering.

type: number
default: 0.001

Integer number of nearest neighbours for clustering.

type: integer
default: 50

Integer number of nearest neighbors to use when creating the k nearest neighbor graph for Louvain/Leiden clustering. k is related to the resolution of the clustering result, a bigger k will result in lower resolution and vice versa.

The number of iterations for clustering.

type: integer
default: 1

Parameters used for cell-type annotation and the associated report.

SingleCellExperiment clusters colData variable name.

type: string
default: clusters

Max cells to sample.

type: integer
default: 10000

A sample metadata unique sample ID.

type: string
default: individual

SingleCellExperiment cell-type colData variable name.

type: string
default: cluster_celltype

Cell-type metrics for categorical variables.

type: string
default: manifest,diagnosis,sex,capdate,prepdate,seqdate

Cell-type metrics for numeric variables.

type: string
default: pc_mito,pc_ribo,total_counts,total_features_by_counts

Number of top marker genes for plot/table generation.

type: integer
default: 5

Parameters for differential gene expression.

Differential gene expression method.

type: string
default: MASTZLM

MAST method.

type: string

See MAST::zlm(). Either 'glm', 'glmer' or 'bayesglm'.

Expressive gene minimum counts.

type: integer
default: 1

Only genes with at least min_counts in min_cells_pc will be tested for differential gene expression.

Expressive gene minimum cells fraction.

type: number
default: 0.1

Only genes with at least min_counts in min_cells_pc will be tested for differential gene expression. Default 0.1 (i.e. 10% of cells).

Re-scale numeric covariates.

type: string
default: true

Re-scaling and centring numeric covariates in a model can improve model performance.

Pseudobulked differential gene expression.

type: string
default: false

Perform differential gene expression on a smaller matrix where counts are first summed across all cells within a sample (defined by dge_sample_var level).

Cell-type annotation variable name.

type: string
default: cluster_celltype

Differential gene expression is performed separately for each cell-type of this colData variable.

Unique sample identifier variable.

type: string
default: manifest

Dependent variable of DGE model.

type: string
default: group

The dependent variable may be a categorical (e.g. diagnosis) or a numeric (e.g. histopathology score) variable.

Reference class of categorical dependent variable.

type: string
default: Control

If a categorical dependent variable is specified, then the reference class of the dependent variable is specified here (e.g. 'Control').

Confounding variables.

type: string
default: cngeneson,seqdate,pc_mito

A comma-separated list of confounding variables to account for in the DGE model.

Random effect confounding variable.

type: string
default: NULL

If specified, the term + (1 | x ) +is added to the model, where x is the specified random effects variable.

Fold-change threshold for plotting.

type: number
default: 1.1

This absolute fold-change cut-off value is used in plots (e.g. volcano) and the DGE report.

Adjusted p-value cutoff.

type: number
default: 0.05

The adjusted p-value cutoff value is used in plots (e.g. volcano) and the DGE report.

Force model fit for non-full rank.

type: string
default: false

A non-full rank model specification will return an error; to override this to return a warning only, set to TRUE.

Maximum CPU cores.

type: string
default: 'null'

The default value of 'null' utilizes all available CPU cores. As each additional CPU core increases the number of genes simultaneously fit, the RAM/memory demand increases concomitantly. Manually overriding this parameter can reduce the memory demands of parallelization across multiple cores.

Parameters for impacted pathway analysis of differentially expressed genes.

Pathway enrichment tool(s) to use.

type: string

Enrichment method.

type: string
default: ORA

Database(s) to use for enrichment.

type: string
default: GO_Biological_Process

See scFlow::list_databases(). Name of the database(s) for enrichment. Examples include "GO_Biological_Process", "GO_Cellular_Component", "GO_Molecular_Function", "KEGG", "Reactome", "Wikipathway".

Parameters for dirichlet modeling of relative cell-type proportions.

Unique sampler identifier.

type: string
default: individual

Cell-type annotation variable name.

type: string
default: cluster_celltype

Dependent variable of Dirichlet model.

type: string
default: group

Reference class of categorical dependent variable.

type: string
default: Control

Dependent variable classes order.

type: string
default: Control,Low,High

For plotting and reports, the order of classes for the dependent variable can be manually specified (e.g. 'Control,Low,High').

General parameters for plotting.

Preferred embedding for plots.

type: string
default: UMAP_Liger

Point size for reduced dimension plots.

type: number
default: 0.1

To improve visualization the point size should be adjusted according to the total number of cells plotted.

Alpha (transparency) value for reduced dimension plots.

type: number
default: 0.2

To improve visualization the alpha (transparency) value should be adjusted according to the total number of cells plotted.

Parameters used to describe centralised config profiles. These should not be edited.

Git commit id for Institutional configs.

hidden
type: string
default: master

Base directory for Institutional configs.

hidden
type: string
default: https://raw.githubusercontent.com/nf-core/configs/master

If you're running offline, Nextflow will not be able to fetch the institutional config files from the internet. If you don't need them, then this is not a problem. If you do need them, you should download the files from the repo and tell Nextflow where to find them with this parameter.

Institutional configs hostname.

hidden
type: string

Institutional config name.

hidden
type: string

Institutional config description.

hidden
type: string

Institutional config contact information.

hidden
type: string

Institutional config URL link.

hidden
type: string

Set the top limit for requested resources for any single job.

Maximum number of CPUs that can be requested for any single job.

hidden
type: integer
default: 16

Use to set an upper-limit for the CPU requirement for each process. Should be an integer e.g. --max_cpus 1

Maximum amount of memory that can be requested for any single job.

hidden
type: string
default: 256.GB
pattern: ^\d+(\.\d+)?\.?\s*(K|M|G|T)?B$

Use to set an upper-limit for the memory requirement for each process. Should be a string in the format integer-unit e.g. --max_memory '8.GB'

Maximum amount of time that can be requested for any single job.

hidden
type: string
default: 240.h
pattern: ^(\d+\.?\s*(s|m|h|day)\s*)+$

Use to set an upper-limit for the time requirement for each process. Should be a string in the format integer-unit e.g. --max_time '2.h'

Less common options for the pipeline, typically set in a config file.

Display help text.

hidden
type: boolean

Method used to save pipeline results to output directory.

hidden
type: string

The Nextflow publishDir option specifies which intermediate files should be saved to the output directory. This option tells the pipeline what method should be used to move these files. See Nextflow docs for details.

Email address for completion summary, only when pipeline fails.

hidden
type: string
pattern: ^([a-zA-Z0-9_\-\.]+)@([a-zA-Z0-9_\-\.]+)\.([a-zA-Z]{2,5})$

An email address to send a summary email to when the pipeline is completed - ONLY sent if the pipeline does not exit successfully.

Do not use coloured log outputs.

hidden
type: boolean

Directory to keep pipeline Nextflow logs and reports.

hidden
type: string
default: ${params.outdir}/pipeline_info

Boolean whether to validate parameters against the schema at runtime

hidden
type: boolean
default: true

Show all params when using --help

hidden
type: boolean

By default, parameters set as hidden in the schema are not shown on the command line when a user runs with --help. Specifying this option will tell the pipeline to show all parameters.

Run this workflow with Conda. You can also use '-profile conda' instead of providing this parameter.

hidden
type: boolean

Instead of directly downloading Singularity images for use with Singularity, force the workflow to pull and convert Docker containers instead.

hidden
type: boolean

This may be useful for example if you are unable to directly pull Singularity containers to run the pipeline due to http/https proxy issues.

E-mail address for optional workflow completion notification.

hidden
type: string

Send plain-text email instead of HTML.

hidden
type: boolean

NA

hidden
type: string