Error in h(simpleError(msg, call)) in SCTransform() when not filtering genes
The point of failure is due to min_cells=1
. This is due to MLE not being defined for a point with single observation:
> theta.ml(y = 1, mu = 1 ) Error in while ((it <- it + 1) < limit && abs(del) > eps) { : missing value where TRUE/FALSE needed
The default in SCTransform
is 5. You can either switch to method="glmGamPoi"
which will force the overdispersion estimate here to zero or set min_cells>=3
.
I'm working with a dataset of 6002 cells and I'm encountering the same issue as #3740. I've tried to solve it by installing the develop branch with the command you gave, but I still get the same error:
> raw <- read.table("/data/heart_counts_6000.txt", row.names = 1, header = TRUE) > s <- CreateSeuratObject(raw, min.cells = 1) Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-') > s <- SCTransform(s, min_cells=1, return.only.var.genes = FALSE) Calculating cell attributes from input UMI matrix: log_umi Variance stabilizing transformation of count matrix of size 21245 by 6002 Model formula is y ~ log_umi Get Negative Binomial regression parameters per gene Using 2000 genes, 5000 cells |==================================================================================================== | 75% warning: solve(): system seems singular; attempting approx solution Error in h(simpleError(msg, call)) : error in evaluating the argument 'x' in selecting a method for function 't': missing value where TRUE/FALSE needed In addition: There were 50 or more warnings (use warnings() to see the first 50)
However, if I run the SCTransform() with the default values it doesn't raise the error:
> s <- SCTransform(s) Calculating cell attributes from input UMI matrix: log_umi Variance stabilizing transformation of count matrix of size 17576 by 6002 Model formula is y ~ log_umi Get Negative Binomial regression parameters per gene Using 2000 genes, 5000 cells |======================================================================================================================================| 100% There are 27 estimated thetas smaller than 1e-07 - will be set to 1e-07 Found 60 outliers - those will be ignored in fitting/regularization step Second step: Get residuals using fitted parameters for 17576 genes |======================================================================================================================================| 100% Computing corrected count matrix for 17576 genes |======================================================================================================================================| 100% Calculating gene attributes Wall clock passed: Time difference of 1.153748 mins Determine variable features Place corrected count matrix in counts slot Centering data matrix |======================================================================================================================================| 100% Set default assay to SCT There were 50 or more warnings (use warnings() to see the first 50)
And I've also tried to run the SCTransform() without filtering any genes with a smaller dataset (646 cells) and it works fine:
> raw <- read.table("/data/transfcounts_performance_test/count_table_ILCs.txt", row.names = 1, header = TRUE) > s <- CreateSeuratObject(raw, min.cells = 1) Warning: Feature names cannot have underscores ('_'), replacing with dashes ('-') > s <- SCTransform(s, min_cells=1, return.only.var.genes = FALSE) Calculating cell attributes from input UMI matrix: log_umi Variance stabilizing transformation of count matrix of size 37982 by 646 Model formula is y ~ log_umi Get Negative Binomial regression parameters per gene Using 2000 genes, 646 cells |======================================================================================================================================| 100% There are 1 estimated thetas smaller than 1e-07 - will be set to 1e-07 Found 45 outliers - those will be ignored in fitting/regularization step Second step: Get residuals using fitted parameters for 37982 genes |======================================================================================================================================| 100% Computing corrected count matrix for 37982 genes |======================================================================================================================================| 100% Calculating gene attributes Wall clock passed: Time difference of 19.15656 secs Determine variable features Place corrected count matrix in counts slot Centering data matrix |======================================================================================================================================| 100% Set default assay to SCT There were 50 or more warnings (use warnings() to see the first 50)
Does it have something to do with the size of the dataset?
I would like to attach my input count table but it's too big, let me know if you want me to send it through another platform.