一文学会ComplexHeatmap包绘制单细胞中常见的聚类
大家好,我是风。这里是风风的CNS系列专栏。酸菜老师和老谈老师在不久前刚在直播中聊到了单细胞空间转录组的相关内容,为了追随酸谈二侠的脚步,今天我们一起学习一下利用ComplexHeatmap包来绘制单细胞的中的常见的聚类,来看看ComplexHeatmap做出来的图片是不是比一般Seurat的图片更好看呢?
传送门
这里我们使用nature biotechnology上面的一篇文章“Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells”的单细胞测序数据进行可视化。
期刊信息
数据处理
# 加载R包library(ComplexHeatmap)library(circlize)# 加载单细胞测序数据expr <- read.table("mouse_scRNA.txt", sep = "\t", header = TRUE,row.names = 1)class(expr)## [1] "data.frame"expr[1:3,1:3]## Gnai3 Cdc45 Narf## Cell 1 3.2322 3.1981 0.29411## Cell 2 1.9832 1.1730 0.49389## Cell 3 2.2482 3.1705 1.62790
观察数据发现行名为细胞类型,列名为分子名称,接下来我们对数据进行预处理:
# 过滤掉在一半以上的细胞中没有表达的基因expr <- as.matrix(expr)expr <- expr[apply(expr, 1, function(x)sum(x > 0)/length(x) > 0.5), , drop = FALSE]# 构建一个function寻找特征基因,特征基因在细胞之间表达多样,并与其他基因相关性高get_correlated_variable_genes <- function(mat, n = nrow(mat), cor_cutoff = 0, n_cutoff = 0){ ind = order(apply(mat, 1, function(x){ q = quantile(x, c(0.1, 0.9)) x = x[x < q[1] & x > q[2]]var(x)/mean(x) }), decreasing = TRUE)[1:n] mat2 = mat[ind, , drop = FALSE] dt = cor(t(mat2), method = "spearman") # 使用Spearman相关性分析 diag(dt) = 0# 将相关系数矩阵的对角线数值设置为0 dt[abs(dt) < cor_cutoff] = 0# 设置相关系数的绝对值小于cutoff值即设置为0,也就是只保留相关系数大于cutoff的结果 dt[dt < 0] = -1 dt[dt > 0] = 1 i = colSums(abs(dt)) > n_cutoff  mat3 = mat2[i, ,drop = FALSE]return(mat3)}# 利用自定义函数进行相关性分析,这里定义每个基因与至少50个基因相关,并且绝对相关性大于0.5mat <- get_correlated_variable_genes(expr, cor_cutoff = 0.5, n_cutoff = 20) mat[1:3,1:3]## Gnai3 Cdc45 Narf## Cell 1 3.2322 3.1981 0.29411## Cell 2 1.9832 1.1730 0.49389## Cell 3 2.2482 3.1705 1.62790# 由于单细胞RNASeq数据高度可变且多数离群,因此在第10和第90个分位数内的基因表达量内进行数据标准化,得到的mat2包含每个基因标准化的表达值,表示每个基因在细胞中的相对表达mat2 <- t(apply(mat, 1, function(x){ q10 = quantile(x, 0.1) q90 = quantile(x, 0.9) x[x < q10] = q10 x[x > q90] = q90 scale(x)}))colnames(mat2) <- colnames(mat)mat2 <- t(mat2)mat2[1:3,1:3]## Cell 1 Cell 2 Cell 3## Gnai3 1.5224311 0.3856825 0.6499109## Cdc45 1.5020892 -0.2635181 1.3440641## Narf -0.8570421 -0.8076784 0.1471516# 对mat数据进行转置,方便后续匹配数据mat3 <- as.data.frame(t(mat))mat3[1:3,1:3]## Cell 1 Cell 2 Cell 3## Gnai3 3.23220 1.98320 2.2482## Cdc45 3.19810 1.17300 3.1705## Narf 0.29411 0.49389 1.6279
提取细胞周期基因和核糖核酸基因
cc<- readRDS("mouse_cell_cycle_gene.rds")ccl<- rownames(mat3) %in% cccc_gene <- rownames(mat3)[ccl]rp<- readRDS("mouse_ribonucleoprotein.rds")rpl <- rownames(mat3) %in% rp# 计算基因在所有细胞中的平均表达值base_mean<- rowMeans(mat3)
绘制热图可视化
前面我们已经得到了缩放的表达矩阵mat2、基数平均值base_mean、矩阵中的细胞周期基因ccl、矩阵中的核糖核酸基因rpl,接下来我们将使用上面处理的结果进行绘图:
library(GetoptLong)library(ComplexHeatmap)library(circlize)# 根据结果绘制热图Heatmap(mat2, col = colorRamp2(c(-1.5, 0, 1.5), c("blue", "white", "red")), name = "scaled_expr", column_title = qq("relative expression for @{nrow(mat3)} genes"), show_column_names = TRUE, width = unit(8, "cm"), show_row_names = FALSE, column_names_gp = gpar(fontsize = 5), heatmap_legend_param = list(title = "Scaled expr"))
# 对每一行添加base mean值Heatmap(mat2, col = colorRamp2(c(-1.5, 0, 1.5), c("blue", "white", "red")), name = "scaled_expr", column_title = qq("relative expression for @{nrow(mat3)} genes"), show_column_names = TRUE, width = unit(8, "cm"), show_row_names = FALSE, column_names_gp = gpar(fontsize = 5), heatmap_legend_param = list(title = "Scaled expr")) + Heatmap(base_mean, name = "base_expr", width = unit(5, "mm"), show_row_names = FALSE, heatmap_legend_param = list(title = "Base expr"))
# 添加表型相关基因热图Heatmap(mat2, col = colorRamp2(c(-1.5, 0, 1.5), c("blue", "white", "red")), name = "scaled_expr", column_title = qq("relative expression for @{nrow(mat3)} genes"), show_column_names = TRUE, width = unit(8, "cm"), show_row_names = FALSE, column_names_gp = gpar(fontsize = 5), heatmap_legend_param = list(title = "Scaled expr")) + Heatmap(base_mean, name = "base_expr", width = unit(5, "mm"), heatmap_legend_param = list(title = "Base expr")) + Heatmap(rpl + 0, name = "ribonucleoprotein", col = c("0" = "white", "1" = "purple"), show_heatmap_legend = FALSE, width = unit(5, "mm")) + Heatmap(ccl + 0, name = "cell_cycle", col = c("0" = "white", "1" = "red"), show_heatmap_legend = FALSE, width = unit(5, "mm"))
行名注释
最后我们把前面提取的细胞周期的行名注释上就完成这个图片啦:
# 选取行名进行注释Heatmap(mat2, col = colorRamp2(c(-1.5, 0, 1.5), c("blue", "white", "red")), name = "scaled_expr", column_title = qq("relative expression for @{nrow(mat3)} genes"), width = unit(8, "cm"), show_column_names = TRUE,show_row_names = FALSE, column_names_gp = gpar(fontsize = 5), heatmap_legend_param = list(title = "Scaled expr")) + Heatmap(base_mean, name = "base_expr", width = unit(5, "mm"), heatmap_legend_param = list(title = "Base expr")) + Heatmap(rpl + 0, name = "ribonucleoprotein", col = c("0" = "white", "1" = "purple"), show_heatmap_legend = FALSE, width = unit(5, "mm")) + Heatmap(ccl + 0, name = "cell_cycle", col = c("0" = "white", "1" = "red"), show_heatmap_legend = FALSE, width = unit(5, "mm")) + rowAnnotation(link = anno_mark(at = which(ccl & base_mean > quantile(base_mean, 0.8)), labels = rownames(mat3)[ccl & base_mean > quantile(base_mean, 0.8)], labels_gp = gpar(cex = 0.5,fontsize = 3), padding = unit(1, "mm")))
好了,今天的内容又结束啦。今天算是一个特别篇,给大家展示了一下单细胞数据的简单处理,后台回复“风25”就可以获得示例数据啦!关于更多单细胞的内容,我们可以在ComplexHeatmap系列结束后继续探讨,但是不得不说单细胞数据的处理真的太耗内存了,我的电脑好烫好烫好烫。下一讲,我们要继续ComplexHeatmap之旅,进行图形注释的内容!我们下期不见不散!o( ̄▽ ̄)ブ
欢迎大家关注解螺旋生信频道-挑圈联靠公号~
继续阅读
阅读原文