WGCNA秘籍教程高级版
大家好,我是风风。今天我们还是要继续WGCNA系列。在之前的推文中,我们已经一起做了对单个数据集进行了WGCNA分析并完整地走完了单个WGCNA的流程和针对多数据集联合分析的共识性分析。然而,在前面的两次推文分析构建网络的过程中,我们使用的都是“自动构建拓扑异构网络”法。但是,在实际分析过程中,有时候我们想要根据自己的数据集的实际情况,去个性化调节每一个步骤,从而构建更优也更符合心意的WGCNA网络,这样就需要用到了我们今天所要一起学习的内容——“逐步法构建拓扑异构网络”。
我们前面也说过了,构建网络是WGCNA的第一个关键点。在公共数据挖掘的文章中,相同领域的研究者所使用的数据集大同小异,比如研究结肠癌的研究人员总会考虑使用TCGA数据库的TCGA-COAD数据集,如果大家都使用前面说的“自动构建拓扑异构网络”法构建网络,那大家所构建的网络应该都差不多才对是吧(都使用全部基因的表达矩阵进行WGCNA分析的前提下)?
然而实际上,即使使用同一份数据集,总有人的结果比较好,而有的人结果比较差,原因之一就是可能有的研究者在使用了“逐步法构建拓扑异构网络”,并且个性化参数设置得很好。如何找到适合自己数据集的个性化参数,这是需要自己不断试错不断探索的过程。即使你去找专业的分析公司,公司一般也不会花时间帮你试错,只会是用最简单的方法得到结果后告诉你数据确实不适合使用这种分析方法。毫不夸张地说,自己学会了WGCNA的个性化网络步骤,可以让自己省下不少钱,还能更有效利用自己手上的数据。怎么样?听起来很诱人了吧?
此外,WGCNA还有一个难题,就是面对大数据集分析的适合,直接构建网络会导致WGCNA速度非常慢,甚至出现R studio崩溃的情况,今天我们也一起来看看大数据集如何切分block有效进行WGCNA的方法
好了,话不多说,我们进入今天主题,前面的数据处理的方法依旧和前两次的系列推文一致:
数据清洗
在进行WGCNA之前,我们首先要进行数据清洗:
# BiocManager::install("WGCNA") 安装R包library(WGCNA)## 载入需要的程辑包:dynamicTreeCut## 载入需要的程辑包:fastcluster## ## 载入程辑包:'fastcluster'## The following object is masked from 'package:stats':## ## hclust#### ## 载入程辑包:'WGCNA'## The following object is masked from 'package:stats':## ## coroptions(stringsAsFactors = FALSE)# 加载LiverFemale3600数据集femData <- read.csv("data//LiverFemale3600.csv");# 查看数据dim(femData)## [1] 3600 143names(femData)## [1] "substanceBXH""gene_symbol""LocusLinkID""ProteomeID"## [5] "cytogeneticLoc""CHROMOSOME""StartPosition""EndPosition"## [9] "F2_2""F2_3""F2_14""F2_15"## [13] "F2_19""F2_20""F2_23""F2_24"## [17] "F2_26""F2_37""F2_42""F2_43"## [21] "F2_45""F2_46""F2_47""F2_48"## [25] "F2_51""F2_52""F2_54""F2_63"## [29] "F2_65""F2_66""F2_68""F2_69"## [33] "F2_70""F2_71""F2_72""F2_78"## [37] "F2_79""F2_80""F2_81""F2_83"## [41] "F2_86""F2_87""F2_88""F2_89"## [45] "F2_107""F2_108""F2_109""F2_110"## [49] "F2_111""F2_112""F2_117""F2_119"## [53] "F2_125""F2_126""F2_127""F2_141"## [57] "F2_142""F2_143""F2_144""F2_145"## [61] "F2_154""F2_155""F2_156""F2_157"## [65] "F2_162""F2_163""F2_164""F2_165"## [69] "F2_166""F2_167""F2_169""F2_180"## [73] "F2_181""F2_182""F2_187""F2_188"## [77] "F2_189""F2_190""F2_191""F2_192"## [81] "F2_194""F2_195""F2_200""F2_201"## [85] "F2_212""F2_213""F2_214""F2_215"## [89] "F2_221""F2_222""F2_223""F2_224"## [93] "F2_225""F2_226""F2_227""F2_228"## [97] "F2_241""F2_242""F2_243""F2_244"## [101] "F2_245""F2_247""F2_248""F2_261"## [105] "F2_263""F2_264""F2_270""F2_271"## [109] "F2_272""F2_278""F2_287""F2_288"## [113] "F2_289""F2_290""F2_291""F2_296"## [117] "F2_298""F2_299""F2_300""F2_302"## [121] "F2_303""F2_304""F2_305""F2_306"## [125] "F2_307""F2_308""F2_309""F2_310"## [129] "F2_311""F2_312""F2_320""F2_321"## [133] "F2_323""F2_324""F2_325""F2_326"## [137] "F2_327""F2_328""F2_329""F2_330"## [141] "F2_332""F2_355""F2_357"#### 数据清洗# 提取表达矩阵并行列转置,将行名变为样本名datExpr0 <- as.data.frame(t(femData[, -c(1:8)]))names(datExpr0) <- femData$substanceBXHrownames(datExpr0) <- names(femData)[-c(1:8)]# 判断矩阵的样本是否都合格?gsg <- goodSamplesGenes(datExpr0, verbose = 3);## Flagging genes and samples with too many missing values...## ..step 1gsg$allOK # 显示TURE则为都合格,不需要进行额外处理,如果是FALSE,则运行下面这部分代码## [1] TRUEif (!gsg$allOK){# Optionally, print the gene and sample names that were removed:if (sum(!gsg$goodGenes)>0) printFlush(paste("Removing genes:", paste(names(datExpr0)[!gsg$goodGenes], collapse = ", ")));if (sum(!gsg$goodSamples)>0) printFlush(paste("Removing samples:", paste(rownames(datExpr0)[!gsg$goodSamples], collapse = ", ")));# Remove the offending genes and samples from the data: datExpr0 = datExpr0[gsg$goodSamples, gsg$goodGenes]}# 对样本进行聚类,dist为计算距离,hclust为聚类sampleTree <- hclust(dist(datExpr0), method = "average")# 聚类结果可视化par(cex = 0.6);par(mar = c(0,4,2,0))plot(sampleTree, main = "Sample clustering to detect outliers", sub="", xlab="", cex.lab = 1.5,     cex.axis = 1.5, cex.main = 2)# 根据聚类结果,发现最左边的F2_221游离于其他样本之外,因此对其进行剪切去除# 绘制切割水平线abline(h = 15, col = "red")
# 确定集群clust <- cutreeStatic(sampleTree, cutHeight = 15, minSize = 10)table(clust)## clust## 0 1 ## 1 134# 我们需要保留clust 1的样本,去除clust 0的样本keepSamples <- (clust==1)datExpr <- datExpr0[keepSamples, ]nGenes <- ncol(datExpr)nSamples <- nrow(datExpr)# 读取表型数据traitData <- read.csv("data//ClinicalTraits.csv")dim(traitData)## [1] 361 38names(traitData)## [1] "X""Mice""Number"## [4] "Mouse_ID""Strain""sex"## [7] "DOB""parents""Western_Diet"## [10] "Sac_Date""weight_g""length_cm"## [13] "ab_fat""other_fat""total_fat"## [16] "comments""X100xfat_weight""Trigly"## [19] "Total_Chol""HDL_Chol""UC"## [22] "FFA""Glucose""LDL_plus_VLDL"## [25] "MCP_1_phys""Insulin_ug_l""Glucose_Insulin"## [28] "Leptin_pg_ml""Adiponectin""Aortic.lesions"## [31] "Note""Aneurysm""Aortic_cal_M"## [34] "Aortic_cal_L""CoronaryArtery_Cal""Myocardial_cal"## [37] "BMD_all_limbs""BMD_femurs_only"# 去掉多余的信息(不要的信息)allTraits <- traitData[, -c(31, 16)];allTraits <- allTraits[, c(2, 11:36) ];dim(allTraits)## [1] 361 27names(allTraits)## [1] "Mice""weight_g""length_cm"## [4] "ab_fat""other_fat""total_fat"## [7] "X100xfat_weight""Trigly""Total_Chol"## [10] "HDL_Chol""UC""FFA"## [13] "Glucose""LDL_plus_VLDL""MCP_1_phys"## [16] "Insulin_ug_l""Glucose_Insulin""Leptin_pg_ml"## [19] "Adiponectin""Aortic.lesions""Aneurysm"## [22] "Aortic_cal_M""Aortic_cal_L""CoronaryArtery_Cal"## [25] "Myocardial_cal""BMD_all_limbs""BMD_femurs_only"# 合并表达矩阵和表型数据femaleSamples <- rownames(datExpr);traitRows <- match(femaleSamples, allTraits$Mice);datTraits <- allTraits[traitRows, -1];rownames(datTraits) = allTraits[traitRows, 1];collectGarbage();# 对样本进行重聚类sampleTree2 <- hclust(dist(datExpr), method = "average")# 设置颜色traitColors <- numbers2colors(datTraits, signed = FALSE);# 聚类结果可视化plotDendroAndColors(sampleTree2, traitColors, groupLabels = names(datTraits), main = "Sample dendrogram and trait heatmap")
# 保存清洗后的数据save(datExpr, datTraits, file = "FemaleLiver-01-dataInput.RData")
逐步法构建网络
模块识别分为两种,一种是自动识别,另外一种是逐步法构建网络并识别,自动识别并一步构建网络的方法我们前面已经做过了,今天我们使用逐步法构建WGCNA网络:
# 设置多线程进行WGCNA分析enableWGCNAThreads()## Allowing parallel execution with up to 11 working processes.# 加载前面清洗完成的数据lnames <- load(file = "FemaleLiver-01-dataInput.RData")lnames## [1] "datExpr" "datTraits"# 设置软阈值区间powers <- c(c(1:10), seq(from = 12, to=20, by=2))# 调用网络拓扑分析功能sft <- pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)## pickSoftThreshold: will use block size 3600.## pickSoftThreshold: calculating connectivity for given powers...## ..working on genes 1 through 3600 of 3600## Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.## 1 1 0.0278 0.345 0.456 747.00 762.0000 1210.0## 2 2 0.1260 -0.597 0.843 254.00 251.0000 574.0## 3 3 0.3400 -1.030 0.972 111.00 102.0000 324.0## 4 4 0.5060 -1.420 0.973 56.50 47.2000 202.0## 5 5 0.6810 -1.720 0.940 32.20 25.1000 134.0## 6 6 0.9020 -1.500 0.962 19.90 14.5000 94.8## 7 7 0.9210 -1.670 0.917 13.20 8.6800 84.1## 8 8 0.9040 -1.720 0.876 9.25 5.3900 76.3## 9 9 0.8590 -1.700 0.836 6.80 3.5600 70.5## 10 10 0.8330 -1.660 0.831 5.19 2.3800 65.8## 11 12 0.8530 -1.480 0.911 3.33 1.1500 58.1## 12 14 0.8760 -1.380 0.949 2.35 0.5740 51.9## 13 16 0.9070 -1.300 0.970 1.77 0.3090 46.8## 14 18 0.9120 -1.240 0.973 1.39 0.1670 42.5## 15 20 0.9310 -1.210 0.977 1.14 0.0951 38.7powerEstimates <- sft$powerEstimate # 查看自动识别的最佳软阈值# 如果上述没有自动识别最佳软阈值,则可以将结果进行可视化后尽心选取# 结果可视化par(mfrow = c(1,2))cex1 = 0.9;# 无标度拓扑拟合指数plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2], xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n", main = paste("Scale independence"))text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2], labels=powers,cex=cex1,col="red")# 设置R2的截点abline(h=0.90,col="red")# 平均连接度plot(sft$fitIndices[,1], sft$fitIndices[,5], xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n", main = paste("Mean connectivity"))text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")
# 从上图可以看到,当power选取为6的时候,左边的R2值达到了0.9,右边的Mean Connectivity也趋向于平缓,因此我们选择6为最佳软阈值softPower = powerEstimates # 设置最佳软阈值adjacency<- adjacency(datExpr,                      power = softPower)# 将最邻近值转化为拓扑网络TOM<- TOMsimilarity(adjacency)## ..connectivity..## ..matrix multiplication (system BLAS)..## ..normalization..## ..done.dissTOM <- 1-TOM# 调用层次聚类函数geneTree<- hclust(as.dist(dissTOM),method = "average")# 绘制生成的聚类树plot(geneTree,xlab="",sub="",main = "Gene clustering on TOM-based dissimilarity",labels = FALSE, hang = 0.04)# 如果我们喜欢大的模块,那我们就可以把最小模块的数值设置得比较高一些:minModuleSize = 100 # 模块识别使用动态切割:dynamicMods<- cutreeDynamic(dendro = geneTree, distM = dissTOM,deepSplit = 2, pamRespectsDendro = FALSE,minClusterSize = minModuleSize)## ..cutHeight not given, setting it to 0.996 ===> 99% of the (truncated) height range in dendro.## ..done.table(dynamicMods)## dynamicMods## 0 1 2 3 4 5 6 7 8 9 ## 174 886 630 489 431 407 208 153 122 100# 可以看到,每个模块的基因数目都大于50# 转换数字标签成颜色标签dynamicColors<- labels2colors(dynamicMods)table(dynamicColors)## dynamicColors## black blue brown green grey magenta pink red ## 153 630 489 407 174 100 122 208 ## turquoise yellow ## 886 431# 在下面画出树状图和颜色plotDendroAndColors(geneTree,dynamicColors,"Dynamic Tree Cut",dendroLabels = FALSE,hang = 0.03,addGuide = TRUE,guideHang = 0.05,main = "Gene dendrogram and module colors")
# 接下来,我们就可以计算Eigengenes了MEList <- moduleEigengenes(datExpr, colors = dynamicColors)str(MEList)## List of 12## $ eigengenes :'data.frame': 134 obs. of 10 variables:## ..$ MEblack : num [1:134] -0.08299 -0.00357 -0.11687 -0.0292 -0.0314 ...## ..$ MEblue : num [1:134] -0.0848 0.0259 -0.0453 0.0836 0.025 ...## ..$ MEbrown : num [1:134] -0.1362 -0.0398 0.0266 -0.0242 0.0906 ...## ..$ MEgreen : num [1:134] 0.0289 0.0107 -0.1005 0.0928 -0.0971 ...## ..$ MEgrey : num [1:134] 0.01447 0.0065 0.00191 -0.02718 0.02109 ...## ..$ MEmagenta : num [1:134] 0.0103 -0.0137 0.0286 -0.0226 0.0031 ...## ..$ MEpink : num [1:134] -0.02191 -0.00401 0.59452 -0.03198 0.00104 ...## ..$ MEred : num [1:134] 0.013 0.0682 0.0661 -0.0633 0.0646 ...## ..$ MEturquoise: num [1:134] -0.00177 -0.04241 -0.20039 0.0151 -0.06133 ...## ..$ MEyellow : num [1:134] 0.00765 0.07166 0.05412 0.05204 0.01282 ...## $ averageExpr :'data.frame': 134 obs. of 10 variables:## ..$ AEblack : num [1:134] -0.418 0.016 -0.616 -0.135 -0.142 ...## ..$ AEblue : num [1:134] -0.3749 0.0951 -0.1778 0.337 0.1012 ...## ..$ AEbrown : num [1:134] -0.793 -0.313 0.126 -0.203 0.562 ...## ..$ AEgreen : num [1:134] 0.112 0.146 -0.795 0.539 -0.729 ...## ..$ AEgrey : num [1:134] -0.0187 -0.1047 -0.0813 -0.1834 -0.1032 ...## ..$ AEmagenta : num [1:134] 0.0952 -0.1483 0.2859 -0.2087 0.0143 ...## ..$ AEpink : num [1:134] -0.219 -0.0147 5.4851 -0.3072 0.0279 ...## ..$ AEred : num [1:134] 0.0956 0.5748 0.5113 -0.5394 0.5385 ...## ..$ AEturquoise: num [1:134] 0.0935 -0.0642 -0.4668 -0.0236 0.0163 ...## ..$ AEyellow : num [1:134] 0.0734 0.4352 0.2139 0.3169 0.071 ...## $ varExplained:'data.frame': 1 obs. of 10 variables:## ..$ X1 : num 0.541## ..$ X2 : num 0.392## ..$ X3 : num 0.439## ..$ X4 : num 0.469## ..$ X5 : num 0.16## ..$ X6 : num 0.663## ..$ X7 : num 0.639## ..$ X8 : num 0.578## ..$ X9 : num 0.399## ..$ X10: num 0.486## $ nPC : num 1## $ validMEs : logi [1:10] TRUE TRUE TRUE TRUE TRUE TRUE ...## $ validColors : chr [1:3600] "grey" "turquoise" "turquoise" "grey" ...## $ allOK : logi TRUE## $ allPC : logi TRUE## $ isPC : logi [1:10] TRUE TRUE TRUE TRUE TRUE TRUE ...## $ isHub : logi [1:10] FALSE FALSE FALSE FALSE FALSE FALSE ...## $ validAEs : logi [1:10] TRUE TRUE TRUE TRUE TRUE TRUE ...## $ allAEOK : logi TRUEMEs <- MEList$eigengenes# 计算模块特征间的不相似性MEDiss = 1 - cor(MEs)METree <- hclust(as.dist(MEDiss), method = "average")# 结果可视化plot(METree, main = "Clustering of module eigengenes", xlab = "", sub = "")MEDissThres = 0.5 # 将低于0.5的模块合并# 在树状图中画出切线abline(h = MEDissThres, col = "red")# 合并相似模块merge <- mergeCloseModules(datExpr, dynamicColors, cutHeight = MEDissThres, verbose = 3)## mergeCloseModules: Merging modules whose distance is less than 0.5## multiSetMEs: Calculating module MEs.## Working on set 1 ...## moduleEigengenes: Calculating 10 module eigengenes in given set.## multiSetMEs: Calculating module MEs.## Working on set 1 ...## moduleEigengenes: Calculating 7 module eigengenes in given set.## Calculating new MEs...## multiSetMEs: Calculating module MEs.## Working on set 1 ...## moduleEigengenes: Calculating 7 module eigengenes in given set.mergedColors = merge$colors# 新合并模块的特征基因:mergedMEs = merge$newMEs# 进行可视化plotDendroAndColors(geneTree, cbind(dynamicColors, mergedColors), c("Dynamic Tree Cut", "Merged dynamic"), dendroLabels = FALSE, hang = 0.03, addGuide = TRUE, guideHang = 0.05)
# 重命名模块颜色名称moduleColors = mergedColorscolorOrder = c("grey", standardColors(50))moduleLabels = match(moduleColors, colorOrder) - 1MEs = mergedMEs# 保存模块颜色和标签,以供后续部分使用save(MEs, moduleLabels, moduleColors, geneTree, file = "FemaleLiver-02-networkConstruction-stepByStep.RData")# 后面的分析就跟我们前面的内容一致啦,可以无缝衔接关联表型信息和模块啦!
大数据集进行WGCNA分析
对于大数据集进行WGCNA分析,可以使用数据集切割的方法,将大数据集自动切割为小数据集块,然后再进行WGCNA分析的逐块网络构建和模块检测,说起来好像很复杂,其实WGCNA也提供了一种函数直接分析,我们来看一下:
## 加载数据lnames = load(file = "FemaleLiver-01-dataInput.RData")lnames## [1] "datExpr" "datTraits"# 设置软阈值区间powers = c(c(1:10), seq(from = 12, to=20, by=2))# 计算软阈值sft <- pickSoftThreshold(datExpr, powerVector = powers, verbose = 5)## pickSoftThreshold: will use block size 3600.## pickSoftThreshold: calculating connectivity for given powers...## ..working on genes 1 through 3600 of 3600## Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k.## 1 1 0.0278 0.345 0.456 747.00 762.0000 1210.0## 2 2 0.1260 -0.597 0.843 254.00 251.0000 574.0## 3 3 0.3400 -1.030 0.972 111.00 102.0000 324.0## 4 4 0.5060 -1.420 0.973 56.50 47.2000 202.0## 5 5 0.6810 -1.720 0.940 32.20 25.1000 134.0## 6 6 0.9020 -1.500 0.962 19.90 14.5000 94.8## 7 7 0.9210 -1.670 0.917 13.20 8.6800 84.1## 8 8 0.9040 -1.720 0.876 9.25 5.3900 76.3## 9 9 0.8590 -1.700 0.836 6.80 3.5600 70.5## 10 10 0.8330 -1.660 0.831 5.19 2.3800 65.8## 11 12 0.8530 -1.480 0.911 3.33 1.1500 58.1## 12 14 0.8760 -1.380 0.949 2.35 0.5740 51.9## 13 16 0.9070 -1.300 0.970 1.77 0.3090 46.8## 14 18 0.9120 -1.240 0.973 1.39 0.1670 42.5## 15 20 0.9310 -1.210 0.977 1.14 0.0951 38.7# 图片可视化par(mfrow = c(1,2))cex1 = 0.9plot(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2], xlab="Soft Threshold (power)",ylab="Scale Free Topology Model Fit,signed R^2",type="n", main = paste("Scale independence"));text(sft$fitIndices[,1], -sign(sft$fitIndices[,3])*sft$fitIndices[,2], labels=powers,cex=cex1,col="red");abline(h = 0.90,col="red")plot(sft$fitIndices[,1], sft$fitIndices[,5], xlab="Soft Threshold (power)",ylab="Mean Connectivity", type="n", main = paste("Mean connectivity"))text(sft$fitIndices[,1], sft$fitIndices[,5], labels=powers, cex=cex1,col="red")
# 从所有基因块计算模块特征基因(自动网络建设和模块检测)bwnet <- blockwiseModules(datExpr, maxBlockSize = 2000, # 很重要!!!表示每次只对数据中的2000个基因进行网络建设和模块识别,也就是以2000为一个block进行处理,适合对大数据集进行分析 power = 6, TOMType = "unsigned", minModuleSize = 30, reassignThreshold = 0, mergeCutHeight = 0.25, numericLabels = TRUE, saveTOMs = TRUE, saveTOMFileBase = "femaleMouseTOM-blockwise", verbose = 3)## Calculating module eigengenes block-wise from all genes## Flagging genes and samples with too many missing values...## ..step 1## Cluster size 3600 broken into 2108 1492 ## Cluster size 2108 broken into 1126 982 ## Done cluster 1126 ## Done cluster 982 ## Done cluster 2108 ## Done cluster 1492 ## ....pre-clustering genes to determine blocks..## Projective K-means:## projectiveKMeans: imputing missing data in 'datExpr'.## To reproduce older results, use 'imputeMissing = FALSE'. ## Cluster size 3600 broken into 2108 1492 ## Cluster size 2108 broken into 1126 982 ## Done cluster 1126 ## Done cluster 982 ## Done cluster 2108 ## Done cluster 1492 ## ..k-means clustering..## ..merging smaller clusters...## Block sizes:## gBlocks## 1 2 ## 1869 1731 ## ..Working on block 1 .## TOM calculation: adjacency..## ..will not use multithreading.## Fraction of slow calculations: 0.434989## ..connectivity..## ..matrix multiplication (system BLAS)..## ..normalization..## ..done.## ..saving TOM for block 1 into file femaleMouseTOM-blockwise-block.1.RData## ....clustering..## ....detecting modules..## ....calculating module eigengenes..## ....checking kME in modules..## ..removing 3 genes from module 1 because their KME is too low.## ..Working on block 2 .## TOM calculation: adjacency..## ..will not use multithreading.## Fraction of slow calculations: 0.353419## ..connectivity..## ..matrix multiplication (system BLAS)..## ..normalization..## ..done.## ..saving TOM for block 2 into file femaleMouseTOM-blockwise-block.2.RData## ....clustering..## ....detecting modules..## ....calculating module eigengenes..## ....checking kME in modules..## ..removing 1 genes from module 2 because their KME is too low.## ..merging modules that are too close..## mergeCloseModules: Merging modules whose distance is less than 0.25## Calculating new MEs...# 加载单块分析的结果load(file = "FemaleLiver-02-networkConstruction-auto.RData")# 重新贴标签于块模块bwLabels <- matchLabels(bwnet$colors, moduleLabels)bwModuleColors <- labels2colors(bwLabels)table(bwModuleColors)## bwModuleColors## black blue brown cyan green greenyellow ## 210 474 538 77 270 98 ## grey grey60 lightcyan lightgreen magenta midnightblue ## 148 42 65 34 120 78 ## pink purple red royalblue salmon tan ## 142 110 130 88 81 142 ## turquoise yellow ## 448 305# 查看总共分为了几个blockbwnet[["TOMFiles"]]## [1] "femaleMouseTOM-blockwise-block.1.RData"## [2] "femaleMouseTOM-blockwise-block.2.RData"# 3600个基因,每2000个为一个block,正好分为2个block# 为block 1绘制树状图和模块颜色plotDendroAndColors(bwnet$dendrograms[[1]], bwModuleColors[bwnet$blockGenes[[1]]], "Module colors", main = "Gene dendrogram and module colors in block 1", dendroLabels = FALSE, hang = 0.03, addGuide = TRUE, guideHang = 0.05)
# 为block 2绘制树状图和模块颜色plotDendroAndColors(bwnet$dendrograms[[2]], bwModuleColors[bwnet$blockGenes[[2]]],"Module colors", main = "Gene dendrogram and module colors in block 2", dendroLabels = FALSE, hang = 0.03, addGuide = TRUE, guideHang = 0.05)
plotDendroAndColors(geneTree, cbind(moduleColors, bwModuleColors), c("Single block", "2 blocks"), main = "Single block gene dendrogram and module colors", dendroLabels = FALSE, hang = 0.03, addGuide = TRUE, guideHang = 0.05)
singleBlockMEs <- moduleEigengenes(datExpr, moduleColors)$eigengenesblockwiseMEs <- moduleEigengenes(datExpr, bwModuleColors)$eigengenessingle2blockwise <- match(names(singleBlockMEs), names(blockwiseMEs))signif(diag(cor(blockwiseMEs[, single2blockwise], singleBlockMEs)), 3)## MEblack MEblue MEbrown MEcyan MEgreen ## 0.999 1.000 0.987 0.999 0.998 ## MEgreenyellow MEgrey MEgrey60 MElightcyan MElightgreen ## 1.000 0.963 0.999 0.993 1.000 ## MEmagenta MEmidnightblue MEpink MEpurple MEred ## 1.000 0.999 0.971 0.998 0.973 ## MEsalmon MEtan MEturquoise MEyellow ## 0.998 0.991 0.993 0.995# 这样我们大数据集的WGCNA网络构建也完成了
好了,今天的内容到此结束!今天我们主要对WGCNA的网络构建的步骤进行了个性化设置,进行了逐步法构建网络,逐步法的优势就是自己可以不断进行调整,使得识别的网络更适合我们自己的数据集,此外我们也展示了对于大数据集来说,如何按照不同block来识别网络,进行得到不同模块。
学完这一期,我想,对于那些你之前进行了WGCNA分析之后发现结果不好的数据集,可以考虑一下重新拿起来进行分析了。
后台回复“feng 62”获取代码和数据,我们下期见!
单细胞分析专栏传送门
碎碎念专栏传送门(完结)
风之美图系列传送门(完结)
END

撰文丨风   风
排版丨四金兄
主编丨小雪球
欢迎大家关注解螺旋生信频道-挑圈联靠公号~

继续阅读
阅读原文