群体遗传进化
简介
群体结构又称为群体分层,指所研究的群体中存在基因频率不同的亚群。其基本原理是将群体分成k个服从哈迪温伯格平衡的亚群,将每个材料归到各个亚群并计算每个材料基因组变异源于第k个亚群的可能性,主要是利用Q矩阵进行衡量,一般来说Q值越大,表明每个材料来自于某个亚群的可能性越大。群体结构分析的软件有很多,比如:STRUCTURE
、ADMIXTURE
、FASTSTRUCTURE
、TESS
、BAPS
等。这里不介绍如何使用这些软件进行群体结构分析,下次有时间再进行介绍。本文主要介绍利用R包pophelper
对这些软件生成的数据进行展示。具体情况可以参考pophelper
的
vignette
安装
安装过程不多说
install.packages(c("Cairo","devtools","ggplot2","gridExtra","gtable","tidyr"),dependencies=T)
devtools::install_github('royfrancis/pophelper')
流程
数据读取
使用pophelper
提供的数据,这里只使用structure软件生成的格式文件
library(pophelper)
sfiles <- list.files(path=system.file("files/structure",package = "pophelper"), full.names = T)
sfiles
## [1] "D:/R-3.5.0/library/pophelper/files/structure/structure_01"
## [2] "D:/R-3.5.0/library/pophelper/files/structure/structure_02"
## [3] "D:/R-3.5.0/library/pophelper/files/structure/structure_03"
## [4] "D:/R-3.5.0/library/pophelper/files/structure/structure_04"
## [5] "D:/R-3.5.0/library/pophelper/files/structure/structure_05"
## [6] "D:/R-3.5.0/library/pophelper/files/structure/structure_06"
## [7] "D:/R-3.5.0/library/pophelper/files/structure/structure_07"
## [8] "D:/R-3.5.0/library/pophelper/files/structure/structure_08"
## [9] "D:/R-3.5.0/library/pophelper/files/structure/structure_09"
## [10] "D:/R-3.5.0/library/pophelper/files/structure/structure_10"
## [11] "D:/R-3.5.0/library/pophelper/files/structure/structure_11"
## [12] "D:/R-3.5.0/library/pophelper/files/structure/structure_12"
## [13] "D:/R-3.5.0/library/pophelper/files/structure/structure_13"
## [14] "D:/R-3.5.0/library/pophelper/files/structure/structure_14"
## [15] "D:/R-3.5.0/library/pophelper/files/structure/structure_15"
## [16] "D:/R-3.5.0/library/pophelper/files/structure/structure_16"
## [17] "D:/R-3.5.0/library/pophelper/files/structure/structure_17"
可以看到总共有17个structure文件
slist <- readQ(files = sfiles,indlabfromfile = T, filetype = "structure")
tr1 <- tabulateQ(qlist = slist)
summariseQ(tr1, writetable = TRUE)
## loci ind k runs elpdmean elpdsd elpdmin elpdmax
## 1 25 149 2 3 -7509.367 0.8082904 -7510.1 -7508.5
## 2 25 149 3 3 -7476.000 0.4358899 -7476.5 -7475.7
## 3 25 149 4 3 -7650.800 46.1418465 -7687.5 -7599.0
## 4 25 149 5 3 -7743.567 74.0671542 -7828.5 -7692.4
## 5 25 149 6 3 -7985.833 32.6353081 -8023.3 -7963.6
## 6 25 149 7 2 -8614.000 60.5283405 -8656.8 -8571.2
summariseQ
会给出一些信息,writetable = TRUE
将文件写入到工作文件夹中
估计最佳分层数
函数evannoMethodStructure
用来估计最佳分层数k,使用的方法是Evanno method,这一函数只适应于STRUCTURE软件运行的结果
sr1 <- summariseQ(tr1)
p <- evannoMethodStructure(data = sr1, exportplot = F, returnplot = T, returndata = F, basesize = 12, linesize = 0.7)
gridExtra::grid.arrange(p)
一般我们是通过看ΔK来确定最佳分群,这里可以看到是k=3
pophelper提供了一个封装函数,可以一次性出来所有结果
analyseQ(sfiles)
可以挑出k=3时的structure图
美化
pophelper还提供了一个绘图函数plotQ
,里面内置了大量图形参数设置用来美化图片,这里就不一一展示了,有兴趣的可以去实践一下
SessionInfo
sessionInfo()
## R version 3.5.0 (2018-04-23)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 16299)
##
## Matrix products: default
##
## locale:
## [1] LC_COLLATE=Chinese (Simplified)_China.936
## [2] LC_CTYPE=Chinese (Simplified)_China.936
## [3] LC_MONETARY=Chinese (Simplified)_China.936
## [4] LC_NUMERIC=C
## [5] LC_TIME=Chinese (Simplified)_China.936
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] pophelper_2.2.5.1 ggplot2_2.2.1 Cairo_1.5-9
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.16 knitr_1.20 magrittr_1.5 tidyselect_0.2.4
## [5] munsell_0.4.3 colorspace_1.3-2 rlang_0.2.0 stringr_1.3.0
## [9] plyr_1.8.4 tools_3.5.0 grid_3.5.0 gtable_0.2.0
## [13] htmltools_0.3.6 yaml_2.1.19 lazyeval_0.2.1 rprojroot_1.3-2
## [17] digest_0.6.15 tibble_1.4.2 gridExtra_2.3 purrr_0.2.4
## [21] tidyr_0.8.0 glue_1.2.0 evaluate_0.10.1 rmarkdown_1.9
## [25] labeling_0.3 stringi_1.1.7 compiler_3.5.0 pillar_1.2.2
## [29] scales_0.5.0 backports_1.1.2