群体遗传进化

Apr 24, 2018 2 min read population genetics

mark

简介

群体结构又称为群体分层，指所研究的群体中存在基因频率不同的亚群。其基本原理是将群体分成k个服从哈迪温伯格平衡的亚群，将每个材料归到各个亚群并计算每个材料基因组变异源于第k个亚群的可能性，主要是利用Q矩阵进行衡量，一般来说Q值越大，表明每个材料来自于某个亚群的可能性越大。群体结构分析的软件有很多，比如：STRUCTURE、ADMIXTURE、FASTSTRUCTURE、TESS、BAPS等。这里不介绍如何使用这些软件进行群体结构分析，下次有时间再进行介绍。本文主要介绍利用R包pophelper对这些软件生成的数据进行展示。具体情况可以参考pophelper的 vignette

安装

安装过程不多说

install.packages(c("Cairo","devtools","ggplot2","gridExtra","gtable","tidyr"),dependencies=T)
devtools::install_github('royfrancis/pophelper')

流程

workflow

数据读取

使用pophelper提供的数据,这里只使用structure软件生成的格式文件

library(pophelper)
sfiles <- list.files(path=system.file("files/structure",package = "pophelper"), full.names = T)
sfiles

##  [1] "D:/R-3.5.0/library/pophelper/files/structure/structure_01"
##  [2] "D:/R-3.5.0/library/pophelper/files/structure/structure_02"
##  [3] "D:/R-3.5.0/library/pophelper/files/structure/structure_03"
##  [4] "D:/R-3.5.0/library/pophelper/files/structure/structure_04"
##  [5] "D:/R-3.5.0/library/pophelper/files/structure/structure_05"
##  [6] "D:/R-3.5.0/library/pophelper/files/structure/structure_06"
##  [7] "D:/R-3.5.0/library/pophelper/files/structure/structure_07"
##  [8] "D:/R-3.5.0/library/pophelper/files/structure/structure_08"
##  [9] "D:/R-3.5.0/library/pophelper/files/structure/structure_09"
## [10] "D:/R-3.5.0/library/pophelper/files/structure/structure_10"
## [11] "D:/R-3.5.0/library/pophelper/files/structure/structure_11"
## [12] "D:/R-3.5.0/library/pophelper/files/structure/structure_12"
## [13] "D:/R-3.5.0/library/pophelper/files/structure/structure_13"
## [14] "D:/R-3.5.0/library/pophelper/files/structure/structure_14"
## [15] "D:/R-3.5.0/library/pophelper/files/structure/structure_15"
## [16] "D:/R-3.5.0/library/pophelper/files/structure/structure_16"
## [17] "D:/R-3.5.0/library/pophelper/files/structure/structure_17"

可以看到总共有17个structure文件

slist <- readQ(files = sfiles,indlabfromfile = T, filetype = "structure")
tr1 <- tabulateQ(qlist = slist)
summariseQ(tr1, writetable = TRUE)

##   loci ind k runs  elpdmean     elpdsd elpdmin elpdmax
## 1   25 149 2    3 -7509.367  0.8082904 -7510.1 -7508.5
## 2   25 149 3    3 -7476.000  0.4358899 -7476.5 -7475.7
## 3   25 149 4    3 -7650.800 46.1418465 -7687.5 -7599.0
## 4   25 149 5    3 -7743.567 74.0671542 -7828.5 -7692.4
## 5   25 149 6    3 -7985.833 32.6353081 -8023.3 -7963.6
## 6   25 149 7    2 -8614.000 60.5283405 -8656.8 -8571.2

summariseQ会给出一些信息，writetable = TRUE将文件写入到工作文件夹中

估计最佳分层数

函数evannoMethodStructure用来估计最佳分层数k,使用的方法是Evanno method，这一函数只适应于STRUCTURE软件运行的结果

sr1 <- summariseQ(tr1)
p <- evannoMethodStructure(data = sr1, exportplot = F, returnplot = T, returndata = F, basesize = 12, linesize = 0.7)
gridExtra::grid.arrange(p)

mark

一般我们是通过看ΔK来确定最佳分群，这里可以看到是k=3

pophelper提供了一个封装函数，可以一次性出来所有结果

analyseQ(sfiles)

可以挑出k=3时的structure图

mark

美化

pophelper还提供了一个绘图函数plotQ,里面内置了大量图形参数设置用来美化图片，这里就不一一展示了，有兴趣的可以去实践一下

mark

SessionInfo

sessionInfo()

## R version 3.5.0 (2018-04-23)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 16299)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=Chinese (Simplified)_China.936 
## [2] LC_CTYPE=Chinese (Simplified)_China.936   
## [3] LC_MONETARY=Chinese (Simplified)_China.936
## [4] LC_NUMERIC=C                              
## [5] LC_TIME=Chinese (Simplified)_China.936    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] pophelper_2.2.5.1 ggplot2_2.2.1     Cairo_1.5-9      
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.16     knitr_1.20       magrittr_1.5     tidyselect_0.2.4
##  [5] munsell_0.4.3    colorspace_1.3-2 rlang_0.2.0      stringr_1.3.0   
##  [9] plyr_1.8.4       tools_3.5.0      grid_3.5.0       gtable_0.2.0    
## [13] htmltools_0.3.6  yaml_2.1.19      lazyeval_0.2.1   rprojroot_1.3-2 
## [17] digest_0.6.15    tibble_1.4.2     gridExtra_2.3    purrr_0.2.4     
## [21] tidyr_0.8.0      glue_1.2.0       evaluate_0.10.1  rmarkdown_1.9   
## [25] labeling_0.3     stringi_1.1.7    compiler_3.5.0   pillar_1.2.2    
## [29] scales_0.5.0     backports_1.1.2

进化选择性清除分析

Researcher

I am a PhD student of Crop Genetics and Breeding at the Zhejiang University Crop Science Lab. My research interests covers a range of issues:Population Genetics Evolution and Ecotype Divergence Analysis of Oilseed Rape, Genome-wide Association Study (GWAS) of Agronomic Traits.