Saturday, July 13, 2013

A Hidden Gem in Central Asia: Previously Unknown Y-DNA R1b Haplotype [Original Work]

1. Introduction

Central Asian Y-DNA diversity has been an area of constant intrigue in the genetics community. Wells et
al.'s The Eurasian Heartland: A continental perspective on Y-chromosome diversity paved the way, with several others following in their regard. Members of the same team (including Dr. Wells) produced another paper - A Genetic Landscape Reshaped by Recent Events: Y-Chromosomal Insights into Central Asia - on the same topic in the following year, this time headed by Dr. Tatania Zerjal. I noted a greater emphasis on East-Central Asian populations as well as a mentioning of Y-STR analysis in the study itself. However, none of this data was supplied, with only Y-SNP information included (shown sporadically in this entry). The age of this paper is apparent through the nomenclature used (see Method section).

Several months ago, I made a request to obtain the Y-STR data from this study to one of the co-authors, Dr. Tyler-Smith, who kindly replied with the results of all sampled populations (Data Sink > Zerjal et al. Raw Data).

In this blog entry, the Y-STR data is showcased with a special emphasis on the Y-DNA R1b-M269 which was discovered.

2. Method
Y-SNP Phylogeny in original paper (Zerjal et al.) [1]

The maximum number of compatible Y-STR's were utilised for processing in Urasin's YPredictor for easier haplogroup identification (14 of a possible 16, DYS434 and 435 were excluded). All data was run through YPredictor. Only samples with ≥70% probability were included in the final results (Data Sink > Processed Data). As discussed below, relevant findings are compared with the basic Y-SNP haplogroups shown in the original study (on right).

One point which needs to be addressed immediately is the high frequency of "_DE-M1" and "P-M45". It appears that the STR selection has led to a phantom result, rendering many of the samples useless. For instance, the original study shows the Kazakhs belong overwhelmingly to C3c-M48, [1] although the probable results shown here are mostly "_DE-M1".  The exclusion of DYS434 and 435 from my level of processing likely contributed to this; if one assigns equal weight to the statistical strength of a prediction, removal of two STR's from a panel numbering 16, accuracy is reduced by 12.5%. Additionally, some conversion error seems to have applied with DYS437 (i.e. a value <12 is unusual). Therefore, "_DE-M1" and "P-M45" results were dismissed on account of the mismatch between predicted and likely confirmed haplogroups probably due to a compatibility issue between the study's STR panel and YPredictor..

3. Results

As the majority of samples were removed owing to the caveat described above, this entry will take a qualitative rather than quantitative approach to analysis on the general picture formed. Much of the remaining results are congruent with findings in other papers. Populations around the Caucasus are signified by plenty of R1b-M269, J2a-M410 and G2a-P15. Tajiks and the Kyrgyz were predominantly R1a1a-M17. Mongolians and other East-Central Asian ethnic groups yielded the most O3-M122 and "NO-M14" (likely to be Y-DNA N or O suffering from the STR restrictions described in the Method section).

Y-SNP distribution in Central Asia (Zerjal et al.) [1]

3i. The R1b Signal 

R1b-M269 was found across Central Asia and not only in the Caucasus (Armenians, Azeris, Georgians, Ossetians). It was mostly detected among the Turkmen (trk1, trk2, trk4, trk6, trk7, trk22, T29, T32) with a single sample among the Uzbek (uz-s110). [1]

Analysis of the haplotypes (including DYS434 and DYS435) revealed the nine Central Asian R1b samples belonged to a secure haplotype (Data Sink > R1b Results). trk6 diverged greatest, albeit with two 1-step mutations on DYS393 and DYS434. The rest match this haplotype exactly or have single 1-step mutations. [1] When this Central Asian R1b haplotype is compared with the other Caucasian samples, a mixed picture emerges, with the poorest being an Armenian (arm47) at 8/16, whereas the best are another Armenian (arm12) and Azeri (az48), both at 15/16. [1]
One interesting point is the Kurds sampled in this study (some of whom also belong to R1b-M269) are actually the displaced population positioned on the Iranian-Turkmenistani border. All of whom match the Central Asian R1b haplotype with a similar value (12-13/16). This definitively rules out the Kurds as a source for the haplotype, particularly as better matches can be found further to the west. It should be noted the Kurds themselves formed their own R1b haplotype (defined here by DYS389II=27, DYS391=10). [1]

In summary, the data reveals that the Turkmen are particularly abundant in R1b-M269 and all belong to the same haplotype as one of the Uzbek samples. This haplotype matched some Caucasians very well, but others not so well. The Kurds living in Turkmenistan belonged to their own haplotype.

3ii. Is This Actually R1b-M269?

Attention must first be shown to the original paper again; any potential R1b-M269 here will be present as P(xR1a)-92R7 (shown in the paper as "Haplogroup 1"). [1] Evidently, this makes up approximately half of the Turkmen lines and a quarter of Uzbek ones. Other haplogroups (such as other forms of R1b, R2a-M124, various Q subclades) presumably make up the rest of "Haplogroup 1" shown.

The next step is to verify whether or not this Central Asian R1b haplotype matches other R1b haplotypes online. As Y-DNA R1b-M269 is fortunately well-represented in the world of genetic genealogy, searching for the haplotype's matches on ySearch is a reasonable enterprise. DYS437 had to be excluded here due to a conversion issue, leaving the haplotype at 15 STR's. A genetic distance (GD) of 3 was allowed on these 15 markers. Results are shown on the right.

ySearch results for Central Asian R1b haplotype
With some confidence, the search has demonstrated that the Central Asian R1b haplotype does indeed belong to R1b-M269, as all the seven matches shown (one of whom is Armenian) belong to it.

Expanding the line of inquiry one further step came through comparing this haplotype with Iranian haplotypes [2] which were readily available. Due to differences in STR panels (an overlap of only 11) this proved to be inconclusive, aside from the observation that DYS389i+ii was completely different between the Central Asian modal (10-26) and the Iranian values. At this point I suspect that, much like DYS437, there is a conversion issue with DYS389 also.

Finally, a comparison was made with the R1b found in Afghanistan last year [3]. Interestingly, if DYS389i+ii and DYS437 are excluded, the two Uzbeks (samples 35 and 181) match the Central Asian R1b haplotype almost exactly based on the remaining 11 STR's. The one Tajik (sample 32) is less likely to be related due to two 1-step mutations on different STR's.

4. Conclusion

The inferences made from the data hang by a metaphorical thread due to the persistent STR issue; different labs have used different panels in the past decade, making it excruciatingly difficult to use materials from older papers. Fortunately, the presence of a specific strain of R1b-M269 in Central Asian (in Turkmen and Uzbeks) has successfully been demonstrated after select exclusions and no modifications to the data.

However, some larger questions remain. If STR limitations were not an issue, how would the Iranians from Haber et al. have compared? Would the Tajik from the other Haber et al. paper have belonged to the same haplotype in the end?

The origin of this Central Asian R1b haplotype will, I anticipate, also be a point discussed heavily among interested parties. At this point in time, I must stress that none of the evidence thus far points to anything in particular without ruling other theories out, although it leaves the door for interpretation wide open.

Having given this cautionary statement, the main thrust of this entry should be emphasised; R1b-M269 in Central Asia is a confirmed reality and here to stay. I will defer any subsequent analyses to the experts on Y-DNA R1b which grace several genetic genealogy boards for their take on the flavour of this haplotype.

5. Acknowledgement

I publicly extend my gratitude to Dr. Tyler-Smith for being so kind in sending me the raw STR's from this important paper for my research, as well as co-authoring the other two excellent studies I have cited here and in the past.

6. References

1. Zerjal T, Wells RS, Yuldasheva N, Ruzibakiev R, Tyler-Smith C. A genetic landscape reshaped by recent events: Y-chromosomal insights into central Asia. Am J Hum Genet. 2002 Sep;71(3):466-82. Epub 2002 Jul 17.

2. Haber M, Platt DE, Badro DA, Xue Y, El-Sibai M, Bonab MA. Influences of history, geography, and religion on genetic structure: the Maronites in Lebanon. Eur J Hum Genet. 2011 Mar;19(3):334-40. doi: 10.1038/ejhg.2010.177. Epub 2010 Dec 1.

3. Haber M, Platt DE, Ashrafian Bonab M, Youhanna SC, Soria-Hernanz DF, Martínez-Cruz B. Afghanistan's ethnic groups share a Y-chromosomal heritage structured by historical events. PLoS One. 2012;7(3):e34288. doi: 10.1371/journal.pone.0034288. Epub 2012 Mar 28.