Afghanistan's Ethnic Groups Share a Y-Chromosomal Heritage Structured by Historical Events
Haber M, Platt DE, Ashrafian Bonab M, Youhanna SC, Soria-Hernanz DF, et al. (2012) Afghanistan's Ethnic Groups Share a Y-Chromosomal Heritage Structured by Historical Events. PLoS ONE 7(3): e34288. doi:10.1371/journal.pone.0034288
"Afghanistan has held a strategic position throughout history. It has been inhabited since the Paleolithic and later became a crossroad for expanding civilizations and empires. Afghanistan's location, history, and diverse ethnic groups present a unique opportunity to explore how nations and ethnic groups emerged, and how major cultural evolutions and technological developments in human history have influenced modern population structures. In this study we have analyzed, for the first time, the four major ethnic groups in present-day Afghanistan: Hazara, Pashtun, Tajik, and Uzbek, using 52 binary markers and 19 short tandem repeats on the non-recombinant segment of the Y-chromosome. A total of 204 Afghan samples were investigated along with more than 8,500 samples from surrounding populations important to Afghanistan's history through migrations and conquests, including Iranians, Greeks, Indians, Middle Easterners, East Europeans, and East Asians. Our results suggest that all current Afghans largely share a heritage derived from a common unstructured ancestral population that could have emerged during the Neolithic revolution and the formation of the first farming communities. Our results also indicate that inter-Afghan differentiation started during the Bronze Age, probably driven by the formation of the first civilizations in the region. Later migrations and invasions into the region have been assimilated differentially among the ethnic groups, increasing inter-population genetic differences, and giving the Afghans a unique genetic diversity in Central Asia."
[PDF] [Supplementary Data]
Tabulated Y-DNA Haplogroup frequencies of the 204 individuals sampled distinguished by ethno-linguistic affiliation (ISOGG 2011 Nomenclature utilised) can be found in the Data Sink.
Results (populations sample count ~50 only)
- Haplogroup B-M60, a marker that would normally be expected among African populations, makes a surprising presence in the Afghan Hazara. Superficial STR analysis (17/19 haplotype match between all) suggests a recent common paternal ancestor, although the timeframe and ultimate origin of this common ancestor is another question.
- Haplogroup C3-M217 has invariably been associated with the expansion of Altaic-/Mongolic steppe populations since medieval times. The greater frequency (33.9%) in the Hazara relative to the Tajiks and Pashtuns appears to support this, as well as the commonly-held belief they partially descend from Mongolian tribes.
- The Hazara E1b1b1c1-M34 also stems from a common ancestor (all three share the exact 19 STR haplotype).
- The single man belonging to Haplogroup G1-M285 is of Tajik descent. It is possible this man's paternal line arrived with eastward migrating Persians following the Sassanid collapse in 651 A.D.
- As shown in previous studies, the Pashtun Haplogroup G men are again G2c-M377 (entirely this time, in contrast with Lacau et al.)
- Paragroups H*-M69, J2a*-M410, Q*-M242 and R*-M207 all indicate that Afghanistan played an important role in the demic development of their downstream subclades, or was at the very least a geographic nexus. It is worth noting that the Hazara Q* men belong to a different haplotype to their Pashtun and Tajik compatriots, again indicating genetic drift has taken place since the formation of the Hazara ethnic group (or, instead, paternal consistency through the presumed Mongolic layer that eventually formed modern Hazaras).
- In previous studies (Sengupta et al., Lacau et al.), several haplotypes without backbone SNP testing were found to belong to Haplogroup I, which is frequently considered a lineage specific to Europe. For the first time we have evidence of an I clade (I2b1-M223) in South-Central Asia, specifically among the Hazara and Tajik. The following is a recent exchange with Professor Ken Nordtvedt regarding the I2b1-M223 samples;
"The two Hazara seem related. Both haplotypes look like M223+, with the Tajik one like Continental2 characteristic of central Europe.
The Hazara haplotype looks more like M223+ Roots. But both have some problems with being considered close matches to European haplotypes.
I don’t think such tmrcas would be worth much. I still don’t have a firm subclade of M223 to work with for either haplotype."
Due to the limited STR's it is not possible to cleanly place these I2b1 haplotypes into any of the existing clusters/subclades. However, Haplogroup I2b itself appears to be thousands of years old (Nordtvedt's I tree, final page). This opens up the possibility for an endogenous form of Haplogroup I existing in South-Central Asia.
- A single Tajik belonging to J1c3-P58 was postulated to potentially be of Arabian origin. As the (miniscule) Afghani Arabs did not yield any J1c3, other possibilities should be considered, such as contacts with the Iranian plateau over the past few millenia.
- The Tajiks were the only population to boast the presence of all major subclades within Haplogroup L (L1a-M27, L1b-M317, L1c-M357). In line with their greater frequency relative to the Tajiks and Hazaras, several Pashtun L1c-M357 samples share similar (exact-to-2-step mutation) matches, suggesting another example of genetic drift.
- Although the Laghman Pashtuns share a similar L1c-M357 haplotype (16-17/19 match), so does the sole Tajik L1c from the same location, providing us with genetic evidence of recent mutual origins between Pashtuns and Tajiks in certain parts of Afghanistan.
- The Tajik population is more paternally diverse than all others sampled. Explanations include a less endogamous cultural character or the more recent imposition of the "Tajik" identity, which arrived with the medieval Turks.
- R1b1a*-P297 (xM269) and R1b1a2*-M269 (xU106) both appear in Uzbek and Tajik populations. Both the R1b1a*-P297 haplotypes are identical and belong to a Tajik and Uzbek, again showing there is some recent paternal overlap between Central Asian ethnic groups. I discovered the haplotype does not generally correspond with any of the established clusters in the R1b1a1-M73 Project, although there is a 13/15 match with a Tajik from Cluster B1. Although the limited STR's are unfavourable, I am of the opinion the match is substantial and the R1b1a*-P297 reported in this study is in fact R1b1a1-M73 and belongs to Cluster B1, whose membership also consists of other Tajiks, Uzbeks and an Anatolian Turk.
- It is very interesting to note that all the locations showing R1b1a*-P297 (xM269) and R1b1a2*-M269 (xU106) (Badakhshan, Herat, Takhar and Mazar-e-Sharif) lie on a horizonal plane that runs across the north of Afghanistan, particularly as the Bactria-Margiana Archaeological Complex (BMAC) was situated here.
Criticisms of Paper
- Haplogroup R2a-M124 has been erroneously correlated with aboriginal Subcontinental populations when results from the R2 WTY Project indicate places like India are a "sink" rather than a "source" (most Indian R2a is R2a1-L295, which has a spotty distribution across the rest of Eurasia).
- Haplogroup L is, much like R2a, an understudied lineage, presumably due to its' paucity in Europe. The once-common assumption in the population genetics and genealogical world that the frequency of a given lineage in a region/population signifies its' antiquity there has been proven to be inherently false through STR and SNP analysis. Haplogroup L may enjoy greater frequencies in India according to the sources at their disposal, but the presence of different L subclades in Central and West Asia should have at least given the authors the initiative to investigate the lineage's deeper structure rather than relying on a population genetics tagline from at least 2006 (Sengupta et al.).
- Despite the recent boon in research on Haplogroup R1a1a-M17's structure by independent genetic genealogists and projects (such as the R1a1a and Subclades Y-DNA Project), Haber et al. failed to include any of the pivotal SNP's that have been discovered since Underhill et al. from 2009, thus preventing observers from making any meaningful conclusions from the current findings, particularly in the context of the Indo-European migrations (generally accepted from the Eurasian steppes).
- When divided into ethno-linguistic lines, this study showcases 3 Arabs, 13 Balochis, 59 Hazaras, 5 Nurestanis, 49 Pashtuns, 56 Tajiks, 1 Turkmen and 17 Uzbeks. The most immediate criticism is inadequate testing of the Arabs, Nurestanis, Balochis, Turkmens and Uzbeks in particular.
Despite several glaring flaws in methodology, Haber et al. has provided us with a much-needed insight into the deeper genetic structure of Afghanistan's Y-Chromosome diversity. There is clear evidence of genetic drift (particularly among the Pashtun Q*-M242/L1c-M357 or Hazara C3-M277), as well as evidence of recent line sharing between populations (The situation of L1c-M357 in Laghman).
However, Haber et al. has thrown out some very interesting surprises (T1-M70 among Tajiks only) as well as validating results from previous studies that had previously been questioned (I2b1-M223 and R1b1a2-M269 particularly). How did these lineages arrive in Central Asia? Is recent colonial admixture a possibility? For the time being, we will have to contend with this questions steadfastly.
Addenum I [30/03/2012]: Determination of R1b1a*-P297 furthered with regard to it potentially being R1b1a1-M73.
Addenum II [30/03/2012]: Insertion of Nordtvedt correspondence.