Thursday, July 19, 2012

Interpreting New Iranian Y-Chromosomal Data (Grugni et al.) [Review]


Introduction


A new study on Iranian Y-Chromosomes released just yesterday has, to my satisfaction, adequately sampled every major ethno-linguistic group as well as determining inter-provincial variation between them. Grugni et al. sampled 938 unrelated Iranian men from 15 ethnic groups (including Assyrians, Zoroastrians and Turkmen) in 14 provinces across the country.


Abstract

"Knowledge of high resolution Y-chromosome haplogroup diversification within Iran provides important geographic context regarding the spread and compartmentalization of male lineages in the Middle East and southwestern Asia. At present, the Iranian population is characterized by an extraordinary mix of different ethnic groups speaking a variety of Indo-Iranian, Semitic and Turkic languages. Despite these features, only few studies have investigated the multiethnic components of the Iranian gene pool. In this survey 938 Iranian male DNAs belonging to 15 ethnic groups from 14 Iranian provinces were analyzed for 84 Y-chromosome biallelic markers and 10 STRs. The results show an autochthonous but non-homogeneous ancient background mainly composed by J2a sub-clades with different external contributions. The phylogeography of the main haplogroups allowed identifying post-glacial and Neolithic expansions toward western Eurasia but also recent movements towards the Iranian region from western Eurasia (R1b-L23), Central Asia (Q-M25), Asia Minor (J2a-M92) and southern Mesopotamia (J1-Page08). In spite of the presence of important geographic barriers (Zagros and Alborz mountain ranges, and the Dasht-e Kavir and Dash-e Lut deserts) which may have limited gene flow, AMOVA analysis revealed that language, in addition to geography, has played an important role in shaping the nowadays Iranian gene pool. Overall, this study provides a portrait of the Y-chromosomal variation in Iran, useful for depicting a more comprehensive history of the peoples of this area as well as for reconstructing ancient migration routes. In addition, our results evidence the important role of the Iranian plateau as source and recipient of gene flow between culturally and genetically distinct populations."

[PDF]


Interpretation of Results

Iranian Y-SNP Frequencies

Data from the original study can be found opposite. In addition, several contour maps showing the frequency of select Y-DNA Haplogroups found across the country are shown along the right. Armenians, Zoroastrians and Assyrians from Tehran, as well as Afro-Iranians from Hormozgan province, are excluded. Note that updated ISOGG nomenclature was applied wherever deemed appropriate (refer to SNP's for clarification of status). Frequency ranges shown on maps are from 0-100%. Please note the maps are only intended to depict general trends rather than specific figures. Refer to the figures from the study (above) for these.


- Consistent with anthropological data and historical records from South Iran, the Y-DNA Haplogroups with frequencies greater in Africa than Eurasia (B-M60 and E2-M75) peak in Hormozgan province. 

- Over half a dozen para-Haplogroups (C*-M216, F*-M89, H*-M69, IJ*-M429, J2*-M172, L*-M61, NO*-LLY22g, Q1*-P36.2 and R*-M207) were found scattered across Iran. Although the presence of para-Haplogroups within a region are often taken as an indicator of a lineage's antiquity there, both their consistency and correspondence with downstream younger clades must be considered before such a conclusion is made. As such, I do not consider H*-M69, NO*-LLY22g or C*-M216's presence in this cohort to indicate anything other than Iran's position as a geographic crossroad. The remaining ones (particularly J2*-M172, L*-M61 and R*-M207) require further investigation to elucidate whether Iran does stake the claim to the origins of each.

- Further to the above, it is likely that the R*-M207 reported in this paper is in fact R2*-M479 based on the dated SNP array used.

- C5-M356 makes a sporadic appearance across Iran. A mysterious clade with a spotty distribution across much of Eurasia. In the region, it is more commonly associated with the Indian Subcontinent.
Iranian J1c3-PAGE08

- Haplogroup G makes a strong appearance with, in my opinion, enough clade diversity to validate an origin in Iran or a close-by region. This is partially supported by its' presence in every ethnic group, albeit through different subclades.

- Although IJ*-M429 has finally been found, Grugni et al.'s decision not to publish STR data does not give us the means to determine if the two Mazandarani and Persian men are in fact related within a genealogical timeframe. The significance of this find in Iran will have to remain pending.

The lacklustre SNP definition in the Y-DNA I found in Iran (Gilaki, Bandari, Kurdish and Armenian populations between I1-M253 and I2-M438) dissuades strong conclusions regarding the development of I-M170 relative to IJ*-M429's discovery. The lack of STR's prevents us from ascertaining whether these are recent contributions from Europe or not, or whether there is any European connection to begin with.

- Both the frequency and subclade diversity of Haplogroup J2-M172 (as well as the presence of J2*-M172 and J2a*-M410 across the country) makes Iran a strong candidate for the origin of this lineage.

The strong presence of J1c3-PAGE08 is one of the surprising finds of this study. With an absence only amongst Assyrians from Azarbaijan province and a peak in Khuzestani Arabs (31.6%), I speculate this is an early Near-Eastern pastoralist nomad marker that is only accentuated in Khuzestani Arabs because the L147.1 marker (J1c3d), which is commonly associated with the expansion of Semitic languages (particularly Arabic in literature) was not tested here. Otherwise, it would be difficult to reconcile medieval Arabic admixture among Iran's Zoroastrians being comparable (and often greater) than Azeris, for instance, as Azerbaijan hosted Arab garrisons following the Sassanid collapse.

- Haplogroup Q presents with a very distorted picture. 42.6% of Turkmens belonging to Q1a2-M25 is not in agreement with Wells et al.'s The Eurasian Heartland: A continental perspective on Y-chromosome diversity, where Haplogroups J, N, R1a and R1b predominated, suggesting either an extensive Founder effect has taken place (i.e. regionalisation of certain branches from a common Oghuz Turk pool) or the Golestani Turkmen values have experienced a more generic form of genetic drift.
On the matter of Turkic affinities, Azeri's from Azarbaijan province have greater subclade variation than all other ethnic groups. However, the total frequency is either comparable (or less) than Persians nationwide. As it stands, if one were to presume Haplogroup Q in Iran was of Turkic origins, it would appear their contribution to the Persian and Azeri genepools is comparable despite linguistic differences. Although more data would certainly flesh this matter out, this diversity combined with the presence of N-M216 among Iran's Azeri population certainly gives a genetic basis for their linguistic heritage.

Haplogroup R1a1a-M17 is regularly found at frequencies greater than 15% across Iran, contrary to the assertion made by Dr. Wells one decade ago regarding the limited samples he obtained, again from The Eurasian Heartland: A continental perspective on Y-chromosome diversity ;

Iranian G2a-P15
"Intriguingly, the population of present-day Iran, speaking a major Indo-European language (Farsi), appears to have had little genetic influence from the M17-carrying Indo-Iranians."

It is somewhat ironic, however, to note that the Persians from Fars province presented one of the lowest R1a1a-M17 frequencies observed in this study. Whether sampling chance is an issue here, or the sparsity of M17 is indeed a reality, is an open question.

- The presence of both R1a1-SRY1532.2 (shown as R1a* due to old nomenclature) and R1b*-M343 repeat the presence of these para-Haplogroups in the region, indicating West Asia was from whence Haplogroup R1-M173 began differentiating into the two primary subclades we see today in Eurasia.

Haplogroup R1b1a2a-L23 is more frequent in the north and west of the country, which (together with its' presence in the furthest southern and eastern poles at ~3%) suggests it likely moved in an overall south-easterly direction via diffusion, probably during the Neolithic.

- The distribution of Haplogroup R2a-M124 is, much like C5-M356, irregular. Contrary to what is shown in Haber et al.'s research, R2a is not more common in the east of the country. Instead, it can be found amongst Esfahani Persians at a frequency of 9.1%. That Iran's R2a frequency achieves its' peak in the centre of the country is reminiscent of Sahoo et al.'s A prehistory of Indian Y chromosomes: Evaluating demic diffusion scenarios;


The sensationalist question of the hour; what accounts for the spike in R2a-M124 that has been picked up in Central Iran for the past half decade?

- Finally, Haplogroup T-M70 enjoys a frequency of 10.1% amongst Assyrians from Azarbaijan province, whilst also being more common among Persians across the country and Iranians from the western periphery of the country (Azeris and Kurds). This would suggest, therefore, an at least passive but deep association with ancient Near-Eastern cultures.

Criticisms of Paper

Despite the rich sampling pool, I have several immediate criticisms;

Iranian J1-M267
  • There are some issues with the sampling strategy employed by this paper. For instance, the Assyrians (Christian non-Arab Semitic-speaking minority) are represented by 39 men, although Persians from Esfahan (a major Iranian city) are by 11 only. 
  • Inadequate haplotype data has been released; the only offering is 8-STR's from select lineages (e.g. J1*-M267) which were used for variance analysis.
  • Furthermore, a maximum of 10 Y-STR's were analysed, rendering some of their variance calculations questionable at such a low resolution. This also does away with the possibility of MRCA and intra-subclade age calculations.
  • Grugni et al. have approached Haplogroup R1a1a-M17 in a similar vein to past studies (e.g. Haber et al., see Showcasing of Y-DNA Variation Among Afghan Ethnic Groups) by not referring to current data concerning the structure of R1a1a. As with Haber et al., R1a1a-M458 is taken as the "European" strain, despite research undertaken by the R1a1a and Subclades Y-DNA Project revealing the apparent schism between the upstream Z283 and Z93 SNP's being far more informative in this regard.
  • Haplogroup R1b1a2*-L23 is considered as a "West Eurasian" paternal contribution to the Iranian plateau rather than the possibility it may have originated within or in proximity to the country's western zone. 
  • As shown in Interpretation of Results, Grugni et al.'s use of dated nomenclature poses problems for those who may not be intimately familiar with recent Y-SNP Tree changes by ISOGG.

Acknowledgements

Map of Iran courtesy of D-Maps.com.

7 comments:

  1. Great summary of the paper and I agree with your criticisms.

    I have one comment about haplogroup Q. You wrote:

    "Haplogroup Q presents with a very distorted picture. 42.6% of Turkmens belonging to Q1a2-M25 is not in agreement with Wells et al."

    I don't know about the subbranch Q1a2-M25 but the presence of haplogroup Q is in agreement with the findings for the Afsar clan of the Oguz tribe in Central Anatolia (Gokcumen et al). The STR data of the Afsar clan is similar to some Iranians (Alshamali et al., 2009; see individuals Iran104, Iran11, Iran67).

    ReplyDelete
  2. Thank you Palisto, I imagine you must've also been pleased to see Kordestani Kurds get sampled! I did not see any information about their tribal affiliations. That would have been the cherry on the cake (as well as additional Y-STR's).

    Al Shamali et al., if I remember correctly, sampled southern Iran only. If there is STR similarity between those Iranians and the Afshars from Central Anatolia, it would only complicate the status of Haplogroup Q in Iran further.

    Haber et al.'s investigation on Lebanese Maronites (Influences of history, geography, and religion on genetic structure: the Maronites in Lebanon) also presented 322 Iranian samples, of which 200 were "West Iranian" and 22 were Q-M242 (11% of West Iran). If time permits, I will investigate all this conflicting information in more detail tomorrow.

    ReplyDelete
  3. "Thank you Palisto, I imagine you must've also been pleased to see Kordestani Kurds get sampled! " Yes, I am. I was trying to get as much information out of Grugni et al., 2012 as possible.
    http://kurdishdna.blogspot.com/2012/07/y-chromosome-analysis-of-iran.html
    http://kurdishdna.blogspot.com/2012/07/y-chromosome-of-kurds-in-iran.html

    "Al Shamali et al., if I remember correctly, sampled southern Iran only. If there is STR similarity between those Iranians and the Afshars from Central Anatolia, it would only complicate the status of Haplogroup Q in Iran further. "

    Yes, the data are from South Iran. I agree that haplogroup Q is not easy to understand in the Middle East. We should also not forget the Jews in this equation: quiet a few Ashkenazi Levite and Cohanim lineages are Q1b1-M378.

    ReplyDelete
  4. I think the low R1a1a in Fars could be due to many reasons such as by chance from the tested group or it could be that people of Fars do have low R1a1a. I have also noticed that Lors also seem to have relatively lower R1a1a as well. The reason that North West/West Iran has more R1a1a could be due to multiple settlements of Iranian groups such as Scythians,Parthians,Medes and also even the early Persians themeselves were in around Urmia.

    It also seems to Lors are more similar to Fars then to any other region. Which is a surprising as many believed Lors would share more with Kurds of Iran due to them been known as Kurds in the past.

    ReplyDelete
  5. "The strong presence of J1c3-PAGE08 is one of the surprising finds of this study. With an absence only amongst Assyrians from Azarbaijan province and a peak in Khuzestani Arabs (31.6%), I speculate this is an early Near-Eastern pastoralist nomad marker that is only accentuated in Khuzestani Arabs because the L147.1 marker (J1c3d), which is commonly associated with the expansion of Semitic languages (particularly Arabic in literature) was not tested here. Otherwise, it would be difficult to reconcile medieval Arabic admixture among Iran's Zoroastrians being comparable (and often greater) than Azeris, for instance, as Azerbaijan hosted Arab garrisons following the Sassanid collapse."

    Your comment raises a highly interesting question, especially the 8.8% in Zoroastrians from Yazd. Most J1c3d Armenians (and presumably many J1c3d Kurds as well) are YSC76+, a SNP downstream from L147.1 and L858! I agree with you: either the absence of L147.1 is the reason and we are dealing here with very old lineages or we have to rewrite the (until now exclusive Semitic) history of J-L147.1 and downstream SNPs. I'm especially interested into the question whether the J1c3 is due to admixture with originally Semitic speaking lineages or not...

    ReplyDelete
  6. I must say fabulous analysis of this data here. I have been preaching nearly identical to every point here, since Wells' premature assumption of the origin of R1a1 (which by the looks of this high res study, seems to be Iran). J2 , R1 and likely I are Iranian in origin.

    ReplyDelete
  7. Thanks very much for the comment, blogmaster.

    Based on modern frequency distributions, that's the intuitive conclusion regarding Y-DNA R1a. Ancient DNA's the crucial piece of evidence we're lacking right now. The presence of Y-DNA R1a in a mesolithic Karelian hunter-gatherer alongside other forms of Y-DNA R1 in Haak et al. (2015), however, does open the field quite a bit. We desperately need some aDNA from West Asia to determine the developmental pattern of R1 in Eurasia.

    I have no current convictions regarding this topic as I follow the data until appropriate junctures have been met. Some pre-neolithic Y-DNA R1 from Iran or the South Caucasus would definitely mix things up nicely.

    ReplyDelete