Showing posts with label Assyrian. Show all posts
Showing posts with label Assyrian. Show all posts

Wednesday, August 6, 2014

Anchored in Armenia: An Exercise in Genetic Relativity [Original Work]


Introduction

Location of the Armenian Highlands in West Asia
As is the case with many groups in the region, the Armenians are, anthropologically-speaking, a very unique modern ethnicity. Situated in the Armenian Highlands (an expansive area straddling between the Zagros & Caucasus range) with a settlement history dating since the Neolithic, the modern Armenian people have maintained a distinct culture both shaped and shielded by the mountainous territory they inhabit. [1] One unique aspect of the Armenian people is their language; Modern Armenian is an Indo-European language belonging to its' own branch. There has long been scholarly debate regarding its' linguistic exodus from the Proto-Indo-European homeland (commonly accepted by modern linguists as the Pontic-Caspian steppe) [2] through to its' historical seat in the South Caucasus. As is evident by the attested Urartian and Hurrian loanwords in later forms of the language, Armenian must have been spoken by its' current forebears since at least before 500 B.C. [3] Various genetics enthusiasts (including myself) on differing occasions have cited this as an indication of an aboriginal West Asian genetic layer accompanying the Urartian-Hurrian vocabulary substratum.

Presumably due to the on-going political instability in West Asia, there has been an unfortunate lack of ancient DNA (aDNA) recovery in the areas adjacent to the Armenian Highlands. Alongside the Armenians, West Asia proper is also home to Anatolian Turks, numerous Kurdish groups, the Assyrians, several Jewish minorities and various ethnic groups within Iran. Inter-relation of all these groups in differing extents has been demonstrated in both published studies [4] and the open-source projects. [5,6]

Mount Ararat - A symbolic item in Armenian culture
Although they have most likely experienced their own demic events in prehistoric times, the insular nature of the Armenians relative to their neighbours allows them to be used as a stand-in for the aDNA we currently lack in this part of the world. In this blog entry, the Armenians will therefore be considered as a surrogate for autochthonous West Asian ancestry. They will be treated as a primary donor population (PDP) for several other West Asian groups, in an attempt to flesh out the degree of mutual shared ancestry, as well as the directions of added affinities beyond the region. This is by no means an authoritative attempt to purport a particular image of the West Asian genetic landscape, but an attempt instead to provoke discussion and explore the underlying structure of the region through a manner that should hopefully yield fruitful results in the glaring absence of aDNA in the region.


Working Hypotheses

1. Given the demonstrated similarity in autosomal DNA profiles (here and here), modern Armenians will serve as a reasonable PDP for all tested populations.

2. Furthermore, the genetic difference (GD) will likely be dictated by geographical proximity to the Armenians, or a (lack of) history of admixture with them.

3. Finally, the other donor populations will be anticipated either by virtue of geography or language.


Method

The Dodecad K12b Oracle was used to undertake this small project (please visit link for technical information). When executed through R, the program was set to Mixed Mode and fixed to 500 results for every iteration per population. The command entered therefore remained the same each time:

DodecadOracle("WestAsianPopulation",mixedmode=T,k=500)

Samples consist of nine location-specific populations (Iranians, Kurds_Y, Azerbaijan_Jews, Iraq_Jews, Iran_Jews, Turks, Turks_Aydin*, Turks_Kayseri*, Turks_Istanbul*) and four Dodecad participant averages (Iranian_D, Kurd_D, Assyrian_D, Turkish_D). A total of thirteen populations were therefore included.

From the output, only those combinations expressing an Armenian population as a PDP were selected. In this context, the Armenians will be considered a PDP if their "ancestral" percentage exceeds 50%. A maximum of ten were collected per population. In the event the number of combinations exceeded this, the subsequent combination lists are terminated with an ellipsis.

* Although not included in the original Dodecad K12b Oracle dataset, Dienekes has conveniently shared the population averages for these samples here. These were manually inserted into the command.


Results

Iranian and Kurdish Oracle results
Unsurprisingly, the Iranians and Kurds all display similar results. Specifically, the adoption of either Makrani or Balochi as the secondary donors when Armenians are fixed as a PDP. The proportions are also comparable between all. The Iranians appear to fit the Armenian + Balochi/Makrani combination slightly better than the Kurds (GD=4.04-5.16 vs. 5.03-6.65 to 2 d.p. respectively). It is also worth observing that both Iranians and Kurds, irrespective of sampling strategy (location-specific or Dodecad average), do not have Mixed Mode results which exceed ten.

Assyrian and select Near-Eastern Jewish Oracle results
The Assyrians are one of the groups of interest, given the demonstrated autosomal similarity between them and Armenians (here). As anticipated, their Mixed Mode results well exceed ten and the best fits (GD=1.66-1.82 to 2 d.p.) are all, coincidentally, with the Near-Eastern Jewish groups studied here. Subsequent matches include additional populations (e.g. Saudi, Bedouin, Syrian) where the GD remains relatively small compared to the Iranian and Kurdish values (>3.15 to 2 d.p.).

The Near-Eastern Jewish groups largely mirror the Assyrian results, although some key differences should be outlined:

  • The Azerbaijani Jews have a GD similar to the Assyrians in range, setting them apart from the Iraqi and Iranian Jews. This seems to fit geography. However, if the association was strictly geographical, one would expect the Assyrians to lie in-between the Azerbaijani Jews from the Iraqi and Iranians. This may be genetic evidence of additional and direct ancestry between Armenians and Assyrians at some (or various) point(s) after the Near-Eastern Jewish groups had formalised their identities.
  • Saudis appear as a secondary donor population in all groups. Interestingly, they appear to have an inverse relationship with geographic proximity to the Armenian Highlands; Iraqi, Iranian and Azerbaijani Jews are 20.4%, 16.1% and 7.8% "Saudi" respectively. The Assyrians too fall on this cline despite the point raised above.

Anatolian Turkish Oracle results
Finally, the Anatolian Turks provide us with another set of interesting values and pairs:

  • Mixed Mode results from Western Turkey (Aydin, Istanbul) largely exhibit a combination of Armenian with various European ethnic groups or nationalities, which can be predominantly ascribed to geography. Please note the comparatively large GD among the Aydin average (>9.93 to 2 d.p.), which contrasts with Istanbul. I suspect the cosmopolitan nature of Istanbul has resulted in an artefactual lowering of the GD, given Anatolian Turks from
    across the country have moved their for employment purposes. [7]
  • In contrast, the samples listed as "Turks" in Dodecad K12b (from the Behar et al. dataset, located in Central-South Turkey) model well as a combination of Armenian with either the Chuvash, Nogay, Uzbek or Uyghur. European secondary donors do make an appearance once more. Please also note their GD is the smallest out of the Turkish averages investigated (4.20 to 2 d.p.).
  • The Kayseri average (Central Turkey) yielded no results matching the criteria outlined in "Method". However, the Assyrians instead made a frequent appearance as primary donors from GD=6.17 onwards. Given the genetic affinity between Assyrians and Armenians (refer above), and the consistency displayed by the Armenians as a PDP for other Turkish averages, this result can be considered anomalous. A close inspection of the Dodecad K12b proportions reveals the Kayseri Turks were on average approximately 1.5% more Southwest Asian than all other Turkish populations, explaining why Assyrians took preferential placing over Armenians as the PDP. The cause of this slight increase is unknown at present.
  • The Turkish_D average best resembled that of Istanbul, albeit with slightly more Armenian and less European proportions. This would suggest that, overall, the Dodecad Turkish participants map somewhere just east of Istanbul despite the presumably diverse backgrounds. 
  • Finally, all averages produced Mixed Mode results which exceeded ten in number.

IBD Segment Indications

To corroborate the findings of this investigation with additional genetic data, I refer to the Dodecad Project's fastIBD analysis of Italy/Balkans/Anatolia and fastIBD analysis of several Jewish and non-Jewish groups. As the analyses do not completely encompass those groups studied here, the results cannot be accepted wholesale. However, there does appear to be a broad agreement with some of the results in this investigation. For example, the Armenians and Assyrians have a demonstrated level of "warmth" to one another beyond background sharing.


Further Work

This investigation would have benefited from Azeri Turkish samples via the Republic of Azerbaijan. Additionally, a better breakdown of Kurdish, Iranian and Assyrian samples, akin to the site-specific sampling seen here in the Anatolian Turks, would have been ideal. Finally, as stated above, this investigation would have benefited from the inclusion of IBD segment analysis specific to the studied groups. Should time permit and the desired samples be made available in the future, this would be a natural line of inquiry to further what has been explored here.


Conclusion

Addressing the three hypotheses stated at the beginning in order:

1. Armenians certainly have behaved as a reasonable proxy for an autochthonous West Asian PDP in most of the populations tested (sole exception being the Kayseri Turks although this appears to be an anomalous response to slightly more Southwest Asian scores). The scores vary depending on the presence of the secondary donors, but Assyrians and Jewish populations from Azerbaijan, Iran and Iraq appear to have the largest proportion of this (occasionally surpassing 90%). All Iranians and Kurds, on the other hand, scored the least overall (approximately 65-75%). The Turkish range lies in-between these two.

2. Unfortunately, this isn't clear. The lack of regional results for Kurds and Iranians, together with a lack of samples specifically from Eastern Turkey, prevents any conclusion being reached on this point. The Near-Eastern Jewish populations studied here certainly do form a cline of Armenian "admixture" that is fully in line with geography. Furthermore, the large GD observed in Aydin Turks does support this idea, leading me to cautiously propose geography does indeed play a role. The second point also provides us with a partial answer, as the Assyrians demonstrate more of this than one would expect given their geographical placement based on GD, as well as fastIBD evidence from elsewhere.

3. With the exception of the Assyrians and Near-Eastern Jewish groups, the secondary donors overwhelmingly matched my expectations regarding their placement with whichever group that was studied (e.g. Iranians and Kurds towards South-Central Asia, Turks towards either Europe or Central Asia proper).

Over the coming years, with the availability of more data, we should hopefully move away from the population averages that have been used by various open-source projects. It has been empirically demonstrated here that regional results will differ significantly from nationwide averages (e.g. Aydin Turks vs. Turkish_D).

This also holds true on an individual basis; the best Oracle match for one Iranian via the described methodology was 56.4% Armenians_15_Y + 43.6% Tajiks_Y (GD=5.44 to 2 d.p.), differing significantly from both the Iranian and Kurdish averages.

I suspect the gentlemen running the numerous open-source projects are aware of this caveat and are, justifiably so in my opinion, making do with currently available data.

In closing, this investigation has also determined that, on the basis of the presumption of an Armenian-like autochthonous West Asian substrate, the studied populations as a whole have an apparent degree of inter-relatedness by virtue of this common South Caucasian autosomal heritage, albeit with the presence of highly significant affinities to elsewhere in Eurasia, be it population-wide, regional or even individual.


Speculations

The first topic is regarding the Iranians and Kurds; why were their average secondary donors always the Balochi's and Makrani, rather than more northern groups, such as the Tajiks? I suspect, when applied to population averages, the Oracle program effectively minimises intra-population variation to the point where only the broadest of affinities are indicated. In the case of Iranians, the secondary donor would therefore be one with genetic features that tend to emphasise the difference between Armenians and Iranians (e.g. additional South Asian and Gedrosian admixture). A similar conclusion can be reached with respect to the Turks.

Another interesting point is the demonstrated close relationship between the Assyrians and various Near-Eastern Jewish groups. This has been speculated upon in various discussion forums in the past. More precise tools will be required to elucidate whether these populations share legitimate ancestry with one another, or the affinity is happen-stance, instead reflecting the mixture of similar Near-Eastern groups with (again) similar Caucasus-derived groups at some point in history.

[Addendum I, 07/08/2014]: For a continuation on this with a fellow genome blogger, please read the Comments below.


Acknowledgements

Full credit for both the generation of raw population data and the Oracle program go to Dienekes Pontikos (Dodecad Ancestry Project).

Map of Armenian Highlands from Wikipedia.org. Photo of Mount Ararat courtesy of NoahsArkSearch.com.

Finally, I must refer all visitors interested in understanding the genetic constituency of the Armenian people to the FTDNA Armenian DNA Project. For a more interactive learning experience, two of the administrators (Mr.'s Simonian and Hrechdakian) recently delivered a lecture on this topic, garnishing it with a deeper description of anthropological and geographical aspects as described here.


References

1. Samuelian TJ. Armenian Origins: An Overview of Ancient and Modern Sources and Theories. [Last Accessed 3/08/2014]: http://www.arak29.am/PDF_PPT/origins_2004.pdf

2. Clackson J. Indo-European Linguistics: An Introduction. Cambridge Textbooks in Linguistics [Last Accessed 4/08/2014]: http://caio.ueberalles.net/Indo-European-Linguistics-Introduction/Indo-European%20Linguistics%20-%20James%20Clackson.pdf

3. Greppin JAC. The Urartian Substratum in Armenian. [Last Accessed 4/08/2014]: http://science.org.ge/2-2/Grepin.pdf

4. Grugni V, Battaglia V, Hooshiar Kashani B, Parolo S, Al-Zahery N et al. Ancient migratory events in the Middle East: new clues from the Y-chromosome variation of modern Iranians. PLoS One. 2012;7(7):e41252.

5. Dodecad Ancestry Project: ChromoPainter/fineSTRUCTURE Analysis of Balkans/West Asia [Last Accessed 4/08/2014]: http://dodecad.blogspot.com/2012/02/chromopainterfinestructure-analysis-of.html

6. Eurogenes Genetic Ancestry Project: Updated Eurogenes K13 and K15 population averages [Last Accessed 4/08/2014]: http://bga101.blogspot.com/2014/03/updated-eurogenes-k13-and-k15.html

7. Filiztekin A, Gokhan A. The Determinants of Internal Migration In Turkey. [Last Accessed 05/08/2014]: http://research.sabanciuniv.edu/11336/1/749.pdf

Saturday, August 4, 2012

West Asian Y-DNA Haplogroup Q - Turkish or Autochthonous Origins? [Original Work]

Genographic Project Y-DNA Q Migration Route

Introduction
Y-DNA Haplogroup Q is defined by the M242 marker and is upstream to Haplogroup P-M45, making it the sister Haplogroup of R-M207, which populates much of West Eurasia. According to the Genographic Project, Haplogroup Q-M242 is between 15-20,000 years old, with the location invariably being placed around North Eurasia.

The frequency of Haplogroup Q largely matches the migration path outlined in the maps shown opposite. However, the presence of haplogroup Q in more southwestern portions of Asia has sparked the curiosity of genealogists and observers alike. In current literature, the presence of Haplogroup Q1a2-M25 specifically in Iran is cited as "Central Asian" influence. [1]

In an attempt to conclusively uncover the origins of Haplogroup Q-M242 in West Asia, the Y-STR haplotype variation of West, Central and South Asian Q1a-MEH2 and Q1b-M378 are visualised and analysed with genealogical tools.


Method
The data for this investigation are gathered from various Family Tree DNA (FTDNA) projects and studies, [1,2,6-11] with the concise list shown in the References section below.

Only results presenting at least 16 Y-STR's were considered. Modifications were made as necessary on certain STR markers (particularly Y-GATA H4) to correct nomenclature differences. Urasin's YPredictor was used when Y-SNP information from studies were inadequate (e.g. no SNP's upstream of Q-M242 tested).

Samples follow a constant naming convention, with _n and _yQP_n suffixes indicating they were obtained from studies and FTDNA Projects respectively. The following populations were included;

FTDNA Y-DNA Q Migration Route
Irn = Iranian (Unspecified ethnicity), Azr_Tal = Talysh from the Republic of Azerbaijan, Trk/Tur = Anatolian Turkish, Ptn = Pashtun from Afghanistan, Ind = Indian (Unspecified ethnicity/caste), Irq = Iraqi (Unspecified ethnicity), Kzk = Kazakh, Pak = Pakistani (Unspecified ethnicity), Uzb = Uzbek, Tjk = Tajik, Haz = Hazara, Npl = Nepali, Arm = Armenian, Geo = Georgian, UAE = Emirati Arab, Irn_Arab = Iranian Arab (Khuzestan), Irn_Mzn = Iranian Mazandarani (Mazandaran), Irn_Bkt = Iranian Bakhtiari

Once collation was complete, modal haplotypes of inferred clusters were found if necessary. Additionally, clusters were inferred from haplotrees that were created. The Most Recent Common Ancestor (tMRCA) of choice clusters were calculated by comparing two modals from the first pair of intra-cluster branches. Due to the STR panels tested in the concerned papers (Y-Filer order 1) McGee's Y-Utility was the only immediately viable choice (infinite allele mutation model, 75% Probability, 25 year/generation).


Working Hypothesis
An indeterminable mix of recent (>1500ybp) and prehistoric Y-DNA Q1a-MEH2 and Q1b-M378 lines exist in the region with some instances of close haplotype sharing between West, South and Central Asia.


Limitations Of This Investigation
  • Although the number of STR panels tested has increased gradually over the past decade, 16 is not considered a "confident sell" in the genealogy world. 
  • Additionally, the difference in STR panels used meant some informative populations, such as the Makrani, Baloch, Burusho and Parsis of Pakistan were not included due to an overlap of only 12 STR's.
  • Y-STR's from several crucial populations, such as the Qashqai, Iraqi Turkoman and Azeri's from the Republic of Azerbaijan could not be found.
  • There is, of course, the great debate concerning STR mutation rates. At the time of writing I have not observed any clear consensus in the genealogy regarding this topic. The applicability of Nordtvedt's Generations series to this entry is minimal due to an STR overlap issue, hence the decision to use McGee's tool instead.
  • As discussed later, the number of Y-SNP's tested across the cited studies are insufficient to draw firm conclusions.
  • Finally, sample size is an issue. The dataset is dominated by Iranian or Afghan samples because these papers were released at times (i.e. 2008-present) where the 17 STR Y-Filer panels became mainstream. 

Y-DNA Q1a Phylogenetic Tree
Haplogroup Q1a STR Results
Four informative clusters were inferred;

  • Cluster A (DYS19=15, DYS389i=12) is largely restricted to Afghan Pashtuns, with Ptn_1-4 all sharing having a MRCA with their modal (and therefore likely founding haplotype) between 900-450 ybp. This result is consistent with the dominance of Turkic-speaking dynasties in this time period. 
  • Cluster B (DYS385a=14) has a large geographical spread from Turkey through to Iran, the United Arab Emirates, Afghanistan, Nepal and Kazakhstan. The most immediate observation is the close haplotype sharing (3-step mutation, 14/17) between Kzk_1 and Irn_4, with an estimated MRCA at 900 ybp. This result, together with the general area covered, again indicates this cluster should at the very least be broadly associated with Central Asian Turks.
  • Cluster C (DYS392=16, DYS389ii=28, DYS448=22) is interesting because its' members are exclusively Iranian and belong to Haber et al.'s Influences of history, geography, and religion on genetic structure: the Maronites in Lebanon. [2] Most of the Iranians bearing Haplogroup Q-M242 in their sample were from West Iran, where Iran's Azeri population happens to dominate the northern region. The regional exclusivity of this cluster combined with the very recent MRCA (900 ybp) lead me to suspect Haber and his associates sampled a locale in West Iran that underwent genetic drift, explaining the +10% Q-M242 that is otherwise not seen in other studies. [1] However, the MRCA too suggests these Iranian men's paternal ancestor was also associated with Medieval Turks despite the result in it's entirety not representing West Iran sufficiently.
  • Cluster D (DYS439=11, DYS437=15) mirrors Cluster B's distribution across the region but the divisions are more consistent with geography than other variables (i.e. Anatolian Turk and Armenian, Hazara together). 


Haplogroup Q1b STR Results
Five informative clusters were inferred;

Y-DNA Q1b Phylogenetic Tree
  • Cluster A (DYS385a=12, DYS439=11, DYS437=15) is, relative to the others, an early offshoot that is highly localised in South-Central Asia. 
  • Cluster B (DYS385a=14) is also localised, found specifically in Iraq and Iran.
  • Cluster C (DYS385a=14, DYS448=20) is twinned with B but appears to have a younger MRCA (925 ybp). Of interest is the wide geographic distribution across Turkey, Iran, India and Kazakhstan. Central Asian Turks once more provide a convenient historical narrative for both the predicted MRCA and spread.
  • Cluster D (DYS385a=15) is again geographically localised, this time in the greater Near-East (Turkey, Iran and Syria). 
  • Cluster E (DYS385a=12, DYS437=15) once more displays geographic localisation in South-Central Asia, specifically among Afghani Pashtuns and a FTDNA Project Pakistani.


SNP's - What Do They Tell Us?
Tabulated Y-DNA Q SNP's for select populations from several studies [1, 3-5] can be viewed in the Vaêdhya Data Sink.

There is, unfortunately, a two-pronged incompatibility issue between the Y-STR analysis and Y-SNP's provided here. Not only is there poor overlap between the populations covered in both sets, but the SNP selections in the four studies cannot do not provide us with a clear picture regarding the presence of Q*-M242(xQ1a-MEH2,xQ1b-M378) Q1a*-MEH2(xQ1a2-M25), Q1a2-M25 and Q1b-M378.

However, the distribution of Q1a3-M346 and Q1b-M378 across the Iranian plateau in contrast with the specificity of Q1a2-M25 in Azeri Iranians and Turkmen (1.6% and 42.6% respectively, although the latter is likely due to genetic drift as discussed here) suggests a strain of the first two lineages is linguistically neutral and preceded the millennia of Turkish dynastic dominance in Iran.

Fortunately, such an inference is indeed supported by the Q1a and Q1b phylogenetic trees shown in this entry. One will note (particularly with Q1b-M378) the distribution is largely geographical rather than covering large swathes of Asian land through a "recent" paternal ancestor.


A comment on Assyrian Q-M242
Although the number of STR markers tested do not allow their inclusion into this research piece, I took the liberty of comparing the sole Assyrian Y-DNA Haplogroup Q-M242 individual from the FTDNA Assyrian Heritage DNA Project to elaborate on their paternal ancestor's ultimate origins.

The Assyrian people are a Neo-Aramaic-speaking ethnic minority native to the land intersecting between Turkey, Iran and Iraq as well as the Mesopotamian basin. Modern Assyrians have (due to their Christian faith and recent historical events) practiced endogamous relationships, making them a genetically distinct group minimally affected by demic movements in the surrounding populations.

The Assyrian Y-DNA Q belongs to the Q1b1a-L245 subclade. As we have observed already, haplogroup Q1b-M378 tends to have a distribution governed more by geography with deeper cluster branches, implying greater diversification time in a given region.

At present, based on the available 10 overlapping STR's, the Assyrian Q1b1a-L245 individual matches Tur_yQP_3 best with a one-step mutation (9/10), placing them deep within Cluster C, the only one without a region-specific distribution. This preliminary evaluation indicates this Assyrian man's paternal ancestor shares Medieval genetic links with Anatolian Turkish, Iranian, Indian and Kazakh men, making a Central Asian Turkish connection likely once more.


Conclusion
Due to the limitations described above, the identification of clusters is more relevant based on their geographic spread. The MRCA calculations shown are simply an extremely rough estimate at the age of a cluster.

However (and fortunately once more), it is very clear that some clusters are determined by geography rather than the sort of "genealogical boon" observed in a few (e.g. Q1a Cluster C's extensive branching despite being young relative to the others).

If one takes the MRCA calculations as a very rough approximation, whilst considering a cluster's ability to supercede regional boundaries, one can estimate that 75.4% (40/53) of the Y-DNA Haplogroup Q1a-MEH2 and 31.4% (11/35) of Y-DNA Haplogroup Q1b-M378 in West, Central and South Asia can be attributed to the Turkish migrations.

In summary, Y-DNA Haplogroup Q1a-MEH2 (likely Q1a2-M25 based on anecdotal SNP evidence) is a convincing Medieval Central Asian Turkish genetic marker based specifically on its' ability to form multi-ethnic clusters in regions with a historical Turkish connection. Q1b-M378, on the other hand, generally displays enough regionalisation and cluster depth to make such an association doubtful at best, with the sole exception being those who belong to the a genetic group highlighted in this entry (Cluster C) with DYS385a=14 and DYS448=20. 

South Central Asian Q1b-M378 appears to be autochthonous whereas any form of Q1a-MEH2 in the region has a strong association with regions intimately connected with the Medieval Turks. The Anatolian highlands and the Iranian plateau, however, appear to be a complicated mix between the two based on the lack of clear distinctions.

The slim presence of Haplogroup Q in India on the other hand, as far as the current data indicates, is almost entirely of Medieval Turkic input, although the Subcontinent's position as a geographic nexus (much like Iran and Turkey) certainly open the possibility for exotic para-haplogroups to also exist there.


Acknowledgement
  • Gratitude is extended to the FTDNA Projects for making their data publicly available. Independent research ventures such as my own would not be possible without their generosity.
  • I would also like to thank Mr. Paul Givargidze, administrator of the Assyrian Heritage, Aramaic and Y-DNA J1* DNA Projects at FTDNA for providing his esteemed support on this research entry.
  • The Y-DNA Haplogroup Q migration route maps are courtesy of the Genographic Project and FTDNA.

Addendum I [5/08/2012]: It has been brought to my attention that Tur_yQP_3, the Assyrian Q1b1a's best match, is in fact an Armenian individual. Although this does not compromise the conclusions reached above, it does serve as a reminder that not everyone in the Republic of Turkey is an ethnic Turk!
Addenum II [6/08/2012]: A recent exchange on a forum highlighted the likelihood of several Turk_yQP samples being Armenian rather than Anatolian Turkish. As above, the findings shouldn't impede too greatly on what has been discussed in this entry.


References
1. Grugni V, Battaglia V, Hooshiar Kashani B, Parolo S, Al-Zahery N, et al. (2012) Ancient Migratory Events in the Middle East: New Clues from the Y-Chromosome Variation of Modern Iranians. PLoS ONE 7(7): e41252. doi:10.1371/journal.pone.

2. Haber M, Platt DE, Badro DA, Xue Y, El-Sibai M, Bonab MA, Youhanna SC, Saade S, Soria-Hernanz DF, Royyuru A, Wells RS, Tyler-Smith C, Zalloua PA; Genographic Consortium. Influences of history, geography, and religion on genetic structure: the Maronites in Lebanon. Eur J Hum Genet. 2011 Mar;19(3):334-40. Epub 2010 Dec 1.

3. Al-Zahery N, Semino O, Benuzzi G, Magri C, Passarino G, Torroni A, Santachiara-Benerecetti AS. Y-chromosome and mtDNA polymorphisms in Iraq, a crossroad of the early human dispersal and of post-Neolithic migrations. Mol Phylogenet Evol. 2003 Sep;28(3):458-72.

4. Abu-Amero KK, Hellani A, González AM, Larruga JM, Cabrera VM, Underhill PA. Saudi Arabian Y-Chromosome diversity and its relationship with nearby regions. BMC Genet. 2009 Sep 22;10:59.

5. Cinnioğlu C, King R, Kivisild T, Kalfoğlu E, Atasoy S, Cavalleri GL, Lillie AS, Roseman CC, Lin AA, Prince K, Oefner PJ, Shen P, Semino O, Cavalli-Sforza LL, Underhill PA. Excavating Y-chromosome haplotype strata in Anatolia. Hum Genet. 2004 Jan;114(2):127-48. Epub 2003 Oct 29.

6. Gokcumen Ö, Gultekin T, Alakoc YD, Tug A, Gulec E, Schurr TG. Biological ancestries, kinship connections, and projected identities in four central Anatolian settlements: insights from culturally contextualized genetic anthropology. Am Anthropol. 2011;113(1):116-31.

7. Roewer L, Willuweit S, Stoneking M, Nasidze I. A Y-STR database of Iranian and Azerbaijanian minority populations. Forensic Sci Int Genet. 2009 Dec;4(1):e53-5. Epub 2009 Jun 5.

8. Dulik MC, Osipova LP, Schurr TG. Y-chromosome variation in Altaian Kazakhs reveals a common paternal gene pool for Kazakhs and the influence of Mongolian expansions. PLoS One. 2011 Mar 11;6(3):e17548.

9. Haber M, Platt DE, Ashrafian Bonab M, Youhanna SC, Soria-Hernanz DF, et al. (2012) Afghanistan's Ethnic Groups Share a Y-Chromosomal Heritage Structured by Historical Events. PLoS ONE 7(3): e34288. doi:10.1371/journal.pone.0034288

10. Tenzin Gayden, Alicia M. Cadenas, Maria Regueiro, Nanda B. Singh, Lev A. Zhivotovsky, Peter A. Underhill, Luigi L. Cavalli-Sforza, and Rene J. Herrera. The Himalayas as a Directional Barrier to Gene Flow. Am J Hum Genet. 2007 May; 80(5): 884–894.

11. Lacau H, Bukhari A, Gayden T, La Salvia J, Regueiro M, Stojkovic O, Herrera RJ. Y-STR profiling in two Afghanistan populations. Leg Med (Tokyo). 2011 Mar;13(2):103-8. Epub 2011 Jan 14.

Thursday, July 19, 2012

Interpreting New Iranian Y-Chromosomal Data (Grugni et al.) [Review]


Introduction


A new study on Iranian Y-Chromosomes released just yesterday has, to my satisfaction, adequately sampled every major ethno-linguistic group as well as determining inter-provincial variation between them. Grugni et al. sampled 938 unrelated Iranian men from 15 ethnic groups (including Assyrians, Zoroastrians and Turkmen) in 14 provinces across the country.


Abstract

"Knowledge of high resolution Y-chromosome haplogroup diversification within Iran provides important geographic context regarding the spread and compartmentalization of male lineages in the Middle East and southwestern Asia. At present, the Iranian population is characterized by an extraordinary mix of different ethnic groups speaking a variety of Indo-Iranian, Semitic and Turkic languages. Despite these features, only few studies have investigated the multiethnic components of the Iranian gene pool. In this survey 938 Iranian male DNAs belonging to 15 ethnic groups from 14 Iranian provinces were analyzed for 84 Y-chromosome biallelic markers and 10 STRs. The results show an autochthonous but non-homogeneous ancient background mainly composed by J2a sub-clades with different external contributions. The phylogeography of the main haplogroups allowed identifying post-glacial and Neolithic expansions toward western Eurasia but also recent movements towards the Iranian region from western Eurasia (R1b-L23), Central Asia (Q-M25), Asia Minor (J2a-M92) and southern Mesopotamia (J1-Page08). In spite of the presence of important geographic barriers (Zagros and Alborz mountain ranges, and the Dasht-e Kavir and Dash-e Lut deserts) which may have limited gene flow, AMOVA analysis revealed that language, in addition to geography, has played an important role in shaping the nowadays Iranian gene pool. Overall, this study provides a portrait of the Y-chromosomal variation in Iran, useful for depicting a more comprehensive history of the peoples of this area as well as for reconstructing ancient migration routes. In addition, our results evidence the important role of the Iranian plateau as source and recipient of gene flow between culturally and genetically distinct populations."

[PDF]


Interpretation of Results

Iranian Y-SNP Frequencies

Data from the original study can be found opposite. In addition, several contour maps showing the frequency of select Y-DNA Haplogroups found across the country are shown along the right. Armenians, Zoroastrians and Assyrians from Tehran, as well as Afro-Iranians from Hormozgan province, are excluded. Note that updated ISOGG nomenclature was applied wherever deemed appropriate (refer to SNP's for clarification of status). Frequency ranges shown on maps are from 0-100%. Please note the maps are only intended to depict general trends rather than specific figures. Refer to the figures from the study (above) for these.


- Consistent with anthropological data and historical records from South Iran, the Y-DNA Haplogroups with frequencies greater in Africa than Eurasia (B-M60 and E2-M75) peak in Hormozgan province. 

- Over half a dozen para-Haplogroups (C*-M216, F*-M89, H*-M69, IJ*-M429, J2*-M172, L*-M61, NO*-LLY22g, Q1*-P36.2 and R*-M207) were found scattered across Iran. Although the presence of para-Haplogroups within a region are often taken as an indicator of a lineage's antiquity there, both their consistency and correspondence with downstream younger clades must be considered before such a conclusion is made. As such, I do not consider H*-M69, NO*-LLY22g or C*-M216's presence in this cohort to indicate anything other than Iran's position as a geographic crossroad. The remaining ones (particularly J2*-M172, L*-M61 and R*-M207) require further investigation to elucidate whether Iran does stake the claim to the origins of each.

- Further to the above, it is likely that the R*-M207 reported in this paper is in fact R2*-M479 based on the dated SNP array used.

- C5-M356 makes a sporadic appearance across Iran. A mysterious clade with a spotty distribution across much of Eurasia. In the region, it is more commonly associated with the Indian Subcontinent.
Iranian J1c3-PAGE08

- Haplogroup G makes a strong appearance with, in my opinion, enough clade diversity to validate an origin in Iran or a close-by region. This is partially supported by its' presence in every ethnic group, albeit through different subclades.

- Although IJ*-M429 has finally been found, Grugni et al.'s decision not to publish STR data does not give us the means to determine if the two Mazandarani and Persian men are in fact related within a genealogical timeframe. The significance of this find in Iran will have to remain pending.

The lacklustre SNP definition in the Y-DNA I found in Iran (Gilaki, Bandari, Kurdish and Armenian populations between I1-M253 and I2-M438) dissuades strong conclusions regarding the development of I-M170 relative to IJ*-M429's discovery. The lack of STR's prevents us from ascertaining whether these are recent contributions from Europe or not, or whether there is any European connection to begin with.

- Both the frequency and subclade diversity of Haplogroup J2-M172 (as well as the presence of J2*-M172 and J2a*-M410 across the country) makes Iran a strong candidate for the origin of this lineage.

The strong presence of J1c3-PAGE08 is one of the surprising finds of this study. With an absence only amongst Assyrians from Azarbaijan province and a peak in Khuzestani Arabs (31.6%), I speculate this is an early Near-Eastern pastoralist nomad marker that is only accentuated in Khuzestani Arabs because the L147.1 marker (J1c3d), which is commonly associated with the expansion of Semitic languages (particularly Arabic in literature) was not tested here. Otherwise, it would be difficult to reconcile medieval Arabic admixture among Iran's Zoroastrians being comparable (and often greater) than Azeris, for instance, as Azerbaijan hosted Arab garrisons following the Sassanid collapse.

- Haplogroup Q presents with a very distorted picture. 42.6% of Turkmens belonging to Q1a2-M25 is not in agreement with Wells et al.'s The Eurasian Heartland: A continental perspective on Y-chromosome diversity, where Haplogroups J, N, R1a and R1b predominated, suggesting either an extensive Founder effect has taken place (i.e. regionalisation of certain branches from a common Oghuz Turk pool) or the Golestani Turkmen values have experienced a more generic form of genetic drift.
On the matter of Turkic affinities, Azeri's from Azarbaijan province have greater subclade variation than all other ethnic groups. However, the total frequency is either comparable (or less) than Persians nationwide. As it stands, if one were to presume Haplogroup Q in Iran was of Turkic origins, it would appear their contribution to the Persian and Azeri genepools is comparable despite linguistic differences. Although more data would certainly flesh this matter out, this diversity combined with the presence of N-M216 among Iran's Azeri population certainly gives a genetic basis for their linguistic heritage.

Haplogroup R1a1a-M17 is regularly found at frequencies greater than 15% across Iran, contrary to the assertion made by Dr. Wells one decade ago regarding the limited samples he obtained, again from The Eurasian Heartland: A continental perspective on Y-chromosome diversity ;

Iranian G2a-P15
"Intriguingly, the population of present-day Iran, speaking a major Indo-European language (Farsi), appears to have had little genetic influence from the M17-carrying Indo-Iranians."

It is somewhat ironic, however, to note that the Persians from Fars province presented one of the lowest R1a1a-M17 frequencies observed in this study. Whether sampling chance is an issue here, or the sparsity of M17 is indeed a reality, is an open question.

- The presence of both R1a1-SRY1532.2 (shown as R1a* due to old nomenclature) and R1b*-M343 repeat the presence of these para-Haplogroups in the region, indicating West Asia was from whence Haplogroup R1-M173 began differentiating into the two primary subclades we see today in Eurasia.

Haplogroup R1b1a2a-L23 is more frequent in the north and west of the country, which (together with its' presence in the furthest southern and eastern poles at ~3%) suggests it likely moved in an overall south-easterly direction via diffusion, probably during the Neolithic.

- The distribution of Haplogroup R2a-M124 is, much like C5-M356, irregular. Contrary to what is shown in Haber et al.'s research, R2a is not more common in the east of the country. Instead, it can be found amongst Esfahani Persians at a frequency of 9.1%. That Iran's R2a frequency achieves its' peak in the centre of the country is reminiscent of Sahoo et al.'s A prehistory of Indian Y chromosomes: Evaluating demic diffusion scenarios;


The sensationalist question of the hour; what accounts for the spike in R2a-M124 that has been picked up in Central Iran for the past half decade?

- Finally, Haplogroup T-M70 enjoys a frequency of 10.1% amongst Assyrians from Azarbaijan province, whilst also being more common among Persians across the country and Iranians from the western periphery of the country (Azeris and Kurds). This would suggest, therefore, an at least passive but deep association with ancient Near-Eastern cultures.

Criticisms of Paper

Despite the rich sampling pool, I have several immediate criticisms;

Iranian J1-M267
  • There are some issues with the sampling strategy employed by this paper. For instance, the Assyrians (Christian non-Arab Semitic-speaking minority) are represented by 39 men, although Persians from Esfahan (a major Iranian city) are by 11 only. 
  • Inadequate haplotype data has been released; the only offering is 8-STR's from select lineages (e.g. J1*-M267) which were used for variance analysis.
  • Furthermore, a maximum of 10 Y-STR's were analysed, rendering some of their variance calculations questionable at such a low resolution. This also does away with the possibility of MRCA and intra-subclade age calculations.
  • Grugni et al. have approached Haplogroup R1a1a-M17 in a similar vein to past studies (e.g. Haber et al., see Showcasing of Y-DNA Variation Among Afghan Ethnic Groups) by not referring to current data concerning the structure of R1a1a. As with Haber et al., R1a1a-M458 is taken as the "European" strain, despite research undertaken by the R1a1a and Subclades Y-DNA Project revealing the apparent schism between the upstream Z283 and Z93 SNP's being far more informative in this regard.
  • Haplogroup R1b1a2*-L23 is considered as a "West Eurasian" paternal contribution to the Iranian plateau rather than the possibility it may have originated within or in proximity to the country's western zone. 
  • As shown in Interpretation of Results, Grugni et al.'s use of dated nomenclature poses problems for those who may not be intimately familiar with recent Y-SNP Tree changes by ISOGG.

Acknowledgements

Map of Iran courtesy of D-Maps.com.