Saturday, December 22, 2012

Yaghnobi Tajiks: Preliminary Results May Reveal Iranian Plateau Affinity [Original Work]

Slipping under the radar of the genetic genealogy world is this paper by Elisabetta Cilli and her colleagues, which investigated the mitochondrial data of 62 individuals from Tajikistan's Yaghnobi population. [1]

The Yaghnobis are of interest given their geographical isolation and the East Iranic nature of their language. Living just northeast of the predominantly Persian (Dari) speaking capital, Dushanbe, Yaghnobi is a continuation of a fully agglutinative Soghdian dialect representing the sole survivor of this language following the Persianization of Central Asia in Medieval times [2]. Despite its' East Iranic vocabulary, Yaghnobi demonstrates several linguistic features (i.e. gender loss, past imperfective preservation from present stem of a verb) which separates it from those modern East Iranic languages immediately surrounding it. Furthering the uniqueness of the Yaghnobi language in this context is the unity it forms through these features with languages mostly spoken further west in the Iranian plateau (e.g. Persian, Gilaki, Kurdish dialects). [2]

Although the results are preliminary and lack any empirical data, Cilli et al. have discovered some interesting connections between the Yaghnobi and relevant populations. In summary, they found the following:

MDS Plot of Results
  • 42 individuals used for the preliminary work belonged to only 19 distinct mtDNA haplotypes. Of these, 11 were distinct among the Yaghnobi.
  • The Yaghnobi have less mtDNA genetic diversity than other Central Asian populations (0.930) and this is attributed to their geographical isolation and recent history of displacement by the U.S.S.R. in the 1970's for agricultural purposes, where a small group (300) returned and repopulated their original homelands.
  • Intriguingly, the Yaghnobi shared all of the mutual haplotypes (8/19) with populations from Iran (e.g. Gilakis, Mazandaranis and Iranians from Tehran and Esfahan) instead of other Central Asian groups, including their Tajik compatriots.
  • The Yaghnobi shared most of these mutual haplotypes with Gilakis, Kurmanji Kurds and Avars from the Caucasus (4 each).
  • However, owing to their predominantly distinct mtDNA character, the Yaghnobi are clear outliers from the general zone occupied by the reference groups. 

My critique and interpretation of these results are as follows:

  • At least two instances of genetic drift occurring (founder effect via geographic isolation, bottleneck due to Soviet relocation) is likely responsible for the decreased mtDNA diversity. Thus, it is clearly simply a reflection of their environment.
  • As a result of the Soviet relocation, it may be useful to determine whether results from the displaced parent population match what has been stated here. This is quite possible given the relocations occurred just over one generation ago (~40 years).
  • It is difficult to criticise the decision to test 62 individuals and the utilisation of 42 haplotypes, given the Yaghnobi population in their homeland between 2007-9 only numbered approximately 500. Approximately 8% of the entire Yaghnobi population was therefore analysed here, which is a generous frequency given the amount of attention the region has received.
  • The MDS plot would have benefited from the inclusion of populations in Europe, Southwest Asia and South Asia to comprehensively flesh out the position of Yaghnobis in Eurasia.
  • Accepting that this is a preliminary investigation, it would still have been pleasing to see some raw data published. Aside from confirming that some/one Yaghnobi matched the Cambridge Reference Sequence (CRS, thus Haplogroup H2a2a which happened to be found in all the populations tested), there is no indication as to what the other mutations looked like. Or, for that matter, what mtDNA haplogroups were even present!

Correlation with Y-Chromosomal Data?

The Yaghnobi have been studied at least one other time through their inclusion in Dr. Spencer Wells et al.'s seminal piece The Eurasian heartland: a continental perspective on Y-chromosome diversity. The breakdown of their Y-Chromosomal SNP data (n=31) is as follows: [3]

3% C-M130(xC3a3-M48)
32% J2-M172
Y-SNP clustering reveals Yaghnobis sit near SE Europe and the Near-East
3% K-M9(xO-M175, O3-M122, O1a-M119, O2a1-M95, N1c1-M46) (possibly parahaplogroup such as K*-M9)
10% L-M20
3% P-M45 (xQ1a1-M120, Q1a3a1-M3, R2a-M124)
32% R1-M173 (likely R1b1a1-M73 or R1b1a2-M269)
16% R1a1a-M17(xR1a-M87, private marker)

Despite the double genetic drift undoubtedly affecting the frequencies, it is worth pointing out that the Yaghnobi presented with a broadly similar Y-DNA spectrum as Iran, where J2-M172, L-M20, R1-M173 and R1a1a-M17 (including subclades) comprise approximately 53% of the national average (refer to Grugni et al. analysis). 

This comparison should be taken with a grain of salt given the Iranian national average also comprises non-Iranic-speaking ethnic groups, the Wells Yaghnobi data does not present with thorough downstream Y-SNP evidence, the sample size is contentious and at least two contributors of a founder effect exist. However, that the Yaghnobi appear rich in J2, L and R is certainly reminiscent of Iranic-speaking populations in the region.


The Yaghnobi are an exceedingly interesting population whose overall parental markers seem to support a connection with populations further west than one would anticipate.

Despite the misgivings of all the data concerning them to date, the mtDNA similarity does corroborate specific linguistic features between the Yaghnobi language with those in the Iranian plateau, such as Kurdish or Persian.

If the data holds up in future investigations, it certainly calls to question whether the proposed model of linguistic inheritance exclusively down the parental line (as represented by Y-DNA data) is entirely correct given this connection.

How the Yaghnobi came to display the markers within them whilst speaking an East Iranic dialect with traits akin to those found in West Iranic languages is an intriguing question. One possible scenario is that the Yaghnobi are partly descended from ancient Iranians from the Iranian plateau during the Achaemanid era. This would also account for the linguistic commonalities noted in current literature.

Time (with the assistance of more mtDNA, Y-DNA and auDNA) will help us understand what happened in Central Asia during the formative period that was the Indo-Iranian migrations.


1. Cilli E, Delaini P, Costazza B, Giacomello L, Panaino A, Gruppioni G. Ethno-anthropological and genetic study of the Yaghnobis;an isolated community in Central Asia. A preliminary study. J Anthropol Sci. 2011;89:189-94.

2. Windfuhr, G. The Iranian Languages. 1st ed. Routledge Language Family Series. 2009.

3. Wells RS, Yuldasheva N, Ruzibakiev R, Underhill PA, Evseeva I, Blue-Smith J. The Eurasian heartland: a continental perspective on Y-chromosome diversity. Proc Natl Acad Sci U S A. 28;98:10244-9. 2001.

Sunday, August 19, 2012

Introducing the ACD Tool [Original Work]

It is with satisfaction I announce the release of my first ever population genetics spreadsheet for fellow researchers. The Ancestral Component Dissection (ACD) Tool is a piece freeware I have developed to give those with a similar knack for fiddling with ADMIXTURE, Y-SNP and mtDNA frequency data better means to flesh out inter-population differences.

ACDTool (v1.0)
How Does The ACD Tool Work?

The ACD Tool relies on the frequencies of "ancestral components", a general catch-all term for uniparental markers (Y-SNP's, mtDNA) and Autosomal DNA (auDNA). These form the mainstay of much of the work that has been done in population genetics for the past few decades. The advent of "genome blogger" projects has brought the immediacy of these techniques to those who have tested with personal genetics companies, such as Family Tree DNA (FTDNA) and 23andMe. The ACD Tool should therefore be considered a supplementary item by those interested in these results, as well as data procured from current literature.

The level of commonality that occurs between many populations and ethnic groups poses a problem for those interested in investigating what differences arise between them.

To solve this, the ACD Tool works by removing mutual shared component frequencies between sample averages within a region. The idea is to lessen the amount of regional similarity and intentionally exaggerate those differences that exist between neighbours.

This is achieved by removing congruent component values across all populations (using the lowest value as a benchmark), leaving only the differences behind.

What Experiments Are Ideal?

As the ACD Tool is intended for finer inter-population analysis, it is best applied in a regional context. It serves the purpose of better revealing genetic differences which may account for linguistic or micro-regional trends.

Example #1: Northeast Europeans (Dodecad)

Once the Polish, Russian and Finnish Dodecad cohort averages were run through the ACD Tool, I simply used Excel to create the charts. The "Before-After" feature is used to highlight that the tool has completely achieved its' desired goal in amplifying the genetic differences between them:

NE European auDNA (Dodecad) through the ACD Tool

Example #2: West Asians (Harappa)
Using the Harappa Ancestry Project this time, I ran the data of Armenians, Assyrians, Kurds and Iranians (mostly from the Harappa cohort) into the ACD Tool once more and presented the differences as above:

W Asian auDNA (Harappa) through the ACD Tool

Example #3: South-Central Asians (Eurogenes)
A final example pits Pathans, Jatts, the Burusho, Balochis and Brahuis against one another:

SC Asian auDNA (Eurogenes) through the ACD Tool

Are There Any Drawbacks?
The efficacy of the ACD Tool depends on the number of populations, cohort size and cohort specificity. As the examples above show, the level of inter-population component sharing may decrease greatly if groups that are from more genetically diverse regions are compared.

In addition, using the ACD Tool on populations that are too different (i.e. Han Chinese and Yoruba) will not work given the genetic overlap through either ADMIXTURE, Y-SNP's or mtDNA is negligible. Of course, this defeats the point of the tool in the first place.

Lastly, the tool requires Macros to be enabled for the instructions to work.


The ACD Tool is an open-source free-to-use spreadsheet. Those wishing to modify the spreadsheet for their personal use are welcome to do so. However, any modifications made to the ACD Tool with the intent of subsequent redistribution are kindly asked to contact the creator (myself) before doing so out of common courtesy.

Please also note the ACD Tool is a first attempt at giving back to the genealogy world I have been a part of for several years. Though functional (as shown above), it is not without bugs. In light of this, I am not responsible for any loss of data that may occur from its' use.

Finally, I hope the genealogy world finds some use for this nifty piece of kit.


To the Dodecad Ancestry ProjectHarappa Ancestry Project and Eurogenes Genetic Ancestry Project (auDNA used in Examples).

Addentum I [20/08/2012]: ACDTool v1.1 replaces v1.0, Macros smoothened and instructions refined. Eurogenes South-Central Asian example also added.

Saturday, August 4, 2012

West Asian Y-DNA Haplogroup Q - Turkish or Autochthonous Origins? [Original Work]

Genographic Project Y-DNA Q Migration Route

Y-DNA Haplogroup Q is defined by the M242 marker and is upstream to Haplogroup P-M45, making it the sister Haplogroup of R-M207, which populates much of West Eurasia. According to the Genographic Project, Haplogroup Q-M242 is between 15-20,000 years old, with the location invariably being placed around North Eurasia.

The frequency of Haplogroup Q largely matches the migration path outlined in the maps shown opposite. However, the presence of haplogroup Q in more southwestern portions of Asia has sparked the curiosity of genealogists and observers alike. In current literature, the presence of Haplogroup Q1a2-M25 specifically in Iran is cited as "Central Asian" influence. [1]

In an attempt to conclusively uncover the origins of Haplogroup Q-M242 in West Asia, the Y-STR haplotype variation of West, Central and South Asian Q1a-MEH2 and Q1b-M378 are visualised and analysed with genealogical tools.

The data for this investigation are gathered from various Family Tree DNA (FTDNA) projects and studies, [1,2,6-11] with the concise list shown in the References section below.

Only results presenting at least 16 Y-STR's were considered. Modifications were made as necessary on certain STR markers (particularly Y-GATA H4) to correct nomenclature differences. Urasin's YPredictor was used when Y-SNP information from studies were inadequate (e.g. no SNP's upstream of Q-M242 tested).

Samples follow a constant naming convention, with _n and _yQP_n suffixes indicating they were obtained from studies and FTDNA Projects respectively. The following populations were included;

FTDNA Y-DNA Q Migration Route
Irn = Iranian (Unspecified ethnicity), Azr_Tal = Talysh from the Republic of Azerbaijan, Trk/Tur = Anatolian Turkish, Ptn = Pashtun from Afghanistan, Ind = Indian (Unspecified ethnicity/caste), Irq = Iraqi (Unspecified ethnicity), Kzk = Kazakh, Pak = Pakistani (Unspecified ethnicity), Uzb = Uzbek, Tjk = Tajik, Haz = Hazara, Npl = Nepali, Arm = Armenian, Geo = Georgian, UAE = Emirati Arab, Irn_Arab = Iranian Arab (Khuzestan), Irn_Mzn = Iranian Mazandarani (Mazandaran), Irn_Bkt = Iranian Bakhtiari

Once collation was complete, modal haplotypes of inferred clusters were found if necessary. Additionally, clusters were inferred from haplotrees that were created. The Most Recent Common Ancestor (tMRCA) of choice clusters were calculated by comparing two modals from the first pair of intra-cluster branches. Due to the STR panels tested in the concerned papers (Y-Filer order 1) McGee's Y-Utility was the only immediately viable choice (infinite allele mutation model, 75% Probability, 25 year/generation).

Working Hypothesis
An indeterminable mix of recent (>1500ybp) and prehistoric Y-DNA Q1a-MEH2 and Q1b-M378 lines exist in the region with some instances of close haplotype sharing between West, South and Central Asia.

Limitations Of This Investigation
  • Although the number of STR panels tested has increased gradually over the past decade, 16 is not considered a "confident sell" in the genealogy world. 
  • Additionally, the difference in STR panels used meant some informative populations, such as the Makrani, Baloch, Burusho and Parsis of Pakistan were not included due to an overlap of only 12 STR's.
  • Y-STR's from several crucial populations, such as the Qashqai, Iraqi Turkoman and Azeri's from the Republic of Azerbaijan could not be found.
  • There is, of course, the great debate concerning STR mutation rates. At the time of writing I have not observed any clear consensus in the genealogy regarding this topic. The applicability of Nordtvedt's Generations series to this entry is minimal due to an STR overlap issue, hence the decision to use McGee's tool instead.
  • As discussed later, the number of Y-SNP's tested across the cited studies are insufficient to draw firm conclusions.
  • Finally, sample size is an issue. The dataset is dominated by Iranian or Afghan samples because these papers were released at times (i.e. 2008-present) where the 17 STR Y-Filer panels became mainstream. 

Y-DNA Q1a Phylogenetic Tree
Haplogroup Q1a STR Results
Four informative clusters were inferred;

  • Cluster A (DYS19=15, DYS389i=12) is largely restricted to Afghan Pashtuns, with Ptn_1-4 all sharing having a MRCA with their modal (and therefore likely founding haplotype) between 900-450 ybp. This result is consistent with the dominance of Turkic-speaking dynasties in this time period. 
  • Cluster B (DYS385a=14) has a large geographical spread from Turkey through to Iran, the United Arab Emirates, Afghanistan, Nepal and Kazakhstan. The most immediate observation is the close haplotype sharing (3-step mutation, 14/17) between Kzk_1 and Irn_4, with an estimated MRCA at 900 ybp. This result, together with the general area covered, again indicates this cluster should at the very least be broadly associated with Central Asian Turks.
  • Cluster C (DYS392=16, DYS389ii=28, DYS448=22) is interesting because its' members are exclusively Iranian and belong to Haber et al.'s Influences of history, geography, and religion on genetic structure: the Maronites in Lebanon. [2] Most of the Iranians bearing Haplogroup Q-M242 in their sample were from West Iran, where Iran's Azeri population happens to dominate the northern region. The regional exclusivity of this cluster combined with the very recent MRCA (900 ybp) lead me to suspect Haber and his associates sampled a locale in West Iran that underwent genetic drift, explaining the +10% Q-M242 that is otherwise not seen in other studies. [1] However, the MRCA too suggests these Iranian men's paternal ancestor was also associated with Medieval Turks despite the result in it's entirety not representing West Iran sufficiently.
  • Cluster D (DYS439=11, DYS437=15) mirrors Cluster B's distribution across the region but the divisions are more consistent with geography than other variables (i.e. Anatolian Turk and Armenian, Hazara together). 

Haplogroup Q1b STR Results
Five informative clusters were inferred;

Y-DNA Q1b Phylogenetic Tree
  • Cluster A (DYS385a=12, DYS439=11, DYS437=15) is, relative to the others, an early offshoot that is highly localised in South-Central Asia. 
  • Cluster B (DYS385a=14) is also localised, found specifically in Iraq and Iran.
  • Cluster C (DYS385a=14, DYS448=20) is twinned with B but appears to have a younger MRCA (925 ybp). Of interest is the wide geographic distribution across Turkey, Iran, India and Kazakhstan. Central Asian Turks once more provide a convenient historical narrative for both the predicted MRCA and spread.
  • Cluster D (DYS385a=15) is again geographically localised, this time in the greater Near-East (Turkey, Iran and Syria). 
  • Cluster E (DYS385a=12, DYS437=15) once more displays geographic localisation in South-Central Asia, specifically among Afghani Pashtuns and a FTDNA Project Pakistani.

SNP's - What Do They Tell Us?
Tabulated Y-DNA Q SNP's for select populations from several studies [1, 3-5] can be viewed in the Vaêdhya Data Sink.

There is, unfortunately, a two-pronged incompatibility issue between the Y-STR analysis and Y-SNP's provided here. Not only is there poor overlap between the populations covered in both sets, but the SNP selections in the four studies cannot do not provide us with a clear picture regarding the presence of Q*-M242(xQ1a-MEH2,xQ1b-M378) Q1a*-MEH2(xQ1a2-M25), Q1a2-M25 and Q1b-M378.

However, the distribution of Q1a3-M346 and Q1b-M378 across the Iranian plateau in contrast with the specificity of Q1a2-M25 in Azeri Iranians and Turkmen (1.6% and 42.6% respectively, although the latter is likely due to genetic drift as discussed here) suggests a strain of the first two lineages is linguistically neutral and preceded the millennia of Turkish dynastic dominance in Iran.

Fortunately, such an inference is indeed supported by the Q1a and Q1b phylogenetic trees shown in this entry. One will note (particularly with Q1b-M378) the distribution is largely geographical rather than covering large swathes of Asian land through a "recent" paternal ancestor.

A comment on Assyrian Q-M242
Although the number of STR markers tested do not allow their inclusion into this research piece, I took the liberty of comparing the sole Assyrian Y-DNA Haplogroup Q-M242 individual from the FTDNA Assyrian Heritage DNA Project to elaborate on their paternal ancestor's ultimate origins.

The Assyrian people are a Neo-Aramaic-speaking ethnic minority native to the land intersecting between Turkey, Iran and Iraq as well as the Mesopotamian basin. Modern Assyrians have (due to their Christian faith and recent historical events) practiced endogamous relationships, making them a genetically distinct group minimally affected by demic movements in the surrounding populations.

The Assyrian Y-DNA Q belongs to the Q1b1a-L245 subclade. As we have observed already, haplogroup Q1b-M378 tends to have a distribution governed more by geography with deeper cluster branches, implying greater diversification time in a given region.

At present, based on the available 10 overlapping STR's, the Assyrian Q1b1a-L245 individual matches Tur_yQP_3 best with a one-step mutation (9/10), placing them deep within Cluster C, the only one without a region-specific distribution. This preliminary evaluation indicates this Assyrian man's paternal ancestor shares Medieval genetic links with Anatolian Turkish, Iranian, Indian and Kazakh men, making a Central Asian Turkish connection likely once more.

Due to the limitations described above, the identification of clusters is more relevant based on their geographic spread. The MRCA calculations shown are simply an extremely rough estimate at the age of a cluster.

However (and fortunately once more), it is very clear that some clusters are determined by geography rather than the sort of "genealogical boon" observed in a few (e.g. Q1a Cluster C's extensive branching despite being young relative to the others).

If one takes the MRCA calculations as a very rough approximation, whilst considering a cluster's ability to supercede regional boundaries, one can estimate that 75.4% (40/53) of the Y-DNA Haplogroup Q1a-MEH2 and 31.4% (11/35) of Y-DNA Haplogroup Q1b-M378 in West, Central and South Asia can be attributed to the Turkish migrations.

In summary, Y-DNA Haplogroup Q1a-MEH2 (likely Q1a2-M25 based on anecdotal SNP evidence) is a convincing Medieval Central Asian Turkish genetic marker based specifically on its' ability to form multi-ethnic clusters in regions with a historical Turkish connection. Q1b-M378, on the other hand, generally displays enough regionalisation and cluster depth to make such an association doubtful at best, with the sole exception being those who belong to the a genetic group highlighted in this entry (Cluster C) with DYS385a=14 and DYS448=20. 

South Central Asian Q1b-M378 appears to be autochthonous whereas any form of Q1a-MEH2 in the region has a strong association with regions intimately connected with the Medieval Turks. The Anatolian highlands and the Iranian plateau, however, appear to be a complicated mix between the two based on the lack of clear distinctions.

The slim presence of Haplogroup Q in India on the other hand, as far as the current data indicates, is almost entirely of Medieval Turkic input, although the Subcontinent's position as a geographic nexus (much like Iran and Turkey) certainly open the possibility for exotic para-haplogroups to also exist there.

  • Gratitude is extended to the FTDNA Projects for making their data publicly available. Independent research ventures such as my own would not be possible without their generosity.
  • I would also like to thank Mr. Paul Givargidze, administrator of the Assyrian Heritage, Aramaic and Y-DNA J1* DNA Projects at FTDNA for providing his esteemed support on this research entry.
  • The Y-DNA Haplogroup Q migration route maps are courtesy of the Genographic Project and FTDNA.

Addendum I [5/08/2012]: It has been brought to my attention that Tur_yQP_3, the Assyrian Q1b1a's best match, is in fact an Armenian individual. Although this does not compromise the conclusions reached above, it does serve as a reminder that not everyone in the Republic of Turkey is an ethnic Turk!
Addenum II [6/08/2012]: A recent exchange on a forum highlighted the likelihood of several Turk_yQP samples being Armenian rather than Anatolian Turkish. As above, the findings shouldn't impede too greatly on what has been discussed in this entry.

1. Grugni V, Battaglia V, Hooshiar Kashani B, Parolo S, Al-Zahery N, et al. (2012) Ancient Migratory Events in the Middle East: New Clues from the Y-Chromosome Variation of Modern Iranians. PLoS ONE 7(7): e41252. doi:10.1371/journal.pone.

2. Haber M, Platt DE, Badro DA, Xue Y, El-Sibai M, Bonab MA, Youhanna SC, Saade S, Soria-Hernanz DF, Royyuru A, Wells RS, Tyler-Smith C, Zalloua PA; Genographic Consortium. Influences of history, geography, and religion on genetic structure: the Maronites in Lebanon. Eur J Hum Genet. 2011 Mar;19(3):334-40. Epub 2010 Dec 1.

3. Al-Zahery N, Semino O, Benuzzi G, Magri C, Passarino G, Torroni A, Santachiara-Benerecetti AS. Y-chromosome and mtDNA polymorphisms in Iraq, a crossroad of the early human dispersal and of post-Neolithic migrations. Mol Phylogenet Evol. 2003 Sep;28(3):458-72.

4. Abu-Amero KK, Hellani A, González AM, Larruga JM, Cabrera VM, Underhill PA. Saudi Arabian Y-Chromosome diversity and its relationship with nearby regions. BMC Genet. 2009 Sep 22;10:59.

5. Cinnioğlu C, King R, Kivisild T, Kalfoğlu E, Atasoy S, Cavalleri GL, Lillie AS, Roseman CC, Lin AA, Prince K, Oefner PJ, Shen P, Semino O, Cavalli-Sforza LL, Underhill PA. Excavating Y-chromosome haplotype strata in Anatolia. Hum Genet. 2004 Jan;114(2):127-48. Epub 2003 Oct 29.

6. Gokcumen Ö, Gultekin T, Alakoc YD, Tug A, Gulec E, Schurr TG. Biological ancestries, kinship connections, and projected identities in four central Anatolian settlements: insights from culturally contextualized genetic anthropology. Am Anthropol. 2011;113(1):116-31.

7. Roewer L, Willuweit S, Stoneking M, Nasidze I. A Y-STR database of Iranian and Azerbaijanian minority populations. Forensic Sci Int Genet. 2009 Dec;4(1):e53-5. Epub 2009 Jun 5.

8. Dulik MC, Osipova LP, Schurr TG. Y-chromosome variation in Altaian Kazakhs reveals a common paternal gene pool for Kazakhs and the influence of Mongolian expansions. PLoS One. 2011 Mar 11;6(3):e17548.

9. Haber M, Platt DE, Ashrafian Bonab M, Youhanna SC, Soria-Hernanz DF, et al. (2012) Afghanistan's Ethnic Groups Share a Y-Chromosomal Heritage Structured by Historical Events. PLoS ONE 7(3): e34288. doi:10.1371/journal.pone.0034288

10. Tenzin Gayden, Alicia M. Cadenas, Maria Regueiro, Nanda B. Singh, Lev A. Zhivotovsky, Peter A. Underhill, Luigi L. Cavalli-Sforza, and Rene J. Herrera. The Himalayas as a Directional Barrier to Gene Flow. Am J Hum Genet. 2007 May; 80(5): 884–894.

11. Lacau H, Bukhari A, Gayden T, La Salvia J, Regueiro M, Stojkovic O, Herrera RJ. Y-STR profiling in two Afghanistan populations. Leg Med (Tokyo). 2011 Mar;13(2):103-8. Epub 2011 Jan 14.

Thursday, July 19, 2012

Interpreting New Iranian Y-Chromosomal Data (Grugni et al.) [Review]


A new study on Iranian Y-Chromosomes released just yesterday has, to my satisfaction, adequately sampled every major ethno-linguistic group as well as determining inter-provincial variation between them. Grugni et al. sampled 938 unrelated Iranian men from 15 ethnic groups (including Assyrians, Zoroastrians and Turkmen) in 14 provinces across the country.


"Knowledge of high resolution Y-chromosome haplogroup diversification within Iran provides important geographic context regarding the spread and compartmentalization of male lineages in the Middle East and southwestern Asia. At present, the Iranian population is characterized by an extraordinary mix of different ethnic groups speaking a variety of Indo-Iranian, Semitic and Turkic languages. Despite these features, only few studies have investigated the multiethnic components of the Iranian gene pool. In this survey 938 Iranian male DNAs belonging to 15 ethnic groups from 14 Iranian provinces were analyzed for 84 Y-chromosome biallelic markers and 10 STRs. The results show an autochthonous but non-homogeneous ancient background mainly composed by J2a sub-clades with different external contributions. The phylogeography of the main haplogroups allowed identifying post-glacial and Neolithic expansions toward western Eurasia but also recent movements towards the Iranian region from western Eurasia (R1b-L23), Central Asia (Q-M25), Asia Minor (J2a-M92) and southern Mesopotamia (J1-Page08). In spite of the presence of important geographic barriers (Zagros and Alborz mountain ranges, and the Dasht-e Kavir and Dash-e Lut deserts) which may have limited gene flow, AMOVA analysis revealed that language, in addition to geography, has played an important role in shaping the nowadays Iranian gene pool. Overall, this study provides a portrait of the Y-chromosomal variation in Iran, useful for depicting a more comprehensive history of the peoples of this area as well as for reconstructing ancient migration routes. In addition, our results evidence the important role of the Iranian plateau as source and recipient of gene flow between culturally and genetically distinct populations."


Interpretation of Results

Iranian Y-SNP Frequencies

Data from the original study can be found opposite. In addition, several contour maps showing the frequency of select Y-DNA Haplogroups found across the country are shown along the right. Armenians, Zoroastrians and Assyrians from Tehran, as well as Afro-Iranians from Hormozgan province, are excluded. Note that updated ISOGG nomenclature was applied wherever deemed appropriate (refer to SNP's for clarification of status). Frequency ranges shown on maps are from 0-100%. Please note the maps are only intended to depict general trends rather than specific figures. Refer to the figures from the study (above) for these.

- Consistent with anthropological data and historical records from South Iran, the Y-DNA Haplogroups with frequencies greater in Africa than Eurasia (B-M60 and E2-M75) peak in Hormozgan province. 

- Over half a dozen para-Haplogroups (C*-M216, F*-M89, H*-M69, IJ*-M429, J2*-M172, L*-M61, NO*-LLY22g, Q1*-P36.2 and R*-M207) were found scattered across Iran. Although the presence of para-Haplogroups within a region are often taken as an indicator of a lineage's antiquity there, both their consistency and correspondence with downstream younger clades must be considered before such a conclusion is made. As such, I do not consider H*-M69, NO*-LLY22g or C*-M216's presence in this cohort to indicate anything other than Iran's position as a geographic crossroad. The remaining ones (particularly J2*-M172, L*-M61 and R*-M207) require further investigation to elucidate whether Iran does stake the claim to the origins of each.

- Further to the above, it is likely that the R*-M207 reported in this paper is in fact R2*-M479 based on the dated SNP array used.

- C5-M356 makes a sporadic appearance across Iran. A mysterious clade with a spotty distribution across much of Eurasia. In the region, it is more commonly associated with the Indian Subcontinent.
Iranian J1c3-PAGE08

- Haplogroup G makes a strong appearance with, in my opinion, enough clade diversity to validate an origin in Iran or a close-by region. This is partially supported by its' presence in every ethnic group, albeit through different subclades.

- Although IJ*-M429 has finally been found, Grugni et al.'s decision not to publish STR data does not give us the means to determine if the two Mazandarani and Persian men are in fact related within a genealogical timeframe. The significance of this find in Iran will have to remain pending.

The lacklustre SNP definition in the Y-DNA I found in Iran (Gilaki, Bandari, Kurdish and Armenian populations between I1-M253 and I2-M438) dissuades strong conclusions regarding the development of I-M170 relative to IJ*-M429's discovery. The lack of STR's prevents us from ascertaining whether these are recent contributions from Europe or not, or whether there is any European connection to begin with.

- Both the frequency and subclade diversity of Haplogroup J2-M172 (as well as the presence of J2*-M172 and J2a*-M410 across the country) makes Iran a strong candidate for the origin of this lineage.

The strong presence of J1c3-PAGE08 is one of the surprising finds of this study. With an absence only amongst Assyrians from Azarbaijan province and a peak in Khuzestani Arabs (31.6%), I speculate this is an early Near-Eastern pastoralist nomad marker that is only accentuated in Khuzestani Arabs because the L147.1 marker (J1c3d), which is commonly associated with the expansion of Semitic languages (particularly Arabic in literature) was not tested here. Otherwise, it would be difficult to reconcile medieval Arabic admixture among Iran's Zoroastrians being comparable (and often greater) than Azeris, for instance, as Azerbaijan hosted Arab garrisons following the Sassanid collapse.

- Haplogroup Q presents with a very distorted picture. 42.6% of Turkmens belonging to Q1a2-M25 is not in agreement with Wells et al.'s The Eurasian Heartland: A continental perspective on Y-chromosome diversity, where Haplogroups J, N, R1a and R1b predominated, suggesting either an extensive Founder effect has taken place (i.e. regionalisation of certain branches from a common Oghuz Turk pool) or the Golestani Turkmen values have experienced a more generic form of genetic drift.
On the matter of Turkic affinities, Azeri's from Azarbaijan province have greater subclade variation than all other ethnic groups. However, the total frequency is either comparable (or less) than Persians nationwide. As it stands, if one were to presume Haplogroup Q in Iran was of Turkic origins, it would appear their contribution to the Persian and Azeri genepools is comparable despite linguistic differences. Although more data would certainly flesh this matter out, this diversity combined with the presence of N-M216 among Iran's Azeri population certainly gives a genetic basis for their linguistic heritage.

Haplogroup R1a1a-M17 is regularly found at frequencies greater than 15% across Iran, contrary to the assertion made by Dr. Wells one decade ago regarding the limited samples he obtained, again from The Eurasian Heartland: A continental perspective on Y-chromosome diversity ;

Iranian G2a-P15
"Intriguingly, the population of present-day Iran, speaking a major Indo-European language (Farsi), appears to have had little genetic influence from the M17-carrying Indo-Iranians."

It is somewhat ironic, however, to note that the Persians from Fars province presented one of the lowest R1a1a-M17 frequencies observed in this study. Whether sampling chance is an issue here, or the sparsity of M17 is indeed a reality, is an open question.

- The presence of both R1a1-SRY1532.2 (shown as R1a* due to old nomenclature) and R1b*-M343 repeat the presence of these para-Haplogroups in the region, indicating West Asia was from whence Haplogroup R1-M173 began differentiating into the two primary subclades we see today in Eurasia.

Haplogroup R1b1a2a-L23 is more frequent in the north and west of the country, which (together with its' presence in the furthest southern and eastern poles at ~3%) suggests it likely moved in an overall south-easterly direction via diffusion, probably during the Neolithic.

- The distribution of Haplogroup R2a-M124 is, much like C5-M356, irregular. Contrary to what is shown in Haber et al.'s research, R2a is not more common in the east of the country. Instead, it can be found amongst Esfahani Persians at a frequency of 9.1%. That Iran's R2a frequency achieves its' peak in the centre of the country is reminiscent of Sahoo et al.'s A prehistory of Indian Y chromosomes: Evaluating demic diffusion scenarios;

The sensationalist question of the hour; what accounts for the spike in R2a-M124 that has been picked up in Central Iran for the past half decade?

- Finally, Haplogroup T-M70 enjoys a frequency of 10.1% amongst Assyrians from Azarbaijan province, whilst also being more common among Persians across the country and Iranians from the western periphery of the country (Azeris and Kurds). This would suggest, therefore, an at least passive but deep association with ancient Near-Eastern cultures.

Criticisms of Paper

Despite the rich sampling pool, I have several immediate criticisms;

Iranian J1-M267
  • There are some issues with the sampling strategy employed by this paper. For instance, the Assyrians (Christian non-Arab Semitic-speaking minority) are represented by 39 men, although Persians from Esfahan (a major Iranian city) are by 11 only. 
  • Inadequate haplotype data has been released; the only offering is 8-STR's from select lineages (e.g. J1*-M267) which were used for variance analysis.
  • Furthermore, a maximum of 10 Y-STR's were analysed, rendering some of their variance calculations questionable at such a low resolution. This also does away with the possibility of MRCA and intra-subclade age calculations.
  • Grugni et al. have approached Haplogroup R1a1a-M17 in a similar vein to past studies (e.g. Haber et al., see Showcasing of Y-DNA Variation Among Afghan Ethnic Groups) by not referring to current data concerning the structure of R1a1a. As with Haber et al., R1a1a-M458 is taken as the "European" strain, despite research undertaken by the R1a1a and Subclades Y-DNA Project revealing the apparent schism between the upstream Z283 and Z93 SNP's being far more informative in this regard.
  • Haplogroup R1b1a2*-L23 is considered as a "West Eurasian" paternal contribution to the Iranian plateau rather than the possibility it may have originated within or in proximity to the country's western zone. 
  • As shown in Interpretation of Results, Grugni et al.'s use of dated nomenclature poses problems for those who may not be intimately familiar with recent Y-SNP Tree changes by ISOGG.


Map of Iran courtesy of

Tuesday, July 17, 2012

The Secrets of Central Asia: Chapter II - The Nomads of West Siberia [Review]

Molodin et al. have conveniently released an exciting paper just days ago, revealing the convergence and possible origins of maternal lines in several West Siberian sites across different points of time.

The authors made the following conclusions based on the data they had gathered;

"We therefore consider the appearance of the Haplogroup T-lineage as the most likely genetic marker of the Andronovo migration wave to the region....
Apparently, the Andronovo group... assimilated the aboriginal... population, from which it obtained these East-Eurasian mtDNA haplogroups. Obviously, there was reciprocal genetic contact between the migrant and indigenous groups in the region.
...These [autochthonous] components were represented by the Eastern Eurasian haplogroups A, C and Z, and the Western Eurasian haplogroup U5a. On the other hand, the results also reveal some changes in the mtDNA pool structure throughout the Bronze Age. Some of these changes, which point to migration waves to the West Siberian forest steppe zone, are in agreement with the archaeological and anthropological evidence. The most relevant migration waves occurred during the Middle Bronze Age (represented by the migration of the Andronovo culture, probably marked by Haplogroup-T lineages) and the transition from the Bronze to the Iron Age (represented by the migration from the south, marked by the U1a, U3 and H haplogroup lineages)."


In this blog entry, these conclusions reached are scrutinised together with the deeper ancestral associations of these haplogroup lineages with modern (and other ancient) populations.

The Original Paper's Findings
A total of 92 ancient DNA (aDNA) haplotypes in the form of mitochondrial DNA (mtDNA) were retrieved from five sites stratified across seven distinct archaeological periods in a fixed portion of West Siberia known as the Baraba forest-steppe, lying between the network formed between the Irtysh and Ob rivers. These haplotypes were obtained from Hypervariable Region 1 (HVR1) of mtDNA and are included in the original study (shown as Table 3). 

Sampling Sites in Babara Forest-Steppe

As no burial remains have been found dating to the Pleistocene (11th-12th millenium BC) in or around the Baraba forest-steppe, which is the earliest period where anatomically modern humans reached this region, the ultimate origins of the Early Bronze Age lineages are left open to interpretation. Nonetheless, below is a summary of each archaeological culture showcased in the paper, as well as relevant extracts from the literature. [1]

Ust-Tartas (4000-3000 B.C.)

The inhabitants of the earliest grave-containing Baraba prehistoric culture appeared to be Caucasoid-Mongoloid hybrids based on anthropological data whose distribution spanned the swathe of forest from Karelia and the Baltic through to the Ural region. Numerous Russian sources have previously described this concept as the Northern Eurasian Anthropological Formation (e.g. Bunak V.V.). Additionally, a comparison with the nearby Comb-pit Ware culture revealed enough anthropological similarities to suggest the individuals of Ust-Tartas were likely to be autochthonous and not recent migrants.

Extent of the N. Eurasian Anthropological Formation
Of the 18 mtDNA haplotypes retrieved, East Eurasian lineages (A, C, D, Z) comprised a slight majority (11/18). The authors noted "widely distributed root haplotypes" for Haplogroups C and D, which presumably indicates greater antiquity of both in the region. The two individuals belonging to haplogroup A "[represent] a subcluster that is apparently characteristic of West Siberia and the Volga-Ural Region". There was surprise at the presence of Haplogroup Z based on its' absence in modern inhabitants of West Siberians, a topic explored later in this entry.
The seven West Eurasian mtDNA Haplogroups belonged entirely to U, comprising of U2e, U4* and U5a1. The authors recalled the findings of several other recent studies on ancient DNA, stating it likely belonged to "Eastern, Central and Northern European hunter-gatherer groups". [1] 

Besides affirming previous literature concerning the migration corridor between East Europe to East Asia, the haplotypes also complement the anthropological data concerning their status as Mongoloid-Caucasoid hybrids.

Odinovo (3000 B.C.) and Krotovo (Early, 2000 B.C.)

Both of these cultures, regardless of stage, represent a fairly linear continuity from the populations and traditions of the Ust-Tartas culture before them.

The Odinovo culture succeeds Ust-Tartas, although it is viewed as a synthesis between it and the Comb-Pit Ware archaeologically. Anthropological kinship between it and contemporary Baraba findings also confirm the autochthonous nature of Odinovo. However, it differs from its' antecedents in grave objects, funeral rites and the presence of bronze artefacts belonging to the Seima-Turbino cultural phenomenon, a short-lived (2200-1700 B.C.) but "striking" package of metallurgical  goods originating around the Sayan-Altai region in South Siberia that was oriented westwards towards Europe. [2]

In turn, the Krotovo culture is partially derived from Odinovo, although it isn't without its' own influences from adjoining regions. As well as "strikingly different" funeral rites, [1] new archaeological features, including items fashioned out of chalcedony, jaspilite and enstatite, point toward interactions of some degree with the Petrovo culture found further south in Kazakhstan, where the nearest deposits of these materials lie. It is worth noting the physical type of the Krotovo people revealed no significant changes, remaining in-line with the previous autochthonous type.

A total of 16 mtDNA haplotypes were recovered from both Odinovo and the Early Krotovo stage. The spectrum of mtDNA Haplogroups remain unaltered from the Ust-Tartas samples, supporting the archaeological record of continuity. 

The paper goes on to elaborate on the discrepancy between the mtDNA results and the archaeological features of Krotovo by stating "our data did not allow us to detect any Central Asian genetic influence". [1] Several possible explanations which may be considered;
  1. New material items from Petrovo accompanied a male-mediated migration towards Krotovo, resulting in some level of cultural assimilation
  2. In support of the above, the Petrovo culture natives may have themselves been a southward extension of the "Northern Eurasian Anthropological Formation" and belong to the same basic physical type as Ust-Tartas, Odinovo and Krotovo individuals further north, making any inter-culture interactions difficult to infer
  3. Some mode of transmission between Krotovo and Petrovo took place (trade, "package diffusion")

Further information is needed to ascertain which is more probable, including (but not restricted to) Y-Chromosomal data from all concerned cultures for evidence of (dis)continuity between Odinovo and Krotovo through southern influence, as well as anthropological data from Petrovo to determine if they were indeed of the same basic physical type.

The summation of the evidence provided, however, indicates material items from further south were brought northwards into the Baraba forest-steppe after 2000 B.C., but these cultural changes do not reflect in the native maternal lineages, implying less overt processes (or male-mediated migration) were causative.

Krotovo (Late, 1750 B.C.) and Andronovo (1500 B.C.)

The next significant period of Baraban history comes with the arrival of semi-nomadic pastoralists whose origins lay further to the west. We are, of course, referring to the founders of the Andronovo archaeological complex, whose Indo-European language, culture and even ideology had eventually infiltrated deep into the Iranian plateau and Indian subcontinent through their utilisation of both horse and chariot. [3]

Schematic Tree of mtDNA Haplogroups  Found
Within Baraba, despite the Krotovo population coexisting with these newcomers for a length of time (presumably due to their occupancy of different pastoralist niches), we see evidence of a shift from Seima-Turbino to Andronovo with regard to their material traditions. Andronovan dominance is also reflected in the eventual northward displacement of some Krotovo natives based on archaeological data. [1] However, cranioanalysis presents a more complicated picture; the presence of an "autochthonous Mongoloid" variant not typically seen in the Baraba steppe-forest, differing from the hybrid type seen for hundreds of years prior, may suggest the two were not in direct contact and Andronovan influence was exerted by proxy of other native groups who were displaced northwards and east following their assimilation. This is anecdotally supported by Keyser et al.'s discovery of one Andronovo male (specimen S07) from near Krasnoyarsk in South Siberia carrying Y-DNA Haplogroup C*. [4] It is worth stating the physical type of those from Andronovo are commonly described as "Variants of three proto-Europoid types " with minor Mongoloid. [1]

40 mtDNA haplotypes from Late Krotovo (1750 B.C) and Andronovo (1500 B.C.) sites and time periods were taken.  As expected, the same spectrum of mixed West-East Eurasian lineages made an appearance, except for the strong introduction of one new Haplogroup.

In both Late Krotovo and Andronovo, Haplogroup T reaches a stable frequency of 15% in both despite being completely absent in 34 earlier haplotypes. The authors cite this as direct genetic evidence of Andronovan influence on Late Krotovo and postulate this lineage was, as a result, a major contingent in the Andronovo culture's spread.

All of these events precede the Irmen culture (1400-900 B.C.), the eventual successor to Andronovo. Those Irmen individuals found in the Baraba region were found to be predominantly Caucasoid and practiced a mixed economy of agriculture and animal husbandry. Only data from the Late stage (900-800 B.C.) was considered in the study.

Baraba (Late, 1000 B.C.)

The Late Baraba culture is a consequence of a Krotovo-modified Andronovo successor (known as Suzgan) interacting with the Irmen culture (described above). This was a particularly tumultuous period in West Siberian prehistory with tribes continuously coalescing unto one another, forming new identities in the process.

Anthropological data from the Late Baraba culture painted a far more diverse picture than over the previous three millennia. The authors noted that, contrary to the general insignificance of gender on physical type, the men were found to be more similar to a "Southern Eurasian Anthropological Formation", whereas females were closer to the Andronovan derivatives in North Kazakhstan. 

Only five mtDNA haplotypes were recovered from this period. Haplogroups A and C once again were represented, as was U5b and T, indicating the previous assimilation events had been maintained uptil this point. 

Irmen (Late, 900-800 B.C.)

From 1000 B.C. onwards, a complex set of migrations took place in West Siberia between the cultures formed by this point. Archaeologists attribute this to ecological changes involving climatic cooling across the region. 

The last sampled site is the Late Irmen culture, which is a continuation of the Irmen culture proper described earlier in this entry. The intricate interactions between cultures of this period are evident through multi-plural settlements in the archaeological record here.

The final 14 mtDNA haplotypes were, unexpectedly, a complete departure from the partial continuity that we have seen since Ust-Tartas uptil Late Baraba. Almost all belonged to West Eurasian lineages, such as Haplogroups J, K and W. The study had suggested the ultimate origins of these lineages came from further south, in the vicinity of West Kazakhstan and West Central Asia (Turkmenistan and Uzbekistan likely implied). This suggestion will be assessed in detail later in this entry.

Confirmation of the 'Migration Corridor'?
It is remarkable to finally find genetic evidence of the migration corridor, an archaeological concept mentioned several times in Vaêdhya, firmly imprint it in such a definitive way (visit North European Component Variation within the Eurasian Heartland for additional information). 

As it stands, we can now safely conclude that prehistoric hybridisation between hunter-gatherer Paleo-European populations and those from along the East across the Eurasian steppe. The crossover of both along opposing ends of this corridor has been supplemented with aDNA and anthropological evidence, with the finding of a near-equal hybrid population midway between the two poles all but confirming what the raw results have already revealed. Therefore, the connection between Northeast Europe and East Asia through the Eurasian steppe (even before Proto-Indo-European's formation) can no longer be considered a hypothesis, but a verified reality of demic prehistory. If supported with autosomal DNA (auDNA) from similar gravesites, it will drastically alter our perception of the migrations that happened afterwards, as well as doing away with over-simplified models of how certain languages and cultures permeated across Eurasia.

Afanasievo: Without a trail?
It is interesting to note that, despite covering over 3,000 years of prehistory, there is yet to be a trace of the Afanasievo culture, the earliest known offshoot of Yamnaya in the east, across this territory. Under the Eurasian steppe theory, the Afanasievo culture is connected with pastoral nomads who spoke an early (proto) form of the Tocharian branch, an extinct Centum Indo-European language which subverts the Centum:Satem isogloss in Eurasia. [5] The only attested connection between Afanasievo and the Baraba forest-steppe is through interactions between its' successor culture, the Karasuk, with the easternmost of the early Irmen. [1]

The question that persists is thus; where is the Afanasievo trail from Yamanaya through to the Urals and their final archaeological seat in South Siberia? Why have none of the Baraba forest-steppe cultures shown any indication of influence, be it cultural or anthropological, of Caucasoid pastoral nomads before those of Andronovo? 

To arrive at one likely answer, Frachetti's Pastoralist Landscapes and Social Interaction in Bronze Age Eurasia clarifies the material culture and mode of living in Central Asia during the Bronze Age;

"The calibrated C14 dates of Afanas'evo material are generally slightly earlier than those taken from Yamnaya contexts in the western steppe, which complicates a diffusionist explanation of the emergence of pastoralists in the eastern steppe. Although their origins may be obscure, communities associated with Afanas'evo materials still represent the earliest mobile pastoralists east of the Ural Mountains... [their] incipient strategy of cattle and sheep/goat herding, supplemented by hunting and fishing.
The Afanas'evo subsistence economy might best be characterized as a mixed or transitional form between hunting/fishing and localized pastoralism, arising from local antecedents or combining native strategies with diffused domestic innovations among local populations.
...Perhaps the strongest evidence that divides the Yamnaya and Afanas'evo pastoralists in the mid-fourth millenium BCE is the discontinuity of pastoral economic strategies among societies living between these territories."

If the Afanasievo culture was itself a combination of local hunting strategies and farming practices with their origins further west in the Yamnaya despite differing with contemporary societies above the Black and Caspian seas, one can postulate the Afanasievo people would have likely intermingled with native cultures in South Siberia whilst retaining their core pastoral attributes, and such an event would have occurred some time earlier. 
The Afanasievo bearers needn't travel through the Baraba forest-steppe neither; the maps shown in Chernykh's The “Steppe Belt” of stockbreeding cultures in Eurasia during the Early Metal Age, for instance, show a straight trajectory from the Urals to the Sayan-Altai region out of clarity rather than a factual basis. Little is currently known about the journey taken by these nomads, but the findings of this paper do help in confirming the founders of Afanasievo did not stray along the northern rim of the forest-steppe towards South Siberia.

1. Molodin VI, Pilipenko AS, Romaschenko AG, Zhuravlev AA, Trapezov RO. Human migrations in the southern region of the West Siberian Plain during the Bronze Age: Archaeological, palaeogenetic and anthropological data. 2012. Retrieved from here:$002f9783110266306$002f9783110266306.93$002f9783110266306.93.xml [Last Accessed 17th July 2012]

2. Chernykh E. The “Steppe Belt” of stockbreeding cultures in Eurasia during the Early Metal Age. Trabajos De Prehistoria. 2008;65:73-93.

3. Kuz'mina EE. The Origin of the Indo-Iranians. Koninklijke Brill NV, Leiden, The Netherlands. 2007.

4. Keyser C, Bouakaze C, Crubézy E, Nikolaev VG, Montagnon D. Ancient DNA provides new insights into the history of south Siberian Kurgan people. Hum Genet. 2009;126:395–410.

5. Anthony DW. The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World. Princeton University Press. 2007.

6. Frachetti MD. Pastoralist Landscapes and Social Interaction in Bronze Age Eurasia. University of California Press, Ltd. 2008.

Tuesday, June 26, 2012

Worldwide Distribution of Dodecad K10a Components [Review]

Numerous ADMIXTURE runs have been completed by the Dodecad Ancestry Project since its' inception approximately two years ago. The status of certain components remained tenuous despite subsequent runs, whilst others provided fairly stable values for the bulk of the project's participants.

With the completion of the latest K10a run, I have composed a series of geographically accurate frequency maps with the intention of effectively presenting the trends that can be seen through the raw data.


Data; values from over 130 groups obtained through the Dodecad K10a Spreadsheet. Only groups with at least 5 participants considered. Composites of populations were taken where appropriate and denoted with _cmp. Labels shown otherwise identical to source. The O_Italian_D group was excluded because no information on their origins were found online. 

Mapping; Dodecad participant populations allocated to national capitals. Exact location of reference populations obtained where possible (see Citations) however some allowances were made regarding those accompanied by scant information. Refer to the Data Sink for the population list, coordinates and commentary made during mapping process. No numerical data, aside from those shown for certain populations, was shown to minimise clutter and to remain faithful to the intention of this entry.

Population depiction; I deemed it necessary to separately consider the genetic structure of Jewish, Indian and expatriate/New World populations and exclude them from the rest of Europe, Asia or Africa. Including Jewish minorities with their gentile compatriots would render the maps uninformative. The complexity of India's demographics, particularly because of the caste system, makes frequency maps an improper choice for revealing inter-group genetic differences. 



The raw values used in this investigation are attributed to Dienekes Pontikos, author of the Dodecad Ancestry Project.

Addenum I [04/07/2012]: Inclusion of All Components Colourised map, shown below:


Sunday, June 17, 2012

Secrets of Central Asia: Chapter I - The Pokrovsk Man [Review]

The first of a series focused entirely on ancient and prehistoric Central Asian ancient DNA (aDNA), this entry covers the furthering of an investigation into frozen remains found in a remote part of Siberian Russia.

Pokrovsk, Sakha Republic, Russia
In 2006, Amory et al. tested bone fragments of a grave found near Pokrovsk, a locale the Russian federal republic of Sakha (Yakutia) with the intention of discerning the remain's origins. [1] Amory et al. briefly elaborate on the purported archaeological history of Siberia, where an autochthonous hunter-gatherer population was either subjugated or partially displaced by expanding Tungus-Manchurian nomadic tribes, before the movement of Yakut herdsmen northwards into their present demographic range as a result of Mongolian domination in the region between the sixth and thirteenth centuries. The Abstract of the paper below:

"The Yakuts, Middle Age Turkic speakers (15th–16th centuries), are widely accepted as the first settlers of the Altai-Baikal area in eastern Siberia. They are supposed to have introduced horses and developed metallurgy in this geographic area during the 15th or 16th century a.d. The analysis of the Siberian grave of Pokrovsk, recently discovered near the Lena River (61_29_ N) and dated by accelerator mass spectrometry from 2,400 to 2,200 years b.p., may provide new elements to test this hypothesis. The exceptional combination of various artifacts and the mitochondrial DNA data extracted from the bone remains of the Pokrovsk man might prove the existence of previous contacts between autochthonous hunters of Oriental Siberia and the nomadic horse breeders from the Altai-Baikal area (Mongolia and Buryatia). Indeed, the stone arrowhead and the harpoons relate this Pokrovsk man to the traditional hunters of the Taiga. Some artifacts made of horse bone and the pieces of armor, however, are related to the tribes of Mongolia and Buryatia of the Xiongnu period (3rd century b.c.). This affinity has been confirmed by the match of the mitochondrial haplotype of this subject with a woman of the Egyin Gol necropolis (Mongolia, 2nd/3rd century a.d.) as well as with two modern Buryats. This result allows us to postulate that contacts between southern steppe populations and Siberian tribes occurred before the 15th century."

Grave Features
The Pokrovsk grave is located at the top of a glacial terrace near the Lena-Pokrovsk river junction. Radiocarbon dating places the site at approximately 2390-2190YBP. The physical type of The Pokrovsk Man was found to be gracile skeletally with a brachycephalic skull. The physical type was found to be Mongoloid, although the authors note it was "less accentuated" than that of Middle-Age Yakuts. It was also noted that the torus mandibularis, a normal variational bony protuberance located within the interior aspect of the mandible, was absent, despite it occurring commonly in East Asian and Native American populations. [2] Several material items were observed in the grave, including bone tools, harpoon heads, reindeer bone armour and flint arrowheads connected to archaic Siberian culture. However, other goods, including an iron arrowhead, are reputedly of South Siberian built. [1]

DNA extraction from bone by technique outlined in Keyser-Tracqui & Ludes’  Methods for the study of ancient DNA. [3] Autosomal DNA (auDNA) was retrieved from Profiler+ Multiplex kit (nine Short Tandem Repeat’s, or STR’s). Y-Chromosomal DNA (Y-DNA) tested using Powerplex Y System (eleven STR’s) as well as a Single-Nucleotide Polymorphism (SNP) on the TAT locus. Finally, a 421 base pair (bp) segment on the sample’s mitochondrial DNA (mtDNA) at the first hypervariable segment (HVS1) was tested (position 1598916410) and compared with the Cambridge Reference Sequence (CRS). 
Consensus data obtained directly from paper; auDNA analysis was achieved through popSTR, an online research processing engine which displays auDNA STR allele frequencies within different populations. [8]
mtDNA and Y-DNA analysis would have ideally been conducted through ySearch, mitosearch, the SMGF, supplementary data from relevant scientific literature as well as online DNA projects.

Allelic Frequencies

auDNA Analysis

Nine auDNA STR’s were retrieved from the Pokrovsk Man's remains. Unfortunately, the utilisation of STR's is questionable given they have a large margin of error and lack of population specificity due to the presence of multiple alleles within a single population, as well as heavy inter-population overlapping. This investigative tool has largely been made redundant by SNP testing, which employ thousands of markers rather than a few. Nonetheless, processing of these results will still be attempted.

The allelic frequencies per worldwide regional groups for the retrieved STR’s are shown opposite. All markers from the Profiler+ Multiplex were utilised in the subsequent popSTR search. The sample populations are largely derived from the HGDP-CEPH Human Genome Diversity Cell Line Panel. [4]

African frequencies of the Pokrovsk alleles are generally lower relative to Eurasian, Oceanian and American regional groups. This warrants the exclusion of such values from the analysis hereon due to their uninformative nature, apart from confirming the Pokrovsk Man had no recent African ancestry, which is in accordance with anthropological, historical and linguistic data from Siberia. Allele frequencies of remaining regions are shown in the Data Sink.

To elucidate the regional affinities of the Pokrovsk Man, averages for the alleles across the given regions were taken and ranked in order of descending magnitude (found again in the Data Sink).

The results indicate his affinity was greatest to the Americas, followed by East Asia and Europe (discussed later) in joint position, ending with the Middle-East and South-Central Asia. The discrepancy between the American and East Asian scores are explained by the East Asia regional group being constituted largely of ethnic groups from East Asia proper and Southeast Asia, such as the She, Naxi and Japanese. The Yakuts, who are the only sample population located in Siberia, are a part of this group, reducing the specificity further. However, the greater score to native American and East Asian populations than others is still consistent with both geographic position and the known demic expansions into of both regions.

The decreased allelic frequency average of South-Central Asians and Middle-Easterners with the Pokrovsk Man supports the above further. However, the Middle-Eastern group did not include populations from West Asia or the Caucasus, such as Anatolian Turks, Iranians or Georgians. Additionally, the lack of North-Central Asian ethnic groups such as the Kazakh, Tatars or Altaians may affect the results further.

It would have been preferable if auDNA SNP’s were obtained instead and compared with specific sample populations - Better yet if IBD segment analysis was also undertaken. SNP analysis could have been possible in 2006, given the HGDP-CEPH samples were made available at least four years prior, [4] which would have opened the door to analysis far deeper than the extent undertaken by Amory et al. or even this investigation.

The authors greatly limited the extent of their own investigation, noting the Pokrovsk Man showed identical matches with Buryats, West Siberians, Altaian Mansis, ancient and modern Yakuts, one Evenk and an Egyin Gol necropolis female [5] in their private haplotype database.

mtDNA Analysis
Of the ten loci tested, only three yielded consistent nucleotide variations (16223T-16362C-16368C). The mitosearch 1-step matches with a known maternal ancestor location were considered only (Data Sink). These results not only confirm Amory et al.'s conclusion the Pokrovsk Man belonged to mtDNA Haplogroup D, but the bulk of the distribution within Asia is expected based on modern samples. [6]

Unfortunately, once more, the scope of the initial investigation has hindered any further analysis, as the lack of testing regions beyond HVS1 cannot elucidate the extent of mitochondrial sharing outside of the data showcased here.

Y-DNA Analysis
None of the eleven Y-DNA STR's provided a successful return. The only SNP tested for was TAT, where a T→C mutation is considered equivalent to the M46 marker, which is defined as Haplogroup N1c under the current International Society of Genetic Genealogy (ISOGG) nomenclature. [7]

As the Pokrovsk Man yielded a T allele at this locus, his Y-DNA Haplogroup could not have been N1c-M46. However, this does not rule out him belonging to a lineage upstream of N1c-M46.

European Affinities & Conclusion
Despite the great limitations, several invaluable inferences can be made from the data presented in the furthering of Amory et al.'s Early influence of the steppe tribes in the peopling of Siberia which cannot be reasonably excluded as anomalous without also discarding conclusions made from other sources.

The auDNA results, though derived from STR data, fully agree with the SNP-based analysis of the Eurogenes Project by David W. in a previous run (described in an earlier Vaêdhya entry), as modern Siberian populations show trace values of various European or Caucasian ADMIXTURE components at the least with an absence of Southwest or South Asian specific components, whilst being predominantly Siberian and East Asian.

The European affinity in this investigation coming third may form a convenient explanation for why the Pokrovsk Man's features were less Mongoloid anthropometrically than Middle-Age Yakuts. It may suggest a West Eurasian physical element existed prior to the tribal and political upheavals that resulted in the Yakut settlement deeper into this portion of Siberia. Although the origins of this element were not elaborated upon, there may also be a connection with the postulated "migration corridor" covered previously and described in Malyarchuk et al.'s On the Origin of Mongoloid Component in the Mitochondrial Gene Pool of Slavs. [10]

This result supplements the picture of a West Eurasian genetic component of ambiguous origins being brought towards Siberia, challenging one interpretation of West Eurasian physical influence in the region stopping abruptly at Lake Baikal. [9] Instead, the totality of the evidence presented raises the possibility of this influence extending itself beyond the lake and manifesting itself simply as a "reduction" of Mongoloid cranial characteristics, which the Pokrovsk Man demonstrated, whose anthropometric configuration may well have been an artefact of this.

Unfortunately, the mtDNA and Y-DNA results were far too non-specific to merit further analyses. Their generality, however, do pose several questions; what subtype of mtDNA Haplogroup D did the Pokrovsk Man belong to? If he was not Y-DNA Haplogroup N1c-M46, what was he?

The material goods found in the Pokrovsk Man's gravesite may point us in the direction of the orientation his apparent European affinities came from. As South Siberia was the source of his iron and horse-derived goods, could he also have inherited West Eurasian genes from there? Were the benefactors ancient, or prehistoric?

Pokrovsk map from WolframAlpha.

1. Amory S, Crubézy E, Keyser C, Alekseev AN, Ludes B. Early influence of the steppe tribes in the peopling of Siberia. Hum Biol. 2006;78:531-49.

2. Apinhasmit W, Jainkittivong A, Swasdison S. Torus Palatinus and Torus Mandibularis in a Thai population. ScienceAsia. 2002;28:105-111.

3. Keyser-Tracqui C, Ludes B. Methods for the study of ancient DNA. Meth. Mol. Biol. 2005;297:253–264.

4. Rosenberg NA. Standardized subsets of the HGDP-CEPH Human Genome Diversity Cell Line Panel, accounting for atypical and duplicated samples and pairs of close relatives. Ann Hum Genet. 2006;70:841-7.

5. Keyser-Tracqui C,Crubézy E, Ludes B. Nuclear and Mitochondrial DNA Analysis of a 2,000-Year-Old Necropolis in the Egyin Gol Valley of Mongolia. Am J Hum Genet. 2003;73:247–260.

6. Mishmar D, Ruiz-Pesini E, Golik P, Macaulay V, Clark AG. Natural selection shaped regional mtDNA variation in humans. Proc Natl Acad Sci. 2003;00:171-6.

7. Zerjal T, Dashnyam B, Pandya A, Kayser M, Roewer L. Genetic relationships of Asians and Northern Europeans, revealed by Y-chromosomal DNA analysis. Am J Hum Genet. 1997;60:1174–1183.

8. Amigo J, Phillips C, Salas T, Fernández Formoso L, Carracedo A. pop.STR - An online population frequency browser for established and new forensic STRs. Forensic Sci. Int. Gene. Suppl. 2009.

9. Mooder KP, Schurr TG, Bamforth FJ, Bazaliiski VI, Savel'ev NA. Population affinities of Neolithic Siberians: A snapshot from prehistoric Lake Baikal. Am J Phys Anthropol. 2006;129:349-61

10. Maliarchuk BA, Perkova MA, Derenko MV. Origin of the Mongoloid component in the mitochondrial gene pool of Slavs. Genetika. 2008;44:401-6.