Saturday, March 31, 2012

North European Component Variation within the Eurasian Heartland [Original Work]

As DNA variation across Asia have progressed over the years (Wells et al., Xing et al., teaser mtDNA results from Burger et al.'s upcoming analysis of prehistoric Eurasian steppe remains), the prevailing theme of ancestral markers with origins in Europe has remained a frequent one, particularly with regard to the expansion of Bronze Age semi-pastoral nomads from the Pontic-Caspian steppe bearing the Indo-European languages.

David W. of the Eurogenes Genetic Ancestry Project has recently posted data online from a new Intra-European run using ADMIXTURE (K=12) with the intention of breaking up the North European component that often arises through the program. Spreadsheet results here.

This brief investigation seeks to identify the North European-derived component patterns within Asia by first mapping out the frequencies and then correlating with Eurogenes' release notes on each.

As many samples from immediately-identifiable populations were obtained from the spreadsheet results (link above). No sample restrictions were implemented. Averages of each population were calculated, except where n=1. No modifications made to population labels except for Eurogenes population averages, denoted by the addition of a _Eg suffix. Populations were then allocated into arbitrary regional groups, allowing results to be displayed more coherently.

Tabulated results can be found in the Data Sink. Autosomal variation per Regional Group can be found below:

The North European-derived components, despite their exceptionally close Fst. distances relative to the other components, do seem to reveal a few interesting trends;

  • Northeast European appears to (at least partially) be the result of allele sharing with populations further east, as evidenced by its' predominance in East-Central Asian groups, as well as extending even further eastwards into the Siberian Selkup (n=1). This component has a circumstantial correlation with the craniometric and ancient mtDNA evidence suggestive of a "migration corridor" between Eastern Europe and Siberia (Malyarchuk et al.'s On the Origin of Mongoloid Component in the Mitochondrial Gene Pool of Slavs, Newton's Ancient Mitochondrial DNA From Pre-historic Southeastern Europe: The Presence of East Eurasian Haplogroups Provides Evidence of Interactions with South Siberians Across the Central Asian Steppe Belt). While it also explains this component's abundance in North Caucasian populations (lie en route between Ukraine and Siberia), the same cannot be said with absolute certainty of South-Central Asia. With that being said, the 0.021 Fst distance with West European despite the markedly different distributions suggests both are the result of prehistoric (possibly paleolithic?) hunter-gatherer migration paths across large swathes of Eurasia.
  • West European has a sporadic appearance across with an Asian peak in the North Caucasus. This implies - Staying true to its' assigned label - It is a generic West Eurasian component that has reached a maximum in Western Europe, with the North Caucasus representing the closest point of reference to there. Indeed, this inference is made independently by Eurogenes, albeit using different parameters;
"I used samples of Scottish, Irish and Western English ancestry to create this cluster. Not surprisingly, it peaks in individuals of Western Irish descent. However, it also peaks in Basques and many Iberians, which is fascinating, because that makes it the autosomal equivalent of Y-chromosome haplgroup R1b in Europe."
  • North Sea and South Baltic accompany one another at similar frequencies across much of Asia, especially in populations with an Indo-Iranian-speaking heritage (observe the ~0.8-1:1 ratio among Kurds, Iranians, the Turkmen, Uzbeks, Tajiks, Brahmins, Kshatriya's and Kyrgyz as examples of this). It is interesting to note that, of the two, only the North Sea component is readily present in East-Central Asians. The only other likely migration path along this trajectory is that of the proto-Tocharians, who (under the Eurasian steppe theory) split off from the Proto-Indo-European homeland several millennia prior to the Proto-Indo-Iranians that eventually formed the Andronovo archaeological horizon from Sintashta/Pit Grave (E Kuz'mina, The Origin of the Indo-Iranians, pg.451). Perhaps this near-solitary North Sea component within the Altaians, Mongolians and Uyghurs is attributed to early speakers of Tocharian? Perhaps the elevated presence of the North Sea component in South-Central Asia (Jatts, Pathans, Kyrgyz) is a relic of the Kushans, nomads supposedly a part of the Yuezhi confederacy, who may have been Tocharian speakers themselves? 
  • One curious phenomenon is the similar West European-North Sea-Northeast European component proportions across the Turkmen, Uzbeks, Kyrgyz, Pathans, Uttar Pradesh Brahmins, Altaians and the Uyghur. Whether this can be substantiated in any way, or whether it is simply an anomalous association predicated by non-uniform and varying sample sizes, prevents a firm conclusion from being made.
  • North European-derived frequencies among Southwest Asian Semitic-speaking groups shown here seldom exceed 1% apiece and are either the result of recent, inconsistent small-scale admixture events or are simply background noise generated by ADMIXTURE.
The Northeast European and West European components appear to have a distribution independent of any significant migration events since the Neolithic, instead being associated with either the "migration corridor" across Eurasia or simply being the result of mutual West Eurasian heritage. North Sea and South Baltic, on the other hand, do seem to correlate with one another and support (rather than contradict) the eastward movement of Bronze age semi-pastoral nomads speaking early dialects of Proto-Indo-European.

Edit I [31/03/2012]: Correction of erroneous Brahmin results due to Google Spreadsheet lag.


  1. Hi DMXX. You seem to have miscalculated and omitted some of the South-Asian averages. The Metspalu et al. Tamil-Brahmins do not exhibit any of the Northeast Euro component, but instead 3.13% North-Sea admixture, 1.04% South-Baltic admixture and 0.84% Western-European admixture.
    Additionally, you're missing the Uttar-Pradesh Brahmin average, who are certainly important in the context of North-European specific components' presence in South-Asia given that they exhibit ~13.5% North_European admixture in Dodecad Ancestry Project's K12a and K12b ADMIXTURE runs; 13.72% of the Baltic hunter-gather component in Eurogenes' hunter gatherer vs. farmer ADMIXTURE test and finally, 12.85% in total of the Northern Europe-centered components in this latest Eurogenes run i.e South Baltic (Lithuanian), North-Sea (Scandinavia?) and Northeast Euro (Chuvash). These samples were sampled in Azamgarh district, eastern Uttar Pradesh and are Bhojpuri speaking. You are most welcome to refer to this post for the Eurogenes NEU.12 ADMIXTURE results for individual South-Asian participants and the reference population averages for the subcontinent.

    One trend that I've come to notice, is that, within the same population group, for instance, in South-Asia, some individuals tend to exhibit an excess of a particular exogenous West-Eurasian element in comparison to others of a similar/same background, while some other West-Eurasian component is lower relative to others. For instance, IN17 (of Tamil-Brahmin ancestry) seem to consistently exhibit an extra-North East European affinity in the latest Eurogenes ADMIXTURE runs including this one, while having comparatively less West Asian/Caucasus admixture. Similarly, IN9 and IN18 (again of Tamil Brahmin ancestry) tend to exhibit an excess of the Caucasus component relative to the other participants of the same ethnicity in this run (and also in runs such as the Eurogenes West-Eurasia K12=b). On the other hand, the other components are rather uniform in the same ethnic group across different participants. This trend can be seen for many other ethnic groups as well, both within and outside of South-Asia. What I, and many others, think we're dealing with here, are components which to an appreciable extent overlap. Thus, it seems ADMIXTURE works on a fairly arbitrary basis as to which element will be chosen to illustrate a basic affinity to a certain geographic area.

    1. There was an error whilst transcribing the data; Google Spreadsheets periodically lagged whilst highlighting cells leading to that discrepancy, erroneously merging the Tamil Nadu and Uttar Pradesh Brahmin results. I have corrected this now.

      Regarding your statement concerning the trend you have witnessed, that is simply be a reflection of an individual's chance allelic inheritance over the generations and thus a representation of that individual's genetic make-up only.

      For instance, if one parent is 55% South Asian and another is 50% South Asian, one would expect inheritance somewhere between 52-53% in the offspring. However, it is entirely possible the offspring will inherit more than 55%, or potentially less than 50%.

  2. -Wouldn't Northeast European also be associated with Indo-Iranian speakers in South-Central Asia? I don't get how this existed before them.

    -Not sure about West European's presence in South-Central Asia? Is it being associated with ydna R1b carriers? Considering the lack of R1b lines this doesn't make sense.

    -So North Sea could also be Tocharian? Maybe there is tocharian ancestry in the area after all (what clade of r1a would they have had? maybe this can be cheked for). And South Baltic seems to be Indo-iranian then? Were the Indo-Iranians more mixed than Tocharians then?

    What really puzzles me is that there are FOUR components but only ONE ydna line can be associated with them in the area. That is R1a-Z93+. I just don't understand how it can happen.

  3. "-Wouldn't Northeast European also be associated with Indo-Iranian speakers in South-Central Asia? I don't get how this existed before them."

    Not necessarily. If we consider the evidence presented by Malyarchuk et al. and Newton, those hunter gatherers who lay on the "migration corridor" during the LGM may have, theoretically, diverted southwards into South-Central Asia, which was a temperate desert at that time (

    "-Not sure about West European's presence in South-Central Asia? Is it being associated with ydna R1b carriers? Considering the lack of R1b lines this doesn't make sense."

    As my previous entry shows (Showcasing of Y-DNA Variation Among Afghan Ethnic Groups), there is indeed Haplogroup R1b1a2-M269 in Central Asia, although its' origins were not validated sufficiently by the use of U106.

    Nevertheless, this little investigation showed that, on average, the presence of the West European component in Asia follows an extremely broad cline (peak in West, diminishes East) that is seldom kept to by many of the ethnicities sampled. This is why I suggested it is a generic West Eurasian signal.

    "-So North Sea could also be Tocharian? Maybe there is tocharian ancestry in the area after all (what clade of r1a would they have had? maybe this can be cheked for). And South Baltic seems to be Indo-iranian then? Were the Indo-Iranians more mixed than Tocharians then?"

    It is impossible to make these sorts of conclusions (deducing ancient genetic compositions) exclusively from modern results. I am suggesting the North Sea component seen here may (operative word) be A signal from early pastoral agriculturalists speaking what would eventually become the Tocharian dialects, not necessarily THE Tocharian signal.

    1. That makes sense. -I guess the M269 in central Asia would explain western European in Central Asia but not in South-Central Asia. But the possibility of M269 being from West Asia recently is also a possibility since they didn't look at it close enough as you said. But other than that only R1a-Z93+ fits these components which is weird. Were even tocharians z93+? Even early hunter gatherers carrying Northeast European? Seems unlikely. It is mind boggling. Can we see anything about Altaic nomads introducing or elevating these components(or some of them) in South-Central Asia (or in Central Asia-ie were they there before the Turkic expansions or did Turks from near Siberia/Altai bring and elevate these components in Central Asia)? It is interesting that Tocharian C which doesn't get talked about much was the one with the most contact with Indo-Iranian and the west Eurasian mummies were found in a Tocharian C speaking area. mtdna lines would be interesting to look at.

  4. Northeast European peaks in the Volga region. Was that within the realm of Andronovo? Or maybe this is some sort of proto uralic influence in proto Indo-iranians showing up?

    Also is there any way to edit comments?

    1. "Was that within the realm of Andronovo?"

      No, but the Volga is where the Yamnaya/Pit Grave culture is situated, which is where the Proto-Indo-Iranians moved eastwards from to form the Sintashta culture, before finally forming the Andronovo archaeological horizon.

      "Or maybe this is some sort of proto uralic influence in proto Indo-iranians showing up?"

      Possible, linguistic contacts between Proto-Uralic and Indo-Iranian have long been confirmed.

      "Also is there any way to edit comments?"

      Unfortunately no. Make those precious seconds before you hit the "Publish" button count! Alternatively you can "Preview" your comment before finalizing.

    2. Contacts were there (with proto greek too) but I question the presence of admixture.

      And once again I don't get how all 4 North European components in Asia are tied to nothing but R1a-Z93+. The theories of proto Indo - Iranians being Z280 seem bogus. Z280 is proto slavic and I doubt proto slavs participated in the Andronovo people's ethnogeneisis.

      Also which mtdnas do you think are Indo-Iranian (and not from uralic speakers or marriage with the tripoyle people which might have happened).

      However the actual settlement of tripoyle farmers in south central asia seems bogus. which mtdna and ydna correspond to that?

  5. I have a hard time believing Northeast Euro components were already present especially since their is no ydna or mtdna evidence for it.

    Also did these components show a connection with Altaic nomads inflating them in central asia or south central asia(altaic nomads from kazakhstan and uzebkistan I suppose since thats where most of the invasions were from and this Northern component there is Indo-Iranian to begin with).

    What is your personal theory on West European and M269?

  6. This comment has been removed by a blog administrator.

  7. This comment has been removed by a blog administrator.

  8. But is there a chance some components are elevated due to assimilation? Either of proto slavs or finno ugrics?

    Maybe pashtuns assimilating Russians/North Caucasians (inflating northern euro and the Caucasus component-pashtuns have 17% of it)?

    1. I have also heard one of the Kazakhs sample has European ancestry. is that throwin off their score? For example higher South baltic and lower west asian than you would expect?

      Also where did west asian admixture in them come from?
      Was all of Kazakhstan originally Indo Iranian natively (even the eastern areas near the Altai , I believe that was always IE , tocharians and afanasevo might have been there first though)

      Did the Persian empire settle Central Asia? And if they did were the settlers indo-iranian speakers or all sort of ethncities from the empire that have nothing to do with Central Asia (armenians, assyrians, arabs, anatolians, greeks, caucasians, syrians, egyptians) etc?

      Did some West asian admxiture come from trade routes with the near weast like from assyria and such via the Caucasus?

  9. Tajiks having more Northern could be explained by more East Asian and Siberian (i.e. Turkish) input, lowering the value of the others slightly, if we presume the original Turks were predominantly East Asian and Siberian (unfounded speculation).

    I have no opinion on what mtDNA haplogroups the Indo-Iranians bore at present and which populations inherited some maternal lineages directly from them. This will be covered in future chapters of my "Secrets of Central Asia" series.

    The Persian empire(s), like most empires, did not produce a whole-sale change to the genetic landscape of the territories they captured. It is more feasible to examine mtDNA/Y-DNA data on an individual basis and determine their broad regional matching for that. I would, however, speculate that a good number of Tajik lines are of Sassanian Persian stock and will trace right back to the Iranian plateau.

    In my opinion, the Proto-Indo-European were a West Asian-North European hybrid population according to ADMIXTURE nomenclature, based on the simple fact that West Asian admixture in Europe and European admixture in select West/Central/South Asian groups are converse to one another. I am an ardent supporter of the Pontic-Caspian steppe theory and marry the autosomal evidence in modern populations with the linguistic by assuming the Proto-Indo-Europeans were donors of (some) West Asian admixture in Europe and (much of) the European admixture in Asia. Both Dienekes and Polako have independently stumbled across parallel genetic signals deep into Europe and Asia from the other side. This cannot be a coincidence. To provide a deeper narrative, I also speculate the Proto-Indo-Europeans were themselves a hybrid population of local hunter-gatherer-foragers above the Pontic-Caspian watershed with early Neolithic migrants from the Near-East. Although I approach every other possibility with the same esteem, as we have no conclusive evidence at present, this is my view as it stands.

    I'd appreciate if you did not post political or biased discourse on this blog (e.g. "Tashkorgan should be given to Tajikistan, shame about the Turks breaking through Central Asia"). This blog was not designed to facilitate such views. There are dozens of other forums that are. I have deleted the posts in question and will do so if they're posted in future.

  10. Maybe. But they don't seem that much more Northern than Pashtuns and Pakistanis. When we say Turks do we mean Turks bringing down northern Euorpean components from regions such as Kazakhstan (Indo-Iranian in ancient times) or bringing Northern European admixture from Siberia/Area north of kazakhstan?

    My only issue with that is we don't know how much of the northern components tajiks would have if they had no Turkish admixture. One possibility is that the Turks they mixed with had more than Pashtuns but less than the unmixed Tajiks. In that way Turkic mixture could lower the Northern components the same way it probably did for Kazakhs. The Northern component of tajiks is in line with their geography ie just slightly more than Pashtuns. While Uzbeks are slightly more than tajiks and Kazakhs slightly more than uzbeks. The pattern you would expect to see if it is just a assimilated Indo-Iranian component.

    Also groups the Selkup no matter how Northern European they are are at least equally East Eurasian if not more. In order for them to inflate the Northern components in groups like Pashtuns they would end up inflating East Eurasian components by at least 1if not 1.5 to 2 %. seems unlikely for that reason that much of this Northern admixture is Turkic.

    I will say some of it might be related to the proto Tocharians who were said to have settlements around Tajikistan and the Areal Sea. Settlements around Tajikistan could also explain some of the East Eurasian admixture in Tajiks as we know they took East Asian wives as opposed to West Asian wives early on.

    Neolithic migrants? Via Central Asia? Or from Tripoyle via Anatolia? We don't see any Neolithic ydnas in Central Asia that can't be explained from their own neolithic. I would expect things like I2 and M269 to be the Neolithic lineages hunter gatherers mixed with.

    Apologies on the comments.

  11. Meant to write Inflating Northern European 1% would entail inflating East Eurasian components at least 1 % by if not 1.5 to 4 %

  12. Wouldn't Mediterranian admixture be more likely than West Asian btw? Andronovo had a mtdna match with Tripoyle and it was a super rare clade of mtdna T.