Autosomal variation from Anatolia to the Tarim periphery

The nature of ADMIXTURE as a tool for inferring ancestral components makes it difficult to discern the nature of a shared Autosomal component between several populations. For instance, a given component may originate in one population and be donated to others (e.g. purported African admixture in the Arabian Peninsula), stem from a mutual population (e.g. West Eurasian-specific components in low K=n runs between the Druze and the French Basque) or be the result of genetic drift (e.g. potentially, the peaking of East Asian-specific components in Korea and Japan).

Nevertheless, using results from the latest Dodecad Ancestry Project K12b run (link), I have investigated the component variation across a horizontal axis from Anatolia to the Tarim periphery in West China, with the intention of establishing the nature of the observed components across this area of interest. Raw values can be viewed on the newly-published Vaêdhya Data Sink. Populations are listed in a geographical cline.

One of the most immediate observations is the similarity between Kurdish and Iranian populations, with both expressing similar admixture percentages (deviation per component usually not >1%). This suggests that Kurds and Iranians have common origins, with the former largely maintaining those ancestral signals despite moving further westwards relative to their linguistic cousins in Iran.

Near-congruency between the Assyrians and Armenians is also striking, bar the variations on the North European, Caucasus and Southwest Asian components. It is again tempting to postulate the two descend for the most part from a similar root population with the aforementioned component differences accounting for the linguistic differences.

If one allocates the Kurds alongside the Iranians, several of the Autosomal components shown here have a distribution that appears to be determined by geography alone;

  • South Asian peaks in Tajiks, who are situated approximately due NNW of the Indian Subcontinent.
  • Caucasus reaches a maximum in Armenians and adjacent populations.
  • Atlantic Med steadily decreases as one moves further away from Europe.
  • Southeast Asian has an inverse relationship to the above, peaking in the Uyghurs sigificantly only.

Other components appear to have more complicated distributions;

  • Interestingly, East Asian and Siberian are not too dissimilar in the populations containing them. The elevation of both in populations which speak Turkic/Altaic languages relative to neighbours speaking other languages confirms genetic input from the Turkish steppe nomads who expanded from the eastern side of Central Asia, eventually reaching the Iranian plateau and Anatolia. However, it is possible some of the Siberian and East Asian values may simply be the result of prehistoric demic diffusion across Eurasia (demonstrated by potential gradient between Kurds/Iranians <-> Tajiks), although this may in itself be of medieval steppe ancestry.
  • Southwest Asian peaks in Assyrians, the only Semitic-speaking population shown in this analysis. This component falls rapidly beyond the Iranian plateau but is found at a background frequency east of Turkmenistan. Whether this is again an artefact of prehistoric demic movements or more recent migrations (e.g. Silk Road, various Persian empires) is debatable. As with the Siberian and East Asian components, there is an elevation which defies a geographical pattern and confirms historical accounts; the Tajiks, who descend in part from Persian speakers escaping Iran after the Sassanid collapse, show an elevation relative to the Uzbeks and Uyghurs. The greater frequency in Christian Armenians relative to the predominantly Muslim Kurdish territories and Iran disregards outright the notion it was introduced by the Islamic expansion out of the Arabian Peninsula.
  • The Gedrosia component has a bifurcated peak between Iranians and Tajiks, implying an ultimate peak in the region of Pakistan (corroborated by other Dodecad population results, such as the Balochis of Pakistan). However, the Gedrosian frequency drops from a stable 28% across West Iranic-speaking populations to 13-18% in Anatolian Turks, Armenians and Assyrians. It is again impossible to infer whether this is of prehistoric origins (i.e. mutual Neolithic phenomena between the Iranian plateau and South-Central Asia) or more recent (inflated Gedrosian values a function of Median, Persian and Parthian ancestry).
  • The North European component has what appears to be a dual geographic and linguistically-oriented distribution, which may be confounded further by recent interactions between Europe and some of the populations shown here (Anatolian Turks may potentially be the greatest example of this). It is interesting to note the Assyrian and Armenians show an inverse in the North European and Southwest Asian components despite otherwise appearing identical. The elevated frequency of this component in Central Asia will hopefully be covered in a future entry.

Despite the usefulness of ADMIXTURE in determining approximate ancestral origins of populations and individuals, it is impossible to ascertain the nature of component X between populations A and B; such Autosomal results should ideally be complementary to historical, linguistic, archaeological and even deep paternal and maternal evidence (Y-DNA, mtDNA).

Some of the observations made in this entry have been gleaned with earlier renditions of population data; through the use of deeper penetrating Autosomal techniques (such as IBD), the exact nature of the component variations should hopefully be resolved in the future.


The raw values used in this investigation are attributed to Dienekes Pontikos, author of the Dodecad Ancestry Project.


  1. Hi, Nice blog.
    Just wanted to say that the K12b run did not include the San and Hadza, whose inclusion would eliminate some of these clusters, in addition to shift around the distribution of many of the remaining clusters on a global level. Please take a look at the K9 global run for a relative comparison.

  2. Thank you for your comment; I agree that the exclusion of reference populations from ADMIXTURE (such as the one you stated) will likely lead to a re-shuffling in certain components. However, as African-relevant components are present in trace quantities at best in this particular analysis, the exclusion of the San or Hadza will probably not lead to any significant differences.

  3. It is who you remove from the data set that makes a difference, in other words the degree of genetic divergence that a given population has from the data set has a direct impact in which populations a cluster will appear/disappear in and the distribution for the remaining clusters, for instance removing the Chinese and keeping the Japanese in a global data set will have much less of an impact in the distribution of a fixed number clusters than removing the San and keeping the Yoruba in the same data set, since the San are as genetically different from a Yoruba as they are from a European.

    The world 9 global run, which included the SAN and the Hadza, had produced a cluster nick-named “Southern” that was very wide flung, from Northwest Africa to well into West Asia and from Subsaharan East Africa to Southern Europe, here below were what the populations outlined in the topic of your post had scored for the “Southern” Cluster:

    As you can see, this cluster has vanished however in the K12 run, had the same data set of 'World9' been run at the ADMIXTURE, K=12 level this cluster wouldn't simply have vanished.

  4. I respectfully disagree with your first point in this particular context. Removing African reference groups will not significantly effect scores of populations that are almost entirely Eurasian.

    The absence of San or Hadza in the latest run only coincides with the dissolution of the "Southern" component, which has likely consolidated itself into at least the Southwest Asian and Caucasus components.

    You are describing two separate changes from K9b->K12b. That does not necessarily mean one change led to the other. There may have been some intra-component consolidation once the San and Hadza were removed, but the difference is largely accounted for by the increased K=n. Components become unstable beyond a certain K value (as previous Dodecad iterations and the Harappa Project have shown in older posts).

  5. What are the origins of the natives of Khuzestan? I have heard many Arabs say it is an extension os Mesopotamia? Isn't this where the Elamites were settled? The Zagros was supposed to be a barrier to gene flow but it doesn't seem like iranians living West of the Zagros are any different from their Eastern counterparts does it?