Wednesday, August 6, 2014

Anchored in Armenia: An Exercise in Genetic Relativity [Original Work]


Location of the Armenian Highlands in West Asia
As is the case with many groups in the region, the Armenians are, anthropologically-speaking, a very unique modern ethnicity. Situated in the Armenian Highlands (an expansive area straddling between the Zagros & Caucasus range) with a settlement history dating since the Neolithic, the modern Armenian people have maintained a distinct culture both shaped and shielded by the mountainous territory they inhabit. [1] One unique aspect of the Armenian people is their language; Modern Armenian is an Indo-European language belonging to its' own branch. There has long been scholarly debate regarding its' linguistic exodus from the Proto-Indo-European homeland (commonly accepted by modern linguists as the Pontic-Caspian steppe) [2] through to its' historical seat in the South Caucasus. As is evident by the attested Urartian and Hurrian loanwords in later forms of the language, Armenian must have been spoken by its' current forebears since at least before 500 B.C. [3] Various genetics enthusiasts (including myself) on differing occasions have cited this as an indication of an aboriginal West Asian genetic layer accompanying the Urartian-Hurrian vocabulary substratum.

Presumably due to the on-going political instability in West Asia, there has been an unfortunate lack of ancient DNA (aDNA) recovery in the areas adjacent to the Armenian Highlands. Alongside the Armenians, West Asia proper is also home to Anatolian Turks, numerous Kurdish groups, the Assyrians, several Jewish minorities and various ethnic groups within Iran. Inter-relation of all these groups in differing extents has been demonstrated in both published studies [4] and the open-source projects. [5,6]

Mount Ararat - A symbolic item in Armenian culture
Although they have most likely experienced their own demic events in prehistoric times, the insular nature of the Armenians relative to their neighbours allows them to be used as a stand-in for the aDNA we currently lack in this part of the world. In this blog entry, the Armenians will therefore be considered as a surrogate for autochthonous West Asian ancestry. They will be treated as a primary donor population (PDP) for several other West Asian groups, in an attempt to flesh out the degree of mutual shared ancestry, as well as the directions of added affinities beyond the region. This is by no means an authoritative attempt to purport a particular image of the West Asian genetic landscape, but an attempt instead to provoke discussion and explore the underlying structure of the region through a manner that should hopefully yield fruitful results in the glaring absence of aDNA in the region.

Working Hypotheses

1. Given the demonstrated similarity in autosomal DNA profiles (here and here), modern Armenians will serve as a reasonable PDP for all tested populations.

2. Furthermore, the genetic difference (GD) will likely be dictated by geographical proximity to the Armenians, or a (lack of) history of admixture with them.

3. Finally, the other donor populations will be anticipated either by virtue of geography or language.


The Dodecad K12b Oracle was used to undertake this small project (please visit link for technical information). When executed through R, the program was set to Mixed Mode and fixed to 500 results for every iteration per population. The command entered therefore remained the same each time:


Samples consist of nine location-specific populations (Iranians, Kurds_Y, Azerbaijan_Jews, Iraq_Jews, Iran_Jews, Turks, Turks_Aydin*, Turks_Kayseri*, Turks_Istanbul*) and four Dodecad participant averages (Iranian_D, Kurd_D, Assyrian_D, Turkish_D). A total of thirteen populations were therefore included.

From the output, only those combinations expressing an Armenian population as a PDP were selected. In this context, the Armenians will be considered a PDP if their "ancestral" percentage exceeds 50%. A maximum of ten were collected per population. In the event the number of combinations exceeded this, the subsequent combination lists are terminated with an ellipsis.

* Although not included in the original Dodecad K12b Oracle dataset, Dienekes has conveniently shared the population averages for these samples here. These were manually inserted into the command.


Iranian and Kurdish Oracle results
Unsurprisingly, the Iranians and Kurds all display similar results. Specifically, the adoption of either Makrani or Balochi as the secondary donors when Armenians are fixed as a PDP. The proportions are also comparable between all. The Iranians appear to fit the Armenian + Balochi/Makrani combination slightly better than the Kurds (GD=4.04-5.16 vs. 5.03-6.65 to 2 d.p. respectively). It is also worth observing that both Iranians and Kurds, irrespective of sampling strategy (location-specific or Dodecad average), do not have Mixed Mode results which exceed ten.

Assyrian and select Near-Eastern Jewish Oracle results
The Assyrians are one of the groups of interest, given the demonstrated autosomal similarity between them and Armenians (here). As anticipated, their Mixed Mode results well exceed ten and the best fits (GD=1.66-1.82 to 2 d.p.) are all, coincidentally, with the Near-Eastern Jewish groups studied here. Subsequent matches include additional populations (e.g. Saudi, Bedouin, Syrian) where the GD remains relatively small compared to the Iranian and Kurdish values (>3.15 to 2 d.p.).

The Near-Eastern Jewish groups largely mirror the Assyrian results, although some key differences should be outlined:

  • The Azerbaijani Jews have a GD similar to the Assyrians in range, setting them apart from the Iraqi and Iranian Jews. This seems to fit geography. However, if the association was strictly geographical, one would expect the Assyrians to lie in-between the Azerbaijani Jews from the Iraqi and Iranians. This may be genetic evidence of additional and direct ancestry between Armenians and Assyrians at some (or various) point(s) after the Near-Eastern Jewish groups had formalised their identities.
  • Saudis appear as a secondary donor population in all groups. Interestingly, they appear to have an inverse relationship with geographic proximity to the Armenian Highlands; Iraqi, Iranian and Azerbaijani Jews are 20.4%, 16.1% and 7.8% "Saudi" respectively. The Assyrians too fall on this cline despite the point raised above.

Anatolian Turkish Oracle results
Finally, the Anatolian Turks provide us with another set of interesting values and pairs:

  • Mixed Mode results from Western Turkey (Aydin, Istanbul) largely exhibit a combination of Armenian with various European ethnic groups or nationalities, which can be predominantly ascribed to geography. Please note the comparatively large GD among the Aydin average (>9.93 to 2 d.p.), which contrasts with Istanbul. I suspect the cosmopolitan nature of Istanbul has resulted in an artefactual lowering of the GD, given Anatolian Turks from
    across the country have moved their for employment purposes. [7]
  • In contrast, the samples listed as "Turks" in Dodecad K12b (from the Behar et al. dataset, located in Central-South Turkey) model well as a combination of Armenian with either the Chuvash, Nogay, Uzbek or Uyghur. European secondary donors do make an appearance once more. Please also note their GD is the smallest out of the Turkish averages investigated (4.20 to 2 d.p.).
  • The Kayseri average (Central Turkey) yielded no results matching the criteria outlined in "Method". However, the Assyrians instead made a frequent appearance as primary donors from GD=6.17 onwards. Given the genetic affinity between Assyrians and Armenians (refer above), and the consistency displayed by the Armenians as a PDP for other Turkish averages, this result can be considered anomalous. A close inspection of the Dodecad K12b proportions reveals the Kayseri Turks were on average approximately 1.5% more Southwest Asian than all other Turkish populations, explaining why Assyrians took preferential placing over Armenians as the PDP. The cause of this slight increase is unknown at present.
  • The Turkish_D average best resembled that of Istanbul, albeit with slightly more Armenian and less European proportions. This would suggest that, overall, the Dodecad Turkish participants map somewhere just east of Istanbul despite the presumably diverse backgrounds. 
  • Finally, all averages produced Mixed Mode results which exceeded ten in number.

IBD Segment Indications

To corroborate the findings of this investigation with additional genetic data, I refer to the Dodecad Project's fastIBD analysis of Italy/Balkans/Anatolia and fastIBD analysis of several Jewish and non-Jewish groups. As the analyses do not completely encompass those groups studied here, the results cannot be accepted wholesale. However, there does appear to be a broad agreement with some of the results in this investigation. For example, the Armenians and Assyrians have a demonstrated level of "warmth" to one another beyond background sharing.

Further Work

This investigation would have benefited from Azeri Turkish samples via the Republic of Azerbaijan. Additionally, a better breakdown of Kurdish, Iranian and Assyrian samples, akin to the site-specific sampling seen here in the Anatolian Turks, would have been ideal. Finally, as stated above, this investigation would have benefited from the inclusion of IBD segment analysis specific to the studied groups. Should time permit and the desired samples be made available in the future, this would be a natural line of inquiry to further what has been explored here.


Addressing the three hypotheses stated at the beginning in order:

1. Armenians certainly have behaved as a reasonable proxy for an autochthonous West Asian PDP in most of the populations tested (sole exception being the Kayseri Turks although this appears to be an anomalous response to slightly more Southwest Asian scores). The scores vary depending on the presence of the secondary donors, but Assyrians and Jewish populations from Azerbaijan, Iran and Iraq appear to have the largest proportion of this (occasionally surpassing 90%). All Iranians and Kurds, on the other hand, scored the least overall (approximately 65-75%). The Turkish range lies in-between these two.

2. Unfortunately, this isn't clear. The lack of regional results for Kurds and Iranians, together with a lack of samples specifically from Eastern Turkey, prevents any conclusion being reached on this point. The Near-Eastern Jewish populations studied here certainly do form a cline of Armenian "admixture" that is fully in line with geography. Furthermore, the large GD observed in Aydin Turks does support this idea, leading me to cautiously propose geography does indeed play a role. The second point also provides us with a partial answer, as the Assyrians demonstrate more of this than one would expect given their geographical placement based on GD, as well as fastIBD evidence from elsewhere.

3. With the exception of the Assyrians and Near-Eastern Jewish groups, the secondary donors overwhelmingly matched my expectations regarding their placement with whichever group that was studied (e.g. Iranians and Kurds towards South-Central Asia, Turks towards either Europe or Central Asia proper).

Over the coming years, with the availability of more data, we should hopefully move away from the population averages that have been used by various open-source projects. It has been empirically demonstrated here that regional results will differ significantly from nationwide averages (e.g. Aydin Turks vs. Turkish_D).

This also holds true on an individual basis; the best Oracle match for one Iranian via the described methodology was 56.4% Armenians_15_Y + 43.6% Tajiks_Y (GD=5.44 to 2 d.p.), differing significantly from both the Iranian and Kurdish averages.

I suspect the gentlemen running the numerous open-source projects are aware of this caveat and are, justifiably so in my opinion, making do with currently available data.

In closing, this investigation has also determined that, on the basis of the presumption of an Armenian-like autochthonous West Asian substrate, the studied populations as a whole have an apparent degree of inter-relatedness by virtue of this common South Caucasian autosomal heritage, albeit with the presence of highly significant affinities to elsewhere in Eurasia, be it population-wide, regional or even individual.


The first topic is regarding the Iranians and Kurds; why were their average secondary donors always the Balochi's and Makrani, rather than more northern groups, such as the Tajiks? I suspect, when applied to population averages, the Oracle program effectively minimises intra-population variation to the point where only the broadest of affinities are indicated. In the case of Iranians, the secondary donor would therefore be one with genetic features that tend to emphasise the difference between Armenians and Iranians (e.g. additional South Asian and Gedrosian admixture). A similar conclusion can be reached with respect to the Turks.

Another interesting point is the demonstrated close relationship between the Assyrians and various Near-Eastern Jewish groups. This has been speculated upon in various discussion forums in the past. More precise tools will be required to elucidate whether these populations share legitimate ancestry with one another, or the affinity is happen-stance, instead reflecting the mixture of similar Near-Eastern groups with (again) similar Caucasus-derived groups at some point in history.

[Addendum I, 07/08/2014]: For a continuation on this with a fellow genome blogger, please read the Comments below.


Full credit for both the generation of raw population data and the Oracle program go to Dienekes Pontikos (Dodecad Ancestry Project).

Map of Armenian Highlands from Photo of Mount Ararat courtesy of

Finally, I must refer all visitors interested in understanding the genetic constituency of the Armenian people to the FTDNA Armenian DNA Project. For a more interactive learning experience, two of the administrators (Mr.'s Simonian and Hrechdakian) recently delivered a lecture on this topic, garnishing it with a deeper description of anthropological and geographical aspects as described here.


1. Samuelian TJ. Armenian Origins: An Overview of Ancient and Modern Sources and Theories. [Last Accessed 3/08/2014]:

2. Clackson J. Indo-European Linguistics: An Introduction. Cambridge Textbooks in Linguistics [Last Accessed 4/08/2014]:

3. Greppin JAC. The Urartian Substratum in Armenian. [Last Accessed 4/08/2014]:

4. Grugni V, Battaglia V, Hooshiar Kashani B, Parolo S, Al-Zahery N et al. Ancient migratory events in the Middle East: new clues from the Y-chromosome variation of modern Iranians. PLoS One. 2012;7(7):e41252.

5. Dodecad Ancestry Project: ChromoPainter/fineSTRUCTURE Analysis of Balkans/West Asia [Last Accessed 4/08/2014]:

6. Eurogenes Genetic Ancestry Project: Updated Eurogenes K13 and K15 population averages [Last Accessed 4/08/2014]:

7. Filiztekin A, Gokhan A. The Determinants of Internal Migration In Turkey. [Last Accessed 05/08/2014]:


  1. Interesting exercise. Although I personally feel that Armenians tend to have too high frequency of Western (European) lineages to be the alleged ideal ancestral proxy that the whole experiment tends to imply, I still feel it is legitimate to do this kind of analyses and that they can potentially provide informative results.

    "The first topic is regarding the Iranians and Kurds; why were their average secondary donors always the Balochi's and Makrani, rather than more northern groups, such as the Tajiks?"

    I would say that this is because Baloch and Makrani represent better as a whole their non "Highlander West Asian" origins, which is not just Steppe Indoeuropean but also X relation with the Neolithic ANI component of South Asia. Baloch almost certainly have both and hence work as a good proxy for the secondary component, which is in fact two (or rather three, as I will argue now).

    But Makrani work even better than Baloch and that is IMO because of the third component affecting Iranians (and to lesser extent Kurds): a "Lowlander West Asian" (Arab-like) component which has some African or para-African affinities. As it is well known, the Makrani do have some African ancestry and that is probably weighting to make them the second proxy "source".

    My read is that the Makrani role here is caused because they embody three different components affecting Iranians and Kurds: (1) ANI, (2) Central Asian Indoeuropean (Indo-Iranian) and (3) Arab-like "Lowlander West Asian" elements with African-like minor but decisive affinity. This can be demonstrated if a 4-populations model with the parameters suggested here scores significantly lower than the 2-populations Armenian-Makrani one. For example Iranians could be best expressed as 65% Armenian + 20% Brahui + 10% Saudi + 5% Central Asian (just a guess).

    In these tests Armenians would account well for the "Highlander West Asian" component but also for whatever other European admixture, which does not show up in any of the tests, excepted Western Turks, who are clearly much more European than Armenians and Central-Eastern Turks.

    So I would suggest that there are a number of actual ancestral components here:
    1. Highland West Asia (the main component expressed as "Armenian admixture")
    2. Lowland West Asia (expressed as "Arab" but also as part of the "Makrani", because it has small but influential African-like affinities)
    3. Neolithic South Asian (ANI) (main Baloch/Makrani factor)
    4. Central Asian Indo-Iranian (smaller factor within the Baloch/Makrani "admixture", that's why they and not the Brahui act as better second source population).
    5. Tatar Mongol-Siberian element (expressed as part of the "Chuvash admixture" in some Turks)
    6. European element (mostly absorbed by the decision of using Armenians as one of the proxies but, among Turks, expressed as either "Chuvash" or "Norwegian").

    I think that would be it. Notice that the 2-population admixture simplification may actually hide a much larger array of actual ancestral populations, which may or not be detected in other tests, such as free-running ADMIXTURE, largely depending on the sampling strategy.

  2. Very informative reply Maju. Thanks very much for expressing your thoughts on this. I overwhelmingly agree with your breakdown and have two points for further discussion:

    1) Although I agree there is likely a Lowland West Asian ("Arab-like", or genuine Arabic admixture in some individuals) in Iran, I'm not convinced this is being captured by the Baloch and Makrani. In practically every ADMIXTURE run I have seen (Eurogenes, Dodecad etc.), the unadmixed Armenians tend to have as much (if not more) Southwest Asian as the Iranians. For an example highlighting this, in the Dodecad K12b spreadsheet (linked in main entry), Iranian_D is 12.4% SW Asian, whereas both Armenian_D and Armenians_15_Y hover around 14%. Even the demonstrated Arabic-admixed Iranians of the Behar dataset have a value which, for all intents and purposes, matches the Armenian value (14.2%). Thus, any Lowland West Asian ancestry would be swallowed up by the Armenians in Oracle. I suppose the Georgians or Abkhazians would make a better surrogate for a more isolated Highland West Asian ancestral component?

    After sleeping on this point, I have an alternative proposition for why the Makrani and Baloch took preferential placement as secondary donors rather than Tajiks, Burusho or Pashtuns; they, like the Iranian average, have less of the North European component. Although regional variation exists, the Iranian average is consistently in the 4-7% range. The Baloch and Makrani have even less (1-3%). In Dodecad K12b, the Iranian average (4.2-6%) is closer to the Makrani (1%) and Baloch (2.3%) than it is to the Pathan (13.2%). This also explains why the individual Iranian discussed in "Conclusion" does model as approximately half Tajik rather than half Baloch/Makrani. I included this as another example of significant deviation from a population average.

    2) Your general point regarding multiple ancestral signals is an intuitive one which I again agree with. To the more discerning genetics enthusiast, the Oracle results as they are give a lot of information regarding both the general and specific picture of variation. I would imagine that, if the Oracle program was extended to include multiple-population admixture results, this would receive added definition. In addition to the example you shared, the Turkish_D result may resemble something like 50% Armenian + 25% Greek + 25% Uzbek. But we do the best to work with what is currently available.

    1. (1) "After sleeping on this point, I have an alternative proposition for why the Makrani and Baloch took preferential placement as secondary donors rather than Tajiks, Burusho or Pashtuns; they, like the Iranian average, have less of the North European component".

      Sounds good on first look but actually the Iranian average (your figures) is above the Baloch one, so it should not be that.

      Wondering: is it possible that Iranians actually retain (as I believe it happens with Peninsular Arabs) a residue of the original OoA peoples who remained behind in the Persian Gulf "Oasis"? This OoA element would weight towards Africa because of contrast with the main India/Indochina branch (ancestral to all Eurasians-plus). It would work similarly to what I suggested before but it would not be strictly related to the Lowland West Asian component.

      I believe this could be tested by free-running Iranians and a few controls in Admixture to sufficient K-depths and watching for Fst distances of the resulting components. Never did that with Iranians but I suspect I found such OoA era components among NW Africans (strongest in Southern Morocco) and among some Egyptians and Saudi-Arabs. I really wonder if something of the like is to be found among Iranians (it would make good sense if the Persian Gulf acted as "oasis" between the OoA and the Early Upper Paleolithic backmigration from the East).

    2. "Sounds good on first look but actually the Iranian average (your figures) is above the Baloch one, so it should not be that. "

      For the sake of discussion, let's suppose the Balochi-Makrani N Euro average score is 2%. The Pathan there are 13.2%. The Iranian average score is 4.2-6%. This range is less than the figure which is formed midway between the Balochi-Makrani and Pathan (7.6%). Thus, the secondary donors would have to be the Balochi or Makrani, as the Iranian values rest closer to them.

      "Wondering: is it possible that Iranians actually retain (as I believe it happens with Peninsular Arabs) a residue of the original OoA peoples who remained behind in the Persian Gulf "Oasis"? "

      I've been thinking similarly for years, although my reasoning was based on genetic evidence. When compared with other West Asians on 23andMe's old Global Similarity feature, the Iranians consistently show up with an extra affinity towards East Africa, despite the majority having no "African" admixture. Your suggested investigation would definitely be interesting to see eventually.

  3. Can I also add this research to your interesting post?

  4. And while I was posting here a brand new research got posted :)

  5. Hello bsw-am,

    Yes, I had seen that blog entry by Razib when it was first published. My criticism at the time was that the genetic layers present in Anatolia are likely more complicated than the assertion they are basically "Turkified Armenians". I suspect this is only true for the eastern portions of Turkey. This very entry demonstrates that Western Turks model better as predominantly being some sort of West Asian-derived population with significant European. I suppose the Greeks or certain Balkan populations would fit the bill better with them.

    Anecdotally, the results of Turks and Azeris I have seen on the forum I co-administrate (Anthrogenica) from the eastern part of the country do support the assertion that they are predominantly Armenian(-like in some cases) with varying levels of Central Asian input. Some exceed 20%, with one user elsewhere apparently scoring less than 1%. I'm a proponent of maintaining distinctions between national averages, regional trends and individual scores for this (and many other) reasons.