Vaêdhya

4Mix Ancients for PuntK12 Calculator

2016-07-02T07:25:00.004-07:00

Overview

4Mix is a nifty supplementary tool executed alongside GEDMatch calculator or ADMIXTURE outputs to establish the genetic distance and ancestral proportions of a given number of population combinations. Originally conceived by "DESEUK1" (Eurogenes Ancestry Project participant), it has been implemented numerous times across the wider genetic genealogy community.

In light of Lazaridis et al. 2016's recent "The genetic structure of the world's first farmers" [Link], crucial aDNA from the Near-East has been published and utilised by citizen scientists.

This brief entry provides users with an immediate means of assessing their ancestral proportions with the new releases through the PuntDNAL K12 calculator.

The R script, an example target file, the population source data and ReadMe's (DESEUK1's original and my own contribution outlining the "sink" version's procedure) can be found in the link below:

>> Download Lazaridis Updated 4Mix PuntK12 Ancients Package (tinyupload.com)<<

Purpose of the Package

This modification was simply designed to give the wider genetic genealogy community an easy and informal means of manipulating this recent data to explore ethnogenesis or personal ancestries at their own discretion. This is not a formal assessment of the above.

Limitations

Those intending to use this 4Mix package must be aware of the following:

1) The Iran_N, Iran_ChL and Levant_N samples here are GEDMatch contributions by genome bloggers "Kurd" and "Srkz". These currently number one, two and one respectively.

2) The utilisation of these samples as references is a short-term convenience and should not be considered equivalent to ADMIXTURE runs containing these samples among them. The methodology described above opens the potential for Davidski's "Calculator Effect" to manifest.

3) Due to the continued absence of Ancestral South Indian (ASI) aDNA, the Paniya were considered a "last resort" surrogate to address the ancestral proportions South/South-Central Asian samples would generate. Furthermore, additional modern reference populations (i.e. Yoruba, Nganasan) were used to furnish other worldwide aDNA deficiencies. These populations were chosen based on their peak modal status in the K's determined by the PuntDNAL K12 calculator.

Contributions

A very special thank you to the users "jesus" and "khanabadoshi" from Anthrogenica for their guidance and assistance in modifying the package for your usage. Another extended thank you to the user "surbakhunWessste" (also from Anthrogenica) for outlining the "sink" procedure here.

Identifying Bias in Cohorts: IBD and Life Stage Effect [Review]

2016-03-19T16:55:00.000-07:00

A very interesting paper published barely one week ago investigating the potential for bias exertion in population genetics cohorts:

Reducing bias in population and landscape genetic inferences: the effects of sampling related individuals and multiple life stages.
Peterman W1, Brocato ER2, Semlitsch RD2, Eggert LS2.
PeerJ. 2016 Mar 14;4:e1813. doi: 10.7717/peerj.1813. eCollection 2016.

"In population or landscape genetics studies, an unbiased sampling scheme is essential for generating accurate results, but logistics may lead to deviations from the sample design. Such deviations may come in the form of sampling multiple life stages. Presently, it is largely unknown what effect sampling different life stages can have on population or landscape genetic inference, or how mixing life stages can affect the parameters being measured. Additionally, the removal of siblings from a data set is considered best-practice, but direct comparisons of inferences made with and without siblings are limited. In this study, we sampled embryos, larvae, and adult Ambystoma maculatum from five ponds in Missouri, and analyzed them at 15 microsatellite loci. We calculated allelic richness, heterozygosity and effective population sizes for each life stage at each pond and tested for genetic differentiation (F ST and D C ) and isolation-by-distance (IBD) among ponds. We tested for differences in each of these measures between life stages, and in a pooled population of all life stages. All calculations were done with and without sibling pairs to assess the effect of sibling removal. We also assessed the effect of reducing the number of microsatellites used to make inference. No statistically significant differences were found among ponds or life stages for any of the population genetic measures, but patterns of IBD differed among life stages. There was significant IBD when using adult samples, but tests using embryos, larvae, or a combination of the three life stages were not significant. We found that increasing the ratio of larval or embryo samples in the analysis of genetic distance weakened the IBD relationship, and when using D C , the IBD was no longer significant when larvae and embryos exceeded 60% of the population sample. Further, power to detect an IBD relationship was reduced when fewer microsatellites were used in the analysis."

[Abstract]

How relevant is the above to human population genetics? Quite, for two reasons:

Per the accepted phenomenon which props the IBD model, the study does give a unique angle with respect to sampling methods. The difference in IBD status as determined by life stage, alongside statistical demonstration of insignificance once only A. maculatum larvae and embryos were considered, confirms social mobility plays a role in obscuring intra-species IBD measurements. This is clearly mitigated in human settlements with extreme geographical isolation.
More microsatellite markers are usually better - Genetic genealogists or researchers familiar with Y-chromosomal analyses are already aware of this mantra. Not a surprise to see the authors concluded their statistical power increased when the maximum number of markers were employed.

The abstract, rather unhelpfully, does not reveal the outcomes of the sibling-pair variation to their experimentation.

A full read of the paper at some point should hopefully address the above, as well as the raw data produced through the statistical calculations.

Pain: OPRM1 & The Ancestral Contribution [Review]

2015-09-23T09:43:00.000-07:00

A paper by Soto & Catanesi assessing genetic variation in the μ (mu) opioid receptor gene (OPRM1) was published in May 2015. OPRM1 contributes to the structure of the μ opioid receptor (MOR), one of three major opioid receptor types which broadly contribute to pain sensation, addiction, ion influx into cells and a host of other functions [1].

WHO Pain Ladder (courtesy of paineurope.com)

Opioid receptors are of interest to medical researchers due to the varying specificity of receptor agonism (activation) by conventional treatments (e.g. tramadol has a higher affinity to MORs than others [2]). Additionally, opioid receptor antagonists are widely used in pain management (as directed by the World Health Organisation's classic "pain management ladder", figure opposite [3]). As such, understanding the structure of opioid receptor characteristics between individuals could theoretically fine-tune the ideal pharmaceutical agents to be used in specific situations, such as in palliative or acute care, as well as narcotic or surgical rehabilitation (in effect stratified or personalised medicine).

Although the paper indicates contradiction in our current data regarding the most studied SNP to date (rs1799971, linked to the A118G polymorphism), others residing within or near OPRM1 are postulated to have an effect on MOR function.

OPRM1 from a population genetics perspective is of interest through the observation in numerous older studies (listed within paper) that show differing A118G polymorphism frequencies across various world populations. The authors extended these findings by including the HapMap world database with their own Argentinian samples to determine whether OPRM1 SNP variants correlated with ancestral background. Establishing the polymorphic frequency among Argentinians appears to be a secondary aim here.

The authors of this paper concluded that Sub-Saharan African, West and East Eurasian ancestral status coincides with OPRM1 polymorphism status in the several SNPs examined (through the use of Fst and AMOVA). However, they noted that no such clustering was observed between West Eurasians (Europeans) and American populations (mixed Argentinian and Mexican samples). This was taken to indicate extensive gene flow from European colonists had made a massive contribution to the polymorphic status at this gene.

Another possibility not highlighted in the paper is that the native Amerindian population of pre-colonial America had similar OPRM1 polymorphic status as modern West Eurasians. In light of the recent findings supporting sizeable mutual prehistoric ancestry between these two populations through a conceptual "Ancestral North Eurasian" (ANE) component (Raghavan et al., see below) [4], the OPRM1 congruity between Amerind-European mixed modern Americans and Europeans could partially be attributed to the proposed ANE-containing migratory events. [4]

Estimated Shared Drift Heat Map with MA1 (Raghavan et al.)

Overall, Soto & Catanesi provide us with a good summary of the population structure that can be directly observed through OPRM1 gene polymorphism variation and support earlier work indicating a correlation with ancestry. It would have furnished their discussion better had some exploration of recent developments in archaeogenetics been undertaken. Their assertion of complete OPRM1 SNP status replacement among Amerind-European admixed American individuals is of course possible, but no evidence is provided that categorically dismisses commonality between Europeans and Amerindians on this genomic region as at least partially responsible for the observation.

Human population genetic structure detected by pain-related mu opioid receptor gene polymorphisms.
López Soto EJ, Catanesi CI. Genet Mol Biol. 2015 May;38(2):152-5. doi: 10.1590/S1415-4757382220140299. Epub 2015 May 1.

"Several single nucleotide polymorphisms (SNPs) in the Mu Opioid Receptor gene (OPRM1) have been identified and associated with a wide variety of clinical phenotypes related both to pain sensitivity and analgesic requirements. The A118G and other potentially functional OPRM1 SNPs show significant differences in their allele distributions among populations. However, they have not been properly addressed in a population genetic analysis. Population stratification could lead to erroneous conclusions when they are not taken into account in association studies. The aim of our study was to analyze OPRM1 SNP variability by comparing population samples of the International Hap Map database and to analyze a new population sample from the city of Corrientes, Argentina. The results confirm that OPRM1 SNP variability differs among human populations and displays a clear ancestry genetic structure, with three population clusters: Africa, Asia, and Europe-America."

[Abtract] [Full Article]

References

1. Feng Y, He X, Yang Y, Chao D, Lazarus LH, Xia Y. Current research on opioid receptor function. Curr Drug Targets. 2012 Feb;13(2):230-46.

2. Dayer P, Desmeules J, Collart L. [Pharmacology of tramadol]. Drugs. 1997;53 Suppl 2:18-24. [Article in French]

3. WHO | WHO's cancer pain ladder for adults. [Last Retrieved 22/09/2015]: http://www.who.int/cancer/palliative/painladder/en/

4. Raghavan M, Skoglund P, Graf KE, Metspalu M, Albrechtsen A. Upper Palaeolithic Siberian genome reveals dual ancestry of Native Americans. Nature. 2014 Jan 2;505(7481):87-91. doi: 10.1038/nature12736. Epub 2013 Nov 20.

Steppe Ancestry Estimations in West, Central & South Asia (Ancestral Proportions Method) [Original Work]

2015-08-06T18:59:00.003-07:00

Disclaimer
This is largely a re-post, albeit with additional explanations, from a recent ADMIXTURE autosomal run (Eurasia K20) performed at Anthrogenica by the user Kurd. Full technical information and the original files may be found in his original thread. Full acknowledgement is provided to him for the great work. Unless stated otherwise, assume the contents refer to the Eurasia K20 run. This entry may be updated at any time to include further investigations based on future runs. Finally, this entry assumes the mainstream Pontic-Caspian theory for the genesis of the Indo-Europeans to be fully accurate.

Preamble
This entry/repost contains a "quick and dirty" method for a preliminary attempt at deriving their Sintashta admixture levels in West, Central and South Asians based on the Eurasia K20 scores. Given the different admixture histories elsewhere in Eurasia, this probably won't be very informative for users with ancestral backgrounds outside the lands between Kurdistan and the Indo-Gangetic plains. This is especially the case with modern Europeans, who share the same core components with Sintashta, while also deriving their own Indo-European ancestries from different archaeological cultures and time periods.

Establishing the Context
According to this Eurasia K20 run, Sintashta are approximately 62% Yamnaya, 22% EEF, 10% European and 3% SHG_WHG. Sintashta, at present, appear to be the best proxy for the Indo-Iranians that arrived in West and South Asia. The above four components define the majority (94%) of Sintashta's autosomal profile here.

As discussed elsewhere in Anthrogenica (kudos to user Sein for pointing this out), Sintashta should be considered better surrogates for the Andronovo-related waves which reached West, Central and South Asia than the actual Andronovo samples derived from Allentoft et al. 2015. This is due to the Andronovo samples being derived from the extreme northeast of the archaeological horizon (above the Altai, near Afanasievo). Their position opens up the possibility for extraneous admixture from other steppe groups (including early speakers of Tocharian through Afanasievo?).

The user Kurd has previously demonstrated that recent steppe-related admixture may be segregated from other components. While undertaking this exercise, it also looks like Kurd has done an excellent job addressing the "teal" component that defined up to half of Samara Yamnaya and a big chunk of Sintashta. Kurd's K20 is, in my view, the most effective attempt thus far at separating the complicated autosomal overlapping in West and Central Eurasia.

Introducing the Ancestral Proportions Method (APM)
At present, the genetic landscape in West, Central and South Asia presents as a triple conundrum:

There is, to date (and with the exception of the poor quality Barcin Neolithic Turkish sample), absolutely no interpretable ancient DNA (aDNA) from any of these regions, or indeed, at any point in this broad area's history. Perhaps the greatest obstacle at present.
Autosomal and uniparental marker data from across the region are either inconsistent in sample strategy, or outdated, preventing a knowledge-based approach towards interpreting results.
Archaeological evidence is inconsistent across the region; some cultures are well-studied, whereas others have fallen to mirthful speculation or cannot be readily assigned to any particular prehistoric group.

The APM is, in principle, unconcerned with these issues. Instead, it relies on objective data from a single ancient population to discern the numerical degree of overlap with modern populations.

Whether or not Iranians, Punjabis or Nepalis derive the bulk of their ancestry from unrelated group X or Y is beside the point. The sole purpose of the APM is, therefore, to establish whether or not ancient population Z has left any genetic imprint on modern populations A-K, and if so, to what extent. As such, the methodology described here is completely different as it is assymetrical; one-way gene flow across space and time from one ancestral (extinct) population to numerous extant populations. APM or derivative approaches should be considered as supplementary rather than directly competing with symmetrical modelling techniques such as f3 statistics.

The APM was specifically designed to answer the question; to what degree did Sintashta-related populations contribute to the modern groups of West, Central and South Asia? This simple inquiry has a tendency to attract considerable debate and wildly differing estimates in online discussion boards. Today, using the APM and recently generated data from the Eurasia K20 run, I hope to provide one set of estimations completely independent of extraneous modelling factors.

This approach is entirely reliant on high component specificity (e.g. minimal overlap or bleed-over from one component to another). This particular parameter is not within my control in this instance. As such, the outputs from APM here should be considered cautionary preliminary estimates at best, given the potential for ADMIXTURE-related shortcomings in the absence of relevant aDNA. I anticipate this approach will be much more effective at gleaning admixture extents once aDNA from West and Central Asia dating <2200 B.C. are retrieved.

The APM Approach
To contrast against the ADMIXTURE Sintashta scores, two different approaches are utilised together:

1) Direct Overlap (DO): summarised, for each component, the maximum overlap between a given population average and Sintashta's scores are calculated. This is done individually across all four components (Yamnaya, EEF, European, SHG_WHG) with the outputs added. See image below for schematic (conceptual breakdown of how the DO principle works between hypothetical samples 1 and 2, with Components a-d representing the distinct components).

Schematic diagram showing the principle behind Direct Overlap calculation

2) Component Proportions (CP): A single dominating component (frequency > 50%) is considered modal for the ancestral population of choice, with the other values considered as a fraction of this in modern populations. Given the Yamnaya component makes up almost two-thirds of Sintashta, the ratio between a population's and Sintashta's Yamnaya score are calculated and re-applied to the rest.

There are, however, problems with either approach:

1) DO is overinflated the more West Eurasian a population is. For example, several of the Iranian or Kurdish users at Anthrogenica had component scores greater than what is found in Sintashta (e.g. European being 12% in one sample, when it's 10% in Sintashta). This biases the results for Iranians and Kurds greatly, even when absolute value adjustments are set in place, which the formula shows is (it is highly improbable an Iranian with 10% European derived all of it from Sintashta).

2) CP is more accurate given the Yamnaya component appears highly steppe-oriented in Eurasia K20 and can therefore serve as a direct admixture marker. However, some of the South Asians are scoring very low, or almost none of, the other key components found in Sintashta (e.g. EEF). Due to this, CP doesn't fully account for the "missing variation" in South Asians, biasing the results slightly in their favour.

One convenient workaround is undertaking an average of both scores. However. given CP is intuitively more accurate due to the reasonable specificity of the Yamnaya component, a weighted average biased in favour of CP by a ratio of 3:1 was undertaken. The ratio choice in this variant of the APM is arbitrary here. Other variants (2:1, 4:1) would not result in radically different outcomes.

Results
Full results from up to 24 populations are shown in the Data Sink (interactive chart below). Summarised, Pamiri Tajiks are the most Sintashta-derived at 31.9%. North Caucasian (Ossetian) and Central Asian ethnic groups (Pashtuns, Uzbek, other Tajiks, Turkmen) follow at 22-19%. Various other ethnic groups across West, Central and South Asia follow. The lowest scoring population sampled here are the Makrani at 9.2%.

Internal Validation
The output (Data Sink) readily demonstrates strong correlation between DO and CP scores per population (e.g. Tajik Pamiris at 34.20% & 30.8%, Nepalis at 15% & 14.5% and Makrani at 10.2% & 8.7% respectively account for the top, middle and bottom pairs). The only marked deviation between the DO and CP scores were noted in West Asian populations (Armenians, Kurds, Iranians), as mentioned previously. Thus, empirical confirmation of correlation (e.g. Spearman's rank order) is unwarranted here.

Another means of confirming the validity of APM is to confirm Andronovo is a descendant of Sintashta. As the Andronovo archaeological horizon originates from Sintashta directly, one would expect very high (>90%) Sintashta-derived ancestry among them.

This appears to be the case. compared against Sintashta, Andronovo exhibits DO = 83.9%, CP = 97.3%, an average of 90.6% and a weighted average of 92.8%.

Summarised, these two results (dataset-wide correlation, ancestral-immediate successor high overlap) validate the outcomes of the APM.

Closing Thoughts
The results featured in this entry are in line with both broad uniparental marker data, previously published IBD results (unfortunately removed from sources) and are largely (though not fully) compatible with the degree of archaeological input from Andronovo-derived cultures in Asia. As stated previously, due to earlier shortcomings, they should not be considered definitive.

Given the CP here is not exclusively associated with Sintashta, I anticipate this technique will be more accurate if future "steppe"/"Yamnaya"/"Yamanya_related" components are shown to define more of the Sintashta samples. I look forward to extending this method in the near future.

Acknowledgements
Special thanks to the user Kurd from Anthrogenica for making this data available and obliging member inquiries with productive responses, as well as the user Sapporo for generating several of the population averages.

Comparison of Online Y-STR Predictors (Petrejcíková et al.) [Review]

2015-07-28T05:56:00.000-07:00

Introduction
An interesting study was published in 2014 based on Slovak Y-STR samples testing for 12 microsatellite markers. The main scope of this paper appears to be the investigation of the efficacy of three publicly available Y-STR haplogroup predictors (Athey, Cullen and YPredictor in alphabetical order) based on these 12 Y-STRs. Study contents shown below.

Y-SNP analysis versus Y-haplogroup predictor in the Slovak population.
Petrejcíková E, Carnogurská J, Hronská D, Bernasovská J, Boronová I, Gabriková D, Bôziková A, Maceková S. Anthropol Anz. 2014;71(3):275-85.

Human Y-chromosome haplogroups are important markers used mainly in population genetic studies. The haplogroups are defined by several SNPs according to the phylogeny and international nomenclature. The alternative method to estimate the Y-chromosome haplogroups is to predict Y-chromosome haplotypes from a set of Y-STR markers using software for Y-haplogroup prediction. The purpose of this study was to compare the accuracy of three types of Y-haplogroup prediction software and to determine the structure of Slovak population revealed by the Y-chromosome haplogroups. We used a sample of 166 Slovak males in which 12 Y-STR markers were genotyped in our previous study. These results were analyzed by three different software products that predict Y-haplogroups. To estimate the accuracy of these prediction software, Y-haplogroups were determined in the same sample by genotyping Y-chromosome SNPs. Haplogroups were correctly predicted in 98.80% (Whit Athey's Haplogroup Predictor), 97.59% (Jim Cullen's Haplogroup Predictor) and 98.19% (YPredictor by Vadim Urasin 1.5.0) of individuals. The occurrence of errors in Y-chromosome haplogroup prediction suggests that the validation using SNP analysis is appropriate when high accuracy is required. The results of SNP based haplotype determination indicate that 39.15% of the Slovak population belongs to R1a-M198 lineage, which is one of the main European lineages.

[Abstract] [Direct Link]

Are They Really Comparable?
Although all three predictors returned similar efficacy rates (~97-99%), it should be noted the authors' chief divisions of interest appear to be the conventional subclade designations currently used in both literature and the genetic genealogy community (e.g. R1a1a-M198). The authors correctly state Y-SNP testing is paramount in definitively gauging subclade classifications, especially for lines substantially downstream of a given haplogroup's phylogeny.

The rest of this entry determines whether these calculators display any other features which may give aspiring researchers reasons to choose one over another.

Subclade Coverage
A substantial difference is observed between the three. Athey's output is oriented around 21 categories spread across most of the major clades/subclades, although haplogroups not commonly found in West Eurasia (e.g. A-D) are unrepresented. Cullen improves on this significantly with 86 subclades, with Y-DNA I receiving the most attention (R1b to a lesser extent), with some improvements, such as well as the inclusion of "A&B". YPredictor has the highest count, hosting over 100 subclades, with the majority found in Y-DNA haplogroups E, G, J, N and R. With the exception of Y-DNA M and S, all are accounted for here.

STR count
Athey is capable of handling 111 Y-STR's (21 and 27-STR versions also available) with the format being listed in either numerical or Family Tree DNA (FTDNA) order. Cullen accepts a maximum of 67 STR's. YPredictor houses approximately 82 STR's. As such, all three are capable of handling a considerable number.

Interface
All three predictors permit the use of batched data and provide different means of categorising the data as seen fit by the user. Instructions are adequately provided for all three as well. As a research utility, however, YPredictor stands out through its' custom YFiler iterations (widely-used format in population genetics publications concerning Y-STRs) and debug feedback before predictions are made by the calculator.

Computational Time
This varies based on the user's CPU processing time, as well as whether they are manually entering STR values or inserting batched data. As such, this probably shouldn't be a pertinent factor in deciding which calculator to use.

Output Information
All three produce similar information (subclade prediction with probability expressed as a percentage).

Conclusion
Before summarising these findings, it is worth noting that Athey's predictor precedes Cullen's and YPredictor. As such, any perceived deficiencies in subclade breakdown or functionality are likely a result of age. Athey's predictor was widely used in the past, irrespective of the current application rate.

All three predictors are of use to genetic genealogists. This entry concludes the following "idealised" purposes for each:

Athey - For users keen to utilise upwards of 111 FTDNA Y-STR's as cross-validation against the other two
Cullen - Best for those seeking refined Y-DNA I or R1b subclade predictions
YPredictor - Most versatile and research-friendly, best worldwide coverage of Y-DNA subclades

As such, the three calculators certainly are comparable for making basic Y-STR predictions for West Eurasians, but obvious differences exist with respect to non-West Eurasian subclade coverage.

If compelled to make a single choice, I would recommend Cullen first to genetic genealogists of Northwest European paternal heritage (given the high frequencies of Y-DNA's I and R1b). YPredictor would be the best choice for those belonging to subclades more common outside Europe. This also explains why it has been extensively used in this blog to date. Athey's function has otherwise been usurped by the other two.

Presenting Bakhtiari Uniparental Marker Data [Original Work]

2015-07-09T05:47:00.003-07:00

Introduction

Bakhtiari people (Google Search)

The Bakhtiari people are one of Iran's ethnic minorities. Inhabiting the Iranian plateau's southwestern portion, the Bakhtiari traditionally maintained a hierarchical social structure with a genealogical basis (with organisations or positions including rish safids, kalantars, khans and ilkhani) [1]. Historically, the Bakhtiari have played a role in several pivotal events leading up to the formation of the modern Iranian state [2].

In recent years, the Bakhtiaris have received additional attention in the literature with respect to ancestry. This has been achieved predominantly via uniparental markers (Y-DNA and mtDNA) and coincides with work addressing the genetic origins of other ethnic minorities in Iran. For instance, in 2012, Grugni et al. expanded our understanding of Iranian Y-DNA across the country through sampling almost 1,000 unrelated men across 15 distinct ethnic groups (previous entry).

In spite of such developments, however, the Bakhtiari have not received much attention in either the genetic genealogy community or the literature. This entry attempts to explore the available data and arrive at a stable set of results for this group.

Method

Khuzestan province, Iran (Wikipedia)

Search engines were limited to PubMed and Google Translate. Search terms included "Bakhtiari", "Y-DNA", "Y-Chromosome", "mtDNA", "mitochondrial", "STR", "SNP", "HVR" and "Iran". No limit was placed on publication date. All mtDNA and Y-DNA data was compiled. Where Y-STRs are presented, these were run through Vadim Urasin's YPredictor (v1.0.3 offline version). A 70% prediction strength threshold was implemented. If the resulting data is sparse, novel ways of consolidating the information will have to be devised and explained during the course of this entry.

Search Outcomes
Three studies were found to contain Bakhtiari uniparental data, with one partially covering Bakhtiari mtDNA (Derenko et al. 2013 [3]) and two for Y-DNA (Nasidze et al. 2008 [4], Roewer et al. 2009 [5]). The Bakhtiari populations featured mostly reside in Izeh, Khuzestan province, Iran [3-5] with a single sample coming from Lurestan province, Iran [4].

mtDNA Results
Derenko et al. featured only two Bakhtiari samples. One belonged to mtDNA H*, which was also observed in several Persian (Kerman province) and Qashqai samples, alongside a single Armenian. [3] The only other sample was mtDNA U2d2, also found in a single Persian (Kerman province). The authors noted that the combined frequency of mtDNA's U2c and U2d in Iran were highest among the Persians nationwide (approaching 10%) [3]. However, given the absence of additional samples, no reasonable conclusions can be drawn from these results.

Nasidze et al. provides both frequency and HVR1 derived variance data on the Bakhtiari and Ahwazi Arab populations [4]. The Bakhtiari appear to chiefly belong to mtDNA haplogroups N, U, H, T and J (below).

mtDNA Frequency Data from Khuzestan province, Iran {Nasidze et al. 2008)

Unfortunately, further information on subclade breakdown is not provided. However, as concluded by the authors and is evident through frequency data, the mtDNA profile of the Bakhtiari is almost identical to the Ahwazi Arab sample. Additionally, Nasidze et al. note "considerable sharing of HV[R]1 sequences" between these two groups [4]. In tandem with the inferences described above through Derenko et al., it appears that significant matrilineal marker overlap does exists across the Iranian plateau.

Y-DNA Results
Nasidze et al. first published data on 53 unrelated Bakhtiari men [4]. Due to substandard Y-SNP genotyping, the only conclusions that may broadly be discerned is the Bakhtiari chiefly belong to Y-DNA haplogroups J2-M172 (25%) and G-M201 (15%) (Data Sink). In this respect, these results cannot give observers a reliable indication of the Bakhtiari Y-DNA profile. Roewer et al.'s data indicates that some number of Bakhtiari do share the same core 17 STR haplotypes among one another (e.g. J2a4, T*) but do not with any other samples across the country [5].

One "quick and dirty" way of addressing this problem is by using the YFiler (17 STR) Bakhtiari haplotypes (Data Sink) from Roewer et al. to "recharacterise" the Nasidze data. This is deemed the most suitable option for two reasons:
1) Nasidze et al. has an adequate sample size (n=53) but inadequate Y-SNP genotype selection
2) Roewer et al. has an inadequate sample size (n=18) and no confirmed Y-SNP testing, but the YPredictor data should provide reasonable subclade determination with a 70% probability threshold in place

"Recharacterisation" is achieved by expressing the Nasidze et al. data by the predicted subclade information provided by the Roewer et al. SNP predictions proportionally. For example, Nasidze et al. found "DE-YAP" at 8%, with the Roewer et al. predicted results showing 5.6% each for "DE*" and "E1b1b1". As both these subclades are contained within the DE-YAP node, the original value is recharacterised as DE 4% and E1b1b1 4%. The outcome is presented numerically (Data Sink) and demonstrated below (values rounded down to fit to 100%):

Y-DNA J2a4 constitutes the largest subclade (22.1%), with H (10.8%), R1a1a (8.9%) and T* (8.5%) following. The results imply considerable Y-SNP diversity within the Izeh Bakhtiari.

These results are somewhat at odds with that suggested by the Roewer et al. figures, particularly the frequency of Y-DNA J2-M172 (50% in Roewer et al. vs. 25% in Nasidze et al.). The most likely basis for this is sampling bias, given the former only tested for 18 individuals. It should be noted that Y-DNA J-12f2 has been documented to have a major (>60%) presence in Southwestern Iran (Quintana-Murci et al. 2001) with the majority of this likely being represented by downstream J2-M172 subclades (as per Grugni et al. 2012). It is therefore plausible for some Bakhtiari groups to yield exceptionally high frequencies of Y-DNA J2-M172 (likely J2a4 subclade) with future testing. The breakdown shown above is also broadly in line with past data from Southwestern Iran (Grugni et al. 2012).

It must be cautioned that literal interpretation of these results (both subclade breakdown and numbers) are not advised due to the inaccuracies brought by the "recharacterisation" and the lack of Y-SNP confirmation in Roewer et al.

It should also be emphasised that, as a tribal group, the Bakhtiari have most likely undergone genetic drift in their uniparental markers over time. As such, the finding of ~10% Y-DNA H is not completely surprising. Whether these values will be substantiated in future work is an open question.

Conclusion
The current evidence does suggest that the Bakhtiari closely resemble and share heritage with their immediate neighbours matrilineally, resting upon a backdrop of some common mtDNA diversity across the Iranian plateau. Inferences beyond this point will fall towards the realm of speculation.

The situation appears somewhat inverted on the Y-DNA side, where non-existent Y-STR haplotype sharing is observed with other groups in the Iranian plateau. The "recharacterised" data gives us an approximate idea of what the Bakhtiari Y-DNA profile should look like if Nasidze et al. used a better Y-SNP genotype panel.

Other ethnic minorities in Iran have received consistent attention in this respect, such as the neighbouring Qashqai and Lurs (Farjadian et al. 2011). The paucity in Bakhtiari uniparental marker data indicates this is very much an area that needs immediate attention. An initial first direction for researchers is to sample at least 50 unrelated individuals from Izeh using a more conventional Y-SNP genotype panel. Additional clarity will be gained by testing further areas, as well as reconciling the Bakhtiari tribal structure with these outcomes.

Acknowledgements
A very special thanks to the user "J Man" from Anthrogenica for bringing this interesting topic to my attention.

[Edit 10/07/2015]: I have also learned while researching this topic that Dr. Ivan Nasidze unfortunately passed away in 2012. His work served as an important early foundation towards understanding the genetic constitution of Caucasian and Iranian populations. May he rest in peace.

References
1. Bakhtiari. Last Accessed 25/06/2015: http://www.everyculture.com/Africa-Middle-East/Bakhtiari.html

2. Study of the Qajar government policy at the case of Household Bakhtiari. Last Accessed 6/07/2015: http://waliaj.com/wp-content/2014/Issue%201,%202014/26%202014-30-1-pp.124-127.pdf

3. Derenko M, Malyarchuk B, Bahmanimehr A, Denisova G, Perkova M, Farjadian S. Complete mitochondrial DNA diversity in Iranians. PLoS One. 2013 Nov 14;8(11):e80673. doi: 10.1371/journal.pone.0080673. eCollection 2013.

4. Nasidze I, Quinque D, Rahmani M, Alemohamad SA, Stoneking M. Close genetic relationship between Semitic-speaking and Indo-European-speaking groups in Iran. Ann Hum Genet. 2008 Mar;72(Pt 2):241-52. doi: 10.1111/j.1469-1809.2007.00413.x. Epub 2008 Jan 20.

5. Roewer L, Willuweit S, Stoneking M, Nasidze I. A Y-STR database of Iranian and Azerbaijanian minority populations. Forensic Sci Int Genet. 2009 Dec;4(1):e53-5. doi: 10.1016/j.fsigen.2009.05.002. Epub 2009 Jun 5.

Worldwide Population Y-DNA Collated (Xu et al.) [Review]

2014-09-05T06:44:00.005-07:00

Approximately one week has passed since a new paper by Xu et al. was indexed by PubMed and made available online ahead of printing:

"The Y chromosome is one of the best genetic materials to explore the evolutionary history of human populations. Global analyses of Y chromosomal short tandem repeats (STRs) data can reveal very interesting world population structures and histories. However, previous Y-STR works tended to focus on small geographical ranges or only included limited sample sizes. In this study, we have investigated population structure and demographic history using 17 Y chromosomal STRs data of 979 males from 44 worldwide populations. The largest genetic distances have been observed between pairs of African and non-African populations. American populations with the lowest genetic diversities also showed large genetic distances and coancestry coefficients with other populations, whereas Eurasian populations displayed close genetic affinities. African populations tend to have the oldest time to the most recent common ancestors (TMRCAs), the largest effective population sizes and the earliest expansion times, whereas the American, Siberian, Melanesian, and isolated Atayal populations have the most recent TMRCAs and expansion times, and the smallest effective population sizes. This clear geographic pattern is well consistent with serial founder model for the origin of populations outside Africa. The Y-STR dataset presented here provides the most detailed view of worldwide population structure and human male demographic history, and additionally will be of great benefit to future forensic applications and population genetic studies."

This paper showcases a staggering 979 distinct Y-DNA 17 STR haplotypes across 44 distinct populations from across the world. These haplotypes are soon to be uploaded to the Y-STR Haplotype Resource Database (YHRD). The authors have made all the haplotypes, together with a slew of additional information, publicly available independent of the official article (raw haplotypes, Y-DNA haplogroup predictions).

In this entry, the collated results of all populations are reviewed, together with cursory inferences provided with the intention of aiding interpreting them.

Method
All 979 haplotypes were retrieved through the above link. Each population dataset was run through Vadim Urasin's YPredictor (v1.5.0). A 70% prediction strength threshold was implemented. All nomenclature were reduced to the haplogroup level to avoid confusion for future readers should these change in time. These haplotypes formed the collated population results.

Results
877 haplotype predictions met the 70% threshold established. Without having access to the original study, it is apparent that the authors also used Urasin's YPredictor, given the identical predictions.

The collated population results have been organised by the location of sampling by continent or region and can be found in the Data Sink. Direct links to each section accompanied by the list of populations sampled are listed below for the reader's convenience with a brief runthrough of some interesting findings under each.

1. Europe Adygei (Russia), Chuvash (Russia), Danes (Denmark), Finns (Finland), Hungarians (Hungary), Irish (Ireland), Khanty (Russia), Komi (Russia), Russians (Archangelsk), Russians (Vologda), Yakut (Russia)

The Adygei present as expected; they are predominantly G-P15 and J-L26 with various subclades of haplogroup R. Various subclades of haplogroups N and R define the Chuvash, with an additional appearance by J-L26 and Q-MEH2. Ethnic Russian populations appear to have their own regionalised diversity on the backdrop of being predominantly R-M198 and downstream subclades (particularly R-M458). The Irish are predominantly (~81%) R-M269, although the presence of a single man with H-M82 is surprising. Finally, the Yakut too belong overwhelmingly to haplogroup N (~78%) with a single man being predicted as I-P37.2.

2. Middle-East Druze (Israel), Samaritans (Israel), Yemenite Jews (Yemen)

The Druze are one of the better-sampled populations in this study, where they are mostly represented by various subclades of haplogroups E and G, together with R-M269 and T-L162. The Samaritans are defined (in order of decreasing frequency) exclusively by J-L26, J-P58 and E-V22. Finally, the Yemenite Jews present with a similar (though more restricted) spectrum as the Druze with some differences in frequency.

3. East Asia Ami (Taiwan), Atayal (Taiwan), Cambodians (Cambodia), Chinese (USA), Chinese (Taiwan), Hakka (Taiwan), Japanese (USA), Koreans (S. Korea), Laotians (Laos)

The Ami are unsurprisingly defined mostly by downstream subclades of haplogroup O, although there does appear to be an I-M223 and L-M317 among them. The Atayal, also of Taiwan, are exclusively O-MSY2.2. The Cambodians appear to have even more lineages which are typically expected further west. The Japanese boast the highest frequency of D-M55 out of all the populations sampled (21.1%). The Korean results contrast with this through the presence of men with N*-LLY22g(xM128,P43,Tat) and Q-MEH2. The Laotians appear to have one man with DE*-M1, although this will require SNP testing to definitively confirm.

4. Africa Ashkenazi Jews (S. Africa), Biaka Pygmies (CAR), Chagga's (Tanzania), Ethiopian Jews (Ethiopia), Hausa (Nigeria), Ibo (Nigeria), Masai (Tanzania-Kenya), Mbuti Pgymies (Congo R.), Sandawe (Tanzania), Yoruba (Nigeria)

The Ashkenazi Jews of South Africa appear to have a Y-DNA spectrum that is completely typical of Southwest Asians (please compare with the Druze). The Bagandu are largely defined by subclades of haplogroups B and E. Tanzanians here are completely haplogroup E and T. The presence of G-M15, J-L26 and R-M269 among the Hausa is surprising and may be attributed to a colonial European presence or some other forms of interaction. The Sandawe have some rather unusual results given their geographical position (I-P37.2 and Q-MEH2), raising the possibility these haplotypes were predicted incorrectly.

5. Australasia Micronesians (Micronesia), Nasioi Melanesians (Solomon Islands)

Both the Micronesians and Melanesians have an unusually diverse spectrum. It is difficult to ascertain whether the parahaplogroups shown are genuine or, as described above, a result of incorrect predictions. A recent paper revealing the presence of newly discovered offshoots from haplogroup K in Southeast Asia [1] raise the possibility some of these may be genuine.

6. Americas Karitiana (Brazil), African Americans (USA), European Americans (USA), Maya (Mexico), Pima (USA), Rondonian Surui (Brazil), Ticuna (Brazil)

The Karitiana are predominantly Q-MEH2 but appear to have some non-American admixture through E-U175. African Americans are represented as an approximately 4:6 mix of R-M269 against various haplogroup E subclades. The Maya population, like the Karitiana, are Q-MEH2 with additional markers from outside the Americas, as are the Pima. The trend continues with the Quechua people, although C-M217 and T-L162 make their first appearance here. Finally, the Rondonian Surui and Ticuna are completely Q-MEH2.

Criticisms
There are at least two areas of the authors' methodology which are deemed to be drawbacks and prevent this study from being exceptionally informative.

Firstly, the authors evidently used the YFiler sampling array to complete this investigation. In an era where commercial testees can enjoy upwards of 111 Y-STR's, the long-term usefulness of this paper's extensive worldwide sampling is cut short. Another recent paper presenting Y-STR's worldwide has done so using 23 rather than just 17. [2]

My comments are more critical of the authors' sampling strategy. More data is never strictly a burden in the world of population genetics, but the informativeness of groups such as "European Americans", "Irish" and Chinese born in the USA is questionable. For instance, these groups are already richly represented, be it in the current literature or FTDNA Project groups. The apparent issue with these samples would have been rectified if they were simply obtained from a single area, providing regional specificity which may prove useful in better establishing genetic variation within Ireland, for example.

Finally, the haplotypes could have also received a "backbone" SNP test each to definitively place them within the current phylogeny. The drawbacks of STR-alone testing became readily apparent with some of the African samples. I can only speculate it is the highly divergent nature of certain uniquely African haplotypes from Eurasian ones which produced these spurious results.

On Mutation Rates (Quick Discussion)
In this study, both BATWING and the average squared distance (ASD) method were used. Within each, four different mutation rates were implemented. On initial inspection these appear to vary wildly. However, on closer examination, it appears all the BATWING most recent common ancestor (MRCA) calculated ages are approximately twice as old as those generated by the ASD method. Even within each technique there is substantial variation; the evolutionary rate appears approximately three times greater than the others. Furthermore, these "other" mutation rates do tend to congregate around a common similar value (e.g. through BATWING, the calculated global age of their Y-DNA R-M198 haplotypes was 5.5k, 6.1k and 6.2kya), which would intuitively suggest the "actual" value lies somewhere within these either through BATWING or ASD. The discrepancy here cannot be overstated and calls into question why some researchers are still utilising a "blanket" mutation rate across several loci which are shown to have significantly different tendencies to mutate (colloquially described as "slow", "medium" and "fast" mutators). I am uncertain whether the authors are in fact doing this, but the implications of this are apparent, as they prevent rational "fitting" of these numbers into candidate prehistoric narratives from happening. This entire topic will likely be explored in a future entry.

Conclusion
Although at least three drawbacks (four including the MRCA calculations) are identified here, this study provides researchers worldwide with a plethora of data from populations that are either poorly represented in the current literature or have been entirely absent until present. The majority of the results outline the wide Y-chromosomal diversity across the world, whilst also revealing specific trends that have been established in both the current literature and in online discussion boards. An mtDNA counterpart of this paper would be a wonderful addition to see sometime in the near future.

There is a bountiful amount of data to be interpreted with pre-existing ideas/models and compared with prior studies which place a premium on each population's area. I welcome any form of dialogue regarding the results. There, is, for many of us, plenty to elucidate. The conclusion does not end here; I encourage as much further investigation and thought by the readers as the data permits.

[Addendum @ 05/09/2014]: Error regarding Karitiana data. Modified and updated.

Citations
1. Karafet TM, Mendez FL, Sudoyo H, Lansing JS, Hammer MF. Improved phylogenetic resolution and rapid diversification of Y-chromosome haplogroup K-M526 in Southeast Asia. [Last Retrieved 03/09/2014]: http://www.nature.com/ejhg/journal/vaop/ncurrent/full/ejhg2014106a.html

2. Purps J, Siegert S, Willuweit S, Nagy M, Alves C, Salazar R et al. A global analysis of Y-chromosomal haplotype diversity for 23 STR loci. [Last Retrieved 05/09/2014]: http://www.fsigenetics.com/article/S1872-4973%2814%2900084-2/abstract

Anchored in Armenia: An Exercise in Genetic Relativity [Original Work]

2014-08-06T16:51:00.002-07:00

Introduction

Location of the Armenian Highlands in West Asia

As is the case with many groups in the region, the Armenians are, anthropologically-speaking, a very unique modern ethnicity. Situated in the Armenian Highlands (an expansive area straddling between the Zagros & Caucasus range) with a settlement history dating since the Neolithic, the modern Armenian people have maintained a distinct culture both shaped and shielded by the mountainous territory they inhabit. [1] One unique aspect of the Armenian people is their language; Modern Armenian is an Indo-European language belonging to its' own branch. There has long been scholarly debate regarding its' linguistic exodus from the Proto-Indo-European homeland (commonly accepted by modern linguists as the Pontic-Caspian steppe) [2] through to its' historical seat in the South Caucasus. As is evident by the attested Urartian and Hurrian loanwords in later forms of the language, Armenian must have been spoken by its' current forebears since at least before 500 B.C. [3] Various genetics enthusiasts (including myself) on differing occasions have cited this as an indication of an aboriginal West Asian genetic layer accompanying the Urartian-Hurrian vocabulary substratum.

Presumably due to the on-going political instability in West Asia, there has been an unfortunate lack of ancient DNA (aDNA) recovery in the areas adjacent to the Armenian Highlands. Alongside the Armenians, West Asia proper is also home to Anatolian Turks, numerous Kurdish groups, the Assyrians, several Jewish minorities and various ethnic groups within Iran. Inter-relation of all these groups in differing extents has been demonstrated in both published studies [4] and the open-source projects. [5,6]

Mount Ararat - A symbolic item in Armenian culture

Although they have most likely experienced their own demic events in prehistoric times, the insular nature of the Armenians relative to their neighbours allows them to be used as a stand-in for the aDNA we currently lack in this part of the world. In this blog entry, the Armenians will therefore be considered as a surrogate for autochthonous West Asian ancestry. They will be treated as a primary donor population (PDP) for several other West Asian groups, in an attempt to flesh out the degree of mutual shared ancestry, as well as the directions of added affinities beyond the region. This is by no means an authoritative attempt to purport a particular image of the West Asian genetic landscape, but an attempt instead to provoke discussion and explore the underlying structure of the region through a manner that should hopefully yield fruitful results in the glaring absence of aDNA in the region.

Working Hypotheses

1. Given the demonstrated similarity in autosomal DNA profiles (here and here), modern Armenians will serve as a reasonable PDP for all tested populations.

2. Furthermore, the genetic difference (GD) will likely be dictated by geographical proximity to the Armenians, or a (lack of) history of admixture with them.

3. Finally, the other donor populations will be anticipated either by virtue of geography or language.

Method

The D odecad K12b Oracle was used to undertake this small project (please visit link for technical information). When executed through R, the program was set to Mixed Mode and fixed to 500 results for every iteration per population. The command entered therefore remained the same each time:

DodecadOracle("WestAsianPopulation",mixedmode=T,k=500)

Samples consist of nine location-specific populations (Iranians, Kurds_Y, Azerbaijan_Jews, Iraq_Jews, Iran_Jews, Turks, Turks_Aydin*, Turks_Kayseri*, Turks_Istanbul*) and four Dodecad participant averages (Iranian_D, Kurd_D, Assyrian_D, Turkish_D). A total of thirteen populations were therefore included.

From the output, only those combinations expressing an Armenian population as a PDP were selected. In this context, the Armenians will be considered a PDP if their "ancestral" percentage exceeds 50%. A maximum of ten were collected per population. In the event the number of combinations exceeded this, the subsequent combination lists are terminated with an ellipsis.

* Although not included in the original Dodecad K12b Oracle dataset, Dienekes has conveniently shared the population averages for these samples here. These were manually inserted into the command.

Results

Iranian and Kurdish Oracle results

Unsurprisingly, the Iranians and Kurds all display similar results. Specifically, the adoption of either Makrani or Balochi as the secondary donors when Armenians are fixed as a PDP. The proportions are also comparable between all. The Iranians appear to fit the Armenian + Balochi/Makrani combination slightly better than the Kurds (GD=4.04-5.16 vs. 5.03-6.65 to 2 d.p. respectively). It is also worth observing that both Iranians and Kurds, irrespective of sampling strategy (location-specific or Dodecad average), do not have Mixed Mode results which exceed ten.

Assyrian and select Near-Eastern Jewish Oracle results

The Assyrians are one of the groups of interest, given the demonstrated autosomal similarity between them and Armenians (here). As anticipated, their Mixed Mode results well exceed ten and the best fits (GD=1.66-1.82 to 2 d.p.) are all, coincidentally, with the Near-Eastern Jewish groups studied here. Subsequent matches include additional populations (e.g. Saudi, Bedouin, Syrian) where the GD remains relatively small compared to the Iranian and Kurdish values (>3.15 to 2 d.p.).

The Near-Eastern Jewish groups largely mirror the Assyrian results, although some key differences should be outlined:

The Azerbaijani Jews have a GD similar to the Assyrians in range, setting them apart from the Iraqi and Iranian Jews. This seems to fit geography. However, if the association was strictly geographical, one would expect the Assyrians to lie in-between the Azerbaijani Jews from the Iraqi and Iranians. This may be genetic evidence of additional and direct ancestry between Armenians and Assyrians at some (or various) point(s) after the Near-Eastern Jewish groups had formalised their identities.
Saudis appear as a secondary donor population in all groups. Interestingly, they appear to have an inverse relationship with geographic proximity to the Armenian Highlands; Iraqi, Iranian and Azerbaijani Jews are 20.4%, 16.1% and 7.8% "Saudi" respectively. The Assyrians too fall on this cline despite the point raised above.

Anatolian Turkish Oracle results

Finally, the Anatolian Turks provide us with another set of interesting values and pairs:

Mixed Mode results from Western Turkey (Aydin, Istanbul) largely exhibit a combination of Armenian with various European ethnic groups or nationalities, which can be predominantly ascribed to geography. Please note the comparatively large GD among the Aydin average (>9.93 to 2 d.p.), which contrasts with Istanbul. I suspect the cosmopolitan nature of Istanbul has resulted in an artefactual lowering of the GD, given Anatolian Turks from
across the country have moved their for employment purposes. [7]
In contrast, the samples listed as "Turks" in Dodecad K12b (from the Behar et al. dataset, located in Central-South Turkey) model well as a combination of Armenian with either the Chuvash, Nogay, Uzbek or Uyghur. European secondary donors do make an appearance once more. Please also note their GD is the smallest out of the Turkish averages investigated (4.20 to 2 d.p.).
The Kayseri average (Central Turkey) yielded no results matching the criteria outlined in "Method". However, the Assyrians instead made a frequent appearance as primary donors from GD=6.17 onwards. Given the genetic affinity between Assyrians and Armenians (refer above), and the consistency displayed by the Armenians as a PDP for other Turkish averages, this result can be considered anomalous. A close inspection of the Dodecad K12b proportions reveals the Kayseri Turks were on average approximately 1.5% more Southwest Asian than all other Turkish populations, explaining why Assyrians took preferential placing over Armenians as the PDP. The cause of this slight increase is unknown at present.
The Turkish_D average best resembled that of Istanbul, albeit with slightly more Armenian and less European proportions. This would suggest that, overall, the Dodecad Turkish participants map somewhere just east of Istanbul despite the presumably diverse backgrounds.
Finally, all averages produced Mixed Mode results which exceeded ten in number.

IBD Segment Indications

To corroborate the findings of this investigation with additional genetic data, I refer to the Dodecad Project's fastIBD analysis of Italy/Balkans/Anatolia and fastIBD analysis of several Jewish and non-Jewish groups. As the analyses do not completely encompass those groups studied here, the results cannot be accepted wholesale. However, there does appear to be a broad agreement with some of the results in this investigation. For example, the Armenians and Assyrians have a demonstrated level of "warmth" to one another beyond background sharing.

Further Work

This investigation would have benefited from Azeri Turkish samples via the Republic of Azerbaijan. Additionally, a better breakdown of Kurdish, Iranian and Assyrian samples, akin to the site-specific sampling seen here in the Anatolian Turks, would have been ideal. Finally, as stated above, this investigation would have benefited from the inclusion of IBD segment analysis specific to the studied groups. Should time permit and the desired samples be made available in the future, this would be a natural line of inquiry to further what has been explored here.

Conclusion

Addressing the three hypotheses stated at the beginning in order:

1. Armenians certainly have behaved as a reasonable proxy for an autochthonous West Asian PDP in most of the populations tested (sole exception being the Kayseri Turks although this appears to be an anomalous response to slightly more Southwest Asian scores). The scores vary depending on the presence of the secondary donors, but Assyrians and Jewish populations from Azerbaijan, Iran and Iraq appear to have the largest proportion of this (occasionally surpassing 90%). All Iranians and Kurds, on the other hand, scored the least overall (approximately 65-75%). The Turkish range lies in-between these two.

2. Unfortunately, this isn't clear. The lack of regional results for Kurds and Iranians, together with a lack of samples specifically from Eastern Turkey, prevents any conclusion being reached on this point. The Near-Eastern Jewish populations studied here certainly do form a cline of Armenian "admixture" that is fully in line with geography. Furthermore, the large GD observed in Aydin Turks does support this idea, leading me to cautiously propose geography does indeed play a role. The second point also provides us with a partial answer, as the Assyrians demonstrate more of this than one would expect given their geographical placement based on GD, as well as fastIBD evidence from elsewhere.

3. With the exception of the Assyrians and Near-Eastern Jewish groups, the secondary donors overwhelmingly matched my expectations regarding their placement with whichever group that was studied (e.g. Iranians and Kurds towards South-Central Asia, Turks towards either Europe or Central Asia proper).

Over the coming years, with the availability of more data, we should hopefully move away from the population averages that have been used by various open-source projects. It has been empirically demonstrated here that regional results will differ significantly from nationwide averages (e.g. Aydin Turks vs. Turkish_D).

This also holds true on an individual basis; the best Oracle match for one Iranian via the described methodology was 56.4% Armenians_15_Y + 43.6% Tajiks_Y (GD=5.44 to 2 d.p.), differing significantly from both the Iranian and Kurdish averages.

I suspect the gentlemen running the numerous open-source projects are aware of this caveat and are, justifiably so in my opinion, making do with currently available data.

In closing, this investigation has also determined that, on the basis of the presumption of an Armenian-like autochthonous West Asian substrate, the studied populations as a whole have an apparent degree of inter-relatedness by virtue of this common South Caucasian autosomal heritage, albeit with the presence of highly significant affinities to elsewhere in Eurasia, be it population-wide, regional or even individual.

Speculations

The first topic is regarding the Iranians and Kurds; why were their average secondary donors always the Balochi's and Makrani, rather than more northern groups, such as the Tajiks? I suspect, when applied to population averages, the Oracle program effectively minimises intra-population variation to the point where only the broadest of affinities are indicated. In the case of Iranians, the secondary donor would therefore be one with genetic features that tend to emphasise the difference between Armenians and Iranians (e.g. additional South Asian and Gedrosian admixture). A similar conclusion can be reached with respect to the Turks.

Another interesting point is the demonstrated close relationship between the Assyrians and various Near-Eastern Jewish groups. This has been speculated upon in various discussion forums in the past. More precise tools will be required to elucidate whether these populations share legitimate ancestry with one another, or the affinity is happen-stance, instead reflecting the mixture of similar Near-Eastern groups with (again) similar Caucasus-derived groups at some point in history.

[Addendum I, 07/08/2014]: For a continuation on this with a fellow genome blogger, please read the Comments below.

Acknowledgements

Full credit for both the generation of raw population data and the Oracle program go to Dienekes Pontikos (Dodecad Ancestry Project).

Map of Armenian Highlands from Wikipedia.org. Photo of Mount Ararat courtesy of NoahsArkSearch.com.

Finally, I must refer all visitors interested in understanding the genetic constituency of the Armenian people to the FTDNA Armenian DNA Project. For a more interactive learning experience, two of the administrators (Mr.'s Simonian and Hrechdakian) recently delivered a lecture on this topic, garnishing it with a deeper description of anthropological and geographical aspects as described here.

References

1. Samuelian TJ. Armenian Origins: An Overview of Ancient and Modern Sources and Theories. [Last Accessed 3/08/2014]: http://www.arak29.am/PDF_PPT/origins_2004.pdf

2. Clackson J. Indo-European Linguistics: An Introduction. Cambridge Textbooks in Linguistics [Last Accessed 4/08/2014]: http://caio.ueberalles.net/Indo-European-Linguistics-Introduction/Indo-European%20Linguistics%20-%20James%20Clackson.pdf

3. Greppin JAC. The Urartian Substratum in Armenian. [Last Accessed 4/08/2014]: http://science.org.ge/2-2/Grepin.pdf

4. Grugni V, Battaglia V, Hooshiar Kashani B, Parolo S, Al-Zahery N et al. Ancient migratory events in the Middle East: new clues from the Y-chromosome variation of modern Iranians. PLoS One. 2012;7(7):e41252.

5. Dodecad Ancestry Project: ChromoPainter/fineSTRUCTURE Analysis of Balkans/West Asia [Last Accessed 4/08/2014]: http://dodecad.blogspot.com/2012/02/chromopainterfinestructure-analysis-of.html

6. Eurogenes Genetic Ancestry Project: Updated Eurogenes K13 and K15 population averages [Last Accessed 4/08/2014]: http://bga101.blogspot.com/2014/03/updated-eurogenes-k13-and-k15.html

7. Filiztekin A, Gokhan A. The Determinants of Internal Migration In Turkey. [Last Accessed 05/08/2014]: http://research.sabanciuniv.edu/11336/1/749.pdf

A Hidden Gem in Central Asia: Previously Unknown Y-DNA R1b Haplotype [Original Work]

2013-07-13T20:28:00.000-07:00

1. Introduction

Central Asian Y-DNA diversity has been an area of constant intrigue in the genetics community. Wells et
al.'s The Eurasian Heartland: A continental perspective on Y-chromosome diversity paved the way, with several others following in their regard. Members of the same team (including Dr. Wells) produced another paper - A Genetic Landscape Reshaped by Recent Events: Y-Chromosomal Insights into Central Asia - on the same topic in the following year, this time headed by Dr. Tatania Zerjal. I noted a greater emphasis on East-Central Asian populations as well as a mentioning of Y-STR analysis in the study itself. However, none of this data was supplied, with only Y-SNP information included (shown sporadically in this entry). The age of this paper is apparent through the nomenclature used (see Method section).

Several months ago, I made a request to obtain the Y-STR data from this study to one of the co-authors, Dr. Tyler-Smith, who kindly replied with the results of all sampled populations (Data Sink > Zerjal et al. Raw Data).

In this blog entry, the Y-STR data is showcased with a special emphasis on the Y-DNA R1b-M269 which was discovered.

2. Method

Y-SNP Phylogeny in original paper (Zerjal et al.) [1]

The maximum number of compatible Y-STR's were utilised for processing in Urasin's YPredictor for easier haplogroup identification (14 of a possible 16, DYS434 and 435 were excluded). All data was run through YPredictor. Only samples with ≥70% probability were included in the final results (Data Sink > Processed Data). As discussed below, relevant findings are compared with the basic Y-SNP haplogroups shown in the original study (on right).

One point which needs to be addressed immediately is the high frequency of "_DE-M1" and "P-M45". It appears that the STR selection has led to a phantom result, rendering many of the samples useless. For instance, the original study shows the Kazakhs belong overwhelmingly to C3c-M48, [1] although the probable results shown here are mostly "_DE-M1". The exclusion of DYS434 and 435 from my level of processing likely contributed to this; if one assigns equal weight to the statistical strength of a prediction, removal of two STR's from a panel numbering 16, accuracy is reduced by 12.5%. Additionally, some conversion error seems to have applied with DYS437 (i.e. a value <12 is unusual). Therefore, "_DE-M1" and "P-M45" results were dismissed on account of the mismatch between predicted and likely confirmed haplogroups probably due to a compatibility issue between the study's STR panel and YPredictor..

3. Results

As the majority of samples were removed owing to the caveat described above, this entry will take a qualitative rather than quantitative approach to analysis on the general picture formed. Much of the remaining results are congruent with findings in other papers. Populations around the Caucasus are signified by plenty of R1b-M269, J2a-M410 and G2a-P15. Tajiks and the Kyrgyz were predominantly R1a1a-M17. Mongolians and other East-Central Asian ethnic groups yielded the most O3-M122 and "NO-M14" (likely to be Y-DNA N or O suffering from the STR restrictions described in the Method section).

Y-SNP distribution in Central Asia (Zerjal et al.) [1]

3i. The R1b Signal

R1b-M269 was found across Central Asia and not only in the Caucasus (Armenians, Azeris, Georgians, Ossetians). It was mostly detected among the Turkmen (trk1, trk2, trk4, trk6, trk7, trk22, T29, T32) with a single sample among the Uzbek (uz-s110). [1]

Analysis of the haplotypes (including DYS434 and DYS435) revealed the nine Central Asian R1b samples belonged to a secure haplotype (Data Sink > R1b Results). trk6 diverged greatest, albeit with two 1-step mutations on DYS393 and DYS434. The rest match this haplotype exactly or have single 1-step mutations. [1] When this Central Asian R1b haplotype is compared with the other Caucasian samples, a mixed picture emerges, with the poorest being an Armenian (arm47) at 8/16, whereas the best are another Armenian (arm12) and Azeri (az48), both at 15/16. [1]
One interesting point is the Kurds sampled in this study (some of whom also belong to R1b-M269) are actually the displaced population positioned on the Iranian-Turkmenistani border. All of whom match the Central Asian R1b haplotype with a similar value (12-13/16). This definitively rules out the Kurds as a source for the haplotype, particularly as better matches can be found further to the west. It should be noted the Kurds themselves formed their own R1b haplotype (defined here by DYS389II=27, DYS391=10). [1]

In summary, the data reveals that the Turkmen are particularly abundant in R1b-M269 and all belong to the same haplotype as one of the Uzbek samples. This haplotype matched some Caucasians very well, but others not so well. The Kurds living in Turkmenistan belonged to their own haplotype.

3ii. Is This Actually R1b-M269?

Attention must first be shown to the original paper again; any potential R1b-M269 here will be present as P(xR1a)-92R7 (shown in the paper as "Haplogroup 1"). [1] Evidently, this makes up approximately half of the Turkmen lines and a quarter of Uzbek ones. Other haplogroups (such as other forms of R1b, R2a-M124, various Q subclades) presumably make up the rest of "Haplogroup 1" shown.

The next step is to verify whether or not this Central Asian R1b haplotype matches other R1b haplotypes online. As Y-DNA R1b-M269 is fortunately well-represented in the world of genetic genealogy, searching for the haplotype's matches on ySearch is a reasonable enterprise. DYS437 had to be excluded here due to a conversion issue, leaving the haplotype at 15 STR's. A genetic distance (GD) of 3 was allowed on these 15 markers. Results are shown on the right.

ySearch results for Central Asian R1b haplotype

With some confidence, the search has demonstrated that the Central Asian R1b haplotype does indeed belong to R1b-M269, as all the seven matches shown (one of whom is Armenian) belong to it.

Expanding the line of inquiry one further step came through comparing this haplotype with Iranian haplotypes [2] which were readily available. Due to differences in STR panels (an overlap of only 11) this proved to be inconclusive, aside from the observation that DYS389i+ii was completely different between the Central Asian modal (10-26) and the Iranian values. At this point I suspect that, much like DYS437, there is a conversion issue with DYS389 also.

Finally, a comparison was made with the R1b found in Afghanistan last year [3]. Interestingly, if DYS389i+ii and DYS437 are excluded, the two Uzbeks (samples 35 and 181) match the Central Asian R1b haplotype almost exactly based on the remaining 11 STR's. The one Tajik (sample 32) is less likely to be related due to two 1-step mutations on different STR's.

4. Conclusion

The inferences made from the data hang by a metaphorical thread due to the persistent STR issue; different labs have used different panels in the past decade, making it excruciatingly difficult to use materials from older papers. Fortunately, the presence of a specific strain of R1b-M269 in Central Asian (in Turkmen and Uzbeks) has successfully been demonstrated after select exclusions and no modifications to the data.

However, some larger questions remain. If STR limitations were not an issue, how would the Iranians from Haber et al. have compared? Would the Tajik from the other Haber et al. paper have belonged to the same haplotype in the end?

The origin of this Central Asian R1b haplotype will, I anticipate, also be a point discussed heavily among interested parties. At this point in time, I must stress that none of the evidence thus far points to anything in particular without ruling other theories out, although it leaves the door for interpretation wide open.

Having given this cautionary statement, the main thrust of this entry should be emphasised; R1b-M269 in Central Asia is a confirmed reality and here to stay. I will defer any subsequent analyses to the experts on Y-DNA R1b which grace several genetic genealogy boards for their take on the flavour of this haplotype.

5. Acknowledgement

I publicly extend my gratitude to Dr. Tyler-Smith for being so kind in sending me the raw STR's from this important paper for my research, as well as co-authoring the other two excellent studies I have cited here and in the past.

6. References

1. Zerjal T, Wells RS, Yuldasheva N, Ruzibakiev R, Tyler-Smith C. A genetic landscape reshaped by recent events: Y-chromosomal insights into central Asia. Am J Hum Genet. 2002 Sep;71(3):466-82. Epub 2002 Jul 17.

2. Haber M, Platt DE, Badro DA, Xue Y, El-Sibai M, Bonab MA. Influences of history, geography, and religion on genetic structure: the Maronites in Lebanon. Eur J Hum Genet. 2011 Mar;19(3):334-40. doi: 10.1038/ejhg.2010.177. Epub 2010 Dec 1.

3. Haber M, Platt DE, Ashrafian Bonab M, Youhanna SC, Soria-Hernanz DF, Martínez-Cruz B. Afghanistan's ethnic groups share a Y-chromosomal heritage structured by historical events. PLoS One. 2012;7(3):e34288. doi: 10.1371/journal.pone.0034288. Epub 2012 Mar 28.

Y-DNA Haplogroup N in India: Wayward Uralics or Lab Error? [Original Work]

2013-03-26T13:46:00.001-07:00

Introduction

Y-DNA Haplogroup N Eurasian Distribution

Per ISOGG's 2013 SNP tree and as has been the case for years, Y-DNA Haplogroup N is defined by the M231 mutation (G->A at rs9341278) on the Y-Chromosome. With a predominantly North Eurasian distribution, it peaks in Europe among the Finnish people and various ethnic groups residing in Russia's far north through the N1c-Tat subclade. N1c-Tat specifically is frequently associated with Uralic-speaking populations in the literature.

Haplogroup N also appears to have an association with Central Asia as shown in the N Y-DNA Haplogroup Project (FTDNA) results, with several samples coming in from Kazakhstan, Uzbekistan and Mongolia. It has also been observed in Turkey (KurdishDNA blog entry) as well as appearing in 1.6% of Iran's Azeri population (Grugni et al. entry).

The finding of Haplogroup N in India through Sharma et al.'s The Indian origin of paternal haplogroup R1a1* substantiates the autochthonous origin of Brahmins and the caste system [1] is a curious one. Unfortunately, the paper did not include any Y-STR material to help understand the basis of N's presence in India.

Significance of Potential Haplogroup N in India

Linguistics provides us with a plausible scenario regarding how Haplogroup N may have arrived in the Indian Subcontinent. Contacts between early Finno-Ugric and Indo-Iranian groups took place around the Ural mountains, specifically between the forest and steppe zones. Evidence of transmission in horsekeeping techniques, economy, deities and common words are firmly established from Andronovo archaeological horizon on the steppes into the "Andronovoid" societies living in the nearby forests. [2]

The presence of Haplogroup N in India, if present in relevant populations and displaying MRCA values or STR clusters consistent with a Neolithic origin further north, would satisfy the likelihood of Haplogroup N representing an accompanying genetic signal from the steppe zone roughly four thousand years ago, as well as serving as a genetic remnant of the interactions that undoubtedly took place between Indo-Iranian and Finno-Ugric tribes.

Current Findings

In 2009, Sharma et al. published a paper highlighting the Y-Chromosome haplogroup differences between various upper caste (Brahmin) and tribal populations across India. The paper went on to deduce that Haplogroup R1a1a in India was autochthonous in origin based on their findings [1] (now disputable and improbable based on Underhill et al.'s landmark study on Y-DNA R1a1a and recent findings by the R1a Subclades FTDNA Project, although this topic is beyond the scope of this entry).

It was this very paper by Sharma et al. which revealed the presence of Y-DNA N in India. Haplogroup N1-LLY22g was found in Brahmins from Gujarat, Madhya Pradesh and Mahastra (3.13%, 2.38% and 3.33% respectively), as well as tribal populations from Uttar Pradesh (1.56%). Their results were extended to include greater caste differentiation (Brahmins vs. Scheduled Castes vs. Tribals); here, Brahmins were found to have five times greater the frequency of N1-LLY22g than tribal groups (0.5% vs. 0.1% respectively). [1]

Although the frequencies were arguably insignificant, the inference stood - Y-DNA Haplogroup N showed an association with the upper caste practitioners of Hinduism in India, paving the way for the scenario described in the above chapter to be considered.

However, the strength of this conclusion is weakened greatly by cross-sectional data from numerous studies concerning the Indian Subcontinent produced in the past decade:

Sengupta et al. (2006) revealed that, out of 1090 samples, with the majority coming from the Indian Subcontinent, the only populations revealing any Haplogroup N (N-M231) and associated downstream subclades were either East Asian (Chinese ethnicities, Cambodian) or Siberian (Yakut). [3] No groups from India belonged to Haplogroup N-M231.
Furthermore, Sahoo et al. (2006) also sampled individuals from across the Indian Subcontinent (n=1074) and failed to find a single instance of N-M231. [4]
In a recent study on various populations in Tamil Nadu (South India), Haplogroup N was completely absent in the 1680 samples tested. [5]
Y-DNA N1c-Tat was absent in the 607 tribal populations tested from East and Northeast India. [6]
Returning to the north of the country, 560 men from various upper castes and Muslim groups were tested by Zhao et al. and N1c-Tat was absent from all. [7]
Focused specifically on Brahmins from Saraswat (Jammu-Kashmir), Yadav et al. found none of the approximately 109 haplotypes to belong to any derivative of Haplogroup N. [8]

Finally, the N Y-DNA Haplogroup Project at FTDNA currently does not show any samples whatsoever from the Indian Subcontinent.

Possible Explanation

Despite over 4,000 samples over five studies representing various groups from across India, not a single trace of Haplogroup N has been detected. What explains this glaring discrepancy with Sharma et al.'s findings? Differences in sampling strategy between the other studies with Sharma et al. cannot account for this; there is enough regional overlap to rule this out.

As was the case with Sengupta et al. where several Hazara haplogroup classifications were allegedly due to a laboratory error, it is probable the Haplogroup N seen here follows the same suit. By reasonable deduction, if one study reveals a trend that several others covering thousands of samples cannot verify, there must be something intrinsically erroneous in the former.

Conclusion

Until I can physically view the purported Haplogroup N haplotypes reported in Sharma et al., it is the conclusion of this entry that they are most likely the result of a laboratory error given the complete absence of any flavour of N-M231 in India through other recent studies. If any Haplogroup N is found, it must be contrasted against Sharma et al. and should be investigated on a separate line of inquiry. As ever, details of any future cases of Haplogroup N in India should be taken into consideration. If of a Mughal background, the paternal origins are readily explained by Medieval Central Asian ancestry. If from the furthest northeast of the Indian Subcontinent, the possibility of Nepali ancestry should be sought. [9] Although prehistoric indirect influence from Finno-Ugric interactions in the second millennium BC onwards shouldn't be dismissed outright, other more recent explanations exist.

References

1. Sharma S, Rai E, Sharma P, Jena M, Singh S, Darvishi K. The Indian origin of paternal haplogroup R1a1* substantiates the autochthonous origin of Brahmins and the caste system. J Hum Genet. 2009 Jan;54(1):47-55. doi: 10.1038/jhg.2008.2. Epub 2009 Jan 9.

2. Kuz'mina EE. The Origin of the Indo-Iranians. Koninklijke Brill NV, Leiden, The Netherlands. 2007.

3. Sengupta S, Zhivotovsky LA, King R, Mehdi SQ, Edmonds CA, Chow CE. Polarity and temporality of high-resolution y-chromosome distributions in India identify both indigenous and exogenous expansions and reveal minor genetic influence of Central Asian pastoralists. Am J Hum Genet. 2006 Feb;78(2):202-21. Epub 2005 Dec 16.

4. Sahoo S, Singh A, Himabindu G, Banerjee J, Sitalaximi T, Gaikwad S. A prehistory of Indian Y chromosomes: evaluating demic diffusion scenarios. Proc Natl Acad Sci U S A. 2006 Jan 24;103(4):843-8. Epub 2006 Jan 13.

5. Arunkumar G, Soria-Hernanz DF, Kavitha VJ, Arun VS, Syama A, Ashokan KS. Population differentiation of southern Indian male lineages correlates with agricultural expansions predating the caste system. PLoS One. 2012;7(11):e50269. doi: 10.1371/journal.pone.0050269. Epub 2012 Nov 28.

6. Borkar M, Ahmad F, Khan F, Agrawal S. Paleolithic spread of Y-chromosomal lineage of tribes in eastern and northeastern India. Ann Hum Biol. 2011 Nov;38(6):736-46. doi: 10.3109/03014460.2011.617389. Epub 2011 Oct 6.

7. Zhao Z, Khan F, Borkar M, Herrera R, Agrawal S. Presence of three different paternal lineages among North Indians: a study of 560 Y chromosomes. Ann Hum Biol. 2009 Jan-Feb;36(1):46-59. doi: 10.1080/03014460802558522.

8. Yadav B, Raina A, Dogra TD. Genetic polymorphisms for 17 Y-chromosomal STR haplotypes in Jammu and Kashmir Saraswat Brahmin population. Leg Med (Tokyo). 2010 Sep;12(5):249-55. doi: 10.1016/j.legalmed.2010.05.003.

9. Gayden T, Chennakrishnaiah S, La Salvia J, Jimenez S, Regueiro M, Maloney T. Y-STR diversity in the Himalayas. Int J Legal Med. 2011 May;125(3):367-75. doi: 10.1007/s00414-010-0485-x. Epub 2010 Jul 21.

Yaghnobi Tajiks: Preliminary Results May Reveal Iranian Plateau Affinity [Original Work]

2012-12-22T17:36:00.001-08:00

Slipping under the radar of the genetic genealogy world is this paper by Elisabetta Cilli and her colleagues, which investigated the mitochondrial data of 62 individuals from Tajikistan's Yaghnobi population. [1]

The Yaghnobis are of interest given their geographical isolation and the East Iranic nature of their language. Living just northeast of the predominantly Persian (Dari) speaking capital, Dushanbe, Yaghnobi is a continuation of a fully agglutinative Soghdian dialect representing the sole survivor of this language following the Persianization of Central Asia in Medieval times [2]. Despite its' East Iranic vocabulary, Yaghnobi demonstrates several linguistic features (i.e. gender loss, past imperfective preservation from present stem of a verb) which separates it from those modern East Iranic languages immediately surrounding it. Furthering the uniqueness of the Yaghnobi language in this context is the unity it forms through these features with languages mostly spoken further west in the Iranian plateau (e.g. Persian, Gilaki, Kurdish dialects). [2]

Although the results are preliminary and lack any empirical data, Cilli et al. have discovered some interesting connections between the Yaghnobi and relevant populations. In summary, they found the following:

MDS Plot of Results

42 individuals used for the preliminary work belonged to only 19 distinct mtDNA haplotypes. Of these, 11 were distinct among the Yaghnobi.
The Yaghnobi have less mtDNA genetic diversity than other Central Asian populations (0.930) and this is attributed to their geographical isolation and recent history of displacement by the U.S.S.R. in the 1970's for agricultural purposes, where a small group (300) returned and repopulated their original homelands.
Intriguingly, the Yaghnobi shared all of the mutual haplotypes (8/19) with populations from Iran (e.g. Gilakis, Mazandaranis and Iranians from Tehran and Esfahan) instead of other Central Asian groups, including their Tajik compatriots.
The Yaghnobi shared most of these mutual haplotypes with Gilakis, Kurmanji Kurds and Avars from the Caucasus (4 each).
However, owing to their predominantly distinct mtDNA character, the Yaghnobi are clear outliers from the general zone occupied by the reference groups.

My critique and interpretation of these results are as follows:

At least two instances of genetic drift occurring (founder effect via geographic isolation, bottleneck due to Soviet relocation) is likely responsible for the decreased mtDNA diversity. Thus, it is clearly simply a reflection of their environment.
As a result of the Soviet relocation, it may be useful to determine whether results from the displaced parent population match what has been stated here. This is quite possible given the relocations occurred just over one generation ago (~40 years).
It is difficult to criticise the decision to test 62 individuals and the utilisation of 42 haplotypes, given the Yaghnobi population in their homeland between 2007-9 only numbered approximately 500. Approximately 8% of the entire Yaghnobi population was therefore analysed here, which is a generous frequency given the amount of attention the region has received.
The MDS plot would have benefited from the inclusion of populations in Europe, Southwest Asia and South Asia to comprehensively flesh out the position of Yaghnobis in Eurasia.
Accepting that this is a preliminary investigation, it would still have been pleasing to see some raw data published. Aside from confirming that some/one Yaghnobi matched the Cambridge Reference Sequence (CRS, thus Haplogroup H2a2a which happened to be found in all the populations tested), there is no indication as to what the other mutations looked like. Or, for that matter, what mtDNA haplogroups were even present!

Correlation with Y-Chromosomal Data?

The Yaghnobi have been studied at least one other time through their inclusion in Dr. Spencer Wells et al.'s seminal piece The Eurasian heartland: a continental perspective on Y-chromosome diversity. The breakdown of their Y-Chromosomal SNP data (n=31) is as follows: [3]

3% C-M130(xC3a3-M48)

32% J2-M172

Y-SNP clustering reveals Yaghnobis sit near SE Europe and the Near-East

3% K-M9(xO-M175, O3-M122, O1a-M119, O2a1-M95, N1c1-M46) (possibly parahaplogroup such as K*-M9)
10% L-M20
3% P-M45 (xQ1a1-M120, Q1a3a1-M3, R2a-M124)
32% R1-M173 (likely R1b1a1-M73 or R1b1a2-M269)
16% R1a1a-M17(xR1a-M87, private marker)

Despite the double genetic drift undoubtedly affecting the frequencies, it is worth pointing out that the Yaghnobi presented with a broadly similar Y-DNA spectrum as Iran, where J2-M172, L-M20, R1-M173 and R1a1a-M17 (including subclades) comprise approximately 53% of the national average (refer to Grugni et al. analysis).

This comparison should be taken with a grain of salt given the Iranian national average also comprises non-Iranic-speaking ethnic groups, the Wells Yaghnobi data does not present with thorough downstream Y-SNP evidence, the sample size is contentious and at least two contributors of a founder effect exist. However, that the Yaghnobi appear rich in J2, L and R is certainly reminiscent of Iranic-speaking populations in the region.

Conclusions

The Yaghnobi are an exceedingly interesting population whose overall parental markers seem to support a connection with populations further west than one would anticipate.

Despite the misgivings of all the data concerning them to date, the mtDNA similarity does corroborate specific linguistic features between the Yaghnobi language with those in the Iranian plateau, such as Kurdish or Persian.

If the data holds up in future investigations, it certainly calls to question whether the proposed model of linguistic inheritance exclusively down the parental line (as represented by Y-DNA data) is entirely correct given this connection.

How the Yaghnobi came to display the markers within them whilst speaking an East Iranic dialect with traits akin to those found in West Iranic languages is an intriguing question. One possible scenario is that the Yaghnobi are partly descended from ancient Iranians from the Iranian plateau during the Achaemanid era. This would also account for the linguistic commonalities noted in current literature.

Time (with the assistance of more mtDNA, Y-DNA and auDNA) will help us understand what happened in Central Asia during the formative period that was the Indo-Iranian migrations.

Reference

1. Cilli E, Delaini P, Costazza B, Giacomello L, Panaino A, Gruppioni G. Ethno-anthropological and genetic study of the Yaghnobis;an isolated community in Central Asia. A preliminary study. J Anthropol Sci. 2011;89:189-94.

2. Windfuhr, G. The Iranian Languages. 1st ed. Routledge Language Family Series. 2009.

3. Wells RS, Yuldasheva N, Ruzibakiev R, Underhill PA, Evseeva I, Blue-Smith J. The Eurasian heartland: a continental perspective on Y-chromosome diversity. Proc Natl Acad Sci U S A. 28;98:10244-9. 2001.

Introducing the ACD Tool [Original Work]

2012-08-19T04:55:00.000-07:00

It is with satisfaction I announce the release of my first ever population genetics spreadsheet for fellow researchers. The Ancestral Component Dissection (ACD) Tool is a piece freeware I have developed to give those with a similar knack for fiddling with ADMIXTURE, Y-SNP and mtDNA frequency data better means to flesh out inter-population differences.

ACDTool (v1.0)

How Does The ACD Tool Work?

The ACD Tool relies on the frequencies of "ancestral components", a general catch-all term for uniparental markers (Y-SNP's, mtDNA) and Autosomal DNA (auDNA). These form the mainstay of much of the work that has been done in population genetics for the past few decades. The advent of "genome blogger" projects has brought the immediacy of these techniques to those who have tested with personal genetics companies, such as Family Tree DNA (FTDNA) and 23andMe. The ACD Tool should therefore be considered a supplementary item by those interested in these results, as well as data procured from current literature.

The level of commonality that occurs between many populations and ethnic groups poses a problem for those interested in investigating what differences arise between them.

To solve this, the ACD Tool works by removing mutual shared component frequencies between sample averages within a region. The idea is to lessen the amount of regional similarity and intentionally exaggerate those differences that exist between neighbours.

This is achieved by removing congruent component values across all populations (using the lowest value as a benchmark), leaving only the differences behind.

What Experiments Are Ideal?

As the ACD Tool is intended for finer inter-population analysis, it is best applied in a regional context. It serves the purpose of better revealing genetic differences which may account for linguistic or micro-regional trends.

Example #1: Northeast Europeans (Dodecad)

Once the Polish, Russian and Finnish Dodecad cohort averages were run through the ACD Tool, I simply used Excel to create the charts. The "Before-After" feature is used to highlight that the tool has completely achieved its' desired goal in amplifying the genetic differences between them:

NE European auDNA (Dodecad) through the ACD Tool

Example #2: West Asians (Harappa)
Using the Harappa Ancestry Project this time, I ran the data of Armenians, Assyrians, Kurds and Iranians (mostly from the Harappa cohort) into the ACD Tool once more and presented the differences as above:

W Asian auDNA (Harappa) through the ACD Tool

Example #3: South-Central Asians (Eurogenes)
A final example pits Pathans, Jatts, the Burusho, Balochis and Brahuis against one another:

SC Asian auDNA (Eurogenes) through the ACD Tool

Are There Any Drawbacks?
The efficacy of the ACD Tool depends on the number of populations, cohort size and cohort specificity. As the examples above show, the level of inter-population component sharing may decrease greatly if groups that are from more genetically diverse regions are compared.

In addition, using the ACD Tool on populations that are too different (i.e. Han Chinese and Yoruba) will not work given the genetic overlap through either ADMIXTURE, Y-SNP's or mtDNA is negligible. Of course, this defeats the point of the tool in the first place.

Lastly, the tool requires Macros to be enabled for the instructions to work.

Disclaimer

The ACD Tool is an open-source free-to-use spreadsheet. Those wishing to modify the spreadsheet for their personal use are welcome to do so. However, any modifications made to the ACD Tool with the intent of subsequent redistribution are kindly asked to contact the creator (myself) before doing so out of common courtesy.

Please also note the ACD Tool is a first attempt at giving back to the genealogy world I have been a part of for several years. Though functional (as shown above), it is not without bugs. In light of this, I am not responsible for any loss of data that may occur from its' use.

Finally, I hope the genealogy world finds some use for this nifty piece of kit.

Download ACDTool v1.1 (Sendspace)

Acknowledgements

To the Dodecad Ancestry Project, Harappa Ancestry Project and Eurogenes Genetic Ancestry Project (auDNA used in Examples).

Addentum I [20/08/2012]: ACDTool v1.1 replaces v1.0, Macros smoothened and instructions refined. Eurogenes South-Central Asian example also added.

West Asian Y-DNA Haplogroup Q - Turkish or Autochthonous Origins? [Original Work]

2012-08-04T13:15:00.002-07:00

Genographic Project Y-DNA Q Migration Route

Introduction
Y-DNA Haplogroup Q is defined by the M242 marker and is upstream to Haplogroup P-M45, making it the sister Haplogroup of R-M207, which populates much of West Eurasia. According to the Genographic Project, Haplogroup Q-M242 is between 15-20,000 years old, with the location invariably being placed around North Eurasia.

The frequency of Haplogroup Q largely matches the migration path outlined in the maps shown opposite. However, the presence of haplogroup Q in more southwestern portions of Asia has sparked the curiosity of genealogists and observers alike. In current literature, the presence of Haplogroup Q1a2-M25 specifically in Iran is cited as "Central Asian" influence. [1]

In an attempt to conclusively uncover the origins of Haplogroup Q-M242 in West Asia, the Y-STR haplotype variation of West, Central and South Asian Q1a-MEH2 and Q1b-M378 are visualised and analysed with genealogical tools.

Method
The data for this investigation are gathered from various Family Tree DNA (FTDNA) projects and studies, [1,2,6-11] with the concise list shown in the References section below.

Only results presenting at least 16 Y-STR's were considered. Modifications were made as necessary on certain STR markers (particularly Y-GATA H4) to correct nomenclature differences. Urasin's YPredictor was used when Y-SNP information from studies were inadequate (e.g. no SNP's upstream of Q-M242 tested).

Samples follow a constant naming convention, with _n and _yQP_n suffixes indicating they were obtained from studies and FTDNA Projects respectively. The following populations were included;

FTDNA Y-DNA Q Migration Route

Irn = Iranian (Unspecified ethnicity), Azr_Tal = Talysh from the Republic of Azerbaijan, Trk/Tur = Anatolian Turkish, Ptn = Pashtun from Afghanistan, Ind = Indian (Unspecified ethnicity/caste), Irq = Iraqi (Unspecified ethnicity), Kzk = Kazakh, Pak = Pakistani (Unspecified ethnicity), Uzb = Uzbek, Tjk = Tajik, Haz = Hazara, Npl = Nepali, Arm = Armenian, Geo = Georgian, UAE = Emirati Arab, Irn_Arab = Iranian Arab (Khuzestan), Irn_Mzn = Iranian Mazandarani (Mazandaran), Irn_Bkt = Iranian Bakhtiari

Once collation was complete, modal haplotypes of inferred clusters were found if necessary. Additionally, clusters were inferred from haplotrees that were created. The Most Recent Common Ancestor (tMRCA) of choice clusters were calculated by comparing two modals from the first pair of intra-cluster branches. Due to the STR panels tested in the concerned papers (Y-Filer order 1) McGee's Y-Utility was the only immediately viable choice (infinite allele mutation model, 75% Probability, 25 year/generation).

Working Hypothesis
An indeterminable mix of recent (>1500ybp) and prehistoric Y-DNA Q1a-MEH2 and Q1b-M378 lines exist in the region with some instances of close haplotype sharing between West, South and Central Asia.

Limitations Of This Investigation

Although the number of STR panels tested has increased gradually over the past decade, 16 is not considered a "confident sell" in the genealogy world.
Additionally, the difference in STR panels used meant some informative populations, such as the Makrani, Baloch, Burusho and Parsis of Pakistan were not included due to an overlap of only 12 STR's.
Y-STR's from several crucial populations, such as the Qashqai, Iraqi Turkoman and Azeri's from the Republic of Azerbaijan could not be found.
There is, of course, the great debate concerning STR mutation rates. At the time of writing I have not observed any clear consensus in the genealogy regarding this topic. The applicability of Nordtvedt's Generations series to this entry is minimal due to an STR overlap issue, hence the decision to use McGee's tool instead.
As discussed later, the number of Y-SNP's tested across the cited studies are insufficient to draw firm conclusions.
Finally, sample size is an issue. The dataset is dominated by Iranian or Afghan samples because these papers were released at times (i.e. 2008-present) where the 17 STR Y-Filer panels became mainstream.

Y-DNA Q1a Phylogenetic Tree

Haplogroup Q1a STR Results
Four informative clusters were inferred;

Cluster A (DYS19=15, DYS389i=12) is largely restricted to Afghan Pashtuns, with Ptn_1-4 all sharing having a MRCA with their modal (and therefore likely founding haplotype) between 900-450 ybp. This result is consistent with the dominance of Turkic-speaking dynasties in this time period.
Cluster B (DYS385a=14) has a large geographical spread from Turkey through to Iran, the United Arab Emirates, Afghanistan, Nepal and Kazakhstan. The most immediate observation is the close haplotype sharing (3-step mutation, 14/17) between Kzk_1 and Irn_4, with an estimated MRCA at 900 ybp. This result, together with the general area covered, again indicates this cluster should at the very least be broadly associated with Central Asian Turks.
Cluster C (DYS392=16, DYS389ii=28, DYS448=22) is interesting because its' members are exclusively Iranian and belong to Haber et al.'s Influences of history, geography, and religion on genetic structure: the Maronites in Lebanon. [2] Most of the Iranians bearing Haplogroup Q-M242 in their sample were from West Iran, where Iran's Azeri population happens to dominate the northern region. The regional exclusivity of this cluster combined with the very recent MRCA (900 ybp) lead me to suspect Haber and his associates sampled a locale in West Iran that underwent genetic drift, explaining the +10% Q-M242 that is otherwise not seen in other studies. [1] However, the MRCA too suggests these Iranian men's paternal ancestor was also associated with Medieval Turks despite the result in it's entirety not representing West Iran sufficiently.
Cluster D (DYS439=11, DYS437=15) mirrors Cluster B's distribution across the region but the divisions are more consistent with geography than other variables (i.e. Anatolian Turk and Armenian, Hazara together).

Haplogroup Q1b STR Results
Five informative clusters were inferred;

Y-DNA Q1b Phylogenetic Tree

Cluster A (DYS385a=12, DYS439=11, DYS437=15) is, relative to the others, an early offshoot that is highly localised in South-Central Asia.
Cluster B (DYS385a=14) is also localised, found specifically in Iraq and Iran.
Cluster C (DYS385a=14, DYS448=20) is twinned with B but appears to have a younger MRCA (925 ybp). Of interest is the wide geographic distribution across Turkey, Iran, India and Kazakhstan. Central Asian Turks once more provide a convenient historical narrative for both the predicted MRCA and spread.
Cluster D (DYS385a=15) is again geographically localised, this time in the greater Near-East (Turkey, Iran and Syria).
Cluster E (DYS385a=12, DYS437=15) once more displays geographic localisation in South-Central Asia, specifically among Afghani Pashtuns and a FTDNA Project Pakistani.

SNP's - What Do They Tell Us?
Tabulated Y-DNA Q SNP's for select populations from several studies [1, 3-5] can be viewed in the Vaêdhya Data Sink.

There is, unfortunately, a two-pronged incompatibility issue between the Y-STR analysis and Y-SNP's provided here. Not only is there poor overlap between the populations covered in both sets, but the SNP selections in the four studies cannot do not provide us with a clear picture regarding the presence of Q*-M242(xQ1a-MEH2,xQ1b-M378) Q1a*-MEH2(xQ1a2-M25), Q1a2-M25 and Q1b-M378.

However, the distribution of Q1a3-M346 and Q1b-M378 across the Iranian plateau in contrast with the specificity of Q1a2-M25 in Azeri Iranians and Turkmen (1.6% and 42.6% respectively, although the latter is likely due to genetic drift as discussed here) suggests a strain of the first two lineages is linguistically neutral and preceded the millennia of Turkish dynastic dominance in Iran.

Fortunately, such an inference is indeed supported by the Q1a and Q1b phylogenetic trees shown in this entry. One will note (particularly with Q1b-M378) the distribution is largely geographical rather than covering large swathes of Asian land through a "recent" paternal ancestor.

A comment on Assyrian Q-M242
Although the number of STR markers tested do not allow their inclusion into this research piece, I took the liberty of comparing the sole Assyrian Y-DNA Haplogroup Q-M242 individual from the FTDNA Assyrian Heritage DNA Project to elaborate on their paternal ancestor's ultimate origins.

The Assyrian people are a Neo-Aramaic-speaking ethnic minority native to the land intersecting between Turkey, Iran and Iraq as well as the Mesopotamian basin. Modern Assyrians have (due to their Christian faith and recent historical events) practiced endogamous relationships, making them a genetically distinct group minimally affected by demic movements in the surrounding populations.

The Assyrian Y-DNA Q belongs to the Q1b1a-L245 subclade. As we have observed already, haplogroup Q1b-M378 tends to have a distribution governed more by geography with deeper cluster branches, implying greater diversification time in a given region.

At present, based on the available 10 overlapping STR's, the Assyrian Q1b1a-L245 individual matches Tur_yQP_3 best with a one-step mutation (9/10), placing them deep within Cluster C, the only one without a region-specific distribution. This preliminary evaluation indicates this Assyrian man's paternal ancestor shares Medieval genetic links with Anatolian Turkish, Iranian, Indian and Kazakh men, making a Central Asian Turkish connection likely once more.

Conclusion
Due to the limitations described above, the identification of clusters is more relevant based on their geographic spread. The MRCA calculations shown are simply an extremely rough estimate at the age of a cluster.

However (and fortunately once more), it is very clear that some clusters are determined by geography rather than the sort of "genealogical boon" observed in a few (e.g. Q1a Cluster C's extensive branching despite being young relative to the others).

If one takes the MRCA calculations as a very rough approximation, whilst considering a cluster's ability to supercede regional boundaries, one can estimate that 75.4% (40/53) of the Y-DNA Haplogroup Q1a-MEH2 and 31.4% (11/35) of Y-DNA Haplogroup Q1b-M378 in West, Central and South Asia can be attributed to the Turkish migrations.

In summary, Y-DNA Haplogroup Q1a-MEH2 (likely Q1a2-M25 based on anecdotal SNP evidence) is a convincing Medieval Central Asian Turkish genetic marker based specifically on its' ability to form multi-ethnic clusters in regions with a historical Turkish connection. Q1b-M378, on the other hand, generally displays enough regionalisation and cluster depth to make such an association doubtful at best, with the sole exception being those who belong to the a genetic group highlighted in this entry (Cluster C) with DYS385a=14 and DYS448=20.

South Central Asian Q1b-M378 appears to be autochthonous whereas any form of Q1a-MEH2 in the region has a strong association with regions intimately connected with the Medieval Turks. The Anatolian highlands and the Iranian plateau, however, appear to be a complicated mix between the two based on the lack of clear distinctions.

The slim presence of Haplogroup Q in India on the other hand, as far as the current data indicates, is almost entirely of Medieval Turkic input, although the Subcontinent's position as a geographic nexus (much like Iran and Turkey) certainly open the possibility for exotic para-haplogroups to also exist there.

Acknowledgement

Gratitude is extended to the FTDNA Projects for making their data publicly available. Independent research ventures such as my own would not be possible without their generosity.
I would also like to thank Mr. Paul Givargidze, administrator of the Assyrian Heritage, Aramaic and Y-DNA J1* DNA Projects at FTDNA for providing his esteemed support on this research entry.
The Y-DNA Haplogroup Q migration route maps are courtesy of the Genographic Project and FTDNA.

Addendum I [5/08/2012]: It has been brought to my attention that Tur_yQP_3, the Assyrian Q1b1a's best match, is in fact an Armenian individual. Although this does not compromise the conclusions reached above, it does serve as a reminder that not everyone in the Republic of Turkey is an ethnic Turk!
Addenum II [6/08/2012]: A recent exchange on a forum highlighted the likelihood of several Turk_yQP samples being Armenian rather than Anatolian Turkish. As above, the findings shouldn't impede too greatly on what has been discussed in this entry.

References
1. Grugni V, Battaglia V, Hooshiar Kashani B, Parolo S, Al-Zahery N, et al. (2012) Ancient Migratory Events in the Middle East: New Clues from the Y-Chromosome Variation of Modern Iranians. PLoS ONE 7(7): e41252. doi:10.1371/journal.pone.

2. Haber M, Platt DE, Badro DA, Xue Y, El-Sibai M, Bonab MA, Youhanna SC, Saade S, Soria-Hernanz DF, Royyuru A, Wells RS, Tyler-Smith C, Zalloua PA; Genographic Consortium. Influences of history, geography, and religion on genetic structure: the Maronites in Lebanon. Eur J Hum Genet. 2011 Mar;19(3):334-40. Epub 2010 Dec 1.

3. Al-Zahery N, Semino O, Benuzzi G, Magri C, Passarino G, Torroni A, Santachiara-Benerecetti AS. Y-chromosome and mtDNA polymorphisms in Iraq, a crossroad of the early human dispersal and of post-Neolithic migrations. Mol Phylogenet Evol. 2003 Sep;28(3):458-72.

4. Abu-Amero KK, Hellani A, González AM, Larruga JM, Cabrera VM, Underhill PA. Saudi Arabian Y-Chromosome diversity and its relationship with nearby regions. BMC Genet. 2009 Sep 22;10:59.

5. Cinnioğlu C, King R, Kivisild T, Kalfoğlu E, Atasoy S, Cavalleri GL, Lillie AS, Roseman CC, Lin AA, Prince K, Oefner PJ, Shen P, Semino O, Cavalli-Sforza LL, Underhill PA. Excavating Y-chromosome haplotype strata in Anatolia. Hum Genet. 2004 Jan;114(2):127-48. Epub 2003 Oct 29.

6. Gokcumen Ö, Gultekin T, Alakoc YD, Tug A, Gulec E, Schurr TG. Biological ancestries, kinship connections, and projected identities in four central Anatolian settlements: insights from culturally contextualized genetic anthropology. Am Anthropol. 2011;113(1):116-31.

7. Roewer L, Willuweit S, Stoneking M, Nasidze I. A Y-STR database of Iranian and Azerbaijanian minority populations. Forensic Sci Int Genet. 2009 Dec;4(1):e53-5. Epub 2009 Jun 5.

8. Dulik MC, Osipova LP, Schurr TG. Y-chromosome variation in Altaian Kazakhs reveals a common paternal gene pool for Kazakhs and the influence of Mongolian expansions. PLoS One. 2011 Mar 11;6(3):e17548.

9. Haber M, Platt DE, Ashrafian Bonab M, Youhanna SC, Soria-Hernanz DF, et al. (2012) Afghanistan's Ethnic Groups Share a Y-Chromosomal Heritage Structured by Historical Events. PLoS ONE 7(3): e34288. doi:10.1371/journal.pone.0034288

10. Tenzin Gayden, Alicia M. Cadenas, Maria Regueiro, Nanda B. Singh, Lev A. Zhivotovsky, Peter A. Underhill, Luigi L. Cavalli-Sforza, and Rene J. Herrera. The Himalayas as a Directional Barrier to Gene Flow. Am J Hum Genet. 2007 May; 80(5): 884–894.

11. Lacau H, Bukhari A, Gayden T, La Salvia J, Regueiro M, Stojkovic O, Herrera RJ. Y-STR profiling in two Afghanistan populations. Leg Med (Tokyo). 2011 Mar;13(2):103-8. Epub 2011 Jan 14.

Interpreting New Iranian Y-Chromosomal Data (Grugni et al.) [Review]

2012-07-19T22:12:00.000-07:00

Introduction

A new study on Iranian Y-Chromosomes released just yesterday has, to my satisfaction, adequately sampled every major ethno-linguistic group as well as determining inter-provincial variation between them. Grugni et al. sampled 938 unrelated Iranian men from 15 ethnic groups (including Assyrians, Zoroastrians and Turkmen) in 14 provinces across the country.

Abstract

"Knowledge of high resolution Y-chromosome haplogroup diversification within Iran provides important geographic context regarding the spread and compartmentalization of male lineages in the Middle East and southwestern Asia. At present, the Iranian population is characterized by an extraordinary mix of different ethnic groups speaking a variety of Indo-Iranian, Semitic and Turkic languages. Despite these features, only few studies have investigated the multiethnic components of the Iranian gene pool. In this survey 938 Iranian male DNAs belonging to 15 ethnic groups from 14 Iranian provinces were analyzed for 84 Y-chromosome biallelic markers and 10 STRs. The results show an autochthonous but non-homogeneous ancient background mainly composed by J2a sub-clades with different external contributions. The phylogeography of the main haplogroups allowed identifying post-glacial and Neolithic expansions toward western Eurasia but also recent movements towards the Iranian region from western Eurasia (R1b-L23), Central Asia (Q-M25), Asia Minor (J2a-M92) and southern Mesopotamia (J1-Page08). In spite of the presence of important geographic barriers (Zagros and Alborz mountain ranges, and the Dasht-e Kavir and Dash-e Lut deserts) which may have limited gene flow, AMOVA analysis revealed that language, in addition to geography, has played an important role in shaping the nowadays Iranian gene pool. Overall, this study provides a portrait of the Y-chromosomal variation in Iran, useful for depicting a more comprehensive history of the peoples of this area as well as for reconstructing ancient migration routes. In addition, our results evidence the important role of the Iranian plateau as source and recipient of gene flow between culturally and genetically distinct populations."

[PDF]

Interpretation of Results

Iranian Y-SNP Frequencies

Data from the original study can be found opposite. In addition, several contour maps showing the frequency of select Y-DNA Haplogroups found across the country are shown along the right. Armenians, Zoroastrians and Assyrians from Tehran, as well as Afro-Iranians from Hormozgan province, are excluded. Note that updated ISOGG nomenclature was applied wherever deemed appropriate (refer to SNP's for clarification of status). Frequency ranges shown on maps are from 0-100%. Please note the maps are only intended to depict general trends rather than specific figures. Refer to the figures from the study (above) for these.

- Consistent with anthropological data and historical records from South Iran, the Y-DNA Haplogroups with frequencies greater in Africa than Eurasia (B-M60 and E2-M75) peak in Hormozgan province.

- Over half a dozen para-Haplogroups (C*-M216, F*-M89, H*-M69, IJ*-M429, J2*-M172, L*-M61, NO*-LLY22g, Q1*-P36.2 and R*-M207) were found scattered across Iran. Although the presence of para-Haplogroups within a region are often taken as an indicator of a lineage's antiquity there, both their consistency and correspondence with downstream younger clades must be considered before such a conclusion is made. As such, I do not consider H*-M69, NO*-LLY22g or C*-M216's presence in this cohort to indicate anything other than Iran's position as a geographic crossroad. The remaining ones (particularly J2*-M172, L*-M61 and R*-M207) require further investigation to elucidate whether Iran does stake the claim to the origins of each.

- Further to the above, it is likely that the R*-M207 reported in this paper is in fact R2*-M479 based on the dated SNP array used.

- C5-M356 makes a sporadic appearance across Iran. A mysterious clade with a spotty distribution across much of Eurasia. In the region, it is more commonly associated with the Indian Subcontinent.

Iranian J1c3-PAGE08

- Haplogroup G makes a strong appearance with, in my opinion, enough clade diversity to validate an origin in Iran or a close-by region. This is partially supported by its' presence in every ethnic group, albeit through different subclades.

- Although IJ*-M429 has finally been found, Grugni et al.'s decision not to publish STR data does not give us the means to determine if the two Mazandarani and Persian men are in fact related within a genealogical timeframe. The significance of this find in Iran will have to remain pending.

- The lacklustre SNP definition in the Y-DNA I found in Iran (Gilaki, Bandari, Kurdish and Armenian populations between I1-M253 and I2-M438) dissuades strong conclusions regarding the development of I-M170 relative to IJ*-M429's discovery. The lack of STR's prevents us from ascertaining whether these are recent contributions from Europe or not, or whether there is any European connection to begin with.

- Both the frequency and subclade diversity of Haplogroup J2-M172 (as well as the presence of J2*-M172 and J2a*-M410 across the country) makes Iran a strong candidate for the origin of this lineage.

- The strong presence of J1c3-PAGE08 is one of the surprising finds of this study. With an absence only amongst Assyrians from Azarbaijan province and a peak in Khuzestani Arabs (31.6%), I speculate this is an early Near-Eastern pastoralist nomad marker that is only accentuated in Khuzestani Arabs because the L147.1 marker (J1c3d), which is commonly associated with the expansion of Semitic languages (particularly Arabic in literature) was not tested here. Otherwise, it would be difficult to reconcile medieval Arabic admixture among Iran's Zoroastrians being comparable (and often greater) than Azeris, for instance, as Azerbaijan hosted Arab garrisons following the Sassanid collapse.

- Haplogroup Q presents with a very distorted picture. 42.6% of Turkmens belonging to Q1a2-M25 is not in agreement with Wells et al.'s The Eurasian Heartland: A continental perspective on Y-chromosome diversity, where Haplogroups J, N, R1a and R1b predominated, suggesting either an extensive Founder effect has taken place (i.e. regionalisation of certain branches from a common Oghuz Turk pool) or the Golestani Turkmen values have experienced a more generic form of genetic drift.
On the matter of Turkic affinities, Azeri's from Azarbaijan province have greater subclade variation than all other ethnic groups. However, the total frequency is either comparable (or less) than Persians nationwide. As it stands, if one were to presume Haplogroup Q in Iran was of Turkic origins, it would appear their contribution to the Persian and Azeri genepools is comparable despite linguistic differences. Although more data would certainly flesh this matter out, this diversity combined with the presence of N-M216 among Iran's Azeri population certainly gives a genetic basis for their linguistic heritage.

- Haplogroup R1a1a-M17 is regularly found at frequencies greater than 15% across Iran, contrary to the assertion made by Dr. Wells one decade ago regarding the limited samples he obtained, again from The Eurasian Heartland: A continental perspective on Y-chromosome diversity ;

Iranian G2a-P15

"Intriguingly, the population of present-day Iran, speaking a major Indo-European language (Farsi), appears to have had little genetic influence from the M17-carrying Indo-Iranians."

It is somewhat ironic, however, to note that the Persians from Fars province presented one of the lowest R1a1a-M17 frequencies observed in this study. Whether sampling chance is an issue here, or the sparsity of M17 is indeed a reality, is an open question.

- The presence of both R1a1-SRY1532.2 (shown as R1a* due to old nomenclature) and R1b*-M343 repeat the presence of these para-Haplogroups in the region, indicating West Asia was from whence Haplogroup R1-M173 began differentiating into the two primary subclades we see today in Eurasia.

- Haplogroup R1b1a2a-L23 is more frequent in the north and west of the country, which (together with its' presence in the furthest southern and eastern poles at ~3%) suggests it likely moved in an overall south-easterly direction via diffusion, probably during the Neolithic.

- The distribution of Haplogroup R2a-M124 is, much like C5-M356, irregular. Contrary to what is shown in Haber et al.'s research, R2a is not more common in the east of the country. Instead, it can be found amongst Esfahani Persians at a frequency of 9.1%. That Iran's R2a frequency achieves its' peak in the centre of the country is reminiscent of Sahoo et al.'s A prehistory of Indian Y chromosomes: Evaluating demic diffusion scenarios;

The sensationalist question of the hour; what accounts for the spike in R2a-M124 that has been picked up in Central Iran for the past half decade?

- Finally, Haplogroup T-M70 enjoys a frequency of 10.1% amongst Assyrians from Azarbaijan province, whilst also being more common among Persians across the country and Iranians from the western periphery of the country (Azeris and Kurds). This would suggest, therefore, an at least passive but deep association with ancient Near-Eastern cultures.

Criticisms of Paper

Despite the rich sampling pool, I have several immediate criticisms;

Iranian J1-M267

There are some issues with the sampling strategy employed by this paper. For instance, the Assyrians (Christian non-Arab Semitic-speaking minority) are represented by 39 men, although Persians from Esfahan (a major Iranian city) are by 11 only.
Inadequate haplotype data has been released; the only offering is 8-STR's from select lineages (e.g. J1*-M267) which were used for variance analysis.
Furthermore, a maximum of 10 Y-STR's were analysed, rendering some of their variance calculations questionable at such a low resolution. This also does away with the possibility of MRCA and intra-subclade age calculations.
Grugni et al. have approached Haplogroup R1a1a-M17 in a similar vein to past studies (e.g. Haber et al., see Showcasing of Y-DNA Variation Among Afghan Ethnic Groups) by not referring to current data concerning the structure of R1a1a. As with Haber et al., R1a1a-M458 is taken as the "European" strain, despite research undertaken by the R1a1a and Subclades Y-DNA Project revealing the apparent schism between the upstream Z283 and Z93 SNP's being far more informative in this regard.
Haplogroup R1b1a2*-L23 is considered as a "West Eurasian" paternal contribution to the Iranian plateau rather than the possibility it may have originated within or in proximity to the country's western zone.
As shown in Interpretation of Results, Grugni et al.'s use of dated nomenclature poses problems for those who may not be intimately familiar with recent Y-SNP Tree changes by ISOGG.

Acknowledgements

Map of Iran courtesy of D-Maps.com.

The Secrets of Central Asia: Chapter II - The Nomads of West Siberia [Review]

2012-07-17T11:16:00.001-07:00

Introduction
Molodin et al. have conveniently released an exciting paper just days ago, revealing the convergence and possible origins of maternal lines in several West Siberian sites across different points of time.

The authors made the following conclusions based on the data they had gathered;

"We therefore consider the appearance of the Haplogroup T-lineage as the most likely genetic marker of the Andronovo migration wave to the region....
Apparently, the Andronovo group... assimilated the aboriginal... population, from which it obtained these East-Eurasian mtDNA haplogroups. Obviously, there was reciprocal genetic contact between the migrant and indigenous groups in the region.
...These [autochthonous] components were represented by the Eastern Eurasian haplogroups A, C and Z, and the Western Eurasian haplogroup U5a. On the other hand, the results also reveal some changes in the mtDNA pool structure throughout the Bronze Age. Some of these changes, which point to migration waves to the West Siberian forest steppe zone, are in agreement with the archaeological and anthropological evidence. The most relevant migration waves occurred during the Middle Bronze Age (represented by the migration of the Andronovo culture, probably marked by Haplogroup-T lineages) and the transition from the Bronze to the Iron Age (represented by the migration from the south, marked by the U1a, U3 and H haplogroup lineages)."

[PDF]

In this blog entry, these conclusions reached are scrutinised together with the deeper ancestral associations of these haplogroup lineages with modern (and other ancient) populations.

The Original Paper's Findings
A total of 92 ancient DNA (aDNA) haplotypes in the form of mitochondrial DNA (mtDNA) were retrieved from five sites stratified across seven distinct archaeological periods in a fixed portion of West Siberia known as the Baraba forest-steppe, lying between the network formed between the Irtysh and Ob rivers. These haplotypes were obtained from Hypervariable Region 1 (HVR1) of mtDNA and are included in the original study (shown as Table 3).

Sampling Sites in Babara Forest-Steppe

As no burial remains have been found dating to the Pleistocene (11th-12th millenium BC) in or around the Baraba forest-steppe, which is the earliest period where anatomically modern humans reached this region, the ultimate origins of the Early Bronze Age lineages are left open to interpretation. Nonetheless, below is a summary of each archaeological culture showcased in the paper, as well as relevant extracts from the literature. [1]

Ust-Tartas (4000-3000 B.C.)

The inhabitants of the earliest grave-containing Baraba prehistoric culture appeared to be Caucasoid-Mongoloid hybrids based on anthropological data whose distribution spanned the swathe of forest from Karelia and the Baltic through to the Ural region. Numerous Russian sources have previously described this concept as the Northern Eurasian Anthropological Formation (e.g. Bunak V.V.). Additionally, a comparison with the nearby Comb-pit Ware culture revealed enough anthropological similarities to suggest the individuals of Ust-Tartas were likely to be autochthonous and not recent migrants.

Extent of the N. Eurasian Anthropological Formation

Of the 18 mtDNA haplotypes retrieved, East Eurasian lineages (A, C, D, Z) comprised a slight majority (11/18). The authors noted "widely distributed root haplotypes" for Haplogroups C and D, which presumably indicates greater antiquity of both in the region. The two individuals belonging to haplogroup A "[represent] a subcluster that is apparently characteristic of West Siberia and the Volga-Ural Region". There was surprise at the presence of Haplogroup Z based on its' absence in modern inhabitants of West Siberians, a topic explored later in this entry.

The seven West Eurasian mtDNA Haplogroups belonged entirely to U, comprising of U2e, U4* and U5a1. The authors recalled the findings of several other recent studies on ancient DNA, stating it likely belonged to "Eastern, Central and Northern European hunter-gatherer groups". [1]

Besides affirming previous literature concerning the migration corridor between East Europe to East Asia, the haplotypes also complement the anthropological data concerning their status as Mongoloid-Caucasoid hybrids.

Odinovo (3000 B.C.) and Krotovo (Early, 2000 B.C.)

Both of these cultures, regardless of stage, represent a fairly linear continuity from the populations and traditions of the Ust-Tartas culture before them.

The Odinovo culture succeeds Ust-Tartas, although it is viewed as a synthesis between it and the Comb-Pit Ware archaeologically. Anthropological kinship between it and contemporary Baraba findings also confirm the autochthonous nature of Odinovo. However, it differs from its' antecedents in grave objects, funeral rites and the presence of bronze artefacts belonging to the Seima-Turbino cultural phenomenon, a short-lived (2200-1700 B.C.) but "striking" package of metallurgical goods originating around the Sayan-Altai region in South Siberia that was oriented westwards towards Europe. [2]

In turn, the Krotovo culture is partially derived from Odinovo, although it isn't without its' own influences from adjoining regions. As well as "strikingly different" funeral rites, [1] new archaeological features, including items fashioned out of chalcedony, jaspilite and enstatite, point toward interactions of some degree with the Petrovo culture found further south in Kazakhstan, where the nearest deposits of these materials lie. It is worth noting the physical type of the Krotovo people revealed no significant changes, remaining in-line with the previous autochthonous type.

A total of 16 mtDNA haplotypes were recovered from both Odinovo and the Early Krotovo stage. The spectrum of mtDNA Haplogroups remain unaltered from the Ust-Tartas samples, supporting the archaeological record of continuity.

The paper goes on to elaborate on the discrepancy between the mtDNA results and the archaeological features of Krotovo by stating "our data did not allow us to detect any Central Asian genetic influence". [1] Several possible explanations which may be considered;

New material items from Petrovo accompanied a male-mediated migration towards Krotovo, resulting in some level of cultural assimilation
In support of the above, the Petrovo culture natives may have themselves been a southward extension of the "Northern Eurasian Anthropological Formation" and belong to the same basic physical type as Ust-Tartas, Odinovo and Krotovo individuals further north, making any inter-culture interactions difficult to infer
Some mode of transmission between Krotovo and Petrovo took place (trade, "package diffusion")

Further information is needed to ascertain which is more probable, including (but not restricted to) Y-Chromosomal data from all concerned cultures for evidence of (dis)continuity between Odinovo and Krotovo through southern influence, as well as anthropological data from Petrovo to determine if they were indeed of the same basic physical type.

The summation of the evidence provided, however, indicates material items from further south were brought northwards into the Baraba forest-steppe after 2000 B.C., but these cultural changes do not reflect in the native maternal lineages, implying less overt processes (or male-mediated migration) were causative.

Krotovo (Late, 1750 B.C.) and Andronovo (1500 B.C.)

The next significant period of Baraban history comes with the arrival of semi-nomadic pastoralists whose origins lay further to the west. We are, of course, referring to the founders of the Andronovo archaeological complex, whose Indo-European language, culture and even ideology had eventually infiltrated deep into the Iranian plateau and Indian subcontinent through their utilisation of both horse and chariot. [3]

Schematic Tree of mtDNA Haplogroups Found

Within Baraba, despite the Krotovo population coexisting with these newcomers for a length of time (presumably due to their occupancy of different pastoralist niches), we see evidence of a shift from Seima-Turbino to Andronovo with regard to their material traditions. Andronovan dominance is also reflected in the eventual northward displacement of some Krotovo natives based on archaeological data. [1] However, cranioanalysis presents a more complicated picture; the presence of an "autochthonous Mongoloid" variant not typically seen in the Baraba steppe-forest, differing from the hybrid type seen for hundreds of years prior, may suggest the two were not in direct contact and Andronovan influence was exerted by proxy of other native groups who were displaced northwards and east following their assimilation. This is anecdotally supported by Keyser et al.'s discovery of one Andronovo male (specimen S07) from near Krasnoyarsk in South Siberia carrying Y-DNA Haplogroup C*. [4] It is worth stating the physical type of those from Andronovo are commonly described as "Variants of three proto-Europoid types " with minor Mongoloid. [1]

40 mtDNA haplotypes from Late Krotovo (1750 B.C) and Andronovo (1500 B.C.) sites and time periods were taken. As expected, the same spectrum of mixed West-East Eurasian lineages made an appearance, except for the strong introduction of one new Haplogroup.

In both Late Krotovo and Andronovo, Haplogroup T reaches a stable frequency of 15% in both despite being completely absent in 34 earlier haplotypes. The authors cite this as direct genetic evidence of Andronovan influence on Late Krotovo and postulate this lineage was, as a result, a major contingent in the Andronovo culture's spread.

All of these events precede the Irmen culture (1400-900 B.C.), the eventual successor to Andronovo. Those Irmen individuals found in the Baraba region were found to be predominantly Caucasoid and practiced a mixed economy of agriculture and animal husbandry. Only data from the Late stage (900-800 B.C.) was considered in the study.

Baraba (Late, 1000 B.C.)

The Late Baraba culture is a consequence of a Krotovo-modified Andronovo successor (known as Suzgan) interacting with the Irmen culture (described above). This was a particularly tumultuous period in West Siberian prehistory with tribes continuously coalescing unto one another, forming new identities in the process.

Anthropological data from the Late Baraba culture painted a far more diverse picture than over the previous three millennia. The authors noted that, contrary to the general insignificance of gender on physical type, the men were found to be more similar to a "Southern Eurasian Anthropological Formation", whereas females were closer to the Andronovan derivatives in North Kazakhstan.

Only five mtDNA haplotypes were recovered from this period. Haplogroups A and C once again were represented, as was U5b and T, indicating the previous assimilation events had been maintained uptil this point.

Irmen (Late, 900-800 B.C.)

From 1000 B.C. onwards, a complex set of migrations took place in West Siberia between the cultures formed by this point. Archaeologists attribute this to ecological changes involving climatic cooling across the region.

The last sampled site is the Late Irmen culture, which is a continuation of the Irmen culture proper described earlier in this entry. The intricate interactions between cultures of this period are evident through multi-plural settlements in the archaeological record here.

The final 14 mtDNA haplotypes were, unexpectedly, a complete departure from the partial continuity that we have seen since Ust-Tartas uptil Late Baraba. Almost all belonged to West Eurasian lineages, such as Haplogroups J, K and W. The study had suggested the ultimate origins of these lineages came from further south, in the vicinity of West Kazakhstan and West Central Asia (Turkmenistan and Uzbekistan likely implied). This suggestion will be assessed in detail later in this entry.

Confirmation of the 'Migration Corridor'?
It is remarkable to finally find genetic evidence of the migration corridor, an archaeological concept mentioned several times in Vaêdhya, firmly imprint it in such a definitive way (visit North European Component Variation within the Eurasian Heartland for additional information).

As it stands, we can now safely conclude that prehistoric hybridisation between hunter-gatherer Paleo-European populations and those from along the East across the Eurasian steppe. The crossover of both along opposing ends of this corridor has been supplemented with aDNA and anthropological evidence, with the finding of a near-equal hybrid population midway between the two poles all but confirming what the raw results have already revealed. Therefore, the connection between Northeast Europe and East Asia through the Eurasian steppe (even before Proto-Indo-European's formation) can no longer be considered a hypothesis, but a verified reality of demic prehistory. If supported with autosomal DNA (auDNA) from similar gravesites, it will drastically alter our perception of the migrations that happened afterwards, as well as doing away with over-simplified models of how certain languages and cultures permeated across Eurasia.

Afanasievo: Without a trail?

It is interesting to note that, despite covering over 3,000 years of prehistory, there is yet to be a trace of the Afanasievo culture, the earliest known offshoot of Yamnaya in the east, across this territory. Under the Eurasian steppe theory, the Afanasievo culture is connected with pastoral nomads who spoke an early (proto) form of the Tocharian branch, an extinct Centum Indo-European language which subverts the Centum:Satem isogloss in Eurasia. [5] The only attested connection between Afanasievo and the Baraba forest-steppe is through interactions between its' successor culture, the Karasuk, with the easternmost of the early Irmen. [1]

The question that persists is thus; where is the Afanasievo trail from Yamanaya through to the Urals and their final archaeological seat in South Siberia? Why have none of the Baraba forest-steppe cultures shown any indication of influence, be it cultural or anthropological, of Caucasoid pastoral nomads before those of Andronovo?

To arrive at one likely answer, Frachetti's Pastoralist Landscapes and Social Interaction in Bronze Age Eurasia clarifies the material culture and mode of living in Central Asia during the Bronze Age;

"The calibrated C14 dates of Afanas'evo material are generally slightly earlier than those taken from Yamnaya contexts in the western steppe, which complicates a diffusionist explanation of the emergence of pastoralists in the eastern steppe. Although their origins may be obscure, communities associated with Afanas'evo materials still represent the earliest mobile pastoralists east of the Ural Mountains... [their] incipient strategy of cattle and sheep/goat herding, supplemented by hunting and fishing.
The Afanas'evo subsistence economy might best be characterized as a mixed or transitional form between hunting/fishing and localized pastoralism, arising from local antecedents or combining native strategies with diffused domestic innovations among local populations.
...Perhaps the strongest evidence that divides the Yamnaya and Afanas'evo pastoralists in the mid-fourth millenium BCE is the discontinuity of pastoral economic strategies among societies living between these territories." [6]

If the Afanasievo culture was itself a combination of local hunting strategies and farming practices with their origins further west in the Yamnaya despite differing with contemporary societies above the Black and Caspian seas, one can postulate the Afanasievo people would have likely intermingled with native cultures in South Siberia whilst retaining their core pastoral attributes, and such an event would have occurred some time earlier.

The Afanasievo bearers needn't travel through the Baraba forest-steppe neither; the maps shown in Chernykh's The “Steppe Belt” of stockbreeding cultures in Eurasia during the Early Metal Age, for instance, show a straight trajectory from the Urals to the Sayan-Altai region out of clarity rather than a factual basis. Little is currently known about the journey taken by these nomads, but the findings of this paper do help in confirming the founders of Afanasievo did not stray along the northern rim of the forest-steppe towards South Siberia.

References
1. Molodin VI, Pilipenko AS, Romaschenko AG, Zhuravlev AA, Trapezov RO. Human migrations in the southern region of the West Siberian Plain during the Bronze Age: Archaeological, palaeogenetic and anthropological data. 2012. Retrieved from here: http://www.degruyter.com/dg/viewbookchapter.fullcontentlink:pdfeventlink/contentUri?t:ac=books$002f9783110266306$002f9783110266306.93$002f9783110266306.93.xml [Last Accessed 17th July 2012]

2. Chernykh E. The “Steppe Belt” of stockbreeding cultures in Eurasia during the Early Metal Age. Trabajos De Prehistoria. 2008;65:73-93.

3. Kuz'mina EE. The Origin of the Indo-Iranians. Koninklijke Brill NV, Leiden, The Netherlands. 2007.

4. Keyser C, Bouakaze C, Crubézy E, Nikolaev VG, Montagnon D. Ancient DNA provides new insights into the history of south Siberian Kurgan people. Hum Genet. 2009;126:395–410.

5. Anthony DW. The Horse, the Wheel, and Language: How Bronze-Age Riders from the Eurasian Steppes Shaped the Modern World. Princeton University Press. 2007.

6. Frachetti MD. Pastoralist Landscapes and Social Interaction in Bronze Age Eurasia. University of California Press, Ltd. 2008.

Worldwide Distribution of Dodecad K10a Components [Review]

2012-06-26T18:13:00.002-07:00

Numerous ADMIXTURE runs have been completed by the Dodecad Ancestry Project since its' inception approximately two years ago. The status of certain components remained tenuous despite subsequent runs, whilst others provided fairly stable values for the bulk of the project's participants.

With the completion of the latest K10a run, I have composed a series of geographically accurate frequency maps with the intention of effectively presenting the trends that can be seen through the raw data.

Method

Data; values from over 130 groups obtained through the Dodecad K10a Spreadsheet. Only groups with at least 5 participants considered. Composites of populations were taken where appropriate and denoted with _cmp. Labels shown otherwise identical to source. The O_Italian_D group was excluded because no information on their origins were found online.

Mapping; Dodecad participant populations allocated to national capitals. Exact location of reference populations obtained where possible (see Citations) however some allowances were made regarding those accompanied by scant information. Refer to the Data Sink for the population list, coordinates and commentary made during mapping process. No numerical data, aside from those shown for certain populations, was shown to minimise clutter and to remain faithful to the intention of this entry.

Population depiction; I deemed it necessary to separately consider the genetic structure of Jewish, Indian and expatriate/New World populations and exclude them from the rest of Europe, Asia or Africa. Including Jewish minorities with their gentile compatriots would render the maps uninformative. The complexity of India's demographics, particularly because of the caste system, makes frequency maps an improper choice for revealing inter-group genetic differences.

Results

Acknowledgement

The raw values used in this investigation are attributed to Dienekes Pontikos, author of the Dodecad Ancestry Project.

Addenum I [04/07/2012]: Inclusion of All Components Colourised map, shown below:

Citations

http://www.uvm.edu/~rsingle/stat295/F05/papers/Cavalli-Sforza-NRG-2005_Ceph-HGDP-CDP.pdf
http://www.1000genomes.org/about

http://www.sanger.ac.uk/resources/downloads/human/hapmap3.html
http://alfred.med.yale.edu/alfred/recordinfo.asp?condition=populations.pop_uid='PO000019K
http://genome.cshlp.org/content/19/11/2154.full
http://upload.wikimedia.org/wikipedia/commons/b/b0/Caucasus-ethnic_en.svg

Secrets of Central Asia: Chapter I - The Pokrovsk Man [Review]

2012-06-17T22:22:00.001-07:00

The first of a series focused entirely on ancient and prehistoric Central Asian ancient DNA (aDNA), this entry covers the furthering of an investigation into frozen remains found in a remote part of Siberian Russia.

Pokrovsk, Sakha Republic, Russia

Introduction
In 2006, Amory et al. tested bone fragments of a grave found near Pokrovsk, a locale the Russian federal republic of Sakha (Yakutia) with the intention of discerning the remain's origins. [1] Amory et al. briefly elaborate on the purported archaeological history of Siberia, where an autochthonous hunter-gatherer population was either subjugated or partially displaced by expanding Tungus-Manchurian nomadic tribes, before the movement of Yakut herdsmen northwards into their present demographic range as a result of Mongolian domination in the region between the sixth and thirteenth centuries. The Abstract of the paper below:

"The Yakuts, Middle Age Turkic speakers (15th–16th centuries), are widely accepted as the first settlers of the Altai-Baikal area in eastern Siberia. They are supposed to have introduced horses and developed metallurgy in this geographic area during the 15th or 16th century a.d. The analysis of the Siberian grave of Pokrovsk, recently discovered near the Lena River (61_29_ N) and dated by accelerator mass spectrometry from 2,400 to 2,200 years b.p., may provide new elements to test this hypothesis. The exceptional combination of various artifacts and the mitochondrial DNA data extracted from the bone remains of the Pokrovsk man might prove the existence of previous contacts between autochthonous hunters of Oriental Siberia and the nomadic horse breeders from the Altai-Baikal area (Mongolia and Buryatia). Indeed, the stone arrowhead and the harpoons relate this Pokrovsk man to the traditional hunters of the Taiga. Some artifacts made of horse bone and the pieces of armor, however, are related to the tribes of Mongolia and Buryatia of the Xiongnu period (3rd century b.c.). This affinity has been confirmed by the match of the mitochondrial haplotype of this subject with a woman of the Egyin Gol necropolis (Mongolia, 2nd/3rd century a.d.) as well as with two modern Buryats. This result allows us to postulate that contacts between southern steppe populations and Siberian tribes occurred before the 15th century."

[Link]

Grave Features
The Pokrovsk grave is located at the top of a glacial terrace near the Lena-Pokrovsk river junction. Radiocarbon dating places the site at approximately 2390-2190YBP. The physical type of The Pokrovsk Man was found to be gracile skeletally with a brachycephalic skull. The physical type was found to be Mongoloid, although the authors note it was "less accentuated" than that of Middle-Age Yakuts. It was also noted that the torus mandibularis, a normal variational bony protuberance located within the interior aspect of the mandible, was absent, despite it occurring commonly in East Asian and Native American populations. [2] Several material items were observed in the grave, including bone tools, harpoon heads, reindeer bone armour and flint arrowheads connected to archaic Siberian culture. However, other goods, including an iron arrowhead, are reputedly of South Siberian built. [1]

Methods
DNA extraction from bone by technique outlined in Keyser-Tracqui & Ludes’ Methods for the study of ancient DNA. [3] Autosomal DNA (auDNA) was retrieved from Profiler+ Multiplex kit (nine Short Tandem Repeat’s, or STR’s). Y-Chromosomal DNA (Y-DNA) tested using Powerplex Y System (eleven STR’s) as well as a Single-Nucleotide Polymorphism (SNP) on the TAT locus. Finally, a 421 base pair (bp) segment on the sample’s mitochondrial DNA (mtDNA) at the first hypervariable segment (HVS1) was tested (position 15989→16410) and compared with the Cambridge Reference Sequence (CRS).

Consensus data obtained directly from paper; auDNA analysis was achieved through popSTR, an online research processing engine which displays auDNA STR allele frequencies within different populations. [8]
mtDNA and Y-DNA analysis would have ideally been conducted through ySearch, mitosearch, the SMGF, supplementary data from relevant scientific literature as well as online DNA projects.

Allelic Frequencies

auDNA Analysis

Nine auDNA STR’s were retrieved from the Pokrovsk Man's remains. Unfortunately, the utilisation of STR's is questionable given they have a large margin of error and lack of population specificity due to the presence of multiple alleles within a single population, as well as heavy inter-population overlapping. This investigative tool has largely been made redundant by SNP testing, which employ thousands of markers rather than a few. Nonetheless, processing of these results will still be attempted.

The allelic frequencies per worldwide regional groups for the retrieved STR’s are shown opposite. All markers from the Profiler+ Multiplex were utilised in the subsequent popSTR search. The sample populations are largely derived from the HGDP-CEPH Human Genome Diversity Cell Line Panel. [4]

African frequencies of the Pokrovsk alleles are generally lower relative to Eurasian, Oceanian and American regional groups. This warrants the exclusion of such values from the analysis hereon due to their uninformative nature, apart from confirming the Pokrovsk Man had no recent African ancestry, which is in accordance with anthropological, historical and linguistic data from Siberia. Allele frequencies of remaining regions are shown in the Data Sink.

To elucidate the regional affinities of the Pokrovsk Man, averages for the alleles across the given regions were taken and ranked in order of descending magnitude (found again in the Data Sink).

The results indicate his affinity was greatest to the Americas, followed by East Asia and Europe (discussed later) in joint position, ending with the Middle-East and South-Central Asia. The discrepancy between the American and East Asian scores are explained by the East Asia regional group being constituted largely of ethnic groups from East Asia proper and Southeast Asia, such as the She, Naxi and Japanese. The Yakuts, who are the only sample population located in Siberia, are a part of this group, reducing the specificity further. However, the greater score to native American and East Asian populations than others is still consistent with both geographic position and the known demic expansions into of both regions.

The decreased allelic frequency average of South-Central Asians and Middle-Easterners with the Pokrovsk Man supports the above further. However, the Middle-Eastern group did not include populations from West Asia or the Caucasus, such as Anatolian Turks, Iranians or Georgians. Additionally, the lack of North-Central Asian ethnic groups such as the Kazakh, Tatars or Altaians may affect the results further.

It would have been preferable if auDNA SNP’s were obtained instead and compared with specific sample populations - Better yet if IBD segment analysis was also undertaken. SNP analysis could have been possible in 2006, given the HGDP-CEPH samples were made available at least four years prior, [4] which would have opened the door to analysis far deeper than the extent undertaken by Amory et al. or even this investigation.

The authors greatly limited the extent of their own investigation, noting the Pokrovsk Man showed identical matches with Buryats, West Siberians, Altaian Mansis, ancient and modern Yakuts, one Evenk and an Egyin Gol necropolis female [5] in their private haplotype database.

mtDNA Analysis
Of the ten loci tested, only three yielded consistent nucleotide variations (16223T-16362C-16368C). The mitosearch 1-step matches with a known maternal ancestor location were considered only (Data Sink). These results not only confirm Amory et al.'s conclusion the Pokrovsk Man belonged to mtDNA Haplogroup D, but the bulk of the distribution within Asia is expected based on modern samples. [6]

Unfortunately, once more, the scope of the initial investigation has hindered any further analysis, as the lack of testing regions beyond HVS1 cannot elucidate the extent of mitochondrial sharing outside of the data showcased here.

Y-DNA Analysis
None of the eleven Y-DNA STR's provided a successful return. The only SNP tested for was TAT, where a T→C mutation is considered equivalent to the M46 marker, which is defined as Haplogroup N1c under the current International Society of Genetic Genealogy (ISOGG) nomenclature. [7]

As the Pokrovsk Man yielded a T allele at this locus, his Y-DNA Haplogroup could not have been N1c-M46. However, this does not rule out him belonging to a lineage upstream of N1c-M46.

European Affinities & Conclusion
Despite the great limitations, several invaluable inferences can be made from the data presented in the furthering of Amory et al.'s Early influence of the steppe tribes in the peopling of Siberia which cannot be reasonably excluded as anomalous without also discarding conclusions made from other sources.

The auDNA results, though derived from STR data, fully agree with the SNP-based analysis of the Eurogenes Project by David W. in a previous run (described in an earlier Va êdhya entry), as modern Siberian populations show trace values of various European or Caucasian ADMIXTURE components at the least with an absence of Southwest or South Asian specific components, whilst being predominantly Siberian and East Asian.

The European affinity in this investigation coming third may form a convenient explanation for why the Pokrovsk Man's features were less Mongoloid anthropometrically than Middle-Age Yakuts. It may suggest a West Eurasian physical element existed prior to the tribal and political upheavals that resulted in the Yakut settlement deeper into this portion of Siberia. Although the origins of this element were not elaborated upon, there may also be a connection with the postulated "migration corridor" covered previously and described in Malyarchuk et al.'s On the Origin of Mongoloid Component in the Mitochondrial Gene Pool of Slavs. [10]

This result supplements the picture of a West Eurasian genetic component of ambiguous origins being brought towards Siberia, challenging one interpretation of West Eurasian physical influence in the region stopping abruptly at Lake Baikal. [9] Instead, the totality of the evidence presented raises the possibility of this influence extending itself beyond the lake and manifesting itself simply as a "reduction" of Mongoloid cranial characteristics, which the Pokrovsk Man demonstrated, whose anthropometric configuration may well have been an artefact of this.

Unfortunately, the mtDNA and Y-DNA results were far too non-specific to merit further analyses. Their generality, however, do pose several questions; what subtype of mtDNA Haplogroup D did the Pokrovsk Man belong to? If he was not Y-DNA Haplogroup N1c-M46, what was he?

The material goods found in the Pokrovsk Man's gravesite may point us in the direction of the orientation his apparent European affinities came from. As South Siberia was the source of his iron and horse-derived goods, could he also have inherited West Eurasian genes from there? Were the benefactors ancient, or prehistoric?

Acknowledgements
Pokrovsk map from WolframAlpha.

References
1. Amory S, Crubézy E, Keyser C, Alekseev AN, Ludes B. Early influence of the steppe tribes in the peopling of Siberia. Hum Biol. 2006;78:531-49.

2. Apinhasmit W, Jainkittivong A, Swasdison S. Torus Palatinus and Torus Mandibularis in a Thai population. ScienceAsia. 2002;28:105-111.

3. Keyser-Tracqui C, Ludes B. Methods for the study of ancient DNA. Meth. Mol. Biol. 2005;297:253–264.

4. Rosenberg NA. Standardized subsets of the HGDP-CEPH Human Genome Diversity Cell Line Panel, accounting for atypical and duplicated samples and pairs of close relatives. Ann Hum Genet. 2006;70:841-7.

5. Keyser-Tracqui C,Crubézy E, Ludes B. Nuclear and Mitochondrial DNA Analysis of a 2,000-Year-Old Necropolis in the Egyin Gol Valley of Mongolia. Am J Hum Genet. 2003;73:247–260.

6. Mishmar D, Ruiz-Pesini E, Golik P, Macaulay V, Clark AG. Natural selection shaped regional mtDNA variation in humans. Proc Natl Acad Sci. 2003;00:171-6.

7. Zerjal T, Dashnyam B, Pandya A, Kayser M, Roewer L. Genetic relationships of Asians and Northern Europeans, revealed by Y-chromosomal DNA analysis. Am J Hum Genet. 1997;60:1174–1183.

8. Amigo J, Phillips C, Salas T, Fernández Formoso L, Carracedo A. pop.STR - An online population frequency browser for established and new forensic STRs. Forensic Sci. Int. Gene. Suppl. 2009.

9. Mooder KP, Schurr TG, Bamforth FJ, Bazaliiski VI, Savel'ev NA. Population affinities of Neolithic Siberians: A snapshot from prehistoric Lake Baikal. Am J Phys Anthropol. 2006;129:349-61

10. Maliarchuk BA, Perkova MA, Derenko MV. Origin of the Mongoloid component in the mitochondrial gene pool of Slavs. Genetika. 2008;44:401-6.

North European Component Variation within the Eurasian Heartland [Original Work]

2012-03-31T05:35:00.005-07:00

As DNA variation across Asia have progressed over the years (Wells et al., Xing et al., teaser mtDNA results from Burger et al.'s upcoming analysis of prehistoric Eurasian steppe remains), the prevailing theme of ancestral markers with origins in Europe has remained a frequent one, particularly with regard to the expansion of Bronze Age semi-pastoral nomads from the Pontic-Caspian steppe bearing the Indo-European languages.

David W. of the Eurogenes Genetic Ancestry Project has recently posted data online from a new Intra-European run using ADMIXTURE (K=12) with the intention of breaking up the North European component that often arises through the program. Spreadsheet results here.

This brief investigation seeks to identify the North European-derived component patterns within Asia by first mapping out the frequencies and then correlating with Eurogenes' release notes on each.

Method
As many samples from immediately-identifiable populations were obtained from the spreadsheet results (link above). No sample restrictions were implemented. Averages of each population were calculated, except where n=1. No modifications made to population labels except for Eurogenes population averages, denoted by the addition of a _Eg suffix. Populations were then allocated into arbitrary regional groups, allowing results to be displayed more coherently.

Results
Tabulated results can be found in the Data Sink. Autosomal variation per Regional Group can be found below:

The North European-derived components, despite their exceptionally close Fst. distances relative to the other components, do seem to reveal a few interesting trends;

Northeast European appears to (at least partially) be the result of allele sharing with populations further east, as evidenced by its' predominance in East-Central Asian groups, as well as extending even further eastwards into the Siberian Selkup (n=1). This component has a circumstantial correlation with the craniometric and ancient mtDNA evidence suggestive of a "migration corridor" between Eastern Europe and Siberia (Malyarchuk et al.'s On the Origin of Mongoloid Component in the Mitochondrial Gene Pool of Slavs, Newton's Ancient Mitochondrial DNA From Pre-historic Southeastern Europe: The Presence of East Eurasian Haplogroups Provides Evidence of Interactions with South Siberians Across the Central Asian Steppe Belt). While it also explains this component's abundance in North Caucasian populations (lie en route between Ukraine and Siberia), the same cannot be said with absolute certainty of South-Central Asia. With that being said, the 0.021 Fst distance with West European despite the markedly different distributions suggests both are the result of prehistoric (possibly paleolithic?) hunter-gatherer migration paths across large swathes of Eurasia.
West European has a sporadic appearance across with an Asian peak in the North Caucasus. This implies - Staying true to its' assigned label - It is a generic West Eurasian component that has reached a maximum in Western Europe, with the North Caucasus representing the closest point of reference to there. Indeed, this inference is made independently by Eurogenes, albeit using different parameters;

"I used samples of Scottish, Irish and Western English ancestry to create this cluster. Not surprisingly, it peaks in individuals of Western Irish descent. However, it also peaks in Basques and many Iberians, which is fascinating, because that makes it the autosomal equivalent of Y-chromosome haplgroup R1b in Europe."

North Sea and South Baltic accompany one another at similar frequencies across much of Asia, especially in populations with an Indo-Iranian-speaking heritage (observe the ~0.8-1:1 ratio among Kurds, Iranians, the Turkmen, Uzbeks, Tajiks, Brahmins, Kshatriya's and Kyrgyz as examples of this). It is interesting to note that, of the two, only the North Sea component is readily present in East-Central Asians. The only other likely migration path along this trajectory is that of the proto-Tocharians, who (under the Eurasian steppe theory) split off from the Proto-Indo-European homeland several millennia prior to the Proto-Indo-Iranians that eventually formed the Andronovo archaeological horizon from Sintashta/Pit Grave (E Kuz'mina, The Origin of the Indo-Iranians, pg.451). Perhaps this near-solitary North Sea component within the Altaians, Mongolians and Uyghurs is attributed to early speakers of Tocharian? Perhaps the elevated presence of the North Sea component in South-Central Asia (Jatts, Pathans, Kyrgyz) is a relic of the Kushans, nomads supposedly a part of the Yuezhi confederacy, who may have been Tocharian speakers themselves?
One curious phenomenon is the similar West European-North Sea-Northeast European component proportions across the Turkmen, Uzbeks, Kyrgyz, Pathans, Uttar Pradesh Brahmins, Altaians and the Uyghur. Whether this can be substantiated in any way, or whether it is simply an anomalous association predicated by non-uniform and varying sample sizes, prevents a firm conclusion from being made.
North European-derived frequencies among Southwest Asian Semitic-speaking groups shown here seldom exceed 1% apiece and are either the result of recent, inconsistent small-scale admixture events or are simply background noise generated by ADMIXTURE.

Summary

The Northeast European and West European components appear to have a distribution independent of any significant migration events since the Neolithic, instead being associated with either the "migration corridor" across Eurasia or simply being the result of mutual West Eurasian heritage. North Sea and South Baltic, on the other hand, do seem to correlate with one another and support (rather than contradict) the eastward movement of Bronze age semi-pastoral nomads speaking early dialects of Proto-Indo-European.

Edit I [31/03/2012]: Correction of erroneous Brahmin results due to Google Spreadsheet lag.

Showcasing of Y-DNA Variation Among Afghan Ethnic Groups [Review]

2012-03-29T09:27:00.011-07:00

This very recent paper on Afghan Y-Chromosomes was released by M Haber et al. and provides us with an insight into the paternally-determined genetic structure of several Afghan populations.

Afghanistan's Ethnic Groups Share a Y-Chromosomal Heritage Structured by Historical Events
Haber M, Platt DE, Ashrafian Bonab M, Youhanna SC, Soria-Hernanz DF, et al. (2012) Afghanistan's Ethnic Groups Share a Y-Chromosomal Heritage Structured by Historical Events. PLoS ONE 7(3): e34288. doi:10.1371/journal.pone.0034288

"Afghanistan has held a strategic position throughout history. It has been inhabited since the Paleolithic and later became a crossroad for expanding civilizations and empires. Afghanistan's location, history, and diverse ethnic groups present a unique opportunity to explore how nations and ethnic groups emerged, and how major cultural evolutions and technological developments in human history have influenced modern population structures. In this study we have analyzed, for the first time, the four major ethnic groups in present-day Afghanistan: Hazara, Pashtun, Tajik, and Uzbek, using 52 binary markers and 19 short tandem repeats on the non-recombinant segment of the Y-chromosome. A total of 204 Afghan samples were investigated along with more than 8,500 samples from surrounding populations important to Afghanistan's history through migrations and conquests, including Iranians, Greeks, Indians, Middle Easterners, East Europeans, and East Asians. Our results suggest that all current Afghans largely share a heritage derived from a common unstructured ancestral population that could have emerged during the Neolithic revolution and the formation of the first farming communities. Our results also indicate that inter-Afghan differentiation started during the Bronze Age, probably driven by the formation of the first civilizations in the region. Later migrations and invasions into the region have been assimilated differentially among the ethnic groups, increasing inter-population genetic differences, and giving the Afghans a unique genetic diversity in Central Asia."

[PDF] [Supplementary Data]

Tabulated Y-DNA Haplogroup frequencies of the 204 individuals sampled distinguished by ethno-linguistic affiliation (ISOGG 2011 Nomenclature utilised) can be found in the Data Sink.

Results (populations sample count ~50 only)

- Haplogroup B-M60, a marker that would normally be expected among African populations, makes a surprising presence in the Afghan Hazara. Superficial STR analysis (17/19 haplotype match between all) suggests a recent common paternal ancestor, although the timeframe and ultimate origin of this common ancestor is another question.

- Haplogroup C3-M217 has invariably been associated with the expansion of Altaic-/Mongolic steppe populations since medieval times. The greater frequency (33.9%) in the Hazara relative to the Tajiks and Pashtuns appears to support this, as well as the commonly-held belief they partially descend from Mongolian tribes.

- The Hazara E1b1b1c1-M34 also stems from a common ancestor (all three share the exact 19 STR haplotype).

- The single man belonging to Haplogroup G1-M285 is of Tajik descent. It is possible this man's paternal line arrived with eastward migrating Persians following the Sassanid collapse in 651 A.D.

- As shown in previous studies, the Pashtun Haplogroup G men are again G2c-M377 (entirely this time, in contrast with Lacau et al.)

- Paragroups H*-M69, J2a*-M410, Q*-M242 and R*-M207 all indicate that Afghanistan played an important role in the demic development of their downstream subclades, or was at the very least a geographic nexus. It is worth noting that the Hazara Q* men belong to a different haplotype to their Pashtun and Tajik compatriots, again indicating genetic drift has taken place since the formation of the Hazara ethnic group (or, instead, paternal consistency through the presumed Mongolic layer that eventually formed modern Hazaras).

- In previous studies (Sengupta et al., Lacau et al.), several haplotypes without backbone SNP testing were found to belong to Haplogroup I, which is frequently considered a lineage specific to Europe. For the first time we have evidence of an I clade (I2b1-M223) in South-Central Asia, specifically among the Hazara and Tajik. The following is a recent exchange with Professor Ken Nordtvedt regarding the I2b1-M223 samples;

"The two Hazara seem related. Both haplotypes look like M223+, with the Tajik one like Continental2 characteristic of central Europe.
The Hazara haplotype looks more like M223+ Roots. But both have some problems with being considered close matches to European haplotypes.
...
I don’t think such tmrcas would be worth much. I still don’t have a firm subclade of M223 to work with for either haplotype."

Due to the limited STR's it is not possible to cleanly place these I2b1 haplotypes into any of the existing clusters/subclades. However, Haplogroup I2b itself appears to be thousands of years old (Nordtvedt's I tree, final page). This opens up the possibility for an endogenous form of Haplogroup I existing in South-Central Asia.

- A single Tajik belonging to J1c3-P58 was postulated to potentially be of Arabian origin. As the (miniscule) Afghani Arabs did not yield any J1c3, other possibilities should be considered, such as contacts with the Iranian plateau over the past few millenia.

- The Tajiks were the only population to boast the presence of all major subclades within Haplogroup L (L1a-M27, L1b-M317, L1c-M357). In line with their greater frequency relative to the Tajiks and Hazaras, several Pashtun L1c-M357 samples share similar (exact-to-2-step mutation) matches, suggesting another example of genetic drift.

- Although the Laghman Pashtuns share a similar L1c-M357 haplotype (16-17/19 match), so does the sole Tajik L1c from the same location, providing us with genetic evidence of recent mutual origins between Pashtuns and Tajiks in certain parts of Afghanistan.

- The Tajik population is more paternally diverse than all others sampled. Explanations include a less endogamous cultural character or the more recent imposition of the "Tajik" identity, which arrived with the medieval Turks.

- R1b1a*-P297 (xM269) and R1b1a2*-M269 (xU106) both appear in Uzbek and Tajik populations. Both the R1b1a*-P297 haplotypes are identical and belong to a Tajik and Uzbek, again showing there is some recent paternal overlap between Central Asian ethnic groups. I discovered the haplotype does not generally correspond with any of the established clusters in the R1b1a1-M73 Project, although there is a 13/15 match with a Tajik from Cluster B1. Although the limited STR's are unfavourable, I am of the opinion the match is substantial and the R1b1a*-P297 reported in this study is in fact R1b1a1-M73 and belongs to Cluster B1, whose membership also consists of other Tajiks, Uzbeks and an Anatolian Turk.

- It is very interesting to note that all the locations showing R1b1a*-P297 (xM269) and R1b1a2*-M269 (xU106) (Badakhshan, Herat, Takhar and Mazar-e-Sharif) lie on a horizonal plane that runs across the north of Afghanistan, particularly as the Bactria-Margiana Archaeological Complex (BMAC) was situated here.

Criticisms of Paper

- Haplogroup R2a-M124 has been erroneously correlated with aboriginal Subcontinental populations when results from the R2 WTY Project indicate places like India are a "sink" rather than a "source" (most Indian R2a is R2a1-L295, which has a spotty distribution across the rest of Eurasia).

- Haplogroup L is, much like R2a, an understudied lineage, presumably due to its' paucity in Europe. The once-common assumption in the population genetics and genealogical world that the frequency of a given lineage in a region/population signifies its' antiquity there has been proven to be inherently false through STR and SNP analysis. Haplogroup L may enjoy greater frequencies in India according to the sources at their disposal, but the presence of different L subclades in Central and West Asia should have at least given the authors the initiative to investigate the lineage's deeper structure rather than relying on a population genetics tagline from at least 2006 (Sengupta et al.).

- Despite the recent boon in research on Haplogroup R1a1a-M17's structure by independent genetic genealogists and projects (such as the R1a1a and Subclades Y-DNA Project), Haber et al. failed to include any of the pivotal SNP's that have been discovered since Underhill et al. from 2009, thus preventing observers from making any meaningful conclusions from the current findings, particularly in the context of the Indo-European migrations (generally accepted from the Eurasian steppes).

- When divided into ethno-linguistic lines, this study showcases 3 Arabs, 13 Balochis, 59 Hazaras, 5 Nurestanis, 49 Pashtuns, 56 Tajiks, 1 Turkmen and 17 Uzbeks. The most immediate criticism is inadequate testing of the Arabs, Nurestanis, Balochis, Turkmens and Uzbeks in particular.

Evaluation

Despite several glaring flaws in methodology, Haber et al. has provided us with a much-needed insight into the deeper genetic structure of Afghanistan's Y-Chromosome diversity. There is clear evidence of genetic drift (particularly among the Pashtun Q*-M242/L1c-M357 or Hazara C3-M277), as well as evidence of recent line sharing between populations (The situation of L1c-M357 in Laghman).

However, Haber et al. has thrown out some very interesting surprises (T1-M70 among Tajiks only) as well as validating results from previous studies that had previously been questioned (I2b1-M223 and R1b1a2-M269 particularly). How did these lineages arrive in Central Asia? Is recent colonial admixture a possibility? For the time being, we will have to contend with this questions steadfastly.

Addenum I [30/03/2012]: Determination of R1b1a*-P297 furthered with regard to it potentially being R1b1a1-M73.
Addenum II [30/03/2012]: Insertion of Nordtvedt correspondence.

Autosomal variation from Anatolia to the Tarim periphery [Original Work]

2012-02-09T15:48:00.003-08:00

The nature of ADMIXTURE as a tool for inferring ancestral components makes it difficult to discern the nature of a shared Autosomal component between several populations. For instance, a given component may originate in one population and be donated to others (e.g. purported African admixture in the Arabian Peninsula), stem from a mutual population (e.g. West Eurasian-specific components in low K=n runs between the Druze and the French Basque) or be the result of genetic drift (e.g. potentially, the peaking of East Asian-specific components in Korea and Japan).

Nevertheless, using results from the latest Dodecad Ancestry Project K12b run (link), I have investigated the component variation across a horizontal axis from Anatolia to the Tarim periphery in West China, with the intention of establishing the nature of the observed components across this area of interest. Raw values can be viewed on the newly-published Vaêdhya Data Sink. Populations are listed in a geographical cline.

One of the most immediate observations is the similarity between Kurdish and Iranian populations, with both expressing similar admixture percentages (deviation per component usually not >1%). This suggests that Kurds and Iranians have common origins, with the former largely maintaining those ancestral signals despite moving further westwards relative to their linguistic cousins in Iran.

Near-congruency between the Assyrians and Armenians is also striking, bar the variations on the North European, Caucasus and Southwest Asian components. It is again tempting to postulate the two descend for the most part from a similar root population with the aforementioned component differences accounting for the linguistic differences.

If one allocates the Kurds alongside the Iranians, several of the Autosomal components shown here have a distribution that appears to be determined by geography alone;

South Asian peaks in Tajiks, who are situated approximately due NNW of the Indian Subcontinent.
Caucasus reaches a maximum in Armenians and adjacent populations.
Atlantic Med steadily decreases as one moves further away from Europe.
Southeast Asian has an inverse relationship to the above, peaking in the Uyghurs sigificantly only.

Other components appear to have more complicated distributions;

Interestingly, East Asian and Siberian are not too dissimilar in the populations containing them. The elevation of both in populations which speak Turkic/Altaic languages relative to neighbours speaking other languages confirms genetic input from the Turkish steppe nomads who expanded from the eastern side of Central Asia, eventually reaching the Iranian plateau and Anatolia. However, it is possible some of the Siberian and East Asian values may simply be the result of prehistoric demic diffusion across Eurasia (demonstrated by potential gradient between Kurds/Iranians <-> Tajiks), although this may in itself be of medieval steppe ancestry.
Southwest Asian peaks in Assyrians, the only Semitic-speaking population shown in this analysis. This component falls rapidly beyond the Iranian plateau but is found at a background frequency east of Turkmenistan. Whether this is again an artefact of prehistoric demic movements or more recent migrations (e.g. Silk Road, various Persian empires) is debatable. As with the Siberian and East Asian components, there is an elevation which defies a geographical pattern and confirms historical accounts; the Tajiks, who descend in part from Persian speakers escaping Iran after the Sassanid collapse, show an elevation relative to the Uzbeks and Uyghurs. The greater frequency in Christian Armenians relative to the predominantly Muslim Kurdish territories and Iran disregards outright the notion it was introduced by the Islamic expansion out of the Arabian Peninsula.
The Gedrosia component has a bifurcated peak between Iranians and Tajiks, implying an ultimate peak in the region of Pakistan (corroborated by other Dodecad population results, such as the Balochis of Pakistan). However, the Gedrosian frequency drops from a stable 28% across West Iranic-speaking populations to 13-18% in Anatolian Turks, Armenians and Assyrians. It is again impossible to infer whether this is of prehistoric origins (i.e. mutual Neolithic phenomena between the Iranian plateau and South-Central Asia) or more recent (inflated Gedrosian values a function of Median, Persian and Parthian ancestry).
The North European component has what appears to be a dual geographic and linguistically-oriented distribution, which may be confounded further by recent interactions between Europe and some of the populations shown here (Anatolian Turks may potentially be the greatest example of this). It is interesting to note the Assyrian and Armenians show an inverse in the North European and Southwest Asian components despite otherwise appearing identical. The elevated frequency of this component in Central Asia will hopefully be covered in a future entry.

Despite the usefulness of ADMIXTURE in determining approximate ancestral origins of populations and individuals, it is impossible to ascertain the nature of component X between populations A and B; such Autosomal results should ideally be complementary to historical, linguistic, archaeological and even deep paternal and maternal evidence (Y-DNA, mtDNA).

Some of the observations made in this entry have been gleaned with earlier renditions of population data; through the use of deeper penetrating Autosomal techniques (such as IBD), the exact nature of the component variations should hopefully be resolved in the future.

Reference

The raw values used in this investigation are attributed to Dienekes Pontikos, author of the Dodecad Ancestry Project.

Of Buryats and Kalmyks: The R2a connection [Original Work]

2011-09-03T02:29:00.000-07:00

Introduction
The following is an investigation I conducted of the Y-DNA R2a-M124 Siberians found in Ancient links between Siberians and Native Americans revealed by subtyping the Y chromosome haplogroup Q1a;

"To investigate the structure of Y chromosome haplogroups R-M207 and Q-M242 in human populations of North Asia, we have performed high-resolution genotyping using both single nucleotide polymorphisms and short tandem repeat (STR)-based approaches of 121 M207- and M242-derived samples from 885 males of 16 ethnic groups of Siberia and East Asia. As a result, the following Y chromosome haplogroups were revealed: R1b1b1-M73 (2.0%), R1b1b2-M269 (0.7%), R2-M124 (1.1%)..."

Supplementary information can be found here.

The 10 R2a-M124 individuals were either Buryat (4/10) or Kalmyk (6/10), two Mongolic-speaking populations living in the Republic of Buryatia (South Siberia) and Republic of Kalmykia (northwest Caspian coast) respectively. It is worth noting, however, the Kalmyk sample was probably not from Kalmykia given the authors specified their participants were "...from 885 males of 16 ethnic groups of Siberia and East Asia". Thus, the results of the investigation (and the R2a-specific analysis here) may not be applicable to the Kalmyk majority of Kalmykia.

R2a haplotypes

All the R2a Siberians belonged to the same 12 STR haplotype apart from two Buryats, whose haplotype differed from the others only by a 1-step mutation at DYS389II (16 -> 17). Adjacent is a spreadsheet comparing the Buryat and Kalmyk haplotypes with raw data from the R2 FTDNA Project, which contains 62 participants who have tested to 67 markers. A legend can be found on the bottom of the attached comparison. Please also note that DYS389II in this instance is the sum between DYS389I+II found on the spreadsheet, as per the standard used by SMGF and FTDNA (DYS389II should've really been called DYS389B!).

Results

The paucity of 12/12 matches (or even 1/2-step) alone indicate these Siberian R2a's are divergent to the R2 project participants beyond the genealogical time frame. Therefore, we can already surmise none of the Jewish, Iranian, Indian, European or Near-Eastern R2a's in the project are recently related to them.
Deeper analysis of the Buryat and Kalmyk haplotypes through McGee's Y-Utility reveals the MCRA (Most Common Recent Ancestor) was roughly 900 years ago based on the single one-step mutation on DYS389ii (Infinite allele mutation model, 30 years/gen, constant mutation rate of 0.0024). This date (~1100 A.D.) coincides with the rise of Mongolian steppe dominance and falls just short of Genghis Khan's reign. Based on the above, it is likely the haplotype differentiation happened in historical times and the common ancestor was a native of the region.
Comparing the Buryat and Kalmyk haplotypes with the R2 project participants with the Y-Utility again demonstrates their great divergence. The earliest match to both is a Syrian paternal ancestor (1530-2190 y.b.p.). All other matches are invariably between 2970-7530 y.b.p. with little geographical coherency. This is likely an artefact of the limited number of STR's.

Summary
Although the number of STR's has limited the scope of this investigation, the Buryat and Kalmyk R2a haplotypes display a striking degree of exclusivity from other Eurasian R2a's and match each other well enough to conclude a recent mutual ancestor pre-dates the two and was likely a native of the region, probably around Genghis Khan's era. The twelve STR's alone have safely shown that R2a in Siberia is not of recent South Asian origins, indicating a greater antiquity in Siberia as well as Central Asia, which is presumably the source location.

Additional

Phylogenetic tree

Phylogenetic tree showing degree of relatedness between Buryat and Buryat-Kalmyk haplotypes relative to the R2a FTDNA Project participants on the 12 STR's used in this investigation (shown opposite). FITCH and FigTree used to generate. Special thanks to vineviz for elaborating on their application. Inferences should be made with plenty of caution given the low number of STR's, but it is interesting to see the Syrian match appear again.

Through this investigation I have inadvertently coined a "Mongolian" R2a Haplotype (i.e. mutual Buryat and Kalmyk) defined by the following STR's;

DYS393 DYS390 DYS19 DYS391 DYS385 DYS439 DYS389i DYS392 DYS437 DYS43814 23 14 10 12-19 10 12 10 16 11

In a study by Nasidze et al. on 99 Kalmyk men, the exact same haplotype shown above was observed. Although the earlier (justified) warning of this investigation's results being extrapolated onto the Kalmyk's living in Kalmykia, it was clearly without merit; the Republic of Kalmykia R2a haplotype is an exact match with the Mongolian one identified here. Therefore, at least some of the R2a found in Kalmykia is a direct import from Siberia rather than nearby sources, such as the Caucasus.

Through this independent investigation, I have demonstrated that the antiquity of R2a outside the Indian Subcontinent is very understated and haplogroups existing at background frequencies may have their own interesting stories to tell.