The Y-chromosome landscape of the Philippines: extensive heterogeneity and varying genetic affinities of Negrito and non-Negrito groups
Frederick Delfin1,2, Jazelyn M. Salvador1, Gayvelline C. Calacal1, Henry B. Perdigon1, Kristina A. Tabbada1, Lilian P. Villamor1, Saturnina C. Halos1, Ellen Gunnarsdóttir2, Sean Myles1,6, David A. Hughes2, Shuhua Xu3, Li Jin3, Oscar Lao4, Manfred Kayser4, Matthew E. Hurles5, Mark Stoneking2 and Maria Corazon A. De Ungria1*
1DNA Analysis Laboratory, Natural Sciences Research Institute, University of the Philippines, Diliman, 1101, Quezon City, Philippines;
2Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Deutscher Platz, D04103, Leipzig, Germany;
3Chinese Academy of Sciences and Max Planck Society Partner Institute for Computational Biology, Chinese Academy of Sciences, 320 Yue Yang Road, Shanghai, 200031 China;
4Department of Forensic Molecular Biology, Erasmus University Medical Center Rotterdam, Dr. Molewaterplein 50, 3000 CA Rotterdam, The Netherlands;
5The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, United Kingdom;
6Current affiliation: Institute for Genomic Diversity, Cornell University, Ithaca, NY 14853-2703 USA.
*Corresponding author: Maria Corazon A. De Ungria, PhD
DNA Analysis Laboratory, Natural Sciences Research Institute, University of the Philippines, Diliman, 1101, Quezon City, Philippines. Telefax: (63-2) 925-2965, E-mail: firstname.lastname@example.org
SUPPLEMENTARY TEXT: MATERIALS AND METHODS
POPULATION SAMPLE HISTORY
The 390 samples included in this study comprise two sets of population sample collections. One set, initially composed of 1032 samples (607 males, 425 females) representing 15 Filipino language groups (excluding Surigaonon) (Figure 1) was the result of population sampling efforts by the University of the Philippines, Natural Sciences Research Institute DNA Analysis Laboratory (UP-NSRI-DAL) from 1997–2004. Biological samples in this collection consisted of whole blood blotted on FTA™ cards (FTA™ Gene Guard system, Whatman Inc., Springfield Mill, Maidstone, Kent, UK), buccal cells collected by swab and transferred on FTA™ cards and plain buccal swabs. The FTA™ Gene Guard system (Whatman Inc., Springfield Mill, Maidstone, Kent, UK) was used to extract DNA from whole blood blotted on FTA™ paper and from saliva-buccal cells swabbed on FTA™ paper.1 A phenol-chloroform method2 and/or the QIAamp® DNA Blood Mini kit (QIAGEN Inc., Valencia, CA, USA) were used to extract DNA from buccal swab samples. DNA samples were stored in the UP-NSRI-DAL sample repository. Given the length of time in storage, DNA samples from some groups (Ivatan, Bugkalot, Kalangoya, CAR and Maranao) required enrichment and were therefore subjected to whole genome amplification (WGA) using the GenomiPhi™ DNA amplification kit (GE Healthcare Bio-Sciences AB, Uppsala, Sweden) kit following manufacturer’s instructions. Male DNA samples in this collection were typed for Y-chromosome binary markers (Y-SNPs) at the Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, U.K. following a protocol adopted from the work of Paracchini, Hurles and colleagues3,4 with modifications (Supplementary information: Figure 1, Table 1, Text – DNA typing) using the ABI Prism® SNaPshot™ multiplex kit (Applied Biosystems, Foster, CA, USA) with amplicons detected using capillary electrophoresis (CE) on an ABI Prism® 3100 Genetic Analyzer (Applied Biosystems, Foster, CA, USA) following manufacturer's instructions.
Another set of 142 samples (86 males, 56 females) consisted of saliva samples from the Mamanwa, Manobo and Surigaonon groups (Figure 1) collected by the Max Planck Institute for Evolutionary Anthropology (MPI-EVA), Leipzig, Germany in 2005. Saliva samples were processed at MPI-EVA, using a high-salt, DNA extraction method.5 Male DNA samples in this collection were typed for Y-SNPs at the MPI-EVA, using an in-house validated, single-base extension (SBE) assay (Supplementary information: Figure 1, Table 1, Text – DNA typing) with amplicons detected using Matrix-Assisted Laser Desorption Ionization Time-of-Flight (MALDI-TOF) Mass Spectrometry (MS) as described elsewhere.6
Male DNA samples of both the UP-NSRI-DAL and the MPI-EVA were typed for Y-chromosome microsatellite (Y-STR) markers at the UP-NSRI-DAL, Philippines using the PowerPlex® Y system (Promega Corporation, Madison, WI, USA) with amplicons detected on an ABI Prism® 310 Genetic Analyzer (Applied Biosystems, Foster, CA, USA), all following the manufacturers’ instructions.
Given that the Mamanwa and Manobo groups were independently sampled by the UP-NSRI-DAL and the MPI-EVA from the same Philippine region (Figure 1), sample overlap and/or relatedness were checked using genealogical information. All samples were also typed for autosomal microsatellite (A-STR) markers using the PowerPlex® 16 system (Promega Corporation, Madison, WI, USA) with amplicons detected on an ABI Prism® 310 Genetic Analyzer (Applied Biosystems, Foster, CA, USA), all following manufacturers’ instructions; and as such, Mamanwa and Manobo A-STR genotypes were also used to check for sample overlap and/or relatedness.
After excluding related and/or overlapping Mamanwa and Manobo samples; as well as DNA samples that did not produce reliable results after Y-SNP and Y-STR typing, the UP-NSRI-DAL male sample set (n=607) and the MPI-EVA male sample set (n=86) were reduced to 317 and 73, respectively; resulting in 390 samples that were included in this study (Figure 1).
POPULATION SAMPLING AND SAMPLE DETAILS
Population field sampling was conducted within enclaves (sitios) and/or districts (barangays) found within municipalities or cities across the Philippines. Before the collection of samples, interviews were conducted to compile the family history of each potential participant. Samples were generally collected from unrelated volunteers. However in some populations with small population sizes, most individuals were related; hence volunteers not related within the second degree of consanguinity were considered. A total of 1174 (1032: UP-NSRI-DAL; 142: MPI-EVA) population samples were collected. The following details of population sample collection enumerate the number of collected samples, sample type, collection sites and various personalities, institutions and organizations who assisted in sample collection from 1997 to 2005.
The CAR (Cordillera Administrative Region) group in this study is composed of three language groups namely: the Ibaloi (buccal swabs from 19 males and 32 females) from Benguet and Nueva Vizcaya provinces; the Ifugao (buccal swabs from 28 males and 26 females) from Banaue municipality in Ifugao province, Nagtipunan municipality in Quirino province and Kayapa municipality of the Nueva Vizcaya province; and Kankanaey (buccal swabs from 13 males and 50 females) from Mankayan and La Trinidad, Benguet province. Samples were collected during several trips to the CAR region in 1997 and 2002. After DNA typing, only two Ibaloi, one Ifugao and six Kankanaey samples gave reliable results. Due to close geographical proximity and cultural relatedness of the Ibaloi, Ifugao and Kankanaey language groups, nine samples were pooled as one sample set and labeled as the CAR group. Dr. Michael Paul Tan and Maricar Posa were involved in the collection of samples in 1997. Armando Bogite and members of the New Tribes Mission facilitated the collection of samples in 2002.
Aeta of Zambales samples (19 male and 19 female whole blood samples) were collected in January 1998 from volunteers who reside in the municipality of Castillejos, located north of the Subic Bay Metropolitan Authority (SBMA), in the province of Zambales. The collection was facilitated by Dr. Edith Tria of the Department of Health and Dr. Leo Uy.
Bugkalot samples (31 male and 21 female saliva samples) were collected from volunteers residing in different municipalities of Nueva Vizcaya and Quirino provinces in two separate trips in January and February, 2002, respectively. The collection of samples was facilitated by Rey and Lea de la Rosa, Pastor Angel de la Cruz and Jessie Magallanes, all from the New Tribes Mission and Armando Argayosa. Ramon Pagsiguian accompanied the sampling team to the community while Liza Faustino and Laarni Carpina assisted in the collection of samples.
Ivatan samples (162 male and 104 female buccal swabs) were collected from volunteers from the main islands of Batanes province namely Batan, Itbayat and Sabtang in March 2002 through the assistance of Leticia Bayaras (Provincial Health Office), Hilda Dacles (Municipal Health Office) and Patricia Gallo of the National Commission on Indigenous Peoples (NCIP) with endorsement from Governor Vicente Gato and facilitated by Dr. Victor Paz of the UP Archaeological Studies Program. Laarni Carpina assisted in the collection and handling of samples.
Kalangoya samples (45 male and 15 female buccal swabs) were collected from volunteers from Bambang and Kayapa municipalities of Nueva Vizcaya province in March 2002 with the assistance of Satur Banih and with support from Mayor Tony Dupiano, Jimmy Tindaan and Steve Baccar.
Maranao samples (28 male and 21 female buccal swabs) were collected in April 2002 from volunteers in two campuses of Mindanao State University (MSU), namely MSU-Iligan Institute of Technology (MSU-IIT) in Lanao del Norte province and MSU-Marawi, Marawi City, Lanao del Sur province. Permission to collect on campus was obtained from Dr. Olga Nuneza of MSU-IIT. Irene Estrada Macarambon- Benito of MSU-IIT facilitated volunteer participation in the study.
Aeta of Bataan samples (27 male and 14 female samples of saliva blotted on FTA™ paper) were collected in April 2003 from volunteers residing in the municipalities of Morong, Hermosa and Orion in the province of Bataan and in the area governed by the SBMA in the Zambales province through the assistance of Edmond de Jesus and with the endorsement of Amethya Concepcion of the SBMA, Dr. Roberto Pagulayan of the UP Institute of Biology, Tribal Chief Bonifacio Florentino and Tribal Chief Josefina Alejo. Joseph Salonga accompanied the sampling team to the communities. The Aeta of Bataan is differentiated from the Aeta of Zambales because these groups use different languages.7
Filipino groups residing on the island of Mindoro are collectively called Mangyan. Samples of saliva blotted on FTA™ paper were collected from four Mangyan groups namely the Hanunuo (15 male and 3 female samples), Iraya (16 male and 11 female samples), Tadyawan (14 male and 1 female samples) and Tawbuid (14 male and 1 female samples) in April 2003 during a gathering organized by the Mangyan Church Tribal Association (MCTA) in Baco municipality and Calapan city of Oriental Mindoro province. Pastor Andino Layda, Efren Aceveda, Peter Mayot and Pastor Diokno Onday of the MCTA and Tribal leader Fundador Fuentes assisted in obtaining the free and prior informed consent (FPIC) from volunteers. John Richards, Lore Jean de Guito and Dothy Smith of the Overseas Mission Fellowship assisted in introducing the sampling team to the community.
Ati samples (36 male and 26 female samples of saliva blotted on FTA™ paper) were collected from volunteers from different districts of the provinces of Antique, Aklan, Capiz and Iloilo, all on the island of Panay as well as from the adjacent islands of Boracay and Guimaras in January 2004. Sampling was coordinated with Pastor Emeterio Allianza, Pastor Jessie Elosendo, Pastor Enoc Valencia, Chief Salvador Escuña, Chief Gregorio Elosendo, Chief Elias Valencia and Chief Enrique Martinez. Alejandro Condez accompanied the sampling team around Panay and Guimaras islands whereas Joana Guarin of the Aklan State University helped in the collection of samples from residents on the island of Boracay.
Agta samples (44 male and 23 female samples of saliva blotted on FTA™ paper) were collected from volunteers residing in Iriga City and Buhi municipality, both within the province of Camarines Sur in March 2004. Contact with the communities was made with the assistance of Julio Versoza of the Mt Isarog Protected Area Office, Dennis Barroga and Belen Jacob of the National Commission on Indigenous People (NCIP). Our request to go to the communities was approved by Director Lee Arroyo and Atty. Corazon Crescini of the NCIP.
Mamanwa samples (26 male and 21 female samples of saliva blotted on FTA™ paper) were collected from volunteers from different municipalities of the province of Surigao del Norte in April 2004, through the assistance of Leonita Gorgolon, Audie Reliquette and Angelita Bullo, from the Provincial Health Office and Pastor Bernard Yap of the Christ Faith Fellowship Church.
Manobo samples (70 male and 37 female samples of saliva blotted on FTA™ paper) were collected from volunteers from different municipalities of the province of Agusan del Sur with the assistance of Mrs. Lily Labadan and Maria Marley Daday in April 2004.
In August 2005, samples of saliva in lysis buffer were collected from Mamanwa (38 male and 20 female samples); Manobo (11 male and 36 female samples) and Surigaonon (37 male samples) groups. Mamanwa and Surigaonon volunteers were from different municipalities of Surigao del Norte province and Manobo volunteers were from different municipalities of Agusan del Norte province. Population sampling was facilitated by Mr. Fernando A. Almeda Junior, Dr. Irinetta C. Montinola, Dr. Wilfredo Sinco (all from the Surigaonon Heritage Center); Ms. Girlie Patagan (NCIP); Mrs. Elizabeth S. Larase and Ms.Juliet P. Erazo (Office of Non-Formal Education) and the Rotary Club of Surigao.
Multiplexes I.1 to III.19 (Supplementary Information: Figure 1 and Table 1)
Specific-PCR Multiplex reactions were performed in 10 microliter (µl) volumes each containing 1X AmpliTaq Gold Buffer II (Applied Biosystems, Foster, CA, USA), 4 millimolar (mM) Magnesium Chloride (MgCl2), 0.4 mM deoxyribonucleotide triphosphate mix (dNTP mix), 0.08–0.24 µM primer mix (Supplementary Table 1), 0.5 Unit AmpliTaq Gold® enzyme (Applied Biosystems, Foster, CA, USA) and 2 µl or 1 FTA™ punch for DNA template; with the following thermocycling parameters: 94oC for 9 minutes (min); 15 cycles of 94oC for 30 seconds (sec), 59oC for 30 sec, 72oC for 60 sec and a final 72oC for 3 min.
With the incorporation of universal tags or “ZIP code” sequences (Supplementary Table 1)3in the specific-PCR products, even concentration of specific-PCR products in the multiplex was obtained through a second amplification using high concentrations of “ZIP code” primers. These reactions were performed in 20 µl volumes each containing 1X AmpliTaq Gold Buffer II (Applied Biosystems, Foster, CA, USA), 4 mM MgCl2, 1 µM each of ZIP code primer A and primer B (Supplementary Table 1), 0.5 Unit AmpliTaq Gold® enzyme (Applied Biosystems, Foster, CA, USA) and 10 µl specific PCR product; with the following thermocycling parameters: 94oC for 9 min; 30–34 cycles of 94oC for 30 sec, 55oC for 30 sec, 72oC for 60 sec and a final 72oC for 3 min.
PCR products (specific PCR-ZIP reaction product) were cleaned in 12 µl-volume reactions each containing 2 Units of shrimp alkaline phosphatase (SAP) enzyme, 1.5 Units of Exonuclease I (Exo I) enzyme and 10 µl PCR product with the following incubation parameters: 37oC for 1 hour (hr) and 80oC for 20 min.
Single-base extension (SBE) reactions were performed in 5 µl volumes using the ABI Prism® SNaPshot™ multiplex kit (Applied Biosystems, Foster, CA, USA) following manufacturer's instructions. Each SBE reaction contained 1X SNaPshot™ reaction mix, 0.2–0.6 µM extension primer mix (Supplementary Table 1) and 0.5 µl of cleaned PCR product with the following thermocycling parameters: 25 cycles of 96oC for 10 sec, 50oC for 5 sec and 60oC for 30 sec. SBE products were detected by CE on an ABI Prism® 3100 Genetic Analyzer (Applied Biosystems, Foster, CA, USA) following manufacturer's instructions.
Cluster I and II Multiplexes (Supplementary Information: Figure 1 and Table 1)
Cluster I specific-PCR multiplex reactions were performed in 25 µl volumes each containing 1X PCR Buffer, 4 mM MgCl2, 0.5 mM dNTP mix, 0.015–0.23 µM each of forward primer mix and reverse primer mix (Table S1), 0.5 Unit AmpliTaq Gold® enzyme (Applied Biosystems, Foster, CA, USA) and 4 µl DNA template; with the following thermocycling parameters: 95oC for 15 min; 36 cycles of 95oC for 20 sec, 59oC for 30 sec, 72oC for 60 sec and a final 72oC for 3 min.
Specific-PCR conditions for Cluster II were similar to Cluster I except that 0.12 µM of forward primer mix and reverse primer mix were used with the following thermocycling parameters: 95oC for 15 min; 38 cycles of 95oC for 20 sec, 64oC for 30 sec, 72oC for 60 sec and a final 72oC for 3 min.
PCR products were cleaned in 10 µl-volume reactions each containing 0.32 Unit of SAP enzyme, 0.2 Unit of Exo I enzyme and 8 µl PCR product with the following incubation parameters: 37oC for 1 hr and 80oC for 20 min. PCR products were subjected to a SBE assay followed by SBE product detection by MALDI-TOF MS as described previously.6
The Reference data set
The reference data set was assembled from previously published works8-12, composed of 1,756 males from 60 groups representing five Asia-Pacific population groups (Figure 1). These population groups, the populations comprising each group and their respective population codes are the following: East Asia composed of Korea (KOR, n = 21), China (CHI, n = 36), Han Chinese from Taiwan (TCH, n = 19), Ami (AMI, n = 9), Ata (ATA, n = 10), Bunum (BUN, n = 10), Paiwan (PAI, n = 12) and Vietnam (VTN, n = 6); Southeast Asia composed of Hiri (MO1, n = 20), Ternate (MO2, n = 13), South Borneo (SBO, n = 40), Sumatra (SUM, n = 55), Malaysia (MAL, n = 17) Java (JAV, n = 53), Flores (FLO, n = 73), Adonara (ADR, n = 96), Alor (ALR, n = 34), East Timor (ETR, n = 48), Roti (ROT, n = 11), Lembata (LMB, n = 31), Pantar (PNT, n = 38) and Solor (SLR, n = 43); Melanesia composed of Ketengban (KET, n = 19), Una (UNA, n = 46), Yali (YAL, n = 5), Dani (DAN, n = 12), Lani (LAN, n = 12), Citak (CIT, n = 28), Kombai (KOM, n = 2), Awyu (AWY, n = 10), Asmat (ASM, n = 20), Mappi (MAP, n = 10), Korowai (KRW, n = 11), Muyu (MYU, n = 8), Papua NewGuinea (PNG) highlands (PHL, n = 31), PNG north coast (NCo, n = 16), PNG south coast (SCo, n = 17), Trobriand (TRO, n = 52), Bereina (BRN, n = 35), Kapuna (KAP, n = 46), Tolai New Britain (TOL, n = 19), Seimat–Wuvulu (S/W, n = 11), Andra–Hus (A/H, n = 20), Kurti (Kur, n = 18), Lele (Lel, n = 24), Mokerang (Mok, n = 5), Nyindrou (Nyi, n = 17), Ere–Kele (E/K, n = 14), Nali (Nal, n = 18) and Titan (Tit, n = 21); Fiji (FIJ, n = 101); Polynesia composed of Cook Islands (COK, n = 66), Futuna (FUT, n = 50), Niue (NIE, n = 8), Tokelau (TOK, n = 6), Tonga (TON, n = 28), Tuvalu (TUV, n = 100), West Samoa (WES, n = 60); and Australia composed of Arnhem Land (AS1, n = 60) and Great Sandy Desert (AS2, n = 35). Further grouping of Melanesian populations into the Admiralty Island (ADM: S/W, A/H, Kur, Lel, Mok, Nyi, E/K and Tit); East New Guinea (ENG: PHL, NCo, SCo, TRO, BRN, and TOL) and West New Guinea (WNG: KET, DAN, CIT, KOM, ASM, MAP and KRW) was done to have suitable sample sizes (samples that share haplogroups with FE groups) for Correspondence analysis (CA, Figure 4).
Reconciling the Filipino data with the reference data set
The reference data set used the haplogroup name C-RPS4Y, while the Filipino data used the haplogroup name C-M130 (Supplementary Information: Figure 1, Tables 1 and 2); however, both these names refer to the same NRY binary marker.13,14 The reference data set was typed for haplogroup O-M11011,12 and the Filipino data set for O-M50 (Supplementary Information: Figure 1, Tables 1 and 2), which define the same haplogroup lineage.14 Hence to avoid confusion in haplogroup names and to facilitate comparison of haplogroups between Filipino and reference data set, names consistent with the reference data (C-RPS4Y and O-M110) were used. The reference data set was typed for O-M324, a subhaplogroup of O-M122, while the Filipino data set was not. For compatibility in the analyses, O-M324 data was pooled with O-M122 in the reference data set.
For compatibility in the analysis of Y-STR haplotypes between Filipino data and the reference data set, seven overlapping Y-STR loci (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392 and DYS393) were used in all analyses. The DYS389 locus produces two fragments that overlap due to a duplicated priming site for the forward primer.15 The DYS389 fragments are distinguishable by size; DYS389I being shorter and DYS389II being the longer fragment which includes the smaller DYS389I fragment. In practice, allele calls for DYS389II have been done in two ways; one in which the allele size of DYS389I is included (DYS389II+I) and one in which the allele size of DYS389I is subtracted (DYS389II-I). This practice differs across laboratory typing systems (in-house developed or commercially purchased kits). The Filipino DYS389 data was generated using the PowerPlex® Y system (Promega Corporation, Madison, WI, USA) which yields DYS389II+I data, while the reference data set considers DYS389II-I. Hence for compatibility, the Filipino DYS389I allele size was subtracted from DYS389II to give the allele size for DYS389II, after which three repeats were subtracted from DYS389I (this is particular to the in-house typing system used for the reference data set) to give the compatible DYS389I allele size. For example, original Filipino data: DYS389I-13, DYS389II-27; reference data set-comparable data: DYS389I-10, DYS389II-14.
Network analyses16 were performed using Network version 4.510 and Network Publisher version 184.108.40.206 (http://fluxus-engineering.com). A network weighting scheme17 based on Y-STR locus-specific mutation rates was used. In the following order of Y-STR loci, DYS19:DYS389I:DYS389II-I:DYS390:DYS391:DYS392:DYS393, the locus specific weights used were 4:4:3:3:2:12:10.17 As applied to the reference data set10, initial analyses was performed using the Median-Joining (MJ) algorithm.16 However this generated complex networks that were difficult to visualize and interpret, hence network reduction schemes were applied. To reduce network complexity, Y-STR data was put thru the Reduced-Median (RM) algorithm.18 The RM output was then put thru the MJ algorithm. The RM-MJ output was then subjected to post processing using the Maximum Parsimony (MP) algorithm19, selecting the option “single, arbitrary near-optimal tree within network”. The RM and MJ algorithms generate various non-parsimonious links. The MP algorithm eliminates these links, simplifying the network; however, it should be noted that MP calculations generate a single network that is just one of many equally parsimonious networks. As performed in this study, the RM-MJ-MP network was compared to the intial MJ network to ensure that the simplified network is representative of the complex network.
Estimation of haplogroup coalescent times (Time since the Most Recent Common Ancestor-TMRCA)
The BATWING program20 (http://www.mas.ncl.ac.uk/~nijw/) was used to estimate TMRCA for each Y-SNP haplogroup using both Y-SNP and Y-STR data. Y-SNP data were used as Unique Event Polymorphisms (UEP sites) to constrain the gene genealogy model and Y-STR data were used under a Step-wise Mutation Model (SMM). Y-STR mutation rates were modeled using a gamma distribution with parameters alpha and beta [gamma (α, β)]. These parameters for DYS19 (5,2763), DYS389I (5,2192), DYS389II-I (6,2192), DYS390 (12,2233), DYS391 (10,2182), DYS392 (1,2182) and DYS393 (1,2182) were previously used for the reference data set.10 Population structure was modeled to be a substructured population (14 Filipino groups, with sample sizes > 10). Population size was modeled to be of an initial constant size followed by exponential growth. Three independent Markov Chain Monte Carlo (MCMC) runs were performed. Each MCMC run had a different random number seed, a total of 108 MCMC chains and a 10% burn-in period. Built-in BATWING functions were used to evaluate the MCMC run. The 95% posterior density was computed and TMRCA point estimates (number of generations) were converted to time in years using a generation time of 30 years per generation. A generation time of 30 years per generation was previously used for the reference data set10, but apart from ensuring comparable TMRCA estimates, a 30-year generation time for males (father-son intervals) was found to be appropriate for recent agricultural societies across the world.21
Estimation of divergence time and migration rates
As there seem to be signals of genetic links between several FEN groups (Aeta-Bataan, Aeta-Zambales and Agta) and indigenous Australians (Arnhem Land and Great Sandy Desert) (Figure 3: C-RPS4Y and K-M9; Figure 4); divergence times and migration rates between these groups were estimated using haplotype data (7 Y-STR loci) through pairwise, simulation-based analyses using the IM program22 (http://genfaculty.rutgers.edu/hey/software).
All IM program information was available in the program documentation (Introduction to the IM and IMa computer programs - March 5, 2007 and IM Documentation - March 5, 2007) available at the program website. IM program author Jody Hey and the IM Google group (http://groups.google.com/group/Isolation-with-Migration) provided valuable support in that most of the questions and concerns that an IM program user would have, are or may have already been addressed and posted at the group site.
One pair of populations (Australia, FEN group) was used for initial program testing and to search for the appropriate IM run parameters. To test whether the IM program ran properly, initial parameter settings were adopted from the IM Documentation (http://genfaculty.rutgers.edu/hey/software#IMa2): Step-wise Mutation Model (SMM) for STR data; population mutation parameter (q1 and q2) = 10; migration rates (m1 and m2) = 10; divergence time (t) = 10. Retaining the initial parameter settings, the number of MCMC iterations required to produce reasonable results was evaluated by using varying number of MCMC iterations (105 – 107) with 10% burn-in periods. Reasonable results refer to unimodal posterior distributions for the parameters being estimated, trend plots (no trend), autocorrelation values (low values) and effective sample size (ESS >50 for the t parameter23) values that indicate MCMC convergence. Given the t parameter shows the slowest rate of mixing, a minimum ESS value >50 for this parameter is acceptable.23 Using STR data requires the incorporation of multiple MCMC coupled chains (Metropolis coupling) which subsequently require chain heating schemes (IM Documentation). The following IM run parameters, previously addressed and posted at the IM Google group site were therefore incorporated into the analyses: 30 coupled chains (Metropolis coupling) with geometric heating scheme with 0.95 and 0.9 for the first (-g1) and second (-g2) heating parameters; respectively. Using the parameters already enumerated, credible intervals for the t parameter were very wide (wider than the reported values in Table 4). To further refine analyses, the t parameter was reduced to t = 5 and the population split parameter (s) which accounts for changes in population size, was incorporated. A final test for the appropriate length of MCMC run was performed using 3x107, 6x107 and 9x107 iterations, each with 10% burn-in periods. No change in results was observed beyond 6x107 MCMC iterations.
Identifying an appropriate set of IM run parameters, population pairwise comparisons were then performed with the following parameters: mutation model = SMM ; q1 and q2 = 10; m1 and m2 = 10; t = 5; s = 0.2; 30 coupled chains with geometric heating scheme (g1 = 0.95; g2 = 0.9); generation time = 30 years; burn-in = 6x106; total run = 6x107 iterations. The use of a 30-year generation period has been discussed in Network analyses. For each pairwise population comparison, three independent IM runs with the same parameter settings, but different random number seeds, were performed. Convergence on the stationary distribution was considered to be reached when each run had a minimum ESS of >50 for the t parameter and when the independent runs generated similar distributions, as recommended previously.23 The peak of the distribution (mode) of the estimated parameters (i.e. t, m1 and m2) has the highest probability, similar to a maximum likelihood estimate, and was therefore considered the actual estimate of the parameter.
This work was supported by: the Natural Sciences Research Institute, University of the Philippines (NSR-97-2-04, NSR-00-1-03 and NSR-03-1-01); the Philippine Council for Advanced Science and Technology Research and Development, Department of Science and Technology; the Academy of Science in the Developing World, Trieste, Italy (RGA: 02-117 RG/BIO/AS); the Wellcome Trust (077014/Z/05/Z), Cambridge, UK; the Chinese Academy of Sciences and Max Planck Society Partner Institute for Computational Biology, Shanghai China; the National Outstanding Youth Science Foundation of China (30625016); National Science Foundation of China (30890034 and 30971577), and 863 Program (2007AA02Z312); the Max Planck Society, Germany; and the European Commission (B7-7070/T-2000/005). Statements made herein do not reflect the views of the European Commission. M.C.A. De Ungria was a recipient of a Royal Society of London visiting fellowship. L. Jin was also supported by the Shanghai Leading Academic Discipline Project (B111) and the Center for Evolutionary Biology. S. Xu was also supported by the Science and Technology Commission of Shanghai Municipality (09ZR1436400), the Knowledge Innovation Program of Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences (2008KIP311), SA-SIBS Scholarship Program and the K.C.Wong Education Foundation, Hong Kong. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
1. Salvador JM, De Ungria MCA: Isolation of DNA from saliva of betel quid chewers using treated cards. J Forensic Sci 2003; 48: 794-797.
3. Paracchini S, Arredi B, Chalk R, Tyler-Smith C: Hierarchical high-throughput SNP genotyping of the human Y-chromosome using MALDI-TOF mass spectrometry. Nucleic Acids Res 2002; 30: e27 21-26.
4. Hurles ME, Sykes BC, Jobling MA, Forster P: The dual origin of the Malagasy in island Southeast Asia and East Africa: Evidence from maternal and paternal lineages. Am J Hum Genet 2005; 76: 894-901.
5. Quinque D, Kittler R, Kayser M, Stoneking M, Nasidze I: Evaluation of saliva as a source of human DNA for population and association studies. Anal Biochem 2006; 353: 272-277.
6. Hughes DA, Tang K, Strotmann R et al: Parallel Selection on TRPV6 in Human Populations. . PLoS ONE 2008; 3: e1686 1681-1613.
7. Gordon RG, Jr. (ed): Ethnologue: Languages of the World, Fifteenth edition. Dallas, Tex.: SIL International. Online version: http://www.ethnologue.com/. 2005.
8. Kayser M, Brauer S, Weiss G, Schiefenhövel W, Underhill PA, Stoneking M: Independent Histories of Human Y Chromosomes from Melanesia and Australia. Am J Hum Genet 2001; 68: 173-190.
9. Kayser M, Brauer S, Weiss G et al: Reduced Y-chromosome, but not mitochondrial DNA, diversity in human populations from West New Guinea. Am J Hum Genet 2003; 72: 281–302.
10. Kayser M, Brauer S, Cordaux R et al: Melanesian and Asian Origins of Polynesians: mtDNA and Y Chromosome Gradients Across the Pacific. Mol Biol Evol 2006; 23: 2234-2244.
11. Kayser M, Choi Y, van Oven et al: The Impact of the Austronesian Expansion: Evidence from mtDNA and Y Chromosome Diversity in the Admiralty Islands of Melanesia. Mol Biol Evol 2008; 25: 1362-1374.
12. Mona S, Grunz KE, Brauer S et al: Genetic Admixture History of Eastern Indonesia as Revealed by Y-Chromosome and Mitochondrial DNA Analysis. Mol Biol Evol 2009; 26: 1865-1877.
13. Underhill PA, Shen P, Lin AA et al: Y chromosome sequence variation and the history of human populations. Nat Genet 2000; 26: 358-361.
14. Karafet TM, Mendez FL, Meilerman MB, Underhill PA, Zegura SL, Hammer MF: New binary polymorphisms reshape and increase resolution of the human Y chromosomal haplogroup tree. Genome Res 2008; 18: 830-838.
15. Kayser M, Caglià A, Corach D et al: Evaluation of Y-chromosomal STRs: a multicenter study. Int J Legal Med 1997; 110: 125–133.
17. Mona S, Tommaseo-Ponzetta M, Brauer S, Sudoyo H, Marzuki S, Kayser M: Patterns of Y-Chromosome Diversity Intersect with the Trans-New Guinea Hypothesis. Mol Biol Evol 2007; 24: 2546-2555.
18. Bandelt H-J, Forster P, Sykes BC, Richards: MB: Mitochondrial portraits of human populations using Median Networks. Genetics 1995; 141: 743-753.
19. Polzina T, Daneshmand SV: On Steiner trees and minimum spanning trees in hypergraphs. Oper Res Lett 2003; 31: 12 – 20.
20. Wilson IJ, Weale ME, Balding DJ: Inferences from DNA data: population histories, evolutionary processes and forensic match probabilities. J R Stat Soc Ser A 2003; 166: 155-201.
21. Matsumura S, Forster P: Generation time and effective population size in Polar Eskimos. Proc R Soc B 2008; 275: 1501–1508.
22. Hey J, Nielsen R: Multilocus methods for estimating population sizes, migration rates and divergence time, with applications to the divergence of Drosophila pseudoobscura and D. perimilis. Genetics 2004; 167: 747-760.
23. Hey J: On the Number of New World Founders: A Population Genetic Portrait of the Peopling of the Americas. PLoS Biol 2005; 3: e193 0965-0975.