Abstract
Anopheles culicifacies is the major vector of malaria in Sri Lanka and the Indian subcontinent which is characterized as a species complex with five sibling species provisionally designated as A, B, C, D and E. The current study was carried out to understand the phylogenetic and phylogeographic relationships between the sibling species of the species complex while observing their genetic diversity and genetic differentiation. Thirty-five ITS2 and seventy-seven COI sequences of An. culicifacies species complex reported from different geographical locations of Asia and China at the NCBI public database were used for the analysis. Bayesian likelihood trees were generated for the phylogenetic analysis. The divergence of the species complex was obtained from the Bayesian phylogeographic model in BEAST. There were two clades of the sibling species of An. culicifacies species complex as A, D and B, C and E in both phylogenetic and phylogeographic analysis using ITS2 sequences. Based on the highly divergent COI sequences and the high mutation rate of the mitochondrial genome, there were four and three clades in both phylogenetic and phylogeographic analysis using COI sequences. The diversification of An. culicifacies species complex was obtained as ranging from 20.25 to 24.12 Mya and 22.37 to 26.22 Mya based on ITS2 and COI phylogeographic analysis respectively. There was a recent diversification of the sibling species A and D than the sibling species B, C and E. Low haplotype diversity was observed in the sequences reported from Sri Lanka in both ITS2 and COI analysis that can be due to bottlenecks resulting from the intense malaria control efforts. A high genetic differentiation was achieved for some populations due to the large geographical distance. The high genetic diversity based on the five sibling species implies the possibility of maintaining a relatively high effective population size despite the vector control efforts.
Citation: Rathnayake RAS, Wedage WMM, Muthukumarana LS, De Silva BGDNK (2023) Genetic diversity, phylogenetic and phylogeographic analysis of Anopheles culicifacies species complex using ITS2 and COI sequences. PLoS ONE 18(8):
e0290178.
https://doi.org/10.1371/journal.pone.0290178
Editor: Maria Stefania Latrofa, University of Bari, ITALY
Received: March 22, 2023; Accepted: August 2, 2023; Published: August 16, 2023
Copyright: © 2023 Rathnayake et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting Information files.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Anopheles culicifacies Giles 1901 is the major vector of malaria in Sri Lanka [1, 2]. It belongs to the order Diptera, sub-order Nematocera, family Culicidae, genus Anopheles, subgenus Cellia and series Myzomyia [3]. Anopheles culicifacies has a wide distribution that extends from Ethiopia, Iran, Yemen westward to Cambodia, Vietnam, Laos, China eastward whereas the Indian subcontinent towards the northern part, Sri Lanka, and southern part of Thailand towards the southern part [4, 5]. In 1977, An. culicifacies sensu lato (s.l.) was colonized initially by the National Institute of Malaria Research (previously Malaria Research Centre) in the laboratory [3]. It has been characterized as a species complex with five sibling species provisionally designated as A, B, C, D and E [2, 3] by using cytogenetic methods. The sibling species of the complex are morphologically similar and reproductively isolated. The only sibling species B and E are reported from Sri Lanka while country where all the members of the species complex are found is in India [2, 6].
According to the polytene chromosome diagnostic banding pattern similarities between the sibling species, A and D are identified as closely related to each other than the sibling species B, C and E. However, the sibling species B, C and E are known to be closely related to each other [7]. The sequence differences of 28S-D3 rDNA domains and ITS2 rDNA region also supported the classification of the species complex into two groups A, D and B, C, E [4, 8, 9]. Vectorial capacity, differences in biting and resting behavior, resistance to insecticides, and susceptibility to malaria parasites of each sibling species vary [10]. Species A, B, C and D are mainly zoophilic and show low anthropophilic index (1–4%) than species E (80%) [11]. However, species A and E are considered as main vectors in the Indian subcontinent than species C and D whereas species B is considered as poor or non-vector [12]. It has been recorded so far that only sibling species B and E are present in Sri Lanka where species E plays the role as major vector [6].
Based on the limited sequence data available at that time it was proposed that the sibling species A and B evolved from a common ancestor. The sibling species D is considered as diverged from sibling species A and the sibling species C from sibling species B in a later period. The origin and distribution of sibling species E persist unclear [9].
The evolutionary potential and history of species are reflected from the genetic diversity and phylogenetic diversity. A higher genetic diversity implies a higher evolutionary potential and a vast ability to respond to environmental changes [13]. In depth understanding of how species evolve through genetic changes can be obtained through phylogenetic analysis and it provides the pathway that connects extant species with their ancestral origin, as well as allows us to predict the genetic divergence that may occur in the future. The phylogeography can be considered as the conceptual bridge connecting processes acting on populations with evolutionary patterns detectable among species or higher taxa [14]. These studies allow us to understand the geographic ordination of genotypes and help to understand both biogeography and landscape genetics across a variety of spatial and temporal scales [15].
The mitochondrial COI and COII, D3 (28S rDNA), ITS1 and ITS2 of ribosomal DNA (rDNA) play a significant role in species identification and phylogenetic analyses. ITS2 region is popular in studying phylogenetics of closely related Anopheles species, as well as biodiversity and geographic races of a particular species of mosquitoes as it is a non-coding DNA sequence containing high degree of mutations [16]. ITS2 region is used in distinguishing members of several Anopheles species complexes including Anopheles maculatus group [17]. Anopheles hyrcanus group [18], Anopheles dirus complex [17]. The sequences of both ITS2 and COI were used in previous studies of Anopheles species also. The barcode region of the five prime (5’) end of the COI, three prime (3’) end of the COI and ITS2 region of rDNA were used to study about the hypothesis of new molecular lineages within Anopheles punctimacula s.l. [19]. The ITS2 and COI sequences were utilized in detection of Anopheles species such as Anopheles arabiensis and Anopheles pretoriensis from understudied regions of eastern Ethiopia [20].
The COI gene followed by ITS2 is the most commonly used molecular markers or barcode regions for mosquito barcoding studies. Comparison of rDNA ITS2 and mtDNA COI efficacies for the correct classification of the species has revealed that ITS2 is preferred on the COI gene [21].
Hence aim of this study was to maximum utilization of the data repository in the public domain, National Center for Biotechnological Information (NCBI) to infer the phylogenetic and phylogeographic relationships of Anopheles culicifacies species complex using ITS2 and COI sequences.
Materials and methods
Data collection
The sequences of ITS2 and COI of An. culicifacies species complex available at NCBI public database (www.ncbi.nlm.nih.gov) were used (S1 and S2 Tables). Four sequences of Aedes aegypti and two sequences of Anopheles subpictus were used as the outgroup of the phylogenetic and phylogeographic analyses respectively (S3 Table). A total number of thirty- five sequences of ITS2 and seventy- seven COI sequences were used for the analysis.
The number of ITS2 and COI sequences from each sibling species is shown in Table 1 and the number of ITS2 and COI sequences from each country is shown in Table 2.
Sequence alignment
All the downloaded sequences of ITS2 and COI were aligned using ClustalW in MEGA5.2.2 software separately [22]. Both ITS2 and COI alignment files were trimmed separately to obtain a consistent region as the length of the sequences are different [23]. The final length of ITS2 aligned sequences was 439bp and COI aligned sequences were 450bp.
Phylogenetic analysis
The FASTA format trimmed files of ITS2 and COI after aligning were converted to Nexus alignment (.nex) and Phylip alignment (.phy) file formats by Geneious Prime 2022.1.1 software. The best schemes of models of evolution for ITS2 and COI sequences were obtained using PartitionFinderV1.1.1_Windows software as shown in Table 3. Phylogenetic trees of ITS2 and COI were generated separately using obtained best models of evolution in MrBayes- V 3.2.5_ Bayesian inference of Phylogeny software.
Haplotype network construction using ITS2 and COI sequences
DNA Sequences Polymorphism software (DnaSP) (Version 5.10.01) and Network 10.2 software were used to determine the spatial distribution of haplotypes of ITS2 and COI sequences of An. culicifacies using the median-joining method. Arlequin version 3.5.2.2 software was used to obtain the haplotype frequencies of each country [24].
Determination of genetic diversity
Genetic diversity of ITS2 and COI sequences An. culicifacies species complex was obtained for the total number of sites (excluding sites with gaps / missing data) [25]. The number of variable sites, GC content, the total number of mutations, number of haplotypes, haplotype diversity, nucleotide diversity and neutrality tests of Tajima’s D, Fu and Li’s D* test statistic, Fu and Li’s F* test statistic, Fu’s Fs statistic were obtained for both ITS2 sequences and COI sequences using DnaSP v5 software.
Determination of genetic differentiation
Genetic differentiation of ITS2 and COI sequences An. culicifacies species complex among populations of the study was estimated based on fixation index (FST) values [26] using Arlequin version 3.5.2.2 software.
Phylogeographic analysis
Phylogeographic analysis using ITS2 and COI sequences following the Bayesian phylogeographic model was carried out in BEAST v1.8.2 software [27, 28]. The diversification date of the most recent common ancestor (MRCA) of the Anopheles genus at 83.23 Mya with a 95% credibility interval ranging from 54.33 to 115.88 Mya [29] was used to calibrate the node ages which was dated using the Drosophila-Anopheles divergence at 260 Mya suggested by Gaunt and Miles [30].
Results
Phylogenetic analysis of An. culicifacies species complex using ITS2 sequences
The generated ITS2 Bayesian likelihood tree was dispersed into two main distinct clades and each clade evolved separately having a different genetic structure for each (Fig 1). The sibling species B, C and E were included in clade 1 and the sibling species A and D were included in clade 2. The sibling species of clade 1 were reported from geographical areas of India, Sri Lanka and Cambodia and China. The sibling species of clade 2 were reported from India and Iran.
Fig 1. Bayesian likelihood tree generated by MrBayes-3.2.5_WIN32_x86 software using ITS2 sequences of An. culicifacies.
35 sequences, 439 characters, 460 000 generations, 2 mcmc runs. Percentages on the node labels are Bayesian posterior probabilities (PP). Ae. aegypti was used as the out-group. GenBank accession numbers of ITS2 sequences included in clade 1, 2 and outgroup were listed in S4 Table.
https://doi.org/10.1371/journal.pone.0290178.g001
The branch length of clade 2 was greater than the branch length of clade 1 implying that sibling species of A and D of clade 2 might have more divergence than the sibling species of B, C and E in clade 1 from their most recent common ancestor (MRCA). All sibling species of A, B, C, D and E were reported in India. Only sibling species B and E were reported in Sri Lanka and sibling species A was reported from Iran and the B was reported from Cambodia. According to the Fig 1, the posterior probability value in the node of clade 1 was 72.21% (node A). The posterior probability value in the node of clade 2 was 97.68% (node B). As the clade 2 posterior probability value is close to 100 than the clade 1 value, the branching pattern of clade 2 can be considered as statistically more reliable than clade 1.
Phylogenetic analysis of An. culicifacies species complex using COI sequences
A different topology is shown in the tree produced from the phylogenetic analyses of COI sequences (Fig 2) than in the ITS2 Bayesian likelihood tree. The Bayesian likelihood tree of COI sequences was broadly dispersed into two in the first branching point with a posterior probability value of 100% in the branching point (node A). Four clades can be distinguished and each clade was separately evolved having a different genetic structure for each. The clade 1 to 3 were more closely related to each other than clade 4 with a MRCA representing node B with a posterior probability value of 74.73%.
Fig 2. Bayesian likelihood tree generated by MrBayes-3.2.5_WIN32_x86 software using COI sequences of An. culicifacies.
77 sequences, 450 characters, 1 640 000 generations, 2 mcmc runs. Percentages on the node labels are Bayesian posterior probabilities (PP). Ae. aegypti was used as the out-group. GenBank accession numbers of COI sequences included in Clade 1 to 4 and outgroup were listed in S5 Table.
https://doi.org/10.1371/journal.pone.0290178.g002
When considering clades 1, 2 and 3, there was a more close relationship between clades 2 and 3 having MRCA representing node C with a posterior probability value of 100% implying that the branching pattern was also more statistically reliable. The clade 4 was identified as a derived clade. There was a sister relationship formed by the clade 4 with other clades with its position supported by a high posterior probability value of 86.05%.
The sibling species B, C and E were included in clade 2 from India (sibling species B, C, E), Sri Lanka (sibling species B) and North-eastern Cambodia. The sibling species A and D were included in clade 3 where A from Iran, India, Pakistan and Oman and D from India. The sibling species from India, Pakistan and the United Arab Emirates were included in clade 4.
According to the Fig 3, the branch length of clade 3 was greater than the branch length of clade 2 implying the sibling species of A and D of clade 3 might have more genetic change (or divergence) than the sibling species of B, C and E in clade 2 from their MRCA. That result was the same with ITS2 phylogenetic analysis where the sibling species A and D might have more genetic divergence than sibling species B, C and E from their MRCA.
Fig 3. ITS2 haplotype network of An. culicifacies species complex (20 haplotypes) generated using DnaSP v5 software and Network 10.2 software.
The size of a circle indicates the relative frequency of sequences belonging to a certain sequence type. Each color indicates a different geographic area. The distance between two haplotypes includes no. of mutations. Red colored dots indicate median vectors.
https://doi.org/10.1371/journal.pone.0290178.g003
ITS2 haplotype network of An. culicifacies species complex
The ITS2 haplotype network is shown in Fig 3. There was a total of 20 haplotypes recorded from Iran, Cambodia, India, China and Sri Lanka based on the 35 ITS2 sequences used in the analysis. The haplotype frequencies and haplotype diversities (Hd) in each population are listed in S6 Table. There were 3 shared haplotypes named as H_2, H_11 and H_14 and the other remaining haplotypes were private haplotypes.
The highest number of private haplotypes (13) were recorded from India (Hd = 0.729) as the number of sequences is also high in India among other countries. There were no private haplotypes from Cambodia and Sri Lanka each having one shared haplotype with India. Hd values were zero in Cambodia, China and Sri Lanka. Any of the haplotypes from China was not shared between other localities.
There were two main haplogroups. The haplogroup 1 was composed of haplotypes from India, Cambodia, Sri Lanka and China. There were predominantly private haplotypes from India and two shared haplotypes (H_2 and H_11). The regional ancestor of the haplogroup 1 was H_2 which is a shared haplotype between India and Cambodia that was understood as the genetic backup of the haplogroup 1.
The haplogroup 2 was made up of private haplotypes from Iran and India with one shared haplotype H_14 that is shared between those two populations. The regional ancestor of the haplogroup 2 was H_14 which is a shared haplotype between Iran and India.
The highest distance was shown between the H_18 and H_13 haplotypes of India having the highest no. of mutations between them. The Table 4 below shows the sibling species belonging to each haplogroup and reported countries.
COI haplotype network of An. culicifacies species complex
The COI haplotype network is shown in Fig 4. There was a total of 55 haplotypes recorded from Iran, Sri Lanka, Pakistan, Oman, India, Cambodia and the United Arab Emirates based on the 77 COI sequences used in the analysis.
Fig 4. COI haplotype network of An. culicifacies species complex (55 haplotypes) generated using DnaSP v5 software and Network 10.2 software.
The size of a circle indicates the relative frequency of sequences belonging to a certain sequence type. Each color indicates a different geographic area. The distance between two haplotypes includes no. of mutations. Red colored dots indicate median vectors.
https://doi.org/10.1371/journal.pone.0290178.g004
The haplotype frequencies and haplotype diversities (Hd) in each population were listed in S7 Table. There were 4 shared haplotypes named as Hap_4, Hap_13, Hap_16 and Hap_32 and the remaining ones were private haplotypes.
The highest number of private haplotypes (26) were recorded from India (Hd = 0.980) while the least number of private haplotypes (one from each) were recorded from Oman, United Arab Emirates and Cambodia. The lowest Hd value was obtained from Sri Lanka (Hd = 0.957). The haplotypes of United Arab Emirates, Iran, Cambodia and Oman were not shared between other localities.
There were four main haplogroups. The haplogroup 1 was made up of haplotypes from India, Sri Lanka and Pakistan. There were predominantly private haplotypes from Sri Lanka and two shared haplotypes (Hap_16, Hap_32) that were shared between India and Sri Lanka. The regional ancestor of the haplogroup 1 was identified as Hap_40 from India.
The haplogroup 2 was made up of haplotypes from India, Sri Lanka and Cambodia. There were predominantly private haplotypes from India and no shared haplotypes. The regional ancestor of the haplogroup 2 was identified as Hap_9 of India.
The haplogroup 3 was made up of haplotypes from India, Pakistan and Iran. There was one shared haplotype Hap_4 that was shared between India and Pakistan. The regional ancestor of the haplogroup 3 was identified as Hap_4 which is a shared haplotype between Pakistan and India.
The haplogroup 4 was composed of haplotypes from India, Pakistan and the United Arab Emirates. There were predominantly private haplotypes from Pakistan and one shared haplotype Hap_13 that was shared between India and Pakistan. The regional ancestor of the haplogroup 4 was identified as Hap_13 which is a shared haplotype between Pakistan and India. The Table 5 below shows the sibling species belongs to each haplogroup and reported countries.
Determination of genetic diversity
The genetic diversity of An. culicifacies species complex based on ITS2 and COI sequences is shown in Table 6 obtained using DnaSP Version 5.10.01 software.
Table 6. Number of variable sites (S), haplotypes (h), nucleotide diversity (π), haplotype diversity (Hd), the average number of nucleotide differences (k), Tajima’s D, Fu and Li’s D, Fu and Li’s F, Fu’s Fs statistic based on ITS2 and COI sequences of An. culicifacies.
https://doi.org/10.1371/journal.pone.0290178.t006
When considering all sequences of ITS2 and COI, the nucleotide diversity (π), haplotype diversity (Hd), and an average number of nucleotide differences (k) were higher in COI sequences than in ITS2 sequences.
A relatively high haplotype diversity (Hd) values of 0.661 to 0.981 was observed in ITS2 and COI sequences respectively. The nucleotide diversity (π) and the average number of nucleotide differences (k) were higher in the analysis of COI sequences than in ITS2 sequences.
The G+C content was 0.591 at coding positions of 341.00 sites in ITS2 sequences and the total G+C content was 0.305 at coding positions of 378.00 sites in COI sequences. The total number of mutations was higher in COI sequences than in ITS2 sequences having 413 of a total number of mutations in COI and 28 total number of mutations in ITS2.
In both ITS2 and COI analysis statistically non-significant (P>0.10) positive values of Tajima’s D values were obtained. There were negative values of Fu and Li’s F and D and positive values of Fu’s Fs statistic for both ITS2 and COI sequence analyses.
Determination of genetic differentiation
Genetic differentiation based on ITS2 sequences of An. culicifacies species complex.
Pairwise fixation index (FST) values among populations of An. culicifacies species complex based on ITS2 sequences are shown in Table 7.
The commonly used values for FST are as follows: 0 to 0.05, small; 0.05 to 0.15, moderate; 0.15 to 0.25, great; values above 0.25 indicate huge genetic differentiation [26].
According to Table 7, there were FST values ranging from -0.57778 to 1.0000. The highest FST value (1.0000) was obtained between Cambodia-Sri Lanka indicating huge genetic differentiation between populations. Negative FST values which are effectively seen as zero were obtained for Cambodia-India, India-China indicating the absence of genetic subdivision between the considered populations.
Genetic differentiation based on COI sequences of An. culicifacies species complex.
Pairwise fixation index (FST) values among populations of An. culicifacies species complex based on COI sequences are shown in Table 8.
The commonly used values for FST are as follows: 0 to 0.05, small; 0.05 to 0.15, moderate; 0.15 to 0.25, great; values above 0.25 indicate huge genetic differentiation [26].
According to Table 4, there were FST values ranging from -0.70893 to 1.0000. The highest FST value (1.0000) was obtained for Oman-Cambodia, Oman-Arab and Cambodia-Arab indicating huge genetic differentiation between populations. Negative FST values which are effectively seen as zero were obtained for Iran-Oman, Iran-Cambodia, Sri Lanka-Arab, Pakistan-Arab and India-Cambodia indicating the absence of genetic subdivision between the considered populations when the allele frequencies of two subpopulations are very similar.
Phylogeographic analysis of An. culicifacies species complex using ITS2 sequences
The phylogeographic tree of An. culicifacies species complex using ITS2 sequences is shown in Fig 5. According to the Fig 5, the diversification of the Anopheles genus was dated at 83.06 Mya (node A) with a 95% HPD ranging from 81.15 to 85.09 Mya in the Cretaceous. The diversification of An. culicifacies species complex was dated at 22.19 Mya (node B) with a 95% HPD ranging from 20.25 to 24.12 Mya in the late Miocene.
Fig 5. Evolutionary timescale for An. culicifacies species complex generated by BEAST v1.8.2 software using ITS2 sequences.
35 sequences, 435 characters. An. subpictus was used as the out-group. Numbers near the nodes designate the average divergence time estimated (Million Years, Mya). Geological time scale includes Cretaceous, Pala(eocene), Eoce(ne), Olig(ocene) and Mioc(ene) respectively. GenBank accession numbers of ITS2 sequences included in clades 1, 2 and outgroup were listed in S8 Table. Posterior probability values, Mean values of diversification times and 95% highest posterior density (HPD) of each nodes of the ITS2 phylogeographic tree is listed in S9 Table.
https://doi.org/10.1371/journal.pone.0290178.g005
The species complex was split into two clades as in the phylogenetic analysis of ITS2 sequences. The diversification of clade 1 including sibling species A and D was dated at 16.25 Mya (node C) in Miocene. The sibling species A from Iran and both sibling species A and D from India were included in clade 1. There are two subclades inside clade 1 at node E and F that were dated at 12.11 Mya and 10.85 Mya separately. The diversification of clade 2 including sibling species B, C and E was 19.17 Mya (node D) in Miocene. The sibling species B from Sri Lanka, Cambodia and India, the sibling species C from India and the sibling species E from Sri Lanka were included in clade 2. There are three subclades inside clade 2 at nodes G, H and I that were dated at 8.73 Mya, 16.47 Mya and 16.18 Mya separately.
The diversification of the clade 1 including sibling species A and D was more recent than the clade 2 with sibling species B, C and E based on ITS2 phylogeographic analysis. The divergence of ITS2 sequences into clade 1 and clade 2 occurred prior to splitting into sibling species.
The diversification of the clade 1 including sibling species A and D was more recent than the clade 2 with sibling species B, C and E based on ITS2 phylogeographic analysis. The divergence of ITS2 sequences into clade 1 and clade 2 occurred prior to splitting into sibling species.
The diversification times of the earliest members of the sibling species A, B, C, D and E based on the ITS2 phylogeographic tree are listed in Table 9. According to that, the sibling species B can be considered as the earliest evolved sibling species and the sibling species C was the recently evolved sibling species based on ITS2 sequences.
Phylogeographic tree of An. culicifacies species complex using COI sequences
The phylogeographic tree of An. culicifacies species complex using COI sequences was shown in Fig 6.
Fig 6. Evolutionary timescale of An. culicifacies species complex generated by BEAST v1.8.2 software using COI sequences.
Sequences were retrieved from NCBI GenBank, 77 sequences, 450 characters. An. subpictus was used as the out-group. Numbers near the nodes designate the average divergence time estimated (Million Years, Mya). Geological time scale includes Cretaceous, Pala(eocene), Eoce(ne), Olig(ocene) and Mioc(ene) respectively. GenBank accession numbers of COI sequences included in clades 1 to 3 and outgroup were listed in S10 Table. Posterior probability values, Mean values of diversification times and 95% highest posterior density (HPD) of each nodes of the COI phylogeographic tree is listed in S11 Table.
https://doi.org/10.1371/journal.pone.0290178.g006
The split time of the most recent common ancestor of the Anopheles genus was obtained as 82.75 Mya in the Cretaceous (node A) with a 95% HPD (Height posterior density) ranging from 80.93 to 85.07 Mya. The split date of An. culicifacies species complex was 24.33 Mya shown in node B with a 95% HPD ranging from 22.37 to 26.22 Mya.
There were three clades diversified in the Miocene period. The diversification of clade 1 including sibling species from Iran, India, Pakistan, Oman and Sri Lanka was dated as 17.30 Mya (node C). The diversification of clade 2 and clade 3 was dated as 18.07 Mya (node D) and 17.92 Mya (node E) respectively. The sibling species from Sri Lanka, India and Pakistan were included in clade 2. The sibling species from Iran, India, Sri Lanka, Pakistan and the United Arab Emirates were included in clade 3.
There were three subclades identified within clade 1 arising at nodes F, G and H. The first subclade arising at node F was dated 10.11 Mya including sibling species A and D from India, and sibling species A from Pakistan, Iran and Oman. The second sub clade arising at node G was dated as 9.59 Mya including sibling species B and C from India, and sibling species B from Sri Lanka. The third sub clade arising at node H was dated 11.77 Mya including species B, C and E from India and one specimen from Northeastern Cambodia. There were two sub clades identified within clade 2 arising at node I and J dated at 12.18 Mya and 16.35 Mya separately.
The diversification of sibling species of the species complex mainly occurred in Miocene. The diversification times of the earliest members of the sibling species A, B, C, D and E based on the COI phylogeographic tree were listed in Table 10. According to that, the sibling species A can be identified as the earliest evolved sibling species and the sibling species D was the recently evolved sibling species based on COI sequences.
Discussion
Molecular taxonomy allows us to distinguish the mosquito vectors to species or sibling species level. Molecular systematics provides information on genetic diversity and supports predicting their evolution and phylogenetic relationships [31]. The analysis of phylogenetic and phylogeographic relationships, identification of haplotypes in different geographical areas, understanding of genetic diversity and genetic differentiation based on ITS2 and COI sequences of An. culicifacies species complex was carried out in this study. The Bayesian likelihood trees generated using ITS2 and COI sequences further supported the sibling species categorization and displayed the phylogenetic relationships between sibling species of An. culicifacies species complex with well-supported clades with high posterior probability values. In the phylogenetic analysis using ITS2 sequences, the generated Bayesian likelihood tree was dispersed into two main distinct clades including the sibling species B, C, and E in clade 1 and the sibling species A and D in clade 2. A considerable sequence divergence between these two clades and a strict sequence conservation within the members of a given clade was implied.
That result was the same as the results based on the primary and secondary structure of the ITS2 sequences. There are two clades of sibling species as A, D and B, C, and E of the species complex based on the primary and secondary structures of the ITS2 sequences [2].
There was more genetic change (or divergence) observed in the sibling species of A, and D than in the sibling species of B, C, and E after splitting from a common ancestor based on the branch length of the two clades of the ITS2 Bayesian likelihood tree. Another phylogenetic analysis of An. culicifacies species complex based on the ITS2 sequences performed using multiple aligned sequences with a neighbour-joining distance method, the species of A and D are under a less selective pressure and evolving faster compared to the sibling species in B, C and E. The members in a certain clade having a near sequence identity without any possibility of cross-breeding suggests that ITS2 regions are undergoing parallel evolution [8].
In the phylogenetic analysis using COI sequences, the generated Bayesian likelihood tree was dispersed into four clades producing a tree with a different topology than the ITS2 Bayesian likelihood tree. Only in the clades 2 and 3 of the COI Bayesian likelihood tree which includes the sibling species B, C, E in the clade 2 and sibling species A, D in the clade 3 were shown near relatedness to the clades of the ITS2 Bayesian likelihood tree. When considering the branch lengths of the clades 2 and 3, the obtained results were the same as the ITS2 phylogenetic analysis which was that sibling species A and D might have more genetic divergence than the sibling species B, C and E from their MRCA.
The generated COI Bayesian likelihood tree was divided into four clades as highly divergent COI sequences and the mitochondrial genome having a higher mutation rate than the nuclear genome. According to the generated two haplotype network, the total number of haplotypes was higher in the COI haplotype network than in the ITS2 haplotype network. Generally, the haploid mitochondrial genome has a higher mutation rate than the nuclear genome and the occurring variations can produce different haplotypes. The considered number of COI sequences was also higher than the number of ITS2 sequences in this study.
The highest number of private haplotypes was recorded from India with high Hd values in both ITS2 and COI haplotype networks. A large effective population size in this area is represented by this private haplotype richness and the high haplotype diversity. The private haplotypes in a particular geographical area are indicators of a restricted gene flow from that area and a population that is adapted to local conditions [32].
There were lower Hd values obtained from Sri Lankan sequences among the sequences in both ITS2 and COI analysis as there are only two sibling species reported from Sri Lanka. In addition to that, the low haplotype diversity that occurred from bottlenecks might have resulted from the intense malaria control efforts, habitat fragmentation and disappearance of vertebrate blood meal sources [32]. It can become high in the future, through the imported cases resulting in a risk of disease re-emergence in Sri Lanka.
When considering the genetic diversity of An. culicifacies species complex based on the ITS2 and COI sequences recorded from all geographical regions, high Hd values were observed in both analyses, suggesting an extensive genetic diversity of the mosquitoes could be due to several facts such as including five sibling species in the complex, a large effective population size of mosquitoes, high gene flow among populations and favorable environmental conditions for mosquito breeding, regardless of control campaigns [33]. This high level of genetic diversity observed in this study implies that the possibility of An. culicifacies species complex maintaining a relatively high effective population size despite enormous population fluctuations.
A long-established mosquito population that has not faced recent bottlenecks and the possibility of the presence of genetically distant haplotypes in sympatry was indicated by the higher nucleotide diversity (π) and an average number of nucleotide differences (k) [33] in the analysis of COI sequences than the ITS2 sequences.
The null hypothesis of neutral evolution indicating that the mosquito populations in the study at mutation-drift equilibrium maintained a stable large population was accepted from non-significant Tajima’s D values [33]. That was indicated by the non-significant Tajima’s D values of both analyses of the ITS2 and COI sequences. There were positive values of Tajima’s D in both ITS2 and COI analyses indicating a decreasing population size or balancing selection and a low level of both low and high-frequency polymorphisms [34].
There were negative values of Fu and Li’s F and D in both analyses of the ITS2 and COI sequences from all countries in this study. An excess of singletons and a high number of unshared haplotypes were indicated by their negative values of it [35]. The deficiency of alleles from a recent population bottleneck can be due to a genetic drift causing reducing population size as indicated by positive values of Fu’s Fs statistic [36].
Considering previous studies, the genetic diversity was well studied in Ae. aegypti using these neutrality test values. Ae. aegypti mosquitoes collected from different regions of Sri Lanka was genetically characterized in relation to Ae. aegypti mosquitoes found from various parts of the world based mitochondrial COI gene. A stable large population at mutation-drift equilibrium was suggested through the obtained statistically non-significant Tajima’s D value as same as this study. Evolutionary fitness of the mosquito and stability of their populations is increased by the high genetic diversity and extensive genetic mixing [33].
According to the pairwise FST values, a huge genetic differentiation between populations was observed from the highest FST value of Cambodia-Sri Lanka in the ITS2 analysis and Oman-Cambodia, Oman-Arab and Cambodia-Arab in the COI analysis. A high genetic differentiation might be achieved due to the large geographical distance between the two locations [32]. When the distance between the populations increases, the genetic similarity between the populations was decreased. That suggests the isolation by distance that is derived from spatially limited gene flow [37]. Negative FST values which are effectively seen as zero were obtained for Cambodia-India, India-China in ITS2 analysis and Iran-Oman, Iran-Cambodia, Sri Lanka-Arab, Pakistan-Arab and India-Cambodia in COI analysis. FST value is close to zero when the allele frequencies of two subpopulations are very similar and not differentiated [38]. High rates of gene flow and the absence of genetic subdivision between the considered populations were indicated by low FST values.
Anopheles fossil record is very uncommon as there were only two Anopheles fossils recognized currently named Anopheles (Nyssorhynchus) dominicanus from the Late Eocene (33.9–41.3 Mya) being the oldest one and Anopheles rottensis from the Late Oligocene (13.8–33.9 Mya) being the most recent one [27]. The usage of those records seemed to be problematic based on the applied dating technique [39]. Therefore, the diversification date of the most recent common ancestor (MRCA) of the Anopheles genus was used to calibrate the phylogeographic trees in the study. The diversification of the Anopheles genus was dated in this study ranging from 81.15 to 85.09 Mya and 80.93 to 85.07 Mya based on ITS2 and COI phylogeographic analysis respectively in the Cretaceous.
The link between the diversification of anophelines and malaria parasites is still controversial [40]. The origin of P. falciparum was dated at 2.5 Mya and the initial radiation of mammalian Plasmodium at 12.8 Mya [41]. The divergence time of these species was younger than which inferred for their Anopheles vectors in our analysis.
The diversification of An. culicifacies species complex was obtained as ranging from 20.25 to 24.12 Mya and 22.37 to 26.22 Mya based on ITS2 and COI phylogeographic analysis respectively. That was nearly equal to the divergence of An. culicifacies based on mitochondrial genome PCG123 datasets were obtained as 21.76 Mya [42]. The diversification of the clade 2 including the sibling species B, C and E was earlier than the clade 1 with sibling species A and D as shown in the ITS2 phylogeographic analysis. The divergence of the clade 1 and 2 occurred prior to splitting into the sibling species. The diversification of sibling species of the species complex mainly occurred in the Miocene.
The split time of An. subpictus was more recent than the split time of An. culicifacies in this study that were obtained as 12.2 Mya and 9.25 Mya in ITS2 and COI phylogeographic analysis respectively. When considering other species of the genus Anopheles, the diversification of Anopheles gambiae / South East Asia-Oceania anophelines, Anopheles darlingi / Anopheles albitarsis complex and Anopheles aquasalis / Anopheles albitarsis complex were dated at 65.5 Mya, 38.98 Mya and 28.56 Mya respectively [29]. Therefore, the diversification of An. culicifacies species complex obtained in this study was more recent than these species of the Anopheles genus.
According to the obtained dates of the phylogeographic analysis, the sibling species B was considered as the earliest evolved sibling species based on ITS2 sequences and when considering COI sequences it was the sibling species A. The recently evolved sibling species were C and D respectively in ITS2 and COI analysis. Based on previous studies, it was mentioned that initially the sibling species A and B evolved from a common ancestor and the sibling species D diverged from A and the sibling species C from the sibling species B later [9]. The diversification times of the sibling species were not mentioned in previous studies.
The diversification dates of the earliest members of the sibling species are varied between ITS2 and COI analysis which can be due to the changes in the diversification of the nuclear genome and the mitochondrial genome. It is assumed that the mtDNA evolves at a faster rate than nuclear DNA (nuDNA) in animals and the ratio of mtDNA over nuDNA mutation rate varies between 2 and 6 in invertebrates including insects and arachnids [43].
Most of the intraspecific diversifications are seen in the Pliocene in the study. In ITS2 phylogeographic analysis, intraspecies diversification was observed mostly in the Pliocene period in India, Iran, Cambodia, Sri Lanka and China as well as in COI analysis it was observed in countries of Pakistan, India, Sri Lanka and Cambodia. The climate is warm and humid during this period and tropical forest is likely to have been widespread across Southeast Asia and larval habitats abundant as well as the forest fragmentation is also increased. Therefore, it can be increased the niche availability and facilitate the spread [44]. All Pleistocene-dated divergence events can be observed in the COI phylogeographic analysis in the countries of Iran, India, Sri Lanka, Oman and the United Arab Emirates. It indicated that the Pleistocene glaciations have shaped genetic diversity within species and possibly to have retained sacks of forest habitat at intermediate elevations throughout the Pleistocene [44].
The possibility of An. culicifacies species complex maintaining a relatively high effective population size despite of the enormous population fluctuations was suggested. There are several limitations of this study as this was carried out using retrieved sequences of the NCBI database to fill the gaps of phylogeny, diversification and genetic diversity of An. culicifacies species complex. A concatenated phylogenetic tree could not be generated as both ITS2 and COI sequences were not reported from the same specimen. It is necessary to revisit the mosquito population to achieve success from the vector control strategies while incorporating the information on the genetic markers including both mitochondrial and nuclear markers to enhance the robustness of the reported observations.
Conclusions
The importance of understanding the genetic diversity, phylogenetic and phylogeographic relationships of mosquito vectors using ITS2 and COI sequences of An. culicifacies species complex was focused. The sibling species of An. culicifacies species complex was broadly divided into two clades as A, D and B, C, E in both phylogenetic and phylogeographic analysis using ITS2 sequences. There were 3 to 4 clades in phylogenetic and phylogeographic analysis using COI sequences. There was more genetic divergence with less selective pressure and a recent diversification shown in the sibling species A and D than the sibling species B, C and E from their MRCA. The high genetic diversity of An. culicifacies species complex suggests the likelihood of maintaining a relatively high effective population size. The genetic diversities of vectors of mosquito-borne diseases should be considered in implementing effective vector control strategies as uniform control measures may not be equally effective for all the populations of a single species.
Acknowledgments
All the members of the Centre for Biotechnology and Genetics and Molecular Biology Unit at the University of Sri Jayewardenepura, Sri Lanka were greatly acknowledged for the assistance rendered during this study.
References
- 1.
Surendran SN, Abhayawardana TA, De Silva BGDNK, Ramasamy R, Ramasamy MS. Anopheles culicifacies Y-chromosome dimorphism indicates sibling species (b and e) with different malaria vector potential in Sri Lanka. Medical and Veterinary Entomology. 2000; 14(4): 437–440. pmid:11129709. - 2.
Shanika KHS, Harischandra IN, De Silva BGDNK. Nucleotide sequence and secondary structure variations in ITS2-rDNA region of the members of Anopheles culicifacies (Diptera: Culicidae) species complex Nucleotide Sequence and Secondary Structure Variations in ITS2-rDNA Region of the Members of Anopheles culicifacies (Diptera: Culicidae) Species Complex. Vidyodaya Journal of Science. 2016; 20: 26–40. - 3.
Sharma VP, Dev V. Biology & control of Anopheles culicifacies Giles 1901. Indian Journal of Medical Research. 2015; 142: 525–536. pmid:26139769. - 4.
Goswami G, Raghavendra K, Nanda N, Gakhar SK, Subbarao SK. PCR-RFLP of mitochondrial cytochrome oxidase subunit II and ITS2 of ribosomal DNA: Markers for the identification of members of the Anopheles culicifacies complex (Diptera: Culicidae). Acta Tropica. 2005; 95(2): 92–99. pmid:15967406. - 5.
Van Bortel W, Sochanta T, Harbach RE, Socheat D, Roelants P, Backeljau T, et al. Presence of Anopheles culicifacies B in Cambodia established by the PCR-RFLP assay developed for the identification of Anopheles minimus species A and C and four related species. Medical and Veterinary Entomology. 2002; 16(3): 329–334. pmid:12243235. - 6.
Surendran SN, Hawkes NJ, Steven A, Hemingway J, Ramasamy R. Molecular studies of Anopheles culicifacies (Diptera: Culicidae) in Sri Lanka: Sibling species B and E show sequence identity at multiple loci. European Journal of Entomology. 2006; 103(1): 233–237. - 7.
Kar I, Subbarao SK, Eapen A, Ravindran J, Satyanarayana TS, Raghavendra K, et al. Evidence for a new malaria vector species, species E, within the Anopheles culicifacies complex (Diptera: Culicidae). Journal of medical entomology. 1999; 36(5): 595–600. pmid:10534953. - 8.
Dassanayake RS, Gunawardene YINS, De Silva BGDNK. ITS-2 secondary structures and phylogeny of Anopheles culicifacies species. Bioinformation. 2008; 2(10): 456–460. pmid:18841242. - 9.
Surendran SN, Ramasamy R. The Anopheles culicifacies and An. subpictus species complexes in Sri Lanka and their implications for Malaria control in the country. Tropical Medicine and Health. 2010; 38(1): 1–11. - 10.
Tyagi V, Sharma AK, Dhiman S, Srivastava AR, Yadav R, Sukumaran D, et al. Malaria vector Anopheles culicifacies sibling species differentiation using egg morphometry and morphology. Parasites & Vectors. 2016; 9(1): 1–13. pmid:27075571. - 11.
Prasad A, Mathur P, Srivastava M, Sharma E, Parveen A, Kumar D. Ecology and Behaviour of Anopheles culicifacies sensu lato (s.l.). A General review. International Journal of Current and Academic Review. 2015; 3(5): 227–241. - 12.
Subbarao SK, Vasantha K, Raghavendra K, Sharma VP, Sharma GK. Anopheles culicifacies: sibling species composition and its relationship to malaria incidence. Journal of American Mosquito Control Association. 1988; 4(1): 29–33. pmid:3057115. - 13.
Hu Y, Fan H, Chen Y, Chang J, Zhan X, Wu H, et al. Spatial patterns and conservation of genetic and phylogenetic diversity of wildlife in China. Science Advances. 2021; 7(4): p.eabd5725. pmid:33523945. - 14.
Avise JC. Phylogeography: the history and formation of species. Harvard university press. 2000. - 15.
Rius M, Turon X. Phylogeography and the description of geographic patterns in invasion genomics. Frontiers in Ecology and Evolution. 2020; 8: p.595711. - 16.
Sum JS, Lee WC, Amir A, Braima KA, Jeffery J, Abdul-Aziz NM, et al. Phylogenetic study of six species of Anopheles mosquitoes in Peninsular Malaysia based on inter-transcribed spacer region 2 (ITS2) of ribosomal DNA. Parasites & vectors. 2014; 7(1): 1–8. pmid:24993022. - 17.
Walton C, Hardley JM, Kuvangkadilok C, Collins FH, Harbach RE, Baimai V, et al. Identification of five species of the Anopheles dirus complex from Thailand, using allele-specific polymerase chain reaction. Medical and Veterinary Entomology. 1999; 13:24–32. pmid:10194746. - 18.
Li C, Lee JS, Groebner JL, Kim HC, Klein TA, O’Guinn ML, et al. A newly recognized species in the Anopheles hyrcanus Group and molecular identification of related species from the Republic of South Korea (Diptera: Culicidae). Zootaxa. 2005; 939:1–8. - 19.
Loaiza JR, Scott ME, Bermingham E, Sanjur OI, Rovira JR, Dutari LC, et al. Novel genetic diversity within Anopheles punctimacula s.l.: Phylogenetic discrepancy between the Barcode cytochrome c oxidase I (COI) gene and the rDNA second internal transcribed spacer (ITS2). Acta Tropica. 2013; 128(1): 61–69. pmid:23806568. - 20.
Carter TE, Yared S, Hansel S, Lopez K, Janies D. Sequence-based identification of Anopheles species in eastern Ethiopia. Malaria Journal. 2019; 18(1): 1–7. pmid:30992003. - 21.
Karimian F, Oshaghi MA, Sedaghat MM, Waterhouse RM, Vatandoost H, Hanafi-Bojd AA, et al. Phylogenetic analysis of the oriental-palearctic-afrotropical members of Anopheles (Culicidae: Diptera) based on nuclear rDNA and mitochondrial DNA characteristics. Japanese Journal of Infectious Diseases. 2014; 67(5): 361–367. pmid:25241686. - 22.
Lorenz C, Alves JM, Foster PG, Suesdek L, Sallum MAM. Phylogeny and temporal diversification of mosquitoes (Diptera: Culicidae) with an emphasis on the Neotropical fauna. Systematic Entomology. 2021; 46(4): 798–811. - 23.
Jizhou LV, Wu S, Zhang Y, Chen Y, Feng C, Yuan X, et al. Assessment of four DNA fragments (COI, 16S rDNA, ITS2, 12S rDNA) for species identification of the Ixodida (Acari: Ixodida). Parasites & vectors. 2014; (7): 1–11. pmid:24589289. - 24.
Depaquit J, Lienard E, Verzeaux-Griffon A, Ferte H, Bounamous A, Gantier J, et al. Molecular homogeneity in diverse geographical populations of Phlebotomus papatasi (Diptera, Psychodidae) inferred from ND4 mtDNA and ITS2 rDNA. Epidemiological consequences. Infection, Genetics and Evolution. 2008; Evol. 8(2): 159–170, 2008, pmid:18243814. - 25.
Jayatunga DPW, Harischandra IN, Chandrasekharan NV, de Silva BGDNK. Compensatory base changes reveal sexual incompatibility among members of the Anopheles subpictus sensu lato (Diptera: Culicidae) species complex in Sri Lanka. Life. 2021; 11(3). pmid:33800295. - 26.
Rangel-Gamboa L, Martínez-Hernandez F, Maravilla P, Arenas-Guzmán R, Flisser A. Update of phylogenetic and genetic diversity of Sporothrix schenckii sensu lato. Medical Mycology. 2016; 54(3): 248–255. pmid:26591010. - 27.
Freitas LA, Russo CAM, Voloch CM, Mutaquiha OCF, Marques LP, Schrago CG. Diversification of the genus Anopheles and a neotropical clade from the Late Cretaceous. Plos One. 2015; 10(8): 1–12. pmid:26244561. - 28.
Lafontaine DLJ, Tollervey D. The function and synthesis of ribosomes. Nature Reviews Molecular Cell Biology. 2001; 2(7): 514–520. pmid:11433365. - 29.
Martinez-Villegas L, Assis-Geraldo J, Koerich LB, Collier TC, Lee Y, Main BJ, et al. Characterization of the complete mitogenome of Anopheles aquasalis, and phylogenetic divergences among Anopheles from diverse geographic zones. Plos One. 2019; 14(9): 1–22. pmid:31479460. - 30.
Gaunt MW, Miles MA. An insect molecular clock dates the origin of the insects and accords with palaeontological and biogeographic landmarks. Molecular Biology and Evolution. 2002; 19(5): 748–761. pmid:11961108. - 31.
Weeraratne TC, Surendran SN, Reimer LJ, Wondji CS, Perera MDB, Walton C, et al. Molecular characterization of Anopheline (Diptera: Culicidae) mosquitoes from eight geographical locations of Sri Lanka. Malaria Journal. 2017; 16(1): 1–14. pmid:28578667. - 32.
Mayoke A, Muya SM, Bateta R, Mireji PO, Okoth SO, Onyoyo SG, et al. Genetic diversity and phylogenetic relationships of tsetse flies of the palpalis group in Congo Brazzaville based on mitochondrial cox1 gene sequences. Parasites & Vectors. 2020; 13(1): 1–16. pmid:32410644. - 33.
Dharmarathne HAKM, Weerasena OVDSJ, Perera KLNS, Galhena G. Genetic characterization of Aedes aegypti (Diptera: Culicidae) in Sri Lanka based on COI gene. Journal of Vector Borne Diseases. 2020; 57(2): 153–160. pmid:34290160. - 34.
Panhuis TM, Swanson WJ. Molecular evolution and population genetic analysis of candidate female reproductive genes in Drosophila. Genetics. 2006; 173(4): 2039–2047. pmid:16783023. - 35.
Fernando HSD, Hapugoda M, Perera R, Black WC, de Silva BGDNK. Mitochondrial metabolic genes provide phylogeographic relationships of global collections of Aedes aegypti (Diptera: Culicidae). Plos One. 2020; 15(7): 1–18. pmid:32722672. - 36.
Sharma M, Fomda BA, Mazta S, Sehgal R, Singh BB, Malla N. Genetic diversity and population genetic structure analysis of Echinococcus granulosus sensu stricto complex based on mitochondrial DNA signature. Plos One. 2013; 8(12): 1–8. pmid:24349394. - 37.
Jensen JL, Bohonak AJ, Kelley ST. Isolation by distance, web service. BMC Genomic Data. 2005; 6: 1–6. pmid:15760479. - 38.
Jacquot M. Genomic diversity of pathogenic bacteria in the Borrelia burgdorferi species complex: evolution and molecular epidemiology. PhD thesis. Blaise Pascal University. 2014. - 39.
Logue K, Chan ER, Phipps T, Small ST, Reimer L, Henry-Halldin C, et al. Mitochondrial genome sequences reveal deep divergences among Anopheles punctulatus sibling species in Papua New Guinea. Malaria Journal. 2013; 12(1): 1–11. pmid:23405960. - 40.
Moreno M, Marinotti O, Krzywinski J, Tadei WP, James AA, Achee NL, et al. Complete mtDNA genomes of Anopheles darlingi and an approach to anopheline divergence time. Malaria Journal. 2010; 9(1): 1–13. pmid:20470395. - 41.
Ricklefs RE, Outlaw DC. A molecular clock for malaria parasites. Science. 2010; 329(5988): 226–229. pmid:20616281. - 42.
Hao YJ, Zou YL, Ding YR, Xu WY, Yan ZT, Li XD, et al. Complete mitochondrial genomes of Anopheles stephensi and An. dirus and comparative evolutionary mitochondriomics of 50 mosquitoes. Scientific Reports. 2017; 7(1): 1–13. pmid:28794438. - 43.
Allio R, Donega S, Galtier N, Nabholz B. Large variation in the ratio of mitochondrial to nuclear mutation rate across animals: Implications for genetic diversity and the use of mitochondrial DNA as a molecular marker. Molecular Biology and Evolution. 2017; 34(11): 2762–2772. pmid:28981721. - 44.
Morgan K, O’Loughlin SM, Mun-Yik F, Linton YM, Somboon P, Min S, et al. Molecular phylogenetics and biogeography of the Neocellia Series of Anopheles mosquitoes in the Oriental Region. Molecular Phylogenetics and Evolution. 2009; 52(3): 588–601. pmid:19603555.