Exploring internal features of 16S rRNA gene for identification of clinically relevant species of the genus Streptococcus

Background Streptococcus is an economically important genus as a number of species belonging to this genus are human and animal pathogens. The genus has been divided into different groups based on 16S rRNA gene sequence similarity. The variability observed among the members of these groups is low and it is difficult to distinguish them. The present study was taken up to explore 16S rRNA gene sequence to develop methods that can be used for preliminary identification and can supplement the existing methods for identification of clinically-relevant isolates of the genus Streptococcus. Methods 16S rRNA gene sequences belonging to the isolates of S. dysgalactiae, S. equi, S. pyogenes, S. agalactiae, S. bovis, S. gallolyticus, S. mutans, S. sobrinus, S. mitis, S. pneumoniae, S. thermophilus and S. anginosus were analyzed with the purpose to define genetic variability within each species to generate a phylogenetic framework, to identify species-specific signatures and in-silico restriction enzyme analysis. Results The framework based analysis was used to segregate Streptococcus spp. previously identified upto genus level. This segregation was validated using species-specific signatures and in-silico restriction enzyme analysis. 43 uncharacterized Streptococcus spp. could be identified using this approach. Conclusions The markers generated exploring 16S rRNA gene sequences provided useful tool that can be further used for identification of different species of the genus Streptococcus.


Background
The genus Streptococcus consists of spherical Gram positive bacteria belonging to the class Bacilli and the order Lactobacillales [1]. The group is large and comprises of numerous clinically significant species which are responsible for wide variety of infections in human and animals. Streptococcus of different groups are known to cause human diseases, some species being highly virulent and responsible for major diseases. Species like S. pyogenes, S. agalactiae and S. pneumoniae are important as they cause serious acute infections in man, but several other species are also involved in a number of diseases like infective endocarditis, abscesses and other pathological conditions [2]. Various species of Streptococcus are known to be associated with infections of cattles, pigs, horses, sheeps, birds, aquatic mammals and fishes [3]. The genus has undergone considerable taxonomic revisions and has been divided into different groups (pyogenic, anginosus, mitis, mutans, salivarius, bovis) based on 16S rRNA gene sequence similarity [4].
Since many species belonging to the genus Streptococcus are associated with various pathological conditions, different protocols have been used for their identification. Still precise identification of these species is laborious. Clinical laboratories use serological grouping by Lancefield, haemolytic reactions and phenotypic tests for identification of various Streptococcus isolates. However, these Lancefield groups are not species-specific [5,6] and haemolytic activity differs within species and depends on incubation procedures. Strains within a given species may differ for a common trait [7,8] and even the same strain may exhibit biochemical variability [9,10].
16S rRNA gene sequencing has proved to be one of the most powerful tools for the classification of microorganisms [38] and has been used for identification of clinically relevant microbes [39,40]. Therefore, molecular tools based on 16S rRNA gene can be developed and used for identification. However it is also true that the correct identification of bacterial species may not be based on the nucleotide sequence of a single gene. Multilocus Sequence Analysis (MLSA) of several housekeeping genes has to be performed. But from practical standpoint there is need for a simplified approach for preliminary identification of a species, particularly under the conditions if the amount of isolated DNA is not enough for MLSA or it does not react with a complete set of typing primers. The current work considers the possibility to use 16S rRNA sequences for this purpose and is useful for practical applications.Thus the present study aims to explore internal features of 16S rRNA gene for preliminary identification of a species that can supplement the existing methods for identification. These methods include construction of phylogenetic framework, identification of species-specific signatures and restriction enzyme analysis.

Sequence data
16S rRNA gene sequences belonging to the genus Streptococcus from RDP database http://rdp.cme.msu.edu/ [41] were analysed in the present study. These included the sequences with relatively high number of identified organisms (86 sequences belonging to isolates of S. dysgalactiae, 61 to S. equi, 61 to S. pyogenes, 29 to S.agalactiae, 31 S. bovis-equinus (S. bovis and S. equinus are considered to be a single species [42]), 76 to S. gallolyticus, 102 to S. mutans, 23 to S.sobrinus, 28 to S. mitis, 41 to S. pneumoniae, 73 to S. thermophilus, 32 to S. anginosus) and 63 sequences of uncharacterized species identified only upto genus level. The sequences belonging to twelve sets of Streptococcus species occurring with higher frequency were used as the master species set for generating a phylogenetic framework, species-specific signatures and restriction enzyme analysis.

Phylogenetic Analyses
For phylogenetic analyses, the sequences were aligned using multiple alignment program CLUSTAL_X [43]. Evolutionary distances between all the sequences were calculated with DNADIST of the PHYLIP 3.6 package [44]. The program NEIGHBOR was used to draw neighbor joining [45] tree with Jukes and Cantor correction [46]. Statistical testing of the trees was done using SEQ-BOOT by resampling the dataset 1000 times. The trees were viewed through TreeView Version 1.6.6 [47]. For each of these 12 Streptococcus species data sets, sequences that formed a single cluster were aligned and a consensus was obtained by using JALVIEW sequence editor [48]. The sequence close to consensus from each group was chosen as a representative for that particular group. Based on this, a reference set of 63 sequences was selected to define the range of genetic variability present in each of the Streptococcus species.

Specific Signatures
Signatures were identified in each of the species data set using online MEME program [49]. Sequences of 12 Streptococcus species data sets were submitted groupwise in MEME program Version 4.6.1 http://meme.sdsc. edu/meme4_6_1/cgi-bin/meme.cgi. In order to obtain maximum number of motifs the default setting was modified from 3 motifs to 20 motifs. The default value of motif widths was also modified and re-set between 25 and 50. Each of the 20 signatures was checked for its frequency of occurrence among a particular Streptococcus sp. The signatures which did not appear in other Streptococcus spp. were considered as unique. BLAST search against NCBI database http://www.ncbi.nlm.nih. gov/ was carried out for these signatures to check their uniqueness.

Restriction enzyme analysis
Eleven Type II Restriction enzymes (Table 1) were considered for these analyses. Restriction Mapper Version 3 http://restrictionmapper.org/ was used to obtain the restriction pattern of the 12 Streptococcus species data sets employed for construction of phylogenetic framework. These restriction patterns were analyzed and a consensus pattern was determined for each species.

Cluster analysis for restriction profile
For cluster analyses MVSP (Multi Variate Statistical Package, Kovach Computing Services) version 3.13p was used. Dendrograms were constructed using the restriction patterns generated by different restriction enzymes for 12 framework species. The dendrograms show the utility of these enzymes in distinguishing different strains.

Results
In the present study, 16S rRNA gene sequences belonging to 12 different species from the genus Streptococcus were analyzed with the aim to construct phylogenetic framework, identification of species-specific signatures and restriction enzyme analysis.

Phylogenetic framework
Phylogenetic tree (Additional file 1: Fig. S1) based on 61 sequences of S. pyogenes revealed 8 clusters. 8 sequences representing these 8 clusters were chosen. These sequences could represent genetic heterogeneity present within this species. Similarly, sequences from other Streptococcus spp. were analysed for genetic heterogeneity present within them (Additional file 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12: Fig. S2-S12). Different representative sequences were choosen from each species that could provide information regarding the range of genetic variability present within each species (Table 2). 63 such representative sequences were selected for framework construction ( Figure 1). Strains of all the species were clearly segregated except for S. pneumoniae and S. mitis suggesting a high level of similarity between strains of these two species, making their identification difficult solely on the basis of 16S rRNA gene. The framework generated was then used to check if uncharacterized Streptococcus spp. can be classified among the framework species ( Figure 2). Out of 63 sequences previously identified upto genus level, 43 were found to segregate with 7 Streptococcus framework species, supported by high bootstrap values. Among these 43 sequences, 3 segregated with S. anginosus, 21 with S. mitis, 6 with S. pneumoniae and S. gallolyticus each, 5 with S. bovis-equinus, 1 with S. dysgalactiae and S. thermophilus each ( Figure 2). No strains could be segregated with S. mutans, S. pyogenes, S. equi, S. agalactiae and S. sobrinus. The framework based segregation was further validated by checking for the presence of species-specific signatures and restriction analysis.

Signature sequences
Out of 20 signatures identified for each of the 12 Streptococcus framework species, only 1-5 unique signatures were found (Table 3) in framework species. The unique signatures were found in: S. mutans: 5; S. dysgalactiae: 3; S. equi, S. sobrinus and S. thermophilus: 2 each; S. gallolyticus, S. agalactiae, S. pyogenes, S. bovis-equinus, S. anginosus and S. pneumoniae: 1 each. These signatures were found to occur with high frequency. Moreover, these signatures were also found to be highly conserved across a particular species showing 98-100% sequence identity but were found to be fragmented in other species. No unique signature could be identified for S. mitis. But S. mitis can still be distinguished from S. pneumoniae using the signature found unique to S. pneumoniae. In S. mitis the signature was found to be substituted at specific positions and thus can distinguish these two species (Table 3). The signature found for S. bovis-equinus was effective in distinguishing it from very closely related species like S. lutetiensis and S. gallolyticus. These signatures were further used to validate the segregation of 43 sequences among 7 different framework species. All 43 sequences were found to contain the signature unique to the particular species thus validating the affiliation of these sequences to a particular species.

Restriction enzyme analysis
In-silico restriction enzyme analysis using eleven type II enzymes revealed different patterns. Restriction sites for AluI, BfaI, HaeIII, MspI, RsaI and Sau3AI occurred with frequency of 3-10 resulting in 4-11 fragments. The sites for enzymes EcoRI, SmaI and HhaI were found in majority of sequences studied but they were found to be less frequent cutters producing single, single and double cuts respectively. These enzymes thus are less informative and serve no purpose. Inspite of low frequency, BamHI and HindIII can still be used for distinguishing different Streptococcus spp. BamHI produced single but unique cut in S. thermophilus and can be used to distinguish S. thermophilus isolates. HindIII produces single but unique cut in S. sobrinus and can be used to distinguish S. sobrinus isolates. As can be seen from the dendrograms (Additional file 13, 14, 15, 16, 17, 18: Fig. S13-S18) different restriction enzymes can be used for identification and distinguishing different isolates. While AluI was found to distinguish majority of Streptococcus HhaI Similarly, S. gallolyticus and S. bovis can be distinguished using BfaI and HaeIII. Therefore a combination of AluI, BfaI, MspI or HaeIII can be used for distinguishing closely related organisms. The sequences segregated with framework species were further validated using in-silicorestriction enzyme analysis. The identified sequences showed unique restriction enzyme pattern close to the nearby framework species (Table 4) again validating the framework based segregation. Thus combining the information from framework, signature sequences and restriction enzyme analysis it was possible to identify 43 sequences (out of 63) upto species level which were previously designated as Streptococcus sp. (Table 4).

Discussion
Streptococcus is a clinically important genus as a number of species belonging to this genus are human and animal pathogens. This genus has undergone considerable taxonomic revisions and has been divided into different groups based on 16S rRNA gene sequence similarity.
The present study aims to explore internal features of 16S rRNA gene sequences of different Streptococcus spp.
to develop methods for their identification. A phylogenetic framework was constructed using different representative sequences followed by identification of signature sequences and restriction enzymes analysis. The framework based analysis suggests a high level of genetic heterogeneity present within different Streptococcus spp. Signature sequences specific for each Streptococcus framework sp. were identified. These signature motifs would be simple to use as a supplement to the automated identification process. Restriction analysis has proved to be an important tool to identify newly isolated strains [50][51][52] and can be exploited for describing new species [53]. Multiple restriction enzyme usage is recommended for better resolution. Although it has been documented that closely related species cannot be distinguished solely on the basis of 16S rRNA gene, but exploring the internal features of this gene can be of definite use. Therefore researchers are now looking to explore the unique features of 16S rRNA gene that have not been explored yet [54,55]. As already described the genus Streptococcus has been divided into different groups based on 16S rRNA gene similarity [4]. The framework species used for these analyses belong to these different groups.
Two framework species, S. pneumoniae and S. mitis belong to mitis group. Identification of members of mitis group, particularly S. pneumoniae is problematic. Identification of S. pneumoniae isolates is usually done using serological [56,57] and molecular techniques [58][59][60]. S. pneumoniae isolates can be easily identified using the signature sequence as given in Table 3. We could easily distinguish S. pneumoniae isolates from S. mitis using this signature sequence. Other members of this group (S. mitis and S. oralis) that are almost indistinguishable on the basis of complete 16S rRNA gene sequence can be differentiated using different restriction enzymes. S. mitis can be distinguished from other two species of this group by using enzyme Sau3AI. S. pneumoniae, S.mitis and S. oralis can be distinguished from each other by exploiting AluI and MspI (data not shown for S. oralis).
Framework species S. anginosus belongs to anginosus group. This group consists of only 3 species (S. anginosus, S. intermedius and S. constellatus). Members of anginosus group are also difficult to identify and distinguish. An identification scheme for differentiation of these 3 strains was proposed by Whiley et al. [61] and Whiley and Beighton [62]. Commercial identification systems [63,64] and molecular methods have been used for identifying and distinguishing these three species [65,14]. S. anginosus can be easily identified using the signature sequence (Table 3) and use of restriction enzymes. Restriction enzymes AluI, BfaI, RsaI and HaeIII can be used for distinguishing members of anginosus group efficiently (data not shown for S. intermedius and S. constellatus).
Framework species, S. thermophilus belongs to salivarius group which is closely related to bovis group [66] and consists of only 3 species (S. salivarius, S. vestibularis, S. thermophilus). S. thermophilus can be identified by using unique signature sequence (Table 3).
Two framework species, S. bovis and S. gallolyticus belong to bovis group. Members of bovis group, S. bovis and S. gallolyticus are difficult to identify. Isolates of these two species can be distinguished using the signature specific for S. gallolyticus and S. bovis. Moreover, the use of restriction enzymes AluI, BfaI and HaeIII can be instrumental in distinguishing them. The signature found for S. bovis was found to be efficient in distinguishing S. bovis from closely related species-S. lutetiensis. These two species are difficult to distinguish solely on the basis of 16S rRNA gene.
Framework species S. mutans and S. sobrinus belong to mutans group. These two species are also difficult to The signatures were found to 98-100% identical and conserved in a particular species. Py denotes pyrimidine, Pu denoted purine and N denotes any nucleotide.
Overlapping signatures are shown in bold.The number in brackets indicate the corresponding position of the signature in 16S rRNA gene.
distinguish. Beighton et al. (1991) [8] provided a scheme for identification of S. mutans and S. sobrinus strains. In the present investigations these two can be easily distinguished using species-specific signature and use of restriction enzymes-AluI, BfaI, HaeIII, MspI, RsaI and Sau3AI.

Conclusions
The species that are difficult to distinguish solely on the basis of 16S rRNA gene sequence can be identified using inner secrets of 16S rRNA gene, the signatures. These signatures can be exploited for quick identification. The aim of phylogenetic framework construction is to define a range of genetic variability within the species and later exploiting this variability for identification of different isolates. Similarly, use of restriction enzymes help in generating markers that can distinguish closely related species. Present study reveals that the framework, use of specific signatures in 16S rRNA gene and pattern generated by different restriction enzymes can be exploited for identification of isolates belonging to the genus Streptococcus. The markers generated in the present study are based on 16S rRNA gene sequence which is conserved and neither subjected to changes due to culture conditions nor exhibit biochemical variability. Thus the scheme proposed can be applied to any isolate. The approach is cost effective and rapid way for identification of various isolates and thus can be used to differentiate isolates that are difficult to distinguish due to very close traits and biochemical features. Additionally the approach is simple for preliminary identification of a species and can supplement existing automated identification processes. But we should keep in mind that this is a simplified procedure and thus it is also important to know the limitations of such simplified approach.