DNA fingerprinting of Shiga-toxin producing Escherichia coli O157 based on Multiple-Locus Variable-Number Tandem-Repeats Analysis (MLVA)

Background The ability to react early to possible outbreaks of Escherichia coli O157:H7 and to trace possible sources relies on the availability of highly discriminatory and reliable techniques. The development of methods that are fast and has the potential for complete automation is needed for this important pathogen. Methods In all 73 isolates of shiga-toxin producing E. coli O157 (STEC) were used in this study. The two available fully sequenced STEC genomes were scanned for tandem repeated stretches of DNA, which were evaluated as polymorphic markers for isolate identification. Results The 73 E. coli isolates displayed 47 distinct patterns and the MLVA assay was capable of high discrimination between the E. coli O157 strains. The assay was fast and all the steps can be automated. Conclusion The findings demonstrate a novel high discriminatory molecular typing method for the important pathogen E. coli O157 that is fast, robust and offers many advantages compared to current methods.


Background
Escherichia coli O157:H7 is a common cause of a variety of illnesses, including bloody diarrhea and the hemolytic uremic syndrome (HUS). The O157:H7 serotype was first described in the literature in 1983 following two outbreaks of hemorrhagic colitis in a fast-food restaurant chain in Oregon and Michigan in 1982 [1,2]. Illness was characterized by watery diarrhea, which progressed to bloody diarrhea and abdominal cramps [1,2]. E. coli O157:H7 belongs to a group of diarrheagenic E. coli named Shiga toxin-producing E. coli (STEC) [3]. STEC produce several distinct Shiga toxins (Stx) [4] and it is believed that HUS results from the systemic action of Stx on vascular endothelial cells [5]. The most frequent mode of transmission for E. coli O157:H7 infection is through consumption of contaminated food and water, and several outbreaks have been caused by ground beef. Approximately 1% of healthy cattle may have the organism in their intestines.
The ability to react early to possible outbreaks of E. coli O157:H7 and to trace possible sources relies on the availability of highly discriminatory and reliable techniques. Of particular interest are the DNA based molecular typing methods. Previous work in our laboratory has shown that the pulsed-field gel electrophoresis (PFGE) method of XbaI digested whole genomes gave good discrimination and was highly reproducible [6]. We previously compared PFGE with the newer method of Amplified-Fragment Length Polymorphism (AFLP) and found that while AFLP was faster, PFGE offered the best resolution for typing E. coli O157:H7 [6]. Typing E. coli O157:H7 by PFGE is currently the preferred typing method in our laboratory. However, the rapid sequencing of complete bacterial genomes including the sequence of two E. coli O157:H7 strains [7,8]. gave us the opportunity to search the genomes of these two strains in order to look for Variable Number of Tandem Repeats (VNTRs) that could be used as a source of genetic polymorphisms and be used in a typing assay. Polymorphic VNTR regions which can be used for typing purpouses have been identified and tested in Yersinia pestis [9][10][11], Francisella tularensis [12], Mycobacterium tuberculosis [13][14][15][16], Xylella fastidiosa [17], Haemophilus influenzae [18][19][20], Bacillus anthracis [21][22][23][24], and we have recently developed a VNTR based typing assay for the important enteropathogen Salmonella Typhimurium [25].

Bacterial strains
In all 73 isolates of shiga-toxin producing Escherichia coli O157 were used in this study. The isolates included both domestic (32 isolates) strains and strains from Sweden (11 isolates), Finland (6 isolates), UK (5 isolates) and from the USA (19 isolates). The majority, 69, of the isolates have previously been typed both by XbaI PFGE and AFLP [6], and the collection included strains from well characterized outbreaks as well as strains designated as sporadic isolates.

DNA extraction
Suspensions of bacterial cells were boiled for 15 min and used directly in the PCR reactions after a brief centrifugation at 3500 rpm for 3 min., or genomic DNA was extracted using a commercial kit ("Easy-DNA"; Invitrogen BV, Leek, The Netherlands).

MLVA typing
The genomic sequences of both the E. coli O157 strains that have been fully sequenced (accession numbers AE005174 and NC_002695) were searched for tandem repeated DNA motifs with the Tandem Repeat Finder [26], GeneQuest (DNASTAR, Madison WI) and Kodon (Applied-Maths, Kortrijk, Belgium) software. Several regions containing tandem repeated motifs were found, of which seven were chosen to constitute a set for high level discrimination of E. coli O157 isolates. Six of the VNTRs were localized on the main chromosome, and one was localized on the E. coli O157 large virulence plasmid (accession no. AF074613). Table 2 [see Additional file 2] gives an overview over the repeat characteristics. The VNTRs were situated in five genes and two intergenic regions and had repeated motifs ranging from 6 to 30 basepairs. Some of the VNTRs displayed size variations between the EDL933 and Sakai strains when their Gen-Bank sequences were examined, and the predicted amplicons are shown in table 2 [see Additional file 2]. The actual repeats with leader and tailing sequence of 500 bps were used to design specific PCR primers (Table 3) [see Additional file 3]. Care was taken to match both annealing temperature and sizes of the produced PCR amplicons to make a convenient set for PCR and capillary electrophoresis separation. The PrimerSelect (DNASTAR) software automatically calculated the primer annealing temperatures, and analyzed how the different primers could interact in multiplex PCR. The forward primers for all the seven VNTR regions were labelled with 6 FAM (6'carboxyfluorescein). The different primer-sets were tested against each other in several combinations. A combination of primers with Vhec1 alone annealed at 50°C, Vhec4 and Vhec5 annealed at 53°C, Vhec2 and Vhec3 annealed at 50°C and Vhec6 and Vhec7 annealed at 50°C. The temperature profile was: 94°C denaturation for 5 min followed by 30 cycles of 94°C for 30 s, annealing temperature from above for 60 s and 72°C for 50 s and finally a 10 min extension step at 72°C performed on a GeneAmp PCR system 9700 (Applied-Biosystems, Foster City, CA). Five microliters of each of the PCR products were pooled together into a single tube for each sample and thoroughly mixed. Five microliters of the resuspended solution mixed with 8 µl formamid and 1 µl of size standard was then used for capillary electrophoresis, after a 2 min denaturation at 94°C, on an ABI-310 Genetic Analyzer (Applied-Biosystems) with POP4-polymer and Genescan TAMRA-500 or TAMRA-2500 as internal standard in each sample (Applied-Biosystems). The capillary electrophoresis was run for 50 min at 60°C. The samples were injected into the capillary by applying a 15 kV voltage for 5 s. The run voltage was 15 kV. The size of each VNTR locus was determined by the GeneScan software (Applied-Biosystems).

Phylogeny calculation
The electropherograms were imported into the BioNumerics software package (Applied-Maths) and a phylogenetic tree was constructed using Dice coefficients and cluster analysis with the unweighted pair group method with arithmetic averages (UPGMA) from the ABI trace files. The constructed phylogeny was based on the resulting compound band-pattern from all amplified loci for each isolate, thus, the separate VNTR loci were not analysed individually.

Sequencing
The same primers used to amplify the VNTR regions were used for sequencing at 3.2 pmol concentration. The PCR products were cleaned with the Qiaquick PCR purification kit (QIAGEN, Hilden, Germany) and both forward and reverse strands were sequenced two times each using the ABI Cycle-sequencing kit V3.0 (Applied-Biosystems), the sequencing products were cleaned with the DyeEx kit (QIAGEN) to remove unincorporated dyes before applied on an ABI-3100 automated sequencer.

Results
The Tandem Repeat Finder software [26] found 36 repeats when it was set to search for repeated motifs of 6 or more nucleotides repeated in tandem with 3 or more copies, which were more than 60% internally homogenous between repeat units. Both the GeneQuest (DNASTAR) and Kodon software (Applied-Maths) found these repeats and added the possibility of viewing graphics of the location of each VNTR locus with surrounding area, and information about melting temperatures and GC content in these genomic areas. The Tandem Repeat Finder offered by far the fastest search routine, however, both GeneQuest and Kodon also searched for other classes of repeated DNA e.g. inverted repeats and dyad repeats.
The MLVA assay displayed 47 distinct profiles among the 73 strains analysed. The resulting electropherogram of the pooled amplicons showed a clear and easily interpretable banding patterns (Figure 1). The identification of each peak in the pooled run with the correct VNTR region was done by running each primer-pair separately in every isolate. This was done primarily for the development of the method, and the resulting phylogeny was based on the resulting banding-pattern without identifying each locus separately. Selected isolates were run repeated times by different individuals, and always displayed identical banding patterns. Some minor variations in signal strength between runs were noted which possibly could be explained by variations in PCR efficiency or in the amount of DNA available as template since the boiling of isolates is likely to produce different amount of DNA. Not all the VNTR loci gave amplification products in all isolates or some of the peaks were masked by two VNTRs producing the same sized PCR products. The majority of samples displayed all amplification products, although some of these were masked by two different VNTRs having the same size in the electropherograms. Eight isolates (11%) displayed a loss of one band, and only one isolate had less than 6 bands (Table 4) [see Additional file 4]. All primer-pairs showed loss of amplification products except Vhec2, Vhec6 and Vhec7, which were present in all our isolates. All the VNTR loci displayed heterogeneity in our material. In table 5 [see Additional file 5] the data for allele size ranges in basepairs, number of alleles at each locus and the polymorphism index of each marker is shown. In table 4 [see Additional file 4] the allele distribution among all the distinct MLVA patterns are shown. All unique alleles differing by gain or loss of repeat units were given allele numbers, thus PCR amplicons of a VNTR locus with the same size in different isolates were given the same number in table 4 [see Additional file 4]. Different allele sizes at the different VNTR loci were results of loss or gains of repeat unites, which were verified by direct sequencing of the PCR amplicons using the VNTR primers. In figure 2 is an alignment of the Vhec4 primer products from isolates G5300 and IHE5344 showing a 10 repeat unit difference of the AAATAG motif.

Discussion
The MLVA assay was easy to perform, consisting of four separate PCR reactions with subsequent pooling of the products, and capillary electrophoresis. The assay was tested both with extracted DNA and with just boiling the bacterial cells for 15 min. Both methods gave the same PCR products, however with boiled cells as the starting material primer Vhec1, and in some cases, Vhec3 and Vhec4 gave reduced amplification products in the multiplexed PCR reactions compared to what was achieved with extracted DNA, but with the sensitivity of the capillary system with fluorescently labelled DNA we had no difficulties in finding all products. Selected isolates were run repeated times by different individuals, and always displayed identical banding patterns. The assay was thus proven to be robust and highly reproducible. The strain collection used was very well described both in literature, and by previous typing [6]. This constituted a good basis for assessing the resolution of this novel method. We have previously typed 69 isolates in this collection both by XbaI PFGE and EcoRI(0)+MseI(+C) AFLP [6], and the MLVA assay displayed 45 distinct profiles in this subset. This was comparable to PFGE, which displayed 44 distinct, profiles and considerably higher than the 23 profiles revealed by AFLP (Table 1) [see Additional file 1]. Some of the PFGE profiles were, however, quite similar (one band difference). The resulting phylogeny generated from the MLVA assay showed good accordance to the epidemiological data available, and with PFGE typing (Table 1) [see Additional file 1]. The majority of strains, which were clustered together with PFGE, also were co-clustered with the MLVA assay with some exceptions. Outbreaks were differentiated and clustered into distinct groups. Some exceptions were, however, noted. One isolate (G4921) from outbreak #7 (USA) displayed a different MLVA profile (V28) than the rest of the outbreak #7 strains (V30). This isolate was also different by PFGE typing indicating that this strain might not be a part of outbreak #7, or that this outbreak had several strains involved. Strain G4919 (V30) of outbreak #7 displayed a different PFGE profile (P17) from the main PFGE pattern of this outbreak (P16), but by MLVA this strain was identical to the main outbreak pattern V30. The P16 and P17 PFGE profiles were, however, quite similar and had a one-band difference [6] making the MLVA data credible. Outbreak #6 from the USA was also divided into different profiles by MLVAtyping. Strains H0619 and H0620 were identical (V31), while H0617 (V32), H0618 (V34) and H0616 (V35) displayed different profiles. Here only the H0618 isolate was also different by PFGE. The H0617 and H0616 isolates were thus identical with H0619 and H0620 by PFGE while they had different patterns by MLVA (Table 1) [see Additional file 1]. The outbreak #6 isolates did however cluster together, but strain 174/99 (V33), a Norwegian isolate, was included in this cluster by MLVA. The cattle isolates H90/95, H82/95, H83/95 and H89/95 were isolated from the same herd, and believed to be identical. This was confirmed by MLVA typing, which gave identical patterns for these isolates (V36). The PFGE typing, however, displayed minor differences between these isolates, giving H90/95 and H83/95 identical patterns (P37) as well as different patterns for H82/95 (P38) and H89/95 (P39). The PFGE profiles for patterns P37, P38 and P39 were, however, highly similar and related [6]. No size variations in the repeated motifs in any of the seven VNTR loci were detected between isolates H82/95 and H90/95. For outbreak #14 (V42 profile), the MLVA assay included two isolates, one human isolate previously considered unrelated Escherichia coli O157 MLVA Fragment patterns generated from seven VNTR loci showing two identical and one different profile Figure 1 Escherichia coli O157 MLVA Fragment patterns generated from seven VNTR loci showing two identical and one different profile. Comparison of the MLVA profiles of three O157:H7 STEC cattle isolates. These isolates were all identical by previous XbaI PFGE and AFLP typing. The MLVA assay confirmed that isolates 126/97 (upper panel) and 131/97 (middle panel) were identical, but displayed a clearly distinct profile for isolate 127/97 (lower panel). 3108/97 (V42) and one isolate from cattle 127/97 (V42). The suspected outbreak isolate 127/98 (V43) (outbreak #14) was given a different profile from the other outbreak strains by MLVA (Fig. 1). The inclusion of strain 3108/97 into outbreak #14 was confirmed by PFGE, while the exclusion of 127/98 was not indicated neither by PFGE nor AFLP. Figure 1 shows the MLVA patterns for three isolates that were identical by PFGE and AFLP. Two of these isolates (131/97 and 126/97) were also identical by MLVA (V1), while strain 127/97 displayed a clearly unrelated profile (V42). Some of the MLVA types in table 1 [see Additional file 1] displayed less than seven VNTR bands. The reason for this was that some primer-pairs did not give an amplification product in all isolates. It can also be viewed in figure 1 where the 127/97 isolate in the lower panel has no amplification product from the Vhec5 primer-pairs.

Conclusions
The main finding was a remarkable high co-clustering of the MLVA and PFGE profiles. This high co-clustering was a bit unexpected given that these methods were based on quite different mechanisms to generate the profile date. The presence of XbaI recognition sites for PFGE and the variability of tandem repeated motifs in MLVA. The good correlation between PFGE and MLVA is important since the majority of E. coli O157 has been typed by PFGE and the clusters generated with MLVA will be familiar. The speed of the MLVA assay combined with its resolution makes it an attractive alternative for typing E. coli O157 isolates which are considered to be the most important pathogens so far among the shiga-toxin producing E. coli serotypes. We thus present here a novel VNTR based typing method for STEC O157 strains, which is both considerably faster, less labour intensive and yet offers a slightly higher discriminatory power compared to PFGE. An addi-Sequence alignment of the Vhec4 VNTR loci from isolates G5300 and IHE5344 Figure 2 Sequence alignment of the Vhec4 VNTR loci from isolates G5300 and IHE5344. The results from the sequencing show that the electrophoretic mobility shift between isolates IHE5344 (upper strand) and G5300 (lower strand) is caused by different numbers of AAATAG repeat units alone.
tional benefit compared with PFGE is the potential for fully automating the analysis. The main tasks were liquid handling, which can be performed by laboratory robots, and by using multi-dye electrophoresis the analysis of the data can also be automated.