Journal of Bacteriology, August 2006, p. 5655-5667, Vol. 188, No. 16
0021-9193/06/$08.00+0 doi:10.1128/JB.01596-05
Copyright © 2006, American Society for Microbiology. All Rights Reserved.
| MINI-REVIEW |
Division of Biological Sciences, University of California at San Diego, La Jolla, California 92093-0116
Gram-negative bacteria possess a two-membrane envelope with an outer lipopolysaccharide-containing membrane that provides an effective barrier, protecting these organisms from detergents, organic solvents, drugs, and other toxic substances (24). However, the occurrence of an outer membrane poses major problems for the secretion of macromolecules (28). Consequently, gram-negative bacteria have evolved a tremendous diversity of outer membrane systems designed for the export of proteins, complex carbohydrates, nucleic acids, and lipids (4, 37).
Among the well-characterized outer membrane protein secretion systems are (i) the so-called two-partner secretion systems (transport classification [TC] 1.B.20) and (ii) the autotransporter systems (AT or AT-1; TC 1.B.12) (20, 30, 51). Following export from the cytoplasm to the periplasm via the general secretory (Sec) system, both AT and two-partner secretion system translocation domains insert into the outer membrane as ß-barrel structures. They mediate export of virulence proteins or protein domains from the periplasm across the outer membrane to the extracellular medium where the exported protein or domain may either remain attached to the outer membrane or can be released in a free state (51). The exported proteins may serve as adhesins, hemolysins, proteases, cytotoxins, or mediators of intracytoplasmic actin-promoted bacterial motility (51).
Proteins of the autotransporter family possess C-terminal domains of 250 to 300 amino acyl residues that fold and insert into the outer membrane to give a ß-barrel with 12 to 14 transmembrane ß-strands (15, 16, 27, 29). This structure forms a pore through which the N-terminal virulence factor is presumed to be exported (13, 32). There is still some controversy as to the mechanism of protein transport (5, 6, 32, 44, 49). For example, the possible involvement of energy in the translocation process has not yet been extensively studied, and the relationship of these outer membrane translocators to mechanisms of antibiotic efflux and TonB-dependent influx, if any, has not been pursued.
A second family of autotransporters called "trimeric autotransporters," "oligomeric coiled-coil adhesins," or "autotransporters-2" (AT-2; TC 1.B.40) has recently been discovered (9, 17, 19, 43, 52). Among the best-characterized members of this family are the multifaceted Yersinia adhesin, YadA (2, 9, 19, 31, 36), the major adhesin of Haemophilus influenzae that allows colonization of the nasopharynx, Hia (25), and the Haemophilus "adhesin and penetration" protein, Hap (10, 11, 26, 48). These proteins define a novel family of autotransporter virulence factors. They may be able to allow translocation of their passenger domains across the outer membrane without the assistance of accessory proteins, but this postulate is still in contention.
A conserved C-terminal domain of about 70 amino acyl residues is believed to form the trimeric ß-barrel that presumably allows the transport of the N-terminal "passenger" domain to the bacterial cell surface. These proteins form trimeric lollypop-like structures anchored to the outer membrane by their C-terminal autotransporter anchor domains (5, 6, 44). A superficially similar structure has been established for the outer membrane TolC protein of Escherichia coli, which has an analogous ß-barrel structure. In the case of TolC, however,
-helical regions extend into the periplasm, a feature lacking in AT-2 domains (18, 22, 23). According to some investigators, the C-terminal 67- to 76-residue domains are both necessary and sufficient for translocation of the N-terminal adhesin domains (44). Each subunit AT-2 domain is believed to consist of just four transmembrane antiparallel ß-strands (reviewed in reference 5). Deletion of this C-terminal domain abolishes outer membrane insertion of YadA (45), while the deletion of the linker region results in degradation of the whole protein (36). These experimental results suggest but do not establish that these C-terminal linker or outer membrane insertion regions are directly responsible for export of the passenger domain.
The few characterized protein members of the AT-2 family serve as virulence factors in animal pathogens (36). They have been termed invasins, immunoglobulin-binding proteins, serum resistance proteins, and hemagglutinins, but all appear to have adhesive properties. Because each of the few functionally characterized "passenger" domains of this class of autotransporters can function in adhesion, it is possible but not demonstrated that they are all structurally related. The characteristic feature that we will use for identification of family members, however, is the presence of the small C-terminal domain that is believed to form the outer membrane trimeric ß-barrel pore.
In this minireview we present a bioinformatic analysis of the AT-2 family. We identify recognizable sequenced members of the AT-2 family and align the sequences of their autotransporter domains. The resultant multiple alignment is used to identify conserved motifs, generate a phylogenetic tree for the family, identify cluster-specific sequence characteristics, and generate average hydropathy, amphipathicity, and similarity plots that allow structural predictions. Essentially all of the AT-2 proteins analyzed here derive from
-, ß-, and
-proteobacteria and their phage, although other more distantly related members of the family are found in other gram-negative bacterial kingdoms (7). Our analyses reveal that phylogeny of the AT-2 domains does not correlate with the size of the N-terminal passenger domain. However, the passenger domains consist of homologous repeat units that are common to all members of the family. Phylogeny of the passenger domains generally follows that of the AT-2 domains. To a considerable degree, protein phylogeny follows the phylogeny of the source organisms. Our results suggest that the genes encoding these proteins have been subject to lateral transfer but that transfer occurred primarily within closely related organisms. This conclusion is substantiated by their occurrence in phage genomes (see below). We suggest that all members of the AT-2 family serve a single unifying function in cell adhesion/macromolecular recognition. This review provides the first detailed bioinformatic analysis of the AT-2 family.
ESTABLISHED PROTEIN MEMBERS OF THE AT-2 FAMILY
Using the PSI-BLAST search tool (1) with YadA of Yersinia enterocolitica as the query sequence and three iterations, about 140 above-threshold hits were retrieved from the NCBI database. AT-2 family members were identified on the basis of their C-terminal AT-2 domains. No homologues were identified that appeared to have the AT-2 domain anywhere other than at their extreme C termini. Redundancies, very closely related homologues, and hits that showed an insufficient degree of sequence similarity with established members of the family to establish homology (
9 standard deviations using the GAP program [8]) were eliminated. This left 69 proteins upon which the analyses reported below were based. These proteins are presented in Table 1 while their aligned AT-2 domain sequences are shown in Fig. 1, and the phylogenetic tree based on this alignment is presented in Fig. 2A. The phylogenies of the passenger domains are presented in Fig. 2B (see below). The proteins listed in Table 1 are presented according to cluster as shown in the tree presented in Fig. 2A.
|
|
|
|
SEVEN-RESIDUE REPEAT SEQUENCES IN THE LINKER REGIONS OF AT-2 PROTEINS AND OTHER PROTEINS
Several of the AT-2 proteins listed in Table 1 exhibit a demonstrable 7-amino-acyl repeat element between the passenger domains and the putative transmembrane regions of the AT-2 domains (i.e., in the linker regions). For many of these homologues, two, three, or more repeat elements could be identified at the N-terminal end of the AT-2 domain, often extending into the part of the protein referred to as the passenger domain (Fig. 1). In AT-2-like proteins retrieved in BLAST searches, this 7-amino-acyl repeat occurred as many as 18 times. Twelve repeats are sufficient to create a domain the length of the linker plus the AT-2 domain. An example of this is the Apl2 protein of Actinobacillus pleuropneumonia with a size of 195 amino acyl residues. The repeat elements, encompassing all but the last 12 residues of this protein, are presented in Table 3, where 12 tandem repeat elements are shown. The consensus for this repeat element is (D/E)(Q/N)(R/K)(F/I)(Q/D)(Q/K)(V/L), where the two most prevalent residues at each position are indicated in parentheses. The presence of this repeat sequence can be easily seen, for example, for Yps1 and Yen1, both of which show extensive similarity to the consensus sequence (Fig. 1). It is possible that the AT-2 domains have evolved from a primordial gene like that encoding the Apl protein, derived from an internally repeated 21-bp genetic element. These repeat sequences of several AT-2 proteins occur in the linker regions connecting the passenger domains to the AT-2 domains. Thus, AT-2 domains may have either evolved from a sequence like that shown in Apl2, as illustrated in Table 3, or they could have evolved independently of this repeat sequence and become associated with it as a result of gene fusion events.
|
All of the proteins in Table 1 exhibit sequence similarity in their AT-2 domains. The phylogenetic tree for these domains, shown in Fig. 2A, reveals clustering according to organismal type (Table 1). Thus, cluster 1a contains only ß-proteobacterial proteins; cluster 2a contains only
-proteobacterial proteins; and clusters 2b, 2d, and 3a contain only
-proteobacterial proteins. Moreover, clusters 1b, 2c, and 3b contain only ß- and
-proteobacterial proteins with the exception of the two E. coli phage proteins and the two putative desulfitobacterial proteins, Dha1 and Dha2. Finally, cluster 1c contains only
- and
-proteobacterial proteins. Thus, to some extent, clustering reflects the organismal type from which these proteins derive. This observation suggests that horizontal transfer of genetic material encoding AT-2 proteins has been restricted largely to organisms within any one of the proteobacterial subdivisions (see Conclusions and Perspectives).
AT-2 DOMAIN STRUCTURAL PREDICTIONS
The average hydropathy, amphipathicity, and similarity plots, based on the Fig. 1 multiple alignment and obtained using the AveHas program (53), are shown in Fig. 3. There are five peaks of hydrophobicity (H1 to H5), and with the angle set at 180°, as is appropriate for a ß-strand, there are five peaks of amphipathicity (A1 to A5). The average similarity plot (Fig. 3, dashed line) follows the average amphipathicity plot (dotted line) more closely than it follows the average hydrophobicity plot (solid line).
|
CONSERVED MOTIFS
As shown in Fig. 3, the most conserved regions of the alignment coincide with hydrophobic peak H1 and amphipathic peak A3. These include the two most conserved motifs among AT-2 domains. These two consensus motifs were AGIASALALA (motif 1; alignment positions 18 to 27) and SAVAIGV (motif 2; alignment positions 51 to 57). Although the majority of the proteins exhibit these conserved residues, no residue position is fully conserved, and the variation at any one position is usually considerable. The best-conserved residue is G56 which is conserved in all but one of the proteins (Hin1), where a V can be found (Fig. 1 and Table 4). Examination of the data in Table 4 reveals that at almost all conserved positions in motif 1, exceptional nonconserved residues can be hydrophilic, hydrophobic, or semipolar. Only at alignment position 21 is the residue always semipolar. This fact suggests that there is not an absolute requirement for residue type at most of the positions in putative hydrophobic peak 1 (Fig. 3).
|
PHYLOGENY OF THE PASSENGER DOMAINS OF AT-2 PROTEINS
The phylogenetic tree of the passenger domains (Fig. 2B) was significantly different from that of the AT-2 domains (Fig. 2A). Cluster 1a, 1b, and 1d proteins in Fig. 2A can be found in clusters 4 and 5 in Fig. 2B, while cluster 1c proteins are found in clusters 4 and 9 in Fig. 2B (see Table S1 in the supplemental material [http://biology.ucsd.edu/
msaier/supmat/AT2]). Thus, cluster 1 proteins in Fig. 2A are found almost exclusively in clusters 4 and 5 in Fig. 2B. Cluster 2 proteins in Fig. 2A are distributed between 10 clusters in Fig. 2B with no member in clusters 4, 5, and 9. Further, cluster 3 proteins in Fig. 2A are distributed between 16 clusters in Fig. 2B, but only 1 of these 16 clusters overlaps with the cluster 1 proteins of Fig. 2A, and only 2 of the 16 clusters shown in Fig. 2B overlap with cluster 2 proteins of Fig. 2A. It is therefore clear that while the phylogenetic trees of the passenger domains reflect a greater degree of sequence divergence than that of the AT-2 domains, there is rough segregation of the passenger domains according to the phylogenetic groupings of the AT-2 domains. Further, whenever two proteins are phylogenetically closely related, the phylogenetic positions of the passenger domains correlate well with those of the AT-2 domains. Because of (i) the greater variation in size, (ii) the presence of multiple repeat units, and (iii) the greater sequence divergence of the passenger domains relative to the AT-2 domains, the tree shown in Fig. 2A is expected to show greater accuracy than the tree in Fig. 2B. We therefore suggest that while shuffling of the passenger domains relative to the AT-2 domains may have occurred throughout evolution of these proteins, such shuffling was a relatively rare event.
LARGE INTERNAL REPEAT SEQUENCES IN THE PASSENGER DOMAINS OF AT-2 PROTEINS
Examination of the passenger domains revealed that these consist primarily of large repeat units of about 70 residues (60 to 80 residues for individual large repeat units). The larger proteins contain greater numbers of repeat units than the smaller proteins, and for each protein examined in detail, most of the passenger domains consist of these types of repeat units. For example, 53 repeat units were identified in the 3,068-residue protein Bfu1 of Burkholderia fungorum. These were multiply aligned as shown in Fig. 4. The alignment revealed that the best-conserved region is in the centers of these repeat units where the residue consensus motif for a 10-residue sequence is (A/T/S)(N/A/S)(T/S/A)(D/V/L)A(V/I)(N/G)(G/L/V)(A/S/G)(Q/A) (Fig. 4, bolded residues under the alignment).
|
|
These two examples represent the only cases where the closest homologues in the protein are adjacent to each other on the tree. In all other cases, phylogenetically close homologues are distant from each other in the protein. For example, repeats 20 to 24 in the alignment shown in Fig. 4 are phylogenetically close (Fig. 5), but they represent repeats 32, 36, 25, 46, and 9, respectively, in the protein. Assuming that these sequence-similar repeats arose recently, we must conclude that they arose either by tandem duplications followed by shuffling or by a copy process, possibly involving polymerase hopping from one repeat unit in the DNA to another nontandem repeat. Such an event could have resulted from DNA looping during replication or from an event involving RNA polymerase and reverse transcriptase. Although the analysis shown in Fig. 5 suggests a mechanism of the latter type, we know of no experimental evidence supporting such a postulate. The proposed pathway for generation of all repeats in Bfu1 (assuming uniform rates of sequence divergence) is shown in Fig. S1 in the supplemental material (http://biology.ucsd.edu/
msaier/supmat/AT2). Repeats 20 to 34 occur on one primary branch of the phylogenetic tree (Fig. 5). The original precursor repeat unit (p) first duplicated and then diverged to give the precursors of repeats 33 and 34 (p33-34) and of repeats 20 to 32 (p20-32). The former primordial unit then duplicated a second time to give repeats 33 and 34. The precursor of repeats 20 to 32 (p20-32) underwent up to eight successive duplication events as follows:
REPEAT UNITS IDENTIFIED IN THE Yen1 PROTEIN
To exemplify the occurrence of repeat units of differing lengths in the AT-2 linker and passenger domains, we analyzed the 454-residue Yen1 protein in detail. The C-terminal 75 residues in Yen1 comprise the AT-2 domain. The linker region of 21 residues consists of three 7-residue repeat units (R71 to R73) (Table 5). The first 7-residue repeat unit (R71, beginning at position 365) is less similar in sequence to the other two repeat units (R72 and R73 at positions 372 and 379, respectively) than these latter two sequences are to each other (Table 5).
|
Upstream of the 14-residue repeats are the apparent
60 residue repeats (Table 5). Repeats R602 and R603 show the greatest percent identity (16 out of 60, or 27% identity). Next, R601 and R603 exhibit 8 out of 60 identities (13.5% identity), while R602 and R604 exhibit 7 out of 40 identities (18% identity). All of the AT-2 protein passenger domains proved to be homologous in the regions exhibiting the 60-residue repeat units. They differed with respect to degrees of sequence similarity and numbers of repeat units. However, the results obtained explain why all of these proteins are homologous and why proteins of very different sizes cluster together on the phylogenetic tree (Fig. 2B).
CONCLUSIONS AND PERSPECTIVES
In this minireview, we summarize the available experimental evidence and report bioinformatic analyses of the newly discovered AT-2 proteins, believed to form trimeric structures in the outer membranes of gram-negative bacteria. These trimers are thought to form 12-ß-strand transmembrane pores that allow export of the N-terminal passenger domains from the periplasm to the external milieu (see introduction). Our analyses have led to several important evolutionary conclusions or suggestions. (i) AT-2 domains are found in proteobacteria of the
-, ß-, and
-subdivisions and their phage although sequence-divergent members of the family are found in other gram-negative bacterial kingdoms (7). (ii) Two homologues found outside of these bacterial subkingdoms were from a low GC-content gram-positive bacterium with an incompletely sequenced genome. We suggest that these two sequences resulted from DNA contamination. (iii) Several paralogues can be present in a single organism; for example, Haemophilus somnus 2336 has five paralogues of similar AT-2 domain sequence, while Burkholderia cepacia R18194 has four AT-2 domain paralogues, three of which are similar in sequence. (iv) AT-2 sequence similarity does not imply similarly sized passenger domains, as phylogeny of the AT-2 domains does not correlate well with protein size. (v) Although there is a poor correlation between position in the AT-2 domain tree and protein size, there is a reasonably good correlation between AT-2 protein domain phylogeny and the source organismal type (with a few potential exceptions). (vi) Linker domains appear to consist of 7-residue repeats. (vii) Adjacent to these are 14-residue repeats that may have arisen by sequence divergence of duplicated 7-residue repeats (8). Finally, most of the passenger domains consist of
60-residue repeats of variable numbers.
Points iii to v above imply that the shuffling of AT-2 domains relative to their passenger domains and/or the modification of passenger domain size during recent evolution has occurred repeatedly, even though horizontal transfer of these proteins across bacterial phylogenetic groupings has been relatively rare. It also appears that recent AT-2 domain-encoding gene duplication events have given rise to most of the paralogues in organisms such as H. somnus and B. cepacia. A recent increase or decrease in the numbers of
60-residue repeat units in the passenger domains is largely responsible for the size variations observed for close homologues.
Sequence analyses led to a very tentative but plausible suggestion that AT-2 domains may have evolved from domains that arose by repeated duplication of a genetic element of 21 nucleotides, encoding a 7-amino-acyl residue peptide. This peptide had the probable sequence of (D/E)(Q/N)(R/K)(F/I)(Q/D)(Q/K)(V/L). This is a strongly hydrophilic heptapeptide with only two hydrophobic residue positions. This repeat unit could be identified in the N-terminal "linker" regions of several AT-2 domains. This hydrophilic "linker" connects the AT-2 domain with the passenger domain. Surprisingly, it could be found throughout most of the C-terminal regions of other proteins that exhibit certain characteristics of AT-2 proteins and that were retrieved with PSI-BLAST iterations (Table 3). It is clear that if this repeated heptapeptide provided the basis for formation of the AT-2 domain, extensive sequence divergence had to have occurred in order to form the more hydrophobic, strongly amphipathic, ß-structured AT-2 domains that are thought to mediate pore formation.
We identified two particularly well-conserved sequence motifs in the AT-2 domain that must be of structural and functional significance. One proved to be in the N-terminal region of the AT-2 domain in a strongly hydrophobic region (Fig. 3, peak H1), while the other was in a strongly amphipathic region in putative transmembrane ß-strand 2 (Fig. 3, peak A3). The former proved to be more hydrophobic than the latter. Most interestingly, motif 1 exhibited AT-2 domain-specific residue-type differences that were lacking in motif 2. Motif 2 exhibited conservation in the different clusters typically characteristic of the entire AT-2 family. Since only in motif 1 was there a suggestion of residue (and hence functional) specialization and since full residue conservation was not observed at any one position, we suggest that the pores formed from AT-2 domains are fairly flexible and nonspecific, accommodating a range of passenger proteins. It is possible, however, that substrate protein selectivity is a function performed by motif 1.
The proposed mechanism of membrane transport by proteins like YadA, Hia, and Hap is by no means established. The notion that 12-stranded ß-barrels form export portals is in doubt. For example, in the crystal structure of the 12-stranded ß-barrel from the E. coli outer membrane phospholipase A2, the ribbon diagram shows the existence of a pore formed by the barrel, but the space-filling form indicates that this channel is too small to permit export of a polypeptide in either
or ß form (21, 33, 41, 42). The limitations of biochemistry to physiological theories are important to note in order to stimulate discussion of the overall validity of the proposed translocation model. A crucial point in this respect is the proposed multimeric structure of AT-2 C domains. The conclusion that AT-2 proteins are homotrimers should be evaluated carefully in view of the potential inability of a 12-stranded ß-barrel to transport polypeptide strands. In this regard, however, it is also important to note that transmembrane channels can be flexible, opening and closing in response to conformational changes that alter the angle of the polypeptide relative to the plane of the membrane (35).
Outer membrane porins with 8 transmembrane ß-strands (TßSs) (OmpA of E. coli, TC 1.B.6 [12, 34]), 10 probable TßSs (TP0453 of Treponema pallidum, TC 1.B.45 [14]), 12 TßSs (Tsx of E. coli, TC 1.B.10 [50]; NalP of Neisseria meningitidis, TC 1.B.12 [32]; TolC of E. coli, TC 1.B.11 [23]), and 14 TßSs (FadL of E. coli, TC 1.B.9 [47]) have been identified and have been shown to have porin activities in spite of their small pore sizes. Quite conceivably, pore activity is transient, being induced by specific conditions such as substrate binding or response to osmotic conditions (3, 35).
The analyses reported in this minireview make several predictions concerning the structures, functions, and evolutionary origins of a novel family of autotransporter proteins. A four-transmembrane strand ß-sheet possibly serves as the pore-forming element, and oligomerization is likely to be required for function, as is the case for all well-characterized channel-forming peptides (38-40). The functional significance of conserved motifs 1 and 2 has not been investigated. The fact that all passenger domains are homologous, consisting of large repeats of various numbers, suggests a unified general function in adhesion/macromolecular recognition. Further studies will be required to understand the structure-function relationships of these interesting virulence-related proteins.
ADDENDUM IN PROOF
After the completion of this work, the complete genome sequence of D. hafniense Y51 has become available (H. Nonaka et al., J. Bacteriol. 188:2262-2274, 2006). The two sequences, Dha7 and Dha2, that we suspected to be contaminants are not in the completed sequence.
ACKNOWLEDGMENTS
This work was supported by NIH grant GM64368 and GM077402 from the National Institute of General Medical Sciences.
We thank Mary Beth Hiller for her assistance in the preparation of the manuscript.
| FOOTNOTES |
|---|
REFERENCES
This article has been cited by other articles:
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Appl. Environ. Microbiol. | Infect. Immun. | Eukaryot. Cell |
|---|---|---|
| Mol. Cell. Biol. | J. Virol. | Microbiol. Mol. Biol. Rev. |
| ALL ASM JOURNALS |