GAG-POL Polyprotein
HIV POL
encodes the viral enzymes protease, reverse transcriptase, and
integrase. The enzymes are produced as a GAG-POL precursor polyprotein,
which is processed by viral protease.
Protease - p15
Reverse Transcriptase - p51
Reverse Transcriptase and RNase H - p66
RNase H - p15
Integrase - p31
Isoforms:
GAG-POL Polyprotein (1003 amino acids)
p160 - GAG-POL Polyprotein
Cleavage
site:
Localization:
Function:
precursor for viral enzymes
during viral maturation, viral protease
cleaves the Pol polyprotein away from Gag, and further digests it to
separate it into 4 proteins: protease , reverse transcriptase ,
RNase H , and Integrase
Additional
Information:
All of the pol gene products can be
found in the capsid of the virion
the Gag-Pol precursor is generated by a
ribosome frameshift at the C-terminus of GAG (Ref. #4 & #5)
the ribosome frameshift is triggered by a
specific cis-acting RNA motif (Ref. #4 & #5)
the cis-acting RNA motif consists of a
heptanucleotide sequence followed by a short stem-loop in the distal
region of the GAG RNA
without interrupting translation, the ribosome
shifts to the pol reading frame ~5% of the time that the
cis-acting RNA motif is encountered
the frequency of ribosomal frameshifting
coincides with the 20:1 ratio of Gag to Gag-Pol precursors
Protease cleavage does not occur efficiently,
and 50% of the Reverse Transcriptase protein remain covalently
associated to RNase H
Genomic Location: [TOP ]
Reference
Sequences:
HIV-1
(HXB2):
10 20 30 40 50 60 70 | | | | | | | FFREDLAFLQ GKAREFSSEQ TRANSPTRRE LQVWGRDNNS PSEAGADRQG TVSFNFPQVT LWQRPLVTIK 80 90 100 110 120 130 140 | | | | | | | IGGQLKEALL DTGADDTVLE EMSLPGRWKP KMIGGIGGFI KVRQYDQILI EICGHKAIGT VLVGPTPVNI 150 160 170 180 190 200 210 | | | | | | | IGRNLLTQIG CTLNFPISPI ETVPVKLKPG MDGPKVKQWP LTEEKIKALV EICTEMEKEG KISKIGPENP 220 230 240 250 260 270 280 | | | | | | | YNTPVFAIKK KDSTKWRKLV DFRELNKRTQ DFWEVQLGIP HPAGLKKKKS VTVLDVGDAY FSVPLDEDFR 290 300 310 320 330 340 350 | | | | | | | KYTAFTIPSI NNETPGIRYQ YNVLPQGWKG SPAIFQSSMT KILEPFRKQN PDIVIYQYMD DLYVGSDLEI 360 370 380 390 400 410 420 | | | | | | | GQHRTKIEEL RQHLLRWGLT TPDKKHQKEP PFLWMGYELH PDKWTVQPIV LPEKDSWTVN DIQKLVGKLN 430 440 450 460 470 480 490 | | | | | | | WASQIYPGIK VRQLCKLLRG TKALTEVIPL TEEAELELAE NREILKEPVH GVYYDPSKDL IAEIQKQGQG 500 510 520 530 540 550 560 | | | | | | | QWTYQIYQEP FKNLKTGKYA RMRGAHTNDV KQLTEAVQKI TTESIVIWGK TPKFKLPIQK ETWETWWTEY 570 580 590 600 610 620 630 | | | | | | | WQATWIPEWE FVNTPPLVKL WYQLEKEPIV GAETFYVDGA ANRETKLGKA GYVTNRGRQK VVTLTDTTNQ 640 650 660 670 680 690 700 | | | | | | | KTELQAIYLA LQDSGLEVNI VTDSQYALGI IQAQPDQSES ELVNQIIEQL IKKEKVYLAW VPAHKGIGGN 710 720 730 740 750 760 770 | | | | | | | EQVDKLVSAG IRKVLFLDGI DKAQDEHEKY HSNWRAMASD FNLPPVVAKE IVASCDKCQL KGEAMHGQVD 780 790 800 810 820 830 840 | | | | | | | CSPGIWQLDC THLEGKVILV AVHVASGYIE AEVIPAETGQ ETAYFLLKLA GRWPVKTIHT DNGSNFTGAT 850 860 870 880 890 900 910 | | | | | | | VRAACWWAGI KQEFGIPYNP QSQGVVESMN KELKKIIGQV RDQAEHLKTA VQMAVFIHNF KRKGGIGGYS 920 930 940 950 960 970 980 | | | | | | | AGERIVDIIA TDIQTKELQK QITKIQNFRV YYRDSRNPLW KGPAKLLWKG EGAVVIQDNS DIKVVPRRKA 990 1000 | | KIIRDYGKQM AGDDCVASRQ DED [download in fasta format ]
Length:
1003 amino acids
Molecular Weight: 113779 Da
Protein
Domains/Folds/Motifs: [TOP ]
p15
- Protease (99 amino acids)
p51 - Reverse Transcriptase (440 amino acids)
p66 - RT and RNase H (560 amino acids)
p15 - RNase H (120 amino acids)
p31 - Integrase (288 amino acids)
Secondary Structure prediction:
Low
Complexity Region - seg:
N-glycosylation:
2 potential sites
NGSN (832 - 835)
NFTG (835 - 838)
N-myristoylation:
15 potential sites
GAddTV (83 - 88)
GGigGF (104 - 109)
GCtlNF (150 - 155)
GQhrTK (351 - 356)
GAhtND (514 - 519)
GLevNI (645 - 650)
GIiqAQ (659 - 664)
GIggNE (696 - 701)
GIdkAQ (719 - 724)
GQvdCS (767 - 772)
GQetAY (809 - 814)
GSnfTG (833 - 838)
GVveSM (864 - 869)
GGigGY (904 - 909)
GGysAG (907 - 912)
Amidation:
none
Protein
kinase C:
7 potential sites
TrR (27 - 29)
TiK (68 - 70)
StK (223 - 225)
TgK (506 - 508)
TpK (541 - 543)
TnR (614 - 616)
TvR (840 - 842)
Casein kinase
II:
18 potential sites
TrrE (27 - 30)
SpsE (40 - 43)
TgaD (82 - 85)
TvlE (87 - 90)
SpiE (158 - 161)
TemE (194 - 197)
TvlD (262 - 265)
SdlE (346 - 349)
TkiE (355 - 358)
TtpD (370 - 373)
TvnD (408 - 411)
TltD (623 - 626)
SglE (644 - 647)
SesE (668 - 671)
ThlE (781 - 784)
TgqE (808 - 811)
SagE (910 - 913)
SrqD (998 - 1001)
Tyrosine
kinase:
3 potential sites
KigpEnp.Y (204 - 211)
KqnpDiviY (328 - 336)
KaqdEhekY (722 - 730)
cAMP / cGMP
kinase:
3 potential sites
KKdS (220 - 223)
KKkS (257 - 260)
RKyT (280 - 283)
Cell
attachment motif:
none
Asp Protease
motif:
1 potential site
(22 - 33)
Asp Prot
Retro motif:
1 potential site
(20 - 89)
Cysteine-rich
Region:
none
Tryptophan-rich
Region:
1 potential site
(553 - 569)
Zinc-finger
CCHC motif:
none
Leucine
Zipper motif:
none
Protein-Protein
Interactions: [TOP ]
Primary and
Secondary Database Entries: [TOP ]
Identifiers:
ViralZone:
HIV-1
PDB/MMDB: Search
for HIV-1 & POL
SwissProt:
P04585
(HIV-1 HXB2 Pol)
EMBL: K03455; AAB50259.1 [EMBL /GenBank /DDBJ ]
PIR: UNKNOWN
HIV: K03455; POL$HXB2
MEROPS: A02.001
InterPro: IPR000477
- RNA-directed DNA polymerase (RT) family /
IPR001037
- Integrase C-terminal family
IPR001584
- Integrase catalytic domain /
IPR001969
- Eukaryotic/viral aspartic protease active site
IPR001995
- Retroviral Aspartic Protease family/
IPR002156
- RNase H domain
IPR003308
- Integrase N-terminal zinc-binding domain /
IPR009007
- Acid Protease domain
Pfam: PF00078
- RVT / PF00665
- RVE / PF00077
- RVP / PF00075
- RNase H / PF00552
- Integrase
PF02022
- Integrase Zinc-binding
Prints: none
ProDom: PD186096
(residues 13 - 72) / PD000261
(residues 156 - 217) / PD580497
(residues 184 - 227) /
PD492067
(residues 218 - 260) / PD404869
(residues 218 - 285) / PD000379
(residues 261 - 303) /
PD513590
(residues 276 - 316) / PD474846
(residues 294 - 389) / PD000698
(residues 390 - 451) /
PD495523
(residues 462 - 593) / PD390352
(residues 589 - 705) / PD000727
(residues 594 - 661) /
PD416714
(residues 664 - 704) / PD582846
(residues 675 - 712) / PD685225
(residues 676 - 705) /
PD502558
(residues 699 - 758) / PD000915
(residues 716 - 770) / PD000348
(residues 771 - 926) /
PD000723
(residues 934 - 985) / PD371748
(residues 940 - 981)
SCOP:
SSF56672
- DNA/RNA polymerase / SSF50630
- Acid protease / SSF53098
- RNase H-like protein
SSF46919
- Integrase N-terminal Zn-binding domain / SSF50122
- Integrase C-terminal DNA-binding domain
BLOCKS: P04585
Prosite: P04585
ProtoNet: P04585
ProtoMap: P04585
PRESAGE: P04585
Database of Interacting Proteins: P04585
ModBase: P04585
Swiss-2DPAGE: 2D gel
BioAfrica
Tools:
- Pol Protein Data Mining Tool
provides real-time analysis of HIV-1 Pol isolates
- HIV
Structure BLAST searches for similar HIV sequences that have known
structures
- HIV Proteomics Resource contains
protein sequence and structure analysis tools
Reviews and
References: [TOP ]
1
- HIV Sequence Compendium 2000
Kuiken
CL, Foley B, Hahn B, Korber B, Marx PA, McCutchan F, Mellors JW,
Mullins JI, Sodroski J, Wolinksy S.
Theoretical
Biol. & Biophys. Group, Los
Alamos Nat Lab , LA-UR 01-3860 [Read it online: Compendium ]
2
- Retroviruses
Coffin
JM, Hughes SH, Varmus HE.
CD-ROM
ed. (2002) Cold
Spring Harbor Laboratory Press [Read it online: NCBI
Bookshelf ]
3
- Molecular Characteristics of HIV-1 Subtype C Viruses from
KwaZulu-Natal, South Africa:
Implications
for Vaccine and Antiretroviral Control Strategies.
Gordon
M, De Oliveira T, Bishop K, Coovadia HM, Madurai L, Engelbrecht S,
Janse van Rensburg E, Mosam A, Smith A, Cassol S.
Journal
of Virology 77(4): 2587-2599 (2003) [pubmed: 12551997 ]
4
- Characterization of ribosomal frameshifting in HIV-1 Gag-Pol
expression.
Jacks
T, Power MD, Masiarz FR.
Nature
331: 280-283 (1988) [pubmed: 2447506 ]
5
- Human immunodeficiency virus type 1 gag-pol frameshifting is
dependent on
mRNA
secondary structure: Demonstration by expression in vivo.
Parkin
NT, Chamorro M, Varmus HE.
J
Virol 66: 5147-5151 (1992) [pubmed: 1321294 ]