In silico analysis of highly conserved cytotoxic T-cell epitopes in the structural proteins of African swine fever virus

Abstract Background and Aim: African swine fever (ASF) is a viral disease of pigs caused by ASF virus (ASFV). High mortality and the lack of available treatments have severely impacted the swine industry resulting in huge global economic losses. In response to the dire necessity for vaccines, this study aims to identify highly conserved cytotoxic T-cell epitopes in ASFV structural proteins pp220, pp62, p72, p30, and CD2v through immunoinformatics approach. Materials and Methods: The amino acid sequences of the structural proteins were retrieved from the National Center for Biotechnology Information protein database. The sequences were evaluated in CD-HIT Suite wherein resulting representative sequences were aligned in Clustal Omega. Highly conserved sequences were identified in the Protein Variability Server which were used as reference sequences for the cytotoxic T-cell epitope mapping. Epitopes were predicted using the tools in Immune Epitope Database. Peptides which bind to the swine major histocompatibility complex with IC50 binding scores >500 nM were filtered out. Epitopes which are classified to be potentially toxic and cross-reactive with the swine proteome sequences were all excluded from the study. The epitopes were docked with the swine leukocyte antigen-1*0401 (SLA-1*0401) wherein the binding affinity, the binding energy, and the root-mean-square deviation (RMSD) per residue of epitope-SLA complexes formed were determined and compared with the influenza epitope as positive control. Results: A total of 112 highly conserved fragments with Shannon variability index ≤0.1 were identified. These include 66, 12, 26, 6, and 2 highly conserved fragments from ASFV proteins pp220, pp62, p72, p30, and CD2v, respectively. From these reference sequences, 35 nonameric peptides were selected for the list of candidate cytotoxic T-cell epitopes. These include 26 epitopes for pp220, 7 for pp62, 6 for p72, and one each for p30 and CD2v. Bioinformatics analysis classified the peptides as non-toxic. Further evaluations of epitopes showed that these are less likely to cross-react with the domestic swine proteome sequences. This study identified candidate epitopes from pp220 (IADAINQEF, FLNKSTQAY, QIYKTLLEY, and SLYPTQFDY), and pp62 (GTDLYQSAM, FINSTDFLY, and STDFLYTAI) which can bind to at least two widely distributed SLAs in pig populations. The immunogenicity of candidate peptides RSNPGSFYW, DFDPLVTFY, AIPSVSIPF, and VVFHAGSLY was validated by the acceptable binding affinities, binding energies, and RMSD of the peptide-SLA complexes formed. Results were also comparable with the crystal structure of an SLA-epitope complex in the database. Conclusion: This is the first study to identify highly conserved cytotoxic T-cell epitopes in the structural proteins of ASFV. Overall, the results of in silico evaluations showed that the identified highly conserved cytotoxic T-cell epitopes may be used as part of future vaccine formulations against ASFV infection in domesticated pigs. Nonetheless, these findings require in vitro and in vivo validation before application.


Introduction
African swine fever (ASF) is a viral disease of pigs with mortality approaching 100% [1]. It was first identified in Kenya in the 1920s; then, it had spread to Europe in the middle of the past century. In 1990s, ASF was eradicated from the most affected regions (except Sardinia and sub-Saharan countries) due to the implementation of biosafety regulations. But in 2007, ASF rapidly spread out again from Africa. At present, ASF has been identified in Africa, Europe, America, and Asia [2][3][4]. ASF is caused by the ASF virus (ASFV) which can spread between soft ticks (Ornithodoros erraticus) and wild pigs through feeding, or less frequently through direct transmission between wild pigs. In domesticated pigs, ASFV can be readily transmitted resulting to lethal hemorrhagic fever. The widespread of ASF has been a great challenge for swine breeding. Although it does not pose a risk to human health, ASF has resulted to huge economic losses around the world [5]. ASFV, the only species of the genus Asfivirus within the family Asfarviridae, is an icosahedral enveloped virus with double-stranded DNA as its genetic material. The viral particle consists of an inner core coated by a thick protein core shell enclosing its genetic material. Available at www.veterinaryworld.org/Vol.14/October-2021/7.pdf The subsequent layers are the lipid envelope surrounding the protein core shell, and the viral capsid which forms the outermost cover of an intracellular virion. Virions budding out of the host cell carry a portion of the host plasma membrane forming an extra layer of envelope [6]. The length of its genome differs from one isolate to another (170-194 kbp) with 151-167 open reading frames [7]. ASFV encodes around 50 structural proteins with significant roles in genome replication and viral packaging. Vital structural proteins include pp220, pp62, p72, p30, and CD2v. These major components of viral particles are important in viral attachment, entry, replication, and processing [8]. Both pp220 and pp62 are polyprotein precursors proteolytically cleaved to mature virion proteins. Structural polyprotein pp220 is encoded by gene CP2475L which product is cleaved and post-translationally processed to p150, p37, p14, and p34. Structural protein pp62, a relatively shorter polypeptide, is encoded by gene CP530R. Similar to pp220, this polyprotein is proteolytically cleaved into mature p35 and p15 virion proteins. The post-translational products of pp220 and pp62 play roles in viral particle assembly and form the major component of the viral core shell [9,10]. Findings suggest that the expression of the major capsid protein p72 is a requirement for the processing of pp220 and pp62 in ASFV [11]. Antigenic structural protein p72 is encoded by gene B646L (VP72) which serves as a major protein component in viral capsids and functions in the formation of ASFV capsids [12]. Another crucial structural protein in ASFV is p30. It is encoded by CP204L gene and is most abundantly expressed during the early phase of infection. It has crucial functions in viral entry and is known to be one of the most antigenic ASFV proteins [13,14]. Studies showed that recombinant p30 can be an efficient ASFV antibody detector both in oral fluid and serum samples [15]. ASFV gene EP402R encodes the structural glycoprotein CD2v which consists of transmembrane region, extracellular domain (N-terminal), and cytosolic domain (C-terminal). CD2v is associated to immune response modulation and lymphocyte function impairment which enhance the virulence of ASFV in domestic swine [16]. In addition, it has been shown to be directly involved in viral hemadsorption resulting in increased spread of the virus [17].
Due to the significant roles of pp220, pp62, p72, p30, and CD2v in ASFV replication and assembly, disporting key properties including immunogenicity, abundant expression, and virulence mechanism, these structural proteins can be suitable immunotherapeutic targets in designing vaccines and treatments against ASFV. At present, there are no anti-viral agents and vaccines available against ASFV [5]. The significant economic diminution brought about by the ASFV infection in swine populations around the world warrants accelerated vaccine development.
In the pursuit of this compelling demand, the researchers utilized immunioinformatics approach to provide potential immunotherapeutic agents against ASFV. Immunoinformatics tools and databases can aid in hastening and reducing the cost required to identify immunogenic proteins and peptides for vaccine designs [18,19]. Identification of epitopes through a computational approach is an advantageous and prototypical step in the process of vaccine development before any further in vitro and in vivo evaluations are conducted.
Thus, this study aims to identify cytotoxic T-cell epitopes from the conserved amino acid sequences of pp220, pp62, p72, p30, and CD2v in ASFV. The use of highly conserved sequences in vaccine development is one of the most important factors that must be considered to resolve immune epitope evasion in the rapidly mutating viral antigens.

Ethical approval
Ethical approval is not necessary for this type of study.

Study period and location
The study was conducted from April to June 2021. The study was conducted at the Polytechnic University of the Philippines.

Retrieval and identification of highly conserved sequences
ASFV protein sequences of pp220, pp62, p72, p30, and CD2v were retrieved from the National Center for Biotechnology Information protein database on April 17, 2021. Search results were filtered using 2475-2476, 530-534, 645-646, 203-204, and 360-404 for the sequence length of pp220, pp62, p72, p30, and CD2v, respectively. The lists of representative sequences for each antigen were obtained using CD-HIT Suite (http://weizhongli-lab.org/cdhit_suite/cgi-bin/index. cgi?cmd=cd-hit) with 1.00 as sequence-identity cutoff. Unique representative sequences for each protein were aligned in Clustal Omega (https://www.ebi.ac.uk/ Tools/msa/clustalo/) to generate multiple sequence alignment. To identify highly conserved sequences for each protein, Shannon variability threshold H ≤0.1 was used in the protein variability server (PVS) tool (http:// imed.med.ucm.es/PVS/). The first sequence in the alignment was used as a reference and the varying residues were masked. Fragments with ≥9 residues were kept for cytotoxic T-cell epitope identification.

Cytotoxic T-cell epitope mapping
Cytotoxic T-cell epitopes which can bind to widely distributed swine leukocyte antigens (SLA) SLA-1*0101, SLA-1*0401, and SLA-1*0801 in swine population [20] were mapped in the Immune Epitope Database. NetMHCcons was utilized to integrate three well-known methods: NetMHC, NetMHCpan, and PickPocket to produce more reliable results [21]. All binders with TAP and proteasome scores below zero were excluded from the study. Epitopes with major histocompatibility complex (MHC) IC 50 <500 nmol/ dm 3 are classified as good binders [22]; thus, these were further evaluated for cross-reactivity and toxicity.
Protein-protein BLAST (BLASTp) was utilized to identify and exclude epitopes with a significant match to swine protein sequences. Sequence match with e-values <1.0e-30 can be cross-reactive in some allergic individuals [23]. Although this result was found in humans, it was adopted in this work to provide a more precise cutoff reference in the identification of potentially cross-reactive peptides. Epitopes were queried against the toxin peptide database in ToxinPred tool (https://webs.iiitd. edu.in/raghava/toxinpred/multi_submit.php) to identify potentially toxic peptides that can cause damage to cells through the default SVM Method [24]. Peptides which overlap with cleavage and glycosylation sites in each protein sequence were excluded from the study. All remaining peptides were included as candidate cytotoxic T-cell epitopes.

SLA-epitope docking and molecular dynamics
The crystal structure of SLA-1*0401 bound to influenza epitope (PDB ID: 3QQ3) was retrieved from the Research Collaboratory for Structural Bioinformatics Protein Data Bank (RCSB PDB). The PDB file was cleaned to obtain the isolated structure of SLA-1*0401 for subsequent docking procedures. For each structural antigen, epitope with the highest IC 50 binding affinity for SLA-1*0401 was docked with the cleaned PDB structure of SLA-1*0401 in GalaxyPepDock server (http://galaxy.seoklab.org/cgibin/submit.cgi?type=PEPDOCK). The resulting top score structures of SLA-epitope complex were further refined in the GalaxyRefineComplex server (http://galaxy.seoklab.org/cgi-bin/submit.cgi?type=COMPLEX). Refined SLA-epitope structures were viewed using the iCn3D tool (https://www.ncbi.nlm.nih.gov/Structure/ icn3d/full.html). The dissociation constant (K d ) and the binding energy (∆G bind ) of each SLA-epitope complex were calculated in PRODIGY (https://bianca.science. uu.nl/prodigy/) at 300 K. This webserver tool uses non-interface surface properties and intermolecular contacts for predictive models [25]. Estimated K d and ∆G bind values for SLA-1*0401-bound epitopes were compared with that of the influenza epitope bound to SLA-1*0401. To determine the stability of SLAepitope complex formation, molecular dynamic simulation was performed in MDWeb server (http://mmb. irbbarcelona.org/MDWeb/) to obtain the root-meansquare deviation (RMSD) plot per residue. Molecular dynamics parameters include C-alpha Brownian dynamics in 100 ps time, 0.01 ps time change, 3.8 Ǻ distance between alpha carbon atoms, 10 step output frequency, and 167.36 kJ/mol Ǻ 2 force constant. This process employs the GROMACS MD setup with solvation using force-field Amber-99sb [26].

Highly conserved sequences in the ASFV structural proteins
Herein, the reference sequences identified for the ASFV structural proteins pp220, pp62, p72, p30, and CD2v have Shannon variability index ≤0.1. Sequences of pp220, pp62, p72, p30, and CD2v yielded 66, 12, 26, 6, and 2 highly conserved fragments, respectively. Table-1 shows the positions of highly conserved fragments identified for each antigen. Conserved sequences in pp220, pp62, and p72 are widely distributed as indicated by the presence of longer fragments adjacent to each other within the full stretch of their amino acid sequences. Antigens pp220, pp62, and p72 contain highly conserved fragments with lengths ranging from 9 to 102, 11 to 139, and 9 to 51, respectively. All these highly conserved fragments were used as reference sequences for cytotoxic T-cell mapping.

Highly conserved cytotoxic T-cell epitopes in the structural proteins of ASFV
All resulting cytotoxic T-cell epitopes mapped using the reference sequences were filtered by excluding potentially toxic and cross-reactive peptides. Table-2 shows the list of candidate cytotoxic T-cell epitopes in pp220, pp62, p72, p30, and CD2v indicating their corresponding SLA binders, sequence location, proteasome score, TAP score, and MHC IC 50 . A total of 35 highly conserved epitopes from the five structural proteins of ASFV were identified. These include 26 peptides from pp220, 7 from pp62, 6 from p72, and one each from p30 and CD2v. Analysis revealed that seven of these epitopes (four in pp220 and three in pp62) can bind to at least two widely distributed SLAs. In this study, all candidate cytotoxic T-cell epitopes have MHC IC 50 ≤500 nmol/dm 3 with positive TAP and proteasome scores.

Validation of identified cytotoxic T-cell epitopes
The PDB structure of SLA-1*0401 was utilized for the validation procedures in this study because the crystal structures of the other two SLAs (0101 and 0801) are not available in the RCSB protein data bank. Thus, for every antigen, the epitope with the lowest binding affinity (with the highest IC 50 ) to SLA-1*0401 was employed for the docking procedures. The candidate epitope YQYNTPIYY in CD2v binds to SLA-1*0801; thus, it was not included in the docking procedures. Peptides docked to SLA-1*0401 include pp220 RSNPGSFYW (P1) with MHC IC 50 392.5, pp62 DFDPLVTFY (P2) with MHC IC 50 324.8, p72 AIPSVSIPF (P3) with MHC IC 50 384.1, and p30 VVFHAGSLY (P4) with MHC IC 50 132.3. The influenza epitope NSDTVGWSW bound to SLA-1*0401 was used as a control to serve as a supporting reference for positive binding. Figure-1 shows the 3D structures of complexes formed for P1, P2, P3, and P4 and influenza epitopes (yellow), with the swine MHC I (magenta and blue) binding groove.
To evaluate if the binding of epitopes to SLA-1*0401 is favorable, the dissociation constant(K d ) and the binding energy (ΔG bind ) of each SLA-epitope complex formed were calculated. Table-3  464     Available at www.veterinaryworld.org/Vol.14/October-2021/7.pdf   Figure-2 shows the plot of sequence position and the RMSD per residue for each complex formed by SLA-1*0401 with P1, P2, P3, and P4, and influenza epitopes. All RMSD values in each complex are lower than 0.9.

Discussion
The recent emergence of ASFV in new countries and continents has significantly impacted worldwide pork production due to its high mortality [27]. At present, there are no commercially available vaccines [28] to treat or prevent domestic swine from this pathogen. Herein, this work aims to identify peptides that can be incorporated as part of a potential vaccine, targeting immunogenic proteins with crucial roles in the virulence of ASFV. Due to higher mutation rates of viral genomic materials, one of the most important properties in vaccines design is the eluding of immune epitope escape by utilizing highly conserved sequences for the identification of epitopes. This strategy may enhance efficacy and longer-term immunity against viral antigens. Therefore, this study utilized highly conserved reference sequences for cytotoxic T-cell epitope mapping. The Shannon variability index of ≤0.1 for all the identified fragments indicates that the reference sequences are highly conserved [29,30]. Cytotoxic T-cell response is critical for effective resolution of viral infections. It has been suggested that the cytotoxic T lymphocytes have important roles in ASFV protective immunity [1,31]. This study identified cytotoxic T-cell epitopes binding to the widely distributed SLA in pig populations which include SLA-1*0101, SLA-1*0401, and SLA-1*0801 [20,32]. All candidate epitopes identified herein, have MHC IC 50 values ≤500 nmol/dm 3 and are classified as good binders [22]. Another significant property that must be considered in any drug and vaccine development procedures is safety. Retrieved epitopes were filtered by conducting in silico toxin-peptide mapping to determine potential toxic epitopes; and Protein BLAST to identify epitopes with significant match to the sequences of Sus scrofa proteome in databases. Short peptides with e-values <1.0e-30 can be cross-reactive [23]. Candidate epitopes identified in this work can be classified as nontoxic. BLASTp assessments showed that the epitopes have e-values ranging from 0.19 to 75, making it less likely to cause cell damage and autoimmune reactions in swine. This study identified seven peptides from a total of 26 candidate cytotoxic T-cell epitopes that can promiscuously bind to at least two of the most widely distributed SLAs in swine. Promiscuous epitopes include IADAINQEF, FLNKSTQAY, QIYKTLLEY, and SLYPTQFDY in pp220; and GTDLYQSAM, FINSTDFLY, and STDFLYTAI in pp62. These cytotoxic T-cell peptides can potentially cover wider range of swine populations around the world, thereby increasing peptide vaccine immunogenicity. Moreover, resulting values of binding affinity (K d ) which are less than 1.0 and the negative binding energy (ΔG bind ) of the candidate epitopes P1, P2, P3, and P4 with SLA-1*0401 (Table-3), indicate favorable complex formation and validate the cytotoxic T-cell epitope mapping conducted in this study. To further demonstrate the immunogenicity of the candidate epitopes, the stability of peptide-SLA complex was evaluated by plotting the RMSD per residue which often represents the mobility of a residue in molecular dynamics simulation. More stable interactions have weaker mobility; thus, lower RMSD value. The RMSD values of all the residues in each peptide-SLA complex formed range from 0.0 to 0.9 Ǻ, indicating stable interactions [33]. This is despite the use of the weakest (with the highest MHC IC 50 ) cytotoxic T-cell epitopes (P1, P2, P3, and P4) from pp220, pp62, p72, and p30 in the docking procedures and molecular dynamics evaluations with SLA-1*0401. In addition, the value of K d , ΔG bind , and RMSD plot of influenza A H1N1 virus cytotoxic T-cell epitope bound to the crystal structure of SLA-1*0401 was also evaluated as a control, to further highlight the positive binding of identified candidate cytotoxic T-cell epitopes to SLA. Table-3 and Figure-2 show that the binding affinity, binding energy, and RMSD plot of influenza A H1N1 epitope are comparable to the values estimated for candidate epitopes predicted in this work.

Limitations of the study
This study has provided preliminary data for the potential immunotherapeutic targets against ASFV. A computational approach was conducted in the analysis of highly conserved sequences and epitopes mapped from the structural proteins of ASFV. In addition, only cytotoxic T-cell epitopes of ASFV pp220, pp62, p72, p30, and CD2v were covered in this study. B-cell and helper T-cell epitopes can also be analyzed from the identified highly conserved sequences in succeeding studies. As with the step-by-step processes in any vaccine development, it is anticipated that in vitro and in vivo assays will be conducted in the future before application of the candidate epitopes inferred in this study.

Conclusion
This is the first work to identify highly conserved cytotoxic T-cell epitopes in the structural proteins pp220, pp62, p72, p30, and CD2v which have important roles in the virulence of ASFV. In silico analysis showed that the candidate epitopes can be safely used in vaccine formulations as they are classified non-toxic and are less likely to cross-react with the domestic swine proteome. The peptides can potentially cover wider domestic swine populations and can bind to their corresponding SLA alleles with stability which indicates that the epitopes are potentially immunogenic. Finally, highly conserved cytotoxic T-cell epitopes identified in this study may avoid immune evasion to offer longer and more effective protection against ASFV infection. The use of these candidate epitopes as part of future peptide or recombinant vaccines is anticipated to be validated both in vitro and in vivo in subsequent studies.