Analysis of UK and South African Strains of SARS-CoV-2 Using Resonant Recognition Model

With newly discovered UK variant of SARS-CoV-2 virus, which has been shown to be about 70% more infectious and possibly 30% more deadly, there is a need to understand why mutations within this variant are so critical. Here, we have applied the Resonant Recognition Model (RRM) to computationally analyse six the most critical mutations within this UK variant and we have found that these mutations are significantly increasing RRM characteristics related to its viral activity. To test the approach, we have also applied the RRM to three the most critical mutations within the South African variant of SARS-CoV-2 virus and found that those mutations are increasing RRM characteristics related to viral activity, but not as much as UK variant. This is in complete agreement with known viral activities of these SARS-CoV-2 variants. Using the same approach, we have applied the RRM model to predict possible even more critical mutations, which probably have not yet occurred, but may lead to even more virulent mutants of SARS-CoV-2 virus. Both UK variant mutations, as well as RRM predicted mutations, have been presented within 3D structure of spike protein during the interaction with ACE2 receptor. It has been shown that all these mutations are in close proximity of interaction site between spike protein and ACE2 receptor.


Introduction
It has been more than one year since the pandemic of COVID-19 disease has started. This pandemic has been caused by spread of SARS-CoV-2 virus, which is single stranded RNA virus. As the normal evolution of any virus, the SARS-CoV-2 virus is regularly mutating. Some of those mutations are not significantly influencing viral activity. However, it has been reported in December 2020 firstly in Britain, variant of SARS-CoV-2 virus containing 17 mutations, which was found to be about 70% more infectious and possibly 30% more deadly [1]. This variant was named Variant of Concern 202012/01 by Public Health England and is part of the B.1.1.7 lineage of coronaviruses.
The newly identified and more virulent UK variant of SARS-CoV-2 (lineage B.1.1.7) carries an unusually high number of genetic mutations, resulting in multiple amino acid changes and deletions. Eight of these mutations occur in the spike protein, which is on the surface of the virus and is first to interact with host cell. Six of spike protein mutations are proposed to be particularly critical for viral infectivity. The following three mutations are extremely worrisome: N501Y (which reportedly leads to an increased binding affinity to human and murine ACE2 receptors), 69-70del (which arose from the outbreak linked to farmed minks) and P681H (which is closed to the furin cleavage site for S1 fragment of spike proteins relevant for interaction with ACE2 receptor). It is also important to consider the B.1.1.7 previously identified high frequency mutations, such as: D614G (which may cause a moderately increased transmissibility), N439K (which may be an "escape mutant" for some neutralizing antibodies) and Y453F (which may be an "escape mutant" for some neutralizing antibodies).
In addition, there is newly identified South African variant of SARS-CoV-2 virus (lineage B.1.351), which is characterised with the following mutations: N501Y, K417N and E484K [2]. This variant was found not to be as virulent as UK variant, but more virulent then original SARS-CoV-2 virus. The main worry with South African variant is that mutation E484K is within receptor binding domain and was reported to be associated with escape from neutralising antibodies and thus adversely affect the efficiency of COVID-19 vaccines [2].
With more and more mutated variants of SARS-CoV-2 appearing, which are with different activity, virulence and mortality, there is a need to establish theoretical, computational approach that can predict strength of mutated viruses, just from known mutations when, and even before, these mutations appear in nature.
Here, we have applied the RRM model to analyse critical mutations within UK variant of SARS-CoV-2 virus, as well as South African variant of SARS-CoV-2 virus. We have correlated our results with the virulence of these different variants of SARS-CoV-2 virus. Also, we have applied the RRM model to predict the most critical mutations, which probably have not yet occurred, but may lead to even more virulent mutants of SARS-CoV-2 virus. Both UK variant mutations, as well as RRM predicted mutations, have been presented within 3D crystal structure of SARS-CoV-2 spike receptor binding domain bound with ACE2 receptor (PDBe>6m0j [15]) and positioning of these mutations in regard to binding site have been observed.

Resonant Recognition Model
The Resonant Recognition Model (RRM) is theoretical, biophysical model that can analyse interaction and recognition between proteins and their targets, which could be other proteins, DNA, RNA or small molecules, has been published previously in details [3][4][5][6][7][8][9][10][11][12][13][14]. The RRM model is based on the findings that certain periodicities (frequencies) within the distribution of energy of delocalised electrons along protein are critical for protein biological function and/or interaction with their targets. The distribution of these energies is calculated by assigning each amino acid physical parameter representing the energy of delocalised electrons of each amino acid. Next step was to calculate spectral characteristics of such energy distribution (signal) using Fourier Transform, where the linear numerical signal is transformed into the frequency domain and is characterised by number of different frequencies making up the original signal. Comparing such spectra using cross-spectral function for proteins sharing the same biological function/interaction, it has been shown that they share the same frequency within the spectrum of energy distribution along the protein [3][4][5][6][7][8][9][10][11][12][13]. Peak frequencies in such multiple cross-spectral function present common frequency components for all sequences analysed. The comprehensive analysis done so far confirms that all protein sequences with the common biological function and/or interaction have common frequency component, which is specific feature for the observed function/interaction [3][4][5][12][13]. Thus, each specific biological function/interaction within protein is characterised by its specific frequency. The strength of the signal either in spectrum or cross-spectrum, can be measured by signal-to-noise ratio (S/N). The S/N is calculated as the ratio between the signal intensity at the specific frequency and the mean value over the whole spectrum. Higher S/N value means stronger signal at the specific frequency and possibly the stronger biological function/interaction related to this frequency.
Knowing the characteristic frequency of a particular protein function/interaction creates the possibility to predict which amino acids predominantly contribute to this frequency and consequently are critical for the observed function/interaction. This could be achieved by small alternations of amplitude in single protein spectrum at characteristic frequency and then observing which amino acids are mostly sensitive to this alternation [3][4][5][6][7][8][9][10][11]. These sensitive amino acids, so called -hot-spot‖ amino acids, are related to characteristic frequency and consequently to the corresponding biological function/interaction. The -hot-spot‖ amino acids predictions, using the RRM, have been applied already to a number of protein and DNA examples including interleukin-2, SV40 enhancer, epidermal growth factor EGF, Ha-ras p21 oncogene product, glucagons, haemoglobins, myoglobins and lysozymes [3][4][5][6][7][8][9][10][11]. In all these examples, it has been found that predicted -hot-spot‖ amino acids, although not sequentially linked, are clustered in the protein 3D structure and are positioned in and around the protein active or interaction site [3][4][5][6][7][8][9][10][11]. In number of examples predicted -hot-spot‖ amino acids have been found to present amino acid positions denoting critical functional mutations [3][4][5][6][7]. In addition, it has been experimentally documented that such predicted -hot-spot‖ amino acids denote amino acids crucial for protein function on example of influenza virus [16]. It is extremely important to understand that the RRM model is particularly efficient when it is applied to viruses. In general, viruses are mutating very quickly, making extremely hard to make vaccine with current approaches based on homology. However, even when viruses are mutating so often and so quickly, they are still keeping their specific functionality. The RRM is not looking at all into homology of mutated viruses, but it is looking for characteristic parameter(s) identifying virus protein's biological function/interaction [3][4][5]. Thus, the RRM analysis can identify the common characteristic frequency of viral activity, which does not depend on virus mutations as long as they keep their functionality. Based on this common characteristic frequency, the RRM can analyse the strength of viral activity by calculating S/N at this characteristic frequency for different mutants. This RRM approach has been experimentally tested on the example of HIV virus, which is highly mutated, but all isolates keep the same RRM characteristic frequency [17][18][19].
Baring all above in mind, we can propose that RRM model is an excellent tool for analysis of protein functional mutations, particularly within viruses, as well as predicting functionally critical -hot-spot‖ amino acids within the viral proteins. Here, we have applied the RRM model to analyse mutations within highly infectious UK variant of SARS-CoV-2 virus and as well as South African variant of SARS-CoV-2 virus. We have also predicted -hot-spot‖ amino acids which if mutated could possibly produce even more virulent variant of SARS-CoV-2 virus.

Previous RRM Results with SARS-CoV-2
The spike proteins, which are on the surface of coronavirus, are the first to approach and recognise host cells. Thus, we have analysed spike proteins from different coronaviruses with the aim to find out if there is any common component that can characterise spike's recognition and interaction with host cells [14]. When we have analysed spike proteins from different coronaviruses, from UniProt database, using the RRM, the most prominent common RRM frequency has been found at f1=0.2827±0.0009, as presented in Figure 1 [14]. Figure 1. RRM cross-spectrum of spike proteins with common characteristic frequency at f1=0.2827±0.0009 [14].
It is interesting to note that there is one unique common characteristic frequency for all analysed spike proteins from many different coronaviruses. This would mean that all coronaviruses have one and the same RRM characteristic frequency f1 characterising viral recognition and interaction with host cells.
The SARS-CoV-2 viral infection is enabled through interaction between spike S1 fragments and receptors on the surface of host cells. These host cells receptors for some coronaviruses, like HCoV-NL63, SARS-CoV (the virus that causes SARS) [20] and SARS-CoV-2 (the virus that causes COVID-19) [21] is angiotensin-converting enzyme 2 (ACE2). The ACE2 receptor has been found to be on the surface of various cell membranes, including cells in the lungs, heart, arteries, kidney, and intestines. The binding of the S1 spike protein fragments of SARS-CoV and SARS-CoV-2 viruses to the ACE2 receptor on the surface of cells leads to endocytosis and translocation of the virus into endosomes within the cells [22].
When we have analysed spike S1 fragments from coronaviruses affecting humans, from UniProt database, using the RRM, the most prominent common RRM frequency has been found at f2=0.3145±0.0019, as presented in Figure 2 [14,23]. Figure 2. RRM cross-spectrum of spike S1 fragments from coronaviruses affecting humans with common characteristic frequency at f2=0.3145±0.0019 [23].
As S1 spike fragments are interacting with ACE2 receptor, we have analysed both groups of proteins, using the RRM model, to identify RRM frequency characterising this interaction [23]. To achieve that, we have used the RRM cross-spectra function, to compare ACE2 receptors and S1 spike proteins from coronaviruses that are interacting with ACE2 receptor. Interestingly, the common characteristic frequency of f2=0.3145±0.0019, appears to be the same as RRM frequency for S1 spike fragments, as presented in Figure 2 [23]. Thus, as proposed by RRM model, the frequency f2 is characterising the interaction between S1 spike fragments and ACE2 receptors.

RRM Analysis of UK and South African SARS-CoV-2 Variants
Once when we have identified the characteristic frequency f1 for spike proteins, as well as the characteristic frequency f2 for S1 fragments of spike proteins, which are relevant for interaction with ACE2 receptors, we can identify how much certain mutations can affect strength of the signal within mutated proteins at these specific frequencies. Firstly, we have to calculate the strength of the signal, as signal-to-noise ratio (S/N), at these characteristic frequencies f1 and f2 within the original strain of SARS-CoV-2 virus (P0DTC2 from UniProt database). For this strain S/N for frequency f1 within the whole spike protein was found to be at 0.90 and S/N for frequency f2 within the S1 fragment of spike protein was found to be at 1.71, as presented in Table 1 and Figure 3.  It can be observed that more virulent variants have higher S/N ratio at both RRM characteristic frequencies f1 and f2.
To analyse the UK variant, it was required to modify original strain with six the most critical mutations identified within UK variant of SARS-CoV-2 as follows: N501Y, 69-70del, P681H, D614G, N439K, Y453F, as described in the Introduction. S/N for frequency f1 within the UK variant of whole spike protein was found to be at 1.36, while S/N for frequency f2 within the UK variant of S1 fragment was found to be at 2.65, as presented in Table 1 and Figure 3.
To analyse the South African variant, it was required to modify original strain with three the most critical mutations identified within South African variant of SARS-CoV-2 as follows: N501Y, K417N, E484K, as described in the Introduction. S/N for frequency f1 within the South African variant of whole spike protein was found to be at 0.94, while S/N for frequency f2 within the South African variant of S1 fragment was found to be at 1.80, as presented in Table 1 and Figure 3.
It can be observed from Table 1 and Figure 3, that both UK and South African variants have increased S/N ratio in comparison to original virus at both characteristic frequencies f1 and f2, while UK variant has much higher increase of S/N ratio than South African variant at both frequencies f1 and f2. Such increased S/N at frequencies critical for SARS-CoV-2 spike activity and its interaction with ACE2 receptor indicates, according to RRM model, that mutated variant is more active and consequently more virulent. This result is in complete agreement with RRM proposition that viral activity is proportional to S/N ratio and possibly can explain why UK variant has much higher prevalence for interaction with ACE2 receptor and thus is more virulent.

Prediction of "Hot-Spot" Amino Acids in SARS-CoV-2 Spike Protein
The next step was to find out, if there are other possible mutations within SARS-CoV-2 spike protein that could produce more virulent and possibly more deadly virus. The RRM proposes that S/N at characteristic functional frequency is important for the strength of biological function/interaction of proteins. This has been shown above, within the example of UK and South African variants of SARS-CoV-2, where it was identified that virulence of variants is proportional to value of S/N ratio for both RRM frequencies, f1 and f2, characterising spike proteins. To find out possible more virulent and deadly mutations within SARS-CoV-2 spike protein, we have used RRM ability to identify -hot-spot‖ amino acids, which are mostly related to the characteristic frequency(ices) and thus mostly influencing protein biological function/interaction. This can be achieved by altering amplitude in single protein spectrum at characteristic frequency and then observing which amino acids are most sensitive to this alternation [3][4][5][6][7][8][9][10][11]. In the case of whole spike protein, we have used its RRM characteristic frequency of f1=0.2827 and by increasing amplitude at this frequency by 8.2%, we have identified that five most sensitive amino acids are at the positions: F192, F1062, G1085, G1124 and G1131. If these amino acids are mutated, then the S/N ratio at frequency f1 will be increased to 1.56. This is higher than S/N in UK variant, which is 1.36 and much higher than S/N in South African variant, which is 0.94 and also original strain, which is 0.90, as presented in Table 1 and Figure 3. This indicates that mutations at RRM predicted -hot-spot‖ amino acid positions could increase the strength of spike protein activity producing possibly even more virulent variant of SARS-CoV-2 virus.
In the case of S1 fragment of spike protein, which is relevant for interaction with ACE2 receptor, we have used its RRM characteristic frequency of f2=0.3145 and by increasing amplitude at this frequency by 5.5%, we have identified that five most sensitive amino acids are at the positions: F318, F400, F429, F464 and F486. If these amino acids are mutated, then the S/N ratio at frequency f2 will be increased to 2.84. This is higher than S/N in UK variant, which is 2.65 and much higher than S/N in South African variant, which is 1.80 and also original strain, which is 1.71, as presented in Table 1 and Figure 3. This indicates that mutations at RRM predicted -hot-spot‖ amino acid positions could also increase the possibility of interaction and strength of interaction between S1 spike fragment and receptor ACE2 possibly producing even more virulent variant of SARS-CoV-2 virus.

3D Presentation of the Most Critical Mutations
When the most critical mutations within UK variant of SARS-CoV-2 virus (N439K, Y453F, N501Y) are positioned within 3D crystal structure of SARS-CoV-2 spike receptor binding domain bound with ACE2 (PDBe>6m0j [15]), it can be observed that they are, as expected, within close proximity of interaction site. These mutations are presented in Figure 4 with CPKs in yellow. When predicted RRM -hot-spot‖ amino acids within S1 fragment of spike protein (F400, F429, F464, F486) are positioned within the same 3D crystal structure, it can also be observed that they are within close proximity of interaction site. This is in complete accordance with previous RRM findings, where predicted -hot-spot‖ amino acids are found to be positioned in and around the protein active or interaction site within 3D structure [3][4][5][6][7][8][9][10][11]. These RRM predicted -hot-spot‖ amino acids are presented in Figure 4 with CPKs in red.

Discussion and Conclusion
With recently appearing new mutants of SARS-CoV-2 virus, where some are more virulent and deadly, there is a need to establish theoretical approach that can predict strength of mutated viruses, just from known mutations sometimes even before these mutations appear in nature. Here, we have employed the RRM model, which is biophysical theoretical model, capable to analyse protein's interactions/functions, as well as influence of mutations to strength of protein's interactions/functions. The examples, we have used here, are UK and South African variants of SARS-CoV-2 virus, which were both found to be more virulent, but with different strengths. By identifying characteristic parameters (frequencies) for spike activity and its interaction with ACE2 receptor, we found that amplitude's S/N at these frequencies are significantly higher within UK variant and slightly higher within South African variant in comparison with original strain of SARS-CoV-2 virus, as presented in Table 1 and Figure 3. Such results can explain why UK variant has much higher prevalence for interaction with ACE2 receptor and thus is more virulent than other so far known variants of SARS-CoV-2 virus. These results are establishing the RRM model as theoretical model capable of analysing and predicting strength of mutated viruses, just from known mutations.
To take it further, we have applied the RRM model to identify critical amino acids, so called -hot-spot‖ amino acids, which, if mutated, could produce even more virulent and possibly more deadly variant of SARS-CoV-2 virus. Similarly, as with mutations of UK variant, these predicted critical -hot-spot‖ amino acids, when positioned within 3D structure of SARS-CoV-2 spike binding domain bound with ACE2 are also found to be within close proximity of interaction site between spike proteins and ACE2 receptor, as presented in

Competing Interests
Authors declare they have no competing interests.

Human/Animal Involvement
Authors declare that there were no human participants nor any animal involvement in this study.

Funding
This research received no external funding.