The Smooth Evolution of the Universal Genetic Code. Main Episodes

The possible scenario of the origin and evolution of genetic code is proposed, being primarily implicated by the working hypothesis which states that the chronological order of amino acids evolutionary implementation monotonically correlates with their increasing mass. It fulfills the minimalistic claim of the smallest changes of the evolving system at increasing complexity, hereinafter called "the smooth evolution". The working hypothesis was postulated concerning the results of statistical analysis indicating a strong correlation between amino acid mass and the chosen parameters of contemporary genetic code, which are expected to change in a certain individual direction during the evolution of the initial genetic system. It was additionally supplemented by the most common hypotheses adopted from the literature, as stereochemical, 'frozen accident' and coevolutional. The developed scenario allows a detailed description of the twenty-two consecutive episodes of the history of code definition and the estimation of its dynamics. It reveals the main eras of evolution conditioned by the environmental and structural constraints. It also lets the estimation of the evolutionary frequencies of codon sense expansion, and redefinition. Dominating trends and amino acids were indicated. The underlying assumptions, limits, exceptions, and the future of the code evolution have been discussed.


Introduction
The genetic code discovered in the mid-sixties of the last century (Nirenberg & Matthaei, 1961;Söll et al., 1965) are nearly universal relations between the sequence of bases in nucleic acid and the sequence of amino acids in a protein (Berg et al., 2002). It is responsible for the accurate transfer of information from the nucleotide triplets in DNA, through RNA, to synthesized protein, during the gene expression process, what guarantees the preservation of primary, and higher level, protein structure. The genetic code differing in the semantics from the contemporary version was probably originated in the initial phase of biological evolution, during the transition between RNA to DNA world. Its maturation could take place on the evolutionary pathway linking hypothetical minimal organism, MO, and the last common ancestor, LUCA, i.e. a minimal molecular assembly compatible with life (Forster & Church, 2006) and an entire minimum genome (Mat et. al., 2008). energy, transfer accuracy, and minimization of the harmful effect of genetic errors (Sonneborn, 1965;Woese, 1965;Itzkovitz and Alon, 2007;Guilloux and Jestin, 2007;Seligmann, 2011;Kumar and Saini, 2016), stereochemical affinity of codons and amino acids (Pelc & Welton, 1966), the randomness of abiotic allocation (Crick, 1968), coevolution of the code and metabolism (Wong, 1975), and the ancient proto-code (Koonin & Novozhilov, 2017). The hybrid combination of the above (Freeland, 2000;Chechetkin, 2006) was also proposed. In all cases, some evidence in support was seeking with the use of experimental tests, analysis of characteristic correlations or Monte Carlo simulations.
Many different single-factor criteria and multi-factor hypotheses about the chronological order of appearance of amino acids in the early evolution were reviewed and summarized in consensus ranking (Trifonov, 2000). Nine amino acids (Gly, Ala, Val, Asp, Glu, Pro, Ser, Leu, and Thr) of Miller's primordial environment are all ranked as topmost. Six of them are the lightest amino acids in the proteinogenic pool, In the following paper, we propose to apply to some extent the former ideas together with the postulated work hypothesis of 'smooth evolution', declaring possible smallest, simplest and most effective changes in gradually erasing genetic code and its evolving semantics. The analysis of the over one thousand of evolutional scenarios allowed for the formulation of the set of the main essential rules of evolutionary priority and that way the anticipation of the scenario of the precedence and temporal evolution of discussed processes, divided into twenty-two chronological episodes. Each episode covers the interval of time, including the existence of certain, parallel or competing, evolutionary processes, dependent on the previous history and determining the future of the system. This scenario, to the best of the author's knowledge, is currently the most elaborate concept. Despite this, it cannot be treated as the only one, but rather as the most possible attractor to which there were tending all the evolving early RNA -amino acid systems.
The obtained course of the evolution shows the main evolutionary phases, i.e., pre-code, proto-code preribosomal (warm), proto-code pre-ribosomal (moderating), proto-code proto-ribosomal, and protocode ribosomal eras, conditioned by the thermal and structural constraints. The estimation of the evolutionary frequencies of codon sense expansion, and amino acid redefinition allowed for the indication of dominating trends and the most important amino acids. The assumptions of the model, essential limits and possible exceptions from standard, as well as the future of the code evolution, have been finally discussed.

Empirical premises
If to analyze the present UGC regarding its possible initial evolution some general observations can be articulated. They are presented below. Obs.1. The molecular weight of amino acid correlates with the strength of corresponding codonanticodon hydrogen bonding. Let's consider the eight RNA triplets containing only bases G and C, i.e., GGG, CGG, GCG... CCC, which are the most bounded triplets in mRNA -tRNA interactions (up to 9 H-bonds), coding four amino acids (Tab. 1). Let's also consider the corresponding triplets AAA, UAA, AUA... UUU as a result of the full set of possible point mutations, conserving the overall nucleotide structure (i.e., The number of carbonnitrogen rings) but decreasing its bounding, namely, transitions G-A and C-U. They are the least bounded triplets (up to 6 H-bonds) coding six amino acids and one release factor. In each case the amino acids coded with the most strongly bounding triplets, i.e., Gly, Arg, Ala, Gly, Pro, Arg, Ala and Pro, have a smaller molecular mass than amino acids (or release factor) coded by the corresponding weak triplets, i.e., Lys, Och, Ile, Asn, Leu, Tyr, Ile and Phe. The mentioned trend (c.c.= -0.996 for averages), weaker bounded codon -heavier object coded, is also clearly visible on the summary plot for all possible amino acid codons (Fig. 1). Shown linear regression may be a trace of hypothetical evolutionary course, leading from the strongly bounding codons and small amino acids coded towards the weak codons and heavy amino acids, at the increasing number of coded entities. The reverse course at decreasing the number of coded amino acids was not included for further considerations. The association between codonanticodon duplex stability and amino acid weight was earlier discussed by Seligmann (2010).
Obs.2. The molecular weight of amino acid correlates with its distance from the center of the metabolic system.
Let's consider amino acid metabolism pathways (Fig.2) and the distance of a given amino acid from the arbitrary chosen central point of the glycolysis route (Glycerate -3-Phosphate), probably one of the earliest metabolic pathways. Discussed distance, describing amino acid peripherality, was defined by a minimal number of catalytic processes leading from the assumed center to the final product.
Summary analysis of mass vs. distance (Fig.3) shows an overall positive trend (c.c.= 0.936), especially seen when amino acids are grouped in main similarity families.
Despite wide dispersion, the visible regularity may be the relict of the evolution of amino acid metabolism and its incorporation into genetic machinery. Coevolution of genetic code and metabolic pathways was postulated by Wong (1975Wong ( , 2005. Obs.3. The molecular weight of amino acid correlates with the packing of genetic code. Let"s consider the packing of genetic code, which means how many different entities (amino acids or release factors) can be coded due to the substitution of the nucleotide at the third position of the coding triplet (Tab. 2.). The columns and lines are assigned to the first and second position of the codon. The color of the field means the number of possible different entities coded due to the substitution of the nucleotide at third position: blue-1, green-2, red-3.
The average mass of packed amino acids was analyzed (Fig. 4). Slightly manifested trend (c.c.= 0.996), the tighter packing -the heaviest amino acids, may be observed, probably as a result of combining wobble and crowding effects during the initial spread of the evolution.

Obs.4. The molecular weight of amino acid correlates with the accessibility of the codons.
Let"s consider genetic codon during mRNA-tRNA triplet stacking in the translation process. The order and size of bases in the codon may facilitate or reduce the accessibility of the codon (Fig.5). Concerning the wobble hypothesis, the first and the second position of the codon play a more important role than the third position. The full approach always requires base pairing at first and second position, with the direction of the anticodon approach from 3" mRNA end. Then codon base at first position can be sometimes shaded by the base at the second position. This effect was assumed to be proportional to the difference in the molecular mass of code nucleotides 2th and 1nd (Tab. 3). Thus the accessibility measure was defined oppositely as the difference m.w.1m.w.2. Field colors indicate the level of codon accessibility during the translation process: whiteneutral, gold facilitated, violetreduced.
The average molecular mass of coded amino acids vs. the measure of the accessibility of the codon was analyzed. (Fig. 6), showing the trend (c.c.= -0.857), more accessible codons -lighter amino acids, which may be the result of the hypothetical evolutionary strategy, first attributing easily accessible codons to the light amino acids. Properties contrasting 1st vs 2nd codon positions were mentioned correlated with genetic code history by Demongeot and Seligmann (2019).

General remarks
During the early evolution of genetic code a decrease in hydrogen bonding of codons, an increase in metabolic peripherallity of amino acids, an increase in packing of code, and a decrease in accessibility of codons may be expected. These varying properties, as it was shown above, in overall, move in tandem with an increased molecular mass of amino acids. Such observation suggests the possibility of a sequential increase in the mass of new coded amino acids during the evolution of universal code. Additionally, the rational claim of minimal structural changes allows us to specify this suggestion and precise postulation of sequential increase in the mass of new amino acids, starting from the lightest one. This assumption was adopted in further considerations as a working hypothesis.
The minimalistic model of the emergence and development of a UGC is proposed. It presents the shortest route obeying the simplest possible modifications of the evolving coding system, gradually increasing its complexity, i.e., from the first codon, and first coded amino acid, towards the current universal code with 64-codons, coding 20 proteinogenic bricks. It postulates that this route approximates a certain universal bio-physicochemical attractor, common for the all proto-organisms in the initial phase of the evolution of life, the history of which, in general, can be divided into the 22 chronologically consecutive episodes, essential for the development of a code in the form we know. Logically arranged episodes obey simultaneous processes, starting approximately at the same time, and running in the same period at a comparable rate, interrelated or not, sometimes competing. They mainly include the first definition and further redefinition of amino acid for a given codon, or the code expansion for a given amino acid, which led the emergence both of new coded amino acids, and used codons, and specified the dynamic constraints for the evolutionary pathways in the following episodes. Generally, one type of amino acid is introduced in a given episode, or amino acids of the same mass.

The precedence rules
According to the model, the order of the emergence of new amino acids, and newly defined codons, in the subsequent episodes, as well as the type of processes taking place, were regulated by the certain evolutionary dominating precedence rules (Tab. 4). At least one codon representing a given amino acid should stay not redefined 2.
No more than four redefinition incidents per episode 3.
An amino acid with the five-membered nitrogen ring (Pro, His) can redefine only one codon 4.
Amino acid-containing sulfur (Cys, Met) can redefine only one codon 5 The same mass of redefining amino acids in a given episode

Results
The evolution of universal genetic code A brief screenplay of the evolution of the UGC according to the proposed model was presented below (for detailed description, see supplementary material in Appendix 1 and Appendix 2). The main episodes were visualized as the movements and conversions on the board N1\N2 of nucleotide combinations (Fig. 8). Episodes 1 and 2 (E.1, E2.) introduce hypothetical pre-code starting conditions. Following episodes (E.3 -E.22) show evolutional progress due to the simplest possible implementation of the precedence rules (Tab. 4).
The presented model places the initial phase of the evolution of the genetic code in the hot era of the RNA world, dominated by self-replicating molecules of RNA, possessing some catalytic properties. They were prevailingly built of weak bonding light nucleotides A, and U, inducing the collective ephemeral interactions between the approaching RNA strands, or within a single RNA molecule. It could lead to the formation of the dynamic RNA sol-gel inter-phase areas, being natural reactors where the first biochemical reactions could be catalyzed. The dominant primary structure of the ancient RNA is rich in poly-A and poly-U regions but also contains some amount of bases G, and C. They can form the first self-replicating proto-tRNA (E.1), and aminoacyl-tRNA synthetase protoribozyme (E.2).
Aminoacyl load of the proto-tRNA required strong and well-defined bonding. Due to some cooling, at the end of the hot pre-code era, the doublets of the strong bonding nucleotides in sequence (up to 6 Hbonds), became capable to preserve the thermal tearing, and provide the local, stable, complementary interaction between RNA strands. It enabled the more precise organization, function, and proper replication of some early RNA structures, which could drive complex physicochemical pre-life processes. The first ribozyme based aminoacyl-tRNA synthetase (Nick et al., 2000), utilizing the most frequent strongly bonding doublet joints, GG-CC, recognizes CC targeted proto-tRNA and transfers the simplest and stereochemically affined amino acid, glycine, to the 3' end of the future transporter (Fig. 7). The possible opposite configuration (e.g., CC-codon, GG anti-codon) implies the less curvature of the pro-tRNA anticodon loop, which hinders the exact approximation, recognition, and pairing (Duan, 2018). Likely in this way, with the synthesis of the first gly-proto-tRNA, the right evolution of the genetic code began (E.3). Final assignation to the weaker bonding codon doublets GA and AG ends an initial hot era.
The further cooling initiates the proto-code preribosomal warm era, when, besides the strongly bonding nucleotides, they are freely incorporating the weaker ones (E.4 -E.9). The following evolution of the first tRNA loading system with the twopositional code, obeying primarily mutation of codons and amino acid redefinition due to the close substitution, leads to the raising of functional precursors of the modern tRNA. In the beginning, they play structural, and catalytic, non-proteinobiotic role.
In the following proto-code pre-ribosomal moderating era (E.10 -E.11), the weakest doublets of A, and U, can be successfully incorporated into the code, too. It enables the further assignation of the aa-proto-tRNA complexes and the introduction of the blind termination codon UA in the synthesis of the small peptides due to the catalytic activity of the supramolecular ribozyme structures. The proteinobiotic role of the proto-tRNA begins to be manifested and the first translational system initiates.
Later, in the proto-code proto-ribosomal era (E.12 -E.16), due to the activity of the first proto-ribosomes the third position in the codons becomes significant, too. This allows the addition of the heavier amino acids to the existing coded collection and the synthesis of the short polypeptides, some of them related to the maturating ribosomes. Initially, the role of the stop codon is naturally played by the sequences complementary to the anticodons belonging to the tRNAs not-charged with amino acids, which were the precursors of the protein release factors. The two protein-based release factors are activated in the following proto-code ribosomal maturation era (E.17-E.22) taking the three unused triplets. This closes the period of the rise and formation of the UGC and opens the phase of its global implementation in the synthesis of the first complex proteins.

Discussion
The action of life takes place in a narrow range of physicochemical parameters (Henderson, 1913). To meet them biological evolution at the beginning could not have an abrupt and discontinuous character. It seems, that the minimal change strategy proved to be very effective.
Present UGC is the set of rules determining the most persistent organization of protein structure -the order of amino acids. Certainly, it was not the case at the beginning of its formation when aa-proto-tRNA complexes do not play the transfer role, but mainly structural or catalytic one and the self -reference may be an important factor (Guimarães, 2013).
The presented model shows a smooth passage from the first single coded amino acid, glycine, at that time were not building any protein, to today's entire system of codons specifying the basic components of life during the translation process. According to the proposed model, the proteinobiotic role of the codons raised in the middle of the evolutionary course. It was preceded by the creation of the certain pro-catalytic RNA structures, i.e. proto-ribozymes and proto-tRNA, synthesizing first aa-proto-tRNA complexes in earlier eras. Revealed pseudodynamics of the UGC formation is presented in Fig.  9.
It reflects the true dynamics only on the assumption that all episodes took the same period. At this constraint, high dynamics can be expected at the beginning of the code evolution (E.3 -E.6), and after initiation of the full ribosomal activity (E.17 -E.22). Redefinition of serine by proline (E. 6) initiates the phase of relatively low dynamics (E.7 -E. 16), devoted to the creation of aa-proto-tRNA, small peptides, ribosomes, and short polypeptides.
Summarizing, the 2 specific definitions and 54 redefinitions of already assigned codon to a new amino acid, supplemented by 10 transitions, and 5 transversions, during the expansions of an assigned codon to a free one, were considered to create UGC. As a result, the most expansive amino acid turned out to be alanine, performing 4 expansions (Eps. 4 and 5). The isoleucine, was the most active in the codon sense redefinition, conquering 5 codons (Eps. 10 and 11).
The most important limitations of the model are related to the environmental, strictly thermal, conditions influencing thermostability (Gotoh, 1983;Xia et al., 1998). It describes hyperthermophilic genetic code origins based on the strong bonding duplexes G, and C, but it may not fully describe the evolution of the code of some exotic systems which were earlier adopted to the extreme thermal conditions or enter locally cold era, what allowed them to use the weaker bonding duplexes containing A, and U, and to follow the nonstandard evolutionary pathway. Although, in general, the proposed model assumes the cooling trend during the evolution of code meanings, it allows the hyperthermophilic maturation of genome compositions (Di Giulio, 2003).
Also, environmentally driven, some specific differences in the maturation of ribosomes may be the source of the exceptions from universal standard observed today in some coding triplets, in the third position containing purines.
Some questions may arise regarding assumed precedence rules (Tab. 4). The priority of codon sense redefinition before its expansion, underline greater importance of the quality than the quantity at the beginning of the evolution of life. The rules 1.1 and 2.1-3 meet the empirical premises discussed in the earlier chapter. It is necessary to stress, that less accessible codons are favored in the redefinition as the target for new amino acids because its small use minimizes the possible changes in the evolving system. The exception obeys the first gly codons (GGN), conserved for the stability of the newly emerged system.
The accessibility of the bi-codons was estimated according to Table 3. In the case of comparison of codons with different bi-codons and partially distinguishable third position (i.e. Purines||Pyrimidines), the negative accessibility of a bi-codon was multiplicated by a factor of 2, and positive accessibility divided by a factor of two. In the analogous case, but for the codons with the same bi-codon it was assumed that the third position of a codon in proto-ribosomal eras favor triplet codons with a total number of three carbon rings at the first and the third position. This means that the mean crowding and rigidity in the vicinity area of the central nucleotide is preferred. This advantage could also be important in the case of codons UGA and UAA, but the accessibility of these codons was inhibited by the proto-ribosome activity. The assumption of domination of transition over transversion in code expansion and the first supplementary rule of the important rest (Tab. 4), evokes fewer doubts. The limit of four redefinitions per episode was assumed to approximate the logistic and energetic capabilities of evolving live systems. This assumption makes each of the four nucleotides in the first and second position of the code capable to participate in the redefinition of the amino acid. Also, the limitation of the evolutional activity of the "exotic" amino acids containing sulfur, or 5-carbon ring, (Cys, Met, His, Trp) have practical reasons, related to the expected structural properties of the synthesized polypeptides, i.e. high flexibility and small rigidity. Finally, the constant mass of the redefining amino acids in a given episode expresses the working assumption that the mass of amino acid is the main factor determining the chronology of the evolution of code. With this assumption, a kind of the pseudo-rhythm of the evolutional episodes can be captured. Indicated association between amino acid weight and codon-anticodon duplex stability (Seligmann, 2010), and also a negative correlation between ""size/complexity"" rating and percentage occurrence in proteins (Dufton, 1997), further strengthens the main idea.
Based on the discussed model one may consequently predict that in the future the UGC should gain heavier than tryptophan non-standard amino acid, e.g. pyrrolysine (Pyl). Lighter selenocysteine is incorporated into some proteins by co-translational modification (Gonzalez-Flores at al., 2013) in the place assigned, until proto-code proto-ribosomal era (E.15), to cysteine. Fundamental change including pyrrolysine should be possible due to the redefinition of the meaning of the codons starting from CG doublet (Fig.10), or due to modification of the recognition in the area of the stop codon triplets. The second resolution is already tested in some bacteria and methanogenic archaea where pyrrolysine is defined by UAG codon, universally stop codon assigned to the release factors. Decreasing the number of stop codons seems not to be a generally good resolution for the organisms with big proteomes (Seligmann, 2019). Referring the created model to the existing concepts, we may place hemoton like cyclic metabolic systems (Ganti 2003) in the proto-code pre-ribosomal warm era, MO like systems in the proto-code preribosomal moderating era, and LUCA like systems in the proto-code proto-ribosomal era.
It was recently shown that the standard genetic code can evolve from a two-letter GC code without information loss or costly reassignments (Frank and Froese, 2018). An expanded (e. g. quadruplet) codons can be thought of in the context of expanding the existing genetic code (Chen & Schindlinger, 2010).
In the contemporary to some other models, assuming the initial setup of few essential codons and simple amino acids coding the first small proteins (Trifonov and Bettecken, 1997), the proposed "smooth" model considers the evolution starting from a single codon, despite its nonproteinobiotic role. From E.17 the difference in methodology of these attempts is not so important, because the initiation of the ribosome maturation, offers the function of the full three base-codon recognition, and thus the translation of simple proteins. Accepted in the model the division of the evolution into 22 episodes carries some dose of uncertainty regarding exact instantaneous chronology. The aim was to approximate, the evolving population of real individuals which chose evolutional routes in each episode, in the special individual order. The proposed "smooth" order of integration of amino acids compared to those consistent with the consensus ranking and the rules of abiotic origin, thermostability, complementarity, and processivity ( (Trifonov, 2000) were presented in Tab. 5. The thirty percent identity is observed, especially manifested at the beginning of the evolution. Some indirect vote for the correctness of the presented concept comes from the known phenomenon of the decrease in G+C content with the decrease in the genome size. Also, the mitochondrial tRNA sequences have a relatively smaller G+C content that their cytoplasmic counterparts. This is in agreement with the taken assumption that early RNA was dominated by A and U basis. Thus the best candidate for the first perceptible codon was respectively rare and strongly bonding doublet GG. It prevents the informatics shock and thermal catastrophe at the beginning of code evolution.

Conclusions
The proposed model is the first such global approximation of the discussed phenomenon of genetic code creation, bringing out the essence of the biological evolution: continuity, smooth and order, and revealing its full chronology. The author allows himself to believe that this model supported by a strong criticism and the further investigated details will bring a better understanding of the most important eon in the evolution of life, the natural creation of the UGC.

Appendix 1 A symbolic description of the evolution of the standard genetic code
The main pathways and processes of evolution were considered (Tab. 6). Applied symbol meaning: => expansion of the code, =>> || code divergence due to the ribosomal recognition ability of the third position of the codon, _ code definition, / code redefinition, the index indicates the version of the given coding system, proto-tRNA, or protoribozyme, bold fontpreferred version. They likely originated from the short poly-U strands, then partially mutated to the proto-tRNA strands (t 1 ), strongly bonding GG doublets in GGN triplets of catalytic structures. Other pathways supporting input to t 1 were also possible. Pathways for the emergence of proto-tRNAs specific to other codons were not thoroughly analyzed. They were likely evolved from t 1. E.2. Aminoacyl-tRNA synthetase proto-ribozymes. The emergence of self-replicating aminoacyl-tRNA synthetase proto-ribozymes (Rz 1 ), strongly bonded to the codon GGN of proto-tRNA, in the sol-gel reactor. E.3. Glycine. The first specific recognition, of the proto-t-RNA (t 1 ) and glycine (Gly 1 ), by some proto-ribozyme (Rz 1 ) and following aminoacylation. Next, the selfreplicating ribozyme and proto-tRNA system, using GGN codes for recognition, multiplies and partially evolves to the systems recognizing weaker binding triplets GAN and AGN (first codon expansion). III. Proto-Code Pre-Ribosomal Moderating Era. E.10. Isoleucine. Weak binding codons began to be useful. The redefinition of four codons (Ile 1 /Thr 1 , Ile 2 /Thr 4 , Ile 3 /Thr 2 , Ile 4 /Pro 4 ,) and the expansion of one code (Thr 3 =>Thr 5 ) occurred. E.11. Leucine. The redefinition of four codons (Ile 5 /Thr 5 , Leu 1 /Ile 1 , Leu 2 /Ile 2 , Leu 3 /Ile 4 ) and the expansion of two codes (Ile 3 =>Ile 6 , Leu 3 =>Leu 4 ) occurred. Some supramolecular molecules composed of the amino acid loaded proto-tRNA and the ribozymes get organized and begun to play some translational role in the simple processes of the synthesis of the first peptides along with the RNA matrix.

II. Proto-Code
IV. Proto-Code Proto-Ribosomal Era. E.12. Asparagine. The redefinition of four codons (Asn 1 /Ile 6, Asn 2 /Ile 3, Asn 3 /Leu 1, Asn 4 /Leu 2 ) took place and the initiation of the activity of the proto-ribosomes occurred. Proto-tRNAs (t 2 ), with anticodon region capable of strong binding to the complementary codons UAN marked the position of the first stop codons and started to play the role of proto-release factor. The third position of the codon begun to be essential for proto ribosomes enabling further differentiation of the aminoacyl-tRNA synthetase ribozymes, and the ribosomal distinguishing between different ribozyme coding systems, initially recognizing the last purines and the last pyrimidines (UUR||UUY). Systems with five carbon rings in the codon were preferred and overexpressed (Leu 5 ||Leu 5 ). The weak binding, next, the less accessible, codons were evolved first.

V. Proto-Code Ribosomal Maturation Era.
E.17. Methionine. The redefinition of one codon (M 1 /Ile 9 ) and one transitional expansion of proto-tRNA (t 5 =>t 6 ) occurred. The first protein-based release factor (r) took the two codons (UAA, UGA). E.18. Histidine. The redefinition of one codon (His 1 /Glu 1 ) and the reactivation of post-citosinic ribozyme Rz 2 capable to recognize tRNAs (t 6 ) and slightly bent to the codon UGA, occurred. E.19. Phenylalanine. The specific recognition of the amino acid, phenylalanine (F 1 ), by some ribozymes (Rz 2 ) and then the aminoacylation of a targeted tRNAs (t 6 ) took place. Next, the redefinition of three codons (Phe 2 /Glu 2, Phe 3 /Glu 3, Phe 4 /Lys 1 ) occurred. E.20. Arginine. The redefinition of four codons (R 1 /F 1, Arg 2 /Phe 4, Arg 3 /Phe 2 , Arg 4 /Ser 5 ) occurred. Ribosomal recognition of proto-tRNA release factor (t 5 ) was inhibited due to the optimization of the arg-tRNA reception. E.21. Tyrosine. The redefinition of two codons (Y 1 /R 1, Tyr 2 /Arg 2 ) occurred. The second protein-based release factor (s), the derivative of r, took the two codons (UAG, UAA). E.22. Tryptophan. The redefinition of one codon (W 1 /Y 1 ) occurred. Fig.1 The average mass of coded amino acids for the groups of codons bounding by a given number of hydrogen bonds (from 6 to 9 HB). The hypothetical course of the evolution was indicated by the arrow. Fig.2. Metabolism of amino acids. An empty arrow indicates the center of the metabolic pathways system, arbitrarily chosen within the glycolysis route. Fig.3. The average mass and metabolic distance of amino acids from the assumed center of metabolic system for the distinguished similarity families: Ser -serine, Pyr -Pyruvate, Glu-glutamate, Aspaspartate, Aroaromatic and Hishistidine family. The hypothetical course of the evolution was indicated by the arrow. The colors are as in Fig.2. Fig.4. The average mass of coded amino acids for the groups of codons of different density of packing.

Figure legends
The hypothetical course of the evolution was indicated by the arrow. The colors are as in Tab. 2. Fig.5. Accessibility of the codon by the anticodon during mRNA-tRNA triplet stacking in the translation process. Broken lines mark the available space for triplets fully approach (a). The order and size of the bases may influence codon accessibility positively (b), or negatively (c). Fig.6. The average molecular mass of coded amino acids vs. the measure of the accessibility of the codon. Colors meaning as in Tab. 3. The hypothetical course of the evolution was indicated by the arrow. Fig.7. Hypothetical picture of the synthesis of the first tRNA precursor. In the gel-like reactor RNA strands and amino acid molecules could meet closely together (a). This could result, both in a complementary, and stereochemical, fitting (b) initiating proto-tRNA loading (c, d). Fig.8. The evolution of UGC. Main episodes (E.1 -E.22). Rows\columns define nucleotides in the first\second position of a codon. The third position of a codon, if essential, is defined in the divided field, respectively in the order: GA for purines, and CU for pyrimidines. Color font, meaning: red-the code definition (underlined) or redefinition (encircled). Field color meaning: gray-the codon hardly or unstable binding in the hot, warm or moderating era, green-new ribosomal property related to the recognition of the third position by the ribosome. The arrows indicate the main routes of codon expansion. Font meaning: bold-preferred systems when the third position becomes essential, italicnew expanded code for the amino acid, tRNA anticodon or the recognition codon of the ribozyme. The empty field manifests as a nonsense codon or a stop codon. Fig.9. Pseudo-dynamics of UGC formation. The molecular weight of evolutionary new amino acid coded versus the episode number. At the bottom main eras are indicated: pre-code hot era (red), proto-code pre-ribosomal warm era (orange), protocode pre-ribosomal moderating era (gray), protocode proto-ribosomal era (blue), proto-code ribosomal maturation era (green). Symbols represent the main events: the first coded amino acid (gold circle), the beginning, and clarification of the ribosomal codon third position recognition function (small and big diamond), the incorporation of the release factor (square). Fig.10. The future of genetic code evolution. Nonstandard amino acid pyrrolysine (Pyl) assigned due to the redefinition of the bi-codon CG. Fig. 1.