Studying the Features of CRISPR Loci in Stenotrophomonas

To study the genetic structure of the Stenotrophomonas CRISPR-Cas system using bioinformatics methods. Methods The sequence information of all Stenotrophomonas strains published in the CRISPRdb database was collected, and the CRISPR locus was analyzed using the CRISPRFinder software; All spacers were searched by BLAST platform in the PubMed database to find homologous sequences, and then the relationship between the number of spacer sequences and the number of phages was statistically analyzed. Results According to statistics, 15 confirmed CRISPR structures and 132 questionable CRISPRs were found in 26 strains of Stenotrophomonas, and the repeat sequences of CRISPR structures in different strains were more conservative. Only 1.3% of spacers were homologous with the sequences of known bacteriophages or plasmid in NCBI database. Conclusion The targeted genes of the spacer sequences are mainly from the genome of the bacteria, indicating that the evolution of the Stenotrophomonas CRISPR is related to other bacterial genes. In addition, the negative correlation between the spacer sequence and the number of phages indicates that CRISPR can prevent phage invasion. Analysis of the structure of CRISPR loci in the genome of Stenotrophomonas laid the foundation for further study of drug resistance and genomic stability.


INTRODUCTION
Stenotrophomonas spp. is a kind of Gram-negative non-fermenting bacteria, of which S. maltophilia is the most common bacteria in hospital infection (Alavi et al., 2014). The genus is parasitic in humans and animals and is widely distributed in both natural and hospital environments. This organism can produce a variety of biologically active substances and have strong metabolic activity and adaptability (Ribitsch et al., 2012). In addition, S.maltophilia is a common opportunistic pathogen causing nosocomial infections. In recent years, the clinical isolation rate of S.maltophilia has been increasing due to the wide application of various broad-spectrum antibacterial drugs and invasive diagnostic techniques. The bacteria are resistance to most antibacterial drugs, and have increasingly attracted the attention of researchers in related fields at domestic and foreign (Brooke 2012).
Clustered regular interspaced short palindromic repeats (CRISPR) is an adaptive immune system for foreign genes that is widely found in archaea and bacterial genomes. The CRISPR system consists of CRISPR-associated gene (cas gene), a leader sequence (LS) and a CRISPR locus. The CRISPR locus are mainly composed of different numbers of direct repeats (DRs) and spacer sequences (Karimi et al., 2018). According to the composition of CAS protein and the structure of CRISPR locus, the CRISPR system can be divided into 5 types and 16 subtypes (Makarova et al.,2011). The CRISPR system is equivalent to the bacterial acquired immune system, which can block foreign DNA such as phage and plasmid from invading bacteria and avoid interference from foreign DNA (Mali et al., 2013). The CRISPR system has attracted widespread attention due to its unique structure and function (Wright et al., 2016).
In the laboratory study of the genetically engineered technology of S.maltophilia, it has been found that it is difficult to transfer foreign DNA from different sources into the strain, the success rate was relatively low, and the construction of recombinant expression plasmids is easily lost. Therefore, it is speculated that the resistance to foreign plasmids may be related to the CRISPR system. At present, the research on CRISPR mainly focuses on E. coli (Rahmatabadi et al., 2016), S. aureus (Zhao et al., 2018). and Salmonella (Shariat et al., 2015). The research on the bacterium of the genus Stenotrophomonas has not been reported. Therefore, the bioinformatics method was used to analyze the gene composition of the CRISPR system of the genus Stenotrophomonas.

Strain Selection
The CRISPR information of the 13 strains of Stenotrophomonas that have been published in the CRISPRdb database (http://crispr.u-psud.fr/) (as of December 25, 2018) was collected (Grissa et al.,2007). The genome sequences of 13 strains of Stenotrophomonas were downloaded from the National Center for Biotechnology Information (NCBI) network database (http://www.ncbi.nlm.nih.GOV/). The GenBank numbers are shown in Table 1.

analysis of CRISPR locus
For published genome sequences in CRISPR database, CRISPR loci were available through the database directly. Alternatively, genome sequences downloaded from GenBank at the NCBI, the detection of CRISPR loci in the downloaded strains were achieved using CRISPRFinder (http://crispr.u-psud.fr/).

Analysis of repeat sequences and spacer sequences
The repeat sequences in the obtained CRISPR loci were aligned by ClustalW, then repeat sequences were input into MEGA7.0 software for comparison analysis, and a phylogenetic tree was constructed. RNA secondary structure and minimum free energy (MFE) prediction by RNAfold with default arguments, and the conservation of repeats was represented by WebLogo. Blast analysis was performed on the spacer sequences in the locus to find foreign gene elements that is homologous to the sequence.

Statistical analysis of spacer sequences and phage sequences
The phage sequences in the Stenotrophomonas genome were predicted by Prophinder (http://aclame.ulb.ac.be/), and the correlation analysis between the spacer and the number of phage sequences was performed using the Spearman rank correlation test.

3.1Distribution of CRISPR arrays
In the 26 strains that were analyzed, a total of 15 confirmed CRISPR arrays and 132 questionable structures were identified. For 15 confirmed CRISPR arrays, the number of confirmed CRISPR arrays identified in the S. maltophilia EPM1 (GCF_000344215) genome is the highest. The second is S. Maltophilia (GCF_000742995), there are 3 confirmed CRISPR arrays, the rest of the strain genome contains only one confirmed locus. Most of these CRISPR-cas locus were located on their chromosome. 4 strains with more complete CRISPR arrays information, the structure of the CRISPR arrays are shown in Figure 1. The CRISPR system of S. Acidaminiphila JCM 13310, S. Nitritireducens DSM12575 and Stenotrophomonas sp. Leaf70 has the same structure and belongs to the I-F type, S. ginsengisoli DSM24757 belongs to the I-C type.

General overview of Repeat Sequence
A total of 147 CRISPR arrays were found in the 26 strains, of which 15 were confirmed CRISPR arrays, and the rest were questionable structures. Each CRISPR arrays contained more than 2 repeats. The repeat sequence is 25-32bp in length, while 25 bp is more common among all the repeats (Table 1). Sequence alignments show that the repeat sequences are highly conserved in the same CRISPR array. The same repeats are also distributed among different strains and they are conserved, However, there are also differences in repeat sequences in the same strains, which are reflected in partial base mutations and differences in CRISPR structures (Table 1).

RNA secondary structures and MFE of repeats
We predicted the sequence stability and conservation of DR by RNA secondary structure and MFE values. The secondary structures as well as the MFE of 8 repeats were predicted by RNA fold, and found that all of the repeats could formed a conserved RNA secondary structure that resembles a dumbbell shape (Fig.2). The RNA secondary structure of the stem length is generally4, 5, 6, 7 and 8bp. In addition to the stem length, the length of the repeat sequence and the "GC" content can also affect the stability of the secondary structure, repeats with higher "GC" content are more stable with the same length. In summary, DR sequences with lower MFE values are more stable than DR sequences with high MFE values. The formation of the RNA secondary structure of the Repeat sequence suggests that it may mediate the interaction of invasive DNA or RNA with the CAS-encoded protein.
Conservative analysis of the repeat sequences was performed by Weblogo and the results are shown in Fig.  3. The results show that different Stenotrophomonas are intertwined and cannot be separated from each other, so this sequence cannot be used for classification and identification.

spacer sequences
There are 376 spacer sequences in 15 confirmed CRISPR arrays, ranging in length from 18 to 60 bp (

Correlation analysis between the spacer sequence and the number of phages
13 strains of Stenotrophomonas downloaded from NCBI and 3 strains of S. maltophilia JV3, S. maltophilia K279a, S. maltophilia R5513 with complete genome sequence were predicted by Prophinder for phage sequence. It was found that not every Stenotrophomonas genome contains a phage, for example, up to five of S. Maltophilia AU1209 and S. Maltophilia Ab55555, S. Maltophilia K279a and S. maltophilia13637. Using the Spearman rank correlation coefficient, it was found that there was a negative correlation between the number of spacer sequences and the number of phage (rho=-0.538, P=0.039). The negative correlation between the number of spacer sequences and the number of phages indicates that as the number of spacer sequences increases, the number of phage cells gradually decreases.

DISCUSSION
CRISPR-Cas system is an acquired immunity system that provides defense against invading genetic elements, such as bacteriophages and plasmids Barrangou et al., 2007;Marraffini et al., 2008). The CRISPR-Cas immunity system functions in three steps: adaptation, expression, and interference (Jiang et al., 2013). Most of the prokaryotic CRISPR loci are located on their chromosomes. The main reason for this phenomenon is that the CRISPR system can provide clearance of exogenous phage and plasmid through the targeted interference mechanism of RNA. This also makes prokaryotes very adaptable in the face of the environment. In recent years, the gene editing technology based on CRISPR has been widely used in the fields of genetic engineering and genetic diseases because of its simple operation and high efficiency, and it has become a new research hotspot (Reed 2017). Therefore, the research and analysis of the CRISPR locus structure of bacteria is of great significance for the study of its function and the development and application of gene editing tools.
CRISPR was first discovered in E. coli, and many studies have shown that this system exists in 90% of archaea and 40% of bacteria (Hille et al., 2018). Stenotrophomonas is not only common environmental bacteria, but also an important conditional pathogen in nosocomial infections. Recently, due to its increasing drug resistance and morbidity, it has attracted people's attention (Hauben et al.,1999). In the process of using the genetic engineering method to study the bacteria, it was found that the plasmid transformation, homologous recombination, modification of the bacteria has lower success rates than most other bacteria. It is suggested that Stenotrophomonas can resist the invasion of foreign DNA such as phage and plasmid, which may be related to the immune function of CRISPR system. Therefore, this study used bioinformatics methods to analyze the genetic structure of the CRISPR system in Stenotrophomonas.
In this study, 15 confirmed and 132 questionable CRISPRs were identified by analyzing the CRISPR locus of 26 strains of Stenotrophomonas published by the CRISPRdb database and NCBI. It is slightly lower than the record percentage of CRISPR-containing genomes in the P. aeruginosa, which is 61.05% (van, Belkum et al., 2015). The system exists in all 26 strains, and some of the same CRISPRs can be distributed among different strains, which are conservative, suggesting that the strains evolved in the same direction due to the same foreign DNA invasion. In addition, the variability of the CRISPR structure may reveal that different strains have different resistance to foreign DNA interference, and also suggest the diversity of foreign DNA invasion.
Repeat sequences are also conserved in different strains, suggesting that different strains may form the same defense system during genetic evolution. Evolutionary analysis of the Stenotrophomonas repeats revealed that the same repeats were distributed among the different strains, suggesting that CRISPR is mainly transmitted between strains and independently evolved through a horizontal transfer mechanism. One study has implicated that stem-loop structures of some repeats may serve to facilitate recognition and mediated contact between the spacer-targeted foreign RNA or DNA and Cas-encoded proteins (Kunin et al.,2007). MFE of Stenotrophomonas is lower than that of other bacteria (Yang et al., 2015), and the bacteria have longer "stems". therefore, they may form more stable secondary structures. Studies have shown that the stability of RNA secondary structure may affect the function of the CRISPR locus. From Figure 3 we can see that the repeats of Stenotrophomonas are not conserved.
The average length of the spacer sequence in the Stenotrophomonas confirmed CRISPR-cas loci is the most common at 30bp, similar to most other bacteria (Barrangou et al., 2014). The correlation analysis between the spacer sequence and the number of phages showed that there was a negative correlation between the number of spacer sequences and the number of phages, which is consistent with the conclusion of Nozawa et al. on Streptococcus pyogenes, that is, the greater the number of CRISPR locus spacer sequences (Nozawa et al. 2011). the smaller the number of phages in the genome, suggesting that CRISPR limits the invasion of phage to Stenotrophomonas.
In this study, the bioinformatics analysis of the CRISPR gene structure in the genome of Stenotrophomonas contributed to the study of drug resistance and pathogenicity and CRISPR-mediated gene knockout technology, which laid the foundation for future research and development. Since the genome sequence of the Stenotrophomonas strains that have been published for analysis in the GenBank database is relatively small, and some of the information on the full gene annotation is not comprehensive, it is not possible to comprehensively analyze the cas gene in the CRISPR system. Therefore, this result has yet to be further improved.