Metagenomic analysis reveals the presence of prunus virus I in diseased Clematis vitalba : first record of this virus in Italy

Summary. Prunus virus I (PrVI) was detected for the first time in Clematis vitalba in Italy using high-throughput sequencing and the complete genome of this isolate, named Clv-1, was assembled and characterized. The results of the bioinformatic analyses were further validated with RT-PCR assays using PrVI-specific primers and Sanger dideoxy sequencing. The Clv-1 genome included three RNA segments of nucleotide lengths of 3468 (RNA1), 2892 (RNA2) and 2225 (RNA3), with five predicted open reading frames. Phylogenetic analyses showed close relationships with other PrVI isolates from different geographical origins, including European and non-European countries. This new pathogen record extends the information on the geographical distribution of PrVI, and possibly reflects the international movement of infected clema-tis germplasm due to global trade. Further surveys on the presence and distribution of PrVI in weeds and crops, such as the two PrVI hosts sweet cherry and peach, are required in the countries where PrVI has been detected.

In the winter of 2023, C. vitalba plants showing previously undescribed symptomatology (Figure 1), were observed in a public garden in Assisi (PG), Umbria province, Central Italy (N 43.068927; E 12.61576).The presence of PrVI was detected in virome analyses of C. vitalba plants, using high-throughput sequencing (HTS) and Sanger sequencing.The whole genome sequence of PrVI was obtained and compared with PrVI isolates previously reported from other countries.

Plant material, HTS and bioinformatic analysis
Total RNAs were isolated from leaf tissues of the sample Clv-1 (Figure 1) using the Viral Gene-spin TM Viral DNA/RNA Extraction Kit (iNtRON Biotechnology, Inc., Seongnam, South Korea), following the manufacturer's instructions.The RNA purity, concentration and integrity were determined by NanoDrop™ 2000 (NanoDrop).Ribosomal RNAs were depleted using a TruSeq RNA Sample Prep Kit, and the remaining RNAs were used for construction of RNA-seq libraries, which were sequenced on an Illumina Novaseq 6000 platform with paired-end reads length of 150 bp.Quality control on the sequencing data was performed with the software FastQC (v.0.11.5;https://www.bioinformatics.babraham.ac.uk/projects/fastqc), then low quality bases and adapters were removed with the software BBDuk in the BBTools (v.36; http://jgi.doe.gov/data-and-tools/software-tools/bbtools) package setting a minimum read quality of 25 and minimum read length of 35 bp.Taxonomic profiling of the reads were made with the software GAIA (v.2.02; Paytuví et al., 2019), which compare the reads against the databases Sila 132 (to identify ribosomal sequences) and NCBInr (to identify viral sequences).The resulting filtered reads were used to assemble the viral genome by using the algorithms Metaspades and RNAViral implemented in SPAdes (v. 3.15.3;Bankevich et al., 2012).BLASTn/BLASTx analysis of the contigs were carried out against local and online databases.

Validation of HTS data
Total RNAs extracted from the sample Clv-1 was used in RT-PCR, using virus-specific primers and sub-sequent Sanger sequencing of the amplicons to confirm the presence of the virus identified by HTS.Virusspecific primers were designed according to the contig sequences of the target virus obtained in this study (Supplementary Table 1).
The complete viral genomic sequence was obtained from sequence assemblies of the amplicons generated from, respectively, RT-PCR and 5'-3'-RACE reactions (2nd generation 5'/3' RACE kit; Roche).RT-PCR was conducted according to the manufacturer's protocol (ImProm-II TM Reverse Transcription System, Promega).The 25 μL RT-PCR reaction contained 2 μL of total RNAs, 0.5 μL of each primer (50 pmol μL -1 ), 12.5 μL of 2× Master Mix, 0.5 μL of Enzyme Mix and 9 μL of distilled water.The thermal cycling conditions were: one cycle of reverse transcription at 42°C for 90 min, one cycle of denaturation at 94°C for 2 min, followed by 35 cycles of amplification at 94°C for 45 s, 52°C for 1 min and 72°C for 2 min, and a final cycle of 72°C for 10 min.
Based on the nucleotide sequences that were obtained, specific primers (Supplementary Table 1) were redesigned to amplify fragments of the 5'-upstream and 3'-downstream regions, corresponding to each segmented genome by 5' and 3'-RACE (rapid amplification of cDNA ends), as described by Parrella and Troiano (2022).Amplicons of the expected sizes were directly sequenced in both directions at Microsynth Seqlab GmbH (Göttingen, Germany).
In addition, total RNAs extracted from five other symptomatic C. vitalba plants and one asymptomatic C. vitalba plant, from the vicinity of Clv-1, were assessed in RT-PCR assays for the presence of the virus identified in Clv-1, using PrV-CP1/PrV-CP2 primers (Supplementary Table 1) following the protocol described above.

Phylogenetic analyses
Multiple sequence alignments were conducted using MUSCLE (Edgar, 2004) implemented in MEGA11.Phylogenetic trees were constructed using the best fit model for each alignment, using the maximum likelihood (ML) method in the MEGA11 (Tamura et al., 2021) with 500 bootstrap replicates.The trees were drawn to scale, with branch lengths measured in the number of substitutions per site.

RESULTS AND DISCUSSION
HTS of the Clv-1 sample yielded 25,409,477 raw reads.Using the results of the taxonomic classification, the trimmed reads were parsed to keep only those classified as "viral" or which were completely unclassified (i.e., 9,246,969 reads).Of the total trimmed reads, 0.068% were mapped on ilarvirus RNA1, 0.055% on ilavirus RNA2 and 0.095% on ilarvirus RNA3.These reads were used to perform viral genome assembly with the two algorithms implemented in the Spades program.Two sets of filtered contigs were obtained: 4,661 contigs with the algorithms implemented in Metaspades, and 4,920 contigs with algorithms implemented in viral RNA.
BLASTx search showed presence of contigs with the greatest amino acid (aa) sequence identities (99-81%) to PrVI (Ilarvirus, Bromoviridae).Three contigs showing the greatest nucleotide identities with the PrVI genome were identified by BLASTn.These contigs represented almost full-length genomic sequences of the corresponding viruses.No other viral contigs belonging to other viruses were generated by HTS library.Based on the PrVI mapped contig sequences, primers were designed for each of the three RNAs (Supplementary Table 1).RT-PCR using these primers targeting the three different putative viral RNAs amplified products of the expected sizes (Figure 2).Sequences obtained from these amplicons were identical to the corresponding genome regions sequenced by HTS, confirming presence of PrVI in the C. vitalba plant.After verifying the sequences at the 5' and 3' ends, the sequences of the three genomic ssRNA segments of the Clv-1 isolate consisted of 3468 nucleotides (nt) for RNA1, 2892 nt for RNA2, and 2225 nt for RNA3.These sequences were deposited in GenBank with the accession numbers OR502867 (RNA1), OR602868 (RNA2) and OR602869 (RNA3).
RNA1 contains a single ORF (1a) with the ATG codon at position 29 and ending with a TAG codon at position 3301.It encodes the 121 kDa viral replicase protein (p1), consisting of 1090 amino acids (aa).The pro-tein shares 98.9-99.6%aa identity and 99.0% similarity with other PrVI isolates in GenBank.
RNA2 is bicistronic with the two ORFs overlapping by 272 nt.The first ORF at the 5' end of RNA2 encodes the viral polymerase protein of 807 aa (p2a) with a predicted molecular mass of 92 kDa.The second ORF encodes a 205 aa protein (p2b) involved in cell-to-cell movement and posttranscriptional gene silencing, with a predicted molecular mass of 22 kDa.The p2a protein shares 98.2-98.6%aa identity and 98-99% similarity, while the p2b protein shares 95.6-99.5% identity and 98-99% similarity with other PrVI isolates in GenBank.
RNA3 contains two ORFs.The ORF 3a encodes the movement protein (MP) of 300 aa (p3a), and has an estimated molecular mass of 32 kDa.The p3a protein has 96.0-99.0%identity and 97-100% similarity with the movement proteins of other PVrI isolates.The ORF 3b encodes the coat protein (CP) of 217 aa long protein (p3b) and has a predicted molecular mass of 24 kDa.The p3b protein shares 98.6-99.5% identity and 99-100% similarity with the coat protein of other PrVI isolates.
The expected amplicons were obtained by RT-PCR with the primers Prv-CP1/PrV-CP2, using the RNAs extracted from five other C. vitalba plants showing similar symptoms of Clv-1, while no reaction product was obtained with RNA extracted from asymptomatic plants.Sanger sequencing on these amplicons revealed 100% identity among these sequences and with Clv-1 sequence obtained by HTS.These results may indicate a common origin of the infection among the different clematis plants tested, and a possible role in the spread of the virus by vegetatively propagated infected host material.
Maximum likelihood-based phylogenetic analyses of the three RNAs, performed with MEGA 11 under the best fit substitution models, always placed Clv-1 in the Figure 2. Agarose gel electrophoresis of the RT-PCR products obtained with the primers designed on the PrVI RNA1 (PrV-R1F/ PrV-R1R), RNA2 (PrV-R2F/PrV-F2R), and RNA3 (PrV-R3F/PrV-R3R) (Supplementary Table 1).M = 100 bp ladder, -= negative control (healthy plant), + = RNA extracted from the infected Clematis vitalba plant.same comparable well-supported clade for each RNA, comprising the recognized PrVI isolates within subgroup I in Ilarvirus.These results were confirmed by a high bootstrap value in all instances (Figure 3).In particular, for phylogenetic reconstruction based on RNA2, two well supported sub-clades were generated, with the Clv-1 isolate grouping with the two PrVI isolates from C. vitalba: isolate Clem identified in Russia and isolate Pl622, identified in Slovakia.The isolate c18 from Picris echioides (identified in Slovenia) was the most divergent, and was placed outside the clades of the PrVI isolates in all the three phylogenetic reconstructions, supported by 100% bootstrap values on each corresponding node (Figure 3).
The advances in sequencing metagenomic analysis have led to the discovery of new viruses, allowing the detection and prevention of emerging viruses.Using HTS analysis, PrVI was first described on asymptomatic sweet cherry (Prunus avium) in Greece (Orfanidou et al., 2021), but was subsequently identified in the weed plant P. echioides (Rivarez et al., 2022) and repeatedly in Clematis spp.plants, mainly in Eastern Europe, including Hungary, Slovenia, Slovakia, Croatia and Russia (Chirkov et al., 2022;Rivarez et al., 2022;Salamon et al., 2023).
The recent discovery of PrVI in Italy described in the present study, although associated with different symptoms in C. vitalba plants from those described on the same host by previous authors (Chirkov et al., 2022;Salamon et al., 2023), confirms that this plant is an important natural host for this virus.Since PrVI was only recently described, it is not clear what is its diffusion in nature or its impacts on cultivated plants.In addition, accidental dispersion of infected propagative material of Clematis cannot be excluded.This could explain the recent findings of PrVI on Clematis in different European countries in a short period.For these reasons, the development of an effective, sensitive and specific PrVI diagnostic method is required, which can be used in extensive monitoring in the areas where PrVI was recently described.Wild and cultivated plants, particularly stone fruit crops, should be assessed for PrVI infections.

Figure 1 .
Figure 1.Symptoms in natural Clematis vitalba (A), and vein banding and pinpoint necrotic spot symptoms on a C. vitalba leaflet (B).