The sequencing of the human genome provided quite a surprise to many when it was determined that there are only ~20,000 protein-coding genes, representing less than 2% of the total genomic sequence. Since other less complex eukaryotes like the nematode C. elegans have a very similar number of genes, it quickly became clear that the developmental and physiological complexity of humans probably can not be solely explained by proteins. We now know that most of the human genome is transcribed, yielding a complex repertoire of RNAs that includes tens of thousands of individual noncoding RNAs with little or no protein-coding capacity. Among these are well-studied small RNAs, such as microRNAs, as well as many other classes of small and long transcripts whose functions and mechanisms of biogenesis are less clear – but likely no less important. This is because many of these poorly characterized RNAs exhibit cell type-specific expression or are associated with human diseases, including cancer and neurological disorders. Our goal is to characterize the mechanisms by which these non-canonical RNAs are generated, regulated, and function, thereby revealing novel fundamental insights into RNA biology and developing new methods to treat diseases.
Much of our recent work has focused on circular RNAs, which are generated from thousands of protein-coding genes. At some genes, the abundance of the circular RNA exceeds that of the associated linear mRNA by a factor of 10, raising the interesting possibility that the function of some protein-coding genes may be to produce circular noncoding RNAs, not proteins. These circular RNAs are generated when the pre-mRNA splicing machinery “backsplices” and joins a splice donor to an upstream splice acceptor. We showed that repetitive elements, e.g. SINE elements, in the flanking introns are critical determinants of whether the intervening exon(s) circularize. When repeat sequences from the flanking introns base pair to one another, the splice sites are brought into close proximity and backsplicing occurs. This knowledge allowed us to generate plasmids that efficiently produce any circular RNA in species ranging from humans to flies. We have further shown that the ratio of linear to circular RNA produced from a given gene is modulated by a number of factors, including hnRNPs, SR proteins, core spliceosome, and transcription termination proteins. Surprisingly, when spliceosome components were depleted or inhibited pharmacologically, the steady-state levels of circular RNAs increased while expression of their associated linear mRNAs concomitantly decreased. Inhibition or slowing of canonical pre-mRNA processing events thus shifts the steady-state output of protein-coding genes towards circular RNAs, which likely helps explain why and how circular RNAs show tissue-specific expression profiles. Once generated, we showed that most circular RNAs are exported to the cytoplasm using a length-dependent and evolutionarily conserved pathway. It still remains largely unclear what most circular RNAs do, although two are known to efficiently modulate the activity of microRNAs. Ongoing efforts aim to further elucidate the mechanisms by which circular RNAs are produced, regulated, and function to control cell physiology and impact human diseases.
We are additionally using high-throughput screening approaches to reveal new insights into how gene outputs are controlled. We recently revealed that the multi-subunit Integrator (Int) complex catalyzes premature transcription termination at hundreds of protein-coding gene loci. It was previously known that Integrator endonucleolytically cleaves the 3’ ends of nascent small nuclear RNAs (snRNAs) in a key step in the biogenesis of these noncoding RNAs. Our work significantly expanded this observation as we revealed that the RNA endonuclease component of Integrator (the IntS11 subunit) also cleaves many nascent Drosophila mRNAs soon after transcription initiation. Unlike what is observed at snRNA gene loci, the short nascent mRNAs generated from Integrator cleavage are degraded from their 3’ ends by the RNA exosome. This is coupled to premature transcription termination, and these Integrator catalyzed events repress the expression of some full-length mRNAs by more than 100-fold. Integrator thus unexpectedly can function as a tuner or even an on/off switch to control transcriptional outputs.
My lab has provided additional important insights into how the 3’ ends of linear RNAs are generated and regulated. We showed that the MALAT1 locus, which is over-expressed in many human cancers, produces a long nuclear-retained noncoding RNA as well as a tRNA-like cytoplasmic small RNA (known as mascRNA). Despite being an RNA polymerase II transcript, the 3’ end of MALAT1 is produced not by canonical cleavage/polyadenylation but instead by recognition and cleavage of the tRNA-like structure by RNase P. Mature MALAT1 thus lacks a poly(A) tail, yet is expressed at a level higher than many protein-coding genes due to a highly conserved triple helical structure protecting its 3’ end. We continue to identify and characterize additional RNAs whose 3’ ends are generated via unexpected mechanisms, thereby revealing novel paradigms for how RNAs are processed and, most importantly, new classes of RNAs with important biological functions.