DescriptionThe Waksman Student Scholars Program, along with the Introduction to Molecular Biology and Biochemical Research class, were responsible for the publication of 628 Artemia sequences. Surprisingly, 361 of these sequences (58%) did not contain an open reading frame larger than 80 residues. It was originally presumed that this was due to a high level of genomic DNA contamination. While it is possible that some of our Artemia sequences are genomic contamination, I believe a large majority of our non-coding sequences are long non-coding RNA (ncRNA), newly recognized players in transcriptional regulation. This high percentage of non-coding sequences is reasonable,
as other genomic studies indicate about 50% of an organism’s RNA is non-coding. Our average non-coding sequence length was 600nt, significantly longer than our average Artemia 3’UTR length of 175nt, which can easily be explained if we acknowledge these sequences as long non-coding RNAs. Many of our non-coding RNAs also contain polyA tails, as well as polyadenylation signals. Considering many ncRNAs are polyadenylated, this data supports my hypothesis. Fifty-two percent of our non-coding sequences match
other Artemia sequences in NCBI, and of these matches, 33% are in the reverse direction. Transcription in the reverse direction is a method used by ncRNA to inhibit gene transcription.
In addition to my analysis of the 628 analyzed Artemia sequences, I used DNASTAR software to analyze all 5,947 Artemia sequences generated from 2005 through 2008. This software validated sequence quality and assembled similar sequences into 2,848 contiguous sequences. These contiguous sequences were further processed using Blast2GO, a gene ontology tool, where only 268 contiguous sequences were of high
enough quality to be considered annotated genes. These genes were further characterized according to their Gene Ontology.