input.xml - SaxonC - Saxonica Developer Community

Bug #4302 » input.xml

Alf Eaton, 2019-08-29 09:17

    
    <?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.1d3 20150301//EN"  "JATS-archivearticle1.dtd"><article article-type="research-article" dtd-version="1.1d3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><front><journal-meta><journal-id journal-id-type="nlm-ta">elife</journal-id><journal-id journal-id-type="hwp">eLife</journal-id><journal-id journal-id-type="publisher-id">eLife</journal-id><journal-title-group><journal-title>eLife</journal-title></journal-title-group><issn publication-format="electronic">2050-084X</issn><publisher><publisher-name>eLife Sciences Publications, Ltd</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="publisher-id">03523</article-id><article-id pub-id-type="doi">10.7554/eLife.03523</article-id><article-categories><subj-group subj-group-type="display-channel"><subject>Research Article</subject></subj-group><subj-group subj-group-type="heading"><subject>Evolutionary Biology</subject></subj-group></article-categories><title-group><article-title>Long non-coding RNAs as a source of new peptides</article-title></title-group><contrib-group><contrib contrib-type="author" id="author-14738"><name><surname>Ruiz-Orera</surname><given-names>Jorge</given-names></name><xref ref-type="aff" rid="aff1"/><xref ref-type="fn" rid="con1"/><xref ref-type="fn" rid="conf1"/></contrib><contrib contrib-type="author" id="author-14739"><name><surname>Messeguer</surname><given-names>Xavier</given-names></name><xref ref-type="aff" rid="aff2"/><xref ref-type="other" rid="par-2"/><xref ref-type="fn" rid="con2"/><xref ref-type="fn" rid="conf1"/></contrib><contrib contrib-type="author" id="author-14740"><name><surname>Subirana</surname><given-names>Juan Antonio</given-names></name><xref ref-type="aff" rid="aff1"/><xref ref-type="aff" rid="aff3"/><xref ref-type="fn" rid="con3"/><xref ref-type="fn" rid="conf1"/></contrib><contrib contrib-type="author" corresp="yes" id="author-14450"><name><surname>Alba</surname><given-names>M Mar</given-names></name><xref ref-type="aff" rid="aff1"/><xref ref-type="aff" rid="aff4"/><xref ref-type="corresp" rid="cor1">*</xref><xref ref-type="other" rid="par-1"/><xref ref-type="fn" rid="con4"/><xref ref-type="fn" rid="conf1"/></contrib><aff id="aff1"><institution content-type="dept">Evolutionary Genomics Group, Research Programme on Biomedical Informatics</institution>, <institution>Hospital del Mar Research Institute, Universitat Pompeu Fabra</institution>, <addr-line><named-content content-type="city">Barcelona</named-content></addr-line>, <country>Spain</country></aff><aff id="aff2"><institution content-type="dept">Llenguatges i Sistemes Informàtics</institution>, <institution>Universitat Politècnica de Catalunya</institution>, <addr-line><named-content content-type="city">Barcelona</named-content></addr-line>, <country>Spain</country></aff><aff id="aff3"><institution>Real Academia de Ciències i Arts de Barcelona</institution>, <addr-line><named-content content-type="city">Barcelona</named-content></addr-line>, <country>Spain</country></aff><aff id="aff4"><institution>Catalan Institution for Research and Advanced Studies</institution>, <addr-line><named-content content-type="city">Barcelona</named-content></addr-line>, <country>Spain</country></aff></contrib-group><contrib-group content-type="section"><contrib contrib-type="editor"><name><surname>Tautz</surname><given-names>Diethard</given-names></name><role>Reviewing editor</role><aff><institution>Max Planck Institute for Evolutionary Biology</institution>, <country>Germany</country></aff></contrib></contrib-group><author-notes><corresp id="cor1"><label>*</label>For correspondence: <email>malba@imim.es</email></corresp></author-notes><pub-date date-type="pub" publication-format="electronic"><day>16</day><month>09</month><year>2014</year></pub-date><pub-date pub-type="collection"><year>2014</year></pub-date><volume>3</volume><elocation-id>e03523</elocation-id><history><date date-type="received"><day>30</day><month>05</month><year>2014</year></date><date date-type="accepted"><day>11</day><month>08</month><year>2014</year></date></history><permissions><copyright-statement>© 2014, Ruiz-Orera et al</copyright-statement><copyright-year>2014</copyright-year><copyright-holder>Ruiz-Orera et al</copyright-holder><license xlink:href="http://creativecommons.org/licenses/by/4.0/"><license-p>This article is distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>, which permits unrestricted use and redistribution provided that the original author and source are credited.</license-p></license></permissions><self-uri content-type="pdf" xlink:href="elife-03523-v1.pdf"/><abstract><object-id pub-id-type="doi">10.7554/eLife.03523.001</object-id><p>Deep transcriptome sequencing has revealed the existence of many transcripts that lack long or conserved open reading frames (ORFs) and which have been termed long non-coding RNAs (lncRNAs). The vast majority of lncRNAs are lineage-specific and do not yet have a known function. In this study, we test the hypothesis that they may act as a repository for the synthesis of new peptides. We find that a large fraction of the lncRNAs expressed in cells from six different species is associated with ribosomes. The patterns of ribosome protection are consistent with the translation of short peptides. lncRNAs show similar coding potential and sequence constraints than evolutionary young protein coding sequences, indicating that they play an important role in de novo protein evolution.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.001">http://dx.doi.org/10.7554/eLife.03523.001</ext-link></p></abstract><abstract abstract-type="executive-summary"><object-id pub-id-type="doi">10.7554/eLife.03523.002</object-id><title>eLife digest</title><p>Despite the terms being largely interchangeable in modern language, ‘DNA’ and ‘gene’ do not mean the same thing. A gene is made of DNA and contains the instructions to make a protein, and it is the protein that performs the function of the gene. However, cells in the body also contain DNA that does not form genes. Far from being ‘junk’ DNA with no biological purpose; this DNA has a variety of roles, including affecting how other genes are used.</p><p>To produce a protein, the DNA sequence of a gene is transcribed into an intermediate molecule called RNA, which is then translated to produce a protein. So-called long non-coding RNA (lncRNA) molecules are also transcribed from DNA, but whether these are translated to make proteins has been a subject of much debate. Indeed, the function of the vast majority of lncRNA molecules is unknown.</p><p>Ruiz-Orera et al. analyzed RNA sequences collected from earlier experiments on six different species—humans, mice, fish, flies, yeast, and a plant—and found nearly 2500 as yet unstudied lncRNAs in addition to those previously identified. Many of the lncRNAs that Ruiz-Orera et al. investigated could be found lodged inside the cellular machinery used to translate RNA into proteins. Furthermore, these lncRNA molecules are oriented in the machinery as if they are primed and ready for translation, suggesting that many lncRNAs do produce proteins. However, it is unclear how many of these proteins have a useful function.</p><p>Very few lncRNAs were found in more than one species, suggesting that they have evolved recently. The properties of lncRNA molecules also show many similarities with the properties of ‘young’—recently evolved—genes that are known to produce proteins. The combined findings of Ruiz-Orera et al. therefore suggest that lncRNAs are important for developing new proteins. The emergence of proteins with new functions has been an important driving force in evolution, and this work provides important clues into the first steps of this process.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.002">http://dx.doi.org/10.7554/eLife.03523.002</ext-link></p></abstract><kwd-group kwd-group-type="author-keywords"><title>Author keywords</title><kwd>lncRNA</kwd><kwd>ribosome profiling</kwd><kwd>eukaryote</kwd><kwd>de novo gene evolution</kwd></kwd-group><kwd-group kwd-group-type="research-organism"><title>Research organism</title><kwd>Human</kwd></kwd-group><funding-group><award-group id="par-1"><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/501100003329</institution-id><institution>Ministerio de Economía y Competitividad</institution></institution-wrap></funding-source><award-id>BFU2012-36820</award-id><principal-award-recipient><name><surname>Alba</surname><given-names>M Mar</given-names></name></principal-award-recipient></award-group><award-group id="par-2"><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/501100003329</institution-id><institution>Ministerio de Economía y Competitividad</institution></institution-wrap></funding-source><award-id>TIN2013-45732-C4-3-P</award-id><principal-award-recipient><name><surname>Messeguer</surname><given-names>Xavier</given-names></name></principal-award-recipient></award-group><funding-statement>The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.</funding-statement></funding-group><custom-meta-group><custom-meta><meta-name>elife-xml-version</meta-name><meta-value>2</meta-value></custom-meta><custom-meta specific-use="meta-only"><meta-name>Author impact statement</meta-name><meta-value>Ribosome profiling data from several eukaryotic species provides strong evidence that many long non-coding RNA molecules encode novel short proteins.</meta-value></custom-meta></custom-meta-group></article-meta></front><body><sec id="s1" sec-type="intro"><title>Introduction</title><p>Studies performed over the past decade have unveiled a richer and more complex transcriptome than was previously appreciated (<xref ref-type="bibr" rid="bib64">Okazaki et al., 2002</xref>; <xref ref-type="bibr" rid="bib11">Carninci et al., 2005</xref>; <xref ref-type="bibr" rid="bib40">Kapranov et al., 2007</xref>; <xref ref-type="bibr" rid="bib69">Ponjavic et al., 2007</xref>). Thousands of long RNA molecules (&gt;200 nucleotides) that do not display the typical properties of well-characterized protein-coding RNAs, and which have been named intergenic or long non-coding RNAs (lncRNAs), have been discovered in several eukaryotic genomes (<xref ref-type="bibr" rid="bib64">Okazaki et al., 2002</xref>; <xref ref-type="bibr" rid="bib70">Ponting et al., 2009</xref>; <xref ref-type="bibr" rid="bib8">Cabili et al., 2011</xref>; <xref ref-type="bibr" rid="bib52">Liu et al., 2012</xref>; <xref ref-type="bibr" rid="bib68">Pauli et al., 2012</xref>; <xref ref-type="bibr" rid="bib87">Ulitsky and Bartel, 2013</xref>). There are several lncRNAs that have regulatory functions (<xref ref-type="bibr" rid="bib29">Guttman and Rinn, 2012</xref>; <xref ref-type="bibr" rid="bib87">Ulitsky and Bartel, 2013</xref>). For example the X-inactive-specific transcript <italic>Xist</italic> regulates X chromosome inactivation in eutherian mammals (<xref ref-type="bibr" rid="bib7">Brockdorff et al., 1992</xref>). However, the vast majority of lncRNAs do not have a known function.</p><p>Intriguingly, several recent studies have noted that a large fraction of lncRNAs associate with ribosomes (<xref ref-type="bibr" rid="bib37">Ingolia et al., 2011</xref>; <xref ref-type="bibr" rid="bib5">Bazzini et al., 2014</xref>; <xref ref-type="bibr" rid="bib39">Juntawong et al., 2014</xref>; <xref ref-type="bibr" rid="bib89">van Heesch et al., 2014</xref>). Deep sequencing of ribosome-protected fragments, or ribosome profiling, provides detailed information on the regions that are translated in a transcript (<xref ref-type="bibr" rid="bib35">Ingolia, 2014</xref>). According to some studies, the patterns of ribosome protection indicate that lncRNAs are capable of translating short peptides (<xref ref-type="bibr" rid="bib37">Ingolia et al., 2011</xref>; <xref ref-type="bibr" rid="bib5">Bazzini et al., 2014</xref>; <xref ref-type="bibr" rid="bib39">Juntawong et al., 2014</xref>) although others have reached different conclusions (<xref ref-type="bibr" rid="bib30">Guttman et al., 2013</xref>). Many lncRNAs have the same structure as classical mRNAs: they are transcribed by polymerase II, capped and polyadenylated, and accumulate in the cytoplasm (<xref ref-type="bibr" rid="bib89">van Heesch et al., 2014</xref>). However, in contrast to typical protein-coding genes, they tend to contain few introns, are expressed at low levels, exhibit weak sequence constraints, and show limited phylogenetic conservation (<xref ref-type="bibr" rid="bib8">Cabili et al., 2011</xref>; <xref ref-type="bibr" rid="bib16">Derrien et al., 2012</xref>; <xref ref-type="bibr" rid="bib46">Kutter et al., 2012</xref>; <xref ref-type="bibr" rid="bib60">Necsulea et al., 2014</xref>).</p><p>The association of lncRNAs with ribosomes, and the fact that many of them appear to have arisen relatively recently in evolution, indicate that they could be an important source of new peptides. Levine et al., who described the first examples of de novo originated genes in <italic>Drosophila melanogaster</italic>, already noted that non-coding RNAs expressed at low levels could contribute to the birth of novel protein coding genes (<xref ref-type="bibr" rid="bib50">Levine et al., 2006</xref>). Cai et al. found a new protein coding gene in <italic>Saccharomyces cerevisiae</italic> likely to have been formed from a previously transcribed non-coding sequence (<xref ref-type="bibr" rid="bib9">Cai et al., 2008</xref>). Wilson and Masel observed that ribosome profiling reads from a yeast experiment often mapped to intergenic transcripts (<xref ref-type="bibr" rid="bib93">Wilson and Masel, 2011</xref>), and they proposed that this could help provide the raw material for the birth of new protein-coding genes. Another study in yeast found evidence of translation of short species-specific ORFs located in non-genic regions (<xref ref-type="bibr" rid="bib12">Carvunis et al., 2012</xref>). More generally, it is important to consider that de novo protein-coding gene evolution, which was once thought to be a very rare event, is now believed to be relatively common (<xref ref-type="bibr" rid="bib42">Khalturin et al., 2009</xref>; <xref ref-type="bibr" rid="bib85">Toll-Riera et al., 2009</xref>; <xref ref-type="bibr" rid="bib84">Tautz and Domazet-Lošo, 2011</xref>; <xref ref-type="bibr" rid="bib54">Long et al., 2013</xref>; <xref ref-type="bibr" rid="bib74">Reinhardt et al., 2013</xref>). Recently emerged proteins tend to be very short and evolve under weak evolutionary constraints (<xref ref-type="bibr" rid="bib1">Albà and Castresana, 2005</xref>; <xref ref-type="bibr" rid="bib50">Levine et al., 2006</xref>; <xref ref-type="bibr" rid="bib10">Cai et al., 2009</xref>; <xref ref-type="bibr" rid="bib51">Liu et al., 2010</xref>; <xref ref-type="bibr" rid="bib95">Xie et al., 2012</xref>; <xref ref-type="bibr" rid="bib66">Palmieri et al., 2014</xref>), properties that we also expect to find in the putative ORFs of lncRNAs.</p><p>The idea that lncRNAs serve as a repository for the evolution of new peptides is appealing but the evidence is still fragmented. In this study, we have analyzed ribosome profiling experiments performed in six different species and measured the sequence coding potential and selective constraints of the putatively translated ORFs in lncRNAs and codRNAs. We have discovered that lncRNAs show very similar characteristics to evolutionary young protein coding genes (lineage-specific proteins). The results strongly support a role for lncRNAs in the production of new peptides.</p></sec><sec id="s2" sec-type="results"><title>Results</title><sec id="s2-1"><title>Characterization of coding and long non-coding transcripts</title><p>We obtained polyA+ RNA and ribosome profiling sequencing data from six different published experiments performed in diverse eukaryotic species, mouse (<italic>Mus musculus</italic>), human (<italic>Homo sapiens</italic>, HeLa cells), zebrafish (<italic>Danio rerio</italic>), fruit fly (<italic>D. melanogaster</italic>), <italic>Arabidopsis</italic> (<italic>A. thaliana),</italic> and yeast (<italic>S. cerevisiae</italic>) (<xref ref-type="table" rid="tbl1">Table 1</xref>). After read mapping and transcript assembly, we classified the expressed transcripts longer than 200 nucleotides into coding and long non-coding classes (codRNAs and lncRNAs, respectively, <xref ref-type="table" rid="tbl2">Table 2</xref>).<table-wrap id="tbl1" position="float"><object-id pub-id-type="doi">10.7554/eLife.03523.003</object-id><label>Table 1.</label><caption><p>Data sets used in the study</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.003">http://dx.doi.org/10.7554/eLife.03523.003</ext-link></p></caption><table frame="hsides" rules="groups"><thead><tr><th colspan="2">Species</th><th>GEO Accession</th><th>Mapped reads (millions)</th><th>Max read length (bp)</th><th>Description</th><th>Reference</th></tr></thead><tbody><tr><td rowspan="2">Mouse <italic>M. musculus</italic></td><td>RNA-seq</td><td>GSE30839</td><td>226.0</td><td>43</td><td rowspan="2">ES cells, E14</td><td rowspan="2"><xref ref-type="bibr" rid="bib37">Ingolia et al., 2011</xref></td></tr><tr><td>Ribosome profiling</td><td>GSE30839</td><td>39.2</td><td>47</td></tr><tr><td rowspan="2">Human <italic>H. sapiens</italic></td><td>RNA-seq</td><td>GSE22004</td><td>29.8</td><td>36</td><td rowspan="2">HeLa cells</td><td rowspan="2"><xref ref-type="bibr" rid="bib28">Guo et al., 2010</xref></td></tr><tr><td>Ribosome profiling</td><td>GSE22004</td><td>78.3</td><td>36</td></tr><tr><td rowspan="2">Zebrafish <italic>D. rerio</italic></td><td>RNA-seq</td><td>GSE32900</td><td>1382.2</td><td>2 × 75</td><td rowspan="2">Series of developmental stages</td><td rowspan="2"><xref ref-type="bibr" rid="bib14">Chew et al., 2013</xref></td></tr><tr><td>Ribosome profiling</td><td>GSE46512</td><td>1040.0</td><td>44</td></tr><tr><td rowspan="2">Fruit fly <italic>D. melanogaster</italic></td><td>RNA-seq</td><td>GSE49197</td><td>1317.9</td><td>50</td><td rowspan="2">0–2hr embryos, wild type</td><td rowspan="2"><xref ref-type="bibr" rid="bib20">Dunn et al., 2013</xref></td></tr><tr><td>Ribosome profiling</td><td>GSE49197</td><td>105.7</td><td>50</td></tr><tr><td rowspan="2">Arabidopsis <italic>A. thaliana</italic></td><td>RNA-seq</td><td>GSE50597</td><td>79.8</td><td>51</td><td rowspan="2">No stress conditions, TRAP purification</td><td rowspan="2"><xref ref-type="bibr" rid="bib39">Juntawong et al., 2014</xref></td></tr><tr><td>Ribosome profiling</td><td>GSE50597</td><td>140.3</td><td>51</td></tr><tr><td rowspan="2">Yeast <italic>S. cerevisiae</italic></td><td>RNA-seq</td><td>GSE52119</td><td>20.54</td><td>50</td><td rowspan="2">GSY83, diploid</td><td rowspan="2"><xref ref-type="bibr" rid="bib57">McManus et al., 2014</xref></td></tr><tr><td>Ribosome profiling</td><td>GSE52119</td><td>6.83</td><td>50</td></tr></tbody></table></table-wrap><table-wrap id="tbl2" position="float"><object-id pub-id-type="doi">10.7554/eLife.03523.004</object-id><label>Table 2.</label><caption><p>Fraction of transcripts associated with ribosomes</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.004">http://dx.doi.org/10.7554/eLife.03523.004</ext-link></p></caption><table frame="hsides" rules="groups"><thead><tr><th/><th colspan="3">codRNA</th><th colspan="3">lncRNA</th></tr><tr><th/><th>Expressed</th><th colspan="2">Associated with ribosomes (RP)</th><th>Expressed</th><th colspan="2">Associated with ribosomes (RP)</th></tr><tr><th/><th/><th>Total</th><th>Stringent</th><th/><th>Total</th><th>Stringent</th></tr></thead><tbody><tr><td>Mouse</td><td>14,245</td><td align="char" char="(">14,196 (99.7%)</td><td align="char" char="(">13,918 (97.7%)</td><td>476</td><td align="char" char="(">390 (81.9%)</td><td align="char" char="(">367 (77.1%)</td></tr><tr><td>Human</td><td>17,011</td><td align="char" char="(">16,630 (97.8%)</td><td align="char" char="(">16,617 (97.7%)</td><td>934</td><td align="char" char="(">403 (43.1%)</td><td align="char" char="(">343 (36.7%)</td></tr><tr><td>Zebrafish</td><td>12,595</td><td align="char" char="(">11,643 (92.4%)</td><td align="char" char="(">11,637 (92.4%)</td><td>2392</td><td align="char" char="(">726 (30.4%)</td><td align="char" char="(">684 (28.6%)</td></tr><tr><td>Fruit fly</td><td>8041</td><td align="char" char="(">8031 (99.9%)</td><td align="char" char="(">7623 (94.8%)</td><td>28</td><td align="char" char="(">22 (78.6%)</td><td align="char" char="(">10 (35.7%)</td></tr><tr><td>Arabidopsis</td><td>19,162</td><td align="char" char="(">18,879 (98.5%)</td><td align="char" char="(">10,329 (53.9%)</td><td>139</td><td align="char" char="(">93 (66.9%)</td><td align="char" char="(">68 (48.9%)</td></tr><tr><td>Yeast</td><td>4740</td><td align="char" char="(">4547 (95.9%)</td><td align="char" char="(">4335 (91.5%)</td><td>21</td><td align="char" char="(">6 (28.6%)</td><td align="char" char="(">6 (28.6%)</td></tr></tbody></table><table-wrap-foot><fn><p>Stringent: number of transcripts significant at p &lt; 0.05 using 3′UTRs as a null model (see ‘Materials and methods’ for more details).</p></fn></table-wrap-foot></table-wrap></p><p>We detected hundreds of annotated lncRNAs in the vertebrate species (mouse, human and zebrafish), the number being lower (&lt;150) in the other species (fruit fly, <italic>Arabidopsis</italic> and yeast). In addition, we identified a large number of novel lncRNAs not annotated in the databases, 2488 taking all species together (<xref ref-type="supplementary-material" rid="SD1-data">Supplementary file 1A</xref>). The inclusion of such lncRNAs resulted in a sixfold increase in the number of lncRNAs amenable for study in zebrafish and a twofold increase in mouse. In yeast, we only found two annotated lncRNAs, but there were 19 novel ones. In the majority of the analyses, we merged the annotated and the novel lncRNAs.</p><p>As expected, lncRNAs tended to be much shorter than codRNAs in all the species studied (<xref ref-type="fig" rid="fig1">Figure 1A</xref>). We found that most lncRNAs contained at least one short ORF (≥24 amino acids) and often several ORFs. The average ORF size in lncRNAs was between 43 and 68 amino acids depending on the species (<xref ref-type="supplementary-material" rid="SD1-data">Supplementary file 1B</xref>). Consistent with previous studies, lncRNAs were expressed at significantly lower levels than codRNAs (<xref ref-type="fig" rid="fig1">Figure 1B</xref>, Wilcoxon test, p &lt; 10<sup>−5</sup>).<fig id="fig1" position="float"><object-id pub-id-type="doi">10.7554/eLife.03523.005</object-id><label>Figure 1.</label><caption><title>General characteristics of codRNA and lncRNA transcripts.</title><p>(<bold>A</bold>) Density plots of transcript length. (<bold>B</bold>) Box-plots of transcript expression level in log2(FPKM) units. lncRNA_ribo: lncRNAs associated with ribosomes; lncRNA_noribo: lncRNAs for which association with ribosomes was not detected. codRNA: coding transcripts encoding experimentally validated proteins except for zebrafish in which all transcripts annotated as coding were considered. The area within the box-plot comprises 50% of the data and the line represents the median value. In all studied species, codRNAs were expressed at higher levels than lncRNAs (Wilcoxon test, p &lt; 10<sup>−5</sup>), and lncRNA_ribo at higher levels than lncRNA_noribo (Wilcoxon test, p &lt; 0.005).</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.005">http://dx.doi.org/10.7554/eLife.03523.005</ext-link></p></caption><graphic xlink:href="elife-03523-fig1-v1.tif"/></fig></p></sec><sec id="s2-2"><title>Efficient detection of translation events by ribosome profiling</title><p>The analysis of ribosome profiling sequencing data showed that the percentage of expressed coding transcripts associated with ribosomes was &gt;90% in all species, with the highest values (&gt;99%) in mouse and fruit fly (<xref ref-type="table" rid="tbl2">Table 2</xref>). Pseudogenes had a lower rate of association with ribosomes than coding RNAs, but surprisingly, in species with many annotated pseudogenes, such as human, mouse, and <italic>Arabidopsis</italic>, the majority of them showed association with ribosomes (<xref ref-type="supplementary-material" rid="SD1-data">Supplementary file 1A</xref>). This appeared to be a true signal; while pseudogenes will typically show sequence similarity to other functional copies in the genome, we only considered uniquely mapped reads with no mismatches.</p><p>Ribosome profiling is based on deep sequencing, and thus provides an unmatched level of resolution of the translated peptides when compared with current proteomics techniques. This is especially important for short proteins, which are difficult to detect by standard mass spectrometry methods (<xref ref-type="bibr" rid="bib79">Slavoff et al., 2013</xref>). We used the ribosome-associated protein-coding RNA data to investigate the relationship between peptide detection by proteomics and protein length. We found that human and mouse translated proteins between 24 and 80 amino acids long were more difficult to identify in proteomics databases than longer proteins (<xref ref-type="table" rid="tbl3">Table 3</xref>).<table-wrap id="tbl3" position="float"><object-id pub-id-type="doi">10.7554/eLife.03523.006</object-id><label>Table 3.</label><caption><p>Fraction of translated proteins of different size detected in proteomics databases</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.006">http://dx.doi.org/10.7554/eLife.03523.006</ext-link></p></caption><table frame="hsides" rules="groups"><thead><tr><th/><th colspan="4">Protein size (amino acids)</th></tr></thead><tbody><tr><td>Species</td><td>24–80</td><td>81–130</td><td>131–180</td><td>&gt;180</td></tr><tr><td>Mouse</td><td>27/58 (46.6%)</td><td>222/286 (77.6%)</td><td>256/330 (77.6%)</td><td>3716/4786 (77.7%)</td></tr><tr><td>Human</td><td>116/272 (42.6%)</td><td>536/748 (71.7%)</td><td>669/875 (76.5%)</td><td>6757/8964 (75.4%)</td></tr><tr><td>Yeast</td><td>27/30 (90.0%)</td><td>168/207 (81.1%)</td><td>234/265 (88.3%)</td><td>2934/3224 (91.0%)</td></tr></tbody></table><table-wrap-foot><fn><p>Only transcripts encoding experimentally validated proteins (codRNAe) were considered.</p></fn></table-wrap-foot></table-wrap></p></sec><sec id="s2-3"><title>Long non-coding RNA transcripts frequently associate with ribosomes</title><p>The percentage of lncRNAs scanned by ribosomes (lncRNA_ribo) was surprisingly high in all the species studied (<xref ref-type="table" rid="tbl2">Table 2</xref>). The values ranged from 28.6% in yeast to 81.9% in mouse. This affected the main lncRNA classes described in Ensembl v. 70, including long intervening non-coding RNAs (lincRNAs) or antisense transcripts (<xref ref-type="supplementary-material" rid="SD1-data">Supplementary file 1C</xref>). Short transcript size may hinder ribosome association detection (<xref ref-type="bibr" rid="bib4a">Aspden et al., 2014</xref>). We also found that the ribosome profiling signal was more difficult to detect in poorly expressed transcripts than in highly expressed ones, both for lncRNAs and codRNAs (<xref ref-type="fig" rid="fig2">Figure 2</xref>). As lncRNAs tend to be expressed at low levels and are short when compared to codRNAs (<xref ref-type="fig" rid="fig1">Figure 1</xref>), we might be underestimating their association with ribosomes. <fig id="fig2" position="float"><object-id pub-id-type="doi">10.7554/eLife.03523.007</object-id><label>Figure 2.</label><caption><title>Effect of transcript expression level on the detection of ribosome association.</title><p>The percentage of transcripts associated with ribosomes is shown for several transcript expression intervals. codRNA: annotated coding transcripts encoding experimentally verified proteins (except in zebrafish for which all coding transcripts were considered). lncRNA: annotated and novel long non-coding RNAs. Only species with at least 20 transcripts in each expression bin were plotted. In the rest of species, the data were consistent with the trends shown.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.007">http://dx.doi.org/10.7554/eLife.03523.007</ext-link></p></caption><graphic xlink:href="elife-03523-fig2-v1.tif"/></fig></p><p>In order to determine if the ribosome profiling signal in lncRNAs was different from noise, we compared ribosome density in the transcripts it to that in 3′untranslated regions (3′UTRs). More specifically, the null model consisted in a size-matched set of sequences containing randomly taken 3′UTR from annotated coding transcripts. Ribosome density was calculated as the number of ribosome profiling reads divided by RNA-seq reads, a ratio defined as translational efficiency (TE) (<xref ref-type="bibr" rid="bib37">Ingolia et al., 2011</xref>). Both codRNAs and lncRNAS displayed much higher TE values than 3′UTRs in all species studied (Wilcoxon test p &lt; 10<sup>−5</sup>, <xref ref-type="fig" rid="fig3">Figure 3</xref>). We could reject the null model for 90.12% of the lncRNAs and 87.19% of the codRNAs associated with ribosomes (p &lt; 0.05) (see details by species in <xref ref-type="table" rid="tbl2">Table 2</xref>, Stringent set). Therefore, we concluded that the density of ribosomes in lncRNAs is much higher than expected by spurious ribosome binding.<fig id="fig3" position="float"><object-id pub-id-type="doi">10.7554/eLife.03523.008</object-id><label>Figure 3.</label><caption><title>TE distribution in human transcripts and 3′UTRs (null-model).</title><p>Cumulative distribution of TE values in human codRNAs, lncRNAs, and 3′UTR sequences. We randomly selected 3′UTRs with a minimum length of 30 nucleotides to build a set of 3′UTR sequences with the same size distribution as the complete transcripts.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.008">http://dx.doi.org/10.7554/eLife.03523.008</ext-link></p></caption><graphic xlink:href="elife-03523-fig3-v1.tif"/></fig></p><p>Next, we compared ribosome density in lncRNAs and codRNAs in each of the species focusing on regions covered by ribosome profiling reads to accommodate for any differences in the length of the putatively translated regions. In human, fruit fly, and yeast, TE was higher in codRNAs than in lncRNAs (Wilcoxon test, p &lt; 0.005), but in mouse and zebrafish the opposite trend was observed (Wilcoxon test, p &lt; 0.05) (<xref ref-type="fig" rid="fig4">Figure 4</xref>). Despite the differences between the species, which may be due to technical issues, it is clear that lncRNAs can show TE values that are similar or even higher than codRNAs. The results were similar when we restricted the analysis to genes encoding a single transcript to avoid any possible biases due to multiple read mapping or when we employed the maximum TE in 90 nucleotide windows (<xref ref-type="fig" rid="fig4s1">Figure 4—figure supplement 1</xref>).<fig-group><fig id="fig4" position="float"><object-id pub-id-type="doi">10.7554/eLife.03523.009</object-id><label>Figure 4.</label><caption><title>Ribosome association profiles for codRNAs and lncRNAs.</title><p>Box-plots of transcript translational efficiency (TE) in log2(TE) units. The area within the box-plot comprises 50% of the data, and the line represents the median value. lncRNA: lncRNAs for which association with ribosomes was detected. codRNA: coding RNAs transcripts encoding experimentally validated proteins except for zebrafish in which all transcripts annotated as coding were considered.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.009">http://dx.doi.org/10.7554/eLife.03523.009</ext-link></p></caption><graphic xlink:href="elife-03523-fig4-v1.tif"/></fig><fig id="fig4s1" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03523.010</object-id><label>Figure 4—figure supplement 1.</label><caption><title>Additional translational efficiency (TE) measures.</title><p>Single isoforms correspond to data for genes with a single transcript. The number of such genes was 2961 codRNA and 246 lncRNA_ribo for mouse, 2853 codRNA and 150 lncRNA_ribo for human, 9352 codRNA and 412 lncRNA_ribo for zebrafish, 836 codRNA and 18 lncRNA_ribo for fruit fly, and 3024 codRNA and 92 lncRNA_ribo for Arabidopsis. In the case of yeast, all genes were taken. TE max is the maximum TE value taking 90 nucleotide windows.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.010">http://dx.doi.org/10.7554/eLife.03523.010</ext-link></p></caption><graphic xlink:href="elife-03523-fig4-figsupp1-v1.tif"/></fig></fig-group></p><p>For comparison, we collected a set of 29 human genes with non-coding functions described in several recent reviews (<xref ref-type="supplementary-material" rid="SD2-data">Supplementary file 2A</xref>; <xref ref-type="bibr" rid="bib70">Ponting et al., 2009</xref>; <xref ref-type="bibr" rid="bib87">Ulitsky and Bartel, 2013</xref>; <xref ref-type="bibr" rid="bib24">Fatica and Bozzoni, 2014</xref>). Many of these genes play roles in the regulation of gene expression in the nucleus and are thus unlikely to be translated. We only detected expression for five of these genes: <italic>Malat1</italic>, <italic>Pvt1</italic>, <italic>Neat1</italic>, <italic>Meg8</italic>, and <italic>Cyrano</italic>. Transcripts encoded by the first three genes showed ribosome association. In the case of <italic>Malat1,</italic> this was also consistently observed in mouse and zebrafish (in the latter species <italic>Malat1</italic> was identified as a novel transcript) and in the case of <italic>Pvt1</italic> in mouse. Given the small number of expressed transcripts, we could not draw any general conclusions for this set.</p></sec><sec id="s2-4"><title>lncRNAs show similar ribosome protection profiles to codRNAs</title><p>The exact positions of ribosome profiling reads on the RNA can be used to delineate the regions that are being actively translated or to discover new functional ORFs (<xref ref-type="bibr" rid="bib14">Chew et al., 2013</xref>; <xref ref-type="bibr" rid="bib30">Guttman et al., 2013</xref>; <xref ref-type="bibr" rid="bib35">Ingolia, 2014</xref>). Because the ribosome is released after encountering a stop codon, this technique can also be employed to identify novel C-terminal protein extensions (<xref ref-type="bibr" rid="bib20">Dunn et al., 2013</xref>) or to evaluate if a predicted ORF is likely to correspond to a translated peptide (<xref ref-type="bibr" rid="bib30">Guttman et al., 2013</xref>). We next aimed at comparing the TE values in different transcript regions, including open reading frames (ORFs), putative 5′ and 3′ untranslated regions (UTRs), and the regions between ORFs.</p><p>In order to obtain an unbiased picture, it was important to define the different regions in the same way in lncRNAs and codRNAs. In typical codRNAs there is a main translated ORF that covers a large fraction of the transcript, sometimes accompanied by short upstream ORFs in the 5′UTR (<xref ref-type="bibr" rid="bib14">Chew et al., 2013</xref>). However, lncRNAs may potentially encode several short peptides (<xref ref-type="bibr" rid="bib37">Ingolia et al., 2011</xref>). The minimum size of ORFs was set at 24 amino acids (75 nucleotides counting the STOP codon), as peptides of this size have been identified in genetic screen studies in humans (<xref ref-type="bibr" rid="bib33">Hashimoto et al., 2001</xref>). To simplify the comparisons, we employed the same ORF size cut-off in all species. We also considered both a primary ORF, defined as the ORF with the largest number of ribosome profiling reads, as well as any additional non-overlapping ORFs that mapped to ribosome profiling reads (rest of ORFs).</p><p>In codRNAs, the primary ORF showed a nearly perfect degree of agreement with the annotated protein, indicating that it was an appropriate metric for the main translated product. Primary ORFs in lncRNAs typically occupied a shorter fraction of the transcript than in codRNAs (<xref ref-type="fig" rid="fig5">Figure 5A</xref>). The relative length of the ORF with respect to transcript length did not seem to be a strong predictor of ribosome association, as it did not help distinguish lncRNAs associated with ribosomes (lncRNA_ribo) to those not associated with ribosomes (lncRNA_noribo). In lncRNAs, most of the primary ORFs corresponded to proteins less than 100 amino acids long (<xref ref-type="fig" rid="fig5s1">Figure 5—figure supplement 1</xref>).<fig-group><fig id="fig5" position="float"><object-id pub-id-type="doi">10.7554/eLife.03523.011</object-id><label>Figure 5.</label><caption><title>Ribosome association in different transcript regions.</title><p>(<bold>A</bold>) Density plot of the relative length of the primary ORF in lncRNA_ribo and codRNA with respect to transcript length. For comparison data for the longest ORF in lncRNA_noribo is also shown (except for fruit fly due to insufficient data). (<bold>B</bold>) Box-plots of TE distribution in primary ORF, 5′UTR, and 3′UTR regions. The area within the box-plot comprises 50% of the data, and the line represents the median value. The analysis considered all transcripts with 5′UTR and 3′UTR longer than 30 nucleotides and &gt;0.2 FPKM in all three regions. The number of transcripts was 1956 codRNA and 159 lncRNA_ribo in mouse, 3558 codRNA and 139 lncRNA_ribo in human, 5216 codRNA and 252 lncRNA_ribo in zebrafish, and 2019 codRNA and 33 lncRNA_ribo in Arabidopsis. (<bold>C</bold>) Box-plots of TE distribution in primary ORFs, rest of ORFs with ribosome profiling reads and non-ORF regions (interORF). The analysis considered all transcripts with at least two ORFs and more than 30 nucleotides interORF. The number of transcripts was 3264 codRNA and 204 lncRNA_ribo in mouse, 3104 codRNA and 168 lncRNA_ribo in human, 1646 codRNA and 212 lncRNA_ribo in zebrafish, and 1098 codRNA and 25 lncRNA_ribo in Arabidopsis. Fruit fly and yeast were not included in the last two analyses due to insufficient data (&lt;8 lncRNA_ribo meeting the conditions).</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.011">http://dx.doi.org/10.7554/eLife.03523.011</ext-link></p></caption><graphic xlink:href="elife-03523-fig5-v1.tif"/></fig><fig id="fig5s1" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03523.012</object-id><label>Figure 5—figure supplement 1.</label><caption><title>Absolute nucleotide length of ORFs in different kinds of transcripts.</title><p>In codRNAs and lncRNA_ribo, we selected the primary ORF (the ORF with the largest number of ribosome profiling reads), whereas in lncRNA_noribo we selected the longest ORF.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.012">http://dx.doi.org/10.7554/eLife.03523.012</ext-link></p></caption><graphic xlink:href="elife-03523-fig5-figsupp1-v1.tif"/></fig><fig id="fig5s2" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03523.013</object-id><label>Figure 5—figure supplement 2.</label><caption><title>Translational efficiency in single-isoform genes.</title><p>(<bold>A</bold>) Box-plots of TE distribution in primary ORF, 5′UTR, and 3′UTR regions. The analysis considered only genes with one isoform, with UTR and ORF regions expressed at &gt;0.2 FPKM and with 5′UTR and 3′UTR longer than 30 nucleotides. The number of transcripts was 980 codRNA and 97 lncRNA_ribo in mouse, 758 codRNA and 36 lncRNA_ribo in human, 3763 codRNA and 117 lncRNA_ribo in zebrafish, and 1495 codRNA and 32 lncRNA_ribo in Arabidopsis. (<bold>B</bold>) Box-plots of TE distribution in primary ORFs, other ORFs with ribosome profiling reads and non-ORF regions (interORFs). The analysis only considered genes with one isoform in which these regions were longer than 30 nucleotides and with expression &gt;0.2 FPKM. The number of transcripts was 1691 codRNA and 113 lncRNA_ribo in mouse, 763 codRNA and 54 lncRNA_ribo in human, 1170 codRNA and 108 lncRNA_ribo in zebrafish, and 817 codRNA and 25 lncRNA_ribo in Arabidopsis.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.013">http://dx.doi.org/10.7554/eLife.03523.013</ext-link></p></caption><graphic xlink:href="elife-03523-fig5-figsupp2-v1.tif"/></fig><fig id="fig5s3" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03523.014</object-id><label>Figure 5—figure supplement 3.</label><caption><title>Translational efficiency in annotated transcripts.</title><p>(<bold>A</bold>) Box-plots of TE distribution in primary ORF, 5′UTR, and 3′UTR regions. The analysis considered only annotated transcripts, with UTR and ORF regions expressed at &gt;0.2 FPKM and with 5′UTR and 3′UTR longer than 30 nucleotides. The number of transcripts was 1956 codRNA and 92 lncRNA_ribo in mouse, 3558 codRNA and 138 lncRNA_ribo in human, 5216 codRNA and 54 lncRNA_ribo in zebrafish, and 2019 codRNA and 22 lncRNA_ribo in Arabidopsis. (<bold>B</bold>) Box-plots of TE distribution in primary ORFs, other ORFs with ribosome profiling reads (rest ORFs) and non-ORF regions (interORF). The analysis only considered annotated transcripts in which these regions were longer than 30 nucleotides and with expression &gt;0.2 FPKM. The number of transcripts was 3264 codRNA and 128 lncRNA_ribo in mouse, 3104 codRNA and 167 lncRNA_ribo in human, 1646 codRNA and 58 lncRNA_ribo in zebrafish, and 1098 codRNA and 18 lncRNA_ribo in Arabidopsis.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.014">http://dx.doi.org/10.7554/eLife.03523.014</ext-link></p></caption><graphic xlink:href="elife-03523-fig5-figsupp3-v1.tif"/></fig><fig id="fig5s4" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03523.015</object-id><label>Figure 5—figure supplement 4.</label><caption><title>Translational efficiency in transcripts expressed at different levels.</title><p>We restricted this analysis to transcripts with ORF and UTR regions expressed at &gt;0.2 FPKM and with 5′UTR and 3′UTR longer than 30 nucleotides. (<bold>A</bold>) Expressed at low levels: transcripts expressed at 0.5–2 FPKM, (<bold>B</bold>) expressed at high levels: transcripts expressed at 2–10 FPKM. codRNAs were sampled in such a way as to have the same gene expression distribution as the corresponding lncRNA set. Results for species in which all sets contained at least 20 transcripts are shown.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.015">http://dx.doi.org/10.7554/eLife.03523.015</ext-link></p></caption><graphic xlink:href="elife-03523-fig5-figsupp4-v1.tif"/></fig></fig-group></p><p>Next, we focused our attention on the differences between the primary ORF and the 5′UTR and 3′UTR regions in codRNAs and lncRNAs. We defined the 3′ untranslated region (3′UTR) as the sequence located immediately after the STOP codon of the primary ORF or the most downstream ORF associated with ribosomes. We used the same criteria to define the 5′UTR upstream from the initiation codon. In this analysis, we included all transcripts containing at least one ORF associated with ribosomes (the primary ORF) and sufficiently long UTR regions as to detect ribosome profiling reads (&gt;30 nucleotides); insufficient data for fruit fly and yeast precluded the analysis for these species. In both codRNAs and lncRNAs, the 5′UTR showed a ribosome density (translational efficiency, TE) comparable to that of the primary ORF (<xref ref-type="fig" rid="fig5">Figure 5B</xref>). In contrast, the 3′UTR showed very little ribosome association and often we could not find a single read mapping to this region (31–91% of cases in codRNAs and 46–68% in lncRNAs). Using genes with a single isoform or considering only annotated transcripts produced similar results (<xref ref-type="fig" rid="fig5s2 fig5s3">Figure 5—figure supplements 2 and 3</xref>). We also controlled for expression level by dividing the data set in transcripts with low (0.5–2 FPKM) and high expression (&gt;2 FPKM), and by sampling the codRNAs in such a way as to have a similar expression distribution as lncRNAs. The results were very similar to those obtained with the complete data set (<xref ref-type="fig" rid="fig5s4">Figure 5—figure supplement 4</xref>), indicating that the analysis is robust to transcript expression differences.</p><p>As transcripts may contain several ORFs, we performed a separate analysis in which we compared the translational efficiency of the primary ORF, any additional ORFs with mapped ribosome profiling reads, and the regions between ribosome-protected ORFs (interORF) (<xref ref-type="fig" rid="fig5">Figure 5C</xref>). InterORF regions showed little signal when compared to the primary ORF, both in codRNAs and lncRNAs (Wilcoxon test, p &lt; 10<sup>−9</sup> in human, mouse, and zebrafish, p &lt; 0.05 in <italic>Arabidopsis</italic>, insufficient data for fruit fly and yeast precluded the analysis for these species). The data also indicated that ribosome binding is not always restricted to the primary ORF, especially in lncRNAs, as ribosome protection could sometimes be observed for additional ORFs.</p><p>Taken together, these results indicate that lncRNAs have ribosome profiling signatures consistent with translation, with a strong decrease of ribosome density in the 3′UTR but not the 5′UTR region, and preferential binding of ribosomes to the primary ORF. There exists the possibility that the translated peptides are degraded soon after being produced. However, we estimate that the percentage of cases that may undergo nonsense-mediated decay (NMD, see ‘Materials and methods’ for more details) is low, between 4.47 and 14.11% depending on the species. For comparison, the percentage for protein-coding transcripts showing the same patterns (including transcripts annotated as NMD in Ensembl) is between 0.34 and 13.33%.</p></sec><sec id="s2-5"><title>lncRNAs are less conserved than codRNAs</title><p>Are the putatively translated ORF in lncRNAs conserved? We performed sequence similarity searches using BLASTP (E-value &lt; 10<sup>−4</sup>) against all annotated coding transcripts in Ensembl, as well as against the primary ORFs in lncRNAs, for the six species studied here (<xref ref-type="supplementary-material" rid="SD1-data SD2-data">Supplementary files 1D and 2B</xref>). The number of lncRNA_ribo with homologues in other species was remarkably low (0–15.6%) except for zebrafish (49.4%). In contrast, the majority of codRNAs had homologues in other species (&gt;95% for vertebrates and fruit fly and 70–73% for <italic>Arabidopsis</italic> and yeast). After we discarded lncRNAs that showed cross-species conservation, association with ribosomes was still very prevalent (80.4% of mouse, 40.3% of human, and 22.1% of zebrafish lncRNAs were associated with ribosomes).</p><p>We also investigated whether the ribosome-associated ORFs in lncRNAs showed homology to annotated proteins in the same species. The values were very low for all the species (0–12.4%) except for zebrafish (47.5%). Therefore, in general lncRNAs are not truncated duplicated copies (pseudogenes). The case of zebrafish is an exception probably because of missing protein-coding annotations in this species.</p></sec><sec id="s2-6"><title>Coding properties of ribosome-protected ORFs in lncRNAs</title><p>Subsequently, we compared the sequence coding properties of the primary ORF in lncRNAs with those in <italic>bona fide</italic> coding and non-coding sequences using a hexamer-based coding score (see ‘Materials and methods’). In all species the coding scores of the primary ORF in lncRNAs, while lower than that of codRNAs, were significantly higher than the coding score of ORFs in introns (<xref ref-type="fig" rid="fig6">Figure 6</xref>, Wilcoxon test lncRNA_ribo vs intron, human, mouse, zebrafish, and <italic>Arabidopsis</italic> p &lt; 10<sup>−16</sup>; fruit fly and yeast p &lt; 10<sup>−5</sup>). This clearly shows that ORFs in lncRNAs are more coding-like than random ORFs. We repeated the same comparison using 100 different randomly sampled intronic sequence sets, and in &gt;95% of the cases, we obtained the same result. lncRNAs associated with ribosomes (lncRNA_ribo) showed higher coding scores than those not associated with ribosomes (lncRNA_noribo), even when we did not use the ribosome profiling information and compared the longest ORF in both types of transcripts (<xref ref-type="fig" rid="fig6s1">Figure 6—figure supplement 1</xref>). We reached similar conclusions when we restricted the analysis to annotated lncRNA transcripts (<xref ref-type="fig" rid="fig6s2">Figure 6—figure supplement 2</xref>), when we used ORFs from gene deserts as an alternative non-coding sequence set (differences with lncRNAs significant by Wilcoxon test, p &lt; 10<sup>−16</sup>, see ‘Materials and methods’ for more details), and when we restricted the analysis to lncRNAs for which we did not find protein coding homologues in the other species studied (<xref ref-type="fig" rid="fig6s3">Figure 6—figure supplement 3</xref>). Because a high proportion of lncRNAs contained small ORFs, we repeated the comparison only considering transcripts with ORFs shorter than 100 amino acids to avoid any length biases, again obtaining similar results (<xref ref-type="fig" rid="fig6s4">Figure 6—figure supplement 4</xref>). The use of other coding scores, for example based on codon frequencies instead of hexamer frequencies or related metrics such as GC content produced consistent results (<xref ref-type="fig" rid="fig6s5">Figure 6—figure supplement 5</xref>; <xref ref-type="supplementary-material" rid="SD1-data">Supplementary file 1E</xref>).<fig-group><fig id="fig6" position="float"><object-id pub-id-type="doi">10.7554/eLife.03523.016</object-id><label>Figure 6.</label><caption><title>Coding scores in ORFs from different types of transcripts.</title><p>Intron: randomly selected intronic regions; lncRNA_noribo: lncRNAs not associated with ribosomes; lncRNA_ribo: lncRNAs associated with ribosomes; pseudogene: pseudogenes associated with ribosomes; codRNAne: coding transcripts encoding non-validated proteins associated with ribosomes; codRNAe: coding transcripts encoding experimentally validated proteins. The coding score was calculated as the log ratio of hexamer frequencies in coding vs intronic sequences. In lncRNA_noribo and introns, we considered the longest ORF and in the rest of transcripts the primary ORF. The Class ‘pseudogene’ was only included in species with more than 20 expressed pseudogenes with mapped ribosome profiling reads. The coding score of the primary ORF in lncRNAs (lncRNA_ribo) was significantly higher than the coding score in ORFs defined in introns (Wilcoxon test, human, mouse, zebrafish, and Arabidopsis p &lt; 10<sup>−16</sup>; fruit fly and yeast p &lt; 10<sup>−4</sup>, Wilcoxon test) and in lncRNA_ribo it was significantly higher than in lncRNA_noribo in four species (Wilcoxon test, human, mouse and zebrafish p &lt; 10<sup>−5</sup>, and Arabidopsis p &lt; 0.05). Transcripts from genes of different evolutionary age were taken from the literature (see manuscript text). The number of transcripts was 68 for rodent, 127/123 for mammalian (mouse/human as reference species), 11,203/13,423/9812 for metazoan (mouse/human/zebrafish), 162 for fish, 208 for Crucifera, 28 for <italic>S. cerevisiae</italic> and 84 for Saccharomyces. The youngest class of codRNAs displayed similar scores than lncRNA_ribo in mouse, zebrafish, and yeast (classes rodent, fish and <italic>S. cerevisiae</italic>, respectively), being only significantly higher in human and Arabidopsis (Wilcoxon test, p &lt; 0.005; classes primate and Cruciferae). We did not analyze young genes in fruit fly due to lack of a suitable young set of codRNAs in this species.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.016">http://dx.doi.org/10.7554/eLife.03523.016</ext-link></p></caption><graphic xlink:href="elife-03523-fig6-v1.tif"/></fig><fig id="fig6s1" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03523.017</object-id><label>Figure 6—figure supplement 1.</label><caption><title>Coding scores for the longest ORF.</title><p>Comparison between lncRNAs associated and not associated with ribosomes using the longest ORF in both cases (lncRNA_ribo and lncRNA_noribo, respectively). Differences between lncRNA_ribo and lncRNA_noribo are significant by a Wilcoxon test (p &lt; 10<sup>−10</sup> in human, mouse, and zebrafish; p &lt; 0.005 in Arabidopsis).</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.017">http://dx.doi.org/10.7554/eLife.03523.017</ext-link></p></caption><graphic xlink:href="elife-03523-fig6-figsupp1-v1.tif"/></fig><fig id="fig6s2" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03523.018</object-id><label>Figure 6—figure supplement 2.</label><caption><title>Coding scores in different classes of annotated sequences.</title><p>Comparison between different transcript classes using only annotated lncRNAs. Yeast transcriptome is composed of very few annotated lncRNAs, and this analysis could not be performed.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.018">http://dx.doi.org/10.7554/eLife.03523.018</ext-link></p></caption><graphic xlink:href="elife-03523-fig6-figsupp2-v1.tif"/></fig><fig id="fig6s3" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03523.019</object-id><label>Figure 6—figure supplement 3.</label><caption><title>Coding scores in lncRNAs without homologues in other species.</title><p>Comparison between different transcript classes using only lncRNA with no homologues (noH) in other species. Only species in which several lncRNA_ribo and lncRNA_noribo had homology matches were considered.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.019">http://dx.doi.org/10.7554/eLife.03523.019</ext-link></p></caption><graphic xlink:href="elife-03523-fig6-figsupp3-v1.tif"/></fig><fig id="fig6s4" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03523.020</object-id><label>Figure 6—figure supplement 4.</label><caption><title>Coding scores in small ORFs from different types of transcripts.</title><p>Here we only employed lncRNAs in which the primary ORF was shorter than 100 amino acids. codRNA refers to joined codRNAe and codRNAne sets, since experimentally verified proteins are usually longer than 100 amino acid. The number of transcripts is shown in red.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.020">http://dx.doi.org/10.7554/eLife.03523.020</ext-link></p></caption><graphic xlink:href="elife-03523-fig6-figsupp4-v1.tif"/></fig><fig id="fig6s5" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03523.021</object-id><label>Figure 6—figure supplement 5.</label><caption><title>Use of different coding statistics in human transcripts.</title><p>Equal dicodon was based on the observed hexamer frequencies in coding sequences vs hexamer equiprobability, intron dicodon was based on the differences between hexamer frequencies in coding vs non-coding sequences and intron_monocodon was based on the observed codon frequences in coding sequences vs codon equiprobability.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.021">http://dx.doi.org/10.7554/eLife.03523.021</ext-link></p></caption><graphic xlink:href="elife-03523-fig6-figsupp5-v1.tif"/></fig><fig id="fig6s6" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03523.022</object-id><label>Figure 6—figure supplement 6.</label><caption><title>Ribosome protection patterns in transcripts containing short ORFs.</title><p>(<bold>A</bold>) Mouse CUFF.34338.1 (chr5:113183493–113188347) is a novel lncRNA, it contains an ORF encoding a 169 amino acid protein associated with ribosomes and with protein-coding homologues in human, zebrafish, and yeast. (<bold>B</bold>) ENSMUST00000107081 is an annotated codRNA in mouse which evolved recently since no homologues were found in any other species. It has a small ORF that translates a 55 amino acid protein. (<bold>C</bold>) AT1G34418.1 is an annotated lncRNA in Arabidopsis showing abundant association with ribosomes in the 5′UTR region, the primary ORF (34 amino acid) and the final region of the transcript, which contains two redundant ORFs (in red) coding the sequence: MGLGFVN(V/F)LLGM. RNAseq: profile of RNAseq reads. RPFs: profile of ribosome profiling reads. Exon-intron transcript structures are represented; the thickest boxes on the exons are the primary ORFs.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.022">http://dx.doi.org/10.7554/eLife.03523.022</ext-link></p></caption><graphic xlink:href="elife-03523-fig6-figsupp6-v1.tif"/></fig></fig-group></p><p>At the individual transcript level, a sizeable fraction of lncRNAs associated with ribosomes displayed significantly higher coding scores than expected for non-coding sequences (p &lt; 0.05 in all 100 intronic random sets; data in <xref ref-type="supplementary-material" rid="SD2-data">Supplementary file 2C</xref>; examples in <xref ref-type="fig" rid="fig6s6">Figure 6—figure supplement 6</xref>). These transcripts are comprised of 143 human lncRNAs (35.5% of the lncRNAs, score &gt; 0.0189), 137 mouse lncRNAs (35.1%, score &gt; 0.0377), 379 zebrafish lncRNAs (52.1% score &gt; 0.0095), 7 fruit fly lncRNAs (31.8%, score &gt; −0.0483), 43 <italic>Arabidopsis</italic> lncRNAs (46.2%, score &gt; −0.0202), and 5 yeast lncRNAs (83.3%, score &gt; 0.03387). Annotated and novel lncRNAs were present in similar proportions in these sets, supporting the validity of our strategy of merging the two types of transcripts from the beginning. We also noted that the fraction of lncRNAs with coding homologues in other species increased in these sets. For example, whereas the proportion of total human lncRNA_ribo with homologues in other species was 15.6%, in the set with significant coding scores it was 29.3%. This number increased to 57.3% when we performed searches against the NCBI non-redundant peptide database ‘nr’, as some of the ORFs in lncRNAs are annotated as predicted peptides in this database.</p><p>If ORFs in lncRNAs are being translated this is likely to be a relatively recent evolutionary event, as many lncRNAs are lineage-specific (<xref ref-type="bibr" rid="bib68">Pauli et al., 2012</xref>; <xref ref-type="bibr" rid="bib60">Necsulea et al., 2014</xref>; our data). It is well established that proteins of different evolutionary age display distinct sequence properties, including different codon usage (<xref ref-type="bibr" rid="bib85">Toll-Riera et al., 2009</xref>; <xref ref-type="bibr" rid="bib12">Carvunis et al., 2012</xref>; <xref ref-type="bibr" rid="bib66">Palmieri et al., 2014</xref>). We retrieved sets of annotated protein-coding transcripts of different evolutionary age from human, mouse, zebrafish, <italic>Arabidopsis</italic>, and yeast available from various studies (<xref ref-type="bibr" rid="bib22">Ekman and Elofsson, 2010</xref>; <xref ref-type="bibr" rid="bib19">Donoghue et al., 2011</xref>; <xref ref-type="bibr" rid="bib62">Neme and Tautz, 2013</xref>) and expressed in the systems studied here. We found that the coding score was always lower in the youngest group than in older groups (<xref ref-type="fig" rid="fig6">Figure 6</xref>, Wilcoxon test, p &lt; 0.05). Remarkably, the youngest codRNAs showed a very similar coding score distribution to lncRNAs (<xref ref-type="fig" rid="fig6">Figure 6</xref>). We obtained similar results when we discarded lncRNAs that had homologues in any of the other species (<xref ref-type="fig" rid="fig6s3">Figure 6—figure supplement 3</xref>).</p><p>We also collected information from young protein coding genes encoding experimentally verified proteins according to Swiss-Prot (<xref ref-type="supplementary-material" rid="SD2-data">Supplementary file 2D</xref>). We observed that these proteins were short and the ORF occupied a relatively small fraction of the transcript, features typically observed in lncRNAs. For example, the average size of proteins encoded by primate-specific transcripts was 148 amino acids and the average transcript coverage 47%. The coding score was remarkably low and again similar to that of lncRNAs (median 0.008 for primate-specific human transcripts, 0.046 for rodent-specific mouse transcripts, and 0.089 for yeast-specific coding transcripts).</p></sec><sec id="s2-7"><title>Selection pressure signatures in ORFs associated with ribosomes</title><p>An important measure of the strength of purifying selection acting on a coding sequence is the ratio between the number of non-synonymous and synonymous single nucleotide polymorphisms (PN/PS). Given the nature of the genetic code, there are more possible non-synonymous mutations than synonymous mutations. Under neutrality (no purifying selection), the PN/PS ratio is expected to be approximately 2.89 (<xref ref-type="bibr" rid="bib61">Nei and Gojobori, 1986</xref>).</p><p>Here, we applied the large amount of available polymorphism data for human, mouse, and zebrafish to compare the level of purifying selection in primary ORFs from codRNAs and lncRNAs (<xref ref-type="fig" rid="fig7">Figure 7</xref>; <xref ref-type="supplementary-material" rid="SD1-data">Supplementary file 1F</xref>). In general, human sequences showed higher PN/PS ratios than sequences from the other analyzed species, probably due to the presence of many slightly deleterious mutations segregating in the population (<xref ref-type="bibr" rid="bib23">Eyre-Walker, 2002</xref>). However, despite the intrinsic differences between organisms, we observed the same general trends. First, the PN/PS was significantly lower in codRNAs than in lncRNAs (proportion test, p &lt; 10<sup>−5</sup>), denoting stronger purifying selection in the former. Second, there was a very clear inverse relationship between the strength of purifying selection and the age of the gene (p &lt; 10<sup>−15</sup> between the youngest and rest of codRNAs in mouse and zebrafish), in agreement with previous studies (<xref ref-type="bibr" rid="bib53">Liu et al., 2008</xref>; <xref ref-type="bibr" rid="bib10">Cai et al., 2009</xref>). High PN/PS values were also observed in the subset of young genes encoding experimentally validated proteins in human (primate-specific transcripts median PN/PS of 3.10) and mouse (rodent-specific transcripts median PN/PS 1.42), confirming this tendency. Third, the distribution of PN/PS values in lncRNAs was very similar to that of young protein-coding genes. In human and mouse, there were no significant differences, and in the case of zebrafish the lncRNAs had even slightly lower PN/PS values than the fish-specific protein coding genes (p &lt; 0.01).<fig id="fig7" position="float"><object-id pub-id-type="doi">10.7554/eLife.03523.023</object-id><label>Figure 7.</label><caption><title>Selective pressure in ORFs from different types of transcripts.</title><p>PN/PS: ratio between the number of non-synonymous and synonymous single nucleotide polymorphisms (SNPs) in the complete set of primary ORFs for a given class of transcripts (in lncRNA_noribo the longest ORF was considered). In blue, data for different coding and non-coding transcript classes. In brown, data for different age codRNA classes. The bars represent the 95% confidence interval for the PN/PS value. For the species not shown there was not sufficient data to perform this analysis.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.023">http://dx.doi.org/10.7554/eLife.03523.023</ext-link></p></caption><graphic xlink:href="elife-03523-fig7-v1.tif"/></fig></p></sec></sec><sec id="s3" sec-type="discussion"><title>Discussion</title><p>Here, we analyzed the patterns of ribosome protection in polyA+ transcripts from cells belonging to six different eukaryotic species. Among the expressed transcripts, we identified many lncRNAs in the different species. The vast majority of transcripts annotated as coding showed association with ribosomes (&gt;92% in all species). Remarkably, a very large number of transcripts annotated as long non-coding RNA (lncRNAs) also showed such association (30–82% depending on the data set). Considering that lncRNAs are typically much shorter and expressed at lower levels than codRNAs, which may hinder the identification of ribosome association, this is a very significant fraction. In addition, the patterns of ribosome protection along the transcript are similar to those of protein-coding genes. Therefore, many lncRNAs appear to be scanned by ribosomes and are likely to translate short peptides.</p><p>Long non-coding RNAs are classified as such in databases because, according to a number of criteria, they are unlikely to encode functional proteins. These criteria include the lack of a long ORF, the absence of amino acid sequence conservation, and the lack of known protein domains (<xref ref-type="bibr" rid="bib32">Harrow et al., 2012</xref>). Moreover, we expect lncRNAs not to have matches to proteomics databases, as this should classify them as coding. Annotated lncRNAs are typically longer than 200 nucleotides because this is the cutoff size normally implemented to differentiate them from other RNA classes such as microRNAs and small nuclear RNAs. In practice, it is difficult to classify a transcript as coding or non-coding on the basis of the ORF size (<xref ref-type="bibr" rid="bib17">Dinger et al., 2008</xref>). Some true coding sequences may be quite small, and by chance alone non-coding transcripts may have relatively long ORFs. The majority of lncRNAs contain ORFs longer than 24 amino acids, which can potentially correspond to real proteins. Short proteins are more difficult to detect than longer ones and consequently they are probably underestimated in databases. In recent years, the use of comparative genomics (<xref ref-type="bibr" rid="bib26">Frith et al., 2006</xref>; <xref ref-type="bibr" rid="bib47">Ladoukakis et al., 2011</xref>; <xref ref-type="bibr" rid="bib31">Hanada et al., 2013</xref>), proteomics (<xref ref-type="bibr" rid="bib79">Slavoff et al., 2013</xref>; <xref ref-type="bibr" rid="bib90">Vanderperre et al., 2013</xref>; <xref ref-type="bibr" rid="bib55">Ma et al., 2014</xref>), and a combination of evolutionary conservation and ribosome profiling data (<xref ref-type="bibr" rid="bib15">Crappé et al., 2013</xref>; <xref ref-type="bibr" rid="bib5">Bazzini et al., 2014</xref>) have shown that the number of short proteins is probably much higher than previously suspected (<xref ref-type="bibr" rid="bib3">Andrews and Rothnagel, 2014</xref>). In yeast, gene deletion experiments have provided evidence of functionality for short open reading frames (sORFs &lt; 100 amino acids) (<xref ref-type="bibr" rid="bib41">Kastenmayer et al., 2006</xref>); in zebrafish, several newly discovered sORFs appear to be involved in embryonic development (<xref ref-type="bibr" rid="bib67">Pauli et al., 2014</xref>) and other examples exist in insects (<xref ref-type="bibr" rid="bib56">Magny et al., 2013</xref>) and humans (<xref ref-type="bibr" rid="bib49">Lee et al., 2013</xref>; <xref ref-type="bibr" rid="bib78">Slavoff et al., 2014</xref>). In many cases, the transcripts containing sORFs will be classified as non-coding, especially if the ORF is not well conserved across different species.</p><p>One approach to identify potential coding transcripts is ribosome profiling (<xref ref-type="bibr" rid="bib36">Ingolia et al., 2009</xref>), which has been used to study translation of proteins in a wide range of organisms (<xref ref-type="bibr" rid="bib28">Guo et al., 2010</xref>; <xref ref-type="bibr" rid="bib37">Ingolia et al., 2011</xref>; <xref ref-type="bibr" rid="bib6">Brar et al., 2012</xref>; <xref ref-type="bibr" rid="bib58">Michel et al., 2012</xref>; <xref ref-type="bibr" rid="bib14">Chew et al., 2013</xref>; <xref ref-type="bibr" rid="bib20">Dunn et al., 2013</xref>; <xref ref-type="bibr" rid="bib34">Huang et al., 2013</xref>; <xref ref-type="bibr" rid="bib4">Artieri and Fraser, 2014</xref>; <xref ref-type="bibr" rid="bib5">Bazzini et al., 2014</xref>; <xref ref-type="bibr" rid="bib39">Juntawong et al., 2014</xref>; <xref ref-type="bibr" rid="bib57">McManus et al., 2014</xref>; <xref ref-type="bibr" rid="bib91">Vasquez et al., 2014</xref>). In several of these studies it has been noted that lncRNAs can be protected by ribosomes (<xref ref-type="bibr" rid="bib37">Ingolia et al., 2011</xref>; <xref ref-type="bibr" rid="bib14">Chew et al., 2013</xref>; <xref ref-type="bibr" rid="bib5">Bazzini et al., 2014</xref>; <xref ref-type="bibr" rid="bib39">Juntawong et al., 2014</xref>). However, there is no consensus on whether the observed patterns are consistent with translation. For example in the original analysis of mouse stem cells, which we reanalyzed here, it was reported that many lncRNAs were polycistronic transcripts encoding short proteins (<xref ref-type="bibr" rid="bib37">Ingolia et al., 2011</xref>), but in another paper where the same data were processed in a different way, they concluded that lncRNAs were unlikely to be protein-coding (<xref ref-type="bibr" rid="bib30">Guttman et al., 2013</xref>). A zebrafish ribosome profiling study reported resemblance between lncRNAs and 5′leaders of coding RNAs; the authors suggested that translation may play a role in lncRNA regulation (<xref ref-type="bibr" rid="bib14">Chew et al., 2013</xref>). Nevertheless, in the same study dozens of lncRNAs were proposed to be <italic>bona fide</italic> protein-coding transcripts. In <italic>Arabidopsis</italic>, the translational efficiency values of highly expressed lncRNAs (&gt;5 FPKM) were similar to those of coding RNAs and some lncRNAs had profiles consistent with initiation and termination of translation (<xref ref-type="bibr" rid="bib39">Juntawong et al., 2014</xref>). Finally, using yeast data, <xref ref-type="bibr" rid="bib93">Wilson and Masel. (2011)</xref> found many cases of non-coding transcripts bound to ribosomes and suggested that this facilitates the evolution of novel protein-coding genes from non-coding sequences.</p><p>The disparity of results obtained in different systems motivated us to retrieve the original data and perform exactly the same analyses for six different species. As lncRNA catalogues are still very incomplete for most species, we also defined sets of novel lncRNAs using the RNA-seq sequencing reads for de novo transcript assembly. We discovered many novel, non-annotated, lncRNAs, especially in zebrafish, mouse, and fruit fly (<xref ref-type="table" rid="tbl2">Table 2</xref>). After the analysis of the ribosome profiling data, the same general picture emerged for the different biological systems, indicating that we are detecting very fundamental properties. In transcripts classified as lncRNAs, the ribosome profiling reads tend to cover a smaller fraction of the transcript than in typical codRNAs, in agreement with a shorter relative size of the ORF accumulating the largest number of ribosome profiling reads (primary ORF). We also find that the translational efficiency of regions corresponding to the primary ORF is much higher than that of 3′UTRs, both in codRNAs and lncRNAs, consistent with translation of the transcripts. Furthermore, the primary ORF of lncRNAs showed significantly higher coding score than the longest ORF extracted from randomly selected non-coding regions.</p><p>lncRNAs often contain several potentially translated ORFs (<xref ref-type="bibr" rid="bib37">Ingolia et al., 2011</xref>). Transcripts encoding multiple short proteins have been reported in insects (<xref ref-type="bibr" rid="bib75">Savard et al., 2006</xref>) and could be common in other species as well (<xref ref-type="bibr" rid="bib83">Tautz, 2009</xref>). One such candidate is AT1G34418.1 in <italic>Arabidopsis</italic>, an annotated lncRNA which contains a primary ORF followed by two instances of a 12 amino acid ORF also covered by ribosome profiling reads (<xref ref-type="fig" rid="fig6s6">Figure 6—figure supplement 6</xref>). This case is reminiscent of the gene <italic>pri</italic> in fruit fly, which regulates tarsal development (<xref ref-type="bibr" rid="bib27">Galindo et al., 2007</xref>) and translates several small redundant ORFs (<xref ref-type="bibr" rid="bib45">Kondo et al., 2007</xref>).</p><p>lncRNAs are poorly conserved across species and so, if translated, they will produce species- or lineage-specific proteins. Recently evolved proteins are markedly different from widely distributed ancient proteins; they are shorter, subject to weaker selective constraints and expressed at lower levels (<xref ref-type="bibr" rid="bib1">Albà and Castresana, 2005</xref>; <xref ref-type="bibr" rid="bib10">Cai et al., 2009</xref>; <xref ref-type="bibr" rid="bib51">Liu et al., 2010</xref>; <xref ref-type="bibr" rid="bib19">Donoghue et al., 2011</xref>; <xref ref-type="bibr" rid="bib12">Carvunis et al., 2012</xref>; <xref ref-type="bibr" rid="bib95">Xie et al., 2012</xref>; <xref ref-type="bibr" rid="bib94">Wissler et al., 2013</xref>; <xref ref-type="bibr" rid="bib63">Neme and Tautz, 2014</xref>). Here for the first time, we have compared the properties of the ORFs in lncRNAs associated with ribosomes with the properties of annotated, and in some cases experimentally validated, young protein-coding genes. lncRNAs and young protein-coding transcripts are virtually indistinguishable regarding their coding score and ORF selective constraints (<xref ref-type="fig" rid="fig6 fig7">Figures 6 and 7</xref>), which is consistent with the idea that many lncRNAs encode new peptides.</p><p>Although it is unclear how many of these peptides are functional, the data indicate that at least a fraction of them may be functional. Sequences that translate functional proteins are expected to display signs of selection related to preferential usage of certain amino acids and codons. This can be used to differentiate between coding and non-coding entities, especially in the absence of cross-species conservation, as is the case of many lncRNAs. About 35–40% of primary ORFs in human and mouse lncRNAs displayed coding scores that were significantly higher than those expected for non-coding sequences, making them excellent candidates for translating functional proteins. In fact, five human lncRNAs associated with ribosomes that exhibited high coding scores in our study were re-annotated as protein-coding transcripts in a subsequent Ensembl gene annotation release (version 75, <xref ref-type="supplementary-material" rid="SD2-data">Supplementary file 2C</xref>). Gene knock-out experiments in fly have discovered that young proteins, even if rapidly evolving, are often essential for the organism and can cause important defects when deleted (<xref ref-type="bibr" rid="bib13">Chen et al., 2010</xref>; <xref ref-type="bibr" rid="bib74">Reinhardt et al., 2013</xref>). Similarly, some peptides translated from lncRNAs may have important cellular functions yet to be discovered.</p><p>lncRNAs tend to be expressed at much lower levels than typical codRNAs, so, everything else being equal, the amount of translated peptide is also expected to be smaller. It may be that some of these peptides are not functional, but their translation does not produce a large enough deleterious effect for them to be eliminated via selection. Pseudogenes also showed extensive association with ribosomes in our study, indicating that the translation machinery is probably not very selective or that some pseudogenes produce functional proteins. This question may be worth revisiting, as a recent proteomics study has also found that dozens of human pseudogenes produce peptides (<xref ref-type="bibr" rid="bib44">Kim et al., 2014</xref>).</p><p>The data also indicate that a fraction of lncRNAs have not acquired the capacity to be translated. Depending on the experiment analyzed, a number of lncRNAs did not show any significant association with ribosomes. As previously discussed, this is probably affected by a lack of sensitivity; it is also true that the lncRNAs not associated with ribosomes tended to show lower coding scores than lncRNAs associated with ribosomes, even when we did not use the ribosome profiling data and simply compared the longest ORF in both kinds of transcripts.</p><p>Recently, it has been reported that human-specific protein-coding genes are often related to non-coding transcripts in macaque, pointing to a non-coding origin for many newly evolved proteins (<xref ref-type="bibr" rid="bib95">Xie et al., 2012</xref>). More generally, one may view de novo protein-coding gene evolution as a continuum from non-functional genomic sequences to fully-fledged protein-coding genes (<xref ref-type="bibr" rid="bib1">Albà and Castresana, 2005</xref>; <xref ref-type="bibr" rid="bib85">Toll-Riera et al., 2009</xref>; <xref ref-type="bibr" rid="bib12">Carvunis et al., 2012</xref>). Therefore, many lncRNAs could be in intermediate states in this process, their pervasive translation serving as the building material for the evolution of new proteins. It may be difficult to obtain functional proteins from completely random ORFs (<xref ref-type="bibr" rid="bib38">Jacob, 1977</xref>), but the effect of natural selection preventing the production of toxic peptides (<xref ref-type="bibr" rid="bib93">Wilson and Masel, 2011</xref>), and the high number of transcripts expressed in the genome, may facilitate this process.</p></sec><sec id="s4" sec-type="materials|methods"><title>Materials and methods</title><sec id="s4-1"><title>Sequencing and mapping of reads</title><p>We downloaded the original data from Gene Expression Omnibus (GEO) for six different ribosome profiling experiments that had both ribosome footprinting and polyA+ RNA-seq sequencing reads: mouse (<italic>M. musculus</italic>) (<xref ref-type="bibr" rid="bib37">Ingolia et al., 2011</xref>), human (<italic>H. sapiens</italic>, HeLa cells) (<xref ref-type="bibr" rid="bib28">Guo et al., 2010</xref>), zebrafish (<italic>D. rerio</italic>) (<xref ref-type="bibr" rid="bib14">Chew et al., 2013</xref>), fruit fly (<italic>D. melanogaster</italic>) (<xref ref-type="bibr" rid="bib20">Dunn et al., 2013</xref>), <italic>Arabidopsis</italic> (<italic>A. thaliana</italic>) (<xref ref-type="bibr" rid="bib39">Juntawong et al., 2014</xref>), and yeast (<italic>S. cerevisiae</italic>) (<xref ref-type="bibr" rid="bib57">McManus et al., 2014</xref>). We retrieved genome sequences and gene annotations from Ensembl v.70 and Ensembl Plants v.21 (<xref ref-type="bibr" rid="bib25">Flicek et al., 2012</xref>).</p><p>Raw ribosome and RNA-seq sequencing reads underwent quality filtering using Condentri (v.2.2) (<xref ref-type="bibr" rid="bib80">Smeds and Künstner, 2011</xref>) with the following settings (-hq=30 –lq=10). Adaptors described in the original publications were trimmed from filtered reads if at least five nucleotides of the adaptor sequence matched the end of each read. In zebrafish, reads from different developmental stages were pooled to improve read coverage. In all experiments, reads below 25 nucleotides were not considered. Clean ribosome short reads were filtered by mapping them to the corresponding species reference RNA (rRNA) using the Bowtie2 short-read alignment program (v. 2.1.0) (<xref ref-type="bibr" rid="bib48">Langmead et al., 2009</xref>). Unaligned reads were aligned to a genomic reference genome with Bowtie2 allowing one mismatch in the first 'seed' region (the length of this region was selected according to the descriptions provided in each individual experiment). RNA-seq short reads were mapped with Tophat (v. 2.0.8) (<xref ref-type="bibr" rid="bib43">Kim et al., 2013</xref>) to the corresponding reference genome. We allowed two mismatches in the alignment with the exception of zebrafish, for which we allowed three mismatches since the reads were significantly longer. Multiple mapping was allowed unless specifically stated.</p></sec><sec id="s4-2"><title>Defining a set of expressed transcripts</title><p>Expressed transcripts were assembled using Cufflinks (v 2.2.0) (<xref ref-type="bibr" rid="bib86">Trapnell et al., 2010</xref>). We initially considered a transcript as expressed if it was covered by at least four reads and its abundance was higher than 1% of the most abundant isoform of the gene. We also discarded assembled transcripts in which &gt;20% of reads were mapped to several locations in the genome. Gene annotation files from Ensembl (gtf format, v.70) were provided to Cufflinks to guide the reconstruction of already annotated transcripts. Annotated transcripts were divided into coding RNAs and long non-coding RNAs (lncRNAs), we only considered lncRNAs that were not part of genes with coding transcripts. Novel isoforms corresponding to annotated loci were not analyzed. Transcripts that did not match or overlapped annotated genes were labeled 'novel’ lncRNAs. We used a length threshold of 200 nucleotides to select novel long non-coding RNAs, as in ENCODE annotations (<xref ref-type="bibr" rid="bib18">Djebali et al., 2012</xref>).</p><p>Strand directionality of multiexonic transcripts was inferred using the splice site consensus sequence. We only considered monoexonic transcripts in the case of <italic>Arabidopsis</italic> and yeast, provided the transcripts were intergenic.</p><p>The inclusion of novel lncRNAs made it possible to perform analyses of species for which there are very few annotated lncRNAs. Annotations of UTR regions in yeast genes were missing from Ensembl because of the variability observed in transcription start sites (TSS). However, we downloaded a set of available 5′ and 3′UTRs obtained by deep transcriptomics (<xref ref-type="bibr" rid="bib59">Nagalakshmi et al., 2008</xref>) and added them to the existing yeast Ensembl annotations before assembling the transcriptome.</p><p>Coding transcripts were classified into different subclasses depending on the existing annotations: (a) Annotated protein-coding transcripts (codRNA), (b) Annotated transcripts with surveillance mechanisms (nonsense mediated decay, nonstop mediated decay, and no-go decay), (c) Annotated pseudogenes. We removed protein-coding transcripts in which annotated coding sequences (CDS) are still incomplete.</p><p>Subsequently, we defined an additional subset of annotated protein-coding transcripts with well-established coding properties based on the existence of an experimentally verified protein in Swiss-Prot for the gene (‘evidence at protein level’, downloaded 29 October 2013, <xref ref-type="bibr" rid="bib88">UniProt Consortium, 2014</xref>). These transcripts were labeled codRNAe. The rest of annotated protein-coding transcripts were abbreviated codRNAne. In zebrafish, most proteins are not yet experimentally validated; and therefore, we generated a single group.</p><p>We built a data set of human lncRNAs with described non-coding functions using data obtained from several recent reviews (<xref ref-type="bibr" rid="bib70">Ponting et al., 2009</xref>; <xref ref-type="bibr" rid="bib87">Ulitsky and Bartel, 2013</xref>; <xref ref-type="bibr" rid="bib24">Fatica and Bozzoni, 2014</xref>). This data set included 29 different genes (<xref ref-type="supplementary-material" rid="SD2-data">Supplementary file 2A</xref>).</p><p>We used cufflinks to estimate the expression level of a transcript in FPKM units (Fragments Per Kilobase per total Million mapped reads). We used a threshold of &gt;0.5 FPKM except in yeast, in which the average read coverage per transcript was much higher than in the other species and the threshold was set up at &gt;5 FPKM. These thresholds guaranteed detection of ribosome association for the majority of expressed coding transcripts (&gt;92%), while yielding proportions of transcripts comparable to those reported in the original papers.</p></sec><sec id="s4-3"><title>Definition of potential open reading frames (ORFs) and other transcript regions</title><p>We predicted all possible open reading frames (ORFs) in the expressed transcripts. We defined an ORF as any sequence starting with an AUG codon and finishing with a stop codon (TAA, TAG, or TGA), and at least 75 nucleotides long. This would correspond to a 24 amino acid protein, which is the size of the smallest complete human polypeptide found in genetic screen studies (<xref ref-type="bibr" rid="bib33">Hashimoto et al., 2001</xref>). This ORF definition will not detect non-canonical ORFs with different start or stop codons, although these ORFs often correspond to regulatory ORFs (uORFs) in the 5′UTR region. In monoexonic transcripts (<italic>Arabidopsis</italic> and yeast), we considered all six possible different frames.</p><p>We also defined each transcript 5′UTR as the region between the transcription start site and the AUG codon from the left-most predicted ORF, and the 3′UTR the region from the stop codon in the right-most predicted ORF to the transcript end. UTRs with lengths below 30 nucleotides were not analyzed since ribosome reads could not be properly aligned to these regions due to their small size. Regions between two consecutive putatively translated ORFs (with ribosome profiling reads) were termed interORF. We only analyzed this region when the length of the interORF sequence in a transcript was 30 nucleotides or longer.</p><p>We defined a set of <italic>bona fide</italic> non-coding sequences sampled from intronic fragments. We used the introns of the genes expressed in each experiment, provided they did not overlap to any exons from other overlapping genes. We randomly selected fragments in such a way as to simulate the same size distribution as in the complete set of expressed transcripts. We performed 100 simulations of intron sampling to ensure the results were robust to the randomization process. We selected the longest ORF in each intronic fragment for the calculation of coding scores and GC content.</p></sec><sec id="s4-4"><title>Association with ribosomes and translational efficiency (TE)</title><p>We computed the number of reads overlapping each feature of interest (transcript, UTR, ORF, and interORF) using the BEDTools package (v. 2.16.2) (<xref ref-type="bibr" rid="bib72">Quinlan and Hall, 2010</xref>). We only considered ribosome reads in which more than half of their length spanned the considered region. This was considered appropriate because the ribosome P-site is usually detected at the central region of the read, with only slight variations depending on the experimental setting. We set up a minimum ribosome profiling coverage of 75 nucleotides per transcript to define the transcript or transcript region (e.g., ORF) as associated with ribosomes. This is significantly longer than the length of the ribosome profiling sequencing reads (36–51 nucleotides) and is consistent with the minimum ORF length threshold.</p><p>The translational efficiency (TE) of a sequence has been previously defined as the density of ribosome profiling (RPF) reads normalized by transcript abundance (<xref ref-type="bibr" rid="bib36">Ingolia et al., 2009</xref>). We calculated it by dividing the FPKM of the ribosome profiling experiment by the FPKM of the RNA-seq experiment. In transcripts, we also obtained the maximum TE by dividing the sequence in 90 nucleotide windows and selecting the window with the highest TE value.</p><p>In order to have a null model of ribosome binding against which to compare the ribosome profiling signal in codRNA and lncRNA transcripts, we extracted annotated 3′ untranslated regions (3′UTRs) from codRNAs in genes in which UTRs did not overlap with coding sequences from other transcripts, and by randomly selecting 3′UTRs with a minimum length of 30 nucleotides, we built a set of 3′UTR sequences with the same size distribution as the complete transcripts. For each species, we calculated the TE values for codRNAs, lncRNA, and 3′UTR sequences. We used the empirical distribution of TE values in the 3′UTRs to calculate the number of codRNAs and lncRNAs that showed significantly higher TE value than expected under the null model at a p &lt; 0.05. These corresponded to TE values higher than 0.1043 in mouse, 0.2556 in human, 0.0004 in zebrafish, 0.7164 in fruit fly, 0.1800 in <italic>Arabidopsis</italic>, and 0.0527 in yeast.</p><p>We defined the primary ORF in a transcript as the ORF with the largest number of RPF reads with respect to the total RPF reads covering the transcript. The rest of ORFs ≥24 amino acids associated with ribosomes were considered as well; when two or more ORFs overlapped, we selected the longest one. In ORFs, interORFs, and UTRs, we computed the TE along the whole region. For comparing the TE in different regions, we only considered transcripts in which all regions had &gt;0.2 FPKM.</p></sec><sec id="s4-5"><title>Peptide evidence in existing proteomics databases</title><p>We downloaded all peptide sequences from the PeptideAtlas database: 338,013 human peptides (August 2013), 101,695 mouse peptides (June 2013), and 86,836 yeast peptides (March 2013). We investigated if the number of ribosome-associated protein-coding transcripts that matched the peptides in these databases varied with protein length. We omitted this analysis in zebrafish and <italic>Arabidopsis</italic> due to the lack of sufficiently large peptide databases. The matches were identified using BLASTP searches (v. 2.2.28+) (<xref ref-type="bibr" rid="bib2">Altschul et al., 1997</xref>). We selected perfect matches only.</p></sec><sec id="s4-6"><title>Evidence of nonsense mediated decay in ORFs</title><p>We investigated how many primary ORFs may be candidates for being regulated via non-sense mediated decay (NMD) surveillance pathways, whose main function is to eliminate transcripts containing premature stop codons. We defined NMD candidates as all cases in which the stop-codon from a predicted ORF was located ≥55 nucleotides upstream of a splice junction site, provided the stop-codon was not in the terminal exon (<xref ref-type="bibr" rid="bib76">Scofield et al., 2007</xref>). This mechanism is well characterized in protein-coding genes and it has been proposed as a way to degrade non-functional peptides translated in lncRNAs (<xref ref-type="bibr" rid="bib82">Tani et al., 2013</xref>). Other surveillance mechanisms, such as non-stop-mediated decay or no-go decay, were not considered since all predicted ORFs finished at a stop codon, and we did not analyze RNA secondary structures.</p></sec><sec id="s4-7"><title>Defining ages of protein-coding transcripts</title><p>We utilized existing gene age classifications in human, mouse, and zebrafish (<xref ref-type="bibr" rid="bib62">Neme and Tautz, 2013</xref>) to identify young gene classes: human primate-specific (∼55.8 My), mouse rodent-specific (∼61.7 My), human and mouse mammalian-specific (∼225 My), zebrafish actinopterygii-specific (∼420 My) (abbreviated fish) and metazoan (∼800 My). In yeast, we used predefined genes specific to <italic>S. cerevisiae</italic> (1–3 My)(abbreviated <italic>S. cerevisiae</italic>) and the <italic>Saccharomyces</italic> group (∼100 My) (<xref ref-type="bibr" rid="bib21">Ekman et al., 2007</xref>). In <italic>Arabidopsis</italic>, we retrieved <italic>Cruciferae</italic>(<italic>Brassicaceae</italic>)-specific genes (20–40 My) (<xref ref-type="bibr" rid="bib19">Donoghue et al., 2011</xref>). These genes are believed to have arisen primarily by de novo mechanisms, as no homologies in other species have been detected despite the fact that many closely related genomes have now been sequenced.</p></sec><sec id="s4-8"><title>Defining gene desert sequences</title><p>In humans, we obtained a set of gene desert sequences as defined in <xref ref-type="bibr" rid="bib65">Ovcharenko et al. (2005)</xref>. We selected two stable and two flexible gene deserts (the definition depends on the degree of conservation in other species). They belonged to chromosome 4 (flexible located in coordinates 136,000,001–138,000,000; stable located in coordinates 180,000,001–182,000,010) that has a high number of gene deserts; and chromosome 17 (flexible located in coordinates 51,100,001–51,900,000; stable located in coordinates 69,300,001–70,000,000) that has a high gene density. We ensured that no protein-coding genes were annotated in subsequent Ensembl versions in these regions. We predicted all possible ORFs in these regions and evaluated their coding score and GC content.</p></sec><sec id="s4-9"><title>ORF coding score</title><p>The examination of nucleotide hexamer frequencies has been shown to be a powerful way to distinguish between coding and non-coding sequences (<xref ref-type="bibr" rid="bib81">Sun et al., 2013</xref>; <xref ref-type="bibr" rid="bib92">Wang et al., 2013</xref>). We computed one coding score (CS) per hexamer:<disp-formula id="equ1"><mml:math id="m1"><mml:mrow><mml:msub><mml:mrow><mml:mi>C</mml:mi><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>log</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac bevelled="true"><mml:mrow><mml:msub><mml:mrow><mml:mi>f</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>q</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>f</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>q</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>−</mml:mo><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p><p>The coding hexamer frequencies were obtained from the open reading frame of all transcripts in a species encoding experimentally validated proteins (except for zebrafish in which all protein-coding transcripts were considered). The non-coding hexamer frequencies were calculated using the longest ORF in intronic regions, which were selected randomly from expressed protein-coding genes. Next, we used the following statistic to measure the coding score of an ORF:<disp-formula id="equ2"><mml:math id="m2"><mml:mrow><mml:msub><mml:mrow><mml:mi>C</mml:mi><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>O</mml:mi><mml:mi>R</mml:mi><mml:mi>F</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mstyle displaystyle="true"><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:msub><mml:mrow><mml:mi>C</mml:mi><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:mstyle></mml:mrow><mml:mi>n</mml:mi></mml:mfrac><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>where <italic>i</italic> is each sequence hexamer in the ORF, and <italic>n</italic> the number of hexamers considered.</p><p>The hexamers were calculated in steps of three nucleotides in frame (dicodons). We did not consider the initial hexamers containing a Methionine or the last hexamers containing a STOP codon, since they are not informative. Given that all ORFs were at least 75 nucleotides long the minimum value for <italic>n</italic> was 22.</p><p>We calculated other related statistics in a similar way. This included using an equiprobable hexamer distribution instead of the distribution obtained from non-coding sequences, or using codon frequencies instead of hexamer frequencies. These statistics showed somewhat lower power to distinguish between coding and non-coding sequences. As a complementary measure, we quantified the GC content in different coding and non-coding transcripts and ORFs.</p></sec><sec id="s4-10"><title>Sequence similarity searches</title><p>We employed BLASTP with an E-value cutoff of 10<sup>−4</sup> to compare the amino acid sequences encoded by ORFs in different kinds of transcripts. We enabled SEG to mask low complexity regions in protein sequences before doing the homology searches. We also searched for homologues in the NCBI non-redundant (nr) protein database (<xref ref-type="bibr" rid="bib71">Pruitt et al., 2014</xref>). BLAST sequence similarity search programs are based on gapped local alignments (<xref ref-type="bibr" rid="bib2">Altschul et al., 1997</xref>).</p></sec><sec id="s4-11"><title>Analysis of single nucleotide polymorphisms</title><p>We downloaded all available single-nucleotide polymorphisms (SNPs) from dbSNP (<xref ref-type="bibr" rid="bib77">Sherry et al., 2001</xref>) for human (∼50 million), mouse (∼64.2 million), and zebrafish (∼1.3 million). We did not consider other species due to insufficient data for the analysis. We classified SNPs in ORFs as non-synonymous (PN, amino acid altering) and synonymous (PS, not amino acid altering). We computed the PN/PS ratio in each sequence data set by using the sum of PN and PS in all sequences. The estimation of PN/PS ratios of individual sequences was in general not reliable due to lack of sufficient SNP data. We obtained confidence intervals using the proportion test in R (see below).</p></sec><sec id="s4-12"><title>Statistical data analyses</title><p>The analysis of the data, including generation of plots and statistical tests, was done with R (<xref ref-type="bibr" rid="bib73">R Development Core Team, 2010</xref>).</p></sec><sec id="s4-13"><title>Additional files</title><p><xref ref-type="supplementary-material" rid="SD1-data">Supplementary file 1</xref> contains additional Tables and <xref ref-type="supplementary-material" rid="SD2-data">Supplementary file 2</xref> data subsets. The genomic coordinates of all transcripts used in this study (GTF files) and the amino acid sequences corresponding to primary ORFs in lncRNA with coding scores significant at p &lt; 0.05 (FASTA files) are available at figshare (<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.6084/m9.figshare.1114969">http://dx.doi.org/10.6084/m9.figshare.1114969</ext-link>).</p></sec></sec></body><back><ack id="ack"><title>Acknowledgements</title><p>We acknowledge José Luis Villanueva-Cañas and Will Blevins for critical revision of the manuscript. We are grateful to Ivan Ovcharenko for advise on gene deserts. This work was funded by Ministerio de Economía y Competitividad (BFU2012-36820 and TIN2013-45732-C4-3-P) and Fundació ICREA (MMA).</p></ack><sec sec-type="additional-information"><title>Additional information</title><fn-group content-type="competing-interest"><title>Competing interests</title><fn fn-type="conflict" id="conf1"><p>The authors declare that no competing interests exist.</p></fn></fn-group><fn-group content-type="author-contribution"><title>Author contributions</title><fn fn-type="con" id="con1"><p>JR-O, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article</p></fn><fn fn-type="con" id="con2"><p>XM, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article</p></fn><fn fn-type="con" id="con3"><p>JAS, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article</p></fn><fn fn-type="con" id="con4"><p>MMA, Conception and design, Analysis and interpretation of data, Drafting or revising the article</p></fn></fn-group></sec><sec sec-type="supplementary-material"><title>Additional files</title><supplementary-material id="SD1-data"><object-id pub-id-type="doi">10.7554/eLife.03523.024</object-id><label>Supplementary file 1.</label><caption><title>Long non-coding RNAs as a source of new peptides. (<bold>A</bold>) Details on the number of coding transcripts associated with ribosomes. (<bold>B</bold>) ORF density and length in different types of transcripts. (<bold>C</bold>) Details on the number of non-coding transcripts associated with ribosomes. (<bold>D</bold>) Homology hits for ORFs. (<bold>E</bold>) GC content (%) in ORFs and complete sequences. (<bold>F</bold>) PN and PS values for different sequence subsets.</title><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.024">http://dx.doi.org/10.7554/eLife.03523.024</ext-link></p></caption><media mime-subtype="docx" mimetype="application" xlink:href="elife-03523-supp1-v1.docx"/></supplementary-material><supplementary-material id="SD2-data"><object-id pub-id-type="doi">10.7554/eLife.03523.025</object-id><label>Supplementary file 2.</label><caption><title>(<bold>A</bold>) Human ncRNA literature. (<bold>B</bold>) IncRNA homologies. (<bold>C</bold>) IncRNA top coding score. (<bold>D</bold>) Young codRNAe.</title><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.025">http://dx.doi.org/10.7554/eLife.03523.025</ext-link></p></caption><media mime-subtype="xls" mimetype="application" xlink:href="elife-03523-supp2-v1.xls"/></supplementary-material><sec sec-type="datasets"><title>Major datasets</title><p>The following previously published datasets were used:</p><p><related-object content-type="existing-dataset" id="dataro1" source-id="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30839" source-id-type="uri"><collab collab-type="author">Ingolia NT</collab>, <collab collab-type="author">Lareau LF</collab>, <collab collab-type="author">Weissman JS</collab>, <year>2011</year><x>, </x><source>Ribosome Profiling of Mouse Embryonic Stem Cells Reveals the Complexity of Mammalian Proteomes</source><x>, </x><ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30839">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30839</ext-link><x>, </x><comment>Publicly available at NCBI Gene Expression Omnibus.</comment></related-object></p><p><related-object content-type="existing-dataset" id="dataro2" source-id="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22004" source-id-type="uri"><collab collab-type="author">Guo H</collab>, <collab collab-type="author">Ingolia NT</collab>, <collab collab-type="author">Weissman JS</collab>, <collab collab-type="author">Bartel DP</collab>, <year>2010</year><x>, </x><source>Mammalian microRNAs predominantly act to decrease target mRNA levels</source><x>, </x><ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22004">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22004</ext-link><x>, </x><comment>Publicly available at NCBI Gene Expression Omnibus.</comment></related-object></p><p><related-object content-type="existing-dataset" id="dataro3" source-id="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE32900" source-id-type="uri"><collab collab-type="author">Pauli A</collab>, <collab collab-type="author">Valen E</collab>, <collab collab-type="author">Lin MF</collab>, <collab collab-type="author">Garber M</collab>, <collab collab-type="author">Vastenhouw NL</collab>, <collab collab-type="author">Levin JZ</collab>, <collab collab-type="author">Sandelin A</collab>, <collab collab-type="author">Rinn JL</collab>, <collab collab-type="author">Regev A</collab>, <collab collab-type="author">Schier AF</collab>, <year>2011</year><x>, </x><source>Comprehensive identification of long non-coding RNAs expressed during zebrafish embryogenesis</source><x>, </x><ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE32900">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE32900</ext-link><x>, </x><comment>Publicly available at NCBI Gene Expression Omnibus.</comment></related-object></p><p><related-object content-type="existing-dataset" id="dataro4" source-id="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE46512" source-id-type="uri"><collab collab-type="author">Chew G</collab>, <collab collab-type="author">Pauli A</collab>, <collab collab-type="author">Valen E</collab>, <collab collab-type="author">Schier A</collab>, <year>2013</year><x>, </x><source>Ribosome Profiling over a Zebrafish Developmental Timecourse</source><x>, </x><ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE46512">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE46512</ext-link><x>, </x><comment>Publicly available at NCBI Gene Expression Omnibus.</comment></related-object></p><p><related-object content-type="existing-dataset" id="dataro5" source-id="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE49197" source-id-type="uri"><collab collab-type="author">Dunn JG</collab>, <collab collab-type="author">Weissman JS</collab>, <year>2013</year><x>, </x><source>Ribosome profiling reveals pervasive and regulated stop codon readthrough in Drosophila melanogaster</source><x>, </x><ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE49197">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE49197</ext-link><x>, </x><comment>Publicly available at NCBI Gene Expression Omnibus.</comment></related-object></p><p><related-object content-type="existing-dataset" id="dataro6" source-id="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE50597" source-id-type="uri"><collab collab-type="author">Juntawong P</collab>, <collab collab-type="author">Girke T</collab>, <collab collab-type="author">Bailey-Serres J</collab>, <year>2013</year><x>, </x><source>High-resolution mapping of ribosome footprints from Arabidopsis thaliana</source><x>, </x><ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE50597">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE50597</ext-link><x>, </x><comment>Publicly available at NCBI Gene Expression Omnibus.</comment></related-object></p><p><related-object content-type="existing-dataset" id="dataro7" source-id="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE52119" source-id-type="uri"><collab collab-type="author">McManus CJ</collab>, <collab collab-type="author">May GE</collab>, <collab collab-type="author">Spealman P</collab>, <collab collab-type="author">Shteyman A</collab>, <year>2014</year><x>, </x><source>Ribosome profiling revelas post-transcriptional buffering of divergent gene expression in yeast</source><x>, </x><ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE52119">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE52119</ext-link><x>, </x><comment>Publicly available at NCBI Gene Expression Omnibus.</comment></related-object></p></sec></sec><ref-list><title>References</title><ref id="bib1"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Albà</surname><given-names>MM</given-names></name><name><surname>Castresana</surname><given-names>J</given-names></name></person-group><year>2005</year><article-title>Inverse relationship between evolutionary rate and age of mammalian genes</article-title><source>Molecular Biology and Evolution</source><volume>22</volume><fpage>598</fpage><lpage>606</lpage><pub-id pub-id-type="doi">10.1093/molbev/msi045</pub-id></element-citation></ref><ref id="bib2"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Altschul</surname><given-names>SF</given-names></name><name><surname>Madden</surname><given-names>TL</given-names></name><name><surname>Schäffer</surname><given-names>AA</given-names></name><name><surname>Zhang</surname><given-names>J</given-names></name><name><surname>Zhang</surname><given-names>Z</given-names></name><name><surname>Miller</surname><given-names>W</given-names></name><name><surname>Lipman</surname><given-names>DJ</given-names></name></person-group><year>1997</year><article-title>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</article-title><source>Nucleic Acids Research</source><volume>25</volume><fpage>3389</fpage><lpage>3402</lpage><pub-id pub-id-type="doi">10.1093/nar/25.17.3389</pub-id></element-citation></ref><ref id="bib3"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Andrews</surname><given-names>SJ</given-names></name><name><surname>Rothnagel</surname><given-names>JA</given-names></name></person-group><year>2014</year><article-title>Emerging evidence for functional peptides encoded by short open reading frames</article-title><source>Nature Reviews Genetics</source><volume>15</volume><fpage>193</fpage><lpage>204</lpage><pub-id pub-id-type="doi">10.1038/nrg3520</pub-id></element-citation></ref><ref id="bib4"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Artieri</surname><given-names>CG</given-names></name><name><surname>Fraser</surname><given-names>HB</given-names></name></person-group><year>2014</year><article-title>Evolution at two levels of gene expression in yeast</article-title><source>Genome Research</source><volume>24</volume><fpage>411</fpage><lpage>421</lpage><pub-id pub-id-type="doi">10.1101/gr.165522.113</pub-id></element-citation></ref><ref id="bib4a"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Aspden</surname><given-names>JL</given-names></name><name><surname>Eyre-Walker</surname><given-names>YC</given-names></name><name><surname>Philips</surname><given-names>RJ</given-names></name><name><surname>Amin</surname><given-names>U</given-names></name><name><surname>Mumtaz</surname><given-names>MA</given-names></name><name><surname>Brocard</surname><given-names>M</given-names></name><name><surname>Couso</surname><given-names>JP</given-names></name></person-group><year>2014</year><article-title>Extensive translation of small ORFs revealed by Poly-Ribo-Seq</article-title><source>eLife</source><fpage>e03528</fpage><pub-id pub-id-type="doi">10.7554/eLife.03528</pub-id></element-citation></ref><ref id="bib5"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bazzini</surname><given-names>AA</given-names></name><name><surname>Johnstone</surname><given-names>TG</given-names></name><name><surname>Christiano</surname><given-names>R</given-names></name><name><surname>Mackowiak</surname><given-names>SD</given-names></name><name><surname>Obermayer</surname><given-names>B</given-names></name><name><surname>Fleming</surname><given-names>ES</given-names></name><name><surname>Vejnar</surname><given-names>CE</given-names></name><name><surname>Lee</surname><given-names>MT</given-names></name><name><surname>Rajewsky</surname><given-names>N</given-names></name><name><surname>Walther</surname><given-names>TC</given-names></name><name><surname>Giraldez</surname><given-names>AJ</given-names></name></person-group><year>2014</year><article-title>Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation</article-title><source>The EMBO Journal</source><volume>33</volume><fpage>981</fpage><lpage>993</lpage><pub-id pub-id-type="doi">10.1002/embj.201488411</pub-id></element-citation></ref><ref id="bib6"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Brar</surname><given-names>GA</given-names></name><name><surname>Yassour</surname><given-names>M</given-names></name><name><surname>Friedman</surname><given-names>N</given-names></name><name><surname>Regev</surname><given-names>A</given-names></name><name><surname>Ingolia</surname><given-names>NT</given-names></name><name><surname>Weissman</surname><given-names>JS</given-names></name></person-group><year>2012</year><article-title>High-resolution view of the yeast meiotic program revealed by ribosome profiling</article-title><source>Science</source><volume>335</volume><fpage>552</fpage><lpage>557</lpage><pub-id pub-id-type="doi">10.1126/science.1215110</pub-id></element-citation></ref><ref id="bib7"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Brockdorff</surname><given-names>N</given-names></name><name><surname>Ashworth</surname><given-names>A</given-names></name><name><surname>Kay</surname><given-names>GF</given-names></name><name><surname>McCabe</surname><given-names>VM</given-names></name><name><surname>Norris</surname><given-names>DP</given-names></name><name><surname>Cooper</surname><given-names>PJ</given-names></name><name><surname>Swift</surname><given-names>S</given-names></name><name><surname>Rastan</surname><given-names>S</given-names></name></person-group><year>1992</year><article-title>The product of the mouse Xist gene Is a 15 Kb inactive X-specific transcript containing no conserved ORF and located in the nucleus</article-title><source>Cell</source><volume>71</volume><fpage>515</fpage><lpage>526</lpage><pub-id pub-id-type="doi">10.1016/0092-8674(92)90519-I</pub-id></element-citation></ref><ref id="bib8"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cabili</surname><given-names>MN</given-names></name><name><surname>Trapnell</surname><given-names>C</given-names></name><name><surname>Goff</surname><given-names>L</given-names></name><name><surname>Koziol</surname><given-names>M</given-names></name><name><surname>Tazon-Vega</surname><given-names>B</given-names></name><name><surname>Regev</surname><given-names>A</given-names></name><name><surname>Rinn</surname><given-names>JL</given-names></name></person-group><year>2011</year><article-title>Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses</article-title><source>Genes &amp; Development</source><volume>25</volume><fpage>1915</fpage><lpage>1927</lpage><pub-id pub-id-type="doi">10.1101/gad.17446611</pub-id></element-citation></ref><ref id="bib9"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cai</surname><given-names>J</given-names></name><name><surname>Zhao</surname><given-names>R</given-names></name><name><surname>Jiang</surname><given-names>H</given-names></name><name><surname>Wang</surname><given-names>W</given-names></name></person-group><year>2008</year><article-title>De novo origination of a new protein-coding gene in <italic>Saccharomyces cerevisiae</italic></article-title><source>Genetics</source><volume>179</volume><fpage>487</fpage><lpage>496</lpage><pub-id pub-id-type="doi">10.1534/genetics.107.084491</pub-id></element-citation></ref><ref id="bib10"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cai</surname><given-names>JJ</given-names></name><name><surname>Borenstein</surname><given-names>E</given-names></name><name><surname>Chen</surname><given-names>R</given-names></name><name><surname>Petrov</surname><given-names>DA</given-names></name></person-group><year>2009</year><article-title>Similarly strong purifying selection acts on human disease genes of all evolutionary ages</article-title><source>Genome Biology and Evolution</source><volume>1</volume><fpage>131</fpage><lpage>144</lpage><pub-id pub-id-type="doi">10.1093/gbe/evp013</pub-id></element-citation></ref><ref id="bib11"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Carninci</surname><given-names>P</given-names></name><name><surname>Kasukawa</surname><given-names>T</given-names></name><name><surname>Katayama</surname><given-names>S</given-names></name><name><surname>Gough</surname><given-names>J</given-names></name><name><surname>Frith</surname><given-names>MC</given-names></name><name><surname>Maeda</surname><given-names>N</given-names></name><name><surname>Oyama</surname><given-names>R</given-names></name><name><surname>Ravasi</surname><given-names>T</given-names></name><name><surname>Lenhard</surname><given-names>B</given-names></name><name><surname>Wells</surname><given-names>C</given-names></name><name><surname>Kodzius</surname><given-names>R</given-names></name><name><surname>Shimokawa</surname><given-names>K</given-names></name><name><surname>Bajic</surname><given-names>VB</given-names></name><name><surname>Brenner</surname><given-names>SE</given-names></name><name><surname>Batalov</surname><given-names>S</given-names></name><name><surname>Forrest</surname><given-names>AR</given-names></name><name><surname>Zavolan</surname><given-names>M</given-names></name><name><surname>Davis</surname><given-names>MJ</given-names></name><name><surname>Wilming</surname><given-names>LG</given-names></name><name><surname>Aidinis</surname><given-names>V</given-names></name><name><surname>Allen</surname><given-names>JE</given-names></name><name><surname>Ambesi-Impiombato</surname><given-names>A</given-names></name><name><surname>Apweiler</surname><given-names>R</given-names></name><name><surname>Aturaliya</surname><given-names>RN</given-names></name><name><surname>Bailey</surname><given-names>TL</given-names></name><name><surname>Bansal</surname><given-names>M</given-names></name><name><surname>Baxter</surname><given-names>L</given-names></name><name><surname>Beisel</surname><given-names>KW</given-names></name><name><surname>Bersano</surname><given-names>T</given-names></name><name><surname>Bono</surname><given-names>H</given-names></name><name><surname>Chalk</surname><given-names>AM</given-names></name><name><surname>Chiu</surname><given-names>KP</given-names></name><name><surname>Choudhary</surname><given-names>V</given-names></name><name><surname>Christoffels</surname><given-names>A</given-names></name><name><surname>Clutterbuck</surname><given-names>DR</given-names></name><name><surname>Crowe</surname><given-names>ML</given-names></name><name><surname>Dalla</surname><given-names>E</given-names></name><name><surname>Dalrymple</surname><given-names>BP</given-names></name><name><surname>de Bono</surname><given-names>B</given-names></name><name><surname>Della Gatta</surname><given-names>G</given-names></name><name><surname>di Bernardo</surname><given-names>D</given-names></name><name><surname>Down</surname><given-names>T</given-names></name><name><surname>Engstrom</surname><given-names>P</given-names></name><name><surname>Fagiolini</surname><given-names>M</given-names></name><name><surname>Faulkner</surname><given-names>G</given-names></name><name><surname>Fletcher</surname><given-names>CF</given-names></name><name><surname>Fukushima</surname><given-names>T</given-names></name><name><surname>Furuno</surname><given-names>M</given-names></name><name><surname>Futaki</surname><given-names>S</given-names></name><name><surname>Gariboldi</surname><given-names>M</given-names></name><name><surname>Georgii-Hemming</surname><given-names>P</given-names></name><name><surname>Gingeras</surname><given-names>TR</given-names></name><name><surname>Gojobori</surname><given-names>T</given-names></name><name><surname>Green</surname><given-names>RE</given-names></name><name><surname>Gustincich</surname><given-names>S</given-names></name><name><surname>Harbers</surname><given-names>M</given-names></name><name><surname>Hayashi</surname><given-names>Y</given-names></name><name><surname>Hensch</surname><given-names>TK</given-names></name><name><surname>Hirokawa</surname><given-names>N</given-names></name><name><surname>Hill</surname><given-names>D</given-names></name><name><surname>Huminiecki</surname><given-names>L</given-names></name><name><surname>Iacono</surname><given-names>M</given-names></name><name><surname>Ikeo</surname><given-names>K</given-names></name><name><surname>Iwama</surname><given-names>A</given-names></name><name><surname>Ishikawa</surname><given-names>T</given-names></name><name><surname>Jakt</surname><given-names>M</given-names></name><name><surname>Kanapin</surname><given-names>A</given-names></name><name><surname>Katoh</surname><given-names>M</given-names></name><name><surname>Kawasawa</surname><given-names>Y</given-names></name><name><surname>Kelso</surname><given-names>J</given-names></name><name><surname>Kitamura</surname><given-names>H</given-names></name><name><surname>Kitano</surname><given-names>H</given-names></name><name><surname>Kollias</surname><given-names>G</given-names></name><name><surname>Krishnan</surname><given-names>SP</given-names></name><name><surname>Kruger</surname><given-names>A</given-names></name><name><surname>Kummerfeld</surname><given-names>SK</given-names></name><name><surname>Kurochkin</surname><given-names>IV</given-names></name><name><surname>Lareau</surname><given-names>LF</given-names></name><name><surname>Lazarevic</surname><given-names>D</given-names></name><name><surname>Lipovich</surname><given-names>L</given-names></name><name><surname>Liu</surname><given-names>J</given-names></name><name><surname>Liuni</surname><given-names>S</given-names></name><name><surname>McWilliam</surname><given-names>S</given-names></name><name><surname>Madan Babu</surname><given-names>M</given-names></name><name><surname>Madera</surname><given-names>M</given-names></name><name><surname>Marchionni</surname><given-names>L</given-names></name><name><surname>Matsuda</surname><given-names>H</given-names></name><name><surname>Matsuzawa</surname><given-names>S</given-names></name><name><surname>Miki</surname><given-names>H</given-names></name><name><surname>Mignone</surname><given-names>F</given-names></name><name><surname>Miyake</surname><given-names>S</given-names></name><name><surname>Morris</surname><given-names>K</given-names></name><name><surname>Mottagui-Tabar</surname><given-names>S</given-names></name><name><surname>Mulder</surname><given-names>N</given-names></name><name><surname>Nakano</surname><given-names>N</given-names></name><name><surname>Nakauchi</surname><given-names>H</given-names></name><name><surname>Ng</surname><given-names>P</given-names></name><name><surname>Nilsson</surname><given-names>R</given-names></name><name><surname>Nishiguchi</surname><given-names>S</given-names></name><name><surname>Nishikawa</surname><given-names>S</given-names></name><name><surname>Nori</surname><given-names>F</given-names></name><name><surname>Ohara</surname><given-names>O</given-names></name><name><surname>Okazaki</surname><given-names>Y</given-names></name><name><surname>Orlando</surname><given-names>V</given-names></name><name><surname>Pang</surname><given-names>KC</given-names></name><name><surname>Pavan</surname><given-names>WJ</given-names></name><name><surname>Pavesi</surname><given-names>G</given-names></name><name><surname>Pesole</surname><given-names>G</given-names></name><name><surname>Petrovsky</surname><given-names>N</given-names></name><name><surname>Piazza</surname><given-names>S</given-names></name><name><surname>Reed</surname><given-names>J</given-names></name><name><surname>Reid</surname><given-names>JF</given-names></name><name><surname>Ring</surname><given-names>BZ</given-names></name><name><surname>Ringwald</surname><given-names>M</given-names></name><name><surname>Rost</surname><given-names>B</given-names></name><name><surname>Ruan</surname><given-names>Y</given-names></name><name><surname>Salzberg</surname><given-names>SL</given-names></name><name><surname>Sandelin</surname><given-names>A</given-names></name><name><surname>Schneider</surname><given-names>C</given-names></name><name><surname>Schönbach</surname><given-names>C</given-names></name><name><surname>Sekiguchi</surname><given-names>K</given-names></name><name><surname>Semple</surname><given-names>CA</given-names></name><name><surname>Seno</surname><given-names>S</given-names></name><name><surname>Sessa</surname><given-names>L</given-names></name><name><surname>Sheng</surname><given-names>Y</given-names></name><name><surname>Shibata</surname><given-names>Y</given-names></name><name><surname>Shimada</surname><given-names>H</given-names></name><name><surname>Shimada</surname><given-names>K</given-names></name><name><surname>Silva</surname><given-names>D</given-names></name><name><surname>Sinclair</surname><given-names>B</given-names></name><name><surname>Sperling</surname><given-names>S</given-names></name><name><surname>Stupka</surname><given-names>E</given-names></name><name><surname>Sugiura</surname><given-names>K</given-names></name><name><surname>Sultana</surname><given-names>R</given-names></name><name><surname>Takenaka</surname><given-names>Y</given-names></name><name><surname>Taki</surname><given-names>K</given-names></name><name><surname>Tammoja</surname><given-names>K</given-names></name><name><surname>Tan</surname><given-names>SL</given-names></name><name><surname>Tang</surname><given-names>S</given-names></name><name><surname>Taylor</surname><given-names>MS</given-names></name><name><surname>Tegner</surname><given-names>J</given-names></name><name><surname>Teichmann</surname><given-names>SA</given-names></name><name><surname>Ueda</surname><given-names>HR</given-names></name><name><surname>van Nimwegen</surname><given-names>E</given-names></name><name><surname>Verardo</surname><given-names>R</given-names></name><name><surname>Wei</surname><given-names>CL</given-names></name><name><surname>Yagi</surname><given-names>K</given-names></name><name><surname>Yamanishi</surname><given-names>H</given-names></name><name><surname>Zabarovsky</surname><given-names>E</given-names></name><name><surname>Zhu</surname><given-names>S</given-names></name><name><surname>Zimmer</surname><given-names>A</given-names></name><name><surname>Hide</surname><given-names>W</given-names></name><name><surname>Bult</surname><given-names>C</given-names></name><name><surname>Grimmond</surname><given-names>SM</given-names></name><name><surname>Teasdale</surname><given-names>RD</given-names></name><name><surname>Liu</surname><given-names>ET</given-names></name><name><surname>Brusic</surname><given-names>V</given-names></name><name><surname>Quackenbush</surname><given-names>J</given-names></name><name><surname>Wahlestedt</surname><given-names>C</given-names></name><name><surname>Mattick</surname><given-names>JS</given-names></name><name><surname>Hume</surname><given-names>DA</given-names></name><name><surname>Kai</surname><given-names>C</given-names></name><name><surname>Sasaki</surname><given-names>D</given-names></name><name><surname>Tomaru</surname><given-names>Y</given-names></name><name><surname>Fukuda</surname><given-names>S</given-names></name><name><surname>Kanamori-Katayama</surname><given-names>M</given-names></name><name><surname>Suzuki</surname><given-names>M</given-names></name><name><surname>Aoki</surname><given-names>J</given-names></name><name><surname>Arakawa</surname><given-names>T</given-names></name><name><surname>Iida</surname><given-names>J</given-names></name><name><surname>Imamura</surname><given-names>K</given-names></name><name><surname>Itoh</surname><given-names>M</given-names></name><name><surname>Kato</surname><given-names>T</given-names></name><name><surname>Kawaji</surname><given-names>H</given-names></name><name><surname>Kawagashira</surname><given-names>N</given-names></name><name><surname>Kawashima</surname><given-names>T</given-names></name><name><surname>Kojima</surname><given-names>M</given-names></name><name><surname>Kondo</surname><given-names>S</given-names></name><name><surname>Konno</surname><given-names>H</given-names></name><name><surname>Nakano</surname><given-names>K</given-names></name><name><surname>Ninomiya</surname><given-names>N</given-names></name><name><surname>Nishio</surname><given-names>T</given-names></name><name><surname>Okada</surname><given-names>M</given-names></name><name><surname>Plessy</surname><given-names>C</given-names></name><name><surname>Shibata</surname><given-names>K</given-names></name><name><surname>Shiraki</surname><given-names>T</given-names></name><name><surname>Suzuki</surname><given-names>S</given-names></name><name><surname>Tagami</surname><given-names>M</given-names></name><name><surname>Waki</surname><given-names>K</given-names></name><name><surname>Watahiki</surname><given-names>A</given-names></name><name><surname>Okamura-Oho</surname><given-names>Y</given-names></name><name><surname>Suzuki</surname><given-names>H</given-names></name><name><surname>Kawai</surname><given-names>J</given-names></name><name><surname>Hayashizaki</surname><given-names>Y</given-names></name>, <collab>FANTOM Consortium</collab>, <collab>RIKEN Genome Exploration Research Group and Genome Science Group (Genome Network Project Core Group)</collab></person-group><year>2005</year><article-title>The transcriptional landscape of the mammalian genome</article-title><source>Science</source><volume>309</volume><fpage>1559</fpage><lpage>1563</lpage><pub-id pub-id-type="doi">10.1126/science.1112014</pub-id></element-citation></ref><ref id="bib12"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Carvunis</surname><given-names>AR</given-names></name><name><surname>Rolland</surname><given-names>T</given-names></name><name><surname>Wapinski</surname><given-names>I</given-names></name><name><surname>Calderwood</surname><given-names>MA</given-names></name><name><surname>Yildirim</surname><given-names>MA</given-names></name><name><surname>Simonis</surname><given-names>N</given-names></name><name><surname>Charloteaux</surname><given-names>B</given-names></name><name><surname>Hidalgo</surname><given-names>CA</given-names></name><name><surname>Barbette</surname><given-names>J</given-names></name><name><surname>Santhanam</surname><given-names>B</given-names></name><name><surname>Brar</surname><given-names>GA</given-names></name><name><surname>Weissman</surname><given-names>JS</given-names></name><name><surname>Regev</surname><given-names>A</given-names></name><name><surname>Thierry-Mieg</surname><given-names>N</given-names></name><name><surname>Cusick</surname><given-names>ME</given-names></name><name><surname>Vidal</surname><given-names>M</given-names></name></person-group><year>2012</year><article-title>Proto-genes and de novo gene birth</article-title><source>Nature</source><volume>487</volume><fpage>370</fpage><lpage>374</lpage><pub-id pub-id-type="doi">10.1038/nature11184</pub-id></element-citation></ref><ref id="bib13"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname><given-names>S</given-names></name><name><surname>Zhang</surname><given-names>YE</given-names></name><name><surname>Long</surname><given-names>M</given-names></name></person-group><year>2010</year><article-title>New genes in <italic>Drosophila</italic> quickly become essential</article-title><source>Science</source><volume>330</volume><fpage>1682</fpage><lpage>1685</lpage><pub-id pub-id-type="doi">10.1126/science.1196380</pub-id></element-citation></ref><ref id="bib14"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chew</surname><given-names>GL</given-names></name><name><surname>Pauli</surname><given-names>A</given-names></name><name><surname>Rinn</surname><given-names>JL</given-names></name><name><surname>Regev</surname><given-names>A</given-names></name><name><surname>Schier</surname><given-names>AF</given-names></name><name><surname>Valen</surname><given-names>E</given-names></name></person-group><year>2013</year><article-title>Ribosome profiling reveals resemblance between long non-coding RNAs and 5’ leaders of coding RNAs</article-title><source>Development</source><volume>140</volume><fpage>2828</fpage><lpage>2834</lpage><pub-id pub-id-type="doi">10.1242/dev.098343</pub-id></element-citation></ref><ref id="bib15"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Crappé</surname><given-names>J</given-names></name><name><surname>Van Criekinge</surname><given-names>W</given-names></name><name><surname>Trooskens</surname><given-names>G</given-names></name><name><surname>Hayakawa</surname><given-names>E</given-names></name><name><surname>Luyten</surname><given-names>W</given-names></name><name><surname>Baggerman</surname><given-names>G</given-names></name><name><surname>Menschaert</surname><given-names>G</given-names></name></person-group><year>2013</year><article-title>Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs</article-title><source>BMC Genomics</source><volume>14</volume><fpage>648</fpage><pub-id pub-id-type="doi">10.1186/1471-2164-14-648</pub-id></element-citation></ref><ref id="bib16"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Derrien</surname><given-names>T</given-names></name><name><surname>Johnson</surname><given-names>R</given-names></name><name><surname>Bussotti</surname><given-names>G</given-names></name><name><surname>Tanzer</surname><given-names>A</given-names></name><name><surname>Djebali</surname><given-names>S</given-names></name><name><surname>Tilgner</surname><given-names>H</given-names></name><name><surname>Guernec</surname><given-names>G</given-names></name><name><surname>Martin</surname><given-names>D</given-names></name><name><surname>Merkel</surname><given-names>A</given-names></name><name><surname>Knowles</surname><given-names>DG</given-names></name><name><surname>Lagarde</surname><given-names>J</given-names></name><name><surname>Veeravalli</surname><given-names>L</given-names></name><name><surname>Ruan</surname><given-names>X</given-names></name><name><surname>Ruan</surname><given-names>Y</given-names></name><name><surname>Lassmann</surname><given-names>T</given-names></name><name><surname>Carninci</surname><given-names>P</given-names></name><name><surname>Brown</surname><given-names>JB</given-names></name><name><surname>Lipovich</surname><given-names>L</given-names></name><name><surname>Gonzalez</surname><given-names>JM</given-names></name><name><surname>Thomas</surname><given-names>M</given-names></name><name><surname>Davis</surname><given-names>CA</given-names></name><name><surname>Shiekhattar</surname><given-names>R</given-names></name><name><surname>Gingeras</surname><given-names>TR</given-names></name><name><surname>Hubbard</surname><given-names>TJ</given-names></name><name><surname>Notredame</surname><given-names>C</given-names></name><name><surname>Harrow</surname><given-names>J</given-names></name><name><surname>Guigó</surname><given-names>R</given-names></name></person-group><year>2012</year><article-title>The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression</article-title><source>Genome Research</source><volume>22</volume><fpage>1775</fpage><lpage>1789</lpage><pub-id pub-id-type="doi">10.1101/gr.132159.111</pub-id></element-citation></ref><ref id="bib17"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Dinger</surname><given-names>ME</given-names></name><name><surname>Pang</surname><given-names>KC</given-names></name><name><surname>Mercer</surname><given-names>TR</given-names></name><name><surname>Mattick</surname><given-names>JS</given-names></name></person-group><year>2008</year><article-title>Differentiating protein-coding and noncoding RNA: challenges and ambiguities</article-title><source>PLOS Computational Biology</source><volume>4</volume><fpage>e1000176</fpage><pub-id pub-id-type="doi">10.1371/journal.pcbi.1000176</pub-id></element-citation></ref><ref id="bib18"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Djebali</surname><given-names>S</given-names></name><name><surname>Davis</surname><given-names>CA</given-names></name><name><surname>Merkel</surname><given-names>A</given-names></name><name><surname>Dobin</surname><given-names>A</given-names></name><name><surname>Lassmann</surname><given-names>T</given-names></name><name><surname>Mortazavi</surname><given-names>A</given-names></name><name><surname>Tanzer</surname><given-names>A</given-names></name><name><surname>Lagarde</surname><given-names>J</given-names></name><name><surname>Lin</surname><given-names>W</given-names></name><name><surname>Schlesinger</surname><given-names>F</given-names></name><name><surname>Xue</surname><given-names>C</given-names></name><name><surname>Marinov</surname><given-names>GK</given-names></name><name><surname>Khatun</surname><given-names>J</given-names></name><name><surname>Williams</surname><given-names>BA</given-names></name><name><surname>Zaleski</surname><given-names>C</given-names></name><name><surname>Rozowsky</surname><given-names>J</given-names></name><name><surname>Röder</surname><given-names>M</given-names></name><name><surname>Kokocinski</surname><given-names>F</given-names></name><name><surname>Abdelhamid</surname><given-names>RF</given-names></name><name><surname>Alioto</surname><given-names>T</given-names></name><name><surname>Antoshechkin</surname><given-names>I</given-names></name><name><surname>Baer</surname><given-names>MT</given-names></name><name><surname>Bar</surname><given-names>NS</given-names></name><name><surname>Batut</surname><given-names>P</given-names></name><name><surname>Bell</surname><given-names>K</given-names></name><name><surname>Bell</surname><given-names>I</given-names></name><name><surname>Chakrabortty</surname><given-names>S</given-names></name><name><surname>Chen</surname><given-names>X</given-names></name><name><surname>Chrast</surname><given-names>J</given-names></name><name><surname>Curado</surname><given-names>J</given-names></name><name><surname>Derrien</surname><given-names>T</given-names></name><name><surname>Drenkow</surname><given-names>J</given-names></name><name><surname>Dumais</surname><given-names>E</given-names></name><name><surname>Dumais</surname><given-names>J</given-names></name><name><surname>Duttagupta</surname><given-names>R</given-names></name><name><surname>Falconnet</surname><given-names>E</given-names></name><name><surname>Fastuca</surname><given-names>M</given-names></name><name><surname>Fejes-Toth</surname><given-names>K</given-names></name><name><surname>Ferreira</surname><given-names>P</given-names></name><name><surname>Foissac</surname><given-names>S</given-names></name><name><surname>Fullwood</surname><given-names>MJ</given-names></name><name><surname>Gao</surname><given-names>H</given-names></name><name><surname>Gonzalez</surname><given-names>D</given-names></name><name><surname>Gordon</surname><given-names>A</given-names></name><name><surname>Gunawardena</surname><given-names>H</given-names></name><name><surname>Howald</surname><given-names>C</given-names></name><name><surname>Jha</surname><given-names>S</given-names></name><name><surname>Johnson</surname><given-names>R</given-names></name><name><surname>Kapranov</surname><given-names>P</given-names></name><name><surname>King</surname><given-names>B</given-names></name><name><surname>Kingswood</surname><given-names>C</given-names></name><name><surname>Luo</surname><given-names>OJ</given-names></name><name><surname>Park</surname><given-names>E</given-names></name><name><surname>Persaud</surname><given-names>K</given-names></name><name><surname>Preall</surname><given-names>JB</given-names></name><name><surname>Ribeca</surname><given-names>P</given-names></name><name><surname>Risk</surname><given-names>B</given-names></name><name><surname>Robyr</surname><given-names>D</given-names></name><name><surname>Sammeth</surname><given-names>M</given-names></name><name><surname>Schaffer</surname><given-names>L</given-names></name><name><surname>See</surname><given-names>LH</given-names></name><name><surname>Shahab</surname><given-names>A</given-names></name><name><surname>Skancke</surname><given-names>J</given-names></name><name><surname>Suzuki</surname><given-names>AM</given-names></name><name><surname>Takahashi</surname><given-names>H</given-names></name><name><surname>Tilgner</surname><given-names>H</given-names></name><name><surname>Trout</surname><given-names>D</given-names></name><name><surname>Walters</surname><given-names>N</given-names></name><name><surname>Wang</surname><given-names>H</given-names></name><name><surname>Wrobel</surname><given-names>J</given-names></name><name><surname>Yu</surname><given-names>Y</given-names></name><name><surname>Ruan</surname><given-names>X</given-names></name><name><surname>Hayashizaki</surname><given-names>Y</given-names></name><name><surname>Harrow</surname><given-names>J</given-names></name><name><surname>Gerstein</surname><given-names>M</given-names></name><name><surname>Hubbard</surname><given-names>T</given-names></name><name><surname>Reymond</surname><given-names>A</given-names></name><name><surname>Antonarakis</surname><given-names>SE</given-names></name><name><surname>Hannon</surname><given-names>G</given-names></name><name><surname>Giddings</surname><given-names>MC</given-names></name><name><surname>Ruan</surname><given-names>Y</given-names></name><name><surname>Wold</surname><given-names>B</given-names></name><name><surname>Carninci</surname><given-names>P</given-names></name><name><surname>Guigó</surname><given-names>R</given-names></name><name><surname>Gingeras</surname><given-names>TR</given-names></name></person-group><year>2012</year><article-title>Landscape of transcription in human cells</article-title><source>Nature</source><volume>489</volume><fpage>101</fpage><lpage>108</lpage><pub-id pub-id-type="doi">10.1038/nature11233</pub-id></element-citation></ref><ref id="bib19"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Donoghue</surname><given-names>MT</given-names></name><name><surname>Keshavaiah</surname><given-names>C</given-names></name><name><surname>Swamidatta</surname><given-names>SH</given-names></name><name><surname>Spillane</surname><given-names>C</given-names></name></person-group><year>2011</year><article-title>Evolutionary origins of <italic>Brassicaceae</italic> specific genes in <italic>Arabidopsis thaliana</italic></article-title><source>BMC Evolutionary Biology</source><volume>11</volume><fpage>47</fpage><pub-id pub-id-type="doi">10.1186/1471-2148-11-47</pub-id></element-citation></ref><ref id="bib20"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Dunn</surname><given-names>JG</given-names></name><name><surname>Foo</surname><given-names>CK</given-names></name><name><surname>Belletier</surname><given-names>NG</given-names></name><name><surname>Gavis</surname><given-names>ER</given-names></name><name><surname>Weissman</surname><given-names>JS</given-names></name></person-group><year>2013</year><article-title>Ribosome profiling reveals pervasive and regulated stop codon readthrough in <italic>Drosophila melanogaster</italic></article-title><source>eLife</source><volume>2</volume><fpage>e01179</fpage><pub-id pub-id-type="doi">10.7554/eLife.01179</pub-id></element-citation></ref><ref id="bib21"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ekman</surname><given-names>D</given-names></name><name><surname>Björklund</surname><given-names>AK</given-names></name><name><surname>Elofsson</surname><given-names>A</given-names></name></person-group><year>2007</year><article-title>Quantification of the elevated rate of domain rearrangements in metazoa</article-title><source>Journal of Molecular Biology</source><volume>372</volume><fpage>1337</fpage><lpage>1348</lpage><pub-id pub-id-type="doi">10.1016/j.jmb.2007.06.022</pub-id></element-citation></ref><ref id="bib22"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ekman</surname><given-names>D</given-names></name><name><surname>Elofsson</surname><given-names>A</given-names></name></person-group><year>2010</year><article-title>Identifying and quantifying orphan protein sequences in fungi</article-title><source>Journal of Molecular Biology</source><volume>396</volume><fpage>396</fpage><lpage>405</lpage><pub-id pub-id-type="doi">10.1016/j.jmb.2009.11.053</pub-id></element-citation></ref><ref id="bib23"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Eyre-Walker</surname><given-names>A</given-names></name></person-group><year>2002</year><article-title>Changing effective population size and the McDonald-Kreitman test</article-title><source>Genetics</source><volume>162</volume><fpage>2017</fpage><lpage>2024</lpage></element-citation></ref><ref id="bib24"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Fatica</surname><given-names>A</given-names></name><name><surname>Bozzoni</surname><given-names>I</given-names></name></person-group><year>2014</year><article-title>Long non-coding RNAs: new players in cell differentiation and development</article-title><source>Nature Reviews Genetics</source><volume>15</volume><fpage>7</fpage><lpage>21</lpage><pub-id pub-id-type="doi">10.1038/nrg3606</pub-id></element-citation></ref><ref id="bib25"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Flicek</surname><given-names>P</given-names></name><name><surname>Amode</surname><given-names>M</given-names></name><name><surname>Barrell</surname><given-names>D</given-names></name><name><surname>Beal</surname><given-names>K</given-names></name><name><surname>Brent</surname><given-names>S</given-names></name><name><surname>Carvalho-Silva</surname><given-names>D</given-names></name><name><surname>Clapham</surname><given-names>P</given-names></name><name><surname>Coates</surname><given-names>G</given-names></name><name><surname>Fairley</surname><given-names>S</given-names></name><name><surname>Fitzgerald</surname><given-names>S</given-names></name><name><surname>Gil</surname><given-names>L</given-names></name><name><surname>Gordon</surname><given-names>L</given-names></name><name><surname>Hendrix</surname><given-names>M</given-names></name><name><surname>Hourlier</surname><given-names>T</given-names></name><name><surname>Johnson</surname><given-names>N</given-names></name><name><surname>Kähäri</surname><given-names>AK</given-names></name><name><surname>Keefe</surname><given-names>D</given-names></name><name><surname>Keenan</surname><given-names>S</given-names></name><name><surname>Kinsella</surname><given-names>R</given-names></name><name><surname>Komorowska</surname><given-names>M</given-names></name><name><surname>Koscielny</surname><given-names>G</given-names></name><name><surname>Kulesha</surname><given-names>E</given-names></name><name><surname>Larsson</surname><given-names>P</given-names></name><name><surname>Longden</surname><given-names>I</given-names></name><name><surname>McLaren</surname><given-names>W</given-names></name><name><surname>Muffato</surname><given-names>M</given-names></name><name><surname>Overduin</surname><given-names>B</given-names></name><name><surname>Pignatelli</surname><given-names>M</given-names></name><name><surname>Pritchard</surname><given-names>B</given-names></name><name><surname>Riat</surname><given-names>HS</given-names></name><name><surname>Ritchie</surname><given-names>GR</given-names></name><name><surname>Ruffier</surname><given-names>M</given-names></name><name><surname>Schuster</surname><given-names>M</given-names></name><name><surname>Sobral</surname><given-names>D</given-names></name><name><surname>Tang</surname><given-names>YA</given-names></name><name><surname>Taylor</surname><given-names>K</given-names></name><name><surname>Trevanion</surname><given-names>S</given-names></name><name><surname>Vandrovcova</surname><given-names>J</given-names></name><name><surname>White</surname><given-names>S</given-names></name><name><surname>Wilson</surname><given-names>M</given-names></name><name><surname>Wilder</surname><given-names>SP</given-names></name><name><surname>Aken</surname><given-names>BL</given-names></name><name><surname>Birney</surname><given-names>E</given-names></name><name><surname>Cunningham</surname><given-names>F</given-names></name><name><surname>Dunham</surname><given-names>I</given-names></name><name><surname>Durbin</surname><given-names>R</given-names></name><name><surname>Fernández-Suarez</surname><given-names>XM</given-names></name><name><surname>Harrow</surname><given-names>J</given-names></name><name><surname>Herrero</surname><given-names>J</given-names></name><name><surname>Hubbard</surname><given-names>TJ</given-names></name><name><surname>Parker</surname><given-names>A</given-names></name><name><surname>Proctor</surname><given-names>G</given-names></name><name><surname>Spudich</surname><given-names>G</given-names></name><name><surname>Vogel</surname><given-names>J</given-names></name><name><surname>Yates</surname><given-names>A</given-names></name><name><surname>Zadissa</surname><given-names>A</given-names></name><name><surname>Searle</surname><given-names>SM</given-names></name></person-group><year>2012</year><article-title>Ensembl 2012</article-title><source>Nucleic Acids Research</source><volume>40</volume><fpage>D84</fpage><lpage>D90</lpage><pub-id pub-id-type="doi">10.1093/nar/gkr991</pub-id></element-citation></ref><ref id="bib26"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Frith</surname><given-names>MC</given-names></name><name><surname>Forrest</surname><given-names>AR</given-names></name><name><surname>Nourbakhsh</surname><given-names>E</given-names></name><name><surname>Pang</surname><given-names>KC</given-names></name><name><surname>Kai</surname><given-names>C</given-names></name><name><surname>Kawai</surname><given-names>J</given-names></name><name><surname>Carninci</surname><given-names>P</given-names></name><name><surname>Hayashizaki</surname><given-names>Y</given-names></name><name><surname>Bailey</surname><given-names>TL</given-names></name><name><surname>Grimmond</surname><given-names>SM</given-names></name></person-group><year>2006</year><article-title>The abundance of short proteins in the mammalian proteome</article-title><source>PLOS Genetics</source><volume>2</volume><fpage>e52</fpage><pub-id pub-id-type="doi">10.1371/journal.pgen.0020052</pub-id></element-citation></ref><ref id="bib27"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Galindo</surname><given-names>MI</given-names></name><name><surname>Pueyo</surname><given-names>JI</given-names></name><name><surname>Fouix</surname><given-names>S</given-names></name><name><surname>Bishop</surname><given-names>SA</given-names></name><name><surname>Couso</surname><given-names>JP</given-names></name></person-group><year>2007</year><article-title>Peptides encoded by short ORFs control development and define a new eukaryotic gene family</article-title><source>PLOS Biology</source><volume>5</volume><fpage>e106</fpage><pub-id pub-id-type="doi">10.1371/journal.pbio.0050106</pub-id></element-citation></ref><ref id="bib28"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Guo</surname><given-names>H</given-names></name><name><surname>Ingolia</surname><given-names>NT</given-names></name><name><surname>Weissman</surname><given-names>JS</given-names></name><name><surname>Bartel</surname><given-names>DP</given-names></name></person-group><year>2010</year><article-title>Mammalian microRNAs predominantly act to decrease target mRNA levels</article-title><source>Nature</source><volume>466</volume><fpage>835</fpage><lpage>840</lpage><pub-id pub-id-type="doi">10.1038/nature09267</pub-id></element-citation></ref><ref id="bib29"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Guttman</surname><given-names>M</given-names></name><name><surname>Rinn</surname><given-names>JL</given-names></name></person-group><year>2012</year><article-title>Modular regulatory principles of large non-coding RNAs</article-title><source>Nature</source><volume>482</volume><fpage>339</fpage><lpage>346</lpage><pub-id pub-id-type="doi">10.1038/nature10887</pub-id></element-citation></ref><ref id="bib30"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Guttman</surname><given-names>M</given-names></name><name><surname>Russell</surname><given-names>P</given-names></name><name><surname>Ingolia</surname><given-names>NT</given-names></name><name><surname>Weissman</surname><given-names>JS</given-names></name><name><surname>Lander</surname><given-names>E</given-names></name></person-group><year>2013</year><article-title>Ribosome profiling provides evidence that large noncoding RNAs do not encode proteins</article-title><source>Cell</source><volume>154</volume><fpage>240</fpage><lpage>251</lpage><pub-id pub-id-type="doi">10.1016/j.cell.2013.06.009</pub-id></element-citation></ref><ref id="bib31"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hanada</surname><given-names>K</given-names></name><name><surname>Higuchi-Takeuchi</surname><given-names>M</given-names></name><name><surname>Okamoto</surname><given-names>M</given-names></name><name><surname>Yoshizumi</surname><given-names>T</given-names></name><name><surname>Shimizu</surname><given-names>M</given-names></name><name><surname>Nakaminami</surname><given-names>K</given-names></name><name><surname>Nishi</surname><given-names>R</given-names></name><name><surname>Ohashi</surname><given-names>C</given-names></name><name><surname>Iida</surname><given-names>K</given-names></name><name><surname>Tanaka</surname><given-names>M</given-names></name><name><surname>Horii</surname><given-names>Y</given-names></name><name><surname>Kawashima</surname><given-names>M</given-names></name><name><surname>Matsui</surname><given-names>K</given-names></name><name><surname>Toyoda</surname><given-names>T</given-names></name><name><surname>Shinozaki</surname><given-names>K</given-names></name><name><surname>Seki</surname><given-names>M</given-names></name><name><surname>Matsui</surname><given-names>M</given-names></name></person-group><year>2013</year><article-title>Small open reading frames associated with morphogenesis are hidden in plant genomes</article-title><source>Proceedings of the National Academy of Sciences of USA</source><volume>110</volume><fpage>2395</fpage><lpage>2400</lpage><pub-id pub-id-type="doi">10.1073/pnas.1213958110</pub-id></element-citation></ref><ref id="bib32"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Harrow</surname><given-names>J</given-names></name><name><surname>Frankish</surname><given-names>A</given-names></name><name><surname>Gonzalez</surname><given-names>JM</given-names></name><name><surname>Tapanari</surname><given-names>E</given-names></name><name><surname>Diekhans</surname><given-names>M</given-names></name><name><surname>Kokocinski</surname><given-names>F</given-names></name><name><surname>Aken</surname><given-names>BL</given-names></name><name><surname>Barrell</surname><given-names>D</given-names></name><name><surname>Zadissa</surname><given-names>A</given-names></name><name><surname>Searle</surname><given-names>S</given-names></name><name><surname>Barnes</surname><given-names>I</given-names></name><name><surname>Bignell</surname><given-names>A</given-names></name><name><surname>Boychenko</surname><given-names>V</given-names></name><name><surname>Hunt</surname><given-names>T</given-names></name><name><surname>Kay</surname><given-names>M</given-names></name><name><surname>Mukherjee</surname><given-names>G</given-names></name><name><surname>Rajan</surname><given-names>J</given-names></name><name><surname>Despacio-Reyes</surname><given-names>G</given-names></name><name><surname>Saunders</surname><given-names>G</given-names></name><name><surname>Steward</surname><given-names>C</given-names></name><name><surname>Harte</surname><given-names>R</given-names></name><name><surname>Lin</surname><given-names>M</given-names></name><name><surname>Howald</surname><given-names>C</given-names></name><name><surname>Tanzer</surname><given-names>A</given-names></name><name><surname>Derrien</surname><given-names>T</given-names></name><name><surname>Chrast</surname><given-names>J</given-names></name><name><surname>Walters</surname><given-names>N</given-names></name><name><surname>Balasubramanian</surname><given-names>S</given-names></name><name><surname>Pei</surname><given-names>B</given-names></name><name><surname>Tress</surname><given-names>M</given-names></name><name><surname>Rodriguez</surname><given-names>JM</given-names></name><name><surname>Ezkurdia</surname><given-names>I</given-names></name><name><surname>van Baren</surname><given-names>J</given-names></name><name><surname>Brent</surname><given-names>M</given-names></name><name><surname>Haussler</surname><given-names>D</given-names></name><name><surname>Kellis</surname><given-names>M</given-names></name><name><surname>Valencia</surname><given-names>A</given-names></name><name><surname>Reymond</surname><given-names>A</given-names></name><name><surname>Gerstein</surname><given-names>M</given-names></name><name><surname>Guigó</surname><given-names>R</given-names></name><name><surname>Hubbard</surname><given-names>TJ</given-names></name></person-group><year>2012</year><article-title>GENCODE: the reference human genome annotation for The ENCODE Project</article-title><source>Genome Research</source><volume>22</volume><fpage>1760</fpage><lpage>1774</lpage><pub-id pub-id-type="doi">10.1101/gr.135350.111</pub-id></element-citation></ref><ref id="bib33"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hashimoto</surname><given-names>Y</given-names></name><name><surname>Niikura</surname><given-names>T</given-names></name><name><surname>Tajima</surname><given-names>H</given-names></name><name><surname>Yasukawa</surname><given-names>T</given-names></name><name><surname>Sudo</surname><given-names>H</given-names></name><name><surname>Ito</surname><given-names>Y</given-names></name><name><surname>Kita</surname><given-names>Y</given-names></name><name><surname>Kawasumi</surname><given-names>M</given-names></name><name><surname>Kouyama</surname><given-names>K</given-names></name><name><surname>Doyu</surname><given-names>M</given-names></name><name><surname>Sobue</surname><given-names>G</given-names></name><name><surname>Koide</surname><given-names>T</given-names></name><name><surname>Tsuji</surname><given-names>S</given-names></name><name><surname>Lang</surname><given-names>J</given-names></name><name><surname>Kurokawa</surname><given-names>K</given-names></name><name><surname>Nishimoto</surname><given-names>I</given-names></name></person-group><year>2001</year><article-title>A rescue factor abolishing neuronal cell death by a wide spectrum of familial Alzheimer’s disease genes and abeta</article-title><source>Proceedings of the National Academy of Sciences of USA</source><volume>98</volume><fpage>6336</fpage><lpage>6341</lpage><pub-id pub-id-type="doi">10.1073/pnas.101133498</pub-id></element-citation></ref><ref id="bib34"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname><given-names>Y</given-names></name><name><surname>Ainsley</surname><given-names>JA</given-names></name><name><surname>Reijmers</surname><given-names>LG</given-names></name><name><surname>Jackson</surname><given-names>FR</given-names></name></person-group><year>2013</year><article-title>Translational profiling of clock cells reveals circadianly synchronized protein synthesis</article-title><source>PLOS Biology</source><volume>11</volume><fpage>e1001703</fpage><pub-id pub-id-type="doi">10.1371/journal.pbio.1001703</pub-id></element-citation></ref><ref id="bib35"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ingolia</surname><given-names>NT</given-names></name></person-group><year>2014</year><article-title>Ribosome profiling: new views of translation, from single codons to genome scale</article-title><source>Nature Reviews Genetics</source><volume>15</volume><fpage>205</fpage><lpage>213</lpage><pub-id pub-id-type="doi">10.1038/nrg3645</pub-id></element-citation></ref><ref id="bib36"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ingolia</surname><given-names>NT</given-names></name><name><surname>Ghaemmaghami</surname><given-names>S</given-names></name><name><surname>Newman</surname><given-names>JR</given-names></name><name><surname>Weissman</surname><given-names>JS</given-names></name></person-group><year>2009</year><article-title>Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling</article-title><source>Science</source><volume>324</volume><fpage>218</fpage><lpage>223</lpage><pub-id pub-id-type="doi">10.1126/science.1168978</pub-id></element-citation></ref><ref id="bib37"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ingolia</surname><given-names>NT</given-names></name><name><surname>Lareau</surname><given-names>LF</given-names></name><name><surname>Weissman</surname><given-names>JS</given-names></name></person-group><year>2011</year><article-title>Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes</article-title><source>Cell</source><volume>147</volume><fpage>789</fpage><lpage>802</lpage><pub-id pub-id-type="doi">10.1016/j.cell.2011.10.002</pub-id></element-citation></ref><ref id="bib38"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jacob</surname><given-names>F</given-names></name></person-group><year>1977</year><article-title>Evolution and tinkering</article-title><source>Science</source><volume>196</volume><fpage>1161</fpage><lpage>1166</lpage><pub-id pub-id-type="doi">10.1126/science.860134</pub-id></element-citation></ref><ref id="bib39"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Juntawong</surname><given-names>P</given-names></name><name><surname>Girke</surname><given-names>T</given-names></name><name><surname>Bazin</surname><given-names>J</given-names></name><name><surname>Bailey-Serres</surname><given-names>J</given-names></name></person-group><year>2014</year><article-title>Translational dynamics revealed by genome-wide profiling of ribosome footprints in <italic>Arabidopsis</italic></article-title><source>Proceedings of the National Academy of Sciences of USA</source><volume>111</volume><fpage>E203</fpage><lpage>E212</lpage><pub-id pub-id-type="doi">10.1073/pnas.1317811111</pub-id></element-citation></ref><ref id="bib40"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kapranov</surname><given-names>P</given-names></name><name><surname>Cheng</surname><given-names>J</given-names></name><name><surname>Dike</surname><given-names>S</given-names></name><name><surname>Nix</surname><given-names>DA</given-names></name><name><surname>Duttagupta</surname><given-names>R</given-names></name><name><surname>Willingham</surname><given-names>AT</given-names></name><name><surname>Stadler</surname><given-names>PF</given-names></name><name><surname>Hertel</surname><given-names>J</given-names></name><name><surname>Hackermüller</surname><given-names>J</given-names></name><name><surname>Hofacker</surname><given-names>IL</given-names></name><name><surname>Bell</surname><given-names>I</given-names></name><name><surname>Cheung</surname><given-names>E</given-names></name><name><surname>Drenkow</surname><given-names>J</given-names></name><name><surname>Dumais</surname><given-names>E</given-names></name><name><surname>Patel</surname><given-names>S</given-names></name><name><surname>Helt</surname><given-names>G</given-names></name><name><surname>Ganesh</surname><given-names>M</given-names></name><name><surname>Ghosh</surname><given-names>S</given-names></name><name><surname>Piccolboni</surname><given-names>A</given-names></name><name><surname>Sementchenko</surname><given-names>V</given-names></name><name><surname>Tammana</surname><given-names>H</given-names></name><name><surname>Gingeras</surname><given-names>TR</given-names></name></person-group><year>2007</year><article-title>RNA maps reveal new RNA classes and a possible function for pervasive transcription</article-title><source>Science</source><volume>316</volume><fpage>1484</fpage><lpage>1488</lpage><pub-id pub-id-type="doi">10.1126/science.1138341</pub-id></element-citation></ref><ref id="bib41"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kastenmayer</surname><given-names>JP</given-names></name><name><surname>Ni</surname><given-names>L</given-names></name><name><surname>Chu</surname><given-names>A</given-names></name><name><surname>Kitchen</surname><given-names>LE</given-names></name><name><surname>Au</surname><given-names>WC</given-names></name><name><surname>Yang</surname><given-names>H</given-names></name><name><surname>Carter</surname><given-names>CD</given-names></name><name><surname>Wheeler</surname><given-names>D</given-names></name><name><surname>Davis</surname><given-names>RW</given-names></name><name><surname>Boeke</surname><given-names>JD</given-names></name><name><surname>Snyder</surname><given-names>MA</given-names></name><name><surname>Basrai</surname><given-names>MA</given-names></name></person-group><year>2006</year><article-title>Functional genomics of genes with small open reading frames ( sORFs ) in <italic>S. Cerevisiae</italic></article-title><source>Genome Research</source><volume>16</volume><fpage>365</fpage><lpage>373</lpage><pub-id pub-id-type="doi">10.1101/gr.4355406.7</pub-id></element-citation></ref><ref id="bib42"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Khalturin</surname><given-names>K</given-names></name><name><surname>Hemmrich</surname><given-names>G</given-names></name><name><surname>Fraune</surname><given-names>S</given-names></name><name><surname>Augustin</surname><given-names>R</given-names></name><name><surname>Bosch</surname><given-names>TC</given-names></name></person-group><year>2009</year><article-title>More than just orphans: are taxonomically-restricted genes important in evolution?</article-title><source>Trends in Genetics</source><volume>25</volume><fpage>404</fpage><lpage>413</lpage><pub-id pub-id-type="doi">10.1016/j.tig.2009.07.006</pub-id></element-citation></ref><ref id="bib43"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname><given-names>D</given-names></name><name><surname>Pertea</surname><given-names>G</given-names></name><name><surname>Trapnell</surname><given-names>C</given-names></name><name><surname>Pimentel</surname><given-names>H</given-names></name><name><surname>Kelley</surname><given-names>R</given-names></name><name><surname>Salzberg</surname><given-names>SL</given-names></name></person-group><year>2013</year><article-title>TopHat2: Accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions</article-title><source>Genome Biology</source><volume>14</volume><fpage>R36</fpage><pub-id pub-id-type="doi">10.1186/gb-2013-14-4-r36</pub-id></element-citation></ref><ref id="bib44"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname><given-names>MS</given-names></name><name><surname>Pinto</surname><given-names>SM</given-names></name><name><surname>Getnet</surname><given-names>D</given-names></name><name><surname>Nirujogi</surname><given-names>RS</given-names></name><name><surname>Manda</surname><given-names>SS</given-names></name><name><surname>Chaerkady</surname><given-names>R</given-names></name><name><surname>Madugundu</surname><given-names>AK</given-names></name><name><surname>Kelkar</surname><given-names>DS</given-names></name><name><surname>Isserlin</surname><given-names>R</given-names></name><name><surname>Jain</surname><given-names>S</given-names></name><name><surname>Thomas</surname><given-names>JK</given-names></name><name><surname>Muthusamy</surname><given-names>B</given-names></name><name><surname>Leal-Rojas</surname><given-names>P</given-names></name><name><surname>Kumar</surname><given-names>P</given-names></name><name><surname>Sahasrabuddhe</surname><given-names>NA</given-names></name><name><surname>Balakrishnan</surname><given-names>L</given-names></name><name><surname>Advani</surname><given-names>J</given-names></name><name><surname>George</surname><given-names>B</given-names></name><name><surname>Renuse</surname><given-names>S</given-names></name><name><surname>Selvan</surname><given-names>LD</given-names></name><name><surname>Patil</surname><given-names>AH</given-names></name><name><surname>Nanjappa</surname><given-names>V</given-names></name><name><surname>Radhakrishnan</surname><given-names>A</given-names></name><name><surname>Prasad</surname><given-names>S</given-names></name><name><surname>Subbannayya</surname><given-names>T</given-names></name><name><surname>Raju</surname><given-names>R</given-names></name><name><surname>Kumar</surname><given-names>M</given-names></name><name><surname>Sreenivasamurthy</surname><given-names>SK</given-names></name><name><surname>Marimuthu</surname><given-names>A</given-names></name><name><surname>Sathe</surname><given-names>GJ</given-names></name><name><surname>Chavan</surname><given-names>S</given-names></name><name><surname>Datta</surname><given-names>KK</given-names></name><name><surname>Subbannayya</surname><given-names>Y</given-names></name><name><surname>Sahu</surname><given-names>A</given-names></name><name><surname>Yelamanchi</surname><given-names>SD</given-names></name><name><surname>Jayaram</surname><given-names>S</given-names></name><name><surname>Rajagopalan</surname><given-names>P</given-names></name><name><surname>Sharma</surname><given-names>J</given-names></name><name><surname>Murthy</surname><given-names>KR</given-names></name><name><surname>Syed</surname><given-names>N</given-names></name><name><surname>Goel</surname><given-names>R</given-names></name><name><surname>Khan</surname><given-names>AA</given-names></name><name><surname>Ahmad</surname><given-names>S</given-names></name><name><surname>Dey</surname><given-names>G</given-names></name><name><surname>Mudgal</surname><given-names>K</given-names></name><name><surname>Chatterjee</surname><given-names>A</given-names></name><name><surname>Huang</surname><given-names>TC</given-names></name><name><surname>Zhong</surname><given-names>J</given-names></name><name><surname>Wu</surname><given-names>X</given-names></name><name><surname>Shaw</surname><given-names>PG</given-names></name><name><surname>Freed</surname><given-names>D</given-names></name><name><surname>Zahari</surname><given-names>MS</given-names></name><name><surname>Mukherjee</surname><given-names>KK</given-names></name><name><surname>Shankar</surname><given-names>S</given-names></name><name><surname>Mahadevan</surname><given-names>A</given-names></name><name><surname>Lam</surname><given-names>H</given-names></name><name><surname>Mitchell</surname><given-names>CJ</given-names></name><name><surname>Shankar</surname><given-names>SK</given-names></name><name><surname>Satishchandra</surname><given-names>P</given-names></name><name><surname>Schroeder</surname><given-names>JT</given-names></name><name><surname>Sirdeshmukh</surname><given-names>R</given-names></name><name><surname>Maitra</surname><given-names>A</given-names></name><name><surname>Leach</surname><given-names>SD</given-names></name><name><surname>Drake</surname><given-names>CG</given-names></name><name><surname>Halushka</surname><given-names>MK</given-names></name><name><surname>Prasad</surname><given-names>TS</given-names></name><name><surname>Hruban</surname><given-names>RH</given-names></name><name><surname>Kerr</surname><given-names>CL</given-names></name><name><surname>Bader</surname><given-names>GD</given-names></name><name><surname>Iacobuzio-Donahue</surname><given-names>CA</given-names></name><name><surname>Gowda</surname><given-names>H</given-names></name><name><surname>Pandey</surname><given-names>A</given-names></name></person-group><year>2014</year><article-title>A draft map of the human proteome</article-title><source>Nature</source><volume>509</volume><fpage>575</fpage><lpage>581</lpage><pub-id pub-id-type="doi">10.1038/nature13302</pub-id></element-citation></ref><ref id="bib45"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kondo</surname><given-names>T</given-names></name><name><surname>Hashimoto</surname><given-names>Y</given-names></name><name><surname>Kato</surname><given-names>K</given-names></name><name><surname>Inagaki</surname><given-names>S</given-names></name><name><surname>Hayashi</surname><given-names>S</given-names></name><name><surname>Kageyama</surname><given-names>Y</given-names></name></person-group><year>2007</year><article-title>Small peptide regulators of actin-based cell morphogenesis encoded by a polycistronic mRNA</article-title><source>Nature Cell Biology</source><volume>9</volume><fpage>660</fpage><lpage>665</lpage><pub-id pub-id-type="doi">10.1038/ncb1595</pub-id></element-citation></ref><ref id="bib46"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kutter</surname><given-names>C</given-names></name><name><surname>Watt</surname><given-names>S</given-names></name><name><surname>Stefflova</surname><given-names>K</given-names></name><name><surname>Wilson</surname><given-names>MD</given-names></name><name><surname>Goncalves</surname><given-names>A</given-names></name><name><surname>Ponting</surname><given-names>CP</given-names></name><name><surname>Odom</surname><given-names>DT</given-names></name><name><surname>Marques</surname><given-names>AC</given-names></name></person-group><year>2012</year><article-title>Rapid turnover of long noncoding RNAs and the evolution of gene expression</article-title><source>PLOS Genetics</source><volume>8</volume><fpage>e1002841</fpage><pub-id pub-id-type="doi">10.1371/journal.pgen.1002841</pub-id></element-citation></ref><ref id="bib47"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ladoukakis</surname><given-names>E</given-names></name><name><surname>Pereira</surname><given-names>V</given-names></name><name><surname>Magny</surname><given-names>EG</given-names></name><name><surname>Eyre-Walker</surname><given-names>A</given-names></name><name><surname>Couso</surname><given-names>JP</given-names></name></person-group><year>2011</year><article-title>Hundreds of putatively functional small open reading frames in <italic>Drosophila</italic></article-title><source>Genome Biology</source><volume>12</volume><fpage>R118</fpage><pub-id pub-id-type="doi">10.1186/gb-2011-12-11-r118</pub-id></element-citation></ref><ref id="bib48"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Langmead</surname><given-names>B</given-names></name><name><surname>Trapnell</surname><given-names>C</given-names></name><name><surname>Pop</surname><given-names>M</given-names></name><name><surname>Salzberg</surname><given-names>SL</given-names></name></person-group><year>2009</year><article-title>Ultrafast and memory-efficient alignment of short DNA sequences to the human genome</article-title><source>Genome Biology</source><volume>10</volume><fpage>R25</fpage><pub-id pub-id-type="doi">10.1186/gb-2009-10-3-r25</pub-id></element-citation></ref><ref id="bib49"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname><given-names>C</given-names></name><name><surname>Yen</surname><given-names>K</given-names></name><name><surname>Cohen</surname><given-names>P</given-names></name></person-group><year>2013</year><article-title>Humanin: a harbinger of mitochondrial-derived peptides?</article-title><source>Trends in Endocrinology and Metabolism</source><volume>24</volume><fpage>222</fpage><lpage>228</lpage><pub-id pub-id-type="doi">10.1016/j.tem.2013.01.005</pub-id></element-citation></ref><ref id="bib50"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Levine</surname><given-names>MT</given-names></name><name><surname>Jones</surname><given-names>CD</given-names></name><name><surname>Kern</surname><given-names>AD</given-names></name><name><surname>Lindfors</surname><given-names>HA</given-names></name><name><surname>Begun</surname><given-names>DJ</given-names></name></person-group><year>2006</year><article-title>Novel genes derived from noncoding DNA in <italic>Drosophila melanogaster</italic> are frequently X-linked and exhibit testis-biased expression</article-title><source>Proceedings of the National Academy of Sciences of USA</source><volume>103</volume><fpage>9935</fpage><lpage>9939</lpage><pub-id pub-id-type="doi">10.1073/pnas.0509809103</pub-id></element-citation></ref><ref id="bib51"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname><given-names>J</given-names></name><name><surname>Hutchison</surname><given-names>K</given-names></name><name><surname>Perrone-Bizzozero</surname><given-names>N</given-names></name><name><surname>Morgan</surname><given-names>M</given-names></name><name><surname>Sui</surname><given-names>J</given-names></name><name><surname>Calhoun</surname><given-names>V</given-names></name></person-group><year>2010</year><article-title>Identification of genetic and epigenetic marks involved in population structure</article-title><source>PLOS ONE</source><volume>5</volume><fpage>e13209</fpage><pub-id pub-id-type="doi">10.1371/journal.pone.0013209</pub-id></element-citation></ref><ref id="bib52"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname><given-names>J</given-names></name><name><surname>Jung</surname><given-names>C</given-names></name><name><surname>Xu</surname><given-names>J</given-names></name><name><surname>Wang</surname><given-names>H</given-names></name><name><surname>Deng</surname><given-names>S</given-names></name><name><surname>Bernad</surname><given-names>L</given-names></name><name><surname>Arenas-Huertero</surname><given-names>C</given-names></name><name><surname>Chua</surname><given-names>NH</given-names></name></person-group><year>2012</year><article-title>Genome-wide analysis uncovers regulation of long intergenic noncoding RNAs in <italic>Arabidopsis</italic></article-title><source>The Plant Cell</source><volume>24</volume><fpage>4333</fpage><lpage>4345</lpage><pub-id pub-id-type="doi">10.1105/tpc.112.102855</pub-id></element-citation></ref><ref id="bib53"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname><given-names>J</given-names></name><name><surname>Zhang</surname><given-names>Y</given-names></name><name><surname>Lei</surname><given-names>X</given-names></name><name><surname>Zhang</surname><given-names>Z</given-names></name></person-group><year>2008</year><article-title>Natural selection of protein structural and functional properties: a single nucleotide polymorphism perspective</article-title><source>Genome Biology</source><volume>9</volume><fpage>R69</fpage><pub-id pub-id-type="doi">10.1186/gb-2008-9-4-r69</pub-id></element-citation></ref><ref id="bib54"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Long</surname><given-names>M</given-names></name><name><surname>VanKuren</surname><given-names>NW</given-names></name><name><surname>Chen</surname><given-names>S</given-names></name><name><surname>Vibranovski</surname><given-names>MD</given-names></name></person-group><year>2013</year><article-title>New gene evolution: little did we know</article-title><source>Annual Review of Genetics</source><volume>47</volume><fpage>307</fpage><lpage>333</lpage><pub-id pub-id-type="doi">10.1146/annurev-genet-111212-133301</pub-id></element-citation></ref><ref id="bib55"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname><given-names>J</given-names></name><name><surname>Ward</surname><given-names>CC</given-names></name><name><surname>Jungreis</surname><given-names>I</given-names></name><name><surname>Slavoff</surname><given-names>SA</given-names></name><name><surname>Schwaid</surname><given-names>AG</given-names></name><name><surname>Neveu</surname><given-names>J</given-names></name><name><surname>Budnik</surname><given-names>BA</given-names></name><name><surname>Kellis</surname><given-names>M</given-names></name><name><surname>Saghatelian</surname><given-names>A</given-names></name></person-group><year>2014</year><article-title>Discovery of human sORF-encoded polypeptides (SEPs) in cell lines and tissue</article-title><source>Journal of Proteome Research</source><volume>13</volume><fpage>1757</fpage><lpage>1765</lpage><pub-id pub-id-type="doi">10.1021/pr401280w</pub-id></element-citation></ref><ref id="bib56"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Magny</surname><given-names>EG</given-names></name><name><surname>Pueyo</surname><given-names>JI</given-names></name><name><surname>Pearl</surname><given-names>FM</given-names></name><name><surname>Cespedes</surname><given-names>MA</given-names></name><name><surname>Niven</surname><given-names>JE</given-names></name><name><surname>Bishop</surname><given-names>SA</given-names></name><name><surname>Couso</surname><given-names>JP</given-names></name></person-group><year>2013</year><article-title>Conserved regulation of cardiac calcium uptake by peptides encoded in small open reading frames</article-title><source>Science</source><volume>341</volume><fpage>1116</fpage><lpage>1120</lpage><pub-id pub-id-type="doi">10.1126/science.1238802</pub-id></element-citation></ref><ref id="bib57"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>McManus</surname><given-names>CJ</given-names></name><name><surname>May</surname><given-names>GE</given-names></name><name><surname>Spealman</surname><given-names>P</given-names></name><name><surname>Shteyman</surname><given-names>A</given-names></name></person-group><year>2014</year><article-title>Ribosome profiling reveals post-transcriptional buffering of divergent gene expression in yeast</article-title><source>Genome Research</source><volume>24</volume><fpage>422</fpage><lpage>430</lpage><pub-id pub-id-type="doi">10.1101/gr.164996.113.Freely</pub-id></element-citation></ref><ref id="bib58"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Michel</surname><given-names>AM</given-names></name><name><surname>Choudhury</surname><given-names>KR</given-names></name><name><surname>Firth</surname><given-names>AE</given-names></name><name><surname>Ingolia</surname><given-names>NT</given-names></name><name><surname>Atkins</surname><given-names>JF</given-names></name><name><surname>Baranov</surname><given-names>PV</given-names></name></person-group><year>2012</year><article-title>Observation of dually decoded regions of the human genome using ribosome profiling data</article-title><source>Genome Research</source><volume>22</volume><fpage>2219</fpage><lpage>2229</lpage><pub-id pub-id-type="doi">10.1101/gr.133249.111</pub-id></element-citation></ref><ref id="bib59"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nagalakshmi</surname><given-names>U</given-names></name><name><surname>Wang</surname><given-names>Z</given-names></name><name><surname>Waern</surname><given-names>K</given-names></name><name><surname>Shou</surname><given-names>C</given-names></name><name><surname>Raha</surname><given-names>D</given-names></name><name><surname>Gerstein</surname><given-names>M</given-names></name><name><surname>Snyder</surname><given-names>M</given-names></name></person-group><year>2008</year><article-title>The transcriptional landscape of the yeast genome defined by RNA sequencing</article-title><source>Science</source><volume>320</volume><fpage>1344</fpage><lpage>1349</lpage><pub-id pub-id-type="doi">10.1126/science.1158441</pub-id></element-citation></ref><ref id="bib60"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Necsulea</surname><given-names>A</given-names></name><name><surname>Soumillon</surname><given-names>M</given-names></name><name><surname>Warnefors</surname><given-names>M</given-names></name><name><surname>Liechti</surname><given-names>A</given-names></name><name><surname>Daish</surname><given-names>T</given-names></name><name><surname>Zeller</surname><given-names>U</given-names></name><name><surname>Baker</surname><given-names>JC</given-names></name><name><surname>Grützner</surname><given-names>F</given-names></name><name><surname>Kaessmann</surname><given-names>H</given-names></name></person-group><year>2014</year><article-title>The evolution of lncRNA repertoires and expression patterns in tetrapods</article-title><source>Nature</source><volume>505</volume><fpage>635</fpage><lpage>640</lpage><pub-id pub-id-type="doi">10.1038/nature12943</pub-id></element-citation></ref><ref id="bib61"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nei</surname><given-names>M</given-names></name><name><surname>Gojobori</surname><given-names>T</given-names></name></person-group><year>1986</year><article-title>Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions</article-title><source>Molecular Biology and Evolution</source><volume>3</volume><fpage>418</fpage><lpage>426</lpage></element-citation></ref><ref id="bib62"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Neme</surname><given-names>R</given-names></name><name><surname>Tautz</surname><given-names>D</given-names></name></person-group><year>2013</year><article-title>Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution</article-title><source>BMC Genomics</source><volume>14</volume><fpage>117</fpage><pub-id pub-id-type="doi">10.1186/1471-2164-14-117</pub-id></element-citation></ref><ref id="bib63"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Neme</surname><given-names>R</given-names></name><name><surname>Tautz</surname><given-names>D</given-names></name></person-group><year>2014</year><article-title>Evolution: dynamics of de novo gene emergence</article-title><source>Current Biology</source><volume>24</volume><fpage>R238</fpage><lpage>R240</lpage><pub-id pub-id-type="doi">10.1016/j.cub.2014.02.016</pub-id></element-citation></ref><ref id="bib64"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Okazaki</surname><given-names>Y</given-names></name><name><surname>Furuno</surname><given-names>M</given-names></name><name><surname>Kasukawa</surname><given-names>T</given-names></name><name><surname>Adachi</surname><given-names>J</given-names></name><name><surname>Bono</surname><given-names>H</given-names></name><name><surname>Kondo</surname><given-names>S</given-names></name><name><surname>Nikaido</surname><given-names>I</given-names></name><name><surname>Osato</surname><given-names>N</given-names></name><name><surname>Saito</surname><given-names>R</given-names></name><name><surname>Suzuki</surname><given-names>H</given-names></name><name><surname>Yamanaka</surname><given-names>I</given-names></name><name><surname>Kiyosawa</surname><given-names>H</given-names></name><name><surname>Yagi</surname><given-names>K</given-names></name><name><surname>Tomaru</surname><given-names>Y</given-names></name><name><surname>Hasegawa</surname><given-names>Y</given-names></name><name><surname>Nogami</surname><given-names>A</given-names></name><name><surname>Schönbach</surname><given-names>C</given-names></name><name><surname>Gojobori</surname><given-names>T</given-names></name><name><surname>Baldarelli</surname><given-names>R</given-names></name><name><surname>Hill</surname><given-names>DP</given-names></name><name><surname>Bult</surname><given-names>C</given-names></name><name><surname>Hume</surname><given-names>DA</given-names></name><name><surname>Quackenbush</surname><given-names>J</given-names></name><name><surname>Schriml</surname><given-names>LM</given-names></name><name><surname>Kanapin</surname><given-names>A</given-names></name><name><surname>Matsuda</surname><given-names>H</given-names></name><name><surname>Batalov</surname><given-names>S</given-names></name><name><surname>Beisel</surname><given-names>KW</given-names></name><name><surname>Blake</surname><given-names>JA</given-names></name><name><surname>Bradt</surname><given-names>D</given-names></name><name><surname>Brusic</surname><given-names>V</given-names></name><name><surname>Chothia</surname><given-names>C</given-names></name><name><surname>Corbani</surname><given-names>LE</given-names></name><name><surname>Cousins</surname><given-names>S</given-names></name><name><surname>Dalla</surname><given-names>E</given-names></name><name><surname>Dragani</surname><given-names>TA</given-names></name><name><surname>Fletcher</surname><given-names>CF</given-names></name><name><surname>Forrest</surname><given-names>A</given-names></name><name><surname>Frazer</surname><given-names>KS</given-names></name><name><surname>Gaasterland</surname><given-names>T</given-names></name><name><surname>Gariboldi</surname><given-names>M</given-names></name><name><surname>Gissi</surname><given-names>C</given-names></name><name><surname>Godzik</surname><given-names>A</given-names></name><name><surname>Gough</surname><given-names>J</given-names></name><name><surname>Grimmond</surname><given-names>S</given-names></name><name><surname>Gustincich</surname><given-names>S</given-names></name><name><surname>Hirokawa</surname><given-names>N</given-names></name><name><surname>Jackson</surname><given-names>IJ</given-names></name><name><surname>Jarvis</surname><given-names>ED</given-names></name><name><surname>Kanai</surname><given-names>A</given-names></name><name><surname>Kawaji</surname><given-names>H</given-names></name><name><surname>Kawasawa</surname><given-names>Y</given-names></name><name><surname>Kedzierski</surname><given-names>RM</given-names></name><name><surname>King</surname><given-names>BL</given-names></name><name><surname>Konagaya</surname><given-names>A</given-names></name><name><surname>Kurochkin</surname><given-names>IV</given-names></name><name><surname>Lee</surname><given-names>Y</given-names></name><name><surname>Lenhard</surname><given-names>B</given-names></name><name><surname>Lyons</surname><given-names>PA</given-names></name><name><surname>Maglott</surname><given-names>DR</given-names></name><name><surname>Maltais</surname><given-names>L</given-names></name><name><surname>Marchionni</surname><given-names>L</given-names></name><name><surname>McKenzie</surname><given-names>L</given-names></name><name><surname>Miki</surname><given-names>H</given-names></name><name><surname>Nagashima</surname><given-names>T</given-names></name><name><surname>Numata</surname><given-names>K</given-names></name><name><surname>Okido</surname><given-names>T</given-names></name><name><surname>Pavan</surname><given-names>WJ</given-names></name><name><surname>Pertea</surname><given-names>G</given-names></name><name><surname>Pesole</surname><given-names>G</given-names></name><name><surname>Petrovsky</surname><given-names>N</given-names></name><name><surname>Pillai</surname><given-names>R</given-names></name><name><surname>Pontius</surname><given-names>JU</given-names></name><name><surname>Qi</surname><given-names>D</given-names></name><name><surname>Ramachandran</surname><given-names>S</given-names></name><name><surname>Ravasi</surname><given-names>T</given-names></name><name><surname>Reed</surname><given-names>JC</given-names></name><name><surname>Reed</surname><given-names>DJ</given-names></name><name><surname>Reid</surname><given-names>J</given-names></name><name><surname>Ring</surname><given-names>BZ</given-names></name><name><surname>Ringwald</surname><given-names>M</given-names></name><name><surname>Sandelin</surname><given-names>A</given-names></name><name><surname>Schneider</surname><given-names>C</given-names></name><name><surname>Semple</surname><given-names>CA</given-names></name><name><surname>Setou</surname><given-names>M</given-names></name><name><surname>Shimada</surname><given-names>K</given-names></name><name><surname>Sultana</surname><given-names>R</given-names></name><name><surname>Takenaka</surname><given-names>Y</given-names></name><name><surname>Taylor</surname><given-names>MS</given-names></name><name><surname>Teasdale</surname><given-names>RD</given-names></name><name><surname>Tomita</surname><given-names>M</given-names></name><name><surname>Verardo</surname><given-names>R</given-names></name><name><surname>Wagner</surname><given-names>L</given-names></name><name><surname>Wahlestedt</surname><given-names>C</given-names></name><name><surname>Wang</surname><given-names>Y</given-names></name><name><surname>Watanabe</surname><given-names>Y</given-names></name><name><surname>Wells</surname><given-names>C</given-names></name><name><surname>Wilming</surname><given-names>LG</given-names></name><name><surname>Wynshaw-Boris</surname><given-names>A</given-names></name><name><surname>Yanagisawa</surname><given-names>M</given-names></name><name><surname>Yang</surname><given-names>I</given-names></name><name><surname>Yang</surname><given-names>L</given-names></name><name><surname>Yuan</surname><given-names>Z</given-names></name><name><surname>Zavolan</surname><given-names>M</given-names></name><name><surname>Zhu</surname><given-names>Y</given-names></name><name><surname>Zimmer</surname><given-names>A</given-names></name><name><surname>Carninci</surname><given-names>P</given-names></name><name><surname>Hayatsu</surname><given-names>N</given-names></name><name><surname>Hirozane-Kishikawa</surname><given-names>T</given-names></name><name><surname>Konno</surname><given-names>H</given-names></name><name><surname>Nakamura</surname><given-names>M</given-names></name><name><surname>Sakazume</surname><given-names>N</given-names></name><name><surname>Sato</surname><given-names>K</given-names></name><name><surname>Shiraki</surname><given-names>T</given-names></name><name><surname>Waki</surname><given-names>K</given-names></name><name><surname>Kawai</surname><given-names>J</given-names></name><name><surname>Aizawa</surname><given-names>K</given-names></name><name><surname>Arakawa</surname><given-names>T</given-names></name><name><surname>Fukuda</surname><given-names>S</given-names></name><name><surname>Hara</surname><given-names>A</given-names></name><name><surname>Hashizume</surname><given-names>W</given-names></name><name><surname>Imotani</surname><given-names>K</given-names></name><name><surname>Ishii</surname><given-names>Y</given-names></name><name><surname>Itoh</surname><given-names>M</given-names></name><name><surname>Kagawa</surname><given-names>I</given-names></name><name><surname>Miyazaki</surname><given-names>A</given-names></name><name><surname>Sakai</surname><given-names>K</given-names></name><name><surname>Sasaki</surname><given-names>D</given-names></name><name><surname>Shibata</surname><given-names>K</given-names></name><name><surname>Shinagawa</surname><given-names>A</given-names></name><name><surname>Yasunishi</surname><given-names>A</given-names></name><name><surname>Yoshino</surname><given-names>M</given-names></name><name><surname>Waterston</surname><given-names>R</given-names></name><name><surname>Lander</surname><given-names>ES</given-names></name><name><surname>Rogers</surname><given-names>J</given-names></name><name><surname>Birney</surname><given-names>E</given-names></name><name><surname>Hayashizaki</surname><given-names>Y</given-names></name>, <collab>FANTOM Consortium</collab>, <collab>RIKEN Genome Exploration Research Group Phase I &amp; II Team</collab></person-group><year>2002</year><article-title>Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs</article-title><source>Nature</source><volume>420</volume><fpage>563</fpage><lpage>573</lpage><pub-id pub-id-type="doi">10.1038/nature01266</pub-id></element-citation></ref><ref id="bib65"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ovcharenko</surname><given-names>I</given-names></name><name><surname>Loots</surname><given-names>GG</given-names></name><name><surname>Nobrega</surname><given-names>MA</given-names></name><name><surname>Hardison</surname><given-names>RC</given-names></name><name><surname>Miller</surname><given-names>W</given-names></name><name><surname>Stubbs</surname><given-names>L</given-names></name></person-group><year>2005</year><article-title>Evolution and functional classification of vertebrate gene deserts</article-title><source>Genome Research</source><volume>15</volume><fpage>137</fpage><lpage>145</lpage><pub-id pub-id-type="doi">10.1101/gr.3015505</pub-id></element-citation></ref><ref id="bib66"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Palmieri</surname><given-names>N</given-names></name><name><surname>Kosiol</surname><given-names>C</given-names></name><name><surname>Schlötterer</surname><given-names>C</given-names></name></person-group><year>2014</year><article-title>The life cycle of <italic>Drosophila</italic> orphan genes</article-title><source>eLife</source><volume>3</volume><fpage>e01311</fpage><pub-id pub-id-type="doi">10.7554/eLife.01311</pub-id></element-citation></ref><ref id="bib67"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pauli</surname><given-names>A</given-names></name><name><surname>Norris</surname><given-names>ML</given-names></name><name><surname>Valen</surname><given-names>E</given-names></name><name><surname>Chew</surname><given-names>G</given-names></name><name><surname>Gagnon</surname><given-names>JA</given-names></name><name><surname>Zimmerman</surname><given-names>S</given-names></name><name><surname>Mitchell</surname><given-names>A</given-names></name><name><surname>Ma</surname><given-names>J</given-names></name><name><surname>Dubrulle</surname><given-names>J</given-names></name><name><surname>Reyon</surname><given-names>D</given-names></name><name><surname>Tsai</surname><given-names>SQ</given-names></name><name><surname>Joung</surname><given-names>JK</given-names></name><name><surname>Saghatelian</surname><given-names>A</given-names></name><name><surname>Schier</surname><given-names>AF</given-names></name></person-group><year>2014</year><article-title>Toddler: an embryonic signal that promotes cell movement via Apelin receptors</article-title><source>Science</source><volume>343</volume><fpage>1248636</fpage><pub-id pub-id-type="doi">10.1126/science.1248636</pub-id></element-citation></ref><ref id="bib68"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pauli</surname><given-names>A</given-names></name><name><surname>Valen</surname><given-names>E</given-names></name><name><surname>Lin</surname><given-names>MF</given-names></name><name><surname>Garber</surname><given-names>M</given-names></name><name><surname>Vastenhouw</surname><given-names>NL</given-names></name><name><surname>Levin</surname><given-names>JZ</given-names></name><name><surname>Fan</surname><given-names>L</given-names></name><name><surname>Sandelin</surname><given-names>A</given-names></name><name><surname>Rinn</surname><given-names>JL</given-names></name><name><surname>Regev</surname><given-names>A</given-names></name><name><surname>Schier</surname><given-names>AF</given-names></name></person-group><year>2012</year><article-title>Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis</article-title><source>Genome Research</source><volume>22</volume><fpage>577</fpage><lpage>591</lpage><pub-id pub-id-type="doi">10.1101/gr.133009.111.2011</pub-id></element-citation></ref><ref id="bib69"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ponjavic</surname><given-names>J</given-names></name><name><surname>Ponting</surname><given-names>CP</given-names></name><name><surname>Lunter</surname><given-names>G</given-names></name></person-group><year>2007</year><article-title>Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs</article-title><source>Genome Research</source><volume>17</volume><fpage>556</fpage><lpage>565</lpage><pub-id pub-id-type="doi">10.1101/gr.6036807</pub-id></element-citation></ref><ref id="bib70"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ponting</surname><given-names>CP</given-names></name><name><surname>Oliver</surname><given-names>PL</given-names></name><name><surname>Reik</surname><given-names>W</given-names></name></person-group><year>2009</year><article-title>Evolution and functions of long noncoding RNAs</article-title><source>Cell</source><volume>136</volume><fpage>629</fpage><lpage>641</lpage><pub-id pub-id-type="doi">10.1016/j.cell.2009.02.006</pub-id></element-citation></ref><ref id="bib71"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pruitt</surname><given-names>KD</given-names></name><name><surname>Brown</surname><given-names>GR</given-names></name><name><surname>Hiatt</surname><given-names>SM</given-names></name><name><surname>Thibaud-Nissen</surname><given-names>F</given-names></name><name><surname>Astashyn</surname><given-names>A</given-names></name><name><surname>Ermolaeva</surname><given-names>O</given-names></name><name><surname>Farrell</surname><given-names>CM</given-names></name><name><surname>Hart</surname><given-names>J</given-names></name><name><surname>Landrum</surname><given-names>MJ</given-names></name><name><surname>McGarvey</surname><given-names>KM</given-names></name><name><surname>Murphy</surname><given-names>MR</given-names></name><name><surname>O'Leary</surname><given-names>NA</given-names></name><name><surname>Pujar</surname><given-names>S</given-names></name><name><surname>Rajput</surname><given-names>B</given-names></name><name><surname>Rangwala</surname><given-names>SH</given-names></name><name><surname>Riddick</surname><given-names>LD</given-names></name><name><surname>Shkeda</surname><given-names>A</given-names></name><name><surname>Sun</surname><given-names>H</given-names></name><name><surname>Tamez</surname><given-names>P</given-names></name><name><surname>Tully</surname><given-names>RE</given-names></name><name><surname>Wallin</surname><given-names>C</given-names></name><name><surname>Webb</surname><given-names>D</given-names></name><name><surname>Weber</surname><given-names>J</given-names></name><name><surname>Wu</surname><given-names>W</given-names></name><name><surname>DiCuccio</surname><given-names>M</given-names></name><name><surname>Kitts</surname><given-names>P</given-names></name><name><surname>Maglott</surname><given-names>DR</given-names></name><name><surname>Murphy</surname><given-names>TD</given-names></name><name><surname>Ostell</surname><given-names>JM</given-names></name></person-group><year>2014</year><article-title>RefSeq: an update on mammalian reference sequences</article-title><source>Nucleic Acids Research</source><volume>42</volume><fpage>D756</fpage><lpage>D763</lpage><pub-id pub-id-type="doi">10.1093/nar/gkt1114</pub-id></element-citation></ref><ref id="bib72"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Quinlan</surname><given-names>AR</given-names></name><name><surname>Hall</surname><given-names>IM</given-names></name></person-group><year>2010</year><article-title>BEDTools: a flexible suite of utilities for comparing genomic features</article-title><source>Bioinformatics</source><volume>26</volume><fpage>841</fpage><lpage>842</lpage><pub-id pub-id-type="doi">10.1093/bioinformatics/btq033</pub-id></element-citation></ref><ref id="bib73"><element-citation publication-type="book"><person-group person-group-type="author"><collab>R Development Core Team</collab></person-group><year>2010</year><article-title>R: a language and environment for statistical computing</article-title><source>R Foundation for statistical computing</source><publisher-loc>Vienna Austria</publisher-loc><publisher-name>R Foundation for statistical computing</publisher-name></element-citation></ref><ref id="bib74"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Reinhardt</surname><given-names>JA</given-names></name><name><surname>Wanjiru</surname><given-names>BM</given-names></name><name><surname>Brant</surname><given-names>AT</given-names></name><name><surname>Saelao</surname><given-names>P</given-names></name><name><surname>Begun</surname><given-names>DJ</given-names></name><name><surname>Jones</surname><given-names>CD</given-names></name></person-group><year>2013</year><article-title>De novo ORFs in <italic>Drosophila</italic> are important to organismal fitness and evolved rapidly from previously non-coding sequences</article-title><source>PLOS Genetics</source><volume>9</volume><fpage>e1003860</fpage><pub-id pub-id-type="doi">10.1371/journal.pgen.1003860</pub-id></element-citation></ref><ref id="bib75"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Savard</surname><given-names>J</given-names></name><name><surname>Marques-Souza</surname><given-names>H</given-names></name><name><surname>Aranda</surname><given-names>M</given-names></name><name><surname>Tautz</surname><given-names>D</given-names></name></person-group><year>2006</year><article-title>A segmentation gene in tribolium produces a polycistronic mRNA that codes for multiple conserved peptides</article-title><source>Cell</source><volume>126</volume><fpage>559</fpage><lpage>569</lpage><pub-id pub-id-type="doi">10.1016/j.cell.2006.05.053</pub-id></element-citation></ref><ref id="bib76"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Scofield</surname><given-names>DG</given-names></name><name><surname>Hong</surname><given-names>X</given-names></name><name><surname>Lynch</surname><given-names>M</given-names></name></person-group><year>2007</year><article-title>Position of the final intron in full-length transcripts: determined by NMD?</article-title><source>Molecular Biology and Evolution</source><volume>24</volume><fpage>896</fpage><lpage>899</lpage><pub-id pub-id-type="doi">10.1093/molbev/msm010</pub-id></element-citation></ref><ref id="bib77"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sherry</surname><given-names>ST</given-names></name><name><surname>Ward</surname><given-names>MH</given-names></name><name><surname>Kholodov</surname><given-names>M</given-names></name><name><surname>Baker</surname><given-names>J</given-names></name><name><surname>Phan</surname><given-names>L</given-names></name><name><surname>Smigielski</surname><given-names>EM</given-names></name><name><surname>Sirotkin</surname><given-names>K</given-names></name></person-group><year>2001</year><article-title>dbSNP: the NCBI database of genetic variation</article-title><source>Nucleic Acids Research</source><volume>29</volume><fpage>308</fpage><lpage>311</lpage><pub-id pub-id-type="doi">10.1093/nar/29.1.308</pub-id></element-citation></ref><ref id="bib78"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Slavoff</surname><given-names>SA</given-names></name><name><surname>Heo</surname><given-names>J</given-names></name><name><surname>Budnik</surname><given-names>BA</given-names></name><name><surname>Hanakahi</surname><given-names>LA</given-names></name><name><surname>Saghatelian</surname><given-names>A</given-names></name></person-group><year>2014</year><article-title>A human short open reading frame (sORF)-encoded polypeptide that stimulates DNA end joining</article-title><source>The Journal of Biological Chemistry</source><volume>289</volume><fpage>10950</fpage><lpage>10957</lpage><pub-id pub-id-type="doi">10.1074/jbc.C113.533968</pub-id></element-citation></ref><ref id="bib79"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Slavoff</surname><given-names>SA</given-names></name><name><surname>Mitchell</surname><given-names>AJ</given-names></name><name><surname>Schwaid</surname><given-names>AG</given-names></name><name><surname>Cabili</surname><given-names>MN</given-names></name><name><surname>Ma</surname><given-names>J</given-names></name><name><surname>Levin</surname><given-names>JZ</given-names></name><name><surname>Karger</surname><given-names>AD</given-names></name><name><surname>Budnik</surname><given-names>BA</given-names></name><name><surname>Rinn</surname><given-names>JL</given-names></name><name><surname>Saghatelian</surname><given-names>A</given-names></name></person-group><year>2013</year><article-title>Peptidomic discovery of short open reading frame-encoded peptides in human cells</article-title><source>Nature Chemical Biology</source><volume>9</volume><fpage>59</fpage><lpage>64</lpage><pub-id pub-id-type="doi">10.1038/nchembio.1120</pub-id></element-citation></ref><ref id="bib80"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Smeds</surname><given-names>L</given-names></name><name><surname>Künstner</surname><given-names>A</given-names></name></person-group><year>2011</year><article-title>ConDe Tri - a content dependent read trimmer for illumina data</article-title><source>PLOS ONE</source><volume>6</volume><fpage>e26314</fpage><pub-id pub-id-type="doi">10.1371/journal.pone.0026314</pub-id></element-citation></ref><ref id="bib81"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname><given-names>L</given-names></name><name><surname>Luo</surname><given-names>H</given-names></name><name><surname>Bu</surname><given-names>D</given-names></name><name><surname>Zhao</surname><given-names>G</given-names></name><name><surname>Yu</surname><given-names>K</given-names></name><name><surname>Zhang</surname><given-names>C</given-names></name><name><surname>Liu</surname><given-names>Y</given-names></name><name><surname>Chen</surname><given-names>R</given-names></name><name><surname>Zhao</surname><given-names>Y</given-names></name></person-group><year>2013</year><article-title>Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts</article-title><source>Nucleic Acids Research</source><volume>41</volume><fpage>e166</fpage><pub-id pub-id-type="doi">10.1093/nar/gkt646</pub-id></element-citation></ref><ref id="bib82"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tani</surname><given-names>H</given-names></name><name><surname>Torimura</surname><given-names>M</given-names></name><name><surname>Akimitsu</surname><given-names>N</given-names></name></person-group><year>2013</year><article-title>The RNA degradation pathway regulates the function of GAS5 a non-coding RNA in mammalian cells</article-title><source>PLOS ONE</source><volume>8</volume><fpage>e55684</fpage><pub-id pub-id-type="doi">10.1371/journal.pone.0055684</pub-id></element-citation></ref><ref id="bib83"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tautz</surname><given-names>D</given-names></name></person-group><year>2009</year><article-title>Polycistronic peptide coding genes in eukaryotes–how widespread are they?</article-title><source>Briefings in Functional Genomics &amp; Proteomics</source><volume>8</volume><fpage>68</fpage><lpage>74</lpage><pub-id pub-id-type="doi">10.1093/bfgp/eln054</pub-id></element-citation></ref><ref id="bib84"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tautz</surname><given-names>D</given-names></name><name><surname>Domazet-Lošo</surname><given-names>T</given-names></name></person-group><year>2011</year><article-title>The evolutionary origin of orphan genes</article-title><source>Nature Reviews Genetics</source><volume>12</volume><fpage>692</fpage><lpage>702</lpage><pub-id pub-id-type="doi">10.1038/nrg3053</pub-id></element-citation></ref><ref id="bib85"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Toll-Riera</surname><given-names>M</given-names></name><name><surname>Bosch</surname><given-names>N</given-names></name><name><surname>Bellora</surname><given-names>N</given-names></name><name><surname>Castelo</surname><given-names>R</given-names></name><name><surname>Armengol</surname><given-names>L</given-names></name><name><surname>Estivill</surname><given-names>X</given-names></name><name><surname>Albà</surname><given-names>MM</given-names></name></person-group><year>2009</year><article-title>Origin of primate orphan genes: a comparative genomics approach</article-title><source>Molecular Biology and Evolution</source><volume>26</volume><fpage>603</fpage><lpage>612</lpage><pub-id pub-id-type="doi">10.1093/molbev/msn281</pub-id></element-citation></ref><ref id="bib86"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Trapnell</surname><given-names>C</given-names></name><name><surname>Williams</surname><given-names>BA</given-names></name><name><surname>Pertea</surname><given-names>G</given-names></name><name><surname>Mortazavi</surname><given-names>A</given-names></name><name><surname>Kwan</surname><given-names>G</given-names></name><name><surname>van Baren</surname><given-names>MJ</given-names></name><name><surname>Salzberg</surname><given-names>SL</given-names></name><name><surname>Wold</surname><given-names>BJ</given-names></name><name><surname>Pachter</surname><given-names>L</given-names></name></person-group><year>2010</year><article-title>Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation</article-title><source>Nature Biotechnology</source><volume>28</volume><fpage>511</fpage><lpage>515</lpage><pub-id pub-id-type="doi">10.1038/nbt.1621</pub-id></element-citation></ref><ref id="bib87"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ulitsky</surname><given-names>I</given-names></name><name><surname>Bartel</surname><given-names>DP</given-names></name></person-group><year>2013</year><article-title>lincRNAs: genomics, evolution, and mechanisms</article-title><source>Cell</source><volume>154</volume><fpage>26</fpage><lpage>46</lpage><pub-id pub-id-type="doi">10.1016/j.cell.2013.06.020</pub-id></element-citation></ref><ref id="bib88"><element-citation publication-type="journal"><person-group person-group-type="author"><collab>UniProt Consortium</collab></person-group><year>2014</year><article-title>Activities at the Universal Protein Resource (UniProt)</article-title><source>Nucleic Acids Research</source><volume>42</volume><fpage>D191</fpage><lpage>D198</lpage><pub-id pub-id-type="doi">10.1093/nar/gkt1140</pub-id></element-citation></ref><ref id="bib89"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>van Heesch</surname><given-names>S</given-names></name><name><surname>van Iterson</surname><given-names>M</given-names></name><name><surname>Jacobi</surname><given-names>J</given-names></name><name><surname>Boymans</surname><given-names>S</given-names></name><name><surname>Essers</surname><given-names>PB</given-names></name><name><surname>de Bruijn</surname><given-names>E</given-names></name><name><surname>Hao</surname><given-names>W</given-names></name><name><surname>Macinnes</surname><given-names>AW</given-names></name><name><surname>Cuppen</surname><given-names>E</given-names></name><name><surname>Simonis</surname><given-names>M</given-names></name></person-group><year>2014</year><article-title>Extensive localization of long noncoding RNAs to the cytosol and mono- and polyribosomal complexes</article-title><source>Genome Biology</source><volume>15</volume><fpage>R6</fpage><pub-id pub-id-type="doi">10.1186/gb-2014-15-1-r6</pub-id></element-citation></ref><ref id="bib90"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Vanderperre</surname><given-names>B</given-names></name><name><surname>Lucier</surname><given-names>JF</given-names></name><name><surname>Bissonnette</surname><given-names>C</given-names></name><name><surname>Motard</surname><given-names>J</given-names></name><name><surname>Tremblay</surname><given-names>G</given-names></name><name><surname>Vanderperre</surname><given-names>S</given-names></name><name><surname>Wisztorski</surname><given-names>M</given-names></name><name><surname>Salzet</surname><given-names>M</given-names></name><name><surname>Boisvert</surname><given-names>FM</given-names></name><name><surname>Roucou</surname><given-names>X</given-names></name></person-group><year>2013</year><article-title>Direct detection of alternative open reading frames translation products in human significantly expands the proteome</article-title><source>PLOS ONE</source><volume>8</volume><fpage>e70698</fpage><pub-id pub-id-type="doi">10.1371/journal.pone.0070698</pub-id></element-citation></ref><ref id="bib91"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Vasquez</surname><given-names>JJ</given-names></name><name><surname>Hon</surname><given-names>CC</given-names></name><name><surname>Vanselow</surname><given-names>JT</given-names></name><name><surname>Schlosser</surname><given-names>A</given-names></name><name><surname>Siegel</surname><given-names>TN</given-names></name></person-group><year>2014</year><article-title>Comparative ribosome profiling reveals extensive translational complexity in different <italic>Trypanosoma brucei</italic> life cycle stages</article-title><source>Nucleic Acids Research</source><volume>42</volume><fpage>3623</fpage><lpage>3637</lpage><pub-id pub-id-type="doi">10.1093/nar/gkt1386</pub-id></element-citation></ref><ref id="bib92"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname><given-names>L</given-names></name><name><surname>Park</surname><given-names>HJ</given-names></name><name><surname>Dasari</surname><given-names>S</given-names></name><name><surname>Wang</surname><given-names>S</given-names></name><name><surname>Kocher</surname><given-names>JP</given-names></name><name><surname>Li</surname><given-names>W</given-names></name></person-group><year>2013</year><article-title>CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model</article-title><source>Nucleic Acids Research</source><volume>41</volume><fpage>e74</fpage><pub-id pub-id-type="doi">10.1093/nar/gkt006</pub-id></element-citation></ref><ref id="bib93"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wilson</surname><given-names>BA</given-names></name><name><surname>Masel</surname><given-names>J</given-names></name></person-group><year>2011</year><article-title>Putatively noncoding transcripts show extensive association with ribosomes</article-title><source>Genome Biology and Evolution</source><volume>3</volume><fpage>1245</fpage><lpage>1252</lpage><pub-id pub-id-type="doi">10.1093/gbe/evr099</pub-id></element-citation></ref><ref id="bib94"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wissler</surname><given-names>L</given-names></name><name><surname>Gadau</surname><given-names>J</given-names></name><name><surname>Simola</surname><given-names>DF</given-names></name><name><surname>Helmkampf</surname><given-names>M</given-names></name><name><surname>Bornberg-Bauer</surname><given-names>E</given-names></name></person-group><year>2013</year><article-title>Mechanisms and dynamics of orphan gene emergence in insect genomes</article-title><source>Genome Biology and Evolution</source><volume>5</volume><fpage>439</fpage><lpage>455</lpage><pub-id pub-id-type="doi">10.1093/gbe/evt009</pub-id></element-citation></ref><ref id="bib95"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Xie</surname><given-names>C</given-names></name><name><surname>Zhang</surname><given-names>YE</given-names></name><name><surname>Chen</surname><given-names>JY</given-names></name><name><surname>Liu</surname><given-names>CJ</given-names></name><name><surname>Zhou</surname><given-names>WZ</given-names></name><name><surname>Li</surname><given-names>Y</given-names></name><name><surname>Zhang</surname><given-names>M</given-names></name><name><surname>Zhang</surname><given-names>R</given-names></name><name><surname>Wei</surname><given-names>L</given-names></name><name><surname>Li</surname><given-names>CY</given-names></name></person-group><year>2012</year><article-title>Hominoid-specific de novo protein-coding genes originating from long non-coding RNAs</article-title><source>PLOS Genetics</source><volume>8</volume><fpage>e1002942</fpage><pub-id pub-id-type="doi">10.1371/journal.pgen.1002942</pub-id></element-citation></ref></ref-list></back><sub-article article-type="article-commentary" id="SA1"><front-stub><article-id pub-id-type="doi">10.7554/eLife.03523.026</article-id><title-group><article-title>Decision letter</article-title></title-group><contrib-group content-type="section"><contrib contrib-type="editor"><name><surname>Tautz</surname><given-names>Diethard</given-names></name><role>Reviewing editor</role><aff><institution>Max Planck Institute for Evolutionary Biology</institution>, <country>Germany</country></aff></contrib></contrib-group></front-stub><body><boxed-text><p>eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see <ext-link ext-link-type="uri" xlink:href="http://elifesciences.org/review-process">review process</ext-link>). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.</p></boxed-text><p>Thank you for sending your work entitled “Long non-coding RNAs as a source of new peptides” for consideration at <italic>eLife</italic>. Your article has been favorably evaluated by Aviv Regev (Senior editor) and 3 reviewers, one of whom is a member of our Board of Reviewing Editors.</p><p>The Reviewing editor and the other reviewers discussed their comments extensively before we reached this decision, and the Reviewing editor has assembled the following comments to help you prepare a revised submission.</p><p>This paper adds to the current active discussion on the coding potential of lncRNAs, the role of short open reading frames and the emergence of new genes. The authors use published ribosome association datasets, but use several analysis pipelines that go beyond the analysis that has previously been done with these data. However, there are two comparable published papers that do similar analysis, namely <xref ref-type="bibr" rid="bib37">Ingolia et al. 2011</xref> and Guttman et al. 2013. While the former had suggested much translation of lncRNAs, the latter denies this, although there is some overlap of authors.</p><p>Major comments that need to be addressed by additional analyses and/clarification:</p><p>The crucial point is in how far ribosome associations are partly artifacts. The fraction of lncRNAs that the authors find to be associated with ribosomes is very large. Is this because the vast majority of transcripts actually are scanned by ribosomes, or could this observation be an artifact of the way the ribosome profiling data was analyzed? Pseudo-genes, and bona-fide human lncRNAs with known non-coding functions, were investigated, but the authors found evidence of ribosome binding in these putative negative controls, i.e. possible evidence for artifacts. This issue needs to be resolved more clearly, since the current paper should go beyond the Guttmann et al. 2013 line of arguments. It is necessary to provide a convincing demonstration that the analysis of ribosome profiling data is based on signal, not on noise. This could be done by different means, for instance by deriving null models describing what fraction of transcripts would be expected to be found associated with ribosomes if all of the ribosome profiling data was random, or by calculating otherwise a False Positive Rate or False Discovery Rate in the calling of “ribosome association” per transcript. You can also try something like the Bazzini 2014 or the Carvunis 2012 method. Another possibility is to choose a class of sequences with very low ribosomal association (maybe 3'UTRs are best) and use that as an upper bound on the false positive rate. The lower bound on the false positive rate is zero, and likely to remain there, but calculating an upper bound is something that should be added.</p><p>The claim is also made that these short and hard-to-annotate protein-coding genes look young according to protein-coding metrics and PN/PS. While plausible, it is also possible that they represent a mixture of genes of all ages combined with sequences that, while perhaps translated at some level, are not really genes in the functional sense of the word (at least not yet), and whose existence is therefore highly transient in evolutionary time. Contamination with these sequences could create the same statistical effect as having young genes. The presence of such contamination is also a critical piece of evidence in theories of how de novo protein birth occurs. This basically means that there are two interpretations of the data, both interesting, and not mutually exclusive. This needs to be better clarified. For instance the results of the BlastP search against codRNAs (supplementary file 8) and the results of the BlastX search against nr could be merged into one table or bar graph counting the number of BlastP and BlastX hits in lncRNA-noribo, lncRNA-ribo, and codRNA, separately, for each species.</p><p>It is unclear why the starved conditions (<xref ref-type="table" rid="tbl1">Table 1</xref>) were used in the yeast riboprofiling data. Starvation represses translation and therefore makes the data unreliable as a marker of translation. This should therefore be redone, perhaps with the rich media conditions of <xref ref-type="bibr" rid="bib36">Ingolia et al. 2009</xref>, but if this needs to be redone anyway, ideally with the much higher coverage data of Artieri &amp; Fraser.</p></body></sub-article><sub-article article-type="reply" id="SA2"><front-stub><article-id pub-id-type="doi">10.7554/eLife.03523.027</article-id><title-group><article-title>Author response</article-title></title-group></front-stub><body><p>Following the editor’s recommendation we have constructed a null model for random ribosome binding based on the signal in annotated 3’UTRs. The null model can be rejected for about 90% of the lncRNAs, and a similar percentage of codRNAs, with p-value &lt; 0.05, confirming that the signal in lncRNAs is not random. We have also reanalysed the yeast transcriptome using data from a recently published study (McManus et al., 2014). Although the main findings are similar to those reported using the original dataset, the ribosome profiling sequencing read coverage is higher and the yeast growth conditions standard, making the results more representative. We have performed homology searches with coding RNAs and lncRNAs not associated with ribosomes (in addition to lncRNAs associated with ribosomes as done previously). The results clearly show that lncRNAs display limited phylogenetic conservation when compared to coding RNAs.</p><p>We have also deposited the genomic coordinates of all transcripts used in this study and the amino acid sequences corresponding to primary ORFs in lncRNA with significant coding scores in figshare (<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.6084/m9.figshare.1114969">http://dx.doi.org/10.6084/m9.figshare.1114969</ext-link>).</p><p><italic>The crucial point is in how far ribosome associations are partly artifacts. The fraction of lncRNAs that the authors find to be associated with ribosomes is very large. Is this because the vast majority of transcripts actually are scanned by ribosomes, or could this observation be an artifact of the way the ribosome profiling data was analyzed? Pseudo-genes, and bona-fide human lncRNAs with known non-coding functions, were investigated, but the authors found evidence of ribosome binding in these putative negative controls, i.e. possible evidence for artifacts. This issue needs to be resolved more clearly, since the current paper should go beyond the Guttmann et al. 2013 line of arguments. It is necessary to provide a convincing demonstration that the analysis of ribosome profiling data is based on signal, not on noise. This could be done by different means, for instance by deriving null models describing what fraction of transcripts would be expected to be found associated with ribosomes if all of the ribosome profiling data was random, or by calculating otherwise a False Positive Rate or False Discovery Rate in the calling of “ribosome association” per transcript. You can also try something like the Bazzini 2014 or the Carvunis 2012 method. Another possibility is to choose a class of sequences with very low ribosomal association (maybe 3'UTRs are best) and use that as an upper bound on the false positive rate. The lower bound on the false positive rate is zero, and likely to remain there, but calculating an upper bound is something that should be added</italic>.</p><p>We have chosen as a null model annotated 3’UTRs from coding transcripts. The results provides strong evidence that the observed ribosome association in lncRNAs in not random and similar to codRNAs. See below the paragraph added in the manuscript text:</p><p>“In order to determine if the ribosome profiling signal in lncRNAs was different from noise, we compared ribosome density in the transcripts it to that in 3’untranslated regions (3’UTRs). More specifically, the null model consisted in a size-matched set of sequences containing randomly taken 3’UTR from annotated coding transcripts. Ribosome density was calculated as the number of ribosome profiling reads divided by RNA-seq reads, a ratio defined as Translational Efficiency (TE) (<xref ref-type="bibr" rid="bib37">Ingolia, Lareau, and Weissman 2011</xref>). Both codRNAs and lncRNAS displayed much higher TE values than 3’UTRs in all species studied (Wilcoxon test p-value &lt; 10<sup>-5</sup>, <xref ref-type="fig" rid="fig3">Figure 3</xref>). We could reject the null model for 90.12% of the lncRNAs and 87.19% of the codRNAs associated with ribosomes (p-value &lt; 0.05) (see details by species in <xref ref-type="table" rid="tbl2">Table 2</xref>, Stringent set). Therefore, we concluded that the density of ribosomes in lncRNAs is much higher than expected by spurious ribosome binding.”</p><p><italic>The claim is also made that these short and hard-to-annotate protein-coding genes look young according to protein-coding metrics and PN/PS. While plausible, it is also possible that they represent a mixture of genes of all ages combined with sequences that, while perhaps translated at some level, are not really genes in the functional sense of the word (at least not yet), and whose existence is therefore highly transient in evolutionary time. Contamination with these sequences could create the same statistical effect as having young genes. The presence of such contamination is also a critical piece of evidence in theories of how</italic> de novo <italic>protein birth occurs. This basically means that there are two interpretations of the data, both interesting, and not mutually exclusive. This needs to be better clarified. For instance the results of the BlastP search against codRNAs (supplementary file 8) and the results of the BlastX search against nr could be merged into one table or bar graph counting the number of BlastP and BlastX hits in lncRNA-noribo, lncRNA-ribo, and codRNA, separately, for each species</italic>.</p><p>Previous studies have found that lncRNAs tend to be poorly conserved across species (Guttman et al., Nature 2009; Marques and Ponting, Genome Biol. 2009; Cabili, Genes Dev. 2011). This question has been thoroughly examined in a recent paper that has dated the age of human lncRNAs using de novo assembled transcriptomes from 11 other vertebrate species (Necsulea et al., Nature 2014). The authors have reported that 81% of the human lncRNAs are not conserved beyond primates and can thus be considered “young”.</p><p>In order to further confirm this trend we have extended our initial sequence homology searches to all annotated coding transcripts in the six species studied and have compared the results obtained for putatively translated ORFs in lncRNAS to those in codRNAs. The results support the extended idea that most lncRNAs are young. For example whereas we can find only protein homologues for about 13-15% of the human and mouse lncRNAs associated with ribosomes this value is &gt; 95% for codRNAs. Details of these searches are shown in Supplementary file 1D and Supplementary file 2B.</p><p>If we discard the lncRNAs with homologues in the other species the percentage of lncRNAs associated with ribosomes continues to be very high (mouse 80.4% with respect to 81.9%, human 40.3% with respect to 43.1%) and the coding scores of the putatively translated ORFs significantly higher than those of random ORFs (new <xref ref-type="fig" rid="fig6s3">Figure 6–figure supplement 3</xref>). Therefore our observations are essentially unaltered after filtering out the oldest lncRNAs.</p><p>The idea that some of these lncRNAs are evolutionarily transient looks plausible to us. It has been shown that the rate of loss of young genes in the Drosophila obscura group is higher than that of older genes, explaining why the number of genes remains approximately constant despite a high rate of de novo gene emergence (Palmieri and Schlotterer, 2014 <italic>eLife</italic>). Similarly, we can speculate that lcnRNAs probably have a high probability of being lost during evolution.</p><p><italic>It is unclear why the starved conditions (</italic><xref ref-type="table" rid="tbl1"><italic>Table 1</italic></xref><italic>) were used in the yeast riboprofiling data. Starvation represses translation and therefore makes the data unreliable as a marker of translation. This should therefore be redone, perhaps with the rich media conditions of</italic> <xref ref-type="bibr" rid="bib36"><italic>Ingolia et al. 2009</italic></xref><italic>, but if this needs to be redone anyway, ideally with the much higher coverage data of Artieri &amp; Fraser</italic>.</p><p>The available ribosome profiling data for <xref ref-type="bibr" rid="bib4">Artieri and Fraser (2014)</xref> was for <italic>Saccharomyces</italic> hybrids. In order to use the same species as in the original study we downloaded the <italic>Saccharomyces cerevisiae</italic> data from a related paper, McManus et al. (2014). Although we obtained a lower number of lncRNAs than when using the dataset from <xref ref-type="bibr" rid="bib36">Ingolia et al. (2009)</xref>, the reconstructed lncRNAs were longer and thus probably more reliable. The conclusions drawn are similar to those already reported using the previous dataset.</p></body></sub-article></article>

	<?xml version="1.0" encoding="UTF-8"?><!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.1d3 20150301//EN" "JATS-archivearticle1.dtd"><article article-type="research-article" dtd-version="1.1d3" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><front><journal-meta><journal-id journal-id-type="nlm-ta">elife</journal-id><journal-id journal-id-type="hwp">eLife</journal-id><journal-id journal-id-type="publisher-id">eLife</journal-id><journal-title-group><journal-title>eLife</journal-title></journal-title-group><issn publication-format="electronic">2050-084X</issn><publisher><publisher-name>eLife Sciences Publications, Ltd</publisher-name></publisher></journal-meta><article-meta><article-id pub-id-type="publisher-id">03523</article-id><article-id pub-id-type="doi">10.7554/eLife.03523</article-id><article-categories><subj-group subj-group-type="display-channel"><subject>Research Article</subject></subj-group><subj-group subj-group-type="heading"><subject>Evolutionary Biology</subject></subj-group></article-categories><title-group><article-title>Long non-coding RNAs as a source of new peptides</article-title></title-group><contrib-group><contrib contrib-type="author" id="author-14738"><name><surname>Ruiz-Orera</surname><given-names>Jorge</given-names></name><xref ref-type="aff" rid="aff1"/><xref ref-type="fn" rid="con1"/><xref ref-type="fn" rid="conf1"/></contrib><contrib contrib-type="author" id="author-14739"><name><surname>Messeguer</surname><given-names>Xavier</given-names></name><xref ref-type="aff" rid="aff2"/><xref ref-type="other" rid="par-2"/><xref ref-type="fn" rid="con2"/><xref ref-type="fn" rid="conf1"/></contrib><contrib contrib-type="author" id="author-14740"><name><surname>Subirana</surname><given-names>Juan Antonio</given-names></name><xref ref-type="aff" rid="aff1"/><xref ref-type="aff" rid="aff3"/><xref ref-type="fn" rid="con3"/><xref ref-type="fn" rid="conf1"/></contrib><contrib contrib-type="author" corresp="yes" id="author-14450"><name><surname>Alba</surname><given-names>M Mar</given-names></name><xref ref-type="aff" rid="aff1"/><xref ref-type="aff" rid="aff4"/><xref ref-type="corresp" rid="cor1"></xref><xref ref-type="other" rid="par-1"/><xref ref-type="fn" rid="con4"/><xref ref-type="fn" rid="conf1"/></contrib><aff id="aff1"><institution content-type="dept">Evolutionary Genomics Group, Research Programme on Biomedical Informatics</institution>, <institution>Hospital del Mar Research Institute, Universitat Pompeu Fabra</institution>, <addr-line><named-content content-type="city">Barcelona</named-content></addr-line>, <country>Spain</country></aff><aff id="aff2"><institution content-type="dept">Llenguatges i Sistemes Informàtics</institution>, <institution>Universitat Politècnica de Catalunya</institution>, <addr-line><named-content content-type="city">Barcelona</named-content></addr-line>, <country>Spain</country></aff><aff id="aff3"><institution>Real Academia de Ciències i Arts de Barcelona</institution>, <addr-line><named-content content-type="city">Barcelona</named-content></addr-line>, <country>Spain</country></aff><aff id="aff4"><institution>Catalan Institution for Research and Advanced Studies</institution>, <addr-line><named-content content-type="city">Barcelona</named-content></addr-line>, <country>Spain</country></aff></contrib-group><contrib-group content-type="section"><contrib contrib-type="editor"><name><surname>Tautz</surname><given-names>Diethard</given-names></name><role>Reviewing editor</role><aff><institution>Max Planck Institute for Evolutionary Biology</institution>, <country>Germany</country></aff></contrib></contrib-group><author-notes><corresp id="cor1"><label></label>For correspondence: <email>malba@imim.es</email></corresp></author-notes><pub-date date-type="pub" publication-format="electronic"><day>16</day><month>09</month><year>2014</year></pub-date><pub-date pub-type="collection"><year>2014</year></pub-date><volume>3</volume><elocation-id>e03523</elocation-id><history><date date-type="received"><day>30</day><month>05</month><year>2014</year></date><date date-type="accepted"><day>11</day><month>08</month><year>2014</year></date></history><permissions><copyright-statement>© 2014, Ruiz-Orera et al</copyright-statement><copyright-year>2014</copyright-year><copyright-holder>Ruiz-Orera et al</copyright-holder><license xlink:href="http://creativecommons.org/licenses/by/4.0/"><license-p>This article is distributed under the terms of the <ext-link ext-link-type="uri" xlink:href="http://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution License</ext-link>, which permits unrestricted use and redistribution provided that the original author and source are credited.</license-p></license></permissions><self-uri content-type="pdf" xlink:href="elife-03523-v1.pdf"/><abstract><object-id pub-id-type="doi">10.7554/eLife.03523.001</object-id><p>Deep transcriptome sequencing has revealed the existence of many transcripts that lack long or conserved open reading frames (ORFs) and which have been termed long non-coding RNAs (lncRNAs). The vast majority of lncRNAs are lineage-specific and do not yet have a known function. In this study, we test the hypothesis that they may act as a repository for the synthesis of new peptides. We find that a large fraction of the lncRNAs expressed in cells from six different species is associated with ribosomes. The patterns of ribosome protection are consistent with the translation of short peptides. lncRNAs show similar coding potential and sequence constraints than evolutionary young protein coding sequences, indicating that they play an important role in de novo protein evolution.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.001">http://dx.doi.org/10.7554/eLife.03523.001</ext-link></p></abstract><abstract abstract-type="executive-summary"><object-id pub-id-type="doi">10.7554/eLife.03523.002</object-id><title>eLife digest</title><p>Despite the terms being largely interchangeable in modern language, ‘DNA’ and ‘gene’ do not mean the same thing. A gene is made of DNA and contains the instructions to make a protein, and it is the protein that performs the function of the gene. However, cells in the body also contain DNA that does not form genes. Far from being ‘junk’ DNA with no biological purpose; this DNA has a variety of roles, including affecting how other genes are used.</p><p>To produce a protein, the DNA sequence of a gene is transcribed into an intermediate molecule called RNA, which is then translated to produce a protein. So-called long non-coding RNA (lncRNA) molecules are also transcribed from DNA, but whether these are translated to make proteins has been a subject of much debate. Indeed, the function of the vast majority of lncRNA molecules is unknown.</p><p>Ruiz-Orera et al. analyzed RNA sequences collected from earlier experiments on six different species—humans, mice, fish, flies, yeast, and a plant—and found nearly 2500 as yet unstudied lncRNAs in addition to those previously identified. Many of the lncRNAs that Ruiz-Orera et al. investigated could be found lodged inside the cellular machinery used to translate RNA into proteins. Furthermore, these lncRNA molecules are oriented in the machinery as if they are primed and ready for translation, suggesting that many lncRNAs do produce proteins. However, it is unclear how many of these proteins have a useful function.</p><p>Very few lncRNAs were found in more than one species, suggesting that they have evolved recently. The properties of lncRNA molecules also show many similarities with the properties of ‘young’—recently evolved—genes that are known to produce proteins. The combined findings of Ruiz-Orera et al. therefore suggest that lncRNAs are important for developing new proteins. The emergence of proteins with new functions has been an important driving force in evolution, and this work provides important clues into the first steps of this process.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.002">http://dx.doi.org/10.7554/eLife.03523.002</ext-link></p></abstract><kwd-group kwd-group-type="author-keywords"><title>Author keywords</title><kwd>lncRNA</kwd><kwd>ribosome profiling</kwd><kwd>eukaryote</kwd><kwd>de novo gene evolution</kwd></kwd-group><kwd-group kwd-group-type="research-organism"><title>Research organism</title><kwd>Human</kwd></kwd-group><funding-group><award-group id="par-1"><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/501100003329</institution-id><institution>Ministerio de Economía y Competitividad</institution></institution-wrap></funding-source><award-id>BFU2012-36820</award-id><principal-award-recipient><name><surname>Alba</surname><given-names>M Mar</given-names></name></principal-award-recipient></award-group><award-group id="par-2"><funding-source><institution-wrap><institution-id institution-id-type="FundRef">http://dx.doi.org/10.13039/501100003329</institution-id><institution>Ministerio de Economía y Competitividad</institution></institution-wrap></funding-source><award-id>TIN2013-45732-C4-3-P</award-id><principal-award-recipient><name><surname>Messeguer</surname><given-names>Xavier</given-names></name></principal-award-recipient></award-group><funding-statement>The funders had no role in study design, data collection and interpretation, or the decision to submit the work for publication.</funding-statement></funding-group><custom-meta-group><custom-meta><meta-name>elife-xml-version</meta-name><meta-value>2</meta-value></custom-meta><custom-meta specific-use="meta-only"><meta-name>Author impact statement</meta-name><meta-value>Ribosome profiling data from several eukaryotic species provides strong evidence that many long non-coding RNA molecules encode novel short proteins.</meta-value></custom-meta></custom-meta-group></article-meta></front><body><sec id="s1" sec-type="intro"><title>Introduction</title><p>Studies performed over the past decade have unveiled a richer and more complex transcriptome than was previously appreciated (<xref ref-type="bibr" rid="bib64">Okazaki et al., 2002</xref>; <xref ref-type="bibr" rid="bib11">Carninci et al., 2005</xref>; <xref ref-type="bibr" rid="bib40">Kapranov et al., 2007</xref>; <xref ref-type="bibr" rid="bib69">Ponjavic et al., 2007</xref>). Thousands of long RNA molecules (>200 nucleotides) that do not display the typical properties of well-characterized protein-coding RNAs, and which have been named intergenic or long non-coding RNAs (lncRNAs), have been discovered in several eukaryotic genomes (<xref ref-type="bibr" rid="bib64">Okazaki et al., 2002</xref>; <xref ref-type="bibr" rid="bib70">Ponting et al., 2009</xref>; <xref ref-type="bibr" rid="bib8">Cabili et al., 2011</xref>; <xref ref-type="bibr" rid="bib52">Liu et al., 2012</xref>; <xref ref-type="bibr" rid="bib68">Pauli et al., 2012</xref>; <xref ref-type="bibr" rid="bib87">Ulitsky and Bartel, 2013</xref>). There are several lncRNAs that have regulatory functions (<xref ref-type="bibr" rid="bib29">Guttman and Rinn, 2012</xref>; <xref ref-type="bibr" rid="bib87">Ulitsky and Bartel, 2013</xref>). For example the X-inactive-specific transcript <italic>Xist</italic> regulates X chromosome inactivation in eutherian mammals (<xref ref-type="bibr" rid="bib7">Brockdorff et al., 1992</xref>). However, the vast majority of lncRNAs do not have a known function.</p><p>Intriguingly, several recent studies have noted that a large fraction of lncRNAs associate with ribosomes (<xref ref-type="bibr" rid="bib37">Ingolia et al., 2011</xref>; <xref ref-type="bibr" rid="bib5">Bazzini et al., 2014</xref>; <xref ref-type="bibr" rid="bib39">Juntawong et al., 2014</xref>; <xref ref-type="bibr" rid="bib89">van Heesch et al., 2014</xref>). Deep sequencing of ribosome-protected fragments, or ribosome profiling, provides detailed information on the regions that are translated in a transcript (<xref ref-type="bibr" rid="bib35">Ingolia, 2014</xref>). According to some studies, the patterns of ribosome protection indicate that lncRNAs are capable of translating short peptides (<xref ref-type="bibr" rid="bib37">Ingolia et al., 2011</xref>; <xref ref-type="bibr" rid="bib5">Bazzini et al., 2014</xref>; <xref ref-type="bibr" rid="bib39">Juntawong et al., 2014</xref>) although others have reached different conclusions (<xref ref-type="bibr" rid="bib30">Guttman et al., 2013</xref>). Many lncRNAs have the same structure as classical mRNAs: they are transcribed by polymerase II, capped and polyadenylated, and accumulate in the cytoplasm (<xref ref-type="bibr" rid="bib89">van Heesch et al., 2014</xref>). However, in contrast to typical protein-coding genes, they tend to contain few introns, are expressed at low levels, exhibit weak sequence constraints, and show limited phylogenetic conservation (<xref ref-type="bibr" rid="bib8">Cabili et al., 2011</xref>; <xref ref-type="bibr" rid="bib16">Derrien et al., 2012</xref>; <xref ref-type="bibr" rid="bib46">Kutter et al., 2012</xref>; <xref ref-type="bibr" rid="bib60">Necsulea et al., 2014</xref>).</p><p>The association of lncRNAs with ribosomes, and the fact that many of them appear to have arisen relatively recently in evolution, indicate that they could be an important source of new peptides. Levine et al., who described the first examples of de novo originated genes in <italic>Drosophila melanogaster</italic>, already noted that non-coding RNAs expressed at low levels could contribute to the birth of novel protein coding genes (<xref ref-type="bibr" rid="bib50">Levine et al., 2006</xref>). Cai et al. found a new protein coding gene in <italic>Saccharomyces cerevisiae</italic> likely to have been formed from a previously transcribed non-coding sequence (<xref ref-type="bibr" rid="bib9">Cai et al., 2008</xref>). Wilson and Masel observed that ribosome profiling reads from a yeast experiment often mapped to intergenic transcripts (<xref ref-type="bibr" rid="bib93">Wilson and Masel, 2011</xref>), and they proposed that this could help provide the raw material for the birth of new protein-coding genes. Another study in yeast found evidence of translation of short species-specific ORFs located in non-genic regions (<xref ref-type="bibr" rid="bib12">Carvunis et al., 2012</xref>). More generally, it is important to consider that de novo protein-coding gene evolution, which was once thought to be a very rare event, is now believed to be relatively common (<xref ref-type="bibr" rid="bib42">Khalturin et al., 2009</xref>; <xref ref-type="bibr" rid="bib85">Toll-Riera et al., 2009</xref>; <xref ref-type="bibr" rid="bib84">Tautz and Domazet-Lošo, 2011</xref>; <xref ref-type="bibr" rid="bib54">Long et al., 2013</xref>; <xref ref-type="bibr" rid="bib74">Reinhardt et al., 2013</xref>). Recently emerged proteins tend to be very short and evolve under weak evolutionary constraints (<xref ref-type="bibr" rid="bib1">Albà and Castresana, 2005</xref>; <xref ref-type="bibr" rid="bib50">Levine et al., 2006</xref>; <xref ref-type="bibr" rid="bib10">Cai et al., 2009</xref>; <xref ref-type="bibr" rid="bib51">Liu et al., 2010</xref>; <xref ref-type="bibr" rid="bib95">Xie et al., 2012</xref>; <xref ref-type="bibr" rid="bib66">Palmieri et al., 2014</xref>), properties that we also expect to find in the putative ORFs of lncRNAs.</p><p>The idea that lncRNAs serve as a repository for the evolution of new peptides is appealing but the evidence is still fragmented. In this study, we have analyzed ribosome profiling experiments performed in six different species and measured the sequence coding potential and selective constraints of the putatively translated ORFs in lncRNAs and codRNAs. We have discovered that lncRNAs show very similar characteristics to evolutionary young protein coding genes (lineage-specific proteins). The results strongly support a role for lncRNAs in the production of new peptides.</p></sec><sec id="s2" sec-type="results"><title>Results</title><sec id="s2-1"><title>Characterization of coding and long non-coding transcripts</title><p>We obtained polyA+ RNA and ribosome profiling sequencing data from six different published experiments performed in diverse eukaryotic species, mouse (<italic>Mus musculus</italic>), human (<italic>Homo sapiens</italic>, HeLa cells), zebrafish (<italic>Danio rerio</italic>), fruit fly (<italic>D. melanogaster</italic>), <italic>Arabidopsis</italic> (<italic>A. thaliana),</italic> and yeast (<italic>S. cerevisiae</italic>) (<xref ref-type="table" rid="tbl1">Table 1</xref>). After read mapping and transcript assembly, we classified the expressed transcripts longer than 200 nucleotides into coding and long non-coding classes (codRNAs and lncRNAs, respectively, <xref ref-type="table" rid="tbl2">Table 2</xref>).<table-wrap id="tbl1" position="float"><object-id pub-id-type="doi">10.7554/eLife.03523.003</object-id><label>Table 1.</label><caption><p>Data sets used in the study</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.003">http://dx.doi.org/10.7554/eLife.03523.003</ext-link></p></caption><table frame="hsides" rules="groups"><thead><tr><th colspan="2">Species</th><th>GEO Accession</th><th>Mapped reads (millions)</th><th>Max read length (bp)</th><th>Description</th><th>Reference</th></tr></thead><tbody><tr><td rowspan="2">Mouse <italic>M. musculus</italic></td><td>RNA-seq</td><td>GSE30839</td><td>226.0</td><td>43</td><td rowspan="2">ES cells, E14</td><td rowspan="2"><xref ref-type="bibr" rid="bib37">Ingolia et al., 2011</xref></td></tr><tr><td>Ribosome profiling</td><td>GSE30839</td><td>39.2</td><td>47</td></tr><tr><td rowspan="2">Human <italic>H. sapiens</italic></td><td>RNA-seq</td><td>GSE22004</td><td>29.8</td><td>36</td><td rowspan="2">HeLa cells</td><td rowspan="2"><xref ref-type="bibr" rid="bib28">Guo et al., 2010</xref></td></tr><tr><td>Ribosome profiling</td><td>GSE22004</td><td>78.3</td><td>36</td></tr><tr><td rowspan="2">Zebrafish <italic>D. rerio</italic></td><td>RNA-seq</td><td>GSE32900</td><td>1382.2</td><td>2 × 75</td><td rowspan="2">Series of developmental stages</td><td rowspan="2"><xref ref-type="bibr" rid="bib14">Chew et al., 2013</xref></td></tr><tr><td>Ribosome profiling</td><td>GSE46512</td><td>1040.0</td><td>44</td></tr><tr><td rowspan="2">Fruit fly <italic>D. melanogaster</italic></td><td>RNA-seq</td><td>GSE49197</td><td>1317.9</td><td>50</td><td rowspan="2">0–2hr embryos, wild type</td><td rowspan="2"><xref ref-type="bibr" rid="bib20">Dunn et al., 2013</xref></td></tr><tr><td>Ribosome profiling</td><td>GSE49197</td><td>105.7</td><td>50</td></tr><tr><td rowspan="2">Arabidopsis <italic>A. thaliana</italic></td><td>RNA-seq</td><td>GSE50597</td><td>79.8</td><td>51</td><td rowspan="2">No stress conditions, TRAP purification</td><td rowspan="2"><xref ref-type="bibr" rid="bib39">Juntawong et al., 2014</xref></td></tr><tr><td>Ribosome profiling</td><td>GSE50597</td><td>140.3</td><td>51</td></tr><tr><td rowspan="2">Yeast <italic>S. cerevisiae</italic></td><td>RNA-seq</td><td>GSE52119</td><td>20.54</td><td>50</td><td rowspan="2">GSY83, diploid</td><td rowspan="2"><xref ref-type="bibr" rid="bib57">McManus et al., 2014</xref></td></tr><tr><td>Ribosome profiling</td><td>GSE52119</td><td>6.83</td><td>50</td></tr></tbody></table></table-wrap><table-wrap id="tbl2" position="float"><object-id pub-id-type="doi">10.7554/eLife.03523.004</object-id><label>Table 2.</label><caption><p>Fraction of transcripts associated with ribosomes</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.004">http://dx.doi.org/10.7554/eLife.03523.004</ext-link></p></caption><table frame="hsides" rules="groups"><thead><tr><th/><th colspan="3">codRNA</th><th colspan="3">lncRNA</th></tr><tr><th/><th>Expressed</th><th colspan="2">Associated with ribosomes (RP)</th><th>Expressed</th><th colspan="2">Associated with ribosomes (RP)</th></tr><tr><th/><th/><th>Total</th><th>Stringent</th><th/><th>Total</th><th>Stringent</th></tr></thead><tbody><tr><td>Mouse</td><td>14,245</td><td align="char" char="(">14,196 (99.7%)</td><td align="char" char="(">13,918 (97.7%)</td><td>476</td><td align="char" char="(">390 (81.9%)</td><td align="char" char="(">367 (77.1%)</td></tr><tr><td>Human</td><td>17,011</td><td align="char" char="(">16,630 (97.8%)</td><td align="char" char="(">16,617 (97.7%)</td><td>934</td><td align="char" char="(">403 (43.1%)</td><td align="char" char="(">343 (36.7%)</td></tr><tr><td>Zebrafish</td><td>12,595</td><td align="char" char="(">11,643 (92.4%)</td><td align="char" char="(">11,637 (92.4%)</td><td>2392</td><td align="char" char="(">726 (30.4%)</td><td align="char" char="(">684 (28.6%)</td></tr><tr><td>Fruit fly</td><td>8041</td><td align="char" char="(">8031 (99.9%)</td><td align="char" char="(">7623 (94.8%)</td><td>28</td><td align="char" char="(">22 (78.6%)</td><td align="char" char="(">10 (35.7%)</td></tr><tr><td>Arabidopsis</td><td>19,162</td><td align="char" char="(">18,879 (98.5%)</td><td align="char" char="(">10,329 (53.9%)</td><td>139</td><td align="char" char="(">93 (66.9%)</td><td align="char" char="(">68 (48.9%)</td></tr><tr><td>Yeast</td><td>4740</td><td align="char" char="(">4547 (95.9%)</td><td align="char" char="(">4335 (91.5%)</td><td>21</td><td align="char" char="(">6 (28.6%)</td><td align="char" char="(">6 (28.6%)</td></tr></tbody></table><table-wrap-foot><fn><p>Stringent: number of transcripts significant at p < 0.05 using 3′UTRs as a null model (see ‘Materials and methods’ for more details).</p></fn></table-wrap-foot></table-wrap></p><p>We detected hundreds of annotated lncRNAs in the vertebrate species (mouse, human and zebrafish), the number being lower (<150) in the other species (fruit fly, <italic>Arabidopsis</italic> and yeast). In addition, we identified a large number of novel lncRNAs not annotated in the databases, 2488 taking all species together (<xref ref-type="supplementary-material" rid="SD1-data">Supplementary file 1A</xref>). The inclusion of such lncRNAs resulted in a sixfold increase in the number of lncRNAs amenable for study in zebrafish and a twofold increase in mouse. In yeast, we only found two annotated lncRNAs, but there were 19 novel ones. In the majority of the analyses, we merged the annotated and the novel lncRNAs.</p><p>As expected, lncRNAs tended to be much shorter than codRNAs in all the species studied (<xref ref-type="fig" rid="fig1">Figure 1A</xref>). We found that most lncRNAs contained at least one short ORF (≥24 amino acids) and often several ORFs. The average ORF size in lncRNAs was between 43 and 68 amino acids depending on the species (<xref ref-type="supplementary-material" rid="SD1-data">Supplementary file 1B</xref>). Consistent with previous studies, lncRNAs were expressed at significantly lower levels than codRNAs (<xref ref-type="fig" rid="fig1">Figure 1B</xref>, Wilcoxon test, p < 10<sup>−5</sup>).<fig id="fig1" position="float"><object-id pub-id-type="doi">10.7554/eLife.03523.005</object-id><label>Figure 1.</label><caption><title>General characteristics of codRNA and lncRNA transcripts.</title><p>(<bold>A</bold>) Density plots of transcript length. (<bold>B</bold>) Box-plots of transcript expression level in log2(FPKM) units. lncRNA_ribo: lncRNAs associated with ribosomes; lncRNA_noribo: lncRNAs for which association with ribosomes was not detected. codRNA: coding transcripts encoding experimentally validated proteins except for zebrafish in which all transcripts annotated as coding were considered. The area within the box-plot comprises 50% of the data and the line represents the median value. In all studied species, codRNAs were expressed at higher levels than lncRNAs (Wilcoxon test, p < 10<sup>−5</sup>), and lncRNA_ribo at higher levels than lncRNA_noribo (Wilcoxon test, p < 0.005).</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.005">http://dx.doi.org/10.7554/eLife.03523.005</ext-link></p></caption><graphic xlink:href="elife-03523-fig1-v1.tif"/></fig></p></sec><sec id="s2-2"><title>Efficient detection of translation events by ribosome profiling</title><p>The analysis of ribosome profiling sequencing data showed that the percentage of expressed coding transcripts associated with ribosomes was >90% in all species, with the highest values (>99%) in mouse and fruit fly (<xref ref-type="table" rid="tbl2">Table 2</xref>). Pseudogenes had a lower rate of association with ribosomes than coding RNAs, but surprisingly, in species with many annotated pseudogenes, such as human, mouse, and <italic>Arabidopsis</italic>, the majority of them showed association with ribosomes (<xref ref-type="supplementary-material" rid="SD1-data">Supplementary file 1A</xref>). This appeared to be a true signal; while pseudogenes will typically show sequence similarity to other functional copies in the genome, we only considered uniquely mapped reads with no mismatches.</p><p>Ribosome profiling is based on deep sequencing, and thus provides an unmatched level of resolution of the translated peptides when compared with current proteomics techniques. This is especially important for short proteins, which are difficult to detect by standard mass spectrometry methods (<xref ref-type="bibr" rid="bib79">Slavoff et al., 2013</xref>). We used the ribosome-associated protein-coding RNA data to investigate the relationship between peptide detection by proteomics and protein length. We found that human and mouse translated proteins between 24 and 80 amino acids long were more difficult to identify in proteomics databases than longer proteins (<xref ref-type="table" rid="tbl3">Table 3</xref>).<table-wrap id="tbl3" position="float"><object-id pub-id-type="doi">10.7554/eLife.03523.006</object-id><label>Table 3.</label><caption><p>Fraction of translated proteins of different size detected in proteomics databases</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.006">http://dx.doi.org/10.7554/eLife.03523.006</ext-link></p></caption><table frame="hsides" rules="groups"><thead><tr><th/><th colspan="4">Protein size (amino acids)</th></tr></thead><tbody><tr><td>Species</td><td>24–80</td><td>81–130</td><td>131–180</td><td>>180</td></tr><tr><td>Mouse</td><td>27/58 (46.6%)</td><td>222/286 (77.6%)</td><td>256/330 (77.6%)</td><td>3716/4786 (77.7%)</td></tr><tr><td>Human</td><td>116/272 (42.6%)</td><td>536/748 (71.7%)</td><td>669/875 (76.5%)</td><td>6757/8964 (75.4%)</td></tr><tr><td>Yeast</td><td>27/30 (90.0%)</td><td>168/207 (81.1%)</td><td>234/265 (88.3%)</td><td>2934/3224 (91.0%)</td></tr></tbody></table><table-wrap-foot><fn><p>Only transcripts encoding experimentally validated proteins (codRNAe) were considered.</p></fn></table-wrap-foot></table-wrap></p></sec><sec id="s2-3"><title>Long non-coding RNA transcripts frequently associate with ribosomes</title><p>The percentage of lncRNAs scanned by ribosomes (lncRNA_ribo) was surprisingly high in all the species studied (<xref ref-type="table" rid="tbl2">Table 2</xref>). The values ranged from 28.6% in yeast to 81.9% in mouse. This affected the main lncRNA classes described in Ensembl v. 70, including long intervening non-coding RNAs (lincRNAs) or antisense transcripts (<xref ref-type="supplementary-material" rid="SD1-data">Supplementary file 1C</xref>). Short transcript size may hinder ribosome association detection (<xref ref-type="bibr" rid="bib4a">Aspden et al., 2014</xref>). We also found that the ribosome profiling signal was more difficult to detect in poorly expressed transcripts than in highly expressed ones, both for lncRNAs and codRNAs (<xref ref-type="fig" rid="fig2">Figure 2</xref>). As lncRNAs tend to be expressed at low levels and are short when compared to codRNAs (<xref ref-type="fig" rid="fig1">Figure 1</xref>), we might be underestimating their association with ribosomes. <fig id="fig2" position="float"><object-id pub-id-type="doi">10.7554/eLife.03523.007</object-id><label>Figure 2.</label><caption><title>Effect of transcript expression level on the detection of ribosome association.</title><p>The percentage of transcripts associated with ribosomes is shown for several transcript expression intervals. codRNA: annotated coding transcripts encoding experimentally verified proteins (except in zebrafish for which all coding transcripts were considered). lncRNA: annotated and novel long non-coding RNAs. Only species with at least 20 transcripts in each expression bin were plotted. In the rest of species, the data were consistent with the trends shown.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.007">http://dx.doi.org/10.7554/eLife.03523.007</ext-link></p></caption><graphic xlink:href="elife-03523-fig2-v1.tif"/></fig></p><p>In order to determine if the ribosome profiling signal in lncRNAs was different from noise, we compared ribosome density in the transcripts it to that in 3′untranslated regions (3′UTRs). More specifically, the null model consisted in a size-matched set of sequences containing randomly taken 3′UTR from annotated coding transcripts. Ribosome density was calculated as the number of ribosome profiling reads divided by RNA-seq reads, a ratio defined as translational efficiency (TE) (<xref ref-type="bibr" rid="bib37">Ingolia et al., 2011</xref>). Both codRNAs and lncRNAS displayed much higher TE values than 3′UTRs in all species studied (Wilcoxon test p < 10<sup>−5</sup>, <xref ref-type="fig" rid="fig3">Figure 3</xref>). We could reject the null model for 90.12% of the lncRNAs and 87.19% of the codRNAs associated with ribosomes (p < 0.05) (see details by species in <xref ref-type="table" rid="tbl2">Table 2</xref>, Stringent set). Therefore, we concluded that the density of ribosomes in lncRNAs is much higher than expected by spurious ribosome binding.<fig id="fig3" position="float"><object-id pub-id-type="doi">10.7554/eLife.03523.008</object-id><label>Figure 3.</label><caption><title>TE distribution in human transcripts and 3′UTRs (null-model).</title><p>Cumulative distribution of TE values in human codRNAs, lncRNAs, and 3′UTR sequences. We randomly selected 3′UTRs with a minimum length of 30 nucleotides to build a set of 3′UTR sequences with the same size distribution as the complete transcripts.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.008">http://dx.doi.org/10.7554/eLife.03523.008</ext-link></p></caption><graphic xlink:href="elife-03523-fig3-v1.tif"/></fig></p><p>Next, we compared ribosome density in lncRNAs and codRNAs in each of the species focusing on regions covered by ribosome profiling reads to accommodate for any differences in the length of the putatively translated regions. In human, fruit fly, and yeast, TE was higher in codRNAs than in lncRNAs (Wilcoxon test, p < 0.005), but in mouse and zebrafish the opposite trend was observed (Wilcoxon test, p < 0.05) (<xref ref-type="fig" rid="fig4">Figure 4</xref>). Despite the differences between the species, which may be due to technical issues, it is clear that lncRNAs can show TE values that are similar or even higher than codRNAs. The results were similar when we restricted the analysis to genes encoding a single transcript to avoid any possible biases due to multiple read mapping or when we employed the maximum TE in 90 nucleotide windows (<xref ref-type="fig" rid="fig4s1">Figure 4—figure supplement 1</xref>).<fig-group><fig id="fig4" position="float"><object-id pub-id-type="doi">10.7554/eLife.03523.009</object-id><label>Figure 4.</label><caption><title>Ribosome association profiles for codRNAs and lncRNAs.</title><p>Box-plots of transcript translational efficiency (TE) in log2(TE) units. The area within the box-plot comprises 50% of the data, and the line represents the median value. lncRNA: lncRNAs for which association with ribosomes was detected. codRNA: coding RNAs transcripts encoding experimentally validated proteins except for zebrafish in which all transcripts annotated as coding were considered.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.009">http://dx.doi.org/10.7554/eLife.03523.009</ext-link></p></caption><graphic xlink:href="elife-03523-fig4-v1.tif"/></fig><fig id="fig4s1" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03523.010</object-id><label>Figure 4—figure supplement 1.</label><caption><title>Additional translational efficiency (TE) measures.</title><p>Single isoforms correspond to data for genes with a single transcript. The number of such genes was 2961 codRNA and 246 lncRNA_ribo for mouse, 2853 codRNA and 150 lncRNA_ribo for human, 9352 codRNA and 412 lncRNA_ribo for zebrafish, 836 codRNA and 18 lncRNA_ribo for fruit fly, and 3024 codRNA and 92 lncRNA_ribo for Arabidopsis. In the case of yeast, all genes were taken. TE max is the maximum TE value taking 90 nucleotide windows.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.010">http://dx.doi.org/10.7554/eLife.03523.010</ext-link></p></caption><graphic xlink:href="elife-03523-fig4-figsupp1-v1.tif"/></fig></fig-group></p><p>For comparison, we collected a set of 29 human genes with non-coding functions described in several recent reviews (<xref ref-type="supplementary-material" rid="SD2-data">Supplementary file 2A</xref>; <xref ref-type="bibr" rid="bib70">Ponting et al., 2009</xref>; <xref ref-type="bibr" rid="bib87">Ulitsky and Bartel, 2013</xref>; <xref ref-type="bibr" rid="bib24">Fatica and Bozzoni, 2014</xref>). Many of these genes play roles in the regulation of gene expression in the nucleus and are thus unlikely to be translated. We only detected expression for five of these genes: <italic>Malat1</italic>, <italic>Pvt1</italic>, <italic>Neat1</italic>, <italic>Meg8</italic>, and <italic>Cyrano</italic>. Transcripts encoded by the first three genes showed ribosome association. In the case of <italic>Malat1,</italic> this was also consistently observed in mouse and zebrafish (in the latter species <italic>Malat1</italic> was identified as a novel transcript) and in the case of <italic>Pvt1</italic> in mouse. Given the small number of expressed transcripts, we could not draw any general conclusions for this set.</p></sec><sec id="s2-4"><title>lncRNAs show similar ribosome protection profiles to codRNAs</title><p>The exact positions of ribosome profiling reads on the RNA can be used to delineate the regions that are being actively translated or to discover new functional ORFs (<xref ref-type="bibr" rid="bib14">Chew et al., 2013</xref>; <xref ref-type="bibr" rid="bib30">Guttman et al., 2013</xref>; <xref ref-type="bibr" rid="bib35">Ingolia, 2014</xref>). Because the ribosome is released after encountering a stop codon, this technique can also be employed to identify novel C-terminal protein extensions (<xref ref-type="bibr" rid="bib20">Dunn et al., 2013</xref>) or to evaluate if a predicted ORF is likely to correspond to a translated peptide (<xref ref-type="bibr" rid="bib30">Guttman et al., 2013</xref>). We next aimed at comparing the TE values in different transcript regions, including open reading frames (ORFs), putative 5′ and 3′ untranslated regions (UTRs), and the regions between ORFs.</p><p>In order to obtain an unbiased picture, it was important to define the different regions in the same way in lncRNAs and codRNAs. In typical codRNAs there is a main translated ORF that covers a large fraction of the transcript, sometimes accompanied by short upstream ORFs in the 5′UTR (<xref ref-type="bibr" rid="bib14">Chew et al., 2013</xref>). However, lncRNAs may potentially encode several short peptides (<xref ref-type="bibr" rid="bib37">Ingolia et al., 2011</xref>). The minimum size of ORFs was set at 24 amino acids (75 nucleotides counting the STOP codon), as peptides of this size have been identified in genetic screen studies in humans (<xref ref-type="bibr" rid="bib33">Hashimoto et al., 2001</xref>). To simplify the comparisons, we employed the same ORF size cut-off in all species. We also considered both a primary ORF, defined as the ORF with the largest number of ribosome profiling reads, as well as any additional non-overlapping ORFs that mapped to ribosome profiling reads (rest of ORFs).</p><p>In codRNAs, the primary ORF showed a nearly perfect degree of agreement with the annotated protein, indicating that it was an appropriate metric for the main translated product. Primary ORFs in lncRNAs typically occupied a shorter fraction of the transcript than in codRNAs (<xref ref-type="fig" rid="fig5">Figure 5A</xref>). The relative length of the ORF with respect to transcript length did not seem to be a strong predictor of ribosome association, as it did not help distinguish lncRNAs associated with ribosomes (lncRNA_ribo) to those not associated with ribosomes (lncRNA_noribo). In lncRNAs, most of the primary ORFs corresponded to proteins less than 100 amino acids long (<xref ref-type="fig" rid="fig5s1">Figure 5—figure supplement 1</xref>).<fig-group><fig id="fig5" position="float"><object-id pub-id-type="doi">10.7554/eLife.03523.011</object-id><label>Figure 5.</label><caption><title>Ribosome association in different transcript regions.</title><p>(<bold>A</bold>) Density plot of the relative length of the primary ORF in lncRNA_ribo and codRNA with respect to transcript length. For comparison data for the longest ORF in lncRNA_noribo is also shown (except for fruit fly due to insufficient data). (<bold>B</bold>) Box-plots of TE distribution in primary ORF, 5′UTR, and 3′UTR regions. The area within the box-plot comprises 50% of the data, and the line represents the median value. The analysis considered all transcripts with 5′UTR and 3′UTR longer than 30 nucleotides and >0.2 FPKM in all three regions. The number of transcripts was 1956 codRNA and 159 lncRNA_ribo in mouse, 3558 codRNA and 139 lncRNA_ribo in human, 5216 codRNA and 252 lncRNA_ribo in zebrafish, and 2019 codRNA and 33 lncRNA_ribo in Arabidopsis. (<bold>C</bold>) Box-plots of TE distribution in primary ORFs, rest of ORFs with ribosome profiling reads and non-ORF regions (interORF). The analysis considered all transcripts with at least two ORFs and more than 30 nucleotides interORF. The number of transcripts was 3264 codRNA and 204 lncRNA_ribo in mouse, 3104 codRNA and 168 lncRNA_ribo in human, 1646 codRNA and 212 lncRNA_ribo in zebrafish, and 1098 codRNA and 25 lncRNA_ribo in Arabidopsis. Fruit fly and yeast were not included in the last two analyses due to insufficient data (<8 lncRNA_ribo meeting the conditions).</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.011">http://dx.doi.org/10.7554/eLife.03523.011</ext-link></p></caption><graphic xlink:href="elife-03523-fig5-v1.tif"/></fig><fig id="fig5s1" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03523.012</object-id><label>Figure 5—figure supplement 1.</label><caption><title>Absolute nucleotide length of ORFs in different kinds of transcripts.</title><p>In codRNAs and lncRNA_ribo, we selected the primary ORF (the ORF with the largest number of ribosome profiling reads), whereas in lncRNA_noribo we selected the longest ORF.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.012">http://dx.doi.org/10.7554/eLife.03523.012</ext-link></p></caption><graphic xlink:href="elife-03523-fig5-figsupp1-v1.tif"/></fig><fig id="fig5s2" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03523.013</object-id><label>Figure 5—figure supplement 2.</label><caption><title>Translational efficiency in single-isoform genes.</title><p>(<bold>A</bold>) Box-plots of TE distribution in primary ORF, 5′UTR, and 3′UTR regions. The analysis considered only genes with one isoform, with UTR and ORF regions expressed at >0.2 FPKM and with 5′UTR and 3′UTR longer than 30 nucleotides. The number of transcripts was 980 codRNA and 97 lncRNA_ribo in mouse, 758 codRNA and 36 lncRNA_ribo in human, 3763 codRNA and 117 lncRNA_ribo in zebrafish, and 1495 codRNA and 32 lncRNA_ribo in Arabidopsis. (<bold>B</bold>) Box-plots of TE distribution in primary ORFs, other ORFs with ribosome profiling reads and non-ORF regions (interORFs). The analysis only considered genes with one isoform in which these regions were longer than 30 nucleotides and with expression >0.2 FPKM. The number of transcripts was 1691 codRNA and 113 lncRNA_ribo in mouse, 763 codRNA and 54 lncRNA_ribo in human, 1170 codRNA and 108 lncRNA_ribo in zebrafish, and 817 codRNA and 25 lncRNA_ribo in Arabidopsis.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.013">http://dx.doi.org/10.7554/eLife.03523.013</ext-link></p></caption><graphic xlink:href="elife-03523-fig5-figsupp2-v1.tif"/></fig><fig id="fig5s3" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03523.014</object-id><label>Figure 5—figure supplement 3.</label><caption><title>Translational efficiency in annotated transcripts.</title><p>(<bold>A</bold>) Box-plots of TE distribution in primary ORF, 5′UTR, and 3′UTR regions. The analysis considered only annotated transcripts, with UTR and ORF regions expressed at >0.2 FPKM and with 5′UTR and 3′UTR longer than 30 nucleotides. The number of transcripts was 1956 codRNA and 92 lncRNA_ribo in mouse, 3558 codRNA and 138 lncRNA_ribo in human, 5216 codRNA and 54 lncRNA_ribo in zebrafish, and 2019 codRNA and 22 lncRNA_ribo in Arabidopsis. (<bold>B</bold>) Box-plots of TE distribution in primary ORFs, other ORFs with ribosome profiling reads (rest ORFs) and non-ORF regions (interORF). The analysis only considered annotated transcripts in which these regions were longer than 30 nucleotides and with expression >0.2 FPKM. The number of transcripts was 3264 codRNA and 128 lncRNA_ribo in mouse, 3104 codRNA and 167 lncRNA_ribo in human, 1646 codRNA and 58 lncRNA_ribo in zebrafish, and 1098 codRNA and 18 lncRNA_ribo in Arabidopsis.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.014">http://dx.doi.org/10.7554/eLife.03523.014</ext-link></p></caption><graphic xlink:href="elife-03523-fig5-figsupp3-v1.tif"/></fig><fig id="fig5s4" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03523.015</object-id><label>Figure 5—figure supplement 4.</label><caption><title>Translational efficiency in transcripts expressed at different levels.</title><p>We restricted this analysis to transcripts with ORF and UTR regions expressed at >0.2 FPKM and with 5′UTR and 3′UTR longer than 30 nucleotides. (<bold>A</bold>) Expressed at low levels: transcripts expressed at 0.5–2 FPKM, (<bold>B</bold>) expressed at high levels: transcripts expressed at 2–10 FPKM. codRNAs were sampled in such a way as to have the same gene expression distribution as the corresponding lncRNA set. Results for species in which all sets contained at least 20 transcripts are shown.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.015">http://dx.doi.org/10.7554/eLife.03523.015</ext-link></p></caption><graphic xlink:href="elife-03523-fig5-figsupp4-v1.tif"/></fig></fig-group></p><p>Next, we focused our attention on the differences between the primary ORF and the 5′UTR and 3′UTR regions in codRNAs and lncRNAs. We defined the 3′ untranslated region (3′UTR) as the sequence located immediately after the STOP codon of the primary ORF or the most downstream ORF associated with ribosomes. We used the same criteria to define the 5′UTR upstream from the initiation codon. In this analysis, we included all transcripts containing at least one ORF associated with ribosomes (the primary ORF) and sufficiently long UTR regions as to detect ribosome profiling reads (>30 nucleotides); insufficient data for fruit fly and yeast precluded the analysis for these species. In both codRNAs and lncRNAs, the 5′UTR showed a ribosome density (translational efficiency, TE) comparable to that of the primary ORF (<xref ref-type="fig" rid="fig5">Figure 5B</xref>). In contrast, the 3′UTR showed very little ribosome association and often we could not find a single read mapping to this region (31–91% of cases in codRNAs and 46–68% in lncRNAs). Using genes with a single isoform or considering only annotated transcripts produced similar results (<xref ref-type="fig" rid="fig5s2 fig5s3">Figure 5—figure supplements 2 and 3</xref>). We also controlled for expression level by dividing the data set in transcripts with low (0.5–2 FPKM) and high expression (>2 FPKM), and by sampling the codRNAs in such a way as to have a similar expression distribution as lncRNAs. The results were very similar to those obtained with the complete data set (<xref ref-type="fig" rid="fig5s4">Figure 5—figure supplement 4</xref>), indicating that the analysis is robust to transcript expression differences.</p><p>As transcripts may contain several ORFs, we performed a separate analysis in which we compared the translational efficiency of the primary ORF, any additional ORFs with mapped ribosome profiling reads, and the regions between ribosome-protected ORFs (interORF) (<xref ref-type="fig" rid="fig5">Figure 5C</xref>). InterORF regions showed little signal when compared to the primary ORF, both in codRNAs and lncRNAs (Wilcoxon test, p < 10<sup>−9</sup> in human, mouse, and zebrafish, p < 0.05 in <italic>Arabidopsis</italic>, insufficient data for fruit fly and yeast precluded the analysis for these species). The data also indicated that ribosome binding is not always restricted to the primary ORF, especially in lncRNAs, as ribosome protection could sometimes be observed for additional ORFs.</p><p>Taken together, these results indicate that lncRNAs have ribosome profiling signatures consistent with translation, with a strong decrease of ribosome density in the 3′UTR but not the 5′UTR region, and preferential binding of ribosomes to the primary ORF. There exists the possibility that the translated peptides are degraded soon after being produced. However, we estimate that the percentage of cases that may undergo nonsense-mediated decay (NMD, see ‘Materials and methods’ for more details) is low, between 4.47 and 14.11% depending on the species. For comparison, the percentage for protein-coding transcripts showing the same patterns (including transcripts annotated as NMD in Ensembl) is between 0.34 and 13.33%.</p></sec><sec id="s2-5"><title>lncRNAs are less conserved than codRNAs</title><p>Are the putatively translated ORF in lncRNAs conserved? We performed sequence similarity searches using BLASTP (E-value < 10<sup>−4</sup>) against all annotated coding transcripts in Ensembl, as well as against the primary ORFs in lncRNAs, for the six species studied here (<xref ref-type="supplementary-material" rid="SD1-data SD2-data">Supplementary files 1D and 2B</xref>). The number of lncRNA_ribo with homologues in other species was remarkably low (0–15.6%) except for zebrafish (49.4%). In contrast, the majority of codRNAs had homologues in other species (>95% for vertebrates and fruit fly and 70–73% for <italic>Arabidopsis</italic> and yeast). After we discarded lncRNAs that showed cross-species conservation, association with ribosomes was still very prevalent (80.4% of mouse, 40.3% of human, and 22.1% of zebrafish lncRNAs were associated with ribosomes).</p><p>We also investigated whether the ribosome-associated ORFs in lncRNAs showed homology to annotated proteins in the same species. The values were very low for all the species (0–12.4%) except for zebrafish (47.5%). Therefore, in general lncRNAs are not truncated duplicated copies (pseudogenes). The case of zebrafish is an exception probably because of missing protein-coding annotations in this species.</p></sec><sec id="s2-6"><title>Coding properties of ribosome-protected ORFs in lncRNAs</title><p>Subsequently, we compared the sequence coding properties of the primary ORF in lncRNAs with those in <italic>bona fide</italic> coding and non-coding sequences using a hexamer-based coding score (see ‘Materials and methods’). In all species the coding scores of the primary ORF in lncRNAs, while lower than that of codRNAs, were significantly higher than the coding score of ORFs in introns (<xref ref-type="fig" rid="fig6">Figure 6</xref>, Wilcoxon test lncRNA_ribo vs intron, human, mouse, zebrafish, and <italic>Arabidopsis</italic> p < 10<sup>−16</sup>; fruit fly and yeast p < 10<sup>−5</sup>). This clearly shows that ORFs in lncRNAs are more coding-like than random ORFs. We repeated the same comparison using 100 different randomly sampled intronic sequence sets, and in >95% of the cases, we obtained the same result. lncRNAs associated with ribosomes (lncRNA_ribo) showed higher coding scores than those not associated with ribosomes (lncRNA_noribo), even when we did not use the ribosome profiling information and compared the longest ORF in both types of transcripts (<xref ref-type="fig" rid="fig6s1">Figure 6—figure supplement 1</xref>). We reached similar conclusions when we restricted the analysis to annotated lncRNA transcripts (<xref ref-type="fig" rid="fig6s2">Figure 6—figure supplement 2</xref>), when we used ORFs from gene deserts as an alternative non-coding sequence set (differences with lncRNAs significant by Wilcoxon test, p < 10<sup>−16</sup>, see ‘Materials and methods’ for more details), and when we restricted the analysis to lncRNAs for which we did not find protein coding homologues in the other species studied (<xref ref-type="fig" rid="fig6s3">Figure 6—figure supplement 3</xref>). Because a high proportion of lncRNAs contained small ORFs, we repeated the comparison only considering transcripts with ORFs shorter than 100 amino acids to avoid any length biases, again obtaining similar results (<xref ref-type="fig" rid="fig6s4">Figure 6—figure supplement 4</xref>). The use of other coding scores, for example based on codon frequencies instead of hexamer frequencies or related metrics such as GC content produced consistent results (<xref ref-type="fig" rid="fig6s5">Figure 6—figure supplement 5</xref>; <xref ref-type="supplementary-material" rid="SD1-data">Supplementary file 1E</xref>).<fig-group><fig id="fig6" position="float"><object-id pub-id-type="doi">10.7554/eLife.03523.016</object-id><label>Figure 6.</label><caption><title>Coding scores in ORFs from different types of transcripts.</title><p>Intron: randomly selected intronic regions; lncRNA_noribo: lncRNAs not associated with ribosomes; lncRNA_ribo: lncRNAs associated with ribosomes; pseudogene: pseudogenes associated with ribosomes; codRNAne: coding transcripts encoding non-validated proteins associated with ribosomes; codRNAe: coding transcripts encoding experimentally validated proteins. The coding score was calculated as the log ratio of hexamer frequencies in coding vs intronic sequences. In lncRNA_noribo and introns, we considered the longest ORF and in the rest of transcripts the primary ORF. The Class ‘pseudogene’ was only included in species with more than 20 expressed pseudogenes with mapped ribosome profiling reads. The coding score of the primary ORF in lncRNAs (lncRNA_ribo) was significantly higher than the coding score in ORFs defined in introns (Wilcoxon test, human, mouse, zebrafish, and Arabidopsis p < 10<sup>−16</sup>; fruit fly and yeast p < 10<sup>−4</sup>, Wilcoxon test) and in lncRNA_ribo it was significantly higher than in lncRNA_noribo in four species (Wilcoxon test, human, mouse and zebrafish p < 10<sup>−5</sup>, and Arabidopsis p < 0.05). Transcripts from genes of different evolutionary age were taken from the literature (see manuscript text). The number of transcripts was 68 for rodent, 127/123 for mammalian (mouse/human as reference species), 11,203/13,423/9812 for metazoan (mouse/human/zebrafish), 162 for fish, 208 for Crucifera, 28 for <italic>S. cerevisiae</italic> and 84 for Saccharomyces. The youngest class of codRNAs displayed similar scores than lncRNA_ribo in mouse, zebrafish, and yeast (classes rodent, fish and <italic>S. cerevisiae</italic>, respectively), being only significantly higher in human and Arabidopsis (Wilcoxon test, p < 0.005; classes primate and Cruciferae). We did not analyze young genes in fruit fly due to lack of a suitable young set of codRNAs in this species.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.016">http://dx.doi.org/10.7554/eLife.03523.016</ext-link></p></caption><graphic xlink:href="elife-03523-fig6-v1.tif"/></fig><fig id="fig6s1" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03523.017</object-id><label>Figure 6—figure supplement 1.</label><caption><title>Coding scores for the longest ORF.</title><p>Comparison between lncRNAs associated and not associated with ribosomes using the longest ORF in both cases (lncRNA_ribo and lncRNA_noribo, respectively). Differences between lncRNA_ribo and lncRNA_noribo are significant by a Wilcoxon test (p < 10<sup>−10</sup> in human, mouse, and zebrafish; p < 0.005 in Arabidopsis).</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.017">http://dx.doi.org/10.7554/eLife.03523.017</ext-link></p></caption><graphic xlink:href="elife-03523-fig6-figsupp1-v1.tif"/></fig><fig id="fig6s2" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03523.018</object-id><label>Figure 6—figure supplement 2.</label><caption><title>Coding scores in different classes of annotated sequences.</title><p>Comparison between different transcript classes using only annotated lncRNAs. Yeast transcriptome is composed of very few annotated lncRNAs, and this analysis could not be performed.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.018">http://dx.doi.org/10.7554/eLife.03523.018</ext-link></p></caption><graphic xlink:href="elife-03523-fig6-figsupp2-v1.tif"/></fig><fig id="fig6s3" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03523.019</object-id><label>Figure 6—figure supplement 3.</label><caption><title>Coding scores in lncRNAs without homologues in other species.</title><p>Comparison between different transcript classes using only lncRNA with no homologues (noH) in other species. Only species in which several lncRNA_ribo and lncRNA_noribo had homology matches were considered.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.019">http://dx.doi.org/10.7554/eLife.03523.019</ext-link></p></caption><graphic xlink:href="elife-03523-fig6-figsupp3-v1.tif"/></fig><fig id="fig6s4" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03523.020</object-id><label>Figure 6—figure supplement 4.</label><caption><title>Coding scores in small ORFs from different types of transcripts.</title><p>Here we only employed lncRNAs in which the primary ORF was shorter than 100 amino acids. codRNA refers to joined codRNAe and codRNAne sets, since experimentally verified proteins are usually longer than 100 amino acid. The number of transcripts is shown in red.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.020">http://dx.doi.org/10.7554/eLife.03523.020</ext-link></p></caption><graphic xlink:href="elife-03523-fig6-figsupp4-v1.tif"/></fig><fig id="fig6s5" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03523.021</object-id><label>Figure 6—figure supplement 5.</label><caption><title>Use of different coding statistics in human transcripts.</title><p>Equal dicodon was based on the observed hexamer frequencies in coding sequences vs hexamer equiprobability, intron dicodon was based on the differences between hexamer frequencies in coding vs non-coding sequences and intron_monocodon was based on the observed codon frequences in coding sequences vs codon equiprobability.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.021">http://dx.doi.org/10.7554/eLife.03523.021</ext-link></p></caption><graphic xlink:href="elife-03523-fig6-figsupp5-v1.tif"/></fig><fig id="fig6s6" position="float" specific-use="child-fig"><object-id pub-id-type="doi">10.7554/eLife.03523.022</object-id><label>Figure 6—figure supplement 6.</label><caption><title>Ribosome protection patterns in transcripts containing short ORFs.</title><p>(<bold>A</bold>) Mouse CUFF.34338.1 (chr5:113183493–113188347) is a novel lncRNA, it contains an ORF encoding a 169 amino acid protein associated with ribosomes and with protein-coding homologues in human, zebrafish, and yeast. (<bold>B</bold>) ENSMUST00000107081 is an annotated codRNA in mouse which evolved recently since no homologues were found in any other species. It has a small ORF that translates a 55 amino acid protein. (<bold>C</bold>) AT1G34418.1 is an annotated lncRNA in Arabidopsis showing abundant association with ribosomes in the 5′UTR region, the primary ORF (34 amino acid) and the final region of the transcript, which contains two redundant ORFs (in red) coding the sequence: MGLGFVN(V/F)LLGM. RNAseq: profile of RNAseq reads. RPFs: profile of ribosome profiling reads. Exon-intron transcript structures are represented; the thickest boxes on the exons are the primary ORFs.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.022">http://dx.doi.org/10.7554/eLife.03523.022</ext-link></p></caption><graphic xlink:href="elife-03523-fig6-figsupp6-v1.tif"/></fig></fig-group></p><p>At the individual transcript level, a sizeable fraction of lncRNAs associated with ribosomes displayed significantly higher coding scores than expected for non-coding sequences (p < 0.05 in all 100 intronic random sets; data in <xref ref-type="supplementary-material" rid="SD2-data">Supplementary file 2C</xref>; examples in <xref ref-type="fig" rid="fig6s6">Figure 6—figure supplement 6</xref>). These transcripts are comprised of 143 human lncRNAs (35.5% of the lncRNAs, score > 0.0189), 137 mouse lncRNAs (35.1%, score > 0.0377), 379 zebrafish lncRNAs (52.1% score > 0.0095), 7 fruit fly lncRNAs (31.8%, score > −0.0483), 43 <italic>Arabidopsis</italic> lncRNAs (46.2%, score > −0.0202), and 5 yeast lncRNAs (83.3%, score > 0.03387). Annotated and novel lncRNAs were present in similar proportions in these sets, supporting the validity of our strategy of merging the two types of transcripts from the beginning. We also noted that the fraction of lncRNAs with coding homologues in other species increased in these sets. For example, whereas the proportion of total human lncRNA_ribo with homologues in other species was 15.6%, in the set with significant coding scores it was 29.3%. This number increased to 57.3% when we performed searches against the NCBI non-redundant peptide database ‘nr’, as some of the ORFs in lncRNAs are annotated as predicted peptides in this database.</p><p>If ORFs in lncRNAs are being translated this is likely to be a relatively recent evolutionary event, as many lncRNAs are lineage-specific (<xref ref-type="bibr" rid="bib68">Pauli et al., 2012</xref>; <xref ref-type="bibr" rid="bib60">Necsulea et al., 2014</xref>; our data). It is well established that proteins of different evolutionary age display distinct sequence properties, including different codon usage (<xref ref-type="bibr" rid="bib85">Toll-Riera et al., 2009</xref>; <xref ref-type="bibr" rid="bib12">Carvunis et al., 2012</xref>; <xref ref-type="bibr" rid="bib66">Palmieri et al., 2014</xref>). We retrieved sets of annotated protein-coding transcripts of different evolutionary age from human, mouse, zebrafish, <italic>Arabidopsis</italic>, and yeast available from various studies (<xref ref-type="bibr" rid="bib22">Ekman and Elofsson, 2010</xref>; <xref ref-type="bibr" rid="bib19">Donoghue et al., 2011</xref>; <xref ref-type="bibr" rid="bib62">Neme and Tautz, 2013</xref>) and expressed in the systems studied here. We found that the coding score was always lower in the youngest group than in older groups (<xref ref-type="fig" rid="fig6">Figure 6</xref>, Wilcoxon test, p < 0.05). Remarkably, the youngest codRNAs showed a very similar coding score distribution to lncRNAs (<xref ref-type="fig" rid="fig6">Figure 6</xref>). We obtained similar results when we discarded lncRNAs that had homologues in any of the other species (<xref ref-type="fig" rid="fig6s3">Figure 6—figure supplement 3</xref>).</p><p>We also collected information from young protein coding genes encoding experimentally verified proteins according to Swiss-Prot (<xref ref-type="supplementary-material" rid="SD2-data">Supplementary file 2D</xref>). We observed that these proteins were short and the ORF occupied a relatively small fraction of the transcript, features typically observed in lncRNAs. For example, the average size of proteins encoded by primate-specific transcripts was 148 amino acids and the average transcript coverage 47%. The coding score was remarkably low and again similar to that of lncRNAs (median 0.008 for primate-specific human transcripts, 0.046 for rodent-specific mouse transcripts, and 0.089 for yeast-specific coding transcripts).</p></sec><sec id="s2-7"><title>Selection pressure signatures in ORFs associated with ribosomes</title><p>An important measure of the strength of purifying selection acting on a coding sequence is the ratio between the number of non-synonymous and synonymous single nucleotide polymorphisms (PN/PS). Given the nature of the genetic code, there are more possible non-synonymous mutations than synonymous mutations. Under neutrality (no purifying selection), the PN/PS ratio is expected to be approximately 2.89 (<xref ref-type="bibr" rid="bib61">Nei and Gojobori, 1986</xref>).</p><p>Here, we applied the large amount of available polymorphism data for human, mouse, and zebrafish to compare the level of purifying selection in primary ORFs from codRNAs and lncRNAs (<xref ref-type="fig" rid="fig7">Figure 7</xref>; <xref ref-type="supplementary-material" rid="SD1-data">Supplementary file 1F</xref>). In general, human sequences showed higher PN/PS ratios than sequences from the other analyzed species, probably due to the presence of many slightly deleterious mutations segregating in the population (<xref ref-type="bibr" rid="bib23">Eyre-Walker, 2002</xref>). However, despite the intrinsic differences between organisms, we observed the same general trends. First, the PN/PS was significantly lower in codRNAs than in lncRNAs (proportion test, p < 10<sup>−5</sup>), denoting stronger purifying selection in the former. Second, there was a very clear inverse relationship between the strength of purifying selection and the age of the gene (p < 10<sup>−15</sup> between the youngest and rest of codRNAs in mouse and zebrafish), in agreement with previous studies (<xref ref-type="bibr" rid="bib53">Liu et al., 2008</xref>; <xref ref-type="bibr" rid="bib10">Cai et al., 2009</xref>). High PN/PS values were also observed in the subset of young genes encoding experimentally validated proteins in human (primate-specific transcripts median PN/PS of 3.10) and mouse (rodent-specific transcripts median PN/PS 1.42), confirming this tendency. Third, the distribution of PN/PS values in lncRNAs was very similar to that of young protein-coding genes. In human and mouse, there were no significant differences, and in the case of zebrafish the lncRNAs had even slightly lower PN/PS values than the fish-specific protein coding genes (p < 0.01).<fig id="fig7" position="float"><object-id pub-id-type="doi">10.7554/eLife.03523.023</object-id><label>Figure 7.</label><caption><title>Selective pressure in ORFs from different types of transcripts.</title><p>PN/PS: ratio between the number of non-synonymous and synonymous single nucleotide polymorphisms (SNPs) in the complete set of primary ORFs for a given class of transcripts (in lncRNA_noribo the longest ORF was considered). In blue, data for different coding and non-coding transcript classes. In brown, data for different age codRNA classes. The bars represent the 95% confidence interval for the PN/PS value. For the species not shown there was not sufficient data to perform this analysis.</p><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.023">http://dx.doi.org/10.7554/eLife.03523.023</ext-link></p></caption><graphic xlink:href="elife-03523-fig7-v1.tif"/></fig></p></sec></sec><sec id="s3" sec-type="discussion"><title>Discussion</title><p>Here, we analyzed the patterns of ribosome protection in polyA+ transcripts from cells belonging to six different eukaryotic species. Among the expressed transcripts, we identified many lncRNAs in the different species. The vast majority of transcripts annotated as coding showed association with ribosomes (>92% in all species). Remarkably, a very large number of transcripts annotated as long non-coding RNA (lncRNAs) also showed such association (30–82% depending on the data set). Considering that lncRNAs are typically much shorter and expressed at lower levels than codRNAs, which may hinder the identification of ribosome association, this is a very significant fraction. In addition, the patterns of ribosome protection along the transcript are similar to those of protein-coding genes. Therefore, many lncRNAs appear to be scanned by ribosomes and are likely to translate short peptides.</p><p>Long non-coding RNAs are classified as such in databases because, according to a number of criteria, they are unlikely to encode functional proteins. These criteria include the lack of a long ORF, the absence of amino acid sequence conservation, and the lack of known protein domains (<xref ref-type="bibr" rid="bib32">Harrow et al., 2012</xref>). Moreover, we expect lncRNAs not to have matches to proteomics databases, as this should classify them as coding. Annotated lncRNAs are typically longer than 200 nucleotides because this is the cutoff size normally implemented to differentiate them from other RNA classes such as microRNAs and small nuclear RNAs. In practice, it is difficult to classify a transcript as coding or non-coding on the basis of the ORF size (<xref ref-type="bibr" rid="bib17">Dinger et al., 2008</xref>). Some true coding sequences may be quite small, and by chance alone non-coding transcripts may have relatively long ORFs. The majority of lncRNAs contain ORFs longer than 24 amino acids, which can potentially correspond to real proteins. Short proteins are more difficult to detect than longer ones and consequently they are probably underestimated in databases. In recent years, the use of comparative genomics (<xref ref-type="bibr" rid="bib26">Frith et al., 2006</xref>; <xref ref-type="bibr" rid="bib47">Ladoukakis et al., 2011</xref>; <xref ref-type="bibr" rid="bib31">Hanada et al., 2013</xref>), proteomics (<xref ref-type="bibr" rid="bib79">Slavoff et al., 2013</xref>; <xref ref-type="bibr" rid="bib90">Vanderperre et al., 2013</xref>; <xref ref-type="bibr" rid="bib55">Ma et al., 2014</xref>), and a combination of evolutionary conservation and ribosome profiling data (<xref ref-type="bibr" rid="bib15">Crappé et al., 2013</xref>; <xref ref-type="bibr" rid="bib5">Bazzini et al., 2014</xref>) have shown that the number of short proteins is probably much higher than previously suspected (<xref ref-type="bibr" rid="bib3">Andrews and Rothnagel, 2014</xref>). In yeast, gene deletion experiments have provided evidence of functionality for short open reading frames (sORFs < 100 amino acids) (<xref ref-type="bibr" rid="bib41">Kastenmayer et al., 2006</xref>); in zebrafish, several newly discovered sORFs appear to be involved in embryonic development (<xref ref-type="bibr" rid="bib67">Pauli et al., 2014</xref>) and other examples exist in insects (<xref ref-type="bibr" rid="bib56">Magny et al., 2013</xref>) and humans (<xref ref-type="bibr" rid="bib49">Lee et al., 2013</xref>; <xref ref-type="bibr" rid="bib78">Slavoff et al., 2014</xref>). In many cases, the transcripts containing sORFs will be classified as non-coding, especially if the ORF is not well conserved across different species.</p><p>One approach to identify potential coding transcripts is ribosome profiling (<xref ref-type="bibr" rid="bib36">Ingolia et al., 2009</xref>), which has been used to study translation of proteins in a wide range of organisms (<xref ref-type="bibr" rid="bib28">Guo et al., 2010</xref>; <xref ref-type="bibr" rid="bib37">Ingolia et al., 2011</xref>; <xref ref-type="bibr" rid="bib6">Brar et al., 2012</xref>; <xref ref-type="bibr" rid="bib58">Michel et al., 2012</xref>; <xref ref-type="bibr" rid="bib14">Chew et al., 2013</xref>; <xref ref-type="bibr" rid="bib20">Dunn et al., 2013</xref>; <xref ref-type="bibr" rid="bib34">Huang et al., 2013</xref>; <xref ref-type="bibr" rid="bib4">Artieri and Fraser, 2014</xref>; <xref ref-type="bibr" rid="bib5">Bazzini et al., 2014</xref>; <xref ref-type="bibr" rid="bib39">Juntawong et al., 2014</xref>; <xref ref-type="bibr" rid="bib57">McManus et al., 2014</xref>; <xref ref-type="bibr" rid="bib91">Vasquez et al., 2014</xref>). In several of these studies it has been noted that lncRNAs can be protected by ribosomes (<xref ref-type="bibr" rid="bib37">Ingolia et al., 2011</xref>; <xref ref-type="bibr" rid="bib14">Chew et al., 2013</xref>; <xref ref-type="bibr" rid="bib5">Bazzini et al., 2014</xref>; <xref ref-type="bibr" rid="bib39">Juntawong et al., 2014</xref>). However, there is no consensus on whether the observed patterns are consistent with translation. For example in the original analysis of mouse stem cells, which we reanalyzed here, it was reported that many lncRNAs were polycistronic transcripts encoding short proteins (<xref ref-type="bibr" rid="bib37">Ingolia et al., 2011</xref>), but in another paper where the same data were processed in a different way, they concluded that lncRNAs were unlikely to be protein-coding (<xref ref-type="bibr" rid="bib30">Guttman et al., 2013</xref>). A zebrafish ribosome profiling study reported resemblance between lncRNAs and 5′leaders of coding RNAs; the authors suggested that translation may play a role in lncRNA regulation (<xref ref-type="bibr" rid="bib14">Chew et al., 2013</xref>). Nevertheless, in the same study dozens of lncRNAs were proposed to be <italic>bona fide</italic> protein-coding transcripts. In <italic>Arabidopsis</italic>, the translational efficiency values of highly expressed lncRNAs (>5 FPKM) were similar to those of coding RNAs and some lncRNAs had profiles consistent with initiation and termination of translation (<xref ref-type="bibr" rid="bib39">Juntawong et al., 2014</xref>). Finally, using yeast data, <xref ref-type="bibr" rid="bib93">Wilson and Masel. (2011)</xref> found many cases of non-coding transcripts bound to ribosomes and suggested that this facilitates the evolution of novel protein-coding genes from non-coding sequences.</p><p>The disparity of results obtained in different systems motivated us to retrieve the original data and perform exactly the same analyses for six different species. As lncRNA catalogues are still very incomplete for most species, we also defined sets of novel lncRNAs using the RNA-seq sequencing reads for de novo transcript assembly. We discovered many novel, non-annotated, lncRNAs, especially in zebrafish, mouse, and fruit fly (<xref ref-type="table" rid="tbl2">Table 2</xref>). After the analysis of the ribosome profiling data, the same general picture emerged for the different biological systems, indicating that we are detecting very fundamental properties. In transcripts classified as lncRNAs, the ribosome profiling reads tend to cover a smaller fraction of the transcript than in typical codRNAs, in agreement with a shorter relative size of the ORF accumulating the largest number of ribosome profiling reads (primary ORF). We also find that the translational efficiency of regions corresponding to the primary ORF is much higher than that of 3′UTRs, both in codRNAs and lncRNAs, consistent with translation of the transcripts. Furthermore, the primary ORF of lncRNAs showed significantly higher coding score than the longest ORF extracted from randomly selected non-coding regions.</p><p>lncRNAs often contain several potentially translated ORFs (<xref ref-type="bibr" rid="bib37">Ingolia et al., 2011</xref>). Transcripts encoding multiple short proteins have been reported in insects (<xref ref-type="bibr" rid="bib75">Savard et al., 2006</xref>) and could be common in other species as well (<xref ref-type="bibr" rid="bib83">Tautz, 2009</xref>). One such candidate is AT1G34418.1 in <italic>Arabidopsis</italic>, an annotated lncRNA which contains a primary ORF followed by two instances of a 12 amino acid ORF also covered by ribosome profiling reads (<xref ref-type="fig" rid="fig6s6">Figure 6—figure supplement 6</xref>). This case is reminiscent of the gene <italic>pri</italic> in fruit fly, which regulates tarsal development (<xref ref-type="bibr" rid="bib27">Galindo et al., 2007</xref>) and translates several small redundant ORFs (<xref ref-type="bibr" rid="bib45">Kondo et al., 2007</xref>).</p><p>lncRNAs are poorly conserved across species and so, if translated, they will produce species- or lineage-specific proteins. Recently evolved proteins are markedly different from widely distributed ancient proteins; they are shorter, subject to weaker selective constraints and expressed at lower levels (<xref ref-type="bibr" rid="bib1">Albà and Castresana, 2005</xref>; <xref ref-type="bibr" rid="bib10">Cai et al., 2009</xref>; <xref ref-type="bibr" rid="bib51">Liu et al., 2010</xref>; <xref ref-type="bibr" rid="bib19">Donoghue et al., 2011</xref>; <xref ref-type="bibr" rid="bib12">Carvunis et al., 2012</xref>; <xref ref-type="bibr" rid="bib95">Xie et al., 2012</xref>; <xref ref-type="bibr" rid="bib94">Wissler et al., 2013</xref>; <xref ref-type="bibr" rid="bib63">Neme and Tautz, 2014</xref>). Here for the first time, we have compared the properties of the ORFs in lncRNAs associated with ribosomes with the properties of annotated, and in some cases experimentally validated, young protein-coding genes. lncRNAs and young protein-coding transcripts are virtually indistinguishable regarding their coding score and ORF selective constraints (<xref ref-type="fig" rid="fig6 fig7">Figures 6 and 7</xref>), which is consistent with the idea that many lncRNAs encode new peptides.</p><p>Although it is unclear how many of these peptides are functional, the data indicate that at least a fraction of them may be functional. Sequences that translate functional proteins are expected to display signs of selection related to preferential usage of certain amino acids and codons. This can be used to differentiate between coding and non-coding entities, especially in the absence of cross-species conservation, as is the case of many lncRNAs. About 35–40% of primary ORFs in human and mouse lncRNAs displayed coding scores that were significantly higher than those expected for non-coding sequences, making them excellent candidates for translating functional proteins. In fact, five human lncRNAs associated with ribosomes that exhibited high coding scores in our study were re-annotated as protein-coding transcripts in a subsequent Ensembl gene annotation release (version 75, <xref ref-type="supplementary-material" rid="SD2-data">Supplementary file 2C</xref>). Gene knock-out experiments in fly have discovered that young proteins, even if rapidly evolving, are often essential for the organism and can cause important defects when deleted (<xref ref-type="bibr" rid="bib13">Chen et al., 2010</xref>; <xref ref-type="bibr" rid="bib74">Reinhardt et al., 2013</xref>). Similarly, some peptides translated from lncRNAs may have important cellular functions yet to be discovered.</p><p>lncRNAs tend to be expressed at much lower levels than typical codRNAs, so, everything else being equal, the amount of translated peptide is also expected to be smaller. It may be that some of these peptides are not functional, but their translation does not produce a large enough deleterious effect for them to be eliminated via selection. Pseudogenes also showed extensive association with ribosomes in our study, indicating that the translation machinery is probably not very selective or that some pseudogenes produce functional proteins. This question may be worth revisiting, as a recent proteomics study has also found that dozens of human pseudogenes produce peptides (<xref ref-type="bibr" rid="bib44">Kim et al., 2014</xref>).</p><p>The data also indicate that a fraction of lncRNAs have not acquired the capacity to be translated. Depending on the experiment analyzed, a number of lncRNAs did not show any significant association with ribosomes. As previously discussed, this is probably affected by a lack of sensitivity; it is also true that the lncRNAs not associated with ribosomes tended to show lower coding scores than lncRNAs associated with ribosomes, even when we did not use the ribosome profiling data and simply compared the longest ORF in both kinds of transcripts.</p><p>Recently, it has been reported that human-specific protein-coding genes are often related to non-coding transcripts in macaque, pointing to a non-coding origin for many newly evolved proteins (<xref ref-type="bibr" rid="bib95">Xie et al., 2012</xref>). More generally, one may view de novo protein-coding gene evolution as a continuum from non-functional genomic sequences to fully-fledged protein-coding genes (<xref ref-type="bibr" rid="bib1">Albà and Castresana, 2005</xref>; <xref ref-type="bibr" rid="bib85">Toll-Riera et al., 2009</xref>; <xref ref-type="bibr" rid="bib12">Carvunis et al., 2012</xref>). Therefore, many lncRNAs could be in intermediate states in this process, their pervasive translation serving as the building material for the evolution of new proteins. It may be difficult to obtain functional proteins from completely random ORFs (<xref ref-type="bibr" rid="bib38">Jacob, 1977</xref>), but the effect of natural selection preventing the production of toxic peptides (<xref ref-type="bibr" rid="bib93">Wilson and Masel, 2011</xref>), and the high number of transcripts expressed in the genome, may facilitate this process.</p></sec><sec id="s4" sec-type="materials\|methods"><title>Materials and methods</title><sec id="s4-1"><title>Sequencing and mapping of reads</title><p>We downloaded the original data from Gene Expression Omnibus (GEO) for six different ribosome profiling experiments that had both ribosome footprinting and polyA+ RNA-seq sequencing reads: mouse (<italic>M. musculus</italic>) (<xref ref-type="bibr" rid="bib37">Ingolia et al., 2011</xref>), human (<italic>H. sapiens</italic>, HeLa cells) (<xref ref-type="bibr" rid="bib28">Guo et al., 2010</xref>), zebrafish (<italic>D. rerio</italic>) (<xref ref-type="bibr" rid="bib14">Chew et al., 2013</xref>), fruit fly (<italic>D. melanogaster</italic>) (<xref ref-type="bibr" rid="bib20">Dunn et al., 2013</xref>), <italic>Arabidopsis</italic> (<italic>A. thaliana</italic>) (<xref ref-type="bibr" rid="bib39">Juntawong et al., 2014</xref>), and yeast (<italic>S. cerevisiae</italic>) (<xref ref-type="bibr" rid="bib57">McManus et al., 2014</xref>). We retrieved genome sequences and gene annotations from Ensembl v.70 and Ensembl Plants v.21 (<xref ref-type="bibr" rid="bib25">Flicek et al., 2012</xref>).</p><p>Raw ribosome and RNA-seq sequencing reads underwent quality filtering using Condentri (v.2.2) (<xref ref-type="bibr" rid="bib80">Smeds and Künstner, 2011</xref>) with the following settings (-hq=30 –lq=10). Adaptors described in the original publications were trimmed from filtered reads if at least five nucleotides of the adaptor sequence matched the end of each read. In zebrafish, reads from different developmental stages were pooled to improve read coverage. In all experiments, reads below 25 nucleotides were not considered. Clean ribosome short reads were filtered by mapping them to the corresponding species reference RNA (rRNA) using the Bowtie2 short-read alignment program (v. 2.1.0) (<xref ref-type="bibr" rid="bib48">Langmead et al., 2009</xref>). Unaligned reads were aligned to a genomic reference genome with Bowtie2 allowing one mismatch in the first 'seed' region (the length of this region was selected according to the descriptions provided in each individual experiment). RNA-seq short reads were mapped with Tophat (v. 2.0.8) (<xref ref-type="bibr" rid="bib43">Kim et al., 2013</xref>) to the corresponding reference genome. We allowed two mismatches in the alignment with the exception of zebrafish, for which we allowed three mismatches since the reads were significantly longer. Multiple mapping was allowed unless specifically stated.</p></sec><sec id="s4-2"><title>Defining a set of expressed transcripts</title><p>Expressed transcripts were assembled using Cufflinks (v 2.2.0) (<xref ref-type="bibr" rid="bib86">Trapnell et al., 2010</xref>). We initially considered a transcript as expressed if it was covered by at least four reads and its abundance was higher than 1% of the most abundant isoform of the gene. We also discarded assembled transcripts in which >20% of reads were mapped to several locations in the genome. Gene annotation files from Ensembl (gtf format, v.70) were provided to Cufflinks to guide the reconstruction of already annotated transcripts. Annotated transcripts were divided into coding RNAs and long non-coding RNAs (lncRNAs), we only considered lncRNAs that were not part of genes with coding transcripts. Novel isoforms corresponding to annotated loci were not analyzed. Transcripts that did not match or overlapped annotated genes were labeled 'novel’ lncRNAs. We used a length threshold of 200 nucleotides to select novel long non-coding RNAs, as in ENCODE annotations (<xref ref-type="bibr" rid="bib18">Djebali et al., 2012</xref>).</p><p>Strand directionality of multiexonic transcripts was inferred using the splice site consensus sequence. We only considered monoexonic transcripts in the case of <italic>Arabidopsis</italic> and yeast, provided the transcripts were intergenic.</p><p>The inclusion of novel lncRNAs made it possible to perform analyses of species for which there are very few annotated lncRNAs. Annotations of UTR regions in yeast genes were missing from Ensembl because of the variability observed in transcription start sites (TSS). However, we downloaded a set of available 5′ and 3′UTRs obtained by deep transcriptomics (<xref ref-type="bibr" rid="bib59">Nagalakshmi et al., 2008</xref>) and added them to the existing yeast Ensembl annotations before assembling the transcriptome.</p><p>Coding transcripts were classified into different subclasses depending on the existing annotations: (a) Annotated protein-coding transcripts (codRNA), (b) Annotated transcripts with surveillance mechanisms (nonsense mediated decay, nonstop mediated decay, and no-go decay), (c) Annotated pseudogenes. We removed protein-coding transcripts in which annotated coding sequences (CDS) are still incomplete.</p><p>Subsequently, we defined an additional subset of annotated protein-coding transcripts with well-established coding properties based on the existence of an experimentally verified protein in Swiss-Prot for the gene (‘evidence at protein level’, downloaded 29 October 2013, <xref ref-type="bibr" rid="bib88">UniProt Consortium, 2014</xref>). These transcripts were labeled codRNAe. The rest of annotated protein-coding transcripts were abbreviated codRNAne. In zebrafish, most proteins are not yet experimentally validated; and therefore, we generated a single group.</p><p>We built a data set of human lncRNAs with described non-coding functions using data obtained from several recent reviews (<xref ref-type="bibr" rid="bib70">Ponting et al., 2009</xref>; <xref ref-type="bibr" rid="bib87">Ulitsky and Bartel, 2013</xref>; <xref ref-type="bibr" rid="bib24">Fatica and Bozzoni, 2014</xref>). This data set included 29 different genes (<xref ref-type="supplementary-material" rid="SD2-data">Supplementary file 2A</xref>).</p><p>We used cufflinks to estimate the expression level of a transcript in FPKM units (Fragments Per Kilobase per total Million mapped reads). We used a threshold of >0.5 FPKM except in yeast, in which the average read coverage per transcript was much higher than in the other species and the threshold was set up at >5 FPKM. These thresholds guaranteed detection of ribosome association for the majority of expressed coding transcripts (>92%), while yielding proportions of transcripts comparable to those reported in the original papers.</p></sec><sec id="s4-3"><title>Definition of potential open reading frames (ORFs) and other transcript regions</title><p>We predicted all possible open reading frames (ORFs) in the expressed transcripts. We defined an ORF as any sequence starting with an AUG codon and finishing with a stop codon (TAA, TAG, or TGA), and at least 75 nucleotides long. This would correspond to a 24 amino acid protein, which is the size of the smallest complete human polypeptide found in genetic screen studies (<xref ref-type="bibr" rid="bib33">Hashimoto et al., 2001</xref>). This ORF definition will not detect non-canonical ORFs with different start or stop codons, although these ORFs often correspond to regulatory ORFs (uORFs) in the 5′UTR region. In monoexonic transcripts (<italic>Arabidopsis</italic> and yeast), we considered all six possible different frames.</p><p>We also defined each transcript 5′UTR as the region between the transcription start site and the AUG codon from the left-most predicted ORF, and the 3′UTR the region from the stop codon in the right-most predicted ORF to the transcript end. UTRs with lengths below 30 nucleotides were not analyzed since ribosome reads could not be properly aligned to these regions due to their small size. Regions between two consecutive putatively translated ORFs (with ribosome profiling reads) were termed interORF. We only analyzed this region when the length of the interORF sequence in a transcript was 30 nucleotides or longer.</p><p>We defined a set of <italic>bona fide</italic> non-coding sequences sampled from intronic fragments. We used the introns of the genes expressed in each experiment, provided they did not overlap to any exons from other overlapping genes. We randomly selected fragments in such a way as to simulate the same size distribution as in the complete set of expressed transcripts. We performed 100 simulations of intron sampling to ensure the results were robust to the randomization process. We selected the longest ORF in each intronic fragment for the calculation of coding scores and GC content.</p></sec><sec id="s4-4"><title>Association with ribosomes and translational efficiency (TE)</title><p>We computed the number of reads overlapping each feature of interest (transcript, UTR, ORF, and interORF) using the BEDTools package (v. 2.16.2) (<xref ref-type="bibr" rid="bib72">Quinlan and Hall, 2010</xref>). We only considered ribosome reads in which more than half of their length spanned the considered region. This was considered appropriate because the ribosome P-site is usually detected at the central region of the read, with only slight variations depending on the experimental setting. We set up a minimum ribosome profiling coverage of 75 nucleotides per transcript to define the transcript or transcript region (e.g., ORF) as associated with ribosomes. This is significantly longer than the length of the ribosome profiling sequencing reads (36–51 nucleotides) and is consistent with the minimum ORF length threshold.</p><p>The translational efficiency (TE) of a sequence has been previously defined as the density of ribosome profiling (RPF) reads normalized by transcript abundance (<xref ref-type="bibr" rid="bib36">Ingolia et al., 2009</xref>). We calculated it by dividing the FPKM of the ribosome profiling experiment by the FPKM of the RNA-seq experiment. In transcripts, we also obtained the maximum TE by dividing the sequence in 90 nucleotide windows and selecting the window with the highest TE value.</p><p>In order to have a null model of ribosome binding against which to compare the ribosome profiling signal in codRNA and lncRNA transcripts, we extracted annotated 3′ untranslated regions (3′UTRs) from codRNAs in genes in which UTRs did not overlap with coding sequences from other transcripts, and by randomly selecting 3′UTRs with a minimum length of 30 nucleotides, we built a set of 3′UTR sequences with the same size distribution as the complete transcripts. For each species, we calculated the TE values for codRNAs, lncRNA, and 3′UTR sequences. We used the empirical distribution of TE values in the 3′UTRs to calculate the number of codRNAs and lncRNAs that showed significantly higher TE value than expected under the null model at a p < 0.05. These corresponded to TE values higher than 0.1043 in mouse, 0.2556 in human, 0.0004 in zebrafish, 0.7164 in fruit fly, 0.1800 in <italic>Arabidopsis</italic>, and 0.0527 in yeast.</p><p>We defined the primary ORF in a transcript as the ORF with the largest number of RPF reads with respect to the total RPF reads covering the transcript. The rest of ORFs ≥24 amino acids associated with ribosomes were considered as well; when two or more ORFs overlapped, we selected the longest one. In ORFs, interORFs, and UTRs, we computed the TE along the whole region. For comparing the TE in different regions, we only considered transcripts in which all regions had >0.2 FPKM.</p></sec><sec id="s4-5"><title>Peptide evidence in existing proteomics databases</title><p>We downloaded all peptide sequences from the PeptideAtlas database: 338,013 human peptides (August 2013), 101,695 mouse peptides (June 2013), and 86,836 yeast peptides (March 2013). We investigated if the number of ribosome-associated protein-coding transcripts that matched the peptides in these databases varied with protein length. We omitted this analysis in zebrafish and <italic>Arabidopsis</italic> due to the lack of sufficiently large peptide databases. The matches were identified using BLASTP searches (v. 2.2.28+) (<xref ref-type="bibr" rid="bib2">Altschul et al., 1997</xref>). We selected perfect matches only.</p></sec><sec id="s4-6"><title>Evidence of nonsense mediated decay in ORFs</title><p>We investigated how many primary ORFs may be candidates for being regulated via non-sense mediated decay (NMD) surveillance pathways, whose main function is to eliminate transcripts containing premature stop codons. We defined NMD candidates as all cases in which the stop-codon from a predicted ORF was located ≥55 nucleotides upstream of a splice junction site, provided the stop-codon was not in the terminal exon (<xref ref-type="bibr" rid="bib76">Scofield et al., 2007</xref>). This mechanism is well characterized in protein-coding genes and it has been proposed as a way to degrade non-functional peptides translated in lncRNAs (<xref ref-type="bibr" rid="bib82">Tani et al., 2013</xref>). Other surveillance mechanisms, such as non-stop-mediated decay or no-go decay, were not considered since all predicted ORFs finished at a stop codon, and we did not analyze RNA secondary structures.</p></sec><sec id="s4-7"><title>Defining ages of protein-coding transcripts</title><p>We utilized existing gene age classifications in human, mouse, and zebrafish (<xref ref-type="bibr" rid="bib62">Neme and Tautz, 2013</xref>) to identify young gene classes: human primate-specific (∼55.8 My), mouse rodent-specific (∼61.7 My), human and mouse mammalian-specific (∼225 My), zebrafish actinopterygii-specific (∼420 My) (abbreviated fish) and metazoan (∼800 My). In yeast, we used predefined genes specific to <italic>S. cerevisiae</italic> (1–3 My)(abbreviated <italic>S. cerevisiae</italic>) and the <italic>Saccharomyces</italic> group (∼100 My) (<xref ref-type="bibr" rid="bib21">Ekman et al., 2007</xref>). In <italic>Arabidopsis</italic>, we retrieved <italic>Cruciferae</italic>(<italic>Brassicaceae</italic>)-specific genes (20–40 My) (<xref ref-type="bibr" rid="bib19">Donoghue et al., 2011</xref>). These genes are believed to have arisen primarily by de novo mechanisms, as no homologies in other species have been detected despite the fact that many closely related genomes have now been sequenced.</p></sec><sec id="s4-8"><title>Defining gene desert sequences</title><p>In humans, we obtained a set of gene desert sequences as defined in <xref ref-type="bibr" rid="bib65">Ovcharenko et al. (2005)</xref>. We selected two stable and two flexible gene deserts (the definition depends on the degree of conservation in other species). They belonged to chromosome 4 (flexible located in coordinates 136,000,001–138,000,000; stable located in coordinates 180,000,001–182,000,010) that has a high number of gene deserts; and chromosome 17 (flexible located in coordinates 51,100,001–51,900,000; stable located in coordinates 69,300,001–70,000,000) that has a high gene density. We ensured that no protein-coding genes were annotated in subsequent Ensembl versions in these regions. We predicted all possible ORFs in these regions and evaluated their coding score and GC content.</p></sec><sec id="s4-9"><title>ORF coding score</title><p>The examination of nucleotide hexamer frequencies has been shown to be a powerful way to distinguish between coding and non-coding sequences (<xref ref-type="bibr" rid="bib81">Sun et al., 2013</xref>; <xref ref-type="bibr" rid="bib92">Wang et al., 2013</xref>). We computed one coding score (CS) per hexamer:<disp-formula id="equ1"><mml:math id="m1"><mml:mrow><mml:msub><mml:mrow><mml:mi>C</mml:mi><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>log</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mfrac bevelled="true"><mml:mrow><mml:msub><mml:mrow><mml:mi>f</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>q</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>f</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>q</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>−</mml:mo><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:mrow></mml:math></disp-formula></p><p>The coding hexamer frequencies were obtained from the open reading frame of all transcripts in a species encoding experimentally validated proteins (except for zebrafish in which all protein-coding transcripts were considered). The non-coding hexamer frequencies were calculated using the longest ORF in intronic regions, which were selected randomly from expressed protein-coding genes. Next, we used the following statistic to measure the coding score of an ORF:<disp-formula id="equ2"><mml:math id="m2"><mml:mrow><mml:msub><mml:mrow><mml:mi>C</mml:mi><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>O</mml:mi><mml:mi>R</mml:mi><mml:mi>F</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mstyle displaystyle="true"><mml:msubsup><mml:mo>∑</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:msub><mml:mrow><mml:mi>C</mml:mi><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:mstyle></mml:mrow><mml:mi>n</mml:mi></mml:mfrac><mml:mo>,</mml:mo></mml:mrow></mml:math></disp-formula>where <italic>i</italic> is each sequence hexamer in the ORF, and <italic>n</italic> the number of hexamers considered.</p><p>The hexamers were calculated in steps of three nucleotides in frame (dicodons). We did not consider the initial hexamers containing a Methionine or the last hexamers containing a STOP codon, since they are not informative. Given that all ORFs were at least 75 nucleotides long the minimum value for <italic>n</italic> was 22.</p><p>We calculated other related statistics in a similar way. This included using an equiprobable hexamer distribution instead of the distribution obtained from non-coding sequences, or using codon frequencies instead of hexamer frequencies. These statistics showed somewhat lower power to distinguish between coding and non-coding sequences. As a complementary measure, we quantified the GC content in different coding and non-coding transcripts and ORFs.</p></sec><sec id="s4-10"><title>Sequence similarity searches</title><p>We employed BLASTP with an E-value cutoff of 10<sup>−4</sup> to compare the amino acid sequences encoded by ORFs in different kinds of transcripts. We enabled SEG to mask low complexity regions in protein sequences before doing the homology searches. We also searched for homologues in the NCBI non-redundant (nr) protein database (<xref ref-type="bibr" rid="bib71">Pruitt et al., 2014</xref>). BLAST sequence similarity search programs are based on gapped local alignments (<xref ref-type="bibr" rid="bib2">Altschul et al., 1997</xref>).</p></sec><sec id="s4-11"><title>Analysis of single nucleotide polymorphisms</title><p>We downloaded all available single-nucleotide polymorphisms (SNPs) from dbSNP (<xref ref-type="bibr" rid="bib77">Sherry et al., 2001</xref>) for human (∼50 million), mouse (∼64.2 million), and zebrafish (∼1.3 million). We did not consider other species due to insufficient data for the analysis. We classified SNPs in ORFs as non-synonymous (PN, amino acid altering) and synonymous (PS, not amino acid altering). We computed the PN/PS ratio in each sequence data set by using the sum of PN and PS in all sequences. The estimation of PN/PS ratios of individual sequences was in general not reliable due to lack of sufficient SNP data. We obtained confidence intervals using the proportion test in R (see below).</p></sec><sec id="s4-12"><title>Statistical data analyses</title><p>The analysis of the data, including generation of plots and statistical tests, was done with R (<xref ref-type="bibr" rid="bib73">R Development Core Team, 2010</xref>).</p></sec><sec id="s4-13"><title>Additional files</title><p><xref ref-type="supplementary-material" rid="SD1-data">Supplementary file 1</xref> contains additional Tables and <xref ref-type="supplementary-material" rid="SD2-data">Supplementary file 2</xref> data subsets. The genomic coordinates of all transcripts used in this study (GTF files) and the amino acid sequences corresponding to primary ORFs in lncRNA with coding scores significant at p < 0.05 (FASTA files) are available at figshare (<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.6084/m9.figshare.1114969">http://dx.doi.org/10.6084/m9.figshare.1114969</ext-link>).</p></sec></sec></body><back><ack id="ack"><title>Acknowledgements</title><p>We acknowledge José Luis Villanueva-Cañas and Will Blevins for critical revision of the manuscript. We are grateful to Ivan Ovcharenko for advise on gene deserts. This work was funded by Ministerio de Economía y Competitividad (BFU2012-36820 and TIN2013-45732-C4-3-P) and Fundació ICREA (MMA).</p></ack><sec sec-type="additional-information"><title>Additional information</title><fn-group content-type="competing-interest"><title>Competing interests</title><fn fn-type="conflict" id="conf1"><p>The authors declare that no competing interests exist.</p></fn></fn-group><fn-group content-type="author-contribution"><title>Author contributions</title><fn fn-type="con" id="con1"><p>JR-O, Conception and design, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article</p></fn><fn fn-type="con" id="con2"><p>XM, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article</p></fn><fn fn-type="con" id="con3"><p>JAS, Acquisition of data, Analysis and interpretation of data, Drafting or revising the article</p></fn><fn fn-type="con" id="con4"><p>MMA, Conception and design, Analysis and interpretation of data, Drafting or revising the article</p></fn></fn-group></sec><sec sec-type="supplementary-material"><title>Additional files</title><supplementary-material id="SD1-data"><object-id pub-id-type="doi">10.7554/eLife.03523.024</object-id><label>Supplementary file 1.</label><caption><title>Long non-coding RNAs as a source of new peptides. (<bold>A</bold>) Details on the number of coding transcripts associated with ribosomes. (<bold>B</bold>) ORF density and length in different types of transcripts. (<bold>C</bold>) Details on the number of non-coding transcripts associated with ribosomes. (<bold>D</bold>) Homology hits for ORFs. (<bold>E</bold>) GC content (%) in ORFs and complete sequences. (<bold>F</bold>) PN and PS values for different sequence subsets.</title><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.024">http://dx.doi.org/10.7554/eLife.03523.024</ext-link></p></caption><media mime-subtype="docx" mimetype="application" xlink:href="elife-03523-supp1-v1.docx"/></supplementary-material><supplementary-material id="SD2-data"><object-id pub-id-type="doi">10.7554/eLife.03523.025</object-id><label>Supplementary file 2.</label><caption><title>(<bold>A</bold>) Human ncRNA literature. (<bold>B</bold>) IncRNA homologies. (<bold>C</bold>) IncRNA top coding score. (<bold>D</bold>) Young codRNAe.</title><p><bold>DOI:</bold> <ext-link ext-link-type="doi" xlink:href="10.7554/eLife.03523.025">http://dx.doi.org/10.7554/eLife.03523.025</ext-link></p></caption><media mime-subtype="xls" mimetype="application" xlink:href="elife-03523-supp2-v1.xls"/></supplementary-material><sec sec-type="datasets"><title>Major datasets</title><p>The following previously published datasets were used:</p><p><related-object content-type="existing-dataset" id="dataro1" source-id="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30839" source-id-type="uri"><collab collab-type="author">Ingolia NT</collab>, <collab collab-type="author">Lareau LF</collab>, <collab collab-type="author">Weissman JS</collab>, <year>2011</year><x>, </x><source>Ribosome Profiling of Mouse Embryonic Stem Cells Reveals the Complexity of Mammalian Proteomes</source><x>, </x><ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30839">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE30839</ext-link><x>, </x><comment>Publicly available at NCBI Gene Expression Omnibus.</comment></related-object></p><p><related-object content-type="existing-dataset" id="dataro2" source-id="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22004" source-id-type="uri"><collab collab-type="author">Guo H</collab>, <collab collab-type="author">Ingolia NT</collab>, <collab collab-type="author">Weissman JS</collab>, <collab collab-type="author">Bartel DP</collab>, <year>2010</year><x>, </x><source>Mammalian microRNAs predominantly act to decrease target mRNA levels</source><x>, </x><ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22004">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE22004</ext-link><x>, </x><comment>Publicly available at NCBI Gene Expression Omnibus.</comment></related-object></p><p><related-object content-type="existing-dataset" id="dataro3" source-id="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE32900" source-id-type="uri"><collab collab-type="author">Pauli A</collab>, <collab collab-type="author">Valen E</collab>, <collab collab-type="author">Lin MF</collab>, <collab collab-type="author">Garber M</collab>, <collab collab-type="author">Vastenhouw NL</collab>, <collab collab-type="author">Levin JZ</collab>, <collab collab-type="author">Sandelin A</collab>, <collab collab-type="author">Rinn JL</collab>, <collab collab-type="author">Regev A</collab>, <collab collab-type="author">Schier AF</collab>, <year>2011</year><x>, </x><source>Comprehensive identification of long non-coding RNAs expressed during zebrafish embryogenesis</source><x>, </x><ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE32900">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE32900</ext-link><x>, </x><comment>Publicly available at NCBI Gene Expression Omnibus.</comment></related-object></p><p><related-object content-type="existing-dataset" id="dataro4" source-id="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE46512" source-id-type="uri"><collab collab-type="author">Chew G</collab>, <collab collab-type="author">Pauli A</collab>, <collab collab-type="author">Valen E</collab>, <collab collab-type="author">Schier A</collab>, <year>2013</year><x>, </x><source>Ribosome Profiling over a Zebrafish Developmental Timecourse</source><x>, </x><ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE46512">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE46512</ext-link><x>, </x><comment>Publicly available at NCBI Gene Expression Omnibus.</comment></related-object></p><p><related-object content-type="existing-dataset" id="dataro5" source-id="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE49197" source-id-type="uri"><collab collab-type="author">Dunn JG</collab>, <collab collab-type="author">Weissman JS</collab>, <year>2013</year><x>, </x><source>Ribosome profiling reveals pervasive and regulated stop codon readthrough in Drosophila melanogaster</source><x>, </x><ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE49197">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE49197</ext-link><x>, </x><comment>Publicly available at NCBI Gene Expression Omnibus.</comment></related-object></p><p><related-object content-type="existing-dataset" id="dataro6" source-id="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE50597" source-id-type="uri"><collab collab-type="author">Juntawong P</collab>, <collab collab-type="author">Girke T</collab>, <collab collab-type="author">Bailey-Serres J</collab>, <year>2013</year><x>, </x><source>High-resolution mapping of ribosome footprints from Arabidopsis thaliana</source><x>, </x><ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE50597">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE50597</ext-link><x>, </x><comment>Publicly available at NCBI Gene Expression Omnibus.</comment></related-object></p><p><related-object content-type="existing-dataset" id="dataro7" source-id="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE52119" source-id-type="uri"><collab collab-type="author">McManus CJ</collab>, <collab collab-type="author">May GE</collab>, <collab collab-type="author">Spealman P</collab>, <collab collab-type="author">Shteyman A</collab>, <year>2014</year><x>, </x><source>Ribosome profiling revelas post-transcriptional buffering of divergent gene expression in yeast</source><x>, </x><ext-link ext-link-type="uri" xlink:href="http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE52119">http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE52119</ext-link><x>, </x><comment>Publicly available at NCBI Gene Expression Omnibus.</comment></related-object></p></sec></sec><ref-list><title>References</title><ref id="bib1"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Albà</surname><given-names>MM</given-names></name><name><surname>Castresana</surname><given-names>J</given-names></name></person-group><year>2005</year><article-title>Inverse relationship between evolutionary rate and age of mammalian genes</article-title><source>Molecular Biology and Evolution</source><volume>22</volume><fpage>598</fpage><lpage>606</lpage><pub-id pub-id-type="doi">10.1093/molbev/msi045</pub-id></element-citation></ref><ref id="bib2"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Altschul</surname><given-names>SF</given-names></name><name><surname>Madden</surname><given-names>TL</given-names></name><name><surname>Schäffer</surname><given-names>AA</given-names></name><name><surname>Zhang</surname><given-names>J</given-names></name><name><surname>Zhang</surname><given-names>Z</given-names></name><name><surname>Miller</surname><given-names>W</given-names></name><name><surname>Lipman</surname><given-names>DJ</given-names></name></person-group><year>1997</year><article-title>Gapped BLAST and PSI-BLAST: a new generation of protein database search programs</article-title><source>Nucleic Acids Research</source><volume>25</volume><fpage>3389</fpage><lpage>3402</lpage><pub-id pub-id-type="doi">10.1093/nar/25.17.3389</pub-id></element-citation></ref><ref id="bib3"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Andrews</surname><given-names>SJ</given-names></name><name><surname>Rothnagel</surname><given-names>JA</given-names></name></person-group><year>2014</year><article-title>Emerging evidence for functional peptides encoded by short open reading frames</article-title><source>Nature Reviews Genetics</source><volume>15</volume><fpage>193</fpage><lpage>204</lpage><pub-id pub-id-type="doi">10.1038/nrg3520</pub-id></element-citation></ref><ref id="bib4"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Artieri</surname><given-names>CG</given-names></name><name><surname>Fraser</surname><given-names>HB</given-names></name></person-group><year>2014</year><article-title>Evolution at two levels of gene expression in yeast</article-title><source>Genome Research</source><volume>24</volume><fpage>411</fpage><lpage>421</lpage><pub-id pub-id-type="doi">10.1101/gr.165522.113</pub-id></element-citation></ref><ref id="bib4a"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Aspden</surname><given-names>JL</given-names></name><name><surname>Eyre-Walker</surname><given-names>YC</given-names></name><name><surname>Philips</surname><given-names>RJ</given-names></name><name><surname>Amin</surname><given-names>U</given-names></name><name><surname>Mumtaz</surname><given-names>MA</given-names></name><name><surname>Brocard</surname><given-names>M</given-names></name><name><surname>Couso</surname><given-names>JP</given-names></name></person-group><year>2014</year><article-title>Extensive translation of small ORFs revealed by Poly-Ribo-Seq</article-title><source>eLife</source><fpage>e03528</fpage><pub-id pub-id-type="doi">10.7554/eLife.03528</pub-id></element-citation></ref><ref id="bib5"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Bazzini</surname><given-names>AA</given-names></name><name><surname>Johnstone</surname><given-names>TG</given-names></name><name><surname>Christiano</surname><given-names>R</given-names></name><name><surname>Mackowiak</surname><given-names>SD</given-names></name><name><surname>Obermayer</surname><given-names>B</given-names></name><name><surname>Fleming</surname><given-names>ES</given-names></name><name><surname>Vejnar</surname><given-names>CE</given-names></name><name><surname>Lee</surname><given-names>MT</given-names></name><name><surname>Rajewsky</surname><given-names>N</given-names></name><name><surname>Walther</surname><given-names>TC</given-names></name><name><surname>Giraldez</surname><given-names>AJ</given-names></name></person-group><year>2014</year><article-title>Identification of small ORFs in vertebrates using ribosome footprinting and evolutionary conservation</article-title><source>The EMBO Journal</source><volume>33</volume><fpage>981</fpage><lpage>993</lpage><pub-id pub-id-type="doi">10.1002/embj.201488411</pub-id></element-citation></ref><ref id="bib6"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Brar</surname><given-names>GA</given-names></name><name><surname>Yassour</surname><given-names>M</given-names></name><name><surname>Friedman</surname><given-names>N</given-names></name><name><surname>Regev</surname><given-names>A</given-names></name><name><surname>Ingolia</surname><given-names>NT</given-names></name><name><surname>Weissman</surname><given-names>JS</given-names></name></person-group><year>2012</year><article-title>High-resolution view of the yeast meiotic program revealed by ribosome profiling</article-title><source>Science</source><volume>335</volume><fpage>552</fpage><lpage>557</lpage><pub-id pub-id-type="doi">10.1126/science.1215110</pub-id></element-citation></ref><ref id="bib7"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Brockdorff</surname><given-names>N</given-names></name><name><surname>Ashworth</surname><given-names>A</given-names></name><name><surname>Kay</surname><given-names>GF</given-names></name><name><surname>McCabe</surname><given-names>VM</given-names></name><name><surname>Norris</surname><given-names>DP</given-names></name><name><surname>Cooper</surname><given-names>PJ</given-names></name><name><surname>Swift</surname><given-names>S</given-names></name><name><surname>Rastan</surname><given-names>S</given-names></name></person-group><year>1992</year><article-title>The product of the mouse Xist gene Is a 15 Kb inactive X-specific transcript containing no conserved ORF and located in the nucleus</article-title><source>Cell</source><volume>71</volume><fpage>515</fpage><lpage>526</lpage><pub-id pub-id-type="doi">10.1016/0092-8674(92)90519-I</pub-id></element-citation></ref><ref id="bib8"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cabili</surname><given-names>MN</given-names></name><name><surname>Trapnell</surname><given-names>C</given-names></name><name><surname>Goff</surname><given-names>L</given-names></name><name><surname>Koziol</surname><given-names>M</given-names></name><name><surname>Tazon-Vega</surname><given-names>B</given-names></name><name><surname>Regev</surname><given-names>A</given-names></name><name><surname>Rinn</surname><given-names>JL</given-names></name></person-group><year>2011</year><article-title>Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses</article-title><source>Genes & Development</source><volume>25</volume><fpage>1915</fpage><lpage>1927</lpage><pub-id pub-id-type="doi">10.1101/gad.17446611</pub-id></element-citation></ref><ref id="bib9"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cai</surname><given-names>J</given-names></name><name><surname>Zhao</surname><given-names>R</given-names></name><name><surname>Jiang</surname><given-names>H</given-names></name><name><surname>Wang</surname><given-names>W</given-names></name></person-group><year>2008</year><article-title>De novo origination of a new protein-coding gene in <italic>Saccharomyces cerevisiae</italic></article-title><source>Genetics</source><volume>179</volume><fpage>487</fpage><lpage>496</lpage><pub-id pub-id-type="doi">10.1534/genetics.107.084491</pub-id></element-citation></ref><ref id="bib10"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Cai</surname><given-names>JJ</given-names></name><name><surname>Borenstein</surname><given-names>E</given-names></name><name><surname>Chen</surname><given-names>R</given-names></name><name><surname>Petrov</surname><given-names>DA</given-names></name></person-group><year>2009</year><article-title>Similarly strong purifying selection acts on human disease genes of all evolutionary ages</article-title><source>Genome Biology and Evolution</source><volume>1</volume><fpage>131</fpage><lpage>144</lpage><pub-id pub-id-type="doi">10.1093/gbe/evp013</pub-id></element-citation></ref><ref id="bib11"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Carninci</surname><given-names>P</given-names></name><name><surname>Kasukawa</surname><given-names>T</given-names></name><name><surname>Katayama</surname><given-names>S</given-names></name><name><surname>Gough</surname><given-names>J</given-names></name><name><surname>Frith</surname><given-names>MC</given-names></name><name><surname>Maeda</surname><given-names>N</given-names></name><name><surname>Oyama</surname><given-names>R</given-names></name><name><surname>Ravasi</surname><given-names>T</given-names></name><name><surname>Lenhard</surname><given-names>B</given-names></name><name><surname>Wells</surname><given-names>C</given-names></name><name><surname>Kodzius</surname><given-names>R</given-names></name><name><surname>Shimokawa</surname><given-names>K</given-names></name><name><surname>Bajic</surname><given-names>VB</given-names></name><name><surname>Brenner</surname><given-names>SE</given-names></name><name><surname>Batalov</surname><given-names>S</given-names></name><name><surname>Forrest</surname><given-names>AR</given-names></name><name><surname>Zavolan</surname><given-names>M</given-names></name><name><surname>Davis</surname><given-names>MJ</given-names></name><name><surname>Wilming</surname><given-names>LG</given-names></name><name><surname>Aidinis</surname><given-names>V</given-names></name><name><surname>Allen</surname><given-names>JE</given-names></name><name><surname>Ambesi-Impiombato</surname><given-names>A</given-names></name><name><surname>Apweiler</surname><given-names>R</given-names></name><name><surname>Aturaliya</surname><given-names>RN</given-names></name><name><surname>Bailey</surname><given-names>TL</given-names></name><name><surname>Bansal</surname><given-names>M</given-names></name><name><surname>Baxter</surname><given-names>L</given-names></name><name><surname>Beisel</surname><given-names>KW</given-names></name><name><surname>Bersano</surname><given-names>T</given-names></name><name><surname>Bono</surname><given-names>H</given-names></name><name><surname>Chalk</surname><given-names>AM</given-names></name><name><surname>Chiu</surname><given-names>KP</given-names></name><name><surname>Choudhary</surname><given-names>V</given-names></name><name><surname>Christoffels</surname><given-names>A</given-names></name><name><surname>Clutterbuck</surname><given-names>DR</given-names></name><name><surname>Crowe</surname><given-names>ML</given-names></name><name><surname>Dalla</surname><given-names>E</given-names></name><name><surname>Dalrymple</surname><given-names>BP</given-names></name><name><surname>de Bono</surname><given-names>B</given-names></name><name><surname>Della Gatta</surname><given-names>G</given-names></name><name><surname>di Bernardo</surname><given-names>D</given-names></name><name><surname>Down</surname><given-names>T</given-names></name><name><surname>Engstrom</surname><given-names>P</given-names></name><name><surname>Fagiolini</surname><given-names>M</given-names></name><name><surname>Faulkner</surname><given-names>G</given-names></name><name><surname>Fletcher</surname><given-names>CF</given-names></name><name><surname>Fukushima</surname><given-names>T</given-names></name><name><surname>Furuno</surname><given-names>M</given-names></name><name><surname>Futaki</surname><given-names>S</given-names></name><name><surname>Gariboldi</surname><given-names>M</given-names></name><name><surname>Georgii-Hemming</surname><given-names>P</given-names></name><name><surname>Gingeras</surname><given-names>TR</given-names></name><name><surname>Gojobori</surname><given-names>T</given-names></name><name><surname>Green</surname><given-names>RE</given-names></name><name><surname>Gustincich</surname><given-names>S</given-names></name><name><surname>Harbers</surname><given-names>M</given-names></name><name><surname>Hayashi</surname><given-names>Y</given-names></name><name><surname>Hensch</surname><given-names>TK</given-names></name><name><surname>Hirokawa</surname><given-names>N</given-names></name><name><surname>Hill</surname><given-names>D</given-names></name><name><surname>Huminiecki</surname><given-names>L</given-names></name><name><surname>Iacono</surname><given-names>M</given-names></name><name><surname>Ikeo</surname><given-names>K</given-names></name><name><surname>Iwama</surname><given-names>A</given-names></name><name><surname>Ishikawa</surname><given-names>T</given-names></name><name><surname>Jakt</surname><given-names>M</given-names></name><name><surname>Kanapin</surname><given-names>A</given-names></name><name><surname>Katoh</surname><given-names>M</given-names></name><name><surname>Kawasawa</surname><given-names>Y</given-names></name><name><surname>Kelso</surname><given-names>J</given-names></name><name><surname>Kitamura</surname><given-names>H</given-names></name><name><surname>Kitano</surname><given-names>H</given-names></name><name><surname>Kollias</surname><given-names>G</given-names></name><name><surname>Krishnan</surname><given-names>SP</given-names></name><name><surname>Kruger</surname><given-names>A</given-names></name><name><surname>Kummerfeld</surname><given-names>SK</given-names></name><name><surname>Kurochkin</surname><given-names>IV</given-names></name><name><surname>Lareau</surname><given-names>LF</given-names></name><name><surname>Lazarevic</surname><given-names>D</given-names></name><name><surname>Lipovich</surname><given-names>L</given-names></name><name><surname>Liu</surname><given-names>J</given-names></name><name><surname>Liuni</surname><given-names>S</given-names></name><name><surname>McWilliam</surname><given-names>S</given-names></name><name><surname>Madan Babu</surname><given-names>M</given-names></name><name><surname>Madera</surname><given-names>M</given-names></name><name><surname>Marchionni</surname><given-names>L</given-names></name><name><surname>Matsuda</surname><given-names>H</given-names></name><name><surname>Matsuzawa</surname><given-names>S</given-names></name><name><surname>Miki</surname><given-names>H</given-names></name><name><surname>Mignone</surname><given-names>F</given-names></name><name><surname>Miyake</surname><given-names>S</given-names></name><name><surname>Morris</surname><given-names>K</given-names></name><name><surname>Mottagui-Tabar</surname><given-names>S</given-names></name><name><surname>Mulder</surname><given-names>N</given-names></name><name><surname>Nakano</surname><given-names>N</given-names></name><name><surname>Nakauchi</surname><given-names>H</given-names></name><name><surname>Ng</surname><given-names>P</given-names></name><name><surname>Nilsson</surname><given-names>R</given-names></name><name><surname>Nishiguchi</surname><given-names>S</given-names></name><name><surname>Nishikawa</surname><given-names>S</given-names></name><name><surname>Nori</surname><given-names>F</given-names></name><name><surname>Ohara</surname><given-names>O</given-names></name><name><surname>Okazaki</surname><given-names>Y</given-names></name><name><surname>Orlando</surname><given-names>V</given-names></name><name><surname>Pang</surname><given-names>KC</given-names></name><name><surname>Pavan</surname><given-names>WJ</given-names></name><name><surname>Pavesi</surname><given-names>G</given-names></name><name><surname>Pesole</surname><given-names>G</given-names></name><name><surname>Petrovsky</surname><given-names>N</given-names></name><name><surname>Piazza</surname><given-names>S</given-names></name><name><surname>Reed</surname><given-names>J</given-names></name><name><surname>Reid</surname><given-names>JF</given-names></name><name><surname>Ring</surname><given-names>BZ</given-names></name><name><surname>Ringwald</surname><given-names>M</given-names></name><name><surname>Rost</surname><given-names>B</given-names></name><name><surname>Ruan</surname><given-names>Y</given-names></name><name><surname>Salzberg</surname><given-names>SL</given-names></name><name><surname>Sandelin</surname><given-names>A</given-names></name><name><surname>Schneider</surname><given-names>C</given-names></name><name><surname>Schönbach</surname><given-names>C</given-names></name><name><surname>Sekiguchi</surname><given-names>K</given-names></name><name><surname>Semple</surname><given-names>CA</given-names></name><name><surname>Seno</surname><given-names>S</given-names></name><name><surname>Sessa</surname><given-names>L</given-names></name><name><surname>Sheng</surname><given-names>Y</given-names></name><name><surname>Shibata</surname><given-names>Y</given-names></name><name><surname>Shimada</surname><given-names>H</given-names></name><name><surname>Shimada</surname><given-names>K</given-names></name><name><surname>Silva</surname><given-names>D</given-names></name><name><surname>Sinclair</surname><given-names>B</given-names></name><name><surname>Sperling</surname><given-names>S</given-names></name><name><surname>Stupka</surname><given-names>E</given-names></name><name><surname>Sugiura</surname><given-names>K</given-names></name><name><surname>Sultana</surname><given-names>R</given-names></name><name><surname>Takenaka</surname><given-names>Y</given-names></name><name><surname>Taki</surname><given-names>K</given-names></name><name><surname>Tammoja</surname><given-names>K</given-names></name><name><surname>Tan</surname><given-names>SL</given-names></name><name><surname>Tang</surname><given-names>S</given-names></name><name><surname>Taylor</surname><given-names>MS</given-names></name><name><surname>Tegner</surname><given-names>J</given-names></name><name><surname>Teichmann</surname><given-names>SA</given-names></name><name><surname>Ueda</surname><given-names>HR</given-names></name><name><surname>van Nimwegen</surname><given-names>E</given-names></name><name><surname>Verardo</surname><given-names>R</given-names></name><name><surname>Wei</surname><given-names>CL</given-names></name><name><surname>Yagi</surname><given-names>K</given-names></name><name><surname>Yamanishi</surname><given-names>H</given-names></name><name><surname>Zabarovsky</surname><given-names>E</given-names></name><name><surname>Zhu</surname><given-names>S</given-names></name><name><surname>Zimmer</surname><given-names>A</given-names></name><name><surname>Hide</surname><given-names>W</given-names></name><name><surname>Bult</surname><given-names>C</given-names></name><name><surname>Grimmond</surname><given-names>SM</given-names></name><name><surname>Teasdale</surname><given-names>RD</given-names></name><name><surname>Liu</surname><given-names>ET</given-names></name><name><surname>Brusic</surname><given-names>V</given-names></name><name><surname>Quackenbush</surname><given-names>J</given-names></name><name><surname>Wahlestedt</surname><given-names>C</given-names></name><name><surname>Mattick</surname><given-names>JS</given-names></name><name><surname>Hume</surname><given-names>DA</given-names></name><name><surname>Kai</surname><given-names>C</given-names></name><name><surname>Sasaki</surname><given-names>D</given-names></name><name><surname>Tomaru</surname><given-names>Y</given-names></name><name><surname>Fukuda</surname><given-names>S</given-names></name><name><surname>Kanamori-Katayama</surname><given-names>M</given-names></name><name><surname>Suzuki</surname><given-names>M</given-names></name><name><surname>Aoki</surname><given-names>J</given-names></name><name><surname>Arakawa</surname><given-names>T</given-names></name><name><surname>Iida</surname><given-names>J</given-names></name><name><surname>Imamura</surname><given-names>K</given-names></name><name><surname>Itoh</surname><given-names>M</given-names></name><name><surname>Kato</surname><given-names>T</given-names></name><name><surname>Kawaji</surname><given-names>H</given-names></name><name><surname>Kawagashira</surname><given-names>N</given-names></name><name><surname>Kawashima</surname><given-names>T</given-names></name><name><surname>Kojima</surname><given-names>M</given-names></name><name><surname>Kondo</surname><given-names>S</given-names></name><name><surname>Konno</surname><given-names>H</given-names></name><name><surname>Nakano</surname><given-names>K</given-names></name><name><surname>Ninomiya</surname><given-names>N</given-names></name><name><surname>Nishio</surname><given-names>T</given-names></name><name><surname>Okada</surname><given-names>M</given-names></name><name><surname>Plessy</surname><given-names>C</given-names></name><name><surname>Shibata</surname><given-names>K</given-names></name><name><surname>Shiraki</surname><given-names>T</given-names></name><name><surname>Suzuki</surname><given-names>S</given-names></name><name><surname>Tagami</surname><given-names>M</given-names></name><name><surname>Waki</surname><given-names>K</given-names></name><name><surname>Watahiki</surname><given-names>A</given-names></name><name><surname>Okamura-Oho</surname><given-names>Y</given-names></name><name><surname>Suzuki</surname><given-names>H</given-names></name><name><surname>Kawai</surname><given-names>J</given-names></name><name><surname>Hayashizaki</surname><given-names>Y</given-names></name>, <collab>FANTOM Consortium</collab>, <collab>RIKEN Genome Exploration Research Group and Genome Science Group (Genome Network Project Core Group)</collab></person-group><year>2005</year><article-title>The transcriptional landscape of the mammalian genome</article-title><source>Science</source><volume>309</volume><fpage>1559</fpage><lpage>1563</lpage><pub-id pub-id-type="doi">10.1126/science.1112014</pub-id></element-citation></ref><ref id="bib12"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Carvunis</surname><given-names>AR</given-names></name><name><surname>Rolland</surname><given-names>T</given-names></name><name><surname>Wapinski</surname><given-names>I</given-names></name><name><surname>Calderwood</surname><given-names>MA</given-names></name><name><surname>Yildirim</surname><given-names>MA</given-names></name><name><surname>Simonis</surname><given-names>N</given-names></name><name><surname>Charloteaux</surname><given-names>B</given-names></name><name><surname>Hidalgo</surname><given-names>CA</given-names></name><name><surname>Barbette</surname><given-names>J</given-names></name><name><surname>Santhanam</surname><given-names>B</given-names></name><name><surname>Brar</surname><given-names>GA</given-names></name><name><surname>Weissman</surname><given-names>JS</given-names></name><name><surname>Regev</surname><given-names>A</given-names></name><name><surname>Thierry-Mieg</surname><given-names>N</given-names></name><name><surname>Cusick</surname><given-names>ME</given-names></name><name><surname>Vidal</surname><given-names>M</given-names></name></person-group><year>2012</year><article-title>Proto-genes and de novo gene birth</article-title><source>Nature</source><volume>487</volume><fpage>370</fpage><lpage>374</lpage><pub-id pub-id-type="doi">10.1038/nature11184</pub-id></element-citation></ref><ref id="bib13"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chen</surname><given-names>S</given-names></name><name><surname>Zhang</surname><given-names>YE</given-names></name><name><surname>Long</surname><given-names>M</given-names></name></person-group><year>2010</year><article-title>New genes in <italic>Drosophila</italic> quickly become essential</article-title><source>Science</source><volume>330</volume><fpage>1682</fpage><lpage>1685</lpage><pub-id pub-id-type="doi">10.1126/science.1196380</pub-id></element-citation></ref><ref id="bib14"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Chew</surname><given-names>GL</given-names></name><name><surname>Pauli</surname><given-names>A</given-names></name><name><surname>Rinn</surname><given-names>JL</given-names></name><name><surname>Regev</surname><given-names>A</given-names></name><name><surname>Schier</surname><given-names>AF</given-names></name><name><surname>Valen</surname><given-names>E</given-names></name></person-group><year>2013</year><article-title>Ribosome profiling reveals resemblance between long non-coding RNAs and 5’ leaders of coding RNAs</article-title><source>Development</source><volume>140</volume><fpage>2828</fpage><lpage>2834</lpage><pub-id pub-id-type="doi">10.1242/dev.098343</pub-id></element-citation></ref><ref id="bib15"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Crappé</surname><given-names>J</given-names></name><name><surname>Van Criekinge</surname><given-names>W</given-names></name><name><surname>Trooskens</surname><given-names>G</given-names></name><name><surname>Hayakawa</surname><given-names>E</given-names></name><name><surname>Luyten</surname><given-names>W</given-names></name><name><surname>Baggerman</surname><given-names>G</given-names></name><name><surname>Menschaert</surname><given-names>G</given-names></name></person-group><year>2013</year><article-title>Combining in silico prediction and ribosome profiling in a genome-wide search for novel putatively coding sORFs</article-title><source>BMC Genomics</source><volume>14</volume><fpage>648</fpage><pub-id pub-id-type="doi">10.1186/1471-2164-14-648</pub-id></element-citation></ref><ref id="bib16"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Derrien</surname><given-names>T</given-names></name><name><surname>Johnson</surname><given-names>R</given-names></name><name><surname>Bussotti</surname><given-names>G</given-names></name><name><surname>Tanzer</surname><given-names>A</given-names></name><name><surname>Djebali</surname><given-names>S</given-names></name><name><surname>Tilgner</surname><given-names>H</given-names></name><name><surname>Guernec</surname><given-names>G</given-names></name><name><surname>Martin</surname><given-names>D</given-names></name><name><surname>Merkel</surname><given-names>A</given-names></name><name><surname>Knowles</surname><given-names>DG</given-names></name><name><surname>Lagarde</surname><given-names>J</given-names></name><name><surname>Veeravalli</surname><given-names>L</given-names></name><name><surname>Ruan</surname><given-names>X</given-names></name><name><surname>Ruan</surname><given-names>Y</given-names></name><name><surname>Lassmann</surname><given-names>T</given-names></name><name><surname>Carninci</surname><given-names>P</given-names></name><name><surname>Brown</surname><given-names>JB</given-names></name><name><surname>Lipovich</surname><given-names>L</given-names></name><name><surname>Gonzalez</surname><given-names>JM</given-names></name><name><surname>Thomas</surname><given-names>M</given-names></name><name><surname>Davis</surname><given-names>CA</given-names></name><name><surname>Shiekhattar</surname><given-names>R</given-names></name><name><surname>Gingeras</surname><given-names>TR</given-names></name><name><surname>Hubbard</surname><given-names>TJ</given-names></name><name><surname>Notredame</surname><given-names>C</given-names></name><name><surname>Harrow</surname><given-names>J</given-names></name><name><surname>Guigó</surname><given-names>R</given-names></name></person-group><year>2012</year><article-title>The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression</article-title><source>Genome Research</source><volume>22</volume><fpage>1775</fpage><lpage>1789</lpage><pub-id pub-id-type="doi">10.1101/gr.132159.111</pub-id></element-citation></ref><ref id="bib17"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Dinger</surname><given-names>ME</given-names></name><name><surname>Pang</surname><given-names>KC</given-names></name><name><surname>Mercer</surname><given-names>TR</given-names></name><name><surname>Mattick</surname><given-names>JS</given-names></name></person-group><year>2008</year><article-title>Differentiating protein-coding and noncoding RNA: challenges and ambiguities</article-title><source>PLOS Computational Biology</source><volume>4</volume><fpage>e1000176</fpage><pub-id pub-id-type="doi">10.1371/journal.pcbi.1000176</pub-id></element-citation></ref><ref id="bib18"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Djebali</surname><given-names>S</given-names></name><name><surname>Davis</surname><given-names>CA</given-names></name><name><surname>Merkel</surname><given-names>A</given-names></name><name><surname>Dobin</surname><given-names>A</given-names></name><name><surname>Lassmann</surname><given-names>T</given-names></name><name><surname>Mortazavi</surname><given-names>A</given-names></name><name><surname>Tanzer</surname><given-names>A</given-names></name><name><surname>Lagarde</surname><given-names>J</given-names></name><name><surname>Lin</surname><given-names>W</given-names></name><name><surname>Schlesinger</surname><given-names>F</given-names></name><name><surname>Xue</surname><given-names>C</given-names></name><name><surname>Marinov</surname><given-names>GK</given-names></name><name><surname>Khatun</surname><given-names>J</given-names></name><name><surname>Williams</surname><given-names>BA</given-names></name><name><surname>Zaleski</surname><given-names>C</given-names></name><name><surname>Rozowsky</surname><given-names>J</given-names></name><name><surname>Röder</surname><given-names>M</given-names></name><name><surname>Kokocinski</surname><given-names>F</given-names></name><name><surname>Abdelhamid</surname><given-names>RF</given-names></name><name><surname>Alioto</surname><given-names>T</given-names></name><name><surname>Antoshechkin</surname><given-names>I</given-names></name><name><surname>Baer</surname><given-names>MT</given-names></name><name><surname>Bar</surname><given-names>NS</given-names></name><name><surname>Batut</surname><given-names>P</given-names></name><name><surname>Bell</surname><given-names>K</given-names></name><name><surname>Bell</surname><given-names>I</given-names></name><name><surname>Chakrabortty</surname><given-names>S</given-names></name><name><surname>Chen</surname><given-names>X</given-names></name><name><surname>Chrast</surname><given-names>J</given-names></name><name><surname>Curado</surname><given-names>J</given-names></name><name><surname>Derrien</surname><given-names>T</given-names></name><name><surname>Drenkow</surname><given-names>J</given-names></name><name><surname>Dumais</surname><given-names>E</given-names></name><name><surname>Dumais</surname><given-names>J</given-names></name><name><surname>Duttagupta</surname><given-names>R</given-names></name><name><surname>Falconnet</surname><given-names>E</given-names></name><name><surname>Fastuca</surname><given-names>M</given-names></name><name><surname>Fejes-Toth</surname><given-names>K</given-names></name><name><surname>Ferreira</surname><given-names>P</given-names></name><name><surname>Foissac</surname><given-names>S</given-names></name><name><surname>Fullwood</surname><given-names>MJ</given-names></name><name><surname>Gao</surname><given-names>H</given-names></name><name><surname>Gonzalez</surname><given-names>D</given-names></name><name><surname>Gordon</surname><given-names>A</given-names></name><name><surname>Gunawardena</surname><given-names>H</given-names></name><name><surname>Howald</surname><given-names>C</given-names></name><name><surname>Jha</surname><given-names>S</given-names></name><name><surname>Johnson</surname><given-names>R</given-names></name><name><surname>Kapranov</surname><given-names>P</given-names></name><name><surname>King</surname><given-names>B</given-names></name><name><surname>Kingswood</surname><given-names>C</given-names></name><name><surname>Luo</surname><given-names>OJ</given-names></name><name><surname>Park</surname><given-names>E</given-names></name><name><surname>Persaud</surname><given-names>K</given-names></name><name><surname>Preall</surname><given-names>JB</given-names></name><name><surname>Ribeca</surname><given-names>P</given-names></name><name><surname>Risk</surname><given-names>B</given-names></name><name><surname>Robyr</surname><given-names>D</given-names></name><name><surname>Sammeth</surname><given-names>M</given-names></name><name><surname>Schaffer</surname><given-names>L</given-names></name><name><surname>See</surname><given-names>LH</given-names></name><name><surname>Shahab</surname><given-names>A</given-names></name><name><surname>Skancke</surname><given-names>J</given-names></name><name><surname>Suzuki</surname><given-names>AM</given-names></name><name><surname>Takahashi</surname><given-names>H</given-names></name><name><surname>Tilgner</surname><given-names>H</given-names></name><name><surname>Trout</surname><given-names>D</given-names></name><name><surname>Walters</surname><given-names>N</given-names></name><name><surname>Wang</surname><given-names>H</given-names></name><name><surname>Wrobel</surname><given-names>J</given-names></name><name><surname>Yu</surname><given-names>Y</given-names></name><name><surname>Ruan</surname><given-names>X</given-names></name><name><surname>Hayashizaki</surname><given-names>Y</given-names></name><name><surname>Harrow</surname><given-names>J</given-names></name><name><surname>Gerstein</surname><given-names>M</given-names></name><name><surname>Hubbard</surname><given-names>T</given-names></name><name><surname>Reymond</surname><given-names>A</given-names></name><name><surname>Antonarakis</surname><given-names>SE</given-names></name><name><surname>Hannon</surname><given-names>G</given-names></name><name><surname>Giddings</surname><given-names>MC</given-names></name><name><surname>Ruan</surname><given-names>Y</given-names></name><name><surname>Wold</surname><given-names>B</given-names></name><name><surname>Carninci</surname><given-names>P</given-names></name><name><surname>Guigó</surname><given-names>R</given-names></name><name><surname>Gingeras</surname><given-names>TR</given-names></name></person-group><year>2012</year><article-title>Landscape of transcription in human cells</article-title><source>Nature</source><volume>489</volume><fpage>101</fpage><lpage>108</lpage><pub-id pub-id-type="doi">10.1038/nature11233</pub-id></element-citation></ref><ref id="bib19"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Donoghue</surname><given-names>MT</given-names></name><name><surname>Keshavaiah</surname><given-names>C</given-names></name><name><surname>Swamidatta</surname><given-names>SH</given-names></name><name><surname>Spillane</surname><given-names>C</given-names></name></person-group><year>2011</year><article-title>Evolutionary origins of <italic>Brassicaceae</italic> specific genes in <italic>Arabidopsis thaliana</italic></article-title><source>BMC Evolutionary Biology</source><volume>11</volume><fpage>47</fpage><pub-id pub-id-type="doi">10.1186/1471-2148-11-47</pub-id></element-citation></ref><ref id="bib20"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Dunn</surname><given-names>JG</given-names></name><name><surname>Foo</surname><given-names>CK</given-names></name><name><surname>Belletier</surname><given-names>NG</given-names></name><name><surname>Gavis</surname><given-names>ER</given-names></name><name><surname>Weissman</surname><given-names>JS</given-names></name></person-group><year>2013</year><article-title>Ribosome profiling reveals pervasive and regulated stop codon readthrough in <italic>Drosophila melanogaster</italic></article-title><source>eLife</source><volume>2</volume><fpage>e01179</fpage><pub-id pub-id-type="doi">10.7554/eLife.01179</pub-id></element-citation></ref><ref id="bib21"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ekman</surname><given-names>D</given-names></name><name><surname>Björklund</surname><given-names>AK</given-names></name><name><surname>Elofsson</surname><given-names>A</given-names></name></person-group><year>2007</year><article-title>Quantification of the elevated rate of domain rearrangements in metazoa</article-title><source>Journal of Molecular Biology</source><volume>372</volume><fpage>1337</fpage><lpage>1348</lpage><pub-id pub-id-type="doi">10.1016/j.jmb.2007.06.022</pub-id></element-citation></ref><ref id="bib22"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ekman</surname><given-names>D</given-names></name><name><surname>Elofsson</surname><given-names>A</given-names></name></person-group><year>2010</year><article-title>Identifying and quantifying orphan protein sequences in fungi</article-title><source>Journal of Molecular Biology</source><volume>396</volume><fpage>396</fpage><lpage>405</lpage><pub-id pub-id-type="doi">10.1016/j.jmb.2009.11.053</pub-id></element-citation></ref><ref id="bib23"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Eyre-Walker</surname><given-names>A</given-names></name></person-group><year>2002</year><article-title>Changing effective population size and the McDonald-Kreitman test</article-title><source>Genetics</source><volume>162</volume><fpage>2017</fpage><lpage>2024</lpage></element-citation></ref><ref id="bib24"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Fatica</surname><given-names>A</given-names></name><name><surname>Bozzoni</surname><given-names>I</given-names></name></person-group><year>2014</year><article-title>Long non-coding RNAs: new players in cell differentiation and development</article-title><source>Nature Reviews Genetics</source><volume>15</volume><fpage>7</fpage><lpage>21</lpage><pub-id pub-id-type="doi">10.1038/nrg3606</pub-id></element-citation></ref><ref id="bib25"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Flicek</surname><given-names>P</given-names></name><name><surname>Amode</surname><given-names>M</given-names></name><name><surname>Barrell</surname><given-names>D</given-names></name><name><surname>Beal</surname><given-names>K</given-names></name><name><surname>Brent</surname><given-names>S</given-names></name><name><surname>Carvalho-Silva</surname><given-names>D</given-names></name><name><surname>Clapham</surname><given-names>P</given-names></name><name><surname>Coates</surname><given-names>G</given-names></name><name><surname>Fairley</surname><given-names>S</given-names></name><name><surname>Fitzgerald</surname><given-names>S</given-names></name><name><surname>Gil</surname><given-names>L</given-names></name><name><surname>Gordon</surname><given-names>L</given-names></name><name><surname>Hendrix</surname><given-names>M</given-names></name><name><surname>Hourlier</surname><given-names>T</given-names></name><name><surname>Johnson</surname><given-names>N</given-names></name><name><surname>Kähäri</surname><given-names>AK</given-names></name><name><surname>Keefe</surname><given-names>D</given-names></name><name><surname>Keenan</surname><given-names>S</given-names></name><name><surname>Kinsella</surname><given-names>R</given-names></name><name><surname>Komorowska</surname><given-names>M</given-names></name><name><surname>Koscielny</surname><given-names>G</given-names></name><name><surname>Kulesha</surname><given-names>E</given-names></name><name><surname>Larsson</surname><given-names>P</given-names></name><name><surname>Longden</surname><given-names>I</given-names></name><name><surname>McLaren</surname><given-names>W</given-names></name><name><surname>Muffato</surname><given-names>M</given-names></name><name><surname>Overduin</surname><given-names>B</given-names></name><name><surname>Pignatelli</surname><given-names>M</given-names></name><name><surname>Pritchard</surname><given-names>B</given-names></name><name><surname>Riat</surname><given-names>HS</given-names></name><name><surname>Ritchie</surname><given-names>GR</given-names></name><name><surname>Ruffier</surname><given-names>M</given-names></name><name><surname>Schuster</surname><given-names>M</given-names></name><name><surname>Sobral</surname><given-names>D</given-names></name><name><surname>Tang</surname><given-names>YA</given-names></name><name><surname>Taylor</surname><given-names>K</given-names></name><name><surname>Trevanion</surname><given-names>S</given-names></name><name><surname>Vandrovcova</surname><given-names>J</given-names></name><name><surname>White</surname><given-names>S</given-names></name><name><surname>Wilson</surname><given-names>M</given-names></name><name><surname>Wilder</surname><given-names>SP</given-names></name><name><surname>Aken</surname><given-names>BL</given-names></name><name><surname>Birney</surname><given-names>E</given-names></name><name><surname>Cunningham</surname><given-names>F</given-names></name><name><surname>Dunham</surname><given-names>I</given-names></name><name><surname>Durbin</surname><given-names>R</given-names></name><name><surname>Fernández-Suarez</surname><given-names>XM</given-names></name><name><surname>Harrow</surname><given-names>J</given-names></name><name><surname>Herrero</surname><given-names>J</given-names></name><name><surname>Hubbard</surname><given-names>TJ</given-names></name><name><surname>Parker</surname><given-names>A</given-names></name><name><surname>Proctor</surname><given-names>G</given-names></name><name><surname>Spudich</surname><given-names>G</given-names></name><name><surname>Vogel</surname><given-names>J</given-names></name><name><surname>Yates</surname><given-names>A</given-names></name><name><surname>Zadissa</surname><given-names>A</given-names></name><name><surname>Searle</surname><given-names>SM</given-names></name></person-group><year>2012</year><article-title>Ensembl 2012</article-title><source>Nucleic Acids Research</source><volume>40</volume><fpage>D84</fpage><lpage>D90</lpage><pub-id pub-id-type="doi">10.1093/nar/gkr991</pub-id></element-citation></ref><ref id="bib26"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Frith</surname><given-names>MC</given-names></name><name><surname>Forrest</surname><given-names>AR</given-names></name><name><surname>Nourbakhsh</surname><given-names>E</given-names></name><name><surname>Pang</surname><given-names>KC</given-names></name><name><surname>Kai</surname><given-names>C</given-names></name><name><surname>Kawai</surname><given-names>J</given-names></name><name><surname>Carninci</surname><given-names>P</given-names></name><name><surname>Hayashizaki</surname><given-names>Y</given-names></name><name><surname>Bailey</surname><given-names>TL</given-names></name><name><surname>Grimmond</surname><given-names>SM</given-names></name></person-group><year>2006</year><article-title>The abundance of short proteins in the mammalian proteome</article-title><source>PLOS Genetics</source><volume>2</volume><fpage>e52</fpage><pub-id pub-id-type="doi">10.1371/journal.pgen.0020052</pub-id></element-citation></ref><ref id="bib27"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Galindo</surname><given-names>MI</given-names></name><name><surname>Pueyo</surname><given-names>JI</given-names></name><name><surname>Fouix</surname><given-names>S</given-names></name><name><surname>Bishop</surname><given-names>SA</given-names></name><name><surname>Couso</surname><given-names>JP</given-names></name></person-group><year>2007</year><article-title>Peptides encoded by short ORFs control development and define a new eukaryotic gene family</article-title><source>PLOS Biology</source><volume>5</volume><fpage>e106</fpage><pub-id pub-id-type="doi">10.1371/journal.pbio.0050106</pub-id></element-citation></ref><ref id="bib28"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Guo</surname><given-names>H</given-names></name><name><surname>Ingolia</surname><given-names>NT</given-names></name><name><surname>Weissman</surname><given-names>JS</given-names></name><name><surname>Bartel</surname><given-names>DP</given-names></name></person-group><year>2010</year><article-title>Mammalian microRNAs predominantly act to decrease target mRNA levels</article-title><source>Nature</source><volume>466</volume><fpage>835</fpage><lpage>840</lpage><pub-id pub-id-type="doi">10.1038/nature09267</pub-id></element-citation></ref><ref id="bib29"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Guttman</surname><given-names>M</given-names></name><name><surname>Rinn</surname><given-names>JL</given-names></name></person-group><year>2012</year><article-title>Modular regulatory principles of large non-coding RNAs</article-title><source>Nature</source><volume>482</volume><fpage>339</fpage><lpage>346</lpage><pub-id pub-id-type="doi">10.1038/nature10887</pub-id></element-citation></ref><ref id="bib30"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Guttman</surname><given-names>M</given-names></name><name><surname>Russell</surname><given-names>P</given-names></name><name><surname>Ingolia</surname><given-names>NT</given-names></name><name><surname>Weissman</surname><given-names>JS</given-names></name><name><surname>Lander</surname><given-names>E</given-names></name></person-group><year>2013</year><article-title>Ribosome profiling provides evidence that large noncoding RNAs do not encode proteins</article-title><source>Cell</source><volume>154</volume><fpage>240</fpage><lpage>251</lpage><pub-id pub-id-type="doi">10.1016/j.cell.2013.06.009</pub-id></element-citation></ref><ref id="bib31"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hanada</surname><given-names>K</given-names></name><name><surname>Higuchi-Takeuchi</surname><given-names>M</given-names></name><name><surname>Okamoto</surname><given-names>M</given-names></name><name><surname>Yoshizumi</surname><given-names>T</given-names></name><name><surname>Shimizu</surname><given-names>M</given-names></name><name><surname>Nakaminami</surname><given-names>K</given-names></name><name><surname>Nishi</surname><given-names>R</given-names></name><name><surname>Ohashi</surname><given-names>C</given-names></name><name><surname>Iida</surname><given-names>K</given-names></name><name><surname>Tanaka</surname><given-names>M</given-names></name><name><surname>Horii</surname><given-names>Y</given-names></name><name><surname>Kawashima</surname><given-names>M</given-names></name><name><surname>Matsui</surname><given-names>K</given-names></name><name><surname>Toyoda</surname><given-names>T</given-names></name><name><surname>Shinozaki</surname><given-names>K</given-names></name><name><surname>Seki</surname><given-names>M</given-names></name><name><surname>Matsui</surname><given-names>M</given-names></name></person-group><year>2013</year><article-title>Small open reading frames associated with morphogenesis are hidden in plant genomes</article-title><source>Proceedings of the National Academy of Sciences of USA</source><volume>110</volume><fpage>2395</fpage><lpage>2400</lpage><pub-id pub-id-type="doi">10.1073/pnas.1213958110</pub-id></element-citation></ref><ref id="bib32"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Harrow</surname><given-names>J</given-names></name><name><surname>Frankish</surname><given-names>A</given-names></name><name><surname>Gonzalez</surname><given-names>JM</given-names></name><name><surname>Tapanari</surname><given-names>E</given-names></name><name><surname>Diekhans</surname><given-names>M</given-names></name><name><surname>Kokocinski</surname><given-names>F</given-names></name><name><surname>Aken</surname><given-names>BL</given-names></name><name><surname>Barrell</surname><given-names>D</given-names></name><name><surname>Zadissa</surname><given-names>A</given-names></name><name><surname>Searle</surname><given-names>S</given-names></name><name><surname>Barnes</surname><given-names>I</given-names></name><name><surname>Bignell</surname><given-names>A</given-names></name><name><surname>Boychenko</surname><given-names>V</given-names></name><name><surname>Hunt</surname><given-names>T</given-names></name><name><surname>Kay</surname><given-names>M</given-names></name><name><surname>Mukherjee</surname><given-names>G</given-names></name><name><surname>Rajan</surname><given-names>J</given-names></name><name><surname>Despacio-Reyes</surname><given-names>G</given-names></name><name><surname>Saunders</surname><given-names>G</given-names></name><name><surname>Steward</surname><given-names>C</given-names></name><name><surname>Harte</surname><given-names>R</given-names></name><name><surname>Lin</surname><given-names>M</given-names></name><name><surname>Howald</surname><given-names>C</given-names></name><name><surname>Tanzer</surname><given-names>A</given-names></name><name><surname>Derrien</surname><given-names>T</given-names></name><name><surname>Chrast</surname><given-names>J</given-names></name><name><surname>Walters</surname><given-names>N</given-names></name><name><surname>Balasubramanian</surname><given-names>S</given-names></name><name><surname>Pei</surname><given-names>B</given-names></name><name><surname>Tress</surname><given-names>M</given-names></name><name><surname>Rodriguez</surname><given-names>JM</given-names></name><name><surname>Ezkurdia</surname><given-names>I</given-names></name><name><surname>van Baren</surname><given-names>J</given-names></name><name><surname>Brent</surname><given-names>M</given-names></name><name><surname>Haussler</surname><given-names>D</given-names></name><name><surname>Kellis</surname><given-names>M</given-names></name><name><surname>Valencia</surname><given-names>A</given-names></name><name><surname>Reymond</surname><given-names>A</given-names></name><name><surname>Gerstein</surname><given-names>M</given-names></name><name><surname>Guigó</surname><given-names>R</given-names></name><name><surname>Hubbard</surname><given-names>TJ</given-names></name></person-group><year>2012</year><article-title>GENCODE: the reference human genome annotation for The ENCODE Project</article-title><source>Genome Research</source><volume>22</volume><fpage>1760</fpage><lpage>1774</lpage><pub-id pub-id-type="doi">10.1101/gr.135350.111</pub-id></element-citation></ref><ref id="bib33"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Hashimoto</surname><given-names>Y</given-names></name><name><surname>Niikura</surname><given-names>T</given-names></name><name><surname>Tajima</surname><given-names>H</given-names></name><name><surname>Yasukawa</surname><given-names>T</given-names></name><name><surname>Sudo</surname><given-names>H</given-names></name><name><surname>Ito</surname><given-names>Y</given-names></name><name><surname>Kita</surname><given-names>Y</given-names></name><name><surname>Kawasumi</surname><given-names>M</given-names></name><name><surname>Kouyama</surname><given-names>K</given-names></name><name><surname>Doyu</surname><given-names>M</given-names></name><name><surname>Sobue</surname><given-names>G</given-names></name><name><surname>Koide</surname><given-names>T</given-names></name><name><surname>Tsuji</surname><given-names>S</given-names></name><name><surname>Lang</surname><given-names>J</given-names></name><name><surname>Kurokawa</surname><given-names>K</given-names></name><name><surname>Nishimoto</surname><given-names>I</given-names></name></person-group><year>2001</year><article-title>A rescue factor abolishing neuronal cell death by a wide spectrum of familial Alzheimer’s disease genes and abeta</article-title><source>Proceedings of the National Academy of Sciences of USA</source><volume>98</volume><fpage>6336</fpage><lpage>6341</lpage><pub-id pub-id-type="doi">10.1073/pnas.101133498</pub-id></element-citation></ref><ref id="bib34"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Huang</surname><given-names>Y</given-names></name><name><surname>Ainsley</surname><given-names>JA</given-names></name><name><surname>Reijmers</surname><given-names>LG</given-names></name><name><surname>Jackson</surname><given-names>FR</given-names></name></person-group><year>2013</year><article-title>Translational profiling of clock cells reveals circadianly synchronized protein synthesis</article-title><source>PLOS Biology</source><volume>11</volume><fpage>e1001703</fpage><pub-id pub-id-type="doi">10.1371/journal.pbio.1001703</pub-id></element-citation></ref><ref id="bib35"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ingolia</surname><given-names>NT</given-names></name></person-group><year>2014</year><article-title>Ribosome profiling: new views of translation, from single codons to genome scale</article-title><source>Nature Reviews Genetics</source><volume>15</volume><fpage>205</fpage><lpage>213</lpage><pub-id pub-id-type="doi">10.1038/nrg3645</pub-id></element-citation></ref><ref id="bib36"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ingolia</surname><given-names>NT</given-names></name><name><surname>Ghaemmaghami</surname><given-names>S</given-names></name><name><surname>Newman</surname><given-names>JR</given-names></name><name><surname>Weissman</surname><given-names>JS</given-names></name></person-group><year>2009</year><article-title>Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling</article-title><source>Science</source><volume>324</volume><fpage>218</fpage><lpage>223</lpage><pub-id pub-id-type="doi">10.1126/science.1168978</pub-id></element-citation></ref><ref id="bib37"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ingolia</surname><given-names>NT</given-names></name><name><surname>Lareau</surname><given-names>LF</given-names></name><name><surname>Weissman</surname><given-names>JS</given-names></name></person-group><year>2011</year><article-title>Ribosome profiling of mouse embryonic stem cells reveals the complexity and dynamics of mammalian proteomes</article-title><source>Cell</source><volume>147</volume><fpage>789</fpage><lpage>802</lpage><pub-id pub-id-type="doi">10.1016/j.cell.2011.10.002</pub-id></element-citation></ref><ref id="bib38"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Jacob</surname><given-names>F</given-names></name></person-group><year>1977</year><article-title>Evolution and tinkering</article-title><source>Science</source><volume>196</volume><fpage>1161</fpage><lpage>1166</lpage><pub-id pub-id-type="doi">10.1126/science.860134</pub-id></element-citation></ref><ref id="bib39"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Juntawong</surname><given-names>P</given-names></name><name><surname>Girke</surname><given-names>T</given-names></name><name><surname>Bazin</surname><given-names>J</given-names></name><name><surname>Bailey-Serres</surname><given-names>J</given-names></name></person-group><year>2014</year><article-title>Translational dynamics revealed by genome-wide profiling of ribosome footprints in <italic>Arabidopsis</italic></article-title><source>Proceedings of the National Academy of Sciences of USA</source><volume>111</volume><fpage>E203</fpage><lpage>E212</lpage><pub-id pub-id-type="doi">10.1073/pnas.1317811111</pub-id></element-citation></ref><ref id="bib40"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kapranov</surname><given-names>P</given-names></name><name><surname>Cheng</surname><given-names>J</given-names></name><name><surname>Dike</surname><given-names>S</given-names></name><name><surname>Nix</surname><given-names>DA</given-names></name><name><surname>Duttagupta</surname><given-names>R</given-names></name><name><surname>Willingham</surname><given-names>AT</given-names></name><name><surname>Stadler</surname><given-names>PF</given-names></name><name><surname>Hertel</surname><given-names>J</given-names></name><name><surname>Hackermüller</surname><given-names>J</given-names></name><name><surname>Hofacker</surname><given-names>IL</given-names></name><name><surname>Bell</surname><given-names>I</given-names></name><name><surname>Cheung</surname><given-names>E</given-names></name><name><surname>Drenkow</surname><given-names>J</given-names></name><name><surname>Dumais</surname><given-names>E</given-names></name><name><surname>Patel</surname><given-names>S</given-names></name><name><surname>Helt</surname><given-names>G</given-names></name><name><surname>Ganesh</surname><given-names>M</given-names></name><name><surname>Ghosh</surname><given-names>S</given-names></name><name><surname>Piccolboni</surname><given-names>A</given-names></name><name><surname>Sementchenko</surname><given-names>V</given-names></name><name><surname>Tammana</surname><given-names>H</given-names></name><name><surname>Gingeras</surname><given-names>TR</given-names></name></person-group><year>2007</year><article-title>RNA maps reveal new RNA classes and a possible function for pervasive transcription</article-title><source>Science</source><volume>316</volume><fpage>1484</fpage><lpage>1488</lpage><pub-id pub-id-type="doi">10.1126/science.1138341</pub-id></element-citation></ref><ref id="bib41"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kastenmayer</surname><given-names>JP</given-names></name><name><surname>Ni</surname><given-names>L</given-names></name><name><surname>Chu</surname><given-names>A</given-names></name><name><surname>Kitchen</surname><given-names>LE</given-names></name><name><surname>Au</surname><given-names>WC</given-names></name><name><surname>Yang</surname><given-names>H</given-names></name><name><surname>Carter</surname><given-names>CD</given-names></name><name><surname>Wheeler</surname><given-names>D</given-names></name><name><surname>Davis</surname><given-names>RW</given-names></name><name><surname>Boeke</surname><given-names>JD</given-names></name><name><surname>Snyder</surname><given-names>MA</given-names></name><name><surname>Basrai</surname><given-names>MA</given-names></name></person-group><year>2006</year><article-title>Functional genomics of genes with small open reading frames ( sORFs ) in <italic>S. Cerevisiae</italic></article-title><source>Genome Research</source><volume>16</volume><fpage>365</fpage><lpage>373</lpage><pub-id pub-id-type="doi">10.1101/gr.4355406.7</pub-id></element-citation></ref><ref id="bib42"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Khalturin</surname><given-names>K</given-names></name><name><surname>Hemmrich</surname><given-names>G</given-names></name><name><surname>Fraune</surname><given-names>S</given-names></name><name><surname>Augustin</surname><given-names>R</given-names></name><name><surname>Bosch</surname><given-names>TC</given-names></name></person-group><year>2009</year><article-title>More than just orphans: are taxonomically-restricted genes important in evolution?</article-title><source>Trends in Genetics</source><volume>25</volume><fpage>404</fpage><lpage>413</lpage><pub-id pub-id-type="doi">10.1016/j.tig.2009.07.006</pub-id></element-citation></ref><ref id="bib43"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname><given-names>D</given-names></name><name><surname>Pertea</surname><given-names>G</given-names></name><name><surname>Trapnell</surname><given-names>C</given-names></name><name><surname>Pimentel</surname><given-names>H</given-names></name><name><surname>Kelley</surname><given-names>R</given-names></name><name><surname>Salzberg</surname><given-names>SL</given-names></name></person-group><year>2013</year><article-title>TopHat2: Accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions</article-title><source>Genome Biology</source><volume>14</volume><fpage>R36</fpage><pub-id pub-id-type="doi">10.1186/gb-2013-14-4-r36</pub-id></element-citation></ref><ref id="bib44"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kim</surname><given-names>MS</given-names></name><name><surname>Pinto</surname><given-names>SM</given-names></name><name><surname>Getnet</surname><given-names>D</given-names></name><name><surname>Nirujogi</surname><given-names>RS</given-names></name><name><surname>Manda</surname><given-names>SS</given-names></name><name><surname>Chaerkady</surname><given-names>R</given-names></name><name><surname>Madugundu</surname><given-names>AK</given-names></name><name><surname>Kelkar</surname><given-names>DS</given-names></name><name><surname>Isserlin</surname><given-names>R</given-names></name><name><surname>Jain</surname><given-names>S</given-names></name><name><surname>Thomas</surname><given-names>JK</given-names></name><name><surname>Muthusamy</surname><given-names>B</given-names></name><name><surname>Leal-Rojas</surname><given-names>P</given-names></name><name><surname>Kumar</surname><given-names>P</given-names></name><name><surname>Sahasrabuddhe</surname><given-names>NA</given-names></name><name><surname>Balakrishnan</surname><given-names>L</given-names></name><name><surname>Advani</surname><given-names>J</given-names></name><name><surname>George</surname><given-names>B</given-names></name><name><surname>Renuse</surname><given-names>S</given-names></name><name><surname>Selvan</surname><given-names>LD</given-names></name><name><surname>Patil</surname><given-names>AH</given-names></name><name><surname>Nanjappa</surname><given-names>V</given-names></name><name><surname>Radhakrishnan</surname><given-names>A</given-names></name><name><surname>Prasad</surname><given-names>S</given-names></name><name><surname>Subbannayya</surname><given-names>T</given-names></name><name><surname>Raju</surname><given-names>R</given-names></name><name><surname>Kumar</surname><given-names>M</given-names></name><name><surname>Sreenivasamurthy</surname><given-names>SK</given-names></name><name><surname>Marimuthu</surname><given-names>A</given-names></name><name><surname>Sathe</surname><given-names>GJ</given-names></name><name><surname>Chavan</surname><given-names>S</given-names></name><name><surname>Datta</surname><given-names>KK</given-names></name><name><surname>Subbannayya</surname><given-names>Y</given-names></name><name><surname>Sahu</surname><given-names>A</given-names></name><name><surname>Yelamanchi</surname><given-names>SD</given-names></name><name><surname>Jayaram</surname><given-names>S</given-names></name><name><surname>Rajagopalan</surname><given-names>P</given-names></name><name><surname>Sharma</surname><given-names>J</given-names></name><name><surname>Murthy</surname><given-names>KR</given-names></name><name><surname>Syed</surname><given-names>N</given-names></name><name><surname>Goel</surname><given-names>R</given-names></name><name><surname>Khan</surname><given-names>AA</given-names></name><name><surname>Ahmad</surname><given-names>S</given-names></name><name><surname>Dey</surname><given-names>G</given-names></name><name><surname>Mudgal</surname><given-names>K</given-names></name><name><surname>Chatterjee</surname><given-names>A</given-names></name><name><surname>Huang</surname><given-names>TC</given-names></name><name><surname>Zhong</surname><given-names>J</given-names></name><name><surname>Wu</surname><given-names>X</given-names></name><name><surname>Shaw</surname><given-names>PG</given-names></name><name><surname>Freed</surname><given-names>D</given-names></name><name><surname>Zahari</surname><given-names>MS</given-names></name><name><surname>Mukherjee</surname><given-names>KK</given-names></name><name><surname>Shankar</surname><given-names>S</given-names></name><name><surname>Mahadevan</surname><given-names>A</given-names></name><name><surname>Lam</surname><given-names>H</given-names></name><name><surname>Mitchell</surname><given-names>CJ</given-names></name><name><surname>Shankar</surname><given-names>SK</given-names></name><name><surname>Satishchandra</surname><given-names>P</given-names></name><name><surname>Schroeder</surname><given-names>JT</given-names></name><name><surname>Sirdeshmukh</surname><given-names>R</given-names></name><name><surname>Maitra</surname><given-names>A</given-names></name><name><surname>Leach</surname><given-names>SD</given-names></name><name><surname>Drake</surname><given-names>CG</given-names></name><name><surname>Halushka</surname><given-names>MK</given-names></name><name><surname>Prasad</surname><given-names>TS</given-names></name><name><surname>Hruban</surname><given-names>RH</given-names></name><name><surname>Kerr</surname><given-names>CL</given-names></name><name><surname>Bader</surname><given-names>GD</given-names></name><name><surname>Iacobuzio-Donahue</surname><given-names>CA</given-names></name><name><surname>Gowda</surname><given-names>H</given-names></name><name><surname>Pandey</surname><given-names>A</given-names></name></person-group><year>2014</year><article-title>A draft map of the human proteome</article-title><source>Nature</source><volume>509</volume><fpage>575</fpage><lpage>581</lpage><pub-id pub-id-type="doi">10.1038/nature13302</pub-id></element-citation></ref><ref id="bib45"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kondo</surname><given-names>T</given-names></name><name><surname>Hashimoto</surname><given-names>Y</given-names></name><name><surname>Kato</surname><given-names>K</given-names></name><name><surname>Inagaki</surname><given-names>S</given-names></name><name><surname>Hayashi</surname><given-names>S</given-names></name><name><surname>Kageyama</surname><given-names>Y</given-names></name></person-group><year>2007</year><article-title>Small peptide regulators of actin-based cell morphogenesis encoded by a polycistronic mRNA</article-title><source>Nature Cell Biology</source><volume>9</volume><fpage>660</fpage><lpage>665</lpage><pub-id pub-id-type="doi">10.1038/ncb1595</pub-id></element-citation></ref><ref id="bib46"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Kutter</surname><given-names>C</given-names></name><name><surname>Watt</surname><given-names>S</given-names></name><name><surname>Stefflova</surname><given-names>K</given-names></name><name><surname>Wilson</surname><given-names>MD</given-names></name><name><surname>Goncalves</surname><given-names>A</given-names></name><name><surname>Ponting</surname><given-names>CP</given-names></name><name><surname>Odom</surname><given-names>DT</given-names></name><name><surname>Marques</surname><given-names>AC</given-names></name></person-group><year>2012</year><article-title>Rapid turnover of long noncoding RNAs and the evolution of gene expression</article-title><source>PLOS Genetics</source><volume>8</volume><fpage>e1002841</fpage><pub-id pub-id-type="doi">10.1371/journal.pgen.1002841</pub-id></element-citation></ref><ref id="bib47"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ladoukakis</surname><given-names>E</given-names></name><name><surname>Pereira</surname><given-names>V</given-names></name><name><surname>Magny</surname><given-names>EG</given-names></name><name><surname>Eyre-Walker</surname><given-names>A</given-names></name><name><surname>Couso</surname><given-names>JP</given-names></name></person-group><year>2011</year><article-title>Hundreds of putatively functional small open reading frames in <italic>Drosophila</italic></article-title><source>Genome Biology</source><volume>12</volume><fpage>R118</fpage><pub-id pub-id-type="doi">10.1186/gb-2011-12-11-r118</pub-id></element-citation></ref><ref id="bib48"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Langmead</surname><given-names>B</given-names></name><name><surname>Trapnell</surname><given-names>C</given-names></name><name><surname>Pop</surname><given-names>M</given-names></name><name><surname>Salzberg</surname><given-names>SL</given-names></name></person-group><year>2009</year><article-title>Ultrafast and memory-efficient alignment of short DNA sequences to the human genome</article-title><source>Genome Biology</source><volume>10</volume><fpage>R25</fpage><pub-id pub-id-type="doi">10.1186/gb-2009-10-3-r25</pub-id></element-citation></ref><ref id="bib49"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Lee</surname><given-names>C</given-names></name><name><surname>Yen</surname><given-names>K</given-names></name><name><surname>Cohen</surname><given-names>P</given-names></name></person-group><year>2013</year><article-title>Humanin: a harbinger of mitochondrial-derived peptides?</article-title><source>Trends in Endocrinology and Metabolism</source><volume>24</volume><fpage>222</fpage><lpage>228</lpage><pub-id pub-id-type="doi">10.1016/j.tem.2013.01.005</pub-id></element-citation></ref><ref id="bib50"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Levine</surname><given-names>MT</given-names></name><name><surname>Jones</surname><given-names>CD</given-names></name><name><surname>Kern</surname><given-names>AD</given-names></name><name><surname>Lindfors</surname><given-names>HA</given-names></name><name><surname>Begun</surname><given-names>DJ</given-names></name></person-group><year>2006</year><article-title>Novel genes derived from noncoding DNA in <italic>Drosophila melanogaster</italic> are frequently X-linked and exhibit testis-biased expression</article-title><source>Proceedings of the National Academy of Sciences of USA</source><volume>103</volume><fpage>9935</fpage><lpage>9939</lpage><pub-id pub-id-type="doi">10.1073/pnas.0509809103</pub-id></element-citation></ref><ref id="bib51"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname><given-names>J</given-names></name><name><surname>Hutchison</surname><given-names>K</given-names></name><name><surname>Perrone-Bizzozero</surname><given-names>N</given-names></name><name><surname>Morgan</surname><given-names>M</given-names></name><name><surname>Sui</surname><given-names>J</given-names></name><name><surname>Calhoun</surname><given-names>V</given-names></name></person-group><year>2010</year><article-title>Identification of genetic and epigenetic marks involved in population structure</article-title><source>PLOS ONE</source><volume>5</volume><fpage>e13209</fpage><pub-id pub-id-type="doi">10.1371/journal.pone.0013209</pub-id></element-citation></ref><ref id="bib52"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname><given-names>J</given-names></name><name><surname>Jung</surname><given-names>C</given-names></name><name><surname>Xu</surname><given-names>J</given-names></name><name><surname>Wang</surname><given-names>H</given-names></name><name><surname>Deng</surname><given-names>S</given-names></name><name><surname>Bernad</surname><given-names>L</given-names></name><name><surname>Arenas-Huertero</surname><given-names>C</given-names></name><name><surname>Chua</surname><given-names>NH</given-names></name></person-group><year>2012</year><article-title>Genome-wide analysis uncovers regulation of long intergenic noncoding RNAs in <italic>Arabidopsis</italic></article-title><source>The Plant Cell</source><volume>24</volume><fpage>4333</fpage><lpage>4345</lpage><pub-id pub-id-type="doi">10.1105/tpc.112.102855</pub-id></element-citation></ref><ref id="bib53"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Liu</surname><given-names>J</given-names></name><name><surname>Zhang</surname><given-names>Y</given-names></name><name><surname>Lei</surname><given-names>X</given-names></name><name><surname>Zhang</surname><given-names>Z</given-names></name></person-group><year>2008</year><article-title>Natural selection of protein structural and functional properties: a single nucleotide polymorphism perspective</article-title><source>Genome Biology</source><volume>9</volume><fpage>R69</fpage><pub-id pub-id-type="doi">10.1186/gb-2008-9-4-r69</pub-id></element-citation></ref><ref id="bib54"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Long</surname><given-names>M</given-names></name><name><surname>VanKuren</surname><given-names>NW</given-names></name><name><surname>Chen</surname><given-names>S</given-names></name><name><surname>Vibranovski</surname><given-names>MD</given-names></name></person-group><year>2013</year><article-title>New gene evolution: little did we know</article-title><source>Annual Review of Genetics</source><volume>47</volume><fpage>307</fpage><lpage>333</lpage><pub-id pub-id-type="doi">10.1146/annurev-genet-111212-133301</pub-id></element-citation></ref><ref id="bib55"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ma</surname><given-names>J</given-names></name><name><surname>Ward</surname><given-names>CC</given-names></name><name><surname>Jungreis</surname><given-names>I</given-names></name><name><surname>Slavoff</surname><given-names>SA</given-names></name><name><surname>Schwaid</surname><given-names>AG</given-names></name><name><surname>Neveu</surname><given-names>J</given-names></name><name><surname>Budnik</surname><given-names>BA</given-names></name><name><surname>Kellis</surname><given-names>M</given-names></name><name><surname>Saghatelian</surname><given-names>A</given-names></name></person-group><year>2014</year><article-title>Discovery of human sORF-encoded polypeptides (SEPs) in cell lines and tissue</article-title><source>Journal of Proteome Research</source><volume>13</volume><fpage>1757</fpage><lpage>1765</lpage><pub-id pub-id-type="doi">10.1021/pr401280w</pub-id></element-citation></ref><ref id="bib56"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Magny</surname><given-names>EG</given-names></name><name><surname>Pueyo</surname><given-names>JI</given-names></name><name><surname>Pearl</surname><given-names>FM</given-names></name><name><surname>Cespedes</surname><given-names>MA</given-names></name><name><surname>Niven</surname><given-names>JE</given-names></name><name><surname>Bishop</surname><given-names>SA</given-names></name><name><surname>Couso</surname><given-names>JP</given-names></name></person-group><year>2013</year><article-title>Conserved regulation of cardiac calcium uptake by peptides encoded in small open reading frames</article-title><source>Science</source><volume>341</volume><fpage>1116</fpage><lpage>1120</lpage><pub-id pub-id-type="doi">10.1126/science.1238802</pub-id></element-citation></ref><ref id="bib57"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>McManus</surname><given-names>CJ</given-names></name><name><surname>May</surname><given-names>GE</given-names></name><name><surname>Spealman</surname><given-names>P</given-names></name><name><surname>Shteyman</surname><given-names>A</given-names></name></person-group><year>2014</year><article-title>Ribosome profiling reveals post-transcriptional buffering of divergent gene expression in yeast</article-title><source>Genome Research</source><volume>24</volume><fpage>422</fpage><lpage>430</lpage><pub-id pub-id-type="doi">10.1101/gr.164996.113.Freely</pub-id></element-citation></ref><ref id="bib58"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Michel</surname><given-names>AM</given-names></name><name><surname>Choudhury</surname><given-names>KR</given-names></name><name><surname>Firth</surname><given-names>AE</given-names></name><name><surname>Ingolia</surname><given-names>NT</given-names></name><name><surname>Atkins</surname><given-names>JF</given-names></name><name><surname>Baranov</surname><given-names>PV</given-names></name></person-group><year>2012</year><article-title>Observation of dually decoded regions of the human genome using ribosome profiling data</article-title><source>Genome Research</source><volume>22</volume><fpage>2219</fpage><lpage>2229</lpage><pub-id pub-id-type="doi">10.1101/gr.133249.111</pub-id></element-citation></ref><ref id="bib59"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nagalakshmi</surname><given-names>U</given-names></name><name><surname>Wang</surname><given-names>Z</given-names></name><name><surname>Waern</surname><given-names>K</given-names></name><name><surname>Shou</surname><given-names>C</given-names></name><name><surname>Raha</surname><given-names>D</given-names></name><name><surname>Gerstein</surname><given-names>M</given-names></name><name><surname>Snyder</surname><given-names>M</given-names></name></person-group><year>2008</year><article-title>The transcriptional landscape of the yeast genome defined by RNA sequencing</article-title><source>Science</source><volume>320</volume><fpage>1344</fpage><lpage>1349</lpage><pub-id pub-id-type="doi">10.1126/science.1158441</pub-id></element-citation></ref><ref id="bib60"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Necsulea</surname><given-names>A</given-names></name><name><surname>Soumillon</surname><given-names>M</given-names></name><name><surname>Warnefors</surname><given-names>M</given-names></name><name><surname>Liechti</surname><given-names>A</given-names></name><name><surname>Daish</surname><given-names>T</given-names></name><name><surname>Zeller</surname><given-names>U</given-names></name><name><surname>Baker</surname><given-names>JC</given-names></name><name><surname>Grützner</surname><given-names>F</given-names></name><name><surname>Kaessmann</surname><given-names>H</given-names></name></person-group><year>2014</year><article-title>The evolution of lncRNA repertoires and expression patterns in tetrapods</article-title><source>Nature</source><volume>505</volume><fpage>635</fpage><lpage>640</lpage><pub-id pub-id-type="doi">10.1038/nature12943</pub-id></element-citation></ref><ref id="bib61"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Nei</surname><given-names>M</given-names></name><name><surname>Gojobori</surname><given-names>T</given-names></name></person-group><year>1986</year><article-title>Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions</article-title><source>Molecular Biology and Evolution</source><volume>3</volume><fpage>418</fpage><lpage>426</lpage></element-citation></ref><ref id="bib62"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Neme</surname><given-names>R</given-names></name><name><surname>Tautz</surname><given-names>D</given-names></name></person-group><year>2013</year><article-title>Phylogenetic patterns of emergence of new genes support a model of frequent de novo evolution</article-title><source>BMC Genomics</source><volume>14</volume><fpage>117</fpage><pub-id pub-id-type="doi">10.1186/1471-2164-14-117</pub-id></element-citation></ref><ref id="bib63"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Neme</surname><given-names>R</given-names></name><name><surname>Tautz</surname><given-names>D</given-names></name></person-group><year>2014</year><article-title>Evolution: dynamics of de novo gene emergence</article-title><source>Current Biology</source><volume>24</volume><fpage>R238</fpage><lpage>R240</lpage><pub-id pub-id-type="doi">10.1016/j.cub.2014.02.016</pub-id></element-citation></ref><ref id="bib64"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Okazaki</surname><given-names>Y</given-names></name><name><surname>Furuno</surname><given-names>M</given-names></name><name><surname>Kasukawa</surname><given-names>T</given-names></name><name><surname>Adachi</surname><given-names>J</given-names></name><name><surname>Bono</surname><given-names>H</given-names></name><name><surname>Kondo</surname><given-names>S</given-names></name><name><surname>Nikaido</surname><given-names>I</given-names></name><name><surname>Osato</surname><given-names>N</given-names></name><name><surname>Saito</surname><given-names>R</given-names></name><name><surname>Suzuki</surname><given-names>H</given-names></name><name><surname>Yamanaka</surname><given-names>I</given-names></name><name><surname>Kiyosawa</surname><given-names>H</given-names></name><name><surname>Yagi</surname><given-names>K</given-names></name><name><surname>Tomaru</surname><given-names>Y</given-names></name><name><surname>Hasegawa</surname><given-names>Y</given-names></name><name><surname>Nogami</surname><given-names>A</given-names></name><name><surname>Schönbach</surname><given-names>C</given-names></name><name><surname>Gojobori</surname><given-names>T</given-names></name><name><surname>Baldarelli</surname><given-names>R</given-names></name><name><surname>Hill</surname><given-names>DP</given-names></name><name><surname>Bult</surname><given-names>C</given-names></name><name><surname>Hume</surname><given-names>DA</given-names></name><name><surname>Quackenbush</surname><given-names>J</given-names></name><name><surname>Schriml</surname><given-names>LM</given-names></name><name><surname>Kanapin</surname><given-names>A</given-names></name><name><surname>Matsuda</surname><given-names>H</given-names></name><name><surname>Batalov</surname><given-names>S</given-names></name><name><surname>Beisel</surname><given-names>KW</given-names></name><name><surname>Blake</surname><given-names>JA</given-names></name><name><surname>Bradt</surname><given-names>D</given-names></name><name><surname>Brusic</surname><given-names>V</given-names></name><name><surname>Chothia</surname><given-names>C</given-names></name><name><surname>Corbani</surname><given-names>LE</given-names></name><name><surname>Cousins</surname><given-names>S</given-names></name><name><surname>Dalla</surname><given-names>E</given-names></name><name><surname>Dragani</surname><given-names>TA</given-names></name><name><surname>Fletcher</surname><given-names>CF</given-names></name><name><surname>Forrest</surname><given-names>A</given-names></name><name><surname>Frazer</surname><given-names>KS</given-names></name><name><surname>Gaasterland</surname><given-names>T</given-names></name><name><surname>Gariboldi</surname><given-names>M</given-names></name><name><surname>Gissi</surname><given-names>C</given-names></name><name><surname>Godzik</surname><given-names>A</given-names></name><name><surname>Gough</surname><given-names>J</given-names></name><name><surname>Grimmond</surname><given-names>S</given-names></name><name><surname>Gustincich</surname><given-names>S</given-names></name><name><surname>Hirokawa</surname><given-names>N</given-names></name><name><surname>Jackson</surname><given-names>IJ</given-names></name><name><surname>Jarvis</surname><given-names>ED</given-names></name><name><surname>Kanai</surname><given-names>A</given-names></name><name><surname>Kawaji</surname><given-names>H</given-names></name><name><surname>Kawasawa</surname><given-names>Y</given-names></name><name><surname>Kedzierski</surname><given-names>RM</given-names></name><name><surname>King</surname><given-names>BL</given-names></name><name><surname>Konagaya</surname><given-names>A</given-names></name><name><surname>Kurochkin</surname><given-names>IV</given-names></name><name><surname>Lee</surname><given-names>Y</given-names></name><name><surname>Lenhard</surname><given-names>B</given-names></name><name><surname>Lyons</surname><given-names>PA</given-names></name><name><surname>Maglott</surname><given-names>DR</given-names></name><name><surname>Maltais</surname><given-names>L</given-names></name><name><surname>Marchionni</surname><given-names>L</given-names></name><name><surname>McKenzie</surname><given-names>L</given-names></name><name><surname>Miki</surname><given-names>H</given-names></name><name><surname>Nagashima</surname><given-names>T</given-names></name><name><surname>Numata</surname><given-names>K</given-names></name><name><surname>Okido</surname><given-names>T</given-names></name><name><surname>Pavan</surname><given-names>WJ</given-names></name><name><surname>Pertea</surname><given-names>G</given-names></name><name><surname>Pesole</surname><given-names>G</given-names></name><name><surname>Petrovsky</surname><given-names>N</given-names></name><name><surname>Pillai</surname><given-names>R</given-names></name><name><surname>Pontius</surname><given-names>JU</given-names></name><name><surname>Qi</surname><given-names>D</given-names></name><name><surname>Ramachandran</surname><given-names>S</given-names></name><name><surname>Ravasi</surname><given-names>T</given-names></name><name><surname>Reed</surname><given-names>JC</given-names></name><name><surname>Reed</surname><given-names>DJ</given-names></name><name><surname>Reid</surname><given-names>J</given-names></name><name><surname>Ring</surname><given-names>BZ</given-names></name><name><surname>Ringwald</surname><given-names>M</given-names></name><name><surname>Sandelin</surname><given-names>A</given-names></name><name><surname>Schneider</surname><given-names>C</given-names></name><name><surname>Semple</surname><given-names>CA</given-names></name><name><surname>Setou</surname><given-names>M</given-names></name><name><surname>Shimada</surname><given-names>K</given-names></name><name><surname>Sultana</surname><given-names>R</given-names></name><name><surname>Takenaka</surname><given-names>Y</given-names></name><name><surname>Taylor</surname><given-names>MS</given-names></name><name><surname>Teasdale</surname><given-names>RD</given-names></name><name><surname>Tomita</surname><given-names>M</given-names></name><name><surname>Verardo</surname><given-names>R</given-names></name><name><surname>Wagner</surname><given-names>L</given-names></name><name><surname>Wahlestedt</surname><given-names>C</given-names></name><name><surname>Wang</surname><given-names>Y</given-names></name><name><surname>Watanabe</surname><given-names>Y</given-names></name><name><surname>Wells</surname><given-names>C</given-names></name><name><surname>Wilming</surname><given-names>LG</given-names></name><name><surname>Wynshaw-Boris</surname><given-names>A</given-names></name><name><surname>Yanagisawa</surname><given-names>M</given-names></name><name><surname>Yang</surname><given-names>I</given-names></name><name><surname>Yang</surname><given-names>L</given-names></name><name><surname>Yuan</surname><given-names>Z</given-names></name><name><surname>Zavolan</surname><given-names>M</given-names></name><name><surname>Zhu</surname><given-names>Y</given-names></name><name><surname>Zimmer</surname><given-names>A</given-names></name><name><surname>Carninci</surname><given-names>P</given-names></name><name><surname>Hayatsu</surname><given-names>N</given-names></name><name><surname>Hirozane-Kishikawa</surname><given-names>T</given-names></name><name><surname>Konno</surname><given-names>H</given-names></name><name><surname>Nakamura</surname><given-names>M</given-names></name><name><surname>Sakazume</surname><given-names>N</given-names></name><name><surname>Sato</surname><given-names>K</given-names></name><name><surname>Shiraki</surname><given-names>T</given-names></name><name><surname>Waki</surname><given-names>K</given-names></name><name><surname>Kawai</surname><given-names>J</given-names></name><name><surname>Aizawa</surname><given-names>K</given-names></name><name><surname>Arakawa</surname><given-names>T</given-names></name><name><surname>Fukuda</surname><given-names>S</given-names></name><name><surname>Hara</surname><given-names>A</given-names></name><name><surname>Hashizume</surname><given-names>W</given-names></name><name><surname>Imotani</surname><given-names>K</given-names></name><name><surname>Ishii</surname><given-names>Y</given-names></name><name><surname>Itoh</surname><given-names>M</given-names></name><name><surname>Kagawa</surname><given-names>I</given-names></name><name><surname>Miyazaki</surname><given-names>A</given-names></name><name><surname>Sakai</surname><given-names>K</given-names></name><name><surname>Sasaki</surname><given-names>D</given-names></name><name><surname>Shibata</surname><given-names>K</given-names></name><name><surname>Shinagawa</surname><given-names>A</given-names></name><name><surname>Yasunishi</surname><given-names>A</given-names></name><name><surname>Yoshino</surname><given-names>M</given-names></name><name><surname>Waterston</surname><given-names>R</given-names></name><name><surname>Lander</surname><given-names>ES</given-names></name><name><surname>Rogers</surname><given-names>J</given-names></name><name><surname>Birney</surname><given-names>E</given-names></name><name><surname>Hayashizaki</surname><given-names>Y</given-names></name>, <collab>FANTOM Consortium</collab>, <collab>RIKEN Genome Exploration Research Group Phase I & II Team</collab></person-group><year>2002</year><article-title>Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs</article-title><source>Nature</source><volume>420</volume><fpage>563</fpage><lpage>573</lpage><pub-id pub-id-type="doi">10.1038/nature01266</pub-id></element-citation></ref><ref id="bib65"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ovcharenko</surname><given-names>I</given-names></name><name><surname>Loots</surname><given-names>GG</given-names></name><name><surname>Nobrega</surname><given-names>MA</given-names></name><name><surname>Hardison</surname><given-names>RC</given-names></name><name><surname>Miller</surname><given-names>W</given-names></name><name><surname>Stubbs</surname><given-names>L</given-names></name></person-group><year>2005</year><article-title>Evolution and functional classification of vertebrate gene deserts</article-title><source>Genome Research</source><volume>15</volume><fpage>137</fpage><lpage>145</lpage><pub-id pub-id-type="doi">10.1101/gr.3015505</pub-id></element-citation></ref><ref id="bib66"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Palmieri</surname><given-names>N</given-names></name><name><surname>Kosiol</surname><given-names>C</given-names></name><name><surname>Schlötterer</surname><given-names>C</given-names></name></person-group><year>2014</year><article-title>The life cycle of <italic>Drosophila</italic> orphan genes</article-title><source>eLife</source><volume>3</volume><fpage>e01311</fpage><pub-id pub-id-type="doi">10.7554/eLife.01311</pub-id></element-citation></ref><ref id="bib67"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pauli</surname><given-names>A</given-names></name><name><surname>Norris</surname><given-names>ML</given-names></name><name><surname>Valen</surname><given-names>E</given-names></name><name><surname>Chew</surname><given-names>G</given-names></name><name><surname>Gagnon</surname><given-names>JA</given-names></name><name><surname>Zimmerman</surname><given-names>S</given-names></name><name><surname>Mitchell</surname><given-names>A</given-names></name><name><surname>Ma</surname><given-names>J</given-names></name><name><surname>Dubrulle</surname><given-names>J</given-names></name><name><surname>Reyon</surname><given-names>D</given-names></name><name><surname>Tsai</surname><given-names>SQ</given-names></name><name><surname>Joung</surname><given-names>JK</given-names></name><name><surname>Saghatelian</surname><given-names>A</given-names></name><name><surname>Schier</surname><given-names>AF</given-names></name></person-group><year>2014</year><article-title>Toddler: an embryonic signal that promotes cell movement via Apelin receptors</article-title><source>Science</source><volume>343</volume><fpage>1248636</fpage><pub-id pub-id-type="doi">10.1126/science.1248636</pub-id></element-citation></ref><ref id="bib68"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pauli</surname><given-names>A</given-names></name><name><surname>Valen</surname><given-names>E</given-names></name><name><surname>Lin</surname><given-names>MF</given-names></name><name><surname>Garber</surname><given-names>M</given-names></name><name><surname>Vastenhouw</surname><given-names>NL</given-names></name><name><surname>Levin</surname><given-names>JZ</given-names></name><name><surname>Fan</surname><given-names>L</given-names></name><name><surname>Sandelin</surname><given-names>A</given-names></name><name><surname>Rinn</surname><given-names>JL</given-names></name><name><surname>Regev</surname><given-names>A</given-names></name><name><surname>Schier</surname><given-names>AF</given-names></name></person-group><year>2012</year><article-title>Systematic identification of long noncoding RNAs expressed during zebrafish embryogenesis</article-title><source>Genome Research</source><volume>22</volume><fpage>577</fpage><lpage>591</lpage><pub-id pub-id-type="doi">10.1101/gr.133009.111.2011</pub-id></element-citation></ref><ref id="bib69"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ponjavic</surname><given-names>J</given-names></name><name><surname>Ponting</surname><given-names>CP</given-names></name><name><surname>Lunter</surname><given-names>G</given-names></name></person-group><year>2007</year><article-title>Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs</article-title><source>Genome Research</source><volume>17</volume><fpage>556</fpage><lpage>565</lpage><pub-id pub-id-type="doi">10.1101/gr.6036807</pub-id></element-citation></ref><ref id="bib70"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ponting</surname><given-names>CP</given-names></name><name><surname>Oliver</surname><given-names>PL</given-names></name><name><surname>Reik</surname><given-names>W</given-names></name></person-group><year>2009</year><article-title>Evolution and functions of long noncoding RNAs</article-title><source>Cell</source><volume>136</volume><fpage>629</fpage><lpage>641</lpage><pub-id pub-id-type="doi">10.1016/j.cell.2009.02.006</pub-id></element-citation></ref><ref id="bib71"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Pruitt</surname><given-names>KD</given-names></name><name><surname>Brown</surname><given-names>GR</given-names></name><name><surname>Hiatt</surname><given-names>SM</given-names></name><name><surname>Thibaud-Nissen</surname><given-names>F</given-names></name><name><surname>Astashyn</surname><given-names>A</given-names></name><name><surname>Ermolaeva</surname><given-names>O</given-names></name><name><surname>Farrell</surname><given-names>CM</given-names></name><name><surname>Hart</surname><given-names>J</given-names></name><name><surname>Landrum</surname><given-names>MJ</given-names></name><name><surname>McGarvey</surname><given-names>KM</given-names></name><name><surname>Murphy</surname><given-names>MR</given-names></name><name><surname>O'Leary</surname><given-names>NA</given-names></name><name><surname>Pujar</surname><given-names>S</given-names></name><name><surname>Rajput</surname><given-names>B</given-names></name><name><surname>Rangwala</surname><given-names>SH</given-names></name><name><surname>Riddick</surname><given-names>LD</given-names></name><name><surname>Shkeda</surname><given-names>A</given-names></name><name><surname>Sun</surname><given-names>H</given-names></name><name><surname>Tamez</surname><given-names>P</given-names></name><name><surname>Tully</surname><given-names>RE</given-names></name><name><surname>Wallin</surname><given-names>C</given-names></name><name><surname>Webb</surname><given-names>D</given-names></name><name><surname>Weber</surname><given-names>J</given-names></name><name><surname>Wu</surname><given-names>W</given-names></name><name><surname>DiCuccio</surname><given-names>M</given-names></name><name><surname>Kitts</surname><given-names>P</given-names></name><name><surname>Maglott</surname><given-names>DR</given-names></name><name><surname>Murphy</surname><given-names>TD</given-names></name><name><surname>Ostell</surname><given-names>JM</given-names></name></person-group><year>2014</year><article-title>RefSeq: an update on mammalian reference sequences</article-title><source>Nucleic Acids Research</source><volume>42</volume><fpage>D756</fpage><lpage>D763</lpage><pub-id pub-id-type="doi">10.1093/nar/gkt1114</pub-id></element-citation></ref><ref id="bib72"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Quinlan</surname><given-names>AR</given-names></name><name><surname>Hall</surname><given-names>IM</given-names></name></person-group><year>2010</year><article-title>BEDTools: a flexible suite of utilities for comparing genomic features</article-title><source>Bioinformatics</source><volume>26</volume><fpage>841</fpage><lpage>842</lpage><pub-id pub-id-type="doi">10.1093/bioinformatics/btq033</pub-id></element-citation></ref><ref id="bib73"><element-citation publication-type="book"><person-group person-group-type="author"><collab>R Development Core Team</collab></person-group><year>2010</year><article-title>R: a language and environment for statistical computing</article-title><source>R Foundation for statistical computing</source><publisher-loc>Vienna Austria</publisher-loc><publisher-name>R Foundation for statistical computing</publisher-name></element-citation></ref><ref id="bib74"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Reinhardt</surname><given-names>JA</given-names></name><name><surname>Wanjiru</surname><given-names>BM</given-names></name><name><surname>Brant</surname><given-names>AT</given-names></name><name><surname>Saelao</surname><given-names>P</given-names></name><name><surname>Begun</surname><given-names>DJ</given-names></name><name><surname>Jones</surname><given-names>CD</given-names></name></person-group><year>2013</year><article-title>De novo ORFs in <italic>Drosophila</italic> are important to organismal fitness and evolved rapidly from previously non-coding sequences</article-title><source>PLOS Genetics</source><volume>9</volume><fpage>e1003860</fpage><pub-id pub-id-type="doi">10.1371/journal.pgen.1003860</pub-id></element-citation></ref><ref id="bib75"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Savard</surname><given-names>J</given-names></name><name><surname>Marques-Souza</surname><given-names>H</given-names></name><name><surname>Aranda</surname><given-names>M</given-names></name><name><surname>Tautz</surname><given-names>D</given-names></name></person-group><year>2006</year><article-title>A segmentation gene in tribolium produces a polycistronic mRNA that codes for multiple conserved peptides</article-title><source>Cell</source><volume>126</volume><fpage>559</fpage><lpage>569</lpage><pub-id pub-id-type="doi">10.1016/j.cell.2006.05.053</pub-id></element-citation></ref><ref id="bib76"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Scofield</surname><given-names>DG</given-names></name><name><surname>Hong</surname><given-names>X</given-names></name><name><surname>Lynch</surname><given-names>M</given-names></name></person-group><year>2007</year><article-title>Position of the final intron in full-length transcripts: determined by NMD?</article-title><source>Molecular Biology and Evolution</source><volume>24</volume><fpage>896</fpage><lpage>899</lpage><pub-id pub-id-type="doi">10.1093/molbev/msm010</pub-id></element-citation></ref><ref id="bib77"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sherry</surname><given-names>ST</given-names></name><name><surname>Ward</surname><given-names>MH</given-names></name><name><surname>Kholodov</surname><given-names>M</given-names></name><name><surname>Baker</surname><given-names>J</given-names></name><name><surname>Phan</surname><given-names>L</given-names></name><name><surname>Smigielski</surname><given-names>EM</given-names></name><name><surname>Sirotkin</surname><given-names>K</given-names></name></person-group><year>2001</year><article-title>dbSNP: the NCBI database of genetic variation</article-title><source>Nucleic Acids Research</source><volume>29</volume><fpage>308</fpage><lpage>311</lpage><pub-id pub-id-type="doi">10.1093/nar/29.1.308</pub-id></element-citation></ref><ref id="bib78"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Slavoff</surname><given-names>SA</given-names></name><name><surname>Heo</surname><given-names>J</given-names></name><name><surname>Budnik</surname><given-names>BA</given-names></name><name><surname>Hanakahi</surname><given-names>LA</given-names></name><name><surname>Saghatelian</surname><given-names>A</given-names></name></person-group><year>2014</year><article-title>A human short open reading frame (sORF)-encoded polypeptide that stimulates DNA end joining</article-title><source>The Journal of Biological Chemistry</source><volume>289</volume><fpage>10950</fpage><lpage>10957</lpage><pub-id pub-id-type="doi">10.1074/jbc.C113.533968</pub-id></element-citation></ref><ref id="bib79"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Slavoff</surname><given-names>SA</given-names></name><name><surname>Mitchell</surname><given-names>AJ</given-names></name><name><surname>Schwaid</surname><given-names>AG</given-names></name><name><surname>Cabili</surname><given-names>MN</given-names></name><name><surname>Ma</surname><given-names>J</given-names></name><name><surname>Levin</surname><given-names>JZ</given-names></name><name><surname>Karger</surname><given-names>AD</given-names></name><name><surname>Budnik</surname><given-names>BA</given-names></name><name><surname>Rinn</surname><given-names>JL</given-names></name><name><surname>Saghatelian</surname><given-names>A</given-names></name></person-group><year>2013</year><article-title>Peptidomic discovery of short open reading frame-encoded peptides in human cells</article-title><source>Nature Chemical Biology</source><volume>9</volume><fpage>59</fpage><lpage>64</lpage><pub-id pub-id-type="doi">10.1038/nchembio.1120</pub-id></element-citation></ref><ref id="bib80"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Smeds</surname><given-names>L</given-names></name><name><surname>Künstner</surname><given-names>A</given-names></name></person-group><year>2011</year><article-title>ConDe Tri - a content dependent read trimmer for illumina data</article-title><source>PLOS ONE</source><volume>6</volume><fpage>e26314</fpage><pub-id pub-id-type="doi">10.1371/journal.pone.0026314</pub-id></element-citation></ref><ref id="bib81"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Sun</surname><given-names>L</given-names></name><name><surname>Luo</surname><given-names>H</given-names></name><name><surname>Bu</surname><given-names>D</given-names></name><name><surname>Zhao</surname><given-names>G</given-names></name><name><surname>Yu</surname><given-names>K</given-names></name><name><surname>Zhang</surname><given-names>C</given-names></name><name><surname>Liu</surname><given-names>Y</given-names></name><name><surname>Chen</surname><given-names>R</given-names></name><name><surname>Zhao</surname><given-names>Y</given-names></name></person-group><year>2013</year><article-title>Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts</article-title><source>Nucleic Acids Research</source><volume>41</volume><fpage>e166</fpage><pub-id pub-id-type="doi">10.1093/nar/gkt646</pub-id></element-citation></ref><ref id="bib82"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tani</surname><given-names>H</given-names></name><name><surname>Torimura</surname><given-names>M</given-names></name><name><surname>Akimitsu</surname><given-names>N</given-names></name></person-group><year>2013</year><article-title>The RNA degradation pathway regulates the function of GAS5 a non-coding RNA in mammalian cells</article-title><source>PLOS ONE</source><volume>8</volume><fpage>e55684</fpage><pub-id pub-id-type="doi">10.1371/journal.pone.0055684</pub-id></element-citation></ref><ref id="bib83"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tautz</surname><given-names>D</given-names></name></person-group><year>2009</year><article-title>Polycistronic peptide coding genes in eukaryotes–how widespread are they?</article-title><source>Briefings in Functional Genomics & Proteomics</source><volume>8</volume><fpage>68</fpage><lpage>74</lpage><pub-id pub-id-type="doi">10.1093/bfgp/eln054</pub-id></element-citation></ref><ref id="bib84"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Tautz</surname><given-names>D</given-names></name><name><surname>Domazet-Lošo</surname><given-names>T</given-names></name></person-group><year>2011</year><article-title>The evolutionary origin of orphan genes</article-title><source>Nature Reviews Genetics</source><volume>12</volume><fpage>692</fpage><lpage>702</lpage><pub-id pub-id-type="doi">10.1038/nrg3053</pub-id></element-citation></ref><ref id="bib85"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Toll-Riera</surname><given-names>M</given-names></name><name><surname>Bosch</surname><given-names>N</given-names></name><name><surname>Bellora</surname><given-names>N</given-names></name><name><surname>Castelo</surname><given-names>R</given-names></name><name><surname>Armengol</surname><given-names>L</given-names></name><name><surname>Estivill</surname><given-names>X</given-names></name><name><surname>Albà</surname><given-names>MM</given-names></name></person-group><year>2009</year><article-title>Origin of primate orphan genes: a comparative genomics approach</article-title><source>Molecular Biology and Evolution</source><volume>26</volume><fpage>603</fpage><lpage>612</lpage><pub-id pub-id-type="doi">10.1093/molbev/msn281</pub-id></element-citation></ref><ref id="bib86"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Trapnell</surname><given-names>C</given-names></name><name><surname>Williams</surname><given-names>BA</given-names></name><name><surname>Pertea</surname><given-names>G</given-names></name><name><surname>Mortazavi</surname><given-names>A</given-names></name><name><surname>Kwan</surname><given-names>G</given-names></name><name><surname>van Baren</surname><given-names>MJ</given-names></name><name><surname>Salzberg</surname><given-names>SL</given-names></name><name><surname>Wold</surname><given-names>BJ</given-names></name><name><surname>Pachter</surname><given-names>L</given-names></name></person-group><year>2010</year><article-title>Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation</article-title><source>Nature Biotechnology</source><volume>28</volume><fpage>511</fpage><lpage>515</lpage><pub-id pub-id-type="doi">10.1038/nbt.1621</pub-id></element-citation></ref><ref id="bib87"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Ulitsky</surname><given-names>I</given-names></name><name><surname>Bartel</surname><given-names>DP</given-names></name></person-group><year>2013</year><article-title>lincRNAs: genomics, evolution, and mechanisms</article-title><source>Cell</source><volume>154</volume><fpage>26</fpage><lpage>46</lpage><pub-id pub-id-type="doi">10.1016/j.cell.2013.06.020</pub-id></element-citation></ref><ref id="bib88"><element-citation publication-type="journal"><person-group person-group-type="author"><collab>UniProt Consortium</collab></person-group><year>2014</year><article-title>Activities at the Universal Protein Resource (UniProt)</article-title><source>Nucleic Acids Research</source><volume>42</volume><fpage>D191</fpage><lpage>D198</lpage><pub-id pub-id-type="doi">10.1093/nar/gkt1140</pub-id></element-citation></ref><ref id="bib89"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>van Heesch</surname><given-names>S</given-names></name><name><surname>van Iterson</surname><given-names>M</given-names></name><name><surname>Jacobi</surname><given-names>J</given-names></name><name><surname>Boymans</surname><given-names>S</given-names></name><name><surname>Essers</surname><given-names>PB</given-names></name><name><surname>de Bruijn</surname><given-names>E</given-names></name><name><surname>Hao</surname><given-names>W</given-names></name><name><surname>Macinnes</surname><given-names>AW</given-names></name><name><surname>Cuppen</surname><given-names>E</given-names></name><name><surname>Simonis</surname><given-names>M</given-names></name></person-group><year>2014</year><article-title>Extensive localization of long noncoding RNAs to the cytosol and mono- and polyribosomal complexes</article-title><source>Genome Biology</source><volume>15</volume><fpage>R6</fpage><pub-id pub-id-type="doi">10.1186/gb-2014-15-1-r6</pub-id></element-citation></ref><ref id="bib90"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Vanderperre</surname><given-names>B</given-names></name><name><surname>Lucier</surname><given-names>JF</given-names></name><name><surname>Bissonnette</surname><given-names>C</given-names></name><name><surname>Motard</surname><given-names>J</given-names></name><name><surname>Tremblay</surname><given-names>G</given-names></name><name><surname>Vanderperre</surname><given-names>S</given-names></name><name><surname>Wisztorski</surname><given-names>M</given-names></name><name><surname>Salzet</surname><given-names>M</given-names></name><name><surname>Boisvert</surname><given-names>FM</given-names></name><name><surname>Roucou</surname><given-names>X</given-names></name></person-group><year>2013</year><article-title>Direct detection of alternative open reading frames translation products in human significantly expands the proteome</article-title><source>PLOS ONE</source><volume>8</volume><fpage>e70698</fpage><pub-id pub-id-type="doi">10.1371/journal.pone.0070698</pub-id></element-citation></ref><ref id="bib91"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Vasquez</surname><given-names>JJ</given-names></name><name><surname>Hon</surname><given-names>CC</given-names></name><name><surname>Vanselow</surname><given-names>JT</given-names></name><name><surname>Schlosser</surname><given-names>A</given-names></name><name><surname>Siegel</surname><given-names>TN</given-names></name></person-group><year>2014</year><article-title>Comparative ribosome profiling reveals extensive translational complexity in different <italic>Trypanosoma brucei</italic> life cycle stages</article-title><source>Nucleic Acids Research</source><volume>42</volume><fpage>3623</fpage><lpage>3637</lpage><pub-id pub-id-type="doi">10.1093/nar/gkt1386</pub-id></element-citation></ref><ref id="bib92"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wang</surname><given-names>L</given-names></name><name><surname>Park</surname><given-names>HJ</given-names></name><name><surname>Dasari</surname><given-names>S</given-names></name><name><surname>Wang</surname><given-names>S</given-names></name><name><surname>Kocher</surname><given-names>JP</given-names></name><name><surname>Li</surname><given-names>W</given-names></name></person-group><year>2013</year><article-title>CPAT: Coding-Potential Assessment Tool using an alignment-free logistic regression model</article-title><source>Nucleic Acids Research</source><volume>41</volume><fpage>e74</fpage><pub-id pub-id-type="doi">10.1093/nar/gkt006</pub-id></element-citation></ref><ref id="bib93"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wilson</surname><given-names>BA</given-names></name><name><surname>Masel</surname><given-names>J</given-names></name></person-group><year>2011</year><article-title>Putatively noncoding transcripts show extensive association with ribosomes</article-title><source>Genome Biology and Evolution</source><volume>3</volume><fpage>1245</fpage><lpage>1252</lpage><pub-id pub-id-type="doi">10.1093/gbe/evr099</pub-id></element-citation></ref><ref id="bib94"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Wissler</surname><given-names>L</given-names></name><name><surname>Gadau</surname><given-names>J</given-names></name><name><surname>Simola</surname><given-names>DF</given-names></name><name><surname>Helmkampf</surname><given-names>M</given-names></name><name><surname>Bornberg-Bauer</surname><given-names>E</given-names></name></person-group><year>2013</year><article-title>Mechanisms and dynamics of orphan gene emergence in insect genomes</article-title><source>Genome Biology and Evolution</source><volume>5</volume><fpage>439</fpage><lpage>455</lpage><pub-id pub-id-type="doi">10.1093/gbe/evt009</pub-id></element-citation></ref><ref id="bib95"><element-citation publication-type="journal"><person-group person-group-type="author"><name><surname>Xie</surname><given-names>C</given-names></name><name><surname>Zhang</surname><given-names>YE</given-names></name><name><surname>Chen</surname><given-names>JY</given-names></name><name><surname>Liu</surname><given-names>CJ</given-names></name><name><surname>Zhou</surname><given-names>WZ</given-names></name><name><surname>Li</surname><given-names>Y</given-names></name><name><surname>Zhang</surname><given-names>M</given-names></name><name><surname>Zhang</surname><given-names>R</given-names></name><name><surname>Wei</surname><given-names>L</given-names></name><name><surname>Li</surname><given-names>CY</given-names></name></person-group><year>2012</year><article-title>Hominoid-specific de novo protein-coding genes originating from long non-coding RNAs</article-title><source>PLOS Genetics</source><volume>8</volume><fpage>e1002942</fpage><pub-id pub-id-type="doi">10.1371/journal.pgen.1002942</pub-id></element-citation></ref></ref-list></back><sub-article article-type="article-commentary" id="SA1"><front-stub><article-id pub-id-type="doi">10.7554/eLife.03523.026</article-id><title-group><article-title>Decision letter</article-title></title-group><contrib-group content-type="section"><contrib contrib-type="editor"><name><surname>Tautz</surname><given-names>Diethard</given-names></name><role>Reviewing editor</role><aff><institution>Max Planck Institute for Evolutionary Biology</institution>, <country>Germany</country></aff></contrib></contrib-group></front-stub><body><boxed-text><p>eLife posts the editorial decision letter and author response on a selection of the published articles (subject to the approval of the authors). An edited version of the letter sent to the authors after peer review is shown, indicating the substantive concerns or comments; minor concerns are not usually shown. Reviewers have the opportunity to discuss the decision before the letter is sent (see <ext-link ext-link-type="uri" xlink:href="http://elifesciences.org/review-process">review process</ext-link>). Similarly, the author response typically shows only responses to the major concerns raised by the reviewers.</p></boxed-text><p>Thank you for sending your work entitled “Long non-coding RNAs as a source of new peptides” for consideration at <italic>eLife</italic>. Your article has been favorably evaluated by Aviv Regev (Senior editor) and 3 reviewers, one of whom is a member of our Board of Reviewing Editors.</p><p>The Reviewing editor and the other reviewers discussed their comments extensively before we reached this decision, and the Reviewing editor has assembled the following comments to help you prepare a revised submission.</p><p>This paper adds to the current active discussion on the coding potential of lncRNAs, the role of short open reading frames and the emergence of new genes. The authors use published ribosome association datasets, but use several analysis pipelines that go beyond the analysis that has previously been done with these data. However, there are two comparable published papers that do similar analysis, namely <xref ref-type="bibr" rid="bib37">Ingolia et al. 2011</xref> and Guttman et al. 2013. While the former had suggested much translation of lncRNAs, the latter denies this, although there is some overlap of authors.</p><p>Major comments that need to be addressed by additional analyses and/clarification:</p><p>The crucial point is in how far ribosome associations are partly artifacts. The fraction of lncRNAs that the authors find to be associated with ribosomes is very large. Is this because the vast majority of transcripts actually are scanned by ribosomes, or could this observation be an artifact of the way the ribosome profiling data was analyzed? Pseudo-genes, and bona-fide human lncRNAs with known non-coding functions, were investigated, but the authors found evidence of ribosome binding in these putative negative controls, i.e. possible evidence for artifacts. This issue needs to be resolved more clearly, since the current paper should go beyond the Guttmann et al. 2013 line of arguments. It is necessary to provide a convincing demonstration that the analysis of ribosome profiling data is based on signal, not on noise. This could be done by different means, for instance by deriving null models describing what fraction of transcripts would be expected to be found associated with ribosomes if all of the ribosome profiling data was random, or by calculating otherwise a False Positive Rate or False Discovery Rate in the calling of “ribosome association” per transcript. You can also try something like the Bazzini 2014 or the Carvunis 2012 method. Another possibility is to choose a class of sequences with very low ribosomal association (maybe 3'UTRs are best) and use that as an upper bound on the false positive rate. The lower bound on the false positive rate is zero, and likely to remain there, but calculating an upper bound is something that should be added.</p><p>The claim is also made that these short and hard-to-annotate protein-coding genes look young according to protein-coding metrics and PN/PS. While plausible, it is also possible that they represent a mixture of genes of all ages combined with sequences that, while perhaps translated at some level, are not really genes in the functional sense of the word (at least not yet), and whose existence is therefore highly transient in evolutionary time. Contamination with these sequences could create the same statistical effect as having young genes. The presence of such contamination is also a critical piece of evidence in theories of how de novo protein birth occurs. This basically means that there are two interpretations of the data, both interesting, and not mutually exclusive. This needs to be better clarified. For instance the results of the BlastP search against codRNAs (supplementary file 8) and the results of the BlastX search against nr could be merged into one table or bar graph counting the number of BlastP and BlastX hits in lncRNA-noribo, lncRNA-ribo, and codRNA, separately, for each species.</p><p>It is unclear why the starved conditions (<xref ref-type="table" rid="tbl1">Table 1</xref>) were used in the yeast riboprofiling data. Starvation represses translation and therefore makes the data unreliable as a marker of translation. This should therefore be redone, perhaps with the rich media conditions of <xref ref-type="bibr" rid="bib36">Ingolia et al. 2009</xref>, but if this needs to be redone anyway, ideally with the much higher coverage data of Artieri & Fraser.</p></body></sub-article><sub-article article-type="reply" id="SA2"><front-stub><article-id pub-id-type="doi">10.7554/eLife.03523.027</article-id><title-group><article-title>Author response</article-title></title-group></front-stub><body><p>Following the editor’s recommendation we have constructed a null model for random ribosome binding based on the signal in annotated 3’UTRs. The null model can be rejected for about 90% of the lncRNAs, and a similar percentage of codRNAs, with p-value < 0.05, confirming that the signal in lncRNAs is not random. We have also reanalysed the yeast transcriptome using data from a recently published study (McManus et al., 2014). Although the main findings are similar to those reported using the original dataset, the ribosome profiling sequencing read coverage is higher and the yeast growth conditions standard, making the results more representative. We have performed homology searches with coding RNAs and lncRNAs not associated with ribosomes (in addition to lncRNAs associated with ribosomes as done previously). The results clearly show that lncRNAs display limited phylogenetic conservation when compared to coding RNAs.</p><p>We have also deposited the genomic coordinates of all transcripts used in this study and the amino acid sequences corresponding to primary ORFs in lncRNA with significant coding scores in figshare (<ext-link ext-link-type="uri" xlink:href="http://dx.doi.org/10.6084/m9.figshare.1114969">http://dx.doi.org/10.6084/m9.figshare.1114969</ext-link>).</p><p><italic>The crucial point is in how far ribosome associations are partly artifacts. The fraction of lncRNAs that the authors find to be associated with ribosomes is very large. Is this because the vast majority of transcripts actually are scanned by ribosomes, or could this observation be an artifact of the way the ribosome profiling data was analyzed? Pseudo-genes, and bona-fide human lncRNAs with known non-coding functions, were investigated, but the authors found evidence of ribosome binding in these putative negative controls, i.e. possible evidence for artifacts. This issue needs to be resolved more clearly, since the current paper should go beyond the Guttmann et al. 2013 line of arguments. It is necessary to provide a convincing demonstration that the analysis of ribosome profiling data is based on signal, not on noise. This could be done by different means, for instance by deriving null models describing what fraction of transcripts would be expected to be found associated with ribosomes if all of the ribosome profiling data was random, or by calculating otherwise a False Positive Rate or False Discovery Rate in the calling of “ribosome association” per transcript. You can also try something like the Bazzini 2014 or the Carvunis 2012 method. Another possibility is to choose a class of sequences with very low ribosomal association (maybe 3'UTRs are best) and use that as an upper bound on the false positive rate. The lower bound on the false positive rate is zero, and likely to remain there, but calculating an upper bound is something that should be added</italic>.</p><p>We have chosen as a null model annotated 3’UTRs from coding transcripts. The results provides strong evidence that the observed ribosome association in lncRNAs in not random and similar to codRNAs. See below the paragraph added in the manuscript text:</p><p>“In order to determine if the ribosome profiling signal in lncRNAs was different from noise, we compared ribosome density in the transcripts it to that in 3’untranslated regions (3’UTRs). More specifically, the null model consisted in a size-matched set of sequences containing randomly taken 3’UTR from annotated coding transcripts. Ribosome density was calculated as the number of ribosome profiling reads divided by RNA-seq reads, a ratio defined as Translational Efficiency (TE) (<xref ref-type="bibr" rid="bib37">Ingolia, Lareau, and Weissman 2011</xref>). Both codRNAs and lncRNAS displayed much higher TE values than 3’UTRs in all species studied (Wilcoxon test p-value < 10<sup>-5</sup>, <xref ref-type="fig" rid="fig3">Figure 3</xref>). We could reject the null model for 90.12% of the lncRNAs and 87.19% of the codRNAs associated with ribosomes (p-value < 0.05) (see details by species in <xref ref-type="table" rid="tbl2">Table 2</xref>, Stringent set). Therefore, we concluded that the density of ribosomes in lncRNAs is much higher than expected by spurious ribosome binding.”</p><p><italic>The claim is also made that these short and hard-to-annotate protein-coding genes look young according to protein-coding metrics and PN/PS. While plausible, it is also possible that they represent a mixture of genes of all ages combined with sequences that, while perhaps translated at some level, are not really genes in the functional sense of the word (at least not yet), and whose existence is therefore highly transient in evolutionary time. Contamination with these sequences could create the same statistical effect as having young genes. The presence of such contamination is also a critical piece of evidence in theories of how</italic> de novo <italic>protein birth occurs. This basically means that there are two interpretations of the data, both interesting, and not mutually exclusive. This needs to be better clarified. For instance the results of the BlastP search against codRNAs (supplementary file 8) and the results of the BlastX search against nr could be merged into one table or bar graph counting the number of BlastP and BlastX hits in lncRNA-noribo, lncRNA-ribo, and codRNA, separately, for each species</italic>.</p><p>Previous studies have found that lncRNAs tend to be poorly conserved across species (Guttman et al., Nature 2009; Marques and Ponting, Genome Biol. 2009; Cabili, Genes Dev. 2011). This question has been thoroughly examined in a recent paper that has dated the age of human lncRNAs using de novo assembled transcriptomes from 11 other vertebrate species (Necsulea et al., Nature 2014). The authors have reported that 81% of the human lncRNAs are not conserved beyond primates and can thus be considered “young”.</p><p>In order to further confirm this trend we have extended our initial sequence homology searches to all annotated coding transcripts in the six species studied and have compared the results obtained for putatively translated ORFs in lncRNAS to those in codRNAs. The results support the extended idea that most lncRNAs are young. For example whereas we can find only protein homologues for about 13-15% of the human and mouse lncRNAs associated with ribosomes this value is > 95% for codRNAs. Details of these searches are shown in Supplementary file 1D and Supplementary file 2B.</p><p>If we discard the lncRNAs with homologues in the other species the percentage of lncRNAs associated with ribosomes continues to be very high (mouse 80.4% with respect to 81.9%, human 40.3% with respect to 43.1%) and the coding scores of the putatively translated ORFs significantly higher than those of random ORFs (new <xref ref-type="fig" rid="fig6s3">Figure 6–figure supplement 3</xref>). Therefore our observations are essentially unaltered after filtering out the oldest lncRNAs.</p><p>The idea that some of these lncRNAs are evolutionarily transient looks plausible to us. It has been shown that the rate of loss of young genes in the Drosophila obscura group is higher than that of older genes, explaining why the number of genes remains approximately constant despite a high rate of de novo gene emergence (Palmieri and Schlotterer, 2014 <italic>eLife</italic>). Similarly, we can speculate that lcnRNAs probably have a high probability of being lost during evolution.</p><p><italic>It is unclear why the starved conditions (</italic><xref ref-type="table" rid="tbl1"><italic>Table 1</italic></xref><italic>) were used in the yeast riboprofiling data. Starvation represses translation and therefore makes the data unreliable as a marker of translation. This should therefore be redone, perhaps with the rich media conditions of</italic> <xref ref-type="bibr" rid="bib36"><italic>Ingolia et al. 2009</italic></xref><italic>, but if this needs to be redone anyway, ideally with the much higher coverage data of Artieri & Fraser</italic>.</p><p>The available ribosome profiling data for <xref ref-type="bibr" rid="bib4">Artieri and Fraser (2014)</xref> was for <italic>Saccharomyces</italic> hybrids. In order to use the same species as in the original study we downloaded the <italic>Saccharomyces cerevisiae</italic> data from a related paper, McManus et al. (2014). Although we obtained a lower number of lncRNAs than when using the dataset from <xref ref-type="bibr" rid="bib36">Ingolia et al. (2009)</xref>, the reconstructed lncRNAs were longer and thus probably more reliable. The conclusions drawn are similar to those already reported using the previous dataset.</p></body></sub-article></article>

« Previous
1
2
Next »

(2-2/2)

Project

Profile

Help

Saxon » SaxonC

Bug #4302 » input.xml