Transcriptome profiling of mouse samples using nanopore sequencing of cDNA and RNA molecules

Our vision of DNA transcription and splicing has changed dramatically with the introduction of short-read sequencing. These high-throughput sequencing technologies promised to unravel the complexityof any transcriptome. Generally gene expression levels are well-captured using these technologies, butthere are still remaining caveats due to the limited read length and the fact that RNA molecules hadto be reverse transcribed before sequencing. Oxford Nanopore Technologies has recently launched aportable sequencer which offers the possibility of sequencing long reads and most importantly RNAmolecules. Here we generated a full mouse transcriptome from brain and liver using the OxfordNanopore device. As a comparison, we sequenced RNA (RNA-Seq) and cDNA (cDNA-Seq) moleculesusing both long and short reads technologies. In addition, we tested the TeloPrime preparation kit,dedicated to the enrichment of full-length transcripts. Using spike-in data, we confirmed that expres-sion levels are efficiently captured by cDNA-Seq using short reads. More importantly, Oxford NanoporeRNA-Seq tends to be more efficient, while cDNA-Seq appears to be more biased. We further showthat the cDNA library preparation of the Nanopore protocol induces read truncation for transcriptscontaining stretches of A�s. Furthermore, bioinformatics challenges remain ahead for quantifying atthe transcript level, especially when reads are not full-length. Accurate quantification of processedpseudogenes also remains difficult, and we show that current mapping protocols which map reads tothe genome largely over-estimate their expression, at the expense of their parent gene.