Where does the big peak at maximum sequencing length (for me, 51 nt) come from? Or am I the only person who gets this?

It is in both my total RNA and RPF samples. When I plot the size distribution of my trimmed reads, they fall mostly between about 15-36 nt (centred around about 28 nt), but there is also a huge peak at 51 nt.

I could understand if I was getting a range all the way up to 51 nt, but I'm not - there's a big gap above about 36 and then the peak at 51.

It is still there after removing rRNA reads and after aligning to the genome.

When we plot where these 51 nt sequences map to on the genome, the coverage is fairly even across it, so it doesn't seem to be one repetitive region or anything like that.

We plan to discard those reads, but I am interested in why we get them in the first place... Can anyone explain, please?


Sorry - should have said, this is after adapter trimming
Hi there,

I haven't observed a peak at 51nt before.

If the full length of your sequence reads is 51nt, then presumably these reads do not have any adapter sequence?

Do they map to coding regions in your genome?
Dear Audrey,

Thanks for your reply.

Yes, they map to coding regions. Maybe also to some other regions - the coverage of them across the whole genome is pretty even...

For the adaptor trimming, we removed all sequences that are recognised in the Illumina database, so there is no way I should have any adaptor sequences left.


Hi Beth,

Just so that we understand better, the reads of length 51nt have been trimmed of adapter sequence?

What are the lengths of your raw reads before removing adapter?
I presume you sequenced libraries with 50SE kits in which case what you see as 51 likely correspond to RNA fragments longer than 50. If you were doing size selection for ~30 fragments there should be no such fragments. If it is not a unique sequence, but something that aligns to different locations, Id' still try to better understand its nature. If you analyze the distribution of the fragments across different functional regions of mRNAs, i.e. leader/CDS/trailer what the distribution would look like? Like RNA-seq control or ribo-seq?
