I have Illumina paired-end reads. I want to know why is it necessary to “scan paired reads for overlap”.

Scanning for overlaps between paired-end reads tells you what proportion of sequenced fragments was so short that paired reads at least partially covered the same sequence (e.g., with 100 nt paired-end reads and minimum overlap length of 30, paired-end read overlaps will be detected for fragments <=170 bp). In extreme case of fragment lengths close to read length, the reads from both ends will represent identical sequences and thus will be redundant. Such reads will negatively affect precision of repeat proportion calculations as well as output of some tools (e.g. detection of cluster connections). Thus, it is generally not recommended to use sequence data with high proportions of overlapping reads for RepeatExplorer.

Posted in: Other tools