RepeatExplorer is a computational pipeline for discovery and characterization of repetitive sequences in eukaryotic genomes. The pipeline uses high-throughput genome sequencing data as an input and performs graph-based clustering analysis of sequence read similarities to identify repetitive elements within analyzed samples. The analysis principles were described in Novak et al. (2010) and examples of its application can be found in a number of published papers (see Appendix). It should be noted that although the repeat identification algorithm generally works for any genome, some parts of the pipeline (e.g. protein domain-based classification of mobile elements) were primarily developed for application to plant genomics. However, there is a possibility to supply a custom repeat database to improve sensitivity in classification of non-plant repeats.
Portal or script version, and older version from 11.10.2013
GNU GPL version 3
module add repeatexplorerREportal seqclust_cmd.py -h
Older script version:
module add repeatexplorer seqclust_cmd.py -h