RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). Currently over 56% of human genomic sequence is identified and masked by the program. Sequence comparisons in RepeatMasker are performed by one of several popular search engines including nhmmer, cross_match, ABBlast/WUBlast, RMBlast and Decypher. RepeatMasker makes use of curated libraries of repeats and currently supports Dfam ( profile HMM library derived from Repbase sequences ) and Repbase, a service of the Genetic Information Research Institute.
Versions 3.3.0, 4.0.0 and 4.0.6
RepeatMasker uses the following submodules:
- Tandem Repeats Finder version 3.2.1
- RMBlast search engine version 1.2
- Repbase Update Database of Repetitive DNA version 20120418
License Terms and User Agreements
To use RepeatMasker and its submodules you must agree with the following license terms and agreements:
- Tandem Repeats Finder is licensed under the following terms:
- The author of this software grants to any individual or organization the right to use and to make an unlimited number of copies of this software. You may not de-compile, disassemble, reverse engineer, or modify the software. This software cannot be sold, incorporated into commercial software or redistributed. The author of this software accepts no responsibility for damages resulting from the use of this software and makes no warranty or representation, either express or implied, including but not limited to, any implied warranty of merchantability or fitness for a particular purpose. This software is provided as is, and the user assumes all risks when using it.
- Please cite: G. Benson, "Tandem repeats finder: a program to analyze DNA sequences" Nucleic Acids Research (1999) Vol. 27, No. 2, pp. 573-580.
- Repbase Update is a Database of Repetitive DNA published by Genetic Information Research Institute. You may use the content of the Database free of charge under the following conditions:
- You agree NOT to make the Repbase Update (or any part thereof, including Repbase Reports, Repeat Maps and other derived materials, modified or not) available to anyone outside your research group. "Make available" includes leaving the data where it may be accessible to outside individuals without your direct knowledge (e.g. on a computer to which people outside your group have login privileges), as well as directly providing it to someone. Refer any requests for the Database to GIRI.
- You agree NOT to use the Repbase Update for commercially restricted sequencing and/or proprietary sequence analysis. Commercially restricted sequencing is defined as sequencing for which a company retains patenting or licensing rights regarding the sequence, or the right to restrict or delay dissemination of the sequence; with the sole exception that sequencing is not considered to be commercially restricted if it is federally funded and the investigators adopt the data release policies endorsed at the Wellcome Trust-sponsored Bermuda meeting, i.e. immediate release of data as it is generated).
- If you are doing commercially restricted sequencing or other proprietary activities involving any portion of Repbase Update you must commercial register at GIRI.
- You agree to properly cite the Database and its specific, original contributions if directly related to your work (details).
- You certify that you are authorized to accept this agreement on behalf of your institution.
- All members of your group with access to the Database agree to the same conditions.
See online documentation here or type:
$ module add repeatmasker $ repeatmasker --help
module add repeatmasker-4-0-0
Load SW module repeatmasker, then run RepeatMasker (e.g. with sample input file my_repeatmasker_sample.fasta):
$ module add repeatmasker $ repeatmasker my_repeatmasker_sample.fasta
In the following text we explain how to run an interactive/batch RepeatMasker job.
Interactive job can be run as follows:
skirit$ qsub -I -q short -l nodes=1:ppn=1,mem=1000mb
(-I stands for interactive, -q short means the job is expected to last less than 2 hours, -l nodes=1:ppn=1,mem=1000mb means one node with one CPU and 1000 MB memory allocation is expected)
You are then redirected to a concrete machine where you can run RepeatMasker with my_repeatmasker_sample.fasta input file as follows (and then exit from the machine):
$ module add repeatmasker $ repeatmasker my_repeatmasker_sample.fasta ... $ exit
Please note that interactive regime does not bring any significant speed-up comparing to running RepeatMasker locally on your machine unless parallelism is used. Interactive regime may be used to test the execution of your job (strongly recommended) and on success you are invited to switch to running it in a batch (see below).
This is the prefered way of running jobs.
Put your input files (e.g. my_repeatmasker_sample.fasta) to a subdirectory of your home directory (which is shared on all machines).
Create the shell script my_repeatmasker_script.sh with the following contents:
#!/bin/bash # add RepeatMasker module module add repeatmasker # go to the directory where the sample file my_repeatmasker_sample.fasta is located cd /storage/home/`whoami`/subdirectory_with_the_sample_file # run RepeatMasker repeatmasker my_repeatmasker_sample.fasta
Submit this shell script by something like:
qsub -q short -l nodes=1:ppn=1,mem=1000mb my_repeatmasker_script.sh
If you expect the job to last more than two hours use -q normal (up to 24 hours) or -q long (up to 30 days). The -l parameters are to be set according to your expectation of the job resource requirements.