RepeatMasker

Z MetaCentrum
Skočit na navigaci Skočit na vyhledávání


Description

RepeatMasker is a program that screens DNA sequences for interspersed repeats and low complexity DNA sequences. The output of the program is a detailed annotation of the repeats that are present in the query sequence as well as a modified version of the query sequence in which all the annotated repeats have been masked (default: replaced by Ns). Currently over 56% of human genomic sequence is identified and masked by the program. Sequence comparisons in RepeatMasker are performed by one of several popular search engines including nhmmer, cross_match, ABBlast/WUBlast, RMBlast and Decypher. RepeatMasker makes use of curated libraries of repeats and currently supports Dfam ( profile HMM library derived from Repbase sequences ) and Repbase, a service of the Genetic Information Research Institute.

Submodules

RepeatMasker (latest version 4.0.7) uses the following submodules:

License Terms and User Agreements

To use RepeatMasker and its submodules you first need to accept its licence with the following license terms and agreements:


  • Tandem Repeats Finder is licensed under the following terms:
    • The author of this software grants to any individual or organization the right to use and to make an unlimited number of copies of this software. You may not de-compile, disassemble, reverse engineer, or modify the software. This software cannot be sold, incorporated into commercial software or redistributed. The author of this software accepts no responsibility for damages resulting from the use of this software and makes no warranty or representation, either express or implied, including but not limited to, any implied warranty of merchantability or fitness for a particular purpose. This software is provided as is, and the user assumes all risks when using it.
    • Please cite: G. Benson, "Tandem repeats finder: a program to analyze DNA sequences" Nucleic Acids Research (1999) Vol. 27, No. 2, pp. 573-580.
  • Repbase Update is a Database of Repetitive DNA published by Genetic Information Research Institute. You may use the content of the Database free of charge under the following conditions:
    • You agree NOT to make the Repbase Update (or any part thereof, including Repbase Reports, Repeat Maps and other derived materials, modified or not) available to anyone outside your research group. "Make available" includes leaving the data where it may be accessible to outside individuals without your direct knowledge (e.g. on a computer to which people outside your group have login privileges), as well as directly providing it to someone. Refer any requests for the Database to GIRI.
    • You agree NOT to use the Repbase Update for commercially restricted sequencing and/or proprietary sequence analysis. Commercially restricted sequencing is defined as sequencing for which a company retains patenting or licensing rights regarding the sequence, or the right to restrict or delay dissemination of the sequence; with the sole exception that sequencing is not considered to be commercially restricted if it is federally funded and the investigators adopt the data release policies endorsed at the Wellcome Trust-sponsored Bermuda meeting, i.e. immediate release of data as it is generated).
    • If you are doing commercially restricted sequencing or other proprietary activities involving any portion of Repbase Update you must commercial register at GIRI.
    • You agree to properly cite the Database and its specific, original contributions if directly related to your work (details).
    • You certify that you are authorized to accept this agreement on behalf of your institution.
    • All members of your group with access to the Database agree to the same conditions.
You won't be able to use this program without a license agreement.

Documentation

See online documentation here or type:

$ module add repeatmasker-4.0.7
$ RepeatMasker --help

http://revbayes.github.io/tutorials.html

Usage

Upcoming modulesystem change alert!

Due to large number of applications and their versions it is not practical to keep them explicitly listed at our wiki pages. Therefore an upgrade of modulefiles is underway. A feature of this upgrade will be the existence of default module for every application. This default choice does not need version number and it will load some (usually latest) version.

You can test the new version now by adding a line

source /cvmfs/software.metacentrum.cz/modulefiles/5.1.0/loadmodules

to your script before loading a module. Then, you can list all versions of repeatmasker and load default version of repeatmasker as

module avail repeatmasker/ # list available modules
module load repeatmasker   # load (default) module


If you wish to keep up to the current system, it is still possible. Simply list all modules by

module avail repeatmasker

and choose explicit version you want to use.

RepeatMasker requires a scratch directory for calculation

Basic usage

Load SW module repeatmasker, then run RepeatMasker (e.g. with sample input file my_repeatmasker_sample.fasta):

$ module add repeatmasker-4.0.7
$ RepeatMasker my_repeatmasker_sample.fasta

Notice: for RepeatMasker versions 3.3.0 and 4.0.0, name of executable binaries is repeatmasker. Instead of this name of executable binaries for versions 4.0.6 and 4.0.7 is RepeatMasker.


In the following text we explain how to run an interactive/batch RepeatMasker job.

Interactive job

Interactive job can be run as follows:

skirit$ qsub -I -l select=1:ncpus=2:mem=4gb:scratch_local=10gb -l walltime=1:00:00

You are then redirected to a concrete machine where you can run RepeatMasker with my_repeatmasker_sample.fasta input file as follows (and then exit from the machine):

$ module add repeatmasker-4.0.7
$ cp my_repeatmasker_sample.fasta $SCRATCHDIR
$ cd $SCRATCHDIR
$ RepeatMasker my_repeatmasker_sample.fasta
...
$ exit

Please note that interactive regime does not bring any significant speed-up comparing to running RepeatMasker locally on your machine unless parallelism is used. Interactive regime may be used to test the execution of your job (strongly recommended) and on success you are invited to switch to running it in a batch (see below).

Batch job

This is the preferred way of running jobs. Create the shell script my_repeatmasker_script.sh with the following content:

#!/bin/bash

# add RepeatMasker module
module add repeatmasker-4.0.7

# copy your input data (e.g. my_repeatmasker_sample.fasta) to the scratch directory
cp my_repeatmasker_sample.fasta $SCRATCHDIR

# go to the scratch directory
cd $SCRATCHDIR

# run RepeatMasker
RepeatMasker my_repeatmasker_sample.fasta

Submit this shell script by something like:

qsub -l select=1:ncpus=2:mem=4gb:scratch_local=10gb my_repeatmasker_script.sh

Useful links