Galaxy

Z MetaCentrum
Skočit na navigaci Skočit na vyhledávání

! Deprecated !

Please see the new Galaxy documentation instead. This wiki page has been deprecated.

Description

Galaxylogo.png is an open, web-based platform for accessible, reproducible, and transparent computational biomedical research.

  • Accessible: Users without programming experience can easily specify parameters and run tools and workflows.
  • Reproducible: Galaxy captures information so that any user can repeat and understand a complete computational analysis.
  • Transparent: Users share and publish analyses via the web and create Pages, interactive, web-based documents that describe a complete analysis.

Our Galaxy instances

Currently, MetaCentrum supports two Galaxy instances:

  • RepeatExplorer instance, dedicated Galaxy with focus on RepeatExplorer tools.
  • MetaCentrum instance, general Galaxy where any tools can be installed and used. This instance replaces the old one on 15.4.2018. Moved to new server on 4.2.2020.


See sections below for more information.

Tutorials

Galaxy is widespread tool, therefore one can choose from many of tutorials to learn working with Galaxy i.e. Our MetaCentrum tutorial,the main starting page is Galaxy wiki and Video tutorial page. The starting guides


RepeatExplorer Galaxy

Our RepeatExplorer Galaxy instance (https://galaxy-elixir.cerit-sc.cz) includes utilities for Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data and tools for the detection of transposable element protein coding domains.

License

RepeatExplorer is provided as free, public, Internet accessible service, in the hope that it will be useful, but WITHOUT ANY WARRANTY.

User data availability duration

Due to increasing number of users and data storage limitations we are forced to delete all unused user data older than 60 days. This includes datasets and histories. Unused data from FTP uploads will be kept for 180 days and then will be deleted permanently.

The deletion process is fully automated.

Usage

Before you can use Galaxy portal, you have to register and your e-mail address have to be verified. This is quite simple process and takes few minutes. Please, take into account that it takes approximately 30 minutes to set up all your account information in our infrastructure -- during this time, e-mail services (for example sending bugreport, etc.) for your Galaxy account will not work properly. All registered users can use advanced features like data and workflow sharing and are free to use up to 50Gb. Please, read RepeatExplorer documentation before analyzing data!

Step by step

FTP upload

Upload of files via FTP protocol is encrypted. Please set up your ftp client as follows:

Filezilla additional settings:

  • protocol: FTP-File Transfer Protocol
  • encryption: Require implicit FTP over TLS

Windows users: Please, use WinSCP ftp client. Several versions of Filezilla program are not working properly with our FTP server set up at Windows operating system.

Linux users: Filezilla ver. 3.5.3 under Linux also does not work. In this case, use command line program curl

Linux/command line:

$ curl -T your_file -k -v -u username ftps://repeatexplorer-elixir.cerit-sc.cz

If you want to make the process non-interactive, you can create a file curl.config containing the line (type your real username and password)

-u username:password

then upload the file(s) from within a job as:

curl -T your_file -K curl.config ftps://repeatexplorer-elixir.cerit-sc.cz

curl.config file contains sensitive information, you should prevent other users from reading by command chmod 700 curl.config.

Some FTP clients might not trust the certificate. This is (unfortunately) normal in some cases. Please check the fingerprint manually if necessary and set the certificate as trustworthy. Correct fingerprints:

  • SHA1 = 15:BF:4A:06:2D:81:12:AA:9B:A4:69:6C:47:EA:B9:79:4C:82:44:45
  • SHA256 = 99:D5:19:E1:C9:9B:CE:24:3F:02:C1:70:D9:8E:6F:74:34:43:28:FC:2A:B8:5F:AD:E4:2A:DB:DD:E3:45:CD:6D
  • MD5 = 38:07:32:F7:E8:91:61:92:B8:37:FB:3D:53:B1:64:78

Known problems

When using clustering or tarean tool, I am receiving PBS: job killed error

It is often very difficult to decide how many processors and memory will your computation require - especially when tarean or seqclust tools are used. Unfortunately, this requirements have to be specified at our infrastructure before computation. This secures that other jobs which are run at same time as yours are not influenced with your computations. When PBS kills your job, It means that your computation tried to use more resources that it asked for. You can for example receive this error message:

PBS: job killed: job requested 4 cores on node zigur18.cerit-sc.cz, but the measured load was 4.422386

This means, that your job asked for 4 processors, but used more. When dealing with this error, select long & slow at Select queue field at tarean or seqclust forms. If you are receiving this error even then, increase ppn number at Modify parameters (optional) field which is right under Select queue field. Please, take a note, that maximal ppn number is 16. If you are still struggling with right set up of your job, please let us known at metasupport@rt.cesnet.cz

First line in error log of job contains word Killed

Word Killed in job error log means that your job was killed by our system. This often happens when job is trying to use more resources than it asked for (even cgroups are not able to satisfy job memory requirements). Typical message in error log looks like this:

/storage/brno7-cerit/home/galaxyelixir/galaxy/tools/repex_tarean/seqclust_wrapper.sh: line 22: 26464 Killed "${@:2}" 2> $TMPFILE

When you receive this error, you have to increase amount of memory that job can use. This can be done in Select queue field at tarean or seqclust forms. Select queue that have bigger -l mem= parameter. You can also edit this parameter yourself -- just take into account, that maximal value of this attribute is 116gb now.

Registration

See this link for PDF version.


Following text describes steps during registration to the Galaxy portal running at: https://repeatexplorer-elixir.cerit-sc.cz.Please, take into consideration that after successful registration, it takes approximately 30 minutes to set up all your Galaxy account information - for example, your e-mail services inside Galaxy portal will not be working properly during this time.

Registration can be done by following these steps:

  • visit url https://metavo.metacentrum.cz/osobniv3/wayf/elixir.jsp
  • select account that you will use for registration - eduIDcz identity can be selected (only Czech institutions), ElixirID (use only with caution) and other set of the different identities which are for example Facebook, LinkedIn, Google, MojeID, OrcID and GitHub.
  • log in to your selected account and agree when asked to share information with Perun
  • if similar user already exists in Perun, you will be asked to prove your identity by logging into this existing account - do this only in case that you want to use this account and it is truly yours (you have to provide correct username and password for this account). If you don't want to use your already existing account or similar user that Perun found is not you, you can click on the button with "It is not me" text. In case that no similar user in Perun was found, you will not be asked this question and you can skip this point.
  • fill application form for Elixir CZ IT services - please fill organization and country fields. If you change already filled e-mail address, you will have to verify your e-mail later.
  • fill application form for RepeatExplorer Galaxy - only username and password is required. You will use these value for log in to the Galaxy portal after registration is finished. If you have chosen to use your already existing account during previous steps of registration, you will be no asked to fill your password and username - these fields will be already filled based on your existing account information.
  • congratulation, you are successfully registered! By clicking to continue button, you can proceed to Galaxy portal. If you have changed your e-mail address during registration, you will have to verify it, before you can continue - you will receive message to your e-mail with verification link in it.

Acknowledgement

The user of Galaxy based RepeatExplorer is obliged to use the following acknowledgement formula in all your publications created with the support of RepeatExplorer.


Computational resources were provided by the ELIXIR-CZ project (LM2018131), part of the international ELIXIR infrastructure.


RepeatExplorer is a part of services provided by ELIXIR - European research infrastructure for biological information. For other services provided by ELIXIR's Czech Republic Node visit www.elixir-czech.cz/services.

How to Cite

If you use RepeatExplorer in your work please cite:

Novak, P., Neumann, P., Pech, J., Steinhaisl, J., Macas, J. (2013) - RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next generation sequence reads. Bioinformatics

RepeatExplorer is a part of services provided by ELIXIR - European research infrastructure for biological information. For other services provided by ELIXIR's Czech Republic Node visit www.elixir-czech.cz/services.

References

The principles of RepeatExplorer approach are described in:

Novak, P., Neumann, P., Macas, J. (2010) - Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. BMC Bioinformatics 11: 378.

Novak, P., Neumann, P., Pech, J., Steinhaisl, J., Macas, J. (2013) - RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next generation sequence reads. Bioinformatics

Repeat Explorer uses RepeatMasker and Repbase:

Jurka, J., Kapitonov, V.V., Pavlicek, A., Klonowski, P., Kohany, O., Walichiewicz, J. (2005) Repbase Update, a database of eukaryotic repetitive elements. Cytogentic and Genome Research 110:462-467

Conserved Domain Database:

Geer L.Y., Geer R.C., Gonzales N.R., Gwadz M., Hurwitz D.I., Jackson J.D., Ke Z., Lanczycki C.J., Lu F., Marchler G.H., Mullokandov M., Omelchenko M.V., Robertson C.L., Song J.S., Thanki N., Yamashita R.A., Zhang D., Zhang N., Zheng C., Bryant S.H.(2011) CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res. 39(Database issue):D225-9.

Clustering is performed using Louvain method:

Blondel V.D., Guillaume J., Lambiotte R., Lefebvre E. (2008) Fast unfolding of communities in large networks. J. Stat. Mech.: P10008

MetaCentrum Galaxy

MetaCentrum Galaxy is available to all MetaCentrum users. You can use your META credentials to login.

If you want to install a tool, please contact our mail meta@cesnet.cz. Also do the same if you encounter any problem where bug reporting is not an option.

FTP upload

For FTP upload you need to use a FTP client with the following settings:

Filezilla additional settings:

  • protocol: FTP-File Transfer Protocol
  • encryption: Require implicit FTP over TLS

Windows users: Please, use WinSCP FTP client.

Linux users: Filezilla can be used.

Linux/command line:

$ curl -T your_file -k -v -u username:password ftps://galaxy.metacentrum.cz

Availability

Version 18.05.

Use

You https://galaxy.metacentrum.cz/ portal to login.

Large or local files upload

You can upload files up to 10 GB size, however if you need to upload even larger one it is possible. Just reach us through mail meta@cesnet.cz

Deleting files

We ask you for deleting old files in your histories. First delete file by click on cross close to file name then you can delete

  • files one by one by click on History options (sprocket icon) and then check Include deleted datasets, now, in the history, you see deleted datasets and you can delete them permanently,
  • all deleted files permanently by click on History options (sprocket icon) and then Purge deleted datasets.

Step by step

Easy start for Next Generation Sequencing Data Analysis at Metacentrum Galaxy:

  • log in https://galaxy.metacentrum.cz/ by your MetaCentrum account
  • click Get data - Upload data
  • upload data from http://nihlibrary.ors.nih.gov/bioinfo/ngs/sourcedata1.zip (zipped file with 3 fastq reads and 1 fa - reference genome), download it, extract a upload to Galaxy
  • fastq grooming: click on menu NGS: QC and manipulation, choose FASTQ Groomer, leave defaults (choose 1 of fastq files from history, Sanger type), click execute
  • quality control: in the same section choose FastQC:Read QC, use groomed output, leave defaults, click execute
  • check quality by click on eye in the output of quality result in history
  • read mapping: click on menu NGS: Mapping, choose Map with BWA for Illumina, reference genome is one from history (fa file), for fastq file groomed fastq file, leave defaults (single-end) and click to execute
  • check the ouput (sam file) by click on eye on the last item in history
  • sam to bam conversion: in the menu NGS: SAM Tools, choose SAM-to-BAM, reference genome is one from history (fa file) and sam file is one after the mapping, click execute
  • download bam result from the history

Supported application

For a list of currently available applications see left panel Tools in the Galaxy interface.

If not written something else, all jobs are send to the queue with 2 CPUs, using 10GB memory in 1 day queue.

If you need other applications to include, write request to meta@cesnet.cz and look if it is in the Galaxy tool shed.

Licence

Licensed under the Academic Free License version 3.0

Documentation

https://wiki.galaxyproject.org/

Homepage

https://wiki.galaxyproject.org/