Galaxy

Z MetaCentrum
Přejít na: navigace, hledání

Description

Galaxylogo.png is an open, web-based platform for accessible, reproducible, and transparent computational biomedical research.

  • Accessible: Users without programming experience can easily specify parameters and run tools and workflows.
  • Reproducible: Galaxy captures information so that any user can repeat and understand a complete computational analysis.
  • Transparent: Users share and publish analyses via the web and create Pages, interactive, web-based documents that describe a complete analysis.

Licence

Licensed under the Academic Free License version 3.0

Tutorials

Galaxy is widespread tool, therefore one can choose from many of tutorials to learn working with Galaxy i.e. Our MetaCentrum tutorial,the main starting page is Galaxy wiki and Video tutorial page. The starting guides

RepeatExplorer Galaxy

Our RepeatExplorer Galaxy instance (https://galaxy-elixir.cerit-sc.cz) includes utilities for Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data and tools for the detection of transposable element protein coding domains.

Licence

RepeatExplorer is provided as free, public, Internet accessible service, in the hope that it will be useful, but WITHOUT ANY WARRANTY.

Availability

Version updated September 13 2016. Version of Galaxy is 16.07.

Use

Before you can use Galaxy portal, you have to register and your e-mail address have to be verified. This is quite simple process and takes few minutes. Please, take into account that it takes approximately 30 minutes to set up all your account information in our infrastructure -- during this time, e-mail services (for example sending bugreport, etc.) for your Galaxy account will not work properly. All registered users can use advanced features like data and workflow sharing and are free to use up to 50Gb. Please, read RepeatExplorer documentation before analyzing data!

Step by step

FTP upload

Upload of files via ftp protocol is encrypted. Please set up your ftp client as follows:

Filezilla additional settings:

  • protocol: FTP-File Transfer Protocol
  • encryption: Require implicit FTP over TLS

Windows users: Please, use WinSCP ftp client. Several versions of Filezilla program are not working properly with our FTP server set up at Windows operating system.

Linux users: Filezilla ver. 3.5.3 under Linux also does not work. In this case, use command line program curl

Linux/command line:

$ curl -T your_file -k -v -u username:password ftps://repeatexplorer-elixir.cerit-sc.cz

Known problems

When using clustering or tarean tool, I am receiving PBS: job killed error

It is often very difficult to decide how many processors and memory will your computation require - especially when tarean or seqclust tools are used. Unfortunately, this requirements have to be specified at our infrastructure before computation. This secures that other jobs which are run at same time as yours are not influenced with your computations. When PBS kills your job, It means that your computation tried to use more resources that it asked for. You can for example receive this error message:

PBS: job killed: job requested 4 cores on node zigur18.cerit-sc.cz, but the measured load was 4.422386

This means, that your job asked for 4 processors, but used more. When dealing with this error, select long & slow at Select queue field at tarean or seqclust forms. If you are receiving this error even then, increase ppn number at Modify parameters (optional) field which is right under Select queue field. Please, take a note, that maximal ppn number is 16. If you are still struggling with right set up of your job, please let us known at metasupport@rt.cesnet.cz

First line in error log of job contains word Killed

Word Killed in job error log means that your job was killed by our system. This often happens when job is trying to use more resources than it asked for (even cgroups are not able to satisfy job memory requirements). Typical message in error log looks like this:

/storage/brno7-cerit/home/galaxyelixir/galaxy/tools/repex_tarean/seqclust_wrapper.sh: line 22: 26464 Killed "${@:2}" 2> $TMPFILE

When you receive this error, you have to increase amount of memory that job can use. This can be done in Select queue field at tarean or seqclust forms. Select queue that have bigger -l mem= parameter. You can also edit this parameter yourself -- just take into account, that maximal value of this attribute is 116gb now.

Registration

Following text describes steps during registration to the Galaxy portal running at: https://repeatexplorer-elixir.cerit-sc.cz.Please, take into consideration that after successful registration, it takes approximately 30 minutes to set up all your Galaxy account information - for example, your e-mail services inside Galaxy portal will not be working properly during this time.

Registration can be done by following these steps:

  • visit url https://perun.cesnet.cz/elixir2/registrar/%3Fvo=elixir-cz%26group=repeatExplorer%26targetnew=https://repeatexplorer-elixir.cerit-sc.cz
  • select account that you will use for registration - eduIDcz identity can be selected (only Czech institutions), ElixirID (use only with caution) and other set of the different identities which are for example Facebook, LinkedIn, Google, MojeID, OrcID and GitHub.
  • log in to your selected account and agree when asked to share information with Perun
  • if similar user already exists in Perun, you will be asked to prove your identity by logging into this existing account - do this only in case that you want to use this account and it is truly yours (you have to provide correct username and password for this account). If you don't want to use your already existing account or similar user that Perun found is not you, you can click on the button with "It is not me" text. In case that no similar user in Perun was found, you will not be asked this question and you can skip this point.
  • fill application form for Elixir CZ IT services - please fill organization and country fields. If you change already filled e-mail address, you will have to verify your e-mail later.
  • fill application form for RepeatExplorer Galaxy - only username and password is required. You will use these value for log in to the Galaxy portal after registration is finished. If you have chosen to use your already existing account during previous steps of registration, you will be no asked to fill your password and username - these fields will be already filled based on your existing account information.
  • congratulation, you are successfully registered! By clicking to continue button, you can proceed to Galaxy portal. If you have changed your e-mail address during registration, you will have to verify it, before you can continue - you will receive message to your e-mail with verification link in it.

Acknowledgement

The user of Galaxy based RepeatExplorer is obliged to use the following acknowledgement formula in all your publications created with the support of RepeatExplorer.

Computational resources were provided by the ELIXIR-CZ project (LM2015047), part of the international ELIXIR infrastructure.

RepeatExplorer is a part of services provided by ELIXIR - European research infrastructure for biological information. For other services provided by ELIXIR's Czech Republic Node visit www.elixir-czech.cz/services.

How to Cite

If you use RepeatExplorer in your work please cite:

Novak, P., Neumann, P., Pech, J., Steinhaisl, J., Macas, J. (2013) - RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next generation sequence reads. Bioinformatics

RepeatExplorer is a part of services provided by ELIXIR - European research infrastructure for biological information. For other services provided by ELIXIR's Czech Republic Node visit www.elixir-czech.cz/services.

References

The principles of RepeatExplorer approach are described in:

Novak, P., Neumann, P., Macas, J. (2010) - Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. BMC Bioinformatics 11: 378.

Novak, P., Neumann, P., Pech, J., Steinhaisl, J., Macas, J. (2013) - RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next generation sequence reads. Bioinformatics

Repeat Explorer uses RepeatMasker and Repbase:

Jurka, J., Kapitonov, V.V., Pavlicek, A., Klonowski, P., Kohany, O., Walichiewicz, J. (2005) Repbase Update, a database of eukaryotic repetitive elements. Cytogentic and Genome Research 110:462-467

Conserved Domain Database:

Geer L.Y., Geer R.C., Gonzales N.R., Gwadz M., Hurwitz D.I., Jackson J.D., Ke Z., Lanczycki C.J., Lu F., Marchler G.H., Mullokandov M., Omelchenko M.V., Robertson C.L., Song J.S., Thanki N., Yamashita R.A., Zhang D., Zhang N., Zheng C., Bryant S.H.(2011) CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res. 39(Database issue):D225-9.

Clustering is performed using Louvain method:

Blondel V.D., Guillaume J., Lambiotte R., Lefebvre E. (2008) Fast unfolding of communities in large networks. J. Stat. Mech.: P10008

Galaxy-dev & OpenMS

New instance of MetaCentrum Galaxy is currently available at https://galaxy-dev.metacentrum.cz/. You can use MetaCentrum standard credentials to login.

OpenMS tools can be found under the last item in the left menu panel.

This instance is not fully tested yet. If you encounter a problem, please use bug reporting system (in the right panel click View or report this error and then click Report).

MetaCentrum Galaxy

Our MetaCentrum Galaxy instance (https://galaxy.metacentrum.cz) allows the analyses that are submited as regular jobs under real user's accounts (you can see jobs in Users page). Kerberos is used for authentification by MetaCentrum accounts through Apache. DRMAA connects Galaxy jobs with Torque. For downloading large files, Apache module Xsendfile is applied. Internal page of configuration

Current version is under construction, if you find any mistakes send us error log and way how you get it. Also, if you want to integrate some tool (ideally one from Galaxy tool shed) in our Galaxy, let us know.

Availability

Version updated April 17 2014.

Use

ALWAYS login through webpage https://galaxy.metacentrum.cz (old addresses https://galaxy.metacentrum.cz/login and https://galaxy.meta.zcu.cz/login will work too, but for the latter one, there isn't a valid ssl certifikacate for the page). It will create Kerberos tickets (that allows you to submit new analysis/job) and will redirect to .../galaxy URL. Later, if you plan to do next analyses after some time, use .../login URL again, because your tickets must be recreated.

Large or local files upload

Galaxy often applies FTP upload for large files. We apply standard user's folders in the storages (/storage/...) for such things, where you can easily upload your files (see Storage types tutorial and File systems information).

In shortcut, if you need to upload large or local files into Galaxy:

  • (do just once) use eg. WinSCP and connect to galaxy.metacentrum.cz (ensure that the SFTP file protocol is choosed - default) by your MetaCentrum login/password, create folder (e.g. galaxy-files) here and add read and execute permission for others and group (755)
  • (do just once) send email to meta@cesnet.cz with the folder name
  • eg. by WinSCP you can easily copy your files in the folder and later you will see them under Galaxy in the "Upload file"-"Files uploaded via FTP"

Deleting files

We ask you for deleting old files in your histories. First delete file by click on cross close to file name then you can delete

  • files one by one by click on History options (sprocket icon) and then check Include deleted datasets, now, in the history, you see deleted datasets and you can delete them permanently,
  • all deleted files permanently by click on History options (sprocket icon) and then Purge deleted datasets.

Step by step

Easy start for Next Generation Sequencing Data Analysis at Metacentrum Galaxy:

  • log in https://galaxy.metacentrum.cz/ by your MetaCentrum account
  • click Get data - Upload data
  • upload data from http://nihlibrary.ors.nih.gov/bioinfo/ngs/sourcedata1.zip (zipped file with 3 fastq reads and 1 fa - reference genome), download it, extract a upload to Galaxy
  • fastq grooming: click on menu NGS: QC and manipulation, choose FASTQ Groomer, leave defaults (choose 1 of fastq files from history, Sanger type), click execute
  • quality control: in the same section choose FastQC:Read QC, use groomed output, leave defaults, click execute
  • check quality by click on eye in the output of quality result in history
  • read mapping: click on menu NGS: Mapping, choose Map with BWA for Illumina, reference genome is one from history (fa file), for fastq file groomed fastq file, leave defaults (single-end) and click to execute
  • check the ouput (sam file) by click on eye on the last item in history
  • sam to bam conversion: in the menu NGS: SAM Tools, choose SAM-to-BAM, reference genome is one from history (fa file) and sam file is one after the mapping, click execute
  • download bam result from the history

Supported application

Actually we support

  • HelloMetaCentrum, it is a test job, that only stores information about machine where it was assigned.
  • Bowtie2 available as module bowtie2-2.1.0, in menu NGS: Mapping - Bowtie2
  • Bfast available as module bfast-0.7.0, in menu NGS: Mapping - Map with BFAST
  • Blast available as module blast+-2.2.27, in menu NCBI Blast+
  • BWA available as module bwa-0.7.5a, in menu NGS: Mapping - Map with BWA for Illuminaa
  • Cuff tools available as module cufflinks-2.0.2, in menu NGS: RNA analysis - Cuff...
  • Cutadapt available in module python27-modules-gcc, in menu NGS: QC and manipulation - Cutadapt
  • Fastq, Fastx, FastQC tools, in menu NGS: QC and manipulation - grooming, quality check, filtering of Fast...
  • Mosaik available as module mosaik-1.1, in menu NGS: Mapping - Map with Mosaik
  • Muscle available as module muscle-3.8.31, in menu FASTA Manipulation - MUSCLE
  • RepeatExplorer available as module repeatexplorerDEV, there are 2 days and 1 weak queues for clustering
  • RSEM available as module rsem-1.2.8, in menu NGS: Mapping - RSEM
  • SAMtools available as module samtools-0.1.19, in menu NGS: SAM tools - Filter SAM or BAM, SAM-to-BAM and BAM-toSAM converts
  • TopHat2 available as module tophat-2.0.8, in menu NGS: RNA analysis Mapping - Tophat2


If not written something else, all jobs are send to the queue with 2 CPUs, using 10GB memory in 1 day queue.

If you need other applications to include, write request to meta@cesnet.cz and look if it is in the Galaxy tool shed.

Problems

Report the problems in detail, i.e. write which tool and input data you have used, what do you see behind icon of bug.

  • You do not see your Large files in FTP upload.

Change permissions (og+rx)of your folder (and folder above) with files that you want to upload into galaxy.

  • galaxy_179.o: Permission denied

If you get output of the job, where is written Permission denied many times, eg.

Unable to copy file /var/spool/torque/spool/5764811.arien.ics.muni.cz.OU to /auto/plzen1/home/galaxy/galaxy-dist/database/job_working_directory/000/179/galaxy_179.o
** error from copy
/bin/cp: accessing `/auto/plzen1/home/galaxy/galaxy-dist/database/job_working_directory/000/179/galaxy_179.o': Permission denied

you should check that you have log in by ../login address before running of analysis.

  • IOError: [Errno 13] Permission denied

If you see somehting like

IOError: [Errno 13] Permission denied: '/storage/brno2/home/galaxy/tmp/GALAXY_VERSION_STRING_66'

write email meta@cesnet.cz, the error is hidden in the log files.

Documentation

https://wiki.galaxyproject.org/

Program manager

meta@cesnet.cz

Homepage

https://wiki.galaxyproject.org/