Structure

Z MetaCentrum
Skočit na navigaci Skočit na vyhledávání

Description

The program structure is a free software package for using multi-locus genotype data to investigate population structure. Its uses include inferring the presence of distinct populations, assigning individuals to populations, studying hybrid zones, identifying migrants and admixed individuals, and estimating population allele frequencies in situations where many individuals are migrants or admixed. It can be applied to most of the commonly-used genetic markers, including SNPs, microsatellites, RFLPs and AFLPs.

Licence

Public

Usage

Upcoming modulesystem change alert!

Due to large number of applications and their versions it is not practical to keep them explicitly listed at our wiki pages. Therefore an upgrade of modulefiles is underway. A feature of this upgrade will be the existence of default module for every application. This default choice does not need version number and it will load some (usually latest) version.

You can test the new version now by adding a line

source /cvmfs/software.metacentrum.cz/modulefiles/5.1.0/loadmodules

to your script before loading a module. Then, you can list all versions of structure and load default version of structure as

module avail structure/ # list available modules
module load structure   # load (default) module


If you wish to keep up to the current system, it is still possible. Simply list all modules by

module avail structure

and choose explicit version you want to use.

  • Version 2.3.4:
module add jdk-8 structure-2.3.4
structure
  • Version 2.3.3:
module add structure-2.3.3
structure

Parallelisation

STRUCTURE itself process single file in time. It has simple Java GUI available to create batch task and run on desktop, or graphical interactive job on MetaCentrum. It is far from being optimal.

Submission of multiple jobs

Set of scripts https://github.com/V-Z/structure-multi-pbspro will for given input file, range of Ks and number of times each K should be calculated submit multiple jobs (one for each STRUCTURE run). All results will be delivered to selected directory. It's probably the most efficient way how to process also larger data.

R library ParallelStructure

On the basis of very good inspiration VZ described way how to launch more Structure runs in parallel using R script. See https://trapa.cz/en/structure-r-linux and slides You can easily launch for example 100 Structure runs on 10 CPUs.


Documentation

Version 2.3.4: Documentation is available at [1] .

Version 2.3.3: Documentation is available at [2] .

Note: There are number of program parameters that are set by the user and stored in two files (mainparams and extraparams), which are read every time the program executes. Both the files must be in a directory from where is the Structure executed. See http://pritchardlab.stanford.edu/software/structure_v.2.3.1/documentation.pdf (chapter 7, page 20) for more dotails.

Homepage

URL: https://web.stanford.edu/group/pritchardlab/structure.html