AYB is a base caller for the Illumina Genome Analyzer, using an explicit statistical model of how errors occur during sequencing to produce more accurate reads from the raw intensity data.
In particular, AYB deals with three sources of error:
Cross-talk: There is overlap in the excitation spectra of the fluorophores used to label the nucleotides, leading to light emission being detected under several combinations of lasers and filters ("channels"). This effect is especally noticable for the fluorophores used to mark adenine and guanine, each of which is bright in two channels. Phasing: As the number of cycles increases, the signal starts to blur as the cluster loses synchronicity: random failure of nucleotides to incorporate, or failure of the blocking element to prevent incorporation of more than one nucleotide mean that individual strands lag or lead and the signal detected at each cycle is a mixture of several positions along the read. Contamination: Non-sequence contamination in the flow cell, microscopic particles of dust for example, get illuminated by the lasers and might be detected instead of sequence. Such contamination is generally abnormally bright compared to the surrounding sequence and so does not conform to what AYB expects, the quality scores for the called base being automatically down-weighted as a result. In contrast to other base-calling approaching, AYB uses a general model of phasing estimated directly from the data rather than assuming that it occurs at a constant rate for all cycles. Dealing with phasing in this manner means that the base calls made by AYB at the end of each read tend to be more accurate than other methods, making greater read lengths feasible and increasing the number of the highest quality reads: AYB returning 2.8 times as many perfect reads than other base callers for 100 cycle data (with smaller gains for shorter reads).
By default AYB performs per-tile analysis, estimating phasing and cross-talk separately for every tile. This level of analysis is more processor intensive than the Illumina analysis pipeline but can be efficiently split between machines: an entire 8 lane run of 45 cycle data (95 million clusters) can be analysed within an hour on a modern eight-core server, as could 2 million clusters of much longer 101 cycle data. In addition AYB offers two options to reduce the total computational burden: fixing the cross-talk matrix across tiles, either at a value previously estimated by AYB or the Illumina pipeline, allows phasing to be solved analytically in each iteration and so speeding up estimation considerably; alteratively a Bustard-like approach can be used, estimating the cross-talk and phasing from a few tiles and then holding them fixed while calling bases for the remaining tiles.
GNU General Public Licence version 3
module add ayb2 ayb -h
https://www.ebi.ac.uk/goldman-srv/AYB/man/ayb-recal.1.html or in directory