Iterated Function System for visualisation of genomes

Main application windowOur JAVA program is a tool for visualization of large genomes by Chaos Game Representation CGR (Jeffrey 1990). CGR transforms DNA sequence to the colored image (figure on the right). Every pixel corresponds to a short sequence of n symbols called n-mer. Color of pixel reflects n-mer frequency in the analyzed sequence. This gives us histogram that allows seeing sparse and dense occurrences of n-mers quickly.

Features

Installation

Description

Control window

After initialization the tool prompts user to browse for an input file (in .fasta or .gb format) on the disc. Parameters of visualization are displayed immediately in a control window: Application works with genomic data in commonly used formats FASTA and GENEBANK. Sometimes GENEBANK format contains descriptive information about sequence such as sequence description, source, authors, references and specification of other features included in genome. These are extracted by program and used later in visualisation process. In the Main application window all opened sequences with their details are shown . Afterwards several visualisations windows can be opened and handled simultaneously. All available parameters of visualisation are shown in the Visualisation control window, which is used to interactively adjust visualisation.

The main parameter of visualisation, i.e. region in genome employed in visualisation is controlled by slider in the top of the window. User can specify region directly by entering start position and length, or using slider control to move region swiftly through the genome. Actual position and length of region is displayed in slider control as a red bar with corresponding size.

GENEBANK format might include specification of genomic features such as genes, mRNA, etc. These are extracted from input file and listed in the Features part of window. After a user selects feature its description is displayed in the right part. The Visualise button can be used to visualise selected feature immediately. This allows us easily analyse important parts of genome simply by a single click.

Remaining parameters of visualisation are listed in section Visualisation settings. Colour range, i.e. transformation of oligomer frequency to colour heatmap can be adjusted. Minimum value is shown in blue and maximum in red colour. When automatic adjustment of colour range is chosen, maximum value is always recalculated to display the most frequent oligomer in red colour. Highlighting of oligomers with low and high frequency simultaneously can be achieved by logarithmic scale of colour heatmap, i.e. log(count) is applied to counter value.

Coloured heatmap can be inverted which is useful when looking for infrequent oligomers (right image). Number of cells in visualisation can be changed from 4x4 to 1024x1024 by choosing different oligomers lengths. Each visualisation is opened in a separate window which title displays sequence name and visualisation range. Heatmap on the left side is used to display both transformation of frequency to colour and histogram of colours used in the image. Histogram of colours makes setting up optimal parameters for colour transformation easier.

Visualisation window Visualisation window

References

Tool redistribution and citation policy

This tool is freely available to the public. The author Matej Makula gives permission for this product to be used without license for any purpose under the condition the author and this web page are clearly acknowledged as the source of the product.

Paper

Makula M. and Benuskova L. (2009) Interactive visualisation of oligomer frequency in DNA. In: Computing and Informatics, Vol. 28, 2009, 695–710 download

Contact

If you have any comments, questions or ideas for functionality improvement do not hesitate to contact me via .

4230 visitors & 1033 downloads since January 15, 2007
Last modified: