03 Software Tools

MADNet

MADNet is an analysis, mining and visualization tool aimed at broadening the biological context of high-throughput data. Taking a list of genes with measured biologically relevant feature (e.g. expression level, copy number, etc.) as input, it performs enrichment analysis of significant data points across KEGG pathways and GO terms, while simultaneously integrating annotation data from various sources, presenting the input in a broadened context.

http://bioinfo.hr/madnet/

INCA
INCA1.20a (previous version)
Overview of features:

computes and charts codon and amino acid frequencies
calculates common indices, such as effective NC, CAI and “codon bias”
fully customizable scatter plots – spot trends in codon usage
export graphics, or text files for further analysis
built-in self organizing map (SOM) for data visualization and clustering
codon usage optimizer helps improve heterologous gene expression
random nucleotide sequence generator
comprehensive user manual and a 15-minute tutorial
available for the Win32 platform; free of charge for academic use

Download INCA 1.20a

INCA 2.1 with INCAblocks now available!
INCA 2.1 features:

ability to load/unload multiple files (ncbi, kegg, cutg, fasta files)
save and load ‘projects’, import numerical data and codon frequencies
create user-defined gene groups, descriptive stats & correlation for groups
3D scatterplots, coloring by any criterion, graphical select & filtering
improved SOM, based on the MILC statistic, more vis criteria
principal component analysis (PCA) in plots, tables and SOM
a more comprehensive nucleotide sequence generator
“INCAblocks 2.1” is the Pascal source code for INCA’s units that enable you to quickly write your own applications
numerous user interface improvements; for Windows and Linux

Download INCA 2.1 with INCAblocks

If you used INCA in your work, please cite:
Supek F, Vlahovicek K; INCA: synonymous codon usage analysis and clustering by means of self-organizing map. Bioinformatics. 2004 Sep 22;20(14):2329-2330 (PubMed link)

PRO-MINE

http://bioinfo.hr/pro-mine

MILC and MELP
What are MILC and MELP?

There are a number of methods (also called: measures) currently in use that quantify codon usage in genes. These measures are often influenced by other sequence properties, such as length. This can introduce strong methodological bias into measurements; therefore we attempted to develop a method free from such dependencies.

What did we do?

We compared the performance of several commonly used measures and a novel method we introduce – Measure Independent of Length and Composition (MILC). Large, randomly generated sequence sets were used to test for dependence on:

sequence length
overall amount of codon bias and
codon bias discrepancy in the sequences.

A derivative of the method, named MELP (MILC-based Expression Level Predictor) can be used to quantitatively predict gene expression levels from genomic data. It was compared to other similar predictors by examining their correlation with actual, experimentally obtained mRNA or protein abundances.

Our conclusion…

We have established that MILC is a generally applicable measure, being resistant to changes in gene length and overall nucleotide composition, and introducing little noise into measurements. Other methods, however, may also be appropriate in certain applications.

Our efforts to quantitatively predict gene expression levels in several prokaryotes and unicellular eukaryotes met with varying levels of success, depending on the experimental dataset and predictor used. Out of all methods, MELP and Rainer Merkl’s GCB method had the most consistent behaviour. A ‘reference set’ containing known ribosomal protein genes appears to be a valid starting point for a codon usage-based expressivity prediction.

Read the whole paper

Fran Supek, Kristian Vlahoviček. Comparison of codon usage measures and their applicability in prediction of microbial gene expressivity. BMC Bioinformatics. 2005 Jul 19;6:182.
Free Full Text in BMC Bioinformatics

Use MILC in your work

MILC is the default codon usage measure used in the INCA software package, freely available for academic use on Windows and Linux. Go here to download it. Of course, you may also implement MILC and MELP in your own scripts and programs.

Written by