Help

PSSM-convert - The program for PSSMs creation and for conversion of PSSM formats.


Name and version:

PSSM-convert, version 0.1

Author and institution:

Matus Hajduk
Institute of Molecular Biology, Slovak Academy of Sciences

 

Usage:

Program PSSM-convert takes as the input either FASTA formatted sequences or PSSM (Position-specific scoring matrix), which can be FASTA, Transfac, Patser or PromScan formatted.

This program automatically recognizes whether inserted data are sequences or matrix (and also the format the PSSM is pasted in). Inserted data have to fulfill criteria mentioned in section Requirements.

The user has to specify the output format which can be FASTA, Transfac, Patser, PromScan, Sequences, PWM or Weblogo. The program then shows inserted data in chosen format. If format Sequences is selected, the output will be sequences with the same information content as had the original sequences the PSSM was created from (It means that PSSM re - created from these sequences will be the same). After choice of format Weblogo the sequence logo will be created.When format PWM is chosen the displayed Position Weight Matrix is calculated as follows: Weights are calculated according to Hertz & Stormo 1999.

Wij=ln((n_(i,j)+p_i)/(N+1))/p_i

At once only one matrix can be inserted . The number of pasted sequences is not limited.

 

Requirements:

FASTA format of matrix must begin with the ">" sign and must be composed of four rows. Example of matrix in FASTA format:

>Matrix
1 0 1 0 3
0 1 0 2 0
0 1 1 1 0
2 1 1 0 0

PromScan format of matrix must begin with "A :" or "a :" and next three rows must begin with "C :", "G :", "T :", or "c :", "g :", "t :". Between the letter and ":" can be zero or more white spaces. Example of matrix in PromScan format:

A: 1 0 1 0 3
C: 0 1 0 2 0
G: 0 1 1 1 0
T: 2 1 1 0 0

Patser format of matrix must begin with "A |" or "a |" and next three rows must begin with "C |", "G |", "T |", or "c |, "g |", "t |". Between the letter and "|" can be zero or more white spaces. Example of matrix in Paster format:

A| 1 0 1 0 3
C| 0 1 0 2 0
G| 0 1 1 1 0
T| 2 1 1 0 0

Transfac format of matrix must begin with "ID" or "id" and conditionally can contain "BF" or "bf" in the next row. Matrices with an IUPAC code letters in additional column are also accepted by the program. Example of matrix in Transfac format:

ID Matrix
BF
PO	A	C	G	T
01	1	0	0	2	W
02	0	1	1	1	B
03	1	0	1	1	D
04	0	2	1	0	S
05	3	0	0	0	A

Sequences must be in FASTA format (first row of each sequence begins with the ">" sign). They cannot contain more then one ">" in the first row. All inserted sequences must have an equal number of characters. Sequences can contain gaps, represented by the hyphen. Within the sequence only A, a, C, c, G, g, T, t, characters are accepted. Example of FASTA sequences:

>Sequence 1
ACGTAGTCGTAGGTGCATCGTTGA
>Sequence 2
CTAGCTACGATCGCTACGACGCAT

Diagnostics:

If "Incorrect matrix format" message is printed out, inserted data do not fulfill the requirements mentioned above.



< Back