Distances: my personal experience

Just a few suggestions according to my own experience. For a more formal and detailed treatment go to Wikipedia or a maths manual.
For a very rich directory on string similarity go to this link

Quantitative Data Arrays

For quantitative data (e.g. transcription signals from microarray experiments) you can use

Binary Arrays

Hamming distance is not good, in my opinion, when the strings compared have a very unequal 1/0 content, and the meaning of 1s and 0s is related to set-membership (as in the examples above).
E.g., consider these strings, and the choice made by Hamming and Jaccard:

Mapping the first problem to interactions, let's say
A1 interacts with B, C
A2 interacts with B, D
A3 interacts with B, C, E, F, G
is A1 neighborhood more similar to A2 (Hamming's choice) or A3 (Jaccard's choice)

In addition, neither of these two distances is suited when some array positions are noisy, whereas others are information-rich, as in the following fictional examples:
110 010 001
110 100 000
110 000 110
000 101 100
001 101 011
000 111 000
clearly, 1st and 2nd position are always co-conserved, as well as 4th and 6th; the co-conservation of other residues is irregular; therefore, we may think that the co-conserved positions hold more information than the other ones.

DanieleMerico/HowtoDirectory/Distances (last edited 2009-07-07 00:47:58 by localhost)

MoinMoin Appliance - Powered by TurnKey Linux