Protein sequence comparison


Motivation
Inexact matching
Algorithms

Biological Sequence Comparison

The lecture introduces the problem of comparing biological sequences as strings of letters allowing insertions, deletions and substitutions between them. We will consider the reason of such assumptions, called gene mutation. Also we will consider several advanced algorithms for comparing biological sequences.

I(i,b[j]) or (i,b[j])
D(a[i],-) or (a[i],-)


Notice, that the problem is symmetric: a deletion in A can be seen as an insertion in B, and vice versa.

D(A,B) = minT(A,B){W(T(A,B))}



Editing distance D(A,B) is a metric. It has the following characteristics:
  1. D(A,B) = 0 => A = B
  2. D(A,B) = D(B,A)
  3. for each C, D(A,B) <= D(A,C) + D(C,B)