编辑距离:
又称Levenshtein距离(莱文斯坦距离也叫做Edit Distance),是指两个字串之间,由一个转成另一个所需的最少编辑操作次数,如果它们的距离越大,说明它们越是不同。许可的编辑操作包括将一个字符替换成另一个字符,插入一个字符,删除一个字符。
具体的执行过程参考点击打开链接 这篇博客
import math def minEditDist(sm,sn): m,n = len(sm)+1,len(sn)+1 # create a matrix (m*n) matrix = [[0]*n for i in range(m)] #初始化矩阵 matrix[0][0]=0 for i in range(1,m): matrix[i][0] = matrix[i-1][0] + 1 for j in range(1,n): matrix[0][j] = matrix[0][j-1]+1 for i in range(m): print (matrix[i]) print ("********************") cost = 0 for i in range(1,m): for j in range(1,n): if sm[i-1]==sn[j-1]: cost = 0 else: cost = 1 matrix[i][j]=min(matrix[i-1][j]+1,matrix[i][j-1]+1,matrix[i-1][j-1]+cost) for i in range(m): print (matrix[i]) return matrix[m-1][n-1] bit1='11010011010101011001011111011111001110111110111110000000001000100001100100001010011000111010000100101010011000110101110011111000' bit2='00111100011111100110100111111100111101111110111111001000001001001110011000011001100100010100110011010111001111100010101011110011' if __name__ == "__main__": mindist=minEditDist(bit1,bit2) print (mindist) len1=len(bit1) len2=len(bit2) print('the similarity is ',1-mindist/max(len1,len2 ))
最后输出两个序列的相似性。