欢迎关注,敬请点赞!
浓缩就是精华——python实现输入法自动提示
附件地址https://gitee.com/wenlong850606/spelling_check_python
参考https://blog.csdn.net/u013830811/article/details/46539919会开花的树-《21行Python写出拼写检查器》
封装函数
import re
import collections
def words(text):
'''
该函数将文本中的单词转化为小写,拆分,存到列表
don't 会被拆分为don 和 t
'''
return re.findall('[a-z]+', text.lower())
def train(features):
'''
该函数统计每个单词出现的次数,对于没有在训练集中出现的单词,采用拉普拉斯平滑,默认它们出现了一次。
'''
model = collections.defaultdict(lambda : 1)
for f in features:
model[f] += 1
return model
正确输入函数
返回顶部
字符输入正确,且在训练集里面的情形,概率最高,放在短路原则最前面
def known(words):
return set(w for w in words if w in NWORDS)
“一次编辑”函数
精华部分一:
- 只对单词中的一个字符进行(增、删、改、换)处理
- 切割,对第2部分进行增、删、改、换操作
alphabet = 'abcdefghijklmnopqrstuvwxyz'
def edits_once(word):
'''
一次编辑:增加1个字母、删除1个字母、修改1个字母、相邻字母调换1次
'''
splits = [(word[:i], word[i:]) for i in range(len(word) + 1)] # 切割,对第2部分进行增、删、改、换操作
inserts = [a + c + b for a, b in splits for c in alphabet] # 增<-->a和b中插入一个c
deletes = [a + b[1:] for a, b in splits if b] # 删<-->b[0]
replaces = [a + c + b[1:] for a, b in splits for c in alphabet if b] # 改<-->b[0]换成c
transposes = [a + b[1] + b[0] + b[2:] for a, b in splits if len(b) > 1] # 换<-->b[1]和b[0]互换
return set(inserts + deletes + replaces + transposes)
“一次编辑”举例:
返回顶部
对单词’word’进行一次处理
如果输入错误,只差一个(增、删、改、换)的概率次于输入正确的概率,短路原则第二
splits = [('word'[:i], 'word'[i:]) for i in range(len('word') + 1)]
splits
[('', 'word'), ('w', 'ord'), ('wo', 'rd'), ('wor', 'd'), ('word', '')]
edits_once('word')
结果
{'aord',
'aword',
'bord',
'bword',
'cord',
'cword',
'dord',
'dword',
'eord',
'eword',
'ford',
'fword',
'gord',
'gword',
'hord',
'hword',
'iord',
'iword',
'jord',
'jword',
'kord',
'kword',
'lord',
'lword',
'mord',
'mword',
'nord',
'nword',
'oord',
'ord',
'oword',
'owrd',
'pord',
'pword',
'qord',
'qword',
'rord',
'rword',
'sord',
'sword',
'tord',
'tword',
'uord',
'uword',
'vord',
'vword',
'waord',
'ward',
'wbord',
'wbrd',
'wcord',
'wcrd',
'wdord',
'wdrd',
'weord',
'werd',
'wford',
'wfrd',
'wgord',
'wgrd',
'whord',
'whrd',
'wiord',
'wird',
'wjord',
'wjrd',
'wkord',
'wkrd',
'wlord',
'wlrd',
'wmord',
'wmrd',
'wnord',
'wnrd',
'woad',
'woard',
'wobd',
'wobrd',
'wocd',
'wocrd',
'wod',
'wodd',
'wodr',
'wodrd',
'woed',
'woerd',
'wofd',
'wofrd',
'wogd',
'wogrd',
'wohd',
'wohrd',
'woid',
'woird',
'wojd',
'wojrd',
'wokd',
'wokrd',
'wold',
'wolrd',
'womd',
'womrd',
'wond',
'wonrd',
'wood',
'woord',
'wopd',
'woprd',
'woqd',
'woqrd',
'wor',
'wora',
'worad',
'worb',
'worbd',
'worc',
'worcd',
'word',
'worda',
'wordb',
'wordc',
'wordd',
'worde',
'wordf',
'wordg',
'wordh',
'wordi',
'wordj',
'wordk',
'wordl',
'wordm',
'wordn',
'wordo',
'wordp',
'wordq',
'wordr',
'words',
'wordt',
'wordu',
'wordv',
'wordw',
'wordx',
'wordy',
'wordz',
'wore',
'wored',
'worf',
'worfd',
'worg',
'worgd',
'worh',
'worhd',
'wori',
'worid',
'worj',
'worjd',
'work',
'workd',
'worl',
'world',
'worm',
'wormd',
'worn',
'wornd',
'woro',
'worod',
'worp',
'worpd',
'worq',
'worqd',
'worr',
'worrd',
'wors',
'worsd',
'wort',
'wortd',
'woru',
'worud',
'worv',
'worvd',
'worw',
'worwd',
'worx',
'worxd',
'wory',
'woryd',
'worz',
'worzd',
'wosd',
'wosrd',
'wotd',
'wotrd',
'woud',
'wourd',
'wovd',
'wovrd',
'wowd',
'wowrd',
'woxd',
'woxrd',
'woyd',
'woyrd',
'wozd',
'wozrd',
'wpord',
'wprd',
'wqord',
'wqrd',
'wrd',
'wrod',
'wrord',
'wrrd',
'wsord',
'wsrd',
'wtord',
'wtrd',
'wuord',
'wurd',
'wvord',
'wvrd',
'wword',
'wwrd',
'wxord',
'wxrd',
'wyord',
'wyrd',
'wzord',
'wzrd',
'xord',
'xword',
'yord',
'yword',
'zord',
'zword'}
验证
print('ord' in edits_once('word'))
True
“两次编辑”函数
返回顶部
能出现在训练集中的情形,概率不高,放在短路原则靠后
def known_edits_twice(word):
return set(e2 for e1 in edits_once(word) for e2 in edits_once(e1) if e2 in NWORDS)
“两次编辑”举例
known_edits_twice('word')
结果
{'award',
'bird',
'board',
'bold',
'bond',
'bore',
'bored',
'born',
'bory',
'card',
'chord',
'cod',
'cold',
'cond',
'cor',
'cord',
'cords',
'core',
'cork',
'corn',
'dodd',
'dora',
'dorr',
'fold',
'fond',
'food',
'for',
'ford',
'fords',
'fore',
'fork',
'form',
'fort',
'gird',
'god',
'gold',
'good',
'gory',
'hard',
'herd',
'hoard',
'hold',
'hood',
'horde',
'horn',
'hors',
'load',
'lord',
'lords',
'lore',
'loud',
'mold',
'mood',
'more',
'mori',
'mort',
'nod',
'nor',
'norm',
'odd',
'old',
'or',
'orb',
'ore',
'org',
'owed',
'pond',
'pork',
'port',
'rd',
'road',
'rod',
'sird',
'sold',
'sore',
'sort',
'sword',
'swords',
'swore',
'sworn',
'told',
'tore',
'torn',
'tory',
'trod',
'void',
'wad',
'wand',
'war',
'ward',
'wards',
'ware',
'warm',
'warn',
'warp',
'wars',
'wart',
'wary',
'wead',
'wed',
'weed',
'weird',
'weld',
'were',
'wert',
'wid',
'wild',
'wind',
'wire',
'wired',
'wirt',
'wiry',
'wo',
'woe',
'woes',
'woke',
'wolf',
'womb',
'won',
'wont',
'wood',
'woods',
'woof',
'wool',
'woot',
'word',
'worded',
'words',
'wordy',
'wore',
'work',
'worked',
'works',
'world',
'worlds',
'worm',
'worms',
'worn',
'worry',
'worse',
'worst',
'worth',
'would',
'wound',
'wove',
'wry',
'yard',
'yore',
'york'}
主流程
第一步,由《big.txt》生成训练集
统计各个单词出现的频次
NWORDS = train(words(open('big.txt').read()))
NWORDS
频次字典
defaultdict(<function __main__.train.<locals>.<lambda>()>,
{'the': 80031,
'project': 289,
'gutenberg': 264,
'ebook': 88,
'of': 40026,
'adventures': 18,
'sherlock': 102,
'holmes': 468,
'by': 6739,
'sir': 178,
'arthur': 35,
'conan': 5,
'doyle': 6,
'in': 22048,
'our': 1067,
'series': 129,
'copyright': 70,
'laws': 234,
'are': 3631,
'changing': 45,
'all': 4145,
'over': 1283,
'world': 363,
'be': 6156,
'sure': 124,
'to': 28767,
'check': 39,
'for': 6940,
'your': 1280,
'country': 424,
'before': 1364,
'downloading': 6,
'or': 5353,
'redistributing': 8,
'this': 4064,
'any': 1205,
'other': 1503,
'header': 8,
'should': 1298,
'first': 1178,
'thing': 304,
'seen': 445,
'when': 2924,
'viewing': 8,
'file': 22,
'please': 173,
'do': 1504,
'not': 6626,
'remove': 54,
'it': 10682,
'change': 151,
'edit': 5,
'without': 1016,
'written': 118,
'permission': 53,
'read': 219,
'legal': 53,
'small': 528,
'print': 48,
'and': 38313,
'information': 74,
'about': 1498,
'at': 6792,
'bottom': 43,
'included': 44,
'is': 9775,
'important': 286,
'specific': 38,
'rights': 169,
'restrictions': 24,
'how': 1316,
'may': 2552,
'used': 277,
'you': 5623,
'can': 1096,
'also': 779,
'find': 295,
'out': 1988,
'make': 505,
'a': 21156,
'donation': 11,
'get': 469,
'involved': 108,
'welcome': 19,
'free': 422,
'plain': 109,
'vanilla': 7,
'electronic': 59,
'texts': 8,
'ebooks': 55,
'readable': 14,
'both': 530,
'humans': 3,
'computers': 8,
'since': 261,
'these': 1232,
'were': 4290,
'prepared': 139,
'thousands': 94,
'volunteers': 23,
'title': 40,
'author': 30,
'release': 29,
'date': 49,
'march': 136,
'most': 909,
'recently': 31,
'updated': 5,
'november': 42,
'edition': 22,
'language': 62,
'english': 212,
'character': 175,
'set': 325,
'encoding': 6,
'ascii': 12,
'start': 68,
'additional': 31,
'editing': 7,
'jose': 2,
'menendez': 2,
'contents': 51,
'i': 7683,
'scandal': 20,
'bohemia': 16,
'ii': 78,
'red': 289,
'headed': 38,
'league': 54,
'iii': 92,
'case': 439,
'identity': 12,
'iv': 56,
'boscombe': 17,
'valley': 79,
'mystery': 40,
'v': 52,
'five': 280,
'orange': 24,
'pips': 13,
'vi': 38,
'man': 1653,
'with': 9741,
'twisted': 22,
'lip': 57,
'vii': 35,
'adventure': 35,
'blue': 144,
'carbuncle': 18,
'viii': 40,
'speckled': 6,
'band': 55,
'ix': 29,
'engineer': 13,
's': 5632,
'thumb': 52,
'x': 137,
'noble': 49,
'bachelor': 19,
'xi': 29,
'beryl': 5,
'coronet': 30,
'xii': 29,
'copper': 27,
'beeches': 13,
'she': 3947,
'always': 609,
'woman': 326,
'have': 3494,
'seldom': 77,
'heard': 637,
'him': 5231,
'mention': 47,
'her': 5285,
'under': 964,
'name': 263,
'his': 10035,
'eyes': 940,
'eclipses': 3,
'predominates': 4,
'whole': 745,
'sex': 12,
'was': 11411,
'that': 12513,
'he': 12402,
'felt': 698,
'emotion': 37,
'akin': 15,
'love': 485,
'irene': 19,
'adler': 17,
'emotions': 11,
'one': 3372,
'particularly': 175,
'abhorrent': 2,
'cold': 258,
'precise': 14,
'but': 5654,
'admirably': 8,
'balanced': 7,
'mind': 342,
'take': 617,
'perfect': 40,
'reasoning': 42,
'observing': 22,
'machine': 40,
'has': 1604,
'as': 8065,
'lover': 27,
'would': 1954,
'placed': 183,
'himself': 1159,
'false': 65,
'position': 433,
'never': 594,
'spoke': 219,
'softer': 11,
'passions': 30,
'save': 111,
'gibe': 3,
'sneer': 7,
'they': 3939,
'admirable': 15,
'things': 322,
'observer': 14,
'excellent': 63,
'drawing': 241,
'veil': 17,
'from': 5710,
'men': 1146,
'motives': 15,
'actions': 78,
'trained': 24,
'reasoner': 7,
'admit': 66,
'such': 1437,
'intrusions': 2,
'into': 2125,
'own': 786,
'delicate': 55,
'finely': 12,
'adjusted': 17,
'temperament': 6,
'introduce': 24,
'distracting': 2,
'factor': 42,
'which': 4843,
'might': 537,
'throw': 49,
'doubt': 153,
'upon': 1112,
'mental': 38,
'results': 230,
'grit': 2,
'sensitive': 36,
'instrument': 36,
'crack': 21,
'high': 291,
'power': 549,
'lenses': 2,
'more': 1998,
'disturbing': 10,
'than': 1207,
'strong': 169,
'nature': 171,
'yet': 489,
'there': 2973,
'late': 166,
'dubious': 2,
'questionable': 4,
'memory': 56,
'had': 7384,
'little': 1002,
'lately': 23,
'my': 2250,
'marriage': 97,
'drifted': 6,
'us': 685,
'away': 839,
'each': 412,
'complete': 146,
'happiness': 144,
'home': 296,
'centred': 3,
'interests': 119,
'rise': 241,
'up': 2285,
'around': 272,
'who': 3051,
'finds': 24,
'master': 142,
'establishment': 41,
'sufficient': 76,
'absorb': 5,
'attention': 192,
'while': 769,
'loathed': 2,
'every': 651,
'form': 508,
'society': 170,
'bohemian': 9,
'soul': 169,
'remained': 232,
'lodgings': 12,
'baker': 50,
'street': 181,
'buried': 22,
'among': 452,
'old': 1181,
'books': 60,
'alternating': 3,
'week': 96,
'between': 655,
'cocaine': 5,
'ambition': 14,
'drowsiness': 5,
'drug': 22,
'fierce': 13,
'energy': 46,
'keen': 33,
'still': 923,
'ever': 275,
'deeply': 78,
'attracted': 37,
'study': 145,
'crime': 62,
'occupied': 117,
'immense': 78,
'faculties': 9,
'extraordinary': 75,
'powers': 150,
'observation': 40,
'following': 209,
'those': 1202,
'clues': 4,
'clearing': 30,
'mysteries': 10,
'been': 2600,
'abandoned': 73,
'hopeless': 18,
'official': 92,
'police': 95,
'time': 1530,
'some': 1537,
'vague': 40,
'account': 178,
'doings': 12,
'summons': 12,
'odessa': 4,
'trepoff': 2,
'murder': 31,
'singular': 37,
'tragedy': 10,
'atkinson': 2,
'brothers': 51,
'trincomalee': 2,
'finally': 157,
'mission': 35,
'accomplished': 40,
'so': 3018,
'delicately': 4,
'successfully': 26,
'reigning': 4,
'family': 211,
'holland': 13,
'beyond': 226,
'signs': 99,
'activity': 132,
'however': 431,
'merely': 190,
'shared': 26,
'readers': 12,
'daily': 45,
'press': 82,
'knew': 497,
'former': 178,
'friend': 284,
'companion': 82,
'night': 386,
'on': 6644,
'twentieth': 20,
'returning': 69,
'journey': 70,
'patient': 384,
'now': 1698,
'returned': 195,
'civil': 178,
'practice': 96,
'way': 860,
'led': 197,
'me': 1921,
'through': 816,
'passed': 368,
'well': 1199,
'remembered': 121,
'door': 499,
'must': 956,
'associated': 197,
'wooing': 3,
'dark': 182,
'incidents': 15,
'scarlet': 23,
'seized': 115,
'desire': 97,
'see': 1102,
'again': 867,
'know': 1049,
'employing': 8,
'rooms': 87,
'brilliantly': 6,
'lit': 75,
'even': 947,
'looked': 761,
'saw': 600,
'tall': 75,
'spare': 28,
'figure': 104,
'pass': 155,
'twice': 85,
'silhouette': 2,
'against': 661,
'blind': 24,
'pacing': 27,
'room': 961,
'swiftly': 39,
'eagerly': 40,
'head': 726,
'sunk': 28,
'chest': 82,
'hands': 456,
'clasped': 12,
'behind': 402,
'mood': 52,
'habit': 56,
'attitude': 73,
'manner': 136,
'told': 491,
'their': 2956,
'story': 134,
'work': 383,
'risen': 31,
'created': 63,
'dreams': 17,
'hot': 120,
'scent': 18,
'new': 1212,
'problem': 77,
'rang': 30,
'bell': 66,
'shown': 114,
'chamber': 36,
'formerly': 78,
'part': 705,
'effusive': 3,
'glad': 151,
'think': 558,
'hardly': 174,
'word': 299,
'spoken': 93,
'kindly': 87,
'eye': 111,
'waved': 30,
'an': 3424,
'armchair': 50,
'threw': 97,
'across': 223,
'cigars': 8,
'indicated': 89,
'spirit': 168,
'gasogene': 2,
'corner': 129,
'then': 1559,
'stood': 384,
'fire': 275,
'introspective': 4,
'fashion': 50,
'wedlock': 2,
'suits': 9,
'remarked': 170,
'watson': 84,
'put': 436,
'seven': 133,
'half': 319,
'pounds': 27,
'answered': 227,
'indeed': 140,
'thought': 903,
'just': 768,
'trifle': 12,
'fancy': 51,
'observe': 38,
'did': 1876,
'tell': 493,
'intended': 59,
'go': 906,
'harness': 28,
'deduce': 15,
'getting': 93,
'yourself': 163,
'very': 1341,
'wet': 61,
'clumsy': 9,
'careless': 15,
'servant': 47,
'girl': 167,
'dear': 450,
'said': 3465,
'too': 549,
'much': 672,
'certainly': 120,
'burned': 78,
'lived': 114,
'few': 459,
'centuries': 13,
'ago': 109,
'true': 206,
'walk': 76,
'thursday': 8,
'came': 980,
'dreadful': 69,
'mess': 11,
'changed': 135,
'clothes': 63,
't': 1319,
'imagine': 97,
'mary': 706,
'jane': 3,
'incorrigible': 3,
'wife': 368,
'given': 365,
'notice': 99,
'fail': 41,
'chuckled': 8,
'rubbed': 33,
'long': 992,
'nervous': 55,
'together': 261,
'simplicity': 31,
'itself': 274,
'inside': 44,
'left': 835,
'shoe': 12,
'where': 978,
'firelight': 3,
'strikes': 20,
'leather': 36,
'scored': 5,
'six': 177,
'almost': 326,
'parallel': 18,
'cuts': 6,
'obviously': 39,
'caused': 103,
'someone': 161,
'carelessly': 15,
'scraped': 22,
'round': 557,
'edges': 71,
'sole': 71,
'order': 405,
'crusted': 3,
'mud': 37,
'hence': 33,
'double': 50,
'deduction': 13,
'vile': 17,
'weather': 43,
'malignant': 89,
'boot': 23,
'slitting': 3,
'specimen': 15,
'london': 77,
'slavey': 2,
'if': 2373,
'gentleman': 100,
'walks': 11,
'smelling': 6,
'iodoform': 44,
'black': 236,
'mark': 39,
'nitrate': 8,
'silver': 129,
'right': 711,
'forefinger': 8,
'bulge': 3,
'side': 512,
'top': 43,
'hat': 106,
'show': 214,
'secreted': 3,
'stethoscope': 3,
'dull': 75,
'pronounce': 10,
'active': 97,
'member': 51,
'medical': 23,
'profession': 23,
'could': 1701,
'help': 231,
'laughing': 116,
'ease': 45,
'explained': 61,
'process': 220,
'hear': 184,
'give': 524,
'reasons': 65,
'appears': 109,
'ridiculously': 2,
'simple': 140,
'easily': 115,
'myself': 228,
'though': 651,
'successive': 18,
'instance': 51,
'am': 747,
'baffled': 9,
'until': 326,
'explain': 124,
'believe': 184,
'good': 745,
'yours': 47,
'quite': 503,
'lighting': 17,
'cigarette': 7,
'throwing': 47,
'down': 1129,
'distinction': 20,
'clear': 234,
'example': 287,
'frequently': 219,
'steps': 189,
'lead': 138,
'hall': 84,
'often': 444,
'hundreds': 49,
'times': 237,
'many': 610,
'don': 582,
'observed': 132,
'point': 224,
'seventeen': 11,
'because': 631,
'interested': 66,
'problems': 79,
'enough': 176,
'chronicle': 8,
'two': 1139,
'trifling': 13,
'experiences': 12,
'sheet': 30,
'thick': 78,
'pink': 28,
'tinted': 10,
'notepaper': 3,
'lying': 119,
'open': 326,
'table': 297,
'last': 566,
'post': 118,
'aloud': 29,
'note': 116,
'undated': 2,
'either': 294,
'signature': 10,
'address': 77,
'will': 1578,
'call': 198,
'quarter': 47,
'eight': 129,
'o': 258,
'clock': 121,
'desires': 23,
'consult': 20,
'matter': 366,
'deepest': 16,
'moment': 488,
'recent': 55,
'services': 39,
'royal': 112,
'houses': 118,
'europe': 154,
'safely': 12,
'trusted': 17,
'matters': 137,
'importance': 118,
'exaggerated': 29,
'we': 1907,
'quarters': 73,
'received': 281,
'hour': 158,
'amiss': 7,
'visitor': 75,
'wear': 31,
'mask': 13,
'what': 3012,
'means': 254,
'no': 2349,
'data': 18,
'capital': 145,
'mistake': 40,
'theorise': 2,
'insensibly': 3,
'begins': 48,
'twist': 15,
'facts': 73,
'suit': 26,
'theories': 22,
'instead': 138,
'carefully': 73,
'examined': 50,
'writing': 70,
'paper': 178,
'wrote': 150,
'presumably': 9,
'endeavouring': 9,
'imitate': 8,
'processes': 36,
'bought': 56,
'crown': 62,
'packet': 12,
'peculiarly': 15,
'stiff': 21,
'peculiar': 85,
'hold': 115,
'light': 279,
'large': 484,
'e': 137,
'g': 56,
'p': 67,
'woven': 6,
'texture': 7,
'asked': 778,
'maker': 5,
'monogram': 5,
'rather': 220,
'stands': 20,
'gesellschaft': 2,
'german': 197,
'company': 193,
'customary': 20,
'contraction': 62,
'like': 1081,
'co': 31,
'course': 390,
'papier': 2,
'eg': 2,
'let': 507,
'glance': 92,
'continental': 47,
'gazetteer': 2,
'took': 574,
'heavy': 140,
'brown': 72,
'volume': 31,
'shelves': 4,
'eglow': 2,
'eglonitz': 2,
'here': 692,
'egria': 2,
'speaking': 186,
'far': 409,
'carlsbad': 2,
'remarkable': 78,
'being': 919,
'scene': 50,
'death': 331,
'wallenstein': 2,
'its': 1636,
'numerous': 51,
'glass': 117,
'factories': 30,
'mills': 40,
'ha': 76,
'boy': 170,
'sparkled': 6,
'sent': 320,
'great': 793,
'triumphant': 17,
'cloud': 31,
'made': 1008,
'precisely': 25,
'construction': 26,
'sentence': 27,
'frenchman': 103,
'russian': 462,
'uncourteous': 2,
'verbs': 2,
'only': 1874,
'remains': 74,
'therefore': 187,
'discover': 29,
'wanted': 214,
'writes': 21,
'prefers': 3,
'wearing': 88,
'showing': 105,
'face': 1126,
'comes': 92,
'mistaken': 60,
'resolve': 15,
'doubts': 40,
'sharp': 84,
'sound': 220,
'horses': 263,
'hoofs': 25,
'grating': 11,
'wheels': 48,
'curb': 5,
'followed': 330,
'pull': 24,
'whistled': 14,
'pair': 41,
'yes': 689,
'continued': 292,
'glancing': 99,
'window': 187,
'nice': 54,
'brougham': 5,
'beauties': 3,
'hundred': 230,
'fifty': 95,
'guineas': 4,
'apiece': 8,
'money': 327,
'nothing': 647,
'else': 202,
'better': 267,
'bit': 64,
'doctor': 184,
'stay': 75,
'lost': 225,
'boswell': 2,
'promises': 16,
'interesting': 72,
'pity': 76,
'miss': 113,
'client': 34,
'want': 324,
'sit': 90,
'best': 269,
'slow': 66,
'step': 140,
'stairs': 32,
'passage': 111,
'paused': 80,
'immediately': 183,
'outside': 111,
'loud': 65,
'authoritative': 3,
'tap': 11,
'come': 935,
'entered': 283,
'less': 368,
'feet': 180,
'inches': 17,
'height': 37,
'limbs': 68,
'hercules': 5,
'dress': 139,
'rich': 93,
'richness': 3,
'england': 312,
'bad': 156,
'taste': 24,
'bands': 28,
'astrakhan': 2,
'slashed': 4,
'sleeves': 31,
'fronts': 2,
'breasted': 2,
'coat': 173,
'deep': 216,
'cloak': 63,
'thrown': 93,
'shoulders': 126,
'lined': 33,
'flame': 16,
'coloured': 22,
'silk': 51,
'secured': 49,
'neck': 204,
'brooch': 2,
'consisted': 39,
'single': 174,
'flaming': 9,
'boots': 92,
'extended': 76,
'halfway': 20,
'calves': 4,
'trimmed': 9,
'tops': 4,
'fur': 39,
'completed': 26,
'impression': 68,
'barbaric': 3,
'opulence': 4,
'suggested': 70,
'appearance': 136,
'carried': 283,
'broad': 93,
'brimmed': 5,
'hand': 835,
'wore': 59,
'upper': 131,
'extending': 36,
'past': 224,
'cheekbones': 5,
'vizard': 2,
'apparently': 69,
'raised': 213,
'lower': 197,
'appeared': 198,
'hanging': 43,
'straight': 125,
'chin': 31,
'suggestive': 12,
'resolution': 58,
'pushed': 82,
'length': 64,
'obstinacy': 8,
'harsh': 23,
'voice': 463,
'strongly': 42,
'marked': 139,
'accent': 19,
'uncertain': 31,
'pray': 80,
'seat': 171,
'colleague': 8,
'dr': 49,
'occasionally': 90,
'cases': 454,
'whom': 490,
'honour': 17,
'count': 749,
'von': 12,
'kramm': 3,
'nobleman': 12,
'understand': 413,
'discretion': 14,
'trust': 69,
'extreme': 73,
'prefer': 22,
'communicate': 16,
'alone': 338,
'rose': 244,
'caught': 91,
'wrist': 69,
'back': 747,
'chair': 136,
'none': 111,
'say': 756,
'anything': 380,
'shrugged': 36,
'begin': 98,
'binding': 19,
'absolute': 57,
'secrecy': 19,
'years': 572,
'end': 466,
'present': 330,
'weight': 71,
'influence': 139,
'european': 100,
'history': 440,
'promise': 68,
'excuse': 54,
'strange': 221,
'august': 71,
'person': 186,
'employs': 3,
'wishes': 43,
'agent': 26,
'unknown': 88,
'confess': 37,
'once': 570,
'called': 451,
'exactly': 48,
'aware': 53,
'dryly': 6,
'circumstances': 108,
'delicacy': 12,
'precaution': 10,
'taken': 439,
'quench': 4,
'grow': 75,
'seriously': 64,
'compromise': 72,
'families': 46,
'speak': 256,
'plainly': 40,
'implicates': 6,
'house': 662,
'ormstein': 3,
'hereditary': 15,
'kings': 28,
'murmured': 19,
'settling': 17,
'closing': 36,
'glanced': 177,
...})
第二步:短路原则进行检查
精华部分二:概率思想(短路原则):
- 经过0次变换的(输入正确),概率最高
- 如果输入错误,只差一个(增、删、改、换)的概率次之
- 如果输入错误,差两个的概率更低
- 如果前面三种情况都没有出现,字符可能不在训练集中,“原样”输出
def correct(word):
candidates = known([word]) or known(edits_once(word)) or known_edits_twice(word) or [word]
return max(candidates, key = NWORDS.get)
测试
correct('WordS')
'words'
总结:
返回顶部
整个代码充分体现了python的灵活性和简洁的特点,到了“可读性差”的地步。
附录:
下面的代码可以将1到100的数中,能被3整除时打印fizz,能被5整除时打印buzz,既能被3整除又能被5整除时打印fizzbuzz,其它的数不打印,也充分体现了python“简单暴力”:
for i in range(1, 101):
print(f'{i}: ', 'fizz'[i % 3 * 4::] + 'buzz'[i % 5 * 4::])
结果
1:
2:
3: fizz
4:
5: buzz
6: fizz
7:
8:
9: fizz
10: buzz
11:
12: fizz
13:
14:
15: fizzbuzz
16:
17:
18: fizz
19:
20: buzz
21: fizz
22:
23:
24: fizz
25: buzz
26:
27: fizz
28:
29:
30: fizzbuzz
31:
32:
33: fizz
34:
35: buzz
36: fizz
37:
38:
39: fizz
40: buzz
41:
42: fizz
43:
44:
45: fizzbuzz
46:
47:
48: fizz
49:
50: buzz
51: fizz
52:
53:
54: fizz
55: buzz
56:
57: fizz
58:
59:
60: fizzbuzz
61:
62:
63: fizz
64:
65: buzz
66: fizz
67:
68:
69: fizz
70: buzz
71:
72: fizz
73:
74:
75: fizzbuzz
76:
77:
78: fizz
79:
80: buzz
81: fizz
82:
83:
84: fizz
85: buzz
86:
87: fizz
88:
89:
90: fizzbuzz
91:
92:
93: fizz
94:
95: buzz
96: fizz
97:
98:
99: fizz
100: buzz
欢迎关注,敬请点赞!
返回顶部