mysql全文索引之停止词(stopword)

本文IT技术学习网将给大家讲述什么是mysql全文索引中的停止词(stopword也有的翻译做停止字)。

stopword

在全文索引中,如果一个词被认为是太普通或者太没价值,那么它将会被搜索索引和搜索查询忽略。innodb和myisam分别有两组不同的设置,控制着对应的stopword。

全文检索时,停止词列表将会被读取和检索,在不同的字符集和排序方式下(character_set_server and collation_server 系统变量),可能会导致在搜索时的停止词的不匹配。

停止词是否大小写敏感,取决于不同的排序方式,比如:latin1_swedish_ci下停止词是大小写敏感的,latin1_general_cs 或 latin1_bin下停止词是大小写不敏感的。

innodb的索引停止词

innodb的默认停止词列表很短。查询INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD表来查看默认的innodb停止词表。

      mysql> SELECT * FROM INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD;
      +-------+
      | value |
      +-------+
扫描二维码关注公众号,回复: 402605 查看本文章
      | a     |
      | about |
      | an    |
      | are   |
      | as    |
      | at    |
      | be    |
      | by    |
      | com   |
      | de    |
      | en    |
      | for   |
      | from  |
      | how   |
      | i     |
      | in    |
      | is    |
      | it    |
      | la    |
      | of    |
      | on    |
      | or    |
      | that  |
      | the   |
      | this  |
      | to    |
      | was   |
      | what  |
      | when  |
      | where |
      | who   |
      | will  |
      | with  |
      | und   |
      | the   |
      | www   |
      +-------+
    36 rows in set (0.00 sec)

myisam索引的停止词

myisam索引的停止词列表与innodb不同,默认的myisam停止词列表是直接在mysql程序源码中已写入。设置ft_stopword_file系统变量来指定停止词文件,从而覆盖默认设置。

在mysql源程序的 storage/myisam/ft_static.c file文件中,你可以找到默认的myisam停止词列表:

      a's able about above according
      accordingly across actually after afterwards
      again against ain't all allow
      allows almost alone along already
      also although always am among
      amongst an and another any
      anybody anyhow anyone anything anyway
      anyways anywhere apart appear appreciate
      appropriate are aren't around as
      aside ask asking associated at
      available away awfully be became
      because become becomes becoming been
      before beforehand behind being believe
      below beside besides best better
      between beyond both brief but
      by c'mon c's came can
      can't cannot cant cause causes
      certain certainly changes clearly co
      com come comes concerning consequently
      consider considering contain containing contains
      corresponding could couldn't course currently
      definitely described despite did didn't
      different do does doesn't doing
      don't done down downwards during
      each edu eg eight either
      else elsewhere enough entirely especially
      et etc even ever every
      everybody everyone everything everywhere ex
      exactly example except far few
      fifth first five followed following
      follows for former formerly forth
      four from further furthermore get
      gets getting given gives go
      goes going gone got gotten
      greetings had hadn't happens hardly
      has hasn't have haven't having
      he he's hello help hence
      her here here's hereafter hereby
      herein hereupon hers herself hi
      him himself his hither hopefully
      how howbeit however i'd i'll
      i'm i've ie if ignored
      immediate in inasmuch inc indeed
      indicate indicated indicates inner insofar
      instead into inward is isn't
      it it'd it'll it's its
      itself just keep keeps kept
      know known knows last lately
      later latter latterly least less
      lest let let's like liked
      likely little look looking looks
      ltd mainly many may maybe
      me mean meanwhile merely might
      more moreover most mostly much
      must my myself name namely
      nd near nearly necessary need
      needs neither never nevertheless new
      next nine no nobody non
      none noone nor normally not
      nothing novel now nowhere obviously
      of off often oh ok
      okay old on once one
      ones only onto or other
      others otherwise ought our ours
      ourselves out outside over overall
      own particular particularly per perhaps
      placed please plus possible presumably
      probably provides que quite qv
      rather rd re really reasonably
      regarding regardless regards relatively respectively
      right said same saw say
      saying says second secondly see
      seeing seem seemed seeming seems
      seen self selves sensible sent
      serious seriously seven several shall
      she should shouldn't since six
      so some somebody somehow someone
      something sometime sometimes somewhat somewhere
      soon sorry specified specify specifying
      still sub such sup sure
      t's take taken tell tends
      th than thank thanks thanx
      that that's thats the their
      theirs them themselves then thence
      there there's thereafter thereby therefore
      therein theres thereupon these they
      they'd they'll they're they've think
      third this thorough thoroughly those
      though three through throughout thru
      thus to together too took
      toward towards tried tries truly
      try trying twice two un
      under unfortunately unless unlikely until
      unto up upon us use
      used useful uses using usually
      value various very via viz
      vs want wants was wasn't
      way we we'd we'll we're
      we've welcome well went were
      weren't what what's whatever when
      whence whenever where where's whereafter
      whereas whereby wherein whereupon wherever
      whether which while whither who
      who's whoever whole whom whose
      why will willing wish with
      within without won't wonder would
      wouldn't yes yet you you'd
      you'll you're you've your yours
      yourself yourselves zero



猜你喜欢

转载自sczcx.iteye.com/blog/2145722