lucene 小知识

以前对全文检索望而却步，认为很难玩，最近玩了下Lucene ，发现挺容易上手的。废话不多说，记下小体会。

luncen索引用的是倒排索引技术，倒排索引和书后面的索引基本类似。其结构如下图所示：

这种结构使得lucene的检索效率高，在常数时间内一次命中所有的文档，一次I/O操作。

     Lucene建立索引非常简单，下面以创建内存索引为例。

     首先创建一个内存索引对象：

//建立内存索引对象
			Directory directory = new RAMDirectory();

     接着配置IndexWriter:

//配置IndexWriterConfig
			IndexWriterConfig iwConfig = new IndexWriterConfig(Version.LUCENE_34 , analyzer);
			iwConfig.setOpenMode(OpenMode.CREATE_OR_APPEND);
			IndexWriter iwriter = new IndexWriter(directory , iwConfig);

      注意：这里的analyzer为中文分词工具，实现自己的中文分词工具很重要。

      剩下就是写入索引了，在写入索引的时候得先创建Document，而Document 又包含有Field。Field就是要进行索引和检索的域（我个人的理解，可能有误）

Document doc = new Document();
                  doc.add(new Field(fieldName, text, Field.Store.YES, Field.Index.ANALYZED));

       filedName是为了后续的检索做准备，text就是要建立索引的文本。

       可以创建多个Document，也可以把多个text写入一个Document中。

iwriter.addDocument(doc);
                    iwriter.close();

剩下的代码为检索：

IndexReader ireader = IndexReader.open(directory);
			IndexSearcher isearcher = new IndexSearcher(ireader);			
			
			String keyword = "要检索的词条";			
			//使用QueryParser查询分析器构造Query对象
			QueryParser qp = new QueryParser(Version.LUCENE_34, fieldName, analyzer);
			qp.setDefaultOperator(QueryParser.AND_OPERATOR);
			Query query = qp.parse(keyword);
			
			//搜索相似度最高的5条记录
			TopDocs topDocs = isearcher.search(query , 5);
			System.out.println("命中：" + topDocs.totalHits);
			//输出结果
			ScoreDoc[] scoreDocs = topDocs.scoreDocs;
			for (int i = 0; i < topDocs.totalHits; i++){
				Document targetDoc = isearcher.doc(scoreDocs[i].doc);
				System.out.println("内容：" + targetDoc.toString());
			}

就这要一个建立索引和检索的小程序就可以了。lucene确实给全文检索带来了很大的方便，可以使新手快速的掌握基本知识。但是到深入还有很长的路要走。要建立好的全文检索工程代码，首先得有好的分词工具。

猜你喜欢