Lucene是一套用来全文检索和搜寻的开源程序，提供了简单而又强大的API，它是对文档中每一个词进行索引。

Lucene的搜索原理：在进行搜索之前，我们只要添加数据，就会对数据进行分词，并把分词后的数据和数据对应的id保存到索引库中，我们进行查询的时候，不会根据某个字段的值直接去查数据库（因为可能数据量很庞大，或者数据库在进行模糊匹配的时候数据库索引会失效，很消耗性能），而是先去索引库中查找获取到id，然后再去数据库中去查找

下面我这里演示下Lucene中一些常用的API的使用吧（本案例采用Lucene4.x系列的版本）

1.创建索引：使用IK分词器，还需要导入两个配置文件

  //演示使用IK分词器后的效果
    @Test
    public void buildIndex() throws IOException {
        //创建Document（类似于数据库中添加的数据）
        Document document = new Document();
        document.add(new StringField("id","1", Field.Store.YES));
        document.add(new TextField("name","我叫MT，我是一个不够优秀但是勤奋的程序员",Field.Store.YES));
        //设置索引库的位置,可以设置文件路径（FSDirectory），也可以是用内存路径（RAMDirectory，查询速度加快，但是无法做到永远保存索引）
        FSDirectory fsDirectory = FSDirectory.open(new File("e:/tmp"));
        //使用IK分词器进行分词，需要注意的是，Google提供的IK分词器只能支持Lucene3.x版本，我的案例是Lucene 4.x版本，使用IKAnalyzer2012FF _u1.jar
        Analyzer ikAnalyzer = new IKAnalyzer();
        //索引写入器配置对象
        IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LATEST, ikAnalyzer);
        //索引写入器
        IndexWriter indexWriter = new IndexWriter(fsDirectory, indexWriterConfig);
        //开始对指定的数据建立索引了
        indexWriter.addDocument(document);
        //注意，建立完索引后需要释放和提交indexWriter，不然索引库里面生成的索引文件不完整
        indexWriter.commit();
        indexWriter.close();
    }

下面是创建索引比较完整的一个例子:

 @Test
    public void completeBuildLuceneIndex() throws IOException {
        Document document = new Document();              //首先定义文档对象（对应数据库中的一条记录）
        //下面开始给文档对象（数据库中的某条记录）添加要进行分词的数据（对应数据库中某条记录的字段值）
        FieldType fieldType = new FieldType();           //设置字段类型
        fieldType.setIndexed(true);                     //设置字段是否建立索引
        fieldType.setTokenized(false);                  //设置字段是否进行分词
        fieldType.setStored(true);                      //设置字段是否保存到索引库中
        document.add(new Field("id","1",fieldType));
        document.add(new TextField("desc","我怀疑你在开车，但是我没有证据....",Field.Store.YES));//这里Strore.YES常量表示查询的时候是否显示原始内容
        //接下来，我们开始定义索引库的位置,有两种，可以是物理地址，也可以是内存地址
        FSDirectory fsDirectory = FSDirectory.open(new File("e:/tmp"));
        //紧接着，设置IK分词器（注意，如果你使用Lucene 4.x系列的版本，Google的ID分词器会版本冲突，可以采纳这个IKAnalyzer2012FF _u1.jar）
        Analyzer analyzer = new IKAnalyzer();
        //然后创建索引写入器配置对象
        IndexWriterConfig indexWriterConfig = new IndexWriterConfig(Version.LATEST, analyzer);
        //在建立索引前，清空索引库
        indexWriterConfig.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
        //创建索引写入器对象
        IndexWriter indexWriter = new IndexWriter(fsDirectory, indexWriterConfig);
        //这一步，开始正式建立索引了，前面都是准备工作
        indexWriter.addDocument(document);
        indexWriter.commit();
        indexWriter.close();
    }

2.a查询索引(也可以认为是对长词进行分词查询)：

 //简单长词模糊查询
    @Test
    public void testQueryParse() throws ParseException, IOException {
        /*
        * 创建查询解析器
        * 参数说明:
        * 参数1：当前Lucene的版本,这个版本号可以直接去掉
        * 参数2：你要搜索的关键词的name（类似于表单中的name属性值一样）
        * 参数3：分词器，这里我们直接new一个IK分词器
        * */
        QueryParser queryParse = new QueryParser(Version.LATEST, "content", new IKAnalyzer());
        Query query = queryParse.parse("谷歌");//参数说明：这里的参数就是搜索时的关键字
        //接下来开始创建索引搜索对象，并且指定索引的位置
        IndexSearcher searcher = new IndexSearcher(DirectoryReader.open(FSDirectory.open(new File("e:/tmp"))));
        //之类我们开始进行正式搜索，同样，前面都是准备工作...这里第二个参数是设置查询的最大结果数据量
        TopDocs docs = searcher.search(query, Integer.MAX_VALUE);//会根据匹配度对每个查询结果进行打分，并返回得分排名文档集合
        System.out.println("查询中命中的文档数量"+docs.totalHits);
        ScoreDoc[] scoreDocs = docs.scoreDocs;
        for (ScoreDoc doc :
                scoreDocs) {
            //打印出分数文档中的内置id
            System.out.println("内置id"+doc.doc);
            //根据文档中的内置的id获取具体的文档对象
            Document document = searcher.doc(doc.doc);
            System.out.println(document.get("id")+"\t"+document.get("content"));

        }

查询索引小加强版本（加强在哪？加强在多列查询上，上述代码只是单列查询，查询的只是content这个字段）

 @Test
    public void queryIndex2() throws ParseException, IOException {
        //创建多列查询解析对象,参数1版本号，参数2是个数组表示你要查询的列，参数3还是IK分词器
        MultiFieldQueryParser multiFieldQueryParser = new MultiFieldQueryParser(Version.LATEST, new String[]{"id", "content"}, new IKAnalyzer());
        Query query = multiFieldQueryParser.parse("谷歌");//开始解析要查询的值
        //创建索引搜索对象,并制定索引库的位置
        IndexSearcher indexSearcher = new IndexSearcher(DirectoryReader.open(FSDirectory.open(new File("e:/tmp"))));
        //开始查询
        TopDocs topDocs = indexSearcher.search(query, Integer.MAX_VALUE);
        System.out.println("命中数量"+topDocs.totalHits);
        ScoreDoc[] scoreDocs = topDocs.scoreDocs;
        for (ScoreDoc scoreDoc :
                scoreDocs) {
            System.out.println("得分排名文档内置id"+scoreDoc.doc);
            //根据id获取得分排名文档
            Document doc = indexSearcher.doc(scoreDoc.doc);
            System.out.println(doc.get("id")+","+doc.get("content"));
        }
    }

其他还有一些查询：词条查询、模糊搜索、相似度搜索、组合查询，下面依依演示一遍

2.b.词条查询

//测试词条查询(测试TermQuery),不需要再去构建查询解析器对象
    @Test
    public void testTermQuery() throws IOException {
        TermQuery query = new TermQuery(new Term("content", "谷歌"));
        IndexSearcher indexSearcher = new IndexSearcher(DirectoryReader.open(FSDirectory.open(new File("e:/tmp"))));
        TopDocs topDocs = indexSearcher.search(query, Integer.MAX_VALUE);
        System.out.println(topDocs.totalHits);
        ScoreDoc[] scoreDocs = topDocs.scoreDocs;
        for (ScoreDoc scoreDoc :
                scoreDocs) {
            System.out.println(scoreDoc.doc);
            Document doc = indexSearcher.doc(scoreDoc.doc);
            System.out.println(doc.get("id")+","+doc.get("content"));
        }
    }

2.c.模糊搜索

  //测试模糊搜索（也可以叫做短词模糊匹配，使用WildcardQuery）
    @Test
    public void testWildcardQuery() throws IOException {
        WildcardQuery wildcardQuery = new WildcardQuery(new Term("content", "*" + "谷歌" + "*"));//注意，参数是星号，不是数据库中的百分号来进行模糊搜索,模糊搜索允许的通配符有*和?
        IndexSearcher indexSearcher = new IndexSearcher(DirectoryReader.open(FSDirectory.open(new File("e:/tmp"))));
        TopDocs topDocs = indexSearcher.search(wildcardQuery, Integer.MAX_VALUE);
        System.out.println(topDocs.totalHits);
        ScoreDoc[] scoreDocs = topDocs.scoreDocs;
        for (ScoreDoc scoreDoc :
                scoreDocs) {
            System.out.println(scoreDoc.doc);
            Document doc = indexSearcher.doc(scoreDoc.doc);
            System.out.println(doc.get("content"));
        }
    }

2.d.相似度搜索（场景：就例如我们有时候想搜'百科全书'写成'摆科全输'，依旧有时候能搜到我们需要的结果）

 @Test
    public void testFuzzQuery() throws IOException {
        FuzzyQuery fuzzyQuery = new FuzzyQuery(new Term("content", "faccbook"), 2);//用户要搜facebook，写成了faccbook。这里的2指的应该是允许输错的范围
        IndexSearcher indexSearcher = new IndexSearcher(DirectoryReader.open(FSDirectory.open(new File("e:/tmp"))));
        TopDocs topDocs = indexSearcher.search(fuzzyQuery, Integer.MAX_VALUE);
        System.out.println(topDocs.totalHits);
        ScoreDoc[] scoreDocs = topDocs.scoreDocs;
        for (ScoreDoc scoreDoc :
                scoreDocs) {
            System.out.println(scoreDoc.doc);
            Document doc = indexSearcher.doc(scoreDoc.doc);
            System.out.println(doc.get("content"));
        }
    }

2.e查询索引库所有内容

 @Test
    public void queryAllfromIndex() throws IOException {
        MatchAllDocsQuery query = new MatchAllDocsQuery();
        IndexSearcher indexSearcher = new IndexSearcher(DirectoryReader.open(FSDirectory.open(new File("e:/tmp"))));
        TopDocs topDocs = indexSearcher.search(query, Integer.MAX_VALUE);
        System.out.println(topDocs.totalHits);
        ScoreDoc[] scoreDocs = topDocs.scoreDocs;
        for (ScoreDoc scoreDoc :
                scoreDocs) {
            System.out.println(scoreDoc.doc);
            Document doc = indexSearcher.doc(scoreDoc.doc);
            System.out.println(doc.get("content"));
        }
    }

2.f组合查询（可以做到将上述几个查询进行合并，嗯，比较实用）

//测试组合查询（这里演示将QueryParse和WildcardParse进行合并）
    @Test
    public void testBooleanQuery() throws ParseException, IOException {
        //先创建长词分词匹配的查询解析器
        QueryParser queryParser = new QueryParser(Version.LATEST, "content", new IKAnalyzer());
        Query query = queryParser.parse("谷歌");//解析要查询的字段的值
        //创建短词模糊匹配
        WildcardQuery wildcardQuery = new WildcardQuery(new Term("content", "*" + "facebook" + "*"));
        //创建组合查询
        BooleanQuery booleanQuery = new BooleanQuery();
        //shoud类似于or，而Must...你懂得，类似于and
        booleanQuery.add(query,BooleanClause.Occur.SHOULD);
        booleanQuery.add(wildcardQuery,BooleanClause.Occur.SHOULD);
        IndexSearcher indexSearcher = new IndexSearcher(DirectoryReader.open(FSDirectory.open(new File("e:/tmp"))));
        TopDocs topDocs = indexSearcher.search(booleanQuery, Integer.MAX_VALUE);
        System.out.println(topDocs.totalHits);
        ScoreDoc[] scoreDocs = topDocs.scoreDocs;
        for (ScoreDoc scoreDoc :
                scoreDocs) {
            System.out.println(scoreDoc.doc);
            Document doc = indexSearcher.doc(scoreDoc.doc);
            System.out.println(doc.get("content"));
        }
    }

3.更新索引库中的索引数据

//更新索引库中的索引
    @Test
    public void updateIndexData() throws IOException {
        //创建Document对象(用于待会的索引更新)
        Document document = new Document();
        document.add(new StringField("id","4", Field.Store.YES));
        document.add(new TextField("content","我很开心，我叫MT2",Field.Store.YES));
        FSDirectory directory = FSDirectory.open(new File("e:/tmp"));
        IndexWriter indexWriter = new IndexWriter(directory, new IndexWriterConfig(Version.LATEST, new IKAnalyzer()));
        indexWriter.updateDocument(new Term("id","4"),document);
        indexWriter.commit();
        indexWriter.close();
    }

4.删除索引

 //删除索引库中的指定数据
    @Test
    public void deleteIndexData() throws IOException {
        FSDirectory directory = FSDirectory.open(new File("e:/tmp"));
        IndexWriter indexWriter = new IndexWriter(directory, new IndexWriterConfig(Version.LATEST, new IKAnalyzer()));
        indexWriter.deleteDocuments(new Term("id","4"));//删除指定的索引
        //如果要删除所有的，可以这样
//        indexWriter.deleteAll();
        indexWriter.commit();
        indexWriter.close();
    }

好了，以上就是lucene的基本操作，后面我会更新一篇进阶点的博客。

Lucene的小入门

Lucene是一套用来全文检索和搜寻的开源程序，提供了简单而又强大的API，它是对文档中每一个词进行索引。

猜你喜欢