最近需要集成一个搜索引擎到项目上,首先用solr集成到tomcat服务器,然后通过配置文件写sql,从数据库中直接取数据。但是一直取不到数据。调查了好久也没有查到问题原因。因为时间比较紧,就换了相对简单好搞得lucene.大体的思路是通过全检索,把数据库中的所有文章数据,和lucene文件建立起同步索引。
public static void Index(List<Article> rs, String lucenepath) { try { Directory directory = FSDirectory.open(new File(lucenepath)); IndexWriter indexWriter = new IndexWriter(directory,LuceneUtils.analyzer,MaxFieldLength.LIMITED); for(Article article : rs) { Document doc = new Document(); doc.add(new Field("id", article.getId(), Store.YES, org.apache.lucene.document.Field.Index.ANALYZED)); if(article.getContent()!= null){ doc.add(new Field("content", article.getContent(), Store.YES, org.apache.lucene.document.Field.Index.ANALYZED)); } doc.add(new Field("title", article.getTitle(), Store.YES, org.apache.lucene.document.Field.Index.ANALYZED)); doc.add(new Field("column_info_id", article.getColumnInfo().getId(), Store.YES, org.apache.lucene.document.Field.Index.ANALYZED)); indexWriter.addDocument(doc); } indexWriter.optimize(); indexWriter.close(); } catch (IOException e) { System.out.println(e); } }
建好索引之后就是检索了
public static List<Article> seacher(String queryString, String lucenepath) { List<Article> articleList = new ArrayList<Article>(); try { Directory directory = FSDirectory.open(new File(lucenepath)); IndexSearcher is = new IndexSearcher(directory); MultiFieldQueryParser parser=new MultiFieldQueryParser(Version.LUCENE_30, new String[]{"title","content"},LuceneUtils.analyzer); /* QueryParser parser = new QueryParser(Version.LUCENE_30, "content", LuceneUtils.analyzer);*/ Query query = parser.parse(queryString); //返回搜索结果 TopDocs docs = is.search(query, 100); ScoreDoc[] scoreDocs = docs.scoreDocs; for (ScoreDoc scoreDoc : scoreDocs) { int num = scoreDoc.doc; Document document = is.doc(num); Article article = DocumentUtils.document2Article(document); articleList.add(article); } //重复数据过滤 articleList = articleList.stream().distinct() .collect(Collectors.toList()); articleList.forEach(System.out::println); } catch (Exception e) { System.out.print(e); } return articleList; }
这个时候一个简单的lucene就写好了
当然还有pom.xml引入,因为版本的原因这里花了很长时间
<!--lucene--> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-core</artifactId> <version>3.0.1</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-analyzers</artifactId> <version>3.0.1</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-memory</artifactId> <version>3.0.1</version> </dependency> <dependency> <groupId>org.apache.lucene</groupId> <artifactId>lucene-highlighter</artifactId> <version>3.0.1</version> </dependency> <!--mmseg4j 的分析器的使用 --> <!--lucene--> <dependency> <groupId>com.chenlb.mmseg4j</groupId> <artifactId>mmseg4j-core</artifactId> <version>1.10.0</version> </dependency>
关于分词的问题也考虑了几个不同的分词器,后来决定用盘古
拿到数据之后就涉及到分页的问题了,
//查询起始记录位置 int begin = DEFAULT_SIZE * (Integer.parseInt(pageStr) - 1) ; //查询终止记录位置 int end = Math.min(begin + DEFAULT_SIZE, articleList.size()); List<Article> articles = new ArrayList<Article>(); //进行分页查询 for(int i=begin;i<end;i++) { articles.add(articleList.get(i)); } Map pageMap = new HashMap<>(); pageMap.put("currentPage", pageStr); pageMap.put("pageSize", sizeStr); pageMap.put("totalCount", articleList.size()); pageMap.put("totalPage", getTotalPage(articleList.size(), Integer.parseInt(sizeStr))); pageMap.put("pagination", getPagination(null) ); pageMap.put("term", term); getJspContext().setAttribute(var, articles); getJspContext().setAttribute(varPage, pageMap); getJspBody().invoke(null); }
这样一个基本检索分页的功能就实现了。当然有很多的不足需要去优化比如高亮展示,提升检索速度等