版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/qq_36059561/article/details/83890536
一、使用Tika创建索引
之前创建索引的文档都是txt文件,现在有了Tika,我们就可以将pdf,word,html等文件,通过Tika提取出文本,之后创建索引,创建索引的写法和之前大致相似。只需要将content域对应的值做一下处理,之前是FileReader来读取,现在是使用Tika.parse()来获取。
public void index(boolean update) {
IndexWriter indexWriter = null;
try {
Directory directory = FSDirectory.open(new File("E:\\Lucene\\IndexLibrary"));
indexWriter = new IndexWriter(directory, new IndexWriterConfig(Version.LUCENE_35, new IKAnalyzer()));
if (update) {
indexWriter.deleteAll();
}
File[] files = new File("E:\\Lucene\\SearchSource\\TikaSource").listFiles();
for (File file : files) {
// 通过Tika来存储数据
Document document = new Document();
// 如果需要,可以放入Metadata数据
Metadata metadata = new Metadata();
document.add(new Field("content", new Tika().parse(file, metadata)));
document.add(new Field("fileName", file.getName(), Field.Store.YES, Field.Index.NOT_ANALYZED));
document.add(new Field("path", file.getAbsolutePath(), Field.Store.YES, Field.Index.NOT_ANALYZED));
document.add(new NumericField("date", Field.Store.YES, true).setLongValue(file.lastModified()));
document.add(new NumericField("size", Field.Store.YES, true).setIntValue((int) (file.length() / 1024)));
indexWriter.addDocument(document);
}
} catch (IOException e) {
e.printStackTrace();
} finally {
if (indexWriter != null) {
try {
indexWriter.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
二、使用Tika进行搜索
索引文件都创建出来了,搜索自然就很简单了,和之前一样,重心应该放在创建索引上,直接上代码吧。
public void search() {
try {
Directory directory = FSDirectory.open(new File("E:\\Lucene\\IndexLibrary"));
IndexSearcher indexSearcher = new IndexSearcher(IndexReader.open(directory));
TermQuery termQuery = new TermQuery(new Term("content", "必须"));
TopDocs topDocs = indexSearcher.search(termQuery, 20);
for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
Document document = indexSearcher.doc(scoreDoc.doc);
System.out.println(document.get("fileName"));
}
} catch (IOException e) {
e.printStackTrace();
}
}