记录:从零开始build自己的搜索引擎

start from 2019/11/04 to ...

Day1:

开源搜索引擎项目: Lucene/Nutch (Doug Cutting, also known as Hadoop. Graduated from Stanford at 1985.)

书籍:搜索引擎-李晓明版  搜索引擎--信息检索实践  信息检索导论

Day2:

安装nutch和solr - Mac

1. java -version #检查是否安装jdk

2. 下载ant 1.10.7-bin.tar.gz for java8 - https://ant.apache.org/bindownload.cgi - 无需编译,直接解压使用 - 用来build nutch source code

3. jdk和ant环境变量设置:

#JDK

扫描二维码关注公众号,回复: 7742207 查看本文章

JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home"

export JAVA_HOME

CLASS_PATH="$JAVA_HOME/lib"

PATH="$PATH:$JAVA_HOME/bin" 

#ANT

ANT_HOME="/Users/*/local/bin/apache-ant-1.10.7"

PATH="$PATH:$ANT_HOME/bin"

4. 下载nutch 1.10-src.tar.gz - https://archive.apache.org/dist/nutch/1.10/ - why this version?仅仅为了和参考文献一中版本保持一致。

解压 - copy below property from apache-nutch-1.10/conf/nutch-default.xml to nutch-site.xml (used to override cfg) - 只需修改htp.agent.name的valule即可

<property>

  <name>http.agent.name</name>

  <value>AlwaysLazy</value>

  <description>HTTP 'User-Agent' request header. MUST NOT be empty - 

    please set this to a single word uniquely related to your organization.

 

    NOTE: You should also check other related properties:

 

    http.robots.agents

    http.agent.description

    http.agent.url

    http.agent.email

    http.agent.version

 

    and set their values appropriately.

 

  </description>

</property>

然后apache-nutch-1.10目录下,输入ant命令开始编译nutch - 第一次编译耗时较长,如果中途hung住了,请ctrl+c,然后ant clean, 再重新ant。(kill一次,第二次耗时22分钟)

5. 下载solr 4.10.4.tgz - http://archive.apache.org/dist/lucene/solr/4.10.4/ - 直接下载built binary - 4.10.4原因同上

解压 - copy配置文件 

cp apache-nutch-1.10/runtime/local/conf/schema-solr4.xml solr-4.10.4/example/solr/collection1/conf/

mv solr-4.10.4/example/solr/collection1/conf/schema-solr4.xml solr-4.10.4/example/solr/collection1/conf/schema.xml (maybe you could backup the old one firstly)

如果run solr时候,遇到如下error:

Could not load conf for core collection1: copyField dest :'location' is not an explicit field and doesn't match a ...

打开文件: solr-4.10.4/example/solr/collection1/conf/schema.xml, 添加:

<field name="location" type="location" stored="true" indexed="true"/>

 

cd solr-4.10.4/example, run this command: java -jar start.jar

Finally, the server is up: http://localhost:8983/solr/

6. 参考文献:

https://www.w3cschool.cn/ozbtsl/ltpuqozt.html

https://www.jianshu.com/p/bdca5215e9ca

https://bbs.csdn.net/topics/391093177

 

 

 

猜你喜欢

转载自www.cnblogs.com/alwayslazy/p/11792142.html