start from 2019/11/04 to ...
Day1:
开源搜索引擎项目: Lucene/Nutch (Doug Cutting, also known as Hadoop. Graduated from Stanford at 1985.)
书籍:搜索引擎-李晓明版 搜索引擎--信息检索实践 信息检索导论
Day2:
安装nutch和solr - Mac
1. java -version #检查是否安装jdk
2. 下载ant 1.10.7-bin.tar.gz for java8 - https://ant.apache.org/bindownload.cgi - 无需编译,直接解压使用 - 用来build nutch source code
3. jdk和ant环境变量设置:
#JDK
JAVA_HOME="/Library/Java/JavaVirtualMachines/jdk1.8.0_161.jdk/Contents/Home"
export JAVA_HOME
CLASS_PATH="$JAVA_HOME/lib"
PATH="$PATH:$JAVA_HOME/bin"
#ANT
ANT_HOME="/Users/*/local/bin/apache-ant-1.10.7"
PATH="$PATH:$ANT_HOME/bin"
4. 下载nutch 1.10-src.tar.gz - https://archive.apache.org/dist/nutch/1.10/ - why this version?仅仅为了和参考文献一中版本保持一致。
解压 - copy below property from apache-nutch-1.10/conf/nutch-default.xml to nutch-site.xml (used to override cfg) - 只需修改htp.agent.name的valule即可
<property>
<name>http.agent.name</name>
<value>AlwaysLazy</value>
<description>HTTP 'User-Agent' request header. MUST NOT be empty -
please set this to a single word uniquely related to your organization.
NOTE: You should also check other related properties:
http.robots.agents
http.agent.description
http.agent.url
http.agent.email
http.agent.version
and set their values appropriately.
</description>
</property>
然后apache-nutch-1.10目录下,输入ant命令开始编译nutch - 第一次编译耗时较长,如果中途hung住了,请ctrl+c,然后ant clean, 再重新ant。(kill一次,第二次耗时22分钟)
5. 下载solr 4.10.4.tgz - http://archive.apache.org/dist/lucene/solr/4.10.4/ - 直接下载built binary - 4.10.4原因同上
解压 - copy配置文件
cp apache-nutch-1.10/runtime/local/conf/schema-solr4.xml solr-4.10.4/example/solr/collection1/conf/
mv solr-4.10.4/example/solr/collection1/conf/schema-solr4.xml solr-4.10.4/example/solr/collection1/conf/schema.xml (maybe you could backup the old one firstly)
如果run solr时候,遇到如下error:
Could not load conf for core collection1: copyField dest :'location' is not an explicit field and doesn't match a ...
打开文件: solr-4.10.4/example/solr/collection1/conf/schema.xml, 添加:
<field name="location" type="location" stored="true" indexed="true"/>
cd solr-4.10.4/example, run this command: java -jar start.jar
Finally, the server is up: http://localhost:8983/solr/
6. 参考文献:
https://www.w3cschool.cn/ozbtsl/ltpuqozt.html
https://www.jianshu.com/p/bdca5215e9ca
https://bbs.csdn.net/topics/391093177