Fetcher: No agents listed in 'http.agent.name' property. Exception in thread "main" java.lang.IllegalArgumentException: Fetcher: No agents listed in 'http.agent.name' property. at org.apache.nutch.fetcher.Fetcher.checkConfiguration(Fetcher.java:1161) at org.apache.nutch.fetcher.Fetcher.fetch(Fetcher.java:1067) at org.apache.nutch.crawl.Crawl.main(Crawl.java:133)
经过搜索才知道是nutch-default.xml属性设置问题:
抛出异常前的设置:
<property> <name>http.agent.name</name> <value></value> <description>HTTP 'User-Agent' request header. MUST NOT be empty - please set this to a single word uniquely related to your organization. NOTE: You should also check other related properties: http.robots.agents http.agent.description http.agent.url http.agent.email http.agent.version and set their values appropriately. </description> </property>
设置如下方式后,异常信息就不见了:
<property> <name>http.agent.name</name> <value>HD nutch agent</value> <description>HTTP 'User-Agent' request header. MUST NOT be empty - please set this to a single word uniquely related to your organization. NOTE: You should also check other related properties: http.robots.agents http.agent.description http.agent.url http.agent.email http.agent.version and set their values appropriately. </description> </property>