1、CrawlMetadata: including identification of crawler/operator
org.archive.modules.CrawlMetadata: Basic crawl metadata, as consulted by functional modules and recorded in ARCs/WARCs.
org.archive.modules.seeds.TextSeedModule
org.archive.modules.deciderules.DecideRuleSequence
org.archive.modules.CandidateChain
org.archive.modules.FetchChain
org.archive.modules.DispositionChain
org.archive.crawler.framework.CrawlController
org.archive.crawler.frontier.BdbFrontier
org.archive.crawler.util.BdbUriUniqFilter
forceRetire
smallBudget
veryPolite
highPrecedence
<!-- OPTIONAL BUT RECOMMENDED BEANS -->
actionDirectory
crawlLimiter
checkpointService
statisticsTracker
loggerModule
sheetOverlaysManager
cookieStorage
serverCache
configPathConfigurer
crawler-beans.cxml
猜你喜欢
转载自sharehua.iteye.com/blog/1745818
今日推荐
周排行