|
Class Summary |
| AcceptRevisitProcessor |
Set a URI to be revisited by the ARFrontier. |
| ContentBasedWaitEvaluator |
A WaitEvaluator that compares the CrawlURIs content type to a configurable
regular expression. |
| CrawlStateUpdater |
A step, late in the processing of a CrawlURI, for updating the per-host
information that may have been affected by the fetch. |
| FrontierScheduler |
'Schedule' with the Frontier CandidateURIs being carried by the passed
CrawlURI. |
| ImageWaitEvaluator |
A specialized ContentBasedWaitEvaluator. |
| LinksScoper |
Determine which extracted links are within scope. |
| LowDiskPauseProcessor |
Processor module which uses 'df -k', where available and with
the expected output format (on Linux), to monitor available
disk space and pause the crawl if free space on monitored
filesystems falls below certain thresholds. |
| RejectRevisitProcessor |
Set a URI to not be revisited by the ARFrontier. |
| SupplementaryLinksScoper |
Run CandidateURI links carried in the passed CrawlURI through a filter
and 'handle' rejections. |
| TextWaitEvaluator |
A specialized ContentBasedWaitEvaluator. |
| WaitEvaluator |
A processor that determines when a URI should be revisited next. |