|
Class Summary |
| CandidateURI |
A URI, discovered or passed-in, that may be scheduled. |
| Checkpoint |
Record of a specific checkpoint on disk. |
| CrawlHost |
Represents a single remote "host". |
| CrawlOrder |
Represents the 'root' of the settings hierarchy. |
| CrawlServer |
Represents a single remote "server". |
| CrawlSubstats |
Collector of statististics for a 'subset' of a crawl,
such as a server (host:port), host, or frontier group
(eg queue). |
| CrawlURI |
Represents a candidate URI and the associated state it
collects as it is crawled. |
| CredentialStore |
Front door to the credential store. |
| LocalizedError |
|
| RobotsDirectives |
Represents the directives that apply to a user-agent (or set of
user-agents) |
| RobotsExclusionPolicy |
RobotsExclusionPolicy represents the actual policy adopted with
respect to a specific remote server, usually constructed from
consulting the robots.txt, if any, the server provided. |
| RobotsHonoringPolicy |
RobotsHonoringPolicy represent the strategy used by the crawler
for determining how robots.txt files will be honored. |
| Robotstxt |
Utility class for parsing and representing 'robots.txt' format
directives, into a list of named user-agents and map from user-agents
to RobotsDirectives. |
| ServerCache |
Server and Host cache. |