|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||
| Interface Summary | |
|---|---|
| AdaptiveRevisitAttributeConstants | Defines static constants for the Adaptive Revisiting module defining data keys in the CrawlURI AList. |
| FrontierJournal | Record of key Frontier happenings. |
| Class Summary | |
|---|---|
| AbstractFrontier | Shared facilities for Frontier implementations. |
| AdaptiveRevisitFrontier | A Frontier that will repeatedly visit all encountered URIs. |
| AdaptiveRevisitHostQueue | A priority based queue of CrawlURIs. |
| AdaptiveRevisitQueueList | Maintains an ordered list of AdaptiveRevisitHostQueues used by a
Frontier. |
| AntiCalendarCostAssignmentPolicy | CostAssignmentPolicy that further penalizes URIs with calendar-suggestive strings in them, with an extra unit of cost. |
| BdbFrontier | A Frontier using several BerkeleyDB JE Databases to hold its record of known hosts (queues), and pending URIs. |
| BdbMultipleWorkQueues | A BerkeleyDB-database-backed structure for holding ordered groupings of CrawlURIs. |
| BdbWorkQueue | One independent queue of items with the same 'classKey' (eg host). |
| BucketQueueAssignmentPolicy | Uses the target IPs as basis for queue-assignment, distributing them over a fixed number of sub-queues. |
| CostAssignmentPolicy | Calculate a integer 'cost' value for the given CrawlURI. |
| DomainSensitiveFrontier | Deprecated. As of release 1.10.0. |
| HostnameQueueAssignmentPolicy | QueueAssignmentPolicy based on the hostname:port evident in the given CrawlURI. |
| IPQueueAssignmentPolicy | Uses target IP as basis for queue-assignment, unless it is unavailable, in which case it behaves as HostnameQueueAssignmentPolicy. |
| QueueAssignmentPolicy | Establishes a mapping from CrawlURIs to String keys (queue names). |
| RecoveryJournal | Helper class for managing a simple Frontier change-events journal which is useful for recovering from crawl problems. |
| RecyclingSerialBinding | A SerialBinding that recycles a single FastOutputStream per thread, avoiding reallocation of the internal buffer for either repeated serializations or because of mid-serialization expansions. |
| SurtAuthorityQueueAssignmentPolicy | SurtAuthorityQueueAssignmentPolicy based on the surt form of hostname. |
| TopmostAssignedSurtQueueAssignmentPolicy | Create a queueKey based on the SURT authority, reduced to the public-suffix-plus-one domain (topmost assignable domain). |
| UnitCostAssignmentPolicy | A CostAssignment policy that uses a constant value of 1 for all CrawlURIs. |
| WagCostAssignmentPolicy | A CostAssignmentPolicy based on some wild guesses of kinds of URIs that should be deferred into the (potentially never-crawled) future. |
| WorkQueue | A single queue of related URIs to visit, grouped by a classKey (typically "hostname:port" or similar) |
| WorkQueueFrontier | A common Frontier base using several queues to hold pending URIs. |
| ZeroCostAssignmentPolicy | CostAssignmentPolicy considering all URIs costless -- essentially disabling budgetting features. |
|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||