|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectjavax.management.Attribute
org.archive.crawler.settings.Type
org.archive.crawler.settings.ComplexType
org.archive.crawler.settings.ModuleType
org.archive.crawler.frontier.AbstractFrontier
org.archive.crawler.frontier.WorkQueueFrontier
org.archive.crawler.frontier.BdbFrontier
org.archive.crawler.frontier.DomainSensitiveFrontier
BdbFrontier and
QuotaEnforcer.
public class DomainSensitiveFrontier
Behaves like BdbFrontier (i.e., a basic mostly breadth-first
frontier), but with the addition that you can set the number of documents
to download on a per site basis.
Useful for case of frequent revisits of a site of frequent changes.
Choose the number of docs you want to download and specify
the count in max-docs. If count-per-host is
true, the default, then the crawler will download max-docs
per host. If you create an override, the overridden max-docs
count will be downloaded instead, whether it is higher or lower.
If count-per-host is false, then max-docs
acts like the the crawl order max-docs and the crawler will
download this total amount of docs only. Overrides will
download max-docs total in the overridden domain.
| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class org.archive.crawler.frontier.WorkQueueFrontier |
|---|
WorkQueueFrontier.WakeTask |
| Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType |
|---|
ComplexType.MBeanAttributeInfoIterator |
| Nested classes/interfaces inherited from interface org.archive.crawler.framework.Frontier |
|---|
Frontier.FrontierGroup |
| Field Summary | |
|---|---|
static java.lang.String[] |
ATTR_AVAILABLE_MODES
Deprecated. |
static java.lang.String |
ATTR_COUNTER_MODE
Deprecated. |
static java.lang.String |
ATTR_MAX_DOCS
Deprecated. |
static java.lang.String |
COUNT_DOMAIN
Deprecated. |
static java.lang.String |
COUNT_HOST
Deprecated. |
static java.lang.String |
COUNT_OVERRIDE
Deprecated. |
static java.lang.String |
DEFAULT_MODE
Deprecated. |
| Fields inherited from class org.archive.crawler.frontier.BdbFrontier |
|---|
ATTR_DUMP_PENDING_AT_CLOSE, ATTR_INCLUDED, pendingUris |
| Fields inherited from class org.archive.crawler.settings.ComplexType |
|---|
definition, definitionMap |
| Fields inherited from interface org.archive.crawler.framework.Frontier |
|---|
ATTR_NAME |
| Constructor Summary | |
|---|---|
DomainSensitiveFrontier(java.lang.String name)
Deprecated. |
|
| Method Summary | |
|---|---|
void |
crawledURIDisregard(CrawlURI curi)
Deprecated. Notification of a crawled URI that is to be disregarded. |
void |
crawledURIFailure(CrawlURI curi)
Deprecated. Notification of a failed crawling of a URI. |
void |
crawledURINeedRetry(CrawlURI curi)
Deprecated. Notification of a failed crawl of a URI that will be retried (failure due to possible transient problems). |
void |
crawledURISuccessful(CrawlURI curi)
Deprecated. Notification of a successfully crawled URI |
protected void |
incrementHostCounters(CrawlURI curi)
Deprecated. |
void |
initialize(CrawlController c)
Deprecated. Initializes the Frontier, given the supplied CrawlController. |
| Methods inherited from class org.archive.crawler.frontier.BdbFrontier |
|---|
closeQueue, crawlCheckpoint, crawlEnded, createAlreadyIncluded, deserializeAlreadySeen, dumpAllPendingToLog, getInitialMarker, getQueueFor, getQueueFor, getURIsList, getWorkQueues, initQueue, initQueuesOfQueues, reinit, workQueueDataOnDisk |
| Methods inherited from class org.archive.crawler.frontier.WorkQueueFrontier |
|---|
appendQueueReports, asCrawlUri, averageDepth, congestionRatio, considerIncluded, deepestUri, deleted, deleteURIs, deleteURIs, discoveredUriCount, finished, forget, getGroup, getReports, isEmpty, kickUpdate, next, receive, reportTo, schedule, sendToQueue, singleLineLegend, singleLineReportTo, wakeQueues |
| Methods inherited from class org.archive.crawler.settings.ModuleType |
|---|
addElement, listUsedFiles |
| Methods inherited from class org.archive.crawler.settings.Type |
|---|
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
| Methods inherited from class javax.management.Attribute |
|---|
getName |
| Methods inherited from class java.lang.Object |
|---|
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
public static final java.lang.String ATTR_MAX_DOCS
public static final java.lang.String ATTR_COUNTER_MODE
public static final java.lang.String COUNT_OVERRIDE
public static final java.lang.String COUNT_HOST
public static final java.lang.String COUNT_DOMAIN
public static final java.lang.String[] ATTR_AVAILABLE_MODES
public static final java.lang.String DEFAULT_MODE
| Constructor Detail |
|---|
public DomainSensitiveFrontier(java.lang.String name)
| Method Detail |
|---|
public void initialize(CrawlController c)
throws FatalConfigurationException,
java.io.IOException
WorkQueueFrontier
initialize in interface Frontierinitialize in class BdbFrontierc - The CrawlController that created the Frontier.
FatalConfigurationException - If provided settings are illegal or
otherwise unusable.
java.io.IOException - If there is a problem reading settings or seeds file
from disk.Frontier.initialize(org.archive.crawler.framework.CrawlController)protected void incrementHostCounters(CrawlURI curi)
public void crawledURISuccessful(CrawlURI curi)
CrawlURIDispositionListener
crawledURISuccessful in interface CrawlURIDispositionListenercuri - The relevant CrawlURIpublic void crawledURINeedRetry(CrawlURI curi)
CrawlURIDispositionListener
crawledURINeedRetry in interface CrawlURIDispositionListenercuri - The relevant CrawlURIpublic void crawledURIDisregard(CrawlURI curi)
CrawlURIDispositionListener
crawledURIDisregard in interface CrawlURIDispositionListenercuri - The relevant CrawlURIpublic void crawledURIFailure(CrawlURI curi)
CrawlURIDispositionListener
crawledURIFailure in interface CrawlURIDispositionListenercuri - The relevant CrawlURI
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||