|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectjavax.management.Attribute
org.archive.crawler.settings.Type
org.archive.crawler.settings.ComplexType
org.archive.crawler.settings.ModuleType
org.archive.crawler.framework.AbstractTracker
public abstract class AbstractTracker
A partial implementation of the StatisticsTracking interface.
It covers the thread handling. (Launching, pausing etc.) Included in this is keeping track of the total time spent (actually) crawling. Several methods to access the time started, finished etc. are provided.
To handle the thread work the class implements the CrawlStatusListener and uses it's events to pause, resume and stop logging of statistics. The run() method will call logActivity() at intervals specified in the crawl order.
Implementation of logActivity (the actual logging) as well as listening for CrawlURIDisposition events is not addressed.
StatisticsTracking,
StatisticsTracker,
Serialized Form| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType |
|---|
ComplexType.MBeanAttributeInfoIterator |
| Field Summary | |
|---|---|
static java.lang.String |
ATTR_STATS_INTERVAL
Attribute name for logging interval in seconds setting |
protected CrawlController |
controller
A reference to the CrawlContoller of the crawl that we are to track statistics for. |
protected long |
crawlerEndTime
|
protected long |
crawlerPauseStarted
|
protected long |
crawlerStartTime
|
protected long |
crawlerTotalPausedTime
|
static java.lang.Integer |
DEFAULT_STATISTICS_REPORT_INTERVAL
Default period between logging stat values |
protected long |
lastLogPointTime
Timestamp of when this logger last wrote something to the log |
protected boolean |
shouldrun
|
| Fields inherited from class org.archive.crawler.settings.ComplexType |
|---|
definition, definitionMap |
| Fields inherited from interface org.archive.crawler.framework.StatisticsTracking |
|---|
SEED_DISPOSITION_DISREGARD, SEED_DISPOSITION_FAILURE, SEED_DISPOSITION_NOT_PROCESSED, SEED_DISPOSITION_RETRY, SEED_DISPOSITION_SUCCESS |
| Constructor Summary | |
|---|---|
AbstractTracker(java.lang.String name,
java.lang.String description)
|
|
| Method Summary | |
|---|---|
long |
crawlDuration()
Returns how long the current crawl has been running (excluding any time spent paused/suspended/stopped) since it began. |
void |
crawlEnded(java.lang.String sExitMessage)
Called when a CrawlController has ended a crawl and is about to exit. |
void |
crawlEnding(java.lang.String sExitMessage)
Called when a CrawlController is ending a crawl (for any reason) |
void |
crawlPaused(java.lang.String statusMessage)
Called when a CrawlController is actually paused (all threads are idle). |
void |
crawlPausing(java.lang.String statusMessage)
Called when a CrawlController is going to be paused. |
void |
crawlResuming(java.lang.String statusMessage)
Called when a CrawlController is resuming a crawl that had been paused. |
void |
crawlStarted(java.lang.String message)
Called on crawl start. |
protected void |
dumpReports()
Dump reports, if any, on request or at crawl end. |
protected void |
finalCleanup()
Cleanup resources used, at crawl end. |
long |
getCrawlEndTime()
If crawl has ended it will return the time it ended (given by System.currentTimeMillis() at that time). |
long |
getCrawlerTotalElapsedTime()
Total amount of time spent actively crawling so far. |
long |
getCrawlPauseStartedTime()
Get the time when the the crawl was last paused/suspended (as given by System.currentTimeMillis() at that time). |
long |
getCrawlStartTime()
Get the starting time of the crawl (as given by System.currentTimeMillis() when the crawl started). |
long |
getCrawlTotalPauseTime()
Returns the number of milliseconds that the crawl spent paused or otherwise in a nonactive state. |
protected int |
getLogWriteInterval()
The number of seconds to wait between writing snapshot data to log file. |
void |
initialize(CrawlController c)
Sets up the Logger (including logInterval) and registers with the CrawlController for CrawlStatus and CrawlURIDisposition events. |
protected void |
logNote(java.lang.String note)
|
void |
noteStart()
Notify tracker that crawl has begun. |
protected void |
progressStatisticsEvent(java.util.EventObject e)
A method for logging current crawler state. |
java.lang.String |
progressStatisticsLegend()
|
void |
run()
Start thread. |
protected void |
tallyCurrentPause()
For a current pause (if any), add paused time to total and reset |
| Methods inherited from class org.archive.crawler.settings.ModuleType |
|---|
addElement, listUsedFiles |
| Methods inherited from class org.archive.crawler.settings.Type |
|---|
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
| Methods inherited from class javax.management.Attribute |
|---|
getName |
| Methods inherited from class java.lang.Object |
|---|
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Methods inherited from interface org.archive.crawler.framework.StatisticsTracking |
|---|
activeThreadCount, averageDepth, congestionRatio, currentProcessedDocsPerSec, currentProcessedKBPerSec, deepestUri, getProgressStatistics, getProgressStatisticsLine, getSeedRecordsSortedByStatusCode, processedDocsPerSec, processedKBPerSec, successfullyFetchedCount, totalBytesCrawled, totalBytesWritten, totalCount |
| Methods inherited from interface org.archive.crawler.event.CrawlStatusListener |
|---|
crawlCheckpoint |
| Field Detail |
|---|
public static final java.lang.Integer DEFAULT_STATISTICS_REPORT_INTERVAL
public static final java.lang.String ATTR_STATS_INTERVAL
protected transient CrawlController controller
protected long crawlerStartTime
protected long crawlerEndTime
protected long crawlerPauseStarted
protected long crawlerTotalPausedTime
protected long lastLogPointTime
protected boolean shouldrun
| Constructor Detail |
|---|
public AbstractTracker(java.lang.String name,
java.lang.String description)
name - description - | Method Detail |
|---|
public void initialize(CrawlController c)
throws FatalConfigurationException
initialize in interface StatisticsTrackingc - A crawl controller instance.
FatalConfigurationException - Not thrown here. For overrides that
go to settings system for configuration.CrawlStatusListener,
CrawlURIDispositionListenerpublic void run()
run in interface java.lang.Runnablepublic java.lang.String progressStatisticsLegend()
progressStatisticsLegend in interface StatisticsTrackingpublic void noteStart()
noteStart in interface StatisticsTrackingprotected void progressStatisticsEvent(java.util.EventObject e)
CrawlController.logProgressStatistics(java.lang.String) so CrawlController
can act on progress statistics event.
It is recommended that for implementations of this method it be carefully considered if it should be synchronized in whole or in part
e - Progress statistics event.public long getCrawlStartTime()
System.currentTimeMillis() when the crawl started).
public long getCrawlEndTime()
System.currentTimeMillis() at that time).
System.currentTimeMillis() at the time of the call.
public long getCrawlTotalPauseTime()
public long getCrawlPauseStartedTime()
System.currentTimeMillis() at that time). Will be 0 if the
crawl is not currently paused.
public long getCrawlerTotalElapsedTime()
StatisticsTrackingReturns the total amount of time (in milliseconds) that has elapsed from the start of the crawl and until the current time or if the crawl has ended until the the end of the crawl minus any time spent paused.
getCrawlerTotalElapsedTime in interface StatisticsTrackingprotected int getLogWriteInterval()
public void crawlPausing(java.lang.String statusMessage)
CrawlStatusListener
crawlPausing in interface CrawlStatusListenerstatusMessage - Should be
STATUS_WAITING_FOR_PAUSE. Passed for convenienceCrawlStatusListener.crawlPausing(java.lang.String)protected void logNote(java.lang.String note)
public void crawlPaused(java.lang.String statusMessage)
CrawlStatusListener
crawlPaused in interface CrawlStatusListenerstatusMessage - Should be
CrawlJob.STATUS_PAUSED. Passed for
conveniencepublic void crawlResuming(java.lang.String statusMessage)
CrawlStatusListener
crawlResuming in interface CrawlStatusListenerstatusMessage - Should be
CrawlJob.STATUS_RUNNING. Passed for
convenienceprotected void tallyCurrentPause()
public void crawlEnding(java.lang.String sExitMessage)
CrawlStatusListener
crawlEnding in interface CrawlStatusListenersExitMessage - Type of exit. Should be one of the STATUS constants
in defined in CrawlJob.CrawlJobpublic void crawlEnded(java.lang.String sExitMessage)
CrawlStatusListener
crawlEnded in interface CrawlStatusListenersExitMessage - Type of exit. Should be one of the STATUS constants
in defined in CrawlJob.CrawlStatusListener.crawlEnded(java.lang.String)public void crawlStarted(java.lang.String message)
CrawlStatusListener
crawlStarted in interface CrawlStatusListenermessage - Start message.protected void dumpReports()
protected void finalCleanup()
public long crawlDuration()
StatisticsTracking
crawlDuration in interface StatisticsTrackingStatisticsTracking.crawlDuration()
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||