|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectjavax.management.Attribute
org.archive.crawler.settings.Type
org.archive.crawler.settings.ComplexType
org.archive.crawler.settings.ModuleType
org.archive.crawler.framework.Processor
org.archive.crawler.framework.WriterPoolProcessor
public abstract class WriterPoolProcessor
Abstract implementation of a file pool processor.
Subclass to implement for a particular WriterPoolMember instance.
| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType |
|---|
ComplexType.MBeanAttributeInfoIterator |
| Field Summary | |
|---|---|
protected static java.lang.String |
ANNOTATION_UNWRITTEN
CrawlURI annotation indicating no record was written |
static java.lang.String |
ATTR_COMPRESS
Key to use asking settings for file compression value. |
static java.lang.String |
ATTR_MAX_BYTES_WRITTEN
Key for the maximum bytes to write attribute. |
static java.lang.String |
ATTR_MAX_SIZE_BYTES
Key to use asking settings for file max size value. |
static java.lang.String |
ATTR_PATH
Key to use asking settings for arc path value. |
static java.lang.String |
ATTR_POOL_MAX_ACTIVE
Key to get maximum pool size. |
static java.lang.String |
ATTR_POOL_MAX_WAIT
Key to get maximum wait on pool object before we give up and throw IOException. |
static java.lang.String |
ATTR_PREFIX
Key to use asking settings for file prefix value. |
static java.lang.String |
ATTR_SKIP_IDENTICAL_DIGESTS
Key for whether to skip writing records of content-digest repeats |
static java.lang.String |
ATTR_SUFFIX
Key to use asking settings for file suffix value. |
static boolean |
DEFAULT_COMPRESS
Default as to whether we do compression of files. |
| Fields inherited from class org.archive.crawler.framework.Processor |
|---|
ATTR_DECIDE_RULES, ATTR_ENABLED, attrDecideRules |
| Fields inherited from class org.archive.crawler.settings.ComplexType |
|---|
definition, definitionMap |
| Constructor Summary | |
|---|---|
WriterPoolProcessor(java.lang.String name)
|
|
WriterPoolProcessor(java.lang.String name,
java.lang.String description)
|
|
| Method Summary | |
|---|---|
protected java.util.List<java.lang.String> |
cacheMetadata()
|
protected void |
checkBytesWritten()
|
protected void |
checkpointRecover()
Called out of initialTasks() when recovering a checkpoint. |
void |
crawlCheckpoint(java.io.File checkpointDir)
Called by CrawlController when checkpointing. |
void |
crawlEnded(java.lang.String sExitMessage)
Called when a CrawlController has ended a crawl and is about to exit. |
void |
crawlEnding(java.lang.String sExitMessage)
Called when a CrawlController is ending a crawl (for any reason) |
void |
crawlPaused(java.lang.String statusMessage)
Called when a CrawlController is actually paused (all threads are idle). |
void |
crawlPausing(java.lang.String statusMessage)
Called when a CrawlController is going to be paused. |
void |
crawlResuming(java.lang.String statusMessage)
Called when a CrawlController is resuming a crawl that had been paused. |
void |
crawlStarted(java.lang.String message)
Called on crawl start. |
java.lang.Object |
getAttributeUnchecked(java.lang.String name)
Version of getAttributes that catches and logs exceptions and returns null if failure to fetch the attribute. |
protected java.lang.String |
getCheckpointStateFile()
|
protected java.lang.String[] |
getDefaultPath()
|
protected java.lang.String |
getFirstrecordBody(java.io.File orderFile)
Write the arc metadata body content. |
protected java.lang.String |
getFirstrecordStylesheet()
|
protected java.lang.String |
getHostAddress(CrawlURI curi)
Return IP address of given URI suitable for recording (as in a classic ARC 5-field header line). |
long |
getMaxSize()
Max size we want files to be (bytes). |
long |
getMaxToWrite()
|
java.util.List<java.lang.String> |
getMetadata()
Return list of metadatas to add to first arc file metadata record. |
java.util.List<java.io.File> |
getOutputDirs()
|
protected WriterPool |
getPool()
|
int |
getPoolMaximumActive()
|
int |
getPoolMaximumWait()
|
java.lang.String |
getPrefix()
|
protected java.util.concurrent.atomic.AtomicInteger |
getSerialNo()
|
java.lang.String |
getSuffix()
|
protected long |
getTotalBytesWritten()
|
void |
initialTasks()
Classes subclassing this one should override this method to perform processor specific actions. |
protected abstract void |
innerProcess(CrawlURI curi)
Writes a CrawlURI and its associated data to store file. |
boolean |
isCompressed()
|
protected int |
loadCheckpointSerialNumber()
|
protected void |
saveCheckpointSerialNumber(java.io.File checkpointDir,
int serialNo)
|
protected void |
setPool(WriterPool pool)
|
protected void |
setTotalBytesWritten(long totalBytesWritten)
|
protected abstract void |
setupPool(java.util.concurrent.atomic.AtomicInteger serialNo)
Set up pool of files. |
protected boolean |
shouldWrite(CrawlURI curi)
Whether the given CrawlURI should be written to archive files. |
| Methods inherited from class org.archive.crawler.framework.Processor |
|---|
checkForInterrupt, finalTasks, getController, getDecideRule, getDefaultNextProcessor, innerRejectProcess, isContentToProcess, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, report, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn |
| Methods inherited from class org.archive.crawler.settings.ModuleType |
|---|
addElement, listUsedFiles |
| Methods inherited from class org.archive.crawler.settings.Type |
|---|
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
| Methods inherited from class javax.management.Attribute |
|---|
getName |
| Methods inherited from class java.lang.Object |
|---|
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
public static final java.lang.String ATTR_COMPRESS
public static final boolean DEFAULT_COMPRESS
public static final java.lang.String ATTR_PREFIX
public static final java.lang.String ATTR_PATH
public static final java.lang.String ATTR_SUFFIX
public static final java.lang.String ATTR_MAX_SIZE_BYTES
public static final java.lang.String ATTR_POOL_MAX_ACTIVE
public static final java.lang.String ATTR_POOL_MAX_WAIT
public static final java.lang.String ATTR_MAX_BYTES_WRITTEN
public static final java.lang.String ATTR_SKIP_IDENTICAL_DIGESTS
protected static final java.lang.String ANNOTATION_UNWRITTEN
| Constructor Detail |
|---|
public WriterPoolProcessor(java.lang.String name)
name - Name of this processor.
public WriterPoolProcessor(java.lang.String name,
java.lang.String description)
name - Name of this processor.description - Description for this processor.| Method Detail |
|---|
protected java.lang.String[] getDefaultPath()
public void initialTasks()
ProcessorThis method is garanteed to be called after the crawl is set up, but before any URI-processing has occured.
initialTasks in class Processorprotected java.util.concurrent.atomic.AtomicInteger getSerialNo()
protected abstract void setupPool(java.util.concurrent.atomic.AtomicInteger serialNo)
protected abstract void innerProcess(CrawlURI curi)
innerProcess in class Processorcuri - CrawlURI to process.protected void checkBytesWritten()
protected boolean shouldWrite(CrawlURI curi)
curi - CrawlURI
protected java.lang.String getHostAddress(CrawlURI curi)
curi - CrawlURI
public java.lang.Object getAttributeUnchecked(java.lang.String name)
name - Attribute name.
public long getMaxSize()
public java.lang.String getPrefix()
public java.util.List<java.io.File> getOutputDirs()
public boolean isCompressed()
public int getPoolMaximumActive()
public int getPoolMaximumWait()
public java.lang.String getSuffix()
public long getMaxToWrite()
public void crawlEnding(java.lang.String sExitMessage)
CrawlStatusListener
crawlEnding in interface CrawlStatusListenersExitMessage - Type of exit. Should be one of the STATUS constants
in defined in CrawlJob.CrawlJobpublic void crawlEnded(java.lang.String sExitMessage)
CrawlStatusListener
crawlEnded in interface CrawlStatusListenersExitMessage - Type of exit. Should be one of the STATUS constants
in defined in CrawlJob.CrawlJobpublic void crawlStarted(java.lang.String message)
CrawlStatusListener
crawlStarted in interface CrawlStatusListenermessage - Start message.protected java.lang.String getCheckpointStateFile()
public void crawlCheckpoint(java.io.File checkpointDir)
throws java.io.IOException
CrawlStatusListenerCrawlController when checkpointing.
crawlCheckpoint in interface CrawlStatusListenercheckpointDir - Checkpoint dir. Write checkpoint state here.
java.io.IOExceptionpublic void crawlPausing(java.lang.String statusMessage)
CrawlStatusListener
crawlPausing in interface CrawlStatusListenerstatusMessage - Should be
STATUS_WAITING_FOR_PAUSE. Passed for conveniencepublic void crawlPaused(java.lang.String statusMessage)
CrawlStatusListener
crawlPaused in interface CrawlStatusListenerstatusMessage - Should be
CrawlJob.STATUS_PAUSED. Passed for
conveniencepublic void crawlResuming(java.lang.String statusMessage)
CrawlStatusListener
crawlResuming in interface CrawlStatusListenerstatusMessage - Should be
CrawlJob.STATUS_RUNNING. Passed for
convenienceprotected WriterPool getPool()
protected void setPool(WriterPool pool)
protected long getTotalBytesWritten()
protected void setTotalBytesWritten(long totalBytesWritten)
protected void checkpointRecover()
initialTasks() when recovering a checkpoint.
Restore state.
protected int loadCheckpointSerialNumber()
protected void saveCheckpointSerialNumber(java.io.File checkpointDir,
int serialNo)
throws java.io.IOException
java.io.IOExceptionpublic java.util.List<java.lang.String> getMetadata()
getFirstrecordStylesheet().
Get xml files from settingshandler. Currently order file is the
only xml file. We're NOT adding seeds to meta data.
protected java.util.List<java.lang.String> cacheMetadata()
protected java.lang.String getFirstrecordStylesheet()
protected java.lang.String getFirstrecordBody(java.io.File orderFile)
orderFile - Order file.
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||