|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||
java.lang.Objectjavax.management.Attribute
org.archive.crawler.settings.Type
org.archive.crawler.settings.ComplexType
org.archive.crawler.settings.ModuleType
org.archive.crawler.framework.Processor
org.archive.crawler.fetcher.FetchFTP
public class FetchFTP
Fetches documents and directory listings using FTP. This class will also try to extract FTP "links" from directory listings. For this class to archive a directory listing, the remote FTP server must support the NLIST command. Most modern FTP servers should.
| Nested Class Summary |
|---|
| Nested classes/interfaces inherited from class org.archive.crawler.settings.ComplexType |
|---|
ComplexType.MBeanAttributeInfoIterator |
| Field Summary | |
|---|---|
static java.lang.String |
ATTR_BANDWIDTH
The name for the fetch-bandwidth attribute. |
static java.lang.String |
ATTR_MAX_LENGTH
The name for the max-length-bytes attribute. |
static java.lang.String |
ATTR_PASSWORD
The name for the password attribute. |
static java.lang.String |
ATTR_TIMEOUT
The name for the timeout-seconds attribute. |
static java.lang.String |
ATTR_USERNAME
The name for the username attribute. |
| Fields inherited from class org.archive.crawler.framework.Processor |
|---|
ATTR_DECIDE_RULES, ATTR_ENABLED, attrDecideRules |
| Fields inherited from class org.archive.crawler.settings.ComplexType |
|---|
definition, definitionMap |
| Constructor Summary | |
|---|---|
FetchFTP(java.lang.String name)
Constructs a new FetchFTP. |
|
| Method Summary | |
|---|---|
java.lang.String |
determinePassword(CrawlURI curi)
Determines the password for the given URI. |
boolean |
getExtractFromDirs(CrawlURI curi)
Returns the extract.from.dirs attribute for this
FetchFTP and the given curi. |
boolean |
getExtractParent(CrawlURI curi)
Returns the extract.parent attribute for this
FetchFTP and the given curi. |
int |
getFetchBandwidth(CrawlURI curi)
Returns the fetch-bandwidth attribute for this
FetchFTP and the given curi. |
long |
getMaxLength(CrawlURI curi)
Returns the max-length-bytes attribute for this
FetchFTP and the given curi. |
int |
getTimeout(CrawlURI curi)
Returns the timeout-seconds attribute for this
FetchFTP and the given curi. |
void |
innerProcess(CrawlURI curi)
Processes the given URI. |
| Methods inherited from class org.archive.crawler.framework.Processor |
|---|
checkForInterrupt, finalTasks, getController, getDecideRule, getDefaultNextProcessor, initialTasks, innerRejectProcess, isContentToProcess, isExpectedMimeType, isHttpTransactionContentToProcess, kickUpdate, process, report, rulesAccept, rulesAccept, setDefaultNextProcessor, spawn |
| Methods inherited from class org.archive.crawler.settings.ModuleType |
|---|
addElement, listUsedFiles |
| Methods inherited from class org.archive.crawler.settings.Type |
|---|
addConstraint, equals, getConstraints, getLegalValueType, isExpertSetting, isOverrideable, isTransient, setExpertSetting, setLegalValueType, setOverrideable, setTransient |
| Methods inherited from class javax.management.Attribute |
|---|
getName |
| Methods inherited from class java.lang.Object |
|---|
clone, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait |
| Field Detail |
|---|
public static final java.lang.String ATTR_USERNAME
username attribute.
public static final java.lang.String ATTR_PASSWORD
password attribute.
public static final java.lang.String ATTR_MAX_LENGTH
max-length-bytes attribute.
public static final java.lang.String ATTR_BANDWIDTH
fetch-bandwidth attribute.
public static final java.lang.String ATTR_TIMEOUT
timeout-seconds attribute.
| Constructor Detail |
|---|
public FetchFTP(java.lang.String name)
FetchFTP.
name - the name of this processor| Method Detail |
|---|
public void innerProcess(CrawlURI curi)
throws java.lang.InterruptedException
If the connection is successful, an attempt will be made to CD to the path specified in the URI. If the remote CD command succeeds, then it is assumed that the URI represents a directory. If the CD command fails, then it is assumed that the URI represents a file.
For directories, the directory listing will be fetched using
the FTP LIST command, and saved to the HttpRecorder. If the
extract.from.dirs attribute is set to true, then
the files in the fetched list will be added to the curi as
extracted FTP links. (It was easier to do that here, rather
than writing a separate FTPExtractor.)
For files, the file will be fetched using the FTP RETR command, and saved to the HttpRecorder.
All file transfers (including directory listings) occur using Binary mode transfer. Also, the local passive transfer mode is always used, to play well with firewalls.
innerProcess in class Processorcuri - the curi to process
java.lang.InterruptedException - if the thread is interrupted during
processingpublic boolean getExtractFromDirs(CrawlURI curi)
extract.from.dirs attribute for this
FetchFTP and the given curi.
curi - the curi whose attribute to return
extract.from.dirspublic boolean getExtractParent(CrawlURI curi)
extract.parent attribute for this
FetchFTP and the given curi.
curi - the curi whose attribute to return
extract-parentpublic int getTimeout(CrawlURI curi)
timeout-seconds attribute for this
FetchFTP and the given curi.
curi - the curi whose attribute to return
timeout-secondspublic long getMaxLength(CrawlURI curi)
max-length-bytes attribute for this
FetchFTP and the given curi.
curi - the curi whose attribute to return
max-length-bytespublic int getFetchBandwidth(CrawlURI curi)
fetch-bandwidth attribute for this
FetchFTP and the given curi.
curi - the curi whose attribute to return
fetch-bandwidthpublic java.lang.String determinePassword(CrawlURI curi)
password attribute, and the value
for that attribute is returned.
curi - the curi whose password to return
|
||||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | |||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | |||||||||