|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||
See:
Description
| Interface Summary | |
|---|---|
| ExternalGeoLookupInterface | Interface used by ExternalImplDecideRule. |
| ExternalImplInterface | Interface used by ExternalImplDecideRule. |
| Class Summary | |
|---|---|
| AcceptDecideRule | Rule which responds ACCEPT to anything passed in. |
| AddRedirectFromRootServerToScope | |
| BeanShellDecideRule | Rule which runs a groovy script to make its decision. |
| ClassKeyMatchesRegExpDecideRule | Rule applies configured decision to any CrawlURI class key -- i.e. |
| ConfiguredDecideRule | Rule which can be configured to ACCEPT or REJECT at operator's option. |
| ContentTypeMatchesRegExpDecideRule | DecideRule whose decision is applied if the URI's content-type is present and matches the supplied regular expression. |
| ContentTypeNotMatchesRegExpDecideRule | DecideRule whose decision is applied if the URI's content-type is present and does not match the supplied regular expression. |
| DecideRule | Interface for rules which, given an object to evaluate,
respond with a decision: DecideRule.ACCEPT,
DecideRule.REJECT, or
DecideRule.PASS. |
| DecideRuleSequence | RuleSequence represents a series of Rules, which are applied in turn to give the final result. |
| DecidingFilter | DecidingFilter: a classic Filter which makes its accept/reject
decision based on whatever DecideRules have been set up inside
it. |
| DecidingScope | DecidingScope: a Scope which makes its accept/reject decision based on whatever DecideRules have been set up inside it. |
| ExceedsDocumentLengthTresholdDecideRule | |
| ExternalGeoLocationDecideRule | A rule that can be configured to take alternate implementations of the ExternalGeoLocationInterface. |
| ExternalImplDecideRule | A rule that can be configured to take alternate implementations of the ExternalImplInterface. |
| FetchStatusDecideRule | Rule applies the configured decision for any URI which has a fetch status equal to the 'target-status' setting. |
| FetchStatusMatchesRegExpDecideRule | |
| FetchStatusNotMatchesRegExpDecideRule | |
| FilterDecideRule | FilterDecideRule wraps a legacy Filter for use in DecideRule contexts. |
| HasViaDecideRule | Rule applies the configured decision for any URI which has a 'via' (essentially, any URI that was a seed or some kinds of mid-crawl adds). |
| HopsPathMatchesRegExpDecideRule | Rule applies configured decision to any CrawlURIs whose 'hops-path' (string like "LLXE" etc.) matches the supplied regexp. |
| IsCrossTopmostAssignedSurtHopDecideRule | Applies its decision if the current URI differs in that portion of its hostname/domain that is assigned/sold by registrars (AKA its 'topmost assigned SURT' or 'public suffix'.) |
| MatchesFilePatternDecideRule | Compares suffix of a passed CrawlURI, UURI, or String against a regular expression pattern, applying its configured decision to all matches. |
| MatchesListRegExpDecideRule | Rule applies configured decision to any CrawlURIs whose String URI matches the supplied regexps. |
| MatchesRegExpDecideRule | Rule applies configured decision to any CrawlURIs whose String URI matches the supplied regexp. |
| NotExceedsDocumentLengthTresholdDecideRule | |
| NotMatchesFilePatternDecideRule | Rule applies configured decision to any URIs which do *not* match the supplied (file-pattern) regexp. |
| NotMatchesListRegExpDecideRule | Rule applies configured decision to any URIs which do *not* match the supplied regexp. |
| NotMatchesRegExpDecideRule | Rule applies configured decision to any URIs which do *not* match the supplied regexp. |
| NotOnDomainsDecideRule | Rule applies configured decision to any URIs that are *not* in one of the domains in the configured set of domains, filled from the seed set. |
| NotOnHostsDecideRule | Rule applies configured decision to any URIs that are *not* on one of the hosts in the configured set of hosts, filled from the seed set. |
| NotSurtPrefixedDecideRule | Rule applies configured decision to any URIs that, when expressed in SURT form, do *not* begin with one of the prefixes in the configured set. |
| OnDomainsDecideRule | Rule applies configured decision to any URIs that are on one of the domains in the configured set of domains, filled from the seed set. |
| OnHostsDecideRule | Rule applies configured decision to any URIs that are on one of the hosts in the configured set of hosts, filled from the seed set. |
| PathologicalPathDecideRule | Rule REJECTs any URI which contains an excessive number of identical, consecutive path-segments (eg http://example.com/a/a/a/boo.html == 3 '/a' segments) |
| PredicatedDecideRule | Rule which applies the configured decision only if a test evaluates to true. |
| PrerequisiteAcceptDecideRule | Rule which ACCEPTs all 'prerequisite' URIs (those with a 'P' in the last hopsPath position). |
| QueueOverbudgetDecideRule | Applies configured decision to every candidate URI that would overbudget its queue. |
| RejectDecideRule | Rule which answers REJECT to everything evaluated. |
| ScopePlusOneDecideRule | Rule allows one level of discovery beyond configured scope (e.g. |
| SeedAcceptDecideRule | Rule which ACCEPTs all 'seed' URIs (those for which isSeed is true). |
| SurtPrefixedDecideRule | Rule applies configured decision to any URIs that, when expressed in SURT form, begin with one of the prefixes in the configured set. |
| TooManyHopsDecideRule | Rule REJECTs any CrawlURIs whose total number of hops (length of the hopsPath string, traversed links of any type) is over a threshold. |
| TooManyPathSegmentsDecideRule | Rule REJECTs any CrawlURIs whose total number of path-segments (as indicated by the count of '/' characters not including the first '//') is over a given threshold. |
| TransclusionDecideRule | Rule ACCEPTs any CrawlURIs whose path-from-seed ('hopsPath' -- see
CandidateURI.getPathFromSeed()) ends
with at least one, but not more than, the given number of
non-navlink ('L') hops. |
Provides classes for a simple decision rules framework.
Each 'step' in a decision rule set which can
affect an objects ultimate fate is called a DecideRule.
Each DecideRule renders a decision (possibly neutral) on the
passed objects fate.
Possible decisions are:
As previously outlined, each DecideRule is applied in turn; the last one to express a non-PASS preference wins.
For example, if the rules are:
To allow this style of decision processing to be plugged into the existing Filter and Scope slots:
See NewScopingModel for background.
|
||||||||||
| PREV PACKAGE NEXT PACKAGE | FRAMES NO FRAMES | |||||||||