Database Group Publications List

This list was generated daily by searching for "http" in the DBG publications search engine.

1. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/fqo.ps
TitleFusion Query Optimization
AuthorsS. Abiteboul , H. Garcia-Molina , Y. Papakonstantinou , R. Yerneni
Year1996
CitationUnpublished manuscript
Keywordsfusion queries, Internet databases, query optimization
AbstractFusion queries search for information integrated from distributed, autonomous sources over the Internet. In this context, data is not cleanly fragmented as in traditional distributed databases, and the number of sources participating in a typical query is large. We investigate optimization techniques for fusion queries. First, we focus on a very wide class of plans that capture the spirit of plans usually considered in commercial systems. We show how to find, under various realistic scenarios, an optimal plan within this large class. We evaluate the performance of this optimal plan and of other plans selected by simpler optimizers to demonstrate the benefits of our approach. We provide additional heuristics that, by considering plans outside our target class of plans, yield further performance improvements.

2. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/icdt97.semistructured.ps
TitleQuerying Semi-Structured Data
AuthorsS. Abiteboul
Year1996
CitationICDT, 1997
Keywords
AbstractQuerying Semi-Structured Data Serge Abiteboul ? INRIA-Rocquencourt Serge.Abiteboul@inria.fr 1 Introduction The amount of data of all kinds available electronically has increased dramatically in recent years. The data resides in different forms, ranging from unstructured data in file systems to highly structured in relational database systems. Data is accessible through a variety of interfaces including Web browsers, database query languages, application-specific interfaces, or data exchange formats. Some of this data is raw data, e.g., images or sound. Some of it has structure even if the structure is often implicit, and not as rigid or regular as that found in standard database systems. Sometimes the structure exists but has to be extracted from the data. Sometimes also it exists but we prefer to ignore it for certain purposes such as browsing. We call here semi-structured data this data that is (from a particular viewpoint) neither raw data nor strictly typed, i.e., not table-orient

3. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/icdt97.www.ps
TitleQueries and Computation on the Web
AuthorsS. Abiteboul , V. Vianu
Year1996
CitationICDT, 1997
Keywords
AbstractThe paper introduces a model of the Web as an infinite, semistructured set of objects. We reconsider the classical notions of genericity and computability of queries in this new context and relate them to styles of computation prevalent on the Web, based on browsing and searching. We revisit several well-known declarative query languages (first-order logic, Datalog, and Datalog with negation) and consider their computational characteristics in terms the notions introduced in this paper. In particular, we are interested in languages or fragments thereof which can be implemented by browsing, or by broand searching combined. Surprisingly, stratified and well-founded semantics for negation turn out to have basic shortcomings in this context, while inflationary semantics emerges as an appealing alternative.

4. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/icdt97.correspondance.ps
Title
AuthorsS. Abiteboul , S. Cluet , T. Milo
Year1996
CitationICDT, 1997
Keywords
AbstractCorrespondence and T ranslation for Heterogeneous Data ? Serge Abiteboul 1 and Sophie Cluet 2 and Tova Milo 3 1 Stanford University, Standford, CA 94402 y 2 INRIA, Domaine de Voluceau, 78153 Le Chesnay, France 3 Tel Aviv University,Tel Aviv, Israel 1 Introduction A primary motivation for new database technology is to provide support for the broad spectrum of multimedia data available notably through the network. These data are stored under different formats: SQL or ODMG (in SGML or LaTex (documents), DX formats (scientific Step (CAD/CAM etc. Their integration is a very active field of research and development (see for instance, for a very small sample, [10, 6, 7, 9, 8, 12, 19, In this paper, we provide a formal foundation to facilitate the integration of such heterogeneous data and the maintenance of heterogeneous replicated data. A sound solution for a data integration task requires a clean abstraction of the different formats in which data are stored, and means for specifying the co

5. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/pods96.tl.ps
TitleTemporal versus First-Order Logic to Query Temporal Databases
AuthorsS. Abiteboul , L. Herr , J. Bussche
Year1996
CitationPODS 96, 1996
Keywords
AbstractA database history can be modeled as a (finite) sequence of instances discretely ordered by time. Similarly, the behavior of a system such as an operating system or a reactive system can be modeled by an infinite such sequence. One can view the sequence as one single database where each relation has an additional column holding the time instant of validity of each tuple. The temporal database can then be queried using standard relational calculus (first-order logic) on this "timestamp" representation. One may alternatively use an implicit access to time and express queries in temporal logic. It is known that these two approaches yield the same expressive power in the propositional case. Their comparison in the predicate/database context remained open. We prove here that there are first-order logic queries on the timestamp representation that are not expressible in (extended) temporal logic. The proof technique is novel and is based on communication complexity. On leave from INRIA-Rocq

6. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/lorel96.ps
TitleThe Lorel Query Language for Semistructured Data
AuthorsS. Abiteboul , D. Quass , J. McHugh , J. Widom , J. Wiener
Year1996
Citationto appear in the Journal on Digital Libraries
KeywordsLore, semistructured, unstructured, object-oriented, OQL, ODMG
AbstractWe present the Lorel language, designed for querying semistructured data. Semistructured data is becoming more and more prevalent, e.g., in structured dots such as HTML and when performing simple integration of data from multiple sources. Traditional data models and query languages are inappropriate, since semistructured data often is irregular: some data is missing, similar concepts are represented using different types, heterogeneous sets are present, or object structure is not fully known. Lorel is a user-friendly language in the SQL/OQL style for querying such data effectively. For wide applicability, the simple object model underlying Lorel can be viewed as an extension of the ODMG data model and the Lorel language as an extension of OQL. The main novelties of the Lorel language are: (i) the extensive use of coercion to relieve the user from the strict typing of OQL, which is inappropriate for semistructured data; and (ii) powerful path expressions, which permit a flexible form o

7. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/oemview97.ps
TitleViews for Semistructured Data
AuthorsS. Abiteboul , R. Goldman , J. McHugh , V. Vassalos , Y. Zhuge
Year1997
CitationTechnical Report
KeywordsSemistructured Databases, Heterogeneous Databases, Views
AbstractDefining a view over a semistructured database introduces many new problems. In this paper we propose a view specification language and consider the problem of answering queries posed over views. The two main approaches, query rewriting and view materialization, are outlined with focus on the diffcult problems caused by the semistructured nature of the data.

8. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/cost.ps
Title"Query Caching and Optimization in Mediator Systems"
AuthorsS. Adali , S. Candan , Y. Papakonstantinou , V. Subrahmanyan
Year1995
CitationACM SIGMOD 96.
Keywords
Abstract

9. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/priority1.ps
TitleEmulating Soft Real-Time Scheduling Using Traditional Operating System Schedulers
AuthorsB. Adelberg , H. Garcia-Molina , B. Kao
Year1994
CitationAppeared in RTSS 1994 Technical note is in priority2.ps
Keywords
AbstractReal-time scheduling algorithms are usually only available in the kernels of real-time operating systems, and not in more general purpose operating systems, like Unix. For some soft real-time problems, a traditional operating system may be the development platform of choice. This paper addresses methods of emulating real-time scheduling algorithms on top of standard time-share schedulers. We examine (through simulations) three strategies for priority assignment within a traditional multi-tasking environment. The results show that the emulation algorithms are comparable in performance to the real-time algorithms and in some instances outperform them. Keywords: soft real-time, priority assignment, scheduling.

10. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/updates1.ps
TitleApplying Update Streams in a Soft Real-Time Database System
AuthorsB. Adelberg , H. Garcia-Molina , B. Kao
Year1994
CitationAppeared in SIGMOD 1995 Technical note is in updates2.ps
Keywords
AbstractMany papers have examined how to effciently export a materialized view but to our knowledge none have studied how to effciently import one. To import a view, i.e., to install a stream of updates, a real-time database system must process new updates in a timely fashion to keep the database "fresh," but at the same time must process transactions and ensure they meet their time constraints. In this paper, we discuss the various properties of updates and views (including staleness) that affect this tradeoff. We also examine, through simulation, four algorithms for scheduling transactions and installing updates in a soft real-time database. Keywords: soft real-time, temporal databases, materialized views, updates.

11. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/derived1.ps
TitleDatabase Support for Efficiently Maintained Derived Data
AuthorsB. Adelberg , B. Kao , H. Garcia-Molina
Year1995
CitationAppeared in EDBT '96
Keywords
AbstractDerived data is maintained in a database system to correlate and summarize base data which record real world facts. As base data changes, derived data needs to be recomputed. A high performance system should execute all these updates and recomputations in a timely fashion so that the data remains fresh and useful, while at the same time executing user transactions quickly. This paper studies the intricate balance between recomputing derived data and transaction execution. Our focus is on effcient recomputation strategies -- how and when recomputations should be done to reduce their cost without jeopardizing data timeliness. We propose the Forced Delay recomputation algorithm and show how it can exploit update locality to improve both data freshness and transaction response time. Keywords: derived data, view maintenance, active database system, transaction scheduling, update locality.

12. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/overview.ps
TitleAn Overview of the STanford Real-time Information Processor (STRIP)
AuthorsB. Adelberg , B. Kao , H. Garcia-Molina
Year1995
CitationAppeared in SIGMOD record, march '96
Keywords
AbstractWe believe that the greatest growth potential for soft realtime databases is not as isolated monolithic databases but as components in open systems consisting of many heterogeneous databases. In such environments, the flexibility to deal with unpredictable situations and the ability to cooperate with other databases (often non-real-time databases) is just as important as the guarantee of stringent timing constraints. In this paper, we describe a database designed explicitly for heterogeneous environments, the STanford Realtime Information Processor STRIP, which runs on standard Posix Unix, is a soft real-time main memory database with special facilities for importing and exporting data as well as handling derived data. We will describe the architecture of STRIP, its unique features, and its potential uses in overall system architectures. 1 Project Goals The STanford Real-time Information Processor (STRIP) was designed with the following goals: 1. support for transactions and data with

13. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/evaluating-strip.ps
TitleProject Synopsis: Evaluating STRIP
AuthorsB. Adelberg , H. Garcia-Molina
Year1996
CitationDART '96.
KeywordsSTRIP, active database, derived data, benchmark, program trading
AbstractThs paper describes preliminary efforts at evaluating the performance of the Stanford realtime information processor (STRIP We desribe a benchmark for active real-time databases based on a program trading application and report STRIP's performance on benchmark. Keywords: derived data, view maintenance, active database system, transaction scheduling, program trading, real-time database system.

14. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/strip-rules.ps
TitleThe STRIP Rule System For Efficiently Maintaining Derived Data
AuthorsB. Adelberg , H. Garcia-Molina , J. Widom
Year1996
Citationtechnical report
Keywordsactive database, derived data, materialized views, program trading
AbstractDerived data is maintained in a database system to correlate and summarize base data which records real world facts. As base data changes, derived data needs to be recomputed. This is often implemented by writing active rules that are triggered by changes to base data. In a system with rapidly changing base data, a database with a standard rule system may consume most of its resources running rules to recompute data. This paper presents the rule system implemented as part of the STanford Real-time Information Processor The STRIP rule system is an extension of SQL3-type rules that allows groups of rule actions to be batched together to reduce the total recomputation load on the system. In this paper we describe the syntax and semantics of the STRIP rule system, present an example set of rules to maintain stock index and theoretical option prices in a program trading application, and report the results of experiments performed on the running system. The experiments verify that STRIP's r

15. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/vlsiDesign.ps
TitleMulti-Way VLSI Circuit Partitioning
AuthorsP. Agrawal , B. Narendran , N. Shivakumar
Year1996
CitationProceedings of 9th International Conference on VLSI Design, Bangalore, India, Jan'96
KeywordsVLSI, CAD, Partitioning, Circuit
Abstract[HaKa92] L. Hagen and A. B. Kahng, A new approach to effective circuit clustering, Proc. IEEE International Conference on Computer-Aided Design, pp. 422-427, Noember 1992. [HaKa92b] L. Hagen and A. B.Kahng, New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pp. 1074-1085, September 1992. [KeLi70] B. Kernighan, and S. Lin, An effcient heuristic for partitioning of electrical circuits, Bell System Technical Journal, vol. 49, pp. 291-307, February 1970. [KiGV83] S. Kirkpatrick, C. D. Gelat, and M. P. Vecchi, Jr., Optmization by simulated annealing, Science, vol. 220, pp. 671-680, May 1983. [Kr84] B. Krishnamurthy, An Improved min-cut algorithm for partitioning VLSI networks, IEEE Transactions on Computers, vol. C-33, pp. 438-446, May 1984. [Sa89] L. Sanchis, Multi-Way network partitioning, IEEE Transactions on Computers, vol. 38, no. 1, pp. 62-81, January1989. [WeCh89] Y-C, Wei and C-K, Cheng, Tow

16. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/behavior.ps
TitleBehavior of Database Production Rules: Termination, Confluence, and Observable Determinism
AuthorsA. Aiken , J. Hellerstein , J. Widom
Year1992
CitationAppeared in SIGMOD '92; published while at IBM Almaden; extended journal version appears in static-analysis.ps
Keywordsactive database, static analysis, Starburst
AbstractStatic analysis methods are en for determining whether arbitrary sets of database production rules are (1) guaranteed to terminate; (2) guaranteed to produce a unique final database state; (3) guaranteed to produce a unique stream of observable actions. When the analysis determines that one of these properties is not guaranteed, it isolates the rules responsible for the problem and determines criteria that, if satisfied, guarantee the property. The analysis methods are presented in the context of the Starburst Rule System; they will form the basis of an interactive development environment for Starburst rule programmers.

17. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/static-analysis.ps
TitleStatic Analysis Techniques for Predicting the Behavior of Database Production Rules
AuthorsA. Aiken , J. Hellerstein , J. Widom
Year1995
CitationIn ACM TODS, March 1995; extended version of work initially reported in behavior.ps
Keywordsactive database, Starburst
AbstractMethods are given for statically analyzing sets of database production rules to determine if the rules are (1) guaranteed to terminate, (2) guaranteed to produce a unique final database state, and (3) guaranteed to produce a unique stream of observable actions. If the analysis determines that one of these properties is not guaranteed, it isolates the rules responsible for the problem and determines criteria that, if satisfied, guarantee the property. The analysis methods are presented in the context of the Starburst Rule System.

18. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/metadata.ps
TitleThe Stanford Digital Library Metadata Architecture
AuthorsM. Baldonado , K. Chang , L. Gravano , A. Paepcke
Year1996
CitationSubmitted for publication
Keywords
AbstractWe describe the requirements and design of a metadata architecture for the Stanford Digital Library project. The overall goal of this project is to provide an infrastructure that affords interoperability among heterogeneous, autonomous digital library services. These services include both search services and remotely usable information processing facilities. We situate our requirements for a metadata architecture in the context of our experience in building this infrastructure and its associated services. Our metadata architecture fits into our established infrastructure and promotes interoperability among existing and de-facto metadata standards. Several pieces of this architecture are implemented; others are under construction. The architecture includes attribute model proxies, attribute model translation services, metadata information facilities for search services, and local metadata repositories. In presenting and discussing the pieces of the architecture, we show how they addres

19. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/delta-relations.ps
TitleUsing Delta Relations to Optimize Condition Evaluation in Active Databases
AuthorsE. Baralis , J. Widom
Year1993
CitationStanford Technical Report Stan-CS-93-1495, Nov. 1993; shortened proceedings version appears in delta-relations-rids.ps
Keywords
AbstractWe give a method for improving the effciency of condition evaluation during rule processing in active database systems. The method derives, from a rule condition, two improved conditions that can be used in place of the original condition when a previous value (true or false) of the original condition is known. The derived conditions are more effcient the original condition because they replace references to entire database relations by references to delta relations, which typically are much smaller. Delta relations are accessible to rule conditions in almost all current active database systems, making this optimization broadly applicable. We specify an implementation of our rewriting method based on attribute grammars.

20. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/algebraic-analysis.ps
TitleAn Algebraic Approach to Rule Analysis in Expert Database Systems
AuthorsE. Baralis , J. Widom
Year1994
CitationStanford Technical Report Stan-CS-94-1504, Feb. 1994; shortened version appears in analysis-vldb.ps
Keywordsactive database, static analysis
AbstractExpert database systems extend the functionality of conventional database systems by providing a facility for creating and automatically executing Condition-Action rules. While Condition-Action rules in database systems are very powerful, they also can be very diffcult to program, due to the unstructured and unpredictable nature of rule processing. We provide methods for static analysis of Condition-Action rules; our methods determine whether a given rule set is guaranteed to terminate, and whether rule execution is confluent (has a guaranteed unique final Our methods are based on previous methods for analyzing rules in active database systems. We improve considerably on the previous methods by providing analysis criteria that are much less conservative: our methods often determine that a rule set will terminate or is confluent when previous methods could not. Our improved analysis is based on a "propagation" algorithm, which uses a formal approach based on an extended relational alge

21. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/analysis-vldb.ps
TitleAn Algebraic Approach to Rule Analysis in Expert Database Systems
AuthorsE. Baralis , J. Widom
Year1994
CitationAppeared in VLDB '94; shortened proceedings version of algebraic-analysis.ps
Keywordsactive database, static analysis
AbstractExpert database systems extend the functionality of conventional database systems by providing a facility for creating and automatically executing Condition-Action rules. While Condition-Action rules in database systems are very powerful, they also can be very diffcult to program, due to the unstructured and unpredictable nature of rule processing. We provide methods for static analysis of Condition-Action rules; our methods determine whether a given rule set is guaranteed to terminate, and whether rule execution is confluent (has a guaranteed unique final Our methods are based on previous methods for analyzing rules in active database systems. We improve considerably on the previous methods by providing analysis criteria that are much less conservative: our methods often determine that a rule set will terminate or is confluent when previous methods could not. Our improved analysis is based on a "propagation" algorithm, which uses a formal approach based on an extended relational alge

22. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/delta-relations-rids.ps
TitleUsing Delta Relations to Optimize Condition Evaluation in Active Databases
AuthorsE. Baralis , J. Widom
Year1995
CitationAppeared in RIDS '95; shortened proceedings version of delta-relations.ps
Keywords
AbstractWe give a method for improving the effciency of condition evaluation during rule processing in active database systems. The method derives, from a rule condition, two new conditions that can be used in place of the original condition when a previous value (true or false) of the original condition is known. The derived conditions are generally more effcient to evaluate than the original condition because they are incremental--they replace references to entire database relations by references to delta relations, which typically are much smaller. Delta relations are accessible to rule conditions in almost all current active database systems, making this optimization broadly applicable. We describe an attribute grammar based approach that we have used to implement our condition rewriting technique.

23. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/better-static-analysis.ps
TitleBetter Static Rule Analysis for Active Database Systems
AuthorsE. Baralis , J. Widom
Year
CitationTechnical report, submitted for journal publication; preliminary version of some of this work appears in algebraic-analysis.ps
Keywords
AbstractRules in active database systems can be very diffcult to program, due to the unstructured and unpredictable nature of rule processing. We provide static analysis techniques for predicting whether a given rule set is guaranteed to terminate, and whether rule execution is confluent (guaranteed to have a unique final Our methods are based on previous techniques for analyzing rules in active database systems. We improve considerably on the previous techniques by providing analysis criteria that are much less conservative: our methods often determine that a rule set will terminate or is confluent when previous methods could not make this determination. Our improved analysis is based on a "propagation" algorithm, which uses an extended relational algebra to accurately determine when the action of one rule can affect the condition of another, and to determine when rule actions commute. We consider both Condition-Action rules and Event-Condition-Action rules, making our approach widely applic

24. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/1993/data-replication.short.ps
TitleReplicated Data Management in Mobile Environments: Anything New Under the Sun?
AuthorsD. Barbara , H. Garcia-Molin
Year1993
CitationIFIP Conference on Applications in Parallel and Distributed Computing, April 1994
Keywords
Abstract

25. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/ooindex.ps
TitleCentralized Versus Distributed Index Management in a Page Server OODBMS
AuthorsJ. Basu , A. Keller
Year1995
Citationunpublished manuscript
KeywordsOODB, index, simulation
Abstract

26. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/isolation.ps
TitleDegrees of Transaction Isolation in SQL*Cache: A Predicate-based Client-side Caching System
AuthorsJ. Basu , A. Keller
Year1996
Citationunpublished manuscript
Keywordscaching, cache consistency, transaction consistency
Abstract

27. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/multidatabase.ps
TitleOverview of Multidatabase Transaction Management
AuthorsY. Breitbart , H. Garcia-Molina , A. Silberschatz
Year1992
CitationVLDB Journal, Vol. 1, No. 2, October 1992, pp.181-23.
Keywords
Abstract[66]H.Wachter, A. Reuter, The ConTract Model, A. K. Elmagarmid (Ed.), Database TransactionModels for Advanced Applications, Morgan Kaufman, San Mateo, CA, 1992.[67]G.Weikum, H.-J. Schek. Architectural Issues of Transaction Management in Layered Systems.Proceedingsof the 10th Conference on Very Large Data Bases. Morgan Kaufmann, Palo Alto,Calif. pp. 454-465. 1984.[68]A. Wolski and J. Veijalainen. 2PC Agent method:AchievingSerializability in presence ofFailures in a Heterogeneous Multidatabase. Proceedings of the International Conference onDatabases Parallel Architectures and their Applications. pp 321-330,1990.[69]K.-L. Wu, P. Yu, C. Pu. Divergence Control for Epsilon-Serializability Proceeding of the 8thInternational Conference on Data Engineering. Phoenix, AZ, February, 1992.42

28. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/multidatabase.ps
TitleOverview of Multidatabase Transaction Management
AuthorsY. Breitbart , H. Garcia-Molina , A. Silberschatz
Year1992
CitationVLDB Journal, Vol. 1, No. 2, October 1992, pp.181-23.
Keywords
Abstract[66]H.Wachter, A. Reuter, The ConTract Model, A. K. Elmagarmid (Ed.), Database TransactionModels for Advanced Applications, Morgan Kaufman, San Mateo, CA, 1992.[67]G.Weikum, H.-J. Schek. Architectural Issues of Transaction Management in Layered Systems.Proceedingsof the 10th Conference on Very Large Data Bases. Morgan Kaufmann, Palo Alto,Calif. pp. 454-465. 1984.[68]A. Wolski and J. Veijalainen. 2PC Agent method:AchievingSerializability in presence ofFailures in a Heterogeneous Multidatabase. Proceedings of the International Conference onDatabases Parallel Architectures and their Applications. pp 321-330,1990.[69]K.-L. Wu, P. Yu, C. Pu. Divergence Control for Epsilon-Serializability Proceeding of the 8thInternational Conference on Data Engineering. Phoenix, AZ, February, 1992.42

29. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/copy.ps
TitleCopy Detection Mechanisms for Digital Documents
AuthorsS. Brin , J. Davis , H. Garcia-Molina
Year1995
CitationAppeared in SIGMOD '95
Keywords
AbstractIn a digital library system, documents are available in digital form and therefore are more easily copied and their copyrights are more easily violated. This is a very serious problem, as it discourages owners of valuable information from sharing it with authorized users. There are two main philosophies for addressing this problem: prevention and detection. The former actually makes unauthorized use of documents diffcult or impossible while the latter makes it easier to discover such activity. In this paper we propose a system for registering documents and then detecting copies, either complete copies or partial copies. We describe algorithms for such detection, and metrics required for evaluating detection mechanisms (covering accuracy, effciency, and securitWe also describe a working prototype, called COPS, describe implementation issues, and present experimental results that suggest the proper settings for copy detection parameters.

30. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/near.ps
TitleNear Neighbor Search in Large Metric Spaces
AuthorsS. Brin
Year1995
CitationAppeared in VLDB '95
Keywords
AbstractGiven user data, one often wants to find approximate matches in a large database. A good example of such a task is finding images similar to a given image in a large collection of images. We focus on the important and technically diffcult case where each data element is high dimensional, or more generally, is represented by a point in a large metric spaceand distance calculations are computationally expensive. In this paper we introduce a data structure to solve this problem called a GNAT { Geometric Near-neighbor Access Tree. It is based on the philosophy that the data structure should act as a hierarchical geometrical model of the data as opposed to a simple decomposition of the data that does not use its intrinsic geometry. In experiments, we find that GNAT's outperform previous data structures in a number of applications. Keywords { near neighbor, metric space, approximate queries, data mining, Dirichlet domains, Voronoi regions.

31. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/constraint-maintenance.ps
TitleDeriving Production Rules for Constraint Maintenance
AuthorsS. Ceri , J. Widom
Year1990
CitationAppeared in VLDB '90; published while at IBM Almaden
Keywordsactive database, integrity, Starburst
AbstractTraditionally, integrity constraints in database systems are maintained either by rolling back any transaction that produces an inconsistent state or by disallowing or modifying operations that may produce an inconsistent state. An alternative approach is to provide automatic "repair" of inconsistent states using production rules. For each constraint, a production rule is used to detect constraint violation and to initiate database operations that restore consistency. We describe an SQL-based language for defining integrity constraints and a framework for translating these constraints into constraint-maintaining production rules. Some parts of the translation are automatic while other parts require user intervention. Based on the semantics of our set-oriented production rules language and under certain assumptions, we prove that at the end of each transaction the rules are guaranteed to produce a state satisfying all defined constraints. We apply our approach to a good-sized example.

32. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/view-maintenance.ps
TitleDeriving Production Rules for Incremental View Maintenance
AuthorsS. Ceri , J. Widom
Year1991
CitationAppeared in VLDB '91 (award paper); published while at IBM Almaden
Keywordsactive database, Starburst
AbstractIt is widely recognized that production rules in database systems can be used to automatically maintain derived data such as views. However, writing a correct set of rules for effciently maintaining a given view can be a diffcult and ad-hoc process. We provide a facility whereby a user defines a view as an SQL select expression, from which the system automatically derives set-oriented production rules that maintain a materialization of that view. The maintenance rules are triggered by operations on the view's base tables. Generally, the rules perform incremental maintenance: the materialized view is modified according to the sets of changes made to the base tables, which are accessible through logical tables provided by the rule language. However, for some operations substantial recomputation may be required. We give algorithms that, based on key information, perform syntactic analysis on a view definition to determine when effcient maintenance is possible.

33. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/par-dist-rules.ps
TitleProduction Rules in Parallel and Distributed Database Environments
AuthorsS. Ceri , J. Widom
Year1992
CitationAppeared in VLDB '92; published while at IBM Almaden
Keywordsactive database, consistency, Starburst
AbstractIn most database systems with production rule facilities, rules respond to operations on centralized data and rule processing is performed in a centralized, sequential fashion. In parallel and distributed database environments, for maximum autonomy it is desirable for rule processing to occur separately at each site (or noresponding to operations on data at that site. However, since rules at one site may read or modify data and interact with rules at other sites, independent rule processing at each site may be impossible or incorrect. We describe mechanisms that allow rule processing to occur separately at each site and guarantee correctness: parallel or distributed rule processing is provably equivalent to rule processing in the corresponding centralized environment. Our mechanisms include locking schemes, communication protocols, and rule restrictions. Based on a given parallel or distributed environment and desired level of transparency, the mechanisms may be combined or may be use

34. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/heterogeneity.ps
TitleManaging Semantic Heterogeneity with Production Rules and Persistent Queues
AuthorsS. Ceri , J. Widom
Year1993
CitationAppeared in VLDB '93; published while at IBM Almaden
Keywordsactive database, heterogeneous database, Starburst
AbstractWe show that production rules and persistent queues together provide a convenient mechanism for maintaining consistency in semantically heterogeneous multidatabase environments. We describe a specification language and methods for automatically deriving production rules that maintain (1) existence dependencies, in which the presence of data in one database implies the presence of related data in another, and (2) value dependencies, in which the value of data in one database is based on the value of related data in another. The production rules derived from dependency specifications use persistent queues to monitor and maintain the dependencies automatically, asynchronously, incrementally, and correctly.

35. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/deductive-data.ps
TitleDeriving Incremental Production Rules for Deductive Data
AuthorsS. Ceri , J. Widom
Year1994
CitationAppeared in Information Systems, 1994
Keywordsactive database, deductive database, Starburst
AbstractWe show that the production rule mechanism provided by active database systems can be used to quickly and easily implement the logic rule interface of deductive database systems. Deductive rules specify derived relations using Datalog with built-in predicates and stratified negation; the deductive rules are compiled automatically into production rules. We present a materialized approach, in which the derived relations are stored in the database and the production rules automatically and incrementally propagate base relation changes to the derived relations. We also present a non-materialized approach, in which the production rules compute the derived relations on demand.

36. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/video-long.ps
TitleMinimizing Memory Use in Video Servers
AuthorsE. Chang , H. Garcia-Molina
Year1996
CitationStanford Technical Note SIDL-WP-1996-0050 Submitted for publication
Keywordsmultimedia, disk scheduling, memory utilization
AbstractA number of techniques have been developed for reducing disk latency in a video server, including disk arm scheduling and data placement ones. In this paper we carefully study memory utilization in video servers, leading to enhanced techniques that significantly reduce memory use. This is achieved by introducing artificial delays between IOs. Surprisingly, the delays lead to better overall performance as compared to "traditional" disk latency reduction techniques. We also propose a novel disk scheduling policy that reduces initial latency (i.e., the time between the arrival of a request and the start of the presentation) to minimum. In addition, we study the memory requirements and costs of a video delivery system, showing how data placement on disk and multiple disks impact memory use. In doing so we show that it may not be advisable to achieve the maximum throughput that a disk can support. Keywords: multimedia, disk scheduling, memory utilization.

37. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/latency-reduction.ps
TitleReducing Initial Latency in Multimedia Storage Systems
AuthorsE. Chang , H. Garcia-Molina
Year1996
CitationAppeared in MMDBMS '96;
Keywords
AbstractA multimedia server delivers presentations (e.g., videos, movies, providing high bandwidth and continuous real-time deliveryIn this paper we present techniques for reducing the initial latency of presentations, i.e., for reducing the time between the arrival of a request and the start of the presentation. Traditionally, initial latency has not received much attention. This is because one major application of multimedia servers is movies on demand where a delay of a few minutes before a new multi-hour movie starts is acceptable. However , latency reduction is important in interactive applications such as video games and browsing of multimedia documents. V arious latency reduction schemes are proposed and analyzed, and their performance compared. We show that our techniques can signicantly reduce (almost eliminate in some cases) initial latency without adversely affecting throughput. Moreover , a novel on-disk partial data replication scheme that we propose proves to be far more cost ef

38. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/bquery.ps
TitleBoolean Query Mapping Across Heterogeneous Information Sources
AuthorsK. Chang , H. Garcia-Molina , A. Paepcke
Year1995
CitationTo appear in the IEEE Transactions on Knowledge and Data Engineering as part of a special section of concise research papers on Digital Libraries (Aug. 1996).
KeywordsBoolean queries, query translation, information retrieval, heterogeneity, digital libraries, query subsumption, filtering.
Abstract---Searching over heterogeneous information sources is difcult because of the non-uniform querylanguages. Our approach is to allow a user to compose Boolean queries in one rich front-end language. Foreach user query and target source, we transform the user query into a subsuming query that can be supportedby the source but that may return extra documents. The results are then processed by a lter query to yield thecorrect nal result. In this paper we introduce the architecture and associated algorithms for generating thesupported subsuming queries and lters. We show that generated subsuming queries return a minimal num-ber of documents; we also discuss how minimal cost lters can be obtained. We have implemented prototypeversions of these algorithms and demonstrated them on heterogeneous Boolean systems.Index Terms---Boolean queries, query translation, information retrieval, heterogeneity, digital libraries,query subsumption, ltering.I.INTRODUCTIONEmerging Digital Libraries can provide

39. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/pred_rewriting.ps
TitlePredicate Rewriting for Translating Boolean Queries in a Heterogeneous Information System
AuthorsK. Chang , H. Garcia-Molina , A. Paepcke
Year1996
CitationUnpublished manuscript
KeywordsBoolean queries, query translation, predicate rewriting, information retrieval, heterogeneity, digital libraries, query subsumption, filtering.
Abstract24[3]M.I. Crystal, G.E. Jakobson. FRED, A Front End for Databases,Online, vol. 6, no. 5, pp. 27-30, Sep. 1982.[4]W. B. Frakes, R. Baeza-Yates.Information Retrieval Data Structures & Algorithm, Prentice-Hall, 1992.[5]D.T. Hawkins, I.R. Levy. Front End Software for Online Database Searching Part1: Denitions, System Fea-tures, and Evaluation,Online 96:30-37, November 1985.[6]A. Loeffen. Text Databases: A Survey of Text Models and Systems,Sigmod Record, vol. 23, no. 1, pp. 97-106,March 1994.[7]J.B. Lovins. Development of a Stemming Algorithm,Mechanical Translation and Computational Linguistics,vol. 11, no. 1-2, pp. 22-31, 1968.[8]R.S. Marcus. User Assistance in Bibliographic Retrieval Networks Through a Computer Intermediary,IEEETrans. on Systems, Man, and Cybernetics, vol. smc-12, no. 2, pp. 116-133, 1982.[9]T.H. Martin.A Feature Analysis of Interactive Retrieval Systems, Institute of Communication Research, Stan-ford University, SU-COMM-ICR-74-1, September, 1974.[10]E.J. McCluskey.Logic

40. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/sigmod96.ps
TitleOptimizing Queries over Multimedia Repositories
AuthorsS. Chaudhuri , L. Gravano
Year1996
CitationAppeared in SIGMOD '96
Keywords
AbstractRepositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A selection on these attributes will typically produce not just a set of objects, as in the traditional relational query model (filteringbut also a grade of match associated with each object, indicating how well the object matches the selection condition (rankingAlso, multimedia repositories may allow access to the attributes of each object only through indexes. We investigate how to optimize the processing of queries over multimedia repositories. A key issue is the choice of the indexes used to search the repository. We define an execution space that is searchminimal, i.e., the set of indexes searched is minimal. Although the general problem of picking an optimal plan in the search-minimal execution space is NP-hard, we solve the problem effciently when the predicates in the query are independent. We also show that the problem of optimizing queries that ask for

41. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/cm.ps
TitleConstraint Management in Loosely Coupled Distributed Databases
AuthorsS. Chawathe , H. Garcia-Molina , J. Widom
Year1993
CitationTechnical report. Detailed version of papers appearing in ICDE 1996 and Data Engg. Bull. 17(2), June 1994
Keywordsconstraint management, autonomous systems, heterogeneous systems
AbstractWe provide a framework for managing integrity constraints that span multiple databases in loosely coupled, heterogeneous environments. Our framework enables the formal description of (1) interfaces provided by a database for the data items involved in multi-database constraints; (2) strategies for monitoring and maintaining multi-database constraints; (3) guarantees regarding the consistency of multi-database constraints. With our approach one can define "relaxed" constraints that only hold at certain times or under certain conditions. Such constraints appear often in practice and cannot be handled effectively by conventional, transaction-based approaches. We also describe a toolkit, based on our framework, for enforcing constraints over heterogeneous systems. The toolkit includes a general-purpose, distributed constraint manager that can be easily configured to a given environment and constraints. A first version of the toolkit has been implemented and is under evaluation.

42. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/ind-nes-obj.ps
TitleOn Index Selection Schemes for Nested Object Hierarchies
AuthorsS. Chawathe , M. Chen , P. Yu
Year1993
CitationVLDB 1994
KeywordsObject-oriented databases, nested objects, path expressions, index selection
AbstractIn this paper we address the problem of devising a set of indexes for a nested object hierarchy in an object-oriented database to improve the overall system performance. It is noted that the effects of two indexes could be entangled in that the inclusion of one index might affect the benefit achievable by the other index. Such a phenomenon is termed index interaction. Clearly, the effect of index interaction needs to be taken into consideration when a set of indexes is being built. The index selection problem is first formulated and four index selection algorithms are evaluated via simulation. The effects of different objective functions, which guide the search in the index selection algorithms, are also investigated. It is shown by simulation results that the greedy algorithm which is devised in light of the phenomenon of index interaction performs fairly well in most cases. Sensitivity analysis for various database parameters is conducted. Index Terms: Object-oriented databases, ind

43. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/indexing-nested-objects.ps
TitleOn Index Selection Schemes for Nested Object Hierarchies
AuthorsS. Chawathe , M. Chen , P. Yu
Year1993
CitationTechnical report. Detailed version of paper that appears in VLDB 1994
KeywordsObject-oriented databases, nested objects, path expressions, index selection
AbstractIn this paper we address the problem of devising a set of indexes for a nested object hierarchy in an objected-oriented database to improve the overall system performance. Note that the effects of two indexes could be entangled in that the inclusion of one index might affect the benefit achievable by the other index. Such a phenomenon is termed index interaction in this paper. Clearly, the effect of index interaction needs to be taken into consideration when a set of indexes is being built. The index selection problem is first formulated and four index selection algorithms are evaluated via simulation. The effects of different objective functions, which guide the search in the index selection algorithms, are also investigated. It is shown by simulation results that the greedy algorithm which is devised in light of the phenomenon of index interaction performs fairly well in most cases. It in fact agrees with the very nature of index interaction we identify in this study. Sensitivity an

44. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/tsimmis-overview.ps
TitleThe TSIMMIS Project: Integration of Heterogenous Information Sources
AuthorsS. Chawathe , H. Garcia-Molina , J. Hammer , K. Ireland , Y. Papakonstantinou , J. Ullman , J. Widom
Year1994
CitationIPSJ Conference 1994 (Tokyo)
KeywordsHeterogeneous databases, Object Exchange Model, integration
AbstractThe goal of the Tsimmis Project is to develop tools that facilitate the rapid integration of heterogeneous information sources that may include both structured and unstructured data. This paper gives an overview of the project, describing components that extract properties from unstructured objects, that translate information into a common object model, that combine information from several sources, that allow browsing of information, and that manage constraints across heterogeneous sites. Tsimmis is a joint project between Stanford and the IBM Almaden Research Center. 1 Overview A common problem facing many organizations today is that of multiple, disparate information sources and repositories, including databases, object stores, knowledge bases, file systems, digital libraries, information retrieval systems, and electronic mail systems. Decision makers often need information from multiple sources, but are unable to get and fuse the required information in a timely fashion due to the

45. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/autonomous-cm.ps
TitleFlexible Constraint Management for Autonomous Distributed Databases
AuthorsS. Chawathe , H. Garcia-Molina , J. Widom
Year1994
CitationData Engineering Bulletin, 17(2), June 1994
Keywordsconstraint management, autonomous systems, heterogeneous systems
AbstractFlexible Constraint Management for Autonomous Distributed Databases Sudarshan S. Chawathe, Hector Garcia-Molina and Jennifer Widom Computer Science Department Stanford University Stanford, California 94305{2140 E-mail: fchaw,hector,widomg@cs.Stanford.edu 1 Introduction When databases inter-operate, integrity constraints arise naturally. For example, consider a flight reservation application that multiple airline databases. Airline A reserves a block of X seats from airline B. If A sells many seats from this block, it tries to increase X. For correctness, the value of X recorded in A's database must be the same as that recorded in B's database; this is a simple distributed copy constraint. However, the databases in the above example are owned by independent airlines and are therefore autonomous. Typically, the database of one airline will not participate in distributed transactions with other airlines, nor will it allow other airlines to lock its data. This renders traditional constra

46. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/tdiff3-8.ps
TitleChange Detection in Hierarchically Structured Information
AuthorsS. Chawathe , A. Rajaraman , H. Garcia-Molina , J. Widom
Year1995
CitationSIGMOD 1996
Keywordschange detection, semi-structured data, edit script, difference
AbstractDetecting and representing changes to data is important for active databases, data warehousing, view maintenance, and version and configuration management. Most previous work in change management has dealt with flat-file and relational data; we focus on hierarchically structured data. Since in many cases changes must be computed from old and new versions of the data, we define the hierarchical change detection problem as the problem of finding a "minimum-cost edit script" that transforms one data tree to another, and we present effcient algorithms for computing such an edit script. Our algorithms make use of some key domain characteristics to achieve substantially better performance than previous, general-purpose algorithms. We study the performance of our algorithms both analytically and empirically, and we describe the application of our techniques to hierarchically structured documents.

47. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/tdiff3-2.ps
TitleChange Detection in Hierarchically Structured Information
AuthorsS. Chawathe , A. Rajaraman , H. Garcia-Molina , J. Widom
Year1995
CitationTechnical report. Detailed version of paper appearing in SIGMOD 1996
Keywordschange detection, semi-structured data, edit script, difference
AbstractDetecting and representing changes to data is important for active databases, data warehousing, view maintenance, and version and conguration management. Most previous work in change management has dealt with at-le and relational data; we focus on hierarchically structured data. Since in many cases changes must be computed from old and new versions of the data, we dene the hierarchical change detection problem as the problem of nding a minimum-cost edit script that transforms one data tree to another, and we present efcient algorithms for computing such an edit script. Our algorithms make use of some key domain characteristics to achieve substantially better performance than previous, generalpurpose algorithms. We study the performance of our algorithms both analytically and empirically, and we describe the application of our techniques to hierarchically structured documents.

48. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/cmtoolkit.ps
TitleA Toolkit for Constraint Management in Heterogeneous
AuthorsS. Chawathe , H. Garcia-Molina , J. Systems
Year1995
CitationICDE 1996
Keywordsconstraint management, autonomous systems, heterogeneous systems
AbstractWe present a framework and a toolkit to monitor and enforce distributed integrity constraints in loosely coupled information systems. Our framework enables formalizes weakened notions of consistencywhich are essential in such environments. Our framework is used to describe (1) interfaces provided by a database for the data items involved in intersite constraints; (2) strategies for monitoring and enforcing such constraints; (3) guaranregarding the level of consistency the system can provide. Our toolkit uses this framework to provide a set of configurable modules that are used to monitor and enforce constraints spanning loosely coupled heterogeneous information systems.

49. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/bbdifftr.ps
TitleMeaningful Change Detection in Structured Data
AuthorsS. Chawathe , H. Garcia-Molina
Year1996
CitationTechnical Report
Keywordschange detection, edit scripts, tree editing, diff, deltas, object-oriented databases, evolving data
AbstractDetecting changes by comparing data snapshots is an important requirement for difference queries, active databases, and version and configuration management. In this paper we focus on detecting meaningful changes in hierarchically structured data, such as nested-object data. This is a much more challenging problem than the corresponding one for relational or flat-file data. In order to describe changes better, we base our work not just on the traditional "atomic" insert, delete, update operations, but also on operations that move an entire sub-tree of nodes, and that copy an entire sub-tree. This allows us to describe changes in a semantically more meaningful way. Since this change detection problem is N P-hard, in this paper we present a heuristic change detection algorithm that yields close to "minimal" descriptions of the changes, and that has fewer restrictions than previous algorithms. Our algorithm is based on transforming the change detection problem to a problem of computing a

50. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/doem.ps
TitleRepresenting and Querying Changes in Semistructured Data
AuthorsS. Chawathe , S. Abiteboul , J. Widom
Year1997
CitationTechnical Report
KeywordsSemistructured data, deltas, change queries, standing queries
AbstractSemistructured data may be irregular and incomplete and does not necessarily conform to a fixed schema. As with structured data, it is often desirable to maintain a history of changes to data, and to query over both the data and the changes. Representing and querying changes in semistructured data is more diffcult than in structured data due to the irregularity and lack of schema. We present a model for representing changes in semistructured data and a language for querying over these changes. We discuss implementation strategies for our model and query language. We also describe the design and implementation of a "query subscription service" that permits standing queries over changes in semistructured information sources.

51. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/iccad94.ps
TitleMulti-way VLSI Circuit Partitioning Based on Dual Net Representation
AuthorsJ. Cong , W. Labio , N. Shivakumar
Year1994
CitationIEEE International Conference on Computer-Aided Design, Nov '94. (ICCAD'94)
Keywords
AbstractInthispaper,westudythearea-balancedmulti-waypartitioningproblemofVLSIcircuitsbasedonanewdualnetlistrepresentationnamedthehybriddualnetlist(HDN).Givenanetlist,werstcomputeaK-wayparti-tionofthenetsbasedontheHDNrepresentation,andthentransformaK-waynetpartitionintoaK-waymodulepartitioningsolution.ThemaincontributionofourworkistheformulationandsolutionoftheK-waymodulecon-tention(K-MC)problem,whichdeterminesthebestassignmentofthemodulesincontentiontopartitions,whilemaintaininguser-speciedarearequirements,whenwetransformthenetpartitionintoamodulepartition.Underanaturaldenitionofbindingfactorbetweennetsandmodules,andpreferencefunctionbetweenpartitionsandmodules,weshowthattheK-MCproblemcanbereducedtoamin-costmax-owproblem.WepresentefcientsolutionstotheK-MCproblembasedonnetworkowcomputation.Extensiveexperimentalresultsshowthatouralgorithmconsistentlyoutperformstheconven-tionalK-FMpartitioningalgorithmbyasignicantmar-gin.1.IntroductionTheK-waypartitioningproblemisoneofpartitioningthemodulesinane

52. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/vlsi.ps
TitleMulti-way VLSI Circuit Partitioning Based on Dual Net Representation
AuthorsJ. Cong , W. Labio , N. Shivakumar
Year1996
CitationIEEE Transactions in CAD, April 1996
Keywords
AbstractInthispaper,westudythearea-balancedmulti-waypartitioningproblemofVLSIcircuitsbasedonanewdualnetlistrepresentationnamedthehybriddualnetlist(HDN),andproposeageneralparadigmformulti-waycircuitpartitioningbasedondualnettransformation.GivenanetlistwerstcomputeaK-waypartitioningofnetsbasedontheHDNrepresentation,andthentransformtheK-waynetpartitionintoaK-waymodulepartitioningsolution.Themaincontri-butionofourworkisintheformulationandsolutionoftheK-waymodulecontention(K-MC)problem,whichdeterminesthebestassignmentofthemodulesincontentiontopartitionswhilemaintaininguser-speciedarearequirements,whenwetransformthenetpartitionintoamodulepartition.Underanaturaldenitionofbindingfunctionbetweennetsandmodules,andpreferencefunctionbetweenpartitionsandmodules,weshowthattheK-MCproblemcanbereducedtoamin-costmax-owproblem.WepresentanefcientsolutiontotheK-MCproblembasedonnetworkowcomputation.Weapplyourdualtransformationparadigmtothewell-knownK-wayFMpartitioningalgorithm(K-FM)andshowthatthenewalgorithm,named

53. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/book-chapter.ps
TitleActive Database Systems
AuthorsU. Dayal , E. Hanson , J. Widom
Year1994
CitationAppeared in Won Kim book "Modern Database Systems..."
Keywordsoverview, survey
AbstractIntegrating a production rules facility into a database system provides a uniform mechanism for a number of advanced database features including integrity constraint enforcement, derived data maintenance, triggers, alerters, protection, version control, and others. In addition, a database system with rule processing capabilities provides a useful platform for large and effcient knowledge-base and expert systems. Database systems with production rules are referred to as active database systems, and the of active database systems has indeed been active. This chapter summarizes current work in active database systems; topics covered include active database rule models and languages, rule execution semantics, and implementation issues. 0

54. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/ifip.ps
TitleA Mechanism and Experimental System for Function-Based Sharing in Federated Databases
AuthorsD. Fang , J. Hammer , D. McLeod
Year1992
CitationIn D.K. Hsiao, E.J. Neuhold, and R. Sacks-Davis, editors, Interoperable Database Systems (DS-5) (A-25), pages 239-253. Elsevier Science Publisher
KeywordsRPC, sharing, function, behavior, schema, evolution
AbstractA function-based approach and mechanism to support sharing among the component database systems in a federation is described. In the context of a functional object-based database model, a technique to support inter-component information unit and behavior sharing is presented. An experimental system that implements the function-based sharing mechanism is described, its underlying algorithms are outlined, and its practical utility and effectiveness are assessed. This work is couched in the framework of the Remote-Exchange research project and experimental system. Keyword Codes: H.2.5. Keywords: Heterogeneous Databases

55. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/iwdom.ps
TitleAn Approach to Behavior Sharing in Federated Database Systems
AuthorsD. Fang , J. Hammer , D. McLeod
Year1993
CitationIn Distributed Object Management. Morgan Kaufman, 1993, M. T. Ozsu, U.Dayal, and P. Valduriez, editors.
KeywordsRPC, sharing, method, behavior, schema, evolution
AbstractAn approach and mechanism to support the sharing of behavior among the component database systems in a federation is described. In the context of a functional object-based database model, a technique to support intercomponent behavior as well as inter-component information unit sharing is presented. An experimental implementation of the behavior sharing mechanism and algorithms is examined, and its practical utility and effectiveness are assessed. This work is couched in the framework of the Remote-Exchange research project and experimental system.

56. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/compcon.ps
TitleRemote-Exchange: An Approach to Controlled Sharing among Autonomous, Heterogeneous Database Systems
AuthorsD. Fang , J. Hammer , D. McLeod , A. Si
Year
CitationIn Proceedings of the IEEE Spring Compcon. IEEE, San Francisco, February 1991.
KeywordsIntegration, heterogeneity, schema, sharing, resolution
AbstractThis short paper describes several aspects of the Remote{Exchange project at USC, which focuses on an approach and experimental system for the controlled sharing and exchange of information among autonomous, heterogeneous database systems. The spectrum of heterogeneity which may exist among the components in a federation of database systems is examined, and an approach to accommodating such heterogeneity is described. An overview of the Remote{Exchange experimental system is provided, including the top level architecture, sharing mechanism, and sharing "advisor".

57. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/nestedSagas.ps
TitleCoordinating Multi-Transaction Activities
AuthorsH. Garcia-Molina , D. Gawlick , J. Klein , K. Kleissner , K. Salem
Year1990
CitationTHIS IS A REVISED VERSION OF TECHNICAL REPORT CS-TR-297-90, DEPARTMENT OF COMPUTER SCIENCE, PRINCETON UNIVERSITY, FEBRUARY 1990. A SHORTER VERSION OF THIS PAPER APPEARED AS "Modeling Long-Running Activities as Nested Sagas," in Database Engineering, Vol. 14, No. 1, March, 1991.
Keywords
AbstractData processing applications must often execute collections ofrelated transactions.We propose a model for structuring and coordinatingthese multi-transaction activities.The model includes mechanisms forcommunication between transactions, for compensating transactions afteran activity has failed, for dynamic creation and binding of activities, andfor checkpointing the progress of an activity.THIS IS A REVISED VERSION OF TECHNICAL REPORTCS-TR-297-90, DEPARTMENT OF COMPUTER SCIENCE, PRINCETONUNIVERSITY,FEBRUARY1990. ASHORTER VERSION OF THISPAPER APPEARED AS ``Modeling Long-Running Activities as NestedSagas,''INDatabase Engineering,Vol. 14, No. 1, March, 1991.

58. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/nestedSagas.ps
TitleCoordinating Multi-Transaction Activities
AuthorsH. Garcia-Molina , D. Gawlick , J. Klein , K. Kleissner , K. Salem
Year1990
CitationREVISED VERSION OF TECHNICAL REPORT CS-TR-297-90, DEPARTMENT OF COMPUTER SCIENCE, PRINCETON UNIVERSITY, FEBRUARY 1990. A SHORTER VERSION OF THIS PAPER APPEARED AS "Modeling Long-Running Activities as Nested Sagas," in Database Engineering, Vol. 14, No. 1, March, 1991.
Keywords
AbstractData processing applications must often execute collections ofrelated transactions.We propose a model for structuring and coordinatingthese multi-transaction activities.The model includes mechanisms forcommunication between transactions, for compensating transactions afteran activity has failed, for dynamic creation and binding of activities, andfor checkpointing the progress of an activity.THIS IS A REVISED VERSION OF TECHNICAL REPORTCS-TR-297-90, DEPARTMENT OF COMPUTER SCIENCE, PRINCETONUNIVERSITY,FEBRUARY1990. ASHORTER VERSION OF THISPAPER APPEARED AS ``Modeling Long-Running Activities as NestedSagas,''INDatabase Engineering,Vol. 14, No. 1, March, 1991.

59. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/tsimmis-abstract.ps
TitleIntegrating and Accessing Heterogeneous Information Sources in TSIMMIS
AuthorsH. Garcia-Molina , J. Hammer , K. Ireland , Y. Papakonstantinou , J. Ullman , J. Widom
Year1994
CitationSubmitted for publication.
Keywords
AbstractHector Garcia-Molina, Joachim Hammer, Kelly Ireland, Yannis Papakonstantinou, Jeffrey Ullman, Jennifer Widom Department of Computer Science Stanford University Stanford, CA 94305-2140 last-name@cs.stanford.edu 1 TSIMMIS and its Components A common problem facing many organizations today is that of multiple, disparate information sources and repositories, including databases, object stores, knowledge file systems, digital libraries, information retrieval systems, and electronic mail systems. Decision makers often need information from multiple sources, but are unable to get and fuse the required information in a timely fashion due to the diffculties of accessing the different systems, and due to the fact that the information obtained can be inconsistent and contradictory. The goal of the TSIMMIS 1 project is to provide tools for accessing, in an integrated fashion, multiple information sources. Numerous other recent projects have similar goals and are discussed briefly at the end of th

60. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/tsimmis-models-languages.ps
TitleThe TSIMMIS Approach to Mediation: Data Models and Languages
AuthorsH. Garcia-Molina , Y. Papakonstantinou , D. Quass , A. Rajaraman , Y. Sagiv , J. Ullman , J. Widom
Year1995
CitationTo appear in NGITS (Next Generation Information Technologies and Systems), Naharia, Isreal, June 27-29, 1995.
Keywords
AbstractTSIMMIS -- The Stanford-IBM Manager of Multiple Information Sources -- is a system for integrating information. It offers a data model and a common query language that are designed to support the combining of information from many different sources. It also offers tools for generating automatically the components that are needed to build systems for integrating information. In this paper we shall discuss the principal architectural features and their rationale.

61. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/tsimmis-abstract-aaai.ps
TitleIntegrating and Accessing Heterogeneous Information Sources in TSIMMIS
AuthorsH. Garcia-Molina , J. Hammer , K. Ireland , Y. Papakonstantinou , J. Ullman , J. Widom
Year1995
CitationTo appear in Proceedings of AAAI Spring Symposium on Information Gathering
Keywords
AbstractThe goal of the Tsimmis Project is to develop tools that facilitate the rapid integration of heterogeneous information sources that may include both structured and unstructured data. This paper gives an overview of the project, describing components that extract properties from unstructured objects, that translate information into a common object model, that combine information from several sources, and that allow browsing of information. TSIMMIS and its Components A common problem facing many organizations today is that of multiple, disparate information sources and repositories, including databases, object stores, knowledge bases, file systems, digital libraries, information retrieval systems, and electronic mail systems. Decision makers often need information from multiple sources, but are unable to get and fuse the required information in a timely fashion due to the diffculties of accessing the different systems, and due to the fact that the information obtained can be inconsisten

62. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/pdis96.ps
TitledSCAM: Finding Document Copies across Multiple Databases
AuthorsH. Garcia-Molina , L. Gravano , N. Shivakumar
Year1996
CitationProceedings of 4th International Conference on Parallel and Distributed Information Systems (PDIS'96), Miami Beach, Florida
Keywords
AbstractThe advent of the Internet has made the illegal dissemination of copyrighted material easy. An important problem is how to automatically detect when a "new" digital document is "suspiciously close" to existing ones. The SCAM project at Stanford University has addressed this problem when there is a single registered-document database. However, in practice, text documents may appear in many autonomous databases, and one would like to discover copies without having to exhaustively search in all databases. Our approach, dSCAM, is a distributed version of SCAM that keeps succinct metainformation about the contents of the available document databases. Given a suspicious document S, dSCAM uses its information to prune all databases that cannot contain any document that is close enough to S, and hence the search can focus on the remaining We also study how to query the remaining databases so as to minimize different querying costs. We empirically study the pruning and searching schemes, using

63. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/dataguide.ps
TitleDataGuides: Enabling Query Formulation and Optimization in Semistructured Databases
AuthorsR. Goldman , J. Widom
Year1997
CitationTechnical Report
Keywords
AbstractIn F7Times-ItalicF7S38semistructured databases there is no schema fixed in advance. To provide the benefits of a schemain such environments, we introduce DataGuides: concise and accurate structural summaries ofsemistructured databases. DataGuides serve as dynamic schemas, generated from the database; they areuseful for browsing database structure, formulating queries, storing information such as statistics andsample values, and enabling query optimization. This paper presents the theoretical foundations ofDataGuides along with algorithms for their creation and incremental maintenance. We provideperformance results based on our implementation of DataGuides in the Lore DBMS for semistructureddata. We also describe the use of DataGuides in Lore, both in the user interface to enable structurebrowsing and query formulation, and as a means of guiding the query processor and optimizing queryexecution.F5S58

64. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/stan.cs.tn.93.002.ps
TitleThe Efficacy of GlOSS for the Text Database Discovery Problem
AuthorsL. Gravano , H. Garcia-Molina , A. Tomasic
Year1993
CitationStanford University Technical Note Number STAN-CS-TN-93-002, 1993; part of paper appeared in SIGMOD '94 (stan.cs.tn.93.002.sigmod94.ps)
Keywords
AbstractThe popularity of information retrieval has led users to a new problem: finding which text databases (out of thousands of candidate choices) are the most relevant to a user. Answering a given query with a list of relevant databases is the text database discovery problem. The first part of this paper presents a practical method for attacking this problem based on estimating the result size of a query and a database. The method is termed GlOSS-Glossary of Servers Server. The second part of this paper evaluates GlOSS using four different semantics to answer a user's queries. Real users' queries were used in the experiments. We also describe several variations of GlOSS and compare their effcacy. In addition, we analyze the storage cost of our approach to the problem.

65. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/stan.cs.tn.93.002.sigmod94.ps
TitleThe Effectiveness of GlOSS for the Text-Database Discovery Problem
AuthorsL. Gravano , H. Garcia-Molina , A. Tomasic
Year1994
CitationAppeared in SIGMOD '94; shortened version of stan.cs.tn.93.002.ps
Keywords
AbstractThe popularity of on-line document databases has led to a new problem: finding which text databases (out of many candidate choices) are the most relevant to a user. Identifying the relevant databases for a given query is the text database discovery problem. The first part of this paper presents a practical solution based on estimating the result size of a query and a database. The method is termed GlOSS{Glossary of Servers Server. The second part of this paper evaluates the effectiveness of GlOSS based on a trace of real user queries. In addition, we analyze the storage cost of our approach.

66. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/stan.cs.tn.94.010.ps
TitlePrecision and Recall of GlOSS Estimators for Database Discovery
AuthorsL. Gravano , H. Garcia-Molina , A. Tomasic
Year1994
CitationStanford University Technical Note Number STAN-CS-TN-94-010, 1994; part of paper appeared in PDIS '94 (stan.cs.tn.94.010.pdis94.ps)
Keywords
AbstractThe availability of large numbers of network information sources has led to a new problem: finding which text databases (out of perhaps thousands of choices) are the most relevant to a query. We call this the text-database discovery problem. Our solution to this problem, GlOSS{Glossary-Of-Servers Server, keeps statistics on the available databases to decide which ones are potentially useful for a given query. In this paper we present different query-result size estimators for GlOSS and we evaluate them with metrics based on the precision and recall concepts of text-document information-retrieval theory. Our generalization of these metrics uses different notions of the set of relevant databases to define different query semantics.

67. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/stan.cs.tn.94.010.pdis94.ps
TitlePrecision and Recall of GlOSS Estimators for Database Discovery
AuthorsL. Gravano , H. Garcia-Molina , A. Tomasic
Year1994
CitationAppeared in PDIS '94; shortened version of stan.cs.tn.94.010.ps
Keywords
AbstractPrecision and Recall of GlOSS Estimators for Database Discovery Luis Gravano H ector Garc a-Molina Anthony Tomasic Computer Science Department Stanford University Stanford, CA 94305-2140 fgravano,hector,tomasicg@cs.stanford.edu 1 Overview On-line information vendors offer access to multiple databases. In addition, the advent of a variety of INTERNET tools [1, 2] has provided easy, distributed access to many more databases. The result is thousands of text databases from which a user may choose for a given information need (a user This paper, an abridged version of [3], presents a framework for (and analyzes a solution to) this problem, which we call the text-database discovery problem (see [3] for a survey of related wOur solution to the text-database discovery problem is to build a service that can suggest potentially good databases to search. A user's query will go through two steps: first, the query is presented to our server (dubbed GlOSS, for Glossary-Of-Servers Server) to select

68. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/stan.cs.tn.95.21.ps
TitleGeneralizing GlOSS to Vector-Space Databases and Broker Hierarchies
AuthorsL. Gravano , H. Garcia-Molina
Year1995
CitationStanford University Technical Note Number STAN-CS-TN-95-21, May 1995; shortened version appeared in VLDB '95
Keywords
AbstractAs large numbers of text databases have become available on the Internet, it is getting harder to locate the right sources for given queries. In this paper we present gGlOSS, a generalized Glossary-Of-Servers Server, that keeps statistics on the available databases to estimate which databases are the potentially most useful for a given query. gGlOSS extends our previous work [GGMT94a], which focused on databases using the boolean model of document retrieval, to cover databases using the more sophisticated vector-space retrieval model. We evaluate our new techniques using real-user queries and 53 databases. Finally, we further generalize our approach by showing how to build a hierarchy of gGlOSS brokers. The top level of the hierarchy is so small it could be widely replicated, even at end-user workstations. Keywords: resource discovery, database selection, vector-space retrieval model, information retrieval, text databases

69. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/vldb95.ps
TitleGeneralizing GlOSS to Vector-Space Databases and Broker Hierarchies
AuthorsL. Gravano , H. Garcia-Molina
Year1995
CitationAppeared in VLDB '95; shortened version of stan.cs.tn.95.21.ps
Keywords
AbstractAs large numbers of text databases have become available on the Internet, it is harder to locate the right sources for given queries. In this paper we present gGlOSS, a generalized Glossary-Of-Servers Server, that keeps statistics on the available databases to estimate which databases are the potentially most useful for a given query. gGlOSS extends our previous work [1], which focused on databases using the boolean model of document retrieval, to cover databases using the more sophisticated vector-space retrieval model. We evaluate our new techniques using real-user queries and 53 databases. Finally, we further generalize our approach by showing how to build a hierarchy of gGlOSS brokers. The top level of the hierarchy is so small it could be widely replicated, even at end-user workstations. This research was sponsored by the Advanced Research Projects Agency (ARPA) of the Department of Defense under Grant No. MDA972-92-J-1029 with the Corporation for National Research Initiatives Th

70. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/sigmod97.ps
TitleSTARTS: Stanford Proposal for Internet Meta-Searching
AuthorsL. Gravano , K. Chang , H. Garcia-Molina , A. Paepcke
Year1996
CitationSubmitted for publication
Keywords
AbstractDocument sources are available everywhere, both within the internal networks of organizations and on the Internet. Even individual organizations use search engines from different vendors to index their internal document collections. These search engines are typically incompatible in that they support different query models and interfaces, they do not return enough information with the query results for adequate merging of the results, and finally, in that they do not export metadata about the collections that they index (e.g., to assist in resource discovThis paper describes STARTS, an emerging protocol for Internet retrieval and search that facilitates the task of querying multiple document sources. STARTS has been developed in a unique way. It is not a standard, but a group effort coordinated by Stanford's Digital Library project, and involving over 11 companies and organizations. The objective of this paper is not only to give an overview of the STARTS protocol proposal, but also t

71. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/cc-federated.ps
TitleProtocols for Integrity Constraint Checking in Federated Databases
AuthorsP. Grefen , J. Widom
Year1994
CitationTechnical report; shortened proceedings version appears in coopis.ps
Keywordsdistributed database, heterogeneous database
AbstractA federated database is comprised of multiple interconnected database systems that primarily operate independently but cooperate to a certain extent. Global integrity constraints can be very useful in federated databases, but the lack of global queries, global transaction mechanisms, and global concurrency control renders traditional constraint management techniques inapplicable. This paper presents a threefold contribution to integrity constraint checking in federated databases: (1) The problem of constraint checking in a federated database environment is clearly formulated. (2) A family of protocols for constraint checking is presented. (3) The differences across protocols in the family are analyzed with respect to system requirements, properties guaranteed by the protocols, and processing and communication costs. Thus, our work yields a suite of options from which a protocol can be chosen to suit the system capabilities and integrity requirements of a particular federated database

72. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/coopis.ps
TitleIntegrity Constraint Checking in Federated Databases
AuthorsP. Grefen , J. Widom
Year1996
CitationTo appear in CoopIS '96; shortened proceedings version of cc-federated.ps
Keywordsdistributed database, heterogeneous database
AbstractA federated database is comprised of multiple interconnected databases that cooperate in an autonomous fashion. Global integrity constraints are very useful in federated databases, but the lack of global queries, global transaction mechanisms, and global concurrency control renders traditional constraint management techniques inapplicable. This paper presents a threefold contribution to integrity constraint checking in federated databases: (1) The problem of constraint checking in a federated database environment is clearly formulated. (2) A family of cooperative protocols for constraint checking is presented. (3) The differences across protocols in the family are analyzed with respect to system requirements, properties guaranteed, and costs involved. Thus, we provide a suite of options with protocols for various environments with specific system capabilities and integrity requirements.

73. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/sigmod93-1.ps
TitleLocal Verification of Global Integrity Constraints in Distributed Databases
AuthorsA. Gupta , J. Widom
Year1993
CitationAppeared in SIGMOD '93
Keywords
AbstractWe present an optimization for integrity constraint verification in distributed databases. The optimization allows a global constraint, i.e. a constraint spanning multiple databases, to be verified by accessing data at a single database, eliminating the cost of accessing remote data. The optimization is based on an algorithm that takes as input a global constraint and data to be inserted into a local database. The algorithm produces a local condition such that if the local data satisfies this condition then, based on the previous satisfaction of the global constraint, the global constraint is still satisfied. If the local data does not satisfy the condition, then a conventional global verification procedure is required.

74. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/pods94.ps
TitleConstraint Checking with Partial Information
AuthorsA. Gupta , Y. Sagiv , J. Ullman , J. Widom
Year1994
CitationAppeared in PODS '94
Keywords
AbstractConstraints are a valuable tool for managing information across multiple databases, as well as for general purposes of assuring data integrity. However, effcient implementation of constraint checking is diffcult. In this paper we explore techniques for assuring constraint satisfaction without performing a complete evaluation of the constraints. We consider methods that use only constraint definitions, methods that use constraints and updates, and methods that use constraints, updates, and "local" data.

75. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/GP2.ps
TitleGeneralized Projections: A Powerful Approach To Aggregation
AuthorsA. Gupta , V. Harinarayan , D. Quass
Year1995
CitationUNpublished Manuscript. A full version of the paper in VLDB'95
Keywords
AbstractIn this paper we introduce generalized projections an extension of duplicate-eliminating projections, that capture aggregations, groupbys, conventional projection with duplicate elimination (distinctand duplicate-preserving projections in a common unified framework. Using GPs we extend well known and simple algorithms for SQL queries that use distinct projections to derive algorithms for queries using aggregations like sum-max-min-count and avg. We develop powerful query rewrite rules for aggregate queries that unify and extend rewrite rules previously known in the literature. We then illustrate the power of our approach by solving a very practical and important problem in data warehousing: how to answer an aggregate query about base tables using materialized aggregate views (summary Keywords: aggregation, data warehousing, materialized views, query optimization

76. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/vldb.ps
TitleAggregate-Query Processing in Data Warehousing Environments
AuthorsA. Gupta , V. Harinarayan , D. Quass
Year1995
CitationAppeared in VLDB'95
Keywords
AbstractIn this paper we introduce generalized projections (G P an extension of duplicateeliminating projections, that capture aggregations, groupbys, duplicate-eliminating projections (distinctand duplicate-preserving projections in a common unified framework. Using G P s we extend well known and simple algorithms for SQL queries that use distinct projections to derive algorithms for queries using aggregations like sum, max, min, count, and avg. We develop powerful query rewrite rules for aggregate queries that unify and extend rewrite rules previously known in the literature. We then illustrate the power of our approach by solving a very practical and important problem in data warehousing: how to answer an aggregate query on base tables using materialized aggregate views (summary

77. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/parc.ps
TitleCommunication Efficient Matrix-Multiplication on Hypercubes.
AuthorsH. Gupta , P. Sadayappan
Year1994
CitationIn Proceedings of the Sixth Annual ACM Symposium on Parallel Algorithms and Architectures, 1994 (SPAA '94). Extended version in Parallel Computing, 22 (1996) 75-99.
Keywords
Abstract1 Introduction Dense matrix multiplication is used in a variety of applications and is one of the core components in many scientific computations. The standard way of multiplying two matrices of size n n requires O(n 3 ) floating point operations on a sequential machine. Since dense matrix multiplication is computationally expensive, the development of effcient algorithms for large distributed memory machines is of great interest. Matrix multiplication is a very regular computation and lends itself well to parallel implementation. One of the effcient approaches to design other parallel matrix or graph algorithms is to decompose them into a sequence of matrix multiplications [3, 9]. One of the earliest distributed algorithms proposed for matrix multiplication was by Cannon [2] in 1969 for 2-D meshes. Ho, Johnsson, and Edelman in [8] presented a variant of Cannon's algorithm which uses the full bandwidth of a 2-D grid embedded in a hypercube. Some other algorithms are by Dekel, Nassimi

78. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/joa.ps
TitleConstructing Piecewise Linear Homeomorphisms in Simple Polygons.
AuthorsH. Gupta , R. Wenger
Year1995
CitationTo appear in Journal of Algorithms.
Keywords
AbstractConstructing Piecewise Linear Homeomorphisms of Simple Polygons Himanshu Gupta Rephael W enger y December 5, 1995 Stanford University , Stanford CA 94305 (hgupta@cs.stanford.edu) W ork done while at The Ohio State University , Columbus, Ohio 43210. y Ohio State University , Columbus OH 43210 (wSupported in part by NSA grant MDA904-93-H-3026. 1

79. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/ver_np.ps
TitleResult Verification Algorithms for Optimization Problems.
AuthorsH. Gupta
Year1995
CitationTechnical Report, UIUC, 1995.
Keywords
AbstractIn this article we discuss the design of result verification algorithms for optimization problems. In particular, we design time-optimal result verification algorithms which verify the solution of all-pairs shortest paths, maximum-flow in a network, and matching problems. W e prove that polynomial-time verification algorithms for NP-complete problems do not exist exist, unless P = NP. Result verification problems for most of the NP-hard problems are not believed to be in NP. W e also consider verification algorithms for approximation algorithms for NP-complete and NP-hard problems.

80. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/CubeIndex.ps
TitleIndex Selection for OLAP
AuthorsH. Gupta , V. Harinarayan , A. Rajaraman , J. Ullman
Year1996
CitationUnpublished manuscript.
Keywords
AbstractOn-line analytical processing (OLAP) is a recent and important application of database systems. Typically, OLAP data is presented as a multidimensional "data cube." OLAP queries are complex and can take many hours or even days to run, if executed directly on the raw data. The most common method of reducing execution time is to precompute some of the queries into summary tables (subcubes of the data cube) and then to build indexes on these summary tables. In most commercial OLAP systems today, the summary tables that are to be precomputed are picked first, followed by the selection of the appropriate indexes on them. A trial-and-error approach is used to divide the space available between the summary tables and the indexes. This two-step process can perform very poorly. Since both summary tables and indexes consume the same resource --space -- their selection should be done together for the most effcient use of space. In this paper, we give algorithms that automate the selection of sum

81. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/CubeIndex.ps
TitleIndex Selection for OLAP
AuthorsH. Gupta , V. Harinarayan , A. Rajaraman
Year
CitationSubmitted Keywords: Views, Indexes, meterialized views, selection algorithms
Keywords
AbstractOn-line analytical processing (OLAP) is a recent and important application of database systems. Typically, OLAP data is presented as a multidimensional "data cube." OLAP queries are complex and can take many hours or even days to run, if executed directly on the raw data. The most common method of reducing execution time is to precompute some of the queries into summary tables (subcubes of the data cube) and then to build indexes on these summary tables. In most commercial OLAP systems today, the summary tables that are to be precomputed are picked first, followed by the selection of the appropriate indexes on them. A trial-and-error approach is used to divide the space available between the summary tables and the indexes. This two-step process can perform very poorly. Since both summary tables and indexes consume the same resource --space -- their selection should be done together for the most effcient use of space. In this paper, we give algorithms that automate the selection of sum

82. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/garlic-debull.ps
TitleAn Optimizer for Heterogeneous Systems with Non-Standard Data and Search Capabilities
AuthorsL. Haas , D. Kossmann , E. Wimmers , J. Yang
Year1996
CitationIn Special Issue on Query Processing for Non-Standard Data, IEEE Data Engineering Bulletin, 19(4):37-43, December 1996.
Keywords
AbstractMuch of the world's nonstandard data resides in specialized data sources. This data must often be queried together with data from other sources to give users the information they desire. Queries that can span several specialized sources with diverse search capabilities pose new challenges for query optimization. This paper describes the design and illustrates the use of a general purpose optimizer that can be taught about the capabilities of a new data source. Our optimizer produces plans of high quality even in the presence of nonstandard data, strange methods, and unusual query processing capabilities, and it easy to add new sources of standard or nonstandard data.

83. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/garlic.ps
TitleOptimizing Queries across Diverse Data Sources
AuthorsL. Haas , D. Kossmann , E. Wimmers , J. Yang
Year1997
CitationSubmitted for conference publication
Keywords
AbstractBusinesses today rely on large collections of data stored in diverse systems with differing capabilities. Database middleware systems offer the possibility of inter-relating data from these various sources via a single high-level query interface. One key issue for these systems is how to process queries in an environment in which each source of data may have its own unique set of query processing capabilities. In this paper we present the design of a query optimizer for Garlic [C + 95], a middleware system designed to integrate data from a broad range of data sources, with very different query capabilities. Garlic's optimizer extends the rule-based approach of Lohman [Loh88] to work in a heterogeneous environment, by defining generic rules, which produce specific plans using wrapper-provided rules that encapsulate the capabilities of a particular data source. This approach offers great advantages in terms of plan quality, extensibility to new sources, incremental implementation of rul

84. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/ijicis-93.ps
TitleAn Approach to Resolving Semantic Heterogeneity in a Federation of Autonomous, Heterogeneous Database Systems
AuthorsJ. Hammer , D. McLeod
Year1993
CitationIn International Journal of Intelligent & Cooperative Information Systems, World Scientific, Vol. 2, Num. 1, 51-83, 1993, M. Papazoglou and T. Sellis, editors-in-chief.
KeywordsFederated, heterogeneous, semantic, integration, resolution, lexicon
AbstractAn approach to accommodating semantic heterogeneity in a federation of interoperable, autonomous, heterogeneous databases is presented. A mechanism is described for identifying and resolving semantic heterogeneity while at the same time honoring the autonomy of the database components that participate in the federation. A minimal, common data model is introduced as the basis for describing sharable information, and a three-pronged facility for determining the relationships between information units (objects) is developed. Our approach serves as a basis for the sharing of related concepts through (partial) schema unification without the need for a global view of the data that is stored in the different components. The mechanism presented here can be seen in contrast with more traditional approaches such as "integrated databases" or "distributed databases". An experimental prototype implementation has been constructed within the framework of the Remote-Exchange experimental system. Keyw

85. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/dbta.ps
TitleObject Discovery and Unification in a Federated Database System
AuthorsJ. Hammer , D. McLeod , A. Si
Year1993
CitationIn Proceedings of the Workshop on Interoperability of Database Systems and Database Applications, pages 3-18. Swiss Information Society, University of Fribourg, Switzerland, October 1993.
KeywordsFederated, semantic, integration, resolution, dictionary, broker
Abstract

86. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/hammer.thesis.ps
TitleResolving Semantic Heterogeneity in a Federation of Autonomous, Heterogeneous database Systems.
AuthorsJ. Hammer
Year1994
CitationTechnical Report USC-CS-94-569, Computer Science Department, University of Southern California, Los Angeles, CA 90089-0781, May 1994. Ph.D. thesis.
KeywordsFederated, heterogeneous, semantic, resolution, lexicon, integration
AbstractResolving Semantic Heterogeneity in a Federation of Autonomous, Heterogeneous Database Systems by Joachim HammerA Dissertation Presented to the F ACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Science) August 1994 Copyright 1994 Joachim Hammer

87. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/hicss.ps
TitleAn Intelligent System for Identifying and Integrating Non-Local Objects in Federated Database Systems
AuthorsJ. Hammer , D. McLeod , A. Si
Year1994
CitationIn Proceedings of the 27th Hawaii International Conference on System Sciences, pages 389-407. Computer Society of the IEEE, University of Hawaii, Hawaii, USA, January 1994.
KeywordsFederated, semantic, integration, resolution, dictionary, broker
AbstractSupport for interoperability among autonomous, heterogeneous database systems is emerging as a key information management problem for the 1990s. A key challenge for achieving interoperability among multiple database systems is to provide capabilities to allow information units and resources to be flexibly and dynamically combined and interconnected while at the same time, preserve the investment in and autonomy of each existing system. This research specifically focuses on two key aspects of this: (1) how to discover the location and content of relevant, non-local information units, and (2) how to identify and resolve the semantic heterogeneity that exists between related information in different database components. We demonstrate and evaluate our approach using the Remote-Exchange experimental prototype system, which supports information sharing and exchange from the above perspe

88. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/demo-final.ps
TitleInformation Translation, Mediation, and Mosaic-Based Browsing in the TSIMMIS System
AuthorsJ. Hammer , H. Garcia-Molina , K. Ireland , Y. Papakonstantinou , J. Ullman , J. Widom
Year1995
CitationIn Exhibits Program, SIGMOD '95.
KeywordsWWW, interface, browsing, graphical, prototype
AbstractInformation T ranslation, Mediation, and Mosaic-Based Browsing in the TSIMMIS System Joachim Hammer, Hector Garcia-Molina, Kelly Ireland, Yannis Papakonstantinou, Jeffrey Ullman, Jennifer Widom Department of Computer Science Stanford University, Stanford, CA 94305-2140, USA E-mail: joachim@cs.stanford.edu The tsimmis project [1] provides an architecture and tools for accessing multiple heterogeneous information sources by translating source information into a common self-describing object model, called the Object Exchange Model (OEM) [2]. tsimmis provides integrated access to heterogeneous sources through a layer of source specific translators as well as "intelligent" modules, called mediators. Translators (wrappers) convert queries over information in the common del (OEM) into requests the source can execute. The data returned by the source is converted back into the common model. Mediators are programs that collect information from one or more sources, process and combine it, and e

89. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/whips-overview.ps
TitleThe Stanford Data Warehousing Project
AuthorsJ. Hammer , H. Garcia-Molina , J. Widom , W. Labio , Y. Zhuge
Year1995
CitationIn IEEE Data Engineering Bulletin, June 1995
KeywordsWarehouse, view, integration, source monitor, diff operator,
AbstractThe goal of the data warehousing project at Stanford (the WHIPS project) is to develop algorithms and tools for the effcient collection and integration of information from heterogeneous and autonomous sources, including legacy sources. In this paper we give a brief overview of the WHIPS project, and we describe some of the research problems being addressed in the initial phase of the project.

90. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/chapter.ps
TitleOn the Resolution of Representational Diversity in Multidatabase Systems
AuthorsJ. Hammer , D. McLeod
Year1995
CitationSubmitted for publication
KeywordsMultidatabase system, resolution, semantic, integration, heterogeneity
AbstractOn the Resolution of Representational Diversity Joachim Hammer Department of Computer Science Stanford University Stanford, CA 94305 Dennis McLeoDepartment of Computer Science University of Southern California Los Angeles, CA 90089-0781 September 1995 1 Introduction A major problem that arises frequently when attempting to support information sharg among autonomous heterogeneous database systems is the occurrence of representational differences that exist among related data objects in different component systems. A collection of cooperating but heterogeneous, autonomous component database systems may be termed a multidatabase system (MDBS) [LMR90], or loosely-coupled federated database system [SL90]. Such cooperating systems attempt to support operations for partial and controlled sharing of data with minimal affect on existing applications. In this context, a key to supporting sharing involves folding of non-local data into the schema of an importing (local) component as gracefully a

91. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/rules-overview.ps
TitleAn Overview of Production Rules in Database Systems
AuthorsE. Hanson , J. Widom
Year1993
CitationAppeared in The Knowledge Engineering Review, June 1993; published while at IBM Almaden
Keywordsactive database, survey
AbstractDatabase researchers have recognized that integrating a production rules facility into a database system provides a uniform mechanism for a number of advanced database features including integrity constraint enforcement, derived data maintenance, triggers, protection, version control, and others. In addition, a database system with rule processing capabilities provides a useful platform for large and effcient knowledge-base and expert systems. Database systems with production rules are referred to as active database systems, and the field of active database systems has indeed been active. This paper summarizes current work in active database systems and suggests future research directions. Topics covered include database rule languages, rule processing semantics, and implementation issues. 0

92. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/icdt.ps
TitleOptimization Using Tuple Subsumption.
AuthorsV. Harinarayan , A. Gupta
Year1994
CitationAppeared in ICDT 1995
Keywords
AbstractA tuple t 1 of relation R subsumes tuple t 2 of R, with respect to a query Q if for every database, tuple t 1 derives all, and possibly more, answers to query Q than derived by tuple t 2 . Therefore, the subsumed tuple t 2 can be ignored with respect to Q in the presence of tuple t 1 in relation R. This property finds use in a large number of problems. For instance: during query optimization subsumed tuples can be ignored thereby avoiding the computation of redundant answers; the size of cached information in distributed and object oriented systems can be reduced by omitting subsumed tuples; constraints need not be checked and rules need not be recomputed when provably subsumed updates are made. We give algorithms for deciding effciently when a tuple subsumes another tuple for queries that use arbitrary mathematical functions. We characterize queries for which, whenever a set of tuples T subsumes a tuple t then one of the tuples in T also subsumed t, yielding effciently verifiable cas

93. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/GP.ps
TitleGeneralized Projections: A Powerful Query-Optimization Technique.
AuthorsV. Harinarayan , A. Gupta
Year1994
CitationTechnical Note: STAN-CS-TN-94-14
Keywords
Abstract

94. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/cube.ps
TitleImplementing Data Cubes Efficiently
AuthorsV. Harinarayan , A. Rajaraman , J. Ullman
Year1995
CitationAppeared in SIGMOD'96. Best Paper Award.
Keywords
AbstractDecision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total sales. The values of many of these cells are dependent on the values of other cells in the data cube. A common and powerful query optimization technique is to materialize some or all of these cells rather than compute them from raw data each time. Commercial systems differ mainly in their approach to materializing the data cube. In this paper, we investigate the issue of which cells (views) to materialize when it is too expensive to materialize all views. A lattice framework is used to express dependencies among views. We then present greedy algorithms that work off this lattice and determine a good set of views to materialize. The greedy algorithm performs within a small constant factor o

95. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/fdvsm.ps
TitleExploiting Dependencies to Enhance View Self-Maintainability
AuthorsN. Huyn
Year1996
CitationDetailed version of a paper submitted for publication.
Keywords
AbstractView self-maintenance is the process of incrementally refreshing a materialized view using the view instance and the update to some base relation, but without examining any of the base relations, or using only a specified subset of the base relations. A new problem is created when base data use is limited in view maintenance: can the view be maintained at all? When a base relation is not available for use in maintaining a view, the next-best form of knowledge that may be available is integrity constraints the relation is known to satisfy. Constraints are often available in practice, yet very little is known about how they can be exploited fully to enhance view self-maintainability The problem becomes even harder when the SM tests are required to be in some effcient query form, e.g., SQL queries. In this paper, we focus on the problem of determining view SM in the presence of functional dependencies. We first show that SM of a conjunctive-query (CQ) view can be reduced to a problem of

96. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/cqvsm.ps
TitleEfficient View Self-Maintenance
AuthorsN. Huyn
Year1996
CitationAppeared in Views 96 Workshop.
Keywords
AbstractWe consider the problem of maintaining a materialized view without accessing the base relations. More specifically, we would like to find a maximal test that guarantees that a view is self-maintainable (abbrev SM) under a given update to the base relations, i.e., can be maintained using only the view definition, its contents and the update. We observe that SM evaluation can be separated into a view-definition-time portion where a maximal test is generated solely based on the view definition, and an update-time portion where the test can be effciently applied to the view and the update. We call such a maximal test a Complete Test for View Self-Maintainability (abbrev This paper reports on some interesting new results for conjunctive-query views under insertion updates: 1) the CTSM's are extremely simple queries that look for certain tuples in the view to be maintained; 2) these CTSM's can be generated at view definition time using a very simple algorithm based on the concept of Minimal

97. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/testing-cqcn.ps
TitleTesting CQCN constraints Under Limited Data Access
AuthorsN. Huyn
Year1996
CitationTechnical Report.
Keywords
Abstract

98. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/cqvsm-tr.ps
TitleEfficient View Self-Maintenance, A TECHNICAL REPORT
AuthorsN. Huyn
Year1996
CitationTechnical report. Detailed version of a paper that appeared in Views 96 Workshop.
Keywords
AbstractWe revisit the problem of finding a maximal test that guarantees that a view can be maintained using only the view, its definition and the update, regardless of the actual database instance. These maximal tests, we call Complete Tests for View Self-Maintainability (abbrev are also known in the literature as necessary and suffcient conditions for Conditionally Autonomously Computable Updates ([TB88This paper focuses on finding an effcient evaluation of self-maintainability (abbrev Our working hypothesis is that at least for conjunctive-query (abbrev CQ) views (i.e. select-project-join views with only equality SM evaluation can be separated into a view-definition-time portion where a maximal test is generated solely based on the view definition, and an update-time portion where the test is actually applied to the view and the update. We also conjecture that these tests can be expressed as queries over the view in some traditional query language. The obvious practical significance would

99. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/cqcnclt.ps
TitleEfficient Complete Local Tests for Conjunctive Query Constraints with Negation
AuthorsN. Huyn
Year1996
CitationTo appear in Proceedings of ICDT '97, Springer-Verlag.
Keywords
AbstractWe consider the problem of incrementally checking global integrity constraints without using all the relations under constraint. In many application areas such as collaborative design, mobile computing and enterprise information systems, total data availability cannot be assumed. Even if all base data is available, some of it may incur such a high cost that its use should only be considered as a last resort. Without looking at all the base data, how can one meaningfully check a constraint for violation? When the constraint is known to be satisfied prior to the update, the state the relations that are available (aka local) can in principle be used to infer something about the relations that are not available (aka This observation is the basis for the existence of tests that guarantee that data integrity is preserved under a given update, without looking at all the base data. In order to make integrity maintenance practical, the challenge is to find those tests that are most general (we

100. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/cqcnclt-tr.ps
TitleTesting CQC^\neg constraints under limited data access.
AuthorsN. Huyn
Year1996
CitationTechnical report. Detailed version of a paper to appear in Proceedings of ICDT '97, Springer-Verlag.
Keywords
AbstractIn many important application areas such as collaborative design, mobile computing and enterprise information systems, total data accessibility can not be assumed. Even if all data can be accessed, some of them may incur such a high cost that their access should only be considered as a last resort. Effciently checking if a local update violates a global integrity constraint spanning several relations poses new challenges when not all these relations are to be looked at. Without being given access to all relations under constraint, how can one meaningfully check a constraint for violation? When the constraint is known to be satisfied prior to the update, the state of the accessible relations can in principle be used to infer something about the inaccessible relations. Under limited data accessibility, the "complete local test" (or CLT in short) represents the best strategy to evaluate if the update can potentially violate the constraint. This paper addresses the problem of finding CLT'

101. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/qxlate-pmap.ps
TitleQuery Reformulation under Incomplete Mappings
AuthorsN. Huyn
Year1996
CitationComputer Science Technical Report STAN-CS-TR-96-1576.
Keywords
AbstractThis paper focuses on some of the important new translatability issues that arise in the problem of interop-eration between two database schemas when mappings between these schemas are inherently more com-plex than traditional views or pure Datalog programs can capture. In many cases, sources cannot beredesigned, and mappings among them exhibit some form of incompleteness under which the question ofwhether a query can be translated across different schemas is not immediately obvious. The notion ofquery we consider here is the traditional one, in which the answers to a query are required to be denite:answers cannot be disjunctive or conditional and must refer only to domain constants. In this paper, map-pings are modeled by Horn programs that allow existential variables, and queries are modeled by pureDatalog programs. We then consider the problem of eliminating functional terms from the answers to aHorn query where function symbols are allowed. We identify a class of Horn queries call

102. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/aggress-view-use.ps
TitleA More Aggressive Use Of Views To Extract Information
AuthorsN. Huyn
Year1996
CitationComputer Science Technical Report STAN-CS-TR-96-1577.
Keywords
AbstractMuch recent work has focussed on using views to evaluate queries. More specifically, queries are rewritten to refer to views instead of the base relations over which the queries were originally written. The motivation is that the views represent the only ways in which some information source may be accessed. Another use of views that has been overlooked becomes important especially when no alent rewriting of a query in terms of views is possible: even though we cannot use the views to get all the answers to the query, we can still use them to deduce as many answers as possible. In many global information applications, the notion of equivalence used is often too restrictive. We propose a notion of pseudo-equivalence that allows more queries to be rewritten usefully: we show that if a query has an equivalent rewriting, the query also has a pseudo-equivalent rewriting. The converse is not true in general. In particular, when the views are conjunctive, we show that all Datalog queries ove

103. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/zoo.ps
TitleDesktop Experiment Management
AuthorsY. Ioannidis , M. Livny , E. Haber , R. Miller , O. Tsatalos , J. Wiener
Year1993
CitationAppeared in IEEE Data Engineering Bulletin, 16(1), March 1993.
Keywords
AbstractDesktop Experiment Management Y. Ioannidis M. Livny E. Haber R. Miller O. Tsatalos J. Wiener fyannis,miron,haber,rmiller,odysseas,wienerg@cs.wisc.edu Computer Sciences Dept., 1210 W. Dayton St., Univ. of Wisconsin, Madison, WI 53706 1 Motivation Traditionally, the scale and scope of an experimental study was determined by the ability of the research team to generate data. Over the past few years, we have been experiencing an unprecedented increase in the ability of small teams of experimental scientists to generate data. This has led to a shift in the balance between the different components of an experimental study. Today, it is not uncommon to find a study whose scale and scope have been determined by the ability of the team to manage the data rather than to generate it. Whether the discipline is experimental computer science [4], genetics, earth and space sciences, soil sciences, or high-energy physics, scientists are faced in their daily work with an experiment and data managemen

104. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/gp.ps
TitleCracking and Co-evolving Randomizers
AuthorsJ. Jannink
Year1994
Citationin Advances in Genetic Programming; Ed. Kenneth E. Kinnear, chapter 20, MIT Press, 1994
KeywordsGenetic Programming, Randomizers, Co-evolution
AbstractCracking and Co-evolving Randomizers Jan Jannink Stanford University Computer Science Dept. Stanford, CA 94305 e-mail: jan@cs.stanford.edu

105. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/btree.ps
TitleImplementing Deletion in B+-trees
AuthorsJ. Jannink
Year1995
CitationSIGMOD RECORD, v.24, n.1, p.33-38, 1995
KeywordsB+tree, Deletion, Pseudo-code
AbstractThis paper describes algorithms for key deletion in B + -trees. There are published algorithms and pseudocode for searching and inserting keys, but deletion, due to its greater complexity and perceived lesser importance, is glossed over completely or left as an exercise to the reader. To remedy this situation, we provide a well documented flowchart, algorithm, and pseudo-code for deletion, their relation to search and insertion algorithms, and a reference to a freely available, complete B + -tree library written in the C programming language.

106. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/profile.ps
TitleData Management for User Profiles in Wireless Communication Systems
AuthorsJ. Jannink , D. Lam , N. Shivakumar , J. Widom , D. Cox
Year1995
CitationTechnical report
KeywordsMobile Communications, User Profile, Location Management, LMT, Hierarchical DB organization, Simulation
AbstractThe explosive growth in wireless communications systems and the demand for advanced mobility features have created novel data management problems. Current schemes to address these problems rely on database organizations that have limited functionality and performance anomalies. We propose a new data management scheme that is flexible and scalable, and that is incrementally deployable so it can coexist with current data management standards. We compare our protocol against current standards and other suggested protocols using realistic calling and mobility patterns. To do so, we have built Pleiades, an extensible event driven simulator that is easily configurable to arbitrary geographies, networks, calling and mobility patterns, and data management schemes. We present, for the first time, models to closely approximate user calling and mobility patterns. These models are validated against real call traffc and urban vehicle data. Based on simulations for a representative 24-hour period i

107. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/hiper-conf.ps
TitleEfficient and Flexible Location Management Techniques for Wireless Communication Systems
AuthorsJ. Jannink , D. Lam , N. Shivakumar , J. Widom , D. Cox
Year1996
CitationProceedings of 2nd ACM/IEEE International Conference on Mobile Computing and Networking (MOBICOM'96), White Plains, New York, November 1996
KeywordsMobile Communications, User Profile, Location Management, LMT, Hierarchical DB organization, HiPeR family, Simulation
AbstractWe consider the problem of managing the information required to locate users in a wireless communication system, with a focus on designing and evaluating location management techniques that are effcient, scalable, and flexible. The three key contributions of this paper are: (1) A family of location management techniques, HiPER (for Hierarchical ProfilE Replicationthat effciently provide lifelong (non-geographic) numbering with fast location lookup; (2) Pleiades, a scalable eventdriven wireless system simulator with realistic calling and mobility patterns derived from several months of real traffc traces; and (3) multiday simulations comparing our proposed location management techniques with current and previously proposed techniques on a realistic geographical and network topology.

108. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/hiper-journ.ps
TitleEfficient and Flexible Location Management Techniques for Wireless Communication Systems
AuthorsJ. Jannink , D. Lam , N. Shivakumar , J. Widom , D. Cox
Year1996
CitationSubmitted for journal publication
KeywordsMobile Communications, User Profile, Location Management, LMT, Hierarchical DB organization, HiPeR family, Simulation
AbstractWe consider the problem of managing the information required to locate users in a wireless communication system, with a focus on designing and evaluating location management techniques that are effcient, scalable, and flexible. The three key contributions of this paper are: (1) A family of location management techniques, HiPER (for Hierarchical ProfilE Replicationthat effciently provide life-long (non-geographic) numbering with fast location lookup; (2) Pleiades, a scalable event-driven wireless system simulator with realistic calling and mobility patterns derived from several months of real traffc traces; and (3) multi-day simulations comparing our proposed location management techniques with current and previously proposed techniques on a realistic geographical and network topology. Research supported by the Center for Telecommunications and the Center for Integrated Systems at Stanford Universityand by equipment grants from Digital and IBM Corporations. 1

109. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/rtdb.ps
TitleAn Overview of Real-Time Database Systems
AuthorsB. Kao , H. Garcia-Molina
Year1993
Citationto appear in the proceedings of NATO Advanced Study Institute on Real-Time Computing. St. Maarten, Netherlands Antilles, October 9, 1992, Springer-Verlag.
Keywords
AbstractAN OVER VIEW OF REAL-TIME DA T ABASE SYSTEMS Ben Kao 1;2 and Hector Garcia-Molina 2 1 Princeton University, Princeton NJ 08544, USA 2 Stanford University, Stanford CA 94305, USA 1 Introduction Traditionally, real-time systems manage their data (e.g. chamber temperature, aircraft locations) in application dependent structures. As real-time systems evolve, their applications become more complex and require access to more data. It thus becomes necessary to manage the data in a systematic and organized fashion. Database management systems provide tools for such organization, so in recent years there has been interest in "merging" database and real-time technology. The resulting integrated system, which provides database operations with real-time constraints is generally called a real-time database system (RTDBS) [1]. Like a conventional database system, a RTDBS functions as a repository of data, provides effcient storage, and performs retrieval and manipulation of information. However, as

110. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/sda1.ps
TitleDeadline Assignment in a Distributed Soft Real-Time System
AuthorsB. Kao , H. Garcia-Molina
Year1993
Citationin the proceedings of the 13th International Conference on Distributed computing Systems. 1993.
Keywords
AbstractIn a distributed environment, tasks often have processing demands on multiple different sites. A distributed task is usually divided up into several subtasks, each one to be executed at some site in order. In a real-time system, an overall deadline is usually specified by an application designer indicating when a distributed task is to be finished. However, the problem of how a global deadline is automatically translated to the deadline of each individual subtask has not been well studied. This paper examines (through simulations) four strategies for subtask deadline assignment in a distributed soft real-time environment. Keywords: soft real-time, distributed systems, deadline assignment, scheduling.

111. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/dual_net.ps
TitleSoft Real-Time Communication Over Dual Non-Real-Time Networks
AuthorsB. Kao , H. Garcia-Molina
Year1993
Citationin the proceedings of the IEEE 1st International Workshop on Parallel and Distributed Real-Time Systems (1993).
Keywords
AbstractIn this paper we consider systems with redundant communication paths, and show how applications can exploit the redundancy to improve real-time communications. We consider two policies, one that evenly distributes load, and one that partitions load according to packet slackness. We evaluate the effectiveness of these policies through analysis and simulation.

112. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/sda2.ps
TitleSubtask Deadline Assignment for Complex Distributed Soft Real-Time Tasks
AuthorsB. Kao , H. Garcia-Molina
Year1993
CitationTechnical REport STAN-CS-93-1491, Stanford University, Oct, 1993.
Keywords
AbstractComplex distributed tasks often involve parallel execution of subtasks at different nodes. To meet the deadline of a global task, all of its parallel subtasks have to be finished on time. Comparing to a local task (which involves execution at only one noa global task may have a much harder time making its deadline because it is fairly likely that at least one of its subtasks run into an overloaded node. Another problem with complex distributed tasks occurs when a global task consists of a number of serially executing subtasks. In this case, we have the problem of dividing up the end-to-end deadline of the global task and assigning them to the intermediate subtasks. In this paper, we study both of these problems. Different algorithms for assigning deadlines to subtasks are presented and evaluated. Keywords: soft real-time, distributed systems, parallel systems, deadline assignment, scheduling.

113. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/dist_srt.ps
TitleOn Building Distributed Soft Real-Time Systems
AuthorsB. Kao , H. Garcia-Molina , B. Adelberg
Year1994
Citation
Keywords
AbstractWhen building a distributed real-time system, one can either build the whole system from scratch, or from pre-existing standard components. Although the former allows better scheduling design, it is not economical in terms of the cost and time of development. This paper studies the performance of distributed soft real-time systems that use standard components with various scheduling algorithms and suggests ways to improve them. Keywords: soft real-time, distributed systems, deadline assignment, priority assignment, scheduling.

114. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/view-cc-theory.ps
TitleConcurrency Control Theory for Deferred Materialized Views
AuthorsA. Kawaguchi , D. Lieuwen , I. Mumick , D. Quass , K. Ross
Year1996
CitationTo appear in ICDT'97
Keywords
AbstractWe consider concurrency control problems that arise in the presence of materialized views. Consider a database system supporting materialized views to speed up queries. For a range of important applications (e.g. banking, billing, network managementransactions that access materialized views would like to get some consistency guarantees--if a transaction reads a base relation after an update, and then reads a materialized view derived from the base relation, it expects to see the effect of the base update on the materialized view. If a transaction reads two views, it expects that the two views reflect a single consistent database state. Such guarantees are not easy to obtain, as materialized views become inconsistent upon updates to base relations. Immediate maintenance reestablishes consistency within the transaction that updates the base relation, but this consistency comes at the cost of delaying update transactions. Deferred maintenance has been proposed to avoid penalizing update

115. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/distributed-transactions.ps
TitleMaking Trust Explicit in Distributed Commerce Transactions
AuthorsS. Ketchpel , H. Garcia-Molina
Year1996
CitationAppeared in 16th International Conference on Distributed Computing Systems (DCS96), pp. 270-281
KeywordsDistributed commerce transactions, interaction graph, sequencing graph, trust, indemnities
AbstractIn a distributed environment where nodes are independently motivated, many transactions or commercial exchanges may be stymied due to a lack of trust between the participants. The addition of trusted intermediaries may facilitate some exchanges, but others are still problematic. We introduce a language for specifying these commercial exchange problems, and sequencing graphs, a formalism for determining whether a given exchange may occur . We also present an algorithm for generating a feasible execution sequence of pairwise exchanges between parties (when it eIndemnities may be offered to facilitate previously infeasible transactions. We show when and how they enable commercial transactions.

116. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/UPAI.ps
TitleU-PAI: A Universal Payment Application Interface
AuthorsS. Ketchpel , H. Garcia-Molina , A. Paepcke , S. Hassan , S. Cousins
Year1996
CitationAppeared in The Second USENIX Workshop on Electronic Commerce Proceedings. 1996. p. 105 - 121.
Keywordspayment mechanism, interoperation, AccountHandle, API
AbstractThe progress of electronic commerce has been stymied by the lack of widely accepted network payment mechanisms. A number of proposals have been put forward, and each one offers a slightly different protocol and set of features. Yet none has achieved the critical mass to become an accepted standard. We believe that there will continue to be a variety of payment mechanisms, so in this paper we propose U-PAI, a universal payment application interface that will enable a programmer to write for one interface, and then interact with any payment mechanism. Each payment mechanism can support the universal API directly, or a proxy or wrapper can be built to translate U-PAI calls to the appropriate native calls supported by the payment mechanisms. In this paper we illustrate how two such proxies could be built. We also provide, in the appendix, a full CORBA spof U-PAI.

117. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/aaai97.ps
TitleDistributed Commerce Transactions
AuthorsS. Ketchpel , H. Garcia-Molina
Year1997
CitationTech Report. 1997.
KeywordsDistributed Commerce Transactions, multi-agent systems, safe execution sequences, trusted intermediaries
AbstractIn situations where self-interested agents are interacting in an environment of distrust, commercial exchanges may be blocked due to a lack of trust. We propose a fully distributed algorithm that each agent may run to provide guarantees about the outcomes of such exchanges. The algorithm is shown in operation on two examples, one feasible and one not. Introduction In a multi-agent system with many information sources covering a vast range of topics, an agent with a query or information request might enlist the help of a number of brokers and sources in answering the query. These other agents may add value by combining results, or having better knowledge about or access to relevant sources. Information brokers may obtain documents from other sources and re-sell them to customers, who may in turn be brokering the documents to someone else. In exchange for provided services, these agents expect payment. The environment is one of distrust, so that a customer will not give payment before b

118. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/ijcai97.ps
TitleDistributed Commerce Transactions with Timing Deadlines and Direct Trust
AuthorsS. Ketchpel , H. Garcia-Molina
Year1997
CitationTech Report. 1997.
KeywordsDistributed Commerce Transactions, multi-agent systems, deadlines, direct trust
AbstractIn a multi-party transaction such as fulfilling an information request from multiple sources (also called a distributed commerce agents face risks from dealing with untrusted agents. These risks are compounded in the face of deadlines, e.g., an agent may fail to deliver purchased goods by the time the goods are needed. We present a distributed algorithm that mitigates these risks, showing that it is sound (produces only safe multi-agent action sequences) and complete (finds a safe sequence whenever one We also show how the algorithm may be extended so that agents may interact directly with other participants rather than through a trusted intermediary. Introduction The explosion of networked information sources leads to the sense that the answer to any question is out there, if only one has access to the right combination of sources. Search engines and information brokers help navigate the space, yet adding information services and sources which require payment exposes a number of haza

119. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/shoppingmodels.ps
TitleShopping Models: A Flexible Architecture for Information Commerce
AuthorsS. Ketchpel , H. Garcia-Molina , A. Paepcke
Year1997
CitationTech Report. 1997. Stanford Digital Libraries Project Working Paper SIDL-WP-1996-0052.
KeywordsShopping Models, electronic commerce, subscription, pay-per-view, Payment Monitor, Delivery Monitor, Order Monitor, U-PAI
AbstractIn a digital library, there are many different interaction models between customers and information providers or merchants. Subscriptions, sessions, pay-per-view, shareware, and pre-paid vouchers are different models that each have different properties. A single merchant may use several of them. Yet if a merchant wants to support multiple models, there is a substantial amount of work to implement each one. In this paper, we formalize the shopping models which represent these different modes of consumer to merchant interaction. In addition to developing the overall architecture, we define the application program interfaces (API) to interact with the models. We show how a small number of primitives can be used to construct a wide range of shopping models that a digital library can support, and provide examples of the shopping models in operation, demonstrating their flexibility.

120. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/view.ps
TitlePhysical Database Design for Data Warehousing
AuthorsW. Labio , D. Quass , B. Adelberg
Year1996
CitationTechnical Note
Keywords
AbstractData warehouses collect copies of information from remote sources into a single database. Since the remote data is cached at the warehouse, it appears as local relations to the users of the warehouse. To improve query response time, the warehouse administrator (WHA) will often materialize views defined on the local relations to support common or complicated Unfortunately, the requirement to keep the views consistent with the local relations creates additional overhead when the remote sources change. The warehouse is often kept only loosely consistent with the sources: it is periodically refreshed with changes sent from the source. When this happens, the warehouse is taken off-line until the local relations and materialized views can be updated. Clearly, the users would prefer as little down time as possible. Often the down time can be reduced by adding carefully selected materialized views or indexes to the physical schema. This paper studies how to select the sets of supporting views

121. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/window-short.ps
TitleEfficient Snapshot Differential Algorithms in Data Warehousing
AuthorsW. Labio , H. Garcia-Molina
Year1996
CitationVLDB, 1996
Keywords
AbstractDetecting and extracting modifications from information sources is an integral part of data warehousing. For unsophisticated sources, it is often necessary to infer modifications by periodically comparing snapshots of data from the source. Although this snapshot differential problem is closely related to traditional joins, there are significant differences, whiclead to simple new algorithms. In particular, we present algorithms that perform compression of records. We also present a window algorithm that works very well if the snapshots are not "very different." The algorithms are studied via analysis and an implementation of two of them; the results illustrate the potential gains achievable with the new algorithms.

122. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/window.ps
TitleEfficient Snapshot Differential Algorithms in Data Warehousing
AuthorsW. Labio , H. Garcia-Molina
Year1996
CitationTechnical Note
Keywords
AbstractDetecting and extracting modifications from information sources is an integral part of data warehousing. For unsophisticated sources, in practice it is often necessary to infer modifications by periodically comparing snapshots of data from the source. Although this snapshot differential problem is closely related to traditional joins and outerjoins, there are significant differences, which lead to simple new algorithms. In particular, we present algorithms that perform (possibly lossy) compression of records. We also present a window algorithm that works very well if the snapshots are not "very different." The algorithms are studied via analysis and an implementation of two of them; the results illustrate the potential gains achievable with the new algorithms.

123. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/expire.ps
TitleExpiring Data from a Warehouse
AuthorsW. Labio , H. Garcia-Molina
Year1997
CitationTechnical Report
KeywordsData Warehouse, Storage Management, View Maintenance
AbstractData warehouses are used to collect and analyze data from remote sources. The data collected often originate from transactional information and can become very large. This paper presents a framework for incrementally removing warehouse data (without a need to fully recompute offering two choices. One is to expunge data, in which case the result is as if the data had never existed. The second is to expire data, in which case views defined over the data are not necessarily affected. the framework, a user or administrator can specify what data to expire or expunge, what auxiliary data is to be kept for facilitating incremental view maintenance, what type of updates are expected from external sources, and how the system should compensate when data is expired or other parameters changed. We present algorithms for the various expiration and compensation actions, and we show how our framework can be implemented on top of a conventional RDBMS. Keywords: view maintenance, data warehouse

124. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/view-short.ps
TitlePhysical Database Design in Data Warehousing
AuthorsW. Labio , D. Quass , B. Adelberg
Year1997
CitationICDE
KeywordsData Warehousing, Views, Physical Database Design
AbstractData warehouses collect copies of information from remote sources into a single database. Since the remote data is cached at the warehouse, it appears as local relations to the users of the warehouse. To improve query response time, the warehouse administrator will often materialize views dened on the local relations to support common or complicated queries. Unfortunately, the requirement to keep the views consistent with the local relations creates additional overhead when the remote sources change. The warehouse is often kept only loosely consistent with the sources: it is periodically refreshed with changes sent from the source. When this happens, the warehouse is taken off-line until the local relations and materialized views can be updated. Clearly, the users would prefer as little down time as possible. Often the down time can be reduced by adding carefully selected materialized views or indexes to the physical schema. This paper studies how to select the sets of supporting view

125. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/whips-demo-sigmod97
TitleThe WHIPS Prototype for Data Warehouse Creation and Maintenance
AuthorsW. Labio , Y. Zhuge , J. Wiener , H. Gupta , H. Garcia-Molina , J. Widom
Year1997
CitationDemonstration description for SIGMOD 1997 and ICDE 1997.
Keywordsdemo, data warehousing, whips
Abstract

126. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/modeling.ps
TitleModeling Location Management in Personal Communications Services
AuthorsD. Lam , J. Jannink , D. Cox , J. Widom
Year1995
CitationTechnical report
KeywordsMobile Communications, Simulation, Topology Model, Call Model, Movement Model, IS-41
AbstractThis paper presents a framework for modeling and evaluating the performance of location management schemes for a Personal Communication Services (PCS) network. The models take into account complex human behaviors, and are based on actual measurements of these behaviors. Real measurements are a crucial factor in achieving realistic performance evaluations. We have developed a powerful and highly configurable simulator based on the framework. Simulation results, showing the performance of an existing location management standard, IS-41, are presented in terms of database performance requirements (transactions per second) and network traffc load (messages per

127. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/teletraffic.ps
TitleTeletraffic Modeling for Personal Communications Services
AuthorsD. Lam , D. Cox , J. Widom
Year1996
CitationTechnical report
Keywordsmobile computing, wireless computing, data modeling
AbstractThis paper presents a realistic teletraffc modeling framework for Personal Communications Services. The framework captures complex human behaviors and has been v alidated through analysis of actual call and mobility data. Using the proposed framework, a large-scale simulation was performed on a model of the San F rancisco Bay Area. Simulation results showing the performance of IS-41 are presented.

128. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/icupc96.ps
TitleModeling Location Management in Personal Communications Services
AuthorsD. Lam , J. Jannink , D. Cox , J. Widom
Year1996
CitationTo appear in ICUPC '96, Cambridge MA
KeywordsMobile Communications, Simulation, Topology Model, Call Model, Movement Model, IS-41
AbstractMo deling Lo cation Managemen t in P ersonal Comm unications Services Derek Lam, Jan Jannink, Donald C. Cox, Jennifer Widom Electrical Engineering & Computer Science Depts. Stanford University , Stanford, CA 94305 fdlam, dcoxg@wireless.stanford.edu, fjan, widomg@cs.stanford.edu Abstract-- This paper presents a realistic modeling framework for ev aluating the performance of location management schemes in PCS networks. The framework captures complex human behaviors and has been v alidated through analysis of actual call and mobility data. Simulation results, showing the performance of IS-41, are presented. I. Introduction Personal Communications Services (PCS) [2] presents many challenging problems in network data management [6;18]. A key problem in this area is location management. Location management refers to accessing and maintaining user information for call routing purposes. Important per-user information, such as current location, authentication information, and billing informati

129. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/lsw-cluster.ps
TitleClustering Association Rules
AuthorsB. Lent , A. Swami , J. Widom
Year1996
Citationtechnical report
Keywordsdata mining, association rules, clustering, segmentation, database
AbstractWe consider the problem of clustering two-dimensional association rules in large databases. We present a geometric-based algorithm, BitOp, for performing the clustering, embedded within an association rule clustering system, ARCS. Association rule clustering is useful when the user desires to segment the data. We measure the quality of the segmentation generated by ARCS using the Minimum Description Length (MDL) principle of encoding the clusters on several databases including noise and errors. Scale-up experiments show that ARCS, using the BitOp algorithm, scales linearly with the amount of data.

130. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/external-processors.ps
TitleAnswering Queries Using Limited External Query Processors
AuthorsA. Levy , A. Rajaraman , J. Ullman
Year1996
CitationAppears in PODS 1996.
Keywords
AbstractWhen answering queries using external information sources, their contents can be described by views. To answer a query, we must rewrite it using the set of views presented by the sources. When the external information sources also have the ability to answer some (perhaps limited) sets of queries that require performing operations on their data, the set of views presented by the source may be infinite (albeit encoded in some finite Previous work on answering queries using views has only considered the case where the set of views is finite. In order to exploit the ability of information sources to answer more complex queries, e consider the problem of answering conjunctive queries using infinite sets of views. Our first result is that an infinite set of views can be partitioned into a finite number of equivalence classes, such that picking one view from every nonempty class is suffcient to determine whether the query can be answered using the views. Second, we show how to compute the se

131. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/manifold.ps
TitleQuerying Heterogeneous Information Sources Using Source Descriptions
AuthorsA. Levy , A. Rajaraman , J. Ordille
Year1996
CitationTo appear in VLDB 1996.
Keywords
AbstractWe witness a rapid increase in the number of structured information sources that are available online, especially on the WWW. These sources include commercial databases on product information, stock market information, real estate, automobiles, and entertainment. We would like to use the data stored in these databases to answer complex queries that go beyond keyword searches. We face the following challenges: (1) Several information sources store interrelated data, and any query-answering system must understand the relationships between their contents. (2) Many sources are not full-featured database systems and can answer only a small set of queries over their data (for example, forms on the WWW restrict the set of queries one can (3) Since the number of sources is very large, effective techniques are needed to prune the set of information sources accessed to answer a query. (4) The details of interacting with each source vary greatly. We describe the Information Manifold, an implemen

132. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/aaai.ps
TitleQuery Answering Algorithms for Information Agents
AuthorsA. Levy , A. Rajaraman , J. Ordille
Year1996
CitationTo appear in AAAI 1996.
Keywords
AbstractThe database theory community, centered around the PODS (Principles of Database Systems) conference has had a long-term interest in logic as a way to represent "data," "information," and "knowledge" (take your pick on the term -- it boils down to facts or atoms and rules, usually Horn The approach of this community has been "slow and steady," preferring to build up carefully from simple special cases to more general ideas, always paying attention to how effciently we can process queries and perform other operations on the facts and rules. A powerful theory has developed, and it is beginning to have some impact on applications, especially information-integration engines. Datalog The term Datalog has been coined to refer to Prolog-like rules without function symbols, treated as logic program. Unlike Prolog, however, the conventional leastfixed-point semantics of the rules is used whenever possible. Example 1: The rules for ancestors can be written as the following Datalog program. anc(X

133. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/external97.ps
TitleIntegrating Dynamically-Fetched External Information into a DBMS for Semistructured Data
AuthorsJ. McHugh , J. Widom
Year1997
CitationTechnical Report
KeywordsSemistructured Databases, Heterogeneous Databases, External data, Dynamic Integration
AbstractWe describe the external data manager component of the Lore database system for semistructured data. Lore's external data manager enables dynamic retrieval and integration of data from arbitrary, heterogeneous external sources during query processing. The distinction between Lore-resident and external data is invisible to the user. We introduce a flexible notion of arguments that limits the amount of data fetched from an external source, and we have incorporated optimizations to reduce the number of calls to an external source.

134. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/kyoto.ps
TitleThe Identification and Resolution of Semantic Heterogeneity
AuthorsD. McLeod , D. Fang , J. Hammer
Year1991
CitationIn Proceedings of First International Workshop on Interoperability in Multidatabase Systems. Kyoto, Japan, April 1991.
KeywordsSemantic, heterogeneity, resolution, integration, schema
AbstractThis paper describes several aspects of the Remote{Exchange project at USC, which focuses on the controlled sharing and exchange of information among autonomous, heterogeneous database systems. The spectrum of heterogeneity which may exist among the components in a federation of database systems is examined, and an approach to accommodating such heterogeneity is described. An overview of the Remote{Exchange experimental system is provided.

135. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/cube-maint.ps
TitleMaintenance of Data Cubes and Summary Tables in a Warehouse
AuthorsI. Mumick , D. Quass , B. Mumick
Year1996
CitationTechnical note
Keywordsview maintenance, aggregation, summary table, data warehousing, olap
AbstractData warehouses contain large amounts of information, often collected from a variety of independent sources. Decision-support functions in a warehouse, such as on-line analytical processing involve hundreds of complex aggregate queries over large volumes of data. It is not feasible to compute these queries by scanning the data sets each time. Warehouse applications therefore build a number of summary tables, or materialized aggregate views, to help them increase the system performance. As changes, most notably new transactional data, are collected at the data sources, all summary tables at the warehouse that depend upon this data need to be updated. Usually, source changes are loaded into the warehouse at regular intervals, usually once a day, in a batch window, and the warehouse is made unavailable for querying while it is updated. Since the number of summary tables that need to be maintained is often large, a critical issue for data warehousing is how to maintain the summary tables

136. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/cube-maint-sigmod.ps
TitleMaintenance of Data Cubes and Summary Tables in a Warehouse
AuthorsI. Mumick , D. Quass , B. Mumick
Year1997
CitationTo appear in SIGMOD 1997; shortened proceedings version of cube-maint.ps
Keywordsview maintenance, aggregation, summary table, data warehousing, olap
AbstractData warehouses contain large amounts of information, often collected from a variety of independent sources. Decisionsupport functions in a warehouse, such as on-line analytical processing involve hundreds of complex aggregate queries over large volumes of data. It is not feasible to compute these queries by scanning the data sets each time. Warehouse applications therefore build a large number of summary tables, or materialized aggregate views, to help them increase the system performance. As changes, most notably new transactional data, are collected at the data sources, all summary tables at the warehouse that depend upon this data need to be updated. Usually, source changes are loaded into the warehouse at regular intervals, usually once a day, in a batch window, and the warehouse is made unavailable for querying while it is updated. Since the number of summary tables that need to be maintained is often large, a critical issue for data warehousing is how to maintain the summary ta

137. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/variant-indexes.ps
TitleImproved Query Performance with Variant Indexes
AuthorsP. Neil , D. Quass
Year1996
Citationfull version of variant-sigmod.ps".
Keywordsindexes, indexing, bitmap, data warehousing, olap
Abstract: The read-mostly environment of data warehousing makes it possible to use morecomplex indexes to speed up queries than in situations where concurrent updates are present. Thecurrent paper presents a short review of current indexing technology, including row-set repre-sentation by Bitmaps, and then introduces two approaches we call Bit-Sliced indexing andProjection indexing. A Projection index basically materializes all values of a column in RID or-der, and a Bit-Sliced index essentially takes an orthogonal bit-by-bit view of the same data.While some of these concepts started with the MODEL 204 product, and both Bit-Sliced andProjection indexing are now fully realized in Sybase IQ, this is the first rigorous examination ofsuch indexing capabilities in the literature. We compare algorithms that become feasible withthese variant index types against algorithms using more conventional indexes. The analysisdemonstrates important performance advantages for variant indexes in some types

138. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/variant-sigmod.ps
TitleImproved Query Performance with Variant Indexes
AuthorsP. Neil , D. Quass
Year1997
CitationTo appear in SIGMOD 1997; shortened proceedings version of variant-indexes.ps
Keywordsindexes, indexing, bitmap, data warehousing, olap
Abstract: The read-mostly environment of data warehousingmakes it possible to use more complex indexes to speed upqueries than in situations where concurrent updates are present.The current paper presents a short review of current indexingtechnology, including row-set representation by Bitmaps, andthen introduces two approaches we call Bit-Sliced indexing andProjection indexing. A Projection index materializes all valuesof a column in RID order, and a Bit-Sliced index essentially takesan orthogonal bit-by-bit view of the same data. While some ofthese concepts started with the MODEL 204 product, and bothBit-Sliced and Projection indexing are now fully realized inSybase IQ, this is the first rigorous examination of such index-ing capabilities in the literature. We compare algorithms thatbecome feasible with these variant index types against algo-rithms using more conventional indexes. The analysis demon-strates important performance advantages for variant indexes insome types of SQL aggreg

139. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/representative-object.ps
TitleRepresentative Objects: Concise Representations of Semistructured, Hierarchical Data
AuthorsS. Nestorov , J. Ullman , J. Wiener , S. Chawathe
Year1997
CitationAppeared in ICDE'97.
Keywordssemistructured data, schema discovery
AbstractIn this paper we introduce the representative object, which uncovers the inherent schema(s) in semistructured, hierarchical data sources and provides a concise description of the structure of the data. Semistructured data, unlike data stored in typical relational or object-oriented databases, does not have xed schema that is known in advance and stored separately from the data. With the rapid growth of the World Wide Web, semistructured hierarchical data sources are becoming widely available to the casual user . The lack of external schema information currently makes browsing and querying these data sources inefcient at best, and impossible at worst. We show how representative objects make schema discovery efcient and facilitate the generation of meaningful queries over the data.

140. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/icde95.ps
TitleObject Exchange Across Heterogeneous Information Sources
AuthorsY. Papakonstantinou , H. Garcia-Molina , J. Widom
Year1995
CitationAppeared in ICDE '95; shortened proceedings version of pub/papakonstantinou/1994/object-exchange-heterogeneous-is.ps; one cut-and-paste figure missing from ps file
Keywordsdata model, semistructured
AbstractWe address the problem of providing integrated access to diverse and dynamic information sources. We explain how this problem differs from the traditional database integration problem and we focus on one aspect of the information integration problem, namely information exchange. We define an object-based information exchange model and a corresponding query language that we believe are well suited for integration of diverse information sources. We describe how the model and language have been used to integrate heterogeneous bibliographic information sources. We also describe two general-purpose libraries we have implemented for object exchange between clients and servers.

141. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/cbr.ps
TitleCapabilities-Based Query Rewriting in Mediator Systems
AuthorsY. Papakonstantinou , A. Gupta , L. Haas
Year1995
CitationIBM Almaden Technical Report.
Keywords
AbstractUsers today are struggling to integrate a broad range of information sources that provide different levels of query capabilities. Currently, data sources with different and limited capabilities are accessed either by writing rich functional wrappers for the more primitive sources, or by dealing with all sources at a "lowest common denominator". This paper explores a third approach, in which a mediator ensures that sources receive queries they can handle, while still taking advantage of all the query power of the source. We propose an architecture that enables this, and identify a key component of that architecture, the Capabilities-Based Rewriter We provide a language to describe the query capability of data sources. The CBR takes as input a description of each data sources' capability. Given a query posed to the mediator, the CBR determines the component queries to be sent to the sources, commensurate with their It also computes a plan for combining the results of the component queri

142. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/oem.ps
TitleObject Exchange Across Heterogeneous Information Sources
AuthorsY. Papakonstantinou , H. Garcia-Molina , J. Widom
Year1994
CitationTechnical report; shortened version appeared in ICDE '95; one cut-and-paste figure missing from ps file
Keywordsdata model, semistructured
AbstractWe address the problem of providing integrated access to diverse and dynamic information sources. We explain how this problem differs from the traditional database integration problem and we focus on one aspect of the information integration problem, namely information exchange. We define an object-based information exchange model and a corresponding query language that we believe are well suited for integration of diverse information sources. We describe how the model and language have been used to integrate heterogeneous bibliographic information sources. We also describe two general-purpose libraries we have implemented for object exchange between clients and servers.

143. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/querying-full.ps
TitleQuerying Semistructured Heterogeneous Information
AuthorsD. Quass , A. Rajaraman , Y. Sagiv , J. Ullman , J. Widom
Year1994
CitationFull version of querying.ps
KeywordsLORE, tsimmis, semistructured, object-oriented
AbstractSemistructured data has no absolute schema fixed in advance and its structure may be irregular or incomplete. Such data commonly arises in sources that do not impose a rigid structure (such as the World-Wide Web) and when data is combined from several heterogeneous sources. Data models and query languages designed for well structured data are inappropriate in such environments. Starting with a "lightweight" object model adopted for the TSIMMIS project at Stanford, in this paper we describe a query language and object repository designed specifically for semistructured data. Our language provides meaningful query results in cases where conventional models and languages do not: when some data is absent, when data does not have regular structure, when similar concepts are represented using different types, when heterogeneous sets are present, and when object structure is not fully known. This paper motivates the key concepts behind our approach, describes the language through a series of

144. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/querying.ps
TitleQuerying Semistructured Heterogeneous Information
AuthorsD. Quass , A. Rajaraman , Y. Sagiv , J. Ullman , J. Widom
Year1995
CitationAppeared in the Proceedings of the 4th International Conference on Deductive and Object-Oriented Databases (DOOD) 1995
KeywordsLORE, tsimmis, semistructured, object-oriented
AbstractSemistructured data has no absolute schema fixed in advance and its structure may be irregular or incomplete. Such data commonly arises in sources that do not impose a rigid structure (such as the World-Wide Web) and when data is combined from several heterogeneous sources. Data models and query languages designed for well structured data are inappropriate in such environments. Starting with a "lightweight" object model adopted for the TSIMMIS project at Stanford, in this paper we describe a query language and object repository designed specifically for semistructured data. Our language provides meaningful query results in cases where conventional models and languages do not: when some data is absent, when data does not have regular structure, when similar concepts are represented using different types, when heterogeneous sets are present, and when object structure is not fully known. This paper motivates the key concepts behind our approach, describes the language through a series of

145. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/self-maint.ps
TitleMaking Views Self-Maintainable for Data Warehousing
AuthorsD. Quass , A. Gupta , I. Mumick , J. Widom
Year1996
CitationFull version of pub/quass/1996/self-maint-pdis.ps
Keywordsdata warehousing, view
AbstractA data warehouse stores materialized views over data from one or more sources in order to provide fast access to the integrated data, regardless of the availability of the data sources. Warehouse views need to be maintained in response to changes to the base data in the sources. Except for very simple views, maintaining a warehouse view requires access to data that is not available in the view itself. Hence, to maintain the view, one either has to query the data sources or store auxiliary data in the warehouse. We show that by using key and referential integrity constraints, we often can maintain a select-project-join view when there are insertions, deletions, and updates to the base relations without going to the data sources or replicating the base relations in their entirety in the warehouse. We derive a set of auxiliary views such that the warehouse view and the auxiliary views together are self-maintainable--they can be maintained without going to the data sources or replicating

146. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/self-maint-pdis.ps
TitleMaking Views Self-Maintainable for Data Warehousing
AuthorsD. Quass , A. Gupta , I. Mumick , J. Widom
Year1996
CitationTo appear in Parallel and Distributed Information Systems (PDIS) 1996; shortened proceedings version of pub/quass/1996/self-maint.ps
Keywordsdata warehousing, view
AbstractA data warehouse stores materialized views over data from one or more sources in order to provide fast access to the integrated data, regardless of the availability of the data sources. Warehouse views need to be maintained in response to changes to the base data in the sources. Except for very simple views, maintaining a warehouse view requires access to data that is not available in the view itself. Hence, to maintain the view, one either has to query the data sources or store auxiliary data in the warehouse. We show that by using key and referential integrity constraints, we often can maintain a select-project-join view without going to the data sources or replicating the base relations in their entirety in the warehouse. We derive a set of auxiliary views such that the warehouse view and the auxiliary views together are self-maintainable--they can be maintained without going to the data sources or replicating all base data. In addition, our technique can be applied to simplify tra

147. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/sm.ps
TitleMaking Views Self-Maintainable for Data Warehousing
AuthorsD. Quass , A. Gupta , I. Mumick , J. Widom
Year1996
CitationTechnical Report. Earlier version of pub/quass/1996/self-maint.ps that takes a slightly different approach.
Keywordswarehousing, materialized, view
AbstractA data warehouse stores materialized views over data from one or more sources in order to provide fast access to the integrated data, regardless of the availability of the data sources. Warehouse views need to be maintained in response to changes to the base data in the sources. Except for very simple views, maintaining a warehouse view requires access to data that is not available the view itself. Hence, to maintain the view, one either has to query the data sources or store auxiliary data in the warehouse. We show that by using key and referential integrity constraints, we often can maintain a select-project-join view with respect to insertions, deletions, and updates to the base relations, without going to the data sources or replicating the base relations in their tirety in the warehouse. We derive a set of auxiliary views such that the warehouse view and the auxiliary views together are self-maintainable--they can be maintained without going to the data sources or replicating all

148. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/online.ps
TitleOn-Line Warehouse View Maintenance for Batch Updates
AuthorsD. Quass , J. Widom
Year1996
CitationTechnical note
Keywordswarehousing, materialized, view
AbstractData warehouses store materialized views over base data from external sources. Clients typically perform complex read-only queries on the views. The views are refreshed periodically by maintenance transactions, which propagate large batch updates from the base tables. In current warehousing systems, maintenance transactions usually are isolated from client read activity, limiting availability and/or size of the warehouse. We describe an algorithm called 2VNL that allows warehouse maintenance transactions to run concurrently with readers. By logically maintaining two versions of the database, no locking is required and serializability is guaranteed. We present our algorithm, explain its relationship to other multi-version concurrency control algorithms, and describe how it can be implemented on top of a conventional relational DBMS using a query rewrite approach.

149. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/view-aggr.ps
TitleMaintenance Expressions for Views with Aggregation
AuthorsD. Quass
Year1996
CitationAppeared in Views 96 workshop, June 1996
Keywordswarehousing, materialized, view, aggregation
AbstractMaterialized views, especially views involving aggregation, are often used in environments such as data warehouses to speed up the evaluation of complex queries. Because the views can be quite complex, as changes are made to the underlying base data it is usually better to incrementally maintain a view by propagating changes to base data onto the view than to recompute the view from scratch. Griffn and Libkin [GL95] provide a framework for deriving incremental view maintenance expressions for a view with duplicate semantics, where the view can be defined using any number of select, project, join, bag union, and monus bag-algebra operators. However, aggregation was considered only in a limited sense; for example, aggregation with groupby attributes was not allowed. We extend the framework of [GL95] to include maintaining views with aggregation in the general case, where aggregation can include groupby attributes and a view can include several aggregate operators.

150. (matched words:2)
URLhttp://www-db.stanford.edu/pub/papers/lore-demo.ps
TitleLORE: A Lightweight Object REpository for Semistructured Data
AuthorsD. Quass , J. Goldman , K. Haas , Q. Luo , J. McHugh , S. Nestorov , A. Rajaraman , H. Rivero , S. Abiteboul , J. Ullman , J. Wiener
Year1996
CitationPage accompanying demo appearing in SIGMOD '96
KeywordsLORE, tsimmis, semistructured, object-oriented
AbstractLORE: A Lightweight Object REpository for Semistructured Data Dallan Quass, Jennifer Widom, Roy Goldman, Kevin Haas, Qingshan Luo, Jason McHugh, Svetlozar Nestorov, Anand Rajaraman, Hugo Rivero, Serge Abiteboul, Jeff Ullman, Janet Wiener Stanford University Database Group, http://db.stanford.edu The number of information sources accessible electronically is growing rapidly. Many of these sources store and export unstructured data in addition to or instead of structured data. In most cases, however, the unstructured data is not entirely devoid of structure, i.e., the data is semistructured. We consider data to be semistructured when there is no schema fixed or known in advance and when the data may be incomplete or irregular. For example, HTML files on the World-Wide Web usually contain some structure, but often the data is irregular or In addition, data integrated from multiple, heterogeneous information sources often is semistructured. Storing and querying semistructured data poses

151. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/limited-opsets.ps
TitleAnswering Queries using Templates with Binding Patterns
AuthorsA. Rajaraman , Y. Sagiv , J. Ullman
Year1995
CitationAppears in PODS 1995.
Keywords
AbstractWhen integrating heterogeneous information resources, it is often the case that the source is rather limited in the kinds of queries it can answer. If a query is asked of the entire system, we have a new kind of optimization problem, in which we must try to express the given query in terms of the limited query templates that this source can answer. For the case of conjunctive queries, we show how to decide with a nondeterministic polynomial-time algorithm whether the given query can be answered. We then extend our results to allow arithmetic comparisons in the given query and in the templates. I. Motivation A data-integration system such as Tsimmis (Papakonstantinou, Garcia, and Widom [1994], Chawathe et al. [1994]) translates information sources of arbitrary type into a common data model and language. If a source is an SQL database, then its interface with the Tsimmis system is fairly clear: one can ask it SQL queries over its database schema and nothing else. However, the source may

152. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/outerjoin.ps
TitleIntegrating Information by Outerjoins and Full Disjunctions
AuthorsA. Rajaraman , J. Ullman
Year1996
CitationAppears in PODS 1996.
Keywords
AbstractOur motivation is the piecing together of tidbits of information found on the "web" into a usable information structure. The problem is related to that of computing the natural outerjoin of many relations in a way that preserves all possible connections among facts. Such a computation has been termed a "full disjunction" by Galindo-Legaria. We are thus led to ask the question of when a full disjunction can be computed by some sequence of natural outerjoins. The answer involves a concept of from Fagin [1983] called fl-acyclic hypergraphs." We prove that there is a natural outerjoin sequence producing the full disjunction if and only if the set of relation schemes forms a connected, fl-acyclic hypergraph. I. Motivation Let us imagine we are constructing an information resource that accepts queries about university information. We gather our information from the on-line information found at the various univand their schools or departments, and we integrate the information into an object

153. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/outerjoin-full.ps
TitleIntegrating Information by Outerjoins and Full Disjunctions
AuthorsA. Rajaraman , J. Ullman
Year1996
CitationFull version of PODS 1996 paper (includes proofs omitted in the PODS '96 version).
Keywords
AbstractOur motivation is the piecing together tidbits of information found on the "web" into a usable information structure. The problem is related to that of computing the natural outerjoin of many relations in a way that preserves all possible connections among facts. Such a computation has been termed a "full disjunction" by GalindoLegaria. We are thus led to ask the question of when a full disjunction can be computed by some sequence of natural outerjoins. The answer involves a concept of from Fagin [1983] called fl-acyclic hypergraphs." We prove that there is a natural outerjoin sequence producing the full disjunction if and only if the set of relation schemes forms a connected, fl-acyclic hypergraph. I. Motivation Let us imagine we are constructing an information resource that accepts queries about university information. We gather our information from the on-line information found at the various universities and their schools or departments, and we integrate the information into an ob

154. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/deductive-survey.ps
TitleA Survey of Research in Deductive Database Systems
AuthorsR. Ramakrishnan , J. Ullman
Year1995
CitationAppears in J. Logic Programming, May, 1995, pp. 125-149
Keywords
AbstractThe area of deductive databases has matured in recent years, and it now seems appropriate to reflect upon what has been achieved and what the future holds. In this paper, we provide an overview of the area and briefly describe a number of projects that have led to implemented systems.

155. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/dlmag.ps
TitleThe SCAM Approach To Copy Detection in Digital Libraries
AuthorsN. Shivakumar , H. Garcia-Molina
Year1995
CitationD-lib Magazine, November 1995
KeywordsSCAM, Copy detection, Plagiarism, Copyright
Abstract The SCAM Approach to Copy Detection inDigital Libraries Narayanan Shivakumar and Hector Garcia-Molina Department of Computer ScienceStanford UniversityStanford, CA 94305, U.S.A{shiva, hector}@cs.stanford.edu Scenario 1 Your local publishing company Books'R'Us decides to publish on the Internet its latest book in aneffort to cut down on printing costs and book distribution expenses. Customers pay for the digitalbooks using sophisticated electronic payment mechanisms such as DigiCash, First Virtual orInterPay. When the payment is received, the book distribution server at Books'R'Us sends adigital version of the book electronically to the paying customer. Books'R'Us expects to makehigher profits on the digital book due to lower production and distribution costs, and largermarkets on the Internet. It turns out, however, that very few books are sold since digital copies of the Books'R'Us bookhad been widely circulated on UseNet newsgroups, bulletin boards, and had been available for free

156. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/icmcs95.ps
TitleThe Concord Algorithm for Synchronization of Networked Multimedia Streams
AuthorsN. Shivakumar , C. Sreenan , B. Narendran , P. Agrawal
Year1995
Citation2nd IEEE International Conference on Multi-media Computing and Systems'95 (ICMCS'95), pp. 31 - 40.
KeywordsJitter-free, Multimedia, Packet networks
AbstractSynchronizing different data streams from multiple sources simultaneously at a receiver is one of the basic problems involved in multimedia distributed systems. This requirement stems from the nature of packet based networks which can introduce end-to-end delays that vary both within and across streams. In this paper, we present a new algorithm called Concord, which provides an integrated solution for these single and multiple stream synchronization problems. It is notable because it defines a single framework to deal with both problems, and operates under the influence of parameters which can be supplied by the application involved. In particular, these parameters are used to allow a trade-off between the packet loss rates, total end-to-end delay and skew for each of the streams. F or applications like conferencing this is used to reduce delay by determining the minimum buffer delay/size required.

157. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/replication.ps
TitleUser Profile Replication for Faster Location Lookup in Mobile Environments
AuthorsN. Shivakumar , J. Widom
Year1995
CitationProceedings of 1st ACM International Conference on Mobile Computing and Networking (MCN'95), November 1995, Berkeley.
KeywordsWireless, Replication, Location Lookup
AbstractWe consider per-user profile replication as a mechanism for faster location lookup of mobile users in a Personal Communications Service system. We present a minimum-cost maximum-flow based algorithm to compute the set of sites at which a user profile should be replicated given known calling and user mobility patterns. We then present schemes for replication plans that gracefully adapt to changes in the calling and mobility patterns.

158. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/scam.ps
TitleSCAM: A Copy Detection Mechanism for Digital Documents
AuthorsN. Shivakumar , H. Garcia-Molina
Year1995
CitationProceedings of 2nd International Conference in Theory and Practice of Digital Libraries (DL'95), June 11 - 13, Austin, Texas.
KeywordsSCAM, Copy detection, Plagiarism, Copyright
AbstractCopy detection in Digital Libraries may provide the necessary guarantees for publishers and newsfeed services to offer valuable on-line data. We consider the case for a registration server that maintains registered documents against which new documents can be checked for overlap. In this paper we present a new scheme for detecting copies based on comparing the word frequency occurrences of the new document against those of registered documents. We also report on an experimental comparison between our proposed scheme and COPS [6], a detection scheme based on sentence overlap. The tests involve over a million comparisons of netnews articles and show that in general the new scheme pbetter in detecting documents that have partial overlap. Keywords: Copy Detection, Plagiarism, Registration Ser-ver, Databases.

159. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/profile-journ.ps
TitlePer-User Profile Replication in Mobile Environments: Algorithms, Analysis, and Simulation Results
AuthorsN. Shivakumar , J. Jannink , J. Widom
Year1995
CitationUnpublished manuscript
KeywordsWireless, Replication, Location Lookup
AbstractWe consider per-user profile replication as a mechanism for faster location lookup of mobile users in a Personal Communications Service system. We present a minimum-cost maximum-flow based algorithm to compute the set of sites at which a user profile should be replicated given known calling and user mobility patterns. We then present schemes for replication plans that gracefully adapt to changes in the calling and mobility patterns. We show the costs and benefits of our replication algorithm against previous location lookup approaches through analysis. We also simulate our algorithm against other location lookup algorithms on a realistic model of a geographical area to evaluate critical system performance measures. A notable aspect of our simulations is that we use well-validated models of user calling and mobility patterns.

160. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/performance.ps
TitleBuilding a Scalable and Accurate Copy Detection Mechanism
AuthorsN. Shivakumar , H. Garcia-Molina
Year1996
CitationProceedings of 1st ACM International Conference on Digital Libraries (DL'96) March 1996, Bethesda Maryland.
KeywordsSCAM, Copy detection, Plagiarism, Copyright
AbstractOften, publishers are reluctant to offer valuable digital documents on the Internet for fear that they will be re-transmitted or copied widely. A Copy Detection Mechanism can help identify such copying. For example, publishers may register their documents with a copy detection server, and the server can then automatically check public sources such as UseNet articles and Web sites for potential illegal copies. The server can search for exact copies, and also for cases where significant portions of documents have been copied. In this paper we study, for the first time, the performance of various copy detection mechanisms, including the disk storage requirements, main memory requirements, response times for registration, and response time for querying. We also contrast performance to the accuracy of the mechanisms (how well they detect partial cThe results are obtained using SCAM, an experimental server we have implemented, and a collection of 50,000 netnews articles.

161. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/huffman.ps
TitleTime-and-Energy Efficient Wireless Broadcasting
AuthorsN. Shivakumar , S. Venkatasubramanian
Year1996
CitationACM Journal of Wireless and Nomadic Applications (NOMAD)
KeywordsHuffman, Wireless, Indexing, Energy, Power, Broadcasting
AbstractWe consider the application of high volume information dissemination in broadcast based mobile environments. Since current mobile units accessing broadcast information have limited battery capacity, the problem of quick and energy-effcient access to data becomes particularly relevant as the number and sizes of information units increases. We propose several randomized and Huffman-encoding based indexing schemes that are sensitive to data popularity patterns to structure data transmission on the wireless medium, so that the average energy consumption of mobile units is minimized while trying to access desired data. We then propose an algorithm for PCS units to tune into desired data independent of the actual transmission scheme being used. We also empirically study the proposed schemes and propose different transmission modes for the base station to dynamically adapt to changes in the number of data files to be broadcasted, the available bandwidth and the accuracy of data popularity pa

162. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/wave.ps
TitleWave-Indices: Indexing Evolving Databases
AuthorsN. Shivakumar , H. Garcia-Molina
Year1997
CitationTo appear in ACM International Conference on Management of Data (SIGMOD'97)
Keywordsindexing, sliding windows, SCAM, search engines
AbstractIn many applications, new data is being generated every day. Often an index of the data of a past window of days is required to answer queries effciently. For example, in a warehouse one may need an index on the sales records of the last week for effcient data mining, or in a Web service one may provide an index of Netnews articles of the past month. In this paper, we propose a variety of wave indices where the data of a new day can be effciently added, and old data can be quickly expired, to maintain the required window. We compare these schemes based on several system performance measures, such as storage, query response time, and maintenance work, as well as on their simplicity and ease of coding.

163. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/filter.ps
TitleFiltering Expensive Predicates
AuthorsN. Shivakumar , C. Chekuri , H. Garcia-Molina
Year1997
CitationSubmitted for publication.
KeywordsExpensive Predicates, SCAM, multi-media databases, extensible databases
AbstractIn many applications, databases are supporting increasingly more complex and expensive predicates, e.g., to find patterns in images, to check for containment of points in regions or to detect textual similarity. To reduce the expense of running the expensive predicate on all tuples in a database, application designers often define simpler and computationally cheaper predicates that "approximate" the complex predicate. For instance, if the complex predicate is to look for daisies in an image, the approximate predicate may look for a substantial yellow component in the image. The approximate predicate is then applied to all tuples, and only items that pass through the approximate predicate are operated on by the expensive predicate. In this paper we propose a general framework for expressing approximate predicates and composing given approximate tests to filter input to expensive predicates. For certain special cases, we also propose algorithms that construct provably "good" filters for

164. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/sigir.94.ps
TitleSynthetic Workload Performance Analysis of Incremental Updates
AuthorsK. Shones , A. Tomasic , H. Garcia-Molina
Year1994
CitationSIGIR 94. A part of Stanford University Department of Computer Science Technical Report Number STAN-CS-TN-93-1.
Keywords
AbstractDeclining disk and CPU costs have kindled a renewed interest in effcient document indexing techniques. In this paper, the problem of incremental updates of inverted lists is addressed using a dual-structure index data structure that dynamically separates long and short inverted lists and optimizes the retrieval, update, and storage of each type of list. The behavior of this index is studied with the use of a synthetically-generated document collection and a simulation model of the algorithm. The index structure is shown to support rapid insertion of documents, fast queries, and to scale well to large document collections and many disks.

165. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/mmcn.ps
TitleInternet Synchronization with Concord
AuthorsC. Sreenan , B. Narendran , P. Agrawal , N. Shivakumar
Year1996
CitationProceedings of Multimedia Computing and Networking (MMCN'96), San Jose, California, Jan 1996.
KeywordsMultimedia, Jitter-free, Packet networks
AbstractUsing packet networks to transport multimedia introduces delay v ariations within and across streams, necessitating synchronization at the receiver. This requires stream data to be buffered prior to presentation, which also increases its total end to end delay . Concord recognizes that applications may wish to influence the underlying synchronization policy in terms of its effect on quality of service It provides a single framework for synchronization within and across streams and employs an application specified tradeoff between packet losses, delay and inter-stream skew. This paper presents a new predictive approach for synchronization and a selection of results from an extensive ev aluation of Concord for use in the Internet. A trace driven simulator is used, allowing a direct comparison with alternative approaches. A conclusion is that Concord can operate with lower maximum and v ariation in total end to end delay , which in turn can allow receiver buffer requirements to be reduce

166. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/stan.cs.92.1434.ps
TitlePerformance of Inverted Indices in Distributed Text Document Retrieval Systems
AuthorsA. Tomasic , H. Garcia-Molina
Year1992
CitationStanford University Department of Computer Science Technical Report Number STAN-CS-92-1434. Short version of paper appears in PDIS '93. Long version of paper appears in VLDB Journal '93.
Keywords
AbstractThe performance of distributed text document retrieval systems is strongly influenced by the organization of the inverted index. This paper compares the performance impact on query processing of various physical organizations for inverted lists. We present a new probabilistic model of the database and queries. Simulation experiments determine which variables most strongly influence response time and throughput. This leads to a set of design trade-offs over a wide range of hardware configurations and new parallel query processing strategies.

167. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/sigmod.93.ps
TitleCaching and Database Scaling in Distributed Shared-Nothing Information Retrieval Systems
AuthorsA. Tomasic , H. Garcia-Molina
Year1993
CitationSIGMOD '93
Keywords
AbstractA common class of existing information retrieval system provides access to abstracts. For example Stanford University, through its FOLIO system, provides access to the INSPEC database of abstracts of the literature on physics, computer science, electrical engineering, etc. In this paper this database is studied by using a trace-driven simulation. We focus on physical index design, inverted index caching, and database scaling in a distributed shared-nothing system. All three issues are shown to have a strong effect on response time and throughput. Database scaling is explored in two ways. One way assumes an "optimal" configuration for a single host and then linearly scales the database by duplicating the host architecture as needed. The second way determines the optimal number of hosts given a fixed database size.

168. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/pdis.93.ps
TitlePerformance of Inverted Indices in Shared-Nothing Distributed Text Document Information Retrieval Systems
AuthorsA. Tomasic , H. Garcia-Molina
Year1993
CitationPDIS '93 (best paper award).
Keywords
AbstractThe performance of distributed text document retrieval systems is strongly influenced by the organization of the inverted index. This paper compares the performance impact on query processing of various physical organizations for inverted lists. We present a new probabilistic model of the database and queries. Simulation experiments determine which variables most strongly influence response time and throughput. This leads to a set of design trade-offs over a range of hardware configurations and new parallel query processing strategies.

169. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/stan.cs.tn.93.3.ps
TitleCorrect View Update Translations via Containment
AuthorsA. Tomasic
Year1993
CitationStanford report
Keywords
AbstractOne approach to the view update problem for deductive databases proves properties of translations that is, a language specifies the meaning of an update to the intensional database (IDB) in terms of updates to the extensional database We argue that the view update problem should be viewed as a question of the expressive power of the translation language and the computational cost of demonstrating properties of a translation. We use an active rule based database language as a means of specifying translations of updates on the IDB into updates on the EDB. This paper uses the containment of one datalog program (or conjunctive query) by another to demonstrate that a translation is semantically correct. We show that the complexity of correctness is lower for insertion than deletion. Finally, we discuss extension to the translation language.

170. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/iclpwdd.94.ps
TitleDetermining Correct View Update Translations via Query Containment
AuthorsA. Tomasic
Year1994
CitationAppears in the International Conference on Logic Programming Workshop on Deductive Databases, 1994
Keywords
AbstractGiven an intensional database (IDB) and an extension database the view update problem translates updates on the IDB into updates on the EDB. One approach to the view update problem uses a translation langauge to specify the meaning of a view update. In this paper we prove properties of a translation language. This approach to the view update problem studies the expressive power of the translation language and the computational cost of demonstrating properties of a translation. We use an active rule based database language for specifying translations of view updates. This paper uses the containment of one datalog program (or conjunctive query) by another to demonstrate that a translation is semantically correct. We show that the complexity of correctness is lower for insertion than deletion. Finally, we discuss extensions to the translation language.

171. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/sigmod.94.ps
TitleIncremental Updates of Inverted Lists for Text Document Retrieval
AuthorsA. Tomasic , H. Garcia-Molina , K. Shoens
Year1994
CitationSIGMOD 94. A part of Stanford University Department of Computer Science Technical Report Number STAN-CS-TN-93-1.
Keywords
AbstractWith the proliferation of the orld's "information highways" a renewed interest in effcient document indexing techniques has come about. In this paper, the problem of incremental updates of inverted lists is addressed using a new dual-structure index data structure. The index dynamically separates long and short inverted lists and optimizes the retrieval, update, and storage of each type of list. To study the behavior of the index, a space of engineering tradeoffs which range from optimizing update time to optimizing query performance is described. We quantitatively explore this space by using actual data and hardware in combination with a simulation of an information retrieval system. We then describe the best algorithm for a variety of criteria.

172. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/ibm_rj.ps
TitleData Structures for Efficient Broker Implementation
AuthorsA. Tomasic , L. Gravano , C. Lue , P. Schwarz , L. Haas
Year1995
CitationTechnical report, IBM Almaden Research Center, June 1995
Keywords
AbstractWith the profusion of text databases on the Internet, it is becoming increasingly hard to find the most useful databases for a given query. To attack this problem, several existing and proposed systems employ brokers to direct user queries, using a local database of summary information about the available databases. This summary information must effectively distinguish relevant databases, and must be compact while allowing effcient access. We offer evidence that one broker, GlOSS, can be effective at locating databases of interest even in a system of hundreds of databases, and examine the performance of accessing the GlOSS summaries for two promising storage methods: the grid file and partitioned hashing. We show that both methods can be tuned to provide good performance for a particular workload (within a broad range of wand discuss the tradeoffs between the two data structures. As a side effect of our work, we show that grid files are more broadly applicable than previously thought;

173. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/tois96.ps
TitleData Structures for Efficient Broker Implementation
AuthorsA. Tomasic , L. Gravano , C. Lue , P. Schwarz , L. Haas
Year1996
CitationAppeared in TOIS '96; journal version of pub/gravano/1995/ibm_rj.ps, technical report, IBM Almaden Research Center, June 1995
Keywords
AbstractWith the profusion of text databases on the Internet, it is becoming increasingly hard to find the useful databases for a given query. To attack this problem, several existing and proposed systems employ brokers to direct user queries, using a local database of summary information about the available databases. This summary information must effectively distinguish relevant databases, and must be compact while allowing effcient access. We offer evidence that one broker, GlOSS, can be effective at locating databases of interest even in a system of hundreds of databases, and examine the performance of accessing the GlOSS summaries for two promising storage methods: the grid file and partitioned hashing. We show that both methods can be tuned to provide good performance for a particular workload (within a broad range of wand discuss the tradeoffs between the two data structures. As a side effect of our work, we show that grid files are more broadly applicable than previously thought; in p

174. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/negation.ps
TitleAssigning an Appropriate Meaning to Database Logic With Negation
AuthorsJ. Ullman
Year1994
CitationA corrected version of a paper that appeared in "Computers as Our Better Partners" (H. Yamada, Y. Kambayashi, and S. Ohta, eds.) pp. 216--225, World Scientific, Singapore, 1994.
Keywords
AbstractDeductive database systems -- that is, database systems with a query language based on logical rules -- must allow negated subgoals in rules to express an adequate range of queries. Adherence to classical deductive logic rarely offers the intuitively correct meaning of the rules. Thus, a variety of approaches to defining the "right" meaning of such rules have been developed. In this paper we survey the principal approaches, including stratified negation, well-founded negation, stable-model semantics, and modularly stratified semantics. 1. Motivation Database query languages based on logic in some form are a promising trend in the development of modern database systems. Building on SQL, a language whose logical nature is hidden by the syntax, logical query languages preserve the declarative "say what you want, not how to get it" nature of SQL, while extending the expressiveness of the language beyond what SQL provides. See Ramakrishnan and Ullman [1993] for a survey of languages and sy

175. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/integration-using-views.ps
TitleInformation Integration Using Logical Views
AuthorsJ. Ullman
Year1996
CitationInvited paper for 1997 ICDT
Keywords
AbstractA number of ideas concerning information-integration tools can be thought of constructing answers to queries using views that represent the capabilities of information sources. We review the formal basis of these techniques, which are closely related to containment algorithms for conjunctive queries and/or Datalog programs. Then we compare the approaches taken by AT&T Labs' "Information Manifold" and the Stanford "Tsimmis" project in these terms. 1 Theoretical Background Before addressing information-integration issues, let us review some of the basic ideas concerning conjunctive queries, Datalog programs, and their containment. To begin, we use the logical rule notation from [Ull88]. Example

176. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/aaai.ps
TitleThe Database Approach to Knowledge Representation
AuthorsJ. Ullman
Year1996
CitationTo appear in AAAI Proceedings, 1996
Keywords
AbstractThe database theory community, centered around the PODS (Principles of Database Systems) conference has had a long-term interest in logic as a way to represent "data," "information," and "knowledge" (take your pick on the term -- it boils down to facts or atoms and rules, usually Horn The approach of this community has been "slow and steady," preferring to build up carefully from simple special cases to more general ideas, always paying attention to how effciently we can process queries and perform other operations on the facts and rules. A powerful theory has developed, and it is beginning to have some impact on applications, especially information-integration engines. Datalog The term Datalog has been coined to refer to Prolog-like rules without function symbols, treated as logic program. Unlike Prolog, however, the conventional leastfixed-point semantics of the rules is used whenever possible. Example 1: The rules for ancestors can be written as the following Datalog program. anc(X

177. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/techtransfer.ps
TitleAn Analysis of Factors Directing the Admission Process of Artificial Intelligence Technologies
AuthorsV. Vassalos , S. Venkatasubramanian
Year1995
Citation8th International Symposium on Artificial Intelligence, Monterrey, Mexico, Oct 1995.
KeywordsAI, industry, technology transfer
Abstract

178. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/query-cap.ps
TitleDescribing and Using Query Capabilities of Heterogeneous Sources
AuthorsV. Vassalos , Y. Papakonstantinou
Year1997
CitationTechnical Report
KeywordsInformation Integration, Heterogeneous Databases, Mediators
AbstractInformation integration systems have to cope with the different and limited query interfaces of the underlying information sources. First, the integration systems need descriptions of the query capabilities of each source, i.e., the set of queries supported by each source. Second, the integration systems need algorithms for deciding how a query can be answered given the capabilities of the sources. Third, they need to translate a query into the format that the source understands. We present two languages suitable for descriptions of query capabilities of sources and compare their expressive power. We also describe algorithms for deciding whether a query "matches" the description and show their application to the problem of translating user queries into source-specific queries and commands. Finally, we propose new improved algorithms for the problem of answering queries using these descriptions.

179. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/rule-language.ps
TitleSet-Oriented Production Rules in Relational Database Systems
AuthorsJ. Widom , S. Finkelstein
Year1990
CitationAppeared in SIGMOD '90; published while at IBM Almaden
Keywordsactive database, Starburst
AbstractWe propose incorporating a production rules facility into a relational database system. Such a facility allows definition of database operations that are automatically executed whenever certain conditions are met. In keeping with the set-oriented approach of relational data manipulation languages, our production rules are also set-oriented--they are triggered by sets of changes to the database and may perform sets of changes. The condition and action parts of our production rules may refer to the current state of the database as well as to the sets of changes triggering the rules. We define a syntax for production rule definition as an extension to SQL. A model of system behavior is used to give an exact semantics for production rule execution, taking into account externally-generated operations, selftriggering rules, and simultaneous triggering of multiple rules.

180. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/srs-implementation.ps
TitleImplementing Set-Oriented Production Rules as an Extension to Starburst
AuthorsJ. Widom , R. Cochrane , B. Lindsay
Year1991
CitationAppeared in VLDB '91; published while at IBM Almaden; one cut-and-paste figure missing from ps file
Keywordsactive database, implementation
AbstractThis paper describes the implementation of a set-oriented database production rule language proposed in earlier papers. Our implementation uses the extensibility features of the Starburst database system, and rule execution is fully integrated into database query and transaction processing.

181. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/srs-short.ps
TitleThe Starburst Rule System: Language Design, Implementation, and Applications
AuthorsJ. Widom
Year1992
CitationAppeared in IEEE Data Engineering Bulletin, Dec. 1992; published while at IBM Almaden; short overview, more detailed coverage of similar material appears in starburst-rule-system.ps
Keywordsactive database, overview
AbstractThis short paper provides an overview of the Starburst Rule System, a production rules facility integrated into the Starburst extensible database system. The rule language is based on arbitrary database state transitions rather than tupleor statement-level changes, yielding a clear and flexible execution semantics. The rule system was implemented rapidly using the extensibility features of Starburst; it is integrated into all aspects of query and transaction processing, including concurrency control, authorization, recovery, etc. Using the Starburst Rule System, we have developed a number of methods for automatically generating database rule applications, including integrity constraints, materialized views, deductive rules, and semantic heterogeneity.

182. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/denotational.ps
TitleA Denotational Semantics for the Starburst Production Rule Language
AuthorsJ. Widom
Year1992
CitationAppeared in SIGMOD Record, Sep. 1992; published while at IBM Almaden
Keywordsactive database, formal semantics
AbstractResearchers often complain that the behavior of database production rules is diffcult to reason about and understand, due in part to the lack of formal declarative semantics. It has even been claimed that database production rule languages inherently cannot be given declarative semantics, in contrast to, e.g., deductive database rule languages. In this short paper we dispute this claim by giving a denotational semantics for the Starburst database production rule language.

183. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/spectrum.ps
TitleDeductive and Active Databases: Two Paradigms or Ends of a Spectrum?
AuthorsJ. Widom
Year1993
CitationAppeared in RIDS '93; published while at IBM Almaden
Keywordssurvey
AbstractThis position paper considers several existing relational database rule languages with a focus on exploring the fundamental differences between deductive and active databases. We find that deductive and active databases do not form two discernible classes, but rather they delineate two ends of a spectrum of database rule languages. We claim that this spectrum corresponds to a notion of abstraction level, with deductive rule languages at a higher level and active rule languages at a lower level.

184. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/constraint-validity.ps
TitleValidating Constraints with Partial Information: Research Overview
AuthorsJ. Widom , A. Gupta , Y. Sagiv , J. Ullman
Year1994
CitationAppeared in DAISD '94; invited paper, similar material appears in PODS '94 paper by same authors
Keywordsintegrity
AbstractWe are interested in the problem of validating the consistency of integrity constraints when data is modified. In particular, we consider how constraints can be checked with only "partial information". Partial information may include: (1) the constraint specifications only, (2) the constraint specifications and the modified data, or (3) the constraint specifications, the modified data, and portions of the existing data. Methods for constraint checking with partial information can be much more effcient than traditional constraint checking methods (e.g. because work is done at compile time, or because less data is Partial information methods also enable constraint checking in where traditional constraint checking methods fail (e.g. in distributed environments where not all data is We explain how existing methods and results for query containment and for independence can be applied to problems (1) and (2) above, and we give an overview of our research into problem

185. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/warehouse-research.ps
TitleResearch Problems in Data Warehousing
AuthorsJ. Widom
Year1995
CitationAppeared in CIKM '95 (invited paper)
Keywordsresearch issues, overview, survey
AbstractThe topic of data warehousing encompasses architectures, algorithms, and tools for bringing together selected data from multiple databases or other information sources into a single repository, called a data warehouse, suitable for direct querying or analysis. In recent years data warehousing has become a prominent buzzword in the database industry, but attention from the database research community has been limited. In this paper we motivate the concept of a data warehouse, we outline a general data warehousing architecture, and we propose a number of technical issues arising from the architecture that we believe are suitable topics for exploratory research.

186. (matched words:2)
URLhttp://www-db.stanford.edu/pub/papers/jw-position.ps
TitleIntegrating Heterogeneous Databases: Lazy or Eager?
AuthorsJ. Widom
Year1996
CitationACM Computing Surveys 28A(4), December 1996, invited position paper
Keywordsoverview
AbstractIntegrating Heterogeneous Databases: Lazy or Eager? (Position Statement) Jennifer Widom Stanford University widom@db.stanford.edu http://db.stanford.edu/~widom Providing integrated access to multiple, distributed, heterogeneous, autonomous databases and other information sources is a topic that has been studied in the database research community for well over a decade. There has been a surge of work in the area recently, due primarily to increased demand from customers ("real" customers as well as funding Nevertheless, despite the longevity of the subfield and the current large population of researchers working in the area, no winning solution or even consensus of approach has emerged. In the research community, most approaches to solving the data integration problem are based very roughly on the following two-step process: 1. Accept a query, determine the appropriate set of information sources to answer the query, and generate the appropriate subqueries or commands for each informati

187. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/starburst-rule-system.ps
TitleThe Starburst Active Database Rule System
AuthorsJ. Widom
Year1996
CitationTo appear in IEEE TKDE, 1996; one cut-and-paste figure missing from ps file
Keywordsoverview
AbstractThis paper describes our development of the Starburst Rule System, an activdatabase rules facility integrated into the Starburst extensible relational database system at the IBM Almaden Research Center. The Starburst rule language is based on arbitrary database state transitions rather than tupleor statement-level changes, yielding a clear and flexible execution semantics. The rule system has been implemented completely. Its rapid implementation was facilitated by the extensibility features of Starburst, and rule management and rule processing is integrated into all aspects of database processing. Index terms: active database systems, database production rules, extensible database systems, expert database systems

188. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/fox.ps
TitleA Moose and a Fox Can Aid Scientists with Data Management Problems
AuthorsJ. Wiener , Y. Ioannidis
Year1993
CitationExtended version of paper in DBPL 1993. Also UW-Madison Tech Report CS-TR-1182, 1993.
Keywords
AbstractFox (Finding Objects of eXperiments) is the declarative query language for Moose (Modeling Objects Of Scientific Experimenan object-oriented data model at the core of a scientific experiment management system (EMS) being developed at Wisconsin. The goal of the EMS is to support scientists in managing their experimental studies and the data that are generated from them. Moose is unique among object-oriented data models in permitting sets to have relationships to classes other than their elements' class, in providing a construct for indexing collections by other collections, such as time series, and in distinguishing structural relationships from non-structural ones. Fox contains several new features necessary to manage experiments, such as support for associative element retrieval from (indexed) sets and highly expressive path expressions. Fox path expressions can traverse any relationship in the schema graph, including inheritance relationships, and in either direction of the relation

189. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/loading.ps
TitleBulk Loading into an OODB: A Performance Study
AuthorsJ. Wiener , J. Naughton
Year1994
CitationAppeared in VLDB 1994.
Keywords
AbstractObject-oriented database (OODB) users bring with them large quantities of legacy data (megabytes and even gigabIn addition, scientific OODB users continually generate new data. All this data must be loaded into the OODB. Every relational database system has a load utility , but most OODBs do not. The process of loading data into an OODB is complicated by inter-object references, or relationships, in the data. These relationships are expressed in the OODB as object identifiers, which are not known at the time the load data is generated; they may contain cycles; and there may be implicit system-maintained inverse relationships that must also be stored. W e introduce seven algorithms for loading data into an OODB that examine different techniques for dealing with circular and inverse relationships. W e present a performance study based on both an analytic model and an implementation of all seven algorithms on top of the Shore object repository . Our study demonstrates that it is ortant t

190. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/whips-arch.ps
TitleA System Prototype for Warehouse View Maintenance
AuthorsJ. Wiener , H. Gupta , W. Labio , Y. Zhuge , H. Garcia-Molina , J. Widom
Year1995
CitationAppeared in the Post-Sigmod Workshop on Materialized Views, June, 1996.
Keywords
AbstractA data warehouse collects and integrates data from multiple, autonomous, heterogeneous, sources. The warehouse effectively maintains one or more materialized views over the source data. In this paper we describe the architecture of the Whips prototype system, which collects, transforms, and integrates data for the warehouse. We show how the required functionality can be divided among cooperating distributed CORBA objects, providing both scalability and the flexibility needed for supporting different application needs and heterogeneous sources. The Whips prototype is a functioning system implemented at Stanford University and we provide preliminary performance results.

191. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/thesis.ps
TitleAlgorithms for Loading Object Databases
AuthorsJ. Wiener
Year1995
CitationPhD thesis from University of Wisconsin-Madison, July, 1995. Also UW-Madison Tech Report CS-TR-1278, 1995.
Keywords
Abstract

192. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/partitioned-list.ps
TitleOODB Loading Revisited: The Partitioned-List Approach
AuthorsJ. Wiener , J. Naughton
Year1995
CitationAppeared in VLDB 1995
Keywords
AbstractObject-oriented and object-relational databases (OODB) need to be able to load the vast quantities of data that OODB users bring to them. Loading OODB data is significantly more complicated than loading relational data due to the presence of relationships, or references, in the data; the presence of these relationships means that naive loading algorithms are slow to the point of being unusable. In our previous work, we presented the late-invsort algorithm, which performed significantly better than naive algorithms on all the data sets we tested. Unfortunately , further experimentation with the late-invsort algorithm revealed that for large data sets (ones in which a critical data structure of the load algorithm does not fit in the performance of late-invsort rapidly degrades to where it, too, is unusable. In this paper we propose a new algorithm, the partitioned-list algorithm, whose performance almost matches that of late-invsort for smaller data sets but does not degrade for large d

193. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/incremental.ps
TitleIncremental Loading of Object Databases
AuthorsJ. Wiener , J. Naughton
Year1996
CitationUnpublished manuscript, 1996.
Keywords
AbstractObject-oriented and object-relational databases (OODB) need to be able to load the vast quantities of data that OODB users bring to them. Loading OODB data is significantly more complicated than loading relational data due to the presence of relationships, or references, in the data. In our previous work, we presented algorithms for loading new objects that only share relationships with other new objects. However, it is frequently the case that new objects need to share relationships with objects already in database. In this paper we propose using queries within the load data file to identify the existing objects and suggest using parameterized functions to designate similar queries. We then propose a novel evaluation strategy for the queries that defers evaluation until all the queries can be evaluated together. All of the instantiations of a single query function can then be treated as a join between the parameters and the collection over which the function ranges, rather than evalu

194. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/sdi-bool-model.ps
TitleIndex Structures for Selective Dissemination of Information Under the Boolean Model
AuthorsT. Yan , H. Garcia-Molina
Year1993
Citationsubmitted to TODS (short version of STAN-CS-92-1454)
Keywords
AbstractThe number, size, and user population of bibliographic and full text document databases are rapidly growing. With a high document arrival rate, it becomes essential for users of such databases to have access to the very latest documents; yet the high document arrival rate also makes it diffcult for the users to keep themselves updated. It is desirable to allow users to subscribe profiles, i.e., queries that are constantly evaluated, so that they will be automatically informed of new additions that may be of interest. Such service is traditionally called Selective Dissemination of Information The high document arrival rate, the huge number of users, and the timeliness requirement of the service pose a challenge in achieving effcient SDI. In this paper, we propose several index structures for indexing profiles and algorithms that effciently match documents against large number of profiles. We also present analysis and simulations results to compare their performance under different scen

195. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/sdi-vector-model-tr.ps
TitleIndex Structures for Information Filtering Under the Vector Space Model
AuthorsT. Yan , H. Garcia-Molina
Year1993
CitationSTAN-CS-93-1494
Keywords
AbstractWith the ever increasing volumes of information generation, users of information systems are facing an information overload. It is desirable to support information filtering as a complement to traditional retrieval mechanism. The number of users, and thus profiles (representing users' long-term inhandled by an information filtering system is potentially huge, and the system has to process a constant stream of incoming information in a timely fashion. The effciency of the filtering process is thus an important issue. In this paper, we study what data structures and algorithms can be used to effciently perform large-scale information filtering under the vector space model, a retrieval model established as being effective. We apply the idea of the standard inverted index to index user profiles. We devise an alternative to the standard inverted index, in which we, instead of indexing every term in a profile, select only the significant ones to index. We evaluate their performance and show

196. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/sdi-vector-model.ps
TitleIndex Structures for Information Filtering Under the Vector Space ModelC
AuthorsT. Yan , H. Garcia-Molina
Year1993
CitationICDE '94 (short version of STAN-CS-93-1494)
Keywords
AbstractWith the ever increasing volumes of electronic information generation, users of information systems are facing an information overload. It is desirable to support information filtering as a complement to traditional retrieval mechanism. The number of users, and thus profiles (representing users' long-term interhandled by an information filtering system is potentially huge, and the system has to process a constant stream of incoming information in a timely fashion. The effciency of the filtering process is thus an important issue. In this paper, we study what data structures and algorithms can be used to effciently perform large-scale information filtering under the vector space model, a retrieval model established as being effective. We apply the idea of the standard inverted index to index user profiles. We devise an alternative to the standard inverted index, in which we, instead of indexing every term in a profile, select the significant ones to index. We evaluate their performance

197. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/sift.ps
TitleSIFT -- A Tool for Wide-Area Information Dissemination
AuthorsT. Yan
Year1994
CitationUSENIX'95
Keywords
AbstractThe dissemination model is becoming increasingly important in wide-area information system. In this model, the user subscribes to an information dissemination service by submitting profiles that describe his interests. He then passively receives new, filtered information. The Stanford Information Filtering Tool (SIFT) is a tool to help provide such service. It supports full-text filtering using well-known information retrieval models. The SIFT filtering engine implements novel indexing techniques, capable of processing large volumes of information against a large number of profiles. It runs on several major Unix platforms and is freely available to the public. In this paper we present SIFT's approach to user interest modeling and user-server communication. We demonstrate the processing capability of SIFT by describing a running server that disseminates USENET News. We present an empirical study of SIFT's performance, examining its main memory requirement and ability to scale with info

198. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/openodb-tm.ps
TitleIntegrating a Structured-Text Retrieval System with an Object- Oriented Database System
AuthorsT. Yan
Year1994
CitationVLDB'94 Proceedings
Keywords
AbstractWe describe the integration of a structured-text retrieval system (TextMachine) into an object-oriented database system (OpOur approach is a light-weight one, using the external function capability of the database system to encapsulate the text retrieval system as an external information source. Yet, we are able to provide a tight integration in the query language and processing; the user can access the text retrieval system using a standard database query language. The effcient and effective retrieval of structured text performed by the text retrieval system is seamlessly combined with the rich modeling and general-purpose querying capabilities of the database system, resulting in an integrated system with querying power beyond those of the underlying systems. The integrated system also provides uniform access to textual data in the text retrieval system and structured data in the database system, thereby achieving information fusion. We discuss the design and implementation of our p

199. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/dsdi.ps
TitleDistributed Selective Dissemination of Information
AuthorsT. Yan
Year1994
CitationPDIS'94
Keywords
AbstractTo help users cope with information overload, Selective Dissemination of Information (SDI) will increasingly become an important tool in wide area information systems. In an SDI service, users post their long term queries, called profiles, at some SDI servers and continuously receive new, filtered documents. To scale up with the volume of information and the size of user population, we need a distributed SDI service with multiple servers. In this paper we first address the key problem of how to replicate and distribute profiles and documents among SDI servers. We draw a parallel between distributed SDI and the well-studied replica control problem, adapt quorum-based protocols for use in distributed SDI, and compare the performances of the different protocols. Next we address another important problem, that of effcient document delivery mechanisms. We present and evaluate a practical scheme, called profile grouping, which exploits the geographical locality of users to cut down network

200. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/textjoin.ps
TitleJoin Queries with External Text Sources: Execution and Optimization Techniques
AuthorsT. Yan
Year1995
CitationSIGMOD'95
Keywords
AbstractText is a pervasive information type, and many applications require querying over text sources in addition to structured data. This paper studies the problem of query processing in a system that loosely integrates extensible database system and a text retrieval system. We focus on a class of conjunctive queries that include joins between text and structured data, in addition to selections over these two types of data. We adapt techniques from distributed query processing and introduce a novel class of join methods based on probing that is especially useful for joins with text systems, and we present a cost model for the various alternative query processing methods. Experimental results confirm the utility of these methods. The space of query plans is extended due to the additional techniques, and we describe an optimization algorithm for searching this extended space. The techniques we describe in this paper are applicable to other types of external data managers loosely integrated wi

201. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/infoclient.ps
TitleA Powerful Wide-Area Information Client
AuthorsT. Yan
Year1995
CitationAlso appeared in COMPCON'95
Keywords
AbstractWide-area information sharing is rapidly gaining momentum. However, current information systems provide inadequate functionality for users to effciently and effectively search and use the information made available. Users are faced with a vast information space apparently in disarray. Assuming current client-server information system architecture, we argue in this paper that more functionality is needed on the information client side. We describe an information client prototype that supports (1) the use of object technology to model the information space, (2) declarative querying as a complement to navigational search, (3) uniform access and integration of data from diverse sources, and (4) user-specified and transparent caching that is integrated with permanent local storage of remote data. We also identify the needed support from the server side to better achieve these capabilities.

202. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/dup.ps
TitleDuplicate Detection in Information Dissemination
AuthorsT. Yan
Year1995
CitationSubmitted for publication
Keywords
AbstractOur experience with the SIFT [YGM95] information dissemination system (in use by over 7,000 users daily) has identified an important and generic dissemination problem: duplicate information. In this paper we explain why duplicates arise, we quantify the problem, and we discuss why it impairs information dissemination. We then propose a Duplicate Removal Module (DRM) for an information dissemination system. The removal of duplicates operates on a per user, per document basis { each document read by a user generates a request, or a duplicate restraint. In wide-area environments, the number of restraints handled is very large. We consider the implementation of a DRM, examining alternative algorithms and data structures that may be used. We present a performance evaluation of the alternatives and answer important design questions such as: Which implementation is the best? With "best" scheme, how expensive will duplicate removal be? How much memory is required? How fast can restraints be p

203. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/yw-tempview.ps
TitleMaintaining Temporal Views Over Non-Historical Information Sources For Data Warehousing
AuthorsJ. Yang , J. Widom
Year1997
CitationSubmitted for conference publication
Keywords
AbstractAn important use of data warehousing is to provide temporal views over the history of source data that may itself be non-temporal. While recent work in view maintenance is applicable to data warehousing, only non-temporal views have been considered. In this paper, we introduce a framework for maintaining temporal views over non-historical information sources in a data warehousing environment. We describe an architecture for the temporal data warehouse that automatically maintains temporal views over non-temporal base relations, and allows users to ask temporal queries using these views. Because of the dimension of time, a materialized temporal view may need to be updated not only when base relations change, but also as time advances. We present incremental techniques to maintain temporal views for both cases, and outline an implementation of our approach in the WHIPS warehousing prototype at Stanford.

204. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/anomaly-short.ps
TitleView Maintenance in a Warehousing Environment
AuthorsY. Zhuge , H. Garcia-Molina , J. Hammer , J. Widom
Year1995
CitationTo appear in Proceedings of the ACM SIGMOD International Conference on Management of Data, San Jose, California, June 1995.
Keywords
AbstractA warehouse is a repository of integrated information drawn from remote data sources. Since a warehouse effectively implements materialized we must maintain the views as the data sources are updated. This view maintenance problem differs from the traditional one in that the view definition and the base data are now decoupled. We show that this decoupling can result in anomalies if traditional algorithms are applied. We introduce a new algorithm, ECA (for "Eager Compensating that eliminates the anomalies. ECA is based on previous incremental view maintenance algorithms, but extra "compensating" queries are used to eliminate anomalies. We also introduce two streamlined versions of ECA for special cases of views and updates, and we present an initial performance study that compares ECA to a view recomputation algorithm in terms of messages transmitted, data transferred, and I/O costs.

205. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/anomaly-full.ps
TitleView Maintenance in a Warehousing Environment
AuthorsY. Zhuge , H. Garcia-Molina , J. Hammer , J. Widom
Year1995
CitationTechnical Report, long version of the SIGMOD95 paper
Keywords
AbstractA warehouse is a repository of integrated information drawn from remote data sources. Since a warehouse effectively implements materialized views, we must maintain the views as the data sources are updated. This view maintenance problem differs from the traditional one in that the view definition and the base data are now decoupled. We show that this decoupling can result in anomalies if traditional algorithms are applied. We introduce a new algorithm, ECA (for "Eager Compensating that eliminates the anomalies. ECA is based on previous incremental view maintenance algorithms, but extra "compensating" queries are used to eliminate anomalies. We also introduce two streamlined versions of ECA for special cases of views and updates, and we presenan initial performance study that compares ECA to a view recomputation algorithm in terms of messages transmitted, data transferred, and I/O costs.

206. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/strobe-full.ps
TitleThe Strobe Algorithms for Multi-Source Warehouse Consistency
AuthorsY. Zhuge , H. Garcia-Molina , J. Wiener
Year1996
CitationTechnical Report, long version of the Strobe paper
Keywords
AbstractA warehouse is a data repository containing integrated information for effcient querying and analysis. Maintaining the consistency of warehouse data is challenging, especially if the data sources are autonomous and views of the data at the warehouse span multiple sources. Transactions containing multiple updates at one or more sources, e.g., batch updates, complicate the consistency problem. In this paper we identify and discuss three fundamental transaction processing scenarios for data warehousing. We define four levels of consistency for warehouse data and present a new family of algorithms, the Strobe family, that maintain consistency as the warehouse is updated, under the various warehousing scenarios. All of the algorithms are incremental and can handle a continuous and overlapping stream of updates from the sources. Our implementation shows that the algorithms are practical and realistic choices for a wide variety of update scenarios.

207. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/strobe.ps
TitleThe Strobe Algorithms for Multi-Source Warehouse Consistency
AuthorsY. Zhuge , H. Garcia-Molina , J. Wiener
Year1996
CitationPDIS 96
Keywords
AbstractA warehouse is a data repository containing integrated information for effcient querying and analysis. Maintaining the consistency of warehouse data is challenging, especially if the data sources are autonomous and views of the data at the warehouse span multiple sources. Transactions containing multiple updates at one or more sources, e.g., batch updates, complicate the consistency problem. In this paper we identify and discuss three fundamental transaction processing scenarios for data warehousing. We define four levels of consistency for warehouse data and present a new family of algorithms, the Strobe family, that maintain consistency as the warehouse is updated, under the various warehousing scenarios. All of the algorithms are incremental and can handle a continuous and overlapping stream of updates from the sources. Our implementation shows that the algorithms are practical and realistic choices for a wide variety of update scenarios.

208. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/mvc.ps
TitleMultiple View Consistency for Data Warehousing
AuthorsY. Zhuge , J. Wiener , H. Garcia-Molina
Year1996
CitationTechnical report
Keywords
AbstractA data warehouse stores integrated information from multiple distributed data sources. In effect, the warehouse stores materialized views over the source data. The problem of ensuring data consistency at the warehouse can be divided into two components: ensuring that each view reects a consistent state of the base data, and ensuring that multiple views are mutually consistent. In this paper we study the latter problem, that of guaranteeing multiple view consistency We identify and dene formally three layers of consistency for materialized views in a distributed environment. We present a scalable architecture for consistently handling multiple views in a data warehouse, which we have implemented in the WHIPS(WareHousing Information Project at Stanford) prototype. Finally, we develop simple, scalable, algorithms for achieving MVC at a warehouse.

209. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/dapd.ps
TitleConsistency Algorithms for Multi-Source Warehouse View Maintenance
AuthorsY. Zhuge , H. Garcia-Molina , J. Wiener
Year1997
CitationSubmitted to Distributed and Parallel Databases Journal (DAPD), Kluwer
Keywordsdata warehouse, data consistency, view maintenance
AbstractA warehouse is a data repository containing integrated information for effcient querying and analysis. Maintaining the consistency of warehouse data is challenging, especially if the data sources are autonomous and views of the data at the warehouse span multiple sources. Transactions containing multiple updates at one or more sources, e.g., batch updates, complicate the consistency problem. In this paper we identify and discuss three fundamental transaction processing scenarios for data warehousing. We define four levels of consistency for warehouse data and present a new family of algorithms, the Strobe family, that maintain consistency as the warehouse is updated, under the various warehousing scenarios. All of the algorithms are incremental and can handle a continuous and overlapping stream of updates from the sources. Our implementation shows that the algorithms are practical and realistic choices for a wide variety of update scenarios. Keywords: data warehouse, data consistency,

210. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/lagii.ps
TitleDatabase Research: Achievements and Opportunities into the 21st Century EDITORS: A. Silberschatz, M. Stonebraker, J. D. Ullman
Authors
Year1995
CitationA report of an NSF workshop on the future of database research held May, 1995. Appears in March 1996 Sigmod Record.
Keywords
AbstractDatabase Research: Achievements and Opportunities Into the 21st Century Avi Silberschatz, Mike Stonebraker, Jeff Ullman, editors Report of an NSF Workshop on the Future of Database Systems Research, May 26{27, 1995 1 1 Introduction In February of 1990, a group of database researchers met to examine the prospects for future database research efforts. The resulting report (Silberschatz et al. [1990]) was instrumental in bringing to the public's attention both the significance of prior database research and the number of challenging and important problems that lay ahead. We shall not repeat here the major points made in that report concerning the historical development of relational database systems and transaction management. Rather the reader is encouraged to consult either that report or an on-line document (Gray each of which discusses these and other historical achievements of database research. In May of 1995, a second workshop was convened to consider anew the prospects for databa

211. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/pub/ullman/1996/kdd.ps
TitleEfficient Implementation of Data Cubes via Materialized Views
Authors
Year
CitationAppears in KDD Proceedings 1996
Keywords
Abstract

212. (matched words:1)
URLhttp://www-db.stanford.edu/pub/papers/SelectionViews.ps
TitleSelection of Views to Materialize in a Data Warehouse
Authors
Year
CitationTo appear in ICDT,'97. Keywords: Views, data warehouse, materialized views, AND-OR graphs, selection algorithms
Keywords
AbstractA data warehouse stores materialized views of data from one or more sources, with the purpose of effciently implementing decisionsupport or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be maintained at the warehouse. The goal is to select an appropriate set of views that minimizes total query response time and the cost of maintaining the selected views, given a limited amount of resource, e.g., materialization time, storage space etc. In this article, we develop a theoretical framework for the general problem of selection of views in a data warehouse. We present competitive polynomial-time heuristics for selection of views to optimize total query response time, for some important special cases of the general data warehouse scenario, viz.: (i) an AND view graph, where each query/view has a unique evaluation, and (ii) an OR view graph, in which any view can be computed from any one of its related views, e.g., d

You can improve your result by using the Back botton of your web browser, or you can start a new search.



[Stanford University | Computer Science Dept | Database Group]
Qingshan Luo / qluo@cs.stanford.edu / Last updated on 9/30/96.