URL | http://www-db.stanford.edu/pub/papers/fqo.ps |
---|---|
Title | Fusion Query Optimization |
Authors | S. Abiteboul , H. Garcia-Molina , Y. Papakonstantinou , R. Yerneni |
Year | 1996 |
Citation | Unpublished manuscript |
Keywords | fusion queries, Internet databases, query optimization |
Abstract | Fusion queries search for information integrated from distributed, autonomous sources over the Internet. In this context, data is not cleanly fragmented as in traditional distributed databases, and the number of sources participating in a typical query is large. We investigate optimization techniques for fusion queries. First, we focus on a very wide class of plans that capture the spirit of plans usually considered in commercial systems. We show how to find, under various realistic scenarios, an optimal plan within this large class. We evaluate the performance of this optimal plan and of other plans selected by simpler optimizers to demonstrate the benefits of our approach. We provide additional heuristics that, by considering plans outside our target class of plans, yield further performance improvements. |
URL | http://www-db.stanford.edu/pub/papers/icdt97.semistructured.ps |
---|---|
Title | Querying Semi-Structured Data |
Authors | S. Abiteboul |
Year | 1996 |
Citation | ICDT, 1997 |
Keywords | |
Abstract | Querying Semi-Structured Data Serge Abiteboul ? INRIA-Rocquencourt Serge.Abiteboul@inria.fr 1 Introduction The amount of data of all kinds available electronically has increased dramatically in recent years. The data resides in different forms, ranging from unstructured data in file systems to highly structured in relational database systems. Data is accessible through a variety of interfaces including Web browsers, database query languages, application-specific interfaces, or data exchange formats. Some of this data is raw data, e.g., images or sound. Some of it has structure even if the structure is often implicit, and not as rigid or regular as that found in standard database systems. Sometimes the structure exists but has to be extracted from the data. Sometimes also it exists but we prefer to ignore it for certain purposes such as browsing. We call here semi-structured data this data that is (from a particular viewpoint) neither raw data nor strictly typed, i.e., not table-orient |
URL | http://www-db.stanford.edu/pub/papers/icdt97.www.ps |
---|---|
Title | Queries and Computation on the Web |
Authors | S. Abiteboul , V. Vianu |
Year | 1996 |
Citation | ICDT, 1997 |
Keywords | |
Abstract | The paper introduces a model of the Web as an infinite, semistructured set of objects. We reconsider the classical notions of genericity and computability of queries in this new context and relate them to styles of computation prevalent on the Web, based on browsing and searching. We revisit several well-known declarative query languages (first-order logic, Datalog, and Datalog with negation) and consider their computational characteristics in terms the notions introduced in this paper. In particular, we are interested in languages or fragments thereof which can be implemented by browsing, or by broand searching combined. Surprisingly, stratified and well-founded semantics for negation turn out to have basic shortcomings in this context, while inflationary semantics emerges as an appealing alternative. |
URL | http://www-db.stanford.edu/pub/papers/icdt97.correspondance.ps |
---|---|
Title | |
Authors | S. Abiteboul , S. Cluet , T. Milo |
Year | 1996 |
Citation | ICDT, 1997 |
Keywords | |
Abstract | Correspondence and T ranslation for Heterogeneous Data ? Serge Abiteboul 1 and Sophie Cluet 2 and Tova Milo 3 1 Stanford University, Standford, CA 94402 y 2 INRIA, Domaine de Voluceau, 78153 Le Chesnay, France 3 Tel Aviv University,Tel Aviv, Israel 1 Introduction A primary motivation for new database technology is to provide support for the broad spectrum of multimedia data available notably through the network. These data are stored under different formats: SQL or ODMG (in SGML or LaTex (documents), DX formats (scientific Step (CAD/CAM etc. Their integration is a very active field of research and development (see for instance, for a very small sample, [10, 6, 7, 9, 8, 12, 19, In this paper, we provide a formal foundation to facilitate the integration of such heterogeneous data and the maintenance of heterogeneous replicated data. A sound solution for a data integration task requires a clean abstraction of the different formats in which data are stored, and means for specifying the co |
URL | http://www-db.stanford.edu/pub/papers/pods96.tl.ps |
---|---|
Title | Temporal versus First-Order Logic to Query Temporal Databases |
Authors | S. Abiteboul , L. Herr , J. Bussche |
Year | 1996 |
Citation | PODS 96, 1996 |
Keywords | |
Abstract | A database history can be modeled as a (finite) sequence of instances discretely ordered by time. Similarly, the behavior of a system such as an operating system or a reactive system can be modeled by an infinite such sequence. One can view the sequence as one single database where each relation has an additional column holding the time instant of validity of each tuple. The temporal database can then be queried using standard relational calculus (first-order logic) on this "timestamp" representation. One may alternatively use an implicit access to time and express queries in temporal logic. It is known that these two approaches yield the same expressive power in the propositional case. Their comparison in the predicate/database context remained open. We prove here that there are first-order logic queries on the timestamp representation that are not expressible in (extended) temporal logic. The proof technique is novel and is based on communication complexity. On leave from INRIA-Rocq |
URL | http://www-db.stanford.edu/pub/papers/lorel96.ps |
---|---|
Title | The Lorel Query Language for Semistructured Data |
Authors | S. Abiteboul , D. Quass , J. McHugh , J. Widom , J. Wiener |
Year | 1996 |
Citation | to appear in the Journal on Digital Libraries |
Keywords | Lore, semistructured, unstructured, object-oriented, OQL, ODMG |
Abstract | We present the Lorel language, designed for querying semistructured data. Semistructured data is becoming more and more prevalent, e.g., in structured dots such as HTML and when performing simple integration of data from multiple sources. Traditional data models and query languages are inappropriate, since semistructured data often is irregular: some data is missing, similar concepts are represented using different types, heterogeneous sets are present, or object structure is not fully known. Lorel is a user-friendly language in the SQL/OQL style for querying such data effectively. For wide applicability, the simple object model underlying Lorel can be viewed as an extension of the ODMG data model and the Lorel language as an extension of OQL. The main novelties of the Lorel language are: (i) the extensive use of coercion to relieve the user from the strict typing of OQL, which is inappropriate for semistructured data; and (ii) powerful path expressions, which permit a flexible form o |
URL | http://www-db.stanford.edu/pub/papers/oemview97.ps |
---|---|
Title | Views for Semistructured Data |
Authors | S. Abiteboul , R. Goldman , J. McHugh , V. Vassalos , Y. Zhuge |
Year | 1997 |
Citation | Technical Report |
Keywords | Semistructured Databases, Heterogeneous Databases, Views |
Abstract | Defining a view over a semistructured database introduces many new problems. In this paper we propose a view specification language and consider the problem of answering queries posed over views. The two main approaches, query rewriting and view materialization, are outlined with focus on the diffcult problems caused by the semistructured nature of the data. |
URL | http://www-db.stanford.edu/pub/papers/cost.ps |
---|---|
Title | "Query Caching and Optimization in Mediator Systems" |
Authors | S. Adali , S. Candan , Y. Papakonstantinou , V. Subrahmanyan |
Year | 1995 |
Citation | ACM SIGMOD 96. |
Keywords | |
Abstract |
URL | http://www-db.stanford.edu/pub/papers/priority1.ps |
---|---|
Title | Emulating Soft Real-Time Scheduling Using Traditional Operating System Schedulers |
Authors | B. Adelberg , H. Garcia-Molina , B. Kao |
Year | 1994 |
Citation | Appeared in RTSS 1994 Technical note is in priority2.ps |
Keywords | |
Abstract | Real-time scheduling algorithms are usually only available in the kernels of real-time operating systems, and not in more general purpose operating systems, like Unix. For some soft real-time problems, a traditional operating system may be the development platform of choice. This paper addresses methods of emulating real-time scheduling algorithms on top of standard time-share schedulers. We examine (through simulations) three strategies for priority assignment within a traditional multi-tasking environment. The results show that the emulation algorithms are comparable in performance to the real-time algorithms and in some instances outperform them. Keywords: soft real-time, priority assignment, scheduling. |
URL | http://www-db.stanford.edu/pub/papers/updates1.ps |
---|---|
Title | Applying Update Streams in a Soft Real-Time Database System |
Authors | B. Adelberg , H. Garcia-Molina , B. Kao |
Year | 1994 |
Citation | Appeared in SIGMOD 1995 Technical note is in updates2.ps |
Keywords | |
Abstract | Many papers have examined how to effciently export a materialized view but to our knowledge none have studied how to effciently import one. To import a view, i.e., to install a stream of updates, a real-time database system must process new updates in a timely fashion to keep the database "fresh," but at the same time must process transactions and ensure they meet their time constraints. In this paper, we discuss the various properties of updates and views (including staleness) that affect this tradeoff. We also examine, through simulation, four algorithms for scheduling transactions and installing updates in a soft real-time database. Keywords: soft real-time, temporal databases, materialized views, updates. |
URL | http://www-db.stanford.edu/pub/papers/derived1.ps |
---|---|
Title | Database Support for Efficiently Maintained Derived Data |
Authors | B. Adelberg , B. Kao , H. Garcia-Molina |
Year | 1995 |
Citation | Appeared in EDBT '96 |
Keywords | |
Abstract | Derived data is maintained in a database system to correlate and summarize base data which record real world facts. As base data changes, derived data needs to be recomputed. A high performance system should execute all these updates and recomputations in a timely fashion so that the data remains fresh and useful, while at the same time executing user transactions quickly. This paper studies the intricate balance between recomputing derived data and transaction execution. Our focus is on effcient recomputation strategies -- how and when recomputations should be done to reduce their cost without jeopardizing data timeliness. We propose the Forced Delay recomputation algorithm and show how it can exploit update locality to improve both data freshness and transaction response time. Keywords: derived data, view maintenance, active database system, transaction scheduling, update locality. |
URL | http://www-db.stanford.edu/pub/papers/overview.ps |
---|---|
Title | An Overview of the STanford Real-time Information Processor (STRIP) |
Authors | B. Adelberg , B. Kao , H. Garcia-Molina |
Year | 1995 |
Citation | Appeared in SIGMOD record, march '96 |
Keywords | |
Abstract | We believe that the greatest growth potential for soft realtime databases is not as isolated monolithic databases but as components in open systems consisting of many heterogeneous databases. In such environments, the flexibility to deal with unpredictable situations and the ability to cooperate with other databases (often non-real-time databases) is just as important as the guarantee of stringent timing constraints. In this paper, we describe a database designed explicitly for heterogeneous environments, the STanford Realtime Information Processor STRIP, which runs on standard Posix Unix, is a soft real-time main memory database with special facilities for importing and exporting data as well as handling derived data. We will describe the architecture of STRIP, its unique features, and its potential uses in overall system architectures. 1 Project Goals The STanford Real-time Information Processor (STRIP) was designed with the following goals: 1. support for transactions and data with |
URL | http://www-db.stanford.edu/pub/papers/evaluating-strip.ps |
---|---|
Title | Project Synopsis: Evaluating STRIP |
Authors | B. Adelberg , H. Garcia-Molina |
Year | 1996 |
Citation | DART '96. |
Keywords | STRIP, active database, derived data, benchmark, program trading |
Abstract | Ths paper describes preliminary efforts at evaluating the performance of the Stanford realtime information processor (STRIP We desribe a benchmark for active real-time databases based on a program trading application and report STRIP's performance on benchmark. Keywords: derived data, view maintenance, active database system, transaction scheduling, program trading, real-time database system. |
URL | http://www-db.stanford.edu/pub/papers/strip-rules.ps |
---|---|
Title | The STRIP Rule System For Efficiently Maintaining Derived Data |
Authors | B. Adelberg , H. Garcia-Molina , J. Widom |
Year | 1996 |
Citation | technical report |
Keywords | active database, derived data, materialized views, program trading |
Abstract | Derived data is maintained in a database system to correlate and summarize base data which records real world facts. As base data changes, derived data needs to be recomputed. This is often implemented by writing active rules that are triggered by changes to base data. In a system with rapidly changing base data, a database with a standard rule system may consume most of its resources running rules to recompute data. This paper presents the rule system implemented as part of the STanford Real-time Information Processor The STRIP rule system is an extension of SQL3-type rules that allows groups of rule actions to be batched together to reduce the total recomputation load on the system. In this paper we describe the syntax and semantics of the STRIP rule system, present an example set of rules to maintain stock index and theoretical option prices in a program trading application, and report the results of experiments performed on the running system. The experiments verify that STRIP's r |
URL | http://www-db.stanford.edu/pub/papers/vlsiDesign.ps |
---|---|
Title | Multi-Way VLSI Circuit Partitioning |
Authors | P. Agrawal , B. Narendran , N. Shivakumar |
Year | 1996 |
Citation | Proceedings of 9th International Conference on VLSI Design, Bangalore, India, Jan'96 |
Keywords | VLSI, CAD, Partitioning, Circuit |
Abstract | [HaKa92] L. Hagen and A. B. Kahng, A new approach to effective circuit clustering, Proc. IEEE International Conference on Computer-Aided Design, pp. 422-427, Noember 1992. [HaKa92b] L. Hagen and A. B.Kahng, New spectral methods for ratio cut partitioning and clustering. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, pp. 1074-1085, September 1992. [KeLi70] B. Kernighan, and S. Lin, An effcient heuristic for partitioning of electrical circuits, Bell System Technical Journal, vol. 49, pp. 291-307, February 1970. [KiGV83] S. Kirkpatrick, C. D. Gelat, and M. P. Vecchi, Jr., Optmization by simulated annealing, Science, vol. 220, pp. 671-680, May 1983. [Kr84] B. Krishnamurthy, An Improved min-cut algorithm for partitioning VLSI networks, IEEE Transactions on Computers, vol. C-33, pp. 438-446, May 1984. [Sa89] L. Sanchis, Multi-Way network partitioning, IEEE Transactions on Computers, vol. 38, no. 1, pp. 62-81, January1989. [WeCh89] Y-C, Wei and C-K, Cheng, Tow |
URL | http://www-db.stanford.edu/pub/papers/behavior.ps |
---|---|
Title | Behavior of Database Production Rules: Termination, Confluence, and Observable Determinism |
Authors | A. Aiken , J. Hellerstein , J. Widom |
Year | 1992 |
Citation | Appeared in SIGMOD '92; published while at IBM Almaden; extended journal version appears in static-analysis.ps |
Keywords | active database, static analysis, Starburst |
Abstract | Static analysis methods are en for determining whether arbitrary sets of database production rules are (1) guaranteed to terminate; (2) guaranteed to produce a unique final database state; (3) guaranteed to produce a unique stream of observable actions. When the analysis determines that one of these properties is not guaranteed, it isolates the rules responsible for the problem and determines criteria that, if satisfied, guarantee the property. The analysis methods are presented in the context of the Starburst Rule System; they will form the basis of an interactive development environment for Starburst rule programmers. |
URL | http://www-db.stanford.edu/pub/papers/static-analysis.ps |
---|---|
Title | Static Analysis Techniques for Predicting the Behavior of Database Production Rules |
Authors | A. Aiken , J. Hellerstein , J. Widom |
Year | 1995 |
Citation | In ACM TODS, March 1995; extended version of work initially reported in behavior.ps |
Keywords | active database, Starburst |
Abstract | Methods are given for statically analyzing sets of database production rules to determine if the rules are (1) guaranteed to terminate, (2) guaranteed to produce a unique final database state, and (3) guaranteed to produce a unique stream of observable actions. If the analysis determines that one of these properties is not guaranteed, it isolates the rules responsible for the problem and determines criteria that, if satisfied, guarantee the property. The analysis methods are presented in the context of the Starburst Rule System. |
URL | http://www-db.stanford.edu/pub/papers/metadata.ps |
---|---|
Title | The Stanford Digital Library Metadata Architecture |
Authors | M. Baldonado , K. Chang , L. Gravano , A. Paepcke |
Year | 1996 |
Citation | Submitted for publication |
Keywords | |
Abstract | We describe the requirements and design of a metadata architecture for the Stanford Digital Library project. The overall goal of this project is to provide an infrastructure that affords interoperability among heterogeneous, autonomous digital library services. These services include both search services and remotely usable information processing facilities. We situate our requirements for a metadata architecture in the context of our experience in building this infrastructure and its associated services. Our metadata architecture fits into our established infrastructure and promotes interoperability among existing and de-facto metadata standards. Several pieces of this architecture are implemented; others are under construction. The architecture includes attribute model proxies, attribute model translation services, metadata information facilities for search services, and local metadata repositories. In presenting and discussing the pieces of the architecture, we show how they addres |
URL | http://www-db.stanford.edu/pub/papers/delta-relations.ps |
---|---|
Title | Using Delta Relations to Optimize Condition Evaluation in Active Databases |
Authors | E. Baralis , J. Widom |
Year | 1993 |
Citation | Stanford Technical Report Stan-CS-93-1495, Nov. 1993; shortened proceedings version appears in delta-relations-rids.ps |
Keywords | |
Abstract | We give a method for improving the effciency of condition evaluation during rule processing in active database systems. The method derives, from a rule condition, two improved conditions that can be used in place of the original condition when a previous value (true or false) of the original condition is known. The derived conditions are more effcient the original condition because they replace references to entire database relations by references to delta relations, which typically are much smaller. Delta relations are accessible to rule conditions in almost all current active database systems, making this optimization broadly applicable. We specify an implementation of our rewriting method based on attribute grammars. |
URL | http://www-db.stanford.edu/pub/papers/algebraic-analysis.ps |
---|---|
Title | An Algebraic Approach to Rule Analysis in Expert Database Systems |
Authors | E. Baralis , J. Widom |
Year | 1994 |
Citation | Stanford Technical Report Stan-CS-94-1504, Feb. 1994; shortened version appears in analysis-vldb.ps |
Keywords | active database, static analysis |
Abstract | Expert database systems extend the functionality of conventional database systems by providing a facility for creating and automatically executing Condition-Action rules. While Condition-Action rules in database systems are very powerful, they also can be very diffcult to program, due to the unstructured and unpredictable nature of rule processing. We provide methods for static analysis of Condition-Action rules; our methods determine whether a given rule set is guaranteed to terminate, and whether rule execution is confluent (has a guaranteed unique final Our methods are based on previous methods for analyzing rules in active database systems. We improve considerably on the previous methods by providing analysis criteria that are much less conservative: our methods often determine that a rule set will terminate or is confluent when previous methods could not. Our improved analysis is based on a "propagation" algorithm, which uses a formal approach based on an extended relational alge |
URL | http://www-db.stanford.edu/pub/papers/analysis-vldb.ps |
---|---|
Title | An Algebraic Approach to Rule Analysis in Expert Database Systems |
Authors | E. Baralis , J. Widom |
Year | 1994 |
Citation | Appeared in VLDB '94; shortened proceedings version of algebraic-analysis.ps |
Keywords | active database, static analysis |
Abstract | Expert database systems extend the functionality of conventional database systems by providing a facility for creating and automatically executing Condition-Action rules. While Condition-Action rules in database systems are very powerful, they also can be very diffcult to program, due to the unstructured and unpredictable nature of rule processing. We provide methods for static analysis of Condition-Action rules; our methods determine whether a given rule set is guaranteed to terminate, and whether rule execution is confluent (has a guaranteed unique final Our methods are based on previous methods for analyzing rules in active database systems. We improve considerably on the previous methods by providing analysis criteria that are much less conservative: our methods often determine that a rule set will terminate or is confluent when previous methods could not. Our improved analysis is based on a "propagation" algorithm, which uses a formal approach based on an extended relational alge |
URL | http://www-db.stanford.edu/pub/papers/delta-relations-rids.ps |
---|---|
Title | Using Delta Relations to Optimize Condition Evaluation in Active Databases |
Authors | E. Baralis , J. Widom |
Year | 1995 |
Citation | Appeared in RIDS '95; shortened proceedings version of delta-relations.ps |
Keywords | |
Abstract | We give a method for improving the effciency of condition evaluation during rule processing in active database systems. The method derives, from a rule condition, two new conditions that can be used in place of the original condition when a previous value (true or false) of the original condition is known. The derived conditions are generally more effcient to evaluate than the original condition because they are incremental--they replace references to entire database relations by references to delta relations, which typically are much smaller. Delta relations are accessible to rule conditions in almost all current active database systems, making this optimization broadly applicable. We describe an attribute grammar based approach that we have used to implement our condition rewriting technique. |
URL | http://www-db.stanford.edu/pub/papers/better-static-analysis.ps |
---|---|
Title | Better Static Rule Analysis for Active Database Systems |
Authors | E. Baralis , J. Widom |
Year | |
Citation | Technical report, submitted for journal publication; preliminary version of some of this work appears in algebraic-analysis.ps |
Keywords | |
Abstract | Rules in active database systems can be very diffcult to program, due to the unstructured and unpredictable nature of rule processing. We provide static analysis techniques for predicting whether a given rule set is guaranteed to terminate, and whether rule execution is confluent (guaranteed to have a unique final Our methods are based on previous techniques for analyzing rules in active database systems. We improve considerably on the previous techniques by providing analysis criteria that are much less conservative: our methods often determine that a rule set will terminate or is confluent when previous methods could not make this determination. Our improved analysis is based on a "propagation" algorithm, which uses an extended relational algebra to accurately determine when the action of one rule can affect the condition of another, and to determine when rule actions commute. We consider both Condition-Action rules and Event-Condition-Action rules, making our approach widely applic |
URL | http://www-db.stanford.edu/pub/papers/1993/data-replication.short.ps |
---|---|
Title | Replicated Data Management in Mobile Environments: Anything New Under the Sun? |
Authors | D. Barbara , H. Garcia-Molin |
Year | 1993 |
Citation | IFIP Conference on Applications in Parallel and Distributed Computing, April 1994 |
Keywords | |
Abstract |
URL | http://www-db.stanford.edu/pub/papers/ooindex.ps |
---|---|
Title | Centralized Versus Distributed Index Management in a Page Server OODBMS |
Authors | J. Basu , A. Keller |
Year | 1995 |
Citation | unpublished manuscript |
Keywords | OODB, index, simulation |
Abstract |
URL | http://www-db.stanford.edu/pub/papers/isolation.ps |
---|---|
Title | Degrees of Transaction Isolation in SQL*Cache: A Predicate-based Client-side Caching System |
Authors | J. Basu , A. Keller |
Year | 1996 |
Citation | unpublished manuscript |
Keywords | caching, cache consistency, transaction consistency |
Abstract |
URL | http://www-db.stanford.edu/pub/papers/multidatabase.ps |
---|---|
Title | Overview of Multidatabase Transaction Management |
Authors | Y. Breitbart , H. Garcia-Molina , A. Silberschatz |
Year | 1992 |
Citation | VLDB Journal, Vol. 1, No. 2, October 1992, pp.181-23. |
Keywords | |
Abstract | [66]H.Wachter, A. Reuter, The ConTract Model, A. K. Elmagarmid (Ed.), Database TransactionModels for Advanced Applications, Morgan Kaufman, San Mateo, CA, 1992.[67]G.Weikum, H.-J. Schek. Architectural Issues of Transaction Management in Layered Systems.Proceedingsof the 10th Conference on Very Large Data Bases. Morgan Kaufmann, Palo Alto,Calif. pp. 454-465. 1984.[68]A. Wolski and J. Veijalainen. 2PC Agent method:AchievingSerializability in presence ofFailures in a Heterogeneous Multidatabase. Proceedings of the International Conference onDatabases Parallel Architectures and their Applications. pp 321-330,1990.[69]K.-L. Wu, P. Yu, C. Pu. Divergence Control for Epsilon-Serializability Proceeding of the 8thInternational Conference on Data Engineering. Phoenix, AZ, February, 1992.42 |
URL | http://www-db.stanford.edu/pub/papers/multidatabase.ps |
---|---|
Title | Overview of Multidatabase Transaction Management |
Authors | Y. Breitbart , H. Garcia-Molina , A. Silberschatz |
Year | 1992 |
Citation | VLDB Journal, Vol. 1, No. 2, October 1992, pp.181-23. |
Keywords | |
Abstract | [66]H.Wachter, A. Reuter, The ConTract Model, A. K. Elmagarmid (Ed.), Database TransactionModels for Advanced Applications, Morgan Kaufman, San Mateo, CA, 1992.[67]G.Weikum, H.-J. Schek. Architectural Issues of Transaction Management in Layered Systems.Proceedingsof the 10th Conference on Very Large Data Bases. Morgan Kaufmann, Palo Alto,Calif. pp. 454-465. 1984.[68]A. Wolski and J. Veijalainen. 2PC Agent method:AchievingSerializability in presence ofFailures in a Heterogeneous Multidatabase. Proceedings of the International Conference onDatabases Parallel Architectures and their Applications. pp 321-330,1990.[69]K.-L. Wu, P. Yu, C. Pu. Divergence Control for Epsilon-Serializability Proceeding of the 8thInternational Conference on Data Engineering. Phoenix, AZ, February, 1992.42 |
URL | http://www-db.stanford.edu/pub/papers/copy.ps |
---|---|
Title | Copy Detection Mechanisms for Digital Documents |
Authors | S. Brin , J. Davis , H. Garcia-Molina |
Year | 1995 |
Citation | Appeared in SIGMOD '95 |
Keywords | |
Abstract | In a digital library system, documents are available in digital form and therefore are more easily copied and their copyrights are more easily violated. This is a very serious problem, as it discourages owners of valuable information from sharing it with authorized users. There are two main philosophies for addressing this problem: prevention and detection. The former actually makes unauthorized use of documents diffcult or impossible while the latter makes it easier to discover such activity. In this paper we propose a system for registering documents and then detecting copies, either complete copies or partial copies. We describe algorithms for such detection, and metrics required for evaluating detection mechanisms (covering accuracy, effciency, and securitWe also describe a working prototype, called COPS, describe implementation issues, and present experimental results that suggest the proper settings for copy detection parameters. |
URL | http://www-db.stanford.edu/pub/papers/near.ps |
---|---|
Title | Near Neighbor Search in Large Metric Spaces |
Authors | S. Brin |
Year | 1995 |
Citation | Appeared in VLDB '95 |
Keywords | |
Abstract | Given user data, one often wants to find approximate matches in a large database. A good example of such a task is finding images similar to a given image in a large collection of images. We focus on the important and technically diffcult case where each data element is high dimensional, or more generally, is represented by a point in a large metric spaceand distance calculations are computationally expensive. In this paper we introduce a data structure to solve this problem called a GNAT { Geometric Near-neighbor Access Tree. It is based on the philosophy that the data structure should act as a hierarchical geometrical model of the data as opposed to a simple decomposition of the data that does not use its intrinsic geometry. In experiments, we find that GNAT's outperform previous data structures in a number of applications. Keywords { near neighbor, metric space, approximate queries, data mining, Dirichlet domains, Voronoi regions. |
URL | http://www-db.stanford.edu/pub/papers/constraint-maintenance.ps |
---|---|
Title | Deriving Production Rules for Constraint Maintenance |
Authors | S. Ceri , J. Widom |
Year | 1990 |
Citation | Appeared in VLDB '90; published while at IBM Almaden |
Keywords | active database, integrity, Starburst |
Abstract | Traditionally, integrity constraints in database systems are maintained either by rolling back any transaction that produces an inconsistent state or by disallowing or modifying operations that may produce an inconsistent state. An alternative approach is to provide automatic "repair" of inconsistent states using production rules. For each constraint, a production rule is used to detect constraint violation and to initiate database operations that restore consistency. We describe an SQL-based language for defining integrity constraints and a framework for translating these constraints into constraint-maintaining production rules. Some parts of the translation are automatic while other parts require user intervention. Based on the semantics of our set-oriented production rules language and under certain assumptions, we prove that at the end of each transaction the rules are guaranteed to produce a state satisfying all defined constraints. We apply our approach to a good-sized example. |
URL | http://www-db.stanford.edu/pub/papers/view-maintenance.ps |
---|---|
Title | Deriving Production Rules for Incremental View Maintenance |
Authors | S. Ceri , J. Widom |
Year | 1991 |
Citation | Appeared in VLDB '91 (award paper); published while at IBM Almaden |
Keywords | active database, Starburst |
Abstract | It is widely recognized that production rules in database systems can be used to automatically maintain derived data such as views. However, writing a correct set of rules for effciently maintaining a given view can be a diffcult and ad-hoc process. We provide a facility whereby a user defines a view as an SQL select expression, from which the system automatically derives set-oriented production rules that maintain a materialization of that view. The maintenance rules are triggered by operations on the view's base tables. Generally, the rules perform incremental maintenance: the materialized view is modified according to the sets of changes made to the base tables, which are accessible through logical tables provided by the rule language. However, for some operations substantial recomputation may be required. We give algorithms that, based on key information, perform syntactic analysis on a view definition to determine when effcient maintenance is possible. |
URL | http://www-db.stanford.edu/pub/papers/par-dist-rules.ps |
---|---|
Title | Production Rules in Parallel and Distributed Database Environments |
Authors | S. Ceri , J. Widom |
Year | 1992 |
Citation | Appeared in VLDB '92; published while at IBM Almaden |
Keywords | active database, consistency, Starburst |
Abstract | In most database systems with production rule facilities, rules respond to operations on centralized data and rule processing is performed in a centralized, sequential fashion. In parallel and distributed database environments, for maximum autonomy it is desirable for rule processing to occur separately at each site (or noresponding to operations on data at that site. However, since rules at one site may read or modify data and interact with rules at other sites, independent rule processing at each site may be impossible or incorrect. We describe mechanisms that allow rule processing to occur separately at each site and guarantee correctness: parallel or distributed rule processing is provably equivalent to rule processing in the corresponding centralized environment. Our mechanisms include locking schemes, communication protocols, and rule restrictions. Based on a given parallel or distributed environment and desired level of transparency, the mechanisms may be combined or may be use |
URL | http://www-db.stanford.edu/pub/papers/heterogeneity.ps |
---|---|
Title | Managing Semantic Heterogeneity with Production Rules and Persistent Queues |
Authors | S. Ceri , J. Widom |
Year | 1993 |
Citation | Appeared in VLDB '93; published while at IBM Almaden |
Keywords | active database, heterogeneous database, Starburst |
Abstract | We show that production rules and persistent queues together provide a convenient mechanism for maintaining consistency in semantically heterogeneous multidatabase environments. We describe a specification language and methods for automatically deriving production rules that maintain (1) existence dependencies, in which the presence of data in one database implies the presence of related data in another, and (2) value dependencies, in which the value of data in one database is based on the value of related data in another. The production rules derived from dependency specifications use persistent queues to monitor and maintain the dependencies automatically, asynchronously, incrementally, and correctly. |
URL | http://www-db.stanford.edu/pub/papers/deductive-data.ps |
---|---|
Title | Deriving Incremental Production Rules for Deductive Data |
Authors | S. Ceri , J. Widom |
Year | 1994 |
Citation | Appeared in Information Systems, 1994 |
Keywords | active database, deductive database, Starburst |
Abstract | We show that the production rule mechanism provided by active database systems can be used to quickly and easily implement the logic rule interface of deductive database systems. Deductive rules specify derived relations using Datalog with built-in predicates and stratified negation; the deductive rules are compiled automatically into production rules. We present a materialized approach, in which the derived relations are stored in the database and the production rules automatically and incrementally propagate base relation changes to the derived relations. We also present a non-materialized approach, in which the production rules compute the derived relations on demand. |
URL | http://www-db.stanford.edu/pub/papers/video-long.ps |
---|---|
Title | Minimizing Memory Use in Video Servers |
Authors | E. Chang , H. Garcia-Molina |
Year | 1996 |
Citation | Stanford Technical Note SIDL-WP-1996-0050 Submitted for publication |
Keywords | multimedia, disk scheduling, memory utilization |
Abstract | A number of techniques have been developed for reducing disk latency in a video server, including disk arm scheduling and data placement ones. In this paper we carefully study memory utilization in video servers, leading to enhanced techniques that significantly reduce memory use. This is achieved by introducing artificial delays between IOs. Surprisingly, the delays lead to better overall performance as compared to "traditional" disk latency reduction techniques. We also propose a novel disk scheduling policy that reduces initial latency (i.e., the time between the arrival of a request and the start of the presentation) to minimum. In addition, we study the memory requirements and costs of a video delivery system, showing how data placement on disk and multiple disks impact memory use. In doing so we show that it may not be advisable to achieve the maximum throughput that a disk can support. Keywords: multimedia, disk scheduling, memory utilization. |
URL | http://www-db.stanford.edu/pub/papers/latency-reduction.ps |
---|---|
Title | Reducing Initial Latency in Multimedia Storage Systems |
Authors | E. Chang , H. Garcia-Molina |
Year | 1996 |
Citation | Appeared in MMDBMS '96; |
Keywords | |
Abstract | A multimedia server delivers presentations (e.g., videos, movies, providing high bandwidth and continuous real-time deliveryIn this paper we present techniques for reducing the initial latency of presentations, i.e., for reducing the time between the arrival of a request and the start of the presentation. Traditionally, initial latency has not received much attention. This is because one major application of multimedia servers is movies on demand where a delay of a few minutes before a new multi-hour movie starts is acceptable. However , latency reduction is important in interactive applications such as video games and browsing of multimedia documents. V arious latency reduction schemes are proposed and analyzed, and their performance compared. We show that our techniques can signicantly reduce (almost eliminate in some cases) initial latency without adversely affecting throughput. Moreover , a novel on-disk partial data replication scheme that we propose proves to be far more cost ef |
URL | http://www-db.stanford.edu/pub/papers/bquery.ps |
---|---|
Title | Boolean Query Mapping Across Heterogeneous Information Sources |
Authors | K. Chang , H. Garcia-Molina , A. Paepcke |
Year | 1995 |
Citation | To appear in the IEEE Transactions on Knowledge and Data Engineering as part of a special section of concise research papers on Digital Libraries (Aug. 1996). |
Keywords | Boolean queries, query translation, information retrieval, heterogeneity, digital libraries, query subsumption, filtering. |
Abstract | ---Searching over heterogeneous information sources is difcult because of the non-uniform querylanguages. Our approach is to allow a user to compose Boolean queries in one rich front-end language. Foreach user query and target source, we transform the user query into a subsuming query that can be supportedby the source but that may return extra documents. The results are then processed by a lter query to yield thecorrect nal result. In this paper we introduce the architecture and associated algorithms for generating thesupported subsuming queries and lters. We show that generated subsuming queries return a minimal num-ber of documents; we also discuss how minimal cost lters can be obtained. We have implemented prototypeversions of these algorithms and demonstrated them on heterogeneous Boolean systems.Index Terms---Boolean queries, query translation, information retrieval, heterogeneity, digital libraries,query subsumption, ltering.I.INTRODUCTIONEmerging Digital Libraries can provide |
URL | http://www-db.stanford.edu/pub/papers/pred_rewriting.ps |
---|---|
Title | Predicate Rewriting for Translating Boolean Queries in a Heterogeneous Information System |
Authors | K. Chang , H. Garcia-Molina , A. Paepcke |
Year | 1996 |
Citation | Unpublished manuscript |
Keywords | Boolean queries, query translation, predicate rewriting, information retrieval, heterogeneity, digital libraries, query subsumption, filtering. |
Abstract | 24[3]M.I. Crystal, G.E. Jakobson. FRED, A Front End for Databases,Online, vol. 6, no. 5, pp. 27-30, Sep. 1982.[4]W. B. Frakes, R. Baeza-Yates.Information Retrieval Data Structures & Algorithm, Prentice-Hall, 1992.[5]D.T. Hawkins, I.R. Levy. Front End Software for Online Database Searching Part1: Denitions, System Fea-tures, and Evaluation,Online 96:30-37, November 1985.[6]A. Loeffen. Text Databases: A Survey of Text Models and Systems,Sigmod Record, vol. 23, no. 1, pp. 97-106,March 1994.[7]J.B. Lovins. Development of a Stemming Algorithm,Mechanical Translation and Computational Linguistics,vol. 11, no. 1-2, pp. 22-31, 1968.[8]R.S. Marcus. User Assistance in Bibliographic Retrieval Networks Through a Computer Intermediary,IEEETrans. on Systems, Man, and Cybernetics, vol. smc-12, no. 2, pp. 116-133, 1982.[9]T.H. Martin.A Feature Analysis of Interactive Retrieval Systems, Institute of Communication Research, Stan-ford University, SU-COMM-ICR-74-1, September, 1974.[10]E.J. McCluskey.Logic |
URL | http://www-db.stanford.edu/pub/papers/sigmod96.ps |
---|---|
Title | Optimizing Queries over Multimedia Repositories |
Authors | S. Chaudhuri , L. Gravano |
Year | 1996 |
Citation | Appeared in SIGMOD '96 |
Keywords | |
Abstract | Repositories of multimedia objects having multiple types of attributes (e.g., image, text) are becoming increasingly common. A selection on these attributes will typically produce not just a set of objects, as in the traditional relational query model (filteringbut also a grade of match associated with each object, indicating how well the object matches the selection condition (rankingAlso, multimedia repositories may allow access to the attributes of each object only through indexes. We investigate how to optimize the processing of queries over multimedia repositories. A key issue is the choice of the indexes used to search the repository. We define an execution space that is searchminimal, i.e., the set of indexes searched is minimal. Although the general problem of picking an optimal plan in the search-minimal execution space is NP-hard, we solve the problem effciently when the predicates in the query are independent. We also show that the problem of optimizing queries that ask for |
URL | http://www-db.stanford.edu/pub/papers/cm.ps |
---|---|
Title | Constraint Management in Loosely Coupled Distributed Databases |
Authors | S. Chawathe , H. Garcia-Molina , J. Widom |
Year | 1993 |
Citation | Technical report. Detailed version of papers appearing in ICDE 1996 and Data Engg. Bull. 17(2), June 1994 |
Keywords | constraint management, autonomous systems, heterogeneous systems |
Abstract | We provide a framework for managing integrity constraints that span multiple databases in loosely coupled, heterogeneous environments. Our framework enables the formal description of (1) interfaces provided by a database for the data items involved in multi-database constraints; (2) strategies for monitoring and maintaining multi-database constraints; (3) guarantees regarding the consistency of multi-database constraints. With our approach one can define "relaxed" constraints that only hold at certain times or under certain conditions. Such constraints appear often in practice and cannot be handled effectively by conventional, transaction-based approaches. We also describe a toolkit, based on our framework, for enforcing constraints over heterogeneous systems. The toolkit includes a general-purpose, distributed constraint manager that can be easily configured to a given environment and constraints. A first version of the toolkit has been implemented and is under evaluation. |
URL | http://www-db.stanford.edu/pub/papers/ind-nes-obj.ps |
---|---|
Title | On Index Selection Schemes for Nested Object Hierarchies |
Authors | S. Chawathe , M. Chen , P. Yu |
Year | 1993 |
Citation | VLDB 1994 |
Keywords | Object-oriented databases, nested objects, path expressions, index selection |
Abstract | In this paper we address the problem of devising a set of indexes for a nested object hierarchy in an object-oriented database to improve the overall system performance. It is noted that the effects of two indexes could be entangled in that the inclusion of one index might affect the benefit achievable by the other index. Such a phenomenon is termed index interaction. Clearly, the effect of index interaction needs to be taken into consideration when a set of indexes is being built. The index selection problem is first formulated and four index selection algorithms are evaluated via simulation. The effects of different objective functions, which guide the search in the index selection algorithms, are also investigated. It is shown by simulation results that the greedy algorithm which is devised in light of the phenomenon of index interaction performs fairly well in most cases. Sensitivity analysis for various database parameters is conducted. Index Terms: Object-oriented databases, ind |
URL | http://www-db.stanford.edu/pub/papers/indexing-nested-objects.ps |
---|---|
Title | On Index Selection Schemes for Nested Object Hierarchies |
Authors | S. Chawathe , M. Chen , P. Yu |
Year | 1993 |
Citation | Technical report. Detailed version of paper that appears in VLDB 1994 |
Keywords | Object-oriented databases, nested objects, path expressions, index selection |
Abstract | In this paper we address the problem of devising a set of indexes for a nested object hierarchy in an objected-oriented database to improve the overall system performance. Note that the effects of two indexes could be entangled in that the inclusion of one index might affect the benefit achievable by the other index. Such a phenomenon is termed index interaction in this paper. Clearly, the effect of index interaction needs to be taken into consideration when a set of indexes is being built. The index selection problem is first formulated and four index selection algorithms are evaluated via simulation. The effects of different objective functions, which guide the search in the index selection algorithms, are also investigated. It is shown by simulation results that the greedy algorithm which is devised in light of the phenomenon of index interaction performs fairly well in most cases. It in fact agrees with the very nature of index interaction we identify in this study. Sensitivity an |
URL | http://www-db.stanford.edu/pub/papers/tsimmis-overview.ps |
---|---|
Title | The TSIMMIS Project: Integration of Heterogenous Information Sources |
Authors | S. Chawathe , H. Garcia-Molina , J. Hammer , K. Ireland , Y. Papakonstantinou , J. Ullman , J. Widom |
Year | 1994 |
Citation | IPSJ Conference 1994 (Tokyo) |
Keywords | Heterogeneous databases, Object Exchange Model, integration |
Abstract | The goal of the Tsimmis Project is to develop tools that facilitate the rapid integration of heterogeneous information sources that may include both structured and unstructured data. This paper gives an overview of the project, describing components that extract properties from unstructured objects, that translate information into a common object model, that combine information from several sources, that allow browsing of information, and that manage constraints across heterogeneous sites. Tsimmis is a joint project between Stanford and the IBM Almaden Research Center. 1 Overview A common problem facing many organizations today is that of multiple, disparate information sources and repositories, including databases, object stores, knowledge bases, file systems, digital libraries, information retrieval systems, and electronic mail systems. Decision makers often need information from multiple sources, but are unable to get and fuse the required information in a timely fashion due to the |
URL | http://www-db.stanford.edu/pub/papers/autonomous-cm.ps |
---|---|
Title | Flexible Constraint Management for Autonomous Distributed Databases |
Authors | S. Chawathe , H. Garcia-Molina , J. Widom |
Year | 1994 |
Citation | Data Engineering Bulletin, 17(2), June 1994 |
Keywords | constraint management, autonomous systems, heterogeneous systems |
Abstract | Flexible Constraint Management for Autonomous Distributed Databases Sudarshan S. Chawathe, Hector Garcia-Molina and Jennifer Widom Computer Science Department Stanford University Stanford, California 94305{2140 E-mail: fchaw,hector,widomg@cs.Stanford.edu 1 Introduction When databases inter-operate, integrity constraints arise naturally. For example, consider a flight reservation application that multiple airline databases. Airline A reserves a block of X seats from airline B. If A sells many seats from this block, it tries to increase X. For correctness, the value of X recorded in A's database must be the same as that recorded in B's database; this is a simple distributed copy constraint. However, the databases in the above example are owned by independent airlines and are therefore autonomous. Typically, the database of one airline will not participate in distributed transactions with other airlines, nor will it allow other airlines to lock its data. This renders traditional constra |
URL | http://www-db.stanford.edu/pub/papers/tdiff3-8.ps |
---|---|
Title | Change Detection in Hierarchically Structured Information |
Authors | S. Chawathe , A. Rajaraman , H. Garcia-Molina , J. Widom |
Year | 1995 |
Citation | SIGMOD 1996 |
Keywords | change detection, semi-structured data, edit script, difference |
Abstract | Detecting and representing changes to data is important for active databases, data warehousing, view maintenance, and version and configuration management. Most previous work in change management has dealt with flat-file and relational data; we focus on hierarchically structured data. Since in many cases changes must be computed from old and new versions of the data, we define the hierarchical change detection problem as the problem of finding a "minimum-cost edit script" that transforms one data tree to another, and we present effcient algorithms for computing such an edit script. Our algorithms make use of some key domain characteristics to achieve substantially better performance than previous, general-purpose algorithms. We study the performance of our algorithms both analytically and empirically, and we describe the application of our techniques to hierarchically structured documents. |
URL | http://www-db.stanford.edu/pub/papers/tdiff3-2.ps |
---|---|
Title | Change Detection in Hierarchically Structured Information |
Authors | S. Chawathe , A. Rajaraman , H. Garcia-Molina , J. Widom |
Year | 1995 |
Citation | Technical report. Detailed version of paper appearing in SIGMOD 1996 |
Keywords | change detection, semi-structured data, edit script, difference |
Abstract | Detecting and representing changes to data is important for active databases, data warehousing, view maintenance, and version and conguration management. Most previous work in change management has dealt with at-le and relational data; we focus on hierarchically structured data. Since in many cases changes must be computed from old and new versions of the data, we dene the hierarchical change detection problem as the problem of nding a minimum-cost edit script that transforms one data tree to another, and we present efcient algorithms for computing such an edit script. Our algorithms make use of some key domain characteristics to achieve substantially better performance than previous, generalpurpose algorithms. We study the performance of our algorithms both analytically and empirically, and we describe the application of our techniques to hierarchically structured documents. |
URL | http://www-db.stanford.edu/pub/papers/cmtoolkit.ps |
---|---|
Title | A Toolkit for Constraint Management in Heterogeneous |
Authors | S. Chawathe , H. Garcia-Molina , J. Systems |
Year | 1995 |
Citation | ICDE 1996 |
Keywords | constraint management, autonomous systems, heterogeneous systems |
Abstract | We present a framework and a toolkit to monitor and enforce distributed integrity constraints in loosely coupled information systems. Our framework enables formalizes weakened notions of consistencywhich are essential in such environments. Our framework is used to describe (1) interfaces provided by a database for the data items involved in intersite constraints; (2) strategies for monitoring and enforcing such constraints; (3) guaranregarding the level of consistency the system can provide. Our toolkit uses this framework to provide a set of configurable modules that are used to monitor and enforce constraints spanning loosely coupled heterogeneous information systems. |
URL | http://www-db.stanford.edu/pub/papers/bbdifftr.ps |
---|---|
Title | Meaningful Change Detection in Structured Data |
Authors | S. Chawathe , H. Garcia-Molina |
Year | 1996 |
Citation | Technical Report |
Keywords | change detection, edit scripts, tree editing, diff, deltas, object-oriented databases, evolving data |
Abstract | Detecting changes by comparing data snapshots is an important requirement for difference queries, active databases, and version and configuration management. In this paper we focus on detecting meaningful changes in hierarchically structured data, such as nested-object data. This is a much more challenging problem than the corresponding one for relational or flat-file data. In order to describe changes better, we base our work not just on the traditional "atomic" insert, delete, update operations, but also on operations that move an entire sub-tree of nodes, and that copy an entire sub-tree. This allows us to describe changes in a semantically more meaningful way. Since this change detection problem is N P-hard, in this paper we present a heuristic change detection algorithm that yields close to "minimal" descriptions of the changes, and that has fewer restrictions than previous algorithms. Our algorithm is based on transforming the change detection problem to a problem of computing a |
URL | http://www-db.stanford.edu/pub/papers/doem.ps |
---|---|
Title | Representing and Querying Changes in Semistructured Data |
Authors | S. Chawathe , S. Abiteboul , J. Widom |
Year | 1997 |
Citation | Technical Report |
Keywords | Semistructured data, deltas, change queries, standing queries |
Abstract | Semistructured data may be irregular and incomplete and does not necessarily conform to a fixed schema. As with structured data, it is often desirable to maintain a history of changes to data, and to query over both the data and the changes. Representing and querying changes in semistructured data is more diffcult than in structured data due to the irregularity and lack of schema. We present a model for representing changes in semistructured data and a language for querying over these changes. We discuss implementation strategies for our model and query language. We also describe the design and implementation of a "query subscription service" that permits standing queries over changes in semistructured information sources. |
URL | http://www-db.stanford.edu/pub/papers/iccad94.ps |
---|---|
Title | Multi-way VLSI Circuit Partitioning Based on Dual Net Representation |
Authors | J. Cong , W. Labio , N. Shivakumar |
Year | 1994 |
Citation | IEEE International Conference on Computer-Aided Design, Nov '94. (ICCAD'94) |
Keywords | |
Abstract | Inthispaper,westudythearea-balancedmulti-waypartitioningproblemofVLSIcircuitsbasedonanewdualnetlistrepresentationnamedthehybriddualnetlist(HDN).Givenanetlist,werstcomputeaK-wayparti-tionofthenetsbasedontheHDNrepresentation,andthentransformaK-waynetpartitionintoaK-waymodulepartitioningsolution.ThemaincontributionofourworkistheformulationandsolutionoftheK-waymodulecon-tention(K-MC)problem,whichdeterminesthebestassignmentofthemodulesincontentiontopartitions,whilemaintaininguser-speciedarearequirements,whenwetransformthenetpartitionintoamodulepartition.Underanaturaldenitionofbindingfactorbetweennetsandmodules,andpreferencefunctionbetweenpartitionsandmodules,weshowthattheK-MCproblemcanbereducedtoamin-costmax-owproblem.WepresentefcientsolutionstotheK-MCproblembasedonnetworkowcomputation.Extensiveexperimentalresultsshowthatouralgorithmconsistentlyoutperformstheconven-tionalK-FMpartitioningalgorithmbyasignicantmar-gin.1.IntroductionTheK-waypartitioningproblemisoneofpartitioningthemodulesinane |
URL | http://www-db.stanford.edu/pub/papers/vlsi.ps |
---|---|
Title | Multi-way VLSI Circuit Partitioning Based on Dual Net Representation |
Authors | J. Cong , W. Labio , N. Shivakumar |
Year | 1996 |
Citation | IEEE Transactions in CAD, April 1996 |
Keywords | |
Abstract | Inthispaper,westudythearea-balancedmulti-waypartitioningproblemofVLSIcircuitsbasedonanewdualnetlistrepresentationnamedthehybriddualnetlist(HDN),andproposeageneralparadigmformulti-waycircuitpartitioningbasedondualnettransformation.GivenanetlistwerstcomputeaK-waypartitioningofnetsbasedontheHDNrepresentation,andthentransformtheK-waynetpartitionintoaK-waymodulepartitioningsolution.Themaincontri-butionofourworkisintheformulationandsolutionoftheK-waymodulecontention(K-MC)problem,whichdeterminesthebestassignmentofthemodulesincontentiontopartitionswhilemaintaininguser-speciedarearequirements,whenwetransformthenetpartitionintoamodulepartition.Underanaturaldenitionofbindingfunctionbetweennetsandmodules,andpreferencefunctionbetweenpartitionsandmodules,weshowthattheK-MCproblemcanbereducedtoamin-costmax-owproblem.WepresentanefcientsolutiontotheK-MCproblembasedonnetworkowcomputation.Weapplyourdualtransformationparadigmtothewell-knownK-wayFMpartitioningalgorithm(K-FM)andshowthatthenewalgorithm,named |
URL | http://www-db.stanford.edu/pub/papers/book-chapter.ps |
---|---|
Title | Active Database Systems |
Authors | U. Dayal , E. Hanson , J. Widom |
Year | 1994 |
Citation | Appeared in Won Kim book "Modern Database Systems..." |
Keywords | overview, survey |
Abstract | Integrating a production rules facility into a database system provides a uniform mechanism for a number of advanced database features including integrity constraint enforcement, derived data maintenance, triggers, alerters, protection, version control, and others. In addition, a database system with rule processing capabilities provides a useful platform for large and effcient knowledge-base and expert systems. Database systems with production rules are referred to as active database systems, and the of active database systems has indeed been active. This chapter summarizes current work in active database systems; topics covered include active database rule models and languages, rule execution semantics, and implementation issues. 0 |
URL | http://www-db.stanford.edu/pub/papers/ifip.ps |
---|---|
Title | A Mechanism and Experimental System for Function-Based Sharing in Federated Databases |
Authors | D. Fang , J. Hammer , D. McLeod |
Year | 1992 |
Citation | In D.K. Hsiao, E.J. Neuhold, and R. Sacks-Davis, editors, Interoperable Database Systems (DS-5) (A-25), pages 239-253. Elsevier Science Publisher |
Keywords | RPC, sharing, function, behavior, schema, evolution |
Abstract | A function-based approach and mechanism to support sharing among the component database systems in a federation is described. In the context of a functional object-based database model, a technique to support inter-component information unit and behavior sharing is presented. An experimental system that implements the function-based sharing mechanism is described, its underlying algorithms are outlined, and its practical utility and effectiveness are assessed. This work is couched in the framework of the Remote-Exchange research project and experimental system. Keyword Codes: H.2.5. Keywords: Heterogeneous Databases |
URL | http://www-db.stanford.edu/pub/papers/iwdom.ps |
---|---|
Title | An Approach to Behavior Sharing in Federated Database Systems |
Authors | D. Fang , J. Hammer , D. McLeod |
Year | 1993 |
Citation | In Distributed Object Management. Morgan Kaufman, 1993, M. T. Ozsu, U.Dayal, and P. Valduriez, editors. |
Keywords | RPC, sharing, method, behavior, schema, evolution |
Abstract | An approach and mechanism to support the sharing of behavior among the component database systems in a federation is described. In the context of a functional object-based database model, a technique to support intercomponent behavior as well as inter-component information unit sharing is presented. An experimental implementation of the behavior sharing mechanism and algorithms is examined, and its practical utility and effectiveness are assessed. This work is couched in the framework of the Remote-Exchange research project and experimental system. |
URL | http://www-db.stanford.edu/pub/papers/compcon.ps |
---|---|
Title | Remote-Exchange: An Approach to Controlled Sharing among Autonomous, Heterogeneous Database Systems |
Authors | D. Fang , J. Hammer , D. McLeod , A. Si |
Year | |
Citation | In Proceedings of the IEEE Spring Compcon. IEEE, San Francisco, February 1991. |
Keywords | Integration, heterogeneity, schema, sharing, resolution |
Abstract | This short paper describes several aspects of the Remote{Exchange project at USC, which focuses on an approach and experimental system for the controlled sharing and exchange of information among autonomous, heterogeneous database systems. The spectrum of heterogeneity which may exist among the components in a federation of database systems is examined, and an approach to accommodating such heterogeneity is described. An overview of the Remote{Exchange experimental system is provided, including the top level architecture, sharing mechanism, and sharing "advisor". |
URL | http://www-db.stanford.edu/pub/papers/nestedSagas.ps |
---|---|
Title | Coordinating Multi-Transaction Activities |
Authors | H. Garcia-Molina , D. Gawlick , J. Klein , K. Kleissner , K. Salem |
Year | 1990 |
Citation | THIS IS A REVISED VERSION OF TECHNICAL REPORT CS-TR-297-90, DEPARTMENT OF COMPUTER SCIENCE, PRINCETON UNIVERSITY, FEBRUARY 1990. A SHORTER VERSION OF THIS PAPER APPEARED AS "Modeling Long-Running Activities as Nested Sagas," in Database Engineering, Vol. 14, No. 1, March, 1991. |
Keywords | |
Abstract | Data processing applications must often execute collections ofrelated transactions.We propose a model for structuring and coordinatingthese multi-transaction activities.The model includes mechanisms forcommunication between transactions, for compensating transactions afteran activity has failed, for dynamic creation and binding of activities, andfor checkpointing the progress of an activity.THIS IS A REVISED VERSION OF TECHNICAL REPORTCS-TR-297-90, DEPARTMENT OF COMPUTER SCIENCE, PRINCETONUNIVERSITY,FEBRUARY1990. ASHORTER VERSION OF THISPAPER APPEARED AS ``Modeling Long-Running Activities as NestedSagas,''INDatabase Engineering,Vol. 14, No. 1, March, 1991. |
URL | http://www-db.stanford.edu/pub/papers/nestedSagas.ps |
---|---|
Title | Coordinating Multi-Transaction Activities |
Authors | H. Garcia-Molina , D. Gawlick , J. Klein , K. Kleissner , K. Salem |
Year | 1990 |
Citation | REVISED VERSION OF TECHNICAL REPORT CS-TR-297-90, DEPARTMENT OF COMPUTER SCIENCE, PRINCETON UNIVERSITY, FEBRUARY 1990. A SHORTER VERSION OF THIS PAPER APPEARED AS "Modeling Long-Running Activities as Nested Sagas," in Database Engineering, Vol. 14, No. 1, March, 1991. |
Keywords | |
Abstract | Data processing applications must often execute collections ofrelated transactions.We propose a model for structuring and coordinatingthese multi-transaction activities.The model includes mechanisms forcommunication between transactions, for compensating transactions afteran activity has failed, for dynamic creation and binding of activities, andfor checkpointing the progress of an activity.THIS IS A REVISED VERSION OF TECHNICAL REPORTCS-TR-297-90, DEPARTMENT OF COMPUTER SCIENCE, PRINCETONUNIVERSITY,FEBRUARY1990. ASHORTER VERSION OF THISPAPER APPEARED AS ``Modeling Long-Running Activities as NestedSagas,''INDatabase Engineering,Vol. 14, No. 1, March, 1991. |
URL | http://www-db.stanford.edu/pub/papers/tsimmis-abstract.ps |
---|---|
Title | Integrating and Accessing Heterogeneous Information Sources in TSIMMIS |
Authors | H. Garcia-Molina , J. Hammer , K. Ireland , Y. Papakonstantinou , J. Ullman , J. Widom |
Year | 1994 |
Citation | Submitted for publication. |
Keywords | |
Abstract | Hector Garcia-Molina, Joachim Hammer, Kelly Ireland, Yannis Papakonstantinou, Jeffrey Ullman, Jennifer Widom Department of Computer Science Stanford University Stanford, CA 94305-2140 last-name@cs.stanford.edu 1 TSIMMIS and its Components A common problem facing many organizations today is that of multiple, disparate information sources and repositories, including databases, object stores, knowledge file systems, digital libraries, information retrieval systems, and electronic mail systems. Decision makers often need information from multiple sources, but are unable to get and fuse the required information in a timely fashion due to the diffculties of accessing the different systems, and due to the fact that the information obtained can be inconsistent and contradictory. The goal of the TSIMMIS 1 project is to provide tools for accessing, in an integrated fashion, multiple information sources. Numerous other recent projects have similar goals and are discussed briefly at the end of th |
URL | http://www-db.stanford.edu/pub/papers/tsimmis-models-languages.ps |
---|---|
Title | The TSIMMIS Approach to Mediation: Data Models and Languages |
Authors | H. Garcia-Molina , Y. Papakonstantinou , D. Quass , A. Rajaraman , Y. Sagiv , J. Ullman , J. Widom |
Year | 1995 |
Citation | To appear in NGITS (Next Generation Information Technologies and Systems), Naharia, Isreal, June 27-29, 1995. |
Keywords | |
Abstract | TSIMMIS -- The Stanford-IBM Manager of Multiple Information Sources -- is a system for integrating information. It offers a data model and a common query language that are designed to support the combining of information from many different sources. It also offers tools for generating automatically the components that are needed to build systems for integrating information. In this paper we shall discuss the principal architectural features and their rationale. |
URL | http://www-db.stanford.edu/pub/papers/tsimmis-abstract-aaai.ps |
---|---|
Title | Integrating and Accessing Heterogeneous Information Sources in TSIMMIS |
Authors | H. Garcia-Molina , J. Hammer , K. Ireland , Y. Papakonstantinou , J. Ullman , J. Widom |
Year | 1995 |
Citation | To appear in Proceedings of AAAI Spring Symposium on Information Gathering |
Keywords | |
Abstract | The goal of the Tsimmis Project is to develop tools that facilitate the rapid integration of heterogeneous information sources that may include both structured and unstructured data. This paper gives an overview of the project, describing components that extract properties from unstructured objects, that translate information into a common object model, that combine information from several sources, and that allow browsing of information. TSIMMIS and its Components A common problem facing many organizations today is that of multiple, disparate information sources and repositories, including databases, object stores, knowledge bases, file systems, digital libraries, information retrieval systems, and electronic mail systems. Decision makers often need information from multiple sources, but are unable to get and fuse the required information in a timely fashion due to the diffculties of accessing the different systems, and due to the fact that the information obtained can be inconsisten |
URL | http://www-db.stanford.edu/pub/papers/pdis96.ps |
---|---|
Title | dSCAM: Finding Document Copies across Multiple Databases |
Authors | H. Garcia-Molina , L. Gravano , N. Shivakumar |
Year | 1996 |
Citation | Proceedings of 4th International Conference on Parallel and Distributed Information Systems (PDIS'96), Miami Beach, Florida |
Keywords | |
Abstract | The advent of the Internet has made the illegal dissemination of copyrighted material easy. An important problem is how to automatically detect when a "new" digital document is "suspiciously close" to existing ones. The SCAM project at Stanford University has addressed this problem when there is a single registered-document database. However, in practice, text documents may appear in many autonomous databases, and one would like to discover copies without having to exhaustively search in all databases. Our approach, dSCAM, is a distributed version of SCAM that keeps succinct metainformation about the contents of the available document databases. Given a suspicious document S, dSCAM uses its information to prune all databases that cannot contain any document that is close enough to S, and hence the search can focus on the remaining We also study how to query the remaining databases so as to minimize different querying costs. We empirically study the pruning and searching schemes, using |
URL | http://www-db.stanford.edu/pub/papers/dataguide.ps |
---|---|
Title | DataGuides: Enabling Query Formulation and Optimization in Semistructured Databases |
Authors | R. Goldman , J. Widom |
Year | 1997 |
Citation | Technical Report |
Keywords | |
Abstract | In F7Times-ItalicF7S38semistructured databases there is no schema fixed in advance. To provide the benefits of a schemain such environments, we introduce DataGuides: concise and accurate structural summaries ofsemistructured databases. DataGuides serve as dynamic schemas, generated from the database; they areuseful for browsing database structure, formulating queries, storing information such as statistics andsample values, and enabling query optimization. This paper presents the theoretical foundations ofDataGuides along with algorithms for their creation and incremental maintenance. We provideperformance results based on our implementation of DataGuides in the Lore DBMS for semistructureddata. We also describe the use of DataGuides in Lore, both in the user interface to enable structurebrowsing and query formulation, and as a means of guiding the query processor and optimizing queryexecution.F5S58 |
URL | http://www-db.stanford.edu/pub/papers/stan.cs.tn.93.002.ps |
---|---|
Title | The Efficacy of GlOSS for the Text Database Discovery Problem |
Authors | L. Gravano , H. Garcia-Molina , A. Tomasic |
Year | 1993 |
Citation | Stanford University Technical Note Number STAN-CS-TN-93-002, 1993; part of paper appeared in SIGMOD '94 (stan.cs.tn.93.002.sigmod94.ps) |
Keywords | |
Abstract | The popularity of information retrieval has led users to a new problem: finding which text databases (out of thousands of candidate choices) are the most relevant to a user. Answering a given query with a list of relevant databases is the text database discovery problem. The first part of this paper presents a practical method for attacking this problem based on estimating the result size of a query and a database. The method is termed GlOSS-Glossary of Servers Server. The second part of this paper evaluates GlOSS using four different semantics to answer a user's queries. Real users' queries were used in the experiments. We also describe several variations of GlOSS and compare their effcacy. In addition, we analyze the storage cost of our approach to the problem. |
URL | http://www-db.stanford.edu/pub/papers/stan.cs.tn.93.002.sigmod94.ps |
---|---|
Title | The Effectiveness of GlOSS for the Text-Database Discovery Problem |
Authors | L. Gravano , H. Garcia-Molina , A. Tomasic |
Year | 1994 |
Citation | Appeared in SIGMOD '94; shortened version of stan.cs.tn.93.002.ps |
Keywords | |
Abstract | The popularity of on-line document databases has led to a new problem: finding which text databases (out of many candidate choices) are the most relevant to a user. Identifying the relevant databases for a given query is the text database discovery problem. The first part of this paper presents a practical solution based on estimating the result size of a query and a database. The method is termed GlOSS{Glossary of Servers Server. The second part of this paper evaluates the effectiveness of GlOSS based on a trace of real user queries. In addition, we analyze the storage cost of our approach. |
URL | http://www-db.stanford.edu/pub/papers/stan.cs.tn.94.010.ps |
---|---|
Title | Precision and Recall of GlOSS Estimators for Database Discovery |
Authors | L. Gravano , H. Garcia-Molina , A. Tomasic |
Year | 1994 |
Citation | Stanford University Technical Note Number STAN-CS-TN-94-010, 1994; part of paper appeared in PDIS '94 (stan.cs.tn.94.010.pdis94.ps) |
Keywords | |
Abstract | The availability of large numbers of network information sources has led to a new problem: finding which text databases (out of perhaps thousands of choices) are the most relevant to a query. We call this the text-database discovery problem. Our solution to this problem, GlOSS{Glossary-Of-Servers Server, keeps statistics on the available databases to decide which ones are potentially useful for a given query. In this paper we present different query-result size estimators for GlOSS and we evaluate them with metrics based on the precision and recall concepts of text-document information-retrieval theory. Our generalization of these metrics uses different notions of the set of relevant databases to define different query semantics. |
URL | http://www-db.stanford.edu/pub/papers/stan.cs.tn.94.010.pdis94.ps |
---|---|
Title | Precision and Recall of GlOSS Estimators for Database Discovery |
Authors | L. Gravano , H. Garcia-Molina , A. Tomasic |
Year | 1994 |
Citation | Appeared in PDIS '94; shortened version of stan.cs.tn.94.010.ps |
Keywords | |
Abstract | Precision and Recall of GlOSS Estimators for Database Discovery Luis Gravano H ector Garc a-Molina Anthony Tomasic Computer Science Department Stanford University Stanford, CA 94305-2140 fgravano,hector,tomasicg@cs.stanford.edu 1 Overview On-line information vendors offer access to multiple databases. In addition, the advent of a variety of INTERNET tools [1, 2] has provided easy, distributed access to many more databases. The result is thousands of text databases from which a user may choose for a given information need (a user This paper, an abridged version of [3], presents a framework for (and analyzes a solution to) this problem, which we call the text-database discovery problem (see [3] for a survey of related wOur solution to the text-database discovery problem is to build a service that can suggest potentially good databases to search. A user's query will go through two steps: first, the query is presented to our server (dubbed GlOSS, for Glossary-Of-Servers Server) to select |
URL | http://www-db.stanford.edu/pub/papers/stan.cs.tn.95.21.ps |
---|---|
Title | Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies |
Authors | L. Gravano , H. Garcia-Molina |
Year | 1995 |
Citation | Stanford University Technical Note Number STAN-CS-TN-95-21, May 1995; shortened version appeared in VLDB '95 |
Keywords | |
Abstract | As large numbers of text databases have become available on the Internet, it is getting harder to locate the right sources for given queries. In this paper we present gGlOSS, a generalized Glossary-Of-Servers Server, that keeps statistics on the available databases to estimate which databases are the potentially most useful for a given query. gGlOSS extends our previous work [GGMT94a], which focused on databases using the boolean model of document retrieval, to cover databases using the more sophisticated vector-space retrieval model. We evaluate our new techniques using real-user queries and 53 databases. Finally, we further generalize our approach by showing how to build a hierarchy of gGlOSS brokers. The top level of the hierarchy is so small it could be widely replicated, even at end-user workstations. Keywords: resource discovery, database selection, vector-space retrieval model, information retrieval, text databases |
URL | http://www-db.stanford.edu/pub/papers/vldb95.ps |
---|---|
Title | Generalizing GlOSS to Vector-Space Databases and Broker Hierarchies |
Authors | L. Gravano , H. Garcia-Molina |
Year | 1995 |
Citation | Appeared in VLDB '95; shortened version of stan.cs.tn.95.21.ps |
Keywords | |
Abstract | As large numbers of text databases have become available on the Internet, it is harder to locate the right sources for given queries. In this paper we present gGlOSS, a generalized Glossary-Of-Servers Server, that keeps statistics on the available databases to estimate which databases are the potentially most useful for a given query. gGlOSS extends our previous work [1], which focused on databases using the boolean model of document retrieval, to cover databases using the more sophisticated vector-space retrieval model. We evaluate our new techniques using real-user queries and 53 databases. Finally, we further generalize our approach by showing how to build a hierarchy of gGlOSS brokers. The top level of the hierarchy is so small it could be widely replicated, even at end-user workstations. This research was sponsored by the Advanced Research Projects Agency (ARPA) of the Department of Defense under Grant No. MDA972-92-J-1029 with the Corporation for National Research Initiatives Th |
URL | http://www-db.stanford.edu/pub/papers/sigmod97.ps |
---|---|
Title | STARTS: Stanford Proposal for Internet Meta-Searching |
Authors | L. Gravano , K. Chang , H. Garcia-Molina , A. Paepcke |
Year | 1996 |
Citation | Submitted for publication |
Keywords | |
Abstract | Document sources are available everywhere, both within the internal networks of organizations and on the Internet. Even individual organizations use search engines from different vendors to index their internal document collections. These search engines are typically incompatible in that they support different query models and interfaces, they do not return enough information with the query results for adequate merging of the results, and finally, in that they do not export metadata about the collections that they index (e.g., to assist in resource discovThis paper describes STARTS, an emerging protocol for Internet retrieval and search that facilitates the task of querying multiple document sources. STARTS has been developed in a unique way. It is not a standard, but a group effort coordinated by Stanford's Digital Library project, and involving over 11 companies and organizations. The objective of this paper is not only to give an overview of the STARTS protocol proposal, but also t |
URL | http://www-db.stanford.edu/pub/papers/cc-federated.ps |
---|---|
Title | Protocols for Integrity Constraint Checking in Federated Databases |
Authors | P. Grefen , J. Widom |
Year | 1994 |
Citation | Technical report; shortened proceedings version appears in coopis.ps |
Keywords | distributed database, heterogeneous database |
Abstract | A federated database is comprised of multiple interconnected database systems that primarily operate independently but cooperate to a certain extent. Global integrity constraints can be very useful in federated databases, but the lack of global queries, global transaction mechanisms, and global concurrency control renders traditional constraint management techniques inapplicable. This paper presents a threefold contribution to integrity constraint checking in federated databases: (1) The problem of constraint checking in a federated database environment is clearly formulated. (2) A family of protocols for constraint checking is presented. (3) The differences across protocols in the family are analyzed with respect to system requirements, properties guaranteed by the protocols, and processing and communication costs. Thus, our work yields a suite of options from which a protocol can be chosen to suit the system capabilities and integrity requirements of a particular federated database |
URL | http://www-db.stanford.edu/pub/papers/coopis.ps |
---|---|
Title | Integrity Constraint Checking in Federated Databases |
Authors | P. Grefen , J. Widom |
Year | 1996 |
Citation | To appear in CoopIS '96; shortened proceedings version of cc-federated.ps |
Keywords | distributed database, heterogeneous database |
Abstract | A federated database is comprised of multiple interconnected databases that cooperate in an autonomous fashion. Global integrity constraints are very useful in federated databases, but the lack of global queries, global transaction mechanisms, and global concurrency control renders traditional constraint management techniques inapplicable. This paper presents a threefold contribution to integrity constraint checking in federated databases: (1) The problem of constraint checking in a federated database environment is clearly formulated. (2) A family of cooperative protocols for constraint checking is presented. (3) The differences across protocols in the family are analyzed with respect to system requirements, properties guaranteed, and costs involved. Thus, we provide a suite of options with protocols for various environments with specific system capabilities and integrity requirements. |
URL | http://www-db.stanford.edu/pub/papers/sigmod93-1.ps |
---|---|
Title | Local Verification of Global Integrity Constraints in Distributed Databases |
Authors | A. Gupta , J. Widom |
Year | 1993 |
Citation | Appeared in SIGMOD '93 |
Keywords | |
Abstract | We present an optimization for integrity constraint verification in distributed databases. The optimization allows a global constraint, i.e. a constraint spanning multiple databases, to be verified by accessing data at a single database, eliminating the cost of accessing remote data. The optimization is based on an algorithm that takes as input a global constraint and data to be inserted into a local database. The algorithm produces a local condition such that if the local data satisfies this condition then, based on the previous satisfaction of the global constraint, the global constraint is still satisfied. If the local data does not satisfy the condition, then a conventional global verification procedure is required. |
URL | http://www-db.stanford.edu/pub/papers/pods94.ps |
---|---|
Title | Constraint Checking with Partial Information |
Authors | A. Gupta , Y. Sagiv , J. Ullman , J. Widom |
Year | 1994 |
Citation | Appeared in PODS '94 |
Keywords | |
Abstract | Constraints are a valuable tool for managing information across multiple databases, as well as for general purposes of assuring data integrity. However, effcient implementation of constraint checking is diffcult. In this paper we explore techniques for assuring constraint satisfaction without performing a complete evaluation of the constraints. We consider methods that use only constraint definitions, methods that use constraints and updates, and methods that use constraints, updates, and "local" data. |
URL | http://www-db.stanford.edu/pub/papers/GP2.ps |
---|---|
Title | Generalized Projections: A Powerful Approach To Aggregation |
Authors | A. Gupta , V. Harinarayan , D. Quass |
Year | 1995 |
Citation | UNpublished Manuscript. A full version of the paper in VLDB'95 |
Keywords | |
Abstract | In this paper we introduce generalized projections an extension of duplicate-eliminating projections, that capture aggregations, groupbys, conventional projection with duplicate elimination (distinctand duplicate-preserving projections in a common unified framework. Using GPs we extend well known and simple algorithms for SQL queries that use distinct projections to derive algorithms for queries using aggregations like sum-max-min-count and avg. We develop powerful query rewrite rules for aggregate queries that unify and extend rewrite rules previously known in the literature. We then illustrate the power of our approach by solving a very practical and important problem in data warehousing: how to answer an aggregate query about base tables using materialized aggregate views (summary Keywords: aggregation, data warehousing, materialized views, query optimization |
URL | http://www-db.stanford.edu/pub/papers/vldb.ps |
---|---|
Title | Aggregate-Query Processing in Data Warehousing Environments |
Authors | A. Gupta , V. Harinarayan , D. Quass |
Year | 1995 |
Citation | Appeared in VLDB'95 |
Keywords | |
Abstract | In this paper we introduce generalized projections (G P an extension of duplicateeliminating projections, that capture aggregations, groupbys, duplicate-eliminating projections (distinctand duplicate-preserving projections in a common unified framework. Using G P s we extend well known and simple algorithms for SQL queries that use distinct projections to derive algorithms for queries using aggregations like sum, max, min, count, and avg. We develop powerful query rewrite rules for aggregate queries that unify and extend rewrite rules previously known in the literature. We then illustrate the power of our approach by solving a very practical and important problem in data warehousing: how to answer an aggregate query on base tables using materialized aggregate views (summary |
URL | http://www-db.stanford.edu/pub/papers/parc.ps |
---|---|
Title | Communication Efficient Matrix-Multiplication on Hypercubes. |
Authors | H. Gupta , P. Sadayappan |
Year | 1994 |
Citation | In Proceedings of the Sixth Annual ACM Symposium on Parallel Algorithms and Architectures, 1994 (SPAA '94). Extended version in Parallel Computing, 22 (1996) 75-99. |
Keywords | |
Abstract | 1 Introduction Dense matrix multiplication is used in a variety of applications and is one of the core components in many scientific computations. The standard way of multiplying two matrices of size n n requires O(n 3 ) floating point operations on a sequential machine. Since dense matrix multiplication is computationally expensive, the development of effcient algorithms for large distributed memory machines is of great interest. Matrix multiplication is a very regular computation and lends itself well to parallel implementation. One of the effcient approaches to design other parallel matrix or graph algorithms is to decompose them into a sequence of matrix multiplications [3, 9]. One of the earliest distributed algorithms proposed for matrix multiplication was by Cannon [2] in 1969 for 2-D meshes. Ho, Johnsson, and Edelman in [8] presented a variant of Cannon's algorithm which uses the full bandwidth of a 2-D grid embedded in a hypercube. Some other algorithms are by Dekel, Nassimi |
URL | http://www-db.stanford.edu/pub/papers/joa.ps |
---|---|
Title | Constructing Piecewise Linear Homeomorphisms in Simple Polygons. |
Authors | H. Gupta , R. Wenger |
Year | 1995 |
Citation | To appear in Journal of Algorithms. |
Keywords | |
Abstract | Constructing Piecewise Linear Homeomorphisms of Simple Polygons Himanshu Gupta Rephael W enger y December 5, 1995 Stanford University , Stanford CA 94305 (hgupta@cs.stanford.edu) W ork done while at The Ohio State University , Columbus, Ohio 43210. y Ohio State University , Columbus OH 43210 (wSupported in part by NSA grant MDA904-93-H-3026. 1 |
URL | http://www-db.stanford.edu/pub/papers/ver_np.ps |
---|---|
Title | Result Verification Algorithms for Optimization Problems. |
Authors | H. Gupta |
Year | 1995 |
Citation | Technical Report, UIUC, 1995. |
Keywords | |
Abstract | In this article we discuss the design of result verification algorithms for optimization problems. In particular, we design time-optimal result verification algorithms which verify the solution of all-pairs shortest paths, maximum-flow in a network, and matching problems. W e prove that polynomial-time verification algorithms for NP-complete problems do not exist exist, unless P = NP. Result verification problems for most of the NP-hard problems are not believed to be in NP. W e also consider verification algorithms for approximation algorithms for NP-complete and NP-hard problems. |
URL | http://www-db.stanford.edu/pub/papers/CubeIndex.ps |
---|---|
Title | Index Selection for OLAP |
Authors | H. Gupta , V. Harinarayan , A. Rajaraman , J. Ullman |
Year | 1996 |
Citation | Unpublished manuscript. |
Keywords | |
Abstract | On-line analytical processing (OLAP) is a recent and important application of database systems. Typically, OLAP data is presented as a multidimensional "data cube." OLAP queries are complex and can take many hours or even days to run, if executed directly on the raw data. The most common method of reducing execution time is to precompute some of the queries into summary tables (subcubes of the data cube) and then to build indexes on these summary tables. In most commercial OLAP systems today, the summary tables that are to be precomputed are picked first, followed by the selection of the appropriate indexes on them. A trial-and-error approach is used to divide the space available between the summary tables and the indexes. This two-step process can perform very poorly. Since both summary tables and indexes consume the same resource --space -- their selection should be done together for the most effcient use of space. In this paper, we give algorithms that automate the selection of sum |
URL | http://www-db.stanford.edu/pub/papers/CubeIndex.ps |
---|---|
Title | Index Selection for OLAP |
Authors | H. Gupta , V. Harinarayan , A. Rajaraman |
Year | |
Citation | Submitted Keywords: Views, Indexes, meterialized views, selection algorithms |
Keywords | |
Abstract | On-line analytical processing (OLAP) is a recent and important application of database systems. Typically, OLAP data is presented as a multidimensional "data cube." OLAP queries are complex and can take many hours or even days to run, if executed directly on the raw data. The most common method of reducing execution time is to precompute some of the queries into summary tables (subcubes of the data cube) and then to build indexes on these summary tables. In most commercial OLAP systems today, the summary tables that are to be precomputed are picked first, followed by the selection of the appropriate indexes on them. A trial-and-error approach is used to divide the space available between the summary tables and the indexes. This two-step process can perform very poorly. Since both summary tables and indexes consume the same resource --space -- their selection should be done together for the most effcient use of space. In this paper, we give algorithms that automate the selection of sum |
URL | http://www-db.stanford.edu/pub/papers/garlic-debull.ps |
---|---|
Title | An Optimizer for Heterogeneous Systems with Non-Standard Data and Search Capabilities |
Authors | L. Haas , D. Kossmann , E. Wimmers , J. Yang |
Year | 1996 |
Citation | In Special Issue on Query Processing for Non-Standard Data, IEEE Data Engineering Bulletin, 19(4):37-43, December 1996. |
Keywords | |
Abstract | Much of the world's nonstandard data resides in specialized data sources. This data must often be queried together with data from other sources to give users the information they desire. Queries that can span several specialized sources with diverse search capabilities pose new challenges for query optimization. This paper describes the design and illustrates the use of a general purpose optimizer that can be taught about the capabilities of a new data source. Our optimizer produces plans of high quality even in the presence of nonstandard data, strange methods, and unusual query processing capabilities, and it easy to add new sources of standard or nonstandard data. |
URL | http://www-db.stanford.edu/pub/papers/garlic.ps |
---|---|
Title | Optimizing Queries across Diverse Data Sources |
Authors | L. Haas , D. Kossmann , E. Wimmers , J. Yang |
Year | 1997 |
Citation | Submitted for conference publication |
Keywords | |
Abstract | Businesses today rely on large collections of data stored in diverse systems with differing capabilities. Database middleware systems offer the possibility of inter-relating data from these various sources via a single high-level query interface. One key issue for these systems is how to process queries in an environment in which each source of data may have its own unique set of query processing capabilities. In this paper we present the design of a query optimizer for Garlic [C + 95], a middleware system designed to integrate data from a broad range of data sources, with very different query capabilities. Garlic's optimizer extends the rule-based approach of Lohman [Loh88] to work in a heterogeneous environment, by defining generic rules, which produce specific plans using wrapper-provided rules that encapsulate the capabilities of a particular data source. This approach offers great advantages in terms of plan quality, extensibility to new sources, incremental implementation of rul |
URL | http://www-db.stanford.edu/pub/papers/ijicis-93.ps |
---|---|
Title | An Approach to Resolving Semantic Heterogeneity in a Federation of Autonomous, Heterogeneous Database Systems |
Authors | J. Hammer , D. McLeod |
Year | 1993 |
Citation | In International Journal of Intelligent & Cooperative Information Systems, World Scientific, Vol. 2, Num. 1, 51-83, 1993, M. Papazoglou and T. Sellis, editors-in-chief. |
Keywords | Federated, heterogeneous, semantic, integration, resolution, lexicon |
Abstract | An approach to accommodating semantic heterogeneity in a federation of interoperable, autonomous, heterogeneous databases is presented. A mechanism is described for identifying and resolving semantic heterogeneity while at the same time honoring the autonomy of the database components that participate in the federation. A minimal, common data model is introduced as the basis for describing sharable information, and a three-pronged facility for determining the relationships between information units (objects) is developed. Our approach serves as a basis for the sharing of related concepts through (partial) schema unification without the need for a global view of the data that is stored in the different components. The mechanism presented here can be seen in contrast with more traditional approaches such as "integrated databases" or "distributed databases". An experimental prototype implementation has been constructed within the framework of the Remote-Exchange experimental system. Keyw |
URL | http://www-db.stanford.edu/pub/papers/dbta.ps |
---|---|
Title | Object Discovery and Unification in a Federated Database System |
Authors | J. Hammer , D. McLeod , A. Si |
Year | 1993 |
Citation | In Proceedings of the Workshop on Interoperability of Database Systems and Database Applications, pages 3-18. Swiss Information Society, University of Fribourg, Switzerland, October 1993. |
Keywords | Federated, semantic, integration, resolution, dictionary, broker |
Abstract |
URL | http://www-db.stanford.edu/pub/papers/hammer.thesis.ps |
---|---|
Title | Resolving Semantic Heterogeneity in a Federation of Autonomous, Heterogeneous database Systems. |
Authors | J. Hammer |
Year | 1994 |
Citation | Technical Report USC-CS-94-569, Computer Science Department, University of Southern California, Los Angeles, CA 90089-0781, May 1994. Ph.D. thesis. |
Keywords | Federated, heterogeneous, semantic, resolution, lexicon, integration |
Abstract | Resolving Semantic Heterogeneity in a Federation of Autonomous, Heterogeneous Database Systems by Joachim HammerA Dissertation Presented to the F ACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (Computer Science) August 1994 Copyright 1994 Joachim Hammer |
URL | http://www-db.stanford.edu/pub/papers/hicss.ps |
---|---|
Title | An Intelligent System for Identifying and Integrating Non-Local Objects in Federated Database Systems |
Authors | J. Hammer , D. McLeod , A. Si |
Year | 1994 |
Citation | In Proceedings of the 27th Hawaii International Conference on System Sciences, pages 389-407. Computer Society of the IEEE, University of Hawaii, Hawaii, USA, January 1994. |
Keywords | Federated, semantic, integration, resolution, dictionary, broker |
Abstract | Support for interoperability among autonomous, heterogeneous database systems is emerging as a key information management problem for the 1990s. A key challenge for achieving interoperability among multiple database systems is to provide capabilities to allow information units and resources to be flexibly and dynamically combined and interconnected while at the same time, preserve the investment in and autonomy of each existing system. This research specifically focuses on two key aspects of this: (1) how to discover the location and content of relevant, non-local information units, and (2) how to identify and resolve the semantic heterogeneity that exists between related information in different database components. We demonstrate and evaluate our approach using the Remote-Exchange experimental prototype system, which supports information sharing and exchange from the above perspe |
URL | http://www-db.stanford.edu/pub/papers/demo-final.ps |
---|---|
Title | Information Translation, Mediation, and Mosaic-Based Browsing in the TSIMMIS System |
Authors | J. Hammer , H. Garcia-Molina , K. Ireland , Y. Papakonstantinou , J. Ullman , J. Widom |
Year | 1995 |
Citation | In Exhibits Program, SIGMOD '95. |
Keywords | WWW, interface, browsing, graphical, prototype |
Abstract | Information T ranslation, Mediation, and Mosaic-Based Browsing in the TSIMMIS System Joachim Hammer, Hector Garcia-Molina, Kelly Ireland, Yannis Papakonstantinou, Jeffrey Ullman, Jennifer Widom Department of Computer Science Stanford University, Stanford, CA 94305-2140, USA E-mail: joachim@cs.stanford.edu The tsimmis project [1] provides an architecture and tools for accessing multiple heterogeneous information sources by translating source information into a common self-describing object model, called the Object Exchange Model (OEM) [2]. tsimmis provides integrated access to heterogeneous sources through a layer of source specific translators as well as "intelligent" modules, called mediators. Translators (wrappers) convert queries over information in the common del (OEM) into requests the source can execute. The data returned by the source is converted back into the common model. Mediators are programs that collect information from one or more sources, process and combine it, and e |
URL | http://www-db.stanford.edu/pub/papers/whips-overview.ps |
---|---|
Title | The Stanford Data Warehousing Project |
Authors | J. Hammer , H. Garcia-Molina , J. Widom , W. Labio , Y. Zhuge |
Year | 1995 |
Citation | In IEEE Data Engineering Bulletin, June 1995 |
Keywords | Warehouse, view, integration, source monitor, diff operator, |
Abstract | The goal of the data warehousing project at Stanford (the WHIPS project) is to develop algorithms and tools for the effcient collection and integration of information from heterogeneous and autonomous sources, including legacy sources. In this paper we give a brief overview of the WHIPS project, and we describe some of the research problems being addressed in the initial phase of the project. |
URL | http://www-db.stanford.edu/pub/papers/chapter.ps |
---|---|
Title | On the Resolution of Representational Diversity in Multidatabase Systems |
Authors | J. Hammer , D. McLeod |
Year | 1995 |
Citation | Submitted for publication |
Keywords | Multidatabase system, resolution, semantic, integration, heterogeneity |
Abstract | On the Resolution of Representational Diversity Joachim Hammer Department of Computer Science Stanford University Stanford, CA 94305 Dennis McLeoDepartment of Computer Science University of Southern California Los Angeles, CA 90089-0781 September 1995 1 Introduction A major problem that arises frequently when attempting to support information sharg among autonomous heterogeneous database systems is the occurrence of representational differences that exist among related data objects in different component systems. A collection of cooperating but heterogeneous, autonomous component database systems may be termed a multidatabase system (MDBS) [LMR90], or loosely-coupled federated database system [SL90]. Such cooperating systems attempt to support operations for partial and controlled sharing of data with minimal affect on existing applications. In this context, a key to supporting sharing involves folding of non-local data into the schema of an importing (local) component as gracefully a |
URL | http://www-db.stanford.edu/pub/papers/rules-overview.ps |
---|---|
Title | An Overview of Production Rules in Database Systems |
Authors | E. Hanson , J. Widom |
Year | 1993 |
Citation | Appeared in The Knowledge Engineering Review, June 1993; published while at IBM Almaden |
Keywords | active database, survey |
Abstract | Database researchers have recognized that integrating a production rules facility into a database system provides a uniform mechanism for a number of advanced database features including integrity constraint enforcement, derived data maintenance, triggers, protection, version control, and others. In addition, a database system with rule processing capabilities provides a useful platform for large and effcient knowledge-base and expert systems. Database systems with production rules are referred to as active database systems, and the field of active database systems has indeed been active. This paper summarizes current work in active database systems and suggests future research directions. Topics covered include database rule languages, rule processing semantics, and implementation issues. 0 |
URL | http://www-db.stanford.edu/pub/papers/icdt.ps |
---|---|
Title | Optimization Using Tuple Subsumption. |
Authors | V. Harinarayan , A. Gupta |
Year | 1994 |
Citation | Appeared in ICDT 1995 |
Keywords | |
Abstract | A tuple t 1 of relation R subsumes tuple t 2 of R, with respect to a query Q if for every database, tuple t 1 derives all, and possibly more, answers to query Q than derived by tuple t 2 . Therefore, the subsumed tuple t 2 can be ignored with respect to Q in the presence of tuple t 1 in relation R. This property finds use in a large number of problems. For instance: during query optimization subsumed tuples can be ignored thereby avoiding the computation of redundant answers; the size of cached information in distributed and object oriented systems can be reduced by omitting subsumed tuples; constraints need not be checked and rules need not be recomputed when provably subsumed updates are made. We give algorithms for deciding effciently when a tuple subsumes another tuple for queries that use arbitrary mathematical functions. We characterize queries for which, whenever a set of tuples T subsumes a tuple t then one of the tuples in T also subsumed t, yielding effciently verifiable cas |
URL | http://www-db.stanford.edu/pub/papers/GP.ps |
---|---|
Title | Generalized Projections: A Powerful Query-Optimization Technique. |
Authors | V. Harinarayan , A. Gupta |
Year | 1994 |
Citation | Technical Note: STAN-CS-TN-94-14 |
Keywords | |
Abstract |
URL | http://www-db.stanford.edu/pub/papers/cube.ps |
---|---|
Title | Implementing Data Cubes Efficiently |
Authors | V. Harinarayan , A. Rajaraman , J. Ullman |
Year | 1995 |
Citation | Appeared in SIGMOD'96. Best Paper Award. |
Keywords | |
Abstract | Decision support applications involve complex queries on very large databases. Since response times should be small, query optimization is critical. Users typically view the data as multidimensional data cubes. Each cell of the data cube is a view consisting of an aggregation of interest, like total sales. The values of many of these cells are dependent on the values of other cells in the data cube. A common and powerful query optimization technique is to materialize some or all of these cells rather than compute them from raw data each time. Commercial systems differ mainly in their approach to materializing the data cube. In this paper, we investigate the issue of which cells (views) to materialize when it is too expensive to materialize all views. A lattice framework is used to express dependencies among views. We then present greedy algorithms that work off this lattice and determine a good set of views to materialize. The greedy algorithm performs within a small constant factor o |
URL | http://www-db.stanford.edu/pub/papers/fdvsm.ps |
---|---|
Title | Exploiting Dependencies to Enhance View Self-Maintainability |
Authors | N. Huyn |
Year | 1996 |
Citation | Detailed version of a paper submitted for publication. |
Keywords | |
Abstract | View self-maintenance is the process of incrementally refreshing a materialized view using the view instance and the update to some base relation, but without examining any of the base relations, or using only a specified subset of the base relations. A new problem is created when base data use is limited in view maintenance: can the view be maintained at all? When a base relation is not available for use in maintaining a view, the next-best form of knowledge that may be available is integrity constraints the relation is known to satisfy. Constraints are often available in practice, yet very little is known about how they can be exploited fully to enhance view self-maintainability The problem becomes even harder when the SM tests are required to be in some effcient query form, e.g., SQL queries. In this paper, we focus on the problem of determining view SM in the presence of functional dependencies. We first show that SM of a conjunctive-query (CQ) view can be reduced to a problem of |
URL | http://www-db.stanford.edu/pub/papers/cqvsm.ps |
---|---|
Title | Efficient View Self-Maintenance |
Authors | N. Huyn |
Year | 1996 |
Citation | Appeared in Views 96 Workshop. |
Keywords | |
Abstract | We consider the problem of maintaining a materialized view without accessing the base relations. More specifically, we would like to find a maximal test that guarantees that a view is self-maintainable (abbrev SM) under a given update to the base relations, i.e., can be maintained using only the view definition, its contents and the update. We observe that SM evaluation can be separated into a view-definition-time portion where a maximal test is generated solely based on the view definition, and an update-time portion where the test can be effciently applied to the view and the update. We call such a maximal test a Complete Test for View Self-Maintainability (abbrev This paper reports on some interesting new results for conjunctive-query views under insertion updates: 1) the CTSM's are extremely simple queries that look for certain tuples in the view to be maintained; 2) these CTSM's can be generated at view definition time using a very simple algorithm based on the concept of Minimal |
URL | http://www-db.stanford.edu/pub/papers/testing-cqcn.ps |
---|---|
Title | Testing CQCN constraints Under Limited Data Access |
Authors | N. Huyn |
Year | 1996 |
Citation | Technical Report. |
Keywords | |
Abstract |
URL | http://www-db.stanford.edu/pub/papers/cqvsm-tr.ps |
---|---|
Title | Efficient View Self-Maintenance, A TECHNICAL REPORT |
Authors | N. Huyn |
Year | 1996 |
Citation | Technical report. Detailed version of a paper that appeared in Views 96 Workshop. |
Keywords | |
Abstract | We revisit the problem of finding a maximal test that guarantees that a view can be maintained using only the view, its definition and the update, regardless of the actual database instance. These maximal tests, we call Complete Tests for View Self-Maintainability (abbrev are also known in the literature as necessary and suffcient conditions for Conditionally Autonomously Computable Updates ([TB88This paper focuses on finding an effcient evaluation of self-maintainability (abbrev Our working hypothesis is that at least for conjunctive-query (abbrev CQ) views (i.e. select-project-join views with only equality SM evaluation can be separated into a view-definition-time portion where a maximal test is generated solely based on the view definition, and an update-time portion where the test is actually applied to the view and the update. We also conjecture that these tests can be expressed as queries over the view in some traditional query language. The obvious practical significance would |
URL | http://www-db.stanford.edu/pub/papers/cqcnclt.ps |
---|---|
Title | Efficient Complete Local Tests for Conjunctive Query Constraints with Negation |
Authors | N. Huyn |
Year | 1996 |
Citation | To appear in Proceedings of ICDT '97, Springer-Verlag. |
Keywords | |
Abstract | We consider the problem of incrementally checking global integrity constraints without using all the relations under constraint. In many application areas such as collaborative design, mobile computing and enterprise information systems, total data availability cannot be assumed. Even if all base data is available, some of it may incur such a high cost that its use should only be considered as a last resort. Without looking at all the base data, how can one meaningfully check a constraint for violation? When the constraint is known to be satisfied prior to the update, the state the relations that are available (aka local) can in principle be used to infer something about the relations that are not available (aka This observation is the basis for the existence of tests that guarantee that data integrity is preserved under a given update, without looking at all the base data. In order to make integrity maintenance practical, the challenge is to find those tests that are most general (we |
URL | http://www-db.stanford.edu/pub/papers/cqcnclt-tr.ps |
---|---|
Title | Testing CQC^\neg constraints under limited data access. |
Authors | N. Huyn |
Year | 1996 |
Citation | Technical report. Detailed version of a paper to appear in Proceedings of ICDT '97, Springer-Verlag. |
Keywords | |
Abstract | In many important application areas such as collaborative design, mobile computing and enterprise information systems, total data accessibility can not be assumed. Even if all data can be accessed, some of them may incur such a high cost that their access should only be considered as a last resort. Effciently checking if a local update violates a global integrity constraint spanning several relations poses new challenges when not all these relations are to be looked at. Without being given access to all relations under constraint, how can one meaningfully check a constraint for violation? When the constraint is known to be satisfied prior to the update, the state of the accessible relations can in principle be used to infer something about the inaccessible relations. Under limited data accessibility, the "complete local test" (or CLT in short) represents the best strategy to evaluate if the update can potentially violate the constraint. This paper addresses the problem of finding CLT' |
URL | http://www-db.stanford.edu/pub/papers/qxlate-pmap.ps |
---|---|
Title | Query Reformulation under Incomplete Mappings |
Authors | N. Huyn |
Year | 1996 |
Citation | Computer Science Technical Report STAN-CS-TR-96-1576. |
Keywords | |
Abstract | This paper focuses on some of the important new translatability issues that arise in the problem of interop-eration between two database schemas when mappings between these schemas are inherently more com-plex than traditional views or pure Datalog programs can capture. In many cases, sources cannot beredesigned, and mappings among them exhibit some form of incompleteness under which the question ofwhether a query can be translated across different schemas is not immediately obvious. The notion ofquery we consider here is the traditional one, in which the answers to a query are required to be denite:answers cannot be disjunctive or conditional and must refer only to domain constants. In this paper, map-pings are modeled by Horn programs that allow existential variables, and queries are modeled by pureDatalog programs. We then consider the problem of eliminating functional terms from the answers to aHorn query where function symbols are allowed. We identify a class of Horn queries call |
URL | http://www-db.stanford.edu/pub/papers/aggress-view-use.ps |
---|---|
Title | A More Aggressive Use Of Views To Extract Information |
Authors | N. Huyn |
Year | 1996 |
Citation | Computer Science Technical Report STAN-CS-TR-96-1577. |
Keywords | |
Abstract | Much recent work has focussed on using views to evaluate queries. More specifically, queries are rewritten to refer to views instead of the base relations over which the queries were originally written. The motivation is that the views represent the only ways in which some information source may be accessed. Another use of views that has been overlooked becomes important especially when no alent rewriting of a query in terms of views is possible: even though we cannot use the views to get all the answers to the query, we can still use them to deduce as many answers as possible. In many global information applications, the notion of equivalence used is often too restrictive. We propose a notion of pseudo-equivalence that allows more queries to be rewritten usefully: we show that if a query has an equivalent rewriting, the query also has a pseudo-equivalent rewriting. The converse is not true in general. In particular, when the views are conjunctive, we show that all Datalog queries ove |
URL | http://www-db.stanford.edu/pub/papers/zoo.ps |
---|---|
Title | Desktop Experiment Management |
Authors | Y. Ioannidis , M. Livny , E. Haber , R. Miller , O. Tsatalos , J. Wiener |
Year | 1993 |
Citation | Appeared in IEEE Data Engineering Bulletin, 16(1), March 1993. |
Keywords | |
Abstract | Desktop Experiment Management Y. Ioannidis M. Livny E. Haber R. Miller O. Tsatalos J. Wiener fyannis,miron,haber,rmiller,odysseas,wienerg@cs.wisc.edu Computer Sciences Dept., 1210 W. Dayton St., Univ. of Wisconsin, Madison, WI 53706 1 Motivation Traditionally, the scale and scope of an experimental study was determined by the ability of the research team to generate data. Over the past few years, we have been experiencing an unprecedented increase in the ability of small teams of experimental scientists to generate data. This has led to a shift in the balance between the different components of an experimental study. Today, it is not uncommon to find a study whose scale and scope have been determined by the ability of the team to manage the data rather than to generate it. Whether the discipline is experimental computer science [4], genetics, earth and space sciences, soil sciences, or high-energy physics, scientists are faced in their daily work with an experiment and data managemen |
URL | http://www-db.stanford.edu/pub/papers/gp.ps |
---|---|
Title | Cracking and Co-evolving Randomizers |
Authors | J. Jannink |
Year | 1994 |
Citation | in Advances in Genetic Programming; Ed. Kenneth E. Kinnear, chapter 20, MIT Press, 1994 |
Keywords | Genetic Programming, Randomizers, Co-evolution |
Abstract | Cracking and Co-evolving Randomizers Jan Jannink Stanford University Computer Science Dept. Stanford, CA 94305 e-mail: jan@cs.stanford.edu |
URL | http://www-db.stanford.edu/pub/papers/btree.ps |
---|---|
Title | Implementing Deletion in B+-trees |
Authors | J. Jannink |
Year | 1995 |
Citation | SIGMOD RECORD, v.24, n.1, p.33-38, 1995 |
Keywords | B+tree, Deletion, Pseudo-code |
Abstract | This paper describes algorithms for key deletion in B + -trees. There are published algorithms and pseudocode for searching and inserting keys, but deletion, due to its greater complexity and perceived lesser importance, is glossed over completely or left as an exercise to the reader. To remedy this situation, we provide a well documented flowchart, algorithm, and pseudo-code for deletion, their relation to search and insertion algorithms, and a reference to a freely available, complete B + -tree library written in the C programming language. |
URL | http://www-db.stanford.edu/pub/papers/profile.ps |
---|---|
Title | Data Management for User Profiles in Wireless Communication Systems |
Authors | J. Jannink , D. Lam , N. Shivakumar , J. Widom , D. Cox |
Year | 1995 |
Citation | Technical report |
Keywords | Mobile Communications, User Profile, Location Management, LMT, Hierarchical DB organization, Simulation |
Abstract | The explosive growth in wireless communications systems and the demand for advanced mobility features have created novel data management problems. Current schemes to address these problems rely on database organizations that have limited functionality and performance anomalies. We propose a new data management scheme that is flexible and scalable, and that is incrementally deployable so it can coexist with current data management standards. We compare our protocol against current standards and other suggested protocols using realistic calling and mobility patterns. To do so, we have built Pleiades, an extensible event driven simulator that is easily configurable to arbitrary geographies, networks, calling and mobility patterns, and data management schemes. We present, for the first time, models to closely approximate user calling and mobility patterns. These models are validated against real call traffc and urban vehicle data. Based on simulations for a representative 24-hour period i |
URL | http://www-db.stanford.edu/pub/papers/hiper-conf.ps |
---|---|
Title | Efficient and Flexible Location Management Techniques for Wireless Communication Systems |
Authors | J. Jannink , D. Lam , N. Shivakumar , J. Widom , D. Cox |
Year | 1996 |
Citation | Proceedings of 2nd ACM/IEEE International Conference on Mobile Computing and Networking (MOBICOM'96), White Plains, New York, November 1996 |
Keywords | Mobile Communications, User Profile, Location Management, LMT, Hierarchical DB organization, HiPeR family, Simulation |
Abstract | We consider the problem of managing the information required to locate users in a wireless communication system, with a focus on designing and evaluating location management techniques that are effcient, scalable, and flexible. The three key contributions of this paper are: (1) A family of location management techniques, HiPER (for Hierarchical ProfilE Replicationthat effciently provide lifelong (non-geographic) numbering with fast location lookup; (2) Pleiades, a scalable eventdriven wireless system simulator with realistic calling and mobility patterns derived from several months of real traffc traces; and (3) multiday simulations comparing our proposed location management techniques with current and previously proposed techniques on a realistic geographical and network topology. |
URL | http://www-db.stanford.edu/pub/papers/hiper-journ.ps |
---|---|
Title | Efficient and Flexible Location Management Techniques for Wireless Communication Systems |
Authors | J. Jannink , D. Lam , N. Shivakumar , J. Widom , D. Cox |
Year | 1996 |
Citation | Submitted for journal publication |
Keywords | Mobile Communications, User Profile, Location Management, LMT, Hierarchical DB organization, HiPeR family, Simulation |
Abstract | We consider the problem of managing the information required to locate users in a wireless communication system, with a focus on designing and evaluating location management techniques that are effcient, scalable, and flexible. The three key contributions of this paper are: (1) A family of location management techniques, HiPER (for Hierarchical ProfilE Replicationthat effciently provide life-long (non-geographic) numbering with fast location lookup; (2) Pleiades, a scalable event-driven wireless system simulator with realistic calling and mobility patterns derived from several months of real traffc traces; and (3) multi-day simulations comparing our proposed location management techniques with current and previously proposed techniques on a realistic geographical and network topology. Research supported by the Center for Telecommunications and the Center for Integrated Systems at Stanford Universityand by equipment grants from Digital and IBM Corporations. 1 |
URL | http://www-db.stanford.edu/pub/papers/rtdb.ps |
---|---|
Title | An Overview of Real-Time Database Systems |
Authors | B. Kao , H. Garcia-Molina |
Year | 1993 |
Citation | to appear in the proceedings of NATO Advanced Study Institute on Real-Time Computing. St. Maarten, Netherlands Antilles, October 9, 1992, Springer-Verlag. |
Keywords | |
Abstract | AN OVER VIEW OF REAL-TIME DA T ABASE SYSTEMS Ben Kao 1;2 and Hector Garcia-Molina 2 1 Princeton University, Princeton NJ 08544, USA 2 Stanford University, Stanford CA 94305, USA 1 Introduction Traditionally, real-time systems manage their data (e.g. chamber temperature, aircraft locations) in application dependent structures. As real-time systems evolve, their applications become more complex and require access to more data. It thus becomes necessary to manage the data in a systematic and organized fashion. Database management systems provide tools for such organization, so in recent years there has been interest in "merging" database and real-time technology. The resulting integrated system, which provides database operations with real-time constraints is generally called a real-time database system (RTDBS) [1]. Like a conventional database system, a RTDBS functions as a repository of data, provides effcient storage, and performs retrieval and manipulation of information. However, as |
URL | http://www-db.stanford.edu/pub/papers/sda1.ps |
---|---|
Title | Deadline Assignment in a Distributed Soft Real-Time System |
Authors | B. Kao , H. Garcia-Molina |
Year | 1993 |
Citation | in the proceedings of the 13th International Conference on Distributed computing Systems. 1993. |
Keywords | |
Abstract | In a distributed environment, tasks often have processing demands on multiple different sites. A distributed task is usually divided up into several subtasks, each one to be executed at some site in order. In a real-time system, an overall deadline is usually specified by an application designer indicating when a distributed task is to be finished. However, the problem of how a global deadline is automatically translated to the deadline of each individual subtask has not been well studied. This paper examines (through simulations) four strategies for subtask deadline assignment in a distributed soft real-time environment. Keywords: soft real-time, distributed systems, deadline assignment, scheduling. |
URL | http://www-db.stanford.edu/pub/papers/dual_net.ps |
---|---|
Title | Soft Real-Time Communication Over Dual Non-Real-Time Networks |
Authors | B. Kao , H. Garcia-Molina |
Year | 1993 |
Citation | in the proceedings of the IEEE 1st International Workshop on Parallel and Distributed Real-Time Systems (1993). |
Keywords | |
Abstract | In this paper we consider systems with redundant communication paths, and show how applications can exploit the redundancy to improve real-time communications. We consider two policies, one that evenly distributes load, and one that partitions load according to packet slackness. We evaluate the effectiveness of these policies through analysis and simulation. |
URL | http://www-db.stanford.edu/pub/papers/sda2.ps |
---|---|
Title | Subtask Deadline Assignment for Complex Distributed Soft Real-Time Tasks |
Authors | B. Kao , H. Garcia-Molina |
Year | 1993 |
Citation | Technical REport STAN-CS-93-1491, Stanford University, Oct, 1993. |
Keywords | |
Abstract | Complex distributed tasks often involve parallel execution of subtasks at different nodes. To meet the deadline of a global task, all of its parallel subtasks have to be finished on time. Comparing to a local task (which involves execution at only one noa global task may have a much harder time making its deadline because it is fairly likely that at least one of its subtasks run into an overloaded node. Another problem with complex distributed tasks occurs when a global task consists of a number of serially executing subtasks. In this case, we have the problem of dividing up the end-to-end deadline of the global task and assigning them to the intermediate subtasks. In this paper, we study both of these problems. Different algorithms for assigning deadlines to subtasks are presented and evaluated. Keywords: soft real-time, distributed systems, parallel systems, deadline assignment, scheduling. |
URL | http://www-db.stanford.edu/pub/papers/dist_srt.ps |
---|---|
Title | On Building Distributed Soft Real-Time Systems |
Authors | B. Kao , H. Garcia-Molina , B. Adelberg |
Year | 1994 |
Citation | |
Keywords | |
Abstract | When building a distributed real-time system, one can either build the whole system from scratch, or from pre-existing standard components. Although the former allows better scheduling design, it is not economical in terms of the cost and time of development. This paper studies the performance of distributed soft real-time systems that use standard components with various scheduling algorithms and suggests ways to improve them. Keywords: soft real-time, distributed systems, deadline assignment, priority assignment, scheduling. |
URL | http://www-db.stanford.edu/pub/papers/view-cc-theory.ps |
---|---|
Title | Concurrency Control Theory for Deferred Materialized Views |
Authors | A. Kawaguchi , D. Lieuwen , I. Mumick , D. Quass , K. Ross |
Year | 1996 |
Citation | To appear in ICDT'97 |
Keywords | |
Abstract | We consider concurrency control problems that arise in the presence of materialized views. Consider a database system supporting materialized views to speed up queries. For a range of important applications (e.g. banking, billing, network managementransactions that access materialized views would like to get some consistency guarantees--if a transaction reads a base relation after an update, and then reads a materialized view derived from the base relation, it expects to see the effect of the base update on the materialized view. If a transaction reads two views, it expects that the two views reflect a single consistent database state. Such guarantees are not easy to obtain, as materialized views become inconsistent upon updates to base relations. Immediate maintenance reestablishes consistency within the transaction that updates the base relation, but this consistency comes at the cost of delaying update transactions. Deferred maintenance has been proposed to avoid penalizing update |
URL | http://www-db.stanford.edu/pub/papers/distributed-transactions.ps |
---|---|
Title | Making Trust Explicit in Distributed Commerce Transactions |
Authors | S. Ketchpel , H. Garcia-Molina |
Year | 1996 |
Citation | Appeared in 16th International Conference on Distributed Computing Systems (DCS96), pp. 270-281 |
Keywords | Distributed commerce transactions, interaction graph, sequencing graph, trust, indemnities |
Abstract | In a distributed environment where nodes are independently motivated, many transactions or commercial exchanges may be stymied due to a lack of trust between the participants. The addition of trusted intermediaries may facilitate some exchanges, but others are still problematic. We introduce a language for specifying these commercial exchange problems, and sequencing graphs, a formalism for determining whether a given exchange may occur . We also present an algorithm for generating a feasible execution sequence of pairwise exchanges between parties (when it eIndemnities may be offered to facilitate previously infeasible transactions. We show when and how they enable commercial transactions. |
URL | http://www-db.stanford.edu/pub/papers/UPAI.ps |
---|---|
Title | U-PAI: A Universal Payment Application Interface |
Authors | S. Ketchpel , H. Garcia-Molina , A. Paepcke , S. Hassan , S. Cousins |
Year | 1996 |
Citation | Appeared in The Second USENIX Workshop on Electronic Commerce Proceedings. 1996. p. 105 - 121. |
Keywords | payment mechanism, interoperation, AccountHandle, API |
Abstract | The progress of electronic commerce has been stymied by the lack of widely accepted network payment mechanisms. A number of proposals have been put forward, and each one offers a slightly different protocol and set of features. Yet none has achieved the critical mass to become an accepted standard. We believe that there will continue to be a variety of payment mechanisms, so in this paper we propose U-PAI, a universal payment application interface that will enable a programmer to write for one interface, and then interact with any payment mechanism. Each payment mechanism can support the universal API directly, or a proxy or wrapper can be built to translate U-PAI calls to the appropriate native calls supported by the payment mechanisms. In this paper we illustrate how two such proxies could be built. We also provide, in the appendix, a full CORBA spof U-PAI. |
URL | http://www-db.stanford.edu/pub/papers/aaai97.ps |
---|---|
Title | Distributed Commerce Transactions |
Authors | S. Ketchpel , H. Garcia-Molina |
Year | 1997 |
Citation | Tech Report. 1997. |
Keywords | Distributed Commerce Transactions, multi-agent systems, safe execution sequences, trusted intermediaries |
Abstract | In situations where self-interested agents are interacting in an environment of distrust, commercial exchanges may be blocked due to a lack of trust. We propose a fully distributed algorithm that each agent may run to provide guarantees about the outcomes of such exchanges. The algorithm is shown in operation on two examples, one feasible and one not. Introduction In a multi-agent system with many information sources covering a vast range of topics, an agent with a query or information request might enlist the help of a number of brokers and sources in answering the query. These other agents may add value by combining results, or having better knowledge about or access to relevant sources. Information brokers may obtain documents from other sources and re-sell them to customers, who may in turn be brokering the documents to someone else. In exchange for provided services, these agents expect payment. The environment is one of distrust, so that a customer will not give payment before b |
URL | http://www-db.stanford.edu/pub/papers/ijcai97.ps |
---|---|
Title | Distributed Commerce Transactions with Timing Deadlines and Direct Trust |
Authors | S. Ketchpel , H. Garcia-Molina |
Year | 1997 |
Citation | Tech Report. 1997. |
Keywords | Distributed Commerce Transactions, multi-agent systems, deadlines, direct trust |
Abstract | In a multi-party transaction such as fulfilling an information request from multiple sources (also called a distributed commerce agents face risks from dealing with untrusted agents. These risks are compounded in the face of deadlines, e.g., an agent may fail to deliver purchased goods by the time the goods are needed. We present a distributed algorithm that mitigates these risks, showing that it is sound (produces only safe multi-agent action sequences) and complete (finds a safe sequence whenever one We also show how the algorithm may be extended so that agents may interact directly with other participants rather than through a trusted intermediary. Introduction The explosion of networked information sources leads to the sense that the answer to any question is out there, if only one has access to the right combination of sources. Search engines and information brokers help navigate the space, yet adding information services and sources which require payment exposes a number of haza |
URL | http://www-db.stanford.edu/pub/papers/shoppingmodels.ps |
---|---|
Title | Shopping Models: A Flexible Architecture for Information Commerce |
Authors | S. Ketchpel , H. Garcia-Molina , A. Paepcke |
Year | 1997 |
Citation | Tech Report. 1997. Stanford Digital Libraries Project Working Paper SIDL-WP-1996-0052. |
Keywords | Shopping Models, electronic commerce, subscription, pay-per-view, Payment Monitor, Delivery Monitor, Order Monitor, U-PAI |
Abstract | In a digital library, there are many different interaction models between customers and information providers or merchants. Subscriptions, sessions, pay-per-view, shareware, and pre-paid vouchers are different models that each have different properties. A single merchant may use several of them. Yet if a merchant wants to support multiple models, there is a substantial amount of work to implement each one. In this paper, we formalize the shopping models which represent these different modes of consumer to merchant interaction. In addition to developing the overall architecture, we define the application program interfaces (API) to interact with the models. We show how a small number of primitives can be used to construct a wide range of shopping models that a digital library can support, and provide examples of the shopping models in operation, demonstrating their flexibility. |
URL | http://www-db.stanford.edu/pub/papers/view.ps |
---|---|
Title | Physical Database Design for Data Warehousing |
Authors | W. Labio , D. Quass , B. Adelberg |
Year | 1996 |
Citation | Technical Note |
Keywords | |
Abstract | Data warehouses collect copies of information from remote sources into a single database. Since the remote data is cached at the warehouse, it appears as local relations to the users of the warehouse. To improve query response time, the warehouse administrator (WHA) will often materialize views defined on the local relations to support common or complicated Unfortunately, the requirement to keep the views consistent with the local relations creates additional overhead when the remote sources change. The warehouse is often kept only loosely consistent with the sources: it is periodically refreshed with changes sent from the source. When this happens, the warehouse is taken off-line until the local relations and materialized views can be updated. Clearly, the users would prefer as little down time as possible. Often the down time can be reduced by adding carefully selected materialized views or indexes to the physical schema. This paper studies how to select the sets of supporting views |
URL | http://www-db.stanford.edu/pub/papers/window-short.ps |
---|---|
Title | Efficient Snapshot Differential Algorithms in Data Warehousing |
Authors | W. Labio , H. Garcia-Molina |
Year | 1996 |
Citation | VLDB, 1996 |
Keywords | |
Abstract | Detecting and extracting modifications from information sources is an integral part of data warehousing. For unsophisticated sources, it is often necessary to infer modifications by periodically comparing snapshots of data from the source. Although this snapshot differential problem is closely related to traditional joins, there are significant differences, whiclead to simple new algorithms. In particular, we present algorithms that perform compression of records. We also present a window algorithm that works very well if the snapshots are not "very different." The algorithms are studied via analysis and an implementation of two of them; the results illustrate the potential gains achievable with the new algorithms. |
URL | http://www-db.stanford.edu/pub/papers/window.ps |
---|---|
Title | Efficient Snapshot Differential Algorithms in Data Warehousing |
Authors | W. Labio , H. Garcia-Molina |
Year | 1996 |
Citation | Technical Note |
Keywords | |
Abstract | Detecting and extracting modifications from information sources is an integral part of data warehousing. For unsophisticated sources, in practice it is often necessary to infer modifications by periodically comparing snapshots of data from the source. Although this snapshot differential problem is closely related to traditional joins and outerjoins, there are significant differences, which lead to simple new algorithms. In particular, we present algorithms that perform (possibly lossy) compression of records. We also present a window algorithm that works very well if the snapshots are not "very different." The algorithms are studied via analysis and an implementation of two of them; the results illustrate the potential gains achievable with the new algorithms. |
URL | http://www-db.stanford.edu/pub/papers/expire.ps |
---|---|
Title | Expiring Data from a Warehouse |
Authors | W. Labio , H. Garcia-Molina |
Year | 1997 |
Citation | Technical Report |
Keywords | Data Warehouse, Storage Management, View Maintenance |
Abstract | Data warehouses are used to collect and analyze data from remote sources. The data collected often originate from transactional information and can become very large. This paper presents a framework for incrementally removing warehouse data (without a need to fully recompute offering two choices. One is to expunge data, in which case the result is as if the data had never existed. The second is to expire data, in which case views defined over the data are not necessarily affected. the framework, a user or administrator can specify what data to expire or expunge, what auxiliary data is to be kept for facilitating incremental view maintenance, what type of updates are expected from external sources, and how the system should compensate when data is expired or other parameters changed. We present algorithms for the various expiration and compensation actions, and we show how our framework can be implemented on top of a conventional RDBMS. Keywords: view maintenance, data warehouse |
URL | http://www-db.stanford.edu/pub/papers/view-short.ps |
---|---|
Title | Physical Database Design in Data Warehousing |
Authors | W. Labio , D. Quass , B. Adelberg |
Year | 1997 |
Citation | ICDE |
Keywords | Data Warehousing, Views, Physical Database Design |
Abstract | Data warehouses collect copies of information from remote sources into a single database. Since the remote data is cached at the warehouse, it appears as local relations to the users of the warehouse. To improve query response time, the warehouse administrator will often materialize views dened on the local relations to support common or complicated queries. Unfortunately, the requirement to keep the views consistent with the local relations creates additional overhead when the remote sources change. The warehouse is often kept only loosely consistent with the sources: it is periodically refreshed with changes sent from the source. When this happens, the warehouse is taken off-line until the local relations and materialized views can be updated. Clearly, the users would prefer as little down time as possible. Often the down time can be reduced by adding carefully selected materialized views or indexes to the physical schema. This paper studies how to select the sets of supporting view |
URL | http://www-db.stanford.edu/pub/papers/whips-demo-sigmod97 |
---|---|
Title | The WHIPS Prototype for Data Warehouse Creation and Maintenance |
Authors | W. Labio , Y. Zhuge , J. Wiener , H. Gupta , H. Garcia-Molina , J. Widom |
Year | 1997 |
Citation | Demonstration description for SIGMOD 1997 and ICDE 1997. |
Keywords | demo, data warehousing, whips |
Abstract |
URL | http://www-db.stanford.edu/pub/papers/modeling.ps |
---|---|
Title | Modeling Location Management in Personal Communications Services |
Authors | D. Lam , J. Jannink , D. Cox , J. Widom |
Year | 1995 |
Citation | Technical report |
Keywords | Mobile Communications, Simulation, Topology Model, Call Model, Movement Model, IS-41 |
Abstract | This paper presents a framework for modeling and evaluating the performance of location management schemes for a Personal Communication Services (PCS) network. The models take into account complex human behaviors, and are based on actual measurements of these behaviors. Real measurements are a crucial factor in achieving realistic performance evaluations. We have developed a powerful and highly configurable simulator based on the framework. Simulation results, showing the performance of an existing location management standard, IS-41, are presented in terms of database performance requirements (transactions per second) and network traffc load (messages per |
URL | http://www-db.stanford.edu/pub/papers/teletraffic.ps |
---|---|
Title | Teletraffic Modeling for Personal Communications Services |
Authors | D. Lam , D. Cox , J. Widom |
Year | 1996 |
Citation | Technical report |
Keywords | mobile computing, wireless computing, data modeling |
Abstract | This paper presents a realistic teletraffc modeling framework for Personal Communications Services. The framework captures complex human behaviors and has been v alidated through analysis of actual call and mobility data. Using the proposed framework, a large-scale simulation was performed on a model of the San F rancisco Bay Area. Simulation results showing the performance of IS-41 are presented. |
URL | http://www-db.stanford.edu/pub/papers/icupc96.ps |
---|---|
Title | Modeling Location Management in Personal Communications Services |
Authors | D. Lam , J. Jannink , D. Cox , J. Widom |
Year | 1996 |
Citation | To appear in ICUPC '96, Cambridge MA |
Keywords | Mobile Communications, Simulation, Topology Model, Call Model, Movement Model, IS-41 |
Abstract | Mo deling Lo cation Managemen t in P ersonal Comm unications Services Derek Lam, Jan Jannink, Donald C. Cox, Jennifer Widom Electrical Engineering & Computer Science Depts. Stanford University , Stanford, CA 94305 fdlam, dcoxg@wireless.stanford.edu, fjan, widomg@cs.stanford.edu Abstract-- This paper presents a realistic modeling framework for ev aluating the performance of location management schemes in PCS networks. The framework captures complex human behaviors and has been v alidated through analysis of actual call and mobility data. Simulation results, showing the performance of IS-41, are presented. I. Introduction Personal Communications Services (PCS) [2] presents many challenging problems in network data management [6;18]. A key problem in this area is location management. Location management refers to accessing and maintaining user information for call routing purposes. Important per-user information, such as current location, authentication information, and billing informati |
URL | http://www-db.stanford.edu/pub/papers/lsw-cluster.ps |
---|---|
Title | Clustering Association Rules |
Authors | B. Lent , A. Swami , J. Widom |
Year | 1996 |
Citation | technical report |
Keywords | data mining, association rules, clustering, segmentation, database |
Abstract | We consider the problem of clustering two-dimensional association rules in large databases. We present a geometric-based algorithm, BitOp, for performing the clustering, embedded within an association rule clustering system, ARCS. Association rule clustering is useful when the user desires to segment the data. We measure the quality of the segmentation generated by ARCS using the Minimum Description Length (MDL) principle of encoding the clusters on several databases including noise and errors. Scale-up experiments show that ARCS, using the BitOp algorithm, scales linearly with the amount of data. |
URL | http://www-db.stanford.edu/pub/papers/external-processors.ps |
---|---|
Title | Answering Queries Using Limited External Query Processors |
Authors | A. Levy , A. Rajaraman , J. Ullman |
Year | 1996 |
Citation | Appears in PODS 1996. |
Keywords | |
Abstract | When answering queries using external information sources, their contents can be described by views. To answer a query, we must rewrite it using the set of views presented by the sources. When the external information sources also have the ability to answer some (perhaps limited) sets of queries that require performing operations on their data, the set of views presented by the source may be infinite (albeit encoded in some finite Previous work on answering queries using views has only considered the case where the set of views is finite. In order to exploit the ability of information sources to answer more complex queries, e consider the problem of answering conjunctive queries using infinite sets of views. Our first result is that an infinite set of views can be partitioned into a finite number of equivalence classes, such that picking one view from every nonempty class is suffcient to determine whether the query can be answered using the views. Second, we show how to compute the se |
URL | http://www-db.stanford.edu/pub/papers/manifold.ps |
---|---|
Title | Querying Heterogeneous Information Sources Using Source Descriptions |
Authors | A. Levy , A. Rajaraman , J. Ordille |
Year | 1996 |
Citation | To appear in VLDB 1996. |
Keywords | |
Abstract | We witness a rapid increase in the number of structured information sources that are available online, especially on the WWW. These sources include commercial databases on product information, stock market information, real estate, automobiles, and entertainment. We would like to use the data stored in these databases to answer complex queries that go beyond keyword searches. We face the following challenges: (1) Several information sources store interrelated data, and any query-answering system must understand the relationships between their contents. (2) Many sources are not full-featured database systems and can answer only a small set of queries over their data (for example, forms on the WWW restrict the set of queries one can (3) Since the number of sources is very large, effective techniques are needed to prune the set of information sources accessed to answer a query. (4) The details of interacting with each source vary greatly. We describe the Information Manifold, an implemen |
URL | http://www-db.stanford.edu/pub/papers/aaai.ps |
---|---|
Title | Query Answering Algorithms for Information Agents |
Authors | A. Levy , A. Rajaraman , J. Ordille |
Year | 1996 |
Citation | To appear in AAAI 1996. |
Keywords | |
Abstract | The database theory community, centered around the PODS (Principles of Database Systems) conference has had a long-term interest in logic as a way to represent "data," "information," and "knowledge" (take your pick on the term -- it boils down to facts or atoms and rules, usually Horn The approach of this community has been "slow and steady," preferring to build up carefully from simple special cases to more general ideas, always paying attention to how effciently we can process queries and perform other operations on the facts and rules. A powerful theory has developed, and it is beginning to have some impact on applications, especially information-integration engines. Datalog The term Datalog has been coined to refer to Prolog-like rules without function symbols, treated as logic program. Unlike Prolog, however, the conventional leastfixed-point semantics of the rules is used whenever possible. Example 1: The rules for ancestors can be written as the following Datalog program. anc(X |
URL | http://www-db.stanford.edu/pub/papers/external97.ps |
---|---|
Title | Integrating Dynamically-Fetched External Information into a DBMS for Semistructured Data |
Authors | J. McHugh , J. Widom |
Year | 1997 |
Citation | Technical Report |
Keywords | Semistructured Databases, Heterogeneous Databases, External data, Dynamic Integration |
Abstract | We describe the external data manager component of the Lore database system for semistructured data. Lore's external data manager enables dynamic retrieval and integration of data from arbitrary, heterogeneous external sources during query processing. The distinction between Lore-resident and external data is invisible to the user. We introduce a flexible notion of arguments that limits the amount of data fetched from an external source, and we have incorporated optimizations to reduce the number of calls to an external source. |
URL | http://www-db.stanford.edu/pub/papers/kyoto.ps |
---|---|
Title | The Identification and Resolution of Semantic Heterogeneity |
Authors | D. McLeod , D. Fang , J. Hammer |
Year | 1991 |
Citation | In Proceedings of First International Workshop on Interoperability in Multidatabase Systems. Kyoto, Japan, April 1991. |
Keywords | Semantic, heterogeneity, resolution, integration, schema |
Abstract | This paper describes several aspects of the Remote{Exchange project at USC, which focuses on the controlled sharing and exchange of information among autonomous, heterogeneous database systems. The spectrum of heterogeneity which may exist among the components in a federation of database systems is examined, and an approach to accommodating such heterogeneity is described. An overview of the Remote{Exchange experimental system is provided. |
URL | http://www-db.stanford.edu/pub/papers/cube-maint.ps |
---|---|
Title | Maintenance of Data Cubes and Summary Tables in a Warehouse |
Authors | I. Mumick , D. Quass , B. Mumick |
Year | 1996 |
Citation | Technical note |
Keywords | view maintenance, aggregation, summary table, data warehousing, olap |
Abstract | Data warehouses contain large amounts of information, often collected from a variety of independent sources. Decision-support functions in a warehouse, such as on-line analytical processing involve hundreds of complex aggregate queries over large volumes of data. It is not feasible to compute these queries by scanning the data sets each time. Warehouse applications therefore build a number of summary tables, or materialized aggregate views, to help them increase the system performance. As changes, most notably new transactional data, are collected at the data sources, all summary tables at the warehouse that depend upon this data need to be updated. Usually, source changes are loaded into the warehouse at regular intervals, usually once a day, in a batch window, and the warehouse is made unavailable for querying while it is updated. Since the number of summary tables that need to be maintained is often large, a critical issue for data warehousing is how to maintain the summary tables |
URL | http://www-db.stanford.edu/pub/papers/cube-maint-sigmod.ps |
---|---|
Title | Maintenance of Data Cubes and Summary Tables in a Warehouse |
Authors | I. Mumick , D. Quass , B. Mumick |
Year | 1997 |
Citation | To appear in SIGMOD 1997; shortened proceedings version of cube-maint.ps |
Keywords | view maintenance, aggregation, summary table, data warehousing, olap |
Abstract | Data warehouses contain large amounts of information, often collected from a variety of independent sources. Decisionsupport functions in a warehouse, such as on-line analytical processing involve hundreds of complex aggregate queries over large volumes of data. It is not feasible to compute these queries by scanning the data sets each time. Warehouse applications therefore build a large number of summary tables, or materialized aggregate views, to help them increase the system performance. As changes, most notably new transactional data, are collected at the data sources, all summary tables at the warehouse that depend upon this data need to be updated. Usually, source changes are loaded into the warehouse at regular intervals, usually once a day, in a batch window, and the warehouse is made unavailable for querying while it is updated. Since the number of summary tables that need to be maintained is often large, a critical issue for data warehousing is how to maintain the summary ta |
URL | http://www-db.stanford.edu/pub/papers/variant-indexes.ps |
---|---|
Title | Improved Query Performance with Variant Indexes |
Authors | P. Neil , D. Quass |
Year | 1996 |
Citation | full version of variant-sigmod.ps". |
Keywords | indexes, indexing, bitmap, data warehousing, olap |
Abstract | : The read-mostly environment of data warehousing makes it possible to use morecomplex indexes to speed up queries than in situations where concurrent updates are present. Thecurrent paper presents a short review of current indexing technology, including row-set repre-sentation by Bitmaps, and then introduces two approaches we call Bit-Sliced indexing andProjection indexing. A Projection index basically materializes all values of a column in RID or-der, and a Bit-Sliced index essentially takes an orthogonal bit-by-bit view of the same data.While some of these concepts started with the MODEL 204 product, and both Bit-Sliced andProjection indexing are now fully realized in Sybase IQ, this is the first rigorous examination ofsuch indexing capabilities in the literature. We compare algorithms that become feasible withthese variant index types against algorithms using more conventional indexes. The analysisdemonstrates important performance advantages for variant indexes in some types |
URL | http://www-db.stanford.edu/pub/papers/variant-sigmod.ps |
---|---|
Title | Improved Query Performance with Variant Indexes |
Authors | P. Neil , D. Quass |
Year | 1997 |
Citation | To appear in SIGMOD 1997; shortened proceedings version of variant-indexes.ps |
Keywords | indexes, indexing, bitmap, data warehousing, olap |
Abstract | : The read-mostly environment of data warehousingmakes it possible to use more complex indexes to speed upqueries than in situations where concurrent updates are present.The current paper presents a short review of current indexingtechnology, including row-set representation by Bitmaps, andthen introduces two approaches we call Bit-Sliced indexing andProjection indexing. A Projection index materializes all valuesof a column in RID order, and a Bit-Sliced index essentially takesan orthogonal bit-by-bit view of the same data. While some ofthese concepts started with the MODEL 204 product, and bothBit-Sliced and Projection indexing are now fully realized inSybase IQ, this is the first rigorous examination of such index-ing capabilities in the literature. We compare algorithms thatbecome feasible with these variant index types against algo-rithms using more conventional indexes. The analysis demon-strates important performance advantages for variant indexes insome types of SQL aggreg |
URL | http://www-db.stanford.edu/pub/papers/representative-object.ps |
---|---|
Title | Representative Objects: Concise Representations of Semistructured, Hierarchical Data |
Authors | S. Nestorov , J. Ullman , J. Wiener , S. Chawathe |
Year | 1997 |
Citation | Appeared in ICDE'97. |
Keywords | semistructured data, schema discovery |
Abstract | In this paper we introduce the representative object, which uncovers the inherent schema(s) in semistructured, hierarchical data sources and provides a concise description of the structure of the data. Semistructured data, unlike data stored in typical relational or object-oriented databases, does not have xed schema that is known in advance and stored separately from the data. With the rapid growth of the World Wide Web, semistructured hierarchical data sources are becoming widely available to the casual user . The lack of external schema information currently makes browsing and querying these data sources inefcient at best, and impossible at worst. We show how representative objects make schema discovery efcient and facilitate the generation of meaningful queries over the data. |
URL | http://www-db.stanford.edu/pub/papers/icde95.ps |
---|---|
Title | Object Exchange Across Heterogeneous Information Sources |
Authors | Y. Papakonstantinou , H. Garcia-Molina , J. Widom |
Year | 1995 |
Citation | Appeared in ICDE '95; shortened proceedings version of pub/papakonstantinou/1994/object-exchange-heterogeneous-is.ps; one cut-and-paste figure missing from ps file |
Keywords | data model, semistructured |
Abstract | We address the problem of providing integrated access to diverse and dynamic information sources. We explain how this problem differs from the traditional database integration problem and we focus on one aspect of the information integration problem, namely information exchange. We define an object-based information exchange model and a corresponding query language that we believe are well suited for integration of diverse information sources. We describe how the model and language have been used to integrate heterogeneous bibliographic information sources. We also describe two general-purpose libraries we have implemented for object exchange between clients and servers. |
URL | http://www-db.stanford.edu/pub/papers/cbr.ps |
---|---|
Title | Capabilities-Based Query Rewriting in Mediator Systems |
Authors | Y. Papakonstantinou , A. Gupta , L. Haas |
Year | 1995 |
Citation | IBM Almaden Technical Report. |
Keywords | |
Abstract | Users today are struggling to integrate a broad range of information sources that provide different levels of query capabilities. Currently, data sources with different and limited capabilities are accessed either by writing rich functional wrappers for the more primitive sources, or by dealing with all sources at a "lowest common denominator". This paper explores a third approach, in which a mediator ensures that sources receive queries they can handle, while still taking advantage of all the query power of the source. We propose an architecture that enables this, and identify a key component of that architecture, the Capabilities-Based Rewriter We provide a language to describe the query capability of data sources. The CBR takes as input a description of each data sources' capability. Given a query posed to the mediator, the CBR determines the component queries to be sent to the sources, commensurate with their It also computes a plan for combining the results of the component queri |
URL | http://www-db.stanford.edu/pub/papers/oem.ps |
---|---|
Title | Object Exchange Across Heterogeneous Information Sources |
Authors | Y. Papakonstantinou , H. Garcia-Molina , J. Widom |
Year | 1994 |
Citation | Technical report; shortened version appeared in ICDE '95; one cut-and-paste figure missing from ps file |
Keywords | data model, semistructured |
Abstract | We address the problem of providing integrated access to diverse and dynamic information sources. We explain how this problem differs from the traditional database integration problem and we focus on one aspect of the information integration problem, namely information exchange. We define an object-based information exchange model and a corresponding query language that we believe are well suited for integration of diverse information sources. We describe how the model and language have been used to integrate heterogeneous bibliographic information sources. We also describe two general-purpose libraries we have implemented for object exchange between clients and servers. |
URL | http://www-db.stanford.edu/pub/papers/querying-full.ps |
---|---|
Title | Querying Semistructured Heterogeneous Information |
Authors | D. Quass , A. Rajaraman , Y. Sagiv , J. Ullman , J. Widom |
Year | 1994 |
Citation | Full version of querying.ps |
Keywords | LORE, tsimmis, semistructured, object-oriented |
Abstract | Semistructured data has no absolute schema fixed in advance and its structure may be irregular or incomplete. Such data commonly arises in sources that do not impose a rigid structure (such as the World-Wide Web) and when data is combined from several heterogeneous sources. Data models and query languages designed for well structured data are inappropriate in such environments. Starting with a "lightweight" object model adopted for the TSIMMIS project at Stanford, in this paper we describe a query language and object repository designed specifically for semistructured data. Our language provides meaningful query results in cases where conventional models and languages do not: when some data is absent, when data does not have regular structure, when similar concepts are represented using different types, when heterogeneous sets are present, and when object structure is not fully known. This paper motivates the key concepts behind our approach, describes the language through a series of |
URL | http://www-db.stanford.edu/pub/papers/querying.ps |
---|---|
Title | Querying Semistructured Heterogeneous Information |
Authors | D. Quass , A. Rajaraman , Y. Sagiv , J. Ullman , J. Widom |
Year | 1995 |
Citation | Appeared in the Proceedings of the 4th International Conference on Deductive and Object-Oriented Databases (DOOD) 1995 |
Keywords | LORE, tsimmis, semistructured, object-oriented |
Abstract | Semistructured data has no absolute schema fixed in advance and its structure may be irregular or incomplete. Such data commonly arises in sources that do not impose a rigid structure (such as the World-Wide Web) and when data is combined from several heterogeneous sources. Data models and query languages designed for well structured data are inappropriate in such environments. Starting with a "lightweight" object model adopted for the TSIMMIS project at Stanford, in this paper we describe a query language and object repository designed specifically for semistructured data. Our language provides meaningful query results in cases where conventional models and languages do not: when some data is absent, when data does not have regular structure, when similar concepts are represented using different types, when heterogeneous sets are present, and when object structure is not fully known. This paper motivates the key concepts behind our approach, describes the language through a series of |
URL | http://www-db.stanford.edu/pub/papers/self-maint.ps |
---|---|
Title | Making Views Self-Maintainable for Data Warehousing |
Authors | D. Quass , A. Gupta , I. Mumick , J. Widom |
Year | 1996 |
Citation | Full version of pub/quass/1996/self-maint-pdis.ps |
Keywords | data warehousing, view |
Abstract | A data warehouse stores materialized views over data from one or more sources in order to provide fast access to the integrated data, regardless of the availability of the data sources. Warehouse views need to be maintained in response to changes to the base data in the sources. Except for very simple views, maintaining a warehouse view requires access to data that is not available in the view itself. Hence, to maintain the view, one either has to query the data sources or store auxiliary data in the warehouse. We show that by using key and referential integrity constraints, we often can maintain a select-project-join view when there are insertions, deletions, and updates to the base relations without going to the data sources or replicating the base relations in their entirety in the warehouse. We derive a set of auxiliary views such that the warehouse view and the auxiliary views together are self-maintainable--they can be maintained without going to the data sources or replicating |
URL | http://www-db.stanford.edu/pub/papers/self-maint-pdis.ps |
---|---|
Title | Making Views Self-Maintainable for Data Warehousing |
Authors | D. Quass , A. Gupta , I. Mumick , J. Widom |
Year | 1996 |
Citation | To appear in Parallel and Distributed Information Systems (PDIS) 1996; shortened proceedings version of pub/quass/1996/self-maint.ps |
Keywords | data warehousing, view |
Abstract | A data warehouse stores materialized views over data from one or more sources in order to provide fast access to the integrated data, regardless of the availability of the data sources. Warehouse views need to be maintained in response to changes to the base data in the sources. Except for very simple views, maintaining a warehouse view requires access to data that is not available in the view itself. Hence, to maintain the view, one either has to query the data sources or store auxiliary data in the warehouse. We show that by using key and referential integrity constraints, we often can maintain a select-project-join view without going to the data sources or replicating the base relations in their entirety in the warehouse. We derive a set of auxiliary views such that the warehouse view and the auxiliary views together are self-maintainable--they can be maintained without going to the data sources or replicating all base data. In addition, our technique can be applied to simplify tra |
URL | http://www-db.stanford.edu/pub/papers/sm.ps |
---|---|
Title | Making Views Self-Maintainable for Data Warehousing |
Authors | D. Quass , A. Gupta , I. Mumick , J. Widom |
Year | 1996 |
Citation | Technical Report. Earlier version of pub/quass/1996/self-maint.ps that takes a slightly different approach. |
Keywords | warehousing, materialized, view |
Abstract | A data warehouse stores materialized views over data from one or more sources in order to provide fast access to the integrated data, regardless of the availability of the data sources. Warehouse views need to be maintained in response to changes to the base data in the sources. Except for very simple views, maintaining a warehouse view requires access to data that is not available the view itself. Hence, to maintain the view, one either has to query the data sources or store auxiliary data in the warehouse. We show that by using key and referential integrity constraints, we often can maintain a select-project-join view with respect to insertions, deletions, and updates to the base relations, without going to the data sources or replicating the base relations in their tirety in the warehouse. We derive a set of auxiliary views such that the warehouse view and the auxiliary views together are self-maintainable--they can be maintained without going to the data sources or replicating all |
URL | http://www-db.stanford.edu/pub/papers/online.ps |
---|---|
Title | On-Line Warehouse View Maintenance for Batch Updates |
Authors | D. Quass , J. Widom |
Year | 1996 |
Citation | Technical note |
Keywords | warehousing, materialized, view |
Abstract | Data warehouses store materialized views over base data from external sources. Clients typically perform complex read-only queries on the views. The views are refreshed periodically by maintenance transactions, which propagate large batch updates from the base tables. In current warehousing systems, maintenance transactions usually are isolated from client read activity, limiting availability and/or size of the warehouse. We describe an algorithm called 2VNL that allows warehouse maintenance transactions to run concurrently with readers. By logically maintaining two versions of the database, no locking is required and serializability is guaranteed. We present our algorithm, explain its relationship to other multi-version concurrency control algorithms, and describe how it can be implemented on top of a conventional relational DBMS using a query rewrite approach. |
URL | http://www-db.stanford.edu/pub/papers/view-aggr.ps |
---|---|
Title | Maintenance Expressions for Views with Aggregation |
Authors | D. Quass |
Year | 1996 |
Citation | Appeared in Views 96 workshop, June 1996 |
Keywords | warehousing, materialized, view, aggregation |
Abstract | Materialized views, especially views involving aggregation, are often used in environments such as data warehouses to speed up the evaluation of complex queries. Because the views can be quite complex, as changes are made to the underlying base data it is usually better to incrementally maintain a view by propagating changes to base data onto the view than to recompute the view from scratch. Griffn and Libkin [GL95] provide a framework for deriving incremental view maintenance expressions for a view with duplicate semantics, where the view can be defined using any number of select, project, join, bag union, and monus bag-algebra operators. However, aggregation was considered only in a limited sense; for example, aggregation with groupby attributes was not allowed. We extend the framework of [GL95] to include maintaining views with aggregation in the general case, where aggregation can include groupby attributes and a view can include several aggregate operators. |
URL | http://www-db.stanford.edu/pub/papers/lore-demo.ps |
---|---|
Title | LORE: A Lightweight Object REpository for Semistructured Data |
Authors | D. Quass , J. Goldman , K. Haas , Q. Luo , J. McHugh , S. Nestorov , A. Rajaraman , H. Rivero , S. Abiteboul , J. Ullman , J. Wiener |
Year | 1996 |
Citation | Page accompanying demo appearing in SIGMOD '96 |
Keywords | LORE, tsimmis, semistructured, object-oriented |
Abstract | LORE: A Lightweight Object REpository for Semistructured Data Dallan Quass, Jennifer Widom, Roy Goldman, Kevin Haas, Qingshan Luo, Jason McHugh, Svetlozar Nestorov, Anand Rajaraman, Hugo Rivero, Serge Abiteboul, Jeff Ullman, Janet Wiener Stanford University Database Group, http://db.stanford.edu The number of information sources accessible electronically is growing rapidly. Many of these sources store and export unstructured data in addition to or instead of structured data. In most cases, however, the unstructured data is not entirely devoid of structure, i.e., the data is semistructured. We consider data to be semistructured when there is no schema fixed or known in advance and when the data may be incomplete or irregular. For example, HTML files on the World-Wide Web usually contain some structure, but often the data is irregular or In addition, data integrated from multiple, heterogeneous information sources often is semistructured. Storing and querying semistructured data poses |
URL | http://www-db.stanford.edu/pub/papers/limited-opsets.ps |
---|---|
Title | Answering Queries using Templates with Binding Patterns |
Authors | A. Rajaraman , Y. Sagiv , J. Ullman |
Year | 1995 |
Citation | Appears in PODS 1995. |
Keywords | |
Abstract | When integrating heterogeneous information resources, it is often the case that the source is rather limited in the kinds of queries it can answer. If a query is asked of the entire system, we have a new kind of optimization problem, in which we must try to express the given query in terms of the limited query templates that this source can answer. For the case of conjunctive queries, we show how to decide with a nondeterministic polynomial-time algorithm whether the given query can be answered. We then extend our results to allow arithmetic comparisons in the given query and in the templates. I. Motivation A data-integration system such as Tsimmis (Papakonstantinou, Garcia, and Widom [1994], Chawathe et al. [1994]) translates information sources of arbitrary type into a common data model and language. If a source is an SQL database, then its interface with the Tsimmis system is fairly clear: one can ask it SQL queries over its database schema and nothing else. However, the source may |
URL | http://www-db.stanford.edu/pub/papers/outerjoin.ps |
---|---|
Title | Integrating Information by Outerjoins and Full Disjunctions |
Authors | A. Rajaraman , J. Ullman |
Year | 1996 |
Citation | Appears in PODS 1996. |
Keywords | |
Abstract | Our motivation is the piecing together of tidbits of information found on the "web" into a usable information structure. The problem is related to that of computing the natural outerjoin of many relations in a way that preserves all possible connections among facts. Such a computation has been termed a "full disjunction" by Galindo-Legaria. We are thus led to ask the question of when a full disjunction can be computed by some sequence of natural outerjoins. The answer involves a concept of from Fagin [1983] called fl-acyclic hypergraphs." We prove that there is a natural outerjoin sequence producing the full disjunction if and only if the set of relation schemes forms a connected, fl-acyclic hypergraph. I. Motivation Let us imagine we are constructing an information resource that accepts queries about university information. We gather our information from the on-line information found at the various univand their schools or departments, and we integrate the information into an object |
URL | http://www-db.stanford.edu/pub/papers/outerjoin-full.ps |
---|---|
Title | Integrating Information by Outerjoins and Full Disjunctions |
Authors | A. Rajaraman , J. Ullman |
Year | 1996 |
Citation | Full version of PODS 1996 paper (includes proofs omitted in the PODS '96 version). |
Keywords | |
Abstract | Our motivation is the piecing together tidbits of information found on the "web" into a usable information structure. The problem is related to that of computing the natural outerjoin of many relations in a way that preserves all possible connections among facts. Such a computation has been termed a "full disjunction" by GalindoLegaria. We are thus led to ask the question of when a full disjunction can be computed by some sequence of natural outerjoins. The answer involves a concept of from Fagin [1983] called fl-acyclic hypergraphs." We prove that there is a natural outerjoin sequence producing the full disjunction if and only if the set of relation schemes forms a connected, fl-acyclic hypergraph. I. Motivation Let us imagine we are constructing an information resource that accepts queries about university information. We gather our information from the on-line information found at the various universities and their schools or departments, and we integrate the information into an ob |
URL | http://www-db.stanford.edu/pub/papers/deductive-survey.ps |
---|---|
Title | A Survey of Research in Deductive Database Systems |
Authors | R. Ramakrishnan , J. Ullman |
Year | 1995 |
Citation | Appears in J. Logic Programming, May, 1995, pp. 125-149 |
Keywords | |
Abstract | The area of deductive databases has matured in recent years, and it now seems appropriate to reflect upon what has been achieved and what the future holds. In this paper, we provide an overview of the area and briefly describe a number of projects that have led to implemented systems. |
URL | http://www-db.stanford.edu/pub/papers/dlmag.ps |
---|---|
Title | The SCAM Approach To Copy Detection in Digital Libraries |
Authors | N. Shivakumar , H. Garcia-Molina |
Year | 1995 |
Citation | D-lib Magazine, November 1995 |
Keywords | SCAM, Copy detection, Plagiarism, Copyright |
Abstract | The SCAM Approach to Copy Detection inDigital Libraries Narayanan Shivakumar and Hector Garcia-Molina Department of Computer ScienceStanford UniversityStanford, CA 94305, U.S.A{shiva, hector}@cs.stanford.edu Scenario 1 Your local publishing company Books'R'Us decides to publish on the Internet its latest book in aneffort to cut down on printing costs and book distribution expenses. Customers pay for the digitalbooks using sophisticated electronic payment mechanisms such as DigiCash, First Virtual orInterPay. When the payment is received, the book distribution server at Books'R'Us sends adigital version of the book electronically to the paying customer. Books'R'Us expects to makehigher profits on the digital book due to lower production and distribution costs, and largermarkets on the Internet. It turns out, however, that very few books are sold since digital copies of the Books'R'Us bookhad been widely circulated on UseNet newsgroups, bulletin boards, and had been available for free |
URL | http://www-db.stanford.edu/pub/papers/icmcs95.ps |
---|---|
Title | The Concord Algorithm for Synchronization of Networked Multimedia Streams |
Authors | N. Shivakumar , C. Sreenan , B. Narendran , P. Agrawal |
Year | 1995 |
Citation | 2nd IEEE International Conference on Multi-media Computing and Systems'95 (ICMCS'95), pp. 31 - 40. |
Keywords | Jitter-free, Multimedia, Packet networks |
Abstract | Synchronizing different data streams from multiple sources simultaneously at a receiver is one of the basic problems involved in multimedia distributed systems. This requirement stems from the nature of packet based networks which can introduce end-to-end delays that vary both within and across streams. In this paper, we present a new algorithm called Concord, which provides an integrated solution for these single and multiple stream synchronization problems. It is notable because it defines a single framework to deal with both problems, and operates under the influence of parameters which can be supplied by the application involved. In particular, these parameters are used to allow a trade-off between the packet loss rates, total end-to-end delay and skew for each of the streams. F or applications like conferencing this is used to reduce delay by determining the minimum buffer delay/size required. |
URL | http://www-db.stanford.edu/pub/papers/replication.ps |
---|---|
Title | User Profile Replication for Faster Location Lookup in Mobile Environments |
Authors | N. Shivakumar , J. Widom |
Year | 1995 |
Citation | Proceedings of 1st ACM International Conference on Mobile Computing and Networking (MCN'95), November 1995, Berkeley. |
Keywords | Wireless, Replication, Location Lookup |
Abstract | We consider per-user profile replication as a mechanism for faster location lookup of mobile users in a Personal Communications Service system. We present a minimum-cost maximum-flow based algorithm to compute the set of sites at which a user profile should be replicated given known calling and user mobility patterns. We then present schemes for replication plans that gracefully adapt to changes in the calling and mobility patterns. |
URL | http://www-db.stanford.edu/pub/papers/scam.ps |
---|---|
Title | SCAM: A Copy Detection Mechanism for Digital Documents |
Authors | N. Shivakumar , H. Garcia-Molina |
Year | 1995 |
Citation | Proceedings of 2nd International Conference in Theory and Practice of Digital Libraries (DL'95), June 11 - 13, Austin, Texas. |
Keywords | SCAM, Copy detection, Plagiarism, Copyright |
Abstract | Copy detection in Digital Libraries may provide the necessary guarantees for publishers and newsfeed services to offer valuable on-line data. We consider the case for a registration server that maintains registered documents against which new documents can be checked for overlap. In this paper we present a new scheme for detecting copies based on comparing the word frequency occurrences of the new document against those of registered documents. We also report on an experimental comparison between our proposed scheme and COPS [6], a detection scheme based on sentence overlap. The tests involve over a million comparisons of netnews articles and show that in general the new scheme pbetter in detecting documents that have partial overlap. Keywords: Copy Detection, Plagiarism, Registration Ser-ver, Databases. |
URL | http://www-db.stanford.edu/pub/papers/profile-journ.ps |
---|---|
Title | Per-User Profile Replication in Mobile Environments: Algorithms, Analysis, and Simulation Results |
Authors | N. Shivakumar , J. Jannink , J. Widom |
Year | 1995 |
Citation | Unpublished manuscript |
Keywords | Wireless, Replication, Location Lookup |
Abstract | We consider per-user profile replication as a mechanism for faster location lookup of mobile users in a Personal Communications Service system. We present a minimum-cost maximum-flow based algorithm to compute the set of sites at which a user profile should be replicated given known calling and user mobility patterns. We then present schemes for replication plans that gracefully adapt to changes in the calling and mobility patterns. We show the costs and benefits of our replication algorithm against previous location lookup approaches through analysis. We also simulate our algorithm against other location lookup algorithms on a realistic model of a geographical area to evaluate critical system performance measures. A notable aspect of our simulations is that we use well-validated models of user calling and mobility patterns. |
URL | http://www-db.stanford.edu/pub/papers/performance.ps |
---|---|
Title | Building a Scalable and Accurate Copy Detection Mechanism |
Authors | N. Shivakumar , H. Garcia-Molina |
Year | 1996 |
Citation | Proceedings of 1st ACM International Conference on Digital Libraries (DL'96) March 1996, Bethesda Maryland. |
Keywords | SCAM, Copy detection, Plagiarism, Copyright |
Abstract | Often, publishers are reluctant to offer valuable digital documents on the Internet for fear that they will be re-transmitted or copied widely. A Copy Detection Mechanism can help identify such copying. For example, publishers may register their documents with a copy detection server, and the server can then automatically check public sources such as UseNet articles and Web sites for potential illegal copies. The server can search for exact copies, and also for cases where significant portions of documents have been copied. In this paper we study, for the first time, the performance of various copy detection mechanisms, including the disk storage requirements, main memory requirements, response times for registration, and response time for querying. We also contrast performance to the accuracy of the mechanisms (how well they detect partial cThe results are obtained using SCAM, an experimental server we have implemented, and a collection of 50,000 netnews articles. |
URL | http://www-db.stanford.edu/pub/papers/huffman.ps |
---|---|
Title | Time-and-Energy Efficient Wireless Broadcasting |
Authors | N. Shivakumar , S. Venkatasubramanian |
Year | 1996 |
Citation | ACM Journal of Wireless and Nomadic Applications (NOMAD) |
Keywords | Huffman, Wireless, Indexing, Energy, Power, Broadcasting |
Abstract | We consider the application of high volume information dissemination in broadcast based mobile environments. Since current mobile units accessing broadcast information have limited battery capacity, the problem of quick and energy-effcient access to data becomes particularly relevant as the number and sizes of information units increases. We propose several randomized and Huffman-encoding based indexing schemes that are sensitive to data popularity patterns to structure data transmission on the wireless medium, so that the average energy consumption of mobile units is minimized while trying to access desired data. We then propose an algorithm for PCS units to tune into desired data independent of the actual transmission scheme being used. We also empirically study the proposed schemes and propose different transmission modes for the base station to dynamically adapt to changes in the number of data files to be broadcasted, the available bandwidth and the accuracy of data popularity pa |
URL | http://www-db.stanford.edu/pub/papers/wave.ps |
---|---|
Title | Wave-Indices: Indexing Evolving Databases |
Authors | N. Shivakumar , H. Garcia-Molina |
Year | 1997 |
Citation | To appear in ACM International Conference on Management of Data (SIGMOD'97) |
Keywords | indexing, sliding windows, SCAM, search engines |
Abstract | In many applications, new data is being generated every day. Often an index of the data of a past window of days is required to answer queries effciently. For example, in a warehouse one may need an index on the sales records of the last week for effcient data mining, or in a Web service one may provide an index of Netnews articles of the past month. In this paper, we propose a variety of wave indices where the data of a new day can be effciently added, and old data can be quickly expired, to maintain the required window. We compare these schemes based on several system performance measures, such as storage, query response time, and maintenance work, as well as on their simplicity and ease of coding. |
URL | http://www-db.stanford.edu/pub/papers/filter.ps |
---|---|
Title | Filtering Expensive Predicates |
Authors | N. Shivakumar , C. Chekuri , H. Garcia-Molina |
Year | 1997 |
Citation | Submitted for publication. |
Keywords | Expensive Predicates, SCAM, multi-media databases, extensible databases |
Abstract | In many applications, databases are supporting increasingly more complex and expensive predicates, e.g., to find patterns in images, to check for containment of points in regions or to detect textual similarity. To reduce the expense of running the expensive predicate on all tuples in a database, application designers often define simpler and computationally cheaper predicates that "approximate" the complex predicate. For instance, if the complex predicate is to look for daisies in an image, the approximate predicate may look for a substantial yellow component in the image. The approximate predicate is then applied to all tuples, and only items that pass through the approximate predicate are operated on by the expensive predicate. In this paper we propose a general framework for expressing approximate predicates and composing given approximate tests to filter input to expensive predicates. For certain special cases, we also propose algorithms that construct provably "good" filters for |
URL | http://www-db.stanford.edu/pub/papers/sigir.94.ps |
---|---|
Title | Synthetic Workload Performance Analysis of Incremental Updates |
Authors | K. Shones , A. Tomasic , H. Garcia-Molina |
Year | 1994 |
Citation | SIGIR 94. A part of Stanford University Department of Computer Science Technical Report Number STAN-CS-TN-93-1. |
Keywords | |
Abstract | Declining disk and CPU costs have kindled a renewed interest in effcient document indexing techniques. In this paper, the problem of incremental updates of inverted lists is addressed using a dual-structure index data structure that dynamically separates long and short inverted lists and optimizes the retrieval, update, and storage of each type of list. The behavior of this index is studied with the use of a synthetically-generated document collection and a simulation model of the algorithm. The index structure is shown to support rapid insertion of documents, fast queries, and to scale well to large document collections and many disks. |
URL | http://www-db.stanford.edu/pub/papers/mmcn.ps |
---|---|
Title | Internet Synchronization with Concord |
Authors | C. Sreenan , B. Narendran , P. Agrawal , N. Shivakumar |
Year | 1996 |
Citation | Proceedings of Multimedia Computing and Networking (MMCN'96), San Jose, California, Jan 1996. |
Keywords | Multimedia, Jitter-free, Packet networks |
Abstract | Using packet networks to transport multimedia introduces delay v ariations within and across streams, necessitating synchronization at the receiver. This requires stream data to be buffered prior to presentation, which also increases its total end to end delay . Concord recognizes that applications may wish to influence the underlying synchronization policy in terms of its effect on quality of service It provides a single framework for synchronization within and across streams and employs an application specified tradeoff between packet losses, delay and inter-stream skew. This paper presents a new predictive approach for synchronization and a selection of results from an extensive ev aluation of Concord for use in the Internet. A trace driven simulator is used, allowing a direct comparison with alternative approaches. A conclusion is that Concord can operate with lower maximum and v ariation in total end to end delay , which in turn can allow receiver buffer requirements to be reduce |
URL | http://www-db.stanford.edu/pub/papers/stan.cs.92.1434.ps |
---|---|
Title | Performance of Inverted Indices in Distributed Text Document Retrieval Systems |
Authors | A. Tomasic , H. Garcia-Molina |
Year | 1992 |
Citation | Stanford University Department of Computer Science Technical Report Number STAN-CS-92-1434. Short version of paper appears in PDIS '93. Long version of paper appears in VLDB Journal '93. |
Keywords | |
Abstract | The performance of distributed text document retrieval systems is strongly influenced by the organization of the inverted index. This paper compares the performance impact on query processing of various physical organizations for inverted lists. We present a new probabilistic model of the database and queries. Simulation experiments determine which variables most strongly influence response time and throughput. This leads to a set of design trade-offs over a wide range of hardware configurations and new parallel query processing strategies. |
URL | http://www-db.stanford.edu/pub/papers/sigmod.93.ps |
---|---|
Title | Caching and Database Scaling in Distributed Shared-Nothing Information Retrieval Systems |
Authors | A. Tomasic , H. Garcia-Molina |
Year | 1993 |
Citation | SIGMOD '93 |
Keywords | |
Abstract | A common class of existing information retrieval system provides access to abstracts. For example Stanford University, through its FOLIO system, provides access to the INSPEC database of abstracts of the literature on physics, computer science, electrical engineering, etc. In this paper this database is studied by using a trace-driven simulation. We focus on physical index design, inverted index caching, and database scaling in a distributed shared-nothing system. All three issues are shown to have a strong effect on response time and throughput. Database scaling is explored in two ways. One way assumes an "optimal" configuration for a single host and then linearly scales the database by duplicating the host architecture as needed. The second way determines the optimal number of hosts given a fixed database size. |
URL | http://www-db.stanford.edu/pub/papers/pdis.93.ps |
---|---|
Title | Performance of Inverted Indices in Shared-Nothing Distributed Text Document Information Retrieval Systems |
Authors | A. Tomasic , H. Garcia-Molina |
Year | 1993 |
Citation | PDIS '93 (best paper award). |
Keywords | |
Abstract | The performance of distributed text document retrieval systems is strongly influenced by the organization of the inverted index. This paper compares the performance impact on query processing of various physical organizations for inverted lists. We present a new probabilistic model of the database and queries. Simulation experiments determine which variables most strongly influence response time and throughput. This leads to a set of design trade-offs over a range of hardware configurations and new parallel query processing strategies. |
URL | http://www-db.stanford.edu/pub/papers/stan.cs.tn.93.3.ps |
---|---|
Title | Correct View Update Translations via Containment |
Authors | A. Tomasic |
Year | 1993 |
Citation | Stanford report |
Keywords | |
Abstract | One approach to the view update problem for deductive databases proves properties of translations that is, a language specifies the meaning of an update to the intensional database (IDB) in terms of updates to the extensional database We argue that the view update problem should be viewed as a question of the expressive power of the translation language and the computational cost of demonstrating properties of a translation. We use an active rule based database language as a means of specifying translations of updates on the IDB into updates on the EDB. This paper uses the containment of one datalog program (or conjunctive query) by another to demonstrate that a translation is semantically correct. We show that the complexity of correctness is lower for insertion than deletion. Finally, we discuss extension to the translation language. |
URL | http://www-db.stanford.edu/pub/papers/iclpwdd.94.ps |
---|---|
Title | Determining Correct View Update Translations via Query Containment |
Authors | A. Tomasic |
Year | 1994 |
Citation | Appears in the International Conference on Logic Programming Workshop on Deductive Databases, 1994 |
Keywords | |
Abstract | Given an intensional database (IDB) and an extension database the view update problem translates updates on the IDB into updates on the EDB. One approach to the view update problem uses a translation langauge to specify the meaning of a view update. In this paper we prove properties of a translation language. This approach to the view update problem studies the expressive power of the translation language and the computational cost of demonstrating properties of a translation. We use an active rule based database language for specifying translations of view updates. This paper uses the containment of one datalog program (or conjunctive query) by another to demonstrate that a translation is semantically correct. We show that the complexity of correctness is lower for insertion than deletion. Finally, we discuss extensions to the translation language. |
URL | http://www-db.stanford.edu/pub/papers/sigmod.94.ps |
---|---|
Title | Incremental Updates of Inverted Lists for Text Document Retrieval |
Authors | A. Tomasic , H. Garcia-Molina , K. Shoens |
Year | 1994 |
Citation | SIGMOD 94. A part of Stanford University Department of Computer Science Technical Report Number STAN-CS-TN-93-1. |
Keywords | |
Abstract | With the proliferation of the orld's "information highways" a renewed interest in effcient document indexing techniques has come about. In this paper, the problem of incremental updates of inverted lists is addressed using a new dual-structure index data structure. The index dynamically separates long and short inverted lists and optimizes the retrieval, update, and storage of each type of list. To study the behavior of the index, a space of engineering tradeoffs which range from optimizing update time to optimizing query performance is described. We quantitatively explore this space by using actual data and hardware in combination with a simulation of an information retrieval system. We then describe the best algorithm for a variety of criteria. |
URL | http://www-db.stanford.edu/pub/papers/ibm_rj.ps |
---|---|
Title | Data Structures for Efficient Broker Implementation |
Authors | A. Tomasic , L. Gravano , C. Lue , P. Schwarz , L. Haas |
Year | 1995 |
Citation | Technical report, IBM Almaden Research Center, June 1995 |
Keywords | |
Abstract | With the profusion of text databases on the Internet, it is becoming increasingly hard to find the most useful databases for a given query. To attack this problem, several existing and proposed systems employ brokers to direct user queries, using a local database of summary information about the available databases. This summary information must effectively distinguish relevant databases, and must be compact while allowing effcient access. We offer evidence that one broker, GlOSS, can be effective at locating databases of interest even in a system of hundreds of databases, and examine the performance of accessing the GlOSS summaries for two promising storage methods: the grid file and partitioned hashing. We show that both methods can be tuned to provide good performance for a particular workload (within a broad range of wand discuss the tradeoffs between the two data structures. As a side effect of our work, we show that grid files are more broadly applicable than previously thought; |
URL | http://www-db.stanford.edu/pub/papers/tois96.ps |
---|---|
Title | Data Structures for Efficient Broker Implementation |
Authors | A. Tomasic , L. Gravano , C. Lue , P. Schwarz , L. Haas |
Year | 1996 |
Citation | Appeared in TOIS '96; journal version of pub/gravano/1995/ibm_rj.ps, technical report, IBM Almaden Research Center, June 1995 |
Keywords | |
Abstract | With the profusion of text databases on the Internet, it is becoming increasingly hard to find the useful databases for a given query. To attack this problem, several existing and proposed systems employ brokers to direct user queries, using a local database of summary information about the available databases. This summary information must effectively distinguish relevant databases, and must be compact while allowing effcient access. We offer evidence that one broker, GlOSS, can be effective at locating databases of interest even in a system of hundreds of databases, and examine the performance of accessing the GlOSS summaries for two promising storage methods: the grid file and partitioned hashing. We show that both methods can be tuned to provide good performance for a particular workload (within a broad range of wand discuss the tradeoffs between the two data structures. As a side effect of our work, we show that grid files are more broadly applicable than previously thought; in p |
URL | http://www-db.stanford.edu/pub/papers/negation.ps |
---|---|
Title | Assigning an Appropriate Meaning to Database Logic With Negation |
Authors | J. Ullman |
Year | 1994 |
Citation | A corrected version of a paper that appeared in "Computers as Our Better Partners" (H. Yamada, Y. Kambayashi, and S. Ohta, eds.) pp. 216--225, World Scientific, Singapore, 1994. |
Keywords | |
Abstract | Deductive database systems -- that is, database systems with a query language based on logical rules -- must allow negated subgoals in rules to express an adequate range of queries. Adherence to classical deductive logic rarely offers the intuitively correct meaning of the rules. Thus, a variety of approaches to defining the "right" meaning of such rules have been developed. In this paper we survey the principal approaches, including stratified negation, well-founded negation, stable-model semantics, and modularly stratified semantics. 1. Motivation Database query languages based on logic in some form are a promising trend in the development of modern database systems. Building on SQL, a language whose logical nature is hidden by the syntax, logical query languages preserve the declarative "say what you want, not how to get it" nature of SQL, while extending the expressiveness of the language beyond what SQL provides. See Ramakrishnan and Ullman [1993] for a survey of languages and sy |
URL | http://www-db.stanford.edu/pub/papers/integration-using-views.ps |
---|---|
Title | Information Integration Using Logical Views |
Authors | J. Ullman |
Year | 1996 |
Citation | Invited paper for 1997 ICDT |
Keywords | |
Abstract | A number of ideas concerning information-integration tools can be thought of constructing answers to queries using views that represent the capabilities of information sources. We review the formal basis of these techniques, which are closely related to containment algorithms for conjunctive queries and/or Datalog programs. Then we compare the approaches taken by AT&T Labs' "Information Manifold" and the Stanford "Tsimmis" project in these terms. 1 Theoretical Background Before addressing information-integration issues, let us review some of the basic ideas concerning conjunctive queries, Datalog programs, and their containment. To begin, we use the logical rule notation from [Ull88]. Example |
URL | http://www-db.stanford.edu/pub/papers/aaai.ps |
---|---|
Title | The Database Approach to Knowledge Representation |
Authors | J. Ullman |
Year | 1996 |
Citation | To appear in AAAI Proceedings, 1996 |
Keywords | |
Abstract | The database theory community, centered around the PODS (Principles of Database Systems) conference has had a long-term interest in logic as a way to represent "data," "information," and "knowledge" (take your pick on the term -- it boils down to facts or atoms and rules, usually Horn The approach of this community has been "slow and steady," preferring to build up carefully from simple special cases to more general ideas, always paying attention to how effciently we can process queries and perform other operations on the facts and rules. A powerful theory has developed, and it is beginning to have some impact on applications, especially information-integration engines. Datalog The term Datalog has been coined to refer to Prolog-like rules without function symbols, treated as logic program. Unlike Prolog, however, the conventional leastfixed-point semantics of the rules is used whenever possible. Example 1: The rules for ancestors can be written as the following Datalog program. anc(X |
URL | http://www-db.stanford.edu/pub/papers/techtransfer.ps |
---|---|
Title | An Analysis of Factors Directing the Admission Process of Artificial Intelligence Technologies |
Authors | V. Vassalos , S. Venkatasubramanian |
Year | 1995 |
Citation | 8th International Symposium on Artificial Intelligence, Monterrey, Mexico, Oct 1995. |
Keywords | AI, industry, technology transfer |
Abstract |
URL | http://www-db.stanford.edu/pub/papers/query-cap.ps |
---|---|
Title | Describing and Using Query Capabilities of Heterogeneous Sources |
Authors | V. Vassalos , Y. Papakonstantinou |
Year | 1997 |
Citation | Technical Report |
Keywords | Information Integration, Heterogeneous Databases, Mediators |
Abstract | Information integration systems have to cope with the different and limited query interfaces of the underlying information sources. First, the integration systems need descriptions of the query capabilities of each source, i.e., the set of queries supported by each source. Second, the integration systems need algorithms for deciding how a query can be answered given the capabilities of the sources. Third, they need to translate a query into the format that the source understands. We present two languages suitable for descriptions of query capabilities of sources and compare their expressive power. We also describe algorithms for deciding whether a query "matches" the description and show their application to the problem of translating user queries into source-specific queries and commands. Finally, we propose new improved algorithms for the problem of answering queries using these descriptions. |
URL | http://www-db.stanford.edu/pub/papers/rule-language.ps |
---|---|
Title | Set-Oriented Production Rules in Relational Database Systems |
Authors | J. Widom , S. Finkelstein |
Year | 1990 |
Citation | Appeared in SIGMOD '90; published while at IBM Almaden |
Keywords | active database, Starburst |
Abstract | We propose incorporating a production rules facility into a relational database system. Such a facility allows definition of database operations that are automatically executed whenever certain conditions are met. In keeping with the set-oriented approach of relational data manipulation languages, our production rules are also set-oriented--they are triggered by sets of changes to the database and may perform sets of changes. The condition and action parts of our production rules may refer to the current state of the database as well as to the sets of changes triggering the rules. We define a syntax for production rule definition as an extension to SQL. A model of system behavior is used to give an exact semantics for production rule execution, taking into account externally-generated operations, selftriggering rules, and simultaneous triggering of multiple rules. |
URL | http://www-db.stanford.edu/pub/papers/srs-implementation.ps |
---|---|
Title | Implementing Set-Oriented Production Rules as an Extension to Starburst |
Authors | J. Widom , R. Cochrane , B. Lindsay |
Year | 1991 |
Citation | Appeared in VLDB '91; published while at IBM Almaden; one cut-and-paste figure missing from ps file |
Keywords | active database, implementation |
Abstract | This paper describes the implementation of a set-oriented database production rule language proposed in earlier papers. Our implementation uses the extensibility features of the Starburst database system, and rule execution is fully integrated into database query and transaction processing. |
URL | http://www-db.stanford.edu/pub/papers/srs-short.ps |
---|---|
Title | The Starburst Rule System: Language Design, Implementation, and Applications |
Authors | J. Widom |
Year | 1992 |
Citation | Appeared in IEEE Data Engineering Bulletin, Dec. 1992; published while at IBM Almaden; short overview, more detailed coverage of similar material appears in starburst-rule-system.ps |
Keywords | active database, overview |
Abstract | This short paper provides an overview of the Starburst Rule System, a production rules facility integrated into the Starburst extensible database system. The rule language is based on arbitrary database state transitions rather than tupleor statement-level changes, yielding a clear and flexible execution semantics. The rule system was implemented rapidly using the extensibility features of Starburst; it is integrated into all aspects of query and transaction processing, including concurrency control, authorization, recovery, etc. Using the Starburst Rule System, we have developed a number of methods for automatically generating database rule applications, including integrity constraints, materialized views, deductive rules, and semantic heterogeneity. |
URL | http://www-db.stanford.edu/pub/papers/denotational.ps |
---|---|
Title | A Denotational Semantics for the Starburst Production Rule Language |
Authors | J. Widom |
Year | 1992 |
Citation | Appeared in SIGMOD Record, Sep. 1992; published while at IBM Almaden |
Keywords | active database, formal semantics |
Abstract | Researchers often complain that the behavior of database production rules is diffcult to reason about and understand, due in part to the lack of formal declarative semantics. It has even been claimed that database production rule languages inherently cannot be given declarative semantics, in contrast to, e.g., deductive database rule languages. In this short paper we dispute this claim by giving a denotational semantics for the Starburst database production rule language. |
URL | http://www-db.stanford.edu/pub/papers/spectrum.ps |
---|---|
Title | Deductive and Active Databases: Two Paradigms or Ends of a Spectrum? |
Authors | J. Widom |
Year | 1993 |
Citation | Appeared in RIDS '93; published while at IBM Almaden |
Keywords | survey |
Abstract | This position paper considers several existing relational database rule languages with a focus on exploring the fundamental differences between deductive and active databases. We find that deductive and active databases do not form two discernible classes, but rather they delineate two ends of a spectrum of database rule languages. We claim that this spectrum corresponds to a notion of abstraction level, with deductive rule languages at a higher level and active rule languages at a lower level. |
URL | http://www-db.stanford.edu/pub/papers/constraint-validity.ps |
---|---|
Title | Validating Constraints with Partial Information: Research Overview |
Authors | J. Widom , A. Gupta , Y. Sagiv , J. Ullman |
Year | 1994 |
Citation | Appeared in DAISD '94; invited paper, similar material appears in PODS '94 paper by same authors |
Keywords | integrity |
Abstract | We are interested in the problem of validating the consistency of integrity constraints when data is modified. In particular, we consider how constraints can be checked with only "partial information". Partial information may include: (1) the constraint specifications only, (2) the constraint specifications and the modified data, or (3) the constraint specifications, the modified data, and portions of the existing data. Methods for constraint checking with partial information can be much more effcient than traditional constraint checking methods (e.g. because work is done at compile time, or because less data is Partial information methods also enable constraint checking in where traditional constraint checking methods fail (e.g. in distributed environments where not all data is We explain how existing methods and results for query containment and for independence can be applied to problems (1) and (2) above, and we give an overview of our research into problem |
URL | http://www-db.stanford.edu/pub/papers/warehouse-research.ps |
---|---|
Title | Research Problems in Data Warehousing |
Authors | J. Widom |
Year | 1995 |
Citation | Appeared in CIKM '95 (invited paper) |
Keywords | research issues, overview, survey |
Abstract | The topic of data warehousing encompasses architectures, algorithms, and tools for bringing together selected data from multiple databases or other information sources into a single repository, called a data warehouse, suitable for direct querying or analysis. In recent years data warehousing has become a prominent buzzword in the database industry, but attention from the database research community has been limited. In this paper we motivate the concept of a data warehouse, we outline a general data warehousing architecture, and we propose a number of technical issues arising from the architecture that we believe are suitable topics for exploratory research. |
URL | http://www-db.stanford.edu/pub/papers/jw-position.ps |
---|---|
Title | Integrating Heterogeneous Databases: Lazy or Eager? |
Authors | J. Widom |
Year | 1996 |
Citation | ACM Computing Surveys 28A(4), December 1996, invited position paper |
Keywords | overview |
Abstract | Integrating Heterogeneous Databases: Lazy or Eager? (Position Statement) Jennifer Widom Stanford University widom@db.stanford.edu http://db.stanford.edu/~widom Providing integrated access to multiple, distributed, heterogeneous, autonomous databases and other information sources is a topic that has been studied in the database research community for well over a decade. There has been a surge of work in the area recently, due primarily to increased demand from customers ("real" customers as well as funding Nevertheless, despite the longevity of the subfield and the current large population of researchers working in the area, no winning solution or even consensus of approach has emerged. In the research community, most approaches to solving the data integration problem are based very roughly on the following two-step process: 1. Accept a query, determine the appropriate set of information sources to answer the query, and generate the appropriate subqueries or commands for each informati |
URL | http://www-db.stanford.edu/pub/papers/starburst-rule-system.ps |
---|---|
Title | The Starburst Active Database Rule System |
Authors | J. Widom |
Year | 1996 |
Citation | To appear in IEEE TKDE, 1996; one cut-and-paste figure missing from ps file |
Keywords | overview |
Abstract | This paper describes our development of the Starburst Rule System, an activdatabase rules facility integrated into the Starburst extensible relational database system at the IBM Almaden Research Center. The Starburst rule language is based on arbitrary database state transitions rather than tupleor statement-level changes, yielding a clear and flexible execution semantics. The rule system has been implemented completely. Its rapid implementation was facilitated by the extensibility features of Starburst, and rule management and rule processing is integrated into all aspects of database processing. Index terms: active database systems, database production rules, extensible database systems, expert database systems |
URL | http://www-db.stanford.edu/pub/papers/fox.ps |
---|---|
Title | A Moose and a Fox Can Aid Scientists with Data Management Problems |
Authors | J. Wiener , Y. Ioannidis |
Year | 1993 |
Citation | Extended version of paper in DBPL 1993. Also UW-Madison Tech Report CS-TR-1182, 1993. |
Keywords | |
Abstract | Fox (Finding Objects of eXperiments) is the declarative query language for Moose (Modeling Objects Of Scientific Experimenan object-oriented data model at the core of a scientific experiment management system (EMS) being developed at Wisconsin. The goal of the EMS is to support scientists in managing their experimental studies and the data that are generated from them. Moose is unique among object-oriented data models in permitting sets to have relationships to classes other than their elements' class, in providing a construct for indexing collections by other collections, such as time series, and in distinguishing structural relationships from non-structural ones. Fox contains several new features necessary to manage experiments, such as support for associative element retrieval from (indexed) sets and highly expressive path expressions. Fox path expressions can traverse any relationship in the schema graph, including inheritance relationships, and in either direction of the relation |
URL | http://www-db.stanford.edu/pub/papers/loading.ps |
---|---|
Title | Bulk Loading into an OODB: A Performance Study |
Authors | J. Wiener , J. Naughton |
Year | 1994 |
Citation | Appeared in VLDB 1994. |
Keywords | |
Abstract | Object-oriented database (OODB) users bring with them large quantities of legacy data (megabytes and even gigabIn addition, scientific OODB users continually generate new data. All this data must be loaded into the OODB. Every relational database system has a load utility , but most OODBs do not. The process of loading data into an OODB is complicated by inter-object references, or relationships, in the data. These relationships are expressed in the OODB as object identifiers, which are not known at the time the load data is generated; they may contain cycles; and there may be implicit system-maintained inverse relationships that must also be stored. W e introduce seven algorithms for loading data into an OODB that examine different techniques for dealing with circular and inverse relationships. W e present a performance study based on both an analytic model and an implementation of all seven algorithms on top of the Shore object repository . Our study demonstrates that it is ortant t |
URL | http://www-db.stanford.edu/pub/papers/whips-arch.ps |
---|---|
Title | A System Prototype for Warehouse View Maintenance |
Authors | J. Wiener , H. Gupta , W. Labio , Y. Zhuge , H. Garcia-Molina , J. Widom |
Year | 1995 |
Citation | Appeared in the Post-Sigmod Workshop on Materialized Views, June, 1996. |
Keywords | |
Abstract | A data warehouse collects and integrates data from multiple, autonomous, heterogeneous, sources. The warehouse effectively maintains one or more materialized views over the source data. In this paper we describe the architecture of the Whips prototype system, which collects, transforms, and integrates data for the warehouse. We show how the required functionality can be divided among cooperating distributed CORBA objects, providing both scalability and the flexibility needed for supporting different application needs and heterogeneous sources. The Whips prototype is a functioning system implemented at Stanford University and we provide preliminary performance results. |
URL | http://www-db.stanford.edu/pub/papers/thesis.ps |
---|---|
Title | Algorithms for Loading Object Databases |
Authors | J. Wiener |
Year | 1995 |
Citation | PhD thesis from University of Wisconsin-Madison, July, 1995. Also UW-Madison Tech Report CS-TR-1278, 1995. |
Keywords | |
Abstract |
URL | http://www-db.stanford.edu/pub/papers/partitioned-list.ps |
---|---|
Title | OODB Loading Revisited: The Partitioned-List Approach |
Authors | J. Wiener , J. Naughton |
Year | 1995 |
Citation | Appeared in VLDB 1995 |
Keywords | |
Abstract | Object-oriented and object-relational databases (OODB) need to be able to load the vast quantities of data that OODB users bring to them. Loading OODB data is significantly more complicated than loading relational data due to the presence of relationships, or references, in the data; the presence of these relationships means that naive loading algorithms are slow to the point of being unusable. In our previous work, we presented the late-invsort algorithm, which performed significantly better than naive algorithms on all the data sets we tested. Unfortunately , further experimentation with the late-invsort algorithm revealed that for large data sets (ones in which a critical data structure of the load algorithm does not fit in the performance of late-invsort rapidly degrades to where it, too, is unusable. In this paper we propose a new algorithm, the partitioned-list algorithm, whose performance almost matches that of late-invsort for smaller data sets but does not degrade for large d |
URL | http://www-db.stanford.edu/pub/papers/incremental.ps |
---|---|
Title | Incremental Loading of Object Databases |
Authors | J. Wiener , J. Naughton |
Year | 1996 |
Citation | Unpublished manuscript, 1996. |
Keywords | |
Abstract | Object-oriented and object-relational databases (OODB) need to be able to load the vast quantities of data that OODB users bring to them. Loading OODB data is significantly more complicated than loading relational data due to the presence of relationships, or references, in the data. In our previous work, we presented algorithms for loading new objects that only share relationships with other new objects. However, it is frequently the case that new objects need to share relationships with objects already in database. In this paper we propose using queries within the load data file to identify the existing objects and suggest using parameterized functions to designate similar queries. We then propose a novel evaluation strategy for the queries that defers evaluation until all the queries can be evaluated together. All of the instantiations of a single query function can then be treated as a join between the parameters and the collection over which the function ranges, rather than evalu |
URL | http://www-db.stanford.edu/pub/papers/sdi-bool-model.ps |
---|---|
Title | Index Structures for Selective Dissemination of Information Under the Boolean Model |
Authors | T. Yan , H. Garcia-Molina |
Year | 1993 |
Citation | submitted to TODS (short version of STAN-CS-92-1454) |
Keywords | |
Abstract | The number, size, and user population of bibliographic and full text document databases are rapidly growing. With a high document arrival rate, it becomes essential for users of such databases to have access to the very latest documents; yet the high document arrival rate also makes it diffcult for the users to keep themselves updated. It is desirable to allow users to subscribe profiles, i.e., queries that are constantly evaluated, so that they will be automatically informed of new additions that may be of interest. Such service is traditionally called Selective Dissemination of Information The high document arrival rate, the huge number of users, and the timeliness requirement of the service pose a challenge in achieving effcient SDI. In this paper, we propose several index structures for indexing profiles and algorithms that effciently match documents against large number of profiles. We also present analysis and simulations results to compare their performance under different scen |
URL | http://www-db.stanford.edu/pub/papers/sdi-vector-model-tr.ps |
---|---|
Title | Index Structures for Information Filtering Under the Vector Space Model |
Authors | T. Yan , H. Garcia-Molina |
Year | 1993 |
Citation | STAN-CS-93-1494 |
Keywords | |
Abstract | With the ever increasing volumes of information generation, users of information systems are facing an information overload. It is desirable to support information filtering as a complement to traditional retrieval mechanism. The number of users, and thus profiles (representing users' long-term inhandled by an information filtering system is potentially huge, and the system has to process a constant stream of incoming information in a timely fashion. The effciency of the filtering process is thus an important issue. In this paper, we study what data structures and algorithms can be used to effciently perform large-scale information filtering under the vector space model, a retrieval model established as being effective. We apply the idea of the standard inverted index to index user profiles. We devise an alternative to the standard inverted index, in which we, instead of indexing every term in a profile, select only the significant ones to index. We evaluate their performance and show |
URL | http://www-db.stanford.edu/pub/papers/sdi-vector-model.ps |
---|---|
Title | Index Structures for Information Filtering Under the Vector Space ModelC |
Authors | T. Yan , H. Garcia-Molina |
Year | 1993 |
Citation | ICDE '94 (short version of STAN-CS-93-1494) |
Keywords | |
Abstract | With the ever increasing volumes of electronic information generation, users of information systems are facing an information overload. It is desirable to support information filtering as a complement to traditional retrieval mechanism. The number of users, and thus profiles (representing users' long-term interhandled by an information filtering system is potentially huge, and the system has to process a constant stream of incoming information in a timely fashion. The effciency of the filtering process is thus an important issue. In this paper, we study what data structures and algorithms can be used to effciently perform large-scale information filtering under the vector space model, a retrieval model established as being effective. We apply the idea of the standard inverted index to index user profiles. We devise an alternative to the standard inverted index, in which we, instead of indexing every term in a profile, select the significant ones to index. We evaluate their performance |
URL | http://www-db.stanford.edu/pub/papers/sift.ps |
---|---|
Title | SIFT -- A Tool for Wide-Area Information Dissemination |
Authors | T. Yan |
Year | 1994 |
Citation | USENIX'95 |
Keywords | |
Abstract | The dissemination model is becoming increasingly important in wide-area information system. In this model, the user subscribes to an information dissemination service by submitting profiles that describe his interests. He then passively receives new, filtered information. The Stanford Information Filtering Tool (SIFT) is a tool to help provide such service. It supports full-text filtering using well-known information retrieval models. The SIFT filtering engine implements novel indexing techniques, capable of processing large volumes of information against a large number of profiles. It runs on several major Unix platforms and is freely available to the public. In this paper we present SIFT's approach to user interest modeling and user-server communication. We demonstrate the processing capability of SIFT by describing a running server that disseminates USENET News. We present an empirical study of SIFT's performance, examining its main memory requirement and ability to scale with info |
URL | http://www-db.stanford.edu/pub/papers/openodb-tm.ps |
---|---|
Title | Integrating a Structured-Text Retrieval System with an Object- Oriented Database System |
Authors | T. Yan |
Year | 1994 |
Citation | VLDB'94 Proceedings |
Keywords | |
Abstract | We describe the integration of a structured-text retrieval system (TextMachine) into an object-oriented database system (OpOur approach is a light-weight one, using the external function capability of the database system to encapsulate the text retrieval system as an external information source. Yet, we are able to provide a tight integration in the query language and processing; the user can access the text retrieval system using a standard database query language. The effcient and effective retrieval of structured text performed by the text retrieval system is seamlessly combined with the rich modeling and general-purpose querying capabilities of the database system, resulting in an integrated system with querying power beyond those of the underlying systems. The integrated system also provides uniform access to textual data in the text retrieval system and structured data in the database system, thereby achieving information fusion. We discuss the design and implementation of our p |
URL | http://www-db.stanford.edu/pub/papers/dsdi.ps |
---|---|
Title | Distributed Selective Dissemination of Information |
Authors | T. Yan |
Year | 1994 |
Citation | PDIS'94 |
Keywords | |
Abstract | To help users cope with information overload, Selective Dissemination of Information (SDI) will increasingly become an important tool in wide area information systems. In an SDI service, users post their long term queries, called profiles, at some SDI servers and continuously receive new, filtered documents. To scale up with the volume of information and the size of user population, we need a distributed SDI service with multiple servers. In this paper we first address the key problem of how to replicate and distribute profiles and documents among SDI servers. We draw a parallel between distributed SDI and the well-studied replica control problem, adapt quorum-based protocols for use in distributed SDI, and compare the performances of the different protocols. Next we address another important problem, that of effcient document delivery mechanisms. We present and evaluate a practical scheme, called profile grouping, which exploits the geographical locality of users to cut down network |
URL | http://www-db.stanford.edu/pub/papers/textjoin.ps |
---|---|
Title | Join Queries with External Text Sources: Execution and Optimization Techniques |
Authors | T. Yan |
Year | 1995 |
Citation | SIGMOD'95 |
Keywords | |
Abstract | Text is a pervasive information type, and many applications require querying over text sources in addition to structured data. This paper studies the problem of query processing in a system that loosely integrates extensible database system and a text retrieval system. We focus on a class of conjunctive queries that include joins between text and structured data, in addition to selections over these two types of data. We adapt techniques from distributed query processing and introduce a novel class of join methods based on probing that is especially useful for joins with text systems, and we present a cost model for the various alternative query processing methods. Experimental results confirm the utility of these methods. The space of query plans is extended due to the additional techniques, and we describe an optimization algorithm for searching this extended space. The techniques we describe in this paper are applicable to other types of external data managers loosely integrated wi |
URL | http://www-db.stanford.edu/pub/papers/infoclient.ps |
---|---|
Title | A Powerful Wide-Area Information Client |
Authors | T. Yan |
Year | 1995 |
Citation | Also appeared in COMPCON'95 |
Keywords | |
Abstract | Wide-area information sharing is rapidly gaining momentum. However, current information systems provide inadequate functionality for users to effciently and effectively search and use the information made available. Users are faced with a vast information space apparently in disarray. Assuming current client-server information system architecture, we argue in this paper that more functionality is needed on the information client side. We describe an information client prototype that supports (1) the use of object technology to model the information space, (2) declarative querying as a complement to navigational search, (3) uniform access and integration of data from diverse sources, and (4) user-specified and transparent caching that is integrated with permanent local storage of remote data. We also identify the needed support from the server side to better achieve these capabilities. |
URL | http://www-db.stanford.edu/pub/papers/dup.ps |
---|---|
Title | Duplicate Detection in Information Dissemination |
Authors | T. Yan |
Year | 1995 |
Citation | Submitted for publication |
Keywords | |
Abstract | Our experience with the SIFT [YGM95] information dissemination system (in use by over 7,000 users daily) has identified an important and generic dissemination problem: duplicate information. In this paper we explain why duplicates arise, we quantify the problem, and we discuss why it impairs information dissemination. We then propose a Duplicate Removal Module (DRM) for an information dissemination system. The removal of duplicates operates on a per user, per document basis { each document read by a user generates a request, or a duplicate restraint. In wide-area environments, the number of restraints handled is very large. We consider the implementation of a DRM, examining alternative algorithms and data structures that may be used. We present a performance evaluation of the alternatives and answer important design questions such as: Which implementation is the best? With "best" scheme, how expensive will duplicate removal be? How much memory is required? How fast can restraints be p |
URL | http://www-db.stanford.edu/pub/papers/yw-tempview.ps |
---|---|
Title | Maintaining Temporal Views Over Non-Historical Information Sources For Data Warehousing |
Authors | J. Yang , J. Widom |
Year | 1997 |
Citation | Submitted for conference publication |
Keywords | |
Abstract | An important use of data warehousing is to provide temporal views over the history of source data that may itself be non-temporal. While recent work in view maintenance is applicable to data warehousing, only non-temporal views have been considered. In this paper, we introduce a framework for maintaining temporal views over non-historical information sources in a data warehousing environment. We describe an architecture for the temporal data warehouse that automatically maintains temporal views over non-temporal base relations, and allows users to ask temporal queries using these views. Because of the dimension of time, a materialized temporal view may need to be updated not only when base relations change, but also as time advances. We present incremental techniques to maintain temporal views for both cases, and outline an implementation of our approach in the WHIPS warehousing prototype at Stanford. |
URL | http://www-db.stanford.edu/pub/papers/anomaly-short.ps |
---|---|
Title | View Maintenance in a Warehousing Environment |
Authors | Y. Zhuge , H. Garcia-Molina , J. Hammer , J. Widom |
Year | 1995 |
Citation | To appear in Proceedings of the ACM SIGMOD International Conference on Management of Data, San Jose, California, June 1995. |
Keywords | |
Abstract | A warehouse is a repository of integrated information drawn from remote data sources. Since a warehouse effectively implements materialized we must maintain the views as the data sources are updated. This view maintenance problem differs from the traditional one in that the view definition and the base data are now decoupled. We show that this decoupling can result in anomalies if traditional algorithms are applied. We introduce a new algorithm, ECA (for "Eager Compensating that eliminates the anomalies. ECA is based on previous incremental view maintenance algorithms, but extra "compensating" queries are used to eliminate anomalies. We also introduce two streamlined versions of ECA for special cases of views and updates, and we present an initial performance study that compares ECA to a view recomputation algorithm in terms of messages transmitted, data transferred, and I/O costs. |
URL | http://www-db.stanford.edu/pub/papers/anomaly-full.ps |
---|---|
Title | View Maintenance in a Warehousing Environment |
Authors | Y. Zhuge , H. Garcia-Molina , J. Hammer , J. Widom |
Year | 1995 |
Citation | Technical Report, long version of the SIGMOD95 paper |
Keywords | |
Abstract | A warehouse is a repository of integrated information drawn from remote data sources. Since a warehouse effectively implements materialized views, we must maintain the views as the data sources are updated. This view maintenance problem differs from the traditional one in that the view definition and the base data are now decoupled. We show that this decoupling can result in anomalies if traditional algorithms are applied. We introduce a new algorithm, ECA (for "Eager Compensating that eliminates the anomalies. ECA is based on previous incremental view maintenance algorithms, but extra "compensating" queries are used to eliminate anomalies. We also introduce two streamlined versions of ECA for special cases of views and updates, and we presenan initial performance study that compares ECA to a view recomputation algorithm in terms of messages transmitted, data transferred, and I/O costs. |
URL | http://www-db.stanford.edu/pub/papers/strobe-full.ps |
---|---|
Title | The Strobe Algorithms for Multi-Source Warehouse Consistency |
Authors | Y. Zhuge , H. Garcia-Molina , J. Wiener |
Year | 1996 |
Citation | Technical Report, long version of the Strobe paper |
Keywords | |
Abstract | A warehouse is a data repository containing integrated information for effcient querying and analysis. Maintaining the consistency of warehouse data is challenging, especially if the data sources are autonomous and views of the data at the warehouse span multiple sources. Transactions containing multiple updates at one or more sources, e.g., batch updates, complicate the consistency problem. In this paper we identify and discuss three fundamental transaction processing scenarios for data warehousing. We define four levels of consistency for warehouse data and present a new family of algorithms, the Strobe family, that maintain consistency as the warehouse is updated, under the various warehousing scenarios. All of the algorithms are incremental and can handle a continuous and overlapping stream of updates from the sources. Our implementation shows that the algorithms are practical and realistic choices for a wide variety of update scenarios. |
URL | http://www-db.stanford.edu/pub/papers/strobe.ps |
---|---|
Title | The Strobe Algorithms for Multi-Source Warehouse Consistency |
Authors | Y. Zhuge , H. Garcia-Molina , J. Wiener |
Year | 1996 |
Citation | PDIS 96 |
Keywords | |
Abstract | A warehouse is a data repository containing integrated information for effcient querying and analysis. Maintaining the consistency of warehouse data is challenging, especially if the data sources are autonomous and views of the data at the warehouse span multiple sources. Transactions containing multiple updates at one or more sources, e.g., batch updates, complicate the consistency problem. In this paper we identify and discuss three fundamental transaction processing scenarios for data warehousing. We define four levels of consistency for warehouse data and present a new family of algorithms, the Strobe family, that maintain consistency as the warehouse is updated, under the various warehousing scenarios. All of the algorithms are incremental and can handle a continuous and overlapping stream of updates from the sources. Our implementation shows that the algorithms are practical and realistic choices for a wide variety of update scenarios. |
URL | http://www-db.stanford.edu/pub/papers/mvc.ps |
---|---|
Title | Multiple View Consistency for Data Warehousing |
Authors | Y. Zhuge , J. Wiener , H. Garcia-Molina |
Year | 1996 |
Citation | Technical report |
Keywords | |
Abstract | A data warehouse stores integrated information from multiple distributed data sources. In effect, the warehouse stores materialized views over the source data. The problem of ensuring data consistency at the warehouse can be divided into two components: ensuring that each view reects a consistent state of the base data, and ensuring that multiple views are mutually consistent. In this paper we study the latter problem, that of guaranteeing multiple view consistency We identify and dene formally three layers of consistency for materialized views in a distributed environment. We present a scalable architecture for consistently handling multiple views in a data warehouse, which we have implemented in the WHIPS(WareHousing Information Project at Stanford) prototype. Finally, we develop simple, scalable, algorithms for achieving MVC at a warehouse. |
URL | http://www-db.stanford.edu/pub/papers/dapd.ps |
---|---|
Title | Consistency Algorithms for Multi-Source Warehouse View Maintenance |
Authors | Y. Zhuge , H. Garcia-Molina , J. Wiener |
Year | 1997 |
Citation | Submitted to Distributed and Parallel Databases Journal (DAPD), Kluwer |
Keywords | data warehouse, data consistency, view maintenance |
Abstract | A warehouse is a data repository containing integrated information for effcient querying and analysis. Maintaining the consistency of warehouse data is challenging, especially if the data sources are autonomous and views of the data at the warehouse span multiple sources. Transactions containing multiple updates at one or more sources, e.g., batch updates, complicate the consistency problem. In this paper we identify and discuss three fundamental transaction processing scenarios for data warehousing. We define four levels of consistency for warehouse data and present a new family of algorithms, the Strobe family, that maintain consistency as the warehouse is updated, under the various warehousing scenarios. All of the algorithms are incremental and can handle a continuous and overlapping stream of updates from the sources. Our implementation shows that the algorithms are practical and realistic choices for a wide variety of update scenarios. Keywords: data warehouse, data consistency, |
URL | http://www-db.stanford.edu/pub/papers/lagii.ps |
---|---|
Title | Database Research: Achievements and Opportunities into the 21st Century EDITORS: A. Silberschatz, M. Stonebraker, J. D. Ullman |
Authors | |
Year | 1995 |
Citation | A report of an NSF workshop on the future of database research held May, 1995. Appears in March 1996 Sigmod Record. |
Keywords | |
Abstract | Database Research: Achievements and Opportunities Into the 21st Century Avi Silberschatz, Mike Stonebraker, Jeff Ullman, editors Report of an NSF Workshop on the Future of Database Systems Research, May 26{27, 1995 1 1 Introduction In February of 1990, a group of database researchers met to examine the prospects for future database research efforts. The resulting report (Silberschatz et al. [1990]) was instrumental in bringing to the public's attention both the significance of prior database research and the number of challenging and important problems that lay ahead. We shall not repeat here the major points made in that report concerning the historical development of relational database systems and transaction management. Rather the reader is encouraged to consult either that report or an on-line document (Gray each of which discusses these and other historical achievements of database research. In May of 1995, a second workshop was convened to consider anew the prospects for databa |
URL | http://www-db.stanford.edu/pub/papers/pub/ullman/1996/kdd.ps |
---|---|
Title | Efficient Implementation of Data Cubes via Materialized Views |
Authors | |
Year | |
Citation | Appears in KDD Proceedings 1996 |
Keywords | |
Abstract |
URL | http://www-db.stanford.edu/pub/papers/SelectionViews.ps |
---|---|
Title | Selection of Views to Materialize in a Data Warehouse |
Authors | |
Year | |
Citation | To appear in ICDT,'97. Keywords: Views, data warehouse, materialized views, AND-OR graphs, selection algorithms |
Keywords | |
Abstract | A data warehouse stores materialized views of data from one or more sources, with the purpose of effciently implementing decisionsupport or OLAP queries. One of the most important decisions in designing a data warehouse is the selection of materialized views to be maintained at the warehouse. The goal is to select an appropriate set of views that minimizes total query response time and the cost of maintaining the selected views, given a limited amount of resource, e.g., materialization time, storage space etc. In this article, we develop a theoretical framework for the general problem of selection of views in a data warehouse. We present competitive polynomial-time heuristics for selection of views to optimize total query response time, for some important special cases of the general data warehouse scenario, viz.: (i) an AND view graph, where each query/view has a unique evaluation, and (ii) an OR view graph, in which any view can be computed from any one of its related views, e.g., d |
[Stanford University | Computer Science Dept | Database Group]
Qingshan Luo
/ qluo@cs.stanford.edu / Last updated on 9/30/96.