The extension of database systems to support multimedia applications requires new mechanisms to ensure the synchronized presentation of multiple media data streams. In order to flexibly and efficiently present multimedia data streams to users, media streams must be segmented into media objects and time constraints among these objects must be specified and maintained. New management tools to effectively manage the time-related characteristic of multimedia data must thus be superimposed on existing database systems. In this paper, we discuss issues relevant to synchronized presentation management in multimedia database systems. We present principles underlying the synchronized presentation of multiple media data streams when delay effects are considered. Various synchronization mechanisms can be designed based on the proposed principles.
Multimedia data refers to the simultaneous use of data in different media forms, including images, audio, video, text, and numerical data. Many multimedia applications, such as recording and playback of motion video and audio, slide presentations, and video conferencing, require continuous presentation of a media data stream and the synchronized display of multiple media data streams. Such synchronization requirements are generally specified by either spatial or temporal relationships among multiple data streams. For example, a motion video and its caption must be synchronized spatially at the appropriate position in a movie, and, in a slide presentation, a sequence of images and speech fragments must be temporally combined and presented to compose one unified and meaningful data stream. This research will focus on only temporal synchronization.
Current database systems are not equipped to represent the entire multimedia data flow, nor may it be desirable for them to support the plethora of retrieval types required to support multimedia data. Thus, continuous media data must be parsed into representable segments, which we term media objects. In order to re-present the original data stream to users, synchronization constraints among media objects must be specified and maintained. Such synchronization is usually termed intra-synchronization. However, if the data stream is composed of media objects from different media streams, additional complications may arise with the timing relationships that may exist among the different types of media data streams. Such media data streams may not be merged prior to storage in a database as such a merger will vastly compound the difficulties of retrieving component media. Thus, the synchronization of multiple media data streams, termed inter-synchronization, becomes an essential prerequisite to any successful multimedia database application. For these reasons, synchronization has been recognized as one of the central problems in multimedia system development [LG90,Ste90].
Literature is replete with commentary on synchronization within operating systems and network architectures [RV93,Ste90,AH91,RRK93,GR93]. However, the techniques proposed at these levels are insufficient to address the problems encountered at the presentation level in database systems. For example, Anderson et al. [AH91] describe techniques for recovering from loss of synchronization between interrupt-driven media I/O devices, and Rangan et al. [RRK93] devise techniques for inter-media synchronization during on-demand multimedia retrival from a server to multiple destinations over integrated networks. Both research efforts are directed toward the synchronization of two types of media streams: master and slave streams. In cases of asynchrony, synchrony is restored by deleting (skipping) media units of a slower slave by the amount it lags behind the master, or by duplicating (pausing) media units of a faster slave by the amount by which it leads the master. Consequently, slower slaves catch up with the master, and faster slaves are forced to wait for the master. However, in the presentation mlevel, it may not be clear whether there exists such a master/slave relationship between media data streams. Furthermore, the presentation of some media streams may not allow skipping of media objects if complete preservation of content is crucial to the presentation.
This paper will propose a framework for supporting continuous and synchronized presentation of multiple media data streams in a multimedia database system. We primarily seek to develop
Our framework proposes approaches to restore synchrony on the presentation of asynchronic multiple media streams. These approaches include a buffering strategy and criteria for scheduling the skipping/pausing of media data streams during asynchronous presentations. These principles guide the design of the various algorithms needed to implement the presentation of multiple media streams. The pragmatic design of these presentation mechanisms is considered elsewhere.
A multimedia data stream consists of a set of data upon which some time constraints are defined. The time constraints may specify discrete, continuous, or step-wise constant time flow relationships among the data. For example, some multimedia streams, such as audio and video, are continuous in nature, in that they flow across time; other data streams, such as slide presentations and animation, have discrete or step-wise time constraints. The multimedia streams may not have convenient boundaries for data representation. To facilitate retrieval of such data in databases, we must break each media stream into a set of media objects. Each media object represents a minimum chunk of the media stream that bears some semantic meaning. Media objects in different media streams may have different internal structures. For example, a continuous video stream can be segmented into a set of media objects, each of which contains a set of video frames with specific semantic meaning. Similarly, a continuous audio stream can be segmented into a set of media objects, each of which contains a set of audio samples with specific semantic meaning. Without loss of generality, we assume that basic data elements delivered from the transportation layer are media objects.
Media objects from different data streams may need to be linked through time constraints to specify their synchronization. For example, in slide presentation applications, an audio object must be played along with a slide object. We define a multimedia unit to be the composition of a set of media objects , where represents the ith media object of the nth media stream participating in the synchronized stream. Thus, a composite data stream made up of multiple media streams consists of a set of multimedia units, where each multimedia unit unifies media objects from multiple media streams. Such multimedia units may also be considered as composite objects. Furthermore, a collection of objects from either a single data stream or a composite data stream may be conceptually modeled as a hierarchical structure. At each increasing level, a set of media objects may be considered to be a superclass object which may then, in turn, be a component of another superclass at a higher level. Further development of such a conceptual model will not be pursued here.
We will now address the synchronization requirements that need be placed on media data streams. There are two types of constraints that need to be specified on media objects: intra- and inter-stream constraints. Intra-stream constraints specify the synchronization requirements to be placed on a single media stream and inter-stream constraints specify the synchronization requirements to be placed on more than one media stream.
Let be a single media stream which consists of a set of media objects . Intra-stream constraints on define time flow relationships among these objects. The intra-stream constraints may be continuous, discrete, or step-wise constant. Unlike temporal constraints defined on data in temporal databases [TCG93], media objects are not typically statically associated with independent time constraints. Owing to the sequential character of media objects in a single media stream, each media object must be associated with a relative start time and a time interval which specifies the duration of its retrieval, assuming that the first media object in the media stream starts at time zero. The actual start time of a media object is usually dynamically determined. Once a media stream is invoked, it is associated with an actual start time; the start time of each media object within that stream will similarly be associated with the actual start time. of the first object in the stream. We use to denote the intra-stream constraint on object o that is presented at time t and lasts a time period .
Let be a composite data stream of media streams . can then be considered as consisting of a set of multimedia units , with each multimedia unit being a composition of media objects from , respectively. Inter-stream constraints on define synchronization requirements among the participating component media streams. Time-related inter-stream constraints are defined implicitly in each media object. That is, for any two media objects (), . This necessitates that the objects have the same duration, i.e., . However, is not necessarily equal to in the original data stream. Under such circumstances, ``silence'' is introduced dynamically in the media object of lesser duration such that
Other inter-stream constraints are specified on media objects as inter-dependency relationships that must be satisfied before the retrieval of the media objects is invoked. As discussed in [COC94], boolean expressions for media objects provide a simple but elegant way to express these inter-dependency relationships. For example, consider a training program that consists of three media streams , , and . Let contain media objects , contain media objects , and contain media objects . Let inter-dependency relationships specify that , , and must be played together for all . We then have the following inter-stream constraints defined on , , and for all , respectively:
The dependency expressions along with the time-related constraints define the synchronization points within . These synchronization points enforce the simultaneous delivery of media objects, thereby providing synchrony in the delivery of data streams, each one possibly associated with a specific media.
In this section, we will investigate the principles of synchronizing media data streams. We assume that the transportation level provides sufficient support for delivering media objects on time. A framework will be developed to permit efficient buffering and the resynchronization of the presentation of multiple media streams in the event of delays.
Intra-stream synchronization maintains intra-stream constraints that are defined on a single data stream. As discussed in Section 2, each data stream must be segmented into a set of media objects in order to facilitate flexible retrievals. The re-presentation of the original data stream can be achieved by ensuring the intra-stream constraints defined on the media objects of the stream. In this context, the central issue is to provide an efficient buffering mechanism so that neither starvation nor overrun will occur.
Let be the time at which the loading of data stream m begins and loading function be the total number of media objects of m loaded at time t. Let be the time at which the consumption of data stream m begins and consuming function be the total number of media objects consumed at time t. The number of media objects that must be buffered at any given time is then , which we denote as
In (1), is fixed when the loading of data stream m is initiated and . We then want to find the smallest such that in the range , where is the time at which the loading of data stream m is completed. Obviously, if for any t in the range , then . Othervise, let the minimum value for , in the range , be at time with , where k>0.
Suppose that a solution is to begin playback at time x. That is, is at least zero for any time . If we compare with in the range , we see that
This shows us that, in the range , differs from by . In (2), we know that, in the range , and . Thus, x must be the minimum start time such that
Let us now consider a situation in which the consuming function is linear [GC92]. That is,
Following (3), in the range , we have
Thus, x should be . As the left side of (4) is a constant when x is determined, and thus have the same slope. As demonstrated by Gemmell [GC92], this value of x is also the intersection of with . In general, the consuming function may not be a single linear function or even a continuous function. In such a situation, we must ensure that formula (3) always hold true. Thus, formula (3) becomes a general buffering criterion for ensuring the continuity and smoothness of the presentation of a media stream. Different consuming functions may determine different minimum start times for their media streams.
We now discuss the effect of loading delays on the buffering criterion given above. Such delays may be caused by network delays or physical storage delays. The basic buffering criterion given in (2) and (3) must be revised in the event of loading delays. Let be the maximum number of media objects that need to be buffered because of loading delays. We then have
Thus, in the event of loading delays, formula (3) can be revised as
Comparing formula (5) with (3), we see that loading delays require additional buffer space.
Inter-stream synchronization maintains inter-stream constraints that are defined on multiple data streams. As media objects from different data streams are composed into multimedia units in the multimedia database, various inter-stream constraints specified on those units must be maintained during the re-presentation of these data streams. We will now discuss the maintenance of these inter-stream constraints.
Let be a multimedia stream of multimedia units which are from media streams . Let each multimedia unit consist of media objects . Each media object () in has intra-stream constraint . Obviously, if all media objects can be displayed at the defined time, then all intra- and inter-stream constraints can be easily maintained. However, the presentation of a media object may be delayed for many reasons, including network delays and physical storage delays. When such delays arise, simply cancelling the whole presentation with its combined multiple media streams may not be an acceptable option. We thus investigate the principles of resynchronization among media objects in the event of delays.
We define a synchronization point to be a point held by all participating media streams needing to be synchronized. Each point of separation between two multimedia units forms a handy synchronization point. Users may define additional synchronization points within each multimedia unit. Synchronization points defined on the composite stream specify the places that synchronous presentation must be checked and maintained. Let denote the maximum time interval that media can skip and denote the current delay time interval with the presentation of media stream . We then have the following situations:
In case (1), synchronous presentation can be restored by simply skipping the amount the delayed media streams lag behind. However, this approach may be inefficient. It requires either substantial extension of buffer space to hold extra media data or pausing of undelayed media streams to wait for the delayed media streams to be read into the buffer. Let be the maximum delay interval occurred among the delayed participating media streams. If each delayed media stream skips the smaller time interval of either or , and simultaneously, the undelayed media streams pause interval, the size of buffer can then be reduced and the pausing interval is also shortened. Clearly, case (2) can follow the same strategy. In case (3), pausing on the undelayed media streams is unavoidable. However, handling of partial delay of each media stream within the interval of can still be compromised between skipping and pausing based on the above approach. In practice, as actual delays are not known in advance, the criteria on maximum delay permitted on media streams must be given. Detailed discussion on this subject can be found in [ZG95].
Following the above approach, the basic buffering criterion given in (2) and (3) must be revised in the event of delays. Let be the number of media objects that need to be buffered because of delays. We then have
A buffering mechanism can be designed based on the above formula. Detailed discussion on a buffering mechanism for video data processing can be found in [Zha96].
A multimedia playout management functionality was developed on top of , an object-oriented database system. This functionality is integrated with the other services provided by the database system like transaction management, storage management, and concurrency control. One of the advantages of such an architecture is it provides adequate database support for multimedia applications demanding script-based interactive multimedia presentations [TK95]. A client-server model wherein the client performs the playout management locally is an ideal candidate for implementing the playout management service (see Figure 1). As everything is handled within the same system, efficient interplay between playout management components and other database management system components is possible.
As shown in Figure 1, the multimedia transaction manager contains two main modules: a multimedia transaction language (MTL) interpreter and a media manager (MM). The multimedia transaction language MTL interpreter allows users to specify a set of transactions associated with a multimedia transaction, including intra- and inter-synchronization requirements on component transactions. A multimedia transaction specified in MTL is then processed by the interpreter, and data accesses are sent to both the MM and the underlying DBMS for processing. Note that the design strategies can be applied to any OODBMS environment that support a C++ interface. Currently existing object-oriented database systems that fit into this category include ObjectStore and ODE.
Figure 1: System model
We have performed some initial experimental analysis based on the system model given in Figure 1. We measured four parameters including average delay, speed ratio, skew, and utilization during the presentation of two media streams; these streams consist of audio and images, respectively. The detailed discussion on this topic can be found in [ZG95].