baiao.ilp03.doc - Description - University of Wisconsin?Madison
Ph.D. Philosophy, 2012. M.A., Philosophy, 2009. Dissertation: ... Handbuch
Verantwortung (Springer VS publishers, 2016). ?Review of Michael Bratman,
Shared ...
Part of the document
Applying Theory Revision to the
Design of Distributed Databases
Fernanda Baião1, Marta Mattoso1, Jude Shavlik2, Gerson Zaverucha1 1 Department of Computer Science - COPPE, Federal University of Rio de
Janeiro (UFRJ)
PO Box 68511, Rio de Janeiro, RJ 21941-972 Brazil
{baiao, marta, gerson}@cos.ufrj.br
2 Computer Sciences Department, University of Wisconsin-Madison
1210 West Dayton Street, Madison, WI 53706 USA
shavlik@cs.wisc.edu Abstract. This work presents the application of theory revision to the
design of distributed databases to automatically revise a heuristic-
based algorithm (called analysis algorithm) through the use of the
FORTE system. The analysis algorithm decides the fragmentation
technique to be used in each class of the database and its Prolog
implementation is provided as the initial domain theory. Fragmentation
schemas with previously known performance, obtained from experimental
results on top of an object database benchmark, are provided as the set
of examples. We show the effectiveness of our approach in finding
better fragmentation schemas with improved performance.
1 Introduction Distributed and parallel processing on database management systems are
efficient ways of improving performance of applications that manipulate
large volumes of data. This may be accomplished by removing irrelevant data
accessed during the execution of queries and by reducing the data exchange
among sites, which are the two main goals of the design of distributed
databases [28]. However, in order to improve performance of these
applications, it is very important to design information distribution
properly.
The distribution design involves making decisions on the fragmentation and
placement of data across the sites of a computer network. The first phase
of the distribution design is the fragmentation phase, which is the focus
of this work. To fragment a class of objects, it is possible to use two
basic techniques: horizontal and vertical fragmentation [28], which may be
combined and applied in many different ways to define the final
fragmentation schema.
The class fragmentation problem in the design of a distributed database is
known to be an NP-hard problem [28]. There are a number of works in the
literature addressing the horizontal [7, 14, 31] or vertical [6, 15] class
fragmentation technique, but not both. Even when the designer decides to
use a horizontal fragmentation algorithm to one class and a vertical
fragmentation algorithm to another class, he is left with no assistance to
make this decision. Our previous work proposed a set of heuristics to drive
the choice of the fragmentation technique to be applied in each class of
the database schema. Those heuristics were implemented in an algorithm
called "analysis algorithm" [2], and were incorporated in a methodology
that includes the analysis algorithm, horizontal and vertical class
fragmentation algorithms adapted from the literature. Experimental results
reported in [3, 4] show applications that were executed 3.4 times faster
when applying the fragmentation schema resulted from our methodology,
compared to other alternative fragmentation schemas proposed by other works
in the literature.
Experimental results from real applications can continuously provide
heuristics for the design of distributed object databases (DDODB) that may
be incorporated in our analysis algorithm. Indeed, we have tried to
manually improve the analysis algorithm using experimental results from
[23, 25], which required a detailed analysis of each result and manual
modifications on the analysis algorithm. However, the formalization of new
heuristics from these experiments and their incorporation in the analysis
algorithm, while maintaining previous heuristics consistent, proved to be
an increasingly difficult task.
This work proposes the use of Theory REvisioN on the Design of Distributed
Databases (TREND3), showing how it automatically improves our analysis
algorithm through the use of the FORTE system [29]. TREND3 is a module of a
framework that handles the class fragmentation problem of the design of
distributed databases, defined in [5].
There are approaches in the literature addressing the DDODB problem [4, 6,
7, 13, 14, 15, 16, 20, 24, 31]. However, none of them addresses the problem
of choosing the most adequate fragmentation technique to be applied to each
class of the database schema. Some works have been applying machine
learning techniques to solve database problems. For example, [8, 9] present
an approach for the inductive design of deductive databases, based on the
database instances to define some intentional predicates. Also, relational
bayesian networks were used to estimate query selectivity in a query
processor [19] and to predict the structure of relational databases [18].
However, considering the design of distributed databases as an application
for theory revision is a novel approach.
The paper is organized as follows: in section 2, the design of distributed
databases is defined and our framework for the design of distributed
databases is described. Theory revision is briefly reviewed in section 3,
while in section 4 we show how to improve a DDODB analysis algorithm
through the use of the FORTE system. Experimental results on top of the OO7
benchmark [12] are presented in section 5. Finally, section 6 presents some
conclusions and future work. 2 A Framework for the Design of Distributed Databases This section defines the problem of designing a distributed database,
focusing on the object-oriented model, and presents a framework we propose
for the class fragmentation phase of the distribution design. 2.1 The Design of Distributed Databases The distribution design of a database makes decisions on the fragmentation
and placement of data across the sites of a computer network. The first
phase of the distribution design is the fragmentation phase, which is the
process of isolating into fragments specific data accessed by the most
relevant applications that run over the database.
In an object-oriented database, data is represented as objects. The set of
objects sharing the same structure and behavior define a class, and classes
may be related to each other through relationships. A database schema
describes the set of classes and relationships. The UML diagram
representing the database schema of the OO7 benchmark [12] is illustrated
in figure 1. The OO7 benchmark is a generic application on top of a
database of design objects assembled through the composition of parts. We
may notice, for example, that each composite part is related to N atomic
parts (through the "parts" relationship), and that each atomic part "is
part of" one composite part. [pic]
Fig. 1. The OO7 benchmark database schema Given the schema of the database to be distributed, as in any distribution
design methodology, we need to capture the set of operations over the
database and quantitative information in order to define a fragmentation
schema to be applied on the database schema, which is the goal of the
fragmentation phase.
The operations are captured by decomposing the application running over
the database, and are classified into selection, projection or navigation
operations according to the definitions from [2]. Quantitative information
needed includes the cardinality of each class (i.e., its estimated size:
small, medium or large) and the execution frequency of each operation.
The fragmentation schema is composed of the choice of a fragmentation
technique and the definition of a set of fragments for each class of the
database schema. The two basic fragmentation techniques to be applied on a
class are horizontal and vertical fragmentation [28]. Vertical
fragmentation breaks the class logical structure (its attributes and
methods) and distributes them into fragments. Horizontal fragmentation
distributes class instances across the fragments. Thus, a horizontal
fragment of a class contains a subset of the whole class extension.
Horizontal fragmentation is usually subdivided into primary and derived
horizontal fragmentation. Primary horizontal fragmentation basically
optimizes selection and projection operations, while derived horizontal
fragmentation addresses the relationships between classes and improves
performance of navigation operations. It is also possible to apply both
vertical and primary horizontal fragmentation techniques to a class
simultaneously (which we call hybrid fragmentation) or to apply different
fragmentation techniques to different classes in the database schema (which
we call mixed fragmentation).
In the object oriented data model, additional issues contribute to
increase the difficulty of the class fragmentation and turn it into an even
more complex problem. Our previous work proposed a set of heuristics
implemented by an algorithm (called "analysis algorithm") [2]. Some
examples of the heuristics proposed are "in the case of a selection
operation on a class with a large cardinality, this class is indicated to
primary horizontal fragmentation", or "in the case of a projection
operation on a class with a large cardinality that is not derived
horizontally fragmented, this class is indicated to vertical
fragmentation". The alg