DARE: Drawing Adequate REpresentations *

Tiziana Catarci, Giuseppe Santucci
Dipartimento di Informatica e Sistemistica
Università di Roma Sapienza
Via Salaria 113, 00198 Roma, Italy
[catarci/santucci]@dis.uniroma1.it
Maria F. Costabile
Dipartimento di Informatica
Università di Bari
Via Orabona 4, 70126 Bari, Italy
fcosta@iesi.ba.cnr.it

Abstract:

We consider the problem of automatically associating a correct, complete, and possibly highly effective visual representation to any kind of database, despite the particular data model used, the size of the instance set, the nature of the data to be represented.

1. Introduction

When building a database, or, more generally, an information system, it is mandatory to design a friendly interface, which allows the final user to easily access the data of interest, while ignoring the database logical and physical structures, as well as any implementation detail. Very often, such an interface exploits the power of visualization and direct manipulation mechanisms [1]. However, things are not so trivial as they may seem. In particular, it is not sufficient to associate ``any'' visual representation to a database [2, 5, 3, 4]. The visual representation should be carefully chosen to effectively convey all and only the database information content. In other words, a visual representation should be first of all and completein order to be adequate for a certain database [7]. A visual representation is complete if the user can perceive from it all the database information content, and it is correct if s/he can perceive only this.

As an example, let us consider the data in Table 1, which refer to towns in Italy, number of people living in each town, their position and their distance in kilometers with respect to Rome. We may visualize these data through a graph, as the one in Figure 2. In this case the visualization is neither complete nor correct. It is not complete, since not all attribute values in Table 1 have an appropriate representation, there is nothing in the figure to infer information about approximate number of people, distance from Rome or their mutual position. Moreover, the distribution of the towns in the graph may even convey a wrong information about their position, since Naples is very likely interpreted as being North of Rome, Milan as being East, etc., thus the visual representation is also not correct. The example in Figure 3 shows another graph, from which the user can infer all data in Table 1. That representation is complete and correct. Since we are referring to geographic information, we are taking into account that the observer usually considers the distribution of data on the plane according to the four cardinal points. Therefore, the representation in Figure 4 is not correct, since the user assumes the wrong information of Naples being South-West of Rome.

Figure 1: Example of database

Figure: Visual representation, which is neither complete nor correct, of data in Table 1

Figure: Complete and correct visual representation of data in Table 1

Figure: Complete but not correct visual representation of data in Table 1

There is a huge amount of literature on this topic, starting from Mackinlay's precursor work on automatic design of graphical presentations [6], to the ZOO project of the University of Wisconsin (see, e.g., [7]), to many AI based proposals (see, e.g., [8] and [9]), to the EU funded FADIVA project [10], and many others. Basically, all these proposals share two limitations: a) they try to automatically build complete representations, while correcteness, even if it is considered a very relevant property, cannot be formally checked; b) they concentrate on the visualization of either the schema or the instances of the database (not on both). Moreover, some proposals restrict to specific domain and/or applications instead of providing a general solution.

We plan to overcome the above drawbacks by 1) proposing a general theory for establishing the adequacy of a visual representation, once specified the database characteristics, and 2) develop a system, called DARE: Drawing Adequate REpresentations, which implements such a theory and works in two modalities, namely:

Representation Check - checking the adequacy of visual representations proposed by the user. The adequacy is expressed in terms of completeness and correcteness of a visual representation wrt a database. A visual representation is complete if the user can perceive from it all the database information content, and it is correct if s/he can perceive only this.
Representation Generation - automatically associating to any database the most effective visual representation. Such a visual representation has to be not only adequate (as in point a) above), but it has also to convey some database features specified by the designer (e.g., that some concepts are the most relevant).

2. DARE

The DARE system is based on a knowledge base containing different kinds of rules:

Visual rules. Visual rules characterize the different kinds of visual symbols (e.g., they list the visual attributes, see [12], which are associated with the different kinds of visual symbols).
Data rules. Data rules specify the characteristics of the data model, the database schema, and the database instances (e.g., if the designer is using the Entity-Relationship model, s/he will use a data rule to say that, for instance, Person is an Entity, as well as that John is a Person).
Mapping rules. Mapping rules specify the link between data and visual elements (e.g., entities are represented as rectangles, Person is a red rectangle, John is a small red rectangle). Note that special kinds of data objects which are naturally visual, such as images, charts, forms, force the visual representation to adhere to their natural representation.
Perceptual rules. Perceptual rules tell us how the user perceives a visual symbol (i.e. a line, a geometric figure, an icon, etc.), relationships between symbols (i.e. the mutual placements of two figures on the plane), and which is the perceptual effect of relevant visual attributes such as color, texture, etc.

Note that the knowledge base contains predicates concerning different levels of the knowledge used by the system (e.g., instance, schema, and metaschema levels). In order to deal with this aspect, we have designed our system in such a way that:

the knowledge base is partitioned into two layers: the predicate layer and the object layer. As usually, the predicate layer deals with intensional knowledge, whereas the object level is concerned with the extensional part of the knowledge base.
the predicates of the knowledge base are classified into three categories:
1. the predicates concerning those aspects related to the data;
2. the predicates concerning the visual symbols;
3. and the predicates concerning the links between the data and the visual symbols.

Some predicates, mainly concerning layout aspects, have a predefined meaning, and are not explicitly defined in the knowledge base. For instance, since two figures may be positioned exactly one on top of the other, there will be a corresponding predefined predicate Exactlyabove(x,y).

To show an example of application, in the following we define a simple data model, only containing classes, binary relationships between classes, and attributes, a data schema, representing people living in certain cities (attributes of person are age and income, while the only attribute of city is extension), and a possible visual representation for such a data schema, which associates a redtriangle to the class person and a bluetriangle to the class city. Thick lines are associated to binary relationships as well as different visual attributes to data attributes.

displaymath334

Note that, in order for such a representation to satisfy the mapping rules, redtriangle and bluetriangle have to be defined as visual sub-categories of triangle. Moreover, the representation is not complete, since lives has not been associated with a visual category. In checking modality, the system would alert the designer on this problem, and proposes her/him to automatically complete it by adding the assertion Rep(thickline,lives) (since lives is the only relationship there is not the need to further specify thickline). Without further knowledge, e.g. some perceptual rules, once the data schema is populated by the corresponding instances, the system will propose (working in representation generation modality) a simple representation in which spatially distributed red triagles, having different texture and size, are linked by thick lines to blue triangles exhibiting different saturation of the blue. Obviously, such a representation may be easily modified by the designer, and the new representation may be checked again for correctness and completeness. Finally, if some perceptual rules are added later on, the already generated representation may become inconsistent and therefore subject to modification by the system.

Finally, the designer will interact with the DARE system through a graphical interface, providing her/him with basic layout tools and visual mechanisms to specify the different rules (similarly to what is proposed for the DOODLE system [11]).

References

1: T. Catarci, M.F. Costabile, S. Levialdi, and C. Batini. Visual Query Systems for Databases: A Survey. Journal of Visual Languages and Computing, 8(2):215-260,1997.
2: T. Catarci, M.F. Costabile, and M. Matera. Visual Metaphors for Interacting with Databases. ACM SIGCHI Bulletin, 27(2),1995.
3: E. R. Tufte. The Visual Display of Quantitative Information. Graphics Press., Cheshire, Conn., 1983.
4: E. R. Tufte. Envisioning Information. Graphics Press., Cheshire, Conn., 1990.
5: D. Harel. On Visual Formalism. Communications of the ACM, 31(5):514-530,1988.
6: J.D. Mackinlay. Automatic Design of Graphical Presentations, Ph.D. Thesis, Department of Computer Science, Stanford University, 1986.
7: E.M. Haber, Y.E. Ioannidis, and M. Livny. Foundation of Visual Metaphors for Schema Display. Journal of Intelligent Information Systems, 3:263-298,1994.
8: Z. Ahmed (Ed.) Special Issue on Intelligent Visualization Systems. Journal of Visual Languages and Computing, 5, 1994.
9: R. Reyter and A.K. Mackworth. A Logical Framework for Depiction and Image Interpretation. Artificial Intelligence, 41:125-155,1989.
10: Foundations of Advanced 3D Information Visualization, 1996.
European (ESPRIT) Working Group Technical Reports available at http://www-cui.cs.darmstadt.gmd.de:80/visit/
activities/IEEE/Fadiva.
11: I.F. Cruz. DOODLE: A Visual Language for Object-Oriented Databases. In ACM-SIGMOD Intl. Conf. on Management of Data, pages 71-80, 1992.
12: T. Catarci, G. Santucci, and M.F. Costabile. DARE: Drawing Adequate REpresentations. InterData Technical Report available at ftp://ftp.dis.uniroma1.it/pub/catarci/T5-R02.ps.

* Work supported by MURST, under the InterData project