Database challenges for exploratory computing

Buoncristiano, Marcello; Mecca, Giansalvatore; Quintarelli, Elisa; Roveri, Manuel; Santoro, Donatello; Tanca, Letizia

doi:10.1145/2814710.2814714

Helping users to make sense of very big datasetsis nowadays considered an important research topic.However, the tools that are available for data analysispurposes typically address professional data scientists,who, besides a deep knowledge of the domainof interest, master one or more of the followingdisciplines: mathematics, statistics, computerscience, computer engineering, and programming.On the contrary, in our vision it is vital to supportalso different kinds of users who, for various reasons,may want to analyze the data and obtain newinsight from them. Examples of these data enthusiasts[4, 9] are journalists, investors, or politicians:non-technical users who can draw great advantagefrom exploring the data, achieving new and essentialknowledge, instead of reading query results withtons of records.The term data exploration generally refers to adata user being able to find her way through largeamounts of data in order to gather the necessary information.A more technical definition comes fromthe field of statistics, introduced by Tukey [12]: withexploratory data analysis the researcher explores thedata in many possible ways, including the use ofgraphical tools like boxplots or histograms, gainingknowledge from the way data are displayed.Despite the emphasis on visualization, exploratorydata analysis still assumes that the user understandsat least the basics of statistics, while in thispaper we propose a paradigm for database explorationwhich is in turn inspired by the exploratorycomputing vision [2]. We may describe exploratorycomputing as the step-by-step “conversation” of auser and a system that “help each other” to refinethe data exploration process, ultimately gatheringnew knowledge that concretely fullfils the userneeds. The process is seen as a conversation sincethe system provides active support: it not only answersuser’s requests, but also suggests one or morepossible actions that may help the user to focus the exploratory session. This activity may entail theuse of a wide range of different techniques, includingthe use of statistics and data analysis, querysuggestion, advanced visualization tools, etc.The closest analogy [2] is that of a human-tohumandialogue, in which two people talk, and continuouslymake reference to their lives, priorities,knowledge and beliefs, leveraging them in order toprovide the best possible contribution to the dialogue.In essence, through the conversation theyare exploring themselves as well as the informationthat is conveyed through their words. Thisexploration process therefore means investigation,exploration-seeking, comparison-making, and learningaltogether. It is most appropriate for big collectionsof semantically rich data, which typically hideprecious knowledge behind their complexity.In this broad and innovative context, this paperintends to make a significant step further: it proposesa model to concretely perform this kind ofexploration over a database. The model is generalenough to encompass most data models and querylanguages that have been proposed for data managementin the last few years. At the same time,it is precise enough to provide a first formalizationof the problem and reason about the research challengesposed to database researchers by this newparadigm of interaction.

CATALOGO DEI PRODOTTI DELLA RICERCA