Query and Curiosity – Scientific Data Exploration
Thursday, August 27, during the poster session at HLF15 I strolled around. As I am not a computer scientist or mathematician I honestly had problems in getting an idea, what most of these posters where about. But one, by Abdul Wasay, a second year PhD student at Harvard University, caught my attention.
Just by looking at the poster I got an idea, what Queriosity means and who might benefit from it. It appeared to be a software for scientists, helping them with data exploration. Although instantly I thought: Doesn’t this already exist? Abdul Wasay was kind enough to provide few answers:
Q: What exactly does Queriosity stand for?
Abdul Wasay: Queriosity is a portmanteau of query and curiosity. It is the vision to design a new class of data system tailored for the process of data exploration in science.
The motivation behind Queriosity is that we generate and collect a staggering amount of data. Unfortunately, all data is not created equal and information is not the same as knowledge. Collected data is often sifted through to extract areas of interest and modern data systems are ill-suited for this purpose.
Q: There exist plenty of different algorithms to filter information, used for example by Amazon, Google or Facebook. Why do we need such a system?
AW: Designing data systems for scientific applications is a zero-million dollar business. Well, not really, but is still not as lucrative as designing one for the Facebooks, Googles and Amazons of the world. As a result, considerably less research goes in designing data systems for application in the sciences – a different application domain from business.
In addition, data exploration is quite different from filtering data. Filtering data, inherently assumes that you have an idea of what you are looking for, i.e. the filter predicates. As opposed to this, in data exploration scenarios, you have no or only very little idea of what you are looking for. This difference between filtering and exploration is analogous to that of wine shopping and wine tasting. In the former, you know you want a chardonnay of a certain price range and in the latter, you are willing to try a whole bunch of different wines before settling for one.
Q: Wouldn’t it be enough to copy-paste and modify existing algorithms for this purpose?
AW: Commercial analytical data systems have been designed and optimized for quite some time. Frameworks, algorithms and design principles they use are indeed a good source of inspiration for a data exploration system. Nevertheless, we believe that data exploration is a fundamentally different use case than data analysis and there are several challenges in achieving the vision of a data exploration system, such as Queriosity.
- For instance, how would the notion of “interesting data” be defined and communicated to the data system?
- How can data exploration be done in a semi-automated to completely automated fashion?
- Can information obtained from one data exploration session be applied to another? etc.
Q: Who might benefit from Queriosity?
AW: Queriosity could find potential applications in natural sciences, public health and other fields of science as well as in analysis of public data. At Harvard University, we interact with scientists in various domains to understand the process of data exploration and design our system as a solution to their pain points.
Q: Queriosity is more usable for local data than open data, right?
AW: As it currently stands, the design of Queriosity is independent of the data source. This means that your data can originate locally or from a public source, as long as you have the data, Queriosity can help in the process of data exploration.
Q: We all experience a certain loss of information by filtering it for personalization. So when using Queriosity doesn’t this mean a possible knowledge loss in sciences? How can you reassure the system is trustable?
AW: This is something we are cognizant of and is part of our future research directions. Quoting Megan Price from the hot topic session, “you’d rather have no results than wrong results”. This is something I totally agree with and now that we have a prototype, we will evaluate the accuracy and the extent of applicability of our system.
A. Wasay, M. Athanassoulis, and S. Idreos, “Queriosity: Automated Data Exploration,” in Proceedings of the IEEE International Congress on Big Data, New York, USA, 2015.
For all interested: Here is a long advice list with DOs and DON’Ts for designing conference posters.