Scalable Reasoning in Knowledge Graphs
"Which countries are currently members of the European Union?" "Who is the latest winner of the Turing award?" These are questions whose answers are, today, only a web search and a few milliseconds of waiting away from us. Behind the scenes, we see Google's, Yahoo's, Bing's or IBM Watson's Knowledge Graph (KG) in action. --- "Is my organization compliant with anti money-laundering (AML) policies?" "Are our organization's salaries in line with regulations on eliminating the Gender-Pay-Gap?" These are questions that are of incredible importance, but need a knowledge graph that combines world knowledge, internal data, and, crucially, a deep representation of the relevant knowledge and the possibility to reason over it. Such knowledge graphs must offer sophisticated reasoning power, as well as scalability over large amounts of data. It is not completely surprising that organizations typically cannot give answers to such questions easily, even if they have the necessary understanding: The balance between reasoning power and scalability is an eternal struggle - approaches known so far fail to satisfy important requirements and central research questions remain open. Solving precisely this problem is the goal of the proposed Vienna Research Group backed up by its strong group of national and international collaborators. With the introduction of its Knowledge Graph (KG), Google has coined the name for a new generation of knowledge-based systems that go beyond what was previously expected of areas that include graph databases, knowledge bases, machine-learning systems and rule-based logical reasoners. Beyond Google, companies are recognizing the need for making use of their vast amounts of data and knowledge in the form of Knowledge Graphs. In the same way that databases created the need for Database Management Systems (DBMS) and knowledge bases fostered the creation of Knowledge Base Management Systems (KBMS), the interest in Knowledge Graphs creates a need for academia and industry to understand and develop Knowledge Graph Management Systems (KGMS). The quest towards this new generation of KGMSs meets its grand conflict: reasoning power vs scalability. On the one hand, we require at least *recursive reasoning* to be able to handle the nature of graph-structured data (in our anti money-laundering example, the need to traverse large graph-structured data of connections between organizations). We also need *ontological reasoning* and in particular existential quantification (in our example, to express the rich ontology underlying the financial domain). Beyond such formal requirements, a KGMS needs interfaces to many heterogeneous data sources, including: RDBMSs, graph databases, NoSQL stores, the web, machine-learning and analytics. The search for knowledge representation and reasoning formalisms that satisfy some or all of these criteria has led the community to develop a wide range of languages. Yet, to see the graveness of the problem, let us concretely pick (a) the ability to express Datalog (to capture the requirement for recursive reasoning) and (b) the ability to express SPARQL + OWL 2 QL (to capture the requirement for ontological reasoning). We have to accept at least PTIME data complexity to meet these requirements. However, while sufficient for many conventional applications, PTIME data complexity is often prohibitive for "Big Data" scenarios. By integrating further features needed in practice -- among them aggregation, arithmetic, support for arbitrary functions -- the complexity raises beyond PTIME -- in the extreme case even up to undecidability, depending on how they are integrated. In this Vienna Research Group, together with our national and international collaborators, we will tackle precisely these challenges. Our goal is very ambitious. We want to develop the theoretical and practical foundations for scalable reasoning in knowledge graphs. We will provide the logical, algorithmic and methodological principles, and scalable systems that bring this theory to practice. To achieve our goal, we will design new methods and algorithms that combine the best methodologies and techniques from the database and knowledge representation and reasoning (KRR) areas with those of selected other fields. The breakthrough which we are aiming at allows for explainable reasoning, statistical reasoning and maintainable reasoning at the same time.