182 lines
6.1 KiB
TeX
182 lines
6.1 KiB
TeX
\documentclass{article}
|
|
|
|
\usepackage{cours}
|
|
|
|
\title{Keys in Graphs}
|
|
\date{January, $13^{\text{th}}$ 2022}
|
|
|
|
\begin{document}
|
|
|
|
\maketitle
|
|
|
|
\section{Introduction}
|
|
|
|
This project aims to find Graph keys, as defined in
|
|
\footnote{\url{https://www.researchgate.net/publication/283189709_Keys_for_graphs}}.
|
|
A Graph Key describes the relations that an object can have with their keys, and
|
|
what relations these involved objects can have.
|
|
|
|
\medskip
|
|
|
|
For example, a Graph Key for a book can be:
|
|
|
|
\begin{center}
|
|
\begin{tikzpicture}[y=3cm]
|
|
\node[draw] (0) at (0, 0) {Book};
|
|
\node[] (00) at (-3, -1) {x};
|
|
\node[draw] (01) at (-1, -1) {Person};
|
|
\node[] (02) at (1, -1) {y};
|
|
\node[draw] (03) at (3, -1) {Company};
|
|
\node[draw] (010) at (-2, -2) {Country};
|
|
\node[] (011) at (0, -2) {z};
|
|
\node[] (030) at (3, -2) {t};
|
|
\draw[->] (0) -- (00) node[midway,above,sloped] {title};
|
|
\draw[->] (0) -- (01) node[midway,above,sloped] {author};
|
|
\draw[->] (0) -- (02) node[midway,above,sloped] {subtitle};
|
|
\draw[->] (0) -- (03) node[midway,above,sloped] {publisher};
|
|
\draw[->] (01) -- (010) node[midway,above,sloped] {nationality};
|
|
\draw[->] (01) -- (011) node[midway,above,sloped] {last name};
|
|
\draw[->] (03) -- (030) node[midway,above,sloped] {identifier};
|
|
\end{tikzpicture}
|
|
\end{center}
|
|
|
|
That's to say, a book can be described with its title and its subtitle, the last
|
|
name and the nationality of its author, and the public identifier of the publisher.
|
|
|
|
\medskip
|
|
|
|
To generate these keys, one suggests to find $n$-almost keys using SAKey,
|
|
then to explore involved relations that define a domain and a range, and to
|
|
explore recursively the related fields.
|
|
|
|
\section{Proposed solution}
|
|
|
|
The proposed tool is available here:
|
|
\url{https://gitlab.crans.org/ynerant/graph-keys}
|
|
|
|
\subsection{Requirements}
|
|
|
|
The project is made with Python 3.9 and Python 3.10, and uses \texttt{BeautifulSoup4}
|
|
and \texttt{SPARQLWrapper} as libraries.
|
|
|
|
\subsection{Principle}
|
|
|
|
\medskip
|
|
|
|
The program takes as an input the class name that we want to explore, and a
|
|
threshold $n$ that is the number of allowed exceptions for SAKey. The class name
|
|
has to be existing in DBPedia since we make only DBPedia queries.
|
|
|
|
\medskip
|
|
|
|
First, we load the ontology of DBPedia and load a lot of relations. The goal is
|
|
to define the range and the domain of most relations. For example, we learn that
|
|
the relation \texttt{inCemetery} has the domain \texttt{GraveMonument} and the
|
|
range \texttt{Cemetery}.
|
|
|
|
\medskip
|
|
|
|
The next step is to query DBPedia to get all elements of the type of the input
|
|
class, then to get all triples \texttt{?x ?r ?y} that are involving these elements.
|
|
Since datasets can be very big, we limit the output by default to 1000 triples,
|
|
but this value can be changed using the option \texttt{-{}-limit}.
|
|
|
|
\medskip
|
|
|
|
We now store all these triples, and give them to the \emph{SAKey} tool, in order to
|
|
extract the $n$-almost keys of the dataset, where $n$ is given in input. These keys
|
|
are relations between our input, and we can now explore further. To get relevant
|
|
data, we made the choice to consider only the relations that has a defined range,
|
|
which is not a primitive (like integers, strings, dates, $\ldots$). In a general
|
|
case, we get only a few but relevant results.
|
|
|
|
\medskip
|
|
|
|
In the following, we consider the exploration of Graph Keys for the class
|
|
\texttt{Library}. We run the following command:
|
|
|
|
\begin{center}
|
|
\texttt{./main.py Library 5 -{}-limit 3000 -{}-recursion 3}
|
|
\end{center}
|
|
|
|
The last option will be explained further.
|
|
|
|
\medskip
|
|
|
|
For this example, we get the relation \texttt{location}, which has for range
|
|
\texttt{Place}.
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\begin{tikzpicture}[y=3cm]
|
|
\node[draw] (0) at (0.00, -0) {Library};
|
|
\node[draw] (0-0) at (0.00, -1) {Place};
|
|
\draw[->] (0) -- (0-0) node[midway,above,sloped] {location};
|
|
\end{tikzpicture}
|
|
\caption{Single discovered key}
|
|
\end{figure}
|
|
|
|
Since we discovered a relation between \texttt{Library} and \texttt{Place}, we
|
|
can now explore the keys of the class \texttt{Place} and extend the key for
|
|
\texttt{Library}.
|
|
|
|
\medskip
|
|
|
|
The program takes as optional input the parameter \texttt{-{}recursion}, which
|
|
limits the height of the output graphs. A value of $1$ gives normal keys from
|
|
\emph{SAKey}.
|
|
|
|
\medskip
|
|
|
|
When we take the example of \texttt{Library}, we get multiple outputs, like the
|
|
tree bellow:
|
|
|
|
\begin{figure}[H]
|
|
\centering
|
|
\begin{tikzpicture}[y=3cm]
|
|
\node[draw] (0) at (0.00, -0) {Library};
|
|
\node[draw] (0-0) at (0.00, -1) {Place};
|
|
\node[draw] (0-0-0) at (0.00, -2) {City};
|
|
\node[draw] (0-0-0-0) at (0.00, -3) {Image};
|
|
\draw[->] (0-0-0) -- (0-0-0-0) node[midway,above,sloped] {thumbnail};
|
|
\draw[->] (0-0) -- (0-0-0) node[midway,above,sloped] {capital};
|
|
\draw[->] (0) -- (0-0) node[midway,above,sloped] {location};
|
|
\end{tikzpicture}
|
|
\caption{Sample output of the program}
|
|
\end{figure}
|
|
|
|
\subsection{Limitations and further works}
|
|
|
|
While SAKey gives a huge amount of keys, only a few of them are well-typed, in the
|
|
sense that they define a comprehensive range. That gives us a serious limitation
|
|
to our algorithm.
|
|
|
|
\medskip
|
|
|
|
Moreover, the current algorithm does not take into account the rules that don't
|
|
have any defined range, which are the most. This isn't really a problem and can
|
|
easily be patched, since this generates more graphs, but does not provide any
|
|
additional input to extend data, as said before.
|
|
|
|
\medskip
|
|
|
|
We can notice that graph keys are for the most only paths (degrees of the nodes are
|
|
2 except for the extremal nodes). This is related to the fact that generated keys
|
|
are minimal, and most of them contain only one parameter. Our algorithm should
|
|
be able to generate any type of tree graph.
|
|
|
|
\medskip
|
|
|
|
However, our algorithm may not guess all existing graph keys. Current output graphs
|
|
have the property that if we truncate each graph to a given depth, then it stays a
|
|
valid graph key (at least a $n$-almost graph key), which has no reason to be true
|
|
in a general context. To cover this issue, we may extend the SAKey algorithm to
|
|
extend directly the properties with their ranges.
|
|
|
|
\medskip
|
|
|
|
To go further, we should take into account the missing properties, and find a way
|
|
to generate more complex graphs, and to find minimal graphs that are not flat.
|
|
|
|
\end{document}
|