## Problem statement Story tags can have multiple parents. This establishes a hierarchy where a tag can have a tree of descendants. We want this directed graph to be acyclic (_DAG_) because cycles don't make sense in a topical hierarchy, and when we query the graph we would have to be extra cautious to not fall in infinite loops. ### Database schema Here are the tables created by django: ```sql CREATE TABLE IF NOT EXISTS "sic_tag" ( "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "name" varchar(40) NOT NULL UNIQUE, "created" datetime NOT NULL, "hex_color" varchar(7) NULL ); CREATE TABLE IF NOT EXISTS "sic_tag_parents" ( "id" integer NOT NULL PRIMARY KEY AUTOINCREMENT, "from_tag_id" integer NOT NULL REFERENCES "sic_tag" ("id") DEFERRABLE INITIALLY DEFERRED, "to_tag_id" integer NOT NULL REFERENCES "sic_tag" ("id") DEFERRABLE INITIALLY DEFERRED ); ``` The table `sic_tag_parents` contains one row for each directed edge. ## Recursive Common Table Expression (CTE) to the rescue `sqlite3` provides us with the ability to perform recursive queries with Common Table Expressions. We can use this to write a query that returns all paths within the graph. Then, when we insert a new parent-child relationship, we check if there already exists a directed path from the child to its parent. Unfortunately it's not possible to make a table `CHECK CONSTRAINT` since the constraint must refer to the table, which exists only after the `CREATE TABLE` statement is finished. We can instead make the check in a `BEFORE INSERT` trigger and `RAISE(ABORT)` if a cycle is detected, thus preventing the insertion. Firstly, the trigger shall execute a `SELECT` query for each inserted row. In order to raise an exception, we have to `SELECT` it: ```sql CREATE TRIGGER cycle_check BEFORE INSERT ON sic_tag_parents FOR EACH ROW BEGIN SELECT RAISE(ABORT, 'Cycle detected'); END ``` Secondly, in order to abort only when a specific condition is met, we can use a `WHERE EXISTS`: ```sql CREATE TRIGGER cycle_check BEFORE INSERT ON sic_tag_parents FOR EACH ROW BEGIN SELECT RAISE(ABORT, 'Cycle detected') WHERE EXISTS (...); END ``` In the `EXISTS` expression we shall put a recursive CTE that returns something (anything) if a cycle exists. If anything is returned, `EXISTS` evaluates to true and `RAISE` is selected. This is the CTE query: ```sql WITH RECURSIVE w(parent, last_visited, already_visited, cycle) AS ( SELECT DISTINCT to_tag_id AS parent, from_tag_id AS last_visited, to_tag_id AS already_visited, 0 AS cycle FROM sic_tag_parents UNION ALL SELECT t.to_tag_id AS parent, t.from_tag_id AS last_visited, already_visited || ', ' || t.to_tag_id, already_visited LIKE '%'||t.to_tag_id||"%" FROM sic_tag_parents AS t JOIN w ON w.last_visited = t.to_tag_id WHERE NOT cycle ) SELECT already_visited, cycle FROM w WHERE last_visited = NEW.to_tag_id AND already_visited LIKE '%'||NEW.from_tag_id||'%' ``` Line-by-line: ```sql WITH RECURSIVE w(parent, last_visited, already_visited, cycle) AS.. ``` This defines a temporary table/view called `w` with these columns as an expression. This is similar to defining regular views as the result of a `SELECT` statement. ```sql SELECT DISTINCT to_tag_id AS parent, from_tag_id AS last_visited, to_tag_id AS already_visited, 0 AS cycle FROM sic_tag_parents ``` Here we select every edge as the start of a potential path. These edges are paths of length 1, and are also acyclic, hence we select 0 as `cycle`. This line is performed one time at the beginning of the CTE, it's the basis of the recursion. ```sql UNION ALL ``` The base select is united with the select that follows: ```sql SELECT t.to_tag_id AS parent, t.from_tag_id AS last_visited, already_visited || ', ' || t.to_tag_id, already_visited LIKE '%'||t.to_tag_id||"%" FROM sic_tag_parents AS t JOIN w ON w.last_visited = t.to_tag_id WHERE NOT cycle ``` Here we `JOIN` `sic_tag_parents` with `w` itself, by adding edges to the already existing paths in the previous iteration of `w`. The path is built as a string of comma separated `id`s. `cycle` is true if `to_tag_id`, the parent, already exists in the previous path. The `WHERE NOT cycle` is necessary to prevent infinite recursion. ```sql SELECT already_visited, cycle FROM w WHERE last_visited = NEW.to_tag_id AND already_visited LIKE '%'||NEW.from_tag_id||'%' ``` Finally, we select the paths that begin with the parent and contain the child. The entire trigger: ```sql CREATE TRIGGER cycle_check BEFORE INSERT ON sic_tag_parents FOR EACH ROW BEGIN SELECT RAISE(ABORT, 'Cycle detected') WHERE EXISTS ( WITH RECURSIVE w(parent, last_visited, already_visited, cycle) AS ( SELECT DISTINCT to_tag_id AS parent, from_tag_id AS last_visited, to_tag_id AS already_visited, 0 AS cycle FROM sic_tag_parents UNION ALL SELECT t.to_tag_id AS parent, t.from_tag_id AS last_visited, already_visited || ', ' || t.to_tag_id, already_visited LIKE '%'||t.to_tag_id||"%" FROM sic_tag_parents AS t JOIN w ON w.last_visited = t.to_tag_id WHERE NOT cycle ) SELECT already_visited, cycle FROM w WHERE last_visited = NEW.to_tag_id AND already_visited LIKE '%'||NEW.from_tag_id||'%' ); END; ``` ## Appendix We can visualise the paths with a CTE. First, create a utility view that associates edges with their corresponding tag names: ```sql CREATE TEMPORARY VIEW sic_tag_parents_name AS select f.name AS from_name, from_tag_id, t.name AS to_name, to_tag_id from sic_tag_parents join sic_tag AS f on f.id = from_tag_id join sic_tag AS t on t.id = to_tag_id; ``` Sample output: ``` sqlite> SELECT * FROM sic_tag_parents_name LIMIT 2; from_name from_tag_id to_name to_tag_id ----------- ----------- ---------------- --------- programming 57 computer science 102 debugging 137 programming 57 ``` Then write a similar CTE as a view that creates a path string out of the tag names: ``` CREATE TEMPORARY VIEW sic_tag_parents_paths AS WITH RECURSIVE w(parent, last_visited, parent_name, last_visited_name, already_visited, cycle, length) AS ( SELECT DISTINCT to_tag_id AS parent, from_tag_id AS last_visited, to_name AS parent_name, from_name AS last_visited_name, to_name AS already_visited, 0 AS cycle, 1 as length FROM sic_tag_parents_name UNION ALL SELECT t.to_tag_id AS parent, t.from_tag_id AS last_visited, t.to_name, t.from_name, t.to_name || ' > ' || already_visited, already_visited LIKE '%'||t.to_name||"%", length+1 AS length FROM sic_tag_parents_name AS t JOIN w ON w.last_visited = t.to_tag_id WHERE NOT cycle ) SELECT * FROM w; ``` The output: ``` sqlite> select distinct(last_visited_name || ' > ' || already_visited) as path, length from sic_tag_parents_paths order by length desc limit 25; path length ------------------------------------------------------------------- ------ c > programming languages > programming > computer science 3 c++ > programming languages > programming > computer science 3 clojure > programming languages > programming > computer science 3 elixir > programming languages > programming > computer science 3 erlang > programming languages > programming > computer science 3 go > programming languages > programming > computer science 3 haskell > programming languages > programming > computer science 3 javascript > programming languages > programming > computer science 3 java > programming languages > programming > computer science 3 lisp > programming languages > programming > computer science 3 lua > programming languages > programming > computer science 3 python > programming languages > programming > computer science 3 ruby > programming languages > programming > computer science 3 rust > programming languages > programming > computer science 3 swift > programming languages > programming > computer science 3 emacs > commandline > unix > operating systems 3 emacs > commandline > unix 2 debugging > programming > computer science 2 programming languages > programming > computer science 2 vcs > programming > computer science 2 commandline > unix > operating systems 2 linux > unix > operating systems 2 openbsd > unix > operating systems 2 dragonflybsd > unix > operating systems 2 sqlite3 > databases > computer science 2 ```