Today I’m going to blog about persisting hierarchies (tree-like structures) in relational databases. Actually not persisting, but more retrieving — as you most probably know hierarchies are stored in a database with just a single additional field referring to a parent record (the “adjacency” model). I’m doing this more for my personal back reference, but of course I hope that it will help some other developers too. For our demonstration purposes we will refer to the most popular relational database in the world — SQLite. It is single-file-based and is demanding zero maintenance effort. We will retrieve data from Java code and the Xerial SQLite JDBC driver will help us to do that. There are different approaches to retrieving hierarchies from a relational database, they are discussed in a good number of books. Here I will discuss the approach that worked for me best from the performance point of view. I was faced with a requirement to be able to retrieve a hierarchy from a relational database starting from a defined node. Additional requirement was that it needed to be sorted in a threaded order, like forum messages: node a sub-node of node a sub-subnode of node a node b etc I altavisted for ‘Trees Hierarchies SQL’ and found this good article by Mr. Volk which described a general approach that actually addresses both requirements. Maintaining paths for each element appears to be a great way to address the sorting part of the requirements, in case nesting of elements is not too deep. Unfortunately it is not very good in addressing the first requirement — retrieving the tree starting from a given node. But lets try the sorting part first. Suppose that we need to persist and retrieve an org chart, e.g. the sample org chart from WikiPedia. First, we need to create a database to store the data. Once you have SQLite downloaded and installed (archieve unpacked) doing that is as easy as running this from the command line: sqlite3 martinets.db A database file called martinets.db gets created and SQLite prompt is ready for your input. We are ready to create a table to store our martinets, to do that copy and paste the next script into sqlite command line: CREATE TABLE MARTINET ( ID INTEGER not null, NAME VARCHAR(20), REPORTS_TO INTEGER, primary key (ID) ); To keep track of martinets we will need a separate table holding node paths: CREATE TABLE MARTINET_PATH ( MARTINET_ID INTEGER, PATH VARCHAR(1000), foreign key (MARTINET_ID) references MARTINET (ID) ); Each node path will be something like ‘/2/5/7/4’ — which means that node with ID 4 is a child of a node with ID 7 etc. To maintain path information in this table we need a trigger firing upon insertion of data in the martinet table: CREATE TRIGGER FILL_PATH AFTER INSERT ON MARTINET begin insert into MARTINET_PATH(MARTINET_ID, PATH) values (new.ID, ( select ifnull(PATH,'')||'/'||ifnull(new.ID,'') from MARTINET_PATH where MARTINET_ID=new.REPORTS_TO )); end; The next script represents a sample orgchart of martinets from WikiPedia. Copy and paste it into a new file Martinets.sql the sqlite3 console: insert into MARTINET(name,reports_to) values('general',null); insert into MARTINET(name,reports_to) values('colonel a',1); insert into MARTINET(name,reports_to) values('colonel b',1); insert into MARTINET(name,reports_to) values('captain a',3); insert into MARTINET(name,reports_to) values('captain b',3); insert into MARTINET(name,reports_to) values('captain c',3); insert into MARTINET(name,reports_to) values('sergeant a',4); insert into MARTINET(name,reports_to) values('sergeant b',4); insert into MARTINET(name,reports_to) values('private a',8); insert into MARTINET(name,reports_to) values('private b',8); It’s possible to run sql statements from a file with .read command: .read Martinets.sql After the above command finishes all paths get created by the trigger. To see all paths from the path storage table issue this SQL statement: select * from martinet_path; 1| 2|/2 3|/3 4|/3/4 5|/3/5 6|/3/6 7|/3/4/7 8|/3/4/8 9|/3/4/8/9 10|/3/4/8/10 Now everything is ready for a good SQL query that outputs org chart sorted as threads, i.e. node 7 coming just after its parent node 4, and node 5 appearing after node 4 and all of its siblings (nodes 7, 8, 9 and 10): select a.id,a.name from martinet a join martinet_path b on a.id=b.martinet_id order by b.path; 1|general 2|colonel a 3|colonel b 4|captain a 7|sergeant a 8|sergeant b 10|private b 9|private a 5|captain b 6|captain c That’s good, but we’re also required to be able to retrieve a tree of siblings for a given node. We can achieve this by simply adding a where clause to the query, something like: select a.id,a.name from martinet a join martinet_path b on a.id=b.martinet_id where b.path like '%/4/%' order by b.path; 7|sergeant a 8|sergeant b 10|private b 9|private a But that’s not good — you don’t want DB engine to do character manipulations on supposedly millions of martinet paths. I don’t think that ‘like’ expressions can ever get indexed. In my opinion it is a better idea to keep track of tree structure (martinet subordinations) in a separate table: CREATE TABLE SUBORDINATION_TREE ( MARTINET_ID INTEGER, REPORTS_TO INTEGER, RELATION INTEGER not null, foreign key (REPORTS_TO) references MARTINET (ID), foreign key (MARTINET_ID) references MARTINET (ID) ); Relation column contains the depth (level) of relation between martinet and his superviser. For a direct relation between captain a and colonel b it will contain 1, for second-level relations (e.g. relation between captain a and general) it will have 2. To populate subordination tree table when new martinet is inserted into martinets table we need a trigger: CREATE TRIGGER FILL_TREE AFTER INSERT ON MARTINET begin insert into SUBORDINATION_TREE(MARTINET_ID,REPORTS_TO,RELATION) values(new.ID, new.ID, 0); insert into SUBORDINATION_TREE(MARTINET_ID,REPORTS_TO,RELATION) select new.ID, A, 1 from (select ID as A from MARTINET where new.REPORTS_TO = ID); end; As you can see the trigger is inserting a record into subordination tree on each insertion to martinets table, with one exception – if report_to field of a martinet points to a non-existing martinet, which most probably means that the martinet being inserted is the root (a general). Trigger is inserting just one row with information about direct relations (relation column is 1). Now we need another trigger that would populate the rest of relation levels (deeper than 1) of a newly inserted record. The only way we can populate an unknown number of relation levels is to make trigger execute recursively: each new record inserted into subordination tree table will trigger the trigger. Here is the trigger itself: CREATE TRIGGER FILL_TREE_RECURSE INSERT ON SUBORDINATION_TREE begin insert into SUBORDINATION_TREE(MARTINET_ID,REPORTS_TO,RELATION) select new.MARTINET_ID, B.REPORTS_TO, new.RELATION+1 from MARTINET A join SUBORDINATION_TREE B on ID=new.REPORTS_TO and MARTINET_ID=new.REPORTS_TO and RELATION=1; end; Another important thing is that SQLite supports recursive triggers starting from version 3.6.18, and currently it needs to be explicitly instructed to turn on recursive trigggers by setting a pragma (look at my previous blog entry for more detail). So, the database part is ready. The full schema listing follows, copy the script into a file SqlTree.sql: CREATE TABLE MARTINET ( ID INTEGER not null, NAME VARCHAR(20), REPORTS_TO INTEGER, primary key (ID) ); CREATE TRIGGER FILL_TREE AFTER INSERT ON MARTINET begin insert into SUBORDINATION_TREE(MARTINET_ID,REPORTS_TO,RELATION) values(new.ID, new.ID, 0); insert into SUBORDINATION_TREE(MARTINET_ID,REPORTS_TO,RELATION) select new.ID, A, 1 from (select ID as A from MARTINET where new.REPORTS_TO = ID); end; CREATE TABLE SUBORDINATION_TREE ( MARTINET_ID INTEGER, REPORTS_TO INTEGER, RELATION INTEGER not null, foreign key (REPORTS_TO) references MARTINET (ID), foreign key (MARTINET_ID) references MARTINET (ID) ); CREATE TRIGGER FILL_TREE_RECURSE INSERT ON SUBORDINATION_TREE begin insert into SUBORDINATION_TREE(MARTINET_ID,REPORTS_TO,RELATION) select new.MARTINET_ID, B.REPORTS_TO, new.RELATION+1 from MARTINET A join SUBORDINATION_TREE B on ID=new.REPORTS_TO and MARTINET_ID=new.REPORTS_TO and RELATION=1; end; CREATE TABLE MARTINET_PATH ( MARTINET_ID INTEGER, PATH VARCHAR(1000), foreign key (MARTINET_ID) references MARTINET (ID) ); CREATE TRIGGER FILL_PATH AFTER INSERT ON MARTINET begin insert into MARTINET_PATH(MARTINET_ID, PATH) values (new.ID, ( select ifnull(PATH,'')||'/'||ifnull(new.ID,'') from MARTINET_PATH where MARTINET_ID=new.REPORTS_TO )); end; Delete the file martinets.db and recreate it again (issue ‘sqlite3 martinets.db’). Then create a schema from the SqlTree.sql (issue the next line from sqlite command line): .read SqlTree.sql Next step is to set PRAGMA to enable recursive triggers: PRAGMA recursive_triggers = true; This time when we read data from the Martinets.sql three triggers will get fired on each insertion: the one that fill paths (FILL_PATH), the one that inserts single record in subordination tree table (FILL_TREE), and the one that recurssively fills all levels of subordination (FILL_TREE_RECURSE). Re-issue the command: .read Martinets.sql To make sure that recursive trigger worked fine take a look at the subordination table contents, which for private b is supposed to look as: select * from SUBORDINATION_TREE where MARTINET_ID=10; 10|10|0 10|1|4 10|3|3 10|4|2 10|8|1 Now we are ready to wrap all that in a small Java class that will output the thread-like structure from the database, starting from any one. Since you’re reading this on Javaworld I assume you know Java and will not go into details about the Java part. Here is the code, put it into SqlTree.java file: import java.sql.*; public class SqlTree { public static void main(String[] args) throws Exception { Class.forName("org.sqlite.JDBC"); Connection conn = DriverManager.getConnection("jdbc:sqlite:martinets.db"); Statement stmt = conn.createStatement(); ResultSet rs = stmt.executeQuery("select c.relation,a.name from martinet a join martinet_path b on a.id=b.martinet_id join subordination_tree c on a.id=c.martinet_id and c.reports_to="+(args.length>0?args[0]:1)+" order by b.path"); while(rs.next()) { System.out.println(String.valueOf(new char[2*rs.getInt("relation")]).replace(" Software Development