What are the Options for Storing Hierarchical Data in a Relational Database?
- by orangepips
Good Overviews
One more Nested Intervals vs. Adjacency List comparison: the best comparison of Adjacency List, Materialized Path, Nested Set and Nested Interval I've found.
Models for hierarchical data: slides with good explanations of tradeoffs and example usage
Representing hierarchies in MySQL: very good overview of Nested Set in particular
Hierarchical data in RDBMSs: most comprehensive and well organized set of links I've seen, but not much in the way on explanation
Options
Ones I am aware of and general features:
Adjacency List:
Columns: ID, ParentID
Easy to implement.
Cheap node moves, inserts, and deletes.
Expensive to find level (can store as a computed column), ancestry & descendants (Bridge Hierarchy combined with level column can solve), path (Lineage Column can solve).
Use Common Table Expressions in those databases that support them to traverse.
Nested Set (a.k.a Modified Preorder Tree Traversal)
First described by Joe Celko - covered in depth in his book Trees and Hierarchies in SQL for Smarties
Columns: Left, Right
Cheap level, ancestry, descendants
Compared to Adjacency List, moves, inserts, deletes more expensive.
Requires a specific sort order (e.g. created). So sorting all descendants in a different order requires additional work.
Nested Intervals
Combination of Nested Sets and Materialized Path where left/right columns are floating point decimals instead of integers and encode the path information.
Bridge Table (a.k.a. Closure Table: some good ideas about how to use triggers for maintaining this approach)
Columns: ancestor, descendant
Stands apart from table it describes.
Can include some nodes in more than one hierarchy.
Cheap ancestry and descendants (albeit not in what order)
For complete knowledge of a hierarchy needs to be combined with another option.
Flat Table
A modification of the Adjacency List that adds a Level and Rank (e.g. ordering) column to each record.
Expensive move and delete
Cheap ancestry and descendants
Good Use: threaded discussion - forums / blog comments
Lineage Column (a.k.a. Materialized Path, Path Enumeration)
Column: lineage (e.g. /parent/child/grandchild/etc...)
Limit to how deep the hierarchy can be.
Descendants cheap (e.g. LEFT(lineage, #) = '/enumerated/path')
Ancestry tricky (database specific queries)
Database Specific Notes
MySQL
Use session variables for Adjacency List
Oracle
Use CONNECT BY to traverse Adjacency Lists
PostgreSQL
ltree datatype for Materialized Path
SQL Server
General summary
2008 offers HierarchyId data type appears to help with Lineage Column approach and expand the depth that can be represented.