I'm looking for an algorithm to infer a hierarchy from a set of tagged items.
E.g. if the following items have the tags:
1 a
2 a,b
3 a,c
4 a,c,e
5 a,b
6 a,c
7 d
8 d,f
Then I can construct an undirected graph (or graphs) by tallying the node weights and edge weights:
node weights edge weights
a 6 a-b 2
b 2 a-c 3
c 3 c-e 1
d 2 a-e 1 <-- this edge is parallel to a-c and c-e and not wanted
e 1 d-f 1
f 1
The first problem is how to drop any redundant edges to get to the simplified graph? Note that it's only appropriate to remove that redundant a-e edge in this case because something is tagged as a-c-e, if that wasn't the case and the tag was a-e, that edge would have to remain. I suspect that means the removal of edges can only happen during the construction of the graph, not after everything has been tallied up.
What I'd then like to do is identify the direction of the edges to create a directed graph (or graphs) and pick out root nodes to hopefully create a tree (or trees):
trees
a d
// \\ |
b c f
\
e
It seems like it could be a string algorithm - longest common subsequences/prefixes - or a tree/graph algorithm, but I am a little stuck since I don't know the correct terminology to search for it.