Algorithm to infer tag hierarchy
Posted
by
Tom
on Programmers
See other posts from Programmers
or by Tom
Published on 2013-10-08T22:41:39Z
Indexed on
2013/10/22
16:02 UTC
Read the original article
Hit count: 308
algorithms
I'm looking for an algorithm to infer a hierarchy from a set of tagged items.
E.g. if the following items have the tags:
1 a
2 a,b
3 a,c
4 a,c,e
5 a,b
6 a,c
7 d
8 d,f
Then I can construct an undirected graph (or graphs) by tallying the node weights and edge weights:
node weights edge weights
a 6 a-b 2
b 2 a-c 3
c 3 c-e 1
d 2 a-e 1 <-- this edge is parallel to a-c and c-e and not wanted
e 1 d-f 1
f 1
The first problem is how to drop any redundant edges to get to the simplified graph? Note that it's only appropriate to remove that redundant a-e edge in this case because something is tagged as a-c-e, if that wasn't the case and the tag was a-e, that edge would have to remain. I suspect that means the removal of edges can only happen during the construction of the graph, not after everything has been tallied up.
What I'd then like to do is identify the direction of the edges to create a directed graph (or graphs) and pick out root nodes to hopefully create a tree (or trees):
trees
a d
// \\ |
b c f
\
e
It seems like it could be a string algorithm - longest common subsequences/prefixes - or a tree/graph algorithm, but I am a little stuck since I don't know the correct terminology to search for it.
© Programmers or respective owner