Segmenting a double array of labels
- by Ami
The Problem:
I have a large double array populated with various labels. Each element (cell) in the double array contains a set of labels and some elements in the double array may be empty. I need an algorithm to cluster elements in the double array into discrete segments. A segment is defined as a set of pixels that are adjacent within the double array and one label that all those pixels in the segment have in common. (Diagonal adjacency doesn't count and I'm not clustering empty cells).
|-------|-------|------|
| Jane | Joe | |
| Jack | Jane | |
|-------|-------|------|
| Jane | Jane | |
| | Joe | |
|-------|-------|------|
| | Jack | Jane |
| | Joe | |
|-------|-------|------|
In the above arrangement of labels distributed over nine elements, the largest cluster is the “Jane” cluster occupying the four upper left cells.
What I've Considered:
I've considered iterating through every label of every cell in the double array and testing to see if the cell-label combination under inspection can be associated with a preexisting segment. If the element under inspection cannot be associated with a preexisting segment it becomes the first member of a new segment. If the label/cell combination can be associated with a preexisting segment it associates.
Of course, to make this method reasonable I'd have to implement an elaborate hashing system. I'd have to keep track of all the cell-label combinations that stand adjacent to preexisting segments and are in the path of the incrementing indices that are iterating through the double array. This hash method would avoid having to iterate through every pixel in every preexisting segment to find an adjacency.
Why I Don't Like it:
As is, the above algorithm doesn't take into consideration the case where an element in the double array can be associated with two unique segments, one in the horizontal direction and one in the vertical direction. To handle these cases properly, I would need to implement a test for this specific case and then implement a method that will both associate the element under inspection with a segment and then concatenate the two adjacent identical segments.
On the whole, this method and the intricate hashing system that it would require feels very inelegant. Additionally, I really only care about finding the large segments in the double array and I'm much more concerned with the speed of this algorithm than with the accuracy of the segmentation, so I'm looking for a better way.
I assume there is some stochastic method for doing this that I haven't thought of.
Any suggestions?