Java code optimization leads to numerical inaccuracies and errors
Posted
by
rano
on Stack Overflow
See other posts from Stack Overflow
or by rano
Published on 2011-01-09T09:44:35Z
Indexed on
2011/01/09
9:53 UTC
Read the original article
Hit count: 236
I'm trying to implement a version of the Fuzzy C-Means algorithm in Java and I'm trying to do some optimization by computing just once everything that can be computed just once.
This is an iterative algorithm and regarding the updating of a matrix, the clusters x pixels membership matrix U
, this is the update rule I want to optimize:
where the x are the element of a matrix X
(pixels x features) and v belongs to the matrix V
(clusters x features). And m
is a parameter that ranges from 1.1
to infinity
. The distance used is the euclidean norm.
If I had to implement this formula in a banal way I'd do:
for(int i = 0; i < X.length; i++)
{
int count = 0;
for(int j = 0; j < V.length; j++)
{
double num = D[i][j];
double sumTerms = 0;
for(int k = 0; k < V.length; k++)
{
double thisDistance = D[i][k];
sumTerms += Math.pow(num / thisDistance, (1.0 / (m - 1.0)));
}
U[i][j] = (float) (1f / sumTerms);
}
}
In this way some optimization is already done, I precomputed all the possible squared distances between X
and V
and stored them in a matrix D
but that is not enough, since I'm cycling througn the elements of V
two times resulting in two nested loops.
Looking at the formula the numerator of the fraction is independent of the sum so I can compute numerator and denominator independently and the denominator can be computed just once for each pixel.
So I came to a solution like this:
int nClusters = V.length;
double exp = (1.0 / (m - 1.0));
for(int i = 0; i < X.length; i++)
{
int count = 0;
for(int j = 0; j < nClusters; j++)
{
double distance = D[i][j];
double denominator = D[i][nClusters];
double numerator = Math.pow(distance, exp);
U[i][j] = (float) (1f / (numerator * denominator));
}
}
Where I precomputed the denominator into an additional column of the matrix D
while I was computing the distances:
for (int i = 0; i < X.length; i++)
{
for (int j = 0; j < V.length; j++)
{
double sum = 0;
for (int k = 0; k < nDims; k++)
{
final double d = X[i][k] - V[j][k];
sum += d * d;
}
D[i][j] = sum;
D[i][B.length] += Math.pow(1 / D[i][j], exp);
}
}
By doing so I encounter numerical differences between the 'banal' computation and the second one that leads to different numerical value in U
(not in the first iterates but soon enough). I guess that the problem is that exponentiate very small numbers to high values (the elements of U
can range from 0.0 to 1.0 and exp
, for m = 1.1
, is 10
) leads to ver
y small values, whereas by dividing the numerator and the denominator and THEN exponentiating the result seems to be better numerically. The problem is it involves much more operations.
Am I doing something wrong? Is there a possible solution to get both the code optimized and numerically stable? Any suggestion or criticism will be appreciated.
© Stack Overflow or respective owner