Parallelize or vectorize all-against-all operation on a large number of matrices?

Posted by reve_etrange on Stack Overflow See other posts from Stack Overflow or by reve_etrange
Published on 2010-05-20T08:56:46Z Indexed on 2010/05/20 9:00 UTC
Read the original article Hit count: 185

I have approximately 5,000 matrices with the same number of rows and varying numbers of columns (20 x ~200). Each of these matrices must be compared against every other in a dynamic programming algorithm.

In this question, I asked how to perform the comparison quickly and was given an excellent answer involving a 2D convolution. Serially, iteratively applying that method, like so

list = who('data_matrix_prefix*')
H = cell(numel(list),numel(list));  
for i=1:numel(list)
    for j=1:numel(list)
        if i ~= j
            eval([ 'H{i,j} = compare(' char(list(i)) ',' char(list(j)) ');']);
        end
    end
end

is fast for small subsets of the data (e.g. for 9 matrices, 9*9 - 9 = 72 calls are made in ~1 s). However, operating on all the data requires almost 25 million calls.
I have also tried using deal() to make a cell array composed entirely of the next element in data, so I could use cellfun() in a single loop:

# who(), load() and struct2cell() calls place k data matrices in a 1D cell array called data.
nextData = cell(k,1);
for i=1:k
    [nextData{:}] = deal(data{i});
    H{:,i} = cellfun(@compare,data,nextData,'UniformOutput',false);
end

Unfortunately, this is not really any faster, because all the time is in compare(). Both of these code examples seem ill-suited for parallelization. I'm having trouble figuring out how to make my variables sliced.
compare() is totally vectorized; it uses matrix multiplication and conv2() exclusively (I am under the impression that all of these operations, including the cellfun(), should be multithreaded in MATLAB?).

Does anyone see a (explicitly) parallelized solution or better vectorization of the problem?

© Stack Overflow or respective owner

Related posts about matlab

Related posts about parallelization