Is it possible to shuffle a 2D matrix while preserving row AND column frequencies?

Posted by j_random_hacker on Stack Overflow See other posts from Stack Overflow or by j_random_hacker
Published on 2011-02-23T04:03:41Z Indexed on 2011/02/23 7:25 UTC
Read the original article Hit count: 206

Filed under:
|
|

Suppose I have a 2D array like the following:

GACTG
AGATA
TCCGA

Each array element is taken from a small finite set (in my case, DNA nucleotides -- {A, C, G, T}). I would like to randomly shuffle this array somehow while preserving both row and column nucleotide frequencies. Is this possible? Can it be done efficiently?

[EDIT]: By this I mean I want to produce a new matrix where each row has the same number of As, Cs, Gs and Ts as the corresponding row of the original matrix, and where each column has the same number of As, Cs, Gs and Ts as the corresponding column of the original matrix. Permuting the rows or columns of the original matrix will not achieve this in general. (E.g. for the example above, the top row has 2 Gs, and 1 each of A, C and T; if this row was swapped with row 2, the top row in the resulting matrix would have 3 As, 1 G and 1 T.)

It's simple enough to preserve just column frequencies by shuffling a column at a time, and likewise for rows. But doing this will in general alter the frequencies of the other kind.

My thoughts so far: If it's possible to pick 2 rows and 2 columns so that the 4 elements at the corners of this rectangle have the pattern

XY
YX

for some pair of distinct elements X and Y, then replacing these 4 elements with

YX
XY

will maintain both row and column frequencies. In the example at the top, this can be done for (at least) rows 1 and 2 and columns 2 and 5 (whose corners give the 2x2 matrix AG;GA), and for rows 1 and 3 and columns 1 and 4 (whose corners give GT;TG). Clearly this could be repeated a number of times to produce some level of randomisation.

Generalising a bit, any "subrectangle" induced by a subset of rows and a subset of columns, in which the frequencies of all rows are the same and the frequencies of all columns are the same, can have both its rows and columns permuted to produce a valid complete rectangle. (Of these, only those subrectangles in which at least 1 element is changed are actually interesting.) Big questions:

  1. Are all valid complete matrices reachable by a series of such "subrectangle rearrangements"? I suspect the answer is yes.
  2. Are all valid subrectangle rearrangements decomposable into a series of 2x2 swaps? I suspect the answer is no, but I hope it's yes, since that would seem to make it easier to come up with an efficient algorithm.
  3. Can some or all of the valid rearrangements be computed efficiently?

This question addresses a special case in which the set of possible elements is {0, 1}. The solutions people have come up with there are similar to what I have come up with myself, and are probably usable, but not ideal as they require an arbitrary amount of backtracking to work correctly. Also I'm concerned that only 2x2 swaps are considered.

Finally, I would ideally like a solution that can be proven to select a matrix uniformly at random from the set of all matrices having identical row frequencies and column frequencies to the original. I know, I'm asking for a lot :)

© Stack Overflow or respective owner

Related posts about algorithm

Related posts about random