My goal is to "sum" two not compatible matrices (matrices with different dimensions) using (and preserving) row and column names.
I've figured this approach: convert the matrices to data.table objects, join them and then sum columns vectors.
An example:
> M1
1 3 4 5 7 8
1 0 0 1 0 0 0
3 0 0 0 0 0 0
4 1 0 0 0 0 0
5 0 0 0 0 0 0
7 0 0 0 0 1 0
8 0 0 0 0 0 0
> M2
1 3 4 5 8
1 0 0 1 0 0
3 0 0 0 0 0
4 1 0 0 0 0
5 0 0 0 0 0
8 0 0 0 0 0
> M1 %ms% M2
1 3 4 5 7 8
1 0 0 2 0 0 0
3 0 0 0 0 0 0
4 2 0 0 0 0 0
5 0 0 0 0 0 0
7 0 0 0 0 1 0
8 0 0 0 0 0 0
This is my code:
M1 <- matrix(c(0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0), byrow = TRUE, ncol = 6)
colnames(M1) <- c(1,3,4,5,7,8)
M2 <- matrix(c(0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0), byrow = TRUE, ncol = 5)
colnames(M2) <- c(1,3,4,5,8)
# to data.table objects
DT1 <- data.table(M1, keep.rownames = TRUE, key = "rn")
DT2 <- data.table(M2, keep.rownames = TRUE, key = "rn")
# join and sum of common columns
if (nrow(DT1) > nrow(DT2)) {
A <- DT2[DT1, roll = TRUE]
A[, list(X1 = X1 + X1.1, X3 = X3 + X3.1, X4 = X4 + X4.1, X5 = X5 + X5.1, X7, X8 = X8 + X8.1), by = rn]
}
That outputs:
rn X1 X3 X4 X5 X7 X8
1: 1 0 0 2 0 0 0
2: 3 0 0 0 0 0 0
3: 4 2 0 0 0 0 0
4: 5 0 0 0 0 0 0
5: 7 0 0 0 0 1 0
6: 8 0 0 0 0 0 0
Then I can convert back this data.table to a matrix and fix row and column names.
The questions are:
how to generalize this procedure?
I need a way to automatically create list(X1 = X1 + X1.1, X3 = X3 + X3.1, X4 = X4 + X4.1, X5 = X5 + X5.1, X7, X8 = X8 + X8.1) because i want to apply this function to matrices which dimensions (and row/columns names) are not known in advance.
In summary I need a merge procedure that behaves as described.
there are other strategies/implementations that achieve the same goal that are, at the same time, faster and generalized? (hoping that some data.table monster help me)
to what kind of join (inner, outer, etc. etc.) is assimilable this procedure?
Thanks in advance.
p.s.: I'm using data.table version 1.8.2
EDIT - SOLUTIONS
@Aaron solution.
No external libraries, only base R. It works also on list of matrices.
add_matrices_1 <- function(...) {
a <- list(...)
cols <- sort(unique(unlist(lapply(a, colnames))))
rows <- sort(unique(unlist(lapply(a, rownames))))
out <- array(0, dim = c(length(rows), length(cols)), dimnames = list(rows,cols))
for (m in a) out[rownames(m), colnames(m)] <- out[rownames(m), colnames(m)] + m
out
}
@MadScone solution. Used reshape2 package. It works only on two matrices per call.
add_matrices_2 <- function(m1, m2) {
m <- acast(rbind(melt(M1), melt(M2)), Var1~Var2, fun.aggregate = sum)
mn <- unique(colnames(m1), colnames(m2))
rownames(m) <- mn
colnames(m) <- mn
m
}
BENCHMARK (100 runs with microbenchmark package)
Unit: microseconds
expr min lq median uq max
1 add_matrices_1 196.009 257.5865 282.027 291.2735 549.397
2 add_matrices_2 13737.851 14697.9790 14864.778 16285.7650 25567.448
No need to comment the benchmark: @Aaron solution wins.
I'll continue to investigate a similar solution for data.table objects.
I'll add other solutions eventually reported or discovered.