Using ddply() to Get Frequency of Certain IDs, by Appearance in Multiple Rows (in R)

Posted by EconomiCurtis on Stack Overflow See other posts from Stack Overflow or by EconomiCurtis
Published on 2013-10-29T23:23:25Z Indexed on 2013/10/30 3:54 UTC
Read the original article Hit count: 130

Filed under:
|

Goal

If the following description is hard follow, please see the example "before" and "after" to see a straightforward example.

I have bartering data, with unique trade ids, and two sides of the trade. Side1 and Side2 are baskets, lists of item ids that represent both sides of the barter transaction.

I'd like to count the frequency each ITEM appears in TRADES. E.g, if item "001" appeared in 3 trades, I'd have a count of 3 (ignoring how many times the item appeared in each trade).

Further, I'd like to do this with the plyr ddply function.

(If you're interested as to my motivation, I working over many hundreds of thousands of transactions and am already using a ddply to calculate several other summary statistics. I'd like to add this to the ddply I'm already using, rather than calculate it after, and merge it into the ddply output.... sorry if that was difficult to follow.)

In terms of pseudo code I'm working off of:

  1. merge each row of Side1 and Side2
  2. by row, get unique() appearances of each item id
  3. apply table() function
  4. transpose and relabel output from table

Example of the structure of my data, and the output I desire.

Data Example (before):

df <- data.frame(TradeID = c("01","02","03","04"))
df$Side1 = list(c("001","001","002"),
                c("002","002","003"),
                c("001","004"),
                c("001","002","003","004"))
df$Side2 = list(c("001"),c("007"),c("009"),c())

Desired Output (after):

df.ItemRelFreq_byTradeID <- data.frame(ItemID = c("001","002","003","004","007","009"),
                                       RelFreq_byTrade = c(3,3,2,2,1,1))

One method to do this without ddply

I've worked out one way to do this below. My problem is that I can't quite seem to get ddply to do this for me.

 temp <- table(unlist(sapply(mapply(c,df$Side1,df$Side2), unique)))

 df.ItemRelFreq_byTradeID <- data.frame(ItemID = names(temp),
                                   RelFreq_byTrade = temp[])

Thanks for any help you can offer!

Curtis

© Stack Overflow or respective owner

Related posts about r

    Related posts about plyr