Using ddply() to Get Frequency of Certain IDs, by Appearance in Multiple Rows (in R)
- by EconomiCurtis
Goal
If the following description is hard follow, please see the example "before" and "after" to see a straightforward example.
I have bartering data, with unique trade ids, and two sides of the trade. Side1 and Side2 are baskets, lists of item ids that represent both sides of the barter transaction.
I'd like to count the frequency each ITEM appears in TRADES. E.g, if item "001" appeared in 3 trades, I'd have a count of 3 (ignoring how many times the item appeared in each trade).
Further, I'd like to do this with the plyr ddply function.
(If you're interested as to my motivation, I working over many hundreds of thousands of transactions and am already using a ddply to calculate several other summary statistics. I'd like to add this to the ddply I'm already using, rather than calculate it after, and merge it into the ddply output.... sorry if that was difficult to follow.)
In terms of pseudo code I'm working off of:
merge each row of Side1 and Side2
by row, get unique() appearances of each item id
apply table() function
transpose and relabel output from table
Example of the structure of my data, and the output I desire.
Data Example (before):
df <- data.frame(TradeID = c("01","02","03","04"))
df$Side1 = list(c("001","001","002"),
c("002","002","003"),
c("001","004"),
c("001","002","003","004"))
df$Side2 = list(c("001"),c("007"),c("009"),c())
Desired Output (after):
df.ItemRelFreq_byTradeID <- data.frame(ItemID = c("001","002","003","004","007","009"),
RelFreq_byTrade = c(3,3,2,2,1,1))
One method to do this without ddply
I've worked out one way to do this below. My problem is that I can't quite seem to get ddply to do this for me.
temp <- table(unlist(sapply(mapply(c,df$Side1,df$Side2), unique)))
df.ItemRelFreq_byTradeID <- data.frame(ItemID = names(temp),
RelFreq_byTrade = temp[])
Thanks for any help you can offer!
Curtis