Summarising grouped records in a dataframe in R (...again)

Posted by monch1962 on Stack Overflow See other posts from Stack Overflow or by monch1962
Published on 2010-04-15T13:20:27Z Indexed on 2010/04/15 13:23 UTC
Read the original article Hit count: 252

Filed under:
|
|
|

Hello all,

(I tried to ask this question earlier today, but later realised I over-simplified the question; the answers I received were correct, but I couldn't use them because of my over-simplification of the problem in the original question. Here's my 2nd attempt...)

I have a data frame in R that looks like:

"Timestamp", "Source", "Target", "Length", "Content"
0.1        , P1      , P2      , 5       , "ABCDE"
0.2        , P1      , P2      , 3       , "HIJ"
0.4        , P1      , P2      , 4       , "PQRS"
0.5        , P2      , P1      , 2       , "ZY"
0.9        , P2      , P1      , 4       , "SRQP"
1.1        , P1      , P2      , 1       , "B"
1.6        , P1      , P2      , 3       , "DEF"
2.0        , P2      , P1      , 3       , "IJK"
...

and I want to convert this to:

"StartTime", "EndTime", "Duration", "Source", "Target", "Length", "Content"
0.1        , 0.4      , 0.3       , P1      , P2      , 12      , "ABCDEHIJPQRS"
0.5        , 0.9      , 0.4       , P2      , P1      , 6       , "ZYSRQP"
1.1        , 1.6      , 0.5       , P1      , P2      , 4       , "BDEF"
...

Trying to put this into English, I want to group consecutive records with the same 'Source' and 'Target' together, then print out a single record per group showing the StartTime, EndTime & Duration (=EndTime-StartTime) for that group, along with the sum of the Lengths for that group, and a concatenation of the Content (which will all be strings) in that group.

The TimeOffset values will always increase throughout the data frame.

I had a look at melt/recast and have a feeling that it could be used to solve the problem, but couldn't get my head around the documentation. I suspect it's possible to do this within R, but I really don't know where to start. In a pinch I could export the data frame out and do it in e.g. Python, but I'd prefer to stay within R if possible.

Thanks in advance for any assistance you can provide

© Stack Overflow or respective owner

Related posts about statistics

Related posts about r