Let's say I have a list of data frames ldf:
df1 <- data.frame(date = c(1,2), value = c(4,5))
df2 <- data.frame(date = c(1,2), value = c(4,5))
ldf <- list(df1, df2)
What is the best way to get the sum (or any other function) of values by date, i.e. some data frame like:
data.frame(date = c(1,2), value = c(8,10))
You could use:
library(data.table)
dt1 <- rbindlist(ldf)
setkey(dt1,'date')
dt1[,list(value=sum(value)), by='date']
date value
1: 1 8
2: 2 10
If these rows were all in the same data frame, you would use aggregate
to do the sum. You can combine them with rbind
so they are in the same data frame:
aggregate(value ~ date, data=do.call(rbind, ldf), FUN=sum)
date value
1 1 8
2 2 10
If the date
columns in all the data frames are identical, you can easily use Reduce
to do the sum:
Reduce(function(x, y) data.frame(date=x$date, value=x$value+y$value), ldf)
date value
1 1 8
2 2 10
This should be a lot faster than rbind
-ing the data together and aggregating.
Another option is to use unnest
from "tidyr" in conjunction with the typical grouping and aggregation functions via "dplyr":
library(dplyr)
library(tidyr)
unnest(ldf) %>%
group_by(date) %>%
summarise(value = sum(value))
# Source: local data frame [2 x 2]
#
# date value
# 1 1 8
# 2 2 10