data.table
DT = data.table(x=c("b","b","b","a","a"),v=rnorm(5))
DT
x v
1: b 0.65853652
2: b -0.57938061
3: b 0.08485302
4: a -1.67034138
5: a -0.10346345
或者可以直接将data.frame转换为data.table类型
1. Keys
Keys在data.table中是一个重要的概念,在一个data.table中只能设置一个key,但是这一个key可以包含多个列。当我们设置好key后,data.table会将数据按照key来排序。
> setkey(DT,x) #设置key按照X作为key
> DT["b",]
x v
1: b 0.65853652
2: b -0.57938061
3: b 0.08485302
> DT["b"] #默认情况下会返回该分组的所有元素mult='all',mult can only be 'first','last' or 'all'
#但是如果我们想要其他结果,比如返回第一个元素,或返回最后一个元素
x v
1: b 0.65853652
2: b -0.57938061
3: b 0.08485302
> DT["b",mult="first"]
x v
1: b 0.6585365
> DT["b",mult="last"]
x v
1: b 0.08485302
system.time() 显示用时
DT = as.data.table(DF)
system.time(setkey(DT,x,y)) #将x,y作为一个key
## user system elapsed
## 0.13 0.01 0.14
system.time(ans2 <- DT[list("R","h")])
## user system elapsed
## 0.02 0.00 0.02
2. 快速聚合(fast grouping)
接下来我们要介绍data.table的第二个参数by
> DT[,sum(v)]
[1] -1.609796
> DT[,sum(v),by=x]
x V1
1: a -1.7738048
2: b 0.1640089
3. 快速连接
使用DT[X],该操作会将X中key(没指定key则默认第一列)与DT的key作连接,
同理,X[DT]会将DT与X作连接
> DT = data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9)
> DT
x y v
1: a 1 1
2: a 3 2
3: a 6 3
4: b 1 4
5: b 3 5
6: b 6 6
7: c 1 7
8: c 3 8
9: c 6 9
> G = data.table(c("b","c"),foo=c(4,2))
> G
V1 foo
1: b 4
2: c 2
> setkey(DT,x)#快速连接前先要设置key
> DT[G]#开始连接,以DT为基础,DT来迎合G
x y v foo
1: b 1 4 4
2: b 3 5 4
3: b 6 6 4
4: c 1 7 2
5: c 3 8 2
6: c 6 9 2
> setkey(G,V1)#若用G来迎合DT,则先设置G的key
> G[DT]
V1 foo y v
1: a NA 1 1 #G中没有的则为NA
2: a NA 3 2
3: a NA 6 3
4: b 4 1 4
5: b 4 3 5
6: b 4 6 6
7: c 2 1 7
8: c 2 3 8
9: c 2 6 9
我们也可以使用on操作来连接两个相同的列:
> DT = data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9)
> X = data.table(x=c("b","c"),foo=c(4,2))
> DT[X, on="x"] # join on columns 'x' on操作来连接两个相同的列
x y v foo
1: b 1 4 4
2: b 3 5 4
3: b 6 6 4
4: c 1 7 2
5: c 3 8 2
6: c 6 9 2
我们也可以使用data.table中的merge函数
> letters
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o"
[16] "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
dt1 <- data.table(A = letters[1:10], X = 1:10, key = "A")
dt2 <- data.table(A = letters[5:14], Y = 1:10, key = "A")
merge(dt1, dt2)
A X Y
1: e 5 1
2: f 6 2
3: g 7 3
4: h 8 4
5: i 9 5
6: j 10 6
待