数据集
向量
向量就是一维的数组,可以包含数字、字符和逻辑语句。c()函数来构造
> a = c(1, 2, 5, 3, 6, -2, 4)
> b = c("one", "two", "three")
> c = c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE)
矩阵
矩阵就是二维的数组,能存储的数据类型和向量一样。用matrix()函数来构造
系统默认是byrow=False,按列
> x = matrix(1:20, nrow=5, ncol=4, byrow=TRUE) #生成1~20,5行,4列,按行
> x
[,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
[4,] 13 14 15 16
[5,] 17 18 19 20
> y = matrix(1:20, nrow=5, ncol=4, byrow=FALSE)#生成1~20,5行,4列,按列
> y
[,1] [,2] [,3] [,4]
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20
> x[2,]#返回第2行
[1] 5 6 7 8
> x[,2]#返回第2列
[1] 2 6 10 14 18
> x[1,4]#返回指定的某个数,第1行第4列的数
[1] 4
> x[2,c(2,4)]#返回指定的几个数,第2行的第2和第4列的数
[1] 6 8
> x[3:5, 2]#返回指定的几个数,第3行到第5行的第2列的数
[1] 10 14 18
#给矩阵行列命名,要注意行数和列数要对应否则会报错
> rnames=c("apple","banana","orange","melon","corn")
> cnames=c("cat","dog","bird","pig")
> x = matrix(1:20, nrow=5, ncol=4, byrow=TRUE)
> rownames(x)=rnames
> colnames(x)=cnames
> x
cat dog bird pig
apple 1 2 3 4
banana 5 6 7 8
orange 9 10 11 12
melon 13 14 15 16
corn 17 18 19 20
数组
数组类似于矩阵,但是可以有很多维,用array()来构造
array(data = NA, dim = length(data), dimnames = NULL)
data为数据,dim为维度,dimnames为每一维的名称一定要用list来创立
> dim1 = c("A1", "A2")
> dim2 = c("B1", "B2", "B3")
> dim3 = c("C1", "C2", "C3", "C4")
> dim4 = c("D1", "D2", "D3")
> z = array(1:72, c(2, 3, 4, 3), dimnames=list(dim1, dim2, dim3, dim4))
> z
, , C1, D1
B1 B2 B3
A1 1 3 5
A2 2 4 6
, , C2, D1
B1 B2 B3
A1 7 9 11
A2 8 10 12
, , C3, D1
B1 B2 B3
A1 13 15 17
A2 14 16 18
, , C4, D1
B1 B2 B3
A1 19 21 23
A2 20 22 24
, , C1, D2
B1 B2 B3
A1 25 27 29
A2 26 28 30
, , C2, D2
B1 B2 B3
A1 31 33 35
A2 32 34 36
, , C3, D2
B1 B2 B3
A1 37 39 41
A2 38 40 42
, , C4, D2
B1 B2 B3
A1 43 45 47
A2 44 46 48
, , C1, D3
B1 B2 B3
A1 49 51 53
A2 50 52 54
, , C2, D3
B1 B2 B3
A1 55 57 59
A2 56 58 60
, , C3, D3
B1 B2 B3
A1 61 63 65
A2 62 64 66
, , C4, D3
B1 B2 B3
A1 67 69 71
A2 68 70 72
> z[1,2,3,]
D1 D2 D3
15 39 63
Data Frame
类似于大的矩阵,一般维度类似于二维
> patientID = c(1, 2, 3, 4)
> age = c(25, 34, 28, 52)
> diabetes = c("Type1", "Type2", "Type1", "Type1")
> status = c("Poor", "Improved", "Excellent", "Poor")
> patientdata = data.frame(patientID, age, diabetes, status)
> patientdata
patientID age diabetes status
1 1 25 Type1 Poor
2 2 34 Type2 Improved
3 3 28 Type1 Excellent
4 4 52 Type1 Poor
> patientdata[1:2]#返回第1和第2列的内容
patientID age
1 1 25
2 2 34
3 3 28
4 4 52
> patientdata[1:3,]#返回第一到第三行的内容
patientID age diabetes status
1 1 25 Type1 Poor
2 2 34 Type2 Improved
3 3 28 Type1 Excellent
> patientdata[1,1:3]#返回第一行的第一到第三列的内容
patientID age diabetes
1 1 25 Type1
> patientdata[c(1,3),1:3]#返回第一和第三行的第一和第三列的内容
patientID age diabetes
1 1 25 Type1
3 3 28 Type1
> swim = read.csv("http://www.macalester.edu/~kaplan/ISM/datasets/swim100m.csv")
> swim
year time sex
1 1905 65.80 M
2 1908 65.60 M
3 1910 62.80 M
4 1912 61.60 M
5 1918 61.40 M
6 1920 60.40 M
7 1922 58.60 M
8 1924 57.40 M
9 1934 56.80 M
10 1935 56.60 M
11 1936 56.40 M
12 1944 55.90 M
13 1947 55.80 M
14 1948 55.40 M
15 1955 54.80 M
16 1957 54.60 M
17 1961 53.60 M
18 1964 52.90 M
19 1967 52.60 M
20 1968 52.20 M
21 1970 51.90 M
22 1972 51.22 M
23 1975 50.59 M
24 1976 49.44 M
25 1981 49.36 M
26 1985 49.24 M
27 1986 48.74 M
28 1988 48.42 M
29 1994 48.21 M
30 2000 48.18 M
31 2000 47.84 M
32 1908 95.00 F
33 1910 86.60 F
34 1911 84.60 F
35 1912 78.80 F
36 1915 76.20 F
37 1920 73.60 F
38 1923 72.80 F
39 1924 72.20 F
40 1926 70.00 F
41 1929 69.40 F
42 1930 68.00 F
43 1931 66.60 F
44 1933 66.00 F
45 1934 65.40 F
46 1936 64.60 F
47 1956 62.00 F
48 1958 61.20 F
49 1960 60.20 F
50 1962 59.50 F
51 1964 58.90 F
52 1972 58.50 F
53 1973 57.54 F
54 1974 56.96 F
55 1976 55.65 F
56 1978 55.41 F
57 1980 54.79 F
58 1986 54.73 F
59 1992 54.48 F
60 1994 54.01 F
61 2000 53.77 F
62 2004 53.52 F
attach()
和detach()
> mtcars
mpg cyl disp hp drat
Mazda RX4 21.0 6 160.0 110 3.90
Mazda RX4 Wag 21.0 6 160.0 110 3.90
Datsun 710 22.8 4 108.0 93 3.85
Hornet 4 Drive 21.4 6 258.0 110 3.08
Hornet Sportabout 18.7 8 360.0 175 3.15
Valiant 18.1 6 225.0 105 2.76
Duster 360 14.3 8 360.0 245 3.21
Merc 240D 24.4 4 146.7 62 3.69
Merc 230 22.8 4 140.8 95 3.92
Merc 280 19.2 6 167.6 123 3.92
Merc 280C 17.8 6 167.6 123 3.92
Merc 450SE 16.4 8 275.8 180 3.07
Merc 450SL 17.3 8 275.8 180 3.07
Merc 450SLC 15.2 8 275.8 180 3.07
Cadillac Fleetwood 10.4 8 472.0 205 2.93
Lincoln Continental 10.4 8 460.0 215 3.00
Chrysler Imperial 14.7 8 440.0 230 3.23
Fiat 128 32.4 4 78.7 66 4.08
Honda Civic 30.4 4 75.7 52 4.93
Toyota Corolla 33.9 4 71.1 65 4.22
Toyota Corona 21.5 4 120.1 97 3.70
Dodge Challenger 15.5 8 318.0 150 2.76
AMC Javelin 15.2 8 304.0 150 3.15
Camaro Z28 13.3 8 350.0 245 3.73
Pontiac Firebird 19.2 8 400.0 175 3.08
Fiat X1-9 27.3 4 79.0 66 4.08
Porsche 914-2 26.0 4 120.3 91 4.43
Lotus Europa 30.4 4 95.1 113 3.77
Ford Pantera L 15.8 8 351.0 264 4.22
Ferrari Dino 19.7 6 145.0 175 3.62
Maserati Bora 15.0 8 301.0 335 3.54
Volvo 142E 21.4 4 121.0 109 4.11
wt qsec vs am gear
Mazda RX4 2.620 16.46 0 1 4
Mazda RX4 Wag 2.875 17.02 0 1 4
Datsun 710 2.320 18.61 1 1 4
Hornet 4 Drive 3.215 19.44 1 0 3
Hornet Sportabout 3.440 17.02 0 0 3
Valiant 3.460 20.22 1 0 3
Duster 360 3.570 15.84 0 0 3
Merc 240D 3.190 20.00 1 0 4
Merc 230 3.150 22.90 1 0 4
Merc 280 3.440 18.30 1 0 4
Merc 280C 3.440 18.90 1 0 4
Merc 450SE 4.070 17.40 0 0 3
Merc 450SL 3.730 17.60 0 0 3
Merc 450SLC 3.780 18.00 0 0 3
Cadillac Fleetwood 5.250 17.98 0 0 3
Lincoln Continental 5.424 17.82 0 0 3
Chrysler Imperial 5.345 17.42 0 0 3
Fiat 128 2.200 19.47 1 1 4
Honda Civic 1.615 18.52 1 1 4
Toyota Corolla 1.835 19.90 1 1 4
Toyota Corona 2.465 20.01 1 0 3
Dodge Challenger 3.520 16.87 0 0 3
AMC Javelin 3.435 17.30 0 0 3
Camaro Z28 3.840 15.41 0 0 3
Pontiac Firebird 3.845 17.05 0 0 3
Fiat X1-9 1.935 18.90 1 1 4
Porsche 914-2 2.140 16.70 0 1 5
Lotus Europa 1.513 16.90 1 1 5
Ford Pantera L 3.170 14.50 0 1 5
Ferrari Dino 2.770 15.50 0 1 5
Maserati Bora 3.570 14.60 0 1 5
Volvo 142E 2.780 18.60 1 1 4
carb
Mazda RX4 4
Mazda RX4 Wag 4
Datsun 710 1
Hornet 4 Drive 1
Hornet Sportabout 2
Valiant 1
Duster 360 4
Merc 240D 2
Merc 230 2
Merc 280 4
Merc 280C 4
Merc 450SE 3
Merc 450SL 3
Merc 450SLC 3
Cadillac Fleetwood 4
Lincoln Continental 4
Chrysler Imperial 4
Fiat 128 1
Honda Civic 2
Toyota Corolla 1
Toyota Corona 1
Dodge Challenger 2
AMC Javelin 2
Camaro Z28 4
Pontiac Firebird 2
Fiat X1-9 1
Porsche 914-2 2
Lotus Europa 2
Ford Pantera L 4
Ferrari Dino 6
Maserati Bora 8
Volvo 142E 2
> mpg
错误: 找不到对象'mpg'
> attach(mtcars)
> mpg
[1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4
[9] 22.8 19.2 17.8 16.4 17.3 15.2 10.4 10.4
[17] 14.7 32.4 30.4 33.9 21.5 15.5 15.2 13.3
[25] 19.2 27.3 26.0 30.4 15.8 19.7 15.0 21.4
> detach(mtcars)
> mpg
错误: 找不到对象'mpg'
list
最复杂的数据集,什么都可以存,不受维度的限制和不受数据长度的限制
> dat = data.frame(swim, x)
Error in data.frame(swim, x) : 参数值意味着不同的行数: 62, 5
> mylist = list(patientdata, swim, x)
> mylist
[[1]]
patientID age diabetes status
1 1 25 Type1 Poor
2 2 34 Type2 Improved
3 3 28 Type1 Excellent
4 4 52 Type1 Poor
[[2]]
year time sex
1 1905 65.80 M
2 1908 65.60 M
3 1910 62.80 M
4 1912 61.60 M
5 1918 61.40 M
6 1920 60.40 M
7 1922 58.60 M
8 1924 57.40 M
9 1934 56.80 M
10 1935 56.60 M
11 1936 56.40 M
12 1944 55.90 M
13 1947 55.80 M
14 1948 55.40 M
15 1955 54.80 M
16 1957 54.60 M
17 1961 53.60 M
18 1964 52.90 M
19 1967 52.60 M
20 1968 52.20 M
21 1970 51.90 M
22 1972 51.22 M
23 1975 50.59 M
24 1976 49.44 M
25 1981 49.36 M
26 1985 49.24 M
27 1986 48.74 M
28 1988 48.42 M
29 1994 48.21 M
30 2000 48.18 M
31 2000 47.84 M
32 1908 95.00 F
33 1910 86.60 F
34 1911 84.60 F
35 1912 78.80 F
36 1915 76.20 F
37 1920 73.60 F
38 1923 72.80 F
39 1924 72.20 F
40 1926 70.00 F
41 1929 69.40 F
42 1930 68.00 F
43 1931 66.60 F
44 1933 66.00 F
45 1934 65.40 F
46 1936 64.60 F
47 1956 62.00 F
48 1958 61.20 F
49 1960 60.20 F
50 1962 59.50 F
51 1964 58.90 F
52 1972 58.50 F
53 1973 57.54 F
54 1974 56.96 F
55 1976 55.65 F
56 1978 55.41 F
57 1980 54.79 F
58 1986 54.73 F
59 1992 54.48 F
60 1994 54.01 F
61 2000 53.77 F
62 2004 53.52 F
[[3]]
cat dog bird pig
apple 1 2 3 4
banana 5 6 7 8
orange 9 10 11 12
melon 13 14 15 16
corn 17 18 19 20
图表
作图的变量
par()
设定作图的参数
> par(mfrow=c(2,2)) #设定两行两列的网格
> plot(rnorm(50),pch=17)
> plot(rnorm(20),type="l",lty=5)
> plot(rnorm(100),cex=0.5)
> plot(rnorm(200),lwd=2)
参数:
- pch:点的样式
- cex:确定点的大小,1为原来大小,0.5为0.5倍,1.5为1.5倍
- lty:为线的类型
- lwd:线的粗细,不一定是线有东西就行,1为原来大小,0.5为0.5倍,1.5为1.5倍
- title()标题函数
- axis()轴线函数
- legend()图例函数
参考博客
https://www.cnblogs.com/xudongliang/p/6757195.html
https://www.cnblogs.com/xudongliang/p/6762618.html
https://blog.csdn.net/myl1992/article/details/45826931/
layout()
> attach(mtcars)
> layout(matrix(c(1,1,2,3), 2, 2, byrow = TRUE))#设置两行两列
> hist(wt)#占了(1,1)和(1,2)
> hist(mpg)占了(2,1)
> hist(disp)占了(2,2)
> detach(mtcars)