目录
一、数据格式
1.创建新向量(一维数组)
a1 <- c(1,2,3,4,5)
print(a1)
[1] 1 2 3 4 5
a2 <-c("山东","农业大学")
print(a2)
print(a2[2])
[1] "山东" "农业大学"
[1] "农业大学"
2.创建矩阵(二维数组)
#格式
a1 <- matrix(data = NA,nrow = 1,ncol = 1,byrow = FALSE,dimnames = NULL)
#data:向量,也就是矩阵的元素
#nrow,ncol:矩阵的行数和列数
#byrow:逻辑值 TRUE为按行填充,FALSE反之
#dimnames:字符型向量,用于记录行名和列名
a1 <- matrix(1:20,nrow = 4,ncol = 5,byrow = TRUE)
print(a1)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 3 4 5
[2,] 6 7 8 9 10
[3,] 11 12 13 14 15
[4,] 16 17 18 19 20
a2 <- matrix(1:20,nrow = 4,ncol = 5,byrow = FALSE)
print(a2)
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
访问矩阵中的元素
> print(a1[3,])
[1] 11 12 13 14 15
> print(a1[,2])
[1] 2 7 12 17
> print(a1[3,c(2,3)])
[1] 12 13
> print(a1[,c(2,3)])
[,1] [,2]
[1,] 2 3
[2,] 7 8
[3,] 12 13
[4,] 17 18
3.数组(三维数组)
#格式
a1 <- array(data = NA,dim = length(data),dimnames = NULL)
#data:向量,包含了数组中的数据
#dim:数值型向量,指定数组每个维度的大小,即下标的最大值
#dimnames:列表,包含各维度名称标签,此项为可选项
a1 <- array(1:24,c(2,3,4))
print(a1)
, , 1
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6, , 2
[,1] [,2] [,3]
[1,] 7 9 11
[2,] 8 10 12, , 3
[,1] [,2] [,3]
[1,] 13 15 17
[2,] 14 16 18, , 4
[,1] [,2] [,3]
[1,] 19 21 23
[2,] 20 22 24
> a1[2,2,2]
[1] 10
> a1[2,2,]
[1] 4 10 16 22
x <- c("x1","x2")
y <- c("y1","y2","y3")
z <- c("z1","z2","z3","z4")
a1 <- array(1:24,c(2,3,4),dimnames = list(x,y,z))
a1
, , z1
y1 y2 y3
x1 1 3 5
x2 2 4 6, , z2
y1 y2 y3
x1 7 9 11
x2 8 10 12, , z3
y1 y2 y3
x1 13 15 17
x2 14 16 18, , z4
y1 y2 y3
x1 19 21 23
x2 20 22 24
4.数据框(数据库)
#格式
data.frame(...,row.names = NULL,check.rows = FALSE,
check.names = TRUE,fix.empty.names = TRUE,
stirngsAsFactors = default.stringsAsFactors())
#...:向量(一维数组),可为任何类型(int,double,bool等)
#row.names:字符型向量,用于记录列名称
#check.rows:逻辑值,TRUE表示检查每一列的长度是否一致
#check.names:逻辑值,TRUE表示检查列名称是否有效,以及是否有重复的列名称
#fix.empty.names:逻辑值,TRUE表示自动填充的空白的列的名称
#stringAsFactors:逻辑值,TRUE表示将字符型变量转化为因子储存
创建学生数据库
number <- c(20210001,20210002,20210003,20210004,20210005)
name <- c("张三","李四","王五","常六","胡七")
age <-c(19,20,18,19,21)
sex <-c("女","男","女","女","男")
student <- data.frame(number,name,age,sex)
student
number name age sex
1 20210001 张三 19 女
2 20210002 李四 20 男
3 20210003 王五 18 女
4 20210004 常六 19 女
5 20210005 胡七 21 男
5.列表
和数据框(数据库)类似
number <- c(20210001,20210002,20210003,20210004,20210005)
name <- c("张三","李四","王五","常六","胡七")
age <-c(19,20,18,19,21)
sex <-c("女","男","女","女","男")
list1 <- list(number = number,name = name,age = age,sex = sex)
$number
[1] 20210001 20210002 20210003 20210004 20210005$name
[1] "张三" "李四" "王五" "常六" "胡七"$age
[1] 19 20 18 19 21$sex
[1] "女" "男" "女" "女" "男"
二、数据存储与读取
2.1 存储文件
2.1.1 csv文件
number <- c(20210001,20210002,20210003,20210004,20210005)
name <- c("scott","jack","tom","daming","xibaozhi")
sex <-c("girl","boy","girl","boy","gay")
student <- data.frame(number,name,age,sex)
write.csv(student,"E:\\student.csv",row.names = FALSE)
2.2 读取文件
2.2.1csv文件
"E:\\编程语言学习\\R语言学习\\weather1.csv", header = TRUE, sep = ",", quote = "\"", dec = ".", fill = TRUE, comment.char = "")
precipitation cloud.cover solar.radiation Direct.radiation
1 0.25 0.20 2101007.9 1096400.81
2 0.00 0.26 2556632.9 1694003.33
3 0.00 0.99 1847945.9 817321.61
4 0.00 1.00 2153495.5 799061.95
5 2.50 1.00 369740.4 9546.60
6 0.00 0.46 2517514.9 1410573.42
7 0.00 0.00 3283947.0 2646156.76
8 0.00 0.39 3099281.6 2343757.00
9 0.18 1.00 2085675.1 900816.49
10 0.01 0.98 1670588.7 535728.73
11 4.93 1.00 305388.2 29720.95
12 0.00 0.53 2847285.9 1967227.04
13 0.00 0.15 3242648.5 2583893.18
14 0.00 0.76 2839078.1 1908628.28
15 0.00 0.82 2876004.4 2103748.27
16 0.00 0.94 2614186.1 1463502.07
17 1.01 1.00 614758.7 38135.31
18 0.02 1.00 1830789.3 682478.36
19 0.00 1.00 2843533.8 1954438.08
20 0.07 0.32 1637820.0 743462.77
21 0.00 0.00 3192254.5 2539045.51
22 0.00 0.30 2816551.3 2028372.93
23 0.19 0.61 2367547.1 1415872.30
24 0.08 1.00 879634.4 2841.58
25 0.07 0.29 2020917.1 931558.72
26 0.29 0.54 2218739.8 1461940.28
三、画图
举个例子,我现在有一个表,表中有潍坊市潍城区2022年7月逐小时理想太阳辐照度和实际太阳辐照度,我想将此会成一幅散点图,并生成回归线,代码如下
ggplot(data = weather,aes(x = sun,y = sun_in_earth))+
+ geom_point()+
+ geom_smooth()