R语言-基本数据结构

向量、矩阵和数组只能存储同一类型的数据

向量

生成向量

> a <- c(1,3);
> print(a);
[1] 1 3
> b <- seq(from=1,to=10, by=2);
> print(b);
[1] 1 3 5 7 9
> d <- rep(0,5);
> print(d);
[1] 0 0 0 0 0
> labs <- paste(c("X","Y"), 1:10, sep="") ;
> print(labs);
 [1] "X1"  "Y2"  "X3"  "Y4"  "X5"  "Y6"  "X7"  "Y8"  "X9"  "Y10"

插入/删除数据

> a <- c(2,3,4,5,6,7)
> append(a,10) 
> #在最后面插入10这个数据
[1]  2  3  4  5  6  7 10
#这是a并没有被改变，因为没有被改变后赋值回去此时a还是c(2,3,4,5,6,7)
> a <- append(a,10)
> print(a)
[1]  2  3  4  5  6  7 10
#这样a才是c(2,3,4,5,6,7,10)
#后面的替换和删除同理，不赋值不改变

> append(a,8,after=3)
> #在第3个字符后面插入8
[1]  2  3  4  8  5  6  7 10

> a <- a[-c(2:3)]
> #删除调第2-3位的字符
> print(a)
[1]  2  5  6  7 10

替换数据

> replace(a,2,0)
>把第2个的值换为0
[1]  2  0  6  7 10

函数

> a <- c(1:10)
> sum(a)
> #求和
[1] 55
> max(a)
[1] 10
> min(a)
[1] 1
> range(a)
> #极值
[1]  1 10
> mean(a)
> #平均值
[1] 5.5
> var(a)
> #方差 方差sum公式((a-mean(a))^2/(n-1))
> #n是a中元素的个数
[1] 9.166667
> sort(a)
> #从小到大排序
 [1]  1  2  3  4  5  6  7  8  9 10
> rev(a)
> 从大到小排序
 [1] 10  9  8  7  6  5  4  3  2  1
> prod(a)
> #乘积
[1] 3628800
> prod(1:10)
> #prod(n:m)为n-m的阶乘，上面因为a的元素从1-10 所以计算结果是一样的
[1] 3628800

字符串

定义串

> s <- 'hello world!'
> strsplit(s,' ')
> #将以‘ ’分隔的元素分开存成列表，可以按任意字符切分(子串中不再含有该字符）
> #后面会存使用unlist函数将数据变为字符串向量方便后续计算
[[1]]
[1] "hello"  "world!"

> unlist(strsplit(s,' '))
[1] "hello"  "world!"
> unlist(strsplit(s,' '))[1]
[1] "hello"
> unlist(strsplit(s,' '))[2]
[1] "world!"
#此时的s是没有被改变的

连接字符串

> s2 <- 'this is r'
> paste(s,s2)
> #连接两个字符串会默认在串间加一个空格
[1] "hello world! this is r"

> paste(s,s2,sep='')
> #可以通过设置连接符为空来完成“无缝衔接”或指定其他的连接符号
[1] "hello world!this is r"
> paste0(s,s2,sep='')
> #paste0是无缝连接
[1] "hello world!this is r"

串的长度

> s
[1] "hello world!"
> length(s)
> #向量的长度
[1] 1
> nchar(s)
> #向量里每个元素的长度
[1] 12
#结合下面这个例子可以较为完整的理解
> s3 <- unlist(strsplit(s, ' '))
> s3
[1] "hello"  "world!"
> length(s3)
[1] 2
> nchar(s3)
[1] 5 6

子串


> substr(s,2,4)
> #substr(s,start,stop)从s串的第start位截到第stop位
[1] "ell"

> substring(s,2,4)
[1] "ell"
> substring(s,2)
> #默认substring(s,start,stop=lent(s))
[1] "ello world!"

矩阵

每一个元素都相同的二维表，行列名可以更改

> mat1 <- matrix(c(1:12),nrow=3,ncol =4)
> #定义了一个矩阵
> mat1
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12
> dim(mat1)
> #获得位数
[1] 3 4

矩阵计算

> mat1*2
> #矩阵与k（常数）相乘＝全部元素×k
     [,1] [,2] [,3] [,4]
[1,]    2    8   14   20
[2,]    4   10   16   22
[3,]    6   12   18   24

> mat1+1
> #R语言和matlab里的矩阵加常数k的计算都是矩阵里所有元素均加k
     [,1] [,2] [,3] [,4]
[1,]    2    5    8   11
[2,]    3    6    9   12
[3,]    4    7   10   13

> mat1
     [,1] [,2] [,3] [,4]
[1,]    1    4    7   10
[2,]    2    5    8   11
[3,]    3    6    9   12
> mat2 <- matrix(c(1:16),nrow=4)
> #定义一个矩阵mat2
> #矩阵A和B相乘的要求是A-col==B-row
> mat2
     [,1] [,2] [,3] [,4]
[1,]    1    5    9   13
[2,]    2    6   10   14
[3,]    3    7   11   15
[4,]    4    8   12   16
> mat1 %*% mat2
> #%*%是矩阵相乘的计算符号
     [,1] [,2] [,3] [,4]
[1,]   70  158  246  334
[2,]   80  184  288  392
[3,]   90  210  330  450

矩阵索引

> colnames(mat1) <- c('序号','参数1','参数2','结果')
> #修改列名
> mat1
     序号 参数1 参数2 结果
[1,]    1     4     7   10
[2,]    2     5     8   11
[3,]    3     6     9   12

> mat1[2,3]
> #取出某个元素值
参数2 
    8 
> mat1[2,]
> #取出第二行元素
 序号 参数1 参数2  结果 
    2     5     8    11 
> mat1[,2]
> #取出第二列
[1] 4 5 6
> mat1[c(1:2),c(2:4)]
> 取出其中部分矩阵，第1-2行*第2-4列
     参数1 参数2 结果
[1,]     4     7   10
[2,]     5     8   11

元素筛选

#逻辑判断
> mat1
     序号 参数1 参数2 结果
[1,]    1     4     7   10
[2,]    2     5     8   11
[3,]    3     6     9   12

> mat1[2,] >= 5
> #第2行元素是否大于等于5
 序号 参数1 参数2  结果 
FALSE  TRUE  TRUE  TRUE 

> mat1[,3] >= 8
> #第3列元素是否大于等于8
[1] FALSE  TRUE  TRUE
#想要取出TRUE的两行的数据
> mat[mat1[,3] >= 8]
> #默认数据放入了一个向量中
[1]  2  3  5  6  8  9 11 12
> mat[mat1[,3] >= 8,]
> #mat1[,3] >= 8为真的个数是2所以后面的矩阵是两行，','表示按照默认格式
     序号 参数1 参数2 结果
[1,]    2     5     8   11
[2,]    3     6     9   12

#which判断
> mat[which(mat1[,4]>11)]
[1] 3
#mat1矩阵“结果”列里数据大于11的是第3列

> mat[which(mat1[,4]>11),]
 序号 参数1 参数2  结果 
    3     6     9    12 
#mat1矩阵“结果”列里数据大于11所在行的数据

Apply函数

Apply Functions Over Array
MarginsDescription
Returns a vector or array or list of values obtained by applying a function to margins of an array or matrix.
Usage
apply(X, MARGIN, FUN, …)
fun的部分可以自己定义的函数
Arguments
X
an array, including a matrix.
MARGIN
a vector giving the subscripts which the function will be applied over. E.g., for a matrix 1 indicates rows, 2 indicates columns, c(1, 2) indicates rows and columns. Where X has named dimnames, it can be a character vector selecting dimension names.
FUN
the function to be applied: see ‘Details’. In the case of functions like +, %*%, etc., the function name must be backquoted or quoted.
…
optional arguments to FUN.

> apply(mat,1,mean)
> #对每一行取平均值
[1] 5.5 6.5 7.5

> apply(mat,2,mean)
> 对每一列取平均值
 序号 参数1 参数2  结果 
    2     5     8    11

数组

多维的数据结构，在R中使用不多

#创建语句
>arr <- array(data,dim)
#data是数据，dim是维度描述。下面举二维、三维、四维的三个例子进行演示。
#二维
> arr1 <- array(seq(from = 2,to = 24,by = 2),dim=c(2,6))
> arr1
     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    2    6   10   14   18   22
[2,]    4    8   12   16   20   24
#三维
> arr <- array(c(1:24),dim= c(2,3,4))
> arr
, , 1

     [,1] [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

, , 2

     [,1] [,2] [,3]
[1,]    7    9   11
[2,]    8   10   12

, , 3

     [,1] [,2] [,3]
[1,]   13   15   17
[2,]   14   16   18

, , 4

     [,1] [,2] [,3]
[1,]   19   21   23
[2,]   20   22   24
#四维
> arr2 <- array(seq(from = 0,to = 32,by = 2),dim=c(2,2,2,2))
> arr2
, , 1, 1

     [,1] [,2]
[1,]    0    4
[2,]    2    6

, , 2, 1

     [,1] [,2]
[1,]    8   12
[2,]   10   14

, , 1, 2

     [,1] [,2]
[1,]   16   20
[2,]   18   22

, , 2, 2

     [,1] [,2]
[1,]   24   28
[2,]   26   30

元素调用

> arr[1,,]
     [,1] [,2] [,3] [,4]
[1,]    1    7   13   19
[2,]    3    9   15   21
[3,]    5   11   17   23

套用函数
和其他数据结构一样基本的函数都可以进行套用

> sum(arr[])
[1] 300

数据框

能把各种不同类型数据组织在一起的数据结构，理解起来和Excel表格类似，可指定行列名，在R中使用最广泛
tips:赋值变量的时候用<-；函数参数用=

#创建数据框
> name <- c('小明','小红','小花')
> age <- c(22,20,24)
> sex <- c('m','f','m')
> person <- data.frame(name,sex,age)
> person
  name sex age
1 小明   m  22
2 小红   f  20
3 小花   m  24

#插入一列
> person$weight <- c(75,50,60)
> person
  name sex age weight
1 小明   m  22     75
2 小红   f  20     50
3 小花   m  24     60

#显示数据框结构
> str(person)
'data.frame':   3 obs. of  3 variables:
 $ name: Factor w/ 3 levels "小红","小花",..: 3 1 2
 $ sex : Factor w/ 2 levels "f","m": 2 1 2
 $ age : num  22 20 24

#访问元素
> person$name[2]
[1] 小红
Levels: 小红 小花 小明

> person$sex[2]
[1] f
Levels: f m

> person[1,2]
[1] m
Levels: f m

> person$age[2]
[1] 20

根据参数进行连接

#新添加一个进行连接的数据框
> person1 <- data.frame(name = name, age = c(18, 16 , 24), height = c(180,160,168))
> person1
  name age height
1 小明  18    180
2 小红  16    160
3 小花  24    168
#数据说明：我们现在需要把两个表里是同一个人的信息进行连接，只有小花名字和年龄和person数据框里的一样，所以只有这一条符合连接逻辑，其他两条年龄不匹配。

#进行连接
> merge(person,person1,person,by.x = 'age',by.y = 'age')
  age name.x sex weight name.y height
1  24   小花   m     60   小花    168
#理解起来和数据库

合并

#rbind()按行拼接，要求列数相同
#cbind()按列拼接，要求行数相同

#数据框是一个特殊的列表所以可以使apply
lapply(): 对列表的每个组件执行给定的函数，并返回另一个列表。
sapply():可以将结果整理以向量，矩阵，列表 的形式输出

> lapply(fruit,sum)
$`price`
[1] 35

$weight
[1] 320

> sapply(fruit,sum)
 price weight 
    35    320

vapply()
vapply（）与sapply（）相似，他可以预先指定的返回值类型。使得得到的结果更加安全。
tapply( )
tapply（x，f，g）需要向量 x (x不可以是数据框)，因子或因子列表 f 以及函数 g 。
tapply（）执行的操作是：暂时将x分组，每组对应一个因子水平，得到x的子向量，然后这些子向量应用函数 g
mapply（）
多参数版本的sapply()。第一次计算传入各组向量的第一个元素到FUN，进行结算得到结果；第二次传入各组向量的第二个元素，得到结果；第三次传入各组向量的第三个元素…以此类推。
参考地址：apply，sapply，lapply，tapply，vapply以及mapply的用法

列表

是一种复杂的数据结构，可以包含不同类型的，类似C的结构体

#定义列表
> a <- 'hello world'
> b <- 168
> d <- c(1:10)
> l <-list(a,b,d)
> l
[[1]]
[1] "hello world"

[[2]]
[1] 168

[[3]]
 [1]  1  2  3  4  5  6  7  8  9 10

#普通查找-很麻烦的数据访问
> l[[1]]
[1] "hello world"

#建立索引会方便很多
> l <- list(a = a,b = b,d = d)
> l
$`a`
[1] "hello world"

$b
[1] 168

$d
 [1]  1  2  3  4  5  6  7  8  9 10

> l$a
[1] "hello world"

绑定列表
和其他的绑定是类似的，通过里面的变量名就可以找到绑定的里面的元素

> attach(l)
The following objects are masked _by_ .GlobalEnv:

    a, b, d

> a
[1] "hello world"

> l <- unlist(l)
> l
            a             b            d1            d2            d3            d4 
"hello world"         "168"           "1"           "2"           "3"           "4" 
           d5            d6            d7            d8            d9           d10 
          "5"           "6"           "7"           "8"           "9"          "10"

R语言-基本数据结构

向量

字符串

矩阵

数组

数据框

列表

猜你喜欢