R语言——聚类分析——处理错误：NAs introduced by coercion

1、聚类分析

使用的是距离矩阵

D=dist(iris)
hc=hcclust(D,method="single")  #method方法默认是complete，最长距离
plot(hc)

2、K-means（K均值聚类或K近邻聚类）

使用的是普通矩阵

set.seed(123)
km=kmeans(chart,5)
kc=km$cluster    #cluster是km的一列值，用$可以将其单独读取出来

实验问题：

报错信息：

Warning message:
In dist(effection) : NAs introduced by coercion

> effection=read.xlsx('D:/大三下/多元统计分析R语言/实验报告/3-6.xlsx',rowNames=T,1)
> hc=hclust(dist(effection),method = "ward.D2")
Warning message:
In dist(effection) : NAs introduced by coercion

说明实验数据有问题，有两种可能，两种处理方法：

1、导入数据时，第一列无关数据（eg：序号）也加入了运算

——>直接在excel表中删除无关列

——>或者在导入数据时，设置以下rowNames=T，让导入的第一列设置为title，就不会参与后续运算

2、还有可能是，excel表中的实验数据有chart型的，所以在导入时被识别到，自动赋值为NA，未知，所以在后续运算时，虽然不报错但，对结果也会造成一定影响

——>打开excel表，检查表中的数据，尤其是数字间的多余的空格

3、chart型数据参与了运算

——>用R自带的函数——matrix()和cbind()或rbind()函数构建新的，不包含字符型数据的矩阵

eg：

原先的矩阵：

使用如下代码，创建新矩阵X

> x1=matrix(iris$Sepal.Length)
> x2=matrix(iris$Sepal.Width)
> x3=matrix(iris$Petal.Length)
> x4=matrix(iris$Petal.Width)
> X=cbind(x1,x2,x3,x4)
> D=dist(X)
> hc=hclust(D,method = "single")
> plot(hc)

新矩阵不包含字符：

R语言——聚类分析——处理错误：NAs introduced by coercion

猜你喜欢