六、物料类别分析

1、导入数据

#2018年物料数据
library(xlsx)
x=read.xlsx(“material_sales_order.xlsx”,sheetIndex=1,encoding = “UTF-8”)
x=read.xlsx(“material_sales_order_动态.xlsx”,sheetIndex=1,encoding = “UTF-8”)
x=x[xKaTeX parse error: Expected 'EOF', got '#' at position 17: …rder_nums>5,] #̲考虑数据的实际情况,将数据值为…km.cluster<=2,] #选择聚类的1和2类

2、观测数据分布

#物料订单额数据
boxplot(x o r d e r a m o u n t ) q q n o r m ( x order_amount) qqnorm(x order_amount)
#物料订单数数据
boxplot(xKaTeX parse error: Expected 'EOF', got '#' at position 13: order_nums) #̲数据的正态分布检验 qqnor…order_nums)

3、对原始数据进行标准化数据

X=cbind(x,scale(x o r d e r a m o u n t ) , s c a l e ( x order_amount),scale(x order_nums)) #对数据做标准化
write.xlsx(X,“material_sales_ordery_标准化.xlsx”)

4、进行聚类分析

##k-means聚类确定聚类个数
d=data.frame(x o r d e r n u m s ) m y d a t a < d w s s < ( n r o w ( m y d a t a ) 1 ) s u m ( a p p l y ( m y d a t a , 2 , v a r ) ) f o r ( i i n 2 : 15 ) w s s [ i ] < s u m ( k m e a n s ( m y d a t a , c e n t e r s = i ) order_nums) mydata <- d wss <- (nrow(mydata)-1)*sum(apply(mydata,2,var)) for (i in 2:15) wss[i] <- sum(kmeans(mydata,centers=i) withinss)
###这里的wss(within-cluster sum of squares)是组内平方和
plot(1:15, wss, type=“b”, xlab=“聚类个数”,ylab=“误差(平方和)”)
##动态聚类
km=kmeans(x o r d e r n u m s , 6 ) p l o t ( x order_nums,6) plot(x order_nums, col = km c l u s t e r , p c h = 1 , x l a b = " " , y l a b = " " ) X = c b i n d ( x , k m cluster, pch =1, xlab="序号",ylab="订单数量") X=cbind(x,km cluster) #读取聚类结果
write.xlsx(X,“material_sales_order_动态.xlsx”)
#自动计算的订单数散布最小的聚类个数
library(ykmeans)
a=data.frame(x o r d e r n u m s ) k m = y k m e a n s ( a , " x . o r d e r n u m s " , " x . o r d e r n u m s " , 3 : 6 ) t a b l e ( k m order_nums) km=ykmeans(a,"x.order_nums","x.order_nums",3:6) table(km cluster)

考虑最近一次订单发生的时间
是否可以考虑使用因子分析对其进行评分

发布了30 篇原创文章 · 获赞 0 · 访问量 346

猜你喜欢

转载自blog.csdn.net/hua_chang/article/details/105034251