探索Julia(part14)--学生得分描述性统计案例

学习笔记,仅供参考,有错必纠

参考自:Julia数据科学应用–Zacharias Voulgaris;官方文档Julia数据处理常用包_DataFrames包测试

使用Julia-1.1.1



学生得分描述性统计案例


导入包,并导入数据:

using DataFrames
using CSV
mydata = CSV.read("./data/score.csv");
println(mydata)

输出:

10××4 DataFrame
││ Row ││ Column1 ││ age   ││ money ││ score ││
││     ││ String  ││ Int64 ││ Int64 ││ Int64 ││
├├──────────┼┼──────────────────┼┼──────────────┼
┤┤
││ 1   ││ A       ││ 19    ││ 1000  ││ 99    ││
││ 2   ││ B       ││ 20    ││ 2000  ││ 100   ││
││ 3   ││ C       ││ 19    ││ 9999  ││ 50    ││
││ 4   ││ D       ││ 21    ││ 3456  ││ 69    ││
││ 5   ││ E       ││ 22    ││ 8999  ││ 95    ││
││ 6   ││ F       ││ 25    ││ 887   ││ 76    ││
││ 7   ││ G       ││ 28    ││ 2600  ││ 85    ││
││ 8   ││ H       ││ 20    ││ 8000  ││ 90    ││
││ 9   ││ I       ││ 21    ││ 2460  ││ 77    ││
││ 10  ││ J       ││ 19    ││ 1000  ││ 84    ││

显示数据框前6行:

head(mydata)

输出:

6××4 DataFrame
││ Row ││ Column1 ││ age   ││ money ││ score ││
││     ││ String  ││ Int64 ││ Int64 ││ Int64 ││
├├──────────┼┼──────────────────┼┼──────────────┼┼
┤┤
││ 1   ││ A       ││ 19    ││ 1000  ││ 99    ││
││ 2   ││ B       ││ 20    ││ 2000  ││ 100   ││
││ 3   ││ C       ││ 19    ││ 9999  ││ 50    ││
││ 4   ││ D       ││ 21    ││ 3456  ││ 69    ││
││ 5   ││ E       ││ 22    ││ 8999  ││ 95    ││
││ 6   ││ F       ││ 25    ││ 887   ││ 76    ││

显示数据后6行:

tail(mydata)

输出:

6××4 DataFrame
││ Row ││ Column1 ││ age   ││ money ││ score ││
││     ││ String  ││ Int64 ││ Int64 ││ Int64 ││
├├──────────┼┼──────────────────┼┼──────────────┼┼
┤┤
││ 1   ││ E       ││ 22    ││ 8999  ││ 95    ││
││ 2   ││ F       ││ 25    ││ 887   ││ 76    ││
││ 3   ││ G       ││ 28    ││ 2600  ││ 85    ││
││ 4   ││ H       ││ 20    ││ 8000  ││ 90    ││
││ 5   ││ I       ││ 21    ││ 2460  ││ 77    ││
││ 6   ││ J       ││ 19    ││ 1000  ││ 84    ││

返回数据的描述性统计信息:

describe(mydata)

输出:

扫描二维码关注公众号,回复: 11403928 查看本文章
variable mean min median max nunique nmissing eltype
Symbol Union… Any Union… Any Union… Nothing DataType
1 Column1 A J 10 String
2 age 21.4 19 20.5 28 Int64
3 money 4040.1 887 2530.0 9999 Int64
4 score 82.5 50 84.5 100 Int64

返回age大于22的记录:

mydata[mydata[:age] .> 22, :]

输出:

Column1 age money score
String Int64 Int64 Int64
1 F 25 887 76
2 G 28 2600 85

求age和money的平均值:

colwise(mean, mydata[[:age, :score]])

输出:

2-element Array{Float64,1}:
 21.4
 82.5

给mydata数据框增加一列等级(grade)列:

mydata[:grade] = ["A", "B", "C", "D", "A", "A", "B", "B", "C", "D"]

删除mydata最后两行:

deleterows!(mydata, 9:10);

按照grade给mydata数据框分组:

by(mydata, :grade, nrow)

输出:

││ Row ││ grade  ││ nrow  ││
││     ││ String ││ Int64 ││
├├──────────┼┼────────────────
││ 1   ││ A      ││ 3     ││
││ 2   ││ B      ││ 3     ││
││ 3   ││ C      ││ 1     ││
││ 4   ││ D      ││ 1     ││

计算age与score之间的皮尔逊相关系数:

cor(mydata[:age], mydata[:score])
#返回值
0.019667052513438126

猜你喜欢

转载自blog.csdn.net/m0_37422217/article/details/107413134