作业一:(使用jupyter notebook 工具)
Step 1. 导入相应的模块
import pandas as pd
import numpy as np
from pandas import Series,DataFrame
Step 2. 给定的原始数据集
raw_data = {'regiment': ['Nighthawks', 'Nighthawks', 'Nighthawks', 'Nighthawks', 'Dragoons', 'Dragoons', 'Dragoons', 'Dragoons', 'Scouts', 'Scouts', 'Scouts', 'Scouts'],
'company': ['1st', '1st', '2nd', '2nd', '1st', '1st', '2nd', '2nd','1st', '1st', '2nd', '2nd'],
'deaths': [523, 52, 25, 616, 43, 234, 523, 62, 62, 73, 37, 35],
'battles': [5, 42, 2, 2, 4, 7, 8, 3, 4, 7, 8, 9],
'size': [1045, 957, 1099, 1400, 1592, 1006, 987, 849, 973, 1005, 1099, 1523],
'veterans': [1, 5, 62, 26, 73, 37, 949, 48, 48, 435, 63, 345],
'readiness': [1, 2, 3, 3, 2, 1, 2, 3, 2, 1, 2, 3],
'armored': [1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1],
'deserters': [4, 24, 31, 2, 3, 4, 24, 31, 2, 3, 2, 3],
'origin': ['Arizona', 'California', 'Texas', 'Florida', 'Maine', 'Iowa', 'Alaska', 'Washington', 'Oregon', 'Wyoming', 'Louisana', 'Georgia']}
Step 3. 根据原始数据集创建一个DataFrame,并赋值给变量army
army = DataFrame(raw_data)
army
|
armored |
battles |
company |
deaths |
deserters |
origin |
readiness |
regiment |
size |
veterans |
0 |
1 |
5 |
1st |
523 |
4 |
Arizona |
1 |
Nighthawks |
1045 |
1 |
1 |
0 |
42 |
1st |
52 |
24 |
California |
2 |
Nighthawks |
957 |
5 |
2 |
1 |
2 |
2nd |
25 |
31 |
Texas |
3 |
Nighthawks |
1099 |
62 |
3 |
1 |
2 |
2nd |
616 |
2 |
Florida |
3 |
Nighthawks |
1400 |
26 |
4 |
0 |
4 |
1st |
43 |
3 |
Maine |
2 |
Dragoons |
1592 |
73 |
5 |
1 |
7 |
1st |
234 |
4 |
Iowa |
1 |
Dragoons |
1006 |
37 |
6 |
0 |
8 |
2nd |
523 |
24 |
Alaska |
2 |
Dragoons |
987 |
949 |
7 |
1 |
3 |
2nd |
62 |
31 |
Washington |
3 |
Dragoons |
849 |
48 |
8 |
0 |
4 |
1st |
62 |
2 |
Oregon |
2 |
Scouts |
973 |
48 |
9 |
0 |
7 |
1st |
73 |
3 |
Wyoming |
1 |
Scouts |
1005 |
435 |
10 |
1 |
8 |
2nd |
37 |
2 |
Louisana |
2 |
Scouts |
1099 |
63 |
11 |
1 |
9 |
2nd |
35 |
3 |
Georgia |
3 |
Scouts |
1523 |
345 |
Step 4. 设定指定列为索引:设定数据中的origin字段为索引
army1 = army.set_index(["origin"])
army1
|
armored |
battles |
company |
deaths |
deserters |
readiness |
regiment |
size |
veterans |
origin |
|
|
|
|
|
|
|
|
|
Arizona |
1 |
5 |
1st |
523 |
4 |
1 |
Nighthawks |
1045 |
1 |
California |
0 |
42 |
1st |
52 |
24 |
2 |
Nighthawks |
957 |
5 |
Texas |
1 |
2 |
2nd |
25 |
31 |
3 |
Nighthawks |
1099 |
62 |
Florida |
1 |
2 |
2nd |
616 |
2 |
3 |
Nighthawks |
1400 |
26 |
Maine |
0 |
4 |
1st |
43 |
3 |
2 |
Dragoons |
1592 |
73 |
Iowa |
1 |
7 |
1st |
234 |
4 |
1 |
Dragoons |
1006 |
37 |
Alaska |
0 |
8 |
2nd |
523 |
24 |
2 |
Dragoons |
987 |
949 |
Washington |
1 |
3 |
2nd |
62 |
31 |
3 |
Dragoons |
849 |
48 |
Oregon |
0 |
4 |
1st |
62 |
2 |
2 |
Scouts |
973 |
48 |
Wyoming |
0 |
7 |
1st |
73 |
3 |
1 |
Scouts |
1005 |
435 |
Louisana |
1 |
8 |
2nd |
37 |
2 |
2 |
Scouts |
1099 |
63 |
Georgia |
1 |
9 |
2nd |
35 |
3 |
3 |
Scouts |
1523 |
345 |
Step 5. 打印列名为veterans的所有值
army1["veterans"]
origin
Arizona 1
California 5
Texas 62
Florida 26
Maine 73
Iowa 37
Alaska 949
Washington 48
Oregon 48
Wyoming 435
Louisana 63
Georgia 345
Name: veterans, dtype: int64
Step 6. 打印列名为 ‘veterans’ 和 ‘deaths’ 的所有数据
|
veterans |
deaths |
origin |
|
|
Arizona |
1 |
523 |
California |
5 |
52 |
Texas |
62 |
25 |
Florida |
26 |
616 |
Maine |
73 |
43 |
Iowa |
37 |
234 |
Alaska |
949 |
523 |
Washington |
48 |
62 |
Oregon |
48 |
62 |
Wyoming |
435 |
73 |
Louisana |
63 |
37 |
Georgia |
345 |
35 |
Step 7. 打印出所有的列索引的值
army1.columns
Index(['armored', 'battles', 'company', 'deaths', 'deserters', 'readiness',
'regiment', 'size', 'veterans'],
dtype='object')
Step 8. 筛选出列 regiments 的值不为”Dragoons”的所有数据
army1.loc[army1["regiment"] != "Dragoons"]
|
armored |
battles |
company |
deaths |
deserters |
readiness |
regiment |
size |
veterans |
origin |
|
|
|
|
|
|
|
|
|
Arizona |
1 |
5 |
1st |
523 |
4 |
1 |
Nighthawks |
1045 |
1 |
California |
0 |
42 |
1st |
52 |
24 |
2 |
Nighthawks |
957 |
5 |
Texas |
1 |
2 |
2nd |
25 |
31 |
3 |
Nighthawks |
1099 |
62 |
Florida |
1 |
2 |
2nd |
616 |
2 |
3 |
Nighthawks |
1400 |
26 |
Oregon |
0 |
4 |
1st |
62 |
2 |
2 |
Scouts |
973 |
48 |
Wyoming |
0 |
7 |
1st |
73 |
3 |
1 |
Scouts |
1005 |
435 |
Louisana |
1 |
8 |
2nd |
37 |
2 |
2 |
Scouts |
1099 |
63 |
Georgia |
1 |
9 |
2nd |
35 |
3 |
3 |
Scouts |
1523 |
345 |
Step 9.筛选出 第 3 到 7 行,第 3 到 6 列的所有数据
army1.iloc[2:6,[2,6]]
|
company |
regiment |
origin |
|
|
Texas |
2nd |
Nighthawks |
Florida |
2nd |
Nighthawks |
Maine |
1st |
Dragoons |
Iowa |
1st |
Dragoons |
作业二:
在校生饮酒消费数据分析
Step 1. 导入相关的模块
import pandas as pd
import numpy as np
from pandas import Series,DataFrame
Step 2. 导入数据,并赋值给变量df
df = pd.read_csv("./datasets/Student_Alcohol.csv")
df
|
school |
sex |
age |
address |
famsize |
Pstatus |
Medu |
Fedu |
Mjob |
Fjob |
… |
absences |
G1 |
G2 |
G3 |
0 |
GP |
F |
18 |
U |
GT3 |
A |
4 |
4 |
at_home |
teacher |
… |
6 |
5 |
6 |
6 |
1 |
GP |
F |
17 |
U |
GT3 |
T |
1 |
1 |
at_home |
other |
… |
4 |
5 |
5 |
6 |
2 |
GP |
F |
15 |
U |
LE3 |
T |
1 |
1 |
at_home |
other |
… |
10 |
7 |
8 |
10 |
3 |
GP |
F |
15 |
U |
GT3 |
T |
4 |
2 |
health |
services |
… |
2 |
15 |
14 |
15 |
4 |
GP |
F |
16 |
U |
GT3 |
T |
3 |
3 |
other |
other |
… |
4 |
6 |
10 |
10 |
5 |
GP |
M |
16 |
U |
LE3 |
T |
4 |
3 |
services |
other |
… |
10 |
15 |
15 |
15 |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
391 |
MS |
M |
17 |
U |
LE3 |
T |
3 |
1 |
services |
services |
… |
3 |
14 |
16 |
16 |
392 |
MS |
M |
21 |
R |
GT3 |
T |
1 |
1 |
other |
other |
… |
3 |
10 |
8 |
7 |
393 |
MS |
M |
18 |
R |
LE3 |
T |
3 |
2 |
services |
other |
… |
0 |
11 |
12 |
10 |
394 |
MS |
M |
19 |
U |
LE3 |
T |
1 |
1 |
other |
at_home |
… |
5 |
8 |
9 |
9 |
395 rows × 33 columns
Step 3. 连续切片(获取[school:guardian]两列以及中间的所有数据)
df.iloc[:,0:12]
|
school |
sex |
age |
address |
famsize |
Pstatus |
Medu |
Fedu |
Mjob |
Fjob |
reason |
guardian |
0 |
GP |
F |
18 |
U |
GT3 |
A |
4 |
4 |
at_home |
teacher |
course |
mother |
1 |
GP |
F |
17 |
U |
GT3 |
T |
1 |
1 |
at_home |
other |
course |
father |
2 |
GP |
F |
15 |
U |
LE3 |
T |
1 |
1 |
at_home |
other |
other |
mother |
3 |
GP |
F |
15 |
U |
GT3 |
T |
4 |
2 |
health |
services |
home |
mother |
4 |
GP |
F |
16 |
U |
GT3 |
T |
3 |
3 |
other |
other |
home |
father |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
391 |
MS |
M |
17 |
U |
LE3 |
T |
3 |
1 |
services |
services |
course |
mother |
392 |
MS |
M |
21 |
R |
GT3 |
T |
1 |
1 |
other |
other |
course |
other |
393 |
MS |
M |
18 |
R |
LE3 |
T |
3 |
2 |
services |
other |
course |
mother |
394 |
MS |
M |
19 |
U |
LE3 |
T |
1 |
1 |
other |
at_home |
course |
father |
395 rows × 12 columns
Step 5. 将数据列 Mjob 和 Fjob中所有数据实现首字母大写
data2 = df.iloc[:,[8,9]] #获取 "Mjob","Fjob" 两列
data21 = Series(data2["Mjob"]) #将两列转成Series格式
data22 = Series(data2["Fjob"])
df["Mjob"] =data21.map(lambda x:x.capitalize()) #将"Mjob"列所有值 首字母大写
df["Fjob"] =data22.map(lambda x:x.capitalize()) #将"Fjob"列所有值 首字母大写
df
|
school |
sex |
age |
address |
famsize |
Pstatus |
Medu |
Fedu |
Mjob |
Fjob |
… |
absences |
G1 |
G2 |
G3 |
0 |
GP |
F |
18 |
U |
GT3 |
A |
4 |
4 |
At_home |
Teacher |
… |
6 |
5 |
6 |
6 |
1 |
GP |
F |
17 |
U |
GT3 |
T |
1 |
1 |
At_home |
Other |
… |
4 |
5 |
5 |
6 |
2 |
GP |
F |
15 |
U |
LE3 |
T |
1 |
1 |
At_home |
Other |
… |
10 |
7 |
8 |
10 |
3 |
GP |
F |
15 |
U |
GT3 |
T |
4 |
2 |
Health |
Services |
… |
2 |
15 |
14 |
15 |
4 |
GP |
F |
16 |
U |
GT3 |
T |
3 |
3 |
Other |
Other |
… |
4 |
6 |
10 |
10 |
5 |
GP |
M |
16 |
U |
LE3 |
T |
4 |
3 |
Services |
Other |
… |
10 |
15 |
15 |
15 |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
390 |
MS |
M |
20 |
U |
LE3 |
A |
2 |
2 |
Services |
Services |
… |
11 |
9 |
9 |
9 |
391 |
MS |
M |
17 |
U |
LE3 |
T |
3 |
1 |
Services |
Services |
… |
3 |
14 |
16 |
16 |
392 |
MS |
M |
21 |
R |
GT3 |
T |
1 |
1 |
Other |
Other |
… |
3 |
10 |
8 |
7 |
393 |
MS |
M |
18 |
R |
LE3 |
T |
3 |
2 |
Services |
Other |
… |
0 |
11 |
12 |
10 |
394 |
MS |
M |
19 |
U |
LE3 |
T |
1 |
1 |
Other |
At_home |
… |
5 |
8 |
9 |
9 |
395 rows × 12 columns
Step 6.创建一个名为majority函数,并根据age列数据返回一个布尔值添加到新的数据列,列名为 legal_drinker (根据年龄这一列数据,大于17岁为合法饮酒)
majority = lambda x:["合法" if x>17 else "不合法"]
df["legal_drinker"] = df["age"].map(majority)
df
|
school |
sex |
age |
address |
famsize |
Pstatus |
Medu |
Fedu |
Mjob |
Fjob |
… |
G1 |
G2 |
G3 |
legal_drinker |
0 |
GP |
F |
18 |
U |
GT3 |
A |
4 |
4 |
At_home |
Teacher |
… |
5 |
6 |
6 |
[合法] |
1 |
GP |
F |
17 |
U |
GT3 |
T |
1 |
1 |
At_home |
Other |
… |
5 |
5 |
6 |
[不合法] |
2 |
GP |
F |
15 |
U |
LE3 |
T |
1 |
1 |
At_home |
Other |
… |
7 |
8 |
10 |
[不合法] |
3 |
GP |
F |
15 |
U |
GT3 |
T |
4 |
2 |
Health |
Services |
… |
15 |
14 |
15 |
[不合法] |
4 |
GP |
F |
16 |
U |
GT3 |
T |
3 |
3 |
Other |
Other |
… |
6 |
10 |
10 |
[不合法] |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
… |
391 |
MS |
M |
17 |
U |
LE3 |
T |
3 |
1 |
Services |
Services |
… |
14 |
16 |
16 |
[不合法] |
392 |
MS |
M |
21 |
R |
GT3 |
T |
1 |
1 |
Other |
Other |
… |
10 |
8 |
7 |
[合法] |
393 |
MS |
M |
18 |
R |
LE3 |
T |
3 |
2 |
Services |
Other |
… |
11 |
12 |
10 |
[合法] |
394 |
MS |
M |
19 |
U |
LE3 |
T |
1 |
1 |
Other |
At_home |
… |
8 |
9 |
9 |
[合法] |
395 rows × 12 columns