导入pandas
import pandas as pd
1. 字典键将用作列标题,而每个列表中的值将用作的行DataFrame。
In [ 2 ] : df = pd. DataFrame( {
. . . : "Name" : [ "Braund, Mr. Owen Harris" ,
. . . : "Allen, Mr. William Henry" ,
. . . : "Bonnell, Miss. Elizabeth" ] ,
. . . : "Age" : [ 22 , 35 , 58 ] ,
. . . : "Sex" : [ "male" , "male" , "female" ] }
. . . : )
df
Name
Age
Sex
0
Braund, Mr. Owen Harris
22
male
1
Allen, Mr. William Henry
35
male
2
Bonnell, Miss. Elizabeth
58
female
2. DataFrame的每一列都是一个Series,并且每个Series都又是一个DataFrame
df[ "Age" ]
0 22
1 35
2 58
Name: Age, dtype: int64
ages = pd. Series( [ 22 , 35 , 58 ] , name= "Age" )
ages
0 22
1 35
2 58
Name: Age, dtype: int64
3. 对数据进行操作
df[ "Age" ] . max ( )
58
ages. max ( )
58
df. describe( )
Age
count
3.000000
mean
38.333333
std
18.230012
min
22.000000
25%
28.500000
50%
35.000000
75%
46.500000
max
58.000000
读取和写入表格数据
1.read_csv()将存储为csv文件的数据读入
titanic= pd. read_csv( "data/titanic.csv" )
titanic
PassengerId
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked
0
1
0
3
Braund, Mr. Owen Harris
male
22.0
1
0
A/5 21171
7.2500
NaN
S
1
2
1
1
Cumings, Mrs. John Bradley (Florence Briggs Th...
female
38.0
1
0
PC 17599
71.2833
C85
C
2
3
1
3
Heikkinen, Miss. Laina
female
26.0
0
0
STON/O2. 3101282
7.9250
NaN
S
3
4
1
1
Futrelle, Mrs. Jacques Heath (Lily May Peel)
female
35.0
1
0
113803
53.1000
C123
S
4
5
0
3
Allen, Mr. William Henry
male
35.0
0
0
373450
8.0500
NaN
S
...
...
...
...
...
...
...
...
...
...
...
...
...
886
887
0
2
Montvila, Rev. Juozas
male
27.0
0
0
211536
13.0000
NaN
S
887
888
1
1
Graham, Miss. Margaret Edith
female
19.0
0
0
112053
30.0000
B42
S
888
889
0
3
Johnston, Miss. Catherine Helen "Carrie"
female
NaN
1
2
W./C. 6607
23.4500
NaN
S
889
890
1
1
Behr, Mr. Karl Howell
male
26.0
0
0
111369
30.0000
C148
C
890
891
0
3
Dooley, Mr. Patrick
male
32.0
0
0
370376
7.7500
NaN
Q
891 rows × 12 columns
2.使用head()查看前n行,tail()查看后m行
titanic. head( 8 )
PassengerId
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked
0
1
0
3
Braund, Mr. Owen Harris
male
22.0
1
0
A/5 21171
7.2500
NaN
S
1
2
1
1
Cumings, Mrs. John Bradley (Florence Briggs Th...
female
38.0
1
0
PC 17599
71.2833
C85
C
2
3
1
3
Heikkinen, Miss. Laina
female
26.0
0
0
STON/O2. 3101282
7.9250
NaN
S
3
4
1
1
Futrelle, Mrs. Jacques Heath (Lily May Peel)
female
35.0
1
0
113803
53.1000
C123
S
4
5
0
3
Allen, Mr. William Henry
male
35.0
0
0
373450
8.0500
NaN
S
5
6
0
3
Moran, Mr. James
male
NaN
0
0
330877
8.4583
NaN
Q
6
7
0
1
McCarthy, Mr. Timothy J
male
54.0
0
0
17463
51.8625
E46
S
7
8
0
3
Palsson, Master. Gosta Leonard
male
2.0
3
1
349909
21.0750
NaN
S
titanic. tail( 3 )
PassengerId
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked
888
889
0
3
Johnston, Miss. Catherine Helen "Carrie"
female
NaN
1
2
W./C. 6607
23.45
NaN
S
889
890
1
1
Behr, Mr. Karl Howell
male
26.0
0
0
111369
30.00
C148
C
890
891
0
3
Dooley, Mr. Patrick
male
32.0
0
0
370376
7.75
NaN
Q
3. dtypes属性:解释每种列数据类型
titanic. dtypes
PassengerId int64
Survived int64
Pclass int64
Name object
Sex object
Age float64
SibSp int64
Parch int64
Ticket object
Fare float64
Cabin object
Embarked object
dtype: object
4.数据存储
read_*函数用来读取数据
to_*方法用于存储数据
to_excel()方法将数据存储为excel文件。
titanic. to_excel( 'titanic.xlsx' , sheet_name= 'passengers' , index= False )
titanic= pd. read_excel( 'titanic.xlsx' , sheet_name= 'passengers' )
titanic. head( )
PassengerId
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked
0
1
0
3
Braund, Mr. Owen Harris
male
22.0
1
0
A/5 21171
7.2500
NaN
S
1
2
1
1
Cumings, Mrs. John Bradley (Florence Briggs Th...
female
38.0
1
0
PC 17599
71.2833
C85
C
2
3
1
3
Heikkinen, Miss. Laina
female
26.0
0
0
STON/O2. 3101282
7.9250
NaN
S
3
4
1
1
Futrelle, Mrs. Jacques Heath (Lily May Peel)
female
35.0
1
0
113803
53.1000
C123
S
4
5
0
3
Allen, Mr. William Henry
male
35.0
0
0
373450
8.0500
NaN
S
5. 方法info()提供有关的技术信息
titanic. info( )
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 714 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object
11 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB
小结:
read_*函数支持将数据从许多不同的文件格式或数据源导入。
通过不同的to_*方法可以将数据导出 。
head/ tail/ info方法和dtypes属性。
选择子集
1.选择单列
ages= titanic[ 'Age' ]
ages. head( )
0 22.0
1 38.0
2 26.0
3 35.0
4 35.0
Name: Age, dtype: float64
type ( titanic[ "Age" ] )
pandas.core.series.Series
titanic[ "Age" ] . shape
(891,)
2.选择多列
age_sex= titanic[ [ "Age" , "Sex" ] ]
age_sex. head( )
Age
Sex
0
22.0
male
1
38.0
female
2
26.0
female
3
35.0
female
4
35.0
male
type ( titanic[ [ "Age" , "Sex" ] ] )
pandas.core.frame.DataFrame
titanic[ [ "Age" , "Sex" ] ] . shape
(891, 2)
过滤特定行
above_35= titanic[ titanic[ "Age" ] > 35 ]
above_35. head( )
PassengerId
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked
1
2
1
1
Cumings, Mrs. John Bradley (Florence Briggs Th...
female
38.0
1
0
PC 17599
71.2833
C85
C
6
7
0
1
McCarthy, Mr. Timothy J
male
54.0
0
0
17463
51.8625
E46
S
11
12
1
1
Bonnell, Miss. Elizabeth
female
58.0
0
0
113783
26.5500
C103
S
13
14
0
3
Andersson, Mr. Anders Johan
male
39.0
1
5
347082
31.2750
NaN
S
15
16
1
2
Hewlett, Mrs. (Mary D Kingcome)
female
55.0
0
0
248706
16.0000
NaN
S
titanic[ "Age" ] > 35
0 False
1 True
2 False
3 False
4 False
...
886 False
887 False
888 False
889 False
890 False
Name: Age, Length: 891, dtype: bool
above_35. shape
(217, 12)
class_23= titanic[ titanic[ "Pclass" ] . isin( [ 2 , 3 ] ) ]
class_23. head( )
PassengerId
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked
0
1
0
3
Braund, Mr. Owen Harris
male
22.0
1
0
A/5 21171
7.2500
NaN
S
2
3
1
3
Heikkinen, Miss. Laina
female
26.0
0
0
STON/O2. 3101282
7.9250
NaN
S
4
5
0
3
Allen, Mr. William Henry
male
35.0
0
0
373450
8.0500
NaN
S
5
6
0
3
Moran, Mr. James
male
NaN
0
0
330877
8.4583
NaN
Q
7
8
0
3
Palsson, Master. Gosta Leonard
male
2.0
3
1
349909
21.0750
NaN
S
1.isin()条件函数
与条件表达式类似,返回布尔型
为在列表中的每一行某个不为空的值都返回一个True ( the isin() conditional function returns a True for each row the values are in the provided list.)
class_23= titanic[ ( titanic[ "Pclass" ] == 2 ) | ( titanic[ "Pclass" ] == 3 ) ]
class_23. head( )
PassengerId
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked
0
1
0
3
Braund, Mr. Owen Harris
male
22.0
1
0
A/5 21171
7.2500
NaN
S
2
3
1
3
Heikkinen, Miss. Laina
female
26.0
0
0
STON/O2. 3101282
7.9250
NaN
S
4
5
0
3
Allen, Mr. William Henry
male
35.0
0
0
373450
8.0500
NaN
S
5
6
0
3
Moran, Mr. James
male
NaN
0
0
330877
8.4583
NaN
Q
7
8
0
3
Palsson, Master. Gosta Leonard
male
2.0
3
1
349909
21.0750
NaN
S
注意
组合多个条件语句时,每个条件必须用括号括起来()。
不能使用 or,and只能使用|和&。
2.notna():为每一行的某一个not Null的值返回一个True
age_no_na= titanic[ titanic[ "Age" ] . notna( ) ]
age_no_na. shape
(714, 12)
选择特定的行和列
1. loc[] :行和列名
adult_names= titanic. loc[ titanic[ "Age" ] > 35 , "Name" ]
adult_names. head( )
1 Cumings, Mrs. John Bradley (Florence Briggs Th...
6 McCarthy, Mr. Timothy J
11 Bonnell, Miss. Elizabeth
13 Andersson, Mr. Anders Johan
15 Hewlett, Mrs. (Mary D Kingcome)
Name: Name, dtype: object
2. iloc[]:基于位置
titanic. iloc[ 9 : 25 , 2 : 5 ]
Pclass
Name
Sex
9
2
Nasser, Mrs. Nicholas (Adele Achem)
female
10
3
Sandstrom, Miss. Marguerite Rut
female
11
1
Bonnell, Miss. Elizabeth
female
12
3
Saundercock, Mr. William Henry
male
13
3
Andersson, Mr. Anders Johan
male
14
3
Vestrom, Miss. Hulda Amanda Adolfina
female
15
2
Hewlett, Mrs. (Mary D Kingcome)
female
16
3
Rice, Master. Eugene
male
17
2
Williams, Mr. Charles Eugene
male
18
3
Vander Planke, Mrs. Julius (Emelia Maria Vande...
female
19
3
Masselmani, Mrs. Fatima
female
20
2
Fynney, Mr. Joseph J
male
21
2
Beesley, Mr. Lawrence
male
22
3
McGowan, Miss. Anna "Annie"
female
23
1
Sloper, Mr. William Thompson
male
24
3
Palsson, Miss. Torborg Danira
female
3.将表中元素进行替换
titanic. iloc[ : 3 , 3 ] = "YuHongxia"
titanic. head( )
PassengerId
Survived
Pclass
Name
Sex
Age
SibSp
Parch
Ticket
Fare
Cabin
Embarked
0
1
0
3
YuHongxia
male
22.0
1
0
A/5 21171
7.2500
NaN
S
1
2
1
1
YuHongxia
female
38.0
1
0
PC 17599
71.2833
C85
C
2
3
1
3
YuHongxia
female
26.0
0
0
STON/O2. 3101282
7.9250
NaN
S
3
4
1
1
Futrelle, Mrs. Jacques Heath (Lily May Peel)
female
35.0
1
0
113803
53.1000
C123
S
4
5
0
3
Allen, Mr. William Henry
male
35.0
0
0
373450
8.0500
NaN
S
小结:
选择数据子集时,使用方括号[]不是()。
在括号内,可以使用单个列/行标签,列/行标签列表,标签切片,条件表达式或冒号。
选择特定的行和列时,如果使用行和列名可以使用loc
选择特定的行和列时,如果使用表中的位置可以使用iloc
可以基于loc/ iloc为选择分配新值。