pandas入门之Series、DataFrame和Index

(一)Series

可以把 Pandas 的 Series 对象看成一种特殊的 Python 字典 “{ }”, 将类型键映射到类型值.(显式索引)。

data.values , data.index

data = pd.Series([0.25, 0.5, 0.75, 1.0], index=['a', 'b', 'c', 'd'])  #创建pd.Series对象，**这是由字典列表创建**
data:  
a 0.25
b 0.50
c 0.75
d 1.00

Series对象支持切片操作。

data["a":"v"]
Output:
a    1
b    2
v    3

(二)Index
0 : 类似数组一样的操作去获取数值，也可以切片。
1：不能通过index索引改变数组的数值。
2 : 可以实现一些pandas对象的集合操作，并、交、差。
3：注意显式索引做切片时包括最后一个索引，隐式索引切片不包括最后一个索引。

索引器-indexer：可以暴露切片接口的属性，避免混乱, loc , iloc , ix

(三)DataFrame

DataFrame 可以看作一种通用的 NumPy 二维数组size(a,b)，相当于多个Series组成，一个Series是一个列。它的行与列都可以通过索引获取。
pd.DataFrame（）.index #索引标签，输出相当于EXCEL最左侧一列。
pd.DataFrame（）.collums #返回一个存放列标签的index对象。
缺失值会用NaN表示：Not a number.

创建pd.DataFrame:几种方法
0 : 通过单个 Series 对象创建
1 : 字典列表
2：通过Series对象字典
3：通过numpy二维数组创建
4 : 通过 NumPy 结构化数组创建

pd.DataFrame(population, columns=['population'])     #way0
Output:                               population
                 California           38332521
                 Florida              19552860
                 Illinois             12882135
                 New York             19651127
                 Texas                26448193

data = [{'a': i, 'b': 2 * i}for i in range(3)]        #way1
pd.DataFrame(data)
Out[24]:      a b
			0 0 0
			1 1 2
			2 2 4              

pd.DataFrame({'population': population,'area': area})	#way2  
Out[24]:               area population
           California 423967 38332521
			Florida   170312 19552860
			Illinois  149995 12882135
			New York  141297 19651127
			Texas     695662 26448193

pd.DataFrame(np.random.rand(3, 2),columns=['foo', 'bar'], index=['a', 'b', 'c'])  #way3  
Out[27]:     foo      bar
		a 0.865257 0.213169
		b 0.442759 0.108267
		c 0.047110 0.905718

A = np.zeros(3, dtype=[('A', 'i8'), ('B', 'f8')])          #way4
pd.DataFrame(A)
Out[29]:   A   B
		0  0  0.0
		1  0  0.0
		2  0  0.0

pandas入门之Series、DataFrame和Index

猜你喜欢