初识机器学习(numpy)

numpy基础

ndarray对象的属相

属性

ndarray.ndim:数组轴的个数
ndarray.shape:数组的维数
ndarray.size:数组元素的总个数
ndarray.dtype:数组元素的数据类型
ndarray.itemsize:数组每个元素所占的字节数
ndarray.flat:将数组扁平化

实例

	from numpy import *
	a=arange(15).reshape(3,5)
	print(a.shape)
	print(a.ndim)
	print(a.dtype.name)
	print(a.itemsize)
	print(a.size)

输出为：

	(3, 5)
	2
	int32
	4
	15

创建

常用方法

zeros_like
ones_like
empty_like
arange
linspace
asarray

类似方法

	from numpy import *
	a=array([2,3,4,5])
	b=array(([1,2,3],[4,5,6]))
	z=zeros((2,3))
	o=ones((2,3))
	e=empty((2,3))
	ey=eye(3)
	id=identity(3)

文件获取

	from numpy import *
	a=genfromtxt(fname='1',delimiter=',')

计算获取

	from numpy import *
	a=fromfunction(lambda i,j:i+j,(2,3))
	print(a)

操作

数组的算数运算是按照元素进行的，新的数组被创建并被结果填充

	>>> a = array( [20,30,40,50] )
	>>> b = arange( 4 )
	>>> b
	array([0, 1, 2, 3])
	>>> c = a-b
	>>> c
	array([20, 29, 38, 47])
	>>> b**2
	array([0, 1, 4, 9])
	>>> 10*sin(a)
	array([ 9.12945251, -9.88031624,  7.4511316 , -2.62374854])
	>>> a<35
	array([True, True, False, False], dtype=bool)

指示按照元素计算，dot按照矩阵计算

	>>> A = array( [[1,1],
	...             [0,1]] )
	>>> B = array( [[2,0],
	...             [3,4]] )
	>>> A*B                         # elementwise product
	array([[2, 0],
		 [0, 4]])
	>>> dot(A,B)                    # matrix product
	array([[5, 4],
		 [3, 4]])

+=、 -=、*=等给该数组的内容

	>>> a = ones((2,3), dtype=int)
	>>> b = random.random((2,3))
	>>> a *= 3
	>>> a
	array([[3, 3, 3],
	  [3, 3, 3]])
	>>> b += a
	>>> b
	array([[ 3.69092703,  3.8324276 ,  3.0114541 ],
 		 [ 3.18679111,  3.3039349 ,  3.37600289]])
	>>> a += b  # b is converted to integer type
	>>> a
	array([[6, 6, 6],
		 [6, 6, 6]])

当运算的是不同类型的数组时，结果数精度会按照精度高的计算

	>>> a = ones(3, dtype=int32)
	>>> b = linspace(0,pi,3)
	>>> b.dtype.name
	'float64'
	>>> c = a+b
	>>> c
	array([ 1.        ,  2.57079633,  4.14159265])
	>>> c.dtype.name
	'float64'
	>>> d = exp(c*1j)
	>>> d
	array([ 0.54030231+0.84147098j, -0.84147098+0.54030231j,
   -0.54030231-0.84147098j])
	>>> d.dtype.name
	'complex128' 许多非数组运算，如计算数组所有元素之和，被作为ndarray类的方法实现

	>>> a = random.random((2,3))
	>>> a
	array([[ 0.6903007 ,  0.39168346,  0.16524769],
   [ 0.48819875,  0.77188505,  0.94792155]])
	>>> a.sum()
	3.4552372100521485
	>>> a.min()
	0.16524768654743593
	>>> a.max()
	0.9479215542670073

常用函数

sum cumsum

	>>> b = arange(12).reshape(3,4)
	>>> b
	array([[ 0,  1,  2,  3],
	  [ 4,  5,  6,  7],
 		[ 8,  9, 10, 11]])
	>>> b.sum(axis=0)        # sum of each column
	array([12, 15, 18, 21])
	>>> b.min(axis=1)        # min of each row
	array([0, 4, 8])
	>>> b.cumsum(axis=1)      # cumulative sum along each row
	array([[ 0,  1,  3,  6],
 		[ 4,  9, 15, 22],
		 [ 8, 17, 27, 38]])

exp sqrt add

	>>> B = arange(3)
	>>> B
	array([0, 1, 2])
	>>> exp(B)
	array([ 1.        ,  2.71828183,  7.3890561 ])
	>>> sqrt(B)
	array([ 0.        ,  1.        ,  1.41421356])
	>>> C = array([2., -1., 4.])
	>>> add(B, C)
	array([ 2.,  0.,  6.])

all any nozero

	from numpy import *
	a=arange(10).reshape(2,5)
	b=a.copy()
	print((a==b).all())
	print((a==b).any())
	print(a.all())

apply_along_axis

	from numpy import *
	def my_func(a):
			return  (a[0] + a[-1]) * 0.5
	b = array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
	print(apply_along_axis(my_func, 0, b))

常用的比较函数

argmax, argmin, argsort:获取最大值对应的索引、最小值对应的索引、索引对应的排序结果
average:表示求均值
median:求中值
max maximum ：最大值，最大值的索引
mean median ：均值，中值
min minmum ：最小值，最小值的索引
sort：排序

clip

	import numpy as np
	a = np.arange(10)
	//则小于1的值变为1，并且大于8的值变为8。
	np.clip(a,min=1,max=8)
	array([1, 1, 2, 3, 4, 5, 6, 7, 8, 8])
	np.clip(a, 3, 6, out=a)
	//可以把结果放置在此数组中
	print(a)
	[3 3 3 3 4 5 6 6 6 6]
	a = np.arange(10)
	#可以用数组进行逐个比较，小于数组中的数时就返回数组的数，要求数组的size要相同
	np.clip(a, [3,4,1,1,1,4,4,4,4,4], 8)
	array([3, 4, 2, 3, 4, 5, 6, 7, 8, 8])

bincount

	x = np.array([0, 1, 1, 3, 2, 1, 7])
	//上式中最大数为7，则构造一个数组b=[0,1,2,3,4,5,6,7]
	x.bincount(x) //会依次判断b中各元素在x中出现的次数，进而形成结果
	array([1, 3, 1, 1, 0, 0, 0, 1])

小数转整数

ceil(取不大于的最大整数)
floor（取不小于的最小整数）
round（四舍五入）

统计类

numpy.cov():作用是计算协方差
numpy.coss():求向量的叉积
numpy.dot():求向量的点积
numpy.cumprod():累计相乘
numpy.cumsum():累计求和
numpy.diff():沿指定轴的离散差值
numpy.lexsort()：按照么某一列进行排序
numpy.prod():表示元素累计相乘
numpy.std():求矩阵的标准差
numpy.trace()：求矩阵的迹
numpy.transpose():矩阵转置
numpy.var():求矩阵的方差

numpy.where(condition,1,2):满足条件的位置上返回结果1，不满足结果的位置上返回结果2

import numpy as np
a=np.arange(10).reshape(2,5)
b=np.where(a>5,a,0)
print(b)
c=np.where(a>5)
print©
print(list(zip(c[0],c[1])))

umpy.ptp()统计最大值与最小值的差值
numpy.searchsorted():排序后查找
numpy.choose(a,chose)：按照序号a,选择chose中的数据

a = [[1, 0, 1], [0, 1, 0], [1, 0, 1]]
choices = [-10, 10]
np.choose(a, choices)
array([[ 10, -10, 10],
[-10, 10, -10],
[ 10, -10, 10]])

np.compress():跟clip很相似，只是只取其中符合条件的数据
np.median():获取中位数
np.fill(shape,fill_value):用于创建数组
np.put(a,index,v):用v替换a索引处的数据

np.putmask(a,mask,value):表示当mask成立时，将其值转化为value

x = np.arange(6).reshape(2, 3)
np.putmask(x, x>2, x**2)
array([[ 0, 1, 2],
[ 9, 16, 25]])
或
x = np.arange(5)
np.putmask(x, x>1, [-33, -44])
x
array([ 0, 1, -33, -44, -33])

集合运算

numpy.unique(x):返回唯一元素
numpy.intersect(x,y)：求交
numpy.union1d((x,y)：求并
numpy.setdiff1d((x,y)：求差
numpy.setxor1d((x,y)：存在于一个数组中，但不同时存在于两个数组中
in1d((x,y)：判断x的元素是否包含于y中

简单索引

一维索引

	>>> a = arange(10)**3
	>>> a
	array([  0,   1,   8,  27,  64, 125, 216, 343, 512, 729])
	>>> a[2]
	8
	>>> a[2:5]
	array([ 8, 27, 64])
	>>> a[:6:2] = -1000    # equivalent to a[0:6:2] = -1000; from start to position 6, exclusive, set every 2nd element to -1000
	>>> a
	array([-1000,     1, -1000,    27, -1000,   125,   216,   343,   512,   729])
	>>> a[ : :-1]                                 # reversed a
	array([  729,   512,   343,   216,   125, -1000,    27, -1000,     1, -1000])

多维索引

	>>> def f(x,y):
	        return 10*x+y
	>>> b = fromfunction(f,(5,4),dtype=int)
	>>> b
	array([[ 0,  1,  2,  3],
   [10, 11, 12, 13],
   [20, 21, 22, 23],
   [30, 31, 32, 33],
   [40, 41, 42, 43]])
	>>> b[2,3]
	23
	>>> b[0:5, 1]                       # each row in the second column of b
	array([ 1, 11, 21, 31, 41])
	>>> b[ : ,1]                        # equivalent to the previous example
	array([ 1, 11, 21, 31, 41])
	>>> b[1:3, : ]                      # each column in the second and third row of b
	array([[10, 11, 12, 13],
 		[20, 21, 22, 23]])
	当少于轴数的索引被提供时，确失的索引被认为是整个切片：
	>>> b[-1]                                  # the last row. Equivalent to b[-1,:]
	array([40, 41, 42, 43])

改变数组的形状

import numpy as np
s=np.arange(4)
ra=s.ravel()#与flatten基本相同
re=s.reshape(1,4)#先按行展开，再按行进行重组;元素个数必须相同
f=s.flatten()#默认按行展开为一维数组
r=s.resize(1,4)#当元素不够时，使用第一个元素补齐
numpy.resize(s,(1,4))#当元素不够时，使用原数据依次补齐

	>>> a=np.array([[0,1],[2,3]])
	>>> np.resize(a,(2,3))
	array([[0, 1, 2],
	   [3, 0, 1]])
	>>> np.resize(a,(1,4))
	array([[0, 1, 2, 3]])
	>>> np.resize(a,(2,4))
	array([[0, 1, 2, 3],
	  [0, 1, 2, 3]])
 
	>>> b = np.array([[0, 1], [2, 3]])
	>>> b.resize(2, 3) 
	>>> b
	array([[0, 1, 2],
   [3, 0, 0]])
	>>> b.resize(1,4)
	array([[0, 1, 2, 3]])
	>>> b.resize(2,4)
	array([[0, 1, 2, 3],
 [0, 0, 0, 0]])

数组的组合

常用方法

vstack
hstack
row_stack
column_stack
concatenate
stack

简单vstack hstack

	>>> a = floor(10*random.random((2,2)))
	>>> a
	array([[ 1.,  1.],
 		  [ 5.,  8.]])
	>>> b = floor(10*random.random((2,2)))
	>>> b
	array([[ 3.,  3.],
  [ 6.,  0.]])
//vstack     
	>>> vstack((a,b))
	array([[ 1.,  1.],
	 [ 5.,  8.],
 	 [ 3.,  3.],
	 [ 6.,  0.]])
 //hstack
	>>> hstack((a,b))
	array([[ 1.,  1.,  3.,  3.],
 	[ 5.,  8.,  6.,  0.]])

stack

首先stack函数用于堆叠数组，其调用方式如下所示：
np.stack(arrays,axis=0)
其中arrays即需要进行堆叠的数组，axis是堆叠时使用的轴，比如：
arrays = [[1,2,3,4], [5,6,7,8]]
这是一个二维数组，axis=0表示的是第一维，也即是arrays[0] = [1,2,3,4]或者arrays[1] = [5,6,7,8]
axis=i时，代表在堆叠时首先选取第i维进行“打包”

	//当执行np.stack(arrays, axis=0)时，取出第一维的1、2、3、4，打包，[1, 2, 3, 4]，其余的类似，然后结果如下：
	>>> arrays = [[1,2,3,4], [5,6,7,8]]  
	>>> arrays=np.array(arrays)  
	>>> np.stack(arrays,axis=0)  
	array([[1, 2, 3, 4],  
 		[5, 6, 7, 8]])  
	//当执行np.stack(arrays, axis=1)时，先对arrays中的第二维进行“打包”，也即是将1、5打包成[1, 5]，其余的类似，结果如下：
	>>> np.stack(arrays, axis=1)  
	array([[1, 5],  
		 [2, 6],  
 		[3, 7],  
		 [4, 8]])

高维练习

	a = np.array([[1,2,3,4], [5,6,7,8]])
	arrays = np.asarray([a, a , a])
	>>> np.stack(arrays, axis=0)  
	array([[[1, 2, 3, 4],  
 		 [5, 6, 7, 8]],  

		[[1, 2, 3, 4],  
  		[5, 6, 7, 8]],  

	 	[[1, 2, 3, 4],  
  		[5, 6, 7, 8]]])  

	>>> np.stack(arrays, axis=1)  
	array([[[1, 2, 3, 4],  
 		 [1, 2, 3, 4],  
 		 [1, 2, 3, 4]],  

		  [[5, 6, 7, 8],  
		  [5, 6, 7, 8],  
 		  [5, 6, 7, 8]]])

	>>> np.stack(arrays, axis=2)  
	array([[[1, 1, 1],  
		 [2, 2, 2],  
		 [3, 3, 3],  
		 [4, 4, 4]],  

	[[5, 5, 5],  
	[6, 6, 6],  
	[7, 7, 7],  
	[8, 8, 8]]])

concatenate

表示在指定的轴上进行拼接，并不会形成新的轴

分割

	>>> a = floor(10*random.random((2,12)))
	>>> a
	array([[ 8.,  8.,  3.,  9.,  0.,  4.,  3.,  0.,  0.,  6.,  4.,  4.],
   [ 0.,  3.,  2.,  9.,  6.,  0.,  4.,  5.,  7.,  5.,  1.,  4.]])
	>>> hsplit(a,3)   # Split a into 3
	[array([[ 8.,  8.,  3.,  9.],
   [ 0.,  3.,  2.,  9.]]), array([[ 0.,  4.,  3.,  0.],
   [ 6.,  0.,  4.,  5.]]), array([[ 0.,  6.,  4.,  4.],
   [ 7.,  5.,  1.,  4.]])]
	>>> hsplit(a,(3,4))   # Split a after the third and the fourth column
	[array([[ 8.,  8.,  3.],
   [ 0.,  3.,  2.]]), array([[ 9.],
   [ 9.]]), array([[ 0.,  4.,  3.,  0.,  0.,  6.,  4.,  4.],
   [ 6.,  0.,  4.,  5.,  7.,  5.,  1.,  4.]])]
	//split可以按照轴分割，比hsplit与vsplit更精确

视图与浅复制

视图

不同的数组对象分享同一个数据。视图方法创造一个新的数组对象指向同一数据

	>>> c = a.view()
	>>> c is a
	False
	>>> c.base is a                        # c is a view of the data owned by a
	True
	>>> c.flags.owndata
	False
	>>> c.shape = 2,6                      # a's shape doesn't change
	>>> a.shape
	(3, 4)
	>>> c[0,4] = 1234                      # a's data changes
	>>> a
	array([[   0,    1,    2,    3],
 	 [1234,    5,    6,    7],
 	 [   8,    9,   10,   11]])#属于浅拷贝

切片数组返回它的一个视图

	>>> s = a[ : , 1:3]     # spaces added for clarity; could also be written "s = a[:,1:3]"
	>>> s[:] = 10           # s[:] is a view of s. Note the difference between s=10 and s[:]=10
	>>> a
	array([[   0,   10,   10,    3],
	 [1234,   10,   10,    7],
	 [   8,   10,   10,   11]])

复制

这个复制方法完全复制数组和它的数据

	>>> d = a.copy()                          # a new array object with new data is created
	>>> d is a
	False
	>>> d.base is a                           # d doesn't share anything with a
	False
	>>> d[0,0] = 9999
	>>> a
	array([[   0,   10,   10,    3],
	     [1234,   10,   10,    7],
	  [ 8,   10,   10,   11]])

广播原则

首先检查两个矩阵维数是否相同，若不同，对维数少的补一。注意这里的维数不是指n行d列中的n和d的值，对于一般的矩阵维数就是2。若一个两维的矩阵（n，d）和一个一维的数组（m，）相乘，补一操作就是将那个一维的数组变为（1，m），补一总是在shape数组的开始补一。

输出数组是输入数组各维度（轴）的最大值，例如（2,3）和（3，）相乘，首先做第一步的维度调整，修正为(2,3)和(1,3)。那么第一维最大是在2和1中选2，第二维最大值是在3和3中选3。那么输出数组维度是（2,3）

检查输入数组各维的数和输出数组各维的关系，要么相等，要么为一。例如第二步中的例子输入数组（2,3）和输出数组（2,3）在各维上都是相等的，而（1,3）和（2,3）虽然第一维不相等，但是却等于1，这也是可以计算的。再举一个反例（2,4）（3，），先做维度调整，变为（2,4）和（1,3），在计算输出数组的维度为（2,4），最后做第三步输入数组（2,4）和输出数组（2,4）相等，但输入数组（1,3）和输出数组（2,4）的第二维不相等也不等于1，所以计算失败。

经过第三步，可以认为输入的两个数组各维的数要么相等要么等于1。对于等于1的维度开始复制增补。例如（1,3）和（3,1）的输出是（3,3）。对于（1,3）要对每一行复制，最终变为（3,3），例如[[2,3,4]]变为[[2,3,4],[2,3,4],[2,3,4]]。对于（3,1）要对每一列复制，最终变为（3,3），例如[[2],[3],[4]]]变为[[2,2,2],[3,3,3],[4,4,4]]。总之哪一维为1就对哪一维复制增补，直到输入数组的形状（shape）和输出数组的形状相同。完成了这一步，两个数组的shape就完全相同了，就可以执行普通的运算了

花式索引

通过数组索引

 >>> a = arange(12)**2        # the first 12 square numbers
 >>> i = array( [ 1,1,3,8,5 ] )  # an array of indices
 >>> a[i]                # the elements of a at the positions i
 array([ 1,  1,  9, 64, 25])
 >>>
 >>> j = array( [ [ 3, 4], [ 9, 7 ] ] ) # a bidimensional array of indices
 >>> a[j]                 # the same shape as j
 array([[ 9, 16],
 	[81, 49]])

重点：当被索引数组a是多维的时，每一个唯一的索引数列指向a的第一维

	>>> palette = array( [ [0,0,0],                # black
	...              [255,0,0],              # red
	...              [0,255,0],              # green
	...              [0,0,255],              # blue
	...              [255,255,255] ] )       # white
	>>> image = array( [ [ 0, 1, 2, 0 ],           # each value corresponds to a color in the palette
	...             [ 0, 3, 4, 0 ]  ] )
	>>> palette[image]                     # the (2,4,3) color image
	array([[[  0,   0,   0],
	      [255,   0,   0],
	      [  0, 255,   0],
	      [  0,   0,   0]],
	     [[  0,   0,   0],
	      [  0,   0, 255],
	      [255, 255, 255],
	      [  0,   0,   0]]])

多维度

 >>> a = arange(12).reshape(3,4)
 >>> a
 array([[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]])
 >>> i = array( [ [0,1],      # indices for the first dim of a
 ...          [1,2] ] )
 >>> j = array( [ [2,1],      # indices for the second dim
 ...          [3,3] ] )
 >>>
 >>> a[i,j]               # i and j must have equal shape
 array([[ 2,  5],
      [ 7, 11]])
 >>>
 >>> a[i,2]
 array([[ 2,  6],
      [ 6, 10]])
 >>>
 >>> a[:,j]               # i.e., a[ : , j]
 array([[[ 2,  1],
       [ 3,  3]],
      [[ 6,  5],
       [ 7,  7]],
      [[10,  9],
       [11, 11]]])

自然，可以将i、j放在一个序列中表示然后通过list索引

	>>> l = [i,j]
	>>> a[l]              # equivalent to a[i,j]
	array([[ 2,  5],
	     [ 7, 11]])

然而，不能把i和j放在一个数组中，因为这个数组将被解释成a的第一维

可以使用数组索引作为目标来赋值

	>>> a = arange(5)
	>>> a
	array([0, 1, 2, 3, 4])
	>>> a[[1,3,4]] = 0
	>>> a
	array([0, 0, 2, 0, 0])

然而，当一个索引列表包含重复时，赋值被多次完成，保留最后的值

>>> a = arange(5)
>>> a[[0,0,2]]=[1,2,3]
>>> a
array([2, 1, 3, 3, 4])

当使用+=时需要注意：

	>>> a = arange(5)
	>>> a[[0,0,2]]+=1
	>>> a
	array([1, 1, 3, 3, 4])

即使0在索引列表中出现两次，索引为0的元素仅仅增加一次。这是因为Python要求a+=1 和 a=a+1 等同。

通过布尔值索引

当我们使用整数数组索引数组时，我们提供一个索引列表去选择。通过布尔数组索引的方法是不同的我们显式地选择数组中我们想要和不想要的元素。

	>>> a = arange(12).reshape(3,4)
	>>> b = a > 4
	>>> b                                          # b is a boolean with a's shape
	array([[False, False, False, False],
	      [False, True, True, True],
	      [True, True, True, True]], dtype=bool)
	>>> a[b]                                       # 1d array with the selected elements
	array([ 5,  6,  7,  8,  9, 10, 11])

第二种通过布尔来索引的方法更近似于整数索引；对数组的每个维度我们给一个一维布尔数组来选择我们想要的切片

	>>> a = arange(12).reshape(3,4)
	>>> b1 = array([False,True,True])             # first dim selection
	>>> b2 = array([True,False,True,False])       # second dim selection
	>>>
	>>> a[b1,:]                                   # selecting rows
	array([[ 4,  5,  6,  7],
	     [ 8,  9, 10, 11]])
	>>>
	>>> a[b1]                                     # same thing
	array([[ 4,  5,  6,  7],
	     [ 8,  9, 10, 11]])
	>>>
	>>> a[:,b2]                                   # selecting columns
	array([[ 0,  2],
	     [ 4,  6],
	     [ 8, 10]])
	>>>
	>>> a[b1,b2]                                  # a weird thing to do
	array([ 4, 10])

ix_函数

ix_ 函数可以为了获得多元组的结果而用来结合不同向量。例如，如果你想要用所有向量a、b和c元素组成的三元组来计算 a+b*c ：

	>>> a = array([2,3,4,5])
	>>> b = array([8,5,4])
	>>> c = array([5,4,6,8,3])
	>>> ax,bx,cx = ix_(a,b,c)
	>>> ax
	array([[[2]],
	     [[3]],
	     [[4]],
	     [[5]]])
	>>> bx
	array([[[8],
	      [5],
	      [4]]])
	>>> cx
	array([[[5, 4, 6, 8, 3]]])
	>>> ax.shape, bx.shape, cx.shape
	((4, 1, 1), (1, 3, 1), (1, 1, 5))
	>>> result = ax+bx*cx

shuffle与permutation区别

shuffle 的参数只能是 array_like，而 permutation 除了 array_like 还可以是 int 类型，如果是 int 类型，那就随机打乱 numpy.arange(int)。
shuffle 返回 None，这点尤其要注意，也就是说没有返回值，而 permutation 则返回打乱后的 array。

	import numpy as np
	shuffle_index=np.random.permutation(len(x))

初识机器学习(numpy)

numpy基础

猜你喜欢