Numpy库学习笔记

数组创建

Numpy中的Array数据对象和普通Python中的数组对象不同，它具有许多的特殊属性，便于开发者操作和查看（ndarray又被冲明明为array）

ndarray.ndim - 数组的轴（维度）的个数。在Python世界中，维度的数量被称为rank。
ndarray.shape - 数组的维度。这是一个整数的元组，表示每个维度中数组的大小。对于有 n 行和 m 列的矩阵，shape 将是 (n,m)。因此，shape 元组的长度就是rank或维度的个数 ndim。
ndarray.size - 数组元素的总数。这等于 shape 的元素的乘积。
ndarray.dtype - 一个描述数组中元素类型的对象。可以使用标准的Python类型创建或指定dtype。另外NumPy提供它自己的类型。例如numpy.int32、numpy.int16和numpy.float64。
ndarray.itemsize - 数组中每个元素的字节大小。例如，元素为 float64 类型的数组的 itemsize 为8（=64/8），而 complex32 类型的数组的 itemsize 为4（=32/8）。它等于 ndarray.dtype.itemsize 。
ndarray.data - 该缓冲区包含数组的实际元素。通常，我们不需要使用此属性，因为我们将使用索引访问数组中的元素。

//常见的Array创建方式：
>>> import numpy as np
>>> a = np.array([2,3,4])
>>> a
array([2, 3, 4])
>>> a.dtype
dtype('int64')
>>> b = np.array([1.2, 3.5, 5.1])
>>> b.dtype
dtype('float64')

注意创建数组时传入的数据类型时列表而不是多个数字参数

创建特殊数组可以使用使用np提供的函数来创建特殊函数

>>> np.zeros( (3,4) )
array([[ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.]])
>>> np.ones( (2,3,4), dtype=np.int16 )                # dtype can also be specified
array([[[ 1, 1, 1, 1],
        [ 1, 1, 1, 1],
        [ 1, 1, 1, 1]],
       [[ 1, 1, 1, 1],
        [ 1, 1, 1, 1],
        [ 1, 1, 1, 1]]], dtype=int16)
>>> np.empty( (2,3) )                                 # uninitialized, output may vary
array([[  3.73603959e-262,   6.02658058e-154,   6.55490914e-260],
       [  5.30498948e-313,   3.14673309e-307,   1.00000000e+000]])

注意要使用一个对象来接受这个数据类型

为了创建数字组成的数组，NumPy提供了一个类似于range的函数，该函数返回数组而不是列表–>arange函数

>>> np.arange( 10, 30, 5 )
array([10, 15, 20, 25])
>>> np.arange( 0, 2, 0.3 )                 # it accepts float arguments
array([ 0. ,  0.3,  0.6,  0.9,  1.2,  1.5,  1.8])
//range前两个参数是范围的闭区间，后面的是range注意如果range的边界恰好在第二个参数的时候最后一个参数要进入函数循环or创建数组

当arange与浮点参数一起使用时，由于有限的浮点精度，通常不可能预测所获得的元素的数量。出于这个原因，通常最好使用linspace函数来接收我们想要的元素数量的函数，而不是步长（step）：

>>> from numpy import pi
>>> np.linspace( 0, 2, 9 )                 # 9 numbers from 0 to 2
array([ 0.  ,  0.25,  0.5 ,  0.75,  1.  ,  1.25,  1.5 ,  1.75,  2.  ])
>>> x = np.linspace( 0, 2*pi, 100 )        # useful to evaluate function at lots of points
>>> f = np.sin(x)

打印数组

当您打印数组时，NumPy以与嵌套列表类似的方式显示它，但具有以下布局：

最后一个轴从左到右打印，
倒数第二个从上到下打印，
其余部分也从上到下打印，每个切片用空行分隔。

然后将一维数组打印为行，将二维数据打印为矩阵，将三维数据打印为矩数组表。

>>> a = np.arange(6)                         # 1d array
>>> print(a)
[0 1 2 3 4 5]
>>>
>>> b = np.arange(12).reshape(4,3)           # 2d array
>>> print(b)
[[ 0  1  2]
 [ 3  4  5]
 [ 6  7  8]
 [ 9 10 11]]
>>>
>>> c = np.arange(24).reshape(2,3,4)         # 3d array （三维数组、二维数组嵌套）
>>> print(c)
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]
 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]

如果数组太大而无法打印，NumPy会自动跳过数组的中心部分并仅打印角点：

要禁用此行为并强制NumPy打印整个数组，可以使用更改打印选项set_printoptions

 np.set_printoptions(threshold=sys.maxsize)       # sys module should be imported  										#更改属性

索引、切片和迭代

一维的数组可以进行索引、切片和迭代操作的，就像列表和其他Python序列类型一样：

>>> a = np.arange(10)**3
>>> a
array([  0,   1,   8,  27,  64, 125, 216, 343, 512, 729])
>>> a[2]
8
>>> a[2:5]
array([ 8, 27, 64])
>>> a[:6:2] = -1000    # equivalent to a[0:6:2] = -1000; from start to position 6, exclusive, set every 2nd element to -1000
>>> a
array([-1000,     1, -1000,    27, -1000,   125,   216,   343,   512,   729])
>>> a[ : :-1]                                 # reversed a
array([  729,   512,   343,   216,   125, -1000,    27, -1000,     1, -1000])
>>> for i in a:
...     print(i**(1/3.))
...
nan
1.0
nan
3.0
nan
5.0
6.0
7.0
8.0
9.0

注意切片操作a[2:5]–>数组索引从0开始而且切片操作对应第一个参数进入切片，而第二个参数不计入切片
多维数组切片：多位数组每个轴可以有一个索引，这些索引以逗号分隔的元组给出：

>>> def f(x,y):
...     return 10*x+y
...
>>> b = np.fromfunction(f,(5,4),dtype=int)
>>> b
array([[ 0,  1,  2,  3],
       [10, 11, 12, 13],
       [20, 21, 22, 23],
       [30, 31, 32, 33],
       [40, 41, 42, 43]])
>>> b[2,3]
23
>>> b[0:5, 1]                       # each row in the second column of b
array([ 1, 11, 21, 31, 41])
>>> b[ : ,1]                        # equivalent to the previous example
array([ 1, 11, 21, 31, 41])
>>> b[1:3, : ]                      # each column in the second and third row of b
array([[10, 11, 12, 13],
       [20, 21, 22, 23]])

多维的数组每个轴可以有一个索引。这些索引以逗号分隔的元组给出：

即多维索引从头到尾都是对不同的轴的定位

>>> def f(x,y):...     return 10*x+y...>>> b = np.fromfunction(f,(5,4),dtype=int)>>> barray([[ 0,  1,  2,  3],       [10, 11, 12, 13],       [20, 21, 22, 23],       [30, 31, 32, 33],       [40, 41, 42, 43]])

形状操纵

一个数组的形状是由每个轴的元素数量决定的

a = np.floor(10*np.random.random((3,4)))>>> aarray([[ 2.,  8.,  0.,  6.],       [ 4.,  5.,  1.,  1.],       [ 8.,  9.,  3.,  6.]])>>> a.shape(3, 4)

可以用各种命令更改数组的形状，但是以下三个命令都返回一个修改后的数组，但不会更改原始数组

>>> a.ravel()  # returns the array, flattenedarray([ 2.,  8.,  0.,  6.,  4.,  5.,  1.,  1.,  8.,  9.,  3.,  6.])>>> a.reshape(6,2)  # returns the array with a modified shapearray([[ 2.,  8.],       [ 0.,  6.],       [ 4.,  5.],       [ 1.,  1.],       [ 8.,  9.],       [ 3.,  6.]])>>> a.T  # returns the array, transposedarray([[ 2.,  4.,  8.],       [ 8.,  5.,  9.],       [ 0.,  1.,  3.],       [ 6.,  1.,  6.]])>>> a.T.shape(4, 3)>>> a.shape(3, 4)

将不同数组堆叠在一起

几个数组可以沿不同的轴堆叠在一起，例如：

>>> a = np.floor(10*np.random.random((2,2)))>>> aarray([[ 8.,  8.],       [ 0.,  0.]])>>> b = np.floor(10*np.random.random((2,2)))>>> barray([[ 1.,  8.],       [ 0.,  4.]])>>> np.vstack((a,b)) #行堆叠array([[ 8.,  8.],       [ 0.,  0.],       [ 1.,  8.],       [ 0.,  4.]])>>> np.hstack((a,b)) #列堆叠array([[ 8.,  8.,  1.,  8.],       [ 0.,  0.,  0.,  4.]])

该函数将colum_stack1D数组作为列堆叠在2D数组中。它仅相当于hstack数组

>>> from numpy import newaxis>>> np.column_stack((a,b))     # with 2D arraysarray([[ 8.,  8.,  1.,  8.],       [ 0.,  0.,  0.,  4.]])>>> a = np.array([4.,2.])>>> b = np.array([3.,8.])>>> np.column_stack((a,b))     # returns a 2D arrayarray([[ 4., 3.],       [ 2., 8.]])>>> np.hstack((a,b))           # the result is differentarray([ 4., 2., 3., 8.])>>> a[:,newaxis]               # this allows to have a 2D columns vectorarray([[ 4.],       [ 2.]])>>> np.column_stack((a[:,newaxis],b[:,newaxis]))array([[ 4.,  3.],       [ 2.,  8.]])>>> np.hstack((a[:,newaxis],b[:,newaxis]))   # the result is the samearray([[ 4.,  3.],       [ 2.,  8.]])

将一个数组拆分成几个较小的数组

使用hsplit，可以沿着、数组的水平轴拆分数组，方法是指定要返回的形状相等数组的数量，或者指定应该在其之后进行分隔的。

>>> a = np.floor(10*np.random.random((2,12)))>>> aarray([[ 9.,  5.,  6.,  3.,  6.,  8.,  0.,  7.,  9.,  7.,  2.,  7.],       [ 1.,  4.,  9.,  2.,  2.,  1.,  0.,  6.,  2.,  2.,  4.,  0.]])>>> np.hsplit(a,3)   # Split a into 3[array([[ 9.,  5.,  6.,  3.],       [ 1.,  4.,  9.,  2.]]), array([[ 6.,  8.,  0.,  7.],       [ 2.,  1.,  0.,  6.]]), array([[ 9.,  7.,  2.,  7.],       [ 2.,  2.,  4.,  0.]])]>>> np.hsplit(a,(3,4))   # Split a after the third and the fourth column[array([[ 9.,  5.,  6.],       [ 1.,  4.,  9.]]), array([[ 3.],       [ 2.]]), array([[ 6.,  8.,  0.,  7.,  9.,  7.,  2.,  7.],       [ 2.,  1.,  0.,  6.,  2.,  2.,  4.,  0.]])]

拷贝和视图

当计算和操作数组时，有时会将数据复制到新数组中，有时则不会。这通常是初学者混淆的根源。有三种情况：

#完全不复制

简单分配不会复制数组对象或其数据。

>>> a = np.arange(12)>>> b = a            # no new object is created>>> b is a           # a and b are two names for the same ndarray objectTrue>>> b.shape = 3,4    # changes the shape of a>>> a.shape(3, 4)

Python将可变对象作为引用传递，因此函数调用不会复制。

>>> def f(x):...     print(id(x))...>>> id(a)                           # id is a unique identifier of an object148293216>>> f(a)148293216

#视图或浅拷贝

不同的数组对象可以共享相同的数据。该view方法创建一个查看相同数据的新数组对象。

>>> c = a.view()>>> c is aFalse>>> c.base is a                        # c is a view of the data owned by aTrue>>> c.flags.owndataFalse>>>>>> c.shape = 2,6                      # a's shape doesn't change>>> a.shape(3, 4)>>> c[0,4] = 1234                      # a's data changes>>> aarray([[   0,    1,    2,    3],       [1234,    5,    6,    7],       [   8,    9,   10,   11]])

切片数组会返回一个视图：

>>> s = a[ : , 1:3]     # spaces added for clarity; could also be written "s = a[:,1:3]">>> s[:] = 10           # s[:] is a view of s. Note the difference between s=10 and s[:]=10>>> aarray([[   0,   10,   10,    3],       [1234,   10,   10,    7],       [   8,   10,   10,   11]])

#深拷贝

该copy方法生成数组及其数据的完整副本。

>>> d = a.copy()                          # a new array object with new data is created>>> d is aFalse>>> d.base is a                           # d doesn't share anything with aFalse>>> d[0,0] = 9999>>> aarray([[   0,   10,   10,    3],       [1234,   10,   10,    7],       [   8,   10,   10,   11]])

有时，如果不再需要原始数组，则应在切片后调用 copy。例如，假设a是一个巨大的中间结果，最终结果b只包含a的一小部分，那么在用切片构造b时应该做一个深拷贝：

>>> a = np.arange(int(1e8))>>> b = a[:100].copy()>>> del a  # the memory of ``a`` can be released.

如果改为使用 b = a[:100]，则 a 由 b 引用，并且即使执行 del a 也会在内存中持久存在。

Python Numpy库笔记整理