写在前面：

这是我的第一篇关于深度学习的处女作，转入这个领域不久，还有很多的知识需要自己去学习和完善，深度学习与我而言，充满了数学的魅力与编程的乐趣，写这篇文章，第一可能是为了为自己划一个更深的记忆痕迹，第二也是希望以文章的形式能为更多喜欢深度学习的小朋友带来便利。力求让大家以最轻松的姿态理解吴恩达的视频，如有不妥的地方欢迎大家指正。

===============================================================
各位同学大家好，今天我们开始第一天吴恩达课程的学习：第一个编程是Logistic回归：

Logistic Regression with a Neural Network mindset

Welcome to your first (required) programming assignment! You will build a logistic regression classifier to recognize cats. This assignment will step you through how to do this with a Neural Network mindset, and so will also hone your intuitions about deep learning.

欢迎来到第一个编程练习，在这里你将建立一个用来识别猫的逻辑回归

Instructions:

Do not use loops (for/while) in your code, unless the instructions explicitly ask you to do so.

注意：
不要在你的代码中使用循环，除非明确要求你这么做。

You will learn to:

Build the general architecture of a learning algorithm, including:
- Initializing parameters
- Calculating the cost function and its gradient
- Using an optimization algorithm (gradient descent)
Gather all three functions above into a main model function, in the right order.

你将学习到：
1.建立一个学习算法的总体框架，包括：
1.1. 初始化参数
1.2.计算损失函数和它的梯度
1.3.使用最优算法（梯度下降）
2. 按照正确的顺序将上述三个函数集合到一个主模型函数中。

1 - Package

First, let’s run the cell below to import all the packages that you will need during this assignment.

numpy is the fundamental package for scientific computing with Python.
h5py is a common package to interact with a dataset that is stored on an H5 file.
matplotlib is a famous library to plot graphs in Python.
PIL and scipy are used here to test your model with your own picture at the end.

1-包：
首先，让我们运行下面的单元来导入在此任务期间需要的所有包。
numpy: 这是一个用于科学计算的基础包
h5py: 一个与存储在H5文件中的数据集交互的通用包。
matplotlib:这是一个python中用于画图的包
PLT和scipy: 这两个包在这使用是为了最后测试你自己的图片的模型

扫描二维码关注公众号，回复： 14695118 查看本文章

import numpy as np
import matplotlib.pyplot as plt
import h5py
import scipy
from PIL import Image
from scipy import ndimage
from lr_utils import load_dataset

%matplotlib inline

2 - Overview of the Problem set

Problem Statement: You are given a dataset (“data.h5”) containing:

a training set of m_train images labeled as cat (y=1) or non-cat (y=0)
a test set of m_test images labeled as cat or non-cat

each image is of shape (num_px, num_px, 3) where 3 is for the 3 channels (RGB). Thus, each image is square (height = num_px) and (width = num_px).

You will build a simple image-recognition algorithm that can correctly classify pictures as cat or non-cat.

Let’s get more familiar with the dataset. Load the data by running the following code.

问题集叙述：
问题叙述：你得到的数据集（“data.h5”）包含如下：
一个训练集M个训练图片的标签y=1表示有猫，y=0表示无猫。
一个测试集有m个测试的图片标签作为有猫和无猫的判断。
每一个图片的形状(num_px, num_px, 3),3表示RGB三个通道。第一个num_px表示高度，第二个num_px表示宽度

你将建立一个简单的图片分类算法，正确的分类出这张图片是否为猫
让我们更加的熟悉数据集，运行以下代码：

# Loading the data (cat/non-cat)
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
# 以下为了更好的测试代码，将这些数据的大小等基本信息打印出来供参考
print("train_set_x_orig.shape =",train_set_x_orig.shape)
print("train_set_y.shape =",train_set_y.shape)
print("test_set_x_orig.shape =",test_set_x_orig.shape)
print("test_set_y.shape =",test_set_y.shape)
print("classes =",classes)

运行结果：
train_set_x_orig.shape = (209, 64, 64, 3)
train_set_y.shape = (1, 209)
test_set_x_orig.shape = (50, 64, 64, 3)
test_set_y.shape = (1, 50)
classes = [b’non-cat’ b’cat’]

We added “_orig” at the end of image datasets (train and test) because we are going to preprocess them. After preprocessing, we will end up with train_set_x and test_set_x (the labels train_set_y and test_set_y don’t need any preprocessing).

Each line of your train_set_x_orig and test_set_x_orig is an array representing an image. You can visualize an example by running the following code. Feel free also to change the index value and re-run to see other images.

我们添加了"_orig"在图片数据集的最后(train and test)因为我们将对它们进行预处理。在预处理之后，我们将得到train_set_x和test_set_x(标签train_set_y和test_set_y不需要任何预处理)。

train_set_x_orig和test_set_x_orig的每一行都是一个表示图像的数组。您可以通过运行以下代码来可视化示例。您也可以随意更改索引值（index）来查看其他图像。

# Example of a picture
index = 25
# print(train_set_x_orig[index])  # 自己填加的一行，可以打印出来这个图片所对应的矩阵
plt.imshow(train_set_x_orig[index])
print ("y = " + str(train_set_y[:, index]) + ", it's a '" + classes[np.squeeze(train_set_y[:, index])].decode("utf-8") +  "' picture.")

运行结果：
y = [1], it’s a ‘cat’ picture.

Many software bugs in deep learning come from having matrix/vector dimensions that don’t fit. If you can keep your matrix/vector dimensions straight you will go a long way toward eliminating many bugs.

Exercise: Find the values for:
- m_train (number of training examples)
- m_test (number of test examples)
- num_px (= height = width of a training image)
Remember that train_set_x_orig is a numpy-array of shape (m_train, num_px, num_px, 3). For instance, you can access m_train by writing train_set_x_orig.shape[0].

深度学习中的许多软件bug都来自于不适合的矩阵/向量维数。如果你能保持你的矩阵/向量维数正确一致，你将会对消除许多bug大有帮助。

练习：找到值

m_train: 训练样例的大小

m_test: 测试样例的大小

num_px: 训练图片的宽和高

记住：train_set_x_orig是一个numpy数组的形状(num_px, num_px, 3).比如，你可以写m_train 通过train_set_x_orig.shape[0].来获得。

### START CODE HERE ### (≈ 3 lines of code)
m_train = train_set_y.shape[1]  # 获取训练样本的大小
m_test = test_set_y.shape[1]  # 获取测试样本的大小
num_px = train_set_x_orig.shape[1]
### END CODE HERE ###

print ("Number of training examples: m_train = " + str(m_train))
print ("Number of testing examples: m_test = " + str(m_test))
print ("Height/Width of each image: num_px = " + str(num_px))
print ("Each image is of size: (" + str(num_px) + ", " + str(num_px) + ", 3)")
print ("train_set_x shape: " + str(train_set_x_orig.shape))
print ("train_set_y shape: " + str(train_set_y.shape))
print ("test_set_x shape: " + str(test_set_x_orig.shape))
print ("test_set_y shape: " + str(test_set_y.shape))

运行结果：

Number of training examples: m_train = 209
Number of testing examples: m_test = 50
Height/Width of each image: num_px = 64
Each image is of size: (64, 64, 3)
train_set_x shape: (209, 64, 64, 3)
train_set_y shape: (1, 209)
test_set_x shape: (50, 64, 64, 3)
test_set_y shape: (1, 50)

For convenience, you should now reshape images of shape (num_px, num_px, 3) in a numpy-array of shape (num_px $*$ num_px $*$ 3, 1). After this, our training (and test) dataset is a numpy-array where each column represents a flattened image. There should be m_train (respectively m_test) columns.

Exercise: Reshape the training and test data sets so that images of size (num_px, num_px, 3) are flattened into single vectors of shape (num_px $*$ num_px $*$ 3, 1).

A trick when you want to flatten a matrix X of shape (a,b,c,d) to a matrix X_flatten of shape (b $*$ c $*$ d, a) is to use:

X_flatten = X.reshape(X.shape[0], -1).T      # X.T is the transpose of X

为了方便起见，现在应该将图片的形状(num_px, num_px, 3)reshape成numpy的数组形状(num_px∗num_px∗3,1)。在这之后，我们的训练(和测试)数据集是一个数字数组，
其中每一列表示一个扁平图像。应该有m_train(分别是m_test)列。

练习:reshape训练和测试数据集，这样大小的图像(num_px, num_px, 3)被平化成单个的形状向量(num_px∗num_px∗3,1)。
一个技巧: 当你想把一个矩阵X(a,b,c,d)平坦化为一个矩阵X_flatten(b∗c∗d, a),
你应该：
X_flatten = X.reshape(X.shape[0], -1).T # X.T is the transpose of X

# Reshape the training and test examples

### START CODE HERE ### (≈ 2 lines of code)
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1).T  # (209, 64, 64, 3)将训练集展开成（64*64*3, 209）
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1).T  # (50, 64, 64, 3)将测试集展开成（64*64*3, 50）
### END CODE HERE ###

print ("train_set_x_flatten shape: " + str(train_set_x_flatten.shape))
print ("train_set_y shape: " + str(train_set_y.shape))
print ("test_set_x_flatten shape: " + str(test_set_x_flatten.shape))
print ("test_set_y shape: " + str(test_set_y.shape))
print ("sanity check after reshaping: " + str(train_set_x_flatten[0:5,0]))

运行结果：
train_set_x_flatten shape: (12288, 209)
train_set_y shape: (1, 209)
test_set_x_flatten shape: (12288, 50)
test_set_y shape: (1, 50)
sanity check after reshaping: [17 31 56 22 33]

To represent color images, the red, green and blue channels (RGB) must be specified for each pixel, and so the pixel value is actually a vector of three numbers ranging from 0 to 255.

One common preprocessing step in machine learning is to center and standardize your dataset, meaning that you substract the mean of the whole numpy array from each example, and then divide each example by the standard deviation of the whole numpy array. But for picture datasets, it is simpler and more convenient and works almost as well to just divide every row of the dataset by 255 (the maximum value of a pixel channel).

Let’s standardize our dataset.

要表示彩色图像，必须为每个像素指定红、绿、蓝通道(RGB)，因此像素值实际上是三个数字的向量，从0到255。
机器学习中一个常见的预处理步骤是集中和标准化数据集，这意味着从每个示例中减去整个numpy数组的平均值，然后用每个示例除以整个numpy数组的标准差。但对于图片数据集来说，它更简单、更方便，而且工作效果也差不多，只需将数据集的每一行除以255(像素通道的最大值)。

让我们对数据集进行标准化。

train_set_x = train_set_x_flatten/255.
test_set_x = test_set_x_flatten/255.
# print(train_set_x_flatten)
# print(test_set_x_flatten)
# print(train_set_x)

What you need to remember:

Common steps for pre-processing a new dataset are:

Figure out the dimensions and shapes of the problem (m_train, m_test, num_px, …)
Reshape the datasets such that each example is now a vector of size (num_px * num_px * 3, 1)
“Standardize” the data

以下是你需要记住的：

预处理一个新数据集的常见步骤是:

找出问题的维度和形状(m_train, m_test, num_px，…)
reshape数据集，使每个示例现在是一个大小矢量(num_px * num_px * 3,1)
“标准化”数据

3 - General Architecture of the learning algorithm

It’s time to design a simple algorithm to distinguish cat images from non-cat images.

You will build a Logistic Regression, using a Neural Network mindset. The following Figure explains why Logistic Regression is actually a very simple Neural Network!

在这里插入图片描述

Mathematical expression of the algorithm:

For one example $x^{(i)}$ :
$z^{(i)} = w^T x^{(i)} + b \tag{1}$
$\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})\tag{2}$
$\mathcal{L}(a^{(i)}, y^{(i)}) = - y^{(i)} \log(a^{(i)}) - (1-y^{(i)} ) \log(1-a^{(i)})\tag{3}$

The cost is then computed by summing over all training examples:
$\frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})\tag{6}$

Key steps:
In this exercise, you will carry out the following steps:
- Initialize the parameters of the model
- Learn the parameters for the model by minimizing the cost
- Use the learned parameters to make predictions (on the test set)
- Analyse the results and conclude

3 - 学习算法的通用架构

上面说了那么多，是时候设计一个简单的算法来区分猫和非猫的图像了。

您将使用神经网络的思维方式构建一个逻辑回归。下图解释了为什么逻辑回归实际上是一个非常简单的神经网络!
在这里插入图片描述
算法的数学表达
对矩阵 $x^{(i)}$ 先进行如下的处理如下:
$z^{(i)} = w^T x^{(i)} + b \tag{1}$
$\hat{y}^{(i)} = a^{(i)} = sigmoid(z^{(i)})\tag{2}$
$\mathcal{L}(a^{(i)}, y^{(i)}) = - y^{(i)} \log(a^{(i)}) - (1-y^{(i)} ) \log(1-a^{(i)})\tag{3}$

然后，通过对所有训练示例求和来计算成本:
$\frac{1}{m} \sum_{i=1}^m \mathcal{L}(a^{(i)}, y^{(i)})\tag{6}$

关键步骤: 在本练习中，您将执行以下步骤:

初始化模型的参数
通过最小化成本学习模型的参数
使用学习到的参数进行预测(在测试集上)
分析结果并作出结论

4 - Building the parts of our algorithm

The main steps for building a Neural Network are:

Define the model structure (such as number of input features)
Initialize the model’s parameters
Loop:
- Calculate current loss (forward propagation)
- Calculate current gradient (backward propagation)
- Update parameters (gradient descent)

You often build 1-3 separately and integrate them into one function we call model().

4.1 - Helper functions

Exercise: Using your code from “Python Basics”, implement sigmoid(). As you’ve seen in the figure above, you need to compute $w^T x + b) = \frac{1}{1 + e^{-(w^T x + b)}}$ to make predictions. Use np.exp().

4 -构建算法的一部分

构建神经网络的主要步骤是:

定义模型结构(例如输入特性的数量)
初始化模型的参数
循环:
- 计算当前损耗(正向传播)
- 计算当前梯度(反向传播)
- 更新参数(梯度下降)

您通常分别构建1-3个函数，并将它们集成到一个称为model()的函数中。

4.1 -辅助函数

练习:使用Python，实现“sigmoid()”。正如您在上图中看到的，您需要计算 $sigmoid(w^T x + b) = \frac{1}{1 + e^{-(w^T x + b)}}$ 来进行预测。使用np.exp()。

# GRADED FUNCTION: sigmoid

def sigmoid(z):
    """
    Compute the sigmoid of z

    Arguments:
    z -- A scalar or numpy array of any size.

    Return:
    s -- sigmoid(z)
    """

    ### START CODE HERE ### (≈ 1 line of code)
    s = 1 / (1 + np.exp(-z))  # sigmoid函数的实现
    ### END CODE HERE ###
    
    return s

print ("sigmoid([0, 2]) = " + str(sigmoid(np.array([0,2]))))

运行结果
sigmoid([0, 2]) = [0.5 0.88079708]

4.2 - Initializing parameters

Exercise: Implement parameter initialization in the cell below. You have to initialize w as a vector of zeros. If you don’t know what numpy function to use, look up np.zeros() in the Numpy library’s documentation.

4.2 - 初始化参数

练习：在下面的代码中实现参数初始化。你必须把w初始化为一个零向量。如果您不知道要使用哪个numpy函数，可以在numpy库的文档中查找np.zeros()。

# GRADED FUNCTION: initialize_with_zeros

def initialize_with_zeros(dim):
    """
    This function creates a vector of zeros of shape (dim, 1) for w and initializes b to 0.
    
    :
    dim -- size of the w vector we want (or number of parameters in this case)
    
    Returns:
    w -- initialized vector of shape (dim, 1)
    b -- initialized scalar (corresponds to the bias)
    """
    
    ### START CODE HERE ### (≈ 1 line of code
    w = np.zeros(shape = (dim,1))  # 初始化w为 （dim,1）的0向量
    b = 0  # 初始化b为0
    ### END CODE HERE ###

    assert(w.shape == (dim, 1))  # 检查w的形状是否为（dim,1）不是的话就终止
    assert(isinstance(b, float) or isinstance(b, int))
    
    return w, b

dim = 2
w, b = initialize_with_zeros(dim)
print ("w = " + str(w))
print ("b = " + str(b))

运行结果
w = [[0.]
[0.]]
b = 0

For image inputs, w will be of shape (num_px × num_px × 3, 1).
对于图片的输入，w的形状为(num_px × num_px × 3, 1).

4.3 - Forward and Backward propagation

Now that your parameters are initialized, you can do the “forward” and “backward” propagation steps for learning the parameters.

Exercise: Implement a function propagate() that computes the cost function and its gradient.

Hints:

Forward Propagation:

You get X
You compute $\sigma(w^T X + b) = (a^{(0)}, a^{(1)}, ..., a^{(m-1)}, a^{(m)})$
You calculate the cost function: $-\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)})$

Here are the two formulas you will be using:

$\frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T\tag{7}$
$\frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})\tag{8}$

4.3 - 前向传播和反向传播

现在已经初始化了参数，您可以执行“前向”和“反向”传播步骤来学习参数。
练习:实现一个函数“propagate()”，它计算代价函数及其梯度。
提示:
前向传播:

得到X
计算 $\sigma(w^T X + b) = (a^{(0)}, a^{(1)}, ..., a^{(m-1)}, a^{(m)})$
计算成本函数: $-\frac{1}{m}\sum_{i=1}^{m}y^{(i)}\log(a^{(i)})+(1-y^{(i)})\log(1-a^{(i)})$
这里有两个计算梯度的公式公式，你将使用:
$\frac{\partial J}{\partial w} = \frac{1}{m}X(A-Y)^T\tag{7}$
$\frac{\partial J}{\partial b} = \frac{1}{m} \sum_{i=1}^m (a^{(i)}-y^{(i)})\tag{8}$

# GRADED FUNCTION: propagate

def propagate(w, b, X, Y):
    """
    Implement the cost function and its gradient for the propagation explained above

    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)

    Return:
    cost -- negative log-likelihood cost for logistic regression
    dw -- gradient of the loss with respect to w, thus same shape as w
    db -- gradient of the loss with respect to b, thus same shape as b
    
    Tips:
    - Write your code step by step for the propagation. np.log(), np.dot()
    """
    
    m = X.shape[1]
    
    # FORWARD PROPAGATION (FROM X TO COST)
    ### START CODE HERE ### (≈ 2 lines of code)
    A = sigmoid(np.dot(w.T, X) + b)                                     # compute activation
#     cost = (-1 / m) * (np.dot(Y, np.log(A).T) + np.dot((1 - Y), np.log(1 - A).T))                                  # compute cost
    cost = (- 1 / m) * np.sum(Y * np.log(A) + (1 - Y) * (np.log(1 - A)))  # 修改
    ### END CODE HERE ###
    
    # BACKWARD PROPAGATION (TO FIND GRAD)
    ### START CODE HERE ### (≈ 2 lines of code)
    dw = (1 / m) * np.dot(X, (A - Y).T) 
    
    db = (1 / m) * np.sum(A - Y, keepdims=True)
    ### END CODE HERE ###

    assert(dw.shape == w.shape)
    assert(db.dtype == float)
    cost = np.squeeze(cost)
    assert(cost.shape == ())
    
    grads = {
    
    "dw": dw,
             "db": db}
    
    return grads, cost

w, b, X, Y = np.array([[1],[2]]), 2, np.array([[1,2],[3,4]]), np.array([[1,0]])
grads, cost = propagate(w, b, X, Y)
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))
print ("cost = " + str(cost))

dw = [[0.99993216]
[1.99980262]]
db = [[0.49993523]]
cost = 6.000064773192205

d) Optimization

You have initialized your parameters.
You are also able to compute a cost function and its gradient.
Now, you want to update the parameters using gradient descent.

Exercise: Write down the optimization function. The goal is to learn $w$ and $b$ by minimizing the cost function $J$ . For a parameter $\theta$ , the update rule is $ \theta = \theta - \alpha \text{ } d\theta$, where $\alpha$ is the learning rate.

d)优化

你已经初始化你的参数。
你也能够计算成本函数和它的梯度。
现在，你想用梯度下降法更新参数。

练习: 写下优化函数。我们的目标是通过最小化成本函数来学习w和b。对于参数 $\theta$ ，更新规则为 $\theta = \theta - \alpha \text{} d\theta$ ，其中 $\alpha$ 是学习率。

# GRADED FUNCTION: optimize

def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost = False):
    """
    This function optimizes w and b by running a gradient descent algorithm
    
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of shape (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)
    num_iterations -- number of iterations of the optimization loop
    learning_rate -- learning rate of the gradient descent update rule
    print_cost -- True to print the loss every 100 steps
    
    Returns:
    params -- dictionary containing the weights w and bias b
    grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
    costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve.
    
    Tips:
    You basically need to write down two steps and iterate through them:
        1) Calculate the cost and the gradient for the current parameters. Use propagate().
        2) Update the parameters using gradient descent rule for w and b.
    """
    
    costs = []
    
    for i in range(num_iterations):
        
        
        # Cost and gradient calculation (≈ 1-4 lines of code)
        ### START CODE HERE ### 
        grads, cost = propagate(w, b, X, Y)  # 调用propagate()函数，得到损失函数和梯度
        ### END CODE HERE ###
        
        # Retrieve derivatives from grads
        dw = grads["dw"]
        db = grads["db"]
        
        # update rule (≈ 2 lines of code)
        ### START CODE HERE ###
        w = w - learning_rate * dw
        b = b - learning_rate * db
        ### END CODE HERE ###
        
        # Record the costs
        if i % 100 == 0:  # 每一百次迭代后，将cost追加都后面
            costs.append(cost)
        
        # Print the cost every 100 training examples
        if print_cost and i % 100 == 0:
            print ("Cost after iteration %i: %f" %(i, cost))
    
    params = {
    
    "w": w,
              "b": b}
    
    grads = {
    
    "dw": dw,
             "db": db}
    
    return params, grads, costs

params, grads, costs = optimize(w, b, X, Y, num_iterations= 100, learning_rate = 0.009, print_cost = False)

print ("w = " + str(params["w"]))
print ("b = " + str(params["b"]))
print ("dw = " + str(grads["dw"]))
print ("db = " + str(grads["db"]))

运行结果：
w = [[0.1124579 ]
[0.23106775]]
b = [[1.55930492]]
dw = [[0.90158428]
[1.76250842]]
db = [[0.43046207]]

Exercise: The previous function will output the learned w and b. We are able to use w and b to predict the labels for a dataset X. Implement the predict() function. There is two steps to computing predictions:

Calculate $\hat{Y} = A = \sigma(w^T X + b)$
Convert the entries of a into 0 (if activation <= 0.5) or 1 (if activation > 0.5), stores the predictions in a vector Y_prediction. If you wish, you can use an if/else statement in a for loop (though there is also a way to vectorize this).

练习: 前面的函数将输出学习过的w和b。我们可以使用w和b来预测数据集x的标签。实现predict()函数。计算预测有两个步骤:

计算 $\hat{Y} = A = \sigma(w^T X + b)$
将a的项转换为0(如果激活<= 0.5)或1(如果激活> 0.5)，将预测存储在向量Y_prediction中。如果愿意，可以在for循环中使用If /else语句(尽管也有一种方法对其进行向量化)

# GRADED FUNCTION: predict

def predict(w, b, X):
    '''
    Predict whether the label is 0 or 1 using learned logistic regression parameters (w, b)
    
    Arguments:
    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    
    Returns:
    Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X
    '''
    
    m = X.shape[1]  # X是行向量，所以取列数
    Y_prediction = np.zeros((1,m))
    w = w.reshape(X.shape[0], 1)  # ????????????
    
    # Compute vector "A" predicting the probabilities of a cat being present in the picture
    ### START CODE HERE ### (≈ 1 line of code)
    A = sigmoid(np.dot(w.T, X)+b)                                     # compute activation
    ### END CODE HERE ###
    
    for i in range(A.shape[1]):
        
        # Convert probabilities A[0,i] to actual predictions p[0,i]
        ### START CODE HERE ### (≈ 4 lines of code)
        if A[0,i] > 0.5:
            Y_prediction[0,i] = 1
        elif A[0,i] <= 0.5:
            Y_prediction[0,i] = 0
        ### END CODE HERE ###
    
    assert(Y_prediction.shape == (1, m))
    
    return Y_prediction

print ("predictions = " + str(predict(w, b, X)))

运算结果：
predictions = [[1. 1.]]

What to remember:
You’ve implemented several functions that:

Initialize (w,b)
Optimize the loss iteratively to learn parameters (w,b)
computing the cost and its gradient - updating the parameters using gradient descent
Use the learned (w,b) to predict the labels for a given set of examples

Note:
你已经实现了几个函数:

Initialize(w, b)
迭代优化损失以学习参数(w,b)
计算成本及其梯度-更新参数使用梯度下降
使用已学的(w,b)来预测一组给定示例的标签

5 - Merge all functions into a model

You will now see how the overall model is structured by putting together all the building blocks (functions implemented in the previous parts) together, in the right order.

Exercise: Implement the model function. Use the following notation:

Y_prediction for your predictions on the test set
Y_prediction_train for your predictions on the train set
w, costs, grads for the outputs of optimize()

5 -合并所有功能到一个模型

现在，您将看到如何按照正确的顺序将所有构建块(前面部分实现的功能)组合在一起，从而构建整个模型。
练习: 实现模型的功能。使用以下符号:

Y_prediction表示对测试集的预测
Y_prediction_train用于对训练集的预测
w，cost，grads,是optimize()返回值

# GRADED FUNCTION: model

def model(X_train, Y_train, X_test, Y_test, num_iterations = 2000, learning_rate = 0.5, print_cost = False):
    """
    Builds the logistic regression model by calling the function you've implemented previously
    
    Arguments:
    X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
    Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
    X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
    Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
    num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
    learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
    print_cost -- Set to true to print the cost every 100 iterations
    
    Returns:
    d -- dictionary containing information about the model.
    """
    
    ### START CODE HERE ###
    
    # initialize parameters with zeros (≈ 1 line of code)
    w, b = initialize_with_zeros(dim)

    # Gradient descent (≈ 1 line of code)
    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)
    
    # Retrieve parameters w and b from dictionary "parameters"
    w = parameters["w"]
    b = parameters["b"]
    
    # Predict test/train set examples (≈ 2 lines of code)
    Y_prediction_test = predict(w, b, X_test)
    Y_prediction_train = predict(w, b, X_train)

    ### END CODE HERE ###

    # Print train/test Errors
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))

    
    d = {
    
    "costs": costs,
         "Y_prediction_test": Y_prediction_test, 
         "Y_prediction_train" : Y_prediction_train, 
         "w" : w, 
         "b" : b,
         "learning_rate" : learning_rate,
         "num_iterations": num_iterations}
    
    return d

Run the following cell to train your model.

d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 2000, learning_rate = 0.005, print_cost = True)

运行结果：
Train Accuracy
99.04306220095694%
Test Accuracy
70.0 %

Comment: Training accuracy is close to 100%. This is a good sanity check: your model is working and has high enough capacity to fit the training data. Test error is 68%. It is actually not bad for this simple model, given the small dataset we used and that logistic regression is a linear classifier. But no worries, you’ll build an even better classifier next week!

Also, you see that the model is clearly overfitting the training data. Later in this specialization you will learn how to reduce overfitting, for example by using regularization. Using the code below (and changing the index variable) you can look at predictions on pictures of the test set.

评论: 训练准确度接近100%。
这是一个很好的全面检查:
您的模型正常工作，并且具有足够大的容量来匹配训练数据。测试错误为68%。考虑到我们使用的小数据集以及logistic回归是一个线性分类器，对于这个简单的模型来说，这实际上并不坏。但是不用担心，下周您将构建一个更好的分类器!
此外，您可以看到，这个模型显然过度拟合了训练数据。在这个专门化的后面，您将学习如何减少过拟合，例如使用正则化。使用下面的代码(并更改“index”变量)，您可以查看测试集图片上的预测。

# Example of a picture that was wrongly classified.
index = 1
plt.imshow(test_set_x[:,index].reshape((num_px, num_px, 3)))
print ("y = " + str(test_set_y[0,index]) + ", you predicted that it is a \"" + classes[d["Y_prediction_test"][0,index]].decode("utf-8") +  "\" picture.")

Let’s also plot the cost function and the gradients.

# Plot learning curve (with costs)
costs = np.squeeze(d['costs'])
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate =" + str(d["learning_rate"]))
plt.show()

Interpretation: You can see the cost decreasing. It shows that the parameters are being learned. However, you see that you could train the model even more on the training set. Try to increase the number of iterations in the cell above and rerun the cells. You might see that the training set accuracy goes up, but the test set accuracy goes down. This is called overfitting.

6 - Further analysis (optional/ungraded exercise)

Congratulations on building your first image classification model. Let’s analyze it further, and examine possible choices for the learning rate $\alpha$

Choice of learning rate

Reminder:
In order for Gradient Descent to work you must choose the learning rate wisely. The learning rate $\alpha$ determines how rapidly we update the parameters. If the learning rate is too large we may “overshoot” the optimal value. Similarly, if it is too small we will need too many iterations to converge to the best values. That’s why it is crucial to use a well-tuned learning rate.

Let’s compare the learning curve of our model with several choices of learning rates. Run the cell below. This should take about 1 minute. Feel free also to try different values than the three we have initialized the learning_rates variable to contain, and see what happens. .


learning_rates = [0.01, 0.001, 0.0001]
models = {
    
    }
for i in learning_rates:
    print ("learning rate is: " + str(i))
    models[str(i)] = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations = 1500, learning_rate = i, print_cost = False)
    print ('\n' + "-------------------------------------------------------" + '\n')

for i in learning_rates:
    plt.plot(np.squeeze(models[str(i)]["costs"]), label= str(models[str(i)]["learning_rate"]))

plt.ylabel('cost')
plt.xlabel('iterations')

legend = plt.legend(loc='upper center', shadow=True)
frame = legend.get_frame()
frame.set_facecolor('0.90')
plt.show()

Interpretation:

Different learning rates give different costs and thus different predictions results.
If the learning rate is too large (0.01), the cost may oscillate up and down. It may even diverge (though in this example, using 0.01 still eventually ends up at a good value for the cost).
A lower cost doesn’t mean a better model. You have to check if there is possibly overfitting. It happens when the training accuracy is a lot higher than the test accuracy.
In deep learning, we usually recommend that you:
- Choose the learning rate that better minimizes the cost function.
- If your model overfits, use other techniques to reduce overfitting. (We’ll talk about this in later videos.)

7 - Test with your own image (optional/ungraded exercise)

Congratulations on finishing this assignment. You can use your own image and see the output of your model. To do that:
1. Click on “File” in the upper bar of this notebook, then click “Open” to go on your Coursera Hub.
2. Add your image to this Jupyter Notebook’s directory, in the “images” folder
3. Change your image’s name in the following code
4. Run the code and check if the algorithm is right (1 = cat, 0 = non-cat)!

## START CODE HERE ## (PUT YOUR IMAGE NAME) 
my_image = "my_image.jpg"   # change this to the name of your image file 
## END CODE HERE ##

# We preprocess the image to fit your algorithm.
fname = "images/" + my_image
image = np.array(ndimage.imread(fname, flatten=False))
my_image = scipy.misc.imresize(image, size=(num_px,num_px)).reshape((1, num_px*num_px*3)).T
my_predicted_image = predict(d["w"], d["b"], my_image)

plt.imshow(image)
print("y = " + str(np.squeeze(my_predicted_image)) + ", your algorithm predicts a \"" + classes[int(np.squeeze(my_predicted_image)),].decode("utf-8") +  "\" picture.")

What to remember from this assignment:

Preprocessing the dataset is important.
You implemented each function separately: initialize(), propagate(), optimize(). Then you built a model().
Tuning the learning rate (which is an example of a “hyperparameter”) can make a big difference to the algorithm. You will see more examples of this later in this course!
Finally, if you’d like, we invite you to try different things on this Notebook. Make sure you submit before trying anything. Once you submit, things you can play with include:
- Play with the learning rate and the number of iterations
- Try different initialization methods and compare the results
- Test other preprocessings (center the data, or divide each row by its standard deviation)

Bibliography:

http://www.wildml.com/2015/09/implementing-a-neural-network-from-scratch/
https://stats.stackexchange.com/questions/211436/why-do-we-normalize-images-by-subtracting-the-datasets-image-mean-and-not-the-c

吴恩达课程作业（最新版，已解决早期版本不一致带来的bug）