第二周作业 Logistic Regression with a Neural Network mindset思路整理




1. 导入数据

train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()
Number of training examples: m_train = 209
Number of testing examples: m_test = 50
Height/Width of each image: num_px = 64
Each image is of size: (64, 64, 3)
train_set_x shape: (209, 64, 64, 3)
train_set_y shape: (1, 209)
test_set_x shape: (50, 64, 64, 3)
test_set_y shape: (1, 50)


train_set_x_flatten =train_set_x_orig.reshape(train_set_x_orig.shape[0],-1).T
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0],-1).T
train_set_x_flatten shape: (12288, 209)
train_set_y shape: (1, 209)
test_set_x_flatten shape: (12288, 50)
test_set_y shape: (1, 50)
sanity check after reshaping: [17 31 56 22 33]


train_set_x = train_set_x_flatten/255.
test_set_x = test_set_x_flatten/255.


For one example x ( i ) :

z ( i ) = w T x ( i ) + b

y ^ ( i ) = a ( i ) = s i g m o i d ( z ( i ) )

L ( a ( i ) , y ( i ) ) = y ( i ) log ( a ( i ) ) ( 1 y ( i ) ) log ( 1 a ( i ) )

The cost is then computed by summing over all training examples:

J = 1 m i = 1 m L ( a ( i ) , y ( i ) )

Key steps
1. Initialize the parameter of the model
2. Learn the parameters for the model by minimizing the cost
3. Use the learned parameters to make predictions(on the test set)
4. Analyse the results and conclude


  1. Define the model structure (such as number of input features)
  2. Initialize the model’s parameters
  3. Loop:
    • Calculate current loss (forward propagation)
    • Calculate current gradient (backward propagation)
    • Update parameters (gradient descent)


z – A scalar or numpy array of any size.

s -- sigmoid(z)

2. initializing parameters

    dim -- size of the w vector we want (or number of parameters in this case)

    w -- initialized vector of shape (dim, 1)
    b -- initialized scalar (corresponds to the bias)

3. Forward and Backward propagation

Forward Propagation:
1. get X
2. computa A = σ ( w T + b ) = ( a ( 1 ) , a ( 2 ) , . . . , a ( m 1 ) , a ( m ) )
3. calculate the cost function : J = 1 m i = 1 m y ( i ) log ( a ( i ) ) ( 1 y ( i ) ) log ( 1 a ( i ) )

    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat) of size (1, number of examples)

    cost -- negative log-likelihood cost for logistic regression
    dw -- gradient of the loss with respect to w, thus same shape as w
    db -- gradient of the loss with respect to b, thus same shape as b

4. optimization

update the parameters using gradient descent

    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of shape (num_px * num_px * 3, number of examples)
    Y -- true "label" vector (containing 0 if non-cat, 1 if cat), of shape (1, number of examples)
    num_iterations -- number of iterations of the optimization loop
    learning_rate -- learning rate of the gradient descent update rule
    print_cost -- True to print the loss every 100 steps

    params -- dictionary containing the weights w and bias b
    grads -- dictionary containing the gradients of the weights and bias with respect to the cost function
    costs -- list of all the costs computed during the optimization, this will be used to plot the learning curve. 


use the learned w , b to predict the labels for a dataset X
1. calculate Y ^ = A = σ ( w T X + b )
2. convert the entries of a into 0(if activation<=0.5) or 1(if activation>0.5),stores the predictions in a vector Y_prediction

    w -- weights, a numpy array of size (num_px * num_px * 3, 1)
    b -- bias, a scalar
    X -- data of size (num_px * num_px * 3, number of examples)

    Y_prediction -- a numpy array (vector) containing all predictions (0/1) for the examples in X

6. merge all function into a model

    X_train -- training set represented by a numpy array of shape (num_px * num_px * 3, m_train)
    Y_train -- training labels represented by a numpy array (vector) of shape (1, m_train)
    X_test -- test set represented by a numpy array of shape (num_px * num_px * 3, m_test)
    Y_test -- test labels represented by a numpy array (vector) of shape (1, m_test)
    num_iterations -- hyperparameter representing the number of iterations to optimize the parameters
    learning_rate -- hyperparameter representing the learning rate used in the update rule of optimize()
    print_cost -- Set to true to print the cost every 100 iterations

    d -- dictionary containing information about the model.

