Triplet Loss入门

Face verification vs. face recogntion


  • Input image, name/ID
  • Output whether the input image is that of the claimed person.


  • Has a database of K peosons(or not recognized)


We can use a face verification system to make a face recognition system. The accuracy of the verification system has to be high (around 99.9% or more) to be use accurately within a recognition system because the recognition system accuracy will be less than the verification system given K persons.

One Shot Learning

  • One of the face recognition challenges is to solve one shot learning problem.
  • One Shot Learning: A recognition system is able to recognize a person, learning from one image.
  • Historically deep learning doesn’t work well with a small number of data.
    Instead to make this work, we will learn a similarity function:
    d ( i m g 1 , i m g 2 ) d( img1, img2 ) = degree of difference between images.
    We want d result to be low in case of the same faces.
    We use τ \tau as a threshold for d:
    If d ( i m g 1 , i m g 2 ) < = τ d( img1, img2 ) <= \tau Then the faces are the same.
  • Similarity function helps us solving the one shot learning. Also its robust to new inputs.

Siamese Network

  • We will implement the similarity function using a type of NNs called Siamease Network in which we can pass multiple inputs to the two or more networks with the same architecture and parameters.
  • The loss function will be d ( x 1 , x 2 ) = f ( x 1 ) f ( x 2 ) 2 d(x^1, x^2) = || f(x^1) - f(x^2) ||^2

Triplet Loss


  • Triplet Loss is one of the loss functions we can use to solve the similarity distance in a Siamese network.
  • f ( A ) f ( P ) 2 < = f ( A ) f ( N ) 2 ||f(A) - f(P)||^2 <= ||f(A) - f(N)||^2
  • f ( A ) f ( P ) 2 f ( A ) f ( N ) 2 < = 0 ||f(A) - f(P)||^2 - ||f(A) - f(N)||^2 <= 0
  • f ( A ) f ( P ) 2 f ( A ) f ( N ) 2 < = α ||f(A) - f(P)||^2 - ||f(A) - f(N)||^2 <= -\alpha to make sure the NN won’t get an output if zero


  • Given 3 images (A, P, N)
  • L ( A , P , N ) = m a x ( f ( A ) f ( P ) 2 f ( A ) f ( N ) 2 + a l p h a , 0 ) L(A, P, N) = max (||f(A) - f(P)||^2 - ||f(A) - f(N)||^2 + alpha , 0)
  • $J = \sum(L(A[i], P[i], N[i]) , i) $for all triplets of images.


  • During training if A, P, N are chosen randomly (Subjet to A and P are the same and A and N aren’t the same) then one of the problems this constrain is easily satisfied
  • What we want to do is choose triplets that are hard to train on.

Offline triplet mining


Online triplet mining

1.对于一个有B个样本的batch,我们最多可以产生 B 3 B^3 个triplets。这里面虽然有很多无效的(没有两个P,一个N),但是却可以在一个batch中产生更多的triplets。
2.Batch hard strategy

  • 计算一个2D距离矩阵然后将无效的设置为0,将有效的pair留下来( a p , a\neq p, a和p有着相同的label),然后在修改后的矩阵计算每一行的最大值。
  • 计算最小值N的时候不能将无效的设置为0(无效的是a和
def batch_hard_triplet_loss(labels, embeddings, margin, squared=False):
    """Build the triplet loss over a batch of embeddings.

    For each anchor, we get the hardest positive and hardest negative to form a triplet.

        labels: labels of the batch, of size (batch_size,)
        embeddings: tensor of shape (batch_size, embed_dim)
        margin: margin for triplet loss
        squared: Boolean. If true, output is the pairwise squared euclidean distance matrix.
                 If false, output is the pairwise euclidean distance matrix.

        triplet_loss: scalar tensor containing the triplet loss
    # Get the pairwise distance matrix
    pairwise_dist = _pairwise_distances(embeddings, squared=squared)

    # For each anchor, get the hardest positive
    # First, we need to get a mask for every valid positive (they should have same label)
    mask_anchor_positive = _get_anchor_positive_triplet_mask(labels)
    mask_anchor_positive = tf.to_float(mask_anchor_positive)

    # We put to 0 any element where (a, p) is not valid (valid if a != p and label(a) == label(p))
    anchor_positive_dist = tf.multiply(mask_anchor_positive, pairwise_dist)

    # shape (batch_size, 1)
    hardest_positive_dist = tf.reduce_max(anchor_positive_dist, axis=1, keepdims=True)

    # For each anchor, get the hardest negative
    # First, we need to get a mask for every valid negative (they should have different labels)
    mask_anchor_negative = _get_anchor_negative_triplet_mask(labels)
    mask_anchor_negative = tf.to_float(mask_anchor_negative)

    # We add the maximum value in each row to the invalid negatives (label(a) == label(n))
    max_anchor_negative_dist = tf.reduce_max(pairwise_dist, axis=1, keepdims=True)
    anchor_negative_dist = pairwise_dist + max_anchor_negative_dist * (1.0 - mask_anchor_negative)

    # shape (batch_size,)
    hardest_negative_dist = tf.reduce_min(anchor_negative_dist, axis=1, keepdims=True)

    # Combine biggest d(a, p) and smallest d(a, n) into final triplet loss
    triplet_loss = tf.maximum(hardest_positive_dist - hardest_negative_dist + margin, 0.0)

    # Get final mean triplet loss
    triplet_loss = tf.reduce_mean(triplet_loss)

    return triplet_loss

