Tensorflow2.0学习（16）：kaggle数据集——10_monkeys_model

10_monkeys数据集

10_monkeys是kaggle平台上的图片数据集，共有十种猴子的各种照片。根据此数据集应用卷积神经网络完成分类。

在Kaggle平台上的实战

创建储存图片的文件夹路径

# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

导包

import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
import numpy as np
import sklearn
import pandas as pd
import os
import sys
import time
import tensorflow as tf
from tensorflow import keras
print(tf.__version__)
print(sys.version_info)
for module in mpl, np ,pd, sklearn, tf, keras:
    print(module.__name__, module.__version__)

2.1.0
sys.version_info(major=3, minor=6, micro=6, releaselevel='final', serial=0)
matplotlib 3.0.3
numpy 1.18.1
pandas 0.25.3
sklearn 0.22.1
tensorflow 2.1.0
tensorflow_core.keras 2.2.4-tf

设置图片的路径

train_dir = "/kaggle/input/10-monkey-species/training/training"
valid_dir = "/kaggle/input/10-monkey-species/validation/validation"
label_file = "/kaggle/input/10-monkey-species/monkey_labels.txt"
print(os.path.exists(train_dir))
print(os.path.exists(valid_dir))
print(os.path.exists(label_file))

True
True
True

读取、观察标签

labels = pd.read_csv(label_file, header=0)
print(labels)

  Label     Latin Name              Common Name                     \
0  n0         alouatta_palliata\t    mantled_howler                   
1  n1        erythrocebus_patas\t    patas_monkey                     
2  n2        cacajao_calvus\t        bald_uakari                      
3  n3        macaca_fuscata\t        japanese_macaque                 
4  n4       cebuella_pygmea\t        pygmy_marmoset                   
5  n5       cebus_capucinus\t        white_headed_capuchin            
6  n6       mico_argentatus\t        silvery_marmoset                 
7  n7      saimiri_sciureus\t        common_squirrel_monkey           
8  n8       aotus_nigriceps\t        black_headed_night_monkey        
9  n9       trachypithecus_johnii    nilgiri_langur                   

    Train Images    Validation Images  
0             131                  26  
1             139                  28  
2             137                  27  
3             152                  30  
4             131                  26  
5             141                  28  
6             132                  26  
7             142                  28  
8             133                  27  
9             132                  26

读取图片和其标签，保存成模型读取的格式

# 做卷积的时候所有图片的尺寸应该是一样的
height = 128
width = 128
channels = 3 
batch_size = 64
num_classes = 10

# 读取训练数据并作数据增强
# 确定一些读取格式要求
train_datagen = keras.preprocessing.image.ImageDataGenerator(
    rescale = 1./255,
    # 图片旋转的角度范围，用来数据增强
    rotation_range = 40,
    # 水平平移
    width_shift_range = 0.2,
    # 高度平移
    height_shift_range = 0.2,
    # 剪切强度
    shear_range = 0.2,
    # 缩放强度
    zoom_range = 0.2,
    # 水平翻转
    horizontal_flip = True,
    # 对图片做处理时需要填充图片，用最近的像素点填充
    fill_mode = "nearest"
)
# 读取训练数据
train_generator = train_datagen.flow_from_directory(
    train_dir,
    # 读取后将图片存什么大小 
    target_size = (height, width),
    batch_size = batch_size,
    seed = 7,
    shuffle = True,
    # label的编码格式：这里为one-hot编码
    class_mode = 'categorical')

# 读取验证数据
valid_datagen = keras.preprocessing.image.ImageDataGenerator(rescale = 1./255)
valid_generator = valid_datagen.flow_from_directory(
    valid_dir,
    # 读取后将图片存什么大小 
    target_size = (height, width),
    batch_size = batch_size,
    seed = 7,
    shuffle = False,
    # label的编码格式：这里为one-hot编码
    class_mode = 'categorical')

train_num = train_generator.samples
valid_num = valid_generator.samples
print(train_num, valid_num)

Found 1098 images belonging to 10 classes.
Found 272 images belonging to 10 classes.
1098 272

观测输入数据的格式

for i in range(2):
    x, y = train_generator.next()
    print(x.shape, y.shape)
    print(y)

(64, 128, 128, 3) (64, 10)
[[0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]]
(64, 128, 128, 3) (64, 10)
[[0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]
 [1. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0. 0. 0. 0. 0. 1.]
 [0. 0. 0. 0. 0. 1. 0. 0. 0. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]
 [0. 0. 0. 0. 0. 0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1. 0. 0. 0. 0. 0.]]

构建模型
- 在后期运行程序的时候，出现了bug，发现是loss应用错了函数：
  如果targets函数是one-hot编码，用 categorical_crossentropy，one-hot 编码：[0, 0, 1], [1, 0, 0], [0, 1, 0]
  如果tagets函数是数字编码，用 sparse_categorical_crossentropy，数字编码：2, 0, 1

model = keras.models.Sequential([
    keras.layers.Conv2D(filters=32, kernel_size=3,
                             padding='same',
                             activation="relu",
                             input_shape=[width, height, channels]),
    keras.layers.Conv2D(filters=32, kernel_size=3,
                             padding='same',
                             activation="relu"
                             ),

    keras.layers.MaxPool2D(pool_size=2),
     keras.layers.Conv2D(filters=64, kernel_size=3,
                             padding='same',
                             activation="relu",
                             ),
    keras.layers.Conv2D(filters=64, kernel_size=3,
                             padding='same',
                             activation="relu"
                             ),

    keras.layers.MaxPool2D(pool_size=2),
     keras.layers.Conv2D(filters=128, kernel_size=3,
                             padding='same',
                             activation="relu",
                             ),
    keras.layers.Conv2D(filters=128, kernel_size=3,
                             padding='same',
                             activation="relu"
                             ),

    keras.layers.MaxPool2D(pool_size=2),
    keras.layers.Flatten(),
    keras.layers.Dense(128, activation='relu'),
    keras.layers.Dense(num_classes,activation='softmax')
    
])
model.compile(loss="categorical_crossentropy",
             optimizer="adam",
             metrics = ["accuracy"])
model.summary()

Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_18 (Conv2D)           (None, 128, 128, 32)      896       
_________________________________________________________________
conv2d_19 (Conv2D)           (None, 128, 128, 32)      9248      
_________________________________________________________________
max_pooling2d_9 (MaxPooling2 (None, 64, 64, 32)        0         
_________________________________________________________________
conv2d_20 (Conv2D)           (None, 64, 64, 64)        18496     
_________________________________________________________________
conv2d_21 (Conv2D)           (None, 64, 64, 64)        36928     
_________________________________________________________________
max_pooling2d_10 (MaxPooling (None, 32, 32, 64)        0         
_________________________________________________________________
conv2d_22 (Conv2D)           (None, 32, 32, 128)       73856     
_________________________________________________________________
conv2d_23 (Conv2D)           (None, 32, 32, 128)       147584    
_________________________________________________________________
max_pooling2d_11 (MaxPooling (None, 16, 16, 128)       0         
_________________________________________________________________
flatten_3 (Flatten)          (None, 32768)             0         
_________________________________________________________________
dense_6 (Dense)              (None, 128)               4194432   
_________________________________________________________________
dense_7 (Dense)              (None, 10)                1290      
=================================================================
Total params: 4,482,730
Trainable params: 4,482,730
Non-trainable params: 0
_________________________________________________________________

训练模型

epochs = 10 
# 数据是generator出来的，所以不能直接用fit
history = model.fit_generator(train_generator,
                               steps_per_epoch = train_num // batch_size,
                               epochs=epochs,
                              validation_data = valid_generator,
                              validation_steps = valid_num // batch_size
                             )

Train for 17 steps, validate for 4 steps
Epoch 1/10
17/17 [==============================] - 81s 5s/step - loss: 2.3143 - accuracy: 0.1132 - val_loss: 2.2271 - val_accuracy: 0.1953
Epoch 2/10
17/17 [==============================] - 80s 5s/step - loss: 2.2123 - accuracy: 0.1692 - val_loss: 2.1524 - val_accuracy: 0.1758
Epoch 3/10
17/17 [==============================] - 79s 5s/step - loss: 2.1053 - accuracy: 0.2012 - val_loss: 2.0001 - val_accuracy: 0.3086
Epoch 4/10
17/17 [==============================] - 79s 5s/step - loss: 1.9543 - accuracy: 0.2969 - val_loss: 1.8549 - val_accuracy: 0.3359
Epoch 5/10
17/17 [==============================] - 80s 5s/step - loss: 2.0232 - accuracy: 0.2689 - val_loss: 1.8926 - val_accuracy: 0.3477
Epoch 6/10
17/17 [==============================] - 79s 5s/step - loss: 1.8416 - accuracy: 0.3288 - val_loss: 1.6534 - val_accuracy: 0.3984
Epoch 7/10
17/17 [==============================] - 82s 5s/step - loss: 1.7299 - accuracy: 0.3759 - val_loss: 1.5653 - val_accuracy: 0.4688
Epoch 8/10
17/17 [==============================] - 82s 5s/step - loss: 1.6817 - accuracy: 0.4033 - val_loss: 1.4859 - val_accuracy: 0.4883
Epoch 9/10
17/17 [==============================] - 80s 5s/step - loss: 1.5872 - accuracy: 0.4449 - val_loss: 1.5066 - val_accuracy: 0.4805
Epoch 10/10
17/17 [==============================] - 80s 5s/step - loss: 1.6426 - accuracy: 0.4091 - val_loss: 1.4463 - val_accuracy: 0.5195

print(history.history.keys())

dict_keys(['loss', 'accuracy', 'val_loss', 'val_accuracy'])

训练结果

def plot_learning_curves(history, label, epochs, min_value, max_value):
    data = {}
    data[label]=history.history[label]
    data['val_'+label]=history.history['val_'+label]
    pd.DataFrame(data).plot(figsize=(8, 5))
    plt.grid(True)
    plt.axis([0, epochs, min_value, max_value])
    plt.show()
    
plot_learning_curves(history, 'accuracy', epochs, 0, 1)
plot_learning_curves(history, 'loss', epochs, 1.5, 2.5)

在这里插入图片描述

一枚小白的日常

发布了35 篇原创文章 · 获赞 3 · 访问量 2495

私信关注

Tensorflow2.0学习（16）：kaggle数据集——10_monkeys_model

10_monkeys数据集

在Kaggle平台上的实战

猜你喜欢