11.2. Convolutional Neural Network

Also Known as CNN

Combination of:

Deep Neural Networks

Kernel Convolutions

With assumption, that input is image

../../_images/convolutional-neural-network-overview.png — Figure 11.9. General overview of Convolutional Neural Network

11.2.1. Problemy z przetwarzaniem obrazów

cienie
nakładające się obrazy
zmiany kąta i pochyłości kamery
kąt padania światła
kolorystyka
zakrzywienia płaszczyzny
szumy

11.2.2. What is and Kernel Convolution?

../../_images/convolutional-neural-network-kernels.png — Figure 11.10. Convolutional Neural Network with 3x3 kernel convolutions

../../_images/convolution-filter-mean.gif — Figure 11.11. Convolution with 3x3 kernel for Mean Blur Filter

../../_images/convolution-filter-gaussian.gif — Figure 11.12. Convolution with 3x3 kernel for Gaussian Blur Filter

11.2.3. What is Convolutional Neural Network (CNN / ConvNet)

../../_images/convolutional-neural-network-architecture.jpg — Figure 11.13. Architecture of the Convolutional Neural Network

Convolutional Neural Networks are very similar to ordinary Neural Networks from the previous chapter: they are made up of neurons that have learnable weights and biases. Each neuron receives some inputs, performs a dot product and optionally follows it with a non-linearity. The whole network still expresses a single differentiable score function: from the raw image pixels on one end to class scores at the other. And they still have a loss function (e.g. SVM/Softmax) on the last (fully-connected) layer and all the tips/tricks we developed for learning regular Neural Networks still apply.

../../_images/convolutional-neural-network-transformation.png — Figure 11.14. Convolutional Neural Network layer pool transformation

So what does change? ConvNet architectures make the explicit assumption that the inputs are images, which allows us to encode certain properties into the architecture. These then make the forward function more efficient to implement and vastly reduce the amount of parameters in the network.

../../_images/convolutional-neural-network-example.jpg — Figure 11.15. Convolutional Neural Network example

11.2.4. Handwritten digits recognition (MNIST) with `sklearn`

import matplotlib.pyplot as plt
from sklearn.datasets import fetch_mldata
from sklearn.neural_network import MLPClassifier


mnist = fetch_mldata("MNIST original")

# rescale the data, use the traditional train/test split
features = mnist.data / 255.
labels = mnist.target

features_train = features[:60000]
features_test = features[60000:]

labels_train = labels[:60000]
labels_test = labels[60000:]


model = MLPClassifier(
    hidden_layer_sizes=(50,),
    max_iter=10,
    alpha=1e-4,
    solver='sgd',
    verbose=10,
    tol=1e-4,
    random_state=1,
    learning_rate_init=.1
)

model.fit(features_train, labels_train)

training_score = model.score(features_train, labels_train)
test_score = model.score(features_test, labels_test)

print(f"Training set score: {training_score}")
print(f"Test set score: {test_score}")

fig, axes = plt.subplots(4, 4)

# use global min / max to ensure all weights are shown on the same scale
vmin = model.coefs_[0].min()
vmax = model.coefs_[0].max()


for coef, ax in zip(model.coefs_[0].T, axes.ravel()):

    # każdy obrazek to jest jeden neuron
    # Neuronów jest 50
    ax.matshow(
        coef.reshape(28, 28),
        cmap=plt.cm.gray,
        vmin=.5 * vmin,
        vmax=.5 * vmax)

    ax.set_xticks(())
    ax.set_yticks(())

plt.show()  # doctest: +SKIP

11.2.5. Handwritten digits recognition (MNIST) with `tensorflow`

import numpy as np
import tensorflow as tf

# Data sets
IRIS_TRAINING = '../_data/iris_training.csv'
IRIS_TEST = '../_data/iris_test.csv'


# Load datasets.
training_set = tf.contrib.learn.datasets.base.load_csv_with_header(
    filename=IRIS_TRAINING,
    target_dtype=np.int,
    features_dtype=np.float32)

test_set = tf.contrib.learn.datasets.base.load_csv_with_header(
    filename=IRIS_TEST,
    target_dtype=np.int,
    features_dtype=np.float32)


# Specify that all features have real-value data
feature_columns = [tf.contrib.layers.real_valued_column("", dimension=4)]


# Build 3 layer DNN with 10, 20, 10 units respectively.
classifier = tf.contrib.learn.DNNClassifier(
    feature_columns=feature_columns,
    hidden_units=[10, 20, 10],
    n_classes=3,
    model_dir="/tmp/iris_model")


# Define the training inputs
def get_train_inputs():
    x = tf.constant(training_set.data)
    y = tf.constant(training_set.target)
    return x, y


# Fit model.
classifier.fit(input_fn=get_train_inputs, steps=2000)


# Define the test inputs
def get_test_inputs():
    x = tf.constant(test_set.data)
    y = tf.constant(test_set.target)
    return x, y


# Evaluate accuracy.
accuracy_score = classifier.evaluate(input_fn=get_test_inputs, steps=1)["accuracy"]
print(f"\nTest Accuracy: {accuracy_score:f}\n")


# Classify two new flower samples.
def new_samples():
    return np.array(
        [[6.4, 3.2, 4.5, 1.5],
         [5.8, 3.1, 5.0, 1.7]], dtype=np.float32)


predictions = list(classifier.predict_classes(input_fn=new_samples))

print(f"New Samples, Class Predictions: {predictions}\n")

# Test Accuracy: 0.966667
# New Samples, Class Predictions: [1, 1]

11.2.6. Handwritten digits recognition (MNIST) with `keras`

Gets to 99.25% test accuracy after 12 epochs

import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras import backend as K

batch_size = 128
num_classes = 10
epochs = 12

# input image dimensions
img_rows, img_cols = 28, 28

# the data, shuffled and split between train and test sets
(x_train, y_train), (x_test, y_test) = mnist.load_data()

if K.image_data_format() == 'channels_first':
    x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
    x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
    input_shape = (1, img_rows, img_cols)
else:
    x_train = x_train.reshape(x_train.shape[0], img_rows, img_cols, 1)
    x_test = x_test.reshape(x_test.shape[0], img_rows, img_cols, 1)
    input_shape = (img_rows, img_cols, 1)

x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')

# convert class vectors to binary class matrices
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)

model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3),
                 activation='relu',
                 input_shape=input_shape))
model.add(Conv2D(64, (3, 3), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(128, activation='relu'))
model.add(Dropout(0.5))
model.add(Dense(num_classes, activation='softmax'))

model.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adadelta(),
              metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=batch_size,
          epochs=epochs,
          verbose=1,
          validation_data=(x_test, y_test))
score = model.evaluate(x_test, y_test, verbose=0)
print('Test loss:', score[0])
print('Test accuracy:', score[1])

11.2.7. Przydatne odnośniki

https://github.com/fchollet/keras/tree/master/examples