# Train a CNN model to identify captcha code with TensorFlow and Keras

In the last post (Automatically fill in captcha code in course selection system), we exploited the "Play Audio" button function to obtain the captcha code in the course selection system from my college. Today, we will be going through another approch to identify the captcha code by training a CNN model with TensorFlow and Keras.

A captcha code from the course selection system.

## Install Needed Packages

Below are the environment and package versions that I perform the training in this post.

MacOS 10.14.6
Python: 3.7.3
numpy: 1.18.0
scikit-learn: 0.22
TensorFlow: 2.0.0
Pillow: 6.2.1

If lacking any of these packages, just simply install them with the following commands.

NumPy: pip install numpy
scikit-learn: pip install scikit-learn
TensorFlow: pip install tensorflow
Pillow: pip install Pillow

## Data Preparation

Data preparation is the most time consuming part during machine learning. For this reason, I have prepared 100 captcha code for training inside in the training folder, and 5 for testing insiede the testing folder. The download link is provided in below. Inside the zip file, the filenames represent the correct captcha code of the captcha images. Unzip it and put it in the same folder with the code we will be writing later.

## Build Up and Train the Model

In the first step, create an empty file called train.py and import all the packages that will be using later.

import numpy as np
import os
from sklearn.model_selection import train_test_split
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras import models
from tensorflow.keras.preprocessing.image import img_to_array


Initiallize the variables.

epochs = 10       #Training times
img_rows = None   #Height of the captcha code image
img_cols = None   #Weight of the captcha code image
digits_in_img = 6 #Number of digits of the captcha code
x_list = list()   #All captcha code images will be stored in the here
y_list = list()   #All the correct answer of captcha code images will be stored in the here
x_train = list()  #All captcha code images for training will be stored in the here
y_train = list()  #All the correct answer of captcha code images for training will be stored in the here
x_test = list()   #All captcha code images for testing will be stored in the here
y_test = list()   #All the correct answer of captcha code images for testing will be stored in the here


Now, define a function that splits out each digit of captcha code given a captcha code image. Each splitted digit of captcha code will be stored in x_list, and the correct answer of each splitted digit of captcha code will be stored in y_list.

def split_digits_in_img(img_array, x_list, y_list):
for i in range(digits_in_img):
step = img_cols // digits_in_img
x_list.append(img_array[:, i * step:(i + 1) * step] / 255)
y_list.append(img_filename[i])


Next, iterate through all the .png captcha code images from training folder and split out the digits by calling the function we just wrote.

img_filenames = os.listdir('training')

for img_filename in img_filenames:
if '.png' not in img_filename:
continue
img_array = img_to_array(img)
img_rows, img_cols, _ = img_array.shape
split_digits_in_img(img_array, x_list, y_list)


After splitting all the captcha codes, we will have to turn the corrects answer of captcha codes into categorical format. For example, 1 is represented as [0, 1, 0, 0, 0, 0, 0, 0, 0, 0,], 2 is represented as [0, 0, 1, 0, 0, 0, 0, 0, 0, 0,], etc.

When the conversion is done, break our data into 2 piles with one pile for training and the other for testing.

y_list = keras.utils.to_categorical(y_list, num_classes=10)
x_train, x_test, y_train, y_test = train_test_split(x_list, y_list)


For the model part, we first determine whether if there exists any trained model file called cnn_model.h5. If the file exists then load it into our program, otherwise, create a new CNN model with the following structure.

if os.path.isfile('cnn_model.h5'):
else:
model = models.Sequential()
model.add(layers.Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(img_rows, img_cols // digits_in_img, 1)))
print('New model created.')



With our model created or loaded, we can now hop in the training part. Because of the model is not too complicated, laptops can perform the training for 10 epochs within a minute. When the training is done, evaluate the model and display the accuracy. If nothing goes wrong, save the model.

model.fit(np.array(x_train), np.array(y_train), batch_size=digits_in_img, epochs=epochs, verbose=1, validation_data=(np.array(x_test), np.array(y_test)))

loss, accuracy = model.evaluate(np.array(x_test), np.array(y_test), verbose=0)
print('Test loss:', loss)
print('Test accuracy:', accuracy)

model.save('cnn_model.h5')


Awesome. We have completed our first part, now save the file and close it. The code can be executed by calling python train.py to traing the model. During training, we can see that the losses drop gradually with the accuracies rise in the same time. The test accuracy comes to 98% at the end.

With our captcha code recognition model being well-trained, it's time to try out some captcha code images that the model has never seen before form testing folder.

## Test the Model with Blind Data

This time, create another file called predict.py and import the needed packages.

import numpy as np
import os
import sys
from tensorflow.keras import models
from tensorflow.keras.preprocessing.image import img_to_array


Initiallize the variables and set numpy to display at most nine decimal places for floating numbers.

img_rows = None
img_cols = None
digits_in_img = 6
model = None
np.set_printoptions(suppress=True, linewidth=150, precision=9, formatter={'float': '{: 0.9f}'.format})


Similar as what we just wrote above in train.py: define a function that splits out the captcha code.

def split_digits_in_img(img_array):
x_list = list()
for i in range(digits_in_img):
step = img_cols // digits_in_img
x_list.append(img_array[:, i * step:(i + 1) * step] / 255)
return x_list


Load the trained model which named cnn_model.h5. If the file doesn't exist, terminate the program.

if os.path.isfile('cnn_model.h5'):
else:
print('No trained model found.')
exit(-1)


Prompt the user to input a captcha code image for prediction. The input image will be loaded as grayscale. Next, we split out the digits in the captcha code and store them into x_list.

img_filename = input('Varification code img filename: ')
img_array = img_to_array(img)
img_rows, img_cols, _ = img_array.shape
x_list = split_digits_in_img(img_array)


We then predict the each splitted captcha code in order. (The model also supports predicting in batch as well.)

varification_code = list()
for i in range(digits_in_img):
confidences = model.predict(np.array([x_list[i]]), verbose=0)
result_class = model.predict_classes(np.array([x_list[i]]), verbose=0)
varification_code.append(result_class[0])
print('Digit {0}: Confidence=> {1}    Predict=> {2}'.format(i + 1, np.squeeze(confidences), np.squeeze(result_class)))
print('Predicted varification code:', varification_code)


This is all for our predict.py. Now let's try some captcha code images from testing folder, just simply execute python predict.py and type in the filename. The result will somehow looks like below, with six digits being correctly identified.

Machine learning is amazing isn't it:)