由于工作繁忙原因,对人脸识别技术原理的连载停了一段时间,从今天开始尝试恢复回来。我们先回想一下前面完成的工作。这几节主要任务就是为神经网络的训练准备足够多的数据,第一步是创建不包含或者包含人脸部分小于30%的图片,我们从人脸图片数据集中的每张图片随机选取一个矩形区域,确定该区域与人脸区域不重合或重合部分少于30%,这部分数据我们成为neg,目的是告诉网络没有人脸的图片是怎样的。
接着再次选取一系列区域,这次确保选取的区域与人脸区域的重合度高于30%但是低于65%,这类数据我们称为part,其目的是训练网络识别部分人脸,由此增强网络对人脸的认知能力。第三部分就是选取一系列矩形区域,确保区域与人脸部分的重合度大于65%,这部分数据称为positive,其目的是让网络学会识别人脸特征。
同时我们还找来数据集“Deep Convolutional Network Cascade for Facial Point Detection”,该数据集包含了众多人脸图片,同时标记了人脸中五个关键点的坐标,这些关键点分别为左右眼睛,鼻子,还有两边嘴角,我们要训练网络在识别图片时能找到这5个关键点所在位置,这样才能有效提高网络对图片中人脸的查询能力。
这一系列数据要输入网络时,读取IO是一个瓶颈。为了提升读取效率,我们需要将这些数据集中起来形成联系的存储块,这样读入内存时效率才能保证,要知道我们需要将几十万张小图片输入给网络,因此IO读写是有效训练神经网络的关键,此次我们采用tensorflow框架下的tfrecord来存储数据,其原理与我们在上一节讲解过的protocol buffer一模一样。
接下来我们要把前面几节获取的相关图片数据,人脸矩形归一化后对应的坐标,人脸五个关键点归一化坐标等,这里总共有将近一百多万条数据需要处理,因此数据的读写非常棘手,首先要做的就是将所有坐标信息从多个文件读取到内存中,代码如下:
def get_dataset(dir, item): #对应生成的train_pnet_landmark.txt
print('dir is: ', dir, item)
dataset_dir = os.path.join(dir, item)
print('join path: ', os.path.join(dir, item))
print('dataset dir: ', dataset_dir)
image_list = open(dataset_dir, 'r')
dataset = []
for line in tqdm(image_list.readlines()):
info = line.strip().split(' ')
if len(info) < 2:
print('info err: ', info)
data_example = {}
bbox = {}
data_example['filename'] = info[0]
data_example['label'] = int(info[1])
bbox['xmin'] = 0.0 #初始化人脸区域
bbox['ymin'] = 0.0
bbox['xmax'] = 0.0
bbox['ymax'] = 0.0
bbox['xlefteye'] = 0.0 #初始化10个关键点
bbox['ylefteye'] = 0.0
bbox['xrighteye'] = 0.0
bbox['yrighteye'] = 0.0
bbox['xnose'] = 0.0
bbox['ynose'] = 0.0
bbox['xleftmouth'] = 0.0
bbox['yleftmouth'] = 0.0
bbox['xrightmouth'] = 0.0
bbox['yrightmouth'] = 0.0
if len(info) == 6: #当前记录只包含人脸区域
bbox['xmin'] = float(info[2])
bbox['ymin'] = float(info[3])
bbox['xmax'] = float(info[4])
bbox['ymax'] = float(info[5])
if len(info) == 12: #当前记录包含了10个人脸关键点
bbox['xlefteye'] = float(info[2])#初始化10个关键点
bbox['ylefteye'] = float(info[3])
bbox['xrighteye'] = float(info[4])
bbox['yrighteye'] = float(info[5])
bbox['xnose'] = float(info[6])
bbox['ynose'] = float(info[7])
bbox['xleftmouth'] = float(info[8])
bbox['yleftmouth'] = float(info[9])
bbox['xrightmouth'] = float(info[10])
bbox['yrightmouth'] = float(info[11])
data_example['bbox'] = bbox
dataset.append(data_example)
return dataset
这些信息存储在前面几节我们生成的pos_12.txt,landmark_12_aug.txt等文件中,接下来是把前面截取的图片块数据转换成字符串读取到内存中:
def process_image(filename):
try:
image = cv2.imread(filename)
#print('image to string')
image_data = image.tostring()
#print('finish image to string')
assert len(image.shape) == 3
height = image.shape[0]
width = image.shape[1]
assert image.shape[2] == 3
return image_data, height, width
except Exception as e:
#print('process image err: ', e, filename)
return None, None, None
第三部就是将前面两步读取的信息写入到tfrecord数据结构中,该结构会以特定格式存储成文件:
def _int64_feature(value):
if not isinstance(value, list):
value = [value]
try:
return tf.train.Feature(int64_list=tf.train.Int64List(value = value))
except Exception as e:
print('int64 err: ', e)
def _float_feature(value):
if not isinstance(value, list):
value = [value]
try:
return tf.train.Feature(float_list=tf.train.FloatList(value = value))
except Exception as e:
print('float err: ', e)
def _bytes_feature(value):
if not isinstance(value, list):
value = [value]
try:
return tf.train.Feature(bytes_list = tf.train.BytesList(value = value))
except Exception as e:
print('bytes err: ', e)
def convert_to_example(image_example, image_buffer):
class_label = image_example['label']
bbox = image_example['bbox']
roi = [bbox['xmin'], bbox['ymin'], bbox['xmax'], bbox['ymax']]
landmark = [bbox['xlefteye'], bbox['ylefteye'], bbox['xrighteye'], bbox['yrighteye'],
bbox['xnose'], bbox['ynose'], bbox['xleftmouth'],
bbox['yleftmouth'], bbox['xrightmouth'],
bbox['yrightmouth']]
try:
example = tf.train.Example(features = tf.train.Features(feature = {
'image/encoded': _bytes_feature(image_buffer),
'image/label': _int64_feature(class_label),
'image/roi': _float_feature(roi),
'image/landmark': _float_feature(landmark)
}))
return example
except Exception as e:
print('example err: ',e , image_example)
def add_to_tfrecord(filename, image_example, tfrecord_writer):
begin = time.time()
image_data, height, width = process_image(filename)
end = time.time()
# print('time for process image: ', end-begin, filename)
if image_data != None:
# print('convert to example ')
example = convert_to_example(image_example, image_data)
# print('after convert to example')
tfrecord_writer.write(example.SerializeToString())
print('tfrecord write')
dataset_dir = '/content/drive/MyDrive/my_mtcnn/data'
import tensorflow as tf
def create_tf_record(size):
output_dir = os.path.join(dataset_dir, str(size) + "/tf_record")
if not os.path.exists(output_dir):
os.mkdir(output_dir)
if size == 12:
net = 'PNet'
tf_filenames = [os.path.join(output_dir, 'train%s_landmark.tfrecord' %(net))]
items = ['train_pnet_landmark.txt']
elif size == 24: #以后再考虑余下两只情况
pass
elif size == 48:
pass
if tf.io.gfile.exists(tf_filenames[0]):
print("tf record file alreay created")
for tf_filename, item in zip(tf_filenames, items): #在size=12时看似多于,在后面处理size=24或48时用上
print('reading daa....')
dataset = get_dataset(dataset_dir, item)
tf_filename = tf_filename + '_shuffle'
random.shuffle(dataset)
print('transform to tfrecord')
with tf.io.TFRecordWriter(tf_filename) as tfrecord_writer:
for image_example in tqdm(dataset):
filename = image_example['filename']
try:
add_to_tfrecord(filename, image_example, tfrecord_writer)
except Exception as e:
print('tf record exception: ', e)
print('completing transform..!')
create_tf_record(12)
大家注意看上面关于tfrecord结构的代码,也就是下面这段:
example = tf.train.Example(features = tf.train.Features(feature = {
'image/encoded': _bytes_feature(image_buffer),
'image/label': _int64_feature(class_label),
'image/roi': _float_feature(roi),
'image/landmark': _float_feature(landmark)
}))
从这里可以看出tfrecord结构跟json或是python的字典结构很像,它也是以key-value的方式存储,而value则对应byte,float,int等基本数据结构,也就是它特别用于存储二进制数据,上面的代码运行后就可以生成基于tfrecord的二进制文件,该文件会把前面几节我们生成的训练数据集合到一个文件里,在笔者试验过程中发现该过程相当缓慢,笔者使用的是colab和google drive,由于数据琐碎且数量众多,要完成该步骤,笔者预计要10个小时以上,当我完成该步骤的运行后,我会把结果分享给大家以避免读者朋友浪费太多时间在数据预处理上。