用于人脸识别的深度神经网络(IJEM-V8-N1-6)

用于人脸识别的深度神经网络(IJEM-V8-N1-6)
用于人脸识别的深度神经网络(IJEM-V8-N1-6)

I.J. Engineering and Manufacturing, 2018, 1, 63-71

Published Online January 2018 in MECS (https://www.360docs.net/doc/fa12241381.html,)

DOI: 10.5815/ijem.2018.01.06

Available online at https://www.360docs.net/doc/fa12241381.html,/ijem

Deep Neural Network for Human Face Recognition

Dr. Priya Gupta a, Nidhi Saxena a, Meetika Sharma a, Jagriti Tripathi a

a Maharaja Agrasen College, University of Delhi, Vasundhara Enclave, Delhi - 110096, India

Received: 19 May 2017; Accepted: 11 September 2017; Published: 08 January 2018

Abstract

Face recognition (FR), the process of identifying people through facial images, has numerous practical applications in the area of biometrics, information security, access control, law enforcement, smart cards and surveillance system. Convolutional Neural Networks (CovNets), a type of deep networks has been proved to be successful for FR. For real-time systems, some preprocessing steps like sampling needs to be done before using to CovNets. But then also complete images (all the pixel values) are passed as input to CovNets and all the steps (feature selection, feature extraction, training) are performed by the network. This is the reason that implementing CovNets are sometimes complex and time consuming. CovNets are at the nascent stage and the accuracies obtained are very high, so they have a long way to go. The paper proposes a new way of using a deep neural network (another type of deep network) for face recognition. In this approach, instead of providing raw pixel values as input, only the extracted facial features are provided. This lowers the complexity of while providing the accuracy of 97.05% on Yale faces dataset.

Index Terms: Face recognition, haar cascade, deep neural networks, convolutional neural networks, softmax. ? 2018 Published by MECS Publisher. Selection and/or peer review under responsibility of the Research Association of Modern Education and Computer Science.

1. Introduction

Face recognition (FR) system identifies a face by matching it with the facial database. It has gained great progress in the recent years due to improvement in design and learning of features and face recognition models [1]. As humans have an exceptional ability to recognize people irrespective of their age, lighting conditions and varying expressions. The aim of researchers is to design an FR system which can match or even surpass the human recognition rate which is nearly 97.5%.

The techniques used in best facial recognition systems may depend on the application of system. Face recognition systems may be divided into two broad categories:

Find a person from his image in a large database of facial images (eg. a police database). These systems

returns the details of the person being searched for. Often only one image is available per person. It is usually not necessary for recognition to be done in real time.

Identify a person in real time. These are used in systems which allow access to a certain group of people and deny access to others. Multiple images per person are often available for training and real time recognition is required. The proposed idea is for the second type of systems with varying facial details, expressions, and angles. It remains an open problem to find an ideal facial feature which is robust for FR in unconstrained environments [2].

The conventional face recognition pipeline consists of four stages: face detection, face alignment, face representation (or feature extraction), and classification [2]. The proposed method extracts facial features from input images and feeds them to deep neural networks for training and classification (softmax layer is used). The architecture of network is very flexible and layers can be added or removed to get best results. In recent times there are numerous libraries, functions and platforms to create and modify a network.

CovNets are a specialized kind of neural networks for processing data that has a known, grid-like topology. These networks have been tremendously successful in practical applications that include time-series data, which can be thought of as a 1D grid taking samples at regular time intervals, and image data, which can be thought of as a 2D grid of pixels. Convolutional networks are simply neural networks that use convolution in place of general matrix multiplication in at least one of their layers. The name “convolutional neural network”indicates that the network employs a mathematical operation called convolution. Convolution is a specialized kind of linear operation [3].

1.1. Literature Work

Recently, multiple CovNets or deep CovNets have shown good results for face verification. According to Yi Sun et.al [1], existing methods generally address the problem of FR in two steps: feature extraction (design or learn features from each individual face image separately to acquire a better representation) and recognition (calculate similarity score between two compared faces using feature representation of each face [1]. For face recognition (FR), many approaches have been implemented earlier, like the use of neural networks [3,4,6], geometrical features, Eigen faces, template matching, and graph matching. CovNets has shown many promising results for FR [2,4,5,7,8,9]. Automatic feature extraction method using ratios of distances, presented by Kanade [7] used geometrical features and reported a recognition rate between 45-75% with a database of 20 people.

The approaches like self-organizing maps (SOM) and Karhunen-Loeve (KL) transform both can be used for dimensionality reduction, from which SOM proved to be an efficient algorithm [3]. Principal Component Analysis (PCA) has also been successfully implemented for same purpose [3, 4]. Though CovNets have shown promising results for FR, it remains still ambiguous to design a good CovNet architecture for a specific classification task due to the lack of theoretical guidance [3]. According to [6], CovNet-Restricted Boltzmann Machine (RBM) has shown 97.08% accuracy for matching two images of same person in unconstrained environment.

Brunelli and Poggio [8] computed a set of geometrical features such as nose width and length, mouth position and chin shape. They reported a recognition rate of 90% on a database of 47 people. However, they showed that a simple template matching scheme shows 100% recognition for the same database. Cox, et.al [9] have introduced a mixture-distance technique which achieved a recognition rate of 95% using a query database of 95 images, where each face was represented by 30 manually extracted features.

By Pentland et al. [5,12] good results are reported on a large database (95% recognition of 200 people out of 3000). It is difficult to draw broad conclusions as many images of the same people looked very similar [10]. In [10], Lades et al.presented a dynamic link architecture for distortion invariant object recognition which employs elastic graph matching to find the closest stored graph. Sparse graphs whose vertices are labeled with a multi-resolution description in terms of a local power spectrum, and whose edges are labeled with

geometrical distances. They presented good results with a database of 87 people and test images composed of different expressions and faces turned 15?. The matching process is computationally expensive, taking roughly 25s to compare with 87 stored objects when a parallel machine with 23 transputers is used. Thus, Eigen faces is a fast, simple, and practical algorithm. However, it may be limited because optimal performance requires a high degree of correlation between the pixel intensities of the training and test images [3]. Graph matching is another approach to face recognition.

Wikott et al. [12] used an updated version of the technique and compare 300 faces against 300 different faces of the same people taken from the Face Recognition Technology (FERET) database. They report a recognition rate of 97.3%.

In constrained environments, hand-crafted features such as Local Binary Patterns (LBP) [16] and Local Phase Quantization (LPQ) [15,17] have received respectable performance in FR. However, the performance degrades dramatically when applied on images taken in unconstrained environments such as varying facial alignment, expression and illumination.

High-level recognition is typically modeled with many stages of processing as in Marr paradigm of processing from images to surfaces to three-dimensional (3D) models to matched models [10]. However, Turk and Pentland [18] argue that there is also a recognition process based on two-dimensional (2D) image processing. They presented a face recognition scheme in which face images are projected onto the principal components of the original set of training images. The resulting Eigen faces are classified by comparison with known individuals [3].

None of the previous methods have used the idea of feeding only the extracted features into deep neural networks to accomplish the task of FR. The paper proposes the use of haar cascade (frontal face) for pre-processing the images which are then fed as input to deep neural networks for face recognition rather than directly passing the pixel values to CovNets.

2. Deep Neural Network

A Neural Network is human brain inspired algorithm designed to recognize pattern in numerical datasets. The real world data for example image, text audio, video etc; needs to be transformed into numerical vectors to use neural nets. A neural network is composed of different layers and a layer is made up of multiple nodes. Based on the type of pattern the neural network is trying to learn each input data fed into a node is assigned some weight. These weights determine the importance of the input data in producing the end result. The weighted sum of input data is calculated and depending on some threshold biases the output for the node is determined. The mapping of input to output is performed by some activation function.

The goal of a neural network is to approximate some function ‘f’. Task of a simple classifier function y = f(x), is to map the input data x to a class y, while the neural network identifies the parameter β, that results in best approximation function, y = f(x,β).

Fig.1. A neural node

A simple neural network is a network of such functions, that may be defined as f(x)=f3(f2(f1(x))). In the chain, f1 is called the first layer, similarly f2 is the second layer and so on. The length of this chain determines the depth of the neural network. Final layer is called the output layer. A schematic representation of a neural

network is depicted in fig2. While training the desired output of each layer is not visible therefore the middle layers are called the hidden layer. A Deep Neural Network (DNN) is a feed forward Artificial Neural network (ANN), with multiple hidden layers and higher level of abstraction.

Fig.2. A Small Neural Network

The width of the DNN is determined from the dimensionality of the hidden layer. The hidden layer values are calculated through activation function. Learning in deep neural networks requires minimizing the cost function, like in case of classification cost function is the difference between actual label and the predicted label. Generally gradient descent is used for this purpose. In modern neural network, it is recommended to use Rectilinear Unit or relu as activation function. A single hidden unit ?(i)activation is given by

?(i)=σ(w(i)T x) (1) Where, σ() is the tanh function w(i), with the weight vector for the i th hidden unit, and x is the input. It gives a nonlinear transformation still it remains very close to the linearity making linear models to be easily optimized by Gradient Descent [13,20].

Generally limited data causes problem of over fitting in DNN. To avoid this dropout is used [6]. It randomly drops some nodes from the layers based on their probability. "dropping out" indicates temporarily removing units along with its incoming and outgoing edges. this is depicted in figure 3.

Fig.3. Dropout Neural Net Model

3. Proposed Work

The following subsections explore the components of the new system. The approach proposes the use of frontal face in haar cascade (defined in opencv) for preprocessing the input images and feeding only the facial features to network for learning and classification. The following steps are performed for FR on yalefaces dataset.

3.1. Preprocessing

Feature selection is the process of selecting the useful features and leaving the extra features. Feature extraction is the process of making combining more than one feature to a single feature. In the proposed

approach uses frontal face, for both feature selection and extraction, which takes the image as input and returns only the values of key features of the face present in image. It first detects the face in the image (selection of features representing face in complete image) and then extract facial features (like making a set of features that denote eyes, nose, lips in the image), as shown in figure 4.

Fig.4. Facial Features in Image

3.2. Learning

A multi-layer feed-forward deep neural network is designed to learn the simplified features from the previous step. Multiple sets of activation, dense (fully-connected) and dropout layers are added in order to make learning efficient After each set the number of features reduces thus using only the important features for classification in the last layer. The architecture of the network is shown in figure 5. This network is trained in 50 epoches and training and testing accuracies are plotted and shown in figure 6, and losses in figure 7. As it can be seen, the accuracies are increased as the number of iterations proceeds. Further increasing the iterations or number of epoches to 75 or 100 results in same final accuracy, so 50 is the most efficient value.

3.3.Classification

The last layer of the network is softmax is used for classification. This is because the extracted facial features are simply stored for training images. Each authorized person will have a class, and the input image must belong to either one of the classes. If it matches to any of the images present in database, the system will allow the person to enter the secured place or to access private information.

Fig.5. Architecture of DNN used

3.4. Algorithm

The method for proposed FR system is defined as follows. The dataset used is Yalefaces A.

?Load the pixel values of all images from dataset

?Detect the facial features of all images using haar cascade

?Crop the face according to the output of previous step

?Split the data for cross validation in the ratio 9:1

?Design the following Neural Network

1.Model consists four layers of Neural Network

2.First layer is dense layer giving 512 outputs with relu activation and dropout of 0.2.

3.Second layer is dense layer giving 512 outputs with relu activation and dropout of 0.2.

4.Third layer is dense layer giving 256 outputs with relu activation and dropout of 0.2.

5.Fourth layer (output layer) is dense layer giving 15 outputs with softmax activation and dropout of 0.2. ?Train the Neural Network with epoch=50.

?Plot the train and test accuracies.

?Calculate the final average accuracy.

3.5. Implementation Details

The proposed method is tested on Yaleface database which consists of black and white images of 15 samples, each having 11 images in different expressions making a total of 165 images. A sample is shown in figure 7. Images for one subject in different facial expression or configuration: center-light, with glasses, happy, left-light, without glasses, normal, right-light, sad, sleepy, surprised, and wink. The categories are defined as subjects along with labels, so total classes are 15. The complete dataset is splitted in two parts: 148 images for training, and 17 for testing. Final average accuracy is calculated at softmax layer by checking the number of test samples which identified correctly. The idea is implemented on Python 3.5.3 (64 bit) system. Opencv package is used for pre-processing using frontal face feature of haar cascade. Creation and training of neural network is done using keras, theano, and tensorflow (packages available in python). Final average accuracy achieved in proposed system is 97.05%, which is close to human face recognition accuracy of nearly 97.5%. The comparative analysis from some of the previous FR systems is provided in tables.

Fig.6. Graph Showing Testing and Training Accuracies

Fig.7. Losses

Fig.8. Yalefaces Dataset

Table 1. Accuracies of FR system

Table 2. Details of above FR system

4. Conclusion

The use of haar cascade for extracting facial features and feeding them instead of raw pixel values helps in decreasing the complexity of neural network based recognition framework as the number of redundant input features has been decreased. Also the use of DNN instead of CovNets makes the process lighter and faster. Also, the accuracy is not compromised in the proposed method as average accuracy obtained is 97.05%. Though one additional step of extraction of facial features from each image is added, still the process is better for small datasets.

References

[1]Sun, Yi, Xiaogang Wang, and Xiaoou Tang. "Hybrid Deep Learning for Face Verification." IEEE

Transactions on Pattern Analysis and Machine Intelligence 38.10 (2016): 1997-2009.

[2]Hu, Guosheng, et al. "When face recognition meets with deep learning: an evaluation of convolutional

neural networks for face recognition." Proceedings of the IEEE International Conference on Computer Vision Workshops. 2015.

[3]Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning. MIT Press, 2016.

[4]Zhang, Tong, et al. "A deep neural network driven feature learning method for multi-view facial

expression recognition." IEEE Trans. Multimed 99 (2016): 1.

[5]Lawrence, Steve, et al. "Face recognition: A convolutional neural-network approach." IEEE transactions

on neural networks 8.1 (1997): 98-113.

[6]Srivastava, Nitish, et al. "Dropout: a simple way to prevent neural networks from overfitting." Journal of

Machine Learning Research 15.1 (2014): 1929-1958.

[7]Kanade, Takeo. "Picture processing system by computer complex and recognition of human faces"

Doctoral dissertation, Kyoto University 3952 (1973): 83-97.

[8]Brunelli, Roberto, and Tomaso Poggio. "Face recognition: Features versus templates." IEEE

transactions on pattern analysis and machine intelligence 15.10 (1993): 1042-1052.

[9]Cox, Ingemar J., Joumana Ghosn, and Peter N. Yianilos. "Feature-based face recognition using mixture-

distance." Computer Vision and Pattern Recognition, 1996. Proceedings CVPR'96, 1996 IEEE Computer Society Conference on. IEEE, 1996.

[10]Ahonen, Timo, et al. "Recognition of blurred faces using local phase quantization." Pattern Recognition,

2008. ICPR 2008. 19th International Conference on. IEEE, 2008.

[11]Kohonen, Teuvo, and Self-Organizing Maps. "vol. 30." Berlin, Heidelberg, New York: Springer 1997

(1995): 2001.

[12]Wiskott, Laurenz, et al. "Face recognition and gender determination." (1995): 92-97.

[13]Maas, Andrew L., Awni Y. Hannun, and Andrew Y. Ng. "Rectifier nonlinearities improve neural

network acoustic models." Proc. ICML. Vol. 30. No. 1. 2013.

[14]Chan, Chi Ho, et al. "Multiscale local phase quantization for robust component-based face recognition

using kernel fusion of multiple descriptors." IEEE Transactions on Pattern Analysis and Machine Intelligence 35.5 (2013): 1164-1177.

[15]S. Moore and R. Bowden, Local binary patterns for multi-view facial expression recognition. Computer

Vision and Image Understanding, 2011, pp. 541-558.

[16]Ahonen, Timo, Abdenour Hadid, and Matti Pietik?inen. "Face recognition with local binary patterns."

Computer vision-eccv 2004 (2004): 469-481.

[17]M. Dahmane and J. Meunier, Emotion recognition using dynamic gridbased HoG features. Proc. IEEE

International Conference on Automatic Face and Gesture Recognition and Workshops, 2011, pp. 884-888.

[18]Turk, Matthew, and Alex Pentland. "Eigenfaces for recognition." Journal of cognitive neuroscience 3.1

(1991): 71-86.

[19]P. Ekman, W. V. Friesen, and J. C. Hager, Facial Action Coding System: The Manual on CD ROM.

Research Nexus Division of Network Information Research Corporation, 2002.

[20] D. G. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,”International Journal of

Computer Vision, vol. 60, no. 2, pp. 91–110, Nov. 2004.

[21] D. G. Lowe, “Object Recognition from local scale-invariant keypoints,”In proc. of International

Conference on Computer Vision, pp. 1150-1157, 1999.

Authors’ Profiles

Dr. Priya Gupta is working as an Assistant Professor in the Department of Computer Science

at Maharaja Agrasen College, University of Delhi. Her Doctoral Degree is from BIT (Mesra),

Ranchi. She has more than 13 years of teaching and 5 years of Industry Experience. Her

research Interest lies in the area of Machine Learning, Theory of Computation, Compiler

Design, Data Mining, Artificial Intelligence etc. She has authored book titled “CRM System and Cross Selling in Indian Banking Industry”,“Innovation in Payment Systems –An Approach Towards Cashless Mandis”, and Banking the Unbanked - A Step Towards Financial Inclusion in Indian Mandis. She has also edited book titled “Innovative Minds – Technocrats with Vision” (Unfolding the dimensions of Compiler Design and Microprocessor) - Volume I and “Innovative Minds –Technocrats with Vision”(Unfolding the dimension of Artificial Intelligence and Information Security) - Volume – II.

Nidhi Saxena is pursuing M. Tech in Computer Science and Engineering with specialization in

‘Artificial Intelligence and Artificial Neural Networks’from University of Petroleum and

Energy Studies. She completed her B. Tech in Computer Science from Maharaja Agrasen

College, University of Delhi in 2017. She has already published some research papers in the

field of machine learning. Her areas of interest are machine learning, especially deep learning, fuzzy logic.

Meetika Sharma was born in Delhi, India, in 1994. She received the B.Tech degree in

computer science from the University of Delhi, Delhi, India, in 2017. She is currently doing

masters in computer science with specialization in ‘Intelligent Systems’ from The University of

Texas at Dallas, Dallas, USA. She has published a few research papers in the field of machine

learning. Her current research interests include machine learning, pattern matching and natural

processing language. She was the recipient of ‘Indira Award’from Directorate of Education, Delhi, India.

Jagriti Tripathi, has completed her B.Tech in Computer Science from Maharaja Agrasen

College, University of Delhi, India in 2017. Her research interests include machine leaning,

artificial intelligence and pattern recognition.

How to cite this paper: Priya Gupta, Nidhi Saxena, Meetika Sharma, Jagriti Tripathi,"Deep Neural Network for Human Face Recognition", International Journal of Engineering and Manufacturing(IJEM), Vol.8, No.1, pp.63-71, 2018.DOI: 10.5815/ijem.2018.01.06

神经网络在人脸识别中的应用

神经网络在人脸识别中的应用 1. 引言 早在上世纪60年代末,人脸识别即引起了研究者的强烈兴趣.但早期的人脸识别一般都需要人的某些先验知识,无法摆脱人的干预。进入上世纪90年代,由于高速度、高性能计算机的出现,人脸识别的方法有了重大突破,进入了真正的机器自动识别阶段,人脸识别研究得到了前所未有的重视。人脸识别方法有很多种: (1) 特征脸方法。这种方法起源于图像描述技术,采用特征脸识别方法有良好的 稳定性、位移不变性、特征向量与图像的高度成比例变化以及转置不变性。不足之处是受表情变化、光照角度强度变化和视角变化等严重影响,鲁棒性较差。(2) 隐马尔可夫模型方法(HiddenMarkovModel) 是用于描述信号统计特征的一 组统计模型。HMM 的基本理论是由Baum和Welch等人在20世纪60年代末70年代初建立,在语音识别中应用较多。 (3) 弹性图匹配方法。弹性图匹配方法是一种基于动态连接结构的方法。它将人脸用格状的稀疏图表示。 (4) 神经网络方法。人工神经网络是由多个神经元按照一定的排列顺序构成的,是一个非线性动力学系统,其特色在于信息的分布式存储和并行协同处理。虽然单个神经元的结构极其简单,功能有限,但由大量冲经元所构成的网络系统却能够 实现复杂丰富的功能。神经网络系统除了具有集体运算的能力和自适应的学习能力外,还有根强的容错性和鲁棒性?善于联想、综合和推广。神经网络模型各种各样。它们是从不同的角度对生物神经系统不同层次的描述和模拟。有代表性的网络模型有感知器、多层映射BP网络、RBF网络等。目前,在人工神经网络的实际应用中,绝大部分的神经网络模型都是采用BP网络及其变化形式,它也是前向网

神经网络在人脸识别中的应用

神经网络在人脸识别中的应用 1.引言 早在上世纪60年代末, 人脸识别即引起了研究者的强烈兴趣.但早期的人脸识别一般都需要人的某些先验知识, 无法摆脱人的干预。进入上世纪9O年代, 由于高速度、高性能计算机的出现,人脸识别的方法有了重大突破, 进入了真正的机器自动识别阶段, 人脸识别研究得到了前所未有的重视。人脸识别方法有很多种: (1)特征脸方法。这种方法起源于图像描述技术,采用特征脸识别方法有良好的稳定性、位移不变性、特征向量与图像的高度成比例变化以及转置不变性。不足之处是受表情变化、光照角度强度变化和视角变化等严重影响, 鲁棒性较差。(2)隐马尔可夫模型方法(HiddenMarkovMode1)是用于描述信号统计特征的一组统计模型。HMM的基本理论是由Baum和Welch等人在20世纪6O年代末70年代初建立, 在语音识别中应用较多。 (3)弹性图匹配方法。弹性图匹配方法是一种基于动态连接结构的方法。它将人脸用格状的稀疏图表示。 (4)神经网络方法。人工神经网络是由多个神经元按照一定的排列顺序构成的, 是一个非线性动力学系统, 其特色在于信息的分布式存储和并行协同处理。虽然单个神经元的结构极其简单, 功能有限, 但由大量冲经元所构成的网络系统却能够实现复杂丰富的功能。神经网络系统除了具有集体运算的能力和自适应的学习能力外, 还有根强的容错性和鲁棒性.善于联想、综合和推广。神经网络模型各种各样。它们是从不同的角度对生物神经系统不同层次的描述和模拟。有代表性的网络模型有感知器、多层映射BP网络、RBF网络等。目前, 在人工神经网络的实际应用中,绝大部分的神经网络模型都是采用BP网络及其变化形式, 它也是前向网络的核心部分, 是人工神经网络最精华的部分。2BP神经网络的人脸识别BP神经网络用于人脸识别一般应先对输入图像实行图像预处理,然后进行特征提取,接下来就是BP网络训练,最后用训练好的网络进行识别,获得识别结果。 2.基于特征脸和BP 神经网络的人脸识别方法 2.1特征脸分析 这种方法是根据图像的统计特征进行正交变换( K-L 变换) [3] , 以去除样 本间的相关性, 然后根据特征值的大小选择特征向量( 主分量) , 由于这些特 征向量的图像类似人脸, 所以称为特征脸[4, 5] 。下面就这种方法作简要介绍。 X∈RN 为表示一幅图像的随机向量, 这里N是图像的大小, X 由图像的行或列连 接而成的向量。假设有p 个人, 每个人有r1 ( 1≤i≤P) 个人脸样本图像, 样 本集为{ Xji } , Xji表示第j个人的第i个样本。那么每个人样本均值向量为 mi ( 1≤ i≤p) ; 总体样本均值向量为m; 类间散布矩阵为

基于神经网络的人脸识别实验报告

基于神经网络的人脸识实验报告别 一、 实验要求 采用三层前馈BP 神经网络实现标准人脸YALE 数据库的识别。 二、BP 神经网络的结构和学习算法 实验中建议采用如下最简单的三层BP 神经网络,输入层为],,,[21n x x x X =,有n 个神经元节点,输出层具有m 个神经元,网络输出为],,,[21m y y y Y =,隐含层具有k 个神经元,采用BP 学习算法训练神经网络。 BP 神经网络的结构 BP 网络在本质上是一种输入到输出的映射,它能够学习大量的输入与输出之间的映射关系,而不需要任何输入和输出之间的精确的数学表达式,只要用已知的模式对BP 网络加以训练,网络就具有输入输出对之间的映射能力。 BP 网络执行的是有教师训练,其样本集是由形如(输入向量,期望输出向量)的向量对构成的。在开始训练前,所有的权值和阈值都应该用一些不同的小随机数进行初始化。 BP 算法主要包括两个阶段: (1) 向前传播阶段 ①从样本集中取一个样本(X p ,Y p ),将X p 输入网络,其中X p 为输入向量,Y p 为期望输出向量。 ②计算相应的实际输出O p 。 在此阶段,信息从输入层经过逐级的变换,传送到输出层。这个过程也是网络在完成训练后正常运行时执行的过程。在此过程中,网络执行的是下列运算: (1)(2)()21(...((())...))n p n p O F F F X W W W = (2) 向后传播阶段 ①计算实际输出O p 与相应的理想输出Y p 的差; ②按极小化误差的方法调整权矩阵。 这两个阶段的工作一般应受到精度要求的控制,定义

21 1()2m p pj pj j E y o ==-∑ (1) 作为网络关于第p 个样本的误差测度(误差函数)。而将网络关于整个样本集的误差测度定义为 p E E =∑ (2) 如前所述,将此阶段称为向后传播阶段,也称之为误差传播阶段。 为了更清楚地说明本文所使用的BP 网络的训练过程,首先假设输入层、中间层和输出层的单元数分别是N 、L 和M 。X=(x 0,x 1,…,x N-1)是加到网络的输入矢量,H=(h 0,h 1,…,h L-1)是中间层输出矢量,Y=(y 0,y 1,…,y M-1)是网络的实际输出矢量,并且用D=(d 0,d 1,…,d M-1)来表示训练组中各模式的目标输出矢量。输出单元i 到隐单元j 的权值是V ij ,而隐单元j 到输出单元k 的权值是W jk 。另外用θk 和Φj 来分别表示输出单元和隐单元的阈值。 于是,中间层各单元的输出为: 1 0()N j ij i j i h f V x φ-==+∑ (3) 而输出层各单元的输出是: 1 0()L k jk j k j y f W h θ-==+∑ (4) 其中f(*)是激励函数,采用S 型函数: 1 ()1x f x e -=+ (5) 在上述条件下,网络的训练过程如下: (1) 选定训练集。由相应的训练策略选择样本图像作为训练集。 (2) 初始化各权值V ij ,W jk 和阈值Φj ,θk ,将其设置为接近于0的随机值,并初始化精度控制参数ε和学习率α。 (3) 从训练集中取一个输入向量X 加到网络,并给定它的目标输出向量D 。 (4) 利用式(7)计算出一个中间层输出H ,再用式(8)计算出网络的实际输出Y 。 (5) 将输出矢量中的元素y k 与目标矢量中的元素d k 进行比较,计算出M 个输出误差项:()(1)k k k k k d y y y δ=--对中间层的隐单元也计算出L 个误差项: 1 *0(1)M J j j k jk k h h W δδ-==-∑ (6) 依次计算出各权值和阈值的调整量: ()(/(1))*((1)1)**jk jk k j W n L W n h αδ?=+?-+ (6)

基于BP神经网络的人脸识别

基于BP神经网络的人脸识别 学生:林仙土学号:S071954 摘要:人脸自动识别技术有着广阔的应用领域,本文提出用主成分分析和BP神经网络进行人脸识别。人脸识别包括两个部分:第一,特征提取;第二,神经网络进行识别。 关键词:BP神经网络人脸识别主成分分析 本系统采用20幅图像(4个人每人5幅)作为训练图像,应用主成分分析对训练图像进行二阶相关和降维,提取训练图像的独立基成分构造人脸子空间,并将训练集中的人脸图像向独立基上投影得到的系数输入改进的BP神经网络进行训练。然后将待识别的人脸图像向独立基上投影得到投影系数,再将其输入已训练过的BP神经网络进行识别。此方法对人脸库图像进行测试,识别率达到90%以上。本系统采用MATLAB编程,并运用了其中的GUI编程实现人机交互。 为在不同机子下顺利运行,本系统用uigetdir函数让用户选择训练图像库和待识别图像,使得待识别图像可在不同位置皆可让软件识别。 注意:待识别图像的名字必须是test.jpg。 系统界面: 程序: function varargout=BP(varargin) gui_Singleton=1;

gui_State=struct('gui_Name',mfilename,... 'gui_Singleton',gui_Singleton,... 'gui_OpeningFcn',@BP_OpeningFcn,... 'gui_OutputFcn',@BP_OutputFcn,... 'gui_LayoutFcn',[],... 'gui_Callback',[]); if nargin&&ischar(varargin{1}) gui_State.gui_Callback=str2func(varargin{1}); end if nargout [varargout{1:nargout}]=gui_mainfcn(gui_State,varargin{:}); else gui_mainfcn(gui_State,varargin{:}); end function BP_OpeningFcn(hObject,eventdata,handles,varargin) handles.output=hObject; guidata(hObject,handles); %UIWAIT makes BP wait for user response(see UIRESUME) %uiwait(handles.figure1); %---Outputs from this function are returned to the command line. function varargout=BP_OutputFcn(hObject,eventdata,handles) %varargout cell array for returning output args(see VARARGOUT); %hObject handle to figure %eventdata reserved-to be defined in a future version of MATLAB %handles structure with handles and user data(see GUIDATA) %Get default command line output from handles structure varargout{1}=handles.output; %---Executes on button press in input. function input_Callback(hObject,eventdata,handles) %hObject handle to input(see GCBO) %eventdata reserved-to be defined in a future version of MATLAB %handles structure with handles and user data(see GUIDATA) global TestDatabasePath TestDatabasePath=uigetdir('D:\','Select test database path'); axes(handles.axes1); a=imread(strcat(TestDatabasePath,'\test.jpg')); imshow(a) set(handles.text1,'string','image for recognition') %---Executes on button press in recognise. function recognise_Callback(hObject,eventdata,handles) %hObject handle to recognise(see GCBO) %eventdata reserved-to be defined in a future version of MATLAB %handles structure with handles and user data(see GUIDATA) TrainDatabasePath=uigetdir('D:\','Select training database path');

基于神经网络的人脸识别

【代码及说明见第四页】 基于三层BP 神经网络的人脸识别 一、 实验要求 采用三层前馈BP 神经网络实现标准人脸YALE 数据库的识别。 二、BP 神经网络的结构和学习算法 实验中建议采用如下最简单的三层BP 神经网络,输入层为],,,[21n x x x X ,有n 个神经元节点,输出层具有m 个神经元,网络输出为],,,[21m y y y Y ,隐含层具有k 个神经元,采用BP 学习算法训练神经网络。 BP 神经网络的结构 BP 网络在本质上是一种输入到输出的映射,它能够学习大量的输入与输出之间的映射关系,而不需要任何输入和输出之间的精确的数学表达式,只要用已知的模式对BP 网络加以训练,网络就具有输入输出对之间的映射能力。 BP 网络执行的是有教师训练,其样本集是由形如(输入向量,期望输出向量)的向量对构成的。在开始训练前,所有的权值和阈值都应该用一些不同的小随机数进行初始化。 BP 算法主要包括两个阶段: (1) 向前传播阶段 ①从样本集中取一个样本(X p ,Y p ),将X p 输入网络,其中X p 为输入向量,Y p 为期望输出向量。 ②计算相应的实际输出O p 。 在此阶段,信息从输入层经过逐级的变换,传送到输出层。这个过程也是网络在完成训练后正常运行时执行的过程。在此过程中,网络执行的是下列运算: (1)(2)()21(...((())...))n p n p O F F F X W W W (2) 向后传播阶段

①计算实际输出O p 与相应的理想输出Y p 的差; ②按极小化误差的方法调整权矩阵。 这两个阶段的工作一般应受到精度要求的控制,定义 21 1()2m p pj pj j E y o (1) 作为网络关于第p 个样本的误差测度(误差函数)。而将网络关于整个样本集的误差测度定义为 p E E (2) 如前所述,将此阶段称为向后传播阶段,也称之为误差传播阶段。 为了更清楚地说明本文所使用的BP 网络的训练过程,首先假设输入层、中间层和输出层的单元数分别是N 、L 和M 。X=(x 0,x 1,…,x N-1)是加到网络的输入矢量,H=(h 0,h 1,…,h L-1)是中间层输出矢量,Y=(y 0,y 1,…,y M-1)是网络的实际输出矢量,并且用D=(d 0,d 1,…,d M-1)来表示训练组中各模式的目标输出矢量。输出单元i 到隐单元j 的权值是V ij ,而隐单元j 到输出单元k 的权值是W jk 。另外用θk 和Φj 来分别表示输出单元和隐单元的阈值。 于是,中间层各单元的输出为: 1 0()N j ij i j i h f V x (3) 而输出层各单元的输出是: 1 0()L k jk j k j y f W h (4) 其中f(*)是激励函数,采用S 型函数: 1 ()1x f x e (5) 在上述条件下,网络的训练过程如下: (1) 选定训练集。由相应的训练策略选择样本图像作为训练集。 (2) 初始化各权值V ij ,W jk 和阈值Φj ,θk ,将其设置为接近于0的随机值,并初始化精度控制参数ε和学习率α。 (3) 从训练集中取一个输入向量X 加到网络,并给定它的目标输出向量D 。 (4) 利用式(7)计算出一个中间层输出H ,再用式(8)计算出网络的实际输出Y 。 (5) 将输出矢量中的元素y k 与目标矢量中的元素d k 进行比较,计算出M 个输出误差项:()(1)k k k k k d y y y 对中间层的隐单元也计算出L 个误差项: 1 *0(1)M J j j k jk k h h W

基于神经网络的人脸识别技术方法研究

第3期2019年2月No.3February,2019 人们在特定时刻的感觉在临床术语中被称为“情绪”。6种基本情绪被认为是快乐、悲伤、愤怒、恐惧、厌恶和惊讶,而其他已知的人类情感往往被视为这6种复杂社交情境的特殊化。研究人员从各种观点研究了情绪在人工智能中的作用:开发与人类更优雅互动的代理人和机器人,开发利用情绪模拟来辅助自己推理的系统,或创建更接近人体情感互动和学习的机器人。皮卡德指出:“智能复杂自适应系统中将会有功能,它们必须响应不可预测的复杂信息,这些信息起着情感在人们身上发挥作用的作用。”因此,对于计算机以实时方式响应复杂的情感信号,他们将需要像我们所拥有的系统,我们称之为情感。 人类的情感不仅是一种合乎逻辑的理性成分,它们与行为和感情紧密相连。人类情感系统在生存、社会互动和合作以及学习中起着至关重要的作用。机器需要一种情感—机器运动。因此,我们可以确定智能机器需要情绪,以便在学习复杂任务时以及在对人类的学习和决策制定进行建模时提高其表现。 情绪在人类决策过程中发挥着重要作用,因此,当我们试图模拟人类反应时,它们应该嵌入推理过程中。Bates 使用Ortony 等描述的模型提出了一个可信的代理人。该模型仅描述了基本情绪和先天反应;然而,它为构建计算机情感模拟提供了一个很好的起点。Kort 等提出了一个模型,他们的目的是概念化情绪对学习的影响,然后,建立一个工作的基于计算机的模型,将识别学习者的情感状态并对其作出适当的反应,以便学习将以最佳的速度进行。Poel 等引入了模拟混合神经网络架构,用于情绪学习。系统从注释数据中学习如何产生情绪状态以及由内部和外部刺激引起的变化。Clocksin 探讨了记忆中的问题,并结合可能的人工认知架构进行了研究。这项工作与人工智能研究中考虑记忆和情感的传统方式背道而驰,并且源于社会和发展心理学中出现的两种思想[1]。 本文提出了一种基于情绪反向传播学习算法的情绪神经网络。情绪神经网络具有两种模拟情绪,有助于网络学习和分类过程。这两种情绪是焦虑和自信。结合这些情绪参数的基本原理是它们对我们人类认知过程中的学习的影响。在实践中,两个情感参数意味着当情绪神经网络被训练时,一个是使用所有节点作为训练模式样本的输入平均值,另一个是某种程度上的增加惯性项用于在训练时期进展时修改从一种模式到下一种模式的变化水平。从数学的角度来看,当接 近成本函数的最小值时,我们不希望被单个模式的误差所左 右,其中一些模式可能是异常值。因此,我们关注最近学习步骤积累的“记忆”。 本文旨在研究这些额外的情绪参数对情绪神经网络在学习和决策中的表现的影响。我们使用脸部图像数据库,它已经在我们以前的作品中有效地使用。该数据库包括270个不同性别,种族和年龄的30人的图像,具有各种照明条件和对比度。面部图像的多样性旨在研究情绪神经网络的稳健性。 1 具有情绪参数的学习算法 反向传播学习算法是用于训练分层神经网络的广义delta 规则。自前人引入该算法以来,该算法已被广泛使用。在本节中,基于情绪反向传播学习算法的情绪神经网络学习算法,根据情绪神经网络内的信息流详细解释,该网络由3层组成:输入层(i )神经元,具有(h )神经元的隐藏层和具有(j )神经元的输出层。1.1 输入层神经元 这些是非处理神经元;每个输入层神经元的输出定义为YIi =Xii 其中, XIi 和YIi 分别是输入和输出输入层中神经元i 的值。1.2 隐藏层神经元 这些是处理神经元,因此, S 形激活函数用于激活该层中的每个神经元。这里,假设有一个隐藏层。但是相同的过程可以应用于多个隐藏层。 其中XHh 和YHh 分别是隐藏层中神经元h 的输入和输出值。使用进入该神经元的所有输入值的总电位计算隐藏层神经元XHh 的输入。总电位是输入值的乘法和它们的相关权重的总和。 其中Whi 是隐藏神经元h 给予输入神经元i 的权重, YIi 是输入神经元i 的输出, r 是输入层神经元的最大数量。其中Whb 是由隐藏神经元h 给予隐藏层偏置神经元b 的权重, Xb 是偏置神经元的输入值。1.3 输出层神经元 这些也是处理神经元,因此, S 形激活函数用于激活该层中的每个神经元。 其中XJj 和YJj 分别是输出层中神经元j 的输入和输出值。除了偏置和情绪神经元之外,还使用从先前隐藏层馈送神经元的所有输入值的总电位来计算输出层神经元XJj 的输入。 作者简介:吴思楠(1997— ),女,辽宁丹东人,本科生;研究方向:计算机科学与技术。 摘 要:文章研究了情绪神经网络的效率,该网络使用改进的反向传播学习算法。实验结果表明,人工情绪可以成功建模并有 效实施,以改善神经网络的学习和普遍性。关键词:神经网络;反向传播算法;人工情感建模;面部识别基于神经网络的人脸识别技术方法研究 吴思楠 (辽宁师范大学海华学院,辽宁 沈阳 110167) 无线互联科技 Wireless Internet Technology

基于神经网络的人脸识别(附代码)

【代码及说明见第四页】 基于三层BP 神经网络的人脸识别 一、 实验要求 采用三层前馈BP 神经网络实现标准人脸YALE 数据库的识别。 二、BP 神经网络的结构和学习算法 实验中建议采用如下最简单的三层BP 神经网络,输入层为],,,[21n x x x X ,有n 个神经元节点,输出层具有m 个神经元,网络输出为],,,[21m y y y Y ,隐含层具有k 个神经元,采用BP 学习算法训练神经网络。 BP 神经网络的结构 BP 网络在本质上是一种输入到输出的映射,它能够学习大量的输入与输出之间的映射关系,而不需要任何输入和输出之间的精确的数学表达式,只要用已知的模式对BP 网络加以训练,网络就具有输入输出对之间的映射能力。 BP 网络执行的是有教师训练,其样本集是由形如(输入向量,期望输出向量)的向量对构成的。在开始训练前,所有的权值和阈值都应该用一些不同的小随机数进行初始化。 BP 算法主要包括两个阶段: (1) 向前传播阶段 ①从样本集中取一个样本(X p ,Y p ),将X p 输入网络,其中X p 为输入向量,Y p 为期望输出向量。 ②计算相应的实际输出O p 。 在此阶段,信息从输入层经过逐级的变换,传送到输出层。这个过程也是网络在完成训练后正常运行时执行的过程。在此过程中,网络执行的是下列运算: (1)(2)()21(...((())...))n p n p O F F F X W W W (2) 向后传播阶段

①计算实际输出O p 与相应的理想输出Y p 的差; ②按极小化误差的方法调整权矩阵。 这两个阶段的工作一般应受到精度要求的控制,定义 21 1()2m p pj pj j E y o (1) 作为网络关于第p 个样本的误差测度(误差函数)。而将网络关于整个样本集的误差测度定义为 p E E (2) 如前所述,将此阶段称为向后传播阶段,也称之为误差传播阶段。 为了更清楚地说明本文所使用的BP 网络的训练过程,首先假设输入层、中间层和输出层的单元数分别是N 、L 和M 。X=(x 0,x 1,…,x N-1)是加到网络的输入矢量,H=(h 0,h 1,…,h L-1)是中间层输出矢量,Y=(y 0,y 1,…,y M-1)是网络的实际输出矢量,并且用D=(d 0,d 1,…,d M-1)来表示训练组中各模式的目标输出矢量。输出单元i 到隐单元j 的权值是V ij ,而隐单元j 到输出单元k 的权值是W jk 。另外用θk 和Φj 来分别表示输出单元和隐单元的阈值。 于是,中间层各单元的输出为: 1 0()N j ij i j i h f V x (3) 而输出层各单元的输出是: 1 0()L k jk j k j y f W h (4) 其中f(*)是激励函数,采用S 型函数: 1()1x f x e (5) 在上述条件下,网络的训练过程如下: (1) 选定训练集。由相应的训练策略选择样本图像作为训练集。 (2) 初始化各权值V ij ,W jk 和阈值Φj ,θk ,将其设置为接近于0的随机值,并初始化精度控制参数ε和学习率α。 (3) 从训练集中取一个输入向量X 加到网络,并给定它的目标输出向量D 。 (4) 利用式(7)计算出一个中间层输出H ,再用式(8)计算出网络的实际输出Y 。 (5) 将输出矢量中的元素y k 与目标矢量中的元素d k 进行比较,计算出M 个输出误差项:()(1)k k k k k d y y y 对中间层的隐单元也计算出L 个误差项:1 *0(1)M J j j k jk k h h W

相关文档
最新文档