CUHK Passions and Pursuits

45 C omputer scientist Wang Xiaogang has improved the way that machines sort images to the point where they are able to match the human ability to recognize faces. He is taking that expertise and expanding on it so that computers will be better equipped to detect and identify objects and actions— including what people are doing. The computer can report that data without requiring a human to sift through millions of images. Professor Wang is using a technique known as Deep Learning to mimic the processes of the mind. Deep Learning is a method of structuring computer networks that uses parallel computing to pass information between millions of computation units to simulate the way the neural networks of the human mind work. Once under way, the interaction between the units helps them to ‘learn’ the increasing complexity of millions of parameters without a human being having to keep programming different instructions. Professor Wang’s group has been the first to apply Deep Learning to the detection of certain parts of the face, as well as the alignment of a face and the segmentation of the body. He has also trained his computer systems to recognize what a person is doing in an image—laughing, eating, talking on a phone—and estimate what kind of pose somebody is in. After developing a new facial- recognition system, Professor Wang tested it against a data set known as ‘Labeled Faces in the Wild,’ a database of thousands of faces collected from the Internet. He has also tackled the problem of ‘occlusion’—identifying a person in a crowd, or when only part of him or her shows in an image. Through Deep Learning, his hierarchical computer system can also figure out what parts of a person are in an image, and what position he or she is in. Humans can recognize the similarity between two cropped faces 97.5% of the time, and the accuracy increases to 99.2% when they’re shown a complete picture. Prior to the introduction of Deep Learning, the most-advanced computers were pitching at 96.3%. But Professor Wang has been able to boost performance to 99.15%—a breakthrough in that computers now have essentially the same success rate as humans. The improvement comes from increasing the ‘depth’ of Deep Learning by introducing more layers of analysis, as well as getting those multiple layers to share information and re-use components. Professor Wang has been able to outperform other scientists in facial- recognition tests, including those at Facebook, something he attributes to experience and a willingness to share information. ‘Many computer scientists treat Deep Learning as a black box,’ Professor Wang said, using their own image database and keeping it in house. ‘We will open this box and carefully design this internal structure, by incorporating our research into computer vision conducted over the last 10 years.’ Professor Wang’s next challenge is to turn the attention of his machines to recognizing what people within crowds of thousands of pedestrians are doing, for instance. It’s far more complex than facial recognition given the large number of people involved and the wide variety of ways they interact. Another challenge is to improve the sophistication of facial recognition so that, for instance, a computer could take a side view of a face and recreate a full-frontal image. e e A well-trained computer system can recognize what a person is doing in an image – laughing, eating, talking on a phone, etc. Reading Laughter and Tears Wang Xiaogang perfects ‘deep learning’ for computers