Deep Learning at Apple

At the WWDC 2016 keynote event, Craig Federighi showed the new “Advanced Computer Vision” capabilities that will be integrated in the Photos app.  Facial recognition (previously only part of Photos for OS X macOS) is now supplemented with more general object and scene recognition.  The end result is the ability to not just present a list of photos based on metadata from the time of capture (e.g. date, location), but new ways to automatically organize and search your photos given better understanding of the subjects and context of each shot.

Apple is claiming that the computer vision and deep learning algorithms are all run on the user’s device to protect user privacy. In contrast, most other solutions run in the cloud (Google Photos).

How can these computationally complex algorithms run efficiently and not drain the batter of mobile devices?

Enter the Basic Neural Network Subroutines (BNNS).    As part of the Accelerate framework, Apple has added new APIs for efficiently running artificial neural networks!  From the reference documentation:

A neural network is a sequence of layers, each layer performing a filter operation on its input and passing the result as input to the next layer. The output of the last layer is an inference drawn from the initial input: for example, the initial input might be an image and the inference might be that it’s an image of a dinosaur.

While the new APIs make it easy to efficiently run a neural network, the training data that is so crucial must be provided by the user of the API.

BNNS supports implementation and operation of neural networks for inference, using input data previously derived from training. BNNS does not do training, however. Its purpose is to provide very high performance inference on already trained neural networks.

It looks like WWDC Session 715 will cover the new neural network acceleration functions.

As for the new face, object, and scene recognition in Photos, where is Apple getting its training data?