Continuing our series of deep learning updates, we pulled together some of the awesome resources that have emerged since our last post on September 20th. In case you missed it, here are our past updates: September part 1, August part 2, August part 1, July part 2, July part 1, June, and the original set of 20+ resources we outlined in April. As always, this list is not comprehensive, so let us know if there’s something we should add, or if you’re interested in discussing this area further.
Open Sourcing 223GB of Driving Data by Oliver Cameron of Udacity. 223GB of image frames and log data from 70 minutes of driving in Mountain View on two separate days. Log data includes latitude, longitude, gear, brake, throttle, steering angles and speed. GitHub repo here.
ImageNet 2016: Large Scale Visual Recognition Challenge results (ILSVRC 2016). A yearly seminal competition where teams correctly classify and detect objects and scenes in images. The teams operate at the bleeding-edge of image recognition — learn about the teams here.
Generating Faces with Deconvolution Networks by Michael Flynn. Neural networks that generate and interpolate between faces, based on a large publicly available dataset. Inspired by this paper on image generation. GitHub repo here.
Youtube-8M Dataset by Google. 8 million video IDs and associated labels from over 4800 visual entities (e.g. vehicle, concert, music video, etc.), making it possible to advance research & applications of video understanding. Blog post here.
Deep3D: Automatic 2D-to-3D Video Conversion with CNNs by Eric Junyuan Xie. 3D videos are typically produced in one of two ways: shooting with a special 3D camera or shooting in 2D and manually convert to 3D — both are hard. This project demonstrates automatic 2D-to-3D conversion, so you could potentially take a 3D selfie with an ordinary smartphone.
Open Sourcing a Deep Learning Solution for Detecting NSFW Images by Yahoo. An open-source classifier for identifying NSFW content, based on a CNN architecture and implemented with Caffe. GitHub repo here.
Anticipating Visual Representations from Unlabeled Video by MIT. Anticipating actions and objects via computer vision is hard (e.g. if someone is gesturing forward to shake hands). Humans do this through extensive experiential knowledge and inference — it’s much harder for a machine. This implementation trains deep neural networks to predict the visual representation of images in the future. Forbes article here.
TensorFlow in a Nutshell by Camron Godbout. A three part series that explains Google’s deep learning framework TensorFlow. The guides cover the basics, hybrid learning, and an overview of supported models. Part 1, part 2, and now, part 3.
The Neural Network Zoo by Fjodor Van Veen. A cheat sheet that covers many of the popular neural network architectures. Great way to keep track various architectures and their underlying structures and relations. The cheat sheet has descriptions of each architecture and links to their original academic papers.
Torch Video Tutorials by Alfredo Canziani. A video collection of intro tutorials on leveraging Torch, providing an overview of Lua, Torch, neural networks, CNNs, and relevant Torch packages. RNNs coming soon.
Show and Tell: image captioning open sourced in TensorFlow by Google Brain. Chris Shallue and his team make available their image captioning system. It’s faster to train, more detailed, and more accurate than past iterations. GitHub repo here. Original paper here.
The Alexa Prize by Amazon. A new annual competition for university students to advance the field of conversational AI. Participants develop a bot that converses coherently with humans for 20 minutes. The application process closes October 28, 2016 — apply here.
Bay Area Deep Learning School held at Stanford in late September and organized by Pieter Abbeel, Samy Bengio, and Andrew Ng. Speakers included Yoshua Bengio, Hugo Larochelle, Russ Salakhutdinov, and many others. All slide decks here and live stream videos from day 1 and day 2 are available.