Up to Speed on AI & Deep Learning: December Update

Sharing some of the latest research, announcements, and resources on deep learning.

By Isaac Madan (email)

Continuing our series of deep learning updates, we pulled together some of the awesome resources that have emerged since our last post. In case you missed it, you can find all past updates here. As always, this list is not comprehensive, so let us know if there’s something we should add, or if you’re interested in discussing this area further. If you’re a machine learning practitioner or student, join our Talent Network here to get exposed to awesome ML opportunities.

Novel Applications of AI

Detecting Pneumonia: CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning by Rajpurkar et al of Stanford ML Group. We develop an algorithm that can detect pneumonia from chest X-rays at a level exceeding practicing radiologists. Our model, CheXNet, is a 121-layer convolutional neural network that inputs a chest X-ray image and outputs the probability of pneumonia along with a heatmap localizing the areas of the image most indicative of pneumonia. Original paper here.

Detecting cracks in nuclear reactors: NB-CNN: Deep Learning-based Crack Detection Using Convolutional Neural Network and Naïve Bayes Data Fusion by Chen et al of Purdue. A system under development at Purdue University uses artificial intelligence to detect cracks captured in videos of nuclear reactors and represents a future inspection technology to help reduce accidents and maintenance costs.

Self learning robots: A.I. Researchers Leave Elon Musk Lab to Begin Robotics Start-Up. Embodied Intelligence will specialize in complex algorithms that allow machines to learn tasks on their own. Using these methods, existing robots could learn to, for example, install car parts that aren’t quite like the parts they have installed in the past, sort through a bucket of random holiday gifts as they arrive at a warehouse, or perform other tasks that machines traditionally could not. Founded by UC Berekeley professor Pieter Abbeel and former OpenAI researchers Peter Chen and Rocky Duan and the former Microsoft researcher Tianhao Zhang.

Face detection: An On-device Deep Neural Network for Face Detection by Computer Vision Machine Learning Team at Apple. Apple started using deep learning for face detection in iOS 10. With the release of the Vision framework, developers can now use this technology and many other computer vision algorithms in their apps. We faced significant challenges in developing the framework so that we could preserve user privacy and run efficiently on-device. This article discusses these challenges and describes the face detection algorithm.

Palliative care: Improving Palliative Care with Deep Learning by Avati et al of Stanford ML / BMI. Using a deep neural network to identify patients who are likely to benefit from palliative care services and bring them to the attention of palliative care professionals at a hospital for better outreach.

  • Improving the quality of end-of-life care for hospitalized patients is a priority for healthcare organizations. The algorithm is a Deep Neural Network trained on the EHR data from previous years, to predict all-cause 3–12 month mortality of patients as a proxy for patients that could benefit from palliative care. Our predictions enable the Palliative Care team to take a proactive approach in reaching out to such patients, rather than relying on referrals from treating physicians, or conduct time consuming chart reviews of all patients.

Coding / algorithm design: DLPaper2Code: Auto-generation of Code from Deep Learning Research Papers by Sethi et al. With an abundance of research papers in deep learning, reproducibility or adoption of the existing works becomes a challenge. We propose a novel extensible approach, DLPaper2Code, to extract and understand deep learning design flow diagrams and tables available in a research paper and convert them to an abstract computational graph. The extracted computational graph is then converted into execution ready source code in both Keras and Caffe, in real-time.


Announcements & Research

Call for an International Ban on the Weaponization of Artificial Intelligence by Ian Kerr, Geoff Hinton, Richard S. Sutton, Doina Precup and Yoshua Bengio. Open letter by leading AI researchers asking the Canadian government to urgently address the challenge of lethal autonomous weapons (often called “killer robots”) and to take a leading position against Autonomous Weapon Systems on the international stage at the upcoming UN meetings in Geneva.

Announcing TensorFlow Lite by Google TensorFlow team. TensorFlow’s lightweight solution for mobile and embedded devices. Enables low-latency inference of on-device machine learning models. Lightweight, cross-platform, and fast.

SLING: A Natural Language Frame Semantic Parser by Michael Ringgaard and Rahul Gupta of Google.

  • Until recently, most practical natural language understanding (NLU) systems used a pipeline of analysis stages, from part-of-speech tagging and dependency parsing to steps that computed a semantic representation of the input text. 
  • Today we are announcing SLING, an experimental system for parsing natural language text directly into a representation of its meaning as a semantic frame graph. 
  • SLING uses a special-purpose recurrent neural network model to compute the output representation of input text through incremental editing operations on the frame graph.

Introducing TensorFlow Feature Columns by Google TensorFlow team. We’re devoting this article to feature columns — a data structure describing the features that an Estimator requires for training and inference. As you’ll see, feature columns are very rich, enabling you to represent a diverse range of data.

One Network to Solve Them All — Solving Linear Inverse Problems using Deep Projection Models by Chang et al of CMU. We propose a general framework to train a single deep neural network that solves arbitrary linear inverse problems.


On AI & Enterprise Software

Software 2.0 by Andrej Karpathy of Tesla. The “classical stack” of Software 1.0 is what we’re all familiar with — it is written in languages such as Python, C++, etc. In contrast, Software 2.0 is written in neural network weights.

  • It turns out that a large portion of real-world problems have the property that it is significantly easier to collect the data than to explicitly write the program. A large portion of programmers of tomorrow do not maintain complex software repositories, write intricate programs, or analyze their running times. They collect, clean, manipulate, label, analyze and visualize data that feeds neural networks.

Considering TensorFlow for the Enterprise by Sean Murphy and Allen Leis of O’Reilly. Introduces deep learning from an enterprise perspective and offers an overview of the TensorFlow library and ecosystem. If your company is adopting deep learning, this report will help you navigate the initial decisions you must make — from choosing a deep learning framework to integrating deep learning with the other data analysis systems already in place — to ensure you’re building a system capable of handling your specific business needs.


Resources / Tutorials

Understanding Hinton’s Capsule Networks. Part I: Intuition and Part 2: How Capsules Work by Max Pechyonkin. A few weeks ago, Geoffrey Hinton and his team published two papers that introduced a completely new type of neural network based on so-called capsules. In addition to that, the team published an algorithm, called dynamic routing between capsules, that allows to train such a network. In this post, I will explain why this new architecture is so important, as well as intuition behind it. In the following posts I will dive into technical details.

Capsule Networks (CapsNets) — Tutorial by Aurélien Géron. Explanation of CapsNets, a hot new architecture for neural networks, invented by Geoffrey Hinton, one of the godfathers of deep learning.

Feature Visualization by Google researchers. Explanation of feature visualization — how neural networks build up their understanding of images. There is a growing sense that neural networks need to be interpretable to humans. While feature visualization is a powerful tool, actually getting it to work involves a number of details. In this article, we examine the major issues and explore common approaches to solving them.

ICML 2017: A Review of Deep Learning Papers, Talks, and Tutorials by Satrajit Chatterjee of Two Sigma. A senior Two Sigma researcher provides an overview of some of the most interesting Deep Learning research from ICML 2017.

Reinforcement Learning: An Introduction by Richard S. Sutton and Andrew G. Barto. Complete draft of 2nd edition. A deep, technical and extensive dive into reinforcement learning.

Power & Limits of Deep Learning by Yann LeCun, Director of AI Research at Facebook. LeCun’s talk at AI & the Future of Work conference. 36 min video.

Understanding LSTM and its diagrams by Shi Yan. Great diagrams explaining how an LSTM works.


By Isaac Madan. Isaac is an investor at Venrock (email). If you’re interested in deep learning or there are resources I should share in a future newsletter, I’d love to hear from you. If you’re a machine learning practitioner or student, join our Talent Network here to get exposed to awesome ML opportunities.

Up to Speed on AI & Deep Learning: November Update


Sharing some of the latest research, announcements, and resources on deep learning.

By Isaac Madan (email)

Continuing our series of deep learning updates, we pulled together some of the awesome resources that have emerged since our last post. In case you missed it, you can find all past updates here. As always, this list is not comprehensive, so let us know if there’s something we should add, or if you’re interested in discussing this area further. If you’re a machine learning practitioner or student, join our Talent Network here to get exposed to awesome ML opportunities.

Research & Announcements

OpenPose: Real-time multi-person keypoint detection library for body, face, and hands by CMU. OpenPose is a library for real-time multi-person keypoint detection and multi-threading written in C++ using OpenCV and Caffe.

‘Mind-reading’ AI: Neural Encoding and Decoding with Deep Learning for Dynamic Natural Vision by Wen et al of Purdue Engineering. Researchers have demonstrated how to decode what the human brain is seeing by using artificial intelligence to interpret fMRI scans from people watching videos, representing a sort of mind-reading technology. Youtube video here. Original paper here.

Dynamic Routing Between Capsules by Hinton et al of Google Brain. New paper by Geoffrey Hinton. A capsule is a group of neurons whose activity vector represents the instantiation parameters of a specific type of entity such as an object or object part. We show that a discrimininatively trained, multi-layer capsule system achieves state-of-the-art performance on MNIST and is considerably better than a convolutional net at recognizing highly overlapping digits. PyTorch implementation of the paper here.

DeepXplore: Automated Whitebox Testing of Deep Learning Systems by Pei et al of Columbia Engineering.

  • Researchers at Columbia and Lehigh universities have come up with a way to automatically error-check the thousands to millions of neurons in a deep learning neural network. Their tool feeds confusing, real-world inputs into the network to expose rare instances of flawed reasoning by clusters of neurons.
  • DeepXplore efficiently finds thousands of incorrect corner case behaviors (e.g., self-driving cars crashing into guard rails and malware masquerading as benign software) in state-of-the-art DL models with thousands of neurons trained on five popular datasets including ImageNet and Udacity self-driving challenge data.

Voice Conversion with Non-Parallel Data (i.e. Speaking like Kate Winslet) by Ahn and Park. Deep neural networks for voice conversion (voice style transfer) in Tensorflow. GitHub repo.

Uber AI Labs Open Sources Pyro, a Deep Probabilistic Programming Language. Website and docs here.

  • Why probabilistic modeling? To correctly capture uncertainty in models and predictions for unsupervised and semi-supervised learning, and to provide AI systems with declarative prior knowledge.
  • Why (universal) probabilistic programs? To provide a clear and high-level, but complete, language for specifying complex models.
  • Why deep probabilistic models? To learn generative knowledge from data and reify knowledge of how to do inference.
  • Why inference by optimization? To enable scaling to large data and leverage advances in modern optimization and variational inference.

Resources, Tutorials & Data

Deep RL Bootcamp by UC Berkeley. ​This two-day long bootcamp will teach you the foundations of Deep RL through a mixture of lectures and hands-on lab sessions, so you can go on and build new fascinating applications using these techniques and maybe even push the algorithmic frontier. Great topics, content, and speakers — the lectures and labs are available online.

Latest Deep Learning OCR with Keras and Supervisely in 15 minutes by Deep Systems. This tutorial is a gentle introduction to building modern text recognition system using deep learning in 15 minutes.

AMA with the two of the team members at DeepMind who developed AlphaGo. AlphaGo Zero uses a quite different approach to deep RL than typical (model-free) algorithms such as policy gradient or Q-learning. By using AlphaGo search we massively improve the policy and self-play outcomes — and then we apply simple, gradient based updates to train the next policy + value network. This appears to be much more stable than incremental, gradient-based policy improvements that can potentially forget previous improvements.

BTW: How to Read a Paper by S. Keshav of University of Waterloo. Not newly published, but very useful as the number of new ML papers continues to grow rapidly.


By Isaac Madan. Isaac is an investor at Venrock (email). If you’re interested in deep learning or there are resources I should share in a future newsletter, I’d love to hear from you. If you’re a machine learning practitioner or student, join our Talent Network here to get exposed to awesome ML opportunities.

Up to Speed on AI & Deep Learning: Colaboratory, OpenFermion, Micromouse (October Update, Part 3)

Sharing some of the latest research, announcements, and resources on deep learning.

By Isaac Madan (email)

Continuing our series of deep learning updates, we pulled together some of the awesome resources that have emerged since our last post. In case you missed it, you can find all past updates here. As always, this list is not comprehensive, so let us know if there’s something we should add, or if you’re interested in discussing this area further. If you’re a machine learning practitioner or student, join our Talent Network here to get exposed to awesome ML opportunities.

Research & Announcements

Colaboratory by Google Research. Colaboratory is a data analysis tool that combines text, code, and code outputs into a single collaborative document. Google releases its own cloud notebook platform — try it.

OpenFermion: The Electronic Structure Package For Quantum Computers by Google. OpenFermion is an open source effort for compiling and analyzing quantum algorithms to simulate fermionic systems, including quantum chemistry. Among other functionalities, the current version features data structures and tools for obtaining and manipulating representations of fermionic and qubit Hamiltonians. Original paper here.

Deep learning and the Schrödinger equation by Mills et al of University of Ontario. We have trained a deep (convolutional) neural network to predict the ground-state energy of an electron in four classes of confining two-dimensional electrostatic potentials.

NVIDIA Deep Learning Accelerator. NVIDIA open sources its deep learning chip architecture to broaden its adoption as an IoT standard. The NVIDIA Deep Learning Accelerator (NVDLA) is a free and open architecture that promotes a standard way to design deep learning inference accelerators.

Micromouse contest first place video. Micromouse is an event where small robot mice solve a 16×16 maze (Wikipedia). Watch first place winner Ning6A1 by BengKiat Ng solve the maze (Youtube video). Read more about the contest here via this blog post.

Neural Networks API by Google. Google announces Neural Networks API for Android which executes against machine learning models on the device, bringing more AI to the edge. The Android Neural Networks API (NNAPI) is an Android C API designed for running computationally intensive operations for machine learning on mobile devices.

Generative Adversarial Networks: An Overview by Creswell et al of Imperial College London. The aim of this review paper is to provide an overview of GANs for the signal processing community, drawing on familiar analogies and concepts where possible. In addition to identifying different methods for training and constructing GANs, we also point to remaining challenges in their theory and application.

Announcing PlaidML: Open Source Deep Learning for Every Platform by Vertex AI. Open source portable deep learning engine. Our mission is to make deep learning accessible to every person on every device.

Resources, Tutorials & Data

Arxiv Vanity by Andreas Jansson and Ben Firshman. A handy tool that renders Arxiv academic papers as easy to read web pages so you don’t have to read the PDF versions that is typical of most ML papers.

Video lectures accompanying Deep Learning book by Alena Kruchkova. Great series of lecture videos that follow the Deep Learning book by Goodfellow et al. Original book here.

Raspberry Pi: Deep learning object detection with OpenCV by Adrian Rosebrock. Tutorial demonstrating near real-time object detection via a Rasberry Pi.

Dimensionality Reduction: Principal Components Analysis, Part 1 by Data4Bio. Thorough and understandable explanation of Principal Component Analysis (Youtube video).

Explaining Your Machine Learning Model (or 5 Ways to Assess Feature Importance) by ClearBrain. Knowing which features, inputs, or variables in a model are influencing its effectiveness is valuable to improving its actionability. Assessing feature importance though is not straightforward. Below we outline five ways of addressing feature importance, with a focus on logistic regression models for simplicity.

Word embeddings in 2017: Trends and future directions by Sebastian Ruder. Word embedding is the collective name for a set of language modeling and feature learning techniques in natural language processing (NLP) where words or phrases from the vocabulary are mapped to vectors of real numbers (Wikipedia). This post will focus on the deficiencies of word embeddings and how recent approaches have tried to resolve them.

How to Win a Data Science Competition: Learn from Top Kagglers by Coursera. In this course, you will learn to analyse and solve competitively such predictive modelling tasks. Course just started October 23.


By Isaac Madan. Isaac is an investor at Venrock (email). If you’re interested in deep learning or there are resources I should share in a future newsletter, I’d love to hear from you. If you’re a machine learning practitioner or student, join our Talent Network here to get exposed to awesome ML opportunities.

Up to Speed on AI & Deep Learning: October Update, Part 2

Sharing some of the latest research, announcements, and resources on deep learning.

By Isaac Madan (email)

Continuing our series of deep learning updates, we pulled together some of the awesome resources that have emerged since our last post. In case you missed it, you can find all past updates here. As always, this list is not comprehensive, so let us know if there’s something we should add, or if you’re interested in discussing this area further.

Research & Announcements

Mastering the game of Go without human knowledge by Silver et al of DeepMind. AlphaGo implementation that requires no prior knowledge — the system teaches itself. Here we introduce an algorithm based solely on reinforcement learning, without human data, guidance or domain knowledge beyond game rules. AlphaGo becomes its own teacher: a neural network is trained to predict AlphaGo’s own move selections and also the winner of AlphaGo’s games. Original paper here.

Introducing Gluon: a new library for machine learning from AWS and Microsoft. Microsoft and Amazon take on Google’s TensorFlow and Facebook’s PyTorch with their own new open source deep learning library. Gluon provides a clear, concise API for defining machine learning models using a collection of pre-built, optimized neural network components.

Artificial intelligence can say yes to the dress. GANs to generate garment photos for e-commerce. Online fashion tech startup Vue.ai is selling technology that analyzes pieces of clothing and automatically generates an image of the garment on a person of any size, shape, or wearing any kind of shoes (company website here).

Introducing NNVM Compiler: A New Open End-to-End Compiler for AI Frameworks by AWS. We introduce the NNVM compiler, which compiles a high-level computation graph into optimized machine codes. This addresses some of the problems that arise when working across a number of AI frameworks and underlying hardware architectures.

Intel® Nervana™ Neural Network Processors (NNP) Redefine AI Siliconby Naveen Rao. Intel announces a new family of processors designed specifically for artificial intelligence.

Resources, Tutorials & Data

ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases by Wang et al of NIH. Researchers release medical data set for machine learning. The NIH Clinical Center recently released over 100,000 anonymized chest x-ray images and their corresponding data to the scientific community. NIH compiled the dataset of scans from more than 30,000 patients, including many with advanced lung disease. Original paper here. Data set available on Box here.

Spotify’s Discover Weekly: How machine learning finds your new musicby Sophia Ciocca. Explanation of machine learning approach to personalized music recommendations.

Visualizing convolutional neural networks by Justin Francis of University of Alberta. Building convnets from scratch with TensorFlow and TensorBoard.

Gradient descent, how neural networks learn by 3Blue1Brown. Digestable explanation of how gradient descent works (Youtube video).

Protecting Against AI’s Existential Threat by Ilya Sutskever and Dario Amodei of OpenAI. Discussing how to perform AI safe research. How do you create AI that doesn’t pose a threat to humanity? By teaching it to work with humans.

Andrew Ng Has a Chatbot That Can Help with Depression. New chatbot backed by Andrew Ng focused on interactive behavioral therapy to improve mental health.


By Isaac Madan. Isaac is an investor at Venrock (email). If you’re interested in deep learning or there are resources I should share in a future newsletter, I’d love to hear from you. If you’re a machine learning practitioner or student, join our Talent Network here to get exposed to awesome ML opportunities.

Up to Speed on Deep Learning: October Update

Sharing some of the latest research, announcements, and resources on deep learning.

By Isaac Madan (email)

Continuing our series of deep learning updates, we pulled together some of the awesome resources that have emerged since our last post. In case you missed it, here are our past updates: August, July, June (part 1, part 2, part 3, part 4), May, April (part 1, part 2), March part 1, February, November, September part 2 & October part 1, September part 1, August (part 1,part 2), July (part 1, part 2), June, and the original set of 20+ resources we outlined in April 2016. As always, this list is not comprehensive, so let us know if there’s something we should add, or if you’re interested in discussing this area further.

Continue reading “Up to Speed on Deep Learning: October Update”

Up to Speed on Deep Learning: August Update

Sharing some of the latest research, announcements, and resources on deep learning.

By Isaac Madan (email)

Continuing our series of deep learning updates, we pulled together some of the awesome resources that have emerged since our last post. In case you missed it, here are our past updates: July, June (part 1, part 2, part 3, part 4), May, April (part 1, part 2), March part 1, February, November, September part 2 & October part 1, September part 1, August (part 1, part 2), July(part 1, part 2), June, and the original set of 20+ resources we outlined in April 2016. As always, this list is not comprehensive, so let us know if there’s something we should add, or if you’re interested in discussing this area further.

Professional-quality photo post-processed by AI.

Research & Announcements

Cardiologist-Level Arrhythmia Detection With Convolutional Neural Networks by Rajpurkar et al of Stanford ML. We develop a model which can diagnose irregular heart rhythms, also known as arrhythmias, from single-lead ECG signals better than a cardiologist. Key to exceeding expert performance is a deep convolutional network which can map a sequence of ECG samples to a sequence of arrhythmia annotations along with a novel dataset two orders of magnitude larger than previous datasets of its kind. Original paper here.

Using Deep Learning to Create Professional-Level Photographs by Hui Wang of Google Research. Whether a photograph is beautiful or not is measured by its aesthetic value, which is a highly subjective concept. To explore how ML can learn subjective concepts, we introduce an experimental deep-learning system for artistic content creation. It mimics the workflow of a professional photographer, roaming landscape panoramas from Google Street View and searching for the best composition, then carrying out various postprocessing operations to create an aesthetically pleasing image. Original paper here.

How to turn audio clips into realistic lip-synced video by Suwajanakorn et al of University of Washington. Given audio of President Barack Obama, we synthesize a high quality video of him speaking with accurate lip sync, composited into a target video clip. Original paper here.

Continue reading “Up to Speed on Deep Learning: August Update”

Up to Speed on Deep Learning: July Update

Sharing some of the latest research, announcements, and resources on deep learning.

By Isaac Madan (email)

Continuing our series of deep learning updates, we pulled together some of the awesome resources that have emerged since our last post. In case you missed it, here are our past updates: June (part 1, part 2, part 3, part 4), May, April (part 1, part 2), March part 1, February, November, September part 2 & October part 1, September part 1, August (part 1, part 2), July(part 1, part 2), June, and the original set of 20+ resources we outlined in April 2016. As always, this list is not comprehensive, so let us know if there’s something we should add, or if you’re interested in discussing this area further.

Research & Announcements

Apollo by Baidu. Newly launched source platform for building autonomous vehicles.

Neural Network Libraries by Sony. Sony demonstrates its interest in deep learning by releasing their own open source deep learning framework.

CAN (Creative Adversarial Network) — Explained by Harshvardhan Gupta. Facebook researchers propose a new system for generating art. The system generates art by looking at art and learning about style; and becomes creative by increasing the arousal potential of the generated art by deviating from the learned styles. This post walks through the paper and explains it. Original paper here.

‘Explainable Artificial Intelligence’: Cracking open the black box of AI by George Nott. One current downside, and ongoing research area, of deep neural networks today is that they are black boxes, meaning their decision making & outcomes can’t be easily justified or explained. Article discusses various attempts and ongoing work in this area, including work by UC Berkeley & Max Plank Institute described in this original paper here.

Interpreting Deep Neural Networks using Cognitive Psychology by DeepMind. In a similar vein to the article above, DeepMind researchers propose a new approach to interpreting/explaining deep neural network models by leveraging methods from cognitive psychology. For example, when children guess they meaning of a word from a single example (one-shot word learning), they are employing a variety of inductive biases, such as shape bias. DeepMind assesses this bias in their models to improve their interpretation of what’s happening under the hood. Original paper here.

Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes by Wu et al. Digs into the question, why do deep neural networks generalize well?

Resources, Tutorials & Data

Under the Hood of a Self-Driving Taxi by Oliver Cameron of Voyage. A helpful overview of the tech stack powering a self-driving car, digging into Voyage’s compute, power, and drive-by-wire systems.

How HBO’s Silicon Valley built “Not Hotdog” with mobile TensorFlow, Keras & React Native by Tim Anglade. A walk-thru of how the Silicon Valley TV show built their app that famously identifies hotdogs and not hotdogs.

Machine UI, a new IDE purpose-built for machine learning with visual model representation. Video.

A 2017 Guide to Semantic Segmentation with Deep Learning by Qure.ai. Overview of state-of-the-art in semantic segmentation. As context, semantic segmentation is understanding an image at pixel level i.e, we want to assign each pixel in the image an object class.


By Isaac Madan. Isaac is an investor at Venrock (email). If you’re interested in deep learning or there are resources I should share in a future newsletter, I’d love to hear from you.

Up to Speed on Deep Learning: June Update, Part 4

Sharing some of the latest research, announcements, and resources on deep learning.

By Isaac Madan (email)

Continuing our series of deep learning updates, we pulled together some of the awesome resources that have emerged since our last post. In case you missed it, here are our past updates: June (part 1part 2part 3)MayApril (part 1part 2)March part 1FebruaryNovemberSeptember part 2 & October part 1September part 1August (part 1part 2)July(part 1part 2), June, and the original set of 20+ resources we outlined in April 2016. As always, this list is not comprehensive, so let us know if there’s something we should add, or if you’re interested in discussing this area further.

Research & Announcements

Grounded Language Learning in a Simulated 3D World by Google DeepMind. Teaching an AI agent to learn & apply language. Here we present an agent that learns to interpret language in a simulated 3D environment where it is rewarded for the successful execution of written instructions. The agent’s comprehension of language extends beyond its prior experience, enabling it to apply familiar language to unfamiliar situations and to interpret entirely novel instructions.

One Model To Learn Them All by Google. Getting a deep learning model to work well for a specific task like speech recognition, translation, etc. can take lots of time researching architecture & tuning. A generalizable model that works well across various tasks would thus be quite useful — Google presents one such model. In particular, this single model is trained concurrently on ImageNet, multiple translation tasks, image captioning (COCO dataset), a speech recognition corpus, and an English parsing task.

Tensor2Tensor by Google Brain. An open-source system for training deep learning models in TensorFlow. T2T facilitates the creation of state-of-the art models for a wide variety of ML applications, such as translation, parsing, image captioning and more, enabling the exploration of various ideas much faster than previously possible. This release also includes a library of datasets and models, including the best models from a few recent papers. GitHub repo here.

TensorFlow Object Detection API by Google. Last year Google demonstrated state-of-the-art results in object detection and won the COCO detection challenge, and featured their work in products like the NestCam.They’re now open sourcing this work: a framework built on top of TensorFlow that makes it easy to construct, train and deploy object detection models. Our goals in designing this system was to support state-of-the-art models while allowing for rapid exploration and research.

MobileNets: Open-Source Models for Efficient On-Device Vision by Google. It’s hard to run visual recognition models accurately on mobile devices given limitations in computational power and space. As such, MobileNets, a family of mobile-first computer vision models for TensorFlow, designed to effectively maximize accuracy while being mindful of the restricted resources for an on-device or embedded application.

deeplearning.ai by Andrew Ng. A new project by Andrew Ng coming up in August 2017. No details provided on the site yet.

Me trying to classify some random stuff on my desk:)

Resources, Tutorials & Data

Building a Real-Time Object Recognition App with Tensorflow and OpenCV by Dat Tran. In this article, I will walk through the steps how you can easily build your own real-time object recognition application with Tensorflow’s (TF) new Object Detection API and OpenCV in Python 3 (specifically 3.5). The focus will be on the challenges that I faced when building it. GitHub repo here.

What Can’t Deep Learning Do? by Bharath Ramsundar. A tweetstorm listing some of the known failures behind deep learning methods. Helpful in understanding where future research may be directed.

Generative Adversarial Networks for Beginners by O’Reilly. Build a neural network that learns to generate handwritten digits. GANs are neural networks that learn to create synthetic data similar to some known input data. GitHub repo here.

Measuring the Progress of AI Research by Electronic Frontier Foundation. Tracking what’s state-of-the-art in ML/AI and understanding how a specific subfield is progressing can get complicated. This pilot project collects problems and metrics/datasets from the AI research literature, and tracks progress on them.


By Isaac Madan. Isaac is an investor at Venrock (email). If you’re interested in deep learning or there are resources I should share in a future newsletter, I’d love to hear from you.

Up to Speed on Deep Learning: June 11–18 Update

Sharing some of the latest research, announcements, and resources on deep learning.

By Isaac Madan (email)

Continuing our series of deep learning updates, we pulled together some of the awesome resources that have emerged since our last post. In case you missed it, here are our past updates: June (part 1, part 2), May, April (part 1, part 2), March part 1, February, November, September part 2 & October part 1, September part 1, August (part 1, part 2), July (part 1, part 2), June, and the original set of 20+ resources we outlined in April 2016. As always, this list is not comprehensive, so let us know if there’s something we should add, or if you’re interested in discussing this area further.

Research & Announcements

Learning to Speak via Interaction by Baidu Research. Teaching an AI agent to speak by interacting with a virtual agent. This represents an advancement in more closely replicating how humans learn, as well as advancing our goal to demonstrate general artificial intelligence. Our AI agent learns to speak in an interactive way similar to a baby. In contrast, the conventional approach relies on supervised training using a large corpus of pre-collected training set, which is static and makes it hard to capture the interactive nature within the process of language learning. Original paper here.

Deep Shimon: Robot that composes its own music by Mason Britan of Georgia Tech. The robot Shimon composes and performs his first deep learning driven piece. A recurrent deep neural network is trained on a large database of classical and jazz music. Based on learned semantic relationships between musical units in this dataset, Shimon generates and performs a new musical piece. Video here.

Curiosity-driven Exploration by Self-supervised Prediction by Pathak et al. UC Berkeley researchers demonstrate artificial curiosity via an intrinsic curiosity model to control a virtual agent in a video game and understand its environment faster — which can accelerate problem solving. Original paper here and video here.

Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour by Facebook Research. Deep learning benefits from massive data sets, but this means long training times that slow down development. Using commodity hardware, our implementation achieves ∼90% scaling efficiency when moving from 8 to 256 GPUs. This system enables us to train visual recognition models on internet-scale data with high efficiency. Original paper here.

Resources

A gentle introduction to deep learning with TensorFlow by Michelle Fullwood at PyCon 2017. This talk aims to gently bridge the divide by demonstrating how deep learning operates on core machine learning concepts and getting attendees started coding deep neural networks using Google’s TensorFlow library. 41 minute video. Slides here and GitHub here.

Deep Reinforcement Learning Demystified (Episode 0) by Moustafa Alzantot. Basic description of what reinforcement learning is and provide examples for where it can be used. Cover the essential terminologies for reinforcement learning and provide a quick tutorial about OpenAI gym.

Neural Networks and Deep Learning by Michael Nielsen. Free online book that introduces neural networks and deep learning.

You can probably use deep learning even if your data isn’t that big by Andrew Beam. Article argues and explains how you can still use deep learning in (some) small data settings, if you train your model carefully. In response to Don’t use deep learning your data isn’t that big by Jeff Leek.

Posting on ArXiv is good, flag planting notwithstanding by Yann LeCun. In response to, and refuting, An Adversarial Review of “Adversarial Generation of Natural Language” by Yoav Goldberg of Bar Ilan University, which takes issue with deep learning researchers publishing aggressively on Arxiv.

Tutorials & Data

Computational Neuroscience Coursera course by University of Washington. Starts July 3, enroll now. Learn how the brain processes information. This course provides an introduction to basic computational methods for understanding what nervous systems do and for determining how they function. We will explore the computational principles governing various aspects of vision, sensory-motor control, learning, and memory.

Core ML and Vision: Machine Learning in iOS 11 Tutorial by Audrey Tam. iOS 11 introduces two new frameworks related to machine learning, Core ML and Vision. This tutorial walks you through how to use these new APIs and build a scene classifier.

Deep Learning CNN’s in Tensorflow with GPUs by Cole Murray. In this tutorial, you’ll learn the architecture of a convolutional neural network (CNN), how to create a CNN in Tensorflow, and provide predictions on labels of images. Finally, you’ll learn how to run the model on a GPU so you can spend your time creating better models, not waiting for them to converge.

Open-sourced Kinetics data set by Google DeepMind. Annotated data set of human actions — things like playing instruments, shaking hands, and hugging. Kinetics is a large-scale, high-quality dataset of YouTube video URLs which include a diverse range of human focused actions. The dataset consists of approximately 300,000 video clips, and covers 400 human action classes with at least 400 video clips for each action class.

Let’s evolve a neural network with a genetic algorithm by Matt Harvey of Coastline Automation. Applying a genetic algorithm to evolve a network with the goal of achieving optimal hyperparameters in a fraction of the time required to do a brute force search.


By Isaac Madan. Isaac is an investor at Venrock (email). If you’re interested in deep learning or there are resources I should share in a future newsletter, I’d love to hear from you.

Up to Speed on Deep Learning: June Update, Part 2

Sharing some of the latest research, announcements, and resources on deep learning.

By Isaac Madan (email)

Continuing our series of deep learning updates, we pulled together some of the awesome resources that have emerged since our last post. In case you missed it, here are our past updates: June (part 1)MayApril (part 1part 2)March part 1FebruaryNovemberSeptember part 2 & October part 1September part 1August (part 1part 2)July (part 1part 2), June, and the original set of 20+ resources we outlined in April 2016. As always, this list is not comprehensive, so let us know if there’s something we should add, or if you’re interested in discussing this area further.

Research & Announcements

Scalable and Sustainable Deep Learning via Randomized Hashing by Spring and Srivastava of Cornell. Rice University computer scientists have adapted a widely used technique for rapid data lookup to slash the amount of computation — and thus energy and time — required for deep learning. “This applies to any deep learning architecture, and the technique scales sublinearly, which means that the larger the deep neural network to which this is applied, the more the savings in computations there will be,” said Shrivastava. News article here.

A neural approach to relational reasoning by DeepMind. Relational reasoning is the process of drawing conclusions about how things are related to one another, and is central to human intelligence. A key challenge in developing artificial intelligence systems with the flexibility and efficiency of human cognition is giving them a similar ability — to reason about entities and their relations from unstructured data. These papers show promising approaches to understanding the challenge of relational reasoning. Original papers here and here.

Resources

Applying deep learning to real-world problems by Rasmus Rothe of Merantix. A must-read on key learnings when using deep learning in the real world. Discussion of the value of pre-training, caveats of real-world label distributions, and understanding black box models.

CuPy by Preferred Networks. An open-source matrix library accelerated with NVIDIA CUDA. Compatible with, or a drop-in replacement for, Numpy. GitHub repo here.

Speaker Resources 2017 by The AI Conference. Various news articles, academic papers, and datasets shared by folks involved in and enthusiastic about AI. (h/t Michelle Valentine)

Neural Network Architectures by Eugenio Culurciello. An in-depth overview & history or neural network architectures in the context of deep learning, spanning LeNet5, AlexNet, GoogLeNet, Inception, and a discussion of where things are headed in the future. Original paper here.

Model Zoo by Sebastian Raschka. A collection of standalone TensorFlow models in Jupyter Notebooks, including classifiers, autoencoders, GANs, and more. The broader repo for Sebastian’s book is also useful, here.

Tutorials & Data

Sketch-RNN: A Generative Model for Vector Drawings by Google. A TensorFlow recurrent neural network model for teaching machines to draw. Overview of the model and how to use it. Described in greater depth by Google here and here.

Exploring LSTMs by Edwin Chen. It turns out LSTMs are a fairly simple extension to neural networks, and they’re behind a lot of the amazing achievements deep learning has made in the past few years. So I’ll try to present them as intuitively as possible — in such a way that you could have discovered them yourself. An overview of long short-term memory networks, and a tutorial on their use.

Vistas Dataset by Mapillary. Free for research, the MVD is the worlds largest manually annotated semantic segmentation training data set for street level imagery. Primarily being used to train deep neural nets focused on object detection, semantic segmentation, and scene understanding for ADAS and autonomous. (h/t Andrew Mahon)


By Isaac Madan. Isaac is an investor at Venrock (email). If you’re interested in deep learning or there are resources I should share in a future newsletter, I’d love to hear from you.