This is the first in a series of blogs on the convergence of computer vision with the Internet of Things. Part One provides a high level overview of the drivers of this convergence. In subsequent parts, we will explore new frameworks, best practices and design methodologies for computer vision-IoT convergence.
Computer vision is ubiquitous today. It is found in everyday consumer products, from game consoles that can recognize your gestures to cell phone cameras that automatically auto-focuses on people. But computer vision has a long history of use in high performance commercial and government applications. Some examples include quality assurance in manufacturing, remote sensing for environmental management and high resolution cameras that collect intelligence over battlefields. These advanced systems use a variety of technologies, including optical sensors that can sense light waves across various spectrum ranges. Some of these sensors are stationary while others are attached to moving objects, such as satellites, drones and vehicles.
Why computer vision?
Sight or vision is the most developed of the five human senses. We use it daily to recognize our friends, detect obstacles in our path, complete tasks and learn new things. Our physical surroundings is full of visual cues. Street signs and signal lights to help us get from one place to another. Stores have signs to help us locate them and to advertise their goods. Computer and television screens display information and entertainment that we consume. Given the importance of sight, it’s only natural to extend its capabilities to computers and automation systems.
What is computer vision?
Computer vision starts with the optical sensors that capture and store an image, or set of images, and then transforms those images into digital information that can be further acted upon. It is comprised of several technologies working together (Figure 1). Computer vision engineering is an interdisciplinary field requiring cross-functional and systems expertise in a number of these technologies.
In the recent past, computer vision applications were built on proprietary platforms. But when combined with IP based technologies, they create a new set of applications that were not possible before. For example, computer vision, coupled with IP connectivity, advanced data analytics and artificial intelligence (AI), are catalysts for innovation. This results in revolutionary leaps in Internet of Things (IoT) innovations and use cases.
As an example, Microsoft Kinect uses 3D computer graphics algorithms to enable computer vision to analyze and understand three dimensional scenes. It allows game developers to merge real-time full body motion capture with artificial 3D environments. Besides gaming, this opens new possibilities in areas like robotics, virtual reality (VR) and augmented reality (AR) applications.
Sensor technology advancements are happening rapidly at many levels beyond conventional camera sensors. Some recent examples include:
- Infrared sensors and lasers combine to sense depth and distance, which are one of the critical enablers of self-driving cars and 3D mapping applications
- Nonintrusive sensors that track vital signs of medical patients without physical contact
- High frequency cameras that can capture subtle movements not perceivable by human eyes to help athletes analyse their gaits
- Ultra low power and low cost vision sensors that run anywhere for a long period of time
Computer vision gets smart
Early adoption started with security
The surveillance industry is one of the early adopters of image processing techniques and video analytics. Video analytics is a special use case of computer vision that focuses on finding patterns from hours of video footage. The ability to automatically detect and identify predefined patterns in real world situations represents a huge market opportunity with hundreds of use cases.
The first video analytics tools use handcrafted algorithms that identify specific features in images and videos. They were accurate in laboratory settings and simulation environments. However, the performance quickly dropped when the input data, like lighting conditions and camera views, deviated from design assumptions.
Researchers and engineers spent many years developing and tuning algorithms or coming with new ones to deal with different conditions. However, cameras or video recorders using those algorithms are still not robust enough. Despite some incremental progress made over the years, poor real world performance limited the usefulness and adoption of the technology.
Deep learning revolutionizes computer vision
In recent years, the emergence of deep learning algorithms has reinvigorated computer vision. Deep learning uses Artificial Neural Networks (ANN) algorithms which mimic the neurons of the human brain.
Starting in the early 2010s, computer performance, accelerated by graphics processing units (GPU), have grown powerful enough for researchers to realize the capabilities of complex ANN. Partly driven by video sites and prevalent IoT devices, researchers have access to large diverse libraries of video and image data to train their neural networks.
In 2012, a version of the Deep Neural Network (DNN), called the Convolutional Neural Network (CNN), demonstrated a huge leap in accuracy. That development drove renewed interest and excitement into the field of computer vision engineering. Now, in applications requiring image classification and facial recognition, deep learning algorithms even exceeded their human counterparts. More importantly, just like humans, these algorithms have the ability to learn and adapt to different conditions.
New use cases emerge
We are entering an era of cognitive technology where computer vision and deep learning integrate to address high level, complex problems once the domain of the human brain (Figure 2). Some examples currently in use include:
- Agricultural drones that monitors the health of crops (Figure 3)
- Transportation infrastructure management
- UAV drone inspections
- Next generation home security cameras
These are just some small examples of how computer vision significantly increases the productivity across different applications. As advanced as these technologies are, these systems will continue to evolve with faster processors, more advanced machine learning algorithms and deeper integration to edge devices. We are just scratching the surface of what is possible.
There are many problems to overcome in making the technology more practical and economical for mass market adoption:
- Embedded platforms need to integrate deep neural design. There are difficult design decisions to be made around power consumption, cost, accuracy, and flexibility.
- The industry needs standardization to allow smart devices and systems to communicate with each other and share metadata.
- Systems are no longer passive collectors of data. They need to act upon the data with minimal human intervention. They need to learn and improvise by themselves. The whole software/firmware update process has new meanings in machine learning era.
- Hackers could exploit new security vulnerabilities in computer vision and AI. Designers need to take that into account.
As we are enter the age of IoT, computer vision is set to accelerate innovation. In the first phase, we focus on connecting devices, aggregating data and building up big data platforms. In the second phase, the focus will shift to making “things” more intelligent through technologies like computer vision and deep learning, generating more actionable data and automating tasks once the domain of the human brain.
In this post, we had a brief introduction to computer vision and how it is becoming a critical component of many connected devices and applications. We predict its imminent explosive growth and listed some of the hurdles in practical applications. In the next series of posts, we will explore new frameworks, best practices and design methodologies to overcome some of the challenges.
This article was written by Frank Lee. He is the co-founder and CEO of DeepPhoton Inc, and is developing a platform that empower IoT devices with cognitive technology. He has over 20 years of experience in leading product developments in IoT platforms, media processing, computer vision, cloud software. He is an enthusiast in computer vision, IoT, and machine intelligence.
Thanks for reading this post. If you found this post useful, please share it with your network. Please subscribe to our newsletter and be notified of new blog articles we will be posting. You can also follow us on Twitter (@strategythings), LinkedIn or Facebook.