EXCLUSIVE: Analytics and AI – it starts with training

Video analytics and AI

Share this content


Senior AI Engineer at Evolon, Srinath Kumar, analyzes how a vendor builds and trains their AI directly affects security accuracy.

The advance of AI

AI is a hot topic in the security industry; companies are developing technologies and end users worldwide are adopting the tools and learning alongside researchers.

We’re in rapid technological innovation and development, with AI improving constantly.

For security operators, it is exciting to be on the frontier of an impactful technological change and challenging to understand what to look for. 

As with any security technology, asking the right questions helps identify the best solution for your business, industry and use case.

Here, we will explore the technologies deployed in video surveillance AI, dive into the importance of training data and arm you with the information you need when evaluating what security AI is best for your organization.

Video surveillance AI – what’s under the hood?

AI is a broad term. Within the security industry, AI used in video surveillance often combines machine learning, computer vision and deep learning.

Machine learning is a field of AI focused on mimicking human learning.

Machine learning empowers computer algorithms to learn and program themselves, gradually improving accuracy.

The technology requires a lot of high-quality data; the more data, the better the program.

Machine learning systems have three primary functions:

  • Descriptive: the AI uses the input data to explain what happened
  • Predictive: the AI uses input data to predict what may happen
  • Prescriptive: the AI uses input data to make recommendations on actions to take

Computer vision is a field of AI that enables computers to see and understand context through images, videos and other visual inputs.

Computer vision works similarly to human vision, though AI’s computational power and capabilities allow it to notice patterns, anomalies and minor imperceptible differences that outperform human vision.

Computer vision capabilities include:

  • Object classification – objects in an image can be identified based on a defined category (people, vehicles, animals)
  • Object identification – the object category is distinguished and its appearance is analyzed to determine the identity or unique traits (colors, clothing, vehicle brand/model)
  • Object tracking – video input is used to process the movement of an object over time
  • Optical character recognition – data within images is analyzed to identify letters and numbers that are converted to text to be used by additional programs (license plates, addresses and other unique identifiers)

Deep learning is a subset of machine learning, mainly based on artificial neural networks (ANNs), inspired by the functioning of the human brain.

Deep learning networks are neural networks with many layers, allowing for added complexity and improved output accuracy.

Deep neural networks with multiple layers of interconnected nodes allow these AI models to solve complex data challenges by discovering patterns and features in the input data.

Deep learning algorithms can learn independently and improve from added data without ongoing manual engineering. 

Training deep neural networks requires extensive data pools and advanced computational resources.

In recent years, the advancement of technologies like graphics processing units (GPUs) and cloud computing architecture has made it easier to train and deploy deep neural networks with the specialized task of improving security.

The impact of deployment methods on AI accuracy

The three deployment methods for video surveillance AI are cloud, on-premises and in-camera.

A general rule of thumb is that the more powerful computational power you can access, the more complex the AI model can be, creating a trade-off between size and accuracy.

Based on this understanding, cloud-based security AI technologies will be the most powerful options on the market.

Access to the cloud provides seemingly unrestricted computing power. Highly complex deep learning neural networks can be built on this architecture and deployed quickly anywhere in the world.  

On-premises and onboard camera AI have inherent limitations based on the physical computational power and investment allocated and come with ongoing maintenance.

Pairing camera AI and cloud-based AI can create a layered approach to a security architecture.

The importance of built-for-security datasets

Understanding how the AI models you’re evaluating were trained before adding them to your security tech stack is essential.

Training an AI model requires an extensive and high-quality dataset. High quality in this context means the image and video data match real-world video surveillance deployments.

The most vital aspect of AI training is the diversity of data included.

AI models should be trained and exposed to many possible scenarios, variables and threats like a seasoned security professional. 

Crucial ingredients for accurate security AI datasets are:

Camera type and quality

  • Datasets should include video surveillance images and video from various device types, such as red, green, blue (RGB), infrared (IR) and forward-looking infrared (FLIR)
  • Data from varying camera quality should be included. Most off-the-shelf datasets include only high-definition clips, whereas field solutions often range in camera capabilities. AI models deployed on existing infrastructure should be trained on all camera types

Variable conditions

  • Every change in visual clarity due to weather adds complexity to the demands of an AI model, including rain, fog, snow, dust and ice. Within each type of weather, there should be training on intensities, i.e., mist, drizzle, rain and cyclone
  • Cameras incur external environmental changes, including dust, cobwebs, ice, condensation and other lens disruptions
  • Many public datasets include ideal camera angles and proximity to objects of interest. AI models should be trained on variable angles and distances to successfully deploy in a legacy security deployment where conditions may not be ideal

Object rich data

  • AI models can be trained on object classification (person, vehicle, animal) and identification (man, truck, deer). Datasets should include image and video data rich in combinations for complexity. Many off-the-shelf datasets include clips with independent training sets, while the real world may encounter people with dogs or vehicles with animals all in a single scene

Atypical scenes

  • The real world is filled with unique activity that is hard to match with public data. Public data sets ignore many of the realities that security professionals deal with. An example within our dataset is the ability to identify persons on the holiday of Halloween. Where basic datasets would miss the person due to the costume, training the AI model on atypical scenes provides context to change that enables accurate results

Large companies like Google, Amazon, NVIDIA and others have large datasets to train computer vision AI models.

These off-the-shelf options make the deployment of AI fast; however, they lack the necessary accuracy for security deployments.

These datasets include high-definition images with perfect camera angles, lighting and simple scene complexity.

As security professionals, we know that an average deployment has challenging camera angles, variable lighting, weather and changing scene complexity. 

The ideal security AI is “field-trained” on real-world surveillance data so that the models can accurately accomplish their tasks when deployed.

Vendors who own their dataset can control quality, adjust for custom deployments/solutions and improve the dataset over time.

As AI accuracy relies on the dataset it uses for training, it’s essential to identify that your industry and use case are in the training set before deploying the technology.

For example, the applications will be vastly different when comparing AI for use in retail and perimeter security.

The datasets and models used in the field should match these differences.

Training AI for retail security uses HD images with indoor scenes and great lighting.

AI at the perimeter requires long distances, variable lighting and weather, animals and other unique scene challenges. The two are not interchangeable. 

The takeaways

AI is a growing field within the security industry and we are at an exciting time when technology exploration is rapidly advancing.

You must try the technology, learn with your peers and adapt as it evolves. AI models are not one-size-fits-all; understanding the type of technology and training dataset used in developing the tools you evaluate will help you succeed at using the new tools. 

To support your evaluation journey, here are a few questions you can ask:

  1. Did you use a publicly available dataset or build your own?
  2. Has your AI technology been successfully trained on my industry, use case and scene complexity?
  3. Do you have independent AI models for color and thermal cameras?
  4. How accurate is your AI model? (The F1 score is an excellent evaluation metric)

About the author

Srinath Kumar is a Senior AI Engineer at Evolon, specializing in the research and development of deep learning-based technologies for perimeter intrusion detection systems (PIDS).

He has over 20 research publications in the fields of autonomous sensing, computer vision and machine learning.

Srinath earned his PhD from Purdue University, Master’s from the Hong Kong University of Science and Technology and Bachelor’s degree from IIT Bombay. 

This article was originally published in the October edition of Security Journal Americas. To read your FREE digital edition, click here.

Receive the latest breaking news straight to your inbox