How AI architectures are transforming physical security

Blue brain - representing AI

Share this content

Facebook
Twitter
LinkedIn

Quang Trinh, Business Development Manager, Platform Technologies at Axis Communications explores how AI will continue to shape the security industry.  

Progress in security

In 2019, the latest advancements in AI techniques, such as machine and deep learning, began to make a more significant impact on the security industry.

Since then, AI breakthroughs have only accelerated, especially generative AI which uses deep learning techniques such as large language models (LLMs), natural language processing (NLP), large vision models (LVMs) and multimodal models.

While the uses and applications of AI are broad, these architectures and techniques are making the greatest impact on modern computer vision, in part due to the ubiquitousness of surveillance cameras and availability of quality video.

Together these factors are transforming the world of physical security.

From pixel-based analytics to deep learning detection

When it comes to surveillance, detecting movement – or, better yet, detecting objects – has obvious advantages as far as alerting property owners of an intrusion.

This ability originated with pixel-based analytics that recognize changes in pixel values in order to detect motion and alert a user.

However, these analytics are susceptible to false alarms due to changes in light and other environmental factors.

The transition from pixel-based analytics for motion detection to deep learning models for object detection (and classification), has helped to significantly reduce the number of false positives – a game-changer for physical security.

As edge processing becomes more powerful, vendors and their customers are able to run “lightweight deep learning models” (which require fewer resources and limited computational capabilities) at the edge to detect and classify the two most prevalent objects in physical security: people and vehicles.

Once a deep learning model is trained to detect and classify people and vehicles, vendors can package these AI models at the edge to significantly improve security systems.

Issues with shadows, swaying objects and non-human moving objects, like animals, can be ignored.

These advancements in deep learning models are not perfect, so security practitioners should always be aware of both strengths and limitations.

The importance of people, process and technology

One common and unrealistic expectation is the belief that an AI model will work consistently at various locations with the same output level.

In fact, as with any technology, AI models involve comprehensive processes that require people to work with them to make it successful.

It is not a “set it and forget it” scenario.

While deep learning models and resultant object detection and classification will inevitably advance and drive more accurate incident alerts, humans are required to manage its development and assess its outcomes, such as determining the severity level of incidents and how to respond.

That said, over time, low-level incidents can be automated by the system.

As deep learning AI models become better at detecting and classifying people and vehicles, basic logic and context will make capturing data analytics about them much easier.

Whether counting objects, determining an object’s direction or capturing sub-features of an object, the customer can now decide what data they would like to obtain in order to help provide better context to what is happening in a scene.

Information like counting, occupancy, loitering, queueing, crossline detection, direction, frequency and speed are among the contextual data that end users can gather from an event, incident or just day-to-day operations, to look for anomalies or gain insights.

As AI and its various architectures and techniques continue to evolve, they will shift the security industry from a reactive to proactive approach.

The role of large language models

Large language models (LLMs) – programs trained on vast amounts of data in order to recognize and generate language – will have a substantial impact on the security industry.

For example, LLMs can take the form of a more informed digital assistant that is fully aware of the context of a particular vendor portfolio of software, hardware and services.

As a result, vendors will incorporate LLMs into their support systems to improve the customer experience before, during and after the sale of a solution or service.

As security systems become more proactive, LLMs can be used as digital assistants during threat detection and response to assist a human operator through the process of engaging, reporting and documenting an incident.

Many organizations follow Security Information and Event Management (SIEM) – an approach toward security where technologies are integrated and information is synthesized to accelerate threat detection and response – and technologies in AI can help enhance SIEM systems to efficiently identify and correlate data with incidents.

The collection of historical data pertaining to events and incidents can be used to fine-tune LLMs and augment their output in efforts to proactively look for threats and risk patterns for someone to make real-world decisions.

An expanded view through large vision models

Large vision models (LVMs) use similar architectures to LLMs but are applied to images and videos.

Edge-based deep-learning models are adept at detecting and classifying humans and vehicles, but what about other objects?

While edge-based processing will continue to improve over time, LVMs could provide a more immediate answer for customers who are looking for solutions.

Whether they need a custom object, or multiple objects, detected with a certain level of accuracy, LVMs can expand their computer vision range with both cloud-based and on-premise solutions.

LVMs and other open-source image models currently acquire most of their training data from public sites on the internet to build thousands of object definitions, but think about the potential that exists with the security industry’s untapped video data repository.

This plethora of video could serve to re-train these models for certain detection and classifications.

Even though cameras in the real world will always have a variety of perspectives and lighting conditions, some of these open-source, public models perform quite well.

Ultimately customers will decide whether to use these public models as-is or transfer learning to their own private data.

The decision comes down to their overall AI journey, the AI maturity within their organization and whether or not the ROI model fits.

Still, the potential contributions to LVMs by the security industry, and the impact on computer vision, are quite significant. LVMs and LLMs will be just some of many AI architectures that will enhance the detection, response and mitigation of security threats, especially when augmented by a human.

The correlation of image with language will be a significant factor in achieving these positive results.

Next up: large multimodal models

Large multimodal models fuse multiple data sources – text, images, video, audio, computer code, etc. – to provide better context and AI output.

For example, the fusion of radar data with video data provides more context and detects objects moving at a certain speed.

Acoustics and video data are fused to provide context for aggression-type events and incidents.

By fusing or correlating multiple data sources in an AI model, outputs and outcomes drastically improve to reduce false positives and negatives.

In multimodal models that combine text and images, the architecture will normalize smart searches for events, incidents and objects of interest across security systems within the next few years.

One development worth paying attention to is OpenAI’s Contrastive Language-Image Pretraining (CLIP), a multimodal model that learns to connect images and text by encoding both modalities into a shared vector space.

This breakthrough will enhance text-to-image search, image classification and object detection.

In other words, searching through petabytes (i.e., 1 million gigabytes) of images and video data can be done more efficiently with models run in the cloud or on-premise.

Properly positioning yourself for the future of AI

With an increasing number of video surveillance cameras and Internet of Things (IoT) devices generating more and more data, AI offers the ability to extract value and provide real-time analysis, task automation and actionable insights.

The integration of language models, multiple modalities and computer vision applications presents new possibilities to interpret video data and enable our devices to better understand a scene and act upon it appropriately.

The implications for improved security, operations and business are significant.

For today’s security professionals and their organizations, it’s important to stay informed and on the leading edge.

As far as AI architectures and techniques, it’s essential to understand where they fit technologically, commercially and most importantly ethically and legally, before incorporating them into your solutions or services.

There is no doubt that AI will continue to influence the security industry, so professionals must properly position themselves for the future.

This article was originally published in the August edition of Security Journal Americas. To read your FREE digital edition, click here.

Newsletter
Receive the latest breaking news straight to your inbox