Smart Cities/Spaces

Linker Vision Uses Vision AI to Optimize City Operations

Kaohsiung City Government

Objective

Linker Vision provides data-centric computer vision solutions that are optimized for rapid deployment and efficient scaling of vision AI applications, from the cloud to the edge. Kaoshiung, one of the largest cities in Taiwan, serves as a key end customer for Linker Vision. They've implemented advanced smart city solutions using NVIDIA Metropolis to address a critical urban challenge with departmental silos. These fragmented government infrastructures make it extremely difficult to share essential information quickly and efficiently, impeding coordinated responses to city-wide issues. Linker Vision uses NVIDIA’s three-computer strategy—simulating digital twins with NVIDIA Omniverse™, training AI models with NVIDIA NeMo™ Curator, and deploying AI agents with NVIDIA AI Blueprint for video search and summarization (VSS)—to help cities provide better situational awareness and make proactive, data-driven decisions. This also helps break down organizational silos and lay the foundation for a truly smart city.

Customer

Kaohsiung City

Partner

Linker Vision

Use Case

Computer Vision / Video Analytics
Simulation / Modeling / Design

Products

NVIDIA Metropolis
NVIDIA AI Blueprint for Video Search & Summarization
NVIDIA Omniverse Enterprise

Reduced development efforts by 85% using the VSS blueprint to build visual AI agents.

Reduced response times by up to 80%, enabling emergency services to reach sites faster.

Enables incident reports to be easily built using VLMs with more details to assess risk levels.
Created a unified platform using VLMs and a VSS blueprint to break down information barriers and maximize effectiveness at a minimal cost.

Enhancing Urban Situational Awareness With Vision Language Models

One of the key challenges in applying vision AI in cities is the high variability and unpredictability of abnormal events. Traditional computer vision systems are trained to detect standard objects like cars, buildings, or people. However, they often struggle to interpret the overall situation or understand critical events—such as a traffic accident, flooding, or a fallen tree.

To address this limitation, Linker Vision uses VLMs, powered by generative AI, to go beyond simple object detection by interpreting the relationships between visual elements and generating descriptive narratives of the scene. By prompting the VLM to describe what's happening, Linker Vision enables the system to provide intelligent explanations of complex scenarios, helping city responders and decision-makers better understand the situation in real time. This approach significantly improves situational awareness and response effectiveness, particularly in dynamic, unpredictable urban environments.

“Through the innovations of generative AI and VLM, we aim to demonstrate the immense potential of vision AI in smart city development. Integrating NVIDIA technologies, our solutions are becoming more efficient and valuable. The collaboration with NVIDIA showcases how smart technologies can align with urban visions to create meaningful and impactful changes.”

Willy Kuo
CTO & Co-Founder, Linker Vision

How Video Analytics Fuel Smarter, Connected Urban Infrastructure

Cities have a critical urban challenge: departmental silos. Historically, different municipal departments—such as the Water Resources Bureau and the Transportation Bureau—operated on isolated systems developed by different system integrators (SIs) and vendors. This makes it extremely difficult to coordinate timely responses to issues. For example, consider a flooding event detected by the Water Resources Bureau. While this data is vital to the Transportation Bureau—as flooding can severely disrupt traffic flow and public safety—the lack of a unified system meant this information couldn’t be automatically or promptly shared. As a result, departments often worked in isolation, missing opportunities for responses that could mitigate impact on citizens and infrastructure.

To address this gap, Linker Vision developed and deployed an integrated, vision AI-powered platform. It used the NVIDIA AI Blueprint for video search and summarization (VSS) to build video analytics AI agents that can process thousands of live camera streams around the city and deliver deeper insights into traffic incidents. These insights help first responders react quickly and improve city operations. For example, AI agents detect flooding on a major roadway and automatically alert respective agencies and impacted citizens with critical insights on the location, timing, and suggested actions. The AI platform serves as a unified foundation for real-time data, enabling cross-departmental collaboration and leading to a higher level of situational awareness and decision-making across the city.

The NVIDIA Three-Computer Strategy in Action at Linker Vision

Linker Vision structures its vision AI city solution around the NVIDIA three-computer strategy, powering each stage of the pipeline—simulation, training, and runtime.

First, Linker Vision converts satellite and aerial imagery into OpenUSD scenes and creates a digital twin of the city using NVIDIA Omniverse running on NVIDIA OVX™ servers. They use NVIDIA Cosmos™ to generate diverse synthetic video data for complex scenarios like infrastructure damage or flooding, helping cover long-tail corner cases that are hard to capture in the real world.

For training AI models, Linker Vision uses NeMo Curator and nv-grounding-dino for real-world data curation, annotation, and labeling. These real and synthetic datasets are used to fine-tune VLMs to increase accuracy and get better insights into complex urban activities.

For deployment, Linker Vision uses the VSS blueprint, which combines NVIDIA Metropolis vision pipelines with generative AI models, including VLMs based on NVIDIA VILA architecture on NVIDIA DGX™ servers. This lets AI agents detect, understand, and respond to real-world events with meaningful, timely insights for smart city operations.

Finally, Linker Vision connects its vision AI pipeline into a real-time digital twin environment powered by Omniverse. By integrating outputs from its vision analytics pipeline, they create an interactive command center where city officials can intuitively monitor and respond to events across the city.

Advancing AI for City Operations

Linker Vision is actively contributing to the development of AI ecosystems, particularly in smart city governance, AI-powered infrastructure, and autonomous decision-making. In the city of Kaohsiung, Linker Vision is integrating 30,000 diverse smart city camera streams, all managed in a city-scale 3D digital twin platform. The system is trained to understand more than ten major urban and enterprise domains—including transportation, water management, healthcare, and logistics—and 300+ scenarios such as traffic accidents, disaster response, public safety, and infrastructure management.

The vision AI solutions improve livability in cities and provide up to 80% reduction in response times to incidents. Linker Vision’s work, highlighted in a recent GTC talk “City-Scale AI with Digital Twins”, showcases how they integrate NVIDIA AI technologies for real-time AI processing, large-scale model training, and cross-domain AI applications in smart cities, industrial automation, and AI ecosystems.

Tap into the power of VLMs and start developing with NVIDIA AI Blueprints.

Explore VSS Blueprint