Why Aptibit built a native C++17 streaming engine for Visylix instead of wrapping FFmpeg. The architectural decisions behind 5,000+ streams per node at 22% CPU.
Why Not Just Use FFmpeg?
When embarking on ambitious projects in the realm of video streaming, particularly those demanding high-performance live streaming capabilities, a foundational question inevitably arises: "Why not simply leverage FFmpeg?" It is a fair inquiry, given FFmpeg's status as a cornerstone of the digital media landscape. This open-source multimedia framework is the backbone of countless applications, from global video platforms to desktop media players. Many commercial streaming solutions and video management systems are built upon its robust foundation.
However, at Aptibit Technologies, when we began developing Visylix, our enterprise AI video management platform, we opted for a different path. We chose to build our streaming engine from the ground up, rather than solely relying on FFmpeg's extensive but often narrowly focused capabilities. This decision was not made lightly; it represented a significant investment in time and engineering effort. The reasons behind this choice stem from the very nature of what we aimed to achieve: a platform capable of handling massive concurrency, sub-second latency, and integrated AI processing on every live stream.
The Problem We Were Trying to Solve
Visylix is engineered for enterprise-grade AI video management. Our clients typically connect thousands of cameras to a single deployment, requiring live video with ultra-low latency ideally sub-second and the simultaneous execution of multiple AI models on each stream. This demanding environment operates 24/7, necessitating unwavering stability and minimal resource degradation.
During our evaluation of existing video management systems built around FFmpeg, we encountered a critical bottleneck. These platforms began to falter around the 300 to 500 camera mark per server. CPU usage would spike to an unsustainable 94%, live stream latency would balloon to several seconds, and stream drops became commonplace. This was not a limitation of hardware; it was a fundamental architectural constraint imposed by the way FFmpeg was integrated. The sheer volume of video processing required for so many concurrent feeds overwhelmed the traditional approach.
What FFmpeg Was Built For
FFmpeg is, without question, a phenomenal tool. Its core strength lies in its ability to transcode media files, convert container formats, and manage individual streams with an unparalleled breadth of video codecs and audio encoders support. It excels at tasks like converting file formats, handling archive formats, and performing single-instance video recording. In essence, FFmpeg is a powerful toolkit, a collection of highly optimized libraries and a versatile command-line tool for media manipulation.
However, using FFmpeg for live video surveillance at the scale we envisioned means asking it to perform tasks far beyond its original design parameters. A typical FFmpeg-based live stream pipeline for such scenarios involves multiple decode-encode cycles: decoding the incoming stream, re-encoding it for storage, decoding it again for live viewing, and finally re-encoding it once more for browser-based delivery. Each of these operations is CPU-intensive. When multiplied by thousands of cameras, the cumulative CPU load becomes the limiting factor. Most video management platforms built on FFmpeg accept this limitation and advise customers to deploy more servers. Our objective was to create a solution that broke through this ceiling, not one that simply scaled out. The global video streaming market is projected to reach USD 137.9 billion in 2024 and grow at a compound annual growth rate of 22.3% until 2033 to reach a value of USD 843.0 billion, underscoring the need for scalable solutions.
The Decision to Build from Scratch
The decision to engineer a custom streaming engine, particularly in C++17, represented a significant deviation from the easier path of wrapping FFmpeg. This approach added months to our development timeline and necessitated the reinvention of solutions that FFmpeg had already perfected decades ago, such as handling various video codecs and file formats.
However, this intensive undertaking granted us the unparalleled ability to design every layer of our system with a single, definitive purpose: to efficiently manage massive concurrent video streams, integrate sophisticated AI video processing, and minimize resource consumption. This deep-level control allowed us to architect a platform that could scale linearly. Three critical architectural decisions set our custom engine apart from traditional FFmpeg-based solutions.
Proprietary Asynchronous I/O
A pervasive bottleneck in many traditional video streaming platforms, including those using FFmpeg, is the overhead associated with Input/Output operations. Each system call made for every I/O event adds up significantly. When managing thousands of streams, each generating dozens of I/O operations per second, this cumulative overhead can cripple performance.
To combat this, we developed a proprietary asynchronous I/O engine. This engine is engineered to process operations with near-zero overhead per individual operation. By drastically reducing the systemic cost of I/O, we achieved a throughput that is approximately three times higher than traditional approaches on identical hardware. This forms the bedrock of our ability to handle more large videos and video recording tasks without performance degradation.
Zero-Copy Memory Architecture
In a typical FFmpeg-based pipeline, video data is subjected to multiple memory copies as it navigates through various stages, such as decoding, AI analysis, and storage. For a scenario involving 5,000 streams at 1080p resolution running at 30 frames per second, this can consume nearly 900 GB per second of memory bandwidth, solely dedicated to moving data around. This is inefficient and fundamentally limits scalability.
Our custom architecture fundamentally eliminates these unnecessary data copies. Video data flows seamlessly through the entire pipeline without redundant duplication. This optimization significantly reduces memory bandwidth consumption by over 80% and, crucially, liberates substantial CPU cycles. These freed-up cycles can then be dedicated to actual video processing and AI inference, rather than being consumed by mere data wrangling.
Purpose-Built Memory Management
Standard memory allocators are designed for general-purpose applications, offering a balance of features for diverse workloads. However, video processing at scale is anything but general-purpose. Frames arrive at variable rates, AI models require dynamic memory allocations, and recording buffers must expand and contract fluidly based on event triggers.
We engineered a specialized memory management system precisely optimized for these dynamic patterns. In rigorous 30-day continuous operation tests, Visylix maintained consistent, high performance throughout the entire period. In stark contrast, platforms based on FFmpeg exhibited a noticeable performance degradation of 15 to 20% over the same duration. This specialized memory management is vital for maintaining predictable performance, whether handling live streams or complex video workflows.
The Results
The tangible outcomes of our architectural decisions are profound and speak volumes about the power of custom engineering. A single Visylix node can effortlessly handle over 5,000 concurrent streams while maintaining a CPU usage of just 22%. To put this into perspective, the same hardware running a traditional FFmpeg-based video management system typically manages only 300 to 500 streams at a crippling 94% CPU usage.
Furthermore, live view latency is reduced to below 500ms, delivered seamlessly via WebRTC. This stands in sharp contrast to the 2 to 5 seconds latency typically experienced with FFmpeg-based platforms relying on HLS stream delivery. Our architecture also supports over a million concurrent connections across a full deployment, a capability that few competitors in the video management space can claim or even market, as their underlying architectures simply cannot sustain such demand. This is crucial in today's rapidly growing live streaming market, which is projected to reach $345 billion by 2030.
Was It Worth It?
Undeniably, constructing a custom streaming engine from the ground up demanded significantly more time and resources than simply wrapping FFmpeg. We had to meticulously address challenges in video codecs, container format handling, streaming protocols negotiation, buffer management, and a myriad of other complex issues that FFmpeg solves out of the box.
However, the reward was the creation of a platform that delivers capabilities unattainable by any FFmpeg wrapper: a system that handles tenfold more streams on the same hardware, consumes five times less CPU, and achieves sub-second latency. For our clients, this translates directly into fewer servers, reduced operational costs, and truly real-time video streaming that enhances their operations and user experiences.
Every startup in the video management and streaming protocols space faces the fundamental question: build or wrap? Many opt to wrap FFmpeg for a faster time to market. Our strategic imperative, however, was not speed to market, but long-term performance and scalability. We optimized for the critical moment when a customer connects their 1,000th camera, ensuring that performance does not just hold, but excels.
The Takeaway
There is no substitute for building technology with architecture tailored to perform at scale. FFmpeg remains an incredible tool, indispensable for its intended purposes in video processing and format conversion, supporting a vast array of video codecs and audio encoders. However, when a product requires handling thousands of concurrent live streams, integrating complex AI video processing, and achieving sub-second latency, an architecture explicitly designed for these challenges is paramount.
We built Visylix from the ground up in Kolkata. Today, it demonstrably handles more concurrent streams than any other video management platform on the market. That would not have been possible by merely wrapping FFmpeg. Sometimes the hardest engineering decision is the right one.