Mobile video is growing like crazy. Zencoder customer PBS just announced that viewers watched over 88 million PBS videos on iOS devices in the month of November, 86 million for their PBS Kids app alone. That's a lot of video, and 89% growth since June.
All of this video - like most mobile video today - was delivered using HTTP Live Streaming (HLS), an Apple-created, HTTP-based format for streaming H.264 and AAC video in an MPEG-TS container. HTTP Live Streaming works by segmenting a long video into short pieces, typically 10 seconds, and then providing a M3U manifest that lists each segment. The player reads the manifest and determines when to pull each segment in order to ensure seamless playback. (If you want more in-depth info on HLS, check out our guide on best practices for iOS encoding.)
The problem is that MPEG-TS is an inefficient format, especially at low bitrates. The MPEG-TS format can easily introduce 10%-15% of unnecessary overhead to a file compared to a format like MP4, which increases costs and decreases picture quality, MB-for-MB. And if you're deploying a video application to the App Store, HLS isn't optional - it's mandatory if you want to display video longer than 10 minutes. Apple has rejected countless applications from the App Store for not complying with this policy.
The good news for PBS and other Zencoder customers is that we have recently released major optimizations to our own custom HTTP Live Streaming segmenter. Our optimizations result in major file size improvements compared with many mainstream transcoding providers, and have the potential to save PBS 10% or more of their CDN and storage bill each month.
Here are some results. Notice that the overhead is significantly higher, as a percentage, on low bitrate content. Of course, mobile video delivery is usually done at low bitrates.
Version
Requested bitrate
Overhead
Zencoder (optimized)
364 kbps
4.83%
Zencoder (optimized)
864 kbps
3.23%
Zencoder (unoptimized)
364 kbps
16.34%
Zencoder (unoptimized)
864 kbps
8.80%
Rhozet
364 kbps
16.03%
Rhozet
864 kbps
8.48%
Sorenson
364 kbps
13.58%
Sorenson
864 kbps
10.44%
These numbers were taken from Apple's Media Stream Validator tool. We ran the same file through each system using the same settings: H.264, AAC, 320p resolution, 64kbps audio, 300kbps/800kbps video, and asked for HLS output with 10 second segments. Notice that the overhead goes down as bitrates increase, so at high bitrates, TS overhead is not as important. (If you're interested in running some tests to verify the results, get in touch - after you run the tests, so we can't manipulate them - and we'll give you a service credit to cover the processing.)
How is this possible?
The MPEG-TS format is an older container format that is widely used in the broadcast world, for everything from satellite broadcast transmission to Blu-Ray discs. When the MPEG-TS standard was designed, long ago, it was used by resource-constrained devices who were more sensitive to processing power than bandwidth. Because it is designed for old networks, and because of the small packet size (188 bytes), MPEG-TS can introduce more file overhead than a format like MP4. That was an appropriate trade-off in 1995 - inefficient, but simple, packet placement allowed for early 90′s hardware to decode MPEG-TS, and MPEG-TS was rarely used for low-bitrate transmission before HLS came around. A bit of padding overhead doesn’t really matter on a 15Mbps 1080p stream. But now that MPEG-TS is used in a low-bitrate context, and today’s mobile devices have more processing power than supercomputers of a few decades ago, the situation has reversed, and bandwidth-efficient (but more complex) MPEG-TS is much preferable.
HLS doesn't require the simplified padding (and increased overhead) of the MPEG-TS format, and so Apple has actually optimized its proprietary HLS tools to reduce overhead. The reductions can be significant.
We've done the same at Zencoder. By combing through the MPEG-TS format at the TS and PES packet level, we've been able to achieve a similar level of optimization to Apple's own tools. We've done this through the same approach Apple takes: more efficient placement of PES packets which allows for the removal of unnecessary padding.
Why not just use Apple's tools?
Some people already see this level of optimization, by using Apple's proprietary segmenter (mediafilesegmenter). And if you want to deploy to Apple hardware, this is a reasonable way to go. But Zencoder runs on Linux, in the cloud, and a large-scale deployment of Apple hardware just isn't practical for us. A heterogenous environment is hard to manage. It takes significant devops to run a cloud service, and each additional platform - hardware, operating system, software stack, security model, monitoring, scaling - adds significant complexity. We also get better performance by locating our transcoding and HLS segmenting together in the same system, obviously.
But more importantly, HLS is not just an Apple standard anymore. Many devices are starting to support HLS - Roku, Android, and a number of upcoming platforms are embracing HLS as a streaming standard. So locking ourselves down to Apple's implementation may not be the right decision long-term. What happens if Apple's mediafilesegmenter tool doesn't support everything supported by Roku, or one HLS implementation starts to deviate from another? Without access to the source code, we would be out of luck. HTTP Live Streaming is becoming important enough that we need a flexible implementation that will work well with every implementation, on any device, both now and in the future.