Video Concatenation? HLS to the Rescue!

Or, how to deliver unique viewing experiences through dynamic manifest generation This article is focused on HTTP Live Streaming (HLS), but the basic concepts are valid for other HTTP-based streaming protocols as well. A deep dive into the HLS protocol is beyond the scope of this article, but a wealth of information is available online including the published standard: HTTP Live Streaming.  This article is part of a series on online video encoding strategies designed for growth.  You can download the series as a whitepaper here. Concatenation and The Old Way Content equals value, so, in the video world, one way to create more value is by taking a single video and mixing it with other videos to create a new piece of content. Many times this is done through concatenation, or the ability to stitch multiple videos together, which represents a basic form of editing. Add to that the creation of clips through edit lists and you have two of the most basic functions of a non-linear editor. As promising as concatenation appears, it can also introduce a burden on both infrastructure and operations. Imagine a social video portal. Depending on the devices they target, there could be anywhere between a handful to many dozens of output formats per video. Should they decide to concatenate multiple videos to extend the value of their library, they will also see a massive increase in storage cost and the complexity of managing assets. Each time a new combination of videos is created, a series of fixed assets are generated and need to be stored. HTTP Live Streaming and The Manifest File The introduction of manifest driven HTTP-based streaming protocols has created an entirely new paradigm for creating dynamic viewing experiences. Traditionally, the only option for delivering multiple combinations of clips from a single piece of content was through editing, which means the creation of fixed assets. With technology such as HLS--since the playable item is no longer a video file, but a simple text file--making edits to a video is the same as making edits to a document in a word processor. For a video platform, there are two ways to treat the HLS m3u8 manifest file. Most simply, the m3u8 file can be treated as a discrete, playable asset. In this model, the m3u8 is stored on the origin server alongside the segmented TS files and delivered to devices. The result is simple and quick to implement, but the m3u8 file can only be changed through a manual process. Instead, by treating the manifest as something that is dynamically generated, it becomes possible to deliver a virtually limitless combination of clips to viewers. In this model, the m3u8 is generated on the fly, so it doesn’t sit on the server, but will be created and delivered every time it's requested. Dynamic Manifest Generation What is a manifest file? Most basically, it is a combination of some metadata and links to segments of video:
Exemplary Video A #EXTM3U #EXT-X-MEDIA-SEQUENCE:0 #EXT-X-TARGETDURATION:10 #EXTINF:10, Exemplary_A_segment-01.ts #EXTINF:10, Exemplary_A_segment-02.ts
The above m3u8 has two video segments of 10 seconds each, so the total video length is 20 seconds. Exemplary Video A, which, by the way is a truly great video, is 20 seconds long. Now let’s imagine we also have:
Exemplary Video B #EXTM3U #EXT-X-MEDIA-SEQUENCE:0 #EXT-X-TARGETDURATION:10 #EXTINF:10, Exemplary_B_segment-01.ts #EXTINF:10, Exemplary_B_segment-02.ts
And let’s also say that we know that a particular viewer would be thrilled to watch a combination of both videos, with Video B running first and Video A running second:
Superb Video #EXTM3U #EXT-X-MEDIA-SEQUENCE:0 #EXT-X-TARGETDURATION:10 #EXTINF:10, Exemplary_B_segment-01.ts #EXTINF:10, Exemplary_B_segment-02.ts #EXT-X-DISCONTINUITY #EXTINF:10, Exemplary_A_segment-01.ts #EXTINF:10, Exemplary_A_segment-02.ts
Now, instantly, without creating any permanent assets that need to be stored on origin, and without having involved an editor to create a new asset, we have generated a new video for the user that begins with Video B followed by Video A. As if that wasn’t cool enough, the video will play seamlessly as though it was a single video. You may have noticed a small addition to the m3u8, the “Discontinuity Flag”:
#EXT-X-DISCONTINUITY
Placing this tag in the m3u8 tells the player to expect the next video segment to be a different resolution or have a different audio profile than the last. If the videos are all encoded with the same resolution, codecs, and profiles then this tag can be left out. Extending the New Model The heavy lifting for making a video platform capable of delivering on-the-fly, custom playback experiences is to treat the m3u8 manifest not as a fixed asset, but as something that needs to be generated per request. That means that the backend must be aware of the location of every segment of video, the total number of segments per item, and the length of each segment. There are ways to make this more simple. For example, by naming the files consistently, only the base filename needs to be known for all of the segments, and the segment iteration can be handled programmatically. It can be assumed that all segments except the final segment will be of the same target duration, so only the duration of the final segment needs to be stored. So, for a single video file with many video segments, all that needs to be stored is base path, base filename, number of segments, average segment length, and length of the last segment. By considering even long-form titles to be a combination of scenes, or even further, by considering scenes to be a combination of shots, there is an incredible amount of power that can be unlocked through dynamic manifest generation. If planned for and built early, the architecture of the delivery platform can achieve a great deal of flexibility without subsequent increase in operational or infrastructure costs.