Synchronizing MPEG DASH streams with traditional broadcasted TV
Have you ever had a video call where sound and image were not really fitting each other? Or maybe you tried to watch a movie, but the subtitles were slightly delayed? Then you know how important synchronization is. Even the smallest difference can spoil a perfect experience.
In ImmersiaTV we give you additional video streams extending what you see on TV. But we don’t want you to feel like you are watching two videos at once. You should be able to smoothly change your focus from TV to tablet or HMD and back, as one consistent experience. They all show the same story, the same events and actions, maybe from different points of view, but still exactly at the same moment in time. And they need to be perfectly synchronized.
With modern “smart” devices this is relatively easy. Your TV, tablet, phone, HMD can use the network to decide, well enough for our perception, what should they present and at what time. This method is precise and easy to use, but we don’t want to rest on our laurels.
What if your TV is not connected to the IP network and it only receives broadcast television signal? The same channel on different people’s TVs will be received with different delay depending on their location, provider, transmission method (terrestrial, satellite or cable). Now, your smartphone must be really smart in order to synchronize with your TV!
In fact the only option it has is to recognize directly what content is your TV playing. You need to record a short part of video or sound and then there are algorithms that can find it in database. The common idea behind them is calculating a “fingerprint” – the set of characteristic features of the sample – that can be easily compared to find the best match. This way we could detect what is on TV and adjust additional devices to it.
We decided to start playing with the synchronization based on audio recognition. It should be easier to use (you don’t need to aim at a TV) and potentially more precise (higher temporal resolution of sound). There is a very promising library Aurio that implements several methods lying under music recognition services:
Our initial tests made us really optimistic – this is an approach worth trying and we researched it further. Is it possible to adapt existing tools for our scenario? Will the precision and reliability be sufficient in real environment? We may need to find our own solution, but expect us to include this scenario in Pilot 3.
Author: Szymon Malewski, PSNC
Photo: Eryk Skotarczak, PSNC