lowest latency HDMI/SDI/NDI capture

thatmattrogers

Hi everyone, I'm working towards a project that uses a lot of live video, which is also going to be simultaneously streamed, so I'm trying to reduce the pain points of making it all work.

one of those pain points will always be syncing audio due to video input lag. since we'll be using multiple cameras it'll be really important to have consistent latency so that we've even got a chance of synced audio.

So the question is:
What hardware do I need to get 4 camera feeds into Isadora simultaneously with minimal (or at the very least consistent) latency?

things that I've played with already include elgato camlink, elgato HD60S, unbranded USB capture dongles, Blackmagic ATEM mini.
___________________________________________________
Below the line is further info for people who really want to get into the weeds with me

-------------------------------------------------------------------------

I accept that the Atem mini would seem like a solid solution as its a single device so it'll have the same latency across the board, but it would limit what we're doing to what the ATEMmini is capable of internally as opposed the broad suite of options that isadora offers. I also haven't found a way to control the ATEM with Isadora in a way that isn't too fiddly to be viable (I'm using Windows so ATEMosc isn't an option)

The project is a simultaneous live performance a theatre and a live streamed event, so the audio synchronisation thing is a bit of a headscratcher to be honest. At the moment the plan is to broadcast the video stream using isadora's RTMP capabilities, but that does come with kinks to iron out. Since the audio cues are all coming from isadora they may need to be triggered at slightly different times for the stream than they are for the room (ie. for tightly choreograophed dance to look right the music will need to play a number of ms later for the stream than it is in the room with the performers). Cimilarlyy we also have some live audio spoken from the cast coming into isadora that needs to be broadcast which will have its own latency complications. If it was only one camera and 2 performers then I could get away with syncing their speech by running radio mics through the audio input on the camera. That way (in theory) that audio will have the same latency as the video. however, with 4 cameras I need to make sure that each camera has the same latency if I want things to sync up at all.

In terms of any amplified speech in the theatre, I'm sort of coming to terms with the idea that my performers may be working with 2 mics at any given time; one for the theatre, and another for the stream. that way their speech isn't having to go through isadora before being amplified.

These are the headaches anyway.

Thanks in advance if you have any genius insights, or obvious solutions that I'm missing.

mark_m

@thatmattrogers

Hi Matt,
Interesting problems... I have some experience with ATEM minis (which I love) both as as Isadora input and for multicam live streaming of events.

Can you please be a bit clearer about the show? Are you projecting live camera feeds in the performance itself? Or are you using the cameras just to broadcast the show? Or are you using the same cameras to do both things? If you could be a bit clearer about that, I might have some suggestions.

BTW there are ways of controlling the ATEM with Isadora in Windows. There's a little app that someone made called Broadcast Controller which works, though can be temperamental. You can use OSC to control it.
More sophisticated is the opensource Bitfocus Companion, which is much more geared towards control from the Elgato Streamdeck, but which will let Isadora control your ATEM with OSC. Now, I have only had partial success doing so, but I suspect that's cause my OSC skills are less than zero. Companion is really well supported with an active community on Slack and also Facebook.

Cheers

Mark (no, not him!)

thatmattrogers

@mark_m I will be sending live camera feeds within the show as well as broadcasting.

For example there's a few sections where we're creating a projected backdrop, live which is then being shot for the broadcast. which obviously couldn't be pulled off with an Atem since you can't process 2 different camera feeds simultaneously. I'm aware that we could use chromakey instead of rear projection but I prefer the look of RP and it works better for the experience of the theatre audience.

Essentially, I lose the ability to work with the various video feeds in simultaneous & separate ways if I do everything through the ATEM which will become very limiting since the whole show is built around playing with live video in various ways.

Although as I look to other projects, it's interesting to hear about Broadcast Controller, I've not come across that one (although I've had some less than stellar experiences with software that claims to work well in providing midi control for an ATEM) so it's great to hear that you've had success with it.

I have tried bitfocus companion but I was really put off by hacky way you have to set it up to get around its streamdeck focus and the fact that it doesn't send any commends over the usb connection, but requires networking to operate. I know that second point is probably true for every technique that you can use to control the ATEM but it feels suboptimal for my use case.

mark_m

@thatmattrogers

Thanks for the clarification. I wonder if you need to think about this as two separate projects: the live performance, and the streaming of the live performance.

When you say "What hardware do I need to get 4 camera feeds into Isadora simultaneously" are you trying to show four video feeds simultaneously, or do you just want to be able to switch between four different sources?

I always have my ATEM attached to the computer via ethernet: you have access to a lot more controls that way, including adjusting the latency. Both the techniques I mention need the ATEM to have an IP address, so they won't work for you if you just want to attach via USB.

thatmattrogers

@mark_m like I said I will definitely be needing to process multiple camera feeds at the same time in various ways which is the reason that the ATEM is not a good fit for this specific project. The live performance and the stream are inherently connected. it isn't as simple as streaming a theatre show. the creation of a film in the live moment for streaming IS the show. So all of the cameras are on stage and being used for different things at different points.

Its very complicated which is why I'm wanting to go into the next phase of development with as ideal of a setup as I can.

It is relatively easy to capture 4 video feeds into isadora; literally any of the devices that I mentioned in my original post will do the job (although avoid using any kind of USB hub for this kind of work they tend to complicate things - so you need 4 available USB ports). The only issue is the need for minimal latency, and for the latency that exists to be consistent between the devices.

Of course this whole thing gets further complicated given that there is an inherent latency in getting the video signal from the stage to the show computer, which is in the tech box. So to give you an idea af where we've come from: during the original R&D about a year ago, each of the 4 cameras (various sony mirrorless cams because that's what we had to hand at the time) were sending a signal out through micro HDMI which was converted with a signal booster box to some video over ethernet protocol, which was then sent over 30m of cat7 before being converted back to HDMI, to then be routed into a pc via 4 elgato camlinks. At that time RTMP wasn't available in isadora so our main video out from isadora went into an elgato HD60s (which has a passthrough as well as capture) so that we could send the stream to both a projector and a second pc running OBS which took care of the streaming.

It was hacky and messy, but we learned a lot. I'm still in the process of gradually refining both the content of the show and the overall setup.

the main concern that I have before going back into development is resolving some of the problems we encountered with audio sync which was a bit of a nightmare. I think that some of the problems I was having will be minimised by the mic audio being run through the cameras as opposed to a sound desk to be recombined with the stream at the end (as was the case with the R&D) but I want to minimise and latency difference between the different camera feeds.

thatmattrogers

Whilst writing that last reply, and describing some of the quirks of using multiple usb capture devices, I got to thinking about USB bandwidth. One of the quirks that we encountered was that not only did we need to avoid USB hubs but we needed to carefullywork out which USB ports would work for video capture in conjunction with each other.

Essentially the capture cards overwhelm a bus so each one needs to kept away from each other.

So maybe the latency problems come from USB bandwidth issues.

given that thought, Does anyone have experience with PCI capture devices like the Blackmagic decklink quad?

its an expensive option (although it's probably cheaper than 4 camlink 4ks) but if it significantly improves reliability and reduces latency it may be worth it.

CitizenJoe

Hi @thatmattrogers,

I'm doing much the same thing as outlined in this post: DeckLink Quad 2 simultaneous input and output | TroikaTronix Forum

Cheers,

Hugh

Woland

@thatmattrogers said:

one of those pain points will always be syncing audio due to video input lag. since we'll be using multiple cameras it'll be really important to have consistent latency so that we've even got a chance of synced audio.

The folks in Office Hours would also be great to ask about this. They're very knowledgable.

Here's the link to sign up for the Zoom Call that's basically a hangout spot 22-hours a day for pro AV folks all over the world that are part of the Office Hours community.: https://090-media.zoom.us/meeting/register/tZ0odu2tqD8rH9KS0B0Vwf5I6d1i6MxkjwQ7

You can just show up anytime and ask questions. It's great.

Best wishes,

Woland

mark_m

@thatmattrogers

Ah, you're making 'live cinema'. OK, now I understand what you're trying to do.

It seems to me that you're really trying to reinvent a wheel that's already been built and refined by the likes of Katie Mitchell, Imitating The Dog, Kiss & Cry Collective etc etc. You might do worse than trying to pick their brains. Andrew from Imitiating the Dog is on this forum somewhere.

But for the live audience the delight is in not just seeing the 'film' but also seeing how it's constructed. Think of a typical Katie Mitchell show, with the Cinema Screen above showing the 'film' and the making of the film going on below the screen. We, the audience, get to see the whole process. It seems to me that what you're proposing for the remote audience is only showing them the 'film'. Is that right, or have I got that wrong?

On to the mechanics of it: each link in your chain will introduce latency, so with something like HDMI to Ethernet, then back to HDMI you're introducing latency with each of those extender boxes. HDMI is not your friend in situations like these. You should be looking to use SDI instead, where you can have a single cable from camera to capture device over a much longer run than with HDMI. Camcorders like the Sony EX1 are excellent, have SDI out, and are cheap as chips now 'cause they're 'only' 1080p, not UHD or 4K.

Likewise, as you said, USB is slow. You want a computer with the Blackmagic Cards or similar if you want four simultaneous inputs. If you have four SDI cameras your latency would be very low, and would be the same for each camera.

Maximortal

in my experience in your setup NDI is not a reliable option to combine live video becouse it's tend to have a non predictable latency and that is a big issue for audio sync. I suggest you to stay with SDI as long you can and converto to HDMI only when you deliver signals to a specific device ( monitor, projector and so on) a pretty good solution to combine isadora with cameras and also stream all the stuff can be to use vMix as central hub for all your signals an to route em where you need. Off course you need some capture cards to convert SDI and grab it into vmix. To let isadora and vmix see each other you have a couple options. 1 use NDI ( if you stay in the same machine latency is not a issue ) or 2 you can use screen capture actor recently developed ( if i remember well also vMix have a screen capture feature)

Kathmandale

Hi All,

I've not been on the forum for a while (been very much lights, not so much video for the last few months). So, sorry I missed this thread when it started. (And thanks for referencing us @mark_m - hi ).

I (or imitating the dog) have had good results with 4-way PCI capture cards previously. I think we used a datapath version on our last show but have used a BM Decklink cards as well. We had some problems with the BM machine I think but I wasn't directly involved in that project so cant speak to why; it may well be that the new Issy release fixes any issues we were having.

If a 4-way (or 8-way) PCI card like the BM Decklink is affordable then that's definitely the best way to go. I wouldn't really have anything to add that people havent said or that isn't on the thread @CitizenJoe linked to.

There is one other option to consider though, if you can't afford to build a machine around a PCI capture card. Use a single capture card (USB, USB-C, whatever) with the four cameras combined into a single image using a Multiviewer, like these from BM. The MV might add a frame of latency, but it will be consistant and if you can stretch to a 4k MV and single 4k capture card you won't be losing any resolution. So, assuming you can get all you video from cameras to the MV with the same amount of latency, from that point on they will be locked in sync (as they're now part of a single 4k video feed). In Issy you then split that feed up into your 4 cameras.

This solution probably isn't 'as good' in the round as the 4-way PCI:

you're adding an extra thing which will likely add some latency and is another point of failure
there may also be performance impacts from capturing a single 4k source compared to 4x HD sources (or not, maybe the opposite, I don't know without testing)
you've got to do some more work in Issy to break up the 4k image into the 4 cameras

But, it might be a lot cheaper and I imagine it will be very consistant. You can also combine the multiviewer option with other options, for example if you need to sync audio to 4 'main' cameras but decide you need a 5th camera that is used for things where the audio sync isn't key (backdrops, cut aways, etc) then you can add the 5th camera on a different capture card on a different bus. Or have the multiview as one input to your ATEM and use the other inputs for other things...

[as a slight aside, the multiview trick can be a great way to get large numbers of video sources into a machine, especially if resolution isn't important. I think I had 16x low res security cameras coming in a single source to be broken up and used on a single input once. It was very much about quantity not quality but it was security cameras, it wasn't meant to look good!]

Kathmandale

Now, sound is very much not my thing but here I go anyway...

I would suggest that rather than giving your performers 2 mics (unless that's an artistic choice) to route your sound inputs through a desk as normal and mix everything there, including all your pre-recorded sound. You can build this mix to sound correct in the room as normal.

1st, try taking that mix from the desk as a sound input for the stream. There's a good chance it will work and you wont need to do anything else to it (assuming you get your sync issue sorted).

If that doesn't work great or you need a different mix, see if you can do it from the desk as a separate mix on an aux output so you can get it right in the room but make separate adjustments to the sound for the stream.

I would be very wary of using the camera microphones, it feels like a way to make a lot of trouble for yourself. If you can fix the sync issues between the cameras (which I think you can) then you should be able to make your show audio sync to the video.

Kathmandale

@mark_m @thatmattrogers

Finally, on the ATEM control topic, we've had excellent results using Companion to control ATEM switches. They were great in rehearsals when we needed buttons on the Streamdeck to quickly switch cameras.

I think for the show we stripped it back and used a wee app simply called atemOSC which receives the OSC commands and parses them on to an ATEM. Eventually, the OSC came from the LX desk which was driven by timecode, no buttons needed! But it could obviously be tied into your Issy patch.

thatmattrogers

@kathmandale I hear what you are saying but for now the only thing that I'm trying to solve for right now is the live streaming element of the project. So if the most reliable way to sync the audio over multiple camera feeds is to run our sound through one of the those cams mic inputs... that's what I'm going to do. In reality at the venues we're going to using we probably aren't even going to need to amplify the cast in the theatre space itself.

This thread is really only about trying to that syncing problem sorted, and primarily its about getting the right hardware for the job. I'm going to do some experiments with a PCI video capture card with 4 inputs next week, which should at least remove USB bandwidth issues from the equation and I'll come back with the results of my testing so there's a resource on the latency of various devices available for anyone who needs it.