[ANSWERED] Large Quantity of NDI Outputs Causing Load Issues
-
Hello internet,
My apologies if this has been addressed somewhere else, but the search function was being very buggy for me. Unusable on Firefox and barely usable on Chrome.
I am using Isadora to output to more stages than I have ever before. The project calls for sending (40) 640x480 NDI streams out over a gigabit network to (40) Raspberry Pis, each connected to an individual CRT TV built into a huge wall. I will be using Izzy to map a single video image over the wall of TVs. So far I have tested the setup with (27) Pis. Each NDI output is a separate stage defined in Izzy. With this many outputs Izzy is lagging. All I'm outputting is a simple test pattern I created consisting of a background color cycling through hues, a text draw and a few shapes. Load hovers around 100% in Izzy, and framerate doesn't quite get to the 30fps I want, BUT the playback computer has plenty of overhead. CPU, GPU and RAM loads are only about 25%. The receiving Pis use about 75% CPU to decode on their end. So, why is Izzy so loaded down when the PC has more power to give? Any ideas on where the bottleneck is and how I could address the issue?Playback PC:
Izzy 3.0.7
Windows 10 Pro
AMD Ryzen 3 3100
16GB RAM
NVIDIA GeForce GTX 1660
-
Unfortunately I think you are hitting a current bottleneck in Isadora 3. NDI is rather cpu intensive and I believe this process is tied to the scene processing thread @mark ?? So the thread/cpu core is maxed out. (you will likely see a core running at 100% if you open Task Manager, click on the Performance tab, and right click to Change graph to: Logical processors.
If you have a Mac you could likely get around this by installing Isadora into multiple application folders, and running multiple copies (each will have its own core scene process) and break the job up between these instances.
Unfortunately this workaround is not available on Windows (only one isadora.exe process is allowed to run) -
Like DusX explained, Isadora can only operate as one instance on Windows. Mac can have multiple instances of Isadora running. NDI in general is a heavy process for encoding video. What you could try is sending stages that have a higher resolution and split them up again at the Pis. So for example in the program running on your PI, you could use an Offset to determine which part of the NDI stream it should showcase. (Since you already have Ethernet running you could also use OSC for this. Since you know the Network IP of all your Pi's it is quite easy to send them their offset using OSC during the startup of your patch for example)
Hopefully, your Pi's will able to manage a higher resolution :)
-
I think the others have pointed out the expectations for the performance elements as NDI is a processor intensive task in general that will peg certain limited resources before others but I am concerned about a particular point you mentioned that has not been addressed yet, specifically that you need to have 40 NDI 640x480 traverse a single gigabit network. There are some "gotchas" that I wanted to share:
- You are extremely likely to saturate that gigabit connection unless you are running NDI at lower bitrates than traditional spec. The general rule of thumb for a 1080p 30FPS image from a camera is that five feeds can traverse a gigabit network (it my experience, it's often fewer before saturation). If we assume that a 640x480 image will use 1/4th of the normal 150-200 mb/s NDI bandwidth, landing you in the area of 35-50MB/s per stream, your total expected bandwidth, you are at 1.4 gigabit on the low end. So you'd hit a wall there.
- Now, lets assume that you are using a lower bitrate for the feeds, as is common in NDI feeds generated by scan-converters, for example. Those hover around 50MB/s at 1080, so let's again assume that your feeds are a quarter of that, now you are in a comfortable range of 500MB/s. I don't know which spec Isadora will implement by default, however, in either situation, there are now some other issues:
- Packet frequency is an important element to consider on your NIC. Last month, I purchased a consumer-grade NIC for a computer to use in a new broadcast center, and when my NDI network collapsed, I realized it was because of the frequency of packets coming into my NIC not being handled properly by the chip in the NIC even though the 10Gig LAN was properly set up for the dozen or so NDI feeds we were using. This computer was no slouch, either, roughly $10K, but that NIC needed to be replaced with enterprise hardware. Hopefully your sending computer can be set up to manage this.
- On the decode side, microcontrollers and even chips like the Pi can run into trouble when there are many NDI feeds traversing a network, even when they are not actually subscribed to NDI. I found that a Skaarhoj controller in the new studio was malfunctioning because its internal CPU was overloaded merely by rejecting the MDNS packets from the eight NDI cameras on premise. I worry that 40 senders may interrupt the performance of your Pis, even if things appear to be working in isolation. There is a reason that specialized ASICs are made for decoding NDI by companies like Birddog, it's really tricky to do this correctly.
Apologies if you have already considered and accounted for all of the above; I wanted to post these notes for the benefit of others who may come across this thread as it seems like you have done a good share of testing. That studio I mentioned had a six figure budget for dealing with 8 cameras, and getting the NDI network set up even in that world was very challenging to do properly (in part because we needed to die multiple buildings back together and such, but in part due to some of these items).
Good luck with this project! I hope we get to see more of it.
-
Hopefully, our advice gave you some pointers / showed you the right direction.
I marked this topic as Answered, feel free to reply or create a new Technical Support ticket.
https://support.troikatronix.c...
- Juriaan
-
Thank you all for the info! My apologies for not responding sooner. This project is on the back burner at the moment and my mind moved on to other things.
That's a bummer, but it is understandable. Follow up question: Does each scene in Isadora run in its own thread, or are all active scenes running in the same thread? Is there any sense it getting around this by activating multiple scenes in Izzy and having each one output some of my NDI streams? Can I split the burden among my CPU cores that way?
The only macs I have available are quite old. I think my solution might be to jump over to multiple windows playback machines synced in their operation via OSC triggers. 27 outputs was almost stable on my current machine. If I pop over to two machines, each handling 20 streams, I think it could work.
-
Thanks for all the info! I did already consider some of these issues. It was a while back, so I can't cite my source at the moment, but I found some technical doc somewhere that listed NDI 640x480 bandwidth needs at 20 MB/s. I have found that to be true in RW testing as well. 40 x 20 = 800 MB/s, which I figure is a solid amount of overhead on a gigabit line. In any case, I told my client we'd probably bottleneck somewhere in the high 30s and if we get to 40 streams, it will be a good day.
Additionally, my raspberry pi receiving software (https://dicaffeine.com/) has a check box for "low bandwidth". Not sure what that one does exactly, but it lowers my needs to 12MB/s per stream and with SD video on CRT screens, whatever quality loss happens is certainly not noticeable to me! So, I think my bandwidth needs will actually be around 12 x 40 = 480 MB/s.The packet frequency and decode side rejection issues I had not thought about (appreciate you mentioning them!) but I also feel pretty good about not hitting them. My half scale (27 stream) test didn't show either issue. My Pi receivers were all happily displaying full frame-rate streams, my only issues were on the supply side.
-
I have done that in a past life (4 years ago) when I was using TCP Syphon to send video signals to RPI 1B+ units. Unfortunately, my current receiver setup (https://dicaffeine.com/ running on a mix of RPI 3B and 3B+) don't have that capability. The units are already at 75-80% CPU just decoding the SD NDI streams and the software doesn't support cutting things up.
Another option I am consider is forking my network and playback system into two layers. A master Izzy playback computer would send 2-4 NDI streams containing half or quarter of the wall's imagery to 2-4 matching second tier Izzy computers over network A. The 2nd tier computers would then cut those images up and pop them out to all their respective RPI receiving nodes over network B. Maybe a little crazy? Maybe could work?