resources JS vs actors

DillTheKraut

There is another question rising regarding performance.

My patch contains 40 text draw actors. If I put the half of it to stand by, it reduces the LOAD to a certain extend, but far from what it does, if I take them out completly (deleting).
The same goes for the video delay actors (did not try others).
Is this usual? I thought "stand by" would be comparable to "shut off" or "bypass" and kind of restore nearly 100% of the resources used by it?

mark

@dillthekraut said:

I added some video delay actors, and again Isadoras LOAD is going up to 120%, while the overall CPU system stays at 90% idle and GPU is at around 25%.

The video delay actors convert the GPU based image to a the CPU. I made this choice when designing the actor because of memory constraints on the typical video cards is simply less than the amount of RAM on a system.

For example, a five second delay at 30fps of 1920x1080 images requires 1.2 Gigabytes. If you add four of those delays, you've now run out of memory on the 4 GB GPU on the GPU on my relatively powerful Mac Book pro and Isadora crashes.

Given that most systems have so much more CPU memory than GPU memory, it seemed wise to make this choice. (I expect any professional level system to have 16 GB, but really a lot of folks now have 32GB or more.) Unfortunately there is a high cost performance wise when moving images from the GPU to the CPU. That's what you're seeing when you add those video delay actors.

If I put the half of it to stand by, it reduces the LOAD to a certain extend, but far from what it does, if I take them out completely

"stand by"? I don't know what you mean. Do you mean Pause Engine???

Best Wishes,
Mark

DusX

@dillthekraut said:

LOAD is going up to 120%

LOAD is a measure of how much time is being used to process each frame, based on the target frame rate. It is NOT a measure of your system resource usage. In Isadora it is most important to know if the scene can process at the selected framerate, LOAD provides that, a measure of 100% means that calculating/rendering the frame is taking all the time available between each frames delivery. This will lead to dropped frames.

Isadora is both multi-threaded and single-threaded. Numerous processes including video playback are very multi-threaded. Video effects, mapping, compositing etc.. are massively multi-threaded due to the use of the GPU. The scene-graph (the calculations, routing etc..) you build within your scene are single-threaded.

DillTheKraut

Thank you for the explanation @mark, I suspected something like that.

@mark said:

If I put the half of it to stand by, it reduces the LOAD to a certain extend, but far from what it does, if I take them out completely"stand by"? I don't know what you mean. Do you mean Pause Engine???

No, I just tried to find a word for comparison for what I understood and thought the "bypass" would work. It seems, that putting "bypass" to 'on' is not the same like "deactivate actor". What I expected was a full recovery of the resources the actor would consume while NOT 'bypassed'.
E.g. LOAD without the actor at all = 50%,
adding an actor 'bypass' off = actor is working = LOAD 80%,
set actor bypass to "on" = LOAD back to 50%,

But this isn't the case, instead it is like this:

LOAD without the actor at all = 50%,
adding an actor with 'bypass' off = actor is working = LOAD 80%,
set actor bypass to "on" = LOAD 65% instead of expected 50%,

It is an example only. The numbers might be different.

DillTheKraut

@dusx thank you for clearification. I'm aware of this. But still, shoudn't there be a connection between system recources and the LOAD (resp. possible framerate and cycles)?

My question here is, what is the bottleneck if the CPU and GPU are far from beeing stressed? As there isn't any video file playing and all content is generative only or comes from videocapture, it shoudn't be the Flashdrive (1500Mbit/s).

Is it maybe the BUS system where the Data between CPU, RAM and GPU are connected? Or maybe just the RAM itself?

Marks explanation about how the video delay works, could be explained by this.

DusX

@dillthekraut said:

maybe the BUS system where the Data between CPU, RAM and GPU are connected?

Without looking at your file I can only guess.. but for sure one that is common is moving GPU data to the CPU (up to the GPU is fast).

If you would like me to take a deeper look, please feel free to open a support request, where I can then request a copy of your project file.

DillTheKraut

@dusx

Did it. Thank you!

mark

@dillthekraut said:

My question here is, what is the bottleneck if the CPU and GPU are far from beeing stressed? As there isn't any video file playing and all content is generative only or comes from videocapture, it shoudn't be the Flashdrive (1500Mbit/s).

The speed of the hard drive has nothing to do with this issue.

As I mentioned above about the Video Delay actor, it needs to move the image from the GPU to the CPU. Then I said "Unfortunately there is a high cost performance wise when moving images from the GPU to the CPU. That's what you're seeing when you add those video delay actors."

GPUs are designed to pull data from CPU RAM very very quickly. But they are not designed to go in the other direction. (Why is this? Because GPUs are designed for gaming, not for video processing. A game never need to get the image back from the GPU, so GPUs are not designed to deal with this use case.)

In any case, when you ask the GPU to give the data back to you, it causes what's called a "stall" -- the GPU needs to finish all operations at the moment you ask for the image. Such a stall destroys the parallel processing (= threading) that makes the GPU so fast. Moreover, the CPU needs to sit and wait for all the pending GPU operations to complete.

It is possible that we could make a Video Delay actor that keeps all the frames on the GPU, which would make it more efficient. The problem is it's not trivial to find out how much memory is available in the GPU and to get the actor to fail gracefully if there isn't enough GPU memory.

Again, every frame of a 1920x1080 image consumes 8.29MB. You want a ten second delay at 30 fps? That's 30 x 10 x 8.29MB = 2.4GB. A lot of GPUs could handle this, but some could would run out of memory. It was this fact that led to my decision to keep the delayed frames in CPU RAM.

Best Wishes,
Mark

DillTheKraut

@mark said:

GPU to the CPU

Ok, I missed that part as I wasn't aware, that the direction of data transmission would be handled differently. This means, not the BUS system is the bottleneck, but the design of the grafikcards? In this case VRAM/shared Mem is comming to my mind, but maybe this is a whole other story?

Thank you, for allways taking the time to explain in the deep.

It shows the complexity of programming live video tools and therefore the affort you put in Isadora.

This community and having the creator as intense part of it, makes the Isadora project even more one of a kind!

Thanks a lot!

DillTheKraut

@mark said:

That's 30 x 10 x 8.29MB = 2.4GB

I guess that's what is copied back and forth per s? Given a PCI 2.0 16x PCIe BUS Speed has 8GB/s (That's the specs of the old 5,1 Mac) throughput, this would only allow a max. 3 of those delays, right?

mark

@dillthekraut said:

I guess that's what is copied back and forth per s?

No, that's not the case. You need to get 8.29 MB for each frame. For each render cycle, you're grabbing the 'video in' frame from the GPU (slow, bottleneck with stalls)and storing it in CPU RAM, but you are also grabbing one of the delayed frames in the CPU and shipping it to the GPU (fast). Assuming a frame rate of 30fps, 2 x 8.29MB x 30 = 497.4 MB per second transfer.

RE: this:

VRAM/shared Mem

Video RAM on the GPU is not shared with the CPU... at least on most existing computers with discrete graphics cards. Now, I'm actually not sure about an integrated GPUs like the Intel UHD Graphics 630 1536 MB on my computer... maybe there is some sharing there? (I just looked it up; kinda complicated.) Interestingly, I know on the new M1 chips the RAM is shared by the CPU and GPU, which might make this bottleneck go away as we transition to those machines. (Again, not 100% certain about this.... inferring this from a few things I've read.)

Best Wishes,
Mark

liminal_andy

@mark any thoughts on the new resizable BAR technologies that have just become available on most GPUs? Would this have an impact on Isadora's capabilities?