@hiroko_tanahashi I would advise having all outputs from a single device if possible. Moving texture data between devices is an expensive operation (less so with Apple silicon unified ram). With a Mac Studio you can do your 4x4k individually from the machine and then use a data path or something similar to get 4 xHD from another single output.
If your videos are in pro res they are hardware decoded on a mac so there is not so much overhead here. The total frame requirements are only about 230mb for a frame of all videos at your resolution. Every step you use will likely add another buffer so by the time you output the images you will likely use 8 or so times that. And then if you fade between scenes at least double that. Ergot 64 gig of ram should be fine - as you cannot upgrade it though get as much as you can afford when you buy a silicon based device.
Going the PC route will work as well but I’m not into the upload and download of all the texture data to and from the GPU required to use multiple graphics cards or to use a deck link card for output. These transfers are limited by the download bus bandwidth on the GPU which is much more limited than what you get for uploading data from the cpu to the GPU. Hardware decoding is a bit more tricky on a PC it depends on your GPU and drivers. HAP, which is great for PC playback has no real hardware decoder. H264 and H265 do, but its trickier to end up with a lossless codec that is the equivalent of pro res 444 (the only real way to retain all colour information is to use a 444 codec).
The cost will be pretty much equal for both system by the time you get all the required bits. Mac Studio is tiny and much less fragile and quieter (easily in your suitcase on a plane). A PC to do this will be an excellent machine (bigger and noisier and harder to transport) and if you go the deck link route you will end up with high quality capture capabilities for another project if you ever need it.