My two cents:
If you want to use an ISO-focused workflow, you're actually doing really well. I'd suggest ninja over skype any day though, as there are a lot of issues with skype's reliability (particularly with audio) and ninja is pretty rock solid, especially when OBS is on Windows. In the broadcast world, Skype is unofficially banned from modern workflows, even Skype TX boxes are being tossed out windows in major studios.
If you are doing this with a professional production budget, the large event companies are using Zoom Rooms. Weird, right? A program designed for conference rooms and digital signage is powering most of the remote guest systems you see in the field. Well, it's true! Look no further than 090's public workflows (Alex Lindsay breaks them down). They do individual contributions to their studio via Zoom Rooms set up for each talent, letting them get three controllable outputs from each contributor. You could do a less aggressive version of this, perhaps by doing one room for every two or three participants. 090 bridges these rooms via hardware systems (Blackmagic ATEMs, constellations, etc.) for mix minus back to all participants, then they have audio and video ISOs into their broadcast systems. Some do this all in the cloud, merging hundreds of thousands of participants all in AWS. Very cool, but very expensive! But that would be the step above what is done with Ninja in the professional broadcast world to solve the issues you face.
If you don't need genuine ISO workflows, ZoomOSC can help you quite a bit. Running in multi-monitor mode, you can have Isadora collect up to two 1080p feeds from each ZoomOSC instance (pinning between 1 and 2). I treat my Zoom inputs like virtual PTZ cameras, triggered by vMix or Isadora. I break down my workflow in this video. I can use this to get most of the benefits of an ISO workflow at a fraction of the computational expense.
As @Armando suggests, you could use ZoomOSC alongside some of @peuclid 's incredible user actors to get ISO outputs from the Zoom gallery view, which is fantastic if you don't need to increase the size of the video feeds from the gallery view beyond what they are already (you just want to re-arrange / animate / show | hide them). There was a ton of Zoom theater produced this way over the Fall, using Isadora and ZoomOSC together to achieve this. Not as powerful as genuine ISO outs from Zoom (believe me, I am working on that too and am actually demoing that new product on AV Tech Talks tomorrow that can do this!). But this simulated workflow can do quite a bit for you.
Finally, if I can offer one specific piece of advice to you, it is to split up your software among multiple computers. Have an ingest station that converts your webrtc contributions to NDI within OBS, or multiple ZoomOSC pinning machines running NDI scan convert, or at least a computer dedicated to your RTMP encoding and AV sync isolated from Isadora. You can open many doors for yourself by doing this.
Maybe one day Isadora will have native streaming inputs and outputs, but for now, this is how I would approach these challenges. Let me know if I can be of further assistance to you.