E+ and Reinforcement Learning - approach to framework advice

Hello everyone! For a 'little' research project, I am working on developing/analyzing reinforcement learning (RL) algorithms for HVAC and building control and energy optimization using E+ simulator, and my testbed is a vertical farming facility.

This is a new project and I am completely new to E+ and anything alike. I started reading about E+ and scoping out the available tools just a few weeks ago. My current focus is reviewing existing research/technology for RL and E+ (there are several papers). I am realizing there is a substantial learning curve for all of this.

To use E+ for an RL framework, I need to access the 'state' (read temp, humidity sensors, weather, etc.) of the running simulator and 'act' on it (actuate ventilation, fans, setpoints, etc.) at sub-hourly timesteps - highest frequency is ideal.

I am 'aware' of OpenStudio, OpenStudio Coalition, SketchUp plugin, Euclid, Modelica, BCVTB, EMS, Python EMS API, RL Testbed for E+ (IBM), and Spawn (yet to come?); but have yet to really dig into any serious documentation. I don't know where to begin.

I am looking for any advice on how to approach this project and learning in a logical manner so as to avoid reading 100s, 1000s of pages of documentation serially. I know I have lots of reading/learning ahead of me, but I can't afford to waste days/weeks digging into something that doesn't align with my project objectives.

Some foresight would be greatly appreciated. Thank you!

