PAT calibration project - performance and parallelization
Hi everyone,
I have been trying to use PAT to calibrate a building model to measured data. The building model was developed using project specifications, i have measured data for almost a full year, with 10 min granularity (timestep) for rooms temperature and HVAC energy consumption. I also have a AMY weather file for that specific location.
I am currently running PAT in Algorithmic mode using OpenStudio-server on AWS, with a node type "t2.large" and max of 4 nodes.
My problem is the run time per each simulation. Running the model locally using EnergyPlus takes round 30s while in PAT is taking almost 3 minutes.
I have done some testing and believe that the problem is the reporting measure "Time Series Objective Function" that i am using to calculate calibration metrics (CVRMSE). This measure is being used per each room with temperature readings (8 rooms) plus for total HVAC energy consumption.
I would like to know if there is a way to improve simulation time and also if it is possible to run simulations in parallel.
Thanks!
Sorry to be brief as I am traveling, but all the algorithms run in parallel. My recommendation is to choose a better instance type, with more cores, etc.
Thank you for your answer and sorry for the late reply (was on vacations). I have manage to run simulations in parallel by changing the eksctl cluster settings, increasing both the number of nodes and minimum nodes (ex. --nodes 2 \ --nodes-min 2).
Regarding the reporting measure the problem still exists, since is increasing a lot the simulation time. I have also noticed that having more run periods (i have data with some missing days) also was a huge impact. Could this be related to the SQL querys being repeated for every run period?
Also regarding the measure, it has a lot of cool features (like the graphic representation), but could there be a way of just reporting the CVRMSE with less computation power required? Thank you!
ah, its coming back to me. sorry its been 6 years. there is an argument 'verbose_messages'. This is very useful to get the output of the measure for debugging and getting the sql query right, but its a major performance hit. make sure that is set to FALSE when doing production runs. i think there's a few other arguments that are not needed for production runs, like 'find_avail' which you can also set to FALSE.
Thanks again for your answer. I was already setting all these arguments to FALSE, but the reporting measures are still taking around 40% of the total simulation time. Is there something i am missing? Another question i have is related to parallelization. Is it possible to have several simulations runs per each node? I have managed to increase the number of parallel simulations to 6 by using the following setting of openstudio-server (set worker_hpa.minReplicas=6), but this is just adding 6 instances (nodes) that have a CPU utilization of arround 50%.