Upgrade OpenStudio from 3.2.1 to 3.3.0

asked 2021-11-09 11:44:51 -0500

Julio Betta's avatar

updated 2021-11-09 16:11:07 -0500

Hey everyone! I'm getting this error when I try to run a calibration simulation after I upgraded from 3.2.1 to 3.3.0. any ideias???

/opt/openstudio/server/app/lib/analysis_library/rgenoud.rb failed with voidEval failed: , /usr/local/lib/ruby/gems/2.7.0/gems/rserve-client-0.3.5/lib/rserve/connection.rb:178:in `void_eval'

complete error log:

/opt/openstudio/server/app/lib/analysis_library/rgenoud.rb failed with voidEval failed: , /usr/local/lib/ruby/gems/2.7.0/gems/rserve-client-0.3.5/lib/rserve/connection.rb:178:in `void_eval' /usr/local/lib/ruby/gems/2.7.0/gems/rserve-simpler-0.0.6/lib/rserve/simpler.rb:74:in `command' /opt/openstudio/server/app/lib/analysis_library/r/cluster.rb:83:in `start' /opt/openstudio/server/app/lib/analysis_library/rgenoud.rb:224:in `perform' /opt/openstudio/server/app/jobs/resque_jobs/run_analysis.rb:43:in `perform' /usr/local/lib/ruby/gems/2.7.0/gems/resque-2.0.0/lib/resque/job.rb:168:in `perform' /usr/local/lib/ruby/gems/2.7.0/gems/resque-2.0.0/lib/resque/worker.rb:308:in `perform' /usr/local/lib/ruby/gems/2.7.0/gems/resque-2.0.0/lib/resque/worker.rb:897:in `block in perform_with_fork' /usr/local/lib/ruby/gems/2.7.0/gems/resque-2.0.0/lib/resque/worker.rb:895:in `fork' /usr/local/lib/ruby/gems/2.7.0/gems/resque-2.0.0/lib/resque/worker.rb:895:in `perform_with_fork' /usr/local/lib/ruby/gems/2.7.0/gems/resque-2.0.0/lib/resque/worker.rb:264:in `work_one_job' /usr/local/lib/ruby/gems/2.7.0/gems/resque-2.0.0/lib/resque/worker.rb:238:in `block in work' /usr/local/lib/ruby/gems/2.7.0/gems/resque-2.0.0/lib/resque/worker.rb:235:in `loop' /usr/local/lib/ruby/gems/2.7.0/gems/resque-2.0.0/lib/resque/worker.rb:235:in `work' /usr/local/lib/ruby/gems/2.7.0/gems/resque-2.0.0/lib/resque/tasks.rb:20:in `block (2 levels) in <top (required)>' /usr/local/lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/task.rb:281:in `block in execute' /usr/local/lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/task.rb:281:in `each' /usr/local/lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/task.rb:281:in `execute' /usr/local/lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/task.rb:219:in `block in invoke_with_call_chain' /usr/local/lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/task.rb:199:in `synchronize' /usr/local/lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/task.rb:199:in `invoke_with_call_chain' /usr/local/lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/task.rb:188:in `invoke' /usr/local/lib/ruby/gems/2.7.0/gems/rake-13.0.6/lib/rake/application.rb:160 ...
(more)
edit retag flag offensive close merge delete

Comments

Are you using the meta-CLI? If so did you upgrade to the meta-CLI included in PAT 3.3.0 which just came out yesterday https://github.com/NREL/OpenStudio-PA...

David Goldwasser's avatar David Goldwasser  ( 2021-11-09 12:28:45 -0500 )edit

hey David! so, I'm using openstudio_meta from Docker

FROM nrel/openstudio-server:3.3.0
Julio Betta's avatar Julio Betta  ( 2021-11-09 12:38:52 -0500 )edit

@Julio Betta I was able to run rgenoud algorithm. on our 3.3.0 so it isn't a global issue with that algorithm. Try restarting the server and see if this goes away. If not can you let me know if it si local or AWS deployment, and can you share the failed analysis log.

David Goldwasser's avatar David Goldwasser  ( 2021-11-09 13:57:42 -0500 )edit

we're using the latest version of openstudio-server-helm (v3.3.0) on azure. here's the link to the logs (https://www.dropbox.com/s/vvxs6zsgftd...) and osa (https://www.dropbox.com/s/drem25s9i89...)

thanks ;)

Julio Betta's avatar Julio Betta  ( 2021-11-09 14:42:38 -0500 )edit

if restarting doesnt work, post the other logs (simulate_datapoint log, resque log from admin page).
the R log looks like the R cluster never starts up, are there worker nodes starting up? whats your helm configuration look like?

BrianLBall's avatar BrianLBall  ( 2021-11-10 11:39:15 -0500 )edit

hey Brian! I recorded a quick video for you guys. This time I re-installed openstudio in a different server (gcp), but I got the same error... I forgot to mention that this simulation was working in v3.2.1

https://www.dropbox.com/s/gk7vvfgb56p...

Julio Betta's avatar Julio Betta  ( 2021-11-10 15:34:49 -0500 )edit

this is what a successful R cluster start up looks like from the logs:

[1] "Current working directory is /mnt/openstudio"

max_queued_jobs: 42

[1] "Starting cluster..."

[1] "Number of Workers: 100"

[1] "max timeout is: 180"

[1] "R cluster startup time: 24.0866537094116"

max_queued_jobs gets set to an ENV here https://github.com/NREL/OpenStudio-se...

and that ENV gets set here https://github.com/NREL/OpenStudio-se...

BrianLBall's avatar BrianLBall  ( 2021-11-10 15:49:09 -0500 )edit

so why is your ENV for OS_SERVER_NUMBER_OF_WORKERS not being set. Are you running this on local hardware? if so, set that as an ENV to the number of workers you want and retry. If its not running locally, who is hosting the server?

BrianLBall's avatar BrianLBall  ( 2021-11-10 15:50:54 -0500 )edit

I'm running OS in a remote server (google cloud), and I followed the instructions from openstudio-server-helm (https://github.com/NREL/openstudio-se...). I didn't change any values... I did notice that OS_SERVER_NUMBER_OF_WORKERS isn't defined by values.yml. https://github.com/NREL/openstudio-se...

Julio Betta's avatar Julio Betta  ( 2021-11-10 21:05:25 -0500 )edit

if you can, see if this PR from Tim helps https://github.com/NREL/openstudio-se...

BrianLBall's avatar BrianLBall  ( 2021-11-12 09:41:30 -0500 )edit

I pulled the latest version of openstudio-server-helm and now OS_SERVER_NUMBER_OF_WORKER is defined correctly. it's still throwing the same error though. this is how R cluster start up looks like now.

[1] "Current working directory is /mnt/openstudio"
max_queued_jobs: 7[1] "Starting cluster..."
[1] "Number of Workers: 100"
[1] "max timeout is: 180"

... max_queue_jobs looks strange. 7 is the number of workers... any ideas on what does 7[1] mean?

Julio Betta's avatar Julio Betta  ( 2021-11-12 13:39:30 -0500 )edit
1

so 7 is the default number of workers for when it doesnt get set. you can set the size of the R cluster in the OSA like here: https://github.com/NREL/OpenStudio-se...

it should be the same size as the number of workers.

what error are you seeing now? can you post the logs

BrianLBall's avatar BrianLBall  ( 2021-11-12 13:44:36 -0500 )edit

it's the same error from the original message... about the osa: this is my "algorithm" attribute: https://pastebin.com/a9kpjsWL

I set "max_queued_jobs: 100", which is associated with "[1] Number of Workers: 100" in the R cluster log. so you're saying that OS_SERVER_NUMBER_OF_WORKER should also be 100, right?

Julio Betta's avatar Julio Betta  ( 2021-11-12 14:58:14 -0500 )edit

that was it! I changed "max_queued_jobs: 7" in the osa and it works! thanks Brian and David, you guys rock!!!

Julio Betta's avatar Julio Betta  ( 2021-11-12 15:34:24 -0500 )edit