Restart an AWS PAT?
Hello,
My connection to AWS seems to time out. I've been trying to do these fairly large Design of Experiment based algorithmic runs, in some cases where I need them to run overnight. Twice now the PAT seems to stop mid stream. The connection to AWS remains open but I am no longer getting progress on models. PAT version 2.7.0 and AMI 2.7.1. In this case it hung up around 22% complete of a 10240 case simulation. Thank you for any insight.
Regards.
Is it just PAT that is unresponsive, or is the server on AWS also not responsive? (If you open it in a web browser)
whats your server instance type? How many datapoints are you running and how big is each datapoint file?
Service instance was m3.2xlarge, 10,240 data points and each run is approx 50-100mb.
The server seems alive, from the EC2 monitoring I can see the cpu activity as resting. It seems like PAT just stops sending more run instructions. The Resque monitoring shows no activity.
also try '2.7.1-largescale1'. This AMI has some load balancing changes to keep the server node from getting overloaded with worker processes, which can make it unresponsive
well, 10,240 datapoints at 50Mb is 512 Gb, so your instance probably ran out of disk space. You can verify that by using the server.pem key and ssh into the server node and 'df -h'. user name is 'ubuntu'