Stale PBS jobs
Posted: Thu Nov 07, 2013 2:24 pm
Hi,
We are running CASINO on our cluster with PBS Torque and the Moab scheduler. Quite frequently I see orphaned CASINO processes on nodes after a job has been deleted or crashed. I am working with the users in an attempt to determine if these jobs were deleted by hand, crashed or ran out of time.
Has anyone else seen this problem or know why there might be slave processes leftover when the master dies?
Thanks,
Albert DeFusco, Ph.D.
Research Assistant Professor
Technical Director, Center for Simulation and Modeling
University of Pittsburgh
Pittsburgh, PA 15260
412-648-3094
http://www.sam.pitt.edu
We are running CASINO on our cluster with PBS Torque and the Moab scheduler. Quite frequently I see orphaned CASINO processes on nodes after a job has been deleted or crashed. I am working with the users in an attempt to determine if these jobs were deleted by hand, crashed or ran out of time.
Has anyone else seen this problem or know why there might be slave processes leftover when the master dies?
Thanks,
Albert DeFusco, Ph.D.
Research Assistant Professor
Technical Director, Center for Simulation and Modeling
University of Pittsburgh
Pittsburgh, PA 15260
412-648-3094
http://www.sam.pitt.edu