Search Results

Search found 12 results on 1 pages for 'qsub'.

Page 1/1 | 1 

  • script unable to find directories/files when running from qsub cluster script

    - by user248237
    I'm calling several unix commands and python on a python script from a qsub shell script, meant to run on a cluster. The trouble is that when the script executes, something seems to go awry in the shell, so that directories and files that exist are not found. For example, in the .out output files of qsub I see the following errors: cd: /valid/dir/name: No such file or directory python valid/script/name.py python: can't open file 'valid/script/name.py': [Errno 2] No such file or directory so the script cannot cd into a dir that definitely exist. Similarly, calling python on a python script that definitely exists yields an error. any idea what might be going wrong here, or how I could try to debug this? thanks very much.

    Read the article

  • using qsub (sge) with multi-threaded applications

    - by dan12345
    i wanted to submit a multi-threaded job to the cluster network i'm working with - but the man page about qsub is not clear how this is done - By default i guess it just sends it as a normal job regardless of the multi-threading - but this might cause problems, i.e. sending many multi-threaded jobs to the same computer, slowing things down. Does anyone know how to accomplish this? thanks. The batch server system is sge.

    Read the article

  • qsub: How can I find out what DRM middleware exactly is installed on a cluster?

    - by gojira
    I have a user account on a very big cluster. I have previous experience with Grid Engine and want to use the cluster for array jobs. The documentation tells me to use "qsub" for load balancing / submission of many jobs. Therefore I assumed this means the cluster has Grid Engine. However all my Grid Engine scripts failed to run. I checked the documentation and it is a bit weird. Now I slowly suspect that this cluster does not actually have Grid Engine, maybe it's running something called Torque (?!). The whole terminology in the man pages is a bit weird for me as a Grid Engine user, for example they talk about "bulk jobs" instead of "array jobs". There is no referral to variables on which I rely on, like SGE_TASK_ID etc. Instead they refer to variables starting with PBS_. Still, there are qsub and qstat commands. Also qsub behaves differently, apparently it is not possible to specifiy the command line parameters with bash-script comments etc. There is a documentation for the cluster system, but it does not say what the DRM middleware actually is - it refers to the entire DRM system simply as "qsub". I tried qsub --version qsub: 1.2 2010/8/17 I am not sure what I am actually running when I invoke qsub on that cluster! My question is, how can I find out if I am running Grid Engine or Torque (or whatever it is), and which version?

    Read the article

  • PBS programming

    - by Werner
    Hi, some short and probably stupid questions about PBS: 1- I submit jobs using qsub job_file is it possible to submit a (sub)job inside a job file? 2- I have the following script: qsub job_a qsub job_b For launching job_b, it would be great to have before the results of job_a finished. Is it possible to put some kind of barrier or some otehr workaround so job_b is not launched until job_a finished? Thanks

    Read the article

  • How to make bash script run with a latency (i.e. wait 1 sec at each iterations)?

    - by user2413
    I have this bash script; for (( i = 1 ; i <= 160 ; i++ )); do qsub myccomputations"${i}".pbs done Basically, I would prefer if there was a 1 second delay between each iteration. The reason is that at each iterations, it sends the program file mycomputation"${i}$.pbs to a core node for solving. Solving in this instance involves the use of pseudo random numbers. I suspect the RNG I use (R's) uses CPU time as seed because as things are now I get repeating pseudo random numbers (at the rate of approx 1 out of 100). So how to you ask bash to for (( i = 1 ; i <= 160 ; i++ )); do wait 1 sec qsub myccomputations"${i}".pbs done

    Read the article

  • Sun Grid Engine : jobs are not well balanced

    - by GlinesMome
    I use Open Grid Scheduler (a fork/copy of Sun Grid Engine). I have tried this configuration from master: # qconf -mattr exechost complex_values slots=8 slave2 # qconf -mq all.q | grep slots slots 100,[slave1=1],[slave2=8] slave1 is down, then I run 10 qsub with a sleep example (so no CPU consumption) but only 4 jobs are run at the same time on slave2 instead of I have put 8 slots. What does I missed ? PS: my goal is to provide infinite slots to force SGE to schedule only via consummable ressources.

    Read the article

  • Running a lot of jobs with sun grid engine

    - by R S
    I want to run a very large number (~30000) of jobs with Sun Grid Engine. I can theoretically, perform 30000 times the "qsub" command to submit jobs. However, I am afraid that will be too much. Is there a better way to do it? (i.e. from a file) Or otherwise, do you think it will work nonetheless? Thank you

    Read the article

  • Sun Grid Engine: Automatically Terminating Idle Interactive Jobs

    - by dmcer
    We're considering using Sun Grid Engine on a small compute cluster. Right now, the current set up is pretty crude and just involves having people ssh to an open machine to run their jobs. We'd like to allow interactive jobs, since that should ease the transition from manually starting jobs to starting them using qsub. But, there is some concern that, if we do, people might accidentally leave their interactive sessions idle and block other jobs from being run on the machines. The issue isn't just theoretical, since we previously tried using OpenPBS and there was a problem with people opening up an interactive job in a screen session and essentially camping on a machine. Is there anyway to configure SGE to automatically kill idle interactive jobs? It looks like this was requested as an enhancement (Issue #:2447) way back in 2007. But, it doesn't seem like the request ever got implemented.

    Read the article

  • how to check if something is in the queue of torque?

    - by kloop
    I want to re-run some jobs that completed prematurely under torque. These jobs are run through .job scripts (using qsub). However, I don't want to re-run a job which is already in the queue. Given a script filename, how can I know whether it is already in torque's queue (using qstat?) or not? I prefer to do it programmatically, of course, so any oneliner that searches for a given script name would be great. I will note that I can grep submit_args in qstat -f, but I can't get it to display the whole script name when it is too long. This is crucial. EDIT: I managed to solve it using the following command: qstat -x | perl -pi -e 's/\<\//\n/g' | grep job$ | grep -v submit_args | perl -pi -e 's/Job_Id\>\<Job_Name\>//' works because all my scripts end in the string "job".

    Read the article

  • Embarrassingly parallel workflow creates too many output files

    - by Hooked
    On a Linux cluster I run many (N > 10^6) independent computations. Each computation takes only a few minutes and the output is a handful of lines. When N was small I was able to store each result in a separate file to be parsed later. With large N however, I find that I am wasting storage space (for the file creation) and simple commands like ls require extra care due to internal limits of bash: -bash: /bin/ls: Argument list too long. Each computation is required to run through a qsub scheduling algorithm so I am unable to create a master program which simply aggregates the output data to a single file. The simple solution of appending to a single fails when two programs finish at the same time and interleave their output. I have no admin access to the cluster, so installing a system-wide database is not an option. How can I collate the output data from embarrassingly parallel computation before it gets unmanageable?

    Read the article

  • Parallel Environment (PE) on Sun Grid Engine (6.2u5) won't run jobs: "only offers 0 slots"

    - by Peter Van Heusden
    I have Sun Grid Engine set up (version 6.2u5) on a Ubuntu 10.10 server with 8 cores. In order to be able to reserve multiple slots, I have a parallel environment (PE) set up like this: pe_name serial slots 999 user_lists NONE xuser_lists NONE start_proc_args /bin/true stop_proc_args /bin/true allocation_rule $pe_slots control_slaves FALSE job_is_first_task TRUE urgency_slots min accounting_summary FALSE This is associated with the all.q on the server in question (let's call the server A). However, when I submit a job that uses 4 threads with e.g. qsub -q all.q@A -pe serial 4 mycmd.sh, it never gets scheduled, and I get the following reasoning from qstat: cannot run in PE "serial" because it only offers 0 slots Why is SGE saying "serial" only offers 0 slots, since there are 8 slots available on the server I specified (server A)? The queue in question is configured thus (server names changed): qname all.q hostlist @allhosts seq_no 0 load_thresholds np_load_avg=1.75 suspend_thresholds NONE nsuspend 1 suspend_interval 00:05:00 priority 0 min_cpu_interval 00:05:00 processors UNDEFINED qtype BATCH INTERACTIVE ckpt_list NONE pe_list make orte serial rerun FALSE slots 1,[D=32],[C=8], \ [B=30],[A=8] tmpdir /tmp shell /bin/sh prolog NONE epilog NONE shell_start_mode posix_compliant starter_method NONE suspend_method NONE resume_method NONE terminate_method NONE notify 00:00:60 owner_list NONE user_lists NONE xuser_lists NONE subordinate_list NONE complex_values NONE projects NONE xprojects NONE calendar NONE initial_state default s_rt INFINITY h_rt 08:00:00 s_cpu INFINITY h_cpu INFINITY s_fsize INFINITY h_fsize INFINITY s_data INFINITY h_data INFINITY s_stack INFINITY h_stack INFINITY s_core INFINITY h_core INFINITY s_rss INFINITY h_rss INFINITY s_vmem INFINITY h_vmem INFINITY,[A=30g], \ [B=5g]

    Read the article

  • Submitting R jobs using PBS

    - by Tony
    I am submitting a job using qsub that runs parallelized R. My intention is to have R programme running on 4 different cores rather than 8 cores. Here are some of my settings in PBS file: #PBS -l nodes=1:ppn=4 .... time R --no-save < program1.R > program1.log I am issuing the command "ta job_id" and I'm seeing that 4 cores are listed. However, the job occupies a large amount of memory(31944900k used vs 32949628k total). If I were to use 8 cores, the jobs got hang due to memory limitation. top - 21:03:53 up 77 days, 11:54, 0 users, load average: 3.99, 3.75, 3.37 Tasks: 207 total, 5 running, 202 sleeping, 0 stopped, 0 zombie Cpu(s): 30.4%us, 1.6%sy, 0.0%ni, 66.8%id, 0.0%wa, 0.0%hi, 1.2%si, 0.0%st Mem: 32949628k total, 31944900k used, 1004728k free, 269812k buffers Swap: 2097136k total, 8360k used, 2088776k free, 6030856k cached Here is a snapshot when issuing command ta job_id PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1794 x 25 0 6247m 6.0g 1780 R 99.2 19.1 8:14.37 R 1795 x 25 0 6332m 6.1g 1780 R 99.2 19.4 8:14.37 R 1796 x 25 0 6242m 6.0g 1784 R 99.2 19.1 8:14.37 R 1797 x 25 0 6322m 6.1g 1780 R 99.2 19.4 8:14.33 R 1714 x 18 0 65932 1504 1248 S 0.0 0.0 0:00.00 bash 1761 x 18 0 63840 1244 1052 S 0.0 0.0 0:00.00 20016.hpc 1783 x 18 0 133m 7096 1128 S 0.0 0.0 0:00.00 python 1786 x 18 0 137m 46m 2688 S 0.0 0.1 0:02.06 R How can I prevent other users to use the other 4 cores? I like to mask somehow that my job is using 8 cores with 4 cores idling. Could anyone kindly help me out on this? Can this be solved using pbs? Many Thanks

    Read the article

1