Embarrassingly parallel workflow creates too many output files

Posted by Hooked on Stack Overflow See other posts from Stack Overflow or by Hooked
Published on 2012-11-28T15:40:29Z Indexed on 2012/11/28 17:05 UTC
Read the original article Hit count: 310

Filed under:

bash

|

file-io

|

parallel-processing

|

embarrassingly-parallel

On a Linux cluster I run many (N > 10^6) independent computations. Each computation takes only a few minutes and the output is a handful of lines. When N was small I was able to store each result in a separate file to be parsed later. With large N however, I find that I am wasting storage space (for the file creation) and simple commands like ls require extra care due to internal limits of bash: -bash: /bin/ls: Argument list too long.

Each computation is required to run through a qsub scheduling algorithm so I am unable to create a master program which simply aggregates the output data to a single file. The simple solution of appending to a single fails when two programs finish at the same time and interleave their output. I have no admin access to the cluster, so installing a system-wide database is not an option.

How can I collate the output data from embarrassingly parallel computation before it gets unmanageable?

© Stack Overflow or respective owner

Related posts about bash

launching a program from bash causes bash to go to new prompt

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
When I run a program from the console, e.g. me@box:~$ firefox I expect the console to log error messages (I think this is std out or std err?) and other items from the program, firefox in this case. But today I notice that bash just opens the program and goes to a new prompt, e.g. me@box:~$… >>> More
How to debug a .bash_profile

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I was updating my .bash_profile, and unfortunetly I made a few updates and now I am getting: env: bash: No such file or directory env: bash: No such file or directory env: bash: No such file or directory env: bash: No such file or directory env: bash: No such file or directory -bash: tar: command… >>> More
Every command fails with "command not found" after changing .bash_profile?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I was updating my .bash_profile, and unfortunetly I made a few updates and now I am getting: env: bash: No such file or directory env: bash: No such file or directory env: bash: No such file or directory env: bash: No such file or directory env: bash: No such file or directory -bash: tar: command… >>> More
Is there any fundamental difference between piping in mac and linux?

as seen on Super User - Search for 'Super User'
ps -e | grep bash sample output from a linux machine: 1128 pts/14 00:00:00 bash 7491 pts/7 00:00:00 bash 12651 pts/14 00:00:00 bash 16145 pts/2 00:00:00 bash sample output from a mac machine: 58352 ttys000 0:00.09 login -pfl username /bin/bash -c exec -la bash /bin/bash 58353 ttys000… >>> More
why is $0 set to -bash?

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
First login process name seems to be set to "-bash", but if I subshell then it becomes "bash". for example: root@nowere:~# echo $0 -bash root@nowere:~# bash root@nowere:~# echo $0 bash -bash is causing some scripts to fail, such as . /usr/share/debconf/confmodule exec /usr/share/debconf/frontend… >>> More

Related posts about file-io

File io error Python

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a program that monitors a folder with word documents for any modifications made on the files. The error -Windows Error[2] The system cannot find the file specified- comes when I run the program, open a .doc within the folder make some changes and save it. Any suggestions on how to fix this… >>> More
Problem with File IO and splitting strings with Environment.NewLine in VB.Net

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi, I was experimenting with basic VB.Net file read/write and encountered this problem. I don't know whether it has something to do with the File IO or the String splitting. I am writing text to a file like so Dim sWriter As New StreamWriter("Data.txt") sWriter.WriteLine("FirstItem") sWriter.WriteLine("SecondItem") sWriter… >>> More
C# Multithreading File IO (Reading)

as seen on Stack Overflow - Search for 'Stack Overflow'
We have a situation where our application needs to process a series of files and rather than perform this function synchronously, we would like to employ multi-threading to have the workload split amongst different threads. Each item of work is: 1. Open a file for read only 2. Process the data in… >>> More
On MacOSX, in a C++ program, what guarantees can I have on file IO

as seen on Stack Overflow - Search for 'Stack Overflow'
I am on MacOSX. I am writing a multi threaded program. One thread does logging. The non-logging threads may crash at any time. What conventions should I adopt in the logger / what guarantees can I have? I would prefer a solution where even if I crash during part of a write, previous writes still… >>> More
ASP.Net Application Trust Medium File IO Outside Virtual Directory

as seen on Stack Overflow - Search for 'Stack Overflow'
I am trying to determine how suicidal this is... I have a hosting environment where a custom ASP.Net CMS application needs to access the files in the root folder of a website even though it is in a virtual folder so it can be shared accross many sites. I can modify the Medium trust on the server… >>> More