Search Results

Search found 6559 results on 263 pages for 'parallel foreach'.

Page 7/263 | < Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14  | Next Page >

  • multi-core processing in R on windows XP - via doMC and foreach

    - by Jan
    Hi guys, I'm posting this question to ask for advice on how to optimize the use of multiple processors from R on a Windows XP machine. At the moment I'm creating 4 scripts (each script with e.g. for (i in 1:100) and (i in 101:200), etc) which I run in 4 different R sessions at the same time. This seems to use all the available cpu. I however would like to do this a bit more efficient. One solution could be to use the "doMC" and the "foreach" package but this is not possible in R on a Windows machine. e.g. library("foreach") library("strucchange") library("doMC") # would this be possible on a windows machine? registerDoMC(2) # for a computer with two cores (processors) ## Nile data with one breakpoint: the annual flows drop in 1898 ## because the first Ashwan dam was built data("Nile") plot(Nile) ## F statistics indicate one breakpoint fs.nile <- Fstats(Nile ~ 1) plot(fs.nile) breakpoints(fs.nile) # , hpc = "foreach" --> It would be great to test this. lines(breakpoints(fs.nile)) Any solutions or advice? Thanks, Jan

    Read the article

  • Selenium navigation inside foreach loop

    - by smudgedlens
    I am having an issue with navigation. I get a list of rows from an html table. I iterate over the rows and scrape information from them. But there is also a link on the row that I click to go to more information related to the row to scrape. Then I navigate back to the page with the original table. This works for the first row, but for the subsequent rows, it throws an exception. I look at my row collection after the first time the link inside a row is clicked, and none of them have the correct values like they did before I clicked the link. I believe that there is something going on when I navigate to a different URL that I'm not getting. My code is below. How do I get this working so I can iterate over the parent table, click the links in each row, navigate to the child table, but still continue iterating over the rows in the parent table? private List<Document> getResults() { var documents = new List<Document>(); //Results IWebElement docsTable = this.webDriver.FindElements(By.TagName("table")) .Where(table => table.Text.Contains("Document List")) .FirstOrDefault(); var validDocRowRegex = new Regex(@"^(\d{3}\s+)"); var docRows = docsTable.FindElements(By.TagName("tr")) .Where(row => //It throws an exception with .FindElement() when there isn't one. row.FindElements(By.TagName("td")).FirstOrDefault() != null && //Yeah, I don't get this one either. I negate the match and so it works?? !validDocRowRegex.IsMatch( row.FindElement(By.TagName("td")).Text)) .ToList(); foreach (var docRow in docRows) { //Todo: find out why this is crashing on some documents. var cells = docRow.FindElements(By.TagName("td")); var document = new Document { DocID = Convert.ToInt32(cells.First().Text), PNum = Convert.ToInt32(cells[1].Text), AuthNum = Convert.ToInt32(cells[2].Text) }; //Go to history for the current document. cells.Where(cell => cell.FindElements(By.TagName("a")).FirstOrDefault() != null) .FirstOrDefault().Click(); //Todo: scrape child table. this.webDriver.Navigate().Back(); } return documents; }

    Read the article

  • Parallel.For System.OutOfMemoryException

    - by Martin Neal
    We have a fairly simple program that's used for creating backups. I'm attempting to parallelize it but am getting an OutofMemoryException within an AggregateExcption. Some of the source folders are quite large, and the program doesn't crash for about 40 minutes after it starts. I don't know where to start looking so the below code is a near exact dump of all code the code sans directory structure and Exception logging code. Any advise as to where to start looking? using System; using System.Diagnostics; using System.IO; using System.Threading.Tasks; namespace SelfBackup { class Program { static readonly string[] saSrc = { "\\src\\dir1\\", //... "\\src\\dirN\\", //this folder is over 6 GB }; static readonly string[] saDest = { "\\dest\\dir1\\", //... "\\dest\\dirN\\", }; static void Main(string[] args) { Parallel.For(0, saDest.Length, i => { try { if (Directory.Exists(sDest)) { //Delete directory first so old stuff gets cleaned up Directory.Delete(sDest, true); } //recursive function clsCopyDirectory.copyDirectory(saSrc[i], sDest); } catch (Exception e) { //standard error logging CL.EmailError(); } }); } } /////////////////////////////////////// using System.IO; using System.Threading.Tasks; namespace SelfBackup { static class clsCopyDirectory { static public void copyDirectory(string Src, string Dst) { Directory.CreateDirectory(Dst); /* Copy all the files in the folder If and when .NET 4.0 is installed, change Directory.GetFiles to Directory.Enumerate files for slightly better performance.*/ Parallel.ForEach<string>(Directory.GetFiles(Src), file => { /* An exception thrown here may be arbitrarily deep into this recursive function there's also a good chance that if one copy fails here, so too will other files in the same directory, so we don't want to spam out hundreds of error e-mails but we don't want to abort all together. Instead, the best solution is probably to throw back up to the original caller of copy directory an move on to the next Src/Dst pair by not catching any possible exception here.*/ File.Copy(file, //src Path.Combine(Dst, Path.GetFileName(file)), //dest true);//bool overwrite }); //Call this function again for every directory in the folder. Parallel.ForEach(Directory.GetDirectories(Src), dir => { copyDirectory(dir, Path.Combine(Dst, Path.GetFileName(dir))); }); } }

    Read the article

  • NHybernate, the Parallel Framework, and SQL Server

    - by andy
    hey guys, we have a loop that: 1.Loops over several thousand xml files. Altogether we're parsing millions of "user" nodes. 2.In each iteration we parse a "user" xml, do custom deserialization 3.finally, in each iteration, we send our object to nhibernate for saving. We use: .SaveOrUpdateAndFlush(user); This is a lengthy process, and we thought it would be a perfect candidate for testing out the .NET 4.0 Parallel libraries. So we wrapped the loop in a: Parallel.ForEach(); After doing this, we start getting "random" Timeout Exceptions from SQL Server, and finally, after leaving it running all night, OutOfMemory unhandled exceptions. I haven't done deep debugging on this yet, but what do you guys think. Is this simply a limitation of SQL Server, or could it be our NHibernate setup, or what? cheers andy

    Read the article

  • NHibernate, the Parallel Framework, and SQL Server

    - by andy
    hey guys, we have a loop that: 1.Loops over several thousand xml files. Altogether we're parsing millions of "user" nodes. 2.In each iteration we parse a "user" xml, do custom deserialization 3.finally, in each iteration, we send our object to nhibernate for saving. We use: .SaveOrUpdateAndFlush(user); This is a lengthy process, and we thought it would be a perfect candidate for testing out the .NET 4.0 Parallel libraries. So we wrapped the loop in a: Parallel.ForEach(); After doing this, we start getting "random" Timeout Exceptions from SQL Server, and finally, after leaving it running all night, OutOfMemory unhandled exceptions. I haven't done deep debugging on this yet, but what do you guys think. Is this simply a limitation of SQL Server, or could it be our NHibernate setup, or what? cheers andy

    Read the article

  • Using Parallel Extensions In Web Applications

    - by Greg
    I'd like to hear some opinions as to what role, if any, parallel computing approaches, including the potential use of the parallel extensions (June CTP for example), have a in web applications. What scenarios does this approach fit and/or not fit for? My understanding of how exactly IIS and web browsers thread tasks is fairly limited. I would appreciate some insight on that if someone out there has a good understanding. I'm more curious to know if the way that IIS and web browsers work limits the ROI of creating threaded and/or asynchronous tasks in web applications in general. Thanks in advance.

    Read the article

  • Mock Assertions on objects inside Parallel ForEach's???

    - by jacko
    Any idea how we can assert a mock object was called when it is being accessed inside Parallel.ForEach via a closure? I assume that because each invocation is on a different thread that Rhino Mocks loses track of the object? Pseudocode: var someStub = MockRepository.GenerateStub() Parallel.Foreach(collectionOfInts, anInt => someStub.DoSomething(anInt)) someStub.AssertWasCalled(s => s.DoSomething, Repeat.Five.Times) This test will return an expectation violation, expecting the stub to be called 5 times but being actually called 0 times. Any ideas how we can tell the lambdas to keep track of the thread-local stub object?

    Read the article

  • Parallel.For maintain input list order on output list

    - by romeozor
    I'd like some input on keeping the order of a list during heavy-duty operations that I decided to try to do in a parallel manner to see if it boosts performance. (It did!) I came up with a solution, but since this was my first attempt at anything parallel, I'd need someone to slap my hands if I did something very stupid. There's a query that returns a list of card owners, sorted by name, then by date of birth. This needs to be rendered in a table on a web page (ASP.Net WebForms). The original coder decided he would construct the table cell-by-cell (TableCell), add them to rows (TableRow), then each row to the table. So no GridView, allegedly its performance is bad, but the performance was very poor regardless :). The database query returns in no time, the most time is spent on looping through the results and adding table cells etc. I made the following method to maintain the original order of the list: private TableRow[] ComposeRows(List<CardHolder> queryResult) { int queryElementsCount = queryResult.Count(); // array with the query's size var rowArray = new TableRow[queryElementsCount]; Parallel.For(0, queryElementsCount, i => { var row = new TableRow(); var cell = new TableCell(); // various operations, including simple ones such as: cell.Text = queryResult[i].Name; row.Cells.Add(cell); // here I'm adding the current item to it's original index // to maintain order in the output list rowArray[i] = row; }); return rowArray; } So as you can see, because I'm returning a very different type of data (List<CardHolder> -> TableRow[]), I can't just simply omit the ordering from the original query to do it after the operations. Also, I also thought it would be a good idea to Dispose() the objects at the end of each loop, because the query can return a huge list and letting cell and row objects pile up in the heap could impact performance.(?) How badly did I do? Does anyone have a better solution in case mine is flawed?

    Read the article

  • Parallel.For(): Update variable outside of loop

    - by TSS
    I'm just looking in to the new .NET 4.0 features. With that, I'm attempting a simple calculation using Parallel.For and a normal for(x;x;x) loop. However, I'm getting different results about 50% of the time. long sum = 0; Parallel.For(1, 10000, y => { sum += y; } ); Console.WriteLine(sum.ToString()); sum = 0; for (int y = 1; y < 10000; y++) { sum += y; } Console.WriteLine(sum.ToString()); My guess is that the threads are trying to update "sum" at the same time. Is there an obvious way around it?

    Read the article

  • Parallel.For(): Different results for simple addition

    - by TSS
    I'm just looking in to the new .NET 4.0 features. With that, I'm attempting a simple calculation using Parallel.For and a normal for(x;x;x) loop. However, I'm getting different results about 50% of the time (no code change). long sum = 0; Parallel.For(1, 10000, y => { sum += y; }); Console.WriteLine(sum.ToString()); sum = 0; for (int y = 1; y < 10000; y++) { sum += y; } Console.WriteLine(sum.ToString()); Surely I'm missing something simple or do not completely understand the concept. My guess is that the threads are trying to update "sum" at the same time. If so, is there an obvious way around it?

    Read the article

  • Solving embarassingly parallel problems using Python multiprocessing

    - by gotgenes
    How does one use multiprocessing to tackle embarrassingly parallel problems? Embarassingly parallel problems typically consist of three basic parts: Read input data (from a file, database, tcp connection, etc.). Run calculations on the input data, where each calculation is independent of any other calculation. Write results of calculations (to a file, database, tcp connection, etc.). We can parallelize the program in two dimensions: Part 2 can run on multiple cores, since each calculation is independent; order of processing doesn't matter. Each part can run independently. Part 1 can place data on an input queue, part 2 can pull data off the input queue and put results onto an output queue, and part 3 can pull results off the output queue and write them out. This seems a most basic pattern in concurrent programming, but I am still lost in trying to solve it, so let's write a canonical example to illustrate how this is done using multiprocessing. Here is the example problem: Given a CSV file with rows of integers as input, compute their sums. Separate the problem into three parts, which can all run in parallel: Process the input file into raw data (lists/iterables of integers) Calculate the sums of the data, in parallel Output the sums Below is traditional, single-process bound Python program which solves these three tasks: #!/usr/bin/env python # -*- coding: UTF-8 -*- # basicsums.py """A program that reads integer values from a CSV file and writes out their sums to another CSV file. """ import csv import optparse import sys def make_cli_parser(): """Make the command line interface parser.""" usage = "\n\n".join(["python %prog INPUT_CSV OUTPUT_CSV", __doc__, """ ARGUMENTS: INPUT_CSV: an input CSV file with rows of numbers OUTPUT_CSV: an output file that will contain the sums\ """]) cli_parser = optparse.OptionParser(usage) return cli_parser def parse_input_csv(csvfile): """Parses the input CSV and yields tuples with the index of the row as the first element, and the integers of the row as the second element. The index is zero-index based. :Parameters: - `csvfile`: a `csv.reader` instance """ for i, row in enumerate(csvfile): row = [int(entry) for entry in row] yield i, row def sum_rows(rows): """Yields a tuple with the index of each input list of integers as the first element, and the sum of the list of integers as the second element. The index is zero-index based. :Parameters: - `rows`: an iterable of tuples, with the index of the original row as the first element, and a list of integers as the second element """ for i, row in rows: yield i, sum(row) def write_results(csvfile, results): """Writes a series of results to an outfile, where the first column is the index of the original row of data, and the second column is the result of the calculation. The index is zero-index based. :Parameters: - `csvfile`: a `csv.writer` instance to which to write results - `results`: an iterable of tuples, with the index (zero-based) of the original row as the first element, and the calculated result from that row as the second element """ for result_row in results: csvfile.writerow(result_row) def main(argv): cli_parser = make_cli_parser() opts, args = cli_parser.parse_args(argv) if len(args) != 2: cli_parser.error("Please provide an input file and output file.") infile = open(args[0]) in_csvfile = csv.reader(infile) outfile = open(args[1], 'w') out_csvfile = csv.writer(outfile) # gets an iterable of rows that's not yet evaluated input_rows = parse_input_csv(in_csvfile) # sends the rows iterable to sum_rows() for results iterable, but # still not evaluated result_rows = sum_rows(input_rows) # finally evaluation takes place as a chain in write_results() write_results(out_csvfile, result_rows) infile.close() outfile.close() if __name__ == '__main__': main(sys.argv[1:]) Let's take this program and rewrite it to use multiprocessing to parallelize the three parts outlined above. Below is a skeleton of this new, parallelized program, that needs to be fleshed out to address the parts in the comments: #!/usr/bin/env python # -*- coding: UTF-8 -*- # multiproc_sums.py """A program that reads integer values from a CSV file and writes out their sums to another CSV file, using multiple processes if desired. """ import csv import multiprocessing import optparse import sys NUM_PROCS = multiprocessing.cpu_count() def make_cli_parser(): """Make the command line interface parser.""" usage = "\n\n".join(["python %prog INPUT_CSV OUTPUT_CSV", __doc__, """ ARGUMENTS: INPUT_CSV: an input CSV file with rows of numbers OUTPUT_CSV: an output file that will contain the sums\ """]) cli_parser = optparse.OptionParser(usage) cli_parser.add_option('-n', '--numprocs', type='int', default=NUM_PROCS, help="Number of processes to launch [DEFAULT: %default]") return cli_parser def main(argv): cli_parser = make_cli_parser() opts, args = cli_parser.parse_args(argv) if len(args) != 2: cli_parser.error("Please provide an input file and output file.") infile = open(args[0]) in_csvfile = csv.reader(infile) outfile = open(args[1], 'w') out_csvfile = csv.writer(outfile) # Parse the input file and add the parsed data to a queue for # processing, possibly chunking to decrease communication between # processes. # Process the parsed data as soon as any (chunks) appear on the # queue, using as many processes as allotted by the user # (opts.numprocs); place results on a queue for output. # # Terminate processes when the parser stops putting data in the # input queue. # Write the results to disk as soon as they appear on the output # queue. # Ensure all child processes have terminated. # Clean up files. infile.close() outfile.close() if __name__ == '__main__': main(sys.argv[1:]) These pieces of code, as well as another piece of code that can generate example CSV files for testing purposes, can be found on github. I would appreciate any insight here as to how you concurrency gurus would approach this problem. Here are some questions I had when thinking about this problem. Bonus points for addressing any/all: Should I have child processes for reading in the data and placing it into the queue, or can the main process do this without blocking until all input is read? Likewise, should I have a child process for writing the results out from the processed queue, or can the main process do this without having to wait for all the results? Should I use a processes pool for the sum operations? If yes, what method do I call on the pool to get it to start processing the results coming into the input queue, without blocking the input and output processes, too? apply_async()? map_async()? imap()? imap_unordered()? Suppose we didn't need to siphon off the input and output queues as data entered them, but could wait until all input was parsed and all results were calculated (e.g., because we know all the input and output will fit in system memory). Should we change the algorithm in any way (e.g., not run any processes concurrently with I/O)?

    Read the article

  • [UNIX] Sort lines of massive file by number of words on line (ideally in parallel)

    - by conradlee
    I am working on a community detection algorithm for analyzing social network data from Facebook. The first task, detecting all cliques in the graph, can be done efficiently in parallel, and leaves me with an output like this: 17118 17136 17392 17064 17093 17376 17118 17136 17356 17318 12345 17118 17136 17356 17283 17007 17059 17116 Each of these lines represents a unique clique (a collection of node ids), and I want to sort these lines in descending order by the number of ids per line. In the case of the example above, here's what the output should look like: 17118 17136 17356 17318 12345 17118 17136 17356 17283 17118 17136 17392 17064 17093 17376 17007 17059 17116 (Ties---i.e., lines with the same number of ids---can be sorted arbitrarily.) What is the most efficient way of sorting these lines. Keep the following points in mind: The file I want to sort could be larger than the physical memory of the machine Most of the machines that I'm running this on have several processors, so a parallel solution would be ideal An ideal solution would just be a shell script (probably using sort), but I'm open to simple solutions in python or perl (or any language, as long as it makes the task simple) This task is in some sense very easy---I'm not just looking for any old solution, but rather for a simple and above all efficient solution

    Read the article

  • Perl Parallel::ForkManager wait_all_children() takes excessively long time

    - by zhang18
    I have a script that uses Parallel::ForkManager. However, the wait_all_children() process takes incredibly long time even after all child-processes are completed. The way I know is by printing out some timestamps (see below). Does anyone have any idea what might be causing this (I have 16 CPU cores on my machine)? my $pm = Parallel::ForkManager->new(16) for my $i (1..16) { $pm->start($i) and next; ... do something within the child-process ... print (scalar localtime), " Process $i completed.\n"; $pm->finish(); } print (scalar localtime), " Waiting for some child process to finish.\n"; $pm->wait_all_children(); print (scalar localtime), " All processes finished.\n"; Clearly, I'll get the Waiting for some child process to finish message first, with a timestamp of, say, 7:08:35. Then I'll get a list of Process i completed messages, with the last one at 7:10:30. However, I do not receive the message All Processes finished until 7:16:33(!). Why is that 6-minute delay between 7:10:30 and 7:16:33? Thx!

    Read the article

  • Parallel version of loop not faster than serial version

    - by Il-Bhima
    I'm writing a program in C++ to perform a simulation of particular system. For each timestep, the biggest part of the execution is taking up by a single loop. Fortunately this is embarassingly parallel, so I decided to use Boost Threads to parallelize it (I'm running on a 2 core machine). I would expect at speedup close to 2 times the serial version, since there is no locking. However I am finding that there is no speedup at all. I implemented the parallel version of the loop as follows: Wake up the two threads (they are blocked on a barrier). Each thread then performs the following: Atomically fetch and increment a global counter. Retrieve the particle with that index. Perform the computation on that particle, storing the result in a separate array Wait on a job finished barrier The main thread waits on the job finished barrier. I used this approach since it should provide good load balancing (since each computation may take differing amounts of time). I am really curious as to what could possibly cause this slowdown. I always read that atomic variables are fast, but now I'm starting to wonder whether they have their performance costs. If anybody has some ideas what to look for or any hints I would really appreciate it. I've been bashing my head on it for a week, and profiling has not revealed much.

    Read the article

< Previous Page | 3 4 5 6 7 8 9 10 11 12 13 14  | Next Page >