parallel computing - Page 41

Preallocate memory for a program in Linux before it gets started

- by Fyg

Hi, folks, I have a program that repeatedly solves large systems of linear equations using cholesky decomposition. Characterising is that I sometimes need to store the complete factorisation which can exceed about 20 GB of memory. The factorisation happens inside a library that I call. Furthermore, this matrix and the resulting factorisation changes quite frequently and as such the memory requirements as well. I am not the only person to use this compute-node. Therefore, is there a way to start the program under Linux and preallocate free memory for the process? Something like: $: prealloc -m 25G ./program

Read the article

is windows azure installable on my own server ?

- by Omu

I just want to know if it is possible to install windows azure on your own hardware, or the only possibility to use it is to use the Microsoft's hosting ?

Read the article

Higher speed options for executing very large (20 GB) .sql file in MySQL

- by Jonogan

My firm was delivered a 20+ GB .sql file in reponse to a request for data from the gov't. I don't have many options for getting the data in a different format, so I need options for how to import it in a reasonable amount of time. I'm running it on a high end server (Win 2008 64bit, MySQL 5.1) using Navicat's batch execution tool. It's been running for 14 hours and shows no signs of being near completion. Does anyone know of any higher speed options for such a transaction? Or is this what I should expect given the large file size? Thanks

Read the article

How to synchronize cuda threads when they are in the same loop and we need to synchronize them to ex

- by Vickey

Hi all, I have written a code and Now I want to implement this on cuda GPU but I'm new to synchronization so please help me with this, It's little urgent to me. Below I'm presenting the code and I want to that LOOP1 to be executed by all threads (heance I want to this portion to take advantage of cuda and the remaining portion (the portion other from the LOOP1) is to be executed by only a single thread. do{ point_set = master_Q[(*num_mas) - 1].q; List* temp = point_set; List* pa = point_set; if(master_Q[num_mas[0] - 1].max) max_level = (int) (ceilf(il2 * log(master_Q[num_mas[0] - 1].max))); *num_mas = (*num_mas) - 1; while(point_set){ List* insert_ele = temp; while(temp){ insert_ele = temp; if((insert_ele->dist[insert_ele->dist_index-1] <= pow(2, max_level-1)) || (top_level == max_level)){ if(point_set == temp){ point_set = temp->next; pa = temp->next; } else{ pa->next = temp->next; } temp = NULL; List* new_point_set = point_set; float maximum_dist = 0; if(parent->p_index != insert_ele->point_index){ List* tmp = new_point_set; float *b = &(data[(insert_ele->point_index)*point_len]); **LOOP 1:** while(tmp){ float *c = &(data[(tmp->point_index)*point_len]); float sum = 0.; for(int j = 0; j < point_len; j+=2){ float d1 = b[j] - c[j]; float d2 = b[j+1] - c[j+1]; d1 *= d1; d2 *= d2; sum = sum + d1 + d2; } tmp->dist[tmp->dist_index] = sqrt(sum); if(maximum_dist < tmp->dist[tmp->dist_index]) maximum_dist = tmp->dist[tmp->dist_index]; tmp->dist_index = tmp->dist_index+1; tmp = tmp->next; } max_distance = maximum_dist; } while(new_point_set || insert_ele){ List* far, *par, *tmp, *tmp_new; far = NULL; tmp = new_point_set; tmp_new = NULL; float level_dist = pow(2, max_level-1); float maxdist = 0, maxp = 0; while(tmp){ if(tmp->dist[(tmp->dist_index)-1] > level_dist){ if(maxdist < tmp->dist[tmp->dist_index-1]) maxdist = tmp->dist[tmp->dist_index-1]; if(tmp == new_point_set){ new_point_set = tmp->next; par = tmp->next; } else{ par->next = tmp->next; } if(far == NULL){ far = tmp; tmp_new = far; } else{ tmp_new->next = tmp; tmp_new = tmp; } if(parent->p_index != insert_ele->point_index) tmp->dist_index = tmp->dist_index - 1; tmp = tmp->next; tmp_new->next = NULL; } else{ par = tmp; if(maxp < tmp->dist[(tmp->dist_index)-1]) maxp = tmp->dist[(tmp->dist_index)-1]; tmp = tmp->next; } } if(0 == maxp){ tmp = new_point_set; aloc_mem[*tree_index].p_index = insert_ele->point_index; aloc_mem[*tree_index].no_child = 0; aloc_mem[*tree_index].level = max_level--; parent->children_index[parent->no_child++] = *tree_index; parent = &(aloc_mem[*tree_index]); tree_index[0] = tree_index[0]+1; while(tmp){ aloc_mem[*tree_index].p_index = tmp->point_index; aloc_mem[(*tree_index)].no_child = 0; aloc_mem[(*tree_index)].level = master_Q[(*cur_count_Q)-1].level; parent->children_index[parent->no_child] = *tree_index; parent->no_child = parent->no_child + 1; (*tree_index)++; tmp = tmp->next; } cur_count_Q[0] = cur_count_Q[0]-1; new_point_set = NULL; } master_Q[*num_mas].q = far; master_Q[*num_mas].parent = parent; master_Q[*num_mas].valid = true; master_Q[*num_mas].max = maxdist; master_Q[*num_mas].level = max_level; num_mas[0] = num_mas[0]+1; if(0 != maxp){ aloc_mem[*tree_index].p_index = insert_ele->point_index; aloc_mem[*tree_index].no_child = 0; aloc_mem[*tree_index].level = max_level; parent->children_index[parent->no_child++] = *tree_index; parent = &(aloc_mem[*tree_index]); tree_index[0] = tree_index[0]+1; if(maxp){ int new_level = ((int) (ceilf(il2 * log(maxp)))) +1; if (new_level < (max_level-1)) max_level = new_level; else max_level--; } else max_level--; } if( 0 == maxp ) insert_ele = NULL; } } else{ if(NULL == temp->next){ master_Q[*num_mas].q = point_set; master_Q[*num_mas].parent = parent; master_Q[*num_mas].valid = true; master_Q[*num_mas].level = max_level; num_mas[0] = num_mas[0]+1; } pa = temp; temp = temp->next; } } if((*num_mas) > 1){ List *temp2 = master_Q[(*num_mas)-1].q; while(temp2){ List* temp3 = master_Q[(*num_mas)-2].q; master_Q[(*num_mas)-2].q = temp2; if((master_Q[(*num_mas)-1].parent)->p_index != (master_Q[(*num_mas)-2].parent)->p_index){ temp2->dist_index = temp2->dist_index - 1; } temp2 = temp2->next; master_Q[(*num_mas)-2].q->next = temp3; } num_mas[0] = num_mas[0]-1; } point_set = master_Q[(*num_mas)-1].q; temp = point_set; pa = point_set; parent = master_Q[(*num_mas)-1].parent; max_level = master_Q[(*num_mas)-1].level; if(master_Q[(*num_mas)-1].max) if( max_level > ((int) (ceilf(il2 * log(master_Q[(*num_mas)-1].max)))) +1) max_level = ((int) (ceilf(il2 * log(master_Q[(*num_mas)-1].max)))) +1; num_mas[0] = num_mas[0]-1; } }while(*num_mas > 0);

Read the article

How can one manage to fully use the newly enhanced Parallelism features in .NET 4.0?

- by Will Marcouiller

I am pretty much interested into using the newly enhanced Parallelism features in .NET 4.0. I have also seen some possibilities of using it in F#, as much as in C#. Despite, I can only see what PLINQ has to offer with, for example, the following: var query = from c in Customers.AsParallel() where (c.Name.Contains("customerNameLike") select c; There must for sure be some other use of this parallelism thing. Have you any other examples of using it? Is this particularly turned toward PLINQ, or are there other usage as easy as PLINQ? Thanks! =)

Read the article

MongoDB: What's the point of using MapReduce without parallelism?

- by netvope

Quoting http://www.mongodb.org/display/DOCS/MapReduce#MapReduce-Parallelism As of right now, MapReduce jobs on a single mongod process are single threaded Without parallelism, what are the benefits of MapReduce compared to simpler or more traditional methods for queries and data aggregation? To avoid confusion: the question is NOT "what are the benefits of document-oriented DB over traditional relational DB"

Read the article

Improve performance writing 10 million records to text file using windows service

- by user1039583

I'm fetching more than 10 millions of records from database and writing to a text file. It takes hours of time to complete this operation. Is there any option to use TPL features here? It would be great if someone could get me started implementing this with the TPL. using (FileStream fStream = new FileStream("d:\\file.txt", FileMode.OpenOrCreate, FileAccess.ReadWrite)) { BufferedStream bStream = new BufferedStream(fStream); TextWriter writer = new StreamWriter(bStream); for (int i = 0; i < 100000000; i++) { writer.WriteLine(i); } bStream.Flush(); writer.Flush(); // empty buffer; fStream.Flush(); }

Read the article

What's the status of multicore programming in Haskell?

- by Don Stewart

What's the status of multicore programming in Haskell? What projects, tools, and libraries are available now? What experience reports have there been?

Read the article

Problem with Boost::Asio for C++

- by Martin Lauridsen

Hi there, For my bachelors thesis, I am implementing a distributed version of an algorithm for factoring large integers (finding the prime factorisation). This has applications in e.g. security of the RSA cryptosystem. My vision is, that clients (linux or windows) will download an application and compute some numbers (these are independant, thus suited for parallelization). The numbers (not found very often), will be sent to a master server, to collect these numbers. Once enough numbers have been collected by the master server, it will do the rest of the computation, which cannot be easily parallelized. Anyhow, to the technicalities. I was thinking to use Boost::Asio to do a socket client/server implementation, for the clients communication with the master server. Since I want to compile for both linux and windows, I thought windows would be as good a place to start as any. So I downloaded the Boost library and compiled it, as it said on the Boost Getting Started page: bootstrap .\bjam It all compiled just fine. Then I try to compile one of the tutorial examples, client.cpp, from Asio, found (here.. edit: cant post link because of restrictions). I am using the Visual C++ compiler from Microsoft Visual Studio 2008, like this: cl /EHsc /I D:\Downloads\boost_1_42_0 client.cpp But I get this error: /out:client.exe client.obj LINK : fatal error LNK1104: cannot open file 'libboost_system-vc90-mt-s-1_42.lib' Anyone have any idea what could be wrong, or how I could move forward? I have been trying pretty much all week, to get a simple client/server socket program for c++ working, but with no luck. Serious frustration kicking in. Thank you in advance.

Read the article

Best ways to utilize computer resources

- by Algorist

Hi, Do you ever feel bad keeping office or lab computers "on" after day work? Is there anything useful you guys utilize those computers till you come back the next day morning. Thank you.

Read the article

Is it possible to use the Spread Toolkit on Amazon EC2?

- by Dave Viner

The Spread Toolkit (http://www.spread.org) allows for easy distributed messaging using publish-subscribe semantics. Is it possible to use this toolkit on EC2? What other pub-sub message buses can be used on EC2 (other than Amazon's SQS)?

Read the article

is windows azure installable on my own hardware ?

- by Omu

I just want to know if it is possible to install windows azure or any other cloud technology on your own hardware, or the only possibility to use it is to use the Microsoft's hosting ?

Read the article

What should i do for accomodating large scale data storage and retrieval?

- by kailashbuki

There's two columns in the table inside mysql database. First column contains the fingerprint while the second one contains the list of documents which have that fingerprint. It's much like an inverted index built by search engines. An instance of a record inside the table is shown below; 34 "doc1, doc2, doc45" The number of fingerprints is very large(can range up to trillions). There are basically following operations in the database: inserting/updating the record & retrieving the record accoring to the match in fingerprint. The table definition python snippet is: self.cursor.execute("CREATE TABLE IF NOT EXISTS `fingerprint` (fp BIGINT, documents TEXT)") And the snippet for insert/update operation is: if self.cursor.execute("UPDATE `fingerprint` SET documents=CONCAT(documents,%s) WHERE fp=%s",(","+newDocId, thisFP))== 0L: self.cursor.execute("INSERT INTO `fingerprint` VALUES (%s, %s)", (thisFP,newDocId)) The only bottleneck i have observed so far is the query time in mysql. My whole application is web based. So time is a critical factor. I have also thought of using cassandra but have less knowledge of it. Please suggest me a better way to tackle this problem.

Read the article

How to printf a time_t variable as a floating point number?

- by soneangel

Hi guys, I'm using a time_t variable in C (openMP enviroment) to keep cpu execution time...I define a float value sum_tot_time to sum time for all cpu's...I mean sum_tot_time is the sum of cpu's time_t values. The problem is that printing the value sum_tot_time it appear as an integer or long, by the way without its decimal part! I tried in these ways: to printf sum_tot_time as a double being a double value to printf sum_tot_time as float being a float value to printf sum_tot_time as double being a time_t value to printf sum_tot_time as float being a time_t value Please help me!!

Read the article

Are batch mutations atomic in Cassandra?

- by user317459

The Cassandra API supports batch mutations: batch_mutate(keyspace, mutation_map, consistency_level): Executes the specified mutations on the keyspace. mutation_map is a map; the outer map maps the key to the inner map, which maps the column family to the Mutation; can be read as: map. To be more specific, the outer map key is a row key, the inner map key is the column family name. A Mutation specifies either columns to insert or columns to delete. See Mutation and Deletion above for more details. Are all mutations that are executed in a batch executed atomically? So if one of the mutations fails, do the others fail too?

Read the article

How can I merge two Linq IEnumerable<T> queries without running them?

- by makerofthings7

How do I merge a List<T> of TPL-based tasks for later execution? public async IEnumerable<Task<string>> CreateTasks(){ /* stuff*/ } My assumption is .Concat() but that doesn't seem to work: void MainTestApp() // Full sample available upon request. { List<string> nothingList = new List<string>(); nothingList.Add("whatever"); cts = new CancellationTokenSource(); delayedExecution = from str in nothingList select AccessTheWebAsync("", cts.Token); delayedExecution2 = from str in nothingList select AccessTheWebAsync("1", cts.Token); delayedExecution = delayedExecution.Concat(delayedExecution2); } /// SNIP async Task AccessTheWebAsync(string nothing, CancellationToken ct) { // return a Task } I want to make sure that this won't spawn any task or evaluate anything. In fact, I suppose I'm asking "what logically executes an IQueryable to something that returns data"? Background Since I'm doing recursion and I don't want to execute this until the correct time, what is the correct way to merge the results if called multiple times? If it matters I'm thinking of running this command to launch all the tasks var AllRunningDataTasks = results.ToList(); followed by this code: while (AllRunningDataTasks.Count > 0) { // Identify the first task that completes. Task<TableResult> firstFinishedTask = await Task.WhenAny(AllRunningDataTasks); // ***Remove the selected task from the list so that you don't // process it more than once. AllRunningDataTasks.Remove(firstFinishedTask); // TODO: Await the completed task. var taskOfTableResult = await firstFinishedTask; // Todo: (doen't work) TrustState thisState = (TrustState)firstFinishedTask.AsyncState; // TODO: Update the concurrent dictionary with data // thisState.QueryStartPoint + thisState.ThingToSearchFor Interlocked.Decrement(ref thisState.RunningDirectQueries); Interlocked.Increment(ref thisState.CompletedDirectQueries); if (thisState.RunningDirectQueries == 0) { thisState.TimeCompleted = DateTime.UtcNow; } }

Read the article

Erlang message loops

- by Roger Alsing

How does message loops in erlang work, are they sync when it comes to processing messages? As far as I understand, the loop will start by "receive"ing a message and then perform something and hit another iteration of the loop. So that has to be sync? right? If multiple clients send messages to the same message loop, then all those messages are queued and performed one after another, or? To process multiple messages in parallell, you would have to spawn multiple message loops in different processes, right? Or did I misunderstand all of it?

Read the article

What are the advantages / disadvantages of a Cloud-based / Web-based IDE?

- by Gabe

I'm writing this as DevConnections in Las Vegas is happening. Visual Studio 2010 has been released and I now have this 3GB beast installed to my machine. (I'll admit, it has some nice features.) However, while the install was monopolizing my computer's resources I began to wish that my IDE worked more like Google Documents (instantly available, available anywhere, easy to share, easy to collaborate, naturally versioned). A few Google (and StackOverflow) searches led me to : Coderun Bespin I'm well aware that these IDE's are missing a lot of what exists in VS 2010. However, that isn't my question. Instead, I'm wondering what benefits a web-based IDE might have? Assuming a company invests the time to create the missing features, what is the downside?

Read the article

Expanding Git SHA1 information into a checkin without archiving?

- by Tim Lin

Is there a way to include git commit hashes inside a file everytime I commit? I can only find out how to do this during archiving but I haven't been able to find out how to do this for every commit. I'm doing scientific programming with git as revision control, so this kind of functionality would be very helpful for reproducibility reasons (i.e., have the git hash automatically included in all result files and figures).

Read the article

Efficiency of manually written loops vs operator overloads (C++)

- by Sagekilla

Hi all, in the program I'm working on I have 3-element arrays, which I use as mathematical vectors for all intents and purposes. Through the course of writing my code, I was tempted to just roll my own Vector class with simple +, -, *, /, etc overloads so I can simplify statements like: for (int i = 0; i < 3; i++) r[i] = r1[i] - r2[i]; // becomes: r = r1 - r2; Which should be more or less identical in generated code. But when it comes to more complicated things, could this really impact my performance heavily? One example that I have in my code is this: Manually written version: for (int j = 0; j < 3; j++) { p.vel[j] = p.oldVel[j] + (p.oldAcc[j] + p.acc[j]) * dt2 + (p.oldJerk[j] - p.jerk[j]) * dt12; p.pos[j] = p.oldPos[j] + (p.oldVel[j] + p.vel[j]) * dt2 + (p.oldAcc[j] - p.acc[j]) * dt12; } Using a Vector class with operator overloads: p.vel = p.oldVel + (p.oldAcc + p.acc) * dt2 + (p.oldJerk - p.jerk) * dt12; p.pos = p.oldPos + (p.oldVel + p.vel) * dt2 + (p.oldAcc - p.acc) * dt12; I am compiling my code for maximum possible speed, as it's extremely important that this code runs quickly and calculates accurately. So will me relying on my Vector's for these sorts of things really affect me? For those curious, this is part of some numerical integration code which is not trivial to run in my program. Any insight would be appreciated, as would any idioms or tricks I'm unaware of.

Read the article

help me understand cuda

- by scatman

i am having some troubles understanding threads in NVIDIA gpu architecture with cuda. please could anybody clarify these info: an 8800 gpu has 16 SMs with 8 SPs each. so we have 128 SPs. i was viewing stanford's video presentation and it was saying that every SP is capable of running 96 threads cuncurrently. does this mean that it (SP) can run 96/32=3 warps concurrently? moreover, since every SP can run 96 threads and we have 8 SPs in every SM. does this mean that every SM can run 96*8=768 threads concurrently?? but if every SM can run a single Block at a time, and the maximum number of threads in a block is 512, so what is the purpose of running 768 threads concurrently and have a max of 512 threads? a more general question is:how are blocks,threads,and warps distributed to SMs and SPs? i read that every SM gets a single block to execute at a time and threads in a block is divided into warps (32 threads), and SPs execute warps.

Read the article

add uchar values in ushort array with sse2 or sse3

- by pompolus

i have an unsigned short dst[16][16] matrix and a larger unsigned char src[m][n] matrix. Now i have to access in the src matrix and add a 16x16 submatrix to dst, using sse2 or ss3. In a my older implementation, I was sure that my summed values ??were never greater than 256, so i could do this: for (int row = 0; row < 16; ++row) { __m128i subMat = _mm_lddqu_si128(reinterpret_cast<const __m128i*>(src)); dst[row] = _mm_add_epi8(dst[row], subMat); src += W; // Step to next row i need to add } where W is an offset to reach the desired rows. This code works, but now my values in src are larger and summed could be greater than 256, so i need to store them as ushort. i've tried this: for (int row = 0; row < 16; ++row) { __m128i subMat = _mm_lddqu_si128(reinterpret_cast<const __m128i*>(src)); dst[row] = _mm_add_epi16(dst[row], subMat); src += W; // Step to next row i need to add } but it doesn't work. I'm not so good with sse, so any help will be appreciated.

Read the article

Subdomain forwarding using .htaccess

- by RJ

I want to redirect a praticular subdomain to the main domain http(s)://dl.example.com/par1/par2 to http(s)://www.example.com/par1/par2 How to achieve the above using .htaccess Why i want to do this: Whenever any user download a file from my server, if the file is huge , then user cannot do any other operation until the file is downloaded completely...so the solution that i have thought is to forward the download request through subdomain so that the browser may continue with rest of the operation. Thanks

Read the article

How do I make an already written concurrent program run on a GPU array?

- by memius

I have a neural network written in Erlang, and I just bought a GeForce GTX 260 card with a 240 core GPU on it. Is it trivial to use CUDA as glue to run this on the graphics card?

Read the article

MPI Large Data all to all transfer

- by csslayer

My application of MPI has some process that generate some large data. Say we have N+1 process (one for master control, others are workers), each of worker processes generate large data, which is now simply write to normal file, named file1, file2, ..., fileN. The size of each file may be quite different. Now I need to send all fileM to rank M process to do the next job, So it's just like all to all data transfer. My problem is how should I use MPI API to send these files efficiently? I used to use windows share folder to transfer these before, but I think it's not a good idea. I have think about MPI_file and MPI_All_to_all, but these functions seems not to be so suitable for my case. Simple MPI_Send and MPI_Recv seems hard to be used because every process need to transfer large data, and I don't want to use distributed file system for now.

Search Results

Search found 3521 results on 141 pages for 'parallel computing'.

Page 41/141 | < Previous Page | 37 38 39 40 41 42 43 44 45 46 47 48 | Next Page >

- by Fyg

- by Omu

- by Jonogan

- by Vickey

- by Will Marcouiller

- by netvope

- by user1039583

- by Don Stewart

- by Martin Lauridsen

- by Algorist

- by Dave Viner

- by Omu

- by kailashbuki

- by soneangel

- by user317459

- by makerofthings7

- by Roger Alsing

- by Gabe

- by Tim Lin

- by Sagekilla

- by scatman

- by pompolus

- by RJ

- by memius

- by csslayer

< Previous Page | 37 38 39 40 41 42 43 44 45 46 47 48 | Next Page >