cloud computing - Page 66

median of a billion numbers

- by anony

If you have one billion numbers and one hundred computers, what is the best way to locate the median of the numbers?

Reproducibility in scientific programming

Along with producing incorrect results, one of the worst fears in scientific programming is not being able to reproduce the results you've generated. What best practices help ensure your analysis is reproducible?

Read the article

Higher speed options for executing very large (20 GB) .sql file in MySQL

- by Jonogan

My firm was delivered a 20+ GB .sql file in reponse to a request for data from the gov't. I don't have many options for getting the data in a different format, so I need options for how to import it in a reasonable amount of time. I'm running it on a high end server (Win 2008 64bit, MySQL 5.1) using Navicat's batch execution tool. It's been running for 14 hours and shows no signs of being near completion. Does anyone know of any higher speed options for such a transaction? Or is this what I should expect given the large file size? Thanks

Read the article

open source gossip-based membership protocol?

- by Aaron

I am looking for a library which I can plug into a distributed application which implements any gossip-based membership protocol. Such a library would allow me to send/receive membership lists, merge received membership lists, etc... Even better would be if the library implemented a protocol with performance O(logn) performance guarantees. Does anyone know of any open source library like this? It doesn't need to meet all of the aforementioned requirements; even something partially implemented would be helpful.

Read the article

Is it possible to use the Spread Toolkit on Amazon EC2?

- by Dave Viner

The Spread Toolkit (http://www.spread.org) allows for easy distributed messaging using publish-subscribe semantics. Is it possible to use this toolkit on EC2? What other pub-sub message buses can be used on EC2 (other than Amazon's SQS)?

Read the article

Preallocate memory for a program in Linux before it gets started

- by Fyg

Hi, folks, I have a program that repeatedly solves large systems of linear equations using cholesky decomposition. Characterising is that I sometimes need to store the complete factorisation which can exceed about 20 GB of memory. The factorisation happens inside a library that I call. Furthermore, this matrix and the resulting factorisation changes quite frequently and as such the memory requirements as well. I am not the only person to use this compute-node. Therefore, is there a way to start the program under Linux and preallocate free memory for the process? Something like: $: prealloc -m 25G ./program

Read the article

Problem with Boost::Asio for C++

- by Martin Lauridsen

Hi there, For my bachelors thesis, I am implementing a distributed version of an algorithm for factoring large integers (finding the prime factorisation). This has applications in e.g. security of the RSA cryptosystem. My vision is, that clients (linux or windows) will download an application and compute some numbers (these are independant, thus suited for parallelization). The numbers (not found very often), will be sent to a master server, to collect these numbers. Once enough numbers have been collected by the master server, it will do the rest of the computation, which cannot be easily parallelized. Anyhow, to the technicalities. I was thinking to use Boost::Asio to do a socket client/server implementation, for the clients communication with the master server. Since I want to compile for both linux and windows, I thought windows would be as good a place to start as any. So I downloaded the Boost library and compiled it, as it said on the Boost Getting Started page: bootstrap .\bjam It all compiled just fine. Then I try to compile one of the tutorial examples, client.cpp, from Asio, found (here.. edit: cant post link because of restrictions). I am using the Visual C++ compiler from Microsoft Visual Studio 2008, like this: cl /EHsc /I D:\Downloads\boost_1_42_0 client.cpp But I get this error: /out:client.exe client.obj LINK : fatal error LNK1104: cannot open file 'libboost_system-vc90-mt-s-1_42.lib' Anyone have any idea what could be wrong, or how I could move forward? I have been trying pretty much all week, to get a simple client/server socket program for c++ working, but with no luck. Serious frustration kicking in. Thank you in advance.

Read the article

What should i do for accomodating large scale data storage and retrieval?

- by kailashbuki

There's two columns in the table inside mysql database. First column contains the fingerprint while the second one contains the list of documents which have that fingerprint. It's much like an inverted index built by search engines. An instance of a record inside the table is shown below; 34 "doc1, doc2, doc45" The number of fingerprints is very large(can range up to trillions). There are basically following operations in the database: inserting/updating the record & retrieving the record accoring to the match in fingerprint. The table definition python snippet is: self.cursor.execute("CREATE TABLE IF NOT EXISTS `fingerprint` (fp BIGINT, documents TEXT)") And the snippet for insert/update operation is: if self.cursor.execute("UPDATE `fingerprint` SET documents=CONCAT(documents,%s) WHERE fp=%s",(","+newDocId, thisFP))== 0L: self.cursor.execute("INSERT INTO `fingerprint` VALUES (%s, %s)", (thisFP,newDocId)) The only bottleneck i have observed so far is the query time in mysql. My whole application is web based. So time is a critical factor. I have also thought of using cassandra but have less knowledge of it. Please suggest me a better way to tackle this problem.

Read the article

Best ways to utilize computer resources

- by Algorist

Hi, Do you ever feel bad keeping office or lab computers "on" after day work? Is there anything useful you guys utilize those computers till you come back the next day morning. Thank you.

Read the article

Are batch mutations atomic in Cassandra?

- by user317459

The Cassandra API supports batch mutations: batch_mutate(keyspace, mutation_map, consistency_level): Executes the specified mutations on the keyspace. mutation_map is a map; the outer map maps the key to the inner map, which maps the column family to the Mutation; can be read as: map. To be more specific, the outer map key is a row key, the inner map key is the column family name. A Mutation specifies either columns to insert or columns to delete. See Mutation and Deletion above for more details. Are all mutations that are executed in a batch executed atomically? So if one of the mutations fails, do the others fail too?

Read the article

Parallelizing a serial algorithm

- by user643813

Hej folks, I am working on porting a Text mining/Natural language application from single-core to a Map-Reduce style system. One of the steps involves a while loop similar to this: Queue<Element>; while (!queue.empty()) { Element e = queue.next(); Set<Element> result = calculateResultSet(e); if (!result.empty()) { queue.addAll(result); } } Each iteration depends on the result of the one before (kind of). There is no way of determining the number of iterations this loop will have to perform. Is there a way of parallelizing a serial algorithm such as this one? I am trying to think of a feedback mechanism, that is able to provide its own input, but how would one go about parallelizing it? Thanks for any help/remarks

Read the article

For distributed applications, which to use, ASIO vs. MPI?

- by Rhubarb

I am a bit confused about this. If you're building a distributed application, which in some cases may perform parallel operations (although not necessarily mathematical), should you use ASIO or something like MPI? I take it MPI is a higher level than ASIO, but it's not clear where in the stack one would begin.

Read the article

Expanding Git SHA1 information into a checkin without archiving?

- by Tim Lin

Is there a way to include git commit hashes inside a file everytime I commit? I can only find out how to do this during archiving but I haven't been able to find out how to do this for every commit. I'm doing scientific programming with git as revision control, so this kind of functionality would be very helpful for reproducibility reasons (i.e., have the git hash automatically included in all result files and figures).

Read the article

Efficiency of manually written loops vs operator overloads (C++)

- by Sagekilla

Hi all, in the program I'm working on I have 3-element arrays, which I use as mathematical vectors for all intents and purposes. Through the course of writing my code, I was tempted to just roll my own Vector class with simple +, -, *, /, etc overloads so I can simplify statements like: for (int i = 0; i < 3; i++) r[i] = r1[i] - r2[i]; // becomes: r = r1 - r2; Which should be more or less identical in generated code. But when it comes to more complicated things, could this really impact my performance heavily? One example that I have in my code is this: Manually written version: for (int j = 0; j < 3; j++) { p.vel[j] = p.oldVel[j] + (p.oldAcc[j] + p.acc[j]) * dt2 + (p.oldJerk[j] - p.jerk[j]) * dt12; p.pos[j] = p.oldPos[j] + (p.oldVel[j] + p.vel[j]) * dt2 + (p.oldAcc[j] - p.acc[j]) * dt12; } Using a Vector class with operator overloads: p.vel = p.oldVel + (p.oldAcc + p.acc) * dt2 + (p.oldJerk - p.jerk) * dt12; p.pos = p.oldPos + (p.oldVel + p.vel) * dt2 + (p.oldAcc - p.acc) * dt12; I am compiling my code for maximum possible speed, as it's extremely important that this code runs quickly and calculates accurately. So will me relying on my Vector's for these sorts of things really affect me? For those curious, this is part of some numerical integration code which is not trivial to run in my program. Any insight would be appreciated, as would any idioms or tricks I'm unaware of.

Read the article

Understanding omission failure in distributed systems

- by karthik A

The following text says this which I'm not able to quite agree : client C sends a request R to server S. The time taken by a communication link to transport R over the link is D. P is the maximum time needed by S to recieve , process and reply to R. If omission failure is assumed ; then if no reply to R is received within 2(D+P) , then C will never recieve a reply to R . Why is the time here 2(D+P). As I understand shouldn't it be 2D+P ?

Read the article

Recommendations for Open Source Parallel programming IDE

- by Andrew Bolster

What are the best IDE's / IDE plugins / Tools, etc for programming with CUDA / MPI etc? I've been working in these frameworks for a short while but feel like the IDE could be doing more heavy lifting in terms of scaling and job processing interactions. (I usually use Eclipse or Netbeans, and usually in C/C++ with occasional Java, and its a vague question but I can't think of any more specific way to put it)

Read the article

Cluster of computers for rent?

- by R S

I am doing a project in the university which requires running of multiple instances (1000s) of a program I've written (in C++), which runs for quite a while (say 2 hours). The program is very self contained - it does not require input files, and the only dependency I think is boost. I'm currently using the university-owned cluster of computer. However, it's quite old and the jobs dispatching and monitors services are pretty bad. So I was wondering whether I can run my jobs elsewhere, for some money. For example, I looked a bit into Google App Engine, but as it seems every job must end after 30 seconds it is not suitable for me. Maybe Amazon EC2? Do you know of such options?

Read the article

No recent books on MPI: is it dying?

- by Jono

I've never used Message Passing Interface (MPI), but I've heard its name thrown about, most recently with Windows HPC Server. I had a quick look on amazon to see if there were any books on it, but they're all dated around 7 or more years ago. Is MPI still a valid technology choice for new applications, or has it been largely superceded by other distributed programming alternatives (e.g. DataSynapse GridServer)? As it's not really an implementation, but rather a standard, what is the likelihood (assuming it's not dead) that learning it will result in better design of distributed programming systems? Is there something else I should be looking at instead?

Read the article

C++ Winsock P2P

- by Goober

Scenario Does anyone have any good examples of peer-to-peer (p2p) networking in C++ using Winsock? It's a requirement I have for a client who specifically needs to use this technology (god knows why). I need to determine whether this is feasible. Any help would be greatly appreciated.

Read the article

Does anyone know of any good tutorials for using the APIs from Amazon WeB Services, namely CloudWatc

- by undefined

Hi, I have been wrestling with Amazon's CloudWatch API with limited success. Does anyone know of any good resources (other than amazon's api docs) for using the APIs. I have tried to run them using the PHP library for CloudWatch but get nothing but error codes. I am configuring the GetMetricStatisticsSample.php file as follows: $request = array(); $endTime = date("Y-m-d G:i:s"); $yesterday = mktime (date("H"), date("i"), date("s"), date("m"), date("d")-1, date("Y")); $startTime = date("Y-m-d 00:00:00", $yesterday); $request["Statistics.member.1"] = "Average"; $request["EndTime"] = $endTime; $request["StartTime"] = $startTime; $request["MeasureName"] = "CPUUtilization"; $request["Unit"] = "Percent"; invokeGetMetricStatistics($service, $request); But this returns "Caught Exception: Internal Error Response Status Code: 400 Error Code: Error Type: Request ID: XML:" I have also tried from command line as follows - set JAVA_HOME=C:\Program Files\Java\jre1.6.0_05 set AWS_CLOUDWATCH_HOME=C:\AmazonWebServices\API_tools\CloudWatch-1.0.0.24 set PATH=%AWS_CLOUDWATCH_HOME%\bin mon-list-metrics but get C:|Program' is not recognized as an internal or external command... any suggestions? cheers

Read the article

Outer product using CBLAS

- by The Dude

I am having trouble utilizing CBLAS to perform an Outer Product. My code is as follows: //===SET UP===// double x1[] = {1,2,3,4}; double x2[] = {1,2,3}; int dx1 = 4; int dx2 = 3; double X[dx1 * dx2]; for (int i = 0; i < (dx1*dx2); i++) {X[i] = 0.0;} //===DO THE OUTER PRODUCT===// cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasTrans, dx1, dx2, 1, 1.0, x1, dx1, x2, 1, 0.0, X, dx1); //===PRINT THE RESULTS===// printf("\nMatrix X (%d x %d) = x1 (*) x2 is:\n", dx1, dx2); for (i=0; i<4; i++) { for (j=0; j<3; j++) { printf ("%lf ", X[j+i*3]); } printf ("\n"); } I get: Matrix X (4 x 3) = x1 (*) x2 is: 1.000000 2.000000 3.000000 0.000000 -1.000000 -2.000000 -3.000000 0.000000 7.000000 14.000000 21.000000 0.000000 But the correct answer is found here: https://www.sharcnet.ca/help/index.php/BLAS_and_CBLAS_Usage_and_Examples I have seen: Efficient computation of kronecker products in C But, it doesn't help me because they don't actually say how to utilize dgemm to actually do this... Any help? What am I doing wrong here?

Read the article

How do I control output files name and content of an Hadoop streaming job?

- by Eran Kampf

Is there a way to control the output filenames of an Hadoop Streaming job? Specifically I would like my job's output files content and name to be organized by the ket the reducer outputs - each file would only contain values for one key and its name would be the key. Update: Just found the answer - Using a Java class that derives from MultipleOutputFormat as the jobs output format allows control of the output file names. http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.htmlhttp://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputFormat.html I havent seen any samples for this out there... Can anyone point out to an Hadoop Streaming sample that makes use of a custom output format Java class?

Read the article

What kind of work benifits from OpenCL

- by Daniel

Hey All First of all: I am well aware that OpenCL does not magically make everything faster I am well aware that OpenCL has limitations So now to my question, i am used to do different scientific calculations using programming. Some of the things i work with is pretty intense in regards to the complexity and number of calculations. SO i was wondering, maybe i could speed things up bu using OpenCL. So, what i would love to hear from you all is answers to some of the following [bonus for links]: *What kind of calculations/algorithms/general problems is suitable for OpenCL *What is the general guidelines for determining if some particular code would benefit by migration to OpenCL? Regards

Read the article

Scientific Plotting in Python

- by user100046

I have a large data set of tuples containing (time of event, latitude, longitude) that I need to visualize. I was hoping to generate a 'movie'-like xy-plot, but was wondering if anyone has a better idea or if there is an easy way to do this in Python? Thanks in advance for the help, --Leo

Read the article

Vector Calculations in LISP

- by abidikgubidik

How can I perform vector calculations in lisp, such as magnitude of a vector, norm of a vector, distance (between two points), dot product, cross product, etc. Thanks.

Search Results

Search found 5318 results on 213 pages for 'cloud computing'.

Page 66/213 | < Previous Page | 62 63 64 65 66 67 68 69 70 71 72 73 | Next Page >

- by anony

- by Andrew Grimm

- by Jonogan

- by Aaron

- by Dave Viner

- by Fyg

- by Martin Lauridsen

- by kailashbuki

- by Algorist

- by user317459

- by user643813

- by Rhubarb

- by Tim Lin

- by Sagekilla

- by karthik A

- by Andrew Bolster

- by R S

- by Jono

- by Goober

- by undefined

- by The Dude

- by Eran Kampf

- by Daniel

- by user100046

- by abidikgubidik

< Previous Page | 62 63 64 65 66 67 68 69 70 71 72 73 | Next Page >