Search Results

Search found 4842 results on 194 pages for 'computation expression'.

Page 119/194 | < Previous Page | 115 116 117 118 119 120 121 122 123 124 125 126 | Next Page >

SharePoint OCR image files indexing

Introduction This article describes how to setup indexing of the image files (including TIFF, PDF, JPEG, BMP...) using OCR technology. The indexing described below utilizes Microsoft IFilter technology and as such is not specific to SharePoint, but can be used with any product that uses Microsoft indexing: Microsoft Search, Desktop search, SQL Server search, and through the plug-ins with Google desktop search. I however use it with Microsoft Windows SharePoint Services 2003. For those other products, the registration may need to be slightly different. Background One of the projects I was working on required a storage of old documents scanned into PDF files. Then there was a separate team of people responsible for providing a tags for a search engine so those image documents could be found. The whole process was clumsy, labor intensive, and error prone. That was what started me on my exploration path. OCR The first search I fired was for the Open Source OCR products. Pretty quickly, I narrowed it down to TESSERACT (http://code.google.com/p/tesseract-ocr/). Tesseract is an orphaned brain child of HP that worked on it from 1985 to 1995. Then it was moved to the Open Source, and now if I understand it correctly, Google is working on it. With credentials like that, it's no wonder that Tesseract scores one of the highest marks on OCR recognition and accuracy. After downloading and struggling just a bit, I got Tesseract to work. The struggling part was that the home page claims that its base input format is a TIFF file. May be my TIFFs were bad, but I was able to get it to work only for BMP files. Image files conversion So now that I have an OCR that can convert BMP files into text, how do I get text out of the image PDF files? One more search, and I settled down on ImageMagic (http://www.imagemagick.org/). This is another wonderful Open Source utility that can convert any file into image. It did work out of the box, converting any TIFF files into bitmaps, but to get PDF files converted, it requires a GhostScript (http://mirror.cs.wisc.edu/pub/mirrors/ghost/GPL/gs864/gs864w32.exe). Dealing with text PDFs With that utility installed, I was cooking - I can convert any file (in particular PDF and TIFF) into bitmap, and then I can extract the text out of the bitmap. The only consideration was to somehow treat PDF files containing text differently - after all, OCR is very computation intensive and somewhat error prone even with perfect image quality and resolution. So another quick search, and I have a PDFTOTEXT (ftp://ftp.foolabs.com/pub/xpdf/xpdf-3.02pl4-win32.zip) - thank God for Open Source! With these guys, I can pull text out of PDF in an eye blink. However, I would get nothing for pure image PDFs, but I already have a solution for that! Batch process It took another 15 minutes to setup a batch script to automate the process: Check the file extension If file is a PDF file try to extract text out of it if there is more than certain amount of text in the file - done! if there is no text, convert first page into bitmap run OCR on the bitmap For any other file type, convert file into bitmap Run OCR on the bitmap Once you unzip the attached project, check out the bin\OCR.BAT file. It will create a temporary file in the directory where your source file is with the same name + the '.txt' extension.Continue span.fullpost {display:none;}

Read the article
Lazy Evaluation – Why being lazy in F# blows my mind!

- by MarkPearl

First of all – shout out to Peter Adams – from the feedback I have gotten from him on the last few posts of F# that I have done – my mind has just been expanded. I did a blog post a few days ago about infinite sequences – I didn’t really understand what was going on with it, and I still don’t really get it – but I am getting closer. In Peter’s last comment he made mention of Lazy Evaluation. I am ashamed to say that up till then I had never heard about lazy evaluation – how can evaluation be lazy? I mean, I know about lazy loading and that makes sense… but surely something is either evaluated or not! Well… a bit of reading today and I have been enlightened to a point – if you do know of any good articles explaining lazy evaluation please send them to me. So what is lazy evaluation and why is it useful? Lazy evaluation is a process whereby the system only computes the values needed and “ignores” the computations not needed. I’m going out on a limb here, but with this explanation in hand, imagine the following C# code… public int CalculatedVal() { int Val1 = 0; int Val2 = 0; for (int Count = 0; Count < 1000000; Count++) { Val1++; } return Val2; } Normally, even though Val1 is never needed, the system would loop 1000000 times and add 1 to the current value of Val1. Imagine if the system realized this and so just skipped this segment of code and instead did the following…. public int CalculatedVal() { int Val1 = 0; return Val2; } A massive saving in computation and wasted effort. Now I am pretty sure it isn’t as simple as this but I think this is the basic idea. For a more detailed explanation of lazy evaluation in c#, Pedram Rezei has a wonderful post on lazy evaluation and makes some C# comparisons. I am not going to take any thunder from him by repeating everything he said since I think he did such a good job of explaining it himself. What I am interested in though is how in F# do you tell something to have lazy evalution, and how do you know if something will be eager or lazy by looking at it. I found this post was useful. From reading around F# by default uses eager evaluation unless explicitly told to use lazy evaluation. One exception to this is sequences, which are lazy by default. Now reading about lazy evaluation has helped me understand more about F# coding… From my understanding of F# because of its declarative nature, most of the actual code you are declaring properties and rules – very little code is actually saying do this right now - but when it comes to a “do this” code section, it then evaluates and optimizes code and applies the rules. So props to lazy evaluation and its optimizations…

Read the article
Improving performance of a particle system (OpenGL ES)

- by Jason

I'm in the process of implementing a simple particle system for a 2D mobile game (using OpenGL ES 2.0). It's working, but it's pretty slow. I start getting frame rate battering after about 400 particles, which I think is pretty low. Here's a summary of my approach: I start with point sprites (GL_POINTS) rendered in a batch just using a native float buffer (I'm in Java-land on Android, so that translates as a java.nio.FloatBuffer). On GL context init, the following are set: GLES20.glViewport(0, 0, width, height); GLES20.glClearColor(0.0f, 0.0f, 0.0f, 0.0f); GLES20.glEnable(GLES20.GL_CULL_FACE); GLES20.glDisable(GLES20.GL_DEPTH_TEST); Each draw frame sets the following: GLES20.glEnable(GLES20.GL_BLEND); GLES20.glBlendFunc(GLES20.GL_ONE, GLES20.GL_ONE_MINUS_SRC_ALPHA); And I bind a single texture: GLES20.glActiveTexture(GLES20.GL_TEXTURE0); GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, textureHandle); GLES20.glUniform1i(mUniformTextureHandle, 0); Which is just a simple circle with some blur (and hence some transparency) http://cl.ly/image/0K2V2p2L1H2x Then there are a bunch of glVertexAttribPointer calls: mBuffer.position(position); mGlEs20.glVertexAttribPointer(mAttributeRGBHandle, valsPerRGB, GLES20.GL_FLOAT, false, stride, mBuffer); ...4 more of these Then I'm drawing: GLES20.glUniformMatrix4fv(mUniformProjectionMatrixHandle, 1, false, Camera.mProjectionMatrix, 0); GLES20.glDrawArrays(GLES20.GL_POINTS, 0, drawCalls); GLES20.glBindTexture(GLES20.GL_TEXTURE_2D, 0); My vertex shader does have some computation in it, but given that they're point sprites (with only 2 coordinate values) I'm not sure this is the problem: #ifdef GL_ES // Set the default precision to low. precision lowp float; #endif uniform mat4 u_ProjectionMatrix; attribute vec4 a_Position; attribute float a_PointSize; attribute vec3 a_RGB; attribute float a_Alpha; attribute float a_Burn; varying vec4 v_Color; void main() { vec3 v_FGC = a_RGB * a_Alpha; v_Color = vec4(v_FGC.x, v_FGC.y, v_FGC.z, a_Alpha * (1.0 - a_Burn)); gl_PointSize = a_PointSize; gl_Position = u_ProjectionMatrix * a_Position; } My fragment shader couldn't really be simpler: #ifdef GL_ES // Set the default precision to low. precision lowp float; #endif uniform sampler2D u_Texture; varying vec4 v_Color; void main() { gl_FragColor = texture2D(u_Texture, gl_PointCoord) * v_Color; } That's about it. I had read that transparent pixels in point sprites can cause issues, but surely not at only 400 points? I'm running on a fairly new device (12 month old Galaxy Nexus). My question is less about my approach (although I'm open to suggestion) but more about whether there are any specific OpenGL "no no's" that have leaked into my code. I'm sure there's GL master out there facepalming right now... I'd love to hear any critique.

Read the article
Serial plans: Threshold / Parallel_degree_limit = 1

- by jean-pierre.dijcks

As a very short follow up on the previous post. So here is some more on getting a serial plan and why that happens Another reason - compared to the auto DOP is not on as we looked at in the earlier post - and often more prevalent to get a serial plan is if the plan simply does not take long enough to consider a parallel path. The resulting plan and note looks like this (note that this is a serial plan!): explain plan for select count(1) from sales; SELECT PLAN_TABLE_OUTPUT FROM TABLE(DBMS_XPLAN.DISPLAY()); PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- Plan hash value: 672559287 -------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Cost (%CPU)| Time | Pstart| Pstop | -------------------------------------------------------------------------------------- PLAN_TABLE_OUTPUT -------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 5 (0)| 00:00:01 | | | | 1 | SORT AGGREGATE | | 1 | | | | | | 2 | PARTITION RANGE ALL| | 960 | 5 (0)| 00:00:01 | 1 | 16 | | 3 | TABLE ACCESS FULL | SALES | 960 | 5 (0)| 00:00:01 | 1 | 16 | Note ----- - automatic DOP: Computed Degree of Parallelism is 1 because of parallel threshold 14 rows selected. The parallel threshold is referring to parallel_min_time_threshold and since I did not change the default (10s) the plan is not being considered for a parallel degree computation and is therefore staying with the serial execution. Now we go into the land of crazy: Assume I do want this DOP=1 to happen, I could set the parameter in the init.ora, but to highlight it in this case I changed it on the session: alter session set parallel_degree_limit = 1; The result I get is: ERROR: ORA-02097: parameter cannot be modified because specified value is invalid ORA-00096: invalid value 1 for parameter parallel_degree_limit, must be from among CPU IO AUTO INTEGER>=2 Which of course makes perfect sense...

Read the article
New R Interface to Oracle Data Mining Available for Download

- by charlie.berger

The R Interface to Oracle Data Mining ( R-ODM) allows R users to access the power of Oracle Data Mining's in-database functions using the familiar R syntax. R-ODM provides a powerful environment for prototyping data analysis and data mining methodologies. R-ODM is especially useful for: Quick prototyping of vertical or domain-based applications where the Oracle Database supports the application Scripting of "production" data mining methodologies Customizing graphics of ODM data mining results (examples: classification, regression, anomaly detection) The R-ODM interface allows R users to mine data using Oracle Data Mining from the R programming environment. It consists of a set of function wrappers written in source R language that pass data and parameters from the R environment to the Oracle RDBMS enterprise edition as standard user PL/SQL queries via an ODBC interface. The R-ODM interface code is a thin layer of logic and SQL that calls through an ODBC interface. R-ODM does not use or expose any Oracle product code as it is completely an external interface and not part of any Oracle product. R-ODM is similar to the example scripts (e.g., the PL/SQL demo code) that illustrates the use of Oracle Data Mining, for example, how to create Data Mining models, pass arguments, retrieve results etc. R-ODM is packaged as a standard R source package and is distributed freely as part of the R environment's Comprehensive R Archive Network (CRAN). For information about the R environment, R packages and CRAN, see www.r-project.org. R-ODM is particularly intended for data analysts and statisticians familiar with R but not necessarily familiar with the Oracle database environment or PL/SQL. It is a convenient environment to rapidly experiment and prototype Data Mining models and applications. Data Mining models prototyped in the R environment can easily be deployed in their final form in the database environment, just like any other standard Oracle Data Mining model. What is R? R is a system for statistical computation and graphics. It consists of a language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files. The design of R has been heavily influenced by two existing languages: Becker, Chambers & Wilks' S and Sussman's Scheme. Whereas the resulting language is very similar in appearance to S, the underlying implementation and semantics are derived from Scheme. R was initially written by Ross Ihaka and Robert Gentleman at the Department of Statistics of the University of Auckland in Auckland, New Zealand. Since mid-1997 there has been a core group (the "R Core Team") who can modify the R source code archive. Besides this core group many R users have contributed application code as represented in the near 1,500 publicly-available packages in the CRAN archive (which has shown exponential growth since 2001; R News Volume 8/2, October 2008). Today the R community is a vibrant and growing group of dozens of thousands of users worldwide. It is free software distributed under a GNU-style copyleft, and an official part of the GNU project ("GNU S"). Resources: R website / CRAN R-ODM

Read the article
How the SPARC T4 Processor Optimizes Throughput Capacity: A Case Study

- by Ruud

This white paper demonstrates the architected latency hiding features of Oracle’s UltraSPARC T2+ and SPARC T4 processors That is the first sentence from this technical white paper, but what does it exactly mean? Let's consider a very simple example, the computation of a = b + c. This boils down to the following (pseudo-assembler) instructions that need to be executed: load @b, r1 load @c, r2 add r1,r2,r3 store r3, @a The first two instructions load variables b and c from an address in memory (here symbolized by @b and @c respectively). These values go into registers r1 and r2. The third instruction adds the values in r1 and r2. The result goes into register r3. The fourth instruction stores the contents of r3 into the memory address symbolized by @a. If we're lucky, both b and c are in a nearby cache and the load instructions only take a few processor cycles to execute. That is the good case, but what if b or c, or both, have to come from very far away? Perhaps both of them are in the main memory and then it easily takes hundreds of cycles for the values to arrive in the registers. Meanwhile the processor is doing nothing and simply waits for the data to arrive. Actually, it does something. It burns cycles while waiting. That is a waste of time and energy. Why not use these cycles to execute instructions from another application or thread in case of a parallel program? That is exactly what latency hiding on the SPARC T-Series processors does. It is a hardware feature totally transparent to the user and application. As soon as there is a delay in the execution, the hardware uses these otherwise idle cycles to execute instructions from another process. As a result, the throughput capacity of the system improves because idle cycles are no longer wasted and therefore more jobs can be run per unit of time. This feature has been in the SPARC T-series from the beginning, so why this paper? The difference with previous publications on this topic is in the amount of detail given. How this all works under the hood is fully explained using two example programs. Starting from the assembly language instructions, it is demonstrated in what way these programs execute. To really see what is happening we go down to the processor pipeline level, where the gaps in the execution are, and show in what way these idle cycles are filled by other copies of the same program running simultaneously. Both the SPARC T4 as well as the older UltraSPARC T2+ processor are covered. You may wonder why the UltraSPARC T2+ is included. The focus of this work is on the SPARC T4 processor, but to explain the basic concept of latency hiding at this very low level, we start with the UltraSPARC T2+ processor because it is architecturally a much simpler design. From the single issue, in-order pipelines of this processor we then shift gears and cover how this all works on the much more advanced dual issue, out-of-order architecture of the T4. The analysis and performance experiments have been conducted on both processors. The results depend on the processor, but in all cases the theoretical estimates are confirmed by the experiments. If you're interested to read a lot more about this and find out how things really work under the hood, you can download a copy of the paper here. A paper like this could not have been produced without the help of several other people. I want to thank the co-author of this paper, Jared Smolens, for his very valuable contributions and our highly inspiring discussions. I'm also indebted to Thomas Nau (Ulm University, Germany), Shane Sigler and Mark Woodyard (both at Oracle) for their feedback on earlier versions of this paper. Karen Perkins (Perkins Technical Writing and Editing) and Rick Ramsey at Oracle were very helpful in providing editorial and publishing assistance.

Read the article
Dynamically load and call delegates based on source data

- by makerofthings7

Assume I have a stream of records that need to have some computation. Records will have a combination of these functions run Sum, Aggregate, Sum over the last 90 seconds, or ignore. A data record looks like this: Date;Data;ID Question Assuming that ID is an int of some kind, and that int corresponds to a matrix of some delegates to run, how should I use C# to dynamically build that launch map? I'm sure this idea exists... it is used in Windows Forms which has many delegates/events, most of which will never actually be invoked in a real application. The sample below includes a few delegates I want to run (sum, count, and print) but I don't know how to make the quantity of delegates fire based on the source data. (say print the evens, and sum the odds in this sample) using System; using System.Threading; using System.Collections.Generic; internal static class TestThreadpool { delegate int TestDelegate(int parameter); private static void Main() { try { // this approach works is void is returned. //ThreadPool.QueueUserWorkItem(new WaitCallback(PrintOut), "Hello"); int c = 0; int w = 0; ThreadPool.GetMaxThreads(out w, out c); bool rrr =ThreadPool.SetMinThreads(w, c); Console.WriteLine(rrr); // perhaps the above needs time to set up6 Thread.Sleep(1000); DateTime ttt = DateTime.UtcNow; TestDelegate d = new TestDelegate(PrintOut); List<IAsyncResult> arDict = new List<IAsyncResult>(); int count = 1000000; for (int i = 0; i < count; i++) { IAsyncResult ar = d.BeginInvoke(i, new AsyncCallback(Callback), d); arDict.Add(ar); } for (int i = 0; i < count; i++) { int result = d.EndInvoke(arDict[i]); } // Give the callback time to execute - otherwise the app // may terminate before it is called //Thread.Sleep(1000); var res = DateTime.UtcNow - ttt; Console.WriteLine("Main program done----- Total time --> " + res.TotalMilliseconds); } catch (Exception e) { Console.WriteLine(e); } Console.ReadKey(true); } static int PrintOut(int parameter) { // Console.WriteLine(Thread.CurrentThread.ManagedThreadId + " Delegate PRINTOUT waited and printed this:"+parameter); var tmp = parameter * parameter; return tmp; } static int Sum(int parameter) { Thread.Sleep(5000); // Pretend to do some math... maybe save a summary to disk on a separate thread return parameter; } static int Count(int parameter) { Thread.Sleep(5000); // Pretend to do some math... maybe save a summary to disk on a separate thread return parameter; } static void Callback(IAsyncResult ar) { TestDelegate d = (TestDelegate)ar.AsyncState; //Console.WriteLine("Callback is delayed and returned") ;//d.EndInvoke(ar)); } }

Read the article
Looking for a real-world example illustrating that composition can be superior to inheritance

- by Job

I watched a bunch of lectures on Clojure and functional programming by Rich Hickey as well as some of the SICP lectures, and I am sold on many concepts of functional programming. I incorporated some of them into my C# code at a previous job, and luckily it was easy to write C# code in a more functional style. At my new job we use Python and multiple inheritance is all the rage. My co-workers are very smart but they have to produce code fast given the nature of the company. I am learning both the tools and the codebase, but the architecture itself slows me down as well. I have not written the existing class hierarchy (neither would I be able to remember everything about it), and so, when I started adding a fairly small feature, I realized that I had to read a lot of code in the process. At the surface the code is neatly organized and split into small functions/methods and not copy-paste-repetitive, but the flip side of being not repetitive is that there is some magic functionality hidden somewhere in the hierarchy chain that magically glues things together and does work on my behalf, but it is very hard to find and follow. I had to fire up a profiler and run it through several examples and plot the execution graph as well as step through a debugger a few times, search the code for some substring and just read pages at the time. I am pretty sure that once I am done, my resulting code will be short and neatly organized, and yet not very readable. What I write feels declarative, as if I was writing an XML file that drives some other magic engine, except that there is no clear documentation on what the XML should look like and what the engine does except for the existing examples that I can read as well as the source code for the 'engine'. There has got to be a better way. IMO using composition over inheritance can help quite a bit. That way the computation will be linear rather than jumping all over the hierarchy tree. Whenever the functionality does not quite fit into an inheritance model, it will need to be mangled to fit in, or the entire inheritance hierarchy will need to be refactored/rebalanced, sort of like an unbalanced binary tree needs reshuffling from time to time in order to improve the average seek time. As I mentioned before, my co-workers are very smart; they just have been doing things a certain way and probably have an ability to hold a lot of unrelated crap in their head at once. I want to convince them to give composition and functional as opposed to OOP approach a try. To do that, I need to find some very good material. I do not think that a SCIP lecture or one by Rich Hickey will do - I am afraid it will be flagged down as too academic. Then, simple examples of Dog and Frog and AddressBook classes do not really connivence one way or the other - they show how inheritance can be converted to composition but not why it is truly and objectively better. What I am looking for is some real-world example of code that has been written with a lot of inheritance, then hit a wall and re-written in a different style that uses composition. Perhaps there is a blog or a chapter. I am looking for something that can summarize and illustrate the sort of pain that I am going through. I already have been throwing the phrase "composition over inheritance" around, but it was not received as enthusiastically as I had hoped. I do not want to be perceived as a new guy who likes to complain and bash existing code while looking for a perfect approach while not contributing fast enough. At the same time, my gut is convinced that inheritance is often the instrument of evil and I want to show a better way in a near future. Have you stumbled upon any great resources that can help me?

Read the article
Seeking advice on tools and technology for my new game [closed]

- by k.k. slider

I'm a C# developer who has been programming a game in my spare time using XNA and Visual Studio. The game's logic is mostly done and I've completed a prototype that has most of the functionality of (what I envision to be) the final game. However, having heard about the uncertain future and (possibly) limited audience for XNA games, I'm looking to switch platforms... but I don't know what technology would best suit my needs. Below are some specifics about my game and what exactly I'm looking for, if you're interested: The game is a 2D turn-based tactical RPG (strategy game) for two players. It is a basic sprite and tile based game with animations and sound. 3D capabilities are not necessary. I'd like to allow players to compete with others online, and have a basic ranking/matchmaking system. I will probably need something that can interact with a server and a database (the game is turn-based and has no RNG, so cheating would be easy to detect even if most computation is done client-side and minimal data is sent to the server). Ideally, I would be able to release an early version of the game and have people give feedback as I develop additional features (similar to Minecraft). I'd prefer to have a way to release periodic updates to the game instead of releasing an absolute final product. To reach the widest possible audience, I'd prefer technology that allows me to release on PC, Android, iOS, and (maybe) Mac. This is a game with simple mouse inputs which can fit on a mobile touch screen. The game should be monetizable. If I find success with this game, then I may consider becoming a full-time indie game developer. I have several other game ideas and have learned quite a bit from my first attempt at game development. My first thought was an F2P/microtransaction model, but I'm open to other suggestions. Language isn't a primary concern of mine, since I have a decent amount of experience using several languages to program large projects. I'm willing to spend money (e.g. on a developer's license), but the more expensive it gets, the more hesitant I am to use it. I've looked into the following solutions... there are a LOT of tools out there... if anyone has experience with any of these and would like to recommend/reject any of them, it would be helpful. C#/.NET (XNA/MonoGame/SDL/SlimDX/Xamarin/ExEn/ANX?) HTML5/JS (AppMobi/PhoneGap/Marmalade/FlashCanvas/Cordova/libRocket?) Python (Pyglet/Pygame/Kivy?) Java (JavaFX/libGDX?) Unity/Construct 2/Cocos2D/NME/Corona/other game creation software? I'd like something that can do 2D and isn't limited by being too high-level. Other languages (Lua/LOVE? Moai?) Thanks for answering this rather long and tedious question...

Read the article
Why are floating point values so prolific?

- by Kibbee

So, title says it all. Why are floating point values so prolific in computer programming. Due to problems like rounding errors, and not being able to even accurately represent numbers such as 0.1, I really can't see how they got as far as they did. I understand that the computation is faster with floating point numbers, however, I can think of only a few cases that they actually the right data type would be using. If you sat back and think about every time you used a floating point value, how many times did you say, well, some error would be ok, as long as the result was a few microseconds faster. It really makes me think because Jeff was talking about NP completeness, and how heuristics give an answer that is kind of right. And well, computers shouldn't do that. They should give you the answer that is correct. Yet we see floating point values used in many applications where they are completely not valid. What really bugs me, isn't that floating point exists, but that in many languages, there isn't even a viable alternative, non-floating point, decimal value. A lot of programmers when doing financial applications have to fall back to storing the number of cents in an integer field. Which brings with it all kinds of other problems. Why do floats continue to be so prolific, even though they can't represent the real answer, and we expect computers to be accurate? [EDIT] Just to clarify, I was talking about Base 2 floating points, and not base 10 floating points. .Net offers the Decimal data type, which is a base 10 floating point value which offers a much better representation of the numbers we deal with on a daily basis in most computer programs. I find it hard to believe that even modern languages like Java don't support base 10 floating point values, unless you want to move into the realm of things like BigDecimal, which isn't really the right answer either in a lot of situations.

Read the article
How are you taking advantage of Multicore?

- by tgamblin

As someone in the world of HPC who came from the world of enterprise web development, I'm always curious to see how developers back in the "real world" are taking advantage of parallel computing. This is much more relevant now that all chips are going multicore, and it'll be even more relevant when there are thousands of cores on a chip instead of just a few. My questions are: How does this affect your software roadmap? I'm particularly interested in real stories about how multicore is affecting different software domains, so specify what kind of development you do in your answer (e.g. server side, client-side apps, scientific computing, etc). What are you doing with your existing code to take advantage of multicore machines, and what challenges have you faced? Are you using OpenMP, Erlang, Haskell, CUDA, TBB, UPC or something else? What do you plan to do as concurrency levels continue to increase, and how will you deal with hundreds or thousands of cores? If your domain doesn't easily benefit from parallel computation, then explaining why is interesting, too. Finally, I've framed this as a multicore question, but feel free to talk about other types of parallel computing. If you're porting part of your app to use MapReduce, or if MPI on large clusters is the paradigm for you, then definitely mention that, too. Update: If you do answer #5, mention whether you think things will change if there get to be more cores (100, 1000, etc) than you can feed with available memory bandwidth (seeing as how bandwidth is getting smaller and smaller per core). Can you still use the remaining cores for your application?

Read the article
Genetic Algorithms applied to Curve Fitting

- by devoured elysium

Let's imagine I have an unknown function that I want to approximate via Genetic Algorithms. For this case, I'll assume it is y = 2x. I'd have a DNA composed of 5 elements, one y for each x, from x = 0 to x = 4, in which, after a lot of trials and computation and I'd arrive near something of the form: best_adn = [ 0, 2, 4, 6, 8 ] Keep in mind I don't know beforehand if it is a linear function, a polynomial or something way more ugly, Also, my goal is not to infer from the best_adn what is the type of function, I just want those points, so I can use them later. This was just an example problem. In my case, instead of having only 5 points in the DNA, I have something like 50 or 100. What is the best approach with GA to find the best set of points? Generating a population of 100, discard the worse 20% Recombine the remaining 80%? How? Cutting them at a random point and then putting together the first part of ADN of the father with the second part of ADN of the mother? Mutation, how should I define in this kind of problem mutation? Is it worth using Elitism? Any other simple idea worth using around? Thanks

Read the article
Posting xml from classic asp to asp.net

- by Chris Dunaway

I apologize if this has been asked before. I searched and didn't find anything that matched my situation. Also bear in mind I am fairly new to asp/asp.net development. My current project is a relatively simple e-commerce site. The customer will connect to the site, select products, input shipping and billing information, payment information (credit card) and submit the order. The project is being split into two parts: The store front which includes displaying the items and taking the customer's shipping and billing information and the payment site which will collect the customers credit card, compute tax, and save the order into the company's system. The reason that the site was split up, was that our side (payment side) already has facilities for credit card handling and tax computation. There may also be some regulatory issues that the store front side does not want to deal with (which we already do). I'm working on the payment portion of the app and I am using asp.net. The store front side is being written in classic asp (not my decision). Each part will be hosted on different servers. The problem I am having is transferring the contents of the "shopping cart" to our app so that we can collect the cc info and submit the order. We had thought that the classic asp could somehow post an xml fragment which contains the billing/shipping info and the items selected. Our side would display a summary of the order, securely collect the credit card info, and submit the order to our system. But I have been unable to post or send the xml from a classic asp on one server, to our asp.net application on another. It all works just fine when I test on the same server. How can I post (or otherwise transfer) the shopping cart data from classic asp to asp.net across server boundaries and transfer control to the asp.net application? As I said, I am new to web development, so this is proving quite a challenge for me. Thanks

Read the article
fast similarity detection

- by reinierpost

I have a large collection of objects and I need to figure out the similarities between them. To be exact: given two objects I can compute their dissimilarity as a number, a metric - higher values mean less similarity and 0 means the objects have identical contents. The cost of computing this number is proportional to the size of the smaller object (each object has a given size). I need the ability to quickly find, given an object, the set of objects similar to it. To be exact: I need to produce a data structure that maps any object o to the set of objects no more dissimilar to o than d, for some dissimilarity value d, such that listing the objects in the set takes no more time than if they were in an array or linked list (and perhaps they actually are). Typically, the set will be very much smaller than the total number of objects, so it is really worthwhile to perform this computation. It's good enough if the data structure assumes a fixed d, but if it works for an arbitrary d, even better. Have you seen this problem before, or something similar to it? What is a good solution? To be exact: a straightforward solution involves computing the dissimilarities between all pairs of objects, but this is slow - O(n2) where n is the number of objects. Is there a general solution with lower complexity?

Read the article
How to setup matlabpool for multiple processors?

- by JohnIdol

I just setup a Extra Large Heavy Computation EC2 instance to throw it at my Genetic Algorithms problem, hoping to speed up things. This instance has 8 Intel Xeon processors (around 2.4Ghz each) and 7 Gigs of RAM. On my machine I have an Intel Core Duo, and matlab is able to work with my two cores just fine by runinng: matlabpool open 2 On the EC2 instance though, matlab only is capable of detecting 1 out of 8 processors, and if I try running: matlabpool open 8 I get an error saying that the ClusterSize is 1 since there's only 1 core on my CPU. True, there is only 1 core on each CPU, but I have 8 CPUs on the given EC2 instance! So the difference from my machine and the ec2 instance is that I have my 2 cores on a single processor locally, while the EC2 instance has 8 distinct processors. My question is, how do I get matlab to work with those 8 processors? I found this paper, but it seems related to setting up matlab with multiple EC2 instances (not related to multiple processors on the same instance, EC2 or not), which is not my problem. Any help appreciated!

Read the article
Silverlight 4, Out of browser, Printing, Automatic updates

- by minal

I have a very critial business application presently running using Winforms. The application is a very core UI shell. It accepts input data, calls a webservice on my server to do the computation, displays the results on the winforms app and finally send a print stream to the printer. Presently the application is deployed using Click-once. Moving forward, I am trying to contemplate wheather I should move the application into a Silverlight application. Couple of reasons I am thinking silverlight. Gives clients the feel that it is a cloud based solution. Can be accessed from any PC. While the clickonce app is able to do this as well, they have to install an app, and when updates are available they have to click "Yes" to update. The application presently has a drop down list of customers, this list has expanded to over 3000 records. Scrolling through the list is very painful. With Silverlight I am thinking of the auto complete ability. Out of the browser - this will be handy for those users who use the app daily. I haven't used Silverlight previous hence looking for some expert advice on a few things: Printing - does silverlight allow sending raw print data to the printer. The application prints to a Zebra Thermal label printer. I have to send raw bytes to the printer with the commands. Can this be done with SL, or will it always prompt the "Print" dialog? Out of browser - when SL apps are installed as out of browser, how to updates come through, does the app update automatically or is the user prompted to opt for update?

Read the article
Static Data Structures on Embedded Devices (Android in particular)

- by Mark

I've started working on some Android applications and have a question regarding how people normally deal with situations where you have a static data set and have an application where that data is needed in memory as one of the standard java collections or as an array. In my current specific issue i have a spreadsheet with some pre-calculated data. It consists of ~100 rows and 3 columns. 1 Column is a string, 1 column is a float, 1 column is an integer. I need access to this data as an array in java. It seems like i could: 1) Encode in XML - This would be cpu intensive to decode in my experience. 2) build into SQLite database - seems like a lot of overhead for static access to data i only need array style access to in ram. 3) Build into binary blob and read in. (never done this in java, i miss void *) 4) Build a python script to take the CSV version of my data and spit out a java function that adds the values to my desired structure with hard coded values. 5) Store a string array via androids resource mechanism and compute the other 2 columns on application load. In my case the computation would require a lot of calls to Math.log, Math.pow and Math.floor which i'd rather not have to do for load time and battery usage reasons. I mostly work in low power embedded applications in C and as such #4 is what i'm used to doing in these situations. It just seems like it should be far easier to gain access to static data structures in java/android. Perhaps I'm just being too battery usage conscious and in my single case i imagine the answer is that it doesn't matter much, but if every application took that stance it could begin to matter. What approaches do people usually take in this situation? Anything I missed?

Read the article
Dilemma of Sending a Byte Array from JavaScript to COM.

- by Voulnet

Hello there, I'm having a bit of a problem because neither Javascript nor ActiveX (written in C++) are behaving like good little children. All I'm asking them to do is for Javascript to send a byte array and for the ActiveX to receive the byte array correctly in order to do more computation. This is how I declared my byte array in JS, and I verified it inside JS: var arr = new Array(0x00, 0xA4, 0x04, 0x00, 0x10,0xA0, 0x00, 00, 00, 0x18, 0x30, 0x03, 0x01, 00, 00, 00, 00, 00, 00, 00, 00); Javascript sends this array as an argument to an ActiveX method. Here is the tricky part; I want the ActiveX method to receive the byte array as a SAFEARRAY or VARIANT, but I can't get it to work for the life of me. I tried debugging and seeing the contents received inside the ActiveX as both SAFEARRAY or VARIANT, but to no avail. Here is the IDL segment: [id(7), helpstring("NEW Sends APDU to Card and Reads the Response")] HRESULT NewSendAPDU( [in] VARIANT pAPDU, [out, retval]VARIANT* pAPDUResponse); Any help is greatly appreciated. Thanks in advance!

Read the article
Autodetect Presence of CSV Headers in a File

- by banzaimonkey

Short question: How do I automatically detect whether a CSV file has headers in the first row? Details: I've written a small CSV parsing engine that places the data into an object that I can access as (approximately) an in-memory database. The original code was written to parse third-party CSV with a predictable format, but I'd like to be able to use this code more generally. I'm trying to figure out a reliable way to automatically detect the presence of CSV headers, so the script can decide whether to use the first row of the CSV file as keys / column names or start parsing data immediately. Since all I need is a boolean test, I could easily specify an argument after inspecting the CSV file myself, but I'd rather not have to (go go automation). I imagine I'd have to parse the first 3 to ? rows of the CSV file and look for a pattern of some sort to compare against the headers. I'm having nightmares of three particularly bad cases in which: The headers include numeric data for some reason The first few rows (or large portions of the CSV) are null There headers and data look too similar to tell them apart If I can get a "best guess" and have the parser fail with an error or spit out a warning if it can't decide, that's OK. If this is something that's going to be tremendously expensive in terms of time or computation (and take more time than it's supposed to save me) I'll happily scrap the idea and go back to working on "important things". I'm working with PHP, but this strikes me as more of an algorithmic / computational question than something that's implementation-specific. If there's a simple algorithm I can use, great. If you can point me to some relevant theory / discussion, that'd be great, too. If there's a giant library that does natural language processing or 300 different kinds of parsing, I'm not interested.

Read the article
Reordering arguments using recursion (pro, cons, alternatives)

- by polygenelubricants

I find that I often make a recursive call just to reorder arguments. For example, here's my solution for endOther from codingbat.com: Given two strings, return true if either of the strings appears at the very end of the other string, ignoring upper/lower case differences (in other words, the computation should not be "case sensitive"). Note: str.toLowerCase() returns the lowercase version of a string. public boolean endOther(String a, String b) { return a.length() < b.length() ? endOther(b, a) : a.toLowerCase().endsWith(b.toLowerCase()); } I'm very comfortable with recursions, but I can certainly understand why some perhaps would object to it. There are two obvious alternatives to this recursion technique: Swap a and b traditionally public boolean endOther(String a, String b) { if (a.length() < b.length()) { String t = a; a = b; b = t; } return a.toLowerCase().endsWith(b.toLowerCase()); } Not convenient in a language like Java that doesn't pass by reference Lots of code just to do a simple operation An extra if statement breaks the "flow" Repeat code public boolean endOther(String a, String b) { return (a.length() < b.length()) ? b.toLowerCase().endsWith(a.toLowerCase()) : a.toLowerCase().endsWith(b.toLowerCase()); } Explicit symmetry may be a nice thing (or not?) Bad idea unless the repeated code is very simple ...though in this case you can get rid of the ternary and just || the two expressions So my questions are: Is there a name for these 3 techniques? (Are there more?) Is there a name for what they achieve? (e.g. "parameter normalization", perhaps?) Are there official recommendations on which technique to use (when)? What are other pros/cons that I may have missed?

Read the article
Fast JSON serialization (and comparison with Pickle) for cluster computing in Python?

- by user248237

I have a set of data points, each described by a dictionary. The processing of each data point is independent and I submit each one as a separate job to a cluster. Each data point has a unique name, and my cluster submission wrapper simply calls a script that takes a data point's name and a file describing all the data points. That script then accesses the data point from the file and performs the computation. Since each job has to load the set of all points only to retrieve the point to be run, I wanted to optimize this step by serializing the file describing the set of points into an easily retrievable format. I tried using JSONpickle, using the following method, to serialize a dictionary describing all the data points to file: def json_serialize(obj, filename, use_jsonpickle=True): f = open(filename, 'w') if use_jsonpickle: import jsonpickle json_obj = jsonpickle.encode(obj) f.write(json_obj) else: simplejson.dump(obj, f, indent=1) f.close() The dictionary contains very simple objects (lists, strings, floats, etc.) and has a total of 54,000 keys. The json file is ~20 Megabytes in size. It takes ~20 seconds to load this file into memory, which seems very slow to me. I switched to using pickle with the same exact object, and found that it generates a file that's about 7.8 megabytes in size, and can be loaded in ~1-2 seconds. This is a significant improvement, but it still seems like loading of a small object (less than 100,000 entries) should be faster. Aside from that, pickle is not human readable, which was the big advantage of JSON for me. Is there a way to use JSON to get similar or better speed ups? If not, do you have other ideas on structuring this? (Is the right solution to simply "slice" the file describing each event into a separate file and pass that on to the script that runs a data point in a cluster job? It seems like that could lead to a proliferation of files). thanks.

Read the article
My OpenCL kernel is slower on faster hardware.. But why?

- by matdumsa

Hi folks, As I was finishing coding my project for a multicore programming class I came up upon something really weird I wanted to discuss with you. We were asked to create any program that would show significant improvement in being programmed for a multi-core platform. I’ve decided to try and code something on the GPU to try out OpenCL. I’ve chosen the matrix convolution problem since I’m quite familiar with it (I’ve parallelized it before with open_mpi with great speedup for large images). So here it is, I select a large GIF file (2.5 MB) [2816X2112] and I run the sequential version (original code) and I get an average of 15.3 seconds. I then run the new OpenCL version I just wrote on my MBP integrated GeForce 9400M and I get timings of 1.26s in average.. So far so good, it’s a speedup of 12X!! But now I go in my energy saver panel to turn on the “Graphic Performance Mode” That mode turns off the GeForce 9400M and turns on the Geforce 9600M GT my system has. Apple says this card is twice as fast as the integrated one. Guess what, my timing using the kick-ass graphic card are 3.2 seconds in average… My 9600M GT seems to be more than two times slower than the 9400M.. For those of you that are OpenCL inclined, I copy all data to remote buffers before starting, so the actual computation doesn’t require roundtrip to main ram. Also, I let OpenCL determine the optimal local-worksize as I’ve read they’ve done a pretty good implementation at figuring that parameter out.. Anyone has a clue? edit: full source code with makefiles here http://www.mathieusavard.info/convolution.zip cd gimage make cd ../clconvolute make put a large input.gif in clconvolute and run it to see results

Read the article
Does query plan optimizer works well with joined/filtered table-valued functions?

- by smoothdeveloper

In SQLSERVER 2005, I'm using table-valued function as a convenient way to perform arbitrary aggregation on subset data from large table (passing date range or such parameters). I'm using theses inside larger queries as joined computations and I'm wondering if the query plan optimizer work well with them in every condition or if I'm better to unnest such computation in my larger queries. Does query plan optimizer unnest table-valued functions if it make sense? If it doesn't, what do you recommend to avoid code duplication that would occur by manually unnesting them? If it does, how do you identify that from the execution plan? code sample: create table dbo.customers ( [key] uniqueidentifier , constraint pk_dbo_customers primary key ([key]) ) go /* assume large amount of data */ create table dbo.point_of_sales ( [key] uniqueidentifier , customer_key uniqueidentifier , constraint pk_dbo_point_of_sales primary key ([key]) ) go create table dbo.product_ranges ( [key] uniqueidentifier , constraint pk_dbo_product_ranges primary key ([key]) ) go create table dbo.products ( [key] uniqueidentifier , product_range_key uniqueidentifier , release_date datetime , constraint pk_dbo_products primary key ([key]) , constraint fk_dbo_products_product_range_key foreign key (product_range_key) references dbo.product_ranges ([key]) ) go . /* assume large amount of data */ create table dbo.sales_history ( [key] uniqueidentifier , product_key uniqueidentifier , point_of_sale_key uniqueidentifier , accounting_date datetime , amount money , quantity int , constraint pk_dbo_sales_history primary key ([key]) , constraint fk_dbo_sales_history_product_key foreign key (product_key) references dbo.products ([key]) , constraint fk_dbo_sales_history_point_of_sale_key foreign key (point_of_sale_key) references dbo.point_of_sales ([key]) ) go create function dbo.f_sales_history_..snip.._date_range ( @accountingdatelowerbound datetime, @accountingdateupperbound datetime ) returns table as return ( select pos.customer_key , sh.product_key , sum(sh.amount) amount , sum(sh.quantity) quantity from dbo.point_of_sales pos inner join dbo.sales_history sh on sh.point_of_sale_key = pos.[key] where sh.accounting_date between @accountingdatelowerbound and @accountingdateupperbound group by pos.customer_key , sh.product_key ) go -- TODO: insert some data -- this is a table containing a selection of product ranges declare @selectedproductranges table([key] uniqueidentifier) -- this is a table containing a selection of customers declare @selectedcustomers table([key] uniqueidentifier) declare @low datetime , @up datetime -- TODO: set top query parameters . select saleshistory.customer_key , saleshistory.product_key , saleshistory.amount , saleshistory.quantity from dbo.products p inner join @selectedproductranges productrangeselection on p.product_range_key = productrangeselection.[key] inner join @selectedcustomers customerselection on 1 = 1 inner join dbo.f_sales_history_..snip.._date_range(@low, @up) saleshistory on saleshistory.product_key = p.[key] and saleshistory.customer_key = customerselection.[key] I hope the sample makes sense. Much thanks for your help!

Read the article
How do I find a source code position from an address given by a crash in Window CE

- by Shane MacLaughlin

I have a Windows mobile 4.0 application, written using EVC++ 4.0 SP4 with MFC, that is exhibiting a random occasional crash in the field. e.g. Exception ox800000002 at 00112584. It does not happen under various emulators and simulators, hence is very difficult to trace using a debugger. The crash throws up and address and exception type. Given that I have the PDB is there any way to track this address to the source. I can't recompile using VC++ 8 as it doesn't support the mobile 4 SDK. My guess is that without a stack trace I'm not going to have much joy, as the chances are that the exception may not be in my source. Worth a try all the same. Edit As suggested, I have looked at the address in the context of the .MAP file for the program. This reveals the following Address Publics by Value Rva+Base Lib:Object 0001:00000000 ?GetUnduValue@@YANMM@Z 00011000 f 7Par.obj ' ' ' 0001:001124b8 ?OnLButtonUp@CGXGridUserDragSelectRangeImp@@UAAHPAVCGXGridCore@@AAVCPoint@@AAI@Z 001234b8 f gxseldrg.obj 0001:001126d8 ?OnSelDragStart@CGXGridUserDragSelectRangeImp@@UAAHPAVCGXGridCore@@KK@Z 001236d8 f gxseldrg.obj Which suggests the error occured during CGXGridUserDragSelectRangeImp::OnLButtonUp(), which seems a bit odd as I don't think there was a mouse / keyboard / screen button pressed at the time. Could be the stack got fragged before the crash got reported, and I'm wasting my time. I'll recompile with assembler output to try to isolate it to a given line, but don't hold out much hope :( Does the fact that the map file reports segmented addresses e.g. 0001:xxxxxxxxx and the crash report unsegmented addresses mean I have to carry out some computation to get the map address from the crash address?

Read the article
How can I apply a PSSM efficiently?

- by flies

I am fitting for position specific scoring matrices (PSSM aka Position Specific Weight Matrices). The fit I'm using is like simulated annealing, where I the perturb the PSSM, compare the prediction to experiment and accept the change if it improves agreement. This means I apply the PSSM millions of times per fit; performance is critical. In my particular problem, I'm applying a PSSM for an object of length L (~8 bp) at every position of a DNA sequence of length M (~30 bp) (so there are M-L+1 valid positions). I need an efficient algorithm to apply a PSSM. Can anyone help improve performance? My best idea is to convert the DNA into some kind of a matrix so that applying the PSSM is matrix multiplication. There are efficient linear algebra libraries out there (e.g. BLAS), but I'm not sure how best to turn an M-length DNA sequence into a matrix M x 4 matrix and then apply the PSSM at each position. The solution needs to work for higher order/dinucleotide terms in the PSSM - presumably this means representing the sequence-matrix for mono-nucleotides and separately for dinucleotides. My current solution iterates over each position m, then over each letter in word from m to m+L-1, adding the corresponding term in the matrix. I'm storing the matrix as a multi-dimensional STL vector, and profiling has revealed that a lot of the computation time is just accessing the elements of the PSSM (with similar performance bottlenecks accessing the DNA sequence). If someone has an idea besides matrix multiplication, I'm all ears.

Read the article

Search Results

Search found 4842 results on 194 pages for 'computation expression'.

Page 119/194 | < Previous Page | 115 116 117 118 119 120 121 122 123 124 125 126 | Next Page >

- by MarkPearl

- by Jason

- by jean-pierre.dijcks

- by charlie.berger

- by Ruud

- by makerofthings7

- by Job

- by k.k. slider

- by Kibbee

- by tgamblin

- by devoured elysium

- by Chris Dunaway

- by reinierpost

- by JohnIdol

- by minal

- by Mark

- by Voulnet

- by banzaimonkey

- by polygenelubricants

- by user248237

- by matdumsa

- by smoothdeveloper

- by Shane MacLaughlin

- by flies

< Previous Page | 115 116 117 118 119 120 121 122 123 124 125 126 | Next Page >