Search Results

Search found 5070 results on 203 pages for 'algorithm'.

Page 180/203 | < Previous Page | 176 177 178 179 180 181 182 183 184 185 186 187 | Next Page >

Java split xml file

- by CC

Hi all, I'm working on a piece of code to split files. I want to split flat file (that's ok, it is working fine) and xml file. The idea is to split based of a number of files to split: I have a file, and I want to split it in x files (x is a parameters). I'm doing the split by taking the size of the file and spliting the size by the number of files to split. Then, mysolution was to use a BufferedReader and to use it like while ((n = reader.read(buffer, 0, buffer.length)) != -1) { { The main problem is that for the xml file I cannot just split it, but I have to split it based on a block delimited by a start xml tag and end xml tag: <start tag> bla bla xml stuff </end tag> So I cannot cut a block at the middle. So if when I'm at the half of a block, is the size of my new file is greater than my max, I will have to read until the end of the tag, and then, to start a next file. The problem is that I have all sort of cases, and is a bit difficult to search the end tag. - the block reads a text until the middle of the end tag - the block reads a text until the end of the end tag, and no more other caracter after - etc and in the same time to have a loop and read the next block. Some times the end of a block concatenated with the start of the next one, I have the end xml tag. I hope you get the idea. My question is, does anyone have some algorithm that does that more accurate and who i treating all special cases ? The idea is to split the file as quickly as possible. Thanks alot.

Read the article
Issue with clipping rectangles and back to front rendering

- by Milo

Here is my problem. My rendering algorithm renders from back to front. But logically, clipping rectangles need to be applied from front to back. Hence why the following does not work: void AguiWidgetManager::recursiveRender(const AguiWidget *root) { //recursively calls itself to render widgets from back to front AguiWidget* nonConstRoot = (AguiWidget*)root; if(!nonConstRoot->isVisable()) { return; } //push the clipping rectangle if(nonConstRoot->isClippingChildren()) { graphicsContext->pushClippingRect(nonConstRoot->getClippingRectangle()); } if(nonConstRoot->isEnabled()) { nonConstRoot->paint(AguiPaintEventArgs(true,graphicsContext)); for(std::vector<AguiWidget*>::const_iterator it = root->getPrivateChildBeginIterator(); it != root->getPrivateChildEndIterator(); ++it) { recursiveRender(*it); } for(std::vector<AguiWidget*>::const_iterator it = root->getChildBeginIterator(); it != root->getChildEndIterator(); ++it) { recursiveRender(*it); } } else { nonConstRoot->paint(AguiPaintEventArgs(false,graphicsContext)); for(std::vector<AguiWidget*>::const_iterator it = root->getPrivateChildBeginIterator(); it != root->getPrivateChildEndIterator(); ++it) { recursiveRenderDisabled(*it); } for(std::vector<AguiWidget*>::const_iterator it = root->getChildBeginIterator(); it != root->getChildEndIterator(); ++it) { recursiveRenderDisabled(*it); } } //release clipping rectangle if(nonConstRoot->isClippingChildren()) { graphicsContext->popClippingRect(); } } I could ofcourse go to the top of the tree, then apply clipping rectangles inward until I get to the currently rendered widget, but that would involve lots of clipping rectangles @ 60 frames per second. I want to minimize calls to pushing and popping rectangles. What could I do, Thanks

Read the article
Calculate car filled up times

- by Ivan

Here is the question: The driving distance between Perth and Adelaide is 1996 miles. On the average, the fuel consumption of a 2.0 litre 4 cylinder car is 8 litres per 100 kilometres. The fuel tank capacity of such a car is 60 litres. Design and implement a JAVA program that prompts for the fuel consumption and fuel tank capacity of the aforementioned car. The program then displays the minimum number of times the car’s fuel tank has to be filled up to drive from Perth to Adelaide. Note that 62 miles is equal to 100 kilometres. What data will you use to test that your algorithm works correctly? Here is what I've done so far: import java.util.Scanner;// public class Ex4{ public static void main( String args[] ){ Scanner input = new Scanner( System.in ); double distance, consumption, capacity, time; distance = Math.sqrt(1996/62*100); consumption = Math.sqrt(8/100); capacity = 60; time = Math.sqrt(distance*consumption/capacity); System.out.println("The car's fuel tank need to be filled up:" + time + "times"); } } I can compile it but the problem is that the result is always 0.0, can anyone help me what's wrong with it ?

Read the article
Using multiple aggregate functions in an (ANSI) SQL statement

- by morpheous

I have aggregate functions foo(), foobar(), fredstats(), barneystats() I want to create a domain specific query language (DSQL) above my DB, to facilitate using a domain language to query the DB. The 'language' comprises of algebraic expressions (or more specifically SQL like criteria) which I use to generate (ANSI) SQL statements which are sent to the db engine. The following lines are examples of what the language statements will look like, and hopefully, it will help further clarify the concept: **Example 1** DQL statement: foobar('yellow') between 1 and 3 and fredstats('weight') > 42 Translation: fetch all rows in an underlying table where computed values for aggregate function foobar() is between 1 and 3 AND computed value for AGG FUNC fredstats() is greater than 42 **Example 2** DQL statement: fredstats('weight') < barneystats('weight') AND foo('fighter') in (9,10,11) AND foobar('green') <> 42 Translation: Fetch all rows where the specified criteria matches **Example 3** DQL statement: foobar('green') / foobar('red') <> 42 Translation: Fetch all rows where the specified criteria matches **Example 4** DQL statement: foobar('green') - foobar('red') >= 42 Translation: Fetch all rows where the specified criteria matches Given the following information: The table upon which the queries above are being executed is called 'tbl' table 'tbl' has the following structure (id int, name varchar(32), weight float) The result set returns only the tbl.id, tbl.name and the names of the aggregate functions as columns in the result set - so for example the foobar() AGG FUNC column will be called foobar in the result set. So for example, the first DQL query will return a result set with the following columns: id, name, foobar, fredstats Given the above, my questions then are: What would be the underlying SQL required for Example1 ? What would be the underlying SQL required for Example3 ? Given an algebraic equation comprising of AGGREGATE functions, Is there a way of generalizing the algorithm needed to generate the required ANSI SQL statement(s)? I am using PostgreSQL as the db, but I would prefer to use ANSI SQL wherever possible.

Read the article
Trying to use boost lambda, but my code won't compile

- by hamishmcn

Hi, I am trying to use boost lambda to avoid having to write trivial functors. For example, I want to use the lambda to access a member of a struct or call a method of a class, eg: #include <vector> #include <utility> #include <algorithm> #include <boost/lambda/lambda.hpp> using namespace std; using namespace boost::lambda; vector< pair<int,int> > vp; vp.push_back( make_pair<int,int>(1,1) ); vp.push_back( make_pair<int,int>(3,2) ); vp.push_back( make_pair<int,int>(2,3) ); sort(vp.begin(), vp.end(), _1.first > _2.first ); When I try and compile this I get the following errors: error C2039: 'first' : is not a member of 'boost::lambda::lambda_functor<T>' with [ T=boost::lambda::placeholder<1> ] error C2039: 'first' : is not a member of 'boost::lambda::lambda_functor<T>' with [ T=boost::lambda::placeholder<2> ] Since vp contains pair<int,int> I thought that _1.first should work. What I am doing wrong?

Read the article
Collaborative filtering in MySQL ?

- by user281434

Hi I'm trying to develop a site that recommends items(fx. books) to users based on their preferences. So far, I've read O'Reilly's "Collective Intelligence" and numerous other online articles. They all, however, seem to deal with single instances of recommendation, for example if you like book A then you might like book B. What I'm trying to do is to create a set of 'preference-nodes' for each user on my site. Let's say a user likes book A,B and C. Then, when they add book D, I don't want the system to recommend other books based solely other users experience with book D. I wan't the system to look up similar 'preference-nodes' and recommend books based on that. Here's an example of 4 nodes: User1: 'book A'->'book B'->'book C' User2: 'book A'->'book B'->'book C'->'book D' user3: 'book X'->'book Y'->'book C'->'book Z' user4: 'book W'->'book Q'->'book C'->'book Z' So a recommendation system, as described in the material I've read, would recommend book Z to User 1, because there are two people who recommends Z in conjuction with liking C (ie. Z weighs more than D), even though a user with a similar 'preference-node', User2, would be more qualified to recommend book D because he has a more similar interest-pattern. So does any of you have any experience with this sort of thing? Is there some things I should try to read or does there exist any open source systems for this? Thanks for your time! Small edit: I think last.fm's algorithm is doing exactly what I my system to do. Using the preference-trees of people to recommmend music more personally to people. Instead of just saying "you might like B because you liked A"

Read the article
Bad crypto error in .NET 4.0

- by Andrey

Today I moved my web application to .net 4.0 and Forms Auth just stopped working. After several hours of digging into my SqlMembershipProvider (simplified version of built-in SqlMembershipProvider), I found that HMACSHA256 hash is not consistent. This is the encryption method: internal string EncodePassword(string pass, int passwordFormat, string salt) { if (passwordFormat == 0) // MembershipPasswordFormat.Clear return pass; byte[] bIn = Encoding.Unicode.GetBytes(pass); byte[] bSalt = Convert.FromBase64String(salt); byte[] bAll = new byte[bSalt.Length + bIn.Length]; byte[] bRet = null; Buffer.BlockCopy(bSalt, 0, bAll, 0, bSalt.Length); Buffer.BlockCopy(bIn, 0, bAll, bSalt.Length, bIn.Length); if (passwordFormat == 1) { // MembershipPasswordFormat.Hashed HashAlgorithm s = HashAlgorithm.Create( Membership.HashAlgorithmType ); bRet = s.ComputeHash(bAll); } else { bRet = EncryptPassword( bAll ); } return Convert.ToBase64String(bRet); } Passing the same password and salt twice returns different results!!! It was working perfectly in .NET 3.5 Anyone aware of any breaking changes, or is it a known bug? UPDATE: When I specify SHA512 as hashing algorithm, everything works fine, so I do believe it's a bug in .NET 4.0 crypto Thanks! Andrey

Read the article
Is it possible in any Java IDE to collapse the type definitions in the source code?

- by asmaier

Lately I often have to read Java code like this: LinkedHashMap<String, Integer> totals = new LinkedHashMap<String, Integer>(listOfRows.get(0)) for (LinkedHashMap<String, Integer> row : (ArrayList<LinkedHashMap<String,Integer>>) table.getValue()) { for(Entry<String, Integer> elem : row.entrySet()) { String colName=elem.getKey(); int Value=elem.getValue(); int oldValue=totals.get(colName); int sum = Value + oldValue; totals.put(colName, sum); } } Due to the long and nested type definitions the simple algorithm becomes quite obscured. So I wished I could remove or collapse the type definitions with my IDE to see the Java code without types like: totals = new (listOfRows.get(0)) for (row : table.getValue()) { for(elem : row.entrySet()) { colName=elem.getKey(); Value=elem.getValue(); oldValue=totals.get(colName); sum = Value + oldValue; totals.put(colName, sum); } } The best way of course would be to collapse the type definitions, but when moving the mouse over a variable show the type as a tooltip. Is there a Java IDE or a plugin for an IDE that can do this?

Read the article
Replicating SQL's 'Join' in Python

- by Daniel Mathews

I'm in the process of trying to switch from R to Python (mainly issues around general flexibility). With Numpy, matplotlib and ipython, I've am able to cover all my use cases save for merging 'datasets'. I would like to simulate SQL's join by clause (inner, outer, full) purely in python. R handles this with the 'merge' function. I've tried the numpy.lib.recfunctions join_by, but it critical issues with duplicates along the 'key': join_by(key, r1, r2, jointype='inner', r1postfix='1', r2postfix='2', defaults=None, usemask=True, asrecarray=False) Join arrays r1 and r2 on key key. The key should be either a string or a sequence of string corresponding to the fields used to join the array. An exception is raised if the key field cannot be found in the two input arrays. Neither r1 nor r2 should have any duplicates along key: the presence of duplicates will make the output quite unreliable. Note that duplicates are not looked for by the algorithm. source: http://presbrey.mit.edu:1234/numpy.lib.recfunctions.html Any pointers or help will be most appreciated!

Read the article
Neural Network settings for fast training

- by danpalmer

I am creating a tool for predicting the time and cost of software projects based on past data. The tool uses a neural network to do this and so far, the results are promising, but I think I can do a lot more optimisation just by changing the properties of the network. There don't seem to be any rules or even many best-practices when it comes to these settings so if anyone with experience could help me I would greatly appreciate it. The input data is made up of a series of integers that could go up as high as the user wants to go, but most will be under 100,000 I would have thought. Some will be as low as 1. They are details like number of people on a project and the cost of a project, as well as details about database entities and use cases. There are 10 inputs in total and 2 outputs (the time and cost). I am using Resilient Propagation to train the network. Currently it has: 10 input nodes, 1 hidden layer with 5 nodes and 2 output nodes. I am training to get under a 5% error rate. The algorithm must run on a webserver so I have put in a measure to stop training when it looks like it isn't going anywhere. This is set to 10,000 training iterations. Currently, when I try to train it with some data that is a bit varied, but well within the limits of what we expect users to put into it, it takes a long time to train, hitting the 10,000 iteration limit over and over again. This is the first time I have used a neural network and I don't really know what to expect. If you could give me some hints on what sort of settings I should be using for the network and for the iteration limit I would greatly appreciate it. Thank you!

Read the article
Output iterator's value_type

- by wilhelmtell

The STL commonly defines an output iterator like so: template<class Cont> class insert_iterator : public iterator<output_iterator_tag,void,void,void,void> { // ... Why do output iterators define value_type as void? It would be useful for an algorithm to know what type of value it is supposed to output. For example, a function that translates a URL query "key1=value1&key2=value2&key3=value3" into any container that holds key-value strings elements. template<typename Ch,typename Tr,typename Out> void parse(const std::basic_string<Ch,Tr>& str, Out result) { std::basic_string<Ch,Tr> key, value; // loop over str, parse into p ... *result = typename iterator_traits<Out>::value_type(key, value); } The SGI reference page of value_type hints this is because it's not possible to dereference an output iterator. But that's not the only use of value_type: I might want to instantiate one in order to assign it to the iterator.

Read the article
Approximate string matching with a letter confusion matrix?

- by zigglenaut

I'm trying to model a phonetic recognizer that has to isolate instances of words (strings of phones) out of a long stream of phones that doesn't have gaps between each word. The stream of phones may have been poorly recognized, with letter substitutions/insertions/deletions, so I will have to do approximate string matching. However, I want the matching to be phonetically-motivated, e.g. "m" and "n" are phonetically similar, so the substitution cost of "m" for "n" should be small, compared to say, "m" and "k". So, if I'm searching for [mein] "main", it would match the letter sequence [meim] "maim" with, say, cost 0.1, whereas it would match the letter sequence [meik] "make" with, say, cost 0.7. Similarly, there are differing costs for inserting or deleting each letter. I can supply a confusion matrix that, for each letter pair (x,y), gives the cost of substituting x with y, where x and y are any letter or the empty string. I know that there are tools available that do approximate matching such as agrep, but as far as I can tell, they do not take a confusion matrix as input. That is, the cost of any insertion/substitution/deletion = 1. My question is, are there any open-source tools already available that can do approximate matching with confusion matrices, and if not, what is a good algorithm that I can implement to accomplish this?

Read the article
Serializing MDI Winforms for persistency

- by Serge

Hello, basically my project is an MDI Winform application where a user can customize the interface by adding various controls and changing the layout. I would like to be able to save the state of the application for each user. I have done quite a bit of searching and found these: http://stackoverflow.com/questions/2076259/how-to-auto-save-and-auto-load-all-properties-in-winforms-c http://stackoverflow.com/questions/1669522/c-save-winform-or-controls-to-file Basically from what I understand, the best approach is to serialize the data to XML, however winform controls are not serializable, so I would have use surrogate classes: http://www.codeproject.com/KB/dotnet/Surrogate_Serialization.aspx Now, do I need to write a surrogate class for each of my controls? I would need to write some sort of a recursive algorithm to save all my controls, what is the best approach to do accomplish that? How would I then restore all the windows, should I use the memento design pattern for that? If I want to implement multiple users later, should I use Nhibernate to store all the object data in a database? I am still trying to wrap my head around the problem and if anyone has any experience or advice I would greatly appreciate it, thanks.

Read the article
Is there a PHP benchmark that meets these specific criteria? [closed]

- by Alex R

I'm working on a tool which converts PHP code to Scala. As one of the finishing touches, I'm in need of a really good (er, somewhat biased) benchmark. By dumb luck my first benchmark attempt was with some code which uses bcmath extensively, which unfortunately is 1000x slower in Java, making the Scala code 22x slower overall than the original PHP. So I'm looking for some meaningful PHP benchmark with the following characteristics: The PHP source needs to be in a single file. It should solve a real-world problem. No silly looping over empty methods etc. I need it to be simple to setup - no databases, hard-to-find input files, etc. Simple text input and output preferred. It should not use features that are slow in Java (BigInteger, trigonometric functions, etc). It should not use exoteric or dynamic PHP functions (e.g. no "eval" or "variable vars"). It should not over-rely on built-in libraries, e.g. MD5, crypt, etc. It should not be I/O bound. A CPU-bound memory-hungry algorithm is preferred. Basically, intensive OO operations, integer and string manipulation, recursion, etc would be great. Thanks

Read the article
simple Java "service provider frameworks"?

- by Jason S

I refer to "service provider framework" as discussed in Chapter 2 of Effective Java, which seems like exactly the right way to handle a problem I am having, where I need to instantiate one of several classes at runtime, based on a String to select which service, and an Configuration object (essentially an XML snippet): But how do I get the individual service providers (e.g. a bunch of default providers + some custom providers) to register themselves? interface FooAlgorithm { /* methods particular to this class of algorithms */ } interface FooAlgorithmProvider { public FooAlgorithm getAlgorithm(Configuration c); } class FooAlgorithmRegistry { private FooAlgorithmRegistry() {} static private final Map<String, FooAlgorithmProvider> directory = new HashMap<String, FooAlgorithmProvider>(); static public FooAlgorithmProvider getProvider(String name) { return directory.get(serviceName); } static public boolean registerProvider(String name, FooAlgorithmProvider provider) { if (directory.containsKey(name)) return false; directory.put(name, provider); return true; } } e.g. if I write custom classes MyFooAlgorithm and MyFooAlgorithmProvider to implement FooAlgorithm, and I distribute them in a jar, is there any way to get registerProvider to be called automatically, or will my client programs that use the algorithm have to explicitly call FooAlgorithmRegistry.registerProvider() for each class they want to use?

Read the article
What is the right approach to checksumming UDP packets

- by mr.b

I'm building UDP server application in C#. I've come across a packet checksum problem. As you probably know, each packet should carry some simple way of telling receiver if packet data is intact. Now, UDP already has 2-byte checksum as part of header, which is optional, at least in IPv4 world. Alternative method is to have custom checksum as part of data section in each packet, and to verify it on receiver. My question boils down to: is it better to rely on (optional) checksum in UDP packet header, or to make a custom checksum implementation as part of packet data section? Perhaps the right answer depends on circumstances (as usual), so one circumstance here is that, even though code is written and developed in .NET on Windows, it might have to run under platform-independent Mono.NET, so eventual solution should be compatible with other platforms. I believe that custom checksum algorithm would be easily portable, but I'm not so sure about the first one. Any thoughts? Also, shouts about packet checksumming in general are welcome.

Read the article
Triangle numbers problem....show within 4 seconds

- by Daredevil

The sequence of triangle numbers is generated by adding the natural numbers. So the 7th triangle number would be 1 + 2 + 3 + 4 + 5 + 6 + 7 = 28. The first ten terms would be: 1, 3, 6, 10, 15, 21, 28, 36, 45, 55, ... Let us list the factors of the first seven triangle numbers: 1: 1 3: 1,3 6: 1,2,3,6 10: 1,2,5,10 15: 1,3,5,15 21: 1,3,7,21 28: 1,2,4,7,14,28 We can see that 28 is the first triangle number to have over five divisors. Given an integer n, display the first triangle number having at least n divisors. Sample Input: 5 Output 28 Input Constraints: 1<=n<=320 I was obviously able to do this question, but I used a naive algorithm: Get n. Find triangle numbers and check their number of factors using the mod operator. But the challenge was to show the output within 4 seconds of input. On high inputs like 190 and above it took almost 15-16 seconds. Then I tried to put the triangle numbers and their number of factors in a 2d array first and then get the input from the user and search the array. But somehow I couldn't do it: I got a lot of processor faults. Please try doing it with this method and paste the code. Or if there are any better ways, please tell me.

Read the article
A way for a file to have its own MD5 inside? Or a string that is it's own MD5?

- by Eli

Hi all, In considering several possible solutions to a recent task, I found myself considering how to get a php file that includes it's own MD5 hash. I ended up doing something else, but the question stayed with me. Something along the lines of: <?php echo("Hello, my MD5 is [MD5 OF THIS FILE HERE]"); ?> Whatever placeholder you have in the file, the second you take its MD5 and insert it, you've changed it, which changes it's MD5, etc. Edit: Perhaps I should rephrase my question: Does anyone know if it has been proven impossible, or if there has been any research on an algorithm that would result in a file containing it's own MD5 (or other hash)? I suppose if the MD5 was the only content in the file, then the problem can be restated as how to find a string that is it's own MD5. It may well be impossible for us to create a process that will result in such a thing, but I can't think of any reason the solution itself can't exist. The question is basically whether it really is impossible, simply improbable (on the order of monkeys randomly typing Shakespeare), or actually solvable by somebody smarter than myself.

Read the article
Is it immoral to write crappy code even if readability and correctness is not a requirement?

- by mafutrct

There are cases when crappy (i.e. unreadable and buggy) code is not much of a problem. For instance, imagine you need to generate a big text file that mostly follows a simple pattern with a few very complex exceptions. What do you do? You quickly write a simple algorithm and insert the exceptional bits in the output manually to save 4 hours. The code is unreadable, and the output is flawed, but it's still the correct way since it is way faster. But let's get this straight: I hate bad code. I've had to read and work with code that caused my stomach to hurt. I care a lot about good code. And actually, I caught myself thinking that it is immoral to write bad code even though the dirty approach is sometimes superior. I was surprised by myself and found my idea to be very irrational. Did you ever experience this? Should I just get rid of this stupid idea and use the most efficient approach to coding?

Read the article
permutations gone wrong

- by vbNewbie

I have written code to implement an algorithm I found on string permutations. What I have is an arraylist of words ( up to 200) and I need to permutate the list in levels of 5. Basically group the string words in fives and permutated them. What I have takes the first 5 words generates the permutations and ignores the rest of the arraylist? Any ideas appreciated. Private Function permute(ByVal chunks As ArrayList, ByVal k As Long) As ArrayList ReDim ItemUsed(k) pno = 0 Permutate(k, 1) Return chunks End Function Private Shared Sub Permutate(ByVal K As Long, ByVal pLevel As Long) Dim i As Long, Perm As String Perm = pString ' Save the current Perm ' for each value currently available For i = 1 To K If Not ItemUsed(i) Then If pLevel = 1 Then pString = chunks.Item(i) 'pString = inChars(i) Else pString = pString & chunks.Item(i) 'pString += inChars(i) End If If pLevel = K Then 'got next Perm pno = pno + 1 SyncLock outfile outfile.WriteLine(pno & " = " & pString & vbCrLf) End SyncLock outfile.Flush() Exit Sub End If ' Mark this item unavailable ItemUsed(i) = True ' gen all Perms at next level Permutate(K, pLevel + 1) ' Mark this item free again ItemUsed(i) = False ' Restore the current Perm pString = Perm End If Next K above is = to 5 for the number of words in one permutation but when I change the for loop to the arraylist size I get an error of index out of bounds

Read the article
What is the best way to translate this recursive python method into Java?

- by Simucal

In another question I was provided with a great answer involving generating certain sets for the Chinese Postman Problem. The answer provided was: def get_pairs(s): if not s: yield [] else: i = min(s) for j in s - set([i]): for r in get_pairs(s - set([i, j])): yield [(i, j)] + r for x in get_pairs(set([1,2,3,4,5,6])): print x This will output the desire result of: [(1, 2), (3, 4), (5, 6)] [(1, 2), (3, 5), (4, 6)] [(1, 2), (3, 6), (4, 5)] [(1, 3), (2, 4), (5, 6)] [(1, 3), (2, 5), (4, 6)] [(1, 3), (2, 6), (4, 5)] [(1, 4), (2, 3), (5, 6)] [(1, 4), (2, 5), (3, 6)] [(1, 4), (2, 6), (3, 5)] [(1, 5), (2, 3), (4, 6)] [(1, 5), (2, 4), (3, 6)] [(1, 5), (2, 6), (3, 4)] [(1, 6), (2, 3), (4, 5)] [(1, 6), (2, 4), (3, 5)] [(1, 6), (2, 5), (3, 4)] This really shows off the expressiveness of Python because this is almost exactly how I would write the pseudo-code for the algorithm. I especially like the usage of yield and and the way that sets are treated as first class citizens. However, there in lies my problem. What would be the best way to: 1.Duplicate the functionality of the yield return construct in Java? Would it instead be best to maintain a list and append my partial results to this list? How would you handle the yield keyword. 2.Handle the dealing with the sets? I know that I could probably use one of the Java collections which implements that implements the Set interface and then using things like removeAll() to give me a set difference. Is this what you would do in that case? Ultimately, I'm looking to reduce this method into as concise and straightforward way as possible in Java. I'm thinking the return type of the java version of this method will likely return a list of int arrays or something similar. How would you handle the situations above when converting this method into Java?

Read the article
Which SCM/VCS cope well with moving text between files?

- by pfctdayelise

We are having havoc with our project at work, because our VCS is doing some awful merging when we move information across files. The scenario is thus: You have lots of files that, say, contain information about terms from a dictionary, so you have a file for each letter of the alphabet. Users entering terms blindly follow the dictionary order, so they will put an entry like "kick the bucket" under B if that is where the dictionary happened to list it (or it might have been listed under both B, bucket and K, kick). Later, other users move the terms to their correct files. Lots of work is being done on the dictionary terms all the time. e.g. User A may have taken the B file and elaborated on the "kick the bucket" entry. User B took the B and K files, and moved the "kick the bucket" entry to the K file. Whichever order they end up getting committed in, the VCS will probably lose entries and not "figure out" that an entry has been moved. (These entries are later automatically converted to an SQL database. But they are kept in a "human friendly" form for working on them, with lots of comments, examples etc. So it is not acceptable to say "make your users enter SQL directly".) It is so bad that we have taken to almost manually merging these kinds of files now, because we can't trust our VCS. :( So what is the solution? I would love to hear that there is a VCS that could cope with this. Or a better merge algorithm? Or otherwise, maybe someone can suggest a better workflow or file arrangement to try and avoid this problem?

Read the article
Looking for advice on importing large dataset in sqlite and Cocoa/Objective-C

- by jluckyiv

I have a fairly large hierarchical dataset I'm importing. The total size of the database after import is about 270MB in sqlite. My current method works, but I know I'm hogging memory as I do it. For instance, if I run with Zombies, my system freezes up (although it will execute just fine if I don't use that Instrument). I was hoping for some algorithm advice. I have three hierarchical tables comprising about 400,000 records. The highest level has about 30 records, the next has about 20,000, the last has the balance. Right now, I'm using nested for loops to import. I know I'm creating an unreasonably large object graph, but I'm also looking to serialize to JSON or XML because I want to break up the records into downloadable chunks for the end user to import a la carte. I have the code written to do the serialization, but I'm wondering if I can serialize the object graph if I only have pieces in memory. Here's pseudocode showing the basic process for sqlite import. I left out the unnecessary detail. [database open]; [database beginTransaction]; NSArray *firstLevels = [[FirstLevel fetchFromURL:url retain]; for (FirstLevel *firstLevel in firstLevels) { [firstLevel save]; int id1 = [firstLevel primaryKey]; NSArray *secondLevels = [[SecondLevel fetchFromURL:url] retain]; for (SecondLevel *secondLevel in secondLevels) { [secondLevel saveWithForeignKey:id1]; int id2 = [secondLevel primaryKey]; NSArray *thirdLevels = [[ThirdLevel fetchFromURL:url] retain]; for (ThirdLevel *thirdLevel in thirdLevels) { [thirdLevel saveWithForeignKey:id2]; } [database commit]; [database beginTransaction]; [thirdLevels release]; } [secondLevels release]; } [database commit]; [database release]; [firstLevels release];

Read the article
Python performance improvement request for winkler

- by Martlark

I'm a python n00b and I'd like some suggestions on how to improve the algorithm to improve the performance of this method to compute the Jaro-Winkler distance of two names. def winklerCompareP(str1, str2): """Return approximate string comparator measure (between 0.0 and 1.0) USAGE: score = winkler(str1, str2) ARGUMENTS: str1 The first string str2 The second string DESCRIPTION: As described in 'An Application of the Fellegi-Sunter Model of Record Linkage to the 1990 U.S. Decennial Census' by William E. Winkler and Yves Thibaudeau. Based on the 'jaro' string comparator, but modifies it according to whether the first few characters are the same or not. """ # Quick check if the strings are the same - - - - - - - - - - - - - - - - - - # jaro_winkler_marker_char = chr(1) if (str1 == str2): return 1.0 len1 = len(str1) len2 = len(str2) halflen = max(len1,len2) / 2 - 1 ass1 = '' # Characters assigned in str1 ass2 = '' # Characters assigned in str2 #ass1 = '' #ass2 = '' workstr1 = str1 workstr2 = str2 common1 = 0 # Number of common characters common2 = 0 #print "'len1', str1[i], start, end, index, ass1, workstr2, common1" # Analyse the first string - - - - - - - - - - - - - - - - - - - - - - - - - # for i in range(len1): start = max(0,i-halflen) end = min(i+halflen+1,len2) index = workstr2.find(str1[i],start,end) #print 'len1', str1[i], start, end, index, ass1, workstr2, common1 if (index > -1): # Found common character common1 += 1 #ass1 += str1[i] ass1 = ass1 + str1[i] workstr2 = workstr2[:index]+jaro_winkler_marker_char+workstr2[index+1:] #print "str1 analyse result", ass1, common1 #print "str1 analyse result", ass1, common1 # Analyse the second string - - - - - - - - - - - - - - - - - - - - - - - - - # for i in range(len2): start = max(0,i-halflen) end = min(i+halflen+1,len1) index = workstr1.find(str2[i],start,end) #print 'len2', str2[i], start, end, index, ass1, workstr1, common2 if (index > -1): # Found common character common2 += 1 #ass2 += str2[i] ass2 = ass2 + str2[i] workstr1 = workstr1[:index]+jaro_winkler_marker_char+workstr1[index+1:] if (common1 != common2): print('Winkler: Wrong common values for strings "%s" and "%s"' % \ (str1, str2) + ', common1: %i, common2: %i' % (common1, common2) + \ ', common should be the same.') common1 = float(common1+common2) / 2.0 ##### This is just a fix ##### if (common1 == 0): return 0.0 # Compute number of transpositions - - - - - - - - - - - - - - - - - - - - - # transposition = 0 for i in range(len(ass1)): if (ass1[i] != ass2[i]): transposition += 1 transposition = transposition / 2.0 # Now compute how many characters are common at beginning - - - - - - - - - - # minlen = min(len1,len2) for same in range(minlen+1): if (str1[:same] != str2[:same]): break same -= 1 if (same > 4): same = 4 common1 = float(common1) w = 1./3.*(common1 / float(len1) + common1 / float(len2) + (common1-transposition) / common1) wn = w + same*0.1 * (1.0 - w) return wn

Read the article
How to scan convert right edges and slopes less than one?

- by Zachary

I'm writing a program which will use scan conversion on triangles to fill in the pixels contained within the triangle. One thing that has me confused is how to determine the x increment for the right edge of the triangle, or for slopes less than or equal to one. Here is the code I have to handle left edges with a slope greater than one (obtained from Computer Graphics: Principles and Practice second edition): for(y=ymin;y<=ymax;y++) { edge.increment+=edge.numerator; if(edge.increment>edge.denominator) { edge.x++; edge.increment -= edge.denominator; } } The numerator is set from (xMax-xMin), and the denominator is set from (yMax-yMin)...which makes sense as it represents the slope of the line. As you move up the scan lines (represented by the y values). X is incremented by 1/(denomniator/numerator) ...which results in x having a whole part and a fractional part. If the fractional part is greater than one, then the x value has to be incremented by 1 (as shown in edge.incrementedge.denominator). This works fine for any left handed lines with a slope greater than one, but I'm having trouble generalizing it for any edge, and google-ing has proved fruitless. Does anyone know the algorithm for that?

Read the article

< Previous Page | 176 177 178 179 180 181 182 183 184 185 186 187 | Next Page >