Search Results

Search found 1933 results on 78 pages for 'genetic algorithms'.

Page 73/78 | < Previous Page | 69 70 71 72 73 74 75 76 77 78  | Next Page >

  • Calculating CPU frequency in C with RDTSC always returns 0

    - by Nazgulled
    Hi, The following piece of code was given to us from our instructor so we could measure some algorithms performance: #include <stdio.h> #include <unistd.h> static unsigned cyc_hi = 0, cyc_lo = 0; static void access_counter(unsigned *hi, unsigned *lo) { asm("rdtsc; movl %%edx,%0; movl %%eax,%1" : "=r" (*hi), "=r" (*lo) : /* No input */ : "%edx", "%eax"); } void start_counter() { access_counter(&cyc_hi, &cyc_lo); } double get_counter() { unsigned ncyc_hi, ncyc_lo, hi, lo, borrow; double result; access_counter(&ncyc_hi, &ncyc_lo); lo = ncyc_lo - cyc_lo; borrow = lo > ncyc_lo; hi = ncyc_hi - cyc_hi - borrow; result = (double) hi * (1 << 30) * 4 + lo; return result; } However, I need this code to be portable to machines with different CPU frequencies. For that, I'm trying to calculate the CPU frequency of the machine where the code is being run like this: int main(void) { double c1, c2; start_counter(); c1 = get_counter(); sleep(1); c2 = get_counter(); printf("CPU Frequency: %.1f MHz\n", (c2-c1)/1E6); printf("CPU Frequency: %.1f GHz\n", (c2-c1)/1E9); return 0; } The problem is that the result is always 0 and I can't understand why. I'm running Linux (Arch) as guest on VMware. On a friend's machine (MacBook) it is working to some extent; I mean, the result is bigger than 0 but it's variable because the CPU frequency is not fixed (we tried to fix it but for some reason we are not able to do it). He has a different machine which is running Linux (Ubuntu) as host and it also reports 0. This rules out the problem being on the virtual machine, which I thought it was the issue at first. Any ideas why this is happening and how can I fix it?

    Read the article

  • Constraint Satisfaction Problem

    - by Carl Smotricz
    I'm struggling my way through Artificial Intelligence: A Modern Approach in order to alleviate my natural stupidity. In trying to solve some of the exercises, I've come up against the "Who Owns the Zebra" problem, Exercise 5.13 in Chapter 5. This has been a topic here on SO but the responses mostly addressed the question "how would you solve this if you had a free choice of problem solving software available?" I accept that Prolog is a very appropriate programming language for this kind of problem, and there are some fine packages available, e.g. in Python as shown by the top-ranked answer and also standalone. Alas, none of this is helping me "tough it out" in a way as outlined by the book. The book appears to suggest building a set of dual or perhaps global constraints, and then implementing some of the algorithms mentioned to find a solution. I'm having a lot of trouble coming up with a set of constraints suitable for modelling the problem. I'm studying this on my own so I don't have access to a professor or TA to get me over the hump - this is where I'm asking for your help. I see little similarity to the examples in the chapter. I was eager to build dual constraints and started out by creating (the logical equivalent of) 25 variables: nationality1, nationality2, nationality3, ... nationality5, pet1, pet2, pet3, ... pet5, drink1 ... drink5 and so on, where the number was indicative of the house's position. This is fine for building the unary constraints, e.g. The Norwegian lives in the first house: nationality1 = { :norway }. But most of the constraints are a combination of two such variables through a common house number, e.g. The Swede has a dog: nationality[n] = { :sweden } AND pet[n] = { :dog } where n can range from 1 to 5, obviously. Or stated another way: nationality1 = { :sweden } AND pet1 = { :dog } XOR nationality2 = { :sweden } AND pet2 = { :dog } XOR nationality3 = { :sweden } AND pet3 = { :dog } XOR nationality4 = { :sweden } AND pet4 = { :dog } XOR nationality5 = { :sweden } AND pet5 = { :dog } ...which has a decidedly different feel to it than the "list of tuples" advocated by the book: ( X1, X2, X3 = { val1, val2, val3 }, { val4, val5, val6 }, ... ) I'm not looking for a solution per se; I'm looking for a start on how to model this problem in a way that's compatible with the book's approach. Any help appreciated.

    Read the article

  • Permutations of Varying Size

    - by waiwai933
    I'm trying to write a function in PHP that gets all permutations of all possible sizes. I think an example would be the best way to start off: $my_array = array(1,1,2,3); Possible permutations of varying size: 1 1 // * See Note 2 3 1,1 1,2 1,3 // And so forth, for all the sets of size 2 1,1,2 1,1,3 1,2,1 // And so forth, for all the sets of size 3 1,1,2,3 1,1,3,2 // And so forth, for all the sets of size 4 Note: I don't care if there's a duplicate or not. For the purposes of this example, all future duplicates have been omitted. What I have so far in PHP: function getPermutations($my_array){ $permutation_length = 1; $keep_going = true; while($keep_going){ while($there_are_still_permutations_with_this_length){ // Generate the next permutation and return it into an array // Of course, the actual important part of the code is what I'm having trouble with. } $permutation_length++; if($permutation_length>count($my_array)){ $keep_going = false; } else{ $keep_going = true; } } return $return_array; } The closest thing I can think of is shuffling the array, picking the first n elements, seeing if it's already in the results array, and if it's not, add it in, and then stop when there are mathematically no more possible permutations for that length. But it's ugly and resource-inefficient. Any pseudocode algorithms would be greatly appreciated. Also, for super-duper (worthless) bonus points, is there a way to get just 1 permutation with the function but make it so that it doesn't have to recalculate all previous permutations to get the next? For example, I pass it a parameter 3, which means it's already done 3 permutations, and it just generates number 4 without redoing the previous 3? (Passing it the parameter is not necessary, it could keep track in a global or static). The reason I ask this is because as the array grows, so does the number of possible combinations. Suffice it to say that one small data set with only a dozen elements grows quickly into the trillions of possible combinations and I don't want to task PHP with holding trillions of permutations in its memory at once.

    Read the article

  • Neural Network Always Produces Same/Similar Outputs for Any Input

    - by l33tnerd
    I have a problem where I am trying to create a neural network for Tic-Tac-Toe. However, for some reason, training the neural network causes it to produce nearly the same output for any given input. I did take a look at Artificial neural networks benchmark, but my network implementation is built for neurons with the same activation function for each neuron, i.e. no constant neurons. To make sure the problem wasn't just due to my choice of training set (1218 board states and moves generated by a genetic algorithm), I tried to train the network to reproduce XOR. The logistic activation function was used. Instead of using the derivative, I multiplied the error by output*(1-output) as some sources suggested that this was equivalent to using the derivative. I can put the Haskell source on HPaste, but it's a little embarrassing to look at. The network has 3 layers: the first layer has 2 inputs and 4 outputs, the second has 4 inputs and 1 output, and the third has 1 output. Increasing to 4 neurons in the second layer didn't help, and neither did increasing to 8 outputs in the first layer. I then calculated errors, network output, bias updates, and the weight updates by hand based on http://hebb.mit.edu/courses/9.641/2002/lectures/lecture04.pdf to make sure there wasn't an error in those parts of the code (there wasn't, but I will probably do it again just to make sure). Because I am using batch training, I did not multiply by x in equation (4) there. I am adding the weight change, though http://www.faqs.org/faqs/ai-faq/neural-nets/part2/section-2.html suggests to subtract it instead. The problem persisted, even in this simplified network. For example, these are the results after 500 epochs of batch training and of incremental training. Input |Target|Output (Batch) |Output(Incremental) [1.0,1.0]|[0.0] |[0.5003781562785173]|[0.5009731800870864] [1.0,0.0]|[1.0] |[0.5003740346965251]|[0.5006347214672715] [0.0,1.0]|[1.0] |[0.5003734471544522]|[0.500589332376345] [0.0,0.0]|[0.0] |[0.5003674110937019]|[0.500095157458231] Subtracting instead of adding produces the same problem, except everything is 0.99 something instead of 0.50 something. 5000 epochs produces the same result, except the batch-trained network returns exactly 0.5 for each case. (Heck, even 10,000 epochs didn't work for batch training.) Is there anything in general that could produce this behavior? Also, I looked at the intermediate errors for incremental training, and the although the inputs of the hidden/input layers varied, the error for the output neuron was always +/-0.12. For batch training, the errors were increasing, but extremely slowly and the errors were all extremely small (x10^-7). Different initial random weights and biases made no difference, either. Note that this is a school project, so hints/guides would be more helpful. Although reinventing the wheel and making my own network (in a language I don't know well!) was a horrible idea, I felt it would be more appropriate for a school project (so I know what's going on...in theory, at least. There doesn't seem to be a computer science teacher at my school). EDIT: Two layers, an input layer of 2 inputs to 8 outputs, and an output layer of 8 inputs to 1 output, produces much the same results: 0.5+/-0.2 (or so) for each training case. I'm also playing around with pyBrain, seeing if any network structure there will work. Edit 2: I am using a learning rate of 0.1. Sorry for forgetting about that. Edit 3: Pybrain's "trainUntilConvergence" doesn't get me a fully trained network, either, but 20000 epochs does, with 16 neurons in the hidden layer. 10000 epochs and 4 neurons, not so much, but close. So, in Haskell, with the input layer having 2 inputs & 2 outputs, hidden layer with 2 inputs and 8 outputs, and output layer with 8 inputs and 1 output...I get the same problem with 10000 epochs. And with 20000 epochs. Edit 4: I ran the network by hand again based on the MIT PDF above, and the values match, so the code should be correct unless I am misunderstanding those equations. Some of my source code is at http://hpaste.org/42453/neural_network__not_working; I'm working on cleaning my code somewhat and putting it in a Github (rather than a private Bitbucket) repository. All of the relevant source code is now at https://github.com/l33tnerd/hsann.

    Read the article

  • Algorithm to Render a Horizontal Binary-ish Tree in Text/ASCII form

    - by Justin L.
    It's a pretty normal binary tree, except for the fact that one of the nodes may be empty. I'd like to find a way to output it in a horizontal way (that is, the root node is on the left and expands to the right). I've had some experience expanding trees vertically (root node at the top, expanding downwards), but I'm not sure where to start, in this case. Preferably, it would follow these couple of rules: If a node has only one child, it can be skipped as redundant (an "end node", with no children, is always displayed) All nodes of the same depth must be aligned vertically; all nodes must be to the right of all less-deep nodes and to the left of all deeper nodes. Nodes have a string representation which includes their depth. Each "end node" has its own unique line; that is, the number of lines is the number of end nodes in the tree, and when an end node is on a line, there may be nothing else on that line after that end node. As a consequence of the last rule, the root node should be in either the top left or the bottom left corner; top left is preferred. For example, this is a valid tree, with six end nodes (node is represented by a name, and its depth): [a0]------------[b3]------[c5]------[d8] \ \ \----------[e9] \ \----[f5] \--[g1]--------[h4]------[i6] \ \--------------------[j10] \-[k3] Which represents the horizontal, explicit binary tree: 0 a / \ 1 g * / \ \ 2 * * * / \ \ 3 k * b / / \ 4 h * * / \ \ \ 5 * * f c / \ / \ 6 * i * * / / \ 7 * * * / / \ 8 * * d / / 9 * e / 10 j (branches folded for compactness; * representing redundant, one-child nodes; note that *'s are actual nodes, storing one child each, just with names omitted here for presentation sake) (also, to clarify, I'd like to generate the first, horizontal tree; not this vertical tree) I say language-agnostic because I'm just looking for an algorithm; I say ruby because I'm eventually going to have to implement it in ruby anyway. Assume that each Node data structure stores only its id, a left node, and a right node. A master Tree class keeps tracks of all nodes and has adequate algorithms to find: A node's nth ancestor A node's nth descendant The generation of a node The lowest common ancestor of two given nodes Anyone have any ideas of where I could start? Should I go for the recursive approach? Iterative?

    Read the article

  • Classifying captured data in unknown format?

    - by monch1962
    I've got a large set of captured data (potentially hundreds of thousands of records), and I need to be able to break it down so I can both classify it and also produce "typical" data myself. Let me explain further... If I have the following strings of data: 132T339G1P112S 164T897F5A498S 144T989B9B223T 155T928X9Z554T ... you might start to infer the following: possibly all strings are 14 characters long the 4th, 8th, 10th and 14th characters may always be alphas, while the rest are numeric the first character may always be a '1' the 4th character may always be the letter 'T' the 14th character may be limited to only being 'S' or 'T' and so on... As you get more and more samples of real data, some of these "rules" might disappear; if you see a 15 character long string, then you have evidence that the 1st "rule" is incorrect. However, given a sufficiently large sample of strings that are exactly 14 characters long, you can start to assume that "all strings are 14 characters long" and assign a numeric figure to your degree of confidence (with an appropriate set of assumptions around the fact that you're seeing a suitably random set of all possible captured data). As you can probably tell, a human can do a lot of this classification by eye, but I'm not aware of libraries or algorithms that would allow a computer to do it. Given a set of captured data (significantly more complex than the above...), are there libraries that I can apply in my code to do this sort of classification for me, that will identify "rules" with a given degree of confidence? As a next step, I need to be able to take those rules, and use them to create my own data that conforms to these rules. I assume this is a significantly easier step than the classification, but I've never had to perform a task like this before so I'm really not sure how complex it is. At a guess, Python or Java (or possibly Perl or R) are possibly the "common" languages most likely to have these sorts of libraries, and maybe some of the bioinformatic libraries do this sort of thing. I really don't care which language I have to use; I need to solve the problem in whatever way I can. Any sort of pointer to information would be very useful. As you can probably tell, I'm struggling to describe this problem clearly, and there may be a set of appropriate keywords I can plug into Google that will point me towards the solution. Thanks in advance

    Read the article

  • Why is it still so hard to write software?

    - by nornagon
    Writing software, I find, is composed of two parts: the Idea, and the Implementation. The Idea is about thinking: "I have this problem; how do I solve it?" and further, "how do I solve it elegantly?" The answers to these questions are obtainable by thinking about algorithms and architecture. The ideas come partially through analysis and partially through insight and intuition. The Idea is usually the easy part. You talk to your friends and co-workers and you nut it out in a meeting or over coffee. It takes an hour or two, plus revisions as you implement and find new problems. The Implementation phase of software development is so difficult that we joke about it. "Oh," we say, "the rest is a Simple Matter of Code." Because it should be simple, but it never is. We used to write our code on punch cards, and that was hard: mistakes were very difficult to spot, so we had to spend extra effort making sure every line was perfect. Then we had serial terminals: we could see all our code at once, search through it, organise it hierarchically and create things abstracted from raw machine code. First we had assemblers, one level up from machine code. Mnemonics freed us from remembering the machine code. Then we had compilers, which freed us from remembering the instructions. We had virtual machines, which let us step away from machine-specific details. And now we have advanced tools like Eclipse and Xcode that perform analysis on our code to help us write code faster and avoid common pitfalls. But writing code is still hard. Writing code is about understanding large, complex systems, and tools we have today simply don't go very far to help us with that. When I click "find all references" in Eclipse, I get a list of them at the bottom of the window. I click on one, and I'm torn away from what I was looking at, forced to context switch. Java architecture is usually several levels deep, so I have to switch and switch and switch until I find what I'm really looking for -- by which time I've forgotten where I came from. And I do that all day until I've understood a system. It's taxing mentally, and Eclipse doesn't do much that couldn't be done in 1985 with grep, except eat hundreds of megs of RAM. Writing code has barely changed since we were staring at amber on black. We have the theoretical groundwork for much more advanced tools, tools that actually work to help us comprehend and extend the complex systems we work with every day. So why is writing code still so hard?

    Read the article

  • Recursive N-way merge/diff algorithm for directory trees?

    - by BobMcGee
    What algorithms or Java libraries are available to do N-way, recursive diff/merge of directories? I need to be able to generate a list of folder trees that have many identical files, and have subdirectories with many similar files. I want to be able to use 2-way merge operations to quickly remove as much redundancy as possible. Goals: Find pairs of directories that have many similar files between them. Generate short list of directory pairs that can be synchronized with 2-way merge to eliminate duplicates Should operate recursively (there may be nested duplicates of higher-level directories) Run time and storage should be O(n log n) in numbers of directories and files Should be able to use an embedded DB or page to disk for processing more files than fit in memory (100,000+). Optional: generate an ancestry and change-set between folders Optional: sort the merge operations by how many duplicates they can elliminate I know how to use hashes to find duplicate files in roughly O(n) space, but I'm at a loss for how to go from this to finding partially overlapping sets between folders and their children. EDIT: some clarification The tricky part is the difference between "exact same" contents (otherwise hashing file hashes would work) and "similar" (which will not). Basically, I want to feed this algorithm at a set of directories and have it return a set of 2-way merge operations I can perform in order to reduce duplicates as much as possible with as few conflicts possible. It's effectively constructing an ancestry tree showing which folders are derived from each other. The end goal is to let me incorporate a bunch of different folders into one common tree. For example, I may have a folder holding programming projects, and then copy some of its contents to another computer to work on it. Then I might back up and intermediate version to flash drive. Except I may have 8 or 10 different versions, with slightly different organizational structures or folder names. I need to be able to merge them one step at a time, so I can chose how to incorporate changes at each step of the way. This is actually more or less what I intend to do with my utility (bring together a bunch of scattered backups from different points in time). I figure if I can do it right I may as well release it as a small open source util. I think the same tricks might be useful for comparing XML trees though.

    Read the article

  • Combinations and Permutations in F#

    - by Noldorin
    I've recently written the following combinations and permutations functions for an F# project, but I'm quite aware they're far from optimised. /// Rotates a list by one place forward. let rotate lst = List.tail lst @ [List.head lst] /// Gets all rotations of a list. let getRotations lst = let rec getAll lst i = if i = 0 then [] else lst :: (getAll (rotate lst) (i - 1)) getAll lst (List.length lst) /// Gets all permutations (without repetition) of specified length from a list. let rec getPerms n lst = match n, lst with | 0, _ -> seq [[]] | _, [] -> seq [] | k, _ -> lst |> getRotations |> Seq.collect (fun r -> Seq.map ((@) [List.head r]) (getPerms (k - 1) (List.tail r))) /// Gets all permutations (with repetition) of specified length from a list. let rec getPermsWithRep n lst = match n, lst with | 0, _ -> seq [[]] | _, [] -> seq [] | k, _ -> lst |> Seq.collect (fun x -> Seq.map ((@) [x]) (getPermsWithRep (k - 1) lst)) // equivalent: | k, _ -> lst |> getRotations |> Seq.collect (fun r -> List.map ((@) [List.head r]) (getPermsWithRep (k - 1) r)) /// Gets all combinations (without repetition) of specified length from a list. let rec getCombs n lst = match n, lst with | 0, _ -> seq [[]] | _, [] -> seq [] | k, (x :: xs) -> Seq.append (Seq.map ((@) [x]) (getCombs (k - 1) xs)) (getCombs k xs) /// Gets all combinations (with repetition) of specified length from a list. let rec getCombsWithRep n lst = match n, lst with | 0, _ -> seq [[]] | _, [] -> seq [] | k, (x :: xs) -> Seq.append (Seq.map ((@) [x]) (getCombsWithRep (k - 1) lst)) (getCombsWithRep k xs) Does anyone have any suggestions for how these functions (algorithms) can be sped up? I'm particularly interested in how the permutation (with and without repetition) ones can be improved. The business involving rotations of lists doesn't look too efficient to me in retrospect. Update Here's my new implementation for the getPerms function, inspired by Tomas's answer. Unfortunately, it's not really any fast than the existing one. Suggestions? let getPerms n lst = let rec getPermsImpl acc n lst = seq { match n, lst with | k, x :: xs -> if k > 0 then for r in getRotations lst do yield! getPermsImpl (List.head r :: acc) (k - 1) (List.tail r) if k >= 0 then yield! getPermsImpl acc k [] | 0, [] -> yield acc | _, [] -> () } getPermsImpl List.empty n lst

    Read the article

  • Checking if an int is prime more efficiently

    - by SipSop
    I recently was part of a small java programming competition at my school. My partner and I have just finished our first pure oop class and most of the questions were out of our league so we settled on this one (and I am paraphrasing somewhat): "given an input integer n return the next int that is prime and its reverse is also prime for example if n = 18 your program should print 31" because 31 and 13 are both prime. Your .class file would then have a test case of all the possible numbers from 1-2,000,000,000 passed to it and it had to return the correct answer within 10 seconds to be considered valid. We found a solution but with larger test cases it would take longer than 10 seconds. I am fairly certain there is a way to move the range of looping from n,..2,000,000,000 down as the likely hood of needing to loop that far when n is a low number is small, but either way we broke the loop when a number is prime under both conditions is found. At first we were looping from 2,..n no matter how large it was then i remembered the rule about only looping to the square root of n. Any suggestions on how to make my program more efficient? I have had no classes dealing with complexity analysis of algorithms. Here is our attempt. public class P3 { public static void main(String[] args){ long loop = 2000000000; long n = Integer.parseInt(args[0]); for(long i = n; i<loop; i++) { String s = i +""; String r = ""; for(int j = s.length()-1; j>=0; j--) r = r + s.charAt(j); if(prime(i) && prime(Long.parseLong(r))) { System.out.println(i); break; } } System.out.println("#"); } public static boolean prime(long p){ for(int i = 2; i<(int)Math.sqrt(p); i++) { if(p%i==0) return false; } return true; } } ps sorry if i did the formatting for code wrong this is my first time posting here. Also the output had to have a '#' after each line thats what the line after the loop is about Thanks for any help you guys offer!!!

    Read the article

  • Are there programs that iteratively write new programs?

    - by chris
    For about a year I have been thinking about writing a program that writes programs. This would primarily be a playful exercise that might teach me some new concepts. My inspiration came from negentropy and the ability for order to emerge from chaos and new chaos to arise out of order in infinite succession. To be more specific, the program would start by writing a short random string. If the string compiles the programs will log it for later comparison. If the string does not compile the program will try to rewrite it until it does compile. As more strings (mini 'useless' programs) are logged they can be parsed for similarities and used to generate a grammar. This grammar can then be drawn on to write more strings that have a higher probability of compilation than purely random strings. This is obviously more than a little silly, but I thought it would be fun to try and grow a program like this. And as a byproduct I get a bunch of unique programs that I can visualize and call art. I'll probably write this in Ruby due to its simple syntax and dynamic compilation and then I will visualize in processing using ruby-processing. What I would like to know is: Is there a name for this type of programming? What currently exists in this field? Who are the primary contributors? BONUS! - In what ways can I procedurally assign value to output programs beyond compiles(y/n)? I may want to extend the functionality of this program to generate a program based on parameters, but I want the program to define those parameters through running the programs that compile and assigning meaning to the programs output. This question is probably more involved than reasonable for a bonus, but if you can think of a simple way to get something like this done in less than 23 lines or one hyperlink, please toss it into your response. I know that this is not quite meta-programming and from the little I know of AI and generative algorithms they are usually more goal oriented than what I am thinking. What would be optimal is a program that continually rewrites and improves itself so I don't have to ^_^

    Read the article

  • Using boost::iterator

    - by Neil G
    I wrote a sparse vector class (see #1, #2.) I would like to provide two kinds of iterators: The first set, the regular iterators, can point any element, whether set or unset. If they are read from, they return either the set value or value_type(), if they are written to, they create the element and return the lvalue reference. Thus, they are: Random Access Traversal Iterator and Readable and Writable Iterator The second set, the sparse iterators, iterate over only the set elements. Since they don't need to lazily create elements that are written to, they are: Random Access Traversal Iterator and Readable and Writable and Lvalue Iterator I also need const versions of both, which are not writable. I can fill in the blanks, but not sure how to use boost::iterator_adaptor to start out. Here's what I have so far: template<typename T> class sparse_vector { public: typedef size_t size_type; typedef T value_type; private: typedef T& true_reference; typedef const T* const_pointer; typedef sparse_vector<T> self_type; struct ElementType { ElementType(size_type i, T const& t): index(i), value(t) {} ElementType(size_type i, T&& t): index(i), value(t) {} ElementType(size_type i): index(i) {} ElementType(ElementType const&) = default; size_type index; value_type value; }; typedef vector<ElementType> array_type; public: typedef T* pointer; typedef T& reference; typedef const T& const_reference; private: size_type size_; mutable typename array_type::size_type sorted_filled_; mutable array_type data_; // lots of code for various algorithms... public: class sparse_iterator : public boost::iterator_adaptor< sparse_iterator // Derived , array_type::iterator // Base (the internal array) (this paramater does not compile! -- says expected a type, got 'std::vector::iterator'???) , boost::use_default // Value , boost::random_access_traversal_tag? // CategoryOrTraversal > class iterator_proxy { ??? }; class iterator : public boost::iterator_facade< iterator // Derived , ????? // Base , ????? // Value , boost::?????? // CategoryOrTraversal > { }; };

    Read the article

  • How to avoid repetition when working with primitive types?

    - by I82Much
    I have the need to perform algorithms on various primitive types; the algorithm is essentially the same with the exception of which type the variables are. So for instance, /** * Determine if <code>value</code> is the bitwise OR of elements of <code>validValues</code> array. * For instance, our valid choices are 0001, 0010, and 1000. * We are given a value of 1001. This is valid because it can be made from * ORing together 0001 and 1000. * On the other hand, if we are given a value of 1111, this is invalid because * you cannot turn on the second bit from left by ORing together those 3 * valid values. */ public static boolean isValid(long value, long[] validValues) { for (long validOption : validValues) { value &= ~validOption; } return value != 0; } public static boolean isValid(int value, int[] validValues) { for (int validOption : validValues) { value &= ~validOption; } return value != 0; } How can I avoid this repetition? I know there's no way to genericize primitive arrays, so my hands seem tied. I have instances of primitive arrays and not boxed arrays of say Number objects, so I do not want to go that route either. I know there are a lot of questions about primitives with respect to arrays, autoboxing, etc., but I haven't seen it formulated in quite this way, and I haven't seen a decisive answer on how to interact with these arrays. I suppose I could do something like: public static<E extends Number> boolean isValid(E value, List<E> numbers) { long theValue = value.longValue(); for (Number validOption : numbers) { theValue &= ~validOption.longValue(); } return theValue != 0; } and then public static boolean isValid(long value, long[] validValues) { return isValid(value, Arrays.asList(ArrayUtils.toObject(validValues))); } public static boolean isValid(int value, int[] validValues) { return isValid(value, Arrays.asList(ArrayUtils.toObject(validValues))); } Is that really much better though? Any thoughts in this matter would be appreciated.

    Read the article

  • Looking for an Open Source Project in need of help

    - by hvidgaard
    Hi StackOverflow! I'm a CS student on well on my way to graduate. I have had a difficult time of finding relevant student jobs (they seems to be taken merely hours after the notice gets on the board) , so instead I'm looking for an open source project in need of help. I'm aware that I should choose one that I use, but I'm not aware of any OS-project that I use that needs help. That's why I'm asking you. I don't have any deep experience, but I here are some of my biggest projects so far: BitTorrent-ish client in Python (a subset of BitTorrent) HTTP 1.1 webserver in Java Compiler from a subset of Java to run on JRE Flash-framework project to model an iPad look and feel (not to run actual iPad programs) complete with an API for programs. Complete MySQL database for a booking system, with departure and arrival times, so you could only book valid tickets (with a Java frontend). I know, Java and languages like AS3 and C# feels natural per se, Python, and have done a fair bit of hacking around in C, but I don't feel very comfortable with it. Mostly I'm afraid to make a fuckup because I have such a high degree of control. I would like to think I'm well aware of good software design practices, but in reality what I do is ask myself "would I like to use/maintain this?", and I love to refactor my code because I see optimizations. I love algorithms and to make them run in the best possible time. I don't have any preferred domain to work in, but I wouldn't mind it to be graphics or math heavy. Ideally I'm looking for a project in C++ to learn the in's and out's of it, but I'm well aware that I don't know that language very well. I would like to have a mentor-like figure until I'm confident enough to stand on my own, not one to review all my code (I'm sure someone will to start with anyway), but to ask questions about the project and language in question. I do have a wife and two children, so don't expect me to put in 10+ hours every week. In return I can work on my own, I strive to program modular and maintainable code. Know how to read an API, use Google, StackOverflow and online resources in general. If you have any questions, shoot. I'm looking forward to your suggestions.

    Read the article

  • Which programming language to choose? (for a specific problem/domain, details inside)

    - by Bijan
    I am building a trading portfolio management system that is responsible for production, optimization, and simulation of non-high frequency trading portfolios (dealing with 1min or 3min bars of data, not tick data). I plan on employing Amazon web services to take on the entire load of the application. I have four choices that I am considering as language. a) Java b) C++ c) C# d) Python Here is the scope of the extremes of the project scope. This isn't how it will be, maybe ever, but it's within the scope of the requirements: Weekly simulation of 10,000,000 trading systems. (Each trading system is expected to have its own data mining methods, including feature selection algorithms which are extremely computationally-expensive. Imagine 500-5000 features using wrappers. These are not run often by any means, but it's still a consideration) Real-time production of portfolio w/ 100,000 trading strategies Taking in 1 min or 3 min data from every stock/futures market around the globe (approx 100,000) Portfolio optimization of portfolios with up to 100,000 strategies. (rather intensive algorithm) Speed is a concern, but I believe that Java can handle the load. I just want to make sure that Java CAN handle the above requirements comfortably. I don't want to do the project in C++, but I will if it's required. The reason C# is on there is because I thought it was a good alternative to Java, even though I don't like Windows at all and would prefer Java if all things are the same. Python - I've read somethings on PyPy and pyscho that claim python can be optimized with JIT compiling to run at near C-like speeds.... That's pretty much the only reason it is on this list, besides that fact that Python is a great language and would probably be the most enjoyable language to code in, which is not a factor at all for this project, but a perk. To sum up: - real time production - weekly simulations of a large number of systems - weekly/monthly optimizations of portfolios - large numbers of connections to collect data from There is no dealing with millisecond or even second based trades. The only consideration is if Java can possibly deal with this kind of load when spread out of a necessary amount of EC2 servers. Thank you guys so much for your wisdom.

    Read the article

  • Finding N contiguous zero bits in an integer to the left of the MSB position of another integer

    - by James Morris
    The problem is: given an integer val1 find the position of the highest bit set (Most Significant Bit) then, given a second integer val2 find a contiguous region of unset bits, with the minimum number of zero bits given by width to the left of the position (ie, in the higher bits). Here is the C code for my solution: typedef unsigned int t; unsigned const t_bits = sizeof(t) * CHAR_BIT; _Bool test_fit_within_left_of_msb( unsigned width, t val1, t val2, unsigned* offset_result) { unsigned offbit = 0; unsigned msb = 0; t mask; t b; while(val1 >>= 1) ++msb; while(offbit + width < t_bits - msb) { mask = (((t)1 << width) - 1) << (t_bits - width - offbit); b = val2 & mask; if (!b) { *offset_result = offbit; return true; } if (offbit++) /* this conditional bothers me! */ b <<= offbit - 1; while(b <<= 1) offbit++; } return false; } Aside from faster ways of finding the MSB of the first integer, the commented test for a zero offbit seems a bit extraneous, but necessary to skip the highest bit of type t if it is set. I have also implemented similar algorithms but working to the right of the MSB of the first number, so they don't require this seemingly extra condition. How can I get rid of this extra condition, or even, are there far more optimal solutions? Edit: Some background not strictly required. The offset result is a count of bits from the high bit, not from the low bit as maybe expected. This will be part of a wider algorithm which scans a 2D array for a 2D area of zero bits. Here, for testing, the algorithm has been simplified. val1 represents the first integer which does not have all bits set found in a row of the 2D array. From this the 2D version would scan down which is what val2 represents. Here's some output showing success and failure: t_bits:32 t_high: 10000000000000000000000000000000 ( 2147483648 ) --------- ----------------------------------- *** fit within left of msb test *** ----------------------------------- val1: 00000000000000000000000010000000 ( 128 ) val2: 01000001000100000000100100001001 ( 1091569929 ) msb: 7 offbit:0 + width: 8 = 8 mask: 11111111000000000000000000000000 ( 4278190080 ) b: 01000001000000000000000000000000 ( 1090519040 ) offbit:8 + width: 8 = 16 mask: 00000000111111110000000000000000 ( 16711680 ) b: 00000000000100000000000000000000 ( 1048576 ) offbit:12 + width: 8 = 20 mask: 00000000000011111111000000000000 ( 1044480 ) b: 00000000000000000000000000000000 ( 0 ) offbit:12 iters:10 ***** found room for width:8 at offset: 12 ***** ----------------------------------- *** fit within left of msb test *** ----------------------------------- val1: 00000000000000000000000001000000 ( 64 ) val2: 00010000000000001000010001000001 ( 268469313 ) msb: 6 offbit:0 + width: 13 = 13 mask: 11111111111110000000000000000000 ( 4294443008 ) b: 00010000000000000000000000000000 ( 268435456 ) offbit:4 + width: 13 = 17 mask: 00001111111111111000000000000000 ( 268402688 ) b: 00000000000000001000000000000000 ( 32768 ) ***** mask: 00001111111111111000000000000000 ( 268402688 ) offbit:17 iters:15 ***** no room found for width:13 ***** (iters is the count of iterations of the inner while loop)

    Read the article

  • How to check for palindrome using Python logic

    - by DrOnline
    My background is only a 6 month college class in basic C/C++, and I'm trying to convert to Python. I may be talking nonsense, but it seems to me C, at least at my level, is very for-loop intensive. I solve most problems with these loops. And it seems to me the biggest mistake people do when going from C to Python is trying to implement C logic using Python, which makes things run slowly, and it's just not making the most of the language. I see on this website: http://hyperpolyglot.org/scripting (serach for "c-style for", that Python doesn't have C-style for loops. Might be outdated, but I interpret it to mean Python has its own methods for this. I've tried looking around, I can't find much up to date (Python 3) advice for this. How can I solve a palindrome challenge in Python, without using the for loop? I've done this in C in class, but I want to do it in Python, on a personal basis. The problem is from the Euler Project, great site btw. def isPalindrome(n): lst = [int(n) for n in str(n)] l=len(lst) if l==0 || l==1: return True elif len(lst)%2==0: for k in range (l) ##### else: while (k<=((l-1)/2)): if (list[]): ##### for i in range (999, 100, -1): for j in range (999,100, -1): if isPalindrome(i*j): print(i*j) break I'm missing a lot of code here. The five hashes are just reminders for myself. Concrete questions: 1) In C, I would make a for loop comparing index 0 to index max, and then index 0+1 with max-1, until something something. How to best do this in Python? 2) My for loop (in in range (999, 100, -1), is this a bad way to do it in Python? 3) Does anybody have any good advice, or good websites or resources for people in my position? I'm not a programmer, I don't aspire to be one, I just want to learn enough so that when I write my bachelor's degree thesis (electrical engineering), I don't have to simultaneously LEARN an applicable programming language while trying to obtain good results in the project. "How to go from basic C to great application of Python", that sort of thing. 4) Any specific bits of code to make a great solution to this problem would also be appreciated, I need to learn good algorithms.. I am envisioning 3 situations. If the value is zero or single digit, if it is of odd length, and if it is of even length. I was planning to write for loops... PS: The problem is: Find the highest value product of two 3 digit integers that is also a palindrome.

    Read the article

  • Android library to get pitch from WAV file

    - by Sakura
    I have a list of sampled data from the WAV file. I would like to pass in these values into a library and get the frequency of the music played in the WAV file. For now, I will have 1 frequency in the WAV file and I would like to find a library that is compatible with Android. I understand that I need to use FFT to get the frequency domain. Is there any good libraries for that? I found that [KissFFT][1] is quite popular but I am not very sure how compatible it is on Android. Is there an easier and good library that can perform the task I want? EDIT: I tried to use JTransforms to get the FFT of the WAV file but always failed at getting the correct frequency of the file. Currently, the WAV file contains sine curve of 440Hz, music note A4. However, I got the result as 441. Then I tried to get the frequency of G4, I got the result as 882Hz which is incorrect. The frequency of G4 is supposed to be 783Hz. Could it be due to not enough samples? If yes, how much samples should I take? //DFT DoubleFFT_1D fft = new DoubleFFT_1D(numOfFrames); double max_fftval = -1; int max_i = -1; double[] fftData = new double[numOfFrames * 2]; for (int i = 0; i < numOfFrames; i++) { // copying audio data to the fft data buffer, imaginary part is 0 fftData[2 * i] = buffer[i]; fftData[2 * i + 1] = 0; } fft.complexForward(fftData); for (int i = 0; i < fftData.length; i += 2) { // complex numbers -> vectors, so we compute the length of the vector, which is sqrt(realpart^2+imaginarypart^2) double vlen = Math.sqrt((fftData[i] * fftData[i]) + (fftData[i + 1] * fftData[i + 1])); //fd.append(Double.toString(vlen)); // fd.append(","); if (max_fftval < vlen) { // if this length is bigger than our stored biggest length max_fftval = vlen; max_i = i; } } //double dominantFreq = ((double)max_i / fftData.length) * sampleRate; double dominantFreq = (max_i/2.0) * sampleRate / numOfFrames; fd.append(Double.toString(dominantFreq)); Can someone help me out? EDIT2: I manage to fix the problem mentioned above by increasing the number of samples to 100000, however, sometimes I am getting the overtones as the frequency. Any idea how to fix it? Should I use Harmonic Product Frequency or Autocorrelation algorithms?

    Read the article

  • Finding the most frequent subtrees in a collection of (parse) trees

    - by peter.murray.rust
    I have a collection of trees whose nodes are labelled (but not uniquely). Specifically the trees are from a collection of parsed sentences (see http://en.wikipedia.org/wiki/Treebank). I wish to extract the most common subtrees from the collection - performance is not (yet) an issue. I'd be grateful for algorithms (ideally Java) or pointers to tools which do this for treebanks. Note that order of child nodes is important. EDIT @mjv. We are working in a limited domain (chemistry) which has a stylised language so the varirty of the trees is not huge - probably similar to children's readers. Simple tree for "the cat sat on the mat". <sentence> <nounPhrase> <article/> <noun/> </nounPhrase> <verbPhrase> <verb/> <prepositionPhrase> <preposition/> <nounPhrase> <article/> <noun/> </nounPhrase> </prepositionPhrase> </verbPhrase> </sentence> Here the sentence contains two identical part-of-speech subtrees (the actual tokens "cat". "mat" are not important in matching). So the algorithm would need to detect this. Note that not all nounPhrases are identical - "the big black cat" could be: <nounPhrase> <article/> <adjective/> <adjective/> <noun/> </nounPhrase> The length of sentences will be longer - between 15 to 30 nodes. I would expect to get useful results from 1000 trees. If this does not take more than a day or so that's acceptable. Obviously the shorter the tree the more frequent, so nounPhrase will be very common. EDIT If this is to be solved by flattening the tree then I think it would be related to Longest Common Substring, not Longest Common Sequence. But note that I don't necessarily just want the longest - I want a list of all those long enough to be "interesting" (criterion yet to be decided).

    Read the article

  • Primary language - C++/Qt, C#, Java?

    - by Airjoe
    I'm looking for some input, but let me start with a bit of background (for tl;dr skip to end). I'm an IT major with a concentration in networking. While I'm not a CS major nor do I want to program as a vocation, I do consider myself a programmer and do pretty well with the concepts involved. I've been programming since about 6th grade, started out with a proprietary game creation language that made my transition into C++ at college pretty easy. I like to make programs for myself and friends, and have been paid to program for local businesses. A bit about that- I wrote some programs for a couple local businesses in my senior year in high school. I wrote management systems for local shops (inventory, phone/pos orders, timeclock, customer info, and more stuff I can't remember). It definitely turned out to be over my head, as I had never had any formal programming education. It was a great learning experience, but damn was it crappy code. Oh yeah, by the way, it was all vb6. So, I've used vb6 pretty extensively, I've used c++ in my classes (intro to programming up to algorithms), used Java a little bit in another class (had to write a ping client program, pretty easy) and used Java for some simple Project Euler problems to help learn syntax and such when writing the program for the class. I've also used C# a bit for my own simple personal projects (simple programs, one which would just generate an HTTP request on a list of websites and notify if one responded unexpectedly or not at all, and another which just held a list of things to do and periodically reminded me to do them), things I would've written in vb6 a year or two ago. I've just started using Qt C++ for some undergrad research I'm working on. Now I've had some formal education, I [think I] understand organization in programming a lot better (I didn't even use classes in my vb6 programs where I really should have), how it's important to structure code, split into functions where appropriate, document properly, efficiency both in memory and speed, dynamic and modular programming etc. I was looking for some input on which language to pick up as my "primary". As I'm not a "real programmer", it will be mostly hobby projects, but will include some 'real' projects I'm sure. From my perspective: QtC++ and Java are cross platform, which is cool. Java and C# run in a virtual machine, but I'm not sure if that's a big deal (something extra to distribute, possibly a bit slower? I think Qt would require additional distributables too, right?). I don't really know too much more than this, so I appreciate any help, thanks! TL;DR Am an avocational programmer looking for a language, want quick and straight forward development, liked vb6, will be working with database driven GUI apps- should I go with QtC++, Java, C#, or perhaps something else?

    Read the article

  • Initialize a Variable Again.

    - by SoulBeaver
    That may sound a little confusing. Basically, I have a function CCard newCard() { /* Used to store the string variables intermittantly */ std::stringstream ssPIN, ssBN; int picker1, picker2; int pin, bankNum; /* Choose 5 random variables, store them in stream */ for( int loop = 0; loop < 5; ++loop ) { picker1 = rand() % 8 + 1; picker2 = rand() % 8 + 1; ssPIN << picker1; ssBN << picker2; } /* Convert them */ ssPIN >> pin; ssBN >> bankNum; CCard card( pin, bankNum ); return card; } that creates a new CCard variable and returns it to the caller CCard card = newCard(); My teacher advised me that doing this is a violation of OOP principles and has to be put in the class. He told me to use this method as a constructor. Which I did: CCard::CCard() { m_Sperre = false; m_Guthaben = rand() % 1000; /* Work */ /* Convert them */ ssPIN >> m_Geheimzahl; ssBN >> m_Nummer; } All variables with m_ are member variables. However, the constructor works when I initialize the card normally CCard card(); at the start of the program. However, I also have a function, that is supposed to create a new card and return it to the user, this function is now broken. The original command: card = newCard(); isn't available anymore, and card = new CCard(); doesn't work. What other options do I have? I have a feeling using the constructor won't work, and that I probably should just create a class method newCard, but I want to see if it is somehow at all possible to do it the way the teacher wanted. This is creating a lot of headaches for me. I told the teacher that this is a stupid idea and not everything has to be classed in OOP. He has since told me that Java or C# don't allow code outside of classes, which sounds a little incredible. Not sure that you can do this in C++, especially when templated functions exist, or generic algorithms. Is it true that this would be bad code for OOP in C++ if I didn't force it into a class?

    Read the article

  • Blending pixels from Two Bitmaps

    - by MarkPowell
    I'm beating my head against a wall here, and I'm fairly certain I'm doing something stupid, so time to make my stupidity public. I'm trying to take two images, blend them together into a third image using standard blending algorithms (Hardlight, softlight, overlay, multiply, etc). Because Android does not have such blend properties build in, I've gone down the path of taking each pixel and combine them using an algorithm. However, the results are garbage. Any help would be appreciated. Below is the code, which I've tried to strip out all the "junk", but some may have made it through. I'll clean it up if something isn't clear. Bitmap src = BitmapFactory.decodeResource(getResources(), R.drawable.base, options); Bitmap mutableBitmap = src.copy(Bitmap.Config.RGB_565, true); int imageId = getResources().getIdentifier("drawable/" + filter, null, getPackageName()); Bitmap filterBitmap = BitmapFactory.decodeResource(getResources(), imageId, options); float scaleWidth = ((float) mutableBitmap.getWidth()) / filterBitmap.getWidth(); float scaleHeight = ((float) mutableBitmap.getHeight()) / filterBitmap.getHeight(); IntBuffer buffSrc = IntBuffer.allocate(src.getWidth() * src.getHeight()); mutableBitmap.copyPixelsToBuffer(buffSrc); buffSrc.rewind(); IntBuffer buffFilter = IntBuffer.allocate(resizedFilterBitmap.getWidth() * resizedFilterBitmap.getHeight()); resizedFilterBitmap.copyPixelsToBuffer(buffFilter); buffFilter.rewind(); IntBuffer buffOut = IntBuffer.allocate(src.getWidth() * src.getHeight()); buffOut.rewind(); while (buffOut.position() < buffOut.limit()) { int filterInt = buffFilter.get(); int srcInt = buffSrc.get(); int alphaValueFilter = Color.alpha(filterInt); int redValueFilter = Color.red(filterInt); int greenValueFilter = Color.green(filterInt); int blueValueFilter = Color.blue(filterInt); int alphaValueSrc = Color.alpha(srcInt); int redValueSrc = Color.red(srcInt); int greenValueSrc = Color.green(srcInt); int blueValueSrc = Color.blue(srcInt); int alphaValueFinal = convert(alphaValueFilter, alphaValueSrc); int redValueFinal = convert(redValueFilter, redValueSrc); int greenValueFinal = convert(greenValueFilter, greenValueSrc); int blueValueFinal = convert(blueValueFilter, blueValueSrc); int pixel = Color.argb(alphaValueFinal, redValueFinal, greenValueFinal, blueValueFinal); buffOut.put(pixel); } buffOut.rewind(); mutableBitmap.copyPixelsFromBuffer(buffOut); BitmapDrawable drawable = new BitmapDrawable(getResources(), mutableBitmap); imageView.setImageDrawable(drawable); } int convert (int in1, int in2) { //simple multiply for example return in1 * in2 / 255; }

    Read the article

  • Java: Combination of recursive loops which has different FOR loop inside; Output: FOR loops indexes

    - by vvinjj
    currently recursion is fresh & difficult topic for me, however I need to use it in one of my algorithms. Here is the challenge: I need a method where I specify number of recursions (number of nested FOR loops) and number of iterations for each FOR loop. The result should show me, something simmilar to counter, however each column of counter is limited to specific number. ArrayList<Integer> specs= new ArrayList<Integer>(); specs.add(5); //for(int i=0 to 5; i++) specs.add(7); specs.add(9); specs.add(2); specs.add(8); specs.add(9); public void recursion(ArrayList<Integer> specs){ //number of nested loops will be equal to: specs.size(); //each item in specs, specifies the For loop max count e.g: //First outside loop will be: for(int i=0; i< specs.get(0); i++) //Second loop inside will be: for(int i=0; i< specs.get(1); i++) //... } The the results will be similar to outputs of this manual, nested loop: int[] i; i = new int[7]; for( i[6]=0; i[6]<5; i[6]++){ for( i[5]=0; i[5]<7; i[5]++){ for(i[4] =0; i[4]<9; i[4]++){ for(i[3] =0; i[3]<2; i[3]++){ for(i[2] =0; i[2]<8; i[2]++){ for(i[1] =0; i[1]<9; i[1]++){ //... System.out.println(i[1]+" "+i[2]+" "+i[3]+" "+i[4]+" "+i[5]+" "+i[6]); } } } } } } I already, killed 3 days on this, and still no results, was searching it in internet, however the examples are too different. Therefore, posting the programming question in internet first time in my life. Thank you in advance, you are free to change the code efficiency, I just need the same results.

    Read the article

  • Primary language - QtC++, C#, Java?

    - by Airjoe
    I'm looking for some input, but let me start with a bit of background (for tl;dr skip to end). I'm an IT major with a concentration in networking. While I'm not a CS major nor do I want to program as a vocation, I do consider myself a programmer and do pretty well with the concepts involved. I've been programming since about 6th grade, started out with a proprietary game creation language that made my transition into C++ at college pretty easy. I like to make programs for myself and friends, and have been paid to program for local businesses. A bit about that- I wrote some programs for a couple local businesses in my senior year in high school. I wrote management systems for local shops (inventory, phone/pos orders, timeclock, customer info, and more stuff I can't remember). It definitely turned out to be over my head, as I had never had any formal programming education. It was a great learning experience, but damn was it crappy code. Oh yeah, by the way, it was all vb6. So, I've used vb6 pretty extensively, I've used c++ in my classes (intro to programming up to algorithms), used Java a little bit in another class (had to write a ping client program, pretty easy) and used Java for some simple Project Euler problems to help learn syntax and such when writing the program for the class. I've also used C# a bit for my own simple personal projects (simple programs, one which would just generate an HTTP request on a list of websites and notify if one responded unexpectedly or not at all, and another which just held a list of things to do and periodically reminded me to do them), things I would've written in vb6 a year or two ago. I've just started using Qt C++ for some undergrad research I'm working on. Now I've had some formal education, I [think I] understand organization in programming a lot better (I didn't even use classes in my vb6 programs where I really should have), how it's important to structure code, split into functions where appropriate, document properly, efficiency both in memory and speed, dynamic and modular programming etc. I was looking for some input on which language to pick up as my "primary". As I'm not a "real programmer", it will be mostly hobby projects, but will include some 'real' projects I'm sure. From my perspective: QtC++ and Java are cross platform, which is cool. Java and C# run in a virtual machine, but I'm not sure if that's a big deal (something extra to distribute, possibly a bit slower? I think Qt would require additional distributables too, right?). I don't really know too much more than this, so I appreciate any help, thanks! TL;DR Am an avocational programmer looking for a language, want quick and straight forward development, liked vb6, will be working with database driven GUI apps- should I go with QtC++, Java, C#, or perhaps something else?

    Read the article

  • Approaches for Content-based Item Recommendations

    - by PartlyCloudy
    Hello, I'm currently developing an application where I want to group similar items. Items (like videos) can be created by users and also their attributes can be altered or extended later (like new tags). Instead of relying on users' preferences as most collaborative filtering mechanisms do, I want to compare item similarity based on the items' attributes (like similar length, similar colors, similar set of tags, etc.). The computation is necessary for two main purposes: Suggesting x similar items for a given item and for clustering into groups of similar items. My application so far is follows an asynchronous design and I want to decouple this clustering component as far as possible. The creation of new items or the addition of new attributes for an existing item will be advertised by publishing events the component can then consume. Computations can be provided best-effort and "snapshotted", which means that I'm okay with the best result possible at a given point in time, although result quality will eventually increase. So I am now searching for appropriate algorithms to compute both similar items and clusters. At important constraint is scalability. Initially the application has to handle a few thousand items, but later million items might be possible as well. Of course, computations will then be executed on additional nodes, but the algorithm itself should scale. It would also be nice if the algorithm supports some kind of incremental mode on partial changes of the data. My initial thought of comparing each item with each other and storing the numerical similarity sounds a little bit crude. Also, it requires n*(n-1)/2 entries for storing all similarities and any change or new item will eventually cause n similarity computations. Thanks in advance! UPDATE tl;dr To clarify what I want, here is my targeted scenario: User generate entries (think of documents) User edit entry meta data (think of tags) And here is what my system should provide: List of similar entries to a given item as recommendation Clusters of similar entries Both calculations should be based on: The meta data/attributes of entries (i.e. usage of similar tags) Thus, the distance of two entries using appropriate metrics NOT based on user votings, preferences or actions (unlike collaborative filtering). Although users may create entries and change attributes, the computation should only take into account the items and their attributes, and not the users associated with (just like a system where only items and no users exist). Ideally, the algorithm should support: permanent changes of attributes of an entry incrementally compute similar entries/clusters on changes scale something better than a simple distance table, if possible (because of the O(n²) space complexity)

    Read the article

< Previous Page | 69 70 71 72 73 74 75 76 77 78  | Next Page >