Search Results

Search found 5070 results on 203 pages for 'algorithm'.

Page 81/203 | < Previous Page | 77 78 79 80 81 82 83 84 85 86 87 88  | Next Page >

  • Naive Bayesian classification (spam filtering) - Doubt in one calculation? Which one is right? Plz c

    - by Microkernel
    Hi guys, I am implementing Naive Bayesian classifier for spam filtering. I have doubt on some calculation. Please clarify me what to do. Here is my question. In this method, you have to calculate P(S|W) - Probability that Message is spam given word W occurs in it. P(W|S) - Probability that word W occurs in a spam message. P(W|H) - Probability that word W occurs in a Ham message. So to calculate P(W|S), should I do (1) (Number of times W occuring in spam)/(total number of times W occurs in all the messages) OR (2) (Number of times word W occurs in Spam)/(Total number of words in the spam message) So, to calculate P(W|S), should I do (1) or (2)? (I thought it to be (2), but I am not sure, so plz clarify me) I am refering http://en.wikipedia.org/wiki/Bayesian_spam_filtering for the info by the way. I got to complete the implementation by this weekend :( Thanks and regards, MicroKernel :) @sth: Hmm... Shouldn't repeated occurrence of word 'W' increase a message's spam score? In the your approach it wouldn't, right?. Lets take a scenario and discuss... Lets say, we have 100 training messages, out of which 50 are spam and 50 are Ham. and say word_count of each message = 100. And lets say, in spam messages word W occurs 5 times in each message and word W occurs 1 time in Ham message. So total number of times W occuring in all the spam message = 5*50 = 250 times. And total number of times W occuring in all Ham messages = 1*50 = 50 times. Total occurance of W in all of the training messages = (250+50) = 300 times. So, in this scenario, how do u calculate P(W|S) and P(W|H) ? Naturally we should expect, P(W|S) P(W|H)??? right. Please share your thought...

    Read the article

  • Travelling Salesman Problem Constraint Representation

    - by alex25
    Hey! I read a couple of articles and sample code about how to solve TSP with Genetic Algorithms and Ant Colony Optimization etc. But everything I found didn't include time (window) constraints, eg. "I have to be at customer x before 12am)" and assumed symmetry. Can somebody point me into the direction of some sample code or articles that explain how I can add constraints to TSP and how I can represent those in code. Thanks!

    Read the article

  • Why is my producer-consumer blocking?

    - by User007
    My code is here: http://pastebin.com/Fi3h0E0P Here is the output 0 Should we take order today (y or n): y Enter order number: 100 More customers (y or n): n Stop serving customers right now. Passing orders to cooker: There are total of 1 order(s) 1 Roger, waiter. I am processing order #100 The goal is waiter must take orders and then give them to the cook. The waiter has to wait cook finishes all pizza, deliver the pizza, and then take new orders. I asked how P-V work in my previous post here. I don't think it has anything to do with \n consuming? I tried all kinds of combination of wait(), but none work. Where did I make a mistake? The main part is here: //Producer process if(pid > 0) { while(1) { printf("0"); P(emptyShelf); // waiter as P finds no items on shelf; P(mutex); // has permission to use the shelf waiter_as_producer(); V(mutex); // cooker now can use the shelf V(orderOnShelf); // cooker now can pickup orders wait(); printf("2"); P(pizzaOnShelf); P(mutex); waiter_as_consumer(); V(mutex); V(emptyShelf); printf("3 "); } } if(pid == 0) { while(1) { printf("1"); P(orderOnShelf); // make sure there is an order on shelf P(mutex); //permission to work cooker_as_consumer(); // take order and put pizza on shelf printf("return from cooker"); V(mutex); //release permission printf("just released perm"); V(pizzaOnShelf); // pizza is now on shelf printf("after"); wait(); printf("4"); } } So I imagine this is the execution path: enter waiter_as_producer, then go to child process (cooker), then transfer the control back to parent, finish waiter_as_consumer, switch back to child. The two waits switch back to parent (like I said I tried all possible wait() combination...).

    Read the article

  • how to develop a program to minimize errors in human transcription of hand written surveys

    - by Alex. S.
    I need to develop custom software to do surveys. Questions may be of multiple choice, or free text in a very few cases. I was asked to design a subsystem to check if there is any error in the manual data entry for the multiple choices part. We're trying to speed up the user data entry process and to minimize human input differences between digital forms and the original questionnaires. The surveys are filled with handwritten marks and text by human interviewers, so it's possible to find hard to read marks, or also the user could accidentally select a different value in some question, and we would like to avoid that. The software must include some automatic control to detect possible typing differences. Each answer of the multiple choice questions has the same probability of being selected. This question has two parts: The GUI. The most simple thing I have in mind is to implement the most usable design of the questions display: use of large and readable fonts and space generously the choices. Is there something else? For faster input, I would like to use drop down lists (favoring keyboard over mouse). Given the questions are grouped in sections, I would like to show the answers selected for the questions of that section, but this could slow down the process. Any other ideas? The error checking subsystem. What else can I do to minimize or to check human typos in the multiple choice questions? Is this a solvable problem? is there some statistical methodology to check values that were entered by the users are the same from the hand filled forms? For example, let's suppose the survey has 5 questions, and each has 4 options. Let's say I have n survey forms filled in paper by interviewers, and they're ready to be entered in the software, then how to minimize the accidental differences that can have the manual transcription of the n surveys, without having to double check everything in the 5 questions of the n surveys? My first suggestion is that at the end of the processing of all the hand filled forms, the software could choose some forms randomly to make a double check of the responses in a few instances, but on what criteria can I make this selection? This validation would be enough to cover everything in a significant way? The actual survey is nation level and it has 56 pages with over 200 questions in total, so it will be a lot of hand written pages by many people, and the intention is to reduce the likelihood of errors and to optimize speed in the data entry process. The surveys must filled in paper first, given the complications of taking laptops or handhelds with the interviewers.

    Read the article

  • Non-Linear color interpolation?

    - by user146780
    If I have a straight line that mesures from 0 to 1, then I have colorA(255,0,0) at 0 on the line, then at 0.3 I have colorB(20,160,0) then at 1 on the line I have colorC(0,0,0). How could I find the color at 0.7? Thanks

    Read the article

  • Sync Algorithms

    - by Kristopher Johnson
    Are there any good references out there for sync algorithms? I'm interested in algorithms that synchronize the following kinds of data between multiple users: calendars documents lists and outlines I'm not just looking for synchronization of contents of directories a la rsync; I am interested in merging the data within individual files.

    Read the article

  • How can I group an array of rectangles into "Islands" of connected regions?

    - by Eric
    The problem I have an array of java.awt.Rectangles. For those who are not familiar with this class, the important piece of information is that they provide an .intersects(Rectangle b) function. I would like to write a function that takes this array of Rectangles, and breaks it up into groups of connected rectangles. Lets say for example, that these are my rectangles (constructor takes the arguments x, y, width,height): Rectangle[] rects = new Rectangle[] { new Rectangle(0, 0, 4, 2), //A new Rectangle(1, 1, 2, 4), //B new Rectangle(0, 4, 8, 2), //C new Rectangle(6, 0, 2, 2) //D } A quick drawing shows that A intersects B and B intersects C. D intersects nothing. A tediously drawn piece of ascii art does the job too: +-------+ +---+ ¦A+---+ ¦ ¦ D ¦ +-+---+-+ +---+ ¦ B ¦ +-+---+---------+ ¦ +---+ C ¦ +---------------+ Therefore, the output of my function should be: new Rectangle[][]{ new Rectangle[] {A,B,C}, new Rectangle[] {D} } The failed code This was my attempt at solving the problem: public List<Rectangle> getIntersections(ArrayList<Rectangle> list, Rectangle r) { List<Rectangle> intersections = new ArrayList<Rectangle>(); for(Rectangle rect : list) { if(r.intersects(rect)) { list.remove(rect); intersections.add(rect); intersections.addAll(getIntersections(list, rect)); } } return intersections; } public List<List<Rectangle>> mergeIntersectingRects(Rectangle... rectArray) { List<Rectangle> allRects = new ArrayList<Rectangle>(rectArray); List<List<Rectangle>> groups = new ArrayList<ArrayList<Rectangle>>(); for(Rectangle rect : allRects) { allRects.remove(rect); ArrayList<Rectangle> group = getIntersections(allRects, rect); group.add(rect); groups.add(group); } return groups; } Unfortunately, there seems to be an infinite recursion loop going on here. My uneducated guess would be that java does not like me doing this: for(Rectangle rect : allRects) { allRects.remove(rect); //... } Can anyone shed some light on the issue?

    Read the article

  • k-combinations of a set of integers in ascending size order

    - by Adamski
    Programming challenge: Given a set of integers [1, 2, 3, 4, 5] I would like to generate all possible k-combinations in ascending size order in Java; e.g. [1], [2], [3], [4], [5], [1, 2], [1, 3] ... [1, 2, 3, 4, 5] It is fairly easy to produce a recursive solution that generates all combinations and then sort them afterwards but I imagine there's a more efficient way that removes the need for the additional sort.

    Read the article

  • How would you write a program to find the shortest pangram in a list of words?

    - by jonathanasdf
    Given a list of words which contains the letters a-z at least once, how would you write a program to find the shortest pangram counted by number of characters (not counting spaces) as a combination of the words? Since I am not sure whether short answers exist, this is not code golf, but rather just a discussion of how you would approach this. However, if you think you can manage to write a short program that would do this, then go ahead, and this might turn into code golf :)

    Read the article

  • question about LSD radix sort

    - by davit-datuashvili
    hello i have following code public class LSD{ public static int R=1<<8; public static int bytesword=4; public static void radixLSD(int a[],int l,int r){ int aux[]=new int[a.length]; for (int d=bytesword-1;d>=0;d--){ int i, j; int count[]=new int[R+1]; for ( j=0;j<R;j++) count[j]=0; for (i=l;i<=r;i++) count[digit(a[i],d)+1]++; for (j=1;j<R;j++) count[j]+=count[j-1]; for (i=l;i<=r;i++) aux[count[digit(a[i],d)]++]=a[i]; for (i=l;i<=r;i++) a[i]=aux[i-1]; } } public static void main(String[]args){ int a[]=new int[]{3,6,5,7,4,8,9}; radixLSD(a,0,a.length-1); for (int i=0;i<a.length;i++){ System.out.println(a[i]); } } public static int digit(int n,int d){ return (n>>d)&1; } } but it show me mistake java.lang.ArrayIndexOutOfBoundsException: -1 at LSD.radixLSD(LSD.java:19) at LSD.main(LSD.java:29) please help me

    Read the article

  • Position elements without overlap

    - by eWolf
    I have a number of rectangular elements that I want to position in a 2D space. I calculate an ideal position for each element. Now my problem is that many elements overlap as very often the ideal positions are concentrated in one region. I want to avoid overlap as much as possible (doesn't have to be perfect, though). How can I do this? I've heard physics simulations are suitable for this - is that correct? And can anyone provide an example/tutorial? By the way: I'm using XNA, if you know any .NET library that does exactly this job - tell me!

    Read the article

  • Javascript algorithm that calculates week number in Fiscal Year

    - by ForeignerBR
    Hi, I have been looking for a Javascript algorithms that gives me the week number of a given Date object within a custom fiscal year. The fiscal year of my company starts on 1 September and ends on 31 August. Say today happens to be September 1st and I pass in a newly instanced Date object to this function; I would expect it to return 1. Hopefully someone will be able to help me with it. thanks, fbr

    Read the article

  • Fast dot product for a very special case

    - by psihodelia
    Given a vector X of size L, where every scalar element of X is from a binary set {0,1}, it is to find a dot product z=dot(X,Y) if vector Y of size L consists of the integer-valued elements. I suggest, there must exist a very fast way to do it. Let's say we have L=4; X[L]={1, 0, 0, 1}; Y[L]={-4, 2, 1, 0} and we have to find z=X[0]*Y[0] + X[1]*Y[1] + X[2]*Y[2] + X[3]*Y[3] (which in this case will give us -4). It is obvious that X can be represented using binary digits, e.g. an integer type int32 for L=32. Then, all what we have to do is to find a dot product of this integer with an array of 32 integers. Do you have any idea or suggestions how to do it very fast?

    Read the article

  • how to elegantly duplicate a graph (neural network)

    - by macias
    I have a graph (network) which consists of layers, which contains nodes (neurons). I would like to write a procedure to duplicate entire graph in most elegant way possible -- i.e. with minimal or no overhead added to the structure of the node or layer. Or yet in other words -- the procedure could be complex, but the complexity should not "leak" to structures. They should be no complex just because they are copyable. I wrote the code in C#, so far it looks like this: neuron has additional field -- copy_of which is pointer the the neuron which base copied from, this is my additional overhead neuron has parameterless method Clone() neuron has method Reconnect() -- which exchanges connection from "source" neuron (parameter) to "target" neuron (parameter) layer has parameterless method Clone() -- it simply call Clone() for all neurons network has parameterless method Clone() -- it calls Clone() for every layer and then it iterates over all neurons and creates mappings neuron=copy_of and then calls Reconnect to exchange all the "wiring" I hope my approach is clear. The question is -- is there more elegant method, I particularly don't like keeping extra pointer in neuron class just in case of being copied! I would like to gather the data in one point (network's Clone) and then dispose it completely (Clone method cannot have an argument though).

    Read the article

  • Ideas for designing an automated content tagging system needed

    - by Benjamin Smith
    I am currently designing a website that amongst other is required to display and organise small amounts of text content (mainly quotes, article stubs, etc.). I currently have a database with 250,000+ items and need to come up with a method of tagging each item with relevant tags which will eventually allow for easy searching/browsing of the content for users. A very simplistic idea I have (and one that I believe is employed by some sites that I have been looking to for inspiration (http://www.brainyquote.com/quotes/topics.html)), is to simply search the database for certain words or phrases and use these words as tags for the content. This can easily be extended so that if for example a user wanted to show all items with a theme of love then I would just return a list of items with words and phrases relating to this theme. This would not be hard to implement but does not provide very good results. For example if I were to search for the month 'May' in the database with the aim of then classifying the items returned as realting to the topic of Spring then I would get back all occurrences of the word May, regardless of the semantic meaning. Another shortcoming of this method is that I believe it would be quite hard to automate the process to any large scale. What I really require is a library that can take an item, break it down and analyse the semantic meaning and also return a list of tags that would correctly classify the item. I know this is a lot to ask and I have a feeling I will end up reverting to the aforementioned method but I just thought I should ask if anyone knew of any pre-existing solution. I think that as the items in the database are short then it is probably quite a hard task to analyse any meaning from them however I may be mistaken. Another path to possibly go down would be to use something like amazon turk to outsource the task which may produce good results but would be expensive. Eventually I would like users to be able to (and want to!) tag content and to vote for the most relevant tags, possibly using a gameification mechanic as motivation however this is some way down the line. A temporary fix may be the best thing if this were the route I decided to go down as I could use the rough results I got as the starting point for a more in depth solution. If you've read this far, thanks for sticking with me, I know I'm spitballing but any input would be really helpful. Thanks.

    Read the article

  • How do I find all paths through a set of given nodes in a DAG?

    - by Hanno Fietz
    I have a list of items (blue nodes below) which are categorized by the users of my application. The categories themselves can be grouped and categorized themselves. The resulting structure can be represented as a Directed Acyclic Graph (DAG) where the items are sinks at the bottom of the graph's topology and the top categories are sources. Note that while some of the categories might be well defined, a lot is going to be user defined and might be very messy. Example: On that structure, I want to perform the following operations: find all items (sinks) below a particular node (all items in Europe) find all paths (if any) that pass through all of a set of n nodes (all items sent via SMTP from example.com) find all nodes that lie below all of a set of nodes (intersection: goyish brown foods) The first seems quite straightforward: start at the node, follow all possible paths to the bottom and collect the items there. However, is there a faster approach? Remembering the nodes I already passed through probably helps avoiding unnecessary repetition, but are there more optimizations? How do I go about the second one? It seems that the first step would be to determine the height of each node in the set, as to determine at which one(s) to start and then find all paths below that which include the rest of the set. But is this the best (or even a good) approach? The graph traversal algorithms listed at Wikipedia all seem to be concerned with either finding a particular node or the shortest or otherwise most effective route between two nodes. I think both is not what I want, or did I just fail to see how this applies to my problem? Where else should I read?

    Read the article

  • Calculating holidays

    - by Ralph Shillington
    A number of holidays move around from year to year. For example, in Canada Victoria day (aka the May two-four weekend) is the Monday before May 25th, or Thanksgiving is the 2nd Monday of October (in Canada). I've been using variations on this Linq query to get the date of a holiday for a given year: var year = 2011; var month = 10; var dow = DayOfWeek.Monday; var instance = 2; var day = (from d in Enumerable.Range(1,DateTime.DaysInMonth(year,month)) let sample = new DateTime(year,month,d) where sample.DayOfWeek == dow select sample).Skip(instance-1).Take(1); While this works, and is easy enough to understand, I can imagine there is a more elegant way of making this calculation versus this brute force approach. Of course this doesn't touch on holidays such as Easter and the many other lunar based dates.

    Read the article

  • Converting to a column oriented array in Java

    - by halfwarp
    Although I have Java in the title, this could be for any OO language. I'd like to know a few new ideas to improve the performance of something I'm trying to do. I have a method that is constantly receiving an Object[] array. I need to split the Objects in this array through multiple arrays (List or something), so that I have an independent list for each column of all arrays the method receives. Example: List<List<Object>> column-oriented = new ArrayList<ArrayList<Object>>(); public void newObject(Object[] obj) { for(int i = 0; i < obj.length; i++) { column-oriented.get(i).add(obj[i]); } } Note: For simplicity I've omitted the initialization of objects and stuff. The code I've shown above is slow of course. I've already tried a few other things, but would like to hear some new ideas. How would you do this knowing it's very performance sensitive?

    Read the article

  • Where can I learn more about datastructure tricky questions?

    - by Sandbox
    I am relatively new to programming (around 1 year programming C#-winforms). Also, I come from a non CS background (no formal degree) Recently, while being interviewed for a job, I was asked about implementing a queue using a stack. I fumbled and wan't able to answer the question. After, the interview I could do it(had to spend some time). I have learnt (and think that I know it well) basic algorithms in datastructures using the book Data Structures: A Pseudocode Approach with C - Richard F. Gilberg (Author) . I want to know about sites/ books which have such questions along with answers. I think this will allow me to develop my CS specific problem solving skills. Any help is appreciated. BOUNTY: I am looking at some blog/website with datastructure and algorithms Q&A.

    Read the article

< Previous Page | 77 78 79 80 81 82 83 84 85 86 87 88  | Next Page >