Search Results

Search found 7240 results on 290 pages for 'natural join'.

Page 40/290 | < Previous Page | 36 37 38 39 40 41 42 43 44 45 46 47  | Next Page >

  • How to make concept representation with the help of bag of words

    - by agazerboy
    Hi All, Thanks for stoping to read my question :) this is very sweet place full of GREAT peoples ! I have a question about "creating sentences with words". NO NO it is not about english grammar :) Let me explain, If I have bag of words like "person apple apple person person a eat person will apple eat hungry apple hungry" and it can generate some kind of following sentence "hungry person eat apple" I don't in which field this topic will relate. Where should I try to find an answer. I tried to search google but I only found english grammar stuff :) Any body there who can tell me which algo can work in this problem? or any program Thanks P.S: It is not an assignment :) if it would be i would ask for source code ! I don't even know in which field I should look for :)

    Read the article

  • Determining whether values can potentially match a regular expression, given more input

    - by Andreas Grech
    I am currently writing an application in JavaScript where I'm matching input to regular expressions, but I also need to find a way how to match strings to parts of the regular expressions. For example: var invalid = "x", potentially = "g", valid = "ggg", gReg = /^ggg$/; gReg.test(invalid); //returns false (correct) gReg.test(valid); //returns true (correct) Now I need to find a way to somehow determine that the value of the potentially variable doesn't exactly match the /^ggg$/ expression, BUT with more input, it potentially can! So for example in this case, the potentially variable is g, but if two more g's are appended to it, it will match the regular expression /^ggg$/ But in the case of invalid, it can never match the /^ggg$/ expression, no matter how many characters you append to it. So how can I determine if a string has or doesn't have potential to match a particular regular expression?

    Read the article

  • Naive Bayesian for Topic detection using "Bag of Words" approach

    - by AlgoMan
    I am trying to implement a naive bayseian approach to find the topic of a given document or stream of words. Is there are Naive Bayesian approach that i might be able to look up for this ? Also, i am trying to improve my dictionary as i go along. Initially, i have a bunch of words that map to a topics (hard-coded). Depending on the occurrence of the words other than the ones that are already mapped. And depending on the occurrences of these words i want to add them to the mappings, hence improving and learning about new words that map to topic. And also changing the probabilities of words. How should i go about doing this ? Is my approach the right one ? Which programming language would be best suited for the implementation ?

    Read the article

  • Building dictionary of words from large text

    - by LiorH
    I have a text file containing posts in English/Italian. I would like to read the posts into a data matrix so that each row represents a post and each column a word. The cells in the matrix are the counts of how many times each word appears in the post. The dictionary should consist of all the words in the whole file or a non exhaustive English/Italian dictionary. I know this is a common essential preprocessing step for NLP. Does anyone know of a tool\project that can perform this task? Someone mentioned apache lucene, do you know if lucene index can be serialized to a data-structure similar to my needs?

    Read the article

  • Defining the context of a word - Python

    - by RadiantHex
    Hi folks, I think this is an interesting question, at least for me. I have a list of words, let's say: photo, free, search, image, css3, css, tutorials, webdesign, tutorial, google, china, censorship, politics, internet and I have a list of contexts: Programming World news Technology Web Design I need to try and match words with the appropriate context/contexts if possible. Maybe discovering word relationships in some way. Any ideas? Help would be much appreciated!

    Read the article

  • Special Ocassion parser in JAVA

    - by Pranav
    Hey guys, I am working on a date parser in Java. Just wanted some information on if there is any java library which could parse special occasions like for example if I give input as: Christmas or new year, it returns a date for this. Thanks in advance. Regards, Pranav

    Read the article

  • Dependency parsing

    - by C.
    Hi I particularly like the transduce feature offered by agfl in their EP4IR http://www.agfl.cs.ru.nl/EP4IR/english.html The download page is here: http://www.agfl.cs.ru.nl/download.html Is there any way i can make use of this in a c# program? Do I need to convert classes to c#? Thanks :)

    Read the article

  • translate by replacing words inside existing text

    - by Berry Tsakala
    What are common approaches for translating certain words (or expressions) inside a given text, when the text must be reconstructed (with punctuations and everythin.) ? The translation comes from a lookup table, and covers words, collocations, and emoticons like L33t, CUL8R, :-), etc. Simple string search-and-replace is not enough since it can replace part of longer words (cat dog ? caterpillar dogerpillar). Assume the following input: s = "dogbert, started a dilbert dilbertion proces cat-bert :-)" after translation, i should receive something like: result = "anna, started a george dilbertion process cat-bert smiley" I can't simply tokenize, since i loose punctuations and word positions. Regular expressions, works for normal words, but don't catch special expressions like the smiley :-) but it does . re.sub(r'\bword\b','translation',s) ==> translation re.sub(r'\b:-\)\b','smiley',s) ==> :-) for now i'm using the above mentioned regex, and simple replace for the non-alphanumeric words, but it's far from being bulletproof. (p.s. i'm using python)

    Read the article

  • Detecting syllables in a word

    - by user50705
    I need to find a fairly efficient way to detect syllables in a word. E.g., invisible - in-vi-sib-le There are some syllabification rules that could be used: V CV VC CVC CCV CCCV CVCC *where V is a vowel and C is a consonant. e.g., pronunciation (5 Pro-nun-ci-a-tion; CV-CVC-CV-V-CVC) I've tried few methods, among which were using regex (which helps only if you want to count syllables) or hard coded rule definition (a brute force approach which proves to be very inefficient) and finally using a finite state automata (which did not result with anything useful). The purpose of my application is to create a dictionary of all syllables in a given language. This dictionary will later be used for spell checking applications (using Bayesian classifiers) and text to speech synthesis. I would appreciate if one could give me tips on an alternate way to solve this problem besides my previous approaches. I work in Java, but any tip in C/C++, C#, Python, Perl... would work for me.

    Read the article

  • Are there any well known algorithms to detect the presence of names?

    - by Rhubarb
    For example, given a string: "Bob went fishing with his friend Jim Smith." Bob and Jim Smith are both names, but bob and smith are both words. Weren't for them being uppercase, there would be less indication of this outside of our knowledge of the sentence. Without doing grammar analysis, are there any well known algorithms for detecting the presence of names, at least Western names?

    Read the article

  • Recognizing language of a short text? - Python

    - by RadiantHex
    Hi folks, I'm have a list of articles, each article has its own title and description. Unfortunately, from the sources I am using, there is no way to know what language they are written. Also, text is not entirely written in 1 language; almost always English words are present. I reckon I would need dictionary databases stored on my machine, but it feels a bit unpractical. What would you suggest I do?

    Read the article

  • How to identify ideas and concepts in a given text

    - by Nick
    I'm working on a project at the moment where it would be really useful to be able to detect when a certain topic/idea is mentioned in a body of text. For instance, if the text contained: Maybe if you tell me a little more about who Mr Balzac is, that would help. It would also be useful if I could have a description of his appearance, or even better a photograph? It'd be great to be able to detect that the person has asked for a photograph of Mr Balzac. I could take a really naïve approach and just look for the word "photo" or "photograph", but this would obviously be no good if they wrote something like: Please, never send me a photo of Mr Balzac. Does anyone know where to start with this? Is it even possible? I've looked into things like nltk, but I've yet to find an example of someone doing something similar and am still not entirely sure what this kind of analysis is called. Any help that can get me off the ground would be great. Thanks!

    Read the article

  • English dictionary as txt or xml file with support of synonyms

    - by Simon
    Can someone point me to where I can download English dictionary as a txt or xml file. I am building a simple app for myself and looking for something what I could start using immediately without learning complex API. Support for synonyms would be great, that is it should be easier to retrieve all the synonyms for particular word. It would be absolutely fantastic if dictionary would be listing British and American spelling of the words where they are differ. Even if it would be small dictionary (few 000's words) that's ok, I only need it for small project. I even would be willing to buy one if the price is reasonable, and dictionary is easy to use - simple xml wold be great. Any directions please.

    Read the article

  • Generating easy-to-remember random identifiers

    - by Carl Seleborg
    Hi all, As all developers do, we constantly deal with some kind of identifiers as part of our daily work. Most of the time, it's about bugs or support tickets. Our software, upon detecting a bug, creates a package that has a name formatted from a timestamp and a version number, which is a cheap way of creating reasonably unique identifiers to avoid mixing packages up. Example: "Bug Report 20101214 174856 6.4b2". My brain just isn't that good at remembering numbers. What I would love to have is a simple way of generating alpha-numeric identifiers that are easy to remember. Examples would be "azil3", "ulmops", "fel2way", etc. I just made these up, but they are much easier to recognize when you see many of them at once. I know of algorithms that perform trigram analysis on text (say you feed them a whole book in German) and that can generate strings that look and feel like German words. This requires lots of data, though, and makes it slightly less suitable for embedding in an application just for this purpose. Do you know of anything else? Thanks! Carl

    Read the article

  • I'm looking for a way to evaluate reading rate in several languages

    - by i30817
    I have a software that is page oriented instead of scrollbar oriented so i can easily count the words, but i'd like a way to filter outliers and some default value for the text language (that is known). The goal is from the remaining text to calculate the remaining time. I'm not sure what is the best unit to use. WPM (words per minute) from here seems very fuzzy and human oriented. Besides i don't know how many "words" remain in the text. http://www.sfsu.edu/~testing/CalReadRate.htm So i came up with this: The user is reading the text. The total text size in characters is known. His position in the text is known. So the remaining characters to read is also known. If a language has a median word length of say 5 chars, then if i had a WPM speed for the user, i could calculate the remaining time. 3 things are needed for this: 1) A table of the median word length of the language. 2) A table of the median WPM of a median user per language. 3) Update the WPM to fit the user as data becomes available, filtering outliers. However i can't find these tables. And i'm not sure how precise it is assuming median word length.

    Read the article

  • Automated Legal Processing

    - by Chris S
    Will it ever be possible to make legal systems quantifiable enough to process with computer algorithms? What technologies would have to be in place before this is possible? Are there any existing technologies that are already trying to accomplish this? Out of curiosity, I downloaded the text for laws in my local municipality, and tried applying some simple NLP tricks to extract rules from sentences. I had mixed results. Some sentences were very explicit (e.g. "Cars may not be left in the park overnight"), but other sentences seemed hopelessly vague (e.g. "The council's purpose is to ensure the well-being of the community"). I apologize if this is too open-ended a topic, but I've often wondered what society would look like if legal systems were based on less ambiguous language. Lawyers, and the legal process in general, are so expensive because they have to manually process a complex set of rules codified in ambiguous legal texts. If this system could be represented in software, this huge expense could potentially be eliminated, making the legal system more accessible for everyone.

    Read the article

  • Is there a list of language only character regions for UTF-8 somewhere?

    - by Brehtt
    I'm trying to analyze some UTF-8 encoded documents in a way that recognizes different language characters. For my approach to work I need to ignore non-language characters, such as control characters, mathematical symbols etc. Just trying to dissect the basic Latin section of the UTF standard has resulted in multiple regions, with characters like the division symbol being right in the middle of a range of valid Latin characters. Is there a list somewhere that identifies these regions? Or better yet, a Regex that defines the regions or something in C# that can identify the different characters?

    Read the article

  • Mapping words to numbers with respect to definition

    - by thornate
    As part of a larger project, I need to read in text and represent each word as a number. For example, if the program reads in "Every good boy deserves fruit", then I would get a table that converts 'every' to '1742', 'good' to '977513', etc. Now, obviously I can just use a hashing algorithm to get these numbers. However, it would be more useful if words with similar meanings had numerical values close to each other, so that 'good' becomes '6827' and 'great' becomes '6835', etc. As another option, instead of a simple integer representing each number, it would be even better to have a vector made up of multiple numbers, eg (lexical_category, tense, classification, specific_word) where lexical_category is noun/verb/adjective/etc, tense is future/past/present, classification defines a wide set of general topics and specific_word is much the same as described in the previous paragraph. Does any such an algorithm exist? If not, can you give me any tips on how to get started on developing one myself? I code in C++.

    Read the article

  • Which SQL query is faster? Filter on Join criteria or Where clause?

    - by Jon Erickson
    Compare these 2 queries. Is it faster to put the filter on the join criteria or in the were clause. I have always felt that it is faster on the join criteria because it reduces the result set at the soonest possible moment, but I don't know for sure. I'm going to build some tests to see, but I also wanted to get opinions on which would is clearer to read as well. Query 1 SELECT * FROM TableA a INNER JOIN TableXRef x ON a.ID = x.TableAID INNER JOIN TableB b ON x.TableBID = b.ID WHERE a.ID = 1 /* <-- Filter here? */ Query 2 SELECT * FROM TableA a INNER JOIN TableXRef x ON a.ID = x.TableAID AND a.ID = 1 /* <-- Or filter here? */ INNER JOIN TableB b ON x.TableBID = b.ID

    Read the article

  • Database strucure for versioning and multiple languages

    - by phobia
    Hi, I'm wondering how to best solve the issue of content existing in multiple versions and multiple languages. An image of my current structure can be seen here: http://i46.tinypic.com/72fx3k.png Each content can only have one active version in each language, and that's how I'm curious on how to best solve. Right now I have a column of the contentversions table, which means for each change of active version I have to run a update and set active=false on all version and then a update to set active=true for the piece of content in question. Any thoughts? :)

    Read the article

  • Entity Framework and associations between string keys

    - by fredrik
    Hi, I am new to Entity Framework, and ORM's for that mather. In the project that I'm involed in we have a legacy database, with all its keys as strings, case-insensitive. We are converting to MSSQL and want to use EF as ORM, but have run in to a problem. Here is an example that illustrates our problem: TableA has a primary string key, TableB has a reference to this primary key. In LINQ we write something like: var result = from t in context.TableB select t.TableA; foreach( var r in result ) Console.WriteLine( r.someFieldInTableA ); if TableA contains a primary key that reads "A", and TableB contains two rows that references TableA but with different cases in the referenceing field, "a" and "A". In our project we want both of the rows to endup in the result, but only the one with the matching case will end up there. Using the SQL Profiler, I have noticed that both of the rows are selected. Is there a way to tell Entity Framework that the keys are case insensitive? Edit:We have now tested this with NHibernate and come to the conclution that NHibernate works with case-insensitive keys. So NHibernate might be a better choice for us.I am however still interested in finding out if there is any way to change the behaviour of Entity Framework.

    Read the article

  • How does Amazon's Statistically Improbable Phrases work?

    - by ??iu
    How does something like Statistically Improbable Phrases work? According to amazon: Amazon.com's Statistically Improbable Phrases, or "SIPs", are the most distinctive phrases in the text of books in the Search Inside!™ program. To identify SIPs, our computers scan the text of all books in the Search Inside! program. If they find a phrase that occurs a large number of times in a particular book relative to all Search Inside! books, that phrase is a SIP in that book. SIPs are not necessarily improbable within a particular book, but they are improbable relative to all books in Search Inside!. For example, most SIPs for a book on taxes are tax related. But because we display SIPs in order of their improbability score, the first SIPs will be on tax topics that this book mentions more often than other tax books. For works of fiction, SIPs tend to be distinctive word combinations that often hint at important plot elements. For instance, for Joel's first book, the SIPs are: leaky abstractions, antialiased text, own dog food, bug count, daily builds, bug database, software schedules One interesting complication is that these are phrases of either 2 or 3 words. This makes things a little more interesting because these phrases can overlap with or contain each other.

    Read the article

  • Algorithm for sentence analysis and tokenization

    - by Andrea Nagar
    I need to analyze a document and compile statistics as to how many times each a sequence of words is used (so the analysis is not on single words but of batch of recurring words). I read that compression algorithms do something similar to what I want - creating dictionaries of blocks of text with a piece of information reporting its frequency. It should be something similar to http://www.codeproject.com/KB/recipes/Patterns.aspx Do you have anything written in C#?

    Read the article

< Previous Page | 36 37 38 39 40 41 42 43 44 45 46 47  | Next Page >