natural key - Page 51 - Developer IT

Hashing words to numbers with respect to definition

- by thornate

As part of a larger project, I need to read in text and represent each word as a number. For example, if the program reads in "Every good boy deserves fruit", then I would get a table that converts 'every' to '1742', 'good' to '977513', etc. Now, obviously I can just use a hashing algorithm to get these numbers. However, it would be more useful if words with similar meanings had numerical values close to each other, so that 'good' becomes '6827' and 'great' becomes '6835', etc. As another option, instead of a simple integer representing each number, it would be even better to have a vector made up of multiple numbers, eg (lexical_category, tense, classification, specific_word) where lexical_category is noun/verb/adjective/etc, tense is future/past/present, classification defines a wide set of general topics and specific_word is much the same as described in the previous paragraph. Does any such an algorithm exist? If not, can you give me any tips on how to get started on developing one myself? I code in C++.

Read the article

entity set expansion python

- by Nicolas M.

Do you know of any existing implementation in any language (preferably python) of any entity set expansion algorithms, such that the one from Google sets ? ( http://labs.google.com/sets ) I couldn't find any library implementing such algorithms and I'd like to play with some of those to see how they would perform on some specific task I would like to implement. Any help is welcome ! Thanks a lot for your help, Regards, Nicolas.

Read the article

Sentiment analysis for twitter in python

- by Ran

I'm looking for an open source implementation, preferably in python, of Textual Sentiment Analysis (http://en.wikipedia.org/wiki/Sentiment_analysis). Is anyone familiar with such open source implementation I can use? I'm writing an application that searches twitter for some search term, say "youtube", and counts "happy" tweets vs. "sad" tweets. I'm using Google's appengine, so it's in python. I'd like to be able to classify the returned search results from twitter and I'd like to do that in python. I haven't been able to find such sentiment analyzer so far, specifically not in python. Are you familiar with such open source implementation I can use? Preferably this is already in python, but if not, hopefully I can translate it to python. Note, the texts I'm analyzing are VERY short, they are tweets. So ideally, this classifier is optimized for such short texts. BTW, twitter does support the ":)" and ":(" operators in search, which aim to do just this, but unfortunately, the classification provided by them isn't that great, so I figured I might give this a try myself. Thanks! BTW, an early demo is here and the code I have so far is here and I'd love to opensource it with any interested developer.

Read the article

Perl Lingua giving weird error on install

- by user299306

I am trying to install perl Lingua onto a unix system (ubuntu, latest version). Of course I am root. when I go into the package to install using 'perl Makefile.pl' I get this dumb error: [root@csisl27 Lingua-Lid-0.01]# perl Makefile.PL /opt/ls//lib does not exist at Makefile.PL line 48. I have tried playing with the path on line 48, nothing changes, here is what line 48-50 looks like: Line 48: die "$BASE/lib does not exist" unless -d "$BASE/lib"; Line 49: die "$BASE/include does not exist" unless -d "$BASE/include"; Line 50: die "lid.h is missing in $BASE/include" unless -e "$BASE/includ/lid.h"; The variable $BASE is declared as this: $BASE = "/opt/ls/" if ($^O eq "linux" or $^O eq "solaris"); $BASE = "/usr/local/" if ($^O eq "freebsd"); $BASE = $ENV{LID_BASE_DIR} if (defined $ENV{LID_BASE_DIR}); Now the perl program I am trying to write simply look like this (just my base): #!/usr/bin/perl use Lingua::LinkParser; use strict; print "Hello world!\n"; When I run this trying to use Lingua, here is my error: [root@csisl27 assign4]# ./perl_parser_1.pl Can't locate Lingua/LinkParser.pm in @INC (@INC contains: /usr/lib/perl5/site_perl/5.10.0/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.10.0 /usr/lib/perl5/vendor_perl/5.10.0/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.10.0 /usr/lib/perl5/5.10.0/x86_64-linux-thread-multi /usr/lib/perl5/5.10.0 /usr/lib/perl5/site_perl /usr/lib/perl5/vendor_perl .) at ./perl_parser_1.pl line 3. BEGIN failed--compilation aborted at ./perl_parser_1.pl line 3. Tried insalling this from cpan, still doesn't properly work.

Read the article

How to implement a SIMPLE "You typed ACB, did you mean ABC?"

- by marcgg

I know this is not a straight up question, so if you need me to provide more information about the scope of it, let me know. There are a bunch of questions that address almost the same issue (they are linked here), but never the exact same one with the same kind of scope and objective - at least as far as I know. Context: I have a MP3 file with ID3 tags for artist name and song title. I have two tables Artists and Songs The ID3 tags might be slightly off (e.g. Mikaell Jacksonne) I'm using ASP.NET + C# and a MSSQL database I need to synchronize the MP3s with the database. Meaning: The user launches a script The script browses through all the MP3s The script says "Is 'Mikaell Jacksonne' 'Michael Jackson' YES/NO" The user pick and we start over Examples of what the system could find: In the database... SONGS = {"This is a great song title", "This is a song title"} ARTISTS = {"Michael Jackson"} Outputs... "This is a grt song title" did you mean "This is a great song title" ? "This is song title" did you mean "This is a song title" ? "This si a song title" did you mean "This is a song title" ? "This si song a title" did you mean "This is a song title" ? "Jackson, Michael" did you mean "Michael Jackson" ? "JacksonMichael" did you mean "Michael Jackson" ? "Michael Jacksno" did you mean "Michael Jackson" ? etc. I read some documentation from this /how-do-you-implement-a-did-you-mean and this is not exactly what I need since I don't want to check an entire dictionary. I also can't really use a web service since it's depending a lot on what I already have in my database. If possible I'd also like to avoid dealing with distances and other complicated things. I could use the google api (or something similar) to do this, meaning that the script will try spell checking and test it with the database, but I feel there could be a better solution since my database might end up being really specific with weird songs and artists, making spell checking useless. I could also try something like what has been explained on this post, using Soundex for c#. Using a regular spell checker won't work because I won't be using words but names and 'titles'. So my question is: is there a relatively simple way of doing this, and if so, what is it? Any kind of help would be appreciated. Thanks!

Read the article

How to honor/inherit user's language settings in WinForm app

- by msorens

I have worked with globalization settings in the past but not within the .NET environment, which is the topic of this question. What I am seeing is most certainly due to knowledge I have yet to learn so I would appreciate illumination on the following. Setup: My default language setting is English (en-us specifically). I added a second language (Danish) on my development system (WinXP) and then opened the language bar so I could select either at will. I selected Danish on the language bar then opened Notepad and found the language reverted to English on the language bar. I understand that the language setting is per application, so it seemed that Notepad set the default back to English. (I found that strange since Windows and thus Notepad is used all over the world.) Closing Notepad returned the setting on the language bar to Danish. I then launched my open custom WinForm application--which I know does not set the language--and it also reverted from English to Danish when opened, then back to Danish when terminated! Question #1A: How do I get my WinForm application upon launch to inherit the current setting of the language bar? My experiment seems to indicate that each application starts with the system default and requires the user to manually change it once the app is running--this would seem to be a major inconvenience for anyone that wants to work with more than one language! Question #1B: If one must, in fact, set the language manually in a multi-language scenario, how do I change my default system language (e.g. to Danish) so I can test my app's launch in another language? I added a display of the current language in my application for this next experiment. Specifically I set a MouseEnter handler on a label that set its tooltip to CultureInfo.CurrentCulture.Name so each time I mouse over I thought I should see the current language setting. Since setting the language before I launch my app did not work, I launched it then set the language to Danish. I found that some things (like typing in a TextBox) did honor this Danish setting. But mousing over the instrumented label still showed en-us! Question #2A: Why does CultureInfo.CurrentCulture.Name not reflect the change from my language bar while other parts of my app seem to recognize the change? (Trying CultureInfo.CurrentUICulture.Name produced the same result.) Question #2B: Is there an event that fires upon changes on the language bar so I could recognize within my app when the language setting changes?

Read the article

Stanford Parser - Traversing the typed dependencies graph

- by pns

Hello! Basically I want to find a path between two NP tokens in the dependencies graph. However, I can't seem to find a good way to do this in the Stanford Parser. Any help? Thank You Very Much

Read the article

Latent Dirichlet Allocation, pitfalls, tips and programs

- by Gregg Lind

I'm experimenting with Latent Dirichlet Allocation for topic disambiguation and assignment, and I'm looking for advice. Which program is the "best", where best is some combination of easiest to use, best prior estimation, fast How do I incorporate my intuitions about topicality. Let's say I think I know that some items in the corpus are really in the same category, like all articles by the same author. Can I add that into the analysis? Any unexpected pitfalls or tips I should know before embarking? I'd prefer is there are R or Python front ends for whatever program, but I expect (and accept) that I'll be dealing with C.

Read the article

How to make concept representation with the help of bag of words

- by agazerboy

Hi All, Thanks for stoping to read my question :) this is very sweet place full of GREAT peoples ! I have a question about "creating sentences with words". NO NO it is not about english grammar :) Let me explain, If I have bag of words like "person apple apple person person a eat person will apple eat hungry apple hungry" and it can generate some kind of following sentence "hungry person eat apple" I don't in which field this topic will relate. Where should I try to find an answer. I tried to search google but I only found english grammar stuff :) Any body there who can tell me which algo can work in this problem? or any program Thanks P.S: It is not an assignment :) if it would be i would ask for source code ! I don't even know in which field I should look for :)

Read the article

Canadian to US English

- by Tinku

Does there exist something like Canadian to US english e-dictionary which I can use in my application ?

Read the article

English Grammar Parsing in PHP (Link Grammar)

- by Chris T

Is there anyway to use the Link Grammar or AbiSource grammar checker in PHP (or C# but I'd prefer php)? I need to have a tree structure for english sentences. Any ideas? The only things I found were in C and I can't use them on a shared host.

Read the article

Determining whether values can potentially match a regular expression, given more input

- by Andreas Grech

I am currently writing an application in JavaScript where I'm matching input to regular expressions, but I also need to find a way how to match strings to parts of the regular expressions. For example: var invalid = "x", potentially = "g", valid = "ggg", gReg = /^ggg$/; gReg.test(invalid); //returns false (correct) gReg.test(valid); //returns true (correct) Now I need to find a way to somehow determine that the value of the potentially variable doesn't exactly match the /^ggg$/ expression, BUT with more input, it potentially can! So for example in this case, the potentially variable is g, but if two more g's are appended to it, it will match the regular expression /^ggg$/ But in the case of invalid, it can never match the /^ggg$/ expression, no matter how many characters you append to it. So how can I determine if a string has or doesn't have potential to match a particular regular expression?

Read the article

Naive Bayesian for Topic detection using "Bag of Words" approach

- by AlgoMan

I am trying to implement a naive bayseian approach to find the topic of a given document or stream of words. Is there are Naive Bayesian approach that i might be able to look up for this ? Also, i am trying to improve my dictionary as i go along. Initially, i have a bunch of words that map to a topics (hard-coded). Depending on the occurrence of the words other than the ones that are already mapped. And depending on the occurrences of these words i want to add them to the mappings, hence improving and learning about new words that map to topic. And also changing the probabilities of words. How should i go about doing this ? Is my approach the right one ? Which programming language would be best suited for the implementation ?

Read the article

Building dictionary of words from large text

- by LiorH

I have a text file containing posts in English/Italian. I would like to read the posts into a data matrix so that each row represents a post and each column a word. The cells in the matrix are the counts of how many times each word appears in the post. The dictionary should consist of all the words in the whole file or a non exhaustive English/Italian dictionary. I know this is a common essential preprocessing step for NLP. Does anyone know of a tool\project that can perform this task? Someone mentioned apache lucene, do you know if lucene index can be serialized to a data-structure similar to my needs?

Read the article

Defining the context of a word - Python

- by RadiantHex

Hi folks, I think this is an interesting question, at least for me. I have a list of words, let's say: photo, free, search, image, css3, css, tutorials, webdesign, tutorial, google, china, censorship, politics, internet and I have a list of contexts: Programming World news Technology Web Design I need to try and match words with the appropriate context/contexts if possible. Maybe discovering word relationships in some way. Any ideas? Help would be much appreciated!

Read the article

Special Ocassion parser in JAVA

- by Pranav

Hey guys, I am working on a date parser in Java. Just wanted some information on if there is any java library which could parse special occasions like for example if I give input as: Christmas or new year, it returns a date for this. Thanks in advance. Regards, Pranav

Read the article

Dependency parsing

- by C.

Hi I particularly like the transduce feature offered by agfl in their EP4IR http://www.agfl.cs.ru.nl/EP4IR/english.html The download page is here: http://www.agfl.cs.ru.nl/download.html Is there any way i can make use of this in a c# program? Do I need to convert classes to c#? Thanks :)

Read the article

translate by replacing words inside existing text

- by Berry Tsakala

What are common approaches for translating certain words (or expressions) inside a given text, when the text must be reconstructed (with punctuations and everythin.) ? The translation comes from a lookup table, and covers words, collocations, and emoticons like L33t, CUL8R, :-), etc. Simple string search-and-replace is not enough since it can replace part of longer words (cat dog ? caterpillar dogerpillar). Assume the following input: s = "dogbert, started a dilbert dilbertion proces cat-bert :-)" after translation, i should receive something like: result = "anna, started a george dilbertion process cat-bert smiley" I can't simply tokenize, since i loose punctuations and word positions. Regular expressions, works for normal words, but don't catch special expressions like the smiley :-) but it does . re.sub(r'\bword\b','translation',s) ==> translation re.sub(r'\b:-\)\b','smiley',s) ==> :-) for now i'm using the above mentioned regex, and simple replace for the non-alphanumeric words, but it's far from being bulletproof. (p.s. i'm using python)

Read the article

Detecting syllables in a word

- by user50705

I need to find a fairly efficient way to detect syllables in a word. E.g., invisible - in-vi-sib-le There are some syllabification rules that could be used: V CV VC CVC CCV CCCV CVCC *where V is a vowel and C is a consonant. e.g., pronunciation (5 Pro-nun-ci-a-tion; CV-CVC-CV-V-CVC) I've tried few methods, among which were using regex (which helps only if you want to count syllables) or hard coded rule definition (a brute force approach which proves to be very inefficient) and finally using a finite state automata (which did not result with anything useful). The purpose of my application is to create a dictionary of all syllables in a given language. This dictionary will later be used for spell checking applications (using Bayesian classifiers) and text to speech synthesis. I would appreciate if one could give me tips on an alternate way to solve this problem besides my previous approaches. I work in Java, but any tip in C/C++, C#, Python, Perl... would work for me.

Read the article

Are there any well known algorithms to detect the presence of names?

- by Rhubarb

For example, given a string: "Bob went fishing with his friend Jim Smith." Bob and Jim Smith are both names, but bob and smith are both words. Weren't for them being uppercase, there would be less indication of this outside of our knowledge of the sentence. Without doing grammar analysis, are there any well known algorithms for detecting the presence of names, at least Western names?

Read the article

Recognizing language of a short text? - Python

- by RadiantHex

Hi folks, I'm have a list of articles, each article has its own title and description. Unfortunately, from the sources I am using, there is no way to know what language they are written. Also, text is not entirely written in 1 language; almost always English words are present. I reckon I would need dictionary databases stored on my machine, but it feels a bit unpractical. What would you suggest I do?

Read the article

How to identify ideas and concepts in a given text

- by Nick

I'm working on a project at the moment where it would be really useful to be able to detect when a certain topic/idea is mentioned in a body of text. For instance, if the text contained: Maybe if you tell me a little more about who Mr Balzac is, that would help. It would also be useful if I could have a description of his appearance, or even better a photograph? It'd be great to be able to detect that the person has asked for a photograph of Mr Balzac. I could take a really naïve approach and just look for the word "photo" or "photograph", but this would obviously be no good if they wrote something like: Please, never send me a photo of Mr Balzac. Does anyone know where to start with this? Is it even possible? I've looked into things like nltk, but I've yet to find an example of someone doing something similar and am still not entirely sure what this kind of analysis is called. Any help that can get me off the ground would be great. Thanks!

Read the article

English dictionary as txt or xml file with support of synonyms

- by Simon

Can someone point me to where I can download English dictionary as a txt or xml file. I am building a simple app for myself and looking for something what I could start using immediately without learning complex API. Support for synonyms would be great, that is it should be easier to retrieve all the synonyms for particular word. It would be absolutely fantastic if dictionary would be listing British and American spelling of the words where they are differ. Even if it would be small dictionary (few 000's words) that's ok, I only need it for small project. I even would be willing to buy one if the price is reasonable, and dictionary is easy to use - simple xml wold be great. Any directions please.

Read the article

Generating easy-to-remember random identifiers

- by Carl Seleborg

Hi all, As all developers do, we constantly deal with some kind of identifiers as part of our daily work. Most of the time, it's about bugs or support tickets. Our software, upon detecting a bug, creates a package that has a name formatted from a timestamp and a version number, which is a cheap way of creating reasonably unique identifiers to avoid mixing packages up. Example: "Bug Report 20101214 174856 6.4b2". My brain just isn't that good at remembering numbers. What I would love to have is a simple way of generating alpha-numeric identifiers that are easy to remember. Examples would be "azil3", "ulmops", "fel2way", etc. I just made these up, but they are much easier to recognize when you see many of them at once. I know of algorithms that perform trigram analysis on text (say you feed them a whole book in German) and that can generate strings that look and feel like German words. This requires lots of data, though, and makes it slightly less suitable for embedding in an application just for this purpose. Do you know of anything else? Thanks! Carl

Read the article

I'm looking for a way to evaluate reading rate in several languages

- by i30817

I have a software that is page oriented instead of scrollbar oriented so i can easily count the words, but i'd like a way to filter outliers and some default value for the text language (that is known). The goal is from the remaining text to calculate the remaining time. I'm not sure what is the best unit to use. WPM (words per minute) from here seems very fuzzy and human oriented. Besides i don't know how many "words" remain in the text. http://www.sfsu.edu/~testing/CalReadRate.htm So i came up with this: The user is reading the text. The total text size in characters is known. His position in the text is known. So the remaining characters to read is also known. If a language has a median word length of say 5 chars, then if i had a WPM speed for the user, i could calculate the remaining time. 3 things are needed for this: 1) A table of the median word length of the language. 2) A table of the median WPM of a median user per language. 3) Update the WPM to fit the user as data becomes available, filtering outliers. However i can't find these tables. And i'm not sure how precise it is assuming median word length.

Search Results

Search found 19359 results on 775 pages for 'natural key'.

Page 51/775 | < Previous Page | 47 48 49 50 51 52 53 54 55 56 57 58 | Next Page >

- by thornate

- by Nicolas M.

- by Ran

- by user299306

- by marcgg

- by msorens

- by pns

- by Gregg Lind

- by agazerboy

- by Tinku

- by Chris T

- by Andreas Grech

- by AlgoMan

- by LiorH

- by RadiantHex

- by Pranav

- by C.

- by Berry Tsakala

- by user50705

- by Rhubarb

- by RadiantHex

- by Nick

- by Simon

- by Carl Seleborg

- by i30817

< Previous Page | 47 48 49 50 51 52 53 54 55 56 57 58 | Next Page >