natural language process - Page 100

Special Ocassion parser in JAVA

- by Pranav

Hey guys, I am working on a date parser in Java. Just wanted some information on if there is any java library which could parse special occasions like for example if I give input as: Christmas or new year, it returns a date for this. Thanks in advance. Regards, Pranav

Read the article

Dependency parsing

- by C.

Hi I particularly like the transduce feature offered by agfl in their EP4IR http://www.agfl.cs.ru.nl/EP4IR/english.html The download page is here: http://www.agfl.cs.ru.nl/download.html Is there any way i can make use of this in a c# program? Do I need to convert classes to c#? Thanks :)

Read the article

Are there any well known algorithms to detect the presence of names?

- by Rhubarb

For example, given a string: "Bob went fishing with his friend Jim Smith." Bob and Jim Smith are both names, but bob and smith are both words. Weren't for them being uppercase, there would be less indication of this outside of our knowledge of the sentence. Without doing grammar analysis, are there any well known algorithms for detecting the presence of names, at least Western names?

Read the article

What should every programmer know?

- by Matt Lacey

Regardless of programming language(s) or operating system(s) used or the environment they develop for, what should every programmer know? Some background: I'm interested in becoming the best programmer I can. As part of this process I'm trying to understand what I don't know and would benefit me a lot if I did. While there are loads of lists around along the lines of "n things every [insert programming language] developer should know", I have yet to find anything similar which isn't limited to a specific language. I also expect this information to be of interest and benefit to others.

Read the article

How to identify ideas and concepts in a given text

- by Nick

I'm working on a project at the moment where it would be really useful to be able to detect when a certain topic/idea is mentioned in a body of text. For instance, if the text contained: Maybe if you tell me a little more about who Mr Balzac is, that would help. It would also be useful if I could have a description of his appearance, or even better a photograph? It'd be great to be able to detect that the person has asked for a photograph of Mr Balzac. I could take a really naïve approach and just look for the word "photo" or "photograph", but this would obviously be no good if they wrote something like: Please, never send me a photo of Mr Balzac. Does anyone know where to start with this? Is it even possible? I've looked into things like nltk, but I've yet to find an example of someone doing something similar and am still not entirely sure what this kind of analysis is called. Any help that can get me off the ground would be great. Thanks!

Read the article

English dictionary as txt or xml file with support of synonyms

- by Simon

Can someone point me to where I can download English dictionary as a txt or xml file. I am building a simple app for myself and looking for something what I could start using immediately without learning complex API. Support for synonyms would be great, that is it should be easier to retrieve all the synonyms for particular word. It would be absolutely fantastic if dictionary would be listing British and American spelling of the words where they are differ. Even if it would be small dictionary (few 000's words) that's ok, I only need it for small project. I even would be willing to buy one if the price is reasonable, and dictionary is easy to use - simple xml wold be great. Any directions please.

Read the article

Generating easy-to-remember random identifiers

- by Carl Seleborg

Hi all, As all developers do, we constantly deal with some kind of identifiers as part of our daily work. Most of the time, it's about bugs or support tickets. Our software, upon detecting a bug, creates a package that has a name formatted from a timestamp and a version number, which is a cheap way of creating reasonably unique identifiers to avoid mixing packages up. Example: "Bug Report 20101214 174856 6.4b2". My brain just isn't that good at remembering numbers. What I would love to have is a simple way of generating alpha-numeric identifiers that are easy to remember. Examples would be "azil3", "ulmops", "fel2way", etc. I just made these up, but they are much easier to recognize when you see many of them at once. I know of algorithms that perform trigram analysis on text (say you feed them a whole book in German) and that can generate strings that look and feel like German words. This requires lots of data, though, and makes it slightly less suitable for embedding in an application just for this purpose. Do you know of anything else? Thanks! Carl

Read the article

Problems with noobs putting my GA code into their sites

- by dclowd9901

I don't mean for the title to be derogatory, but this is a rather frustrating problem, and I'm looking for a good workaround, given a language barrier involved. I have a site set up for a plugin I wrote, and, rather than use the site's resources to write their own code, I've had people simply rip the code from the samples on the site. Normally, this wouldn't be any issue at all, but they are also taking my Google Analytics instantiation, so my Analytics data is getting very skewed by incorporating visitation data from their websites. I've been able to contact the English-speaking site owners with little issue. The problem lies in the Japanese language sites that are yanking the code. I have no idea how to ask them to take down the analytics portion. Long-term, I'm providing a package that streamlines the learning-to-use process, but in the meantime, what can I do about this language barrier? Is there a way around this problem that I haven't thought of?

Read the article

Mapping words to numbers with respect to definition

- by thornate

As part of a larger project, I need to read in text and represent each word as a number. For example, if the program reads in "Every good boy deserves fruit", then I would get a table that converts 'every' to '1742', 'good' to '977513', etc. Now, obviously I can just use a hashing algorithm to get these numbers. However, it would be more useful if words with similar meanings had numerical values close to each other, so that 'good' becomes '6827' and 'great' becomes '6835', etc. As another option, instead of a simple integer representing each number, it would be even better to have a vector made up of multiple numbers, eg (lexical_category, tense, classification, specific_word) where lexical_category is noun/verb/adjective/etc, tense is future/past/present, classification defines a wide set of general topics and specific_word is much the same as described in the previous paragraph. Does any such an algorithm exist? If not, can you give me any tips on how to get started on developing one myself? I code in C++.

Read the article

Algorithm for sentence analysis and tokenization

- by Andrea Nagar

I need to analyze a document and compile statistics as to how many times each a sequence of words is used (so the analysis is not on single words but of batch of recurring words). I read that compression algorithms do something similar to what I want - creating dictionaries of blocks of text with a piece of information reporting its frequency. It should be something similar to http://www.codeproject.com/KB/recipes/Patterns.aspx Do you have anything written in C#?

Read the article

How does Amazon's Statistically Improbable Phrases work?

- by ??iu

How does something like Statistically Improbable Phrases work? According to amazon: Amazon.com's Statistically Improbable Phrases, or "SIPs", are the most distinctive phrases in the text of books in the Search Inside!™ program. To identify SIPs, our computers scan the text of all books in the Search Inside! program. If they find a phrase that occurs a large number of times in a particular book relative to all Search Inside! books, that phrase is a SIP in that book. SIPs are not necessarily improbable within a particular book, but they are improbable relative to all books in Search Inside!. For example, most SIPs for a book on taxes are tax related. But because we display SIPs in order of their improbability score, the first SIPs will be on tax topics that this book mentions more often than other tax books. For works of fiction, SIPs tend to be distinctive word combinations that often hint at important plot elements. For instance, for Joel's first book, the SIPs are: leaky abstractions, antialiased text, own dog food, bug count, daily builds, bug database, software schedules One interesting complication is that these are phrases of either 2 or 3 words. This makes things a little more interesting because these phrases can overlap with or contain each other.

Read the article

Entity Framework and associations between string keys

- by fredrik

Hi, I am new to Entity Framework, and ORM's for that mather. In the project that I'm involed in we have a legacy database, with all its keys as strings, case-insensitive. We are converting to MSSQL and want to use EF as ORM, but have run in to a problem. Here is an example that illustrates our problem: TableA has a primary string key, TableB has a reference to this primary key. In LINQ we write something like: var result = from t in context.TableB select t.TableA; foreach( var r in result ) Console.WriteLine( r.someFieldInTableA ); if TableA contains a primary key that reads "A", and TableB contains two rows that references TableA but with different cases in the referenceing field, "a" and "A". In our project we want both of the rows to endup in the result, but only the one with the matching case will end up there. Using the SQL Profiler, I have noticed that both of the rows are selected. Is there a way to tell Entity Framework that the keys are case insensitive? Edit:We have now tested this with NHibernate and come to the conclution that NHibernate works with case-insensitive keys. So NHibernate might be a better choice for us.I am however still interested in finding out if there is any way to change the behaviour of Entity Framework.

Read the article

What applications is Python optimal for?

- by Alan

I'm already a professional J2EE developer by day, and Rails developer by night. I'm planning on adding Python to my list of skills. I'm already convinced a language is just a tool, so I'm not interested in a religious war. I agree with the Pragmatic Programmers that learning one language/year is a good thing for your professional development So, in your considered opinion, what kinds of applications does Python hit the sweet spot? And why? What advantages does it have, and why do these advantages outweigh the costs in adopting Python? ADD: I also plan on learning a pure functional language like Scheme.

Read the article

Data clean up: are there libraries of common permutations that we can use? Or is there a better appr

- by anyaelena

We are working on clean-up and analysis of a lot of human-entered customer data. We need to decide programmatically whether 2 addresses (for example) are the same, even though the data was entered with slight variations. Right now we run each address through fairly simplistic string replacement (replacing avenue with ave, for example), concatenate the fields and compare the results. We are doing something similar with names. At the very least, it seems like our list of search-replace values should already exist somewhere. Or perhaps you can suggest a totally different and superior way to detect matches?

Read the article

Corpus/data set of English words with syllabic stress information?

- by endtime

I know this is a long shot, but does anyone know of a dataset of English words that has stress information by syllable? Something as simple as the following would be fantastic: AARD vark A ble a BOUT ac COUNT AC id ad DIC tion ad VERT ise ment ... Thanks in advance!

Read the article

New or not so well-known paradigms, syntax features and behaviours of programming languages?

- by George B

I've designed some educational programming languages and interpreters for them, but my problem always was that they ended up "normal" and "boring", mostly similar to some kind of existing language (ASM and BASIC). I find it really hard to come up with new ideas for syntax features, "neat things" and new or very modified programming paradigms for it. I always thought that it was hard to come up with good new things not fun/useless new things for this case. I wondered if you could help me out with your creativity: What features in terms of language syntax and built-in functions as well as maybe even new paradigms can I work into my language to keep it useless but more fun, enjoyable, interesting and/or different to program in?

Read the article

Given a document select a relevant snippet.

- by BCS

When I ask a question here, the tool tips for the question returned by the auto search given the first little bit of the question, but a decent percentage of them don't give any text that is any more useful for understanding the question than the title. Does anyone have an idea about how to make a filter to trim out useless bits of a question? My first idea is to trim any leading sentences that contain only words in some list (for instance, stop words, plus words from the title, plus words from the SO corpus that have very weak correlation with tags, that is that are equally likely to occur in any question regardless of it's tags)

Read the article

Inter-rater agreement (Fleiss' Kappa, Krippendorff's Alpha etc) Java API?

- by adam

I am working on building a Question Classification/Answering corpus as a part of my masters thesis. I'm looking at evaluating my expected answer type taxonomy with respect to inter-rater agreement/reliability, and I was wondering: Does anybody know of any decent (preferably free) Java API(s) that can do this? I'm reasonably certain all I need is Fleiss' Kappa and Krippendorff's Alpha at this point. Weka provides a kappa statistic in it's evaluation package, but I think it can only evaluate a classifier and I'm not at that stage yet (because I'm still building the data set and classes). Thanks.

Read the article

Which OSS can extract a synopsis from a text?

- by Aaron Digulla

Is there an OSS which can compress a text to a synopsis? My goal is to build an editor for SciFi novels which can either automatically create a synopsizes for chapters or at least make a suggestion for one.

Read the article

SQL Server 2008 Restore from Backup fails with error 3241 'cannot process this media family'

- by pearcewg

I am attempting to backup a database from a SQL Server instance on one machine and restore it to another, and I am encountering the frequently discovered 'SQL Server cannot process this media family' error. Each of my instances are SQL Server 2008, but with different patch levels Restore: 10.0.2531.0 Backup: 10.0.1600.22 ((SQL_PreRelease).080709-1414 ) The restore DB is express. Not sure about the backup version. The backup version is on a virtual private server. The restore is on my development box. When I restore to a different database on the source (backup) server, it restores fine. Lots of stuff on google about this issue, some on stackoverflow about this issue, but nothing which is this exact situation. Any thoughts? It should be straightforward to do a backup and restore from one machine to another (having done this thousands of times in with SQL 6.5,7,2000,2005). Any ideas how to restore a database in this situation, which gives this error when attempting to restore? PARTIAL RESOLUTION: When I restored to a different box, running SQL 2008 Express on Windows Server 2003, all worked well. It just wouldn't work on the Windows 7 box. Not sure why. If anyone else has a similar experience, please let me know (there are many similar issues in different forums out there).

Read the article

Squid external_acl_type Cannot run process

- by Alex Rezistorman

I want to restrict uploading for group of the users via squid. So I've choosen to use external_acl_type but after reload of the squid it returns error. WARNING: Cannot run '/usr/local/etc/squid/lists/newupload.sh' process. Permissions of newupload.sh and squid are the same. newupload.sh is executive. How can I solve this problem? Thnx in advance. newupload.sh #!/bin/sh while read line; do set -- $line length=$1 limit=$2 if [ -z "$length" ] || [ "$length" -le "$2" ]; then echo OK else echo ERR fi done Strings from squid.conf external_acl_type request_body protocol=2.5 %{Content-Lenght} /usr/local/etc/squid/lists/newupload.sh acl request_max_size external request_body 5000 http_access allow users request_max_size Squid version squid -v Squid Cache: Version 3.2.13 configure options: '--with-default-user=squid' '--bindir=/usr/local/sbin' '--sbindir=/usr/local/sbin' '--datadir=/usr/local/etc/squid' '--libexecdir=/usr/local/libexec/squid' '--localstatedir=/var' '--sysconfdir=/usr/local/etc/squid' '--with-logdir=/var/log/squid' '--with-pidfile=/var/run/squid/squid.pid' '--with-swapdir=/var/squid/cache/squid' '--enable-auth' '--enable-build-info' '--enable-loadable-modules' '--enable-removal-policies=lru heap' '--disable-epoll' '--disable-linux-netfilter' '--disable-linux-tproxy' '--disable-translation' '--enable-auth-basic=PAM' '--disable-auth-digest' '--enable-external-acl-helpers= kerberos_ldap_group' '--enable-auth-negotiate=kerberos' '--disable-auth-ntlm' '--without-pthreads' '--enable-storeio=diskd ufs' '--enable-disk-io=AIO Blocking DiskDaemon IpcIo Mmapped' '--enable-log-daemon-helpers=file' '--disable-url-rewrite-helpers' '--disable-ipv6' '--disable-snmp' '--disable-htcp' '--disable-forw-via-db' '--disable-cache-digests' '--disable-wccp' '--disable-wccpv2' '--disable-ident-lookups' '--disable-eui' '--disable-ipfw-transparent' '--disable-pf-transparent' '--disable-ipf-transparent' '--disable-follow-x-forwarded-for' '--disable-ecap' '--disable-icap-client' '--disable-esi' '--enable-kqueue' '--with-large-files' '--enable-cachemgr-hostname=proxy.adir.vbr.ua' '--with-filedescriptors=131072' '--disable-auto-locale' '--prefix=/usr/local' '--mandir=/usr/local/man' '--infodir=/usr/local/info/' '--build=amd64-portbld-freebsd8.3' 'build_alias=amd64-portbld-freebsd8.3' 'CC=cc' 'CFLAGS=-O2 -fno-strict-aliasing -frename-registers -fweb -fforce-addr -fmerge-all-constants -maccumulate-outgoing-args -pipe -march=core2 -I/usr/local/include -DLDAP_DEPRECATED' 'LDFLAGS= -L/usr/local/lib' 'CPPFLAGS=-I/usr/local/include' 'CXX=c++' 'CXXFLAGS=-O2 -fno-strict-aliasing -frename-registers -fweb -fforce-addr -fmerge-all-constants -maccumulate-outgoing-args -pipe -march=core2 -I/usr/local/include -DLDAP_DEPRECATED' 'CPP=cpp' --enable-ltdl-convenience Related post: Restrict uploading for groups in squid http://squid-web-proxy-cache.1019090.n4.nabble.com/flexible-managing-of-request-body-max-size-with-squid-2-5-STABLE12-td1022653.html

Read the article

Redhat 5.5: Multi-thread process only uses 1 CPU of the available 8

- by Tonny

Weird situation: Redhat Enterprise 5.5 (stock install, no updates, x64) on a HP z800 workstation. (Dual Xeon 2,2 Ghz. 8 cores, 16 if you count Hyper-threading. RH sees 16 cores.) We have an application that can utilize 1, 2 or 4 threads for heavy calculations. Somehow all these threads run on the same core at 100% load (the other 15 cores are nearly idle) so there is absolutely no benefit from the extra threads. In fact there is a slight slowdown as the threads get in each others way on the single core. How do I get them to run on separate cores (if possible)? Application is 64 bit. Can't change anything about the software except changing the threads setting. Is there some obscure Linux setting I can try to change? (I'm a True64 and Aix guy. I use Linux, but have no in depth knowledge of the process scheduling on Linux.)

Read the article

Apache2 refuses to process php files - "Snow Leopard" OSX 10.6.4

- by w-01

I have a macbook pro i5. my understanding is that by default it should be able to serve php5. i have uncommented the relevant line in /etc/apache2/httpd.conf LoadModule php5_module libexec/apache2/libphp5.so I have restarted apache with sudo apachectl -k restart and when i try to access a file with a php extension, Apache prompts me to download the file. i.e. instead of processing the php and sending me html, it thinks i want to download the file.... when i look in apache error log i see this [Fri Nov 12 10:16:14 2010] [notice] Apache/2.2.14 (Unix) PHP/5.3.2 mod_ssl/2.2.14 OpenSSL/0.9.8l DAV/2 mod_wsgi/3.2 Python/2.6.1 configured -- resuming normal operations so it looks like php5 is loading properly. I'd like to know either: How do i fix this? or How do I reinstall apache2 so that it's like i just installed the os? thanks in advance update @Zayne - the end of my httpd.conf has Include /private/etc/apache2/other/*.conf and i have a file /etc/apache2/other/php.conf with the contents <IfModule php5_module> AddType application/x-httpd-php .php AddType application/x-httpd-php-source .phps <IfModule dir_module> DirectoryIndex index.html index.php </IfModule> </IfModule> @Zayne I've already copied php.ini.default to php.ini in the same folder. when i run sudo apachectl configtest i get /usr/sbin/apachectl: line 82: ulimit: open files: cannot modify limit: Invalid argument httpd: Could not reliably determine the server's fully qualified domain name, using ::1 for ServerName Syntax OK furthermore i decided to try apachectl -M which shows all loaded modules Most importantly in the list of loaded modules i got Loaded Modules: php5_module (shared) Since the module is being loaded, it seems like the issue has more to do with making apache use php engine to process the php files.... so something wrong with the ifmodule directive?

Read the article

Sporadic crash of master-slave MySQL replication process

- by obarshay

Hello, I was wondering if someone has experienced this and can perhaps provide some insight into this issue. We have a plan-vanilla MySQL master-slave replication set up. The tables are MyISAM and the master can get quite read/write active. We use the slave instance to perform full daily backups in order to avoid bringing down the master server. The backup process does the following: STOP SLAVE SQL_THREAD mysqlhotcopy all tables START SLAVE SQL_THREAD Every once in a while (once a month or so) the replication breaks with varying error messages indicating a corrupt query or log file. Here's one that happened last night: mysql> show slave status \G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: server8.propreports.com Master_User: nexus8 Master_Port: 3306 Connect_Retry: 60 Master_Log_File: bin.000045 Read_Master_Log_Pos: 581644327 Relay_Log_File: relay.000086 Relay_Log_Pos: 94131 Relay_Master_Log_File: bin.000045 Slave_IO_Running: Yes Slave_SQL_Running: No Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1064 Last_Error: Error 'You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '138070603'£' at line 1' on query. Default database: 'wtsdb'. Query: 'UPDATE fill SET clearing_fee='0.0E id='138070603'£' Skip_Counter: 0 Exec_Master_Log_Pos: 4164743 Relay_Log_Space: 577574251 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: NULL I follow the following procedure to recover from above error and resume replication: stop slave; change master to MASTER_LOG_POS = 4164743, MASTER_LOG_FILE = 'bin.000045'; start slave; We have multiple servers set up this way and they all sporadically stop replicating with a similar error. Any advice on how to resolve this would be greatly appreciated.

Read the article

USB drive dead after stopping copying process on Snow Leopard Server

- by Anriëtte Combrink

Hi there I was copying to a flash drive from our Snow Leopard server when I stopped the copying process half way through. The device then disappeared from the Desktop. So I unplugged it and plugged it right back in. The device just didn't show up. I unplugged it and plugged it into a Windows XP machine as well as a Windows 7 machine. On both machines, I right clicked "My Computer" and selected "Manage…". On both PC's, the device was located under Removable Storage, but had no size and no drive letter. It shows up in "My Computer", but when I choose "Format…" from the right-click menu (context menu), it says the drive could not be formatted. Can someone please advise me? The flash drives is about 5 mins old and should have no reason to be dead. I really can't loose this drive (I don't need the data on it, I just need it to work again), any help would be appreciated. Thanks in advance.

Search Results

Search found 33424 results on 1337 pages for 'natural language process'.

Page 100/1337 | < Previous Page | 96 97 98 99 100 101 102 103 104 105 106 107 | Next Page >

- by Pranav

- by C.

- by Rhubarb

- by Matt Lacey

- by Nick

- by Simon

- by Carl Seleborg

- by dclowd9901

- by thornate

- by Andrea Nagar

- by ??iu

- by fredrik

- by Alan

- by anyaelena

- by endtime

- by George B

- by BCS

- by adam

- by Aaron Digulla

- by pearcewg

- by Alex Rezistorman

- by Tonny

- by w-01

- by obarshay

- by Anriëtte Combrink

< Previous Page | 96 97 98 99 100 101 102 103 104 105 106 107 | Next Page >