stanford nlp - Page 4 - Developer IT

Algorithm to classify a list of products?

- by Martin

I have a list representing products which are more or less the same. For instance, in the list below, they are all Seagate hard drives. Seagate Hard Drive 500Go Seagate Hard Drive 120Go for laptop Seagate Barracuda 7200.12 ST3500418AS 500GB 7200 RPM SATA 3.0Gb/s Hard Drive New and shinny 500Go hard drive from Seagate Seagate Barracuda 7200.12 Seagate FreeAgent Desk 500GB External Hard Drive Silver 7200RPM USB2.0 Retail For a human being, the hard drives 3 and 5 are the same. We could go a little bit further and suppose that the products 1, 3, 4 and 5 are the same and put in other categories the product 2 and 6. We have a huge list of products that I would like to classify. Does anybody have an idea of what would be the best algorithm to do such thing. Any suggestions? I though of a Bayesian classifier but I am not sure if it is the best choice. Any help would be appreciated! Thanks.

Read the article

Natural Language Processing in Ruby

- by Joey Robert

I'm looking to do some sentence analysis (mostly for twitter apps) and infer some general characteristics. Are there any good natural language processing libraries for this sort of thing in Ruby? Similar to http://stackoverflow.com/questions/870460/java-is-there-a-good-natural-language-processing-library but for Ruby. I'd prefer something very general, but any leads are appreciated!

Read the article

Algorithm to match natural text in mail

- by snøreven

I need to separate natural, coherent text/sentences in emails from lists, signatures, greetings and so on before further processing. example: Hi tom, last monday we did bla bla, lore Lorem ipsum dolor sit amet, consectetur adipisici elit, sed eiusmod tempor incidunt ut labore et dolore magna aliqua. list item 2 list item 3 list item 3 Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquid x ea commodi consequat. Quis aute iure reprehenderit in voluptate velit regards, K. ---line-of-funny-characters-####### example inc. 33 evil street, london mobile: 00 234534/234345 Ideally the algorithm would match only the bold parts. Is there any recommended approach - or are there even existing algorithms for that problem? Should I try approximate regular expressions or more statistical stuff based on number of punctation marks, length and so on?

Read the article

How does Amazon's Statistically Improbable Phrases work?

- by ??iu

How does something like Statistically Improbable Phrases work? According to amazon: Amazon.com's Statistically Improbable Phrases, or "SIPs", are the most distinctive phrases in the text of books in the Search Inside!™ program. To identify SIPs, our computers scan the text of all books in the Search Inside! program. If they find a phrase that occurs a large number of times in a particular book relative to all Search Inside! books, that phrase is a SIP in that book. SIPs are not necessarily improbable within a particular book, but they are improbable relative to all books in Search Inside!. For example, most SIPs for a book on taxes are tax related. But because we display SIPs in order of their improbability score, the first SIPs will be on tax topics that this book mentions more often than other tax books. For works of fiction, SIPs tend to be distinctive word combinations that often hint at important plot elements. For instance, for Joel's first book, the SIPs are: leaky abstractions, antialiased text, own dog food, bug count, daily builds, bug database, software schedules One interesting complication is that these are phrases of either 2 or 3 words. This makes things a little more interesting because these phrases can overlap with or contain each other.

Read the article

PyParsing: Is this correct use of setParseAction()?

- by Rosarch

I have strings like this: "MSE 2110, 3030, 4102" I would like to output: [("MSE", 2110), ("MSE", 3030), ("MSE", 4102)] This is my way of going about it, although I haven't quite gotten it yet: def makeCourseList(str, location, tokens): print "before: %s" % tokens for index, course_number in enumerate(tokens[1:]): tokens[index + 1] = (tokens[0][0], course_number) print "after: %s" % tokens course = Group(DEPT_CODE + COURSE_NUMBER) # .setResultsName("Course") course_data = (course + ZeroOrMore(Suppress(',') + COURSE_NUMBER)).setParseAction(makeCourseList) This outputs: >>> course.parseString("CS 2110") ([(['CS', 2110], {})], {}) >>> course_data.parseString("CS 2110, 4301, 2123, 1110") before: [['CS', 2110], 4301, 2123, 1110] after: [['CS', 2110], ('CS', 4301), ('CS', 2123), ('CS', 1110)] ([(['CS', 2110], {}), ('CS', 4301), ('CS', 2123), ('CS', 1110)], {}) Is this the right way to do it, or am I totally off? Also, the output of isn't quite correct - I want course_data to emit a list of course symbols that are in the same format as each other. Right now, the first course is different from the others. (It has a {}, whereas the others don't.)

Read the article

How to estimate the quality of a web page?

- by roddik

Hello, I'm doing a university project, that must gather and combine data on a user provided topic. The problem I've encountered is that Google search results for many terms are polluted with low quality autogenerated pages and if I use them, I can end up with wrong facts. How is it possible to estimate the quality/trustworthiness of a page? You may think "nah, Google engineers are working on the problem for 10 years and he's asking for a solution", but if you think about it, SE must provide up-to-date content and if it marks a good page as a bad one, users will be dissatisfied. I don't have such limitations, so if the algorithm accidentally marks as bad some good pages, that wouldn't be a problem.

Read the article

Measuring the performance of classification algorithm

- by Silver Dragon

I've got a classification problem in my hand, which I'd like to address with a machine learning algorithm ( Bayes, or Markovian probably, the question is independent on the classifier to be used). Given a number of training instances, I'm looking for a way to measure the performance of an implemented classificator, with taking data overfitting problem into account. That is: given N[1..100] training samples, if I run the training algorithm on every one of the samples, and use this very same samples to measure fitness, it might stuck into a data overfitting problem -the classifier will know the exact answers for the training instances, without having much predictive power, rendering the fitness results useless. An obvious solution would be seperating the hand-tagged samples into training, and test samples; and I'd like to learn about methods selecting the statistically significant samples for training. White papers, book pointers, and PDFs much appreciated!

Read the article

Python/YACC Lexer: Token priority?

- by Rosarch

I'm trying to use reserved words in my grammar: reserved = { 'if' : 'IF', 'then' : 'THEN', 'else' : 'ELSE', 'while' : 'WHILE', } tokens = [ 'DEPT_CODE', 'COURSE_NUMBER', 'OR_CONJ', 'ID', ] + list(reserved.values()) t_DEPT_CODE = r'[A-Z]{2,}' t_COURSE_NUMBER = r'[0-9]{4}' t_OR_CONJ = r'or' t_ignore = ' \t' def t_ID(t): r'[a-zA-Z_][a-zA-Z_0-9]*' if t.value in reserved.values(): t.type = reserved[t.value] return t return None However, the t_ID rule somehow swallows up DEPT_CODE and OR_CONJ. How can I get around this? I'd like those two to take higher precedence than the reserved words.

Read the article

How to conjugate English words in Java?

- by roddik

Hello. Say I have a base form of a word and a tag from the Penn Treebank Tag Set. How can I get the conjugated form? For example for "do" and "VBN" how can I get "done"?

Read the article

How to write efficient code for extracting Noun phrases?

- by Arun Abraham

I am trying to extract phrases using rules such as the ones mentioned below on text which has been POS tagged 1) NNP - NNP (- indicates followed by) 2) NNP - CC - NNP 3) VP - NP etc.. I have written code in this manner, Can someone tell me how i can do in a better manner. List<String> nounPhrases = new ArrayList<String>(); for (List<HasWord> sentence : documentPreprocessor) { //System.out.println(sentence.toString()); System.out.println(Sentence.listToString(sentence, false)); List<TaggedWord> tSentence = tagger.tagSentence(sentence); String lastTag = null, lastWord = null; for (TaggedWord taggedWord : tSentence) { if (lastTag != null && taggedWord.tag().equalsIgnoreCase("NNP") && lastTag.equalsIgnoreCase("NNP")) { nounPhrases.add(taggedWord.word() + " " + lastWord); //System.out.println(taggedWord.word() + " " + lastWord); } lastTag = taggedWord.tag(); lastWord = taggedWord.word(); } } In the above code, i have done only for NNP followed by NNP extraction, how can i generalise it so that i can add other rules too. I know that there are libraries available for doing this , but wanted to do this manually.

Read the article

Searching Natural Language Sentence Structure

- by Cerin

What's the best way to store and search a database of natural language sentence structure trees? Using OpenNLP's English Treebank Parser, I can get fairly reliable sentence structure parsings for arbitrary sentences. What I'd like to do is create a tool that can extract all the doc strings from my source code, generate these trees for all sentences in the doc strings, store these trees and their associated function name in a database, and then allow a user to search the database using natural language queries. So, given the sentence "This uploads files to a remote machine." for the function upload_files(), I'd have the tree: (TOP (S (NP (DT This)) (VP (VBZ uploads) (NP (NNS files)) (PP (TO to) (NP (DT a) (JJ remote) (NN machine)))) (. .))) If someone entered the query "How can I upload files?", equating to the tree: (TOP (SBARQ (WHADVP (WRB How)) (SQ (MD can) (NP (PRP I)) (VP (VB upload) (NP (NNS files)))) (. ?))) how would I store and query these trees in a SQL database? I've written a simple proof-of-concept script that can perform this search using a mix of regular expressions and network graph parsing, but I'm not sure how I'd implement this in a scalable way. And yes, I realize my example would be trivial to retrieve using a simple keyword search. The idea I'm trying to test is how I might take advantage of grammatical structure, so I can weed-out entries with similar keywords, but a different sentence structure. For example, with the above query, I wouldn't want to retrieve the entry associated with the sentence "Checks a remote machine to find a user that uploads files." which has similar keywords, but is obviously describing a completely different behavior.

Read the article

Python: How best to parse a simple grammar?

- by Rosarch

Ok, so I've asked a bunch of smaller questions about this project, but I still don't have much confidence in the designs I'm coming up with, so I'm going to ask a question on a broader scale. I am parsing pre-requisite descriptions for a course catalog. The descriptions almost always follow a certain form, which makes me think I can parse most of them. From the text, I would like to generate a graph of course pre-requisite relationships. (That part will be easy, after I have parsed the data.) Some sample inputs and outputs: "CS 2110" => ("CS", 2110) # 0 "CS 2110 and INFO 3300" => [("CS", 2110), ("INFO", 3300)] # 1 "CS 2110, INFO 3300" => [("CS", 2110), ("INFO", 3300)] # 1 "CS 2110, 3300, 3140" => [("CS", 2110), ("CS", 3300), ("CS", 3140)] # 1 "CS 2110 or INFO 3300" => [[("CS", 2110)], [("INFO", 3300)]] # 2 "MATH 2210, 2230, 2310, or 2940" => [[("MATH", 2210), ("MATH", 2230), ("MATH", 2310)], [("MATH", 2940)]] # 3 If the entire description is just a course, it is output directly. If the courses are conjoined ("and"), they are all output in the same list If the course are disjoined ("or"), they are in separate lists Here, we have both "and" and "or". One caveat that makes it easier: it appears that the nesting of "and"/"or" phrases is never greater than as shown in example 3. What is the best way to do this? I started with PLY, but I couldn't figure out how to resolve the reduce/reduce conflicts. The advantage of PLY is that it's easy to manipulate what each parse rule generates: def p_course(p): 'course : DEPT_CODE COURSE_NUMBER' p[0] = (p[1], int(p[2])) With PyParse, it's less clear how to modify the output of parseString(). I was considering building upon @Alex Martelli's idea of keeping state in an object and building up the output from that, but I'm not sure exactly how that is best done. def addCourse(self, str, location, tokens): self.result.append((tokens[0][0], tokens[0][1])) def makeCourseList(self, str, location, tokens): dept = tokens[0][0] new_tokens = [(dept, tokens[0][1])] new_tokens.extend((dept, tok) for tok in tokens[1:]) self.result.append(new_tokens) For instance, to handle "or" cases: def __init__(self): self.result = [] # ... self.statement = (course_data + Optional(OR_CONJ + course_data)).setParseAction(self.disjunctionCourses) def disjunctionCourses(self, str, location, tokens): if len(tokens) == 1: return tokens print "disjunction tokens: %s" % tokens How does disjunctionCourses() know which smaller phrases to disjoin? All it gets is tokens, but what's been parsed so far is stored in result, so how can the function tell which data in result corresponds to which elements of token? I guess I could search through the tokens, then find an element of result with the same data, but that feel convoluted... What's a better way to approach this problem?

Read the article

Online job-searching is tedious. Help me automate it.

- by ehsanul

Many job sites have broken searches that don't let you narrow down jobs by experience level. Even when they do, it's usually wrong. This requires you to wade through hundreds of postings that you can't apply for before finding a relevant one, quite tedious. Since I'd rather focus on writing cover letters etc., I want to write a program to look through a large number of postings, and save the URLs of just those jobs that don't require years of experience. I don't require help writing the scraper to get the html bodies of possibly relevant job posts. The issue is accurately detecting the level of experience required for the job. This should not be too difficult as job posts are usually very explicit about this ("must have 5 years experience in..."), but there may be some issues with overly simple solutions. In my case, I'm looking for entry-level positions. Often they don't say "entry-level", but inclusion of the words probably means the job should be saved. Next, I can safely exclude a job the says it requires "5 years" of experience in whatever, so a regex like /\d\syears/ seems reasonable to exclude jobs. But then, I realized some jobs say they'll take 0-2 years of experience, matches the exclusion regex but is clearly a job I want to take a look at. Hmmm, I can handle that with another regex. But some say "less than 2 years" or "fewer than 2 years". Can handle that too, but it makes me wonder what other patterns I'm not thinking of, and possibly excluding many jobs. That's what brings me here, to find a better way to do this than regexes, if there is one. I'd like to minimize the false negative rate and save all the jobs that seem like they might not require many years of experience. Does excluding anything that matches /[3-9]\syears|1\d\syears/ seem reasonable? Or is there a better way? Training a bayesian filter maybe?

Read the article

POS tagger in SharpNLP

- by C.

I am using SharpNLP for my POS tagging: EnglishMaximumEntropyPosTagger posTagger = new EnglishMaximumEntropyPosTagger(mModelPath); String tagSentence = posTagger.TagSentence(question); I only have 3 tags. How can I load a set of Penn treebank or some other tagging tree banks to use? Thanks :)

Read the article

Corpus/data set of English words with syllabic stress information?

- by endtime

I know this is a long shot, but does anyone know of a dataset of English words that has stress information by syllable? Something as simple as the following would be fantastic: AARD vark A ble a BOUT ac COUNT AC id ad DIC tion ad VERT ise ment ... Thanks in advance!

Read the article

Keyword sorting algorithm

- by Nai

I have over 1000 surveys, many of which contains open-ended replies. I would like to be able to 'parse' in all the words and get a ranking of the most used words (disregarding common words) to spot a trend. How can I do this? Is there a program I can use? EDIT If a 3rd party solution is not available, it would be great if we can keep the discussion to microsoft technologies only. Cheers.

Read the article

Natural Language Processing Solution in Java ?

- by Kim Jong Woo

Are there any equally great packages like Python's NTLK in Java world ?

Read the article

A StringToken Parser which gives Google Search style "Did you mean:" Suggestions

- by _ande_turner_

Seeking a method to: Take whitespace separated tokens in a String; return a suggested Word ie: Google Search can take "fonetic wrd nterpreterr", and atop of the result page it shows "Did you mean: phonetic word interpreter" A solution in any of the C* languages or Java would be preferred. Are there any existing Open Libraries which perform such functionality? Or is there a way to Utilise a Google API to request a suggested word?

Read the article

Python: Trouble with YACC

- by Rosarch

I'm parsing sentences like: "CS 2310 or equivalent experience" The desired output: [[("CS", 2310)], ["equivalent experience"]] YACC tokenizer symbols: tokens = [ 'DEPT_CODE', 'COURSE_NUMBER', 'OR_CONJ', 'MISC_TEXT', ] t_DEPT_CODE = r'[A-Z]{2,}' t_COURSE_NUMBER = r'[0-9]{4}' t_OR_CONJ = r'or' t_ignore = ' \t' terms = {'DEPT_CODE': t_DEPT_CODE, 'COURSE_NUMBER': t_COURSE_NUMBER, 'OR_CONJ': t_OR_CONJ} for name, regex in terms.items(): terms[name] = "^%s$" % regex def t_MISC_TEXT(t): r'\S+' for name, regex in terms.items(): # print "trying to match %s with regex %s" % (t.value, regex) if re.match(regex, t.value): t.type = name return t return t (MISC_TEXT is meant to match anything not caught by the other terms.) Some relevant rules from the parser: precedence = ( ('left', 'MISC_TEXT'), ) def p_statement_course_data(p): 'statement : course_data' p[0] = p[1] def p_course_data(p): 'course_data : course' p[0] = p[1] def p_course(p): 'course : DEPT_CODE COURSE_NUMBER' p[0] = make_course(p[1], int(p[2])) def p_or_phrase(p): 'or_phrase : statement OR_CONJ statement' p[0] = [[p[1]], [p[3]]] def p_misc_text(p): '''text_aggregate : MISC_TEXT MISC_TEXT | MISC_TEXT text_aggregate | text_aggregate MISC_TEXT ''' p[0] = "%s %s" % (p[0], [1]) def p_text_aggregate_statement(p): 'statement : text_aggregate' p[0] = p[1] Unfortunately, this fails: # works as it should >>> token_list("CS 2110 or equivalent experience") [LexToken(DEPT_CODE,'CS',1,0), LexToken(COURSE_NUMBER,'2110',1,3), LexToken(OR_CONJ,'or',1,8), LexToken(MISC_TEXT,'equivalent',1,11), LexToken(MISC_TEXT,'experience',1,22)] # fails. bummer. >>> parser.parse("CS 2110 or equivalent experience") Syntax error in input: LexToken(MISC_TEXT,'equivalent',1,11) What am I doing wrong? I don't fully understand how to set precedence rules. Also, this is my error function: def p_error(p): print "Syntax error in input: %s" % p Is there a way to see which rule the parser was trying when it failed? Or some other way to make the parser print which rules its trying?

Read the article

Learning Objective-C: Need advice on populating NSMutableDictionary

- by Zigrivers

I am teaching myself Objective-C utilizing a number of resources, one of which is the Stanford iPhone Dev class available via iTunes U (past 2010 class). One of the home work assignments asked that I populate a mutable dictionary with a predefined list of keys and values (URLs). I was able to put the code together, but as I look at it, I keep thinking there is probably a much better way for me to approach what I'm trying to do: Populate a NSMutableDictionary with the predefined keys and values Enumerate through the keys of the dictionary and check each key to see if it starts with "Stanford" If it meets the criteria, log both the key and the value I would really appreciate any feedback on how I might improve on what I've put together. I'm the very definition of a beginner, but I'm really enjoying the challenge of learning Objective-C. void bookmarkDictionary () { NSMutableDictionary* bookmarks = [NSMutableDictionary dictionary]; NSString* one = @"Stanford University", *two = @"Apple", *three = @"CS193P", *four = @"Stanford on iTunes U", *five = @"Stanford Mall"; NSString* urlOne = @"http://www.stanford.edu", *urlTwo = @"http://www.apple.com", *urlThree = @"http://cs193p.stanford.edu", *urlFour = @"http://itunes.stanford.edu", *urlFive = @"http://stanfordshop.com"; NSURL* oneURL = [NSURL URLWithString:urlOne]; NSURL* twoURL = [NSURL URLWithString:urlTwo]; NSURL* threeURL = [NSURL URLWithString:urlThree]; NSURL* fourURL = [NSURL URLWithString:urlFour]; NSURL* fiveURL = [NSURL URLWithString:urlFive]; [bookmarks setObject:oneURL forKey:one]; [bookmarks setObject:twoURL forKey:two]; [bookmarks setObject:threeURL forKey:three]; [bookmarks setObject:fourURL forKey:four]; [bookmarks setObject:fiveURL forKey:five]; NSString* akey; NSString* testString = @"Stanford"; for (akey in bookmarks) { if ([akey hasPrefix:testString]) { NSLog(@"Key: %@ URL: %@", akey, [bookmarks objectForKey:akey]); } } } Thanks for your help!

Read the article

Agile methodologies. Is it a by-product of mind control techniques as NLP / Scientology?

- by Bobb

The more I read about contemporary methods combinging scrum, tdd and xp, the more I feel like I already seen the methods. I am not arguing that agile approach is much more progressive than older rigid structures like waterfall, what I am saying is that it seems to me that agile methodologies are ideal to be used as a nest for a brainwashing business. I read few articles which kept referring to authors which I checked afterwards and they call themselves - coaches, trainers (usual thing when NLP specialists are involved) with no apparent software development history. Also I met a guy who is a scrum faciltator (term widly used in relation to scientology) in a high profile company. I talked to him less than 5 min but I got complete feeling that he is either on drugs or he has been programmed by a powerful NLP specialist. The way to talk and his body movements witnessed he is not an average normal person (in terms of normal distribution :))... Please dont get me wrong. I am not a fun of conspiracy theories. But I had an experience with a member of church of scientology tried to invade a commercial firm and actually went half way through to very top in just 3 weeks. I saw his work. For now I have complete impression is that psycho manipulators are now invading IT industry through the convenient door of agile techniques. Anyone has the same feeling/thoughts?

Read the article

WhatApp?

Web and mobile apps come under review on new Stanford site sport - Games - Video Games - Stanford University - Mobile

Read the article

Classnotfound exception while running hadoop

- by vana

Hi, I am new to hadoop. I have a file Wordcount.java which refers hadoop.jar and stanford-parser.jar I am running the following commnad javac -classpath .:hadoop-0.20.1-core.jar:stanford-parser.jar -d ep WordCount.java jar cvf ep.jar -C ep . bin/hadoop jar ep.jar WordCount gutenburg gutenburg1 After executing i am getting the following error: lang.ClassNotFoundException: edu.stanford.nlp.parser.lexparser.LexicalizedParser The class is in stanford-parser.jar ... What can be the possible problem? Thanks

Read the article

Ubuntu server has slow performance

- by Rich

I have a custom built Ubuntu 11.04 server with a 6 disk software RAID 10 primary drive. On it I'm primarily running a PostgreSQL and a few other utilities that stream data from the web. I often find after a few hours of uptime the server starts to lag with all kinds of processes. For example, it may take 10-15 seconds after log-in to get a shell prompt. It might take 5-10 seconds for top to come up. An ls might take a second or two. When I look at top there is almost no CPU usage. There's a fair amount of memory used by the PostgreSQL server but not enough to bleed into swap. I have no idea where to go from here, other than to suspect the RAID10 (I've only ever had software RAID 1's before). Edit: Output from top: top - 11:56:03 up 1:46, 3 users, load average: 0.89, 0.73, 0.72 Tasks: 119 total, 1 running, 118 sleeping, 0 stopped, 0 zombie Cpu(s): 0.2%us, 0.0%sy, 0.0%ni, 93.5%id, 6.2%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 16325596k total, 3478248k used, 12847348k free, 20880k buffers Swap: 19534176k total, 0k used, 19534176k free, 3041992k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1747 woodsp 20 0 109m 10m 4888 S 1 0.1 0:42.70 python 357 root 20 0 0 0 0 S 0 0.0 0:00.40 jbd2/sda3-8 1 root 20 0 24324 2284 1344 S 0 0.0 0:00.84 init 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root 20 0 0 0 0 S 0 0.0 0:00.24 ksoftirqd/0 6 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/0 7 root RT 0 0 0 0 S 0 0.0 0:00.01 watchdog/0 8 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/1 10 root 20 0 0 0 0 S 0 0.0 0:00.02 ksoftirqd/1 12 root RT 0 0 0 0 S 0 0.0 0:00.01 watchdog/1 13 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/2 14 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/2:0 15 root 20 0 0 0 0 S 0 0.0 0:00.00 ksoftirqd/2 16 root RT 0 0 0 0 S 0 0.0 0:00.01 watchdog/2 17 root RT 0 0 0 0 S 0 0.0 0:00.00 migration/3 18 root 20 0 0 0 0 S 0 0.0 0:00.00 kworker/3:0 19 root 20 0 0 0 0 S 0 0.0 0:00.02 ksoftirqd/3 20 root RT 0 0 0 0 S 0 0.0 0:00.01 watchdog/3 21 root 0 -20 0 0 0 S 0 0.0 0:00.00 cpuset 22 root 0 -20 0 0 0 S 0 0.0 0:00.00 khelper 23 root 20 0 0 0 0 S 0 0.0 0:00.00 kdevtmpfs 24 root 0 -20 0 0 0 S 0 0.0 0:00.00 netns 26 root 20 0 0 0 0 S 0 0.0 0:00.00 sync_supers df -h rpsharp@ncp-skookum:~$ df -h Filesystem Size Used Avail Use% Mounted on /dev/sda3 1.8T 549G 1.2T 32% / udev 7.8G 4.0K 7.8G 1% /dev tmpfs 3.2G 492K 3.2G 1% /run none 5.0M 0 5.0M 0% /run/lock none 7.8G 0 7.8G 0% /run/shm /dev/sda2 952M 128K 952M 1% /boot/efi /dev/md0 5.5T 562G 4.7T 11% /usr/local free -m psharp@ncp-skookum:~$ free -m total used free shared buffers cached Mem: 15942 3409 12533 0 20 2983 -/+ buffers/cache: 405 15537 Swap: 19076 0 19076 tail -50 /var/log/syslog Jul 3 06:31:32 ncp-skookum rsyslogd: [origin software="rsyslogd" swVersion="5.8.6" x-pid="1070" x-info="http://www.rsyslog.com"] rsyslogd was HUPed Jul 3 06:39:01 ncp-skookum CRON[14211]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jul 3 06:40:01 ncp-skookum CRON[14223]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 07:00:01 ncp-skookum CRON[14328]: (woodsp) CMD (/home/woodsp/bin/mail_tweetupdate # email an update) Jul 3 07:00:01 ncp-skookum CRON[14327]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 07:00:28 ncp-skookum sendmail[14356]: q63E0SoZ014356: from=woodsp, size=2328, class=0, nrcpts=2, msgid=<201207031400.q63E0SoZ014356@ncp-skookum.Stanford.EDU>, relay=woodsp@localhost Jul 3 07:00:29 ncp-skookum sm-mta[14357]: q63E0Si6014357: from=<woodsp@ncp-skookum.Stanford.EDU>, size=2569, class=0, nrcpts=2, msgid=<201207031400.q63E0SoZ014356@ncp-skookum.Stanford.EDU>, proto=ESMTP, daemon=MTA-v4, relay=localhost [127.0.0.1] Jul 3 07:00:29 ncp-skookum sendmail[14356]: q63E0SoZ014356: to=Spencer Wood <[email protected]>,Martin Lacayo <[email protected]>, ctladdr=woodsp (1004/1005), delay=00:00:01, xdelay=00:00:01, mailer=relay, pri=62328, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (q63E0Si6014357 Message accepted for delivery) Jul 3 07:00:29 ncp-skookum sm-mta[14359]: STARTTLS=client, relay=mx3.stanford.edu., version=TLSv1/SSLv3, verify=FAIL, cipher=DHE-RSA-AES256-SHA, bits=256/256 Jul 3 07:00:29 ncp-skookum sm-mta[14359]: q63E0Si6014357: to=<[email protected]>,<[email protected]>, ctladdr=<woodsp@ncp-skookum.Stanford.EDU> (1004/1005), delay=00:00:01, xdelay=00:00:00, mailer=esmtp, pri=152569, relay=mx3.stanford.edu. [171.67.219.73], dsn=2.0.0, stat=Sent (Ok: queued as 8F3505802AC) Jul 3 07:09:08 ncp-skookum CRON[14396]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jul 3 07:17:01 ncp-skookum CRON[14438]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jul 3 07:20:01 ncp-skookum CRON[14453]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 07:39:01 ncp-skookum CRON[14551]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jul 3 07:40:01 ncp-skookum CRON[14562]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 08:00:01 ncp-skookum CRON[14668]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 08:09:01 ncp-skookum CRON[14724]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jul 3 08:17:01 ncp-skookum CRON[14766]: (root) CMD ( cd / && run-parts --report /etc/cron.hourly) Jul 3 08:20:01 ncp-skookum CRON[14781]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Jul 3 08:39:01 ncp-skookum CRON[14881]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -d /var/lib/php5 ] && find /var/lib/php5/ -depth -mindepth 1 -maxdepth 1 -type f -cmin +$(/usr/lib/php5/maxlifetime) ! -execdir fuser -s {} 2>/dev/null \; -delete) Jul 3 08:40:01 ncp-skookum CRON[14892]: (smmsp) CMD (test -x /etc/init.d/sendmail && /usr/share/sendmail/sendmail cron-msp) Output of hdparm -t /dev/sd{a,b,c,d,e,f} This looks suspicious? /dev/sda: Timing buffered disk reads: 2 MB in 4.84 seconds = 423.39 kB/sec /dev/sdb: Timing buffered disk reads: 420 MB in 3.01 seconds = 139.74 MB/sec /dev/sdc: Timing buffered disk reads: 390 MB in 3.00 seconds = 129.87 MB/sec /dev/sdd: Timing buffered disk reads: 416 MB in 3.00 seconds = 138.51 MB/sec /dev/sde: Timing buffered disk reads: 422 MB in 3.00 seconds = 140.50 MB/sec /dev/sdf: Timing buffered disk reads: 416 MB in 3.01 seconds = 138.26 MB/sec

Read the article

How to disable log4j logging from Java code

- by Erel Segal Halevi

I use a legacy library that writes logs using log4j. My default log4j.properties file directs the log to the console, but in some specific functions of my main program, I would like to disable logging altogether (from all classes). I tried this: Logger.getLogger(BasicImplementation.class.getName()).setLevel(Level.OFF); where "BasicImplementation" is one of the main classes that does logging, but it didn't work - the logs are still written to the console. Here is my log4j.properties: log4j.rootLogger=warn, stdout log4j.logger.ac.biu.nlp.nlp.engineml=info, logfile log4j.logger.org.BIU.utils.logging.ExperimentLogger=warn log4j.appender.stdout = org.apache.log4j.ConsoleAppender log4j.appender.stdout.layout = org.apache.log4j.PatternLayout log4j.appender.stdout.layout.ConversionPattern = %-5p %d{HH:mm:ss} [%t]: %m%n log4j.appender.logfile = ac.biu.nlp.nlp.log.BackupOlderFileAppender log4j.appender.logfile.append=false log4j.appender.logfile.layout = org.apache.log4j.PatternLayout log4j.appender.logfile.layout.ConversionPattern = %-5p %d{HH:mm:ss} [%t]: %m%n log4j.appender.logfile.File = logfile.log

Search Results

Search found 234 results on 10 pages for 'stanford nlp'.

Page 4/10 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page >

- by Martin

- by Joey Robert

- by snøreven

- by ??iu

- by Rosarch

- by roddik

- by Silver Dragon

- by Rosarch

- by roddik

- by Arun Abraham

- by Cerin

- by Rosarch

- by ehsanul

- by C.

- by endtime

- by Nai

- by Kim Jong Woo

- by _ande_turner_

- by Rosarch

- by Zigrivers

- by Bobb

- by vana

- by Rich

- by Erel Segal Halevi

< Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page >