-
as seen on Stack Overflow
- Search for 'Stack Overflow'
does anybody know where i can get a free web crawler that actually works with minimal coding by me. ive googled it and can only find really old ones that dont work or openwebspider which doesnt seem to work.
ideally id like to store just the web addresses and which links that page contains
any suggestions…
>>> More
-
as seen on Stack Overflow
- Search for 'Stack Overflow'
I am building a web application crawler that's meant not only to find all the links or pages in a web application, but also perform all the allowed actions in the app (such as pushing buttons, filling forms, notice changes in the DOM even if they did not trigger a request etc.)
Basically, this is…
>>> More
-
as seen on Stack Overflow
- Search for 'Stack Overflow'
i built an appengine web app cricket.hover.in. The web app consists of about 15k url's
linked in it, But even after a long time of my launch, no pages are indexed on google.
Any base link place on my root site hover.in are being indexed with in minutes.
but i placed the same link home page of root…
>>> More
-
as seen on Programmers
- Search for 'Programmers'
I would like to extract data from internet like www.mozenda.com does but I want to write my own program to do that. Specific data I'm looking for is various event data.
Based on my research, I think custom web crawler is my answer but I Would like to confirm the answer and see if there are any suggestion…
>>> More
-
as seen on Stack Overflow
- Search for 'Stack Overflow'
I want to crawl useful resource (like background picture .. ) from certain websites. It is not a hard job, especially with the help of some wonderful projects like scrapy.
The problem here is I not only just want crawl this site ONE TIME. I also want to keep my crawl long running and crawl the updated…
>>> More
-
as seen on Stack Overflow
- Search for 'Stack Overflow'
Zend Lucene and Java Lucene are built in PHP and java repectively, and PHP language has a higher level than java.
Just wondering How big the performance difference among these two, regarding to index building and data searching?
Is it much more effective to let java create and rebuild index, and…
>>> More
-
as seen on Stack Overflow
- Search for 'Stack Overflow'
I've read some Jira entries and they mentioned moving fast-vector-highlighter to core about a year ago but it never made it.
Looking at the svn for contrib it seems incomplete.
There are no tests for FastVectorHighlighter
Documentation is lacking
No samples anywhere on apache.org
Anyone have…
>>> More
-
as seen on Stack Overflow
- Search for 'Stack Overflow'
I am trying to install Pylucene (pylucene-3.3-3-src.tar.gz) on my ubuntu linux 11.10.
I have python 2.7.2. I was able to compile JCC (I think) because I didnt see any error when I installed it. When I tried to install Pylucene I get the following error. Can someone help? Thanks.
ICU not installed
/usr/bin/python…
>>> More
-
as seen on Stack Overflow
- Search for 'Stack Overflow'
I am trying to get the Highlighter class from Lucene to work properly with tokens coming from Solr's WordDelimiterFilter. It works 90% of the time, but if the matching text contains a ',' such as "1,500" the output is incorrect:
Expected: 'test 1,500 this'
Observed: 'test 11,500 this'
I…
>>> More
-
as seen on Stack Overflow
- Search for 'Stack Overflow'
How to handle this error in lucene:
java.lang.AbstractMethodError: org.apache.lucene.store.Directory.listAll()[Ljava/lang/String;
at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:568)
at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:69)
…
>>> More