crawling pasta hellion - Page 8

Investment advice data dump analysis

- by portoalet

For my year-end pet project, I'd like to analyze investment advices and their correlation to the stock market performance. The problem is, where do I get the dump of investment advice data (free) ? something like stackoverflow.com data dump will be nice. Or maybe it's easier to do distributed crawling and crawl the public finance webpages for investment advices? Investment advice is buy/sell advice for stocks/forex, issued by institution/investment advisor.

Read the article

List of default managed properties in SharePoint search

- by stranger001

Hi, I would like to what are the default managed properties available with default installation of a SharePoint. Also would like to know what is the default crawling property name maped to a managed property "ModifiedBy". Thanks.

Read the article

Retrieivng coordinates in this page

- by hao

Hey guys, Im trying to do some data mining and analyze data based on locations. For this site, http://www.dianping.com/shop/1898365 I am trying to figure out whats the latitude and longitude by crawling. But I cant seem to figure out where this information is stored. Can someone give me some pointers

Read the article

which Distribution of Linux is best suited for Nutch-Hadoop?

- by vipin k.

Hi experts, we are Trying to figure out which Distribution of Linux be best suited for the Nutch-Hadoop Integration?. we are planning to Use Clusters for Crawling large contents through Nutch. Let me Know if You need more clarification on this question?. Thanks you.

Read the article

Is there any Gmail API for Java

- by chenxgre

I am trying to crawling google gmail inbox and download messages into the database, is there any gmail java api can best do this job?

Read the article

Redraw balloon tip and tooltip in C#?

- by Rryk

I wrote a C# application that is a simple countdown timer. I use it myself to keep track of cooking time (not to forget about boiling pasta) and other purposes. It works in the tray. When hovering the icon it shows remaining time as a tooltip. When clicked it shows remaining time in a balloon tip. I would like displayed time to be "ticking down", i.e. to update every second. How do I force update/redraw of the balloon and tool tip?

Read the article

Spotlight on an office - Moscow

- by Maria Sandu

Probably the most famous place in Moscow, after Red Square, is the centre of Moscow. Here you can find beautiful buildings that seem to touch the sky, located on the banks of the river. In one of these high towers you can find the Oracle offices, friendly and modern. The stunning view will keep capture your attention for a couple of minutes and then you can enjoy a delicious coffee and take a seat at your desk, starting a new day. My name is Dmitry and I can tell you that we’re enjoying every minute spent in the office and that’s because of the pleasant atmosphere. As soon as you enter the offices, the friendly environment will make you feel more relaxed. Even though the space is split between the different departments, we interact and communicate a lot. We take our cup of coffee or tea together and discuss our achievements and all sort of subjects in the kitchen or in the open space. One of my favorite parts are the festive events when we celebrate with cakes and goodies. Any birthday or new arrival is a good reason for a tea party! We have some work-related traditions that help us as employees. One of them is the monthly Tech Hour when Experts from the Pre-sales team discuss technical topics and about the most recent innovations within the company. Lunch is another good opportunity to interact and chat. We have a variety of options, such as the two kitchens or the vast number of restaurants where you can serve up anything you want. As we are right in the centre of Moscow, you can choose between Sushi, Italian Pasta and all sorts of food. We usually go with our colleagues to have lunch. If you care about your health, I have very good news for you as nearby there are two first-class fitness centres with swimming pools, yoga and various sport classes that you can attend. My suggestion would be to either start or end your day with a visit to the swimming pool for a well-deserved hour of relaxation. As I mentioned before, we’re right in the heart of Moscow, so after work you can spend some time in the large shopping centers where you can choose between many different entertainment options. We often go to bowling or to the cinema. I hope I have given you a glimpse into working life at the Oracle offices in Moscow, a really great and pleasant place to work in, so follow us on http://campus.oracle.com for our latest vacancies and internships.

Read the article

Agile team with no dedicated Tester members. Insane or efficient?

- by MetaFight

I'm a software developer. I've been thinking a lot about the efficiency of the Software Testers I've worked with so far in my career. In fact, I've been thinking a lot about the Software Testers role in general and have reached a potentially contentious conclusion: Non-developer Software Testers staff are less efficient at software testing than developers. Now, before everyone gets upset, hear me out. This isn't mere opinion: Software Testing and Software Development both require a lot of skills in common: Problem solving Thinking about corner cases Analytical skills The ability to define clear and concise step-by-step scenarios What developers have in addition to this is the ability to automate their tests. Yes, I know non-dev testers can automate their tests too, but that often then becomes a test maintenance issue. Because automating UI tests is essentially programming, non-dev members encounter all the same difficulties software developers encounter: Copy-pasta, lack of code reusibility/maintainability, etc. So, I was wondering. Why not replace all non-dev roles with developer roles? Developers have the skills required to perform Software Testing tasks, and they have the skills to automate tests and keep them maintainable. Would the following work: Hire a bunch of developers and split them into 2 roles: Software developers Software developers doing testing (some manual, mostly automated by writing integration tests, unit tests, etc) Software developers doing application support. (I've removed this as it is probably a separate question altogether) And, in our case since we're doing Agile development, rotate the roles every sprint or two. Also, if at all possible, try to have people spend their Developer stints and Testing stints on different projects. Ideally you would want to reduce the turnover rate per rotation. So maybe you could have 2 groups and make sure the rotation cycles of the groups are elided. So, for example, if each rotation was two sprints long, the two groups would have their rotations 1 sprint apart. That way there's only a 50% turn-over rate per sprint. Am I crazy, or could this work? (Obviously a key component to this working is that all devs want to be in the 3 roles. Let's assume I'm starting a new company and I can hire these ideal people) Edit I've removed the phrase "QA", as apparently we are using it incorrectly where I work.

Read the article

Sharepoint Search crawl not working

- by Satish

Search Crawling is error out on my MOSS 2007 installation. I get the following error for all the web apps I have following error in Crawl logs. http://mysites.devserver URL could not be resolved. The host may be unavailable, or the proxy settings are not configured correctly on the index server. The Application Event log also has the following corresponding error The start address http://mysites.devserver cannot be crawled. Context: Application 'SSPMain', Catalog 'Portal_Content' Details: The URL of the item could not be resolved. The repository might be unavailable, or the crawler proxy settings are not configured. To configure the crawler proxy settings, use the Proxy and Timeout page in search administration. (0x80041221) I'm using Windows 2008 server. I tried accessing the site using the above mentioned url and its available. I did the registry setting for loop back issue found here http://support.microsoft.com/kb/896861 still not luck. Any Ideas?

Read the article

SQL Server 2005, Sudden increase of connections - SharePoint 2007

- by CrazyNick

We observed that sudden increase of SQL connections during a specific hour, it is a backend of a SharePoint 2007 Farm. From SharePoint 2007 Perspective: 1. Incremental crawling is scheduled at that time and few of the Timer jobs (normal timer jobs) are scheduled to run every mins / per 10mins. 2. Number of user requests are less. From SQL Server 2005 Perspective: 1. Transaction log backup is scheduled at that time 2. No other scheduled jobs are running at that time. so, how to narrow down the issue, what would be causing the sudden SQL connection increase?

Read the article

SQL Server 2005, Sudden increase of connections - SharePoint 2007

- by CrazyNick

We observed that sudden increase of SQL connections during a specific hour, it is a backend of a SharePoint 2007 Farm. From SharePoint 2007 Perspective: 1. Incremental crawling is scheduled at that time and few of the Timer jobs (normal timer jobs) are scheduled to run every mins / per 10mins. 2. Number of user requests are less. From SQL Server 2005 Perspective: 1. Transaction log backup is scheduled at that time 2. No other scheduled jobs are running at that time. so, how to narrow down the issue, what would be causing the sudden SQL connection increase?

Read the article

Sharepoint Search crawl not working

- by Satish

Search Crawling is error out on my MOSS 2007 installation. I get the following error for all the web apps I have following error in Crawl logs. http://mysites.devserver URL could not be resolved. The host may be unavailable, or the proxy settings are not configured correctly on the index server. The Application Event log also has the following corresponding error The start address http://mysites.devserver cannot be crawled. Context: Application 'SSPMain', Catalog 'Portal_Content' Details: The URL of the item could not be resolved. The repository might be unavailable, or the crawler proxy settings are not configured. To configure the crawler proxy settings, use the Proxy and Timeout page in search administration. (0x80041221) I'm using Windows 2008 server. I tried accessing the site using the above mentioned url and its available. I did the registry setting for loop back issue found here http://support.microsoft.com/kb/896861 still not luck. Any Ideas?

Read the article

What sources do spammers use to get email addresses?

- by Andrew Grimm

From what sources do email spammers get their addresses? Wikipedia mentions the following: Harvesting email addresses from publicly available sources. This includes web pages (web crawling), usenet posts, mailing list archives, DNS and WHOIS records Guessing email addresses (directory harvest attack) Asking people for their emails for one purpose, such as jokes of the day, and selling the email addresses elsewhere Getting access to people's address books (which Quechup utilized) Scanning an infected computer for email addresses. Are there any other techniques used? Are any of the techniques above now obsolete?

Read the article

How do I rate limit google's crawl of my class C IP block?

- by Zak

I have several sites in a class C network that all get crawled by google on a pretty regular basis. Normally this is fine. However, when google starts crawling all the sites at the same time, the small set of servers that back this IP block can take a pretty big hit on load. With google webmaster tools, you can rate limit the googlebot on a given domain, but I haven't found a way to limit the bot across an IP network yet. Anyone have experience with this? How did you fix it?

Read the article

My HP-Vista based laptop has become very slow recently

- by goldenmean

My HP laptop which has Vista Home premium. When I try to start Firefox, internet explorer, it becomes very slow. No other app. When i checked the Performance in Task Manager. It shows the Physical memory , Free as 0 bytes, almost always. This has been recently. Earlier it didn't used to be zero. Laptop has 2GB of RAM. I have nothing running in my tray except - Sound control, Laptop power plan indicator,Network status indicator. There are no other processes whose memory usage adds up to so high to make Free memory as 0. Then what could be hogging the memory and make the laptop very slow. Any pointers would help as it is crawling at the moment.

Read the article

How do multiple displays work on a AMD 785G / ATI HD 4200 motherboard?

- by aireq

I just ordered a ASUS M4A785TD-V EVO which has the AMD 785G chipset and HD4200 integrated graphic. The board has VGA, DVI, and HDMI outputs. I'm wondering how many outputs I can run at once, and from what connectors? My guess is that I can only use the VGA, and either the DVI or the HDMI in a dual setup. But not the HDMI and the DVI at the same time. Is this correct? If I have devices plugged into both the HDMI and the DVI ports is there a way to choose between which port I want to use? I have a dual 19" monitor setup, as well as a LCD TV. I'd like to run the VGA and the DVI into my two monitors, and then the HDMI to my TV. Then when I want to watch something on the TV I'd like to be able to switch over from the DVI to the HDMI. Is this possible with out crawling under my desk and unplugging/plugging things in?

Read the article

How much HDD space would I need to cache the web while respecting robot.txts?

- by Koning Baard XIV

I want to experiment with creating a web crawler. I'll start with indexing a few medium sized website like Stack Overflow or Smashing Magazine. If it works, I'd like to start crawling the entire web. I'll respect robot.txts. I save all html, pdf, word, excel, powerpoint, keynote, etc... documents (not exes, dmgs etc, just documents) in a MySQL DB. Next to that, I'll have a second table containing all restults and descriptions, and a table with words and on what page to find those words (aka an index). How much HDD space do you think I need to save all the pages? Is it as low as 1 TB or is it about 10 TB, 20? Maybe 30? 1000? Thanks

Read the article

Do virtual machines perform better on the host HDD or USB drive?

- by Jeremy Ricketts

The question I'm asking is kind of general, and I'll give more specifics about my specific setup. Here's the main question though: Do virtual machines generally perform better on the host HDD or is it better to operate them from an external disk? My specific setup: A Macbook Pro with a nearly full internal SATA drive that spins at 7200. On this system I'm running large programs like Photoshop and some other RAM-intense applications. I've dedicated 2 of my 8 gigs of RAM to my VMware Fusion virtual machine, which runs Windows 7 and Visual Studio, sits on the same drive. When that thing boots up, my system really starts crawling. I have an external USB (specifics of that drive are here) which I'm thinking about moving the VM to. Obviously a USB drive is slower than my internal HDD, but maybe having two operating systems using the same disk is WORSE than putting one of them on a separate (albiet slower) disk. This a bad idea?

Read the article

Merging and re-formatting paragraphs in Microsoft Word 2007

- by thkala

After a copy/paste mishap in Microsoft Word 2007, I ended up with text looking like this: This line breaks up here continues here, and so on here, when it should all be in a single line without all the random whitespace. I confirmed that there are paragraph separators and extra whitespace between each line - probably due to hard-coded newlines in the original source. Is there a (preferrably easy) way to merge paragraphs in Microsoft Word? Is there a way to re-format a paragraph so that extraneous whitespace is removed? I can change the flush style, but the whitespace remains. I (obviously?) do not have any experience with Word, being more of a TeX person, but I have been searching Google and crawling the menus for a few hours and I have yet to find a solution...

Read the article

How to display recently installed programs and when they were installed?

- by salvationishere

I have a Windows XP and I just installed about 12 new programs. Big mistake! Before I installed these programs, my internet connection was running great. But now after installing and restarting my laptop, the internet is crawling. How can I see what was changed? Hint: prior to installing these 12 programs, I installed IE version 8. So probably if I removed that it would fix it, but the problem is I need IE in order for my SQL/C# web application to work properly.

Read the article

OWA no longer accessing 1 backend exchange server

- by Morchuboo

We have IIS hosting OWA that is the web frontend to 3 backend exchange servers. Yesterday we got a lot of event 9791 warnings: "Cleanup of the DeliveredTo table for database 'Second Storage Group\Mailbox Store EUROPE 2' was pre-empted because the database engine's version store was growing too large. 0 entries were purged. At this point the server was crawling. Our Mail admin is currently away and not contactable so we rebooted the server. Everything seems ok when reading mail from outlook and evolution-mapi clients but OWA and active-sync connections cannot access. When logging into OWA, users whos mailboxes are not on this backend server are fine but users on this server can log into the OWA frontend but once submitting their credentials the page returns a 503 service unavailable error. We have since rebstarted the affected exchange server and the IIS server as well as iisreset /noforce but problem persists. Can anyone suggest what we should look at...

Read the article

Private Git repo using Smart HTTP with LDAP authentification

- by ALOToverflow

I've been crawling the interwebz and getting my hands dirty for the last few days, but I can't seem to make it all work together. I managed to get a HTTP repo working with Ubuntu 10.04 over Smart HTTP (pull and push over HTTP) for a single repo. This means that I do the initial setup over SSH to the server (git init --bare) and after that the clients can pull and push to it (git clone http://servername/allgitrepos/repo.git). Unfortunately it's impossible to add a new repo without SSHing to the server and adding it manually) i.e. git push http://servername/allgitrepos/repo2.git (allgitrepos is available for everyone to read-write and execute) would fail talking about git update-server-info (which seems to be a general error message). So far the repository is anonymous, so I would like to authenticate using LDAP and also use the LDAP creds to make the git commit. So, how can I push new repos to the server and how can I use the LDAP creds to make the git commit. Thanks

Read the article

Building intranet search

- by gmkv

At work, we have lots of information squirreled away in many different sites -- wikis, product docs, ticketing system, etc -- many of which require authentication. I'm very interested in having a single way to search all our various silos, and in my spare time have looked at Nutch, Grub, Django + Haystack, etc. None of these is a complete solution a la Google Mini or Google Search Appliance. Has anybody built a basic intranet search engine out of a mixture of these tools? Would you have recommendations about how to go about it? I like Django, and Haystack seems to be a mildly popular search solution for it, but I'd need to wire up a crawler that can support crawling authenticated sites to it.

Read the article

De-index URL parameters by value

- by Doug Firr

Upon reading over this question is lengthy so allow me to provide a one sentence summary: I need to get Google to de-index URLs that have parameters with certain values appended I have a website example.com with language translations. There used to be many translations but I deleted them all so that only English (Default) and French options remain. When one selects a language option a parameter is aded to the URL. For example, the home page: https://example.com (default) https://example.com/main?l=fr_FR (French) I added a robots.txt to stop Google from crawling any of the language translations: # robots.txt generated at http://www.mcanerin.com User-agent: * Disallow: Disallow: /cgi-bin/ Disallow: /*?l= So any pages containing "?l=" should not be crawled. I checked in GWT using the robots testing tool. It works. But under html improvements the previously crawled language translation URLs remain indexed. The internet says to add a 404 to the header of the removed URLs so the Googles knows to de-index it. I checked to see what my CMS would throw up if I visited one of the URLs that should no longer exist. This URL was listed in GWT under duplicate title tags (One of the reasons I want to scrub up my URLS) https://example.com/reports/view/884?l=vi_VN&l=hy_AM This URL should not exist - I removed the language translations. The page loads when it should not! I played around. I typed example.com?whatever123 It seems that parameters always load as long as everything before the question mark is a real URL. So if Google has indexed all these URLS with parameters how do I remove them? I cannot check if a 404 is being generated because the page always loads because it's a parameter that needs to be de-indexed.

Read the article

Do or can robots cause considerable performance issues?

- by Anicho

So the question in the title is exactly what I am trying to find out. My case is: At work we are in a discussion with team members who seem to think bots will cause us problems relating to performance when running on our services website. Out setup: Lets say I have site www.mysite.co.uk this is a shop window to our online services which sit on www.mysiteonline.co.uk. When people search in google for mysite they see mysiteonline.co.uk as well as mysite.co.uk. Cases against stopping bots crawling: We don't store gb's of data publicly available on the web Most friendly bots, if they were to cause issues would have done so already In our instance the bots can't crawl the site because it requires username & password Stopping bots with robot .txt causes an issue with seo (ref.1) If it was a malicious bot, it would ignore robot.txt or meta tags anyway Ref 1. If we were to block mysiteonline.co.uk from having robots crawl this will affect seo rankings and make it inconvenient for users who actively search for mysite to find mysiteonline. Which we can prove is the case for a good portion of our users.

Search Results

Search found 287 results on 12 pages for 'crawling pasta hellion'.

Page 8/12 | < Previous Page | 4 5 6 7 8 9 10 11 12 | Next Page >

- by portoalet

- by stranger001

- by hao

- by vipin k.

- by chenxgre

- by Rryk

- by Maria Sandu

- by MetaFight

- by Satish

- by CrazyNick

- by CrazyNick

- by Satish

- by Andrew Grimm

- by Zak

- by goldenmean

- by aireq

- by Koning Baard XIV

- by Jeremy Ricketts

- by thkala

- by salvationishere

- by Morchuboo

- by ALOToverflow

- by gmkv

- by Doug Firr

- by Anicho

< Previous Page | 4 5 6 7 8 9 10 11 12 | Next Page >