Search Results

Search found 261 results on 11 pages for 'crawler'.

Page 6/11 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page >

Proper Usage of SqlConnection in .NET

- by Jojo

Hi guys, I just want an opinion on the proper usage or a proper design with regards to using SqlConnection object. Which of the 2 below is the best use: A data provider class whose methods (each of them) contain SqlConnection object (and disposed when done). Like: IList<Employee> GetAllEmployees() { using (SqlConnection connection = new SqlConnection(this.connectionString)) { // Code goes here... } } Employee GetEmployee(int id) { using (SqlConnection connection = new SqlConnection(this.connectionString)) { // Code goes here... } } or SqlConnection connection; // initialized in constructor IList<Employee> GetAllEmployees() { this.TryOpenConnection(); // tries to open member SqlConnection instance // Code goes here... this.CloseConnection(); // return } Employee GetEmployee(int id) { this.TryOpenConnection(); // tries to open member SqlConnection instance // Code goes here... this.CloseConnection(); // return } Or is there a better approach than this? I have a focused web crawler type of application and this application will crawl 50 or more websites simultaneously (multithreaded) with each website contained in a crawler object and each crawler object has an instance of a data provider class (above). Please advise. Thanks.

Read the article
Problem in Application_Error in Global.asax

- by mmtemporary

my problem is User.Identity.Name or Request.Url.AbsoluteUri in exception handling is empty when exception email to me. this is Application_Code: void Application_Error(object sender, EventArgs e) { Server.Transfer("~/errors/default.aspx"); } and this is default.aspx code: protected void Page_Load(object sender, EventArgs e) { if (Server.GetLastError() == null) return; Exception ex = Server.GetLastError().GetBaseException(); if (ex == null) return; string message = string.Format("User: ", User.Identity.Name); message += Environment.NewLine; message += string.Format("AbsoluteUri: ", Request.Url.AbsoluteUri); message += Environment.NewLine; message += string.Format("Form: ", Request.Form.ToString()); message += Environment.NewLine; message += string.Format("QueryString: ", Request.QueryString.ToString()); message += Environment.NewLine; HttpBrowserCapabilities browser = Request.Browser; string s = "Browser Capabilities:\n" + "Type = " + browser.Type + "\n" + "Name = " + browser.Browser + "\n" + "Version = " + browser.Version + "\n" + "Platform = " + browser.Platform + "\n" + "Is Crawler = " + browser.Crawler + "\n" + "Supports Cookies = " + browser.Cookies + "\n" + "Supports JavaScript = " + browser.EcmaScriptVersion.ToString() + "\n" + "\n"; message += s; message += Environment.NewLine; message += ex.ToString(); Exception lastException = (Exception)Application["LastException"]; if (lastException == null || lastException.Message != ex.Message) { Application.Lock(); Application["LastException"] = ex; Application.UnLock(); SiteHelper.SendEmail(SiteHelper.AdministratorEMail, "Error!!!", message, false); } Server.ClearError(); } but i receive email like this (this is header without full exception content): User: AbsoluteUri: Form: QueryString: Browser Capabilities: Type = IE8 Name = IE Version = 8.0 Platform = WinXP Is Crawler = False Supports Cookies = True Supports JavaScript = 1.2 why username and request url is emty? this problem is exist when i replace transfer with redirect or i don't use both. tanx

Read the article
Issue with the Entity Manager and phpunit in Symfony 2

- by rgazelot

I have an issue with my Entity Manager in phpunit. This is my test : public function testValidChangeEmail() { $client = self::createAuthClient('user','password'); $crawler = $client->request('GET', '/user/edit/30'); $crawler = $client->submit($crawler->selectButton('submit')->form(array( 'form[email]' => '[email protected]', ))); /* * With this em, this work perfectly * $em = $client->getContainer()->get('doctrine.orm.entity_manager'); */ $user = self::$em->getRepository('MyBundle:User')->findUser('[email protected]'); die(var_dump($user->getEmail())); } and this is my WebTestCase which extends original WebTestCase : class WebTestCase extends BaseWebTestCase { static protected $container; static protected $em; static protected function createClient(array $options = array(), array $server = array()) { $client = parent::createClient($options, $server); self::$em = $client->getContainer()->get('doctrine.orm.entity_manager'); self::$container = $client->getContainer(); return $client; } protected function createAuthClient($user, $pass) { return self::createClient(array(), array( 'PHP_AUTH_USER' => $user, 'PHP_AUTH_PW' => $pass, )); } As you can see, I replace the self::$em when I created my client. My issue : In my test, the die() give me the old email and not the new email ([email protected]) which has registered in the test. However in my database, I have the [email protected] correctly saved. When I retrieve my user in the database, I use sefl::$em. If I use the $em in comments, I retrieve the right new email. I don't understand why in my WebTestCase, I can access to the new Entity Manager...

Read the article
Sharepoint Search crawl not working

- by Satish

Search Crawling is error out on my MOSS 2007 installation. I get the following error for all the web apps I have following error in Crawl logs. http://mysites.devserver URL could not be resolved. The host may be unavailable, or the proxy settings are not configured correctly on the index server. The Application Event log also has the following corresponding error The start address http://mysites.devserver cannot be crawled. Context: Application 'SSPMain', Catalog 'Portal_Content' Details: The URL of the item could not be resolved. The repository might be unavailable, or the crawler proxy settings are not configured. To configure the crawler proxy settings, use the Proxy and Timeout page in search administration. (0x80041221) I'm using Windows 2008 server. I tried accessing the site using the above mentioned url and its available. I did the registry setting for loop back issue found here http://support.microsoft.com/kb/896861 still not luck. Any Ideas?

Read the article
Sharepoint Search crawl not working

- by Satish

Search Crawling is error out on my MOSS 2007 installation. I get the following error for all the web apps I have following error in Crawl logs. http://mysites.devserver URL could not be resolved. The host may be unavailable, or the proxy settings are not configured correctly on the index server. The Application Event log also has the following corresponding error The start address http://mysites.devserver cannot be crawled. Context: Application 'SSPMain', Catalog 'Portal_Content' Details: The URL of the item could not be resolved. The repository might be unavailable, or the crawler proxy settings are not configured. To configure the crawler proxy settings, use the Proxy and Timeout page in search administration. (0x80041221) I'm using Windows 2008 server. I tried accessing the site using the above mentioned url and its available. I did the registry setting for loop back issue found here http://support.microsoft.com/kb/896861 still not luck. Any Ideas?

Read the article
SharePoint Server Search Not Crawling

- by tekiegreg

Hi there, we recently moved some sites into a new farm, everything seems to be doing fine, but the search for reasons I can't identify are not crawling the migrated content. We're getting this message in our crawl log for every document: http://xxx/sites/...announcements The object was not found. (The item was deleted because it was either not found or the crawler was denied access to it.) Of course the first thing I suspected was the crawler access account, so I logged into SharePoint with the account and was able to access via that URL just fine. I tried upping permissions (even all the way up to Admin) but to no avail. Thoughts?

Read the article
How can I run my program on a large number of computers? [closed]

- by zenpoy

I'm looking for a (preferably free) service for running an executable I wrote? It's not malicious, it's not a virus, it's not scam, and if this is really important I can upload the python source code instead. I wrote a small crawler to gather information regarding the style of web pages for my MA project, and I need a lot more data. EDIT Here is more information on my problem and how I approach on solving it, and where I'm stuck. As part of my research I'm trying to classify text based on it's style (font-family for now), my data is based web pages, so I wrote a client/server application - the client is a crawler that gathers this data and send it to the server. The problem is that like 99% of the internet is Arial, Verdana and Helvetica - other fonts are far more rare, so I need to spend very long time to gather enough data regarding these fonts. Hope this explains it.

Read the article
Can I use a 302 redirect to serve up static content from a url with escaped_fragment?

- by Starfs

We would like to serve up seo-friendly ajax-driven content. We are following this documentation. Has anyone ever tried to write a 302 redirect into the htaccess file, that takes the '?_escaped_fragment=' string and send that to a static page? For example /snapshot/yourfilename/ How will Google react to this? I've gone through the documentation and it's not very clear. The below quote is from Google's documentation this is what I find. I'm not sure if they are saying that you can redirect the _escaped_fragment_ url to a different static page, or if this is to redirect the hashtag URL to static content? Thoughts? From Google's site: Question: Can I use redirects to point the crawler at my static content? Redirects are okay to use, as long as they eventually get you to a page that's equivalent to what the user would see on the #! version of the page. This may be more convenient for some webmasters than serving up the content directly. If you choose this approach, please keep the following in mind: Compared to serving the content directly, using redirects will result in extra traffic because the crawler has to follow redirects to get the content. This will result in a somewhat higher number of fetches/second in crawl activity. Note that if you use a permanent (301) redirect, the url shown in our search results will typically be the target of the redirect, whereas if a temporary (302) redirect is used, we'll typically show the #! url in search results. Depending on how your site is set up, showing #! may produce a better user experience, because the user will be taken straight into the AJAX experience from the Google search results page. Clicking on a static page will take them to the static content, and they may experience avoidable extra page load time if the site later wants to switch them to the AJAX experience.

Read the article
Can I use a 302 redirect to serve up static content from an URL with escaped_fragment?

- by Starfs

We would like to serve up SEO-friendly Ajax-driven content. We are following this documentation. Has anyone ever tried to write a 302 redirect into the .htaccess file, that takes the ?_escaped_fragment= string and send that to a static page?, for example /snapshot/yourfilename/. How will Google react to this? I've gone through the documentation and it's not very clear. The below quote is from Google's documentation this is what I find. I'm not sure if they are saying that you can redirect the _escaped_fragment_ URL to a different static page, or if this is to redirect the hashtag URL to static content? Thoughts? From Google's site: Question: Can I use redirects to point the crawler at my static content? Redirects are okay to use, as long as they eventually get you to a page that's equivalent to what the user would see on the #! version of the page. This may be more convenient for some webmasters than serving up the content directly. If you choose this approach, please keep the following in mind: Compared to serving the content directly, using redirects will result in extra traffic because the crawler has to follow redirects to get the content. This will result in a somewhat higher number of fetches/second in crawl activity. Note that if you use a permanent (301) redirect, the url shown in our search results will typically be the target of the redirect, whereas if a temporary (302) redirect is used, we'll typically show the #! url in search results. Depending on how your site is set up, showing #! may produce a better user experience, because the user will be taken straight into the AJAX experience from the Google search results page. Clicking on a static page will take them to the static content, and they may experience avoidable extra page load time if the site later wants to switch them to the AJAX experience.

Read the article
Working with Google Webmaster Tools

- by com

My first question is about Crawl errors in Google Webmaster Tools. Crawl errors is devided into few sections. One of them is HTTP. I assume that all broken links in HTTP was somehow found by crawler, this is not the links from sitemap. If this was found by scanning all sitemap pages for links, why it doesn't mention what was the source page, like in sitemap section with column Linked From. And what the meaning of Linked From, I thought if the name of section is sitemap, therefore all URLs should be taken from sitemap, so why there is Linked From? The second question, what is the best way to trreat searching on the site. How come the searching result page are getting indexed? Because of the fact that all searching result page are getting indexed, I have to many page in Linked From. What's the right practice? Question three: In order to improve response time in WMT, can I redirect all crawler's requests to designated free web server? Is this good practice? Question four: How should I treat Google Analytics Code (with parameters PageView, PageLoadTime), in the case user request non existing page, should I render Google code or not? Right now I use Google Analytics Code on the common template page, such that every page, also non existing page with error message contains Google Analytics Code, it seems like it has influence on WMT.

Read the article
CodePlex Daily Summary for Tuesday, April 20, 2010

CodePlex Daily Summary for Tuesday, April 20, 2010New ProjectsASP.NET MVC Extensibility: ASP.NET MVC Extensibility.ASP.NET MVC Starter: Tekpub's ASP.NET MVC 2.0 Starter Site, as put together by Rob Conery in Episode 15 of Mastering ASP.NET MVC (http://tekpub.com/production/starter)AzureDemo: An internal Azure demo and test bed for some projects. After demo is complete this project will be closed.Basic Sprite Sheet Creator: A basic c# program to create sprite sheets. CodeDefender: Protect your .Net codes easily with this smart obfuscator!Crawlr: Tema 2 projectDocument Session Manager - Visual Studio addin: Document Session Manager is a Visual Studio 2008 addin for saving and restoring the list of opened documents (xml files, source files, winforms, et...Esferatec.Text.RegularExpressions: assembly to build regular expression patternsFIFA World Cup 2010 Mobile Sticker Checklist: FIFA World Cup 2010 Mobile Sticker Checklist is a small application for Windows Mobile developed in CF 3.5 to keep tracking of your sticker album. ...Finia.net: 追忆游乐网·幻之大地FusspawnsAI: Fusspawns UT AI is a small test engine for a classic ut remote bot api. intending to improve ut's ai to a god like level without cheating bots(bots...G.A.E.T.: This is a Graphical Asymmetric Encryption Tool based on R.S.A. algorithm with the help of Java Language.Even though, this may be a small applicatio...ItzyBitzySpider: Webcrawler project from computer science at UCN.JingQiao.Ads: My DDD NTier Architecture example project.Managed Meizu SDK Demo: In this project we are sharing the source code to demonstrate the usage of managed SDK for Meizu cell phones, currently for M8. With the help of th...MaxxUtils.MaxxTagger: MaxxTagger: An Mp3 Tag Editor.. Add /Edit/Remove MP3 ID3 V1 and 2.3 Tags like Title, Artist, Album, Album Art, Genre. Besides tag editing, it also ...Maya Project Management: The Maya Project Management is a clone of RedMine with all its functions and plug-in support, using the following technologies: Microsoft .net Fra...MessageBoxLib: A simple, robust library for Xbox 360 and Windows development using the XNA Game Studio that makes using the Guide class's message box functionalit...MyWSAT - ASP.NET Membership Administration Tool: MyWSAT aka ASP.NET WSAT is a WebForms based website Starter Kit for the ASP.NET Membership Provider. It is a feature rich application that takes ca...OntologyCreator: this is my thesis and it is not finished yetPOS for .Net Handheld Products Service Object: POS for .Net Service Object Handheld Products Bar Code ScannerPostBinder: PostBinder is a small helper library that deserializes ASP.NET requests into C# classes. This eliminates having to write repeated hand wiring co...PostSharp for ASP.NET Web Sites: Adds support for PostSharp 2.0 on ASP.NET Web Sites.Rapid Dictionary: * Rapid Dictionary is a Translation Dictionary initialized by language learning network http://wordsteps.com. * Dictionary developed in C# and Co...ROrganizer: If you feel your movie files are kept in messy way, try out the ROrganizer which helps you rearrange them.RoRoWoBlog: 萝萝窝个人博客开源项目SPGroupDeflector - Explicitly deny groups to webs within your Site Collection: Secure webs within your MOSS or WSS Portal by explicitly denying access to specific users in SharePoint groups.SSIS ShapeFileSource: SSIS ShapeFileSource imports ESRI Shapefiles, and the associated attribute file (.dbf). The component based on the free Shapefile C Library.StoreManagement: University assignment. The task is to build an application that can perform basic CRUD operations on a property and use an arbitrary database. ...Surfium: TODO ;-)TaskCleaner: This is a Windows Forms project created to kill some running process in order to enhace the performance of Windows execution. Sometimes it is desi...The Expert Calendar: The Expert Calendar is a MOSS 2007 webpart which allows to connect to a Event Item List and display event items in a small design customizable cale...Visual Studio Find Results Window Tweak: This is a Visual Studio 2010 add-in which enables you to adjust the format of the Find Results Window. It is written in C#, .NET 4.0 and requires ...Weightlifting Sinclair coeficient calculator: Weightlifting Sinclair coeficient calculator for competitors (for Windows Mobile platform)Windows Azure Web Storage Explorer: Windows Azure Web Storage Explorer makes it easier for developers to browse and manage Blobs, Queues and Tables from Windows Azure Storage account....New Releases#SNMP - C# Based Open Source SNMP for .NET and Mono: CatPaw (5.0) Beta 1: SNMP v3 support in snmpd is complete.ASP.Net MVC Crud with JqGrid: Mvc Crud with JqGrid 0.3.0: Fairly major reworking of the GenericDataGrid (with alot of work from James). Most noticeable is the replacing of Edit and Delete with action butt...Basic Sprite Sheet Creator: Sprite Tool v1.1: Fixed the progress bar, it now correctly displays text and progress. Also download will now come with an installer and an executable so you don't h...Basic Sprite Sheet Creator: Sprite Tool Version 1.0: Program used to make basic sprite sheets. please visit http://coderplex.blogspot.com for more infoBraintree Client Library: Braintree-1.2.1: Escape all XMLCodeDefender: CodeDefender v0.1: Protect your .Net exe and dll files with this smart tool.ColinTesting: test: testColinTesting: test2: test2ColinTesting: test3: test3ColinTesting: test4: test4ColinTesting: test6: test6CycleMania Starter Kit EAP - ASP.NET 4 Problem - Design - Solution: Cyclemania 0.08.63: See Source Code tab for recent change history.Document Session Manager - Visual Studio addin: Release v0.45948: Release v0.45948DotNetNuke® Community Edition: 05.04.00: Major Highlights Fixed issue where portal settings were not saved per portal. Fixed issue with importing page templates. Fixed issue with...DotNetNuke® Postgres Data Provider: DNN PG Provider 01.00.00 Beta2: Fixes problems with deprecated datatype money in Postgres. Upgrades DotnetNuke code base to 04.09.05 It comes with a patch for the DotNetNuke insta...FIFA World Cup 2010 Mobile Sticker Checklist: FIFA World Cup 2010 Mobile Sticker Checklist v0.1b: FIFA World Cup 2010 Mobile Sticker Checklist v0.1b First beta release. Requires Microsoft Compact Framework 3.5. It was tested on an HTC Touch Viva...FIFA World Cup 2010 Mobile Sticker Checklist: FIFA World Cup 2010 Mobile Sticker Checklist v0.2b: FIFA World Cup 2010 Mobile Sticker Checklist v0.2b Second beta release. Requires Microsoft Compact Framework 3.5. It was tested on an HTC Touch Viv...Fluent Ribbon Control Suite: Fluent Ribbon Control Suite 1.2: Fluent Ribbon Control Suite 1.2(supports .NET 3.5 and .NET 4 RTM) Includes: Fluent.dll (with .pdb and .xml) Showcase Application Samples Found...G.A.E.T.: Graphical Asymmetric Encryption Tool: User Interface The GAET User Interface is a window with five buttons. Each button is explained the following sections. Each button has a functional...HTML Ruby: 6.21.7: As long as I don't find anything else that I can improve, this will be submitted to Mozilla for review tomorrow. Added back process inserted conten...IBCSharp: IBCSharp 1.03: What IBCSharp 1.03.zip unzips to: http://i43.tinypic.com/24ffbqr.png Note: The above solution has MSTest, Typemock Isolator, and Microsoft CHESS c...LogikBug's IoC Container: Second Release: This project is dependent upon Microsoft.Practices.ServiceLocation and must be referenced when referencing LogikBug.Injection. Click here to view d...Managed Meizu SDK Demo: Library and Demo: Library and DemoMaxxUtils.MaxxTagger: MaxxUtils.MaxxTagger: Version: 1.0.0 (Beta) Instructions: Unzip the files to a folder and then dbl click on the exe. Known Issues: 1. When u copy or move a folde...OrthoLab: Cellule: Compile with Autodesk Maya 2008 32bit and 2010 64bit.OWASP Code Crawler: OWASP Code Crawler 2.7: Code Crawler 2.7 DescriptionIn terms of functionality there is not much new stuff in this release. We transplanted the new engine. Code Crawler is ...PerceptiveMCAPI - A .NET wrapper for the MailChimp Api: V1.2.3 PerceptiveMCAPI .Net Wrapper [Beta 2]: PerceptiveMCAPI – v 1.2.3 Change logFunctionality through MailChimp API announce v1.2.5 on 15-Feb-2010 .NET Wrapper New wrapper directives; api_Me...POS for .Net Handheld Products Service Object: POS for .Net Handhelp Products Service Object: The Service Object contained herein is a work in progress. This Service Object's is written as VS 2008 C# Project. The Target Platform is x86. ...PostSharp for ASP.NET Web Sites: R1: First release.Rich Ajax empowered Web/Cloud Applications: 6.4 beta 2c: A revisiov to the first fully featured version of Visual webGui offering web/cloud development tool that puts all ASP.NET Ajax limits behind with e...Should: Beta - 1.0: This is the initial release of the Should assertions extensions.Shrinkr: v1.0: First public release.Site Directory for SharePoint 2010 (from Microsoft Consulting Services, UK): v1.2: Address a bug found in v1.1 relating to the Delete Site Listings job not incrementing the 'Site Missing Count' for some SharePoint sites.Software Localization Tool: SharpSLT 1.0: New functions Backup before saving Delete entries Undo deletion Added more comments in the codeSPGroupDeflector - Explicitly deny groups to webs within your Site Collection: SPGroupDeflector: Download the source code, the wsp solution package, and Setup.docSSIS ShapeFileSource: Version 0.1: Short Preview of SSIS ShapeFileSource ComponentStarter Kit Mytrip.Mvc.Entity: Mytrip.Mvc.Entity 1.0: Warning Install MySql Connector/Net 6.3 MySQL Membership MSSQL Membership XML Membership UserManager FileManager Localization Captcha ...Surfium: Linux Expo Prebuild: First public releaseTaskCleaner: Initial Working Version: In this version we have all the features listed in the project description working fine. Built under Framework 3.5.Text to HTML: 0.4.5.0: CambiosSustitución de los siguientes caracteres: Anteriores: " < > ¡ © º ¿ Á Ä É Í Ñ Ó Ö Ú Ü ß á ä é í ñ ó ö ú ü € Nuevos: & ´ ≈ ¦ • ¸ ˆ ↓ ð … ∫ ...TS3QueryLib.Net: TS3QueryLib.Net Version 0.21.16.0: This release contains a bugfix for a bug that caused connection problems when connecting using an IP for some cases. So it's strongly recommended t...Tweety - Twitter Client: Tweety - 0.96: Form activation from system tray improved. General fixes. General code refactor.Web/Cloud Applications Development Framework | Visual WebGui: 6.4 Beta 2c: A revision to the first fully featured version of Visual webGui offering unique developer/designer interface and enhanced extensibility and customi...Windows Azure - PHP contributions: PhpAzureExtensions (Azure Drives) - 0.2.0: Extension for use with Windows Azure SDK 1.1! Breaking changes! Documentation can be found at http://phpazurecontrib.codeplex.com/wikipage?title=A...WoW Character Viewer: Viewer (40545): New setup build for 40545.Xrns2XMod: Xrns2XMod 0.0.5.3: Major Source code optimization: >> Separated logical code of xm/mod conversion from renoiseSong xml. Now all necessary renoise song data code is st...XsltDb - DotNetNuke XSLT module: 01.00.99: callable tag is introduced - create javascript ajax functions more easy import/export bug is fixed mdo:ajax checkbox processing is now the same...Most Popular ProjectsRawrWBFS ManagerSilverlight ToolkitAJAX Control ToolkitMicrosoft SQL Server Product Samples: DatabaseWindows Presentation Foundation (WPF)ASP.NETpatterns & practices – Enterprise LibraryPHPExcelMicrosoft SQL Server Community & SamplesMost Active ProjectsRawrpatterns & practices – Enterprise LibraryIndustrial DashboardIonics Isapi Rewrite FilterFarseer Physics EngineBlogEngine.NETPHPExcelCaliburn: An Application Framework for WPF and SilverlightNB_Store - Free DotNetNuke Ecommerce Catalog ModuleTweetSharp

Read the article
Getting IIS Worker Process Crash dumps

- by CVertex

I'm doing something bad in my ASP.NET app. It could be the any number of CTP libraries I'm using or I'm just not disposing something properly. But when I redeploy my ASP.NET to my Vista IIS7 install or my server's IIS6 install I crash an IIS worker process. I've narrowed the problem down to my HTTP crawler, which is a multithreaded beast that crawls sites for useful information when asked to. After I start a crawler and redeploy the app over the top, rather than gracefully unloading the appDomain and reloading, an IIS worker process will crash (popping up a crash message) and continue reloading the app domain. When this crash happens, where can I find the crash dump for analysis?

Read the article
SwingWorker in Java (beginner question)

- by Malachi

I am relatively new to multi-threading and want to execute a background task using a Swingworker thread - the method that is called does not actually return anything but I would like to be notified when it has completed. The code I have so far doesn't appear to be working: private void crawl(ActionEvent evt) { try { SwingWorker<Void, Void> crawler = new SwingWorker<Void, Void>() { @Override protected Void doInBackground() throws Exception { Discoverer discover = new Discoverer(); discover.crawl(); return null; } @Override protected void done() { JOptionPane.showMessageDialog(jfThis, "Finished Crawling", "Success", JOptionPane.INFORMATION_MESSAGE); } }; crawler.execute(); } catch (Exception ex) { JOptionPane.showMessageDialog(this, ex.getMessage(), "Exception", JOptionPane.ERROR_MESSAGE); } } Any feedback/advice would be greatly appreciated as multi-threading is a big area of programming that I am weak in.

Read the article
MOSS search crawl fails with "Access is denied ..."

- by strongopinions

Recently the search crawler stopped working on my MOSS installation. The message in the crawl log is Access is denied. Check that the Default Content Access Account has access to this content, or add a crawl rule to crawl this content. (The item was deleted because it was either not found or the crawler was denied access to it.) The default content account is an admin on the site collection that I am trying to crawl. Almost every result for this error on Google tells me to add the DisableLoobackCheck registry key with a value of 1. I have done this and rebooted and the error continues. The "Do not allow Basic Authentication" checkbox in my crawl rule screen is unchecked. Is there anything else that could be causing this error? Something with file system or database permissions maybe?

Read the article
Include models in Cakephp without going through index.php

- by Nicklas Ansman

Hey! I'm using CakePHP to build a site which has a crawler that mines data from different sites. The problem is that I would like to be able to call the crawler from command line (since it can take hours for it to finnish) and I would like to the use models in CakePHP when saving the data. Is there a file I could just include that includes the rest for me? If I just include the index.php it dispatches the call which I don't want. Regards Nicklas

Read the article
How to enable indexing of pages with dynamic data?

- by mithunb

I have a site that has certain urls that point to pages with permanent data and others that point to dynamic web pages. Google indexes both these regularly. By the time a user finds one of the dynamic content urls, the data on the page has already changed and the user does not find what he was looking for. Further, the dynamic url pages contains links to the permanent urls (which I want Google or any crawler to index). Google crawler controls (webmaster tools) cannot be made to read urls from a page but not index them. Solutions? crawling strategies *system architecture*.

Read the article
AJAX, DHTML, and SEO Search Engine Optimization

AJAX and DHTML can be used for rich user experience of websites, but AJAX and DHTML do not work with SEO. Search crawler robots do not crawl and analyze the JavaScript.

Read the article
Algorithm to map an area [on hold]

- by user37843

I want to create a crawler that starts in a room and from that room to move North,East,West and South until there aren't any new rooms to visit. I don't want to have duplicates and the output format per line to be something like this: current room, neighbour 1, neighbour 2 ... and in the end to apply BFS algorithm to find the shortest path between 2 rooms. Can anyone offer me some suggestion what to use? Thanks

Read the article
Monitoring GWT Website

- by Raf

We currently monitor our webapps using curl. More and more of our webapps use the GWT framework, which uses tons of javascript, and we can't rely on our curl system to monitor anymore. Therefore, we search the right tool to monitor, but it seems difficult to find a crawler which is light (no Selenium please) but handles javascript correctly. PS : we host our webapps as well as the probes, we don't want any Internet monitoring service.

Read the article
Learning About SEO

SEO stands for Search Engine Optimization. It is a way of doing good things on your site by making it friendly to the search engine crawler. Most of the search engine optimization (SEO) specialists devoted most of their time in making a certain website be on top of the search results in Google and Yahoo!, etc.

Read the article
The Job of SEO Spiders

The World Wide Web, also known as the Internet, is a very complex world. Search engines like Google need a software program that can read what's on the web. The said software program is known as bot or spider or crawler.

Read the article
wget crawling search results of news website

- by kiltek

I am trying to crawl the search results of a news website using wget. The name of the website is www.voanews.com. After typing in my search keyword and clicking search, it proceeds to the results. Then i can specify a "to" and a "from"-date and hit search again. After this the URL becomes: http://www.voanews.com/search/?st=article&k=mykeyword&df=10%2F01%2F2013&dt=09%2F20%2F2013&ob=dt#article and the actual content of the results is what i want to download. To achieve this I created the following wget-command: wget --reject=js,txt,gif,jpeg,jpg \ --accept=html \ --user-agent=My-Browser \ --recursive --level=2 \ www.voanews.com/search/?st=article&k=germany&df=08%2F21%2F2013&dt=09%2F20%2F2013&ob=dt#article Unfortunately, the crawler doesn't download the search results. It only gets into the upper link bar, which contains the "Home,USA,Africa,Asia,..." links and saves the articles they link to. It seems like he crawler doesn't check the search result links at all. What am I doing wrong and how can I modify the wget command to download the results search list links (and of course the sites they link to) only ?

Read the article
Agile: User Stories for Machine Learning Project?

- by benjismith

I've just finished up with a prototype implementation of a supervised learning algorithm, automatically assigning categorical tags to all the items in our company database (roughly 5 million items). The results look good, and I've been given the go-ahead to plan the production implementation project. I've done this kind of work before, so I know how the functional components of the software. I need a collection of web crawlers to fetch data. I need to extract features from the crawled documents. Those documents need to be segregated into a "training set" and a "classification set", and feature-vectors need to be extracted from each document. Those feature vectors are self-organized into clusters, and the clusters are passed through a series of rebalancing operations. Etc etc etc etc. So I put together a plan, with about 30 unique development/deployment tasks, each with time estimates. The first stage of development -- ignoring some advanced features that we'd like to have in the long-term, but aren't high enough priority to make it into the development schedule yet -- is slated for about two months worth of work. (Keep in mind that I already have a working prototype, so the final implementation is significantly simpler than if the project was starting from scratch.) My manager said the plan looked good to him, but he asked if I could reorganize the tasks into user stories, for a few reasons: (1) our project management software is totally organized around user stories; (2) all of our scheduling is based on fitting entire user stories into sprints, rather than individually scheduling tasks; (3) other teams -- like the web developers -- have made great use of agile methodologies, and they've benefited from modelling all the software features as user stories. So I created a user story at the top level of the project: As a user of the system, I want to search for items by category, so that I can easily find the most relevant items within a huge, complex database. Or maybe a better top-level story for this feature would be: As a content editor, I want to automatically create categorical designations for the items in our database, so that customers can easily find high-value data within our huge, complex database. But that's not the real problem. The tricky part, for me, is figuring out how to create subordinate user stories for the rest of the machine learning architecture. Case in point... I know that the algorithm requires two major architectural subdivisions: (A) training, and (B) classification. And I know that the training portion of the architecture requires construction of a cluster-space. All the Agile Development literature I've read seems to indicate that a user story should be the "smallest possible implementation that provides any business value". And that makes a lot of sense when designing a piece of end-user software. Start small, and then incrementally add value when users demand additional functionality. But a cluster-space, in and of itself, provides zero business value. Nor does a crawler, or a feature-extractor. There's no business value (not for the end-user, or for any of the roles internal to the company) in a partial system. A trained cluster-space is only possible with the crawler and feature extractor, and only relevant if we also develop an accompanying classifier. I suppose it would be possible to create user stories where the subordinate components of the system act as the users in the stories: As a supervised-learning cluster-space construction routine, I want to consume data from a feature extractor, so that I can exist. But that seems really weird. What benefit does it provide me as the developer (or our users, or any other stakeholders, for that matter) to model my user stories like that? Although the main story can be easily divided along architectural-component boundaries (crawler, trainer, classifier, etc), I can't think of any useful decomposition from a user's perspective. What do you guys think? How do you plan Agile user stories for sophisticated, indivisible, non-user-facing components?

Read the article
Best practice to hide/encrypt email adress in webpage

- by Sebi

I couldn't find a similar question, that's why here it is: Whats the best way to hide or encrypt an email link in a website, so that a crawler can't read it, but the user can nevertheless click it? I don't want to conufse the users by typing the email like this: john (at) mail.com or similar ways. (and i think this kind of links can nevertheless read by crawlers?) I also tried things like that: <script>// <![CDATA[eval(unescape('%76%61%72%20%73%3D%27%61%6D%6C%69%6F%74%72%3A%62%61%40%65%64%61%6E%6F%6C%2E%69%27%3B%76%61%72%20%72%3D%27%27%3B%66%6F%72%28%76%61%72%20%69%3D%30%3B%69%3C%73%2E%6C%65%6E%67%74%68%3B%69%2B%2B%2C%69%2B%2B%29%7B%72%3D%72%2B%73%2E%73%75%62%73%74%72%69%6E%67%28%69%2B%31%2C%69%2B%32%29%2B%73%2E%73%75%62%73%74%72%69%6E%67%28%69%2C%69%2B%31%29%7D%64%6F%63%75%6D%65%6E%74%2E%77%72%69%74%65%28%27%3C%61%20%68%72%65%66%3D%22%27%2B%72%2B%27%22%3E%4F%62%65%72%70%61%72%6C%65%69%74%65%72%3C%2F%61%3E%27%29%3B'))]]></script> but i heard this can also be read by crawler and it isn't really good practices are ther any common approaches?

Read the article
Wordpress paths issue

- by Martin

I have set a crawler up in wordpress which grabs stocks data and writes to file which when a user enters a symbol/ticker the variable is read and if it matches the data of a previous crawl for that particular companies data will echo the text file on page, if no data is found the crawler then sets off grabs it and writes to file to save for the next time that symbol is used. The problem im having is that everything works groovey apart from one thing, when the content is written to file it saves it in the WP root and not inside a subfolder of the theme, basicaly this means that root becomes untidy very quickly and also should the theme be used on another site then its not practical as some important info is missing. I have tried bloginfo and absolute both return the same failure. This is the code i am using to write to file, like i say it works apart from writing the file into root. <?php $CompDetails = "http://www.devserverurl.com/mattv1/wp-content/themes/stocks/tools/modules/Stock_Quote/company_details/$Symbol.txt"; if (file_exists($CompDetails)) {} else { include ('crawler_file.php'); $html = file_get_html("http://targeturl.com/research/stocks/snapshot/snapshot.asp?ticker=$Symbol:US"); $es = $html->find('div[class="detailsDataContainerLt"]'); $tickerdetails = ("$es[0]"); $FileHandle2 = fopen($CompDetails, 'w') or die("can't open file"); fwrite($FileHandle2, $tickerdetails); fclose($FileHandle2); } ?> edit below, have also tried this and the same happens as above <?php if (file_exists($_SERVER['DOCUMENT_ROOT'] . "/wp-content/themes/stocks/tools/modules/Stock_Quote/company_details/$Symbol.txt")) {} else { include ('crawler_file.php'); $html = file_get_html("http://targeturl.com/research/stocks/snapshot/snapshot.asp?ticker=$Symbol:US"); $es = $html->find('div[class="detailsDataContainerLt"]'); $tickerdetails = ("$es[0]"); $FileHandle2 = fopen($_SERVER['DOCUMENT_ROOT'] . "/wp-content/themes/stocks/tools/modules/Stock_Quote/company_details/$Symbol.txt", 'w') or die("can't open file"); fwrite($FileHandle2, $tickerdetails); fclose($FileHandle2); } ?>

Read the article

Search Results

Search found 261 results on 11 pages for 'crawler'.

Page 6/11 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page >

- by Jojo

- by mmtemporary

- by rgazelot

- by Satish

- by Satish

- by tekiegreg

- by zenpoy

- by Starfs

- by Starfs

- by com

- by CVertex

- by Malachi

- by strongopinions

- by Nicklas Ansman

- by mithunb

- by user37843

- by Raf

- by kiltek

- by benjismith

- by Sebi

- by Martin

< Previous Page | 2 3 4 5 6 7 8 9 10 11 | Next Page >