crawl rate - Page 41 - Developer IT

Does SEO optimisation count on the responsive side of a site?

- by Rick Donohoe

I'm looking at making some SEO optimisation fixes, and at this point I'm sorting out the heading structure and keywords - H1's, H2's etc We have a site where there are a number of similar blocks, and one is always visible, and one is hidden depending on the screen size. This is our method of making a single site responsive. Firstly, how does this technique affect the SEO, and in general does the responsive side of a site matter at all to search engines? What I mean by this is if the site has different content depending on screen sizes, then which content would the search spider crawl?

Read the article

Crawling for geotagged data

- by abe3

I have no experience with web crawlers -- but I know that Apache maintains an open source web crawler called "Lucene." How would I go about writing such a crawler to search the web for geo tagged data close to a particular location? What would a general road map look like? How do I pick which slice of the web to crawl? Do I use regular expressions to find things that look like longitudes and latitudes? What does a general sketch of that solution look like?

Read the article

robots.txt, how effective is it and how long does it take?

- by Stefan

We recently updated the site to a single page site using jQuery to slide between "pages". So we now have only index.php. When you search the company on engines such as Google, you get the site and a listing of its sub pages which now lead to outdated pages. Our plan doesn't allow us to edit the .htaccess and the old pages are .html docs so I cannot use PHP redirects either. So if I put in place a robots.txt telling the engines to not crawl beyond index.php, how effective will this be in preventing/removing crawled sub pages. And rough guess, how long before the search engines would update?

Read the article

I've changed my URL schema. How do I tell Google to index the new schema and forget the old one?

- by growse

I had a site where the urls were constructed like this /index.php/Topic /index.php/AnotherTopic These were indexed in google, and search results returned that pointed to these. However, I've recently replatformed that site, and reconfigured it so the above urls would be: /index.php?title=Topic /index.php?title=AnotherTopic The original urls are returning 404s. The site is linking to the correct URL schema internally, but Google is retaining the original schema in its search results. I've updated and resubmitted the sitemap which only contains the new schema. Also, Google's webmasters tool is going slightly bananas at the fact there's now a spike in 404 errors in its crawl results. What would be the best approach to get Google to 'forget' about the old schema, and instead index the new schema? Should I try blocking /index.php/ in robots.txt? Should I be returning 301 codes instead of 404 for the original urls?

Read the article

Did Google Delete my index?

- by Sei

I have a website that I haven't had much time to take care of. I updated it 4 times since last October, the contents are all original and informative. I can see those cache from Way back Machine and the site was indexed in Yahoo, but not in google. Did I get my index on Google deleted because I did not update it often? Normally Google crawl often and index fast, I just got really worried if I did something wrong. Is it possible that I set something wrong in the hosting or something? Please help me

Read the article

SEO: Joomla Category Page Optimization + Canonical Linking

- by Huberis

I'm wondering how best to optimize my Joomla site's SEO. I have pages with multiple articles on each page. Either via category-type pages, or via modules. In each case, I'm not wanting users to access the articles separately from the forward facing, menu-linked pages. I understand however that Joomla still generates a url for those articles, and Google can still crawl and display these articles separate from the pages. My question is what is the best way to control this so that my users get directed only to the front-facing pages? By using the canonical element for each article to point to the front-facing page it's on? Or is there a better method? Thanks for your help!

Read the article

problem showing my website correctly in search engines

- by dinbrca

Hello guys, I have a website which i have indexed on google for example (like 15 days ago). some of my pages pass arguments like: http://www.bla.com/products.php?pro=bla&page=view suddently i saw that passing arguments like this isn't good for SEO purposes and started using htaccess rewrite. and changed the arguments to like this: http://www.bla.com/products/bla/*view*/ now my site on google still shows as i showed at link number 1 what should i do? i thought i should wait for the search engine to crawl my site again but nothing happened. thanks in advanced, Din

Read the article

Data architecture for event log metrics?

- by elliot42

My service has a large ongoing number of user events, and we would like to do things like "count occurrence of event type T since date D." We are trying to make two basic decisions: What to store? Storing every event vs. only storing aggregates (Event log style) log every event and count them later, vs. (Time-series style) store a single aggregated "count of event E for date D" for every day Where to store the data In a relational database (particularly MySQL) In a non-relational (NoSQL) database In flat log files (collected centrally over the network via syslog-ng) What is standard practice / where can I read more about comparing the different types of systems? Additional details: The total event stream is large, potentially hundreds of thousands of entries per day But our current need is only to count certain types of events within it We don't necessarily need real-time access to the raw data or aggregation results IMHO, "log all events to files, crawl them at a later time to filter and aggregate the stream" is a pretty standard UNIX Way, but my Rails-y compatriots seem to think that nothing is real unless it's in MySQL.

Read the article

My website google index suddenly increase and also suddenly reduced

- by Jeg Bagus

Yesterday before i sleep, i check my site index. i get about 50 index on google. today morning when i wake up, i get 250 index on google. and my page ranking better on several keyword. than i add 1 page and 2 canonical link, add 404 page header, and resubmit sitemap. and after 2 hour, its going down to 50 index again. and my page ranking just rolled back to previous day. what is actually happen? is it because i resubmit sitemap? until now, google still crawl my website. do they try to refresh the index?

Read the article

Google is still crawling and indexing my old, dummy, test pages which now are 404 not found

- by Ace

I have set up my site with sample pages and data (lorem ipsum, etc..) and Google has crawled these pages. I deleted all these pages and actually added real content but in webmaster tools, i still get a lot of 404 errors Google trying to crawl these pages. I have set them to "mark as resolved" but some pages still come back as 404. Furthermore, I have a lot of these sample pages still listed when i do a search of my site on Google. How to remove them. I think these irrelevant pages are hurting my rating. I actually wanted to erase all these pages and start getting my site being being indexed as a new one but I read it's not possible? (I have submitted a sitemap and used "Fetch as Google.")

Read the article

Space in img:s "ALT" attribute good/bad for search engines?

- by Camran

I am trying to make it easier for search engines to crawl my website, as it is almost 100% dynamic. I have a couple of transparent images which are actually links to sections of my page. I wonder, if I add an "alt" attribute containing space characters to explain the target, will this improve SE rankings etc? For example: <img src="blabla.png" alt="post new classified"> Or will this just result in errors? Ànd, what should I put in the alt attribute if I can't use space? PS: Another different and short question, will javascript-rich content make a page less important to crawlers? Thanks

Read the article

Google doesn't index a subdomain. What can be the problem and what can be done?

- by fudge

I have a domain, let's call it example.com, which has a subdomain, games.example.com. I maintain a games forum using phpbbseo which is located at games.example.com/forum. The problem is that the forum is not being crawled. I used Google's webmaster tools and tested that the page is seen by google. P.S. There is a link from games.example.com to games.example.com/forum. What can I do? How can I make google crawl my forum?

Read the article

Issue with sitemap in GWT

- by Anusha

I have an e-commerce website www.beyondtime.in, i have been constantly monitoring the google bot crawling on my website and my webmaster account. Lately, i have found two issues that i have not been able to understand and hence want your help. 1.) The Google Bots have been only crawling www.beyondtime.in/telecom.php this URL of my website, when the URL is not even valid. So, kindly help me understand what needs to be done to let Google crawl other pages of the website as well. 2.) The second question is about the Google Webmaster account, where i've submitted my sitmap with 227 URLs, but out of that only 156 have been indexed. Also none of the images of my website have been indexed by Google. So kindly help me with this as well. Thanks

Read the article

Should I include everything in the sitemap or only new content?

- by Mee

For a website with dynamic content (new content is constantly being added), should I only include the newest content in the sitemap or should I include everything (with a sitemap index)? What are the best practices for sitemaps esp. for large sites? Also, is there anyway to make google (and other search engines) only crawl the pages in the sitemap? Thanks Update: Also, any idea how stackoverflow handle this? I'd like to know but unfortunately (also understandingly) they have blocked access to their sitemap.

Read the article

Ubuntu slows down even after cpu-intensive process is ended

- by Matt2

After a Skype video call, or the use of virtualbox, Ubuntu slows down to a crawl, even after the process is ended. Running htop reveals that processes that used little CPU before are now all using about 30% cpu (namely Compiz, Firefox, Python, and Skype, but I'm sure there are others), to the point where all my cores are at 99%. All I can do from here is restart. Any idea why this is happpening? I'm running Ubuntu 12.04 64-bit on 3.7 GiB of memory, Intel® Core™ i3 CPU M 330 @ 2.13GHz × 4, VESA: M92 graphics driver. Not sure why I'm running VESA, I installed fglrx, but I suppose that's a different question. Thanks in advance!

Read the article

How to make a jar file run on startup & and when you log out?

- by RanZilber

I have no idea where to start looking. I've been reading about daemons and didn't understand the concept. More details : I've been writing a crawler which never stops and crawlers over RSS in the internet. The crawler has been written in java - therefore its a jar right now. I'm an administrator on a machine that has Ubuntu 11.04 . There is some chances for the machine to crash , so I'd like the crawler to run every time you startup the machine. Furthermore, I'd like it to keep running even when i logged out. I'm not sure this is possible, but most of the time I'm logged out, and I still want to it crawl. Any ideas? Can someone point me in the right direction? Just looking for the simplest solution.

Read the article

Restricting A Directory Through .htaccess

- by Whitechapel

I'm trying to put all of my FTP accounts into a folder on /public_html/ftp and password protect it so search bots can't crawl their private files. I'm also trying to redirect all site traffic from the non-www to www. I keep getting 500 errors when accessing the site, and I need to point it to www.vivalanation.com/ftp to www.vivalanation.com/ftp/, because the /ftp just errors out, you need the trailing slash. Here is my .htaccess in the /public_html/ftp folder: RewriteEngine on RewriteBase / RewriteCond %{HTTP_HOST} !^www\. [NC] RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L] AuthName "FTP Access" AuthType Basic AuthUserFile /home1/vivalst/.htpasswds/public_html/ftp/passwd Require valid-user I created a passwd file in /.htpasswds/public_html/ftp And here is my basic .htaccess in the root of /public_html/: RewriteEngine on RewriteBase / RewriteCond %{HTTP_HOST} !^www\. [NC] RewriteRule ^(.*)$ http://www.%{HTTP_HOST}/$1 [R=301,L]

Read the article

Site returning 404 header to google, not sure why

- by Damon

A Drupal site that works fine for regular users returns a 404 not found error when I try to use the W3C validator on it; it is also not being indexed by google at all (which is the main issue but I suspect there is a connection). It is a https:// site with .htaccess rule to redirect any http:// request to the https://. I had had it running in google webmaster tools and thought it was fine, but it turns out I had not added the https domain. After adding the https domain it's also returning the header as HTTP/1.1 404 Not Found Date: Mon, 15 Oct 2012 19:37:43 GMT Server: Apache Expires: Sun, 19 Nov 1978 05:00:00 GMT Cache-Control: no-cache, must-revalidate, post-check=0, pre-check=0 Robots.txt just has User-agent: * Crawl-delay: 10 # Files Disallow: /cron.php How can I check what the issue is here?

Read the article

If C-Panel Indexing Manager sets a folder to "No Indexing" can it be crawled by a webcrawler?

- by Graham

People are able to view directories / folders on my site right now. So, they could go to mysite.com/images and see the full index. To prevent this, C-Panel offers an option to set a directory / folder to "No Indexing" under the "Index Manager." Will this option allow webcrawlers to crawl / index the images? Or, is there a simpler alternative to block access to all folders directly while still having it SEO friendly? My old server restricted direct access to folders by default. But, the new one does not. Any ideas on this? Thanks!

Read the article

Hide from google while developing

- by user210757

I will be building a (wordpress) web site. While I am developing, other team members will be pushing content. I'd like to have it hidden from google while under development. It will be hosted on godaddy. I have thought of not pointing the domain name to it until live and using "preview dns", or buying a static IP during development. Or hosting dev site in a sub-directory ("/dev/") until ready and then moving it up a level. If in the dev directory I'd add htaccess or robots.txt to not crawl. Is any of this a bad idea? Will google penalize for any of this - like search by IP and then associate that with the domain later on? Any better ideas?

Read the article

SEO Influenc search result per device class (mobile/desktop)

- by user32224

We're currently building a new responsive website and while working on the site map figured that we don't want to show certain sections on mobile devices. This can be easily done by hiding the navigation parts using css/media queries. However, trouble is that the hidden sites would still show up in search engines' search results. If a user happens to click on one of these links she might happen to see a badly formatted page as we'd use desktop/tablet only code to show images and video. Is there any way to "influence" to exclude certain pages if the search is done on a mobile device? Do search engines crawl pages once or with a device specific view twice? Could we set a noindex meta tag for a specific device class?

Read the article

Exclude pages from search results based on device class (mobile/desktop)

- by user32224

We're currently building a new responsive website. While working on the site map, we figured that we don't want to show certain sections on mobile devices. This can be easily done by hiding the navigation parts using CSS/media queries. However, the trouble is that the hidden sites would still show up in search engine results. If a user happens to click on one of these links she might happen to see a badly formatted page as we'd use desktop/tablet only code to show images and video. Is there any way influence the search engines to exclude certain pages if the search is done on a mobile device? Do search engines crawl pages once or with a device specific view twice? Could we set a noindex meta tag for a specific device class?

Read the article

what does calling ´this´ outside of a jquery plugin refer to

- by Richard

Hi, I am using the liveTwitter plugin The problem is that I need to stop the plugin from hitting the Twitter api. According to the documentation I need to do this $("#tab1 .container_twitter_status").each(function(){ this.twitter.stop(); }); Already, the each does not make sense on an id and what does this refer to? Anyway, I get an undefined error. I will paste the plugin code and hope it makes sense to somebody MY only problem thusfar with this plugin is that I need to be able to stop it. thanks in advance, Richard /* * jQuery LiveTwitter 1.5.0 * - Live updating Twitter plugin for jQuery * * Copyright (c) 2009-2010 Inge Jørgensen (elektronaut.no) * Licensed under the MIT license (MIT-LICENSE.txt) * * $Date: 2010/05/30$ */ /* * Usage example: * $("#twitterSearch").liveTwitter('bacon', {limit: 10, rate: 15000}); */ (function($){ if(!$.fn.reverse){ $.fn.reverse = function() { return this.pushStack(this.get().reverse(), arguments); }; } $.fn.liveTwitter = function(query, options, callback){ var domNode = this; $(this).each(function(){ var settings = {}; // Handle changing of options if(this.twitter) { settings = jQuery.extend(this.twitter.settings, options); this.twitter.settings = settings; if(query) { this.twitter.query = query; } this.twitter.limit = settings.limit; this.twitter.mode = settings.mode; if(this.twitter.interval){ this.twitter.refresh(); } if(callback){ this.twitter.callback = callback; } // ..or create a new twitter object } else { // Extend settings with the defaults settings = jQuery.extend({ mode: 'search', // Mode, valid options are: 'search', 'user_timeline' rate: 15000, // Refresh rate in ms limit: 10, // Limit number of results refresh: true }, options); // Default setting for showAuthor if not provided if(typeof settings.showAuthor == "undefined"){ settings.showAuthor = (settings.mode == 'user_timeline') ? false : true; } // Set up a dummy function for the Twitter API callback if(!window.twitter_callback){ window.twitter_callback = function(){return true;}; } this.twitter = { settings: settings, query: query, limit: settings.limit, mode: settings.mode, interval: false, container: this, lastTimeStamp: 0, callback: callback, // Convert the time stamp to a more human readable format relativeTime: function(timeString){ var parsedDate = Date.parse(timeString); var delta = (Date.parse(Date()) - parsedDate) / 1000; var r = ''; if (delta < 60) { r = delta + ' seconds ago'; } else if(delta < 120) { r = 'a minute ago'; } else if(delta < (45*60)) { r = (parseInt(delta / 60, 10)).toString() + ' minutes ago'; } else if(delta < (90*60)) { r = 'an hour ago'; } else if(delta < (24*60*60)) { r = '' + (parseInt(delta / 3600, 10)).toString() + ' hours ago'; } else if(delta < (48*60*60)) { r = 'a day ago'; } else { r = (parseInt(delta / 86400, 10)).toString() + ' days ago'; } return r; }, // Update the timestamps in realtime refreshTime: function() { var twitter = this; $(twitter.container).find('span.time').each(function(){ $(this).html(twitter.relativeTime(this.timeStamp)); }); }, // Handle reloading refresh: function(initialize){ var twitter = this; if(this.settings.refresh || initialize) { var url = ''; var params = {}; if(twitter.mode == 'search'){ params.q = this.query; if(this.settings.geocode){ params.geocode = this.settings.geocode; } if(this.settings.lang){ params.lang = this.settings.lang; } if(this.settings.rpp){ params.rpp = this.settings.rpp; } else { params.rpp = this.settings.limit; } // Convert params to string var paramsString = []; for(var param in params){ if(params.hasOwnProperty(param)){ paramsString[paramsString.length] = param + '=' + encodeURIComponent(params[param]); } } paramsString = paramsString.join("&"); url = "http://search.twitter.com/search.json?"+paramsString+"&callback=?"; } else if(twitter.mode == 'user_timeline') { url = "http://api.twitter.com/1/statuses/user_timeline/"+encodeURIComponent(this.query)+".json?count="+twitter.limit+"&callback=?"; } else if(twitter.mode == 'list') { var username = encodeURIComponent(this.query.user); var listname = encodeURIComponent(this.query.list); url = "http://api.twitter.com/1/"+username+"/lists/"+listname+"/statuses.json?per_page="+twitter.limit+"&callback=?"; } $.getJSON(url, function(json) { var results = null; if(twitter.mode == 'search'){ results = json.results; } else { results = json; } var newTweets = 0; $(results).reverse().each(function(){ var screen_name = ''; var profile_image_url = ''; if(twitter.mode == 'search') { screen_name = this.from_user; profile_image_url = this.profile_image_url; created_at_date = this.created_at; } else { screen_name = this.user.screen_name; profile_image_url = this.user.profile_image_url; // Fix for IE created_at_date = this.created_at.replace(/^(\w+)\s(\w+)\s(\d+)(.*)(\s\d+)$/, "$1, $3 $2$5$4"); } var userInfo = this.user; var linkified_text = this.text.replace(/[A-Za-z]+:\/\/[A-Za-z0-9-_]+\.[A-Za-z0-9-_:%&\?\/.=]+/, function(m) { return m.link(m); }); linkified_text = linkified_text.replace(/@[A-Za-z0-9_]+/g, function(u){return u.link('http://twitter.com/'+u.replace(/^@/,''));}); linkified_text = linkified_text.replace(/#[A-Za-z0-9_\-]+/g, function(u){return u.link('http://search.twitter.com/search?q='+u.replace(/^#/,'%23'));}); if(!twitter.settings.filter || twitter.settings.filter(this)) { if(Date.parse(created_at_date) > twitter.lastTimeStamp) { newTweets += 1; var tweetHTML = '<div class="tweet tweet-'+this.id+'">'; if(twitter.settings.showAuthor) { tweetHTML += '<img width="24" height="24" src="'+profile_image_url+'" />' + '<p class="text"><span class="username"><a href="http://twitter.com/'+screen_name+'">'+screen_name+'</a>:</span> '; } else { tweetHTML += '<p class="text"> '; } tweetHTML += linkified_text + ' <span class="time">'+twitter.relativeTime(created_at_date)+'</span>' + '</p>' + '</div>'; $(twitter.container).prepend(tweetHTML); var timeStamp = created_at_date; $(twitter.container).find('span.time:first').each(function(){ this.timeStamp = timeStamp; }); if(!initialize) { $(twitter.container).find('.tweet-'+this.id).hide().fadeIn(); } twitter.lastTimeStamp = Date.parse(created_at_date); } } }); if(newTweets > 0) { // Limit number of entries $(twitter.container).find('div.tweet:gt('+(twitter.limit-1)+')').remove(); // Run callback if(twitter.callback){ twitter.callback(domNode, newTweets); } // Trigger event $(domNode).trigger('tweets'); } }); } }, start: function(){ var twitter = this; if(!this.interval){ this.interval = setInterval(function(){twitter.refresh();}, twitter.settings.rate); this.refresh(true); } }, stop: function(){ if(this.interval){ clearInterval(this.interval); this.interval = false; } } }; var twitter = this.twitter; this.timeInterval = setInterval(function(){twitter.refreshTime();}, 5000); this.twitter.start(); } }); return this; }; })(jQuery);

Read the article

How can I use Perl regular expressions to parse XML data?

- by Luke

I have a pretty long piece of XML that I want to parse. I want to remove everything except for the subclass-code and city. So that I am left with something like the example below. EXAMPLE TEST SUBCLASS|MIAMI CODE <?xml version="1.0" standalone="no"?> <web-export> <run-date>06/01/2010 <pub-code>TEST <ad-type>TEST <cat-code>Real Estate</cat-code> <class-code>TEST</class-code> <subclass-code>TEST SUBCLASS</subclass-code> <placement-description></placement-description> <position-description>Town House</position-description> <subclass3-code></subclass3-code> <subclass4-code></subclass4-code> <ad-number>0000284708-01</ad-number> <start-date>05/28/2010</start-date> <end-date>06/09/2010</end-date> <line-count>6</line-count> <run-count>13</run-count> <customer-type>Private Party</customer-type> <account-number>100099237</account-number> <account-name>DOE, JOHN</account-name> <addr-1>207 CLARENCE STREET</addr-1> <addr-2> </addr-2> <city>MIAMI</city> <state>FL</state> <postal-code>02910</postal-code> <country>USA</country> <phone-number>4014612880</phone-number> <fax-number></fax-number> <url-addr> </url-addr> <email-addr>[email protected]</email-addr> <pay-flag>N</pay-flag> <ad-description>DEANESTATES2BEDS2BATHSAPPLIANCED</ad-description> <order-source>Import</order-source> <order-status>Live</order-status> <payor-acct>100099237</payor-acct> <agency-flag>N</agency-flag> <rate-note></rate-note> <ad-content> MIAMI/Dean Estates: 2 beds, 2 baths. Applianced. Central air. Carpets. Laundry. 2 decks. Pool. Parking. Close to everything.No smoking. No utilities. $1275 mo. 401-578-1501. </ad-content> </ad-type> </pub-code> </run-date> </web-export> PERL So what I want to do is open an existing file read the contents then use regular expressions to eliminate the unnecessary XML tags. open(READFILE, "FILENAME"); while(<READFILE>) { $_ =~ s/<\?xml version="(.*)" standalone="(.*)"\?>\n.*//g; $_ =~ s/<subclass-code>//g; $_ =~ s/<\/subclass-code>\n.*/|/g; $_ =~ s/(.*)PJ RER Houses /PJ RER Houses/g; $_ =~ s/\G //g; $_ =~ s/<city>//g; $_ =~ s/<\/city>\n.*//g; $_ =~ s/<(\/?)web-export>(.*)\n.*//g; $_ =~ s/<(\/?)run-date>(.*)\n.*//g; $_ =~ s/<(\/?)pub-code>(.*)\n.*//g; $_ =~ s/<(\/?)ad-type>(.*)\n.*//g; $_ =~ s/<(\/?)cat-code>(.*)<(\/?)cat-code>\n.*//g; $_ =~ s/<(\/?)class-code>(.*)<(\/?)class-code>\n.*//g; $_ =~ s/<(\/?)placement-description>(.*)<(\/?)placement-description>\n.*//g; $_ =~ s/<(\/?)position-description>(.*)<(\/?)position-description>\n.*//g; $_ =~ s/<(\/?)subclass3-code>(.*)<(\/?)subclass3-code>\n.*//g; $_ =~ s/<(\/?)subclass4-code>(.*)<(\/?)subclass4-code>\n.*//g; $_ =~ s/<(\/?)ad-number>(.*)<(\/?)ad-number>\n.*//g; $_ =~ s/<(\/?)start-date>(.*)<(\/?)start-date>\n.*//g; $_ =~ s/<(\/?)end-date>(.*)<(\/?)end-date>\n.*//g; $_ =~ s/<(\/?)line-count>(.*)<(\/?)line-count>\n.*//g; $_ =~ s/<(\/?)run-count>(.*)<(\/?)run-count>\n.*//g; $_ =~ s/<(\/?)customer-type>(.*)<(\/?)customer-type>\n.*//g; $_ =~ s/<(\/?)account-number>(.*)<(\/?)account-number>\n.*//g; $_ =~ s/<(\/?)account-name>(.*)<(\/?)account-name>\n.*//g; $_ =~ s/<(\/?)addr-1>(.*)<(\/?)addr-1>\n.*//g; $_ =~ s/<(\/?)addr-2>(.*)<(\/?)addr-2>\n.*//g; $_ =~ s/<(\/?)state>(.*)<(\/?)state>\n.*//g; $_ =~ s/<(\/?)postal-code>(.*)<(\/?)postal-code>\n.*//g; $_ =~ s/<(\/?)country>(.*)<(\/?)country>\n.*//g; $_ =~ s/<(\/?)phone-number>(.*)<(\/?)phone-number>\n.*//g; $_ =~ s/<(\/?)fax-number>(.*)<(\/?)fax-number>\n.*//g; $_ =~ s/<(\/?)url-addr>(.*)<(\/?)url-addr>\n.*//g; $_ =~ s/<(\/?)email-addr>(.*)<(\/?)email-addr>\n.*//g; $_ =~ s/<(\/?)pay-flag>(.*)<(\/?)pay-flag>\n.*//g; $_ =~ s/<(\/?)ad-description>(.*)<(\/?)ad-description>\n.*//g; $_ =~ s/<(\/?)order-source>(.*)<(\/?)order-source>\n.*//g; $_ =~ s/<(\/?)order-status>(.*)<(\/?)order-status>\n.*//g; $_ =~ s/<(\/?)payor-acct>(.*)<(\/?)payor-acct>\n.*//g; $_ =~ s/<(\/?)agency-flag>(.*)<(\/?)agency-flag>\n.*//g; $_ =~ s/<(\/?)rate-note>(.*)<(\/?)rate-note>\n.*//g; $_ =~ s/<ad-content>(.*)\n.*//g; $_ =~ s/\t(.*)\n.*//g; $_ =~ s/<\/ad-content>(.*)\n.*//g; } close( READFILE1 ); Is there an easier way of doing this? I don't want to use any modules. I know that it might make this easier but the file I am reading has a lot of data in it.

Read the article

Problems with real-valued input deep belief networks (of RBMs)

- by Junier

I am trying to recreate the results reported in Reducing the dimensionality of data with neural networks of autoencoding the olivetti face dataset with an adapted version of the MNIST digits matlab code, but am having some difficulty. It seems that no matter how much tweaking I do on the number of epochs, rates, or momentum the stacked RBMs are entering the fine-tuning stage with a large amount of error and consequently fail to improve much at the fine-tuning stage. I am also experiencing a similar problem on another real-valued dataset. For the first layer I am using a RBM with a smaller learning rate (as described in the paper) and with negdata = poshidstates*vishid' + repmat(visbiases,numcases,1); I'm fairly confident I am following the instructions found in the supporting material but I cannot achieve the correct errors. Is there something I am missing? See the code I'm using for real-valued visible unit RBMs below, and for the whole deep training. The rest of the code can be found here. rbmvislinear.m: epsilonw = 0.001; % Learning rate for weights epsilonvb = 0.001; % Learning rate for biases of visible units epsilonhb = 0.001; % Learning rate for biases of hidden units weightcost = 0.0002; initialmomentum = 0.5; finalmomentum = 0.9; [numcases numdims numbatches]=size(batchdata); if restart ==1, restart=0; epoch=1; % Initializing symmetric weights and biases. vishid = 0.1*randn(numdims, numhid); hidbiases = zeros(1,numhid); visbiases = zeros(1,numdims); poshidprobs = zeros(numcases,numhid); neghidprobs = zeros(numcases,numhid); posprods = zeros(numdims,numhid); negprods = zeros(numdims,numhid); vishidinc = zeros(numdims,numhid); hidbiasinc = zeros(1,numhid); visbiasinc = zeros(1,numdims); sigmainc = zeros(1,numhid); batchposhidprobs=zeros(numcases,numhid,numbatches); end for epoch = epoch:maxepoch, fprintf(1,'epoch %d\r',epoch); errsum=0; for batch = 1:numbatches, if (mod(batch,100)==0) fprintf(1,' %d ',batch); end %%%%%%%%% START POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% data = batchdata(:,:,batch); poshidprobs = 1./(1 + exp(-data*vishid - repmat(hidbiases,numcases,1))); batchposhidprobs(:,:,batch)=poshidprobs; posprods = data' * poshidprobs; poshidact = sum(poshidprobs); posvisact = sum(data); %%%%%%%%% END OF POSITIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% poshidstates = poshidprobs > rand(numcases,numhid); %%%%%%%%% START NEGATIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% negdata = poshidstates*vishid' + repmat(visbiases,numcases,1);% + randn(numcases,numdims) if not using mean neghidprobs = 1./(1 + exp(-negdata*vishid - repmat(hidbiases,numcases,1))); negprods = negdata'*neghidprobs; neghidact = sum(neghidprobs); negvisact = sum(negdata); %%%%%%%%% END OF NEGATIVE PHASE %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% err= sum(sum( (data-negdata).^2 )); errsum = err + errsum; if epoch>5, momentum=finalmomentum; else momentum=initialmomentum; end; %%%%%%%%% UPDATE WEIGHTS AND BIASES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% vishidinc = momentum*vishidinc + ... epsilonw*( (posprods-negprods)/numcases - weightcost*vishid); visbiasinc = momentum*visbiasinc + (epsilonvb/numcases)*(posvisact-negvisact); hidbiasinc = momentum*hidbiasinc + (epsilonhb/numcases)*(poshidact-neghidact); vishid = vishid + vishidinc; visbiases = visbiases + visbiasinc; hidbiases = hidbiases + hidbiasinc; %%%%%%%%%%%%%%%% END OF UPDATES %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% end fprintf(1, '\nepoch %4i error %f \n', epoch, errsum); end dofacedeepauto.m: clear all close all maxepoch=200; %In the Science paper we use maxepoch=50, but it works just fine. numhid=2000; numpen=1000; numpen2=500; numopen=30; fprintf(1,'Pretraining a deep autoencoder. \n'); fprintf(1,'The Science paper used 50 epochs. This uses %3i \n', maxepoch); load fdata %makeFaceData; [numcases numdims numbatches]=size(batchdata); fprintf(1,'Pretraining Layer 1 with RBM: %d-%d \n',numdims,numhid); restart=1; rbmvislinear; hidrecbiases=hidbiases; save mnistvh vishid hidrecbiases visbiases; maxepoch=50; fprintf(1,'\nPretraining Layer 2 with RBM: %d-%d \n',numhid,numpen); batchdata=batchposhidprobs; numhid=numpen; restart=1; rbm; hidpen=vishid; penrecbiases=hidbiases; hidgenbiases=visbiases; save mnisthp hidpen penrecbiases hidgenbiases; fprintf(1,'\nPretraining Layer 3 with RBM: %d-%d \n',numpen,numpen2); batchdata=batchposhidprobs; numhid=numpen2; restart=1; rbm; hidpen2=vishid; penrecbiases2=hidbiases; hidgenbiases2=visbiases; save mnisthp2 hidpen2 penrecbiases2 hidgenbiases2; fprintf(1,'\nPretraining Layer 4 with RBM: %d-%d \n',numpen2,numopen); batchdata=batchposhidprobs; numhid=numopen; restart=1; rbmhidlinear; hidtop=vishid; toprecbiases=hidbiases; topgenbiases=visbiases; save mnistpo hidtop toprecbiases topgenbiases; backpropface; Thanks for your time

Search Results

Search found 2823 results on 113 pages for 'crawl rate'.

Page 41/113 | < Previous Page | 37 38 39 40 41 42 43 44 45 46 47 48 | Next Page >

- by Rick Donohoe

- by abe3

- by Stefan

- by growse

- by Sei

- by Huberis

- by dinbrca

- by elliot42

- by Jeg Bagus

- by Ace

- by Camran

- by fudge

- by Anusha

- by Mee

- by Matt2

- by RanZilber

- by Whitechapel

- by Damon

- by Graham

- by user210757

- by user32224

- by user32224

- by Richard

- by Luke

- by Junier

< Previous Page | 37 38 39 40 41 42 43 44 45 46 47 48 | Next Page >