Search Results

Search found 7625 results on 305 pages for 'scraper sites'.

Page 20/305 | < Previous Page | 16 17 18 19 20 21 22 23 24 25 26 27 | Next Page >

A global login (many sites)

- by John

We are a growing network but we figured we want to keep that the User only would need one account in order to access the network different sites. (Similar to Stackoverflow's login, If you login in to another "site" you use your account credentials and than your account is created). We want our own login system (Username, password) and not OpenId, as we'd probably have that in the future, but the main focus right now is the global login. How can I do this? Do a Curl request and send back a cookie? Have a "database" just for the login procedure and on first login also create a new "User" in the site specified database? Suggestions?.

Read the article
PGU HTML Renderer can't render most sites

- by None

I am trying to make a web browser using pygame. I am using PGU for html rendering. It works fine when I visit simple sites, like example.com, but when I try and load anything more complex that uses an html form, like google, I get this error: UnboundLocalError: local variable 'e' referenced before assignment I looked in the PGU html rendering file and found this code segment: def start_input(self,attrs): r = self.attrs_to_map(attrs) params = self.map_to_params(r) #why bother #params = {} type_,name,value = r.get('type','text'),r.get('name',None),r.get('value',None) f = self.form if type_ == 'text': e = gui.Input(**params) self.map_to_connects(e,r) self.item.add(e) elif type_ == 'radio': if name not in f.groups: f.groups[name] = gui.Group(name=name) g = f.groups[name] del params['name'] e = gui.Radio(group=g,**params) self.map_to_connects(e,r) self.item.add(e) if 'checked' in r: g.value = value elif type_ == 'checkbox': if name not in f.groups: f.groups[name] = gui.Group(name=name) g = f.groups[name] del params['name'] e = gui.Checkbox(group=g,**params) self.map_to_connects(e,r) self.item.add(e) if 'checked' in r: g.value = value elif type_ == 'button': e = gui.Button(**params) self.map_to_connects(e,r) self.item.add(e) elif type_ == 'submit': e = gui.Button(**params) self.map_to_connects(e,r) self.item.add(e) elif type_ == 'file': e = gui.Input(**params) self.map_to_connects(e,r) self.item.add(e) b = gui.Button(value='Browse...') self.item.add(b) def _browse(value): d = gui.FileDialog(); d.connect(gui.CHANGE,gui.action_setvalue,(d,e)) d.open(); b.connect(gui.CLICK,_browse,None) self._locals[r.get('id',None)] = e I got the error in the last line, because e wasn't defined. I am guessing the reason for this is that the if statement that checks the type of the input and creates the e variable didn't match anything. I added a line to print the _type variable and I got 'hidden' when i tried google and apple. Is there any way to render form items that have the type 'hidden' with PGU?

Read the article
Convert a (nested)HTML unordered list of links to PHP array of links

- by Klark

Hi, I have a regular, nested HTML unordered list of links, and I'd like to scrape it with PHP and convert it to an array. The original list looks something like this: <ul> <li><a href="http://someurl.com">First item</a> <ul> <li><a href="http://someotherurl.com/">Child of First Item</a></li> <li><a href="http://someotherurl.com/">Second Child of First Item</a></li> </ul> </li> <li><a href="http://bogusurl.com">Second item</a></li> <li><a href="http://bogusurl.com">Third item</a></li> <li><a href="http://bogusurl.com">Fourth item</a></li> </ul> Any of the items can have children. (The actual screen scraping is not a problem, I can do that.) I'd like to turn this into a PHP array, of just the links, while keeping the hierarchical nature of the list. Any ideas? I've looked at using htmlsimpledom and phpQuery, which both use jQuery like syntax. But, I can't seem to get the syntax right. Thanks.

Read the article
Are relative-path symlinks reliable on Rackspace Cloud Sites?

- by Jakobud

Rackspace's Cloud Sites have a lot of stupid limitations. For example, no SSH (in or out), no shell, no RSYNC, etc... (even through cron). Recently I learned that you can't reliably use symlinks in Cloud Sites. Apparently this is because the absolute path of your sites could change at any moment, since it's a shared host environment split up between many disks/servers. I guess different account's sites get moved from disk to disk whenever Rackspace decides to. Supposedly to increase efficiency across the board. So after talking with a Rackspace tech, he said they cannot guarantee that symlinks would always work. Obviously this is because if you have a symlink that use's an absolute path like this: //mnt/disk-34566/home/user34566/files/sites/www.mysite.com/mydir If you files go moved to a different disk (or whatever they do), then the absolute path would be different and the link would now be broken. That makes sense. So next, I asked the Rackspace tech if relative path symlinks were reliable. So if I have the following link: files/sites/www.mysite.com/mylink --> ../www.myothersite.com/anotherdir You can see that the symlink simply points to a nearby directory's sub-directory. He said they cannot guarantee that even those would always work either. Since it uses a relative path to another nearby directory I'm not sure how it could ever break from something Rackspace would do. Do relative symlinks somehow rely on absolute paths underneath? Or is Rackspace using some weird custom filesystem where they will break from absolute path changes? It seems like a relative-path symlink would be fine and would only break if the user did something to mess up the directories involved. But when the tech's say that they "don't officially support symlinks of any kind" that makes me hesitant to use them for large commercial websites in Cloud Sites. Can anyone with Rackspace experience give input on this topic?

Read the article
Git for Websites / post-receive / Separation of Test and Production Sites

- by Walt W

Hi all, I'm using Git to manage my website's source code and deployment, and currently have the test and live sites running on the same box. Following this resource http://toroid.org/ams/git-website-howto originally, I came up with the following post-receive hook script to differentiate between pushes to my live site and pushes to my test site: while read ref do #echo "Ref updated:" #echo $ref -- would print something like example at top of file result=`echo $ref | gawk -F' ' '{ print $3 }'` if [ $result != "" ]; then echo "Branch found: " echo $result case $result in refs/heads/master ) git --work-tree=c:/temp/BLAH checkout -f master echo "Updated master" ;; refs/heads/testbranch ) git --work-tree=c:/temp/BLAH2 checkout -f testbranch echo "Updated testbranch" ;; * ) echo "No update known for $result" ;; esac fi done echo "Post-receive updates complete" However, I have doubts that this is actually safe :) I'm by no means a Git expert, but I am guessing that Git probably keeps track of the current checked-out branch head, and this approach probably has the potential to confuse it to no end. So a few questions: IS this safe? Would a better approach be to have my base repository be the test site repository (with corresponding working directory), and then have that repository push changes to a new live site repository, which has a corresponding working directory to the live site base? This would also allow me to move the production to a different server and keep the deployment chain intact. Is there something I'm missing? Is there a different, clean way to differentiate between test and production deployments when using Git for managing websites? As an additional note in light of Vi's answer, is there a good way to do this that would handle deletions without mucking with the file system much? Thank you, -Walt PS - The script I came up with for the multiple repos (and am using unless I hear better) is as follows: sitename=`basename \`pwd\`` while read ref do #echo "Ref updated:" #echo $ref -- would print something like example at top of file result=`echo $ref | gawk -F' ' '{ print $3 }'` if [ $result != "" ]; then echo "Branch found: " echo $result case $result in refs/heads/master ) git checkout -q -f master if [ $? -eq 0 ]; then echo "Test Site checked out properly" else echo "Failed to checkout test site!" fi ;; refs/heads/live-site ) git push -q ../Live/$sitename live-site:master if [ $? -eq 0 ]; then echo "Live Site received updates properly" else echo "Failed to push updates to Live Site" fi ;; * ) echo "No update known for $result" ;; esac fi done echo "Post-receive updates complete" And then the repo in ../Live/$sitename (these are "bare" repos with working trees added after init) has the basic post-receive: git checkout -f if [ $? -eq 0 ]; then echo "Live site `basename \`pwd\`` checked out successfully" else echo "Live site failed to checkout" fi

Read the article
Mac OS - Built SVN from source, now Apache2 not loading sites

- by Geuis

This relates to another question I asked earlier today. I built SVN 1.6.2 from source. In the process, it has completely screwed up my dev environment. After I built SVN, Apache wasn't loading. It was giving me this error: Syntax error on line 117 of /private/etc/apache2/httpd.conf: Cannot load /usr/libexec /apache2/mod_dav_svn.so into server: dlopen(/usr/libexec/apache2/mod_dav_svn.so, 10): no suitable image found. Did find:\n\t/usr/libexec/apache2/mod_dav_svn.so: mach-o, but wrong architecture It appears that SVN over-wrote the old mod_dav_svn.so and I am not able to get it to build as FAT, and I can't recover whatever was originally there. I resolved this(temporarily?) by commenting out the line that was loading the mod_dav_svn.so and got Apache to start at this point. However, even though Apache is running I am now getting this error when trying to access my dev sites: Directory index forbidden by Options directive: /usr/share/tomcat6/webapps/ROOT/ I have Apache2 sitting in front of Tomcat6. I access my local dev site using the internal name "http://localthesite". I have had virtual directories set up that have worked until this SVN debacle. Tomcat is installed at /usr/local/apache-tomcat, and webapps is /usr/local/apache-tomcat/webapps. Our production servers deploy tomcat to /usr/share/tomcat6, so I have symlinks setup on my system to replicate this as well. These point back to the actual installation path. This has all been working fine as well. None of our configurations for Apache2, Tomcat, or .htaccess have changed. Over the weekend, I performed a "Repair Disk Permissions" on the system. This was before I discovered the mod_dav_svn.so problem. I have been reading up on this all morning and the most common answer is that there is an Options -Indexes set. We have this in a config file, but it was there before and when I removed it during testing, I still got the same errors from Apache. At this point, I'm assuming I either totally borked the native Apache2 installation on this Mac, or that there is a permissions error somewhere that I'm missing. The permissions error could be from the SVN installation, or from my repair process. Does anyone have any idea what could be the problem? I'm totally blocked right now and have no idea where to check next.

Read the article
Google Site Data fetching

- by inTagger

Hail! I want to fetch image from NOT PUBLIC Google Site's page. I'm using WebClient for this purposes. var uri = new Uri("http://sites.google.com/a/MYDOMAIN.COM/SITENAME/" + "_/rsrc/1234567890/MYIMAGE.jpg"); string fileName = "d:\\!temp\\MYIMAGE.jpg"; if (File.Exists(fileName)) File.Delete(fileName); using (var webClient = new WebClient()) { var networkCredential = new NetworkCredential("USERNAME", "PASSWORD"); var credentialCache = new CredentialCache { {new Uri("sites.google.com"), "Basic", networkCredential}, {new Uri("www.google.com"), "Basic", networkCredential} }; webClient.Credentials = credentialCache; webClient.DownloadFile(uri, fileName); } It doesn't download image, but html file with login form is downloaded. If i open this link in browser it shows me login form then i enter username and password and then i can see the image. How i must use my credentials to download file with WebClient or HttpWebRequest?

Read the article
How do I sync Dreamweaver site definitions across 2 computers?

- by baritoneuk

I use Dreamweaver on my laptop and Desktop PC and frequently change between them. I keep all my sites synced using Syncplicity (similar to Dropbox, which indecently I also use) I want my site definitions to be synced across the two computers. If I add a new site on one computer I want it to appear on the other one. I know I can export all sites on one computer and then get Syncplicity to sync the files to the other computer at which point I can import them. However this relies on me remembering to do this each time I add a new site, and it's also quite time consuming. As far as I can tell, Dreamweaver (at least upto CS4- not sure about CS5) stupidly stores the site definitions in the registry. I really don't know why they do this- if they stored it in xml files then I could easily sync the information. Does anyone know if what I am asking is possible?

Read the article
Anyone have a good solution for scraping the HTML source of a page with content (in this case, HTML

- by phpwns

Anyone have a good solution for scraping the HTML source of a page with content (in this case, HTML tables) generated with Javascript? An embarrassingly simple, though workable solution using Crowbar: <?php function get_html($url) // $url must be urlencode(d) { $context = stream_context_create(array( 'http' => array('timeout' => 120) // HTTP timeout in seconds )); $html = substr(file_get_contents('http://127.0.0.1:10000/?url=' . $url . '&delay=3000&view=browser', 0, $context), 730, -32); // substr removes HTML from the Crowbar web service, returning only the $url HTML return $html; } ?> The advantage to using Crowbar is that the tables will be rendered (and accessible) thanks to the headless mozilla-based browser. The problem, of course, is being dependent on on an external web service, especially given that SIMILE seems to undergo regular server maintenance. :( A pure php solution would be nice, but any functional (and reliable) alternatives would be great.

Read the article
Clearing C#'s WebBrowser control's cookies for all sites WITHOUT clearing for IE itself

- by Helgi Hrafn Gunnarsson

Hail StackOverflow! The short version of what I'm trying to do is in the title. Here's the long version. I have a bit of a complex problem which I'm sure I will receive a lot of guesses as a response to. In order to keep the well-intended but unfortunately useless guesses to a minimum, let me first mention that the solution to this problem is not simple, so simple suggestions will unfortunately not help at all, even though I appreciate the effort. C#'s WebBrowser component is fundamentally IE itself so solutions with any sorts of caveats will almost certainly not work. I need to do exactly what I'm trying to do, and even a seemingly minor caveat will defeat the purpose completely. At the risk of sounding arrogant, I need assistance from someone who really has in-depth knowledge about C#'s WebBrowser and/or WinInet and/or how to communicate with Windows's underlying system from C#... or how to encapsulate C++ code in C#. That said, I don't expect anyone to do this for me, and I've found some promising hints which are explained later in this question. But first... what I'm trying to achieve is this. I have a Windows.Forms component which contains a WebBrowser control. This control needs to: Clear ALL cookies for ALL websites. Visit several websites, one after another, and record cookies and handle them correctly. This part works fine already so I don't have any problems with this. Rinse and repeat... theoretically forever. Now, here's the real problem. I need to clear all those cookies (for any and all sites), but only for the WebBrowser control itself and NOT the cookies which IE proper uses. What's fundamentally wrong with this approach is of course the fact that C#'s WebBrowser control is IE. But I'm a stubborn young man and I insist on it being possible, or else! ;) Here's where I'm stuck at the moment. It is quite simply impossible to clear all cookies for the WebBrowser control programmatically through C# alone. One must use DllImport and all the crazy stuff that comes with it. This chunk works fine for that purpose: [DllImport("wininet.dll", SetLastError = true)] private static extern bool InternetSetOption(IntPtr hInternet, int dwOption, IntPtr lpBuffer, int lpdwBufferLength); And then, in the function that actually does the clearing of the cookies: InternetSetOption(IntPtr.Zero, INTERNET_OPTION_END_BROWSER_SESSION, IntPtr.Zero, 0); Then all the cookies get cleared and as such, I'm happy. The program works exactly as intended, aside from the fact that it also clears IE's cookies, which must not be allowed to happen. The problem is that this also clears the cookies for IE proper, and I can't have that happen. From one fellow StackOverflower (if that's a word), Sheng Jiang proposed this to a different problem in a comment, but didn't elaborate further: "If you want to isolate your application's cookies you need to override the Cache directory registry setting via IDocHostUIHandler2::GetOverrideKeyPath" I've looked around the internet for IDocHostUIHandler2 and GetOverrideKeyPath, but I've got no idea of how to use them from C# to isolate cookies to my WebBrowser control. My experience with the Windows registry is limited to RegEdit (so I understand that it's a tree structure with different data types but that's about it... I have no in-depth knowledge of the registry's relationship with IE, for example). Here's what I dug up on MSDN: IDocHostUIHandler2 docs: http://msdn.microsoft.com/en-us/library/aa753275%28VS.85%29.aspx GetOverrideKeyPath docs: http://msdn.microsoft.com/en-us/library/aa753274%28VS.85%29.aspx I think I know roughly what these things do, I just don't know how to use them. So, I guess that's it! Any help is greatly appreciated.

Read the article
How come many project-hosting sites doesn't have a forum feature?

- by george

I'm considering starting an open-source project, so I shopped around some popular project hosting sites. What I find surprising is that many (see here for a nice feature table) of the popular project hosting sites (e.g. GitHub, BitBucket) don't have a forum feature, i.e. a place where users can talk to the devs, ask questions, raise ideas, etc. IMHO an active forum is an important factor in creating a user community around a project, so I would expect that most project owners would be interested in such a feature. I've also noticed that some projects do have support forums (or mailing lists) hosted elsewhere - e.g. Ruby on Rails is hosted on GitHub but has a Google Groups support group, and TortoiseHG is hosted on BitBucket but has a mailing list on SourceForge - so it's not like this feature is unneeded. So how come many project hosting sites don't have a forum feature?

Read the article
How to handle possible duplicate content across multiple sites?

- by ElHaix

Let's say I have two sites that cover the same vertical/topic. one in the USA and one in Canada. Both sites have local-related content, which is obviously unique by location. However they will share common news or blog pages. How do I avoid getting hit with duplicate content on both sites for those news/blog pages? If the content is exactly the same, I'm guessing I would have to pick which site's content I want to noindex,nofollow, is that correct, and if so, is that all I have to add on the URL links to those pages, and the pages' meta tags?

Read the article
Is there a decent way to maintain development of wordpress sites using the same base?

- by Joakim Johansson

We've been churning out wordpress sites for a while, and we'd like to keep a base repository that can be used when starting a new project, as well as updating existing sites with changes to the wordpress base. Am I wrong in assuming this would be a good thing? We take care of updating the sites, so having a common base would make this easier. I've been looking at solutions using git, such as forking a base repository and using it to pull changes to the wordpress base, but committing the site to it's own repository. Or maybe, if it's possible, storing the base as a git submodule, but this would require storing themes and plugins outside of that. Is there any common way to go about this kind of website development?

Read the article
List of eCommerce sites that use end-to-end SSL?

- by Jon Schneider

My development team is considering implementing an eCommerce site using end-to-end SSL -- that is, every page on the site is accessed via an https:// URL -- rather than the more traditional "mixed mode" where most pages are accessed via http:// and only "secure" pages such as login and credit card entry are redirected to https://. Pros of doing such a "pure SSL" approach include avoidance of some session-hijacking attacks such as Firesheep; cons include performance considerations. My question is: Is anyone aware of a list of eCommerce websites (especially USA-based sites), or even specific websites, that use this end-to-end SSL approach? I'm especially interested in "regular" eCommerce sites rather than banks or other "financial" sites.

Read the article
Google pénalise le référencement des sites truffés de publicités, compliquant l'accès au contenu, ils sont considérés comme du spam

Google pénalise le référencement des sites truffés de publicités Compliquant l'accès au contenu, ils sont considérés comme du spam Matt Cutts, responsable de l'équipe antispam de Google, s'est présenté à la conférence PubCon avec de bonnes et de mauvaises nouvelles. Le moteur de recherche pénalise désormais le référencement des sites truffés de publicités rendant difficile l'accès aux contenus pertinents des pages. Ce qui compte, a expliqué Cutts dans sa keynote est « combien de contenu est au-dessus de la ligne de flottaison [...] Si vous avez des publicités obscurcissant votre contenu, vous avez intérêt à y repenser », laissant entendre que les sites compli...

Read the article
Référencement : Google déploie un nouvel algorithme qui pénalise les sites utilisant des noms de domaine avec des mots clés

Référencement : Google déploie un nouvel algorithme qui pénalise les sites utilisant des noms de domaine avec des mots clés Via un message sur son compte Twitter, le spécialiste de l'équipe anti WebSpam de Google, Matt Cutts, a annoncé le déploiement d'un nouvel algorithme sur le moteur Google. Ce nouvel algorithme vise à affiner les résultats de recherche du moteur Google en pénalisant les sites ayant un contenu de faible qualité malgré que le nom de domaine du site corresponde aux termes de la recherche. En effet, la firme a constaté que des sites Web ont recourt à des noms de domaines qualifiés de « exact-match domain » qui reprennent des termes de recherche susceptibles...

Read the article
Les pirates utilisent de plus en plus les sites légitimes pour leurs exploits, révèle un rapport de Kaspersky Lab

Les pirates utilisent de plus en plus les sites légitimes pour leurs exploits, révèle un rapport de Kaspersky Lab Kaspersky Lab vient de publier ses dernières observations sur l'évolution des menaces de sécurité informatique. Il y est mis en lumière une hausse des attaques en ligne en 2010, avec plus de 580 millions d'incidents détectés. Et une nouvelle tendance s'est faite remarquer : les risques ne planaient plus seulement au dessus des sites proposant des contenus illégaux, mais aussi du côté des pages légitimes (comme les sites de shopping ou de jeu en ligne), que les cyber-criminels prennent de plus en plus à parti. En général, ces derniers s'attaquent à des serveurs vulnérables, et injectent un code malveillant...

Read the article
Google veut stopper la conversion des vidéos YouTube en MP3, la société menace de poursuivre en justice les sites offrant ce service

Google veut stopper la conversion des vidéos YouTube en MP3, la société menace de poursuivre en justice les sites offrant ce service Google s'attaque aux services de conversion gratuite des streaming YouTube en MP3. Le géant de la recherche a envoyé une lettre de mise en demeure aux sites de conversion YouTube-mp3.org et Music-Clips.net, menaçant ceux-ci de poursuites en justice au cas où ils ne cesseraient pas leur activité. En effet, des dizaines de sites Web offrent des services en ligne, permettant de convertir des vidéos YouTube en MP3, en renseignant uniquement l'URL de la vidéo YouTube. Ces services, qui jouissent d'une certaine popularité, sont utili...

Read the article
IIS6 host multiple websites under same sub-domain (or something similar)

- by user28502

I'm trying to figure out a structure for a hosted application that i'm working on. I've got a domain lets call it app.company.com (a sub-domain company.com of course) that is setup to redirect to my IIS 6 web server. I would like to set up one website in IIS for each client that will use this application. And have the URL schema be like this: app.company.com/clientA -- would point to ClientA website in IIS app.company.com/clientB -- would point to ClientB website in IIS Do you guys have any pointers or best practices for my scenario?

Read the article
multiple domains, one static IP address and latency

- by shirish

how is latency affected when multiple domains are using one single static IP address ? The scenario is in shared web-hosting By latency meaning the DNS lookup the client has to do. As far as I understand it, the browser would hit the root servers to try to figure out the IP Address and it belongs where and then when it comes to the correct server, it probably looks up some sort of table to determine which site names much and show that site as such via browser to the user. Is my understanding correct or backwards or what ?

Read the article
Setting up a copy of a site with IIS 7?

- by SJaguar13

I have a site running on IIS with a dyndns.org domain that points to the IP of the Windows 2008 machine hosting it. I need a copy of that site for development purposes. I set up another folder with all the files, and create a new site in IIS. I don't really have a domain for it, so I was just going to use the IP address. When I go to localhost, 127.0.0.1, or the internal IP, I get bad hostname. If I use the IP address on port 80 (the same as the real version of the site), I get 404 not found. If I use a different port so I don't have them both on the same IP with the same port, I get connection timed out. How do I go about setting this up?

Read the article
Two Different servers and websites on One IP

- by bob

I'd Like to set up two different websites on one IP address using two different servers. Right now we are using Apache for our one Mac Server and it forwards correctly both to the pyhsical IP address as well as the domain. We'd like to add another domain to a new computer also running Leopard + Apache

Read the article
IIS6 host multiple websites under same sub-domain (or something similar)

- by Sigurbjörn

Hi there! I'm trying to figure out a structure for a hosted application that i'm working on. I've got a domain lets call it app.company.com (a sub-domain company.com of course) that is setup to redirect to my IIS 6 web server. I would like to set up one website in IIS for each client that will use this application. And have the URL schema be like this: app.company.com/clientA -- would point to ClientA website in IIS app.company.com/clientB -- would point to ClientB website in IIS Do you guys have any pointers or best practices for my scenario?

Read the article
Sites with 1-column css styles

- by user300413

where i may found simple 1-column css styles? EDIT: templates like it - http://mashable.com/2007/09/13/one-column-website-templates/ but another sorry my english EDIT2: bignose, ok, thanks

Read the article
Are there any B-tree programs or sites that show visually how a B-tree works

- by Phenom

I found this website that lets you insert and delete items from a B-tree and shows you visually what the B-tree looks like: java b-tree I'm looking for another website or program similar to this. This site does not allow you to specify a B-tree of order 4 (4 pointers and 3 elements), it only lets you specify B-trees with an even number of elements. Also, if possible, I'd like to be able to insert letters instead of numbers. I think I actually found a different site but that was a while ago and can't find it anymore.

Read the article

< Previous Page | 16 17 18 19 20 21 22 23 24 25 26 27 | Next Page >