Search Results

Search found 2812 results on 113 pages for 'michael lu'.

Page 80/113 | < Previous Page | 76 77 78 79 80 81 82 83 84 85 86 87  | Next Page >

  • Improving HTML scrapper efficiency with pcntl_fork()

    - by Michael Pasqualone
    With the help from two previous questions, I now have a working HTML scrapper that feeds product information into a database. What I am now trying to do is improve efficiently by wrapping my brain around with getting my scrapper working with pcntl_fork. If I split my php5-cli script into 10 separate chunks, I improve total runtime by a large factor so I know I am not i/o or cpu bound but just limited by the linear nature of my scraping functions. Using code I've cobbled together from multiple sources, I have this working test: <?php libxml_use_internal_errors(true); ini_set('max_execution_time', 0); ini_set('max_input_time', 0); set_time_limit(0); $hrefArray = array("http://slashdot.org", "http://slashdot.org", "http://slashdot.org", "http://slashdot.org"); function doDomStuff($singleHref,$childPid) { $html = new DOMDocument(); $html->loadHtmlFile($singleHref); $xPath = new DOMXPath($html); $domQuery = '//div[@id="slogan"]/h2'; $domReturn = $xPath->query($domQuery); foreach($domReturn as $return) { $slogan = $return->nodeValue; echo "Child PID #" . $childPid . " says: " . $slogan . "\n"; } } $pids = array(); foreach ($hrefArray as $singleHref) { $pid = pcntl_fork(); if ($pid == -1) { die("Couldn't fork, error!"); } elseif ($pid > 0) { // We are the parent $pids[] = $pid; } else { // We are the child $childPid = posix_getpid(); doDomStuff($singleHref,$childPid); exit(0); } } foreach ($pids as $pid) { pcntl_waitpid($pid, $status); } // Clear the libxml buffer so it doesn't fill up libxml_clear_errors(); Which raises the following questions: 1) Given my hrefArray contains 4 urls - if the array was to contain say 1,000 product urls this code would spawn 1,000 child processes? If so, what is the best way to limit the amount of processes to say 10, and again 1,000 urls as an example split the child work load to 100 products per child (10 x 100). 2) I've learn that pcntl_fork creates a copy of the process and all variables, classes, etc. What I would like to do is replace my hrefArray variable with a DOMDocument query that builds the list of products to scrape, and then feeds them off to child processes to do the processing - so spreading the load across 10 child workers. My brain is telling I need to do something like the following (obviously this doesn't work, so don't run it): <?php libxml_use_internal_errors(true); ini_set('max_execution_time', 0); ini_set('max_input_time', 0); set_time_limit(0); $maxChildWorkers = 10; $html = new DOMDocument(); $html->loadHtmlFile('http://xxxx'); $xPath = new DOMXPath($html); $domQuery = '//div[@id=productDetail]/a'; $domReturn = $xPath->query($domQuery); $hrefsArray[] = $domReturn->getAttribute('href'); function doDomStuff($singleHref) { // Do stuff here with each product } // To figure out: Split href array into $maxChilderWorks # of workArray1, workArray2 ... workArray10. $pids = array(); foreach ($workArray(1,2,3 ... 10) as $singleHref) { $pid = pcntl_fork(); if ($pid == -1) { die("Couldn't fork, error!"); } elseif ($pid > 0) { // We are the parent $pids[] = $pid; } else { // We are the child $childPid = posix_getpid(); doDomStuff($singleHref); exit(0); } } foreach ($pids as $pid) { pcntl_waitpid($pid, $status); } // Clear the libxml buffer so it doesn't fill up libxml_clear_errors(); But what I can't figure out is how to build my hrefsArray[] in the master/parent process only and feed it off to the child process. Currently everything I've tried causes loops in the child processes. I.e. my hrefsArray gets built in the master, and in each subsequent child process. I am sure I am going about this all totally wrong, so would greatly appreciate just general nudge in the right direction.

    Read the article

  • UITableView Reusable cells

    - by Michael
    Can someone explain me the way reusable cells works for single table view? How many reusable cells a datasource should create? So far in all samples I've seen only one. Would one even need more?

    Read the article

  • "_FILE_AND_LINE_ is not defined in this scope" (compiling RakNet NAT examples in OS X)

    - by Michael F
    Hello! I'm working on a RakNet-based project (using 3.8 on OS X 10.6), and I'm trying to work through the various examples that demonstrate the parts of RakNet I want to use. For the "NatCompleteClient" example, I've imported the source into a command-line project in XCode, along with the UPNP dependency. At compile time I've had a few errors in the UPNP section, though, and I can't find any guidance on this. In UPNPPortForwarder.mm, there are 7 lines that use _FILE_AND_LINE_, and the compiler is not happy; for example on line 232: foundInterfaces.Deallocate(r1,_FILE_AND_LINE_); causes: UPNPPortForwarder.mm:232: error: '_FILE_AND_LINE_' was not declared in this scope Can anyone tell me what this is all about? That variable doesn't seem to get talked about very often... or Google doesn't like to find it.

    Read the article

  • best way to export data from pdfs

    - by michael
    Hi i work at a news paper and we are lookin a way to make archieve material available. Atm our pages come in pdf format so we need a way to export text and images from the pdf so that they can be added to a database. We've had a look at the News studio plugin for Adobe Acrobat from Iceni Technology, but just wondering if anyone else knows other options for exporting pdf data. thanks

    Read the article

  • Backup Google Calendar programmatically: https://www.google.com/calendar/exporticalzip

    - by Michael
    I'm struggling with writing a python script that automatically grabs the zip fail containing all my google calendars and stores it (as a backup) on my harddisk. I'm using ClientLogin to get an authentication token (and successfully can obtain the token). Unfortunately, i'm unable to retrieve the file at https://www.google.com/calendar/exporticalzip It always asks me for the login credentials again by returning a login page as html (instead of the zip). Here's the critical code: post_data = post_data = urllib.urlencode({ 'auth': token, 'continue': zip_url}) request = urllib2.Request('https://www.google.com/calendar', post_data, header) try: f = urllib2.urlopen(request) result = f.read() except: print "Error" Anyone any ideas or done that before? Or an alternative idea how to backup all my calendars (automatically!)

    Read the article

  • Using same Debug settings for Start External Program across 32 bit and 64 bit debug environments

    - by Michael Prewecki
    We use a mixture of 32-bit and 64-bit development environments. Some of our class libraries are debugged using a 32-bit application so we have debug settings for "Start External Program" and "Working Directory". The problem is that the settings need to be different since the 32-bit application is installed to C:\Program Files\xxx (on the 32-bit dev enviroment) or C:\Program Files (x86)\xxx (on the 64-bit dev environment) Is there a way to use some sort of tag like %PROGRAMFILES% or $(ProgramFiles) so that Visual Studio 2008 will know where to look for the external program? This wouldn't be a major issue except the solution file (where the debug information is saved) is checked into source control...so getting the latest version of the solution from our source repository keeps yoyo'ing the debug settings between the two program files locations.

    Read the article

  • Detecting if MSBuild/.net 4 is installed from C# code running on 3.5?

    - by Michael Stum
    I have an application that is running on .net 3.5 SP1 and that is supposed to check if .net 4 is installed. Actually, I'm more interested if MSBuild v4 is installed, which would boil down to a simple File.Exists(@"C:\Windows\Microsoft.NET\Framework\v4.0.30319\msbuild.exe"); However, apart from the fragility of the 4.0.30319 Version (and the Windir, but that's easy to solve), I wonder if there is a more appropriate way, like an API?

    Read the article

  • How to copy a variable in JavaScript?

    - by Michael Stum
    I have this JavaScript code: for (var idx in data) { var row = $("<tr></tr>"); row.click(function() { alert(idx); }); table.append(row); } So I'm looking through an array, dynamically creating rows (the part where I create the cells is omitted as it's not important). Important is that I create a new function which encloses the idx variable. However, idx is only a reference, so at the end of the loop, all rows have the same function and all alert the same value. One way I solve this at the moment is by doing this: function GetRowClickFunction(idx){ return function() { alert(idx); } } and in the calling code I call row.click(GetRowClickFunction(idx)); This works, but is somewhat ugly. I wonder if there is a better way to just copy the current value of idx inside the loop? While the problem itself is not jQuery specific (it's related to JavaScript closures/scope), I use jQuery and hence a jQuery-only solution is okay if it works.

    Read the article

  • XP Leveling System - PHP

    - by Michael Rich
    Rank Table ID, Primary Key RANK, The rank or level, 1 being the highest and 3 the lowest MIN_SCORE, The minimum amount of point or XP needed to reach the rank NAME, The associated name of the rank Rank Table +----+------+-----------+-------------------------+ | ID | RANK | MIN_SCORE | NAME | +----+------+-----------+-------------------------+ | 1 | 1 | 18932 | Editor-in-Chief | | 2 | 2 | 15146 | Senior Technical Writer | | 3 | 3 | 12116 | Senior Copywriter | +----+------+-----------+-------------------------+ Ranking Table ID, Primary Key FK_MEMEBER_ID, Foreign Key to member's Primary Key FK_RANK, Foreign Key to Author Rank Table's Rank column (top) SCORE, The member's current earned score or XP Ranking Table +-----+--------------+---------+-------+ | ID | FK_MEMBER_ID | FK_RANK | SCORE | +-----+--------------+---------+-------+ | 1 | 1 | 1 | 17722 | | 2 | 2 | 2 | 16257 | | 3 | 3 | 3 | 12234 | +-----+--------------+---------+-------+ In my class I have stored the ranks -- matching those in the Rank Table -- and correlating minimum scores; RANK as key and MINIMUM_SCORE as value. When a member's score (XP) is updated (up/down) I want to test that updated score against the below array to determine if their rank needs updating too. private $scores = array('3' => '12116', '2' => '15146', '1' => '18932',); Using the updated score, how could I determine the correlating rank from the above array? Everything is open to scrutiny, this is my first time creating a ranking system so I hope to get it right :)

    Read the article

  • Image.createImage problem in J2ME

    - by Michael
    Hi All, I tried this on J2ME try { Image immutableThumb = Image.createImage( temp, 0, temp.length); } catch (Exception ex) { System.out.println(ex); } I hit this error: java.lang.IllegalArgumentException: How do I solve this?

    Read the article

  • Inject a EJB into a JSF converter with JEE6

    - by Michael Bavin
    Hi, I have a stateless EJB that acceses my database. I need this bean in a JSF 2 converter to retreive an entity object from the String value parameter. I'm using JEE6 with Glassfish V3 @EJB annotation does not work and gets a NPE, because it's in the faces context and it has not access to the ejb context. My question is: Is it still possible to Inject this bean (With a @Resource or other annotation, a JNDI lookup,...), or do i need a workaround? Thank you Solution Do a JNDI lookup like this: try { ic = new InitialContext(); myejb= (MyEJB) ic .lookup("java:global/xxxx/MyEJB"); } catch (NamingException e) { e.printStackTrace(); }

    Read the article

  • For business people to manage, keep binary images in MySQL or just the urls?

    - by Michael Mao
    Hello everyone: I am working on a task to enable image uploading and auto-scaling(from full sized to thumbnail) by jQuery & PHP. I can naturally come up with two approaches : First, store both images as binary objects directly into MySQL; Second, store only urls to the images and keep the images somewhere on server. The images are for everyone to view, so there are no security restrictions, as far as I know. Personally I don't have any preference, however, at the end of the day, it is the business people that are going to manage the images as part of the system(CRUD). So I am wondering which seems to be a bit better for them? Of course I am building a easy-to-use, visualize web interface for the staff to control the process, but I am not sure if that is enough. Lessons told me that if I don't think for the future and seek the most flexible approach, the I will probably screw myself sooner or later. PS. The following link is what I've found so far, which is pretty cool, no flash involved :) Andrew Valum's ajax image upload jQuery plugin

    Read the article

  • Problem with absolute positioning div over SWF and IE 8.

    - by Michael S. Kelly
    I'm attempting to use the old IFrame-over-SWF trick to get HTML to display "inside" a SWF. I'm following the example provided by Brian Deitte at: http://www.deitte.com/IFrameDemo3/IFrameDemo.html. (The source code can be viewed and downloaded by right-clicking on the SWF and selecting "View Source".) In the latest versions of Firefox, Google, Opera, and Safari on the Mac, it all looks good. But in IE 8 the absolutely positioned div containing the IFrame is positioned too far up and left, and the height and width are considerably smaller. Thoughts?

    Read the article

  • Strange behaviour of NSScanner on simple whitespace removal

    - by Michael Waterfall
    I'm trying to replace all multiple whitespace in some text with a single space. This should be a very simple task, however for some reason it's returning a different result than expected. I've read the docs on the NSScanner and it seems like it's not working properly! NSScanner *scanner = [[NSScanner alloc] initWithString:@"This is a test of NSScanner !"]; NSMutableString *result = [[NSMutableString alloc] init]; NSString *temp; NSCharacterSet *whitespace = [NSCharacterSet whitespaceCharacterSet]; while (![scanner isAtEnd]) { // Scan upto and stop before any whitespace [scanner scanUpToCharactersFromSet:whitespace intoString:&temp]; // Add all non whotespace characters to string [result appendString:temp]; // Scan past all whitespace and replace with a single space if ([scanner scanCharactersFromSet:whitespace intoString:NULL]) { [result appendString:@" "]; } } But for some reason the result is @"ThisisatestofNSScanner!" instead of @"This is a test of NSScanner !". If you read through the comments and what each line should achieve it seems simple enough!? scanUpToCharactersFromSet should stop the scanner just as it encounters whitespace. scanCharactersFromSet should then progress the scanner past the whitespace up to the non-whitespace characters. And then the loop continues to the end. What am I missing or not understanding?

    Read the article

  • Is there a browser-agnostic way to detect client-side script errors with Watin?

    - by Michael
    We're using WatiN to test our web portals. During the course of an E2E test, we'll occasionally see client-side script errors on the IE status bar. I'd like to chain a handler onto the script error event and record the error for later analysis and bug filing. Problem is, I don't know that there's a global script error event or how to chain into it. And if there's not a browser-agnostic way to accomplish this, I can create MyIE and MyFF subclasses but then this becomes two browser-specific questions. In essence, I'm thinking of something like this entirely made-up call: browser.ScriptEngine.SetCustomErrorHandler(LogScriptingError); ... where LogScriptErrors is my code that does the obvious. Many of our client-side scripting errors don't necessarily prevent the test from continuing (a pretty UI element didn't animate, for example, but the underlying form is still submittable), so I'd like to log the error and forge ahead in most cases.

    Read the article

  • iPhone apps for company-internal use - possible?

    - by Michael Stum
    I hope this is still programming related, as SuperUser doesn't seem the appropriate place. Basically I wonder if it is possible to have Applications that are internal to a company on the iPhone? That is something like a companion Application to an Intranet (when Safari and Mail just don't cut it) which wouldn't make sense on the AppStore (and likely wouldn't get approved anyway). Is something like that possible (without Jailbreaking or doing anything else that Apple doesn't normally want)?

    Read the article

  • What's a good way to encrypt data using an asymmetric key, that's available to both java and ruby?

    - by Michael Campbell
    I have a customer that wants to encrypt some data in his database (not passwords; this needs actual encryption, not hashing). The application which will be doing the encrypting/writing is in Java, but the process which will DECRYPT it is behind a secure firewall, and is written in ruby. The idea was to use a public/private key scheme; the java system would encrypt it with the public key, then the process on his local box would use the private key to decrypt it as needed. I'm looking for any experience anyone has doing something like that; my main question is what sorts of libraries on java and ruby can interoperate with the same keys and data.

    Read the article

  • php DOM, get values from xml document, php xml

    - by Michael
    I'm trying to get some information (itemID, title, price and mileage) for multiple listings from ebay website using their api . So far I got this link up http://open.api.ebay.com/shopping?callname=GetMultipleItems&responseencoding=XML&appid=Morcovar-c74b-47c0-954f-463afb69a4b3&siteid=0&version=525&IncludeSelector=ItemSpecifics&ItemID=220617293997,250645537939,230485306218 I've saved the document as .xml file using php curl and now I need to get/extract the values(itemID, title, price and mileage) into arrays and store them in database. Unfortunately I never worked with php dom and I can't figure it out how to extract the values . I tried to follow the tutorial found on IBM website http://www.ibm.com/developerworks/library/os-xmldomphp/ but I had no success. Some help would be highly appreciated.

    Read the article

  • Cannot access NSDictionary

    - by michael blaize
    I created a JSON using a PHP script. I am reading the JSON and can see that the data has been correctly read. However, when it comes to access the objects I get unrecognized selector sent to instance... Cannot seem to find why that is after too many hours !!!! Any help would be great ! My code looks like that: `NSDictionary *json = [[NSDictionary alloc] init]; json = [NSJSONSerialization JSONObjectWithData:receivedData options:kNilOptions error:&error]; NSLog(@"raw json = %@,%@",json,error); NSMutableArray *name = [[NSMutableArray alloc] init]; [name addObjectsFromArray: [json objectForKey:@"name"]];` The code crashes when reaching the last line above. The output like this: raw json = ( { category = vacancies; link = "http://blablabla.com"; name = "name 111111"; tagline = "tagline 111111"; }, { category = vacancies; link = "http://blobloblo.com"; name = "name 222222222"; tagline = "tagline 222222222"; } ),(null) 2012-06-23 21:46:57.539 Wind expert[4302:15203] -[__NSCFArray objectForKey:]: unrecognized selector sent to instance 0xdcfb970 HELP !!!

    Read the article

< Previous Page | 76 77 78 79 80 81 82 83 84 85 86 87  | Next Page >