Search Results

Search found 5322 results on 213 pages for 'answers'.

Page 175/213 | < Previous Page | 171 172 173 174 175 176 177 178 179 180 181 182  | Next Page >

  • Which key value store is the most promising/stable?

    - by Mike Trpcic
    I'm looking to start using a key/value store for some side projects (mostly as a learning experience), but so many have popped up in the recent past that I've got no idea where to begin. Just listing from memory, I can think of: CouchDB MongoDB Riak Redis Tokyo Cabinet Berkeley DB Cassandra MemcacheDB And I'm sure that there are more out there that have slipped through my search efforts. With all the information out there, it's hard to find solid comparisons between all of the competitors. My criteria and questions are: (Most Important) Which do you recommend, and why? Which one is the fastest? Which one is the most stable? Which one is the easiest to set up and install? Which ones have bindings for Python and/or Ruby? Edit: So far it looks like Redis is the best solution, but that's only because I've gotten one solid response (from ardsrk). I'm looking for more answers like his, because they point me in the direction of useful, quantitative information. Which Key-Value store do you use, and why? Edit 2: If anyone has experience with CouchDB, Riak, or MongoDB, I'd love to hear your experiences with them (and even more so if you can offer a comparative analysis of several of them)

    Read the article

  • Java hangs when trying to close a ProcessBuilder OutputStream

    - by Jeff Bullard
    I have the following Java code to start a ProcessBuilder, open an OutputStream, have the process write a string to an OutputStream, and then close the OutputStream. The whole thing hangs indefinitely when I try to close the OutputStream. This only happens on Windows, never on Mac or Linux. Some of the related questions seem to be close to the same problem I'm having, but I haven't been able to figure out how to apply the answers to my problem, as I am a relative newbie with Java. Here is the code. You can see I have put in a lot of println statements to try to isolate the problem. System.out.println("GenMic trying to get the input file now"); System.out.flush(); OutputStream out = child.getOutputStream(); try { System.out.println("GenMic getting ready to write the input file to out"); System.out.flush(); out.write(intext.getBytes()); System.out.println("GenMic finished writing to out"); System.out.flush(); out.close(); System.out.println("GenMic closed OutputStream"); System.out.flush(); } catch (IOException iox) { System.out.println("GenMic caught IOException 2"); System.out.flush(); String detailedMessage = iox.getMessage(); System.out.println("Exception: " + detailedMessage); System.out.flush(); throw new RuntimeException(iox); } And here is the output when this chunk is executed: GenMic trying to get the input file now GenMic getting ready to write the input file to out GenMic finished writing to out

    Read the article

  • Java appending XML data

    - by Travis
    I've already read through a few of the answers on this site but none of them worked for me. I have an XML file like this: <root> <character> <name>Volstvok</name> <charID>(omitted)</charID> <userID>(omitted)</userID> <apiKey>(omitted)</apiKey> </character> </root> I need to add another <character> somehow. I'm trying this but it does not work: public void addCharacter(String name, int id, int userID, String apiKey){ Element newCharacter = doc.createElement("character"); Element newName = doc.createElement("name"); newName.setTextContent(name); Element newID = doc.createElement("charID"); newID.setTextContent(Integer.toString(id)); Element newUserID = doc.createElement("userID"); newUserID.setTextContent(Integer.toString(userID)); Element newApiKey = doc.createElement("apiKey"); newApiKey.setTextContent(apiKey); //Setup and write newCharacter.appendChild(newName); newCharacter.appendChild(newID); newCharacter.appendChild(newUserID); newCharacter.appendChild(newApiKey); doc.getDocumentElement().appendChild(newCharacter); }

    Read the article

  • How would you go about parsing markdown?

    - by John Leidegren
    You can find the syntax here. The thing is, the source that follows with the download is written in perl. Which I have no intentions of honoring. It is riddled with regex and it relies on MD5 hashes to escape certain characters. Something is just wrong about that! I'm about to hard code a parser for markdown and I'm wonder if someone had some experience with this? Edit: If you don't have anything meaningful to say about the actual parsing of markdown, spare me the time. (This might sound harsh, but yes, I'm looking for insight, not a solution i.e. third-party library). To help a bit with the answers, regex are meant to identify patterns! NOT to parse an entire grammar. That people consider doing so is foobar. If you think about markdown, it's fundamentally based around the concept of paragraphs. As such, a reasonable approach might be to split the input into paragraphs. There are many kinds of paragraphs e.g. heading, text, list, blockquote, code. The challenge is thus to identify these paragraphs and in what context they occur. I'll be back with a solution, once I find it's worthy to be shared.

    Read the article

  • Use AJAX to reload captcha

    - by arik-so
    Hello! This question may have been asked already - but unfortunately, I could not find any satisfactory answers. I will just ask it for my concrete case and ask the admins not to delete the question for at least a few days so I can try it out... I have a page. It uses a captcha. Like so: <?php session_start(); // the captcha saves the md5 into the session ?> <img src="captcha.php" onclick="this.src = this.src" /> That was my first code. It did not work, because the browser condsidered it useless to reload an image if the source is the same. My current solution is to pass a get parameter: onclick="this.src = 'captcha.php?randomNumber='+ranNum" The JavaScript variable var ranNum is generated randomly every time the onclick event fires. It works fine, still, I don't like the possibility, if the - though improbable - case of two numbers being the same twice in a row. Although the random number varies between -50,000 and 50,000 - I still do not like it. And I don't think the method is right. I would like to know the 'righter' method, by means of AJAX. I know it's possible. I hope you know how it's possible ^^ In that case, please show me. Thanks in advance! By the way - if I spell cap(t)cha differently, never mind, the reference to the PHP file is right in my code: I use randomImage.php

    Read the article

  • How do I detect if both left and right buttons are pushed?

    - by Greg McGuffey
    I would like have three mouse actions over a control: left, right and BOTH. I've got the left and right and am currently using the middle button for the third, but am curious how I could use the left and right buttons being pressed together, for those situations where the user has a mouse without a middle button. This would be handled in the OnMouseDown method of a custom control. UPDATE After reviewing the suggested answers, I need to clarify that what I was attempting to do was to take action on the mouse click in the MouseDown event (actually OnMouseDown method of a control). Because it appears that .NET will always raise two MouseDown events when both the left and right buttons on the mouse are clicked (one for each button), I'm guessing the only way to do this would be either do some low level windows message management or to implement some sort of delayed execution of an action after MouseDown. In the end, it is just way simpler to use the middle mouse button. Now, if the action took place on MouseUp, then Gary's or nos's suggestions would work well. Any further insights on this problem would be appreciated. Thanks!

    Read the article

  • FILE* issue PPU side code

    - by Cristina
    We are working on a homework on CELL programming for college and their feedback response to our questions is kinda slow, thought i can get some faster answers here. I have a PPU side code which tries to open a file passed down through char* argv[], however this doesn't work it cannot make the assignment of the pointer, i get a NULL. Now my first idea was that the file isn't in the correct directory and i copied in every possible and logical place, my second idea is that maybe the PPU wants this pointer in its LS area, but i can't deduce if that's the bug or not. So... My question is what am i doing wrong? I am working with a Fedora 7 SDK Cell, with Eclipse as an IDE. Maybe my argument setup is wrong tho he gets the name of the file correctly. Code on request: images_t *read_bin_data(char *name) { FILE *file; images_t *img; uint32_t *buffer; uint8_t buf; unsigned long fileLen; unsigned long i; //Open file file = (FILE*)malloc(sizeof(FILE)); file = fopen(name, "rb"); printf("[Debug]Opening file %s\n",name); if (!file) { fprintf(stderr, "Unable to open file %s", name); return NULL; } //....... } Main launch: int main(int argc,char* argv[]) { int i,img_width; int modif_this[4] __attribute__ ((aligned(16))) = {1,2,3,4}; images_t *faces, *nonfaces; spe_context_ptr_t ctxs[SPU_THREADS]; pthread_t threads[SPU_THREADS]; thread_arg_t arg[SPU_THREADS]; //intializare img_width img_width = atoi(argv[1]); printf("[Debug]Img size is %i\n",img_width); faces = read_bin_data(argv[3]); //....... } Thanks for the help.

    Read the article

  • Javascript Object.Watch for all browsers?

    - by SeanW
    Hey all, I was looking for an easy way to monitor an object or variable for changes, and I found Object.Watch that's supported in Mozilla browsers, but not IE. So I started searching around to see if anyone had written some sort of equivalent. About the only thing I've found has been a jQuery plugin (http://plugins.jquery.com/files/jquery-watch.js.txt), but I'm not sure if that's the best way to go. I certainly use jQuery in most of my projects, so I'm not worried about the jQuery aspect... Anyway, the question: can someone show me a working example of that jQuery plugin? I'm having problems making it work... Or, does anyone know of any better alternatives that would work cross browser? Thanks! Update after answers: Thanks everyone for the responses! I tried out the code posted here: http://webreflection.blogspot.com/2009/01/internet-explorer-object-watch.html But I couldn't seem to make it work with IE. The code below works fine in FireFox, but does nothing in IE. In Firefox, each time watcher.status is changed, the document.write in watcher.watch is called and you can see the output on the page. In IE, that doesn't happen, but I can see that watcher.status is updating the value, because the last document.write shows the correct value (in both IE and FF). But, if the callback function isn't called, then that's kind of pointless... :) Am I missing something? var options = {'status': 'no status'}, watcher = createWatcher(options); watcher.watch("status", function(prop, oldValue, newValue) { document.write("old: " + oldValue + ", new: " + newValue + "<br>"); return newValue; }); watcher.status = 'asdf'; watcher.status = '1234'; document.write(watcher.status + "<br>");

    Read the article

  • Why is creating a ring buffer shared by different processes so hard (in C++), what I am doing wrong?

    - by recipriversexclusion
    I am being especially dense about this but it seems I'm missing an important, basic point or something, since what I want to do should be common: I need to create a fixed-size ring buffer object from a manager process (Process M). This object has write() and read() methods to write/read from the buffer. The read/write methods will be called by independent processes (Process R and W) I have implemented the buffer, SharedBuffer<T&>, it allocates buffer slots in SHM using boost::interprocess and works perfectly within a single process. I have read the answers to this question and that one on SO, as well as asked my own, but I'm still in the dark about how to have different processes access methods from a common object. The Boost doc has an example of creating a vector in SHM, which is very similar to what I want, but I want to instantiate my own class. My current options are: Use placement new, as suggested by Charles B. to my question; however, he cautions that it's not a good idea to put non-POD objects in SHM. But my class needs the read/write methods, how can I handle those? Add an allocator to my class definition, e.g. have SharedBuffer<T&, Alloc> and proceed similarly to the vector example given in boost. This sounds really complicated. Change SharedBuffer to a POD class, i.e. get rid of all the methods. But then how to synchronize reading and writing between processes? What am I missing? Fixed-length ring buffers are very common, so either this problem has a solution or else I'm doing something wrong.

    Read the article

  • Flex Spark TitleWindow bad redraw on dragging

    - by praksant
    Hi, I have a problem with redrawing in flex 4. I have a spark titleWindow, and if i drag it faster, it looks like it's mask is one frame late after the component. it's easily visible with 1pixel thin border, because it becomes invisible even with slower movement. You can try it here (what is not my page, but it's easier to show you here than uploading example): http://flexponential.com/2010/01/10/resizable-titlewindow-in-flex-4/ If you move in direction up, you see disappearing top border. in another directions it's not that sensitive as it has wide shadow, and it's not very visible on shadow. On my computer i see it on every spark TitleWindow i have found on google, although it's much less visible with less contrast skins, without borders or with shadows. Do you see it there? i had never this problem with halo components. It's doing the same thing with different skins. I tried to delete masks from skin, cache component, skin even an application as bitmap with no success. I also turned on redraw regions in flash player, and it looks like it's one frame late after titlewindow too. Does anyone know why is it doing this or how can i prevent it? Thank you UPDATE: no answers? really?

    Read the article

  • Removing elements using XSLT 1.0

    - by pmdarrow
    I'm attempting to remove Component elements from the XML below that have File children with the extension "config." I've managed to do this part, but I also need to remove the matching ComponentRef elements that have the same IDs as these Components. <Fragment> <DirectoryRef Id="MyWebsite"> <Component Id="Comp1"> <File Source="Web.config" /> </Component> <Component Id="Comp2"> <File Source="Default.aspx" /> </Component> </DirectoryRef> </Fragment> <Fragment> <ComponentGroup Id="MyWebsite"> <ComponentRef Id="Comp1" /> <ComponentRef Id="Comp2" /> </ComponentGroup> </Fragment> Based on other SO answers, I've come up with the following XSLT to remove these Component elements: <?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" indent="yes" /> <xsl:template match="Component[File[substring(@Source, string-length(@Source)- string-length('config') + 1) = 'config']]" /> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet> Unfortunately, this doesn't remove the matching ComponentRef elements. The XSLT will remove the component with the Id "Comp1" but not the ComponentRef with Id "Comp1". How do I achieve this using XSLT 1.0?

    Read the article

  • What are some good ways to store performance statistics in a database for querying later?

    - by Nathan
    Goal: Store arbitrary performance statistics of stuff that you care about (how many customers are currently logged on, how many widgets are being processed, etc.) in a database so that you can understand what how your servers are doing over time. Assumptions: A database is already available, and you already know how to gather the information you want and are capable of putting it in the database however you like. Some Ideal Attributes of a Solution Causes no noticeable performance hit on the server being monitored Has a very high precision of measurement Does not store useless or redundant information Is easy to query (lends itself to gathering/displaying useful information) Lends itself to being graphed easily Is accurate Is elegant Primary Questions 1) What is a good design/method/scheme for triggering the storing of statistics? 2) What is a good database design for how to actually store the data? Example answers...that are sort of vague and lame... 1) I could, once per [fixed time interval], store a row of data with all the performance measurements I care about in each column of one big flat table indexed by timestamp and/or server. 2) I could have a daemon monitoring performance stuff I care about, and add a row whenever something changes (instead of at fixed time intervals) to a flat table as in #1. 3) I could trigger either as in #2, but I could store information about each aspect of performance that I'm measuring in separate tables, opening up the possibility of adding tons of rows for often-changing items, and few rows for seldom-changing items. Etc. In the end, I will implement something, even if it's some super-braindead approach I make up myself, but I'm betting there are some really smart people out there willing to share their experiences and bright ideas!

    Read the article

  • When is someone else's code I use from the internet "mine"?

    - by robault
    I'm building a library from methods that I've found on the internet. Some are free to use or modify with no requirements, others say that if I leave a comment in the code it's okay to use, others say when I use the code I have to attribute the use of someone's code in my application (in the credits for my app I guess). What I've been doing is reorganizing classes, renaming methods, adding descriptions (code comments), renaming the parameters and names inside the methods to something meaningful, optimizing loops if applicable, changing return types, adding try/catch/throw blocks, adding parameter checks and cleaning up resources in the methods. For example; I didn't come up with the algorithm for blurring a Bitmap but I've taken the basic example of iterating through the pixels and turned it into a decent library method (applying the aforementioned modifications). I understand how to go about building it now myself but I didn't actually hit the keystrokes to make it and I couldn't have come up with it before learning from their example. What about code people get in answers on Stackoverflow or examples from Codeproject? At what point can I drop their requirements because at n% their code became mine? FWIW I intend on using the libraries to create products that I will sell.

    Read the article

  • How do I write unescaped XML outside of a CDATA

    - by kazanaki
    Hello I am trying to write XML data using Stax where the content itself is HTML If I try xtw.writeStartElement("contents"); xtw.writeCharacters("<b>here</b>"); xtw.writeEndElement(); I get this <contents>&lt;b&gt;here&lt;/b&gt;</contents> Then I notice the CDATA method and change my code to: xtw.writeStartElement("contents"); xtw.writeCData("<b>here</b>"); xtw.writeEndElement(); and this time the result is <contents><![CDATA[<b>here</b>]]></contents> which is still not good. What I really want is <contents><b>here</b></contents> So is there an XML API/Library that allows me to write raw text without being in a CDATA section? So far I have looked at Stax and JDom and they do not seem to offer this. In the end I might resort to good old StringBuilder but this would not be elegant. Update I agree mostly with the answers so far. However instead of <b>here</b> I could have a 1MB HTML document that I want to embed in a bigger XML document. What you suggest means that I have to parse this HTML document in order to understand its structure. I would like to avoid this if possible.

    Read the article

  • How to reuse results with a schema for end of day stock-data

    - by Vishalrix
    I am creating a database schema to be used for technical analysis like top-volume gainers, top-price gainers etc.I have checked answers to questions here, like the design question. Having taken the hint from boe100 's answer there I have a schema modeled pretty much on it, thusly: Symbol - char 6 //primary Date - date //primary Open - decimal 18, 4 High - decimal 18, 4 Low - decimal 18, 4 Close - decimal 18, 4 Volume - int Right now this table containing End Of Day( EOD) data will be about 3 million rows for 3 years. Later when I get/need more data it could be 20 million rows. The front end will be asking requests like "give me the top price gainers on date X over Y days". That request is one of the simpler ones, and as such is not too costly, time wise, I assume. But a request like " give me top volume gainers for the last 10 days, with the previous 100 days acting as baseline", could prove 10-100 times costlier. The result of such a request would be a float which signifies how many times the volume as grown etc. One option I have is adding a column for each such result. And if the user asks for volume gain in 10 days over 20 days, that would require another table. The total such tables could easily cross 100, specially if I start using other results as tables, like MACD-10, MACD-100. each of which will require its own column. Is this a feasible solution? Another option being that I keep the result in cached html files and present them to the user. I dont have much experience in web-development, so to me it looks messy; but I could be wrong ( ofc!) . Is that a option too? Let me add that I am/will be using mod_perl to present the response to the user. With much of the work on mysql database being done using perl. I would like to have a response time of 1-2 seconds.

    Read the article

  • How to use an Audio Unit on the iPhone

    - by CodeToaster
    I'm looking for a way to change the pitch of recorded audio as it is saved to disk, or played back (in real time). I understand Audio Units can be used for this. The iPhone offers limited support for Audio Units (for example it's not possible to create/use custom audio units, as far as I can tell), but several out-of-the-box audio units are available, one of which is AUPitch. How exactly would I use an audio unit (specifically AUPitch)? Do you hook it into an audio queue somehow? Is it possible to chain audio units together (for example, to simultaneously add an echo effect and a change in pitch)? EDIT: After inspecting the iPhone SDK headers (I think AudioUnit.h, I'm not in front of a Mac at the moment), I noticed that AUPitch is commented out. So it doesn't look like AUPitch is available on the iPhone after all. weep weep Apple seems to have better organized their iPhone SDK documentation at developer.apple.com of late - now its more difficult to find references to AUPitch, etc. That said, I'm still interested in quality answers on using Audio Units (in general) on the iPhone.

    Read the article

  • Discussion - Allowing / blocking user access to pages (Client Side Only!) - Javascript / Jquery

    - by Ozaki
    TLDR Using plain HTML / Javascript (Client Side) I want to prevent viewing of certain pages. The user will have to type a username and password and depending on that they get access to different pages. Answers can NOT include server side whatsoever It does not matter if they can break it easily. There is no sensitive information etc. Also the target audience will not have access to internet OR probably know what a cookie is... At some point the user will have to type username / password.(I can define the cookie here) Currently I thought of using cookies to set a cookie for each page to say "true" / "false" but that would get messy with so many cookies. Or setting an array within a cookie for each page? I have div field "#Content" which as it looks encompasses all of my content on the page so blocking out content will be as simple as replacing it with ("sorry you don't have access") etc. For Example: $.cookie("Access","page1, page2, page3"{ expires: 1 }); I am looking for anyway to do this does not have to be with cookies. Would be nice to get a discussion of different ways this can be done. So the question is: What do YOU think would be a good way to go about doing this with client side validation?

    Read the article

  • C Run-Time library part 2

    - by b-gen-jack-o-neill
    Hi, I was suggested when I have some further questions on my older ones, to create newer Question and reffer to old one. So, this is the original question: What is the C runtime library? OK, from your answers, I now get thet statically linked libraries are Microsoft implementation of C standart functions. Now: If I get it right, the scheme would be as follow: I want to use printf(), so I must include which just tels compiler there us functio printf() with these parameters. Now, when I compile code, becouse printf() is defined in C Standart Library, and becouse Microsoft decided to name it C Run Time library, it gets automatically statically linked from libcmt.lib (if libcmt.lib is set in compiler) at compile time. I ask, becouse on wikipedia, in article about runtime library there is that runtime library is linked in runtime, but .lib files are linked at compile time, am I right? Now, what confuses me. There is .dll version of C standart library. But I thought that to link .dll file, you must actually call winapi program to load that library. So, how can be these functions dynamically linked, if there is no static library to provide code to tell Windows to load desired functions from dll? And really last question on this subject - are C Standart library functions also calls to winapi even they are not .dll files like more advanced WinAPI functions? I mean, in the end to access framebuffer and print something you must tell Windows to do it, since OS cannot let you directly manipulate HW. I think of it like the OS must be written to support all C standart library functions same way across similiar versions, since they are statically linked, and can differently support more complex WinAPI calls becouse new version of OS can have adjustements in the .dll file.

    Read the article

  • Help with possible Haskell type inference quiz questions

    - by Linda Cohen
    foldr:: (a -> b -> b) -> b -> [a] -> b map :: (a -> b) -> [a] -> [b] mys :: a -> a (.) :: (a -> b) -> (c -> a) -> c -> b what is inferred type of: a.map mys :: b.mys map :: c.foldr map :: d.foldr map.mys :: I've tried to create mys myself using mys n = n + 2 but the type of that is mys :: Num a => a -> a What's the difference between Num a = a - a and just a - a? Or what does 'Num a =' mean? Is it that mys would only take Num type? So anyways, a) I got [a] - [a] I think because it would just take a list and return it +2'd according to my definition of mys b) (a - b) - [a] - [b] I think because map still needs take both arguments like (*3) and a list then returns [a] which goes to mys and returns [b] c) I'm not sure how to do this 1. d) I'm not sure how to do this 1 either but map.mys means do mys first then map right? Are my answers and thoughts correct? If not, why not? THANKS!

    Read the article

  • JSF Float Conversion

    - by Phill Sacre
    I'm using JSF 1.2 with IceFaces 1.8 in a project here. I have a page which is basically a big edit grid for a whole bunch of floating-point number fields. This is implemented with inputText fields on the page pointing at a value object with primitive float types Now, as a new requirement sees some of the fields be nullable, I wanted to change the value object to use Float objects rather than primitive types. I didn't think I'd need to do anything to the page to accomodate this. However, when I make the change I get the following error: /pages/page.xhtml @79,14 value="#{row.targetValue}": java.lang.IllegalArgumentException: argument type mismatch And /pages/page.xhtml @79,14 value="#{row.targetValue}": java.lang.IllegalArgumentException: java.lang.ClassCastException@1449aa1 The page looks like this: <ice:inputText value="#{row.targetValue}" size="4"> <f:convertNumber pattern="###.#" /> </ice:inputText> I've also tried adding in <f:convert convertId="javax.faces.Float" /> in there as well but that doesn't seem to work either! Neither does changing the value object types to Double. I'm sure I'm probably missing something really simple but I've been staring at this for a while now and no answers are immediately obvious!

    Read the article

  • Which version of Grady Booch's OOA/D book should I buy?

    - by jackj
    Grady Booch's "Object-Oriented Analysis and Design with Applications" is available brand new in both the 2nd edition (1993) and the 3rd edition (2007), while many used copies of both editions are available. Here are my concerns: 1) The 2nd edition uses C++: given that I just finished reading my first two C++ books (Accelerated C++ and C++ Primer) I guess practical tips can only help, so the 2nd edition is probably best (I think the 3rd edition has absolutely no code). On the other hand, the C++ books I read insist on the importance of using standard C++, whereas Booch's 2nd edition was published before the 1998 standard. 2) The 2nd edition is shorter (608 pages vs. 720) so, I guess, it will be slightly easier to get through. 3) The 3rd edition uses UML 2.0, whereas the 2nd edition is pre-UML. Some reviews say that the notation in the 2nd edition is close enough to UML, so it doesn't matter, but I don't know if I should be worrying about this or not. 4) The 2nd edition is available in good-shape used copies for considerably less than what the 3rd one goes for. Given all the above factors, do you think I should buy the 2nd or the 3rd edition? Recommendations on other books are also welcome but I would prefer it if whoever answers has read at least one of the versions of Booch's book (preferably both!). I have already bought but not read GoF and Riel's books. I also know that I should practice a lot with real-life code. Thanks.

    Read the article

  • What techniques can be used to detect so called "black holes" (a spider trap) when creating a web crawler?

    - by Tom
    When creating a web crawler, you have to design somekind of system that gathers links and add them to a queue. Some, if not most, of these links will be dynamic, which appear to be different, but do not add any value as they are specifically created to fool crawlers. An example: We tell our crawler to crawl the domain evil.com by entering an initial lookup URL. Lets assume we let it crawl the front page initially, evil.com/index The returned HTML will contain several "unique" links: evil.com/somePageOne evil.com/somePageTwo evil.com/somePageThree The crawler will add these to the buffer of uncrawled URLs. When somePageOne is being crawled, the crawler receives more URLs: evil.com/someSubPageOne evil.com/someSubPageTwo These appear to be unique, and so they are. They are unique in the sense that the returned content is different from previous pages and that the URL is new to the crawler, however it appears that this is only because the developer has made a "loop trap" or "black hole". The crawler will add this new sub page, and the sub page will have another sub page, which will also be added. This process can go on infinitely. The content of each page is unique, but totally useless (it is randomly generated text, or text pulled from a random source). Our crawler will keep finding new pages, which we actually are not interested in. These loop traps are very difficult to find, and if your crawler does not have anything to prevent them in place, it will get stuck on a certain domain for infinity. My question is, what techniques can be used to detect so called black holes? One of the most common answers I have heard is the introduction of a limit on the amount of pages to be crawled. However, I cannot see how this can be a reliable technique when you do not know what kind of site is to be crawled. A legit site, like Wikipedia, can have hundreds of thousands of pages. Such limit could return a false positive for these kind of sites. Any feedback is appreciated. Thanks.

    Read the article

  • Convert UCS-2 characters to UTF-8 Using C#

    - by quanticle
    I'm pulling some internationalized text from a MS SQL Server 2005 database. As per the defaults for that DB, the characters are stored as UCS-2. However, I need to output the data in UTF-8 format, as I'm sending it out over the web. Currently, I have the following code to convert: SqlString dbString = resultReader.GetSqlString(0); byte[] dbBytes = dbString.GetUnicodeBytes(); byte[] utf8Bytes = System.Text.Encoding.Convert(System.Text.Encoding.Unicode, System.Text.Encoding.UTF8, dbBytes); System.Text.UTF8Encoding encoder = new System.Text.UTF8Encoding(); string outputString = encoder.GetString(utf8Bytes); However, when I examine the output in the browser, it appears to be garbage, no matter what I set the encoding to. What am I missing? EDIT: In response to the answers below, the reason I thought I had to perform a conversion is because I can output literal multibyte strings just fine. For example: OutputControl.Text = "????????????????????????????????????????????????????????????????"; works. Here, OutputControl is an ASP.Net Literal. However, OutputControl.Text = outputString; //Output from above snippet results in mangled output as described above. My hypothesis was that the database's output was somehow getting mangled by ASP.Net. If that's not the case, then what are some other possibilities?

    Read the article

  • What is the most efficient/elegant way to parse a flat table into a tree?

    - by Tomalak
    Assume you have a flat table that stores an ordered tree hierarchy: Id Name ParentId Order 1 'Node 1' 0 10 2 'Node 1.1' 1 10 3 'Node 2' 0 20 4 'Node 1.1.1' 2 10 5 'Node 2.1' 3 10 6 'Node 1.2' 1 20 What minimalistic approach would you use to output that to HTML (or text, for that matter) as a correctly ordered, correctly intended tree? Assume further you only have basic data structures (arrays and hashmaps), no fancy objects with parent/children references, no ORM, no framework, just your two hands. The table is represented as a result set, which can be accessed randomly. Pseudo code or plain English is okay, this is purely a conceptional question. Bonus question: Is there a fundamentally better way to store a tree structure like this in a RDBMS? EDITS AND ADDITIONS To answer one commenter's (Mark Bessey's) question: A root node is not necessary, because it is never going to be displayed anyway. ParentId = 0 is the convention to express "these are top level". The Order column defines how nodes with the same parent are going to be sorted. The "result set" I spoke of can be pictured as an array of hashmaps (to stay in that terminology). For my example was meant to be already there. Some answers go the extra mile and construct it first, but thats okay. The tree can be arbitrarily deep. Each node can have N children. I did not exactly have a "millions of entries" tree in mind, though. Don't mistake my choice of node naming ('Node 1.1.1') for something to rely on. The nodes could equally well be called 'Frank' or 'Bob', no naming structure is implied, this was merely to make it readable. I have posted my own solution so you guys can pull it to pieces.

    Read the article

  • Optimization in Python - do's, don'ts and rules of thumb.

    - by JV
    Well I was reading this post and then I came across a code which was: jokes=range(1000000) domain=[(0,(len(jokes)*2)-i-1) for i in range(0,len(jokes)*2)] I thought wouldn't it be better to calculate the value of len(jokes) once outside the list comprehension? Well I tried it and timed three codes jv@Pioneer:~$ python -m timeit -s 'jokes=range(1000000);domain=[(0,(len(jokes)*2)-i-1) for i in range(0,len(jokes)*2)]' 10000000 loops, best of 3: 0.0352 usec per loop jv@Pioneer:~$ python -m timeit -s 'jokes=range(1000000);l=len(jokes);domain=[(0,(l*2)-i-1) for i in range(0,l*2)]' 10000000 loops, best of 3: 0.0343 usec per loop jv@Pioneer:~$ python -m timeit -s 'jokes=range(1000000);l=len(jokes)*2;domain=[(0,l-i-1) for i in range(0,l)]' 10000000 loops, best of 3: 0.0333 usec per loop Observing the marginal difference 2.55% between the first and the second made me think - is the first list comprehension domain=[(0,(len(jokes)*2)-i-1) for i in range(0,len(jokes)*2)] optimized internally by python? or is 2.55% a big enough optimization (given that the len(jokes)=1000000)? If this is - What are the other implicit/internal optimizations in Python ? What are the developer's rules of thumb for optimization in Python? Edit1: Since most of the answers are "don't optimize, do it later if its slow" and I got some tips and links from Triptych and Ali A for the do's. I will change the question a bit and request for don'ts. Can we have some experiences from people who faced the 'slowness', what was the problem and how it was corrected? Edit2: For those who haven't here is an interesting read Edit3: Incorrect usage of timeit in question please see dF's answer for correct usage and hence timings for the three codes.

    Read the article

< Previous Page | 171 172 173 174 175 176 177 178 179 180 181 182  | Next Page >