Search Results

Search found 4702 results on 189 pages for 'coding district'.

Page 171/189 | < Previous Page | 167 168 169 170 171 172 173 174 175 176 177 178  | Next Page >

  • Python: How to read huge text file into memory

    - by asmaier
    I'm using Python 2.6 on a Mac Mini with 1GB RAM. I want to read in a huge text file $ ls -l links.csv; file links.csv; tail links.csv -rw-r--r-- 1 user user 469904280 30 Nov 22:42 links.csv links.csv: ASCII text, with CRLF line terminators 4757187,59883 4757187,99822 4757187,66546 4757187,638452 4757187,4627959 4757187,312826 4757187,6143 4757187,6141 4757187,3081726 4757187,58197 So each line in the file consists of a tuple of two comma separated integer values. I want to read in the whole file and sort it according to the second column. I know, that I could do the sorting without reading the whole file into memory. But I thought for a file of 500MB I should still be able to do it in memory since I have 1GB available. However when I try to read in the file, Python seems to allocate a lot more memory than is needed by the file on disk. So even with 1GB of RAM I'm not able to read in the 500MB file into memory. My Python code for reading the file and printing some information about the memory consumption is: #!/usr/bin/python # -*- coding: utf-8 -*- import sys infile=open("links.csv", "r") edges=[] count=0 #count the total number of lines in the file for line in infile: count=count+1 total=count print "Total number of lines: ",total infile.seek(0) count=0 for line in infile: edge=tuple(map(int,line.strip().split(","))) edges.append(edge) count=count+1 # for every million lines print memory consumption if count%1000000==0: print "Position: ", edge print "Read ",float(count)/float(total)*100,"%." mem=sys.getsizeof(edges) for edge in edges: mem=mem+sys.getsizeof(edge) for node in edge: mem=mem+sys.getsizeof(node) print "Memory (Bytes): ", mem The output I got was: Total number of lines: 30609720 Position: (9745, 2994) Read 3.26693612356 %. Memory (Bytes): 64348736 Position: (38857, 103574) Read 6.53387224712 %. Memory (Bytes): 128816320 Position: (83609, 63498) Read 9.80080837067 %. Memory (Bytes): 192553000 Position: (139692, 1078610) Read 13.0677444942 %. Memory (Bytes): 257873392 Position: (205067, 153705) Read 16.3346806178 %. Memory (Bytes): 320107588 Position: (283371, 253064) Read 19.6016167413 %. Memory (Bytes): 385448716 Position: (354601, 377328) Read 22.8685528649 %. Memory (Bytes): 448629828 Position: (441109, 3024112) Read 26.1354889885 %. Memory (Bytes): 512208580 Already after reading only 25% of the 500MB file, Python consumes 500MB. So it seem that storing the content of the file as a list of tuples of ints is not very memory efficient. Is there a better way to do it, so that I can read in my 500MB file into my 1GB of memory?

    Read the article

  • Toolbar items in sub-nib

    - by roe
    This question has probably been asked before, but my google-fu must be inferior to everybody else's, cause I can't figure this out. I'm playing around with the iPhone SDK, and I'm building a concept app I've been thinking about. If we have a look at the skeleton generated with a navigation based app, the MainWindow.xib contains a navigation controller, and within that a root-view controller (and a navigation bar and toolbar if you play around with it a little). The root-view controller has the RootViewController-nib associated with it, which loads the table-view. So far so good. To add content to the tool bar and to the navigation bar, I'm supposed to add those to in the hierarchy below the Root View Controller (which works, no problem). However, what I can't figure out is, this is all still within the MainWindow.xib (or, at runtime, nib). How would I define a xib in order for it to pick up tool bar items from that? I want to do (the equivalent of, just reusing the name here) RootViewController *controller = [[RootViewController alloc] initWithNibName:nil bundle:nil]; [self.navigationController pushViewController:controller animated:YES]; [controller release]; and have the navigation controller pick-up on the tool bar items defined in that nib. The logical place to put it would be in the hierarchy under File's Owner (which is of type RootViewController), but it doesn't appear to be possible. Currently, I'm assigning these (navigationItem and toolbarItems) manually in the viewDidLoad method, or define them in the MainWindow.xib directly to be loaded when the app initializes. Any ideas? Edit I guess I'll try to explain with a picture. This is the Interface Builder of the main window, pretty much as it comes out of the wizard to create a navigation based project. I've added a toolbar item for clarity though. You can see the navigation controller, with a toolbar and a navigation bar, and the root view controller. Basically, the Root View Controller has a bar button item and a navigation item as you can see. The thing is, it's also got a nib associated with it, which, when loaded will instantiate a view, and assign it to the view outlet of the controller (which in that nib is File's Owner, of type RootViewController, as should be). How can I get the toolbar item, and the navigation item, into the other nib, the RootViewController.nib so I can remove them here. The RootViewController.nib adds everything else to the Root View Controller, why not these items? The background for this is that I want to simply instantiate RootViewController, initialize it with its own nib (i.e. initWithNibName:nil shown above), and push it onto the navigation controller, without having to add the navigation/toolbar items in coding (as I do it now).

    Read the article

  • Any one like me who still believes in AspNet Web forms? Or has evryone switched to MVC

    - by The_AlienCoder
    After years mastering Aspnet webforms I recently decided to try out ASpnet MVC. Naturally my first action was to google 'Aspnet webforms vs Aspnet MVC'. I hoped to get an honest comparison of the two development techniques and guidelines on when to use which one. ..But I was completely turned off by the MVC proponents. In almost every post on the net comparing the two 'platforms' the MVC camp is simply bashing webform developers like me. They go on and on about how wrong and stupid using webforms is. As if what we have been doing the past decade has been pointless - all those websites built(& still running), all those clever controls, the mighty gridview..ALL POINTLESS ! Karl Seguin especially with his stupid webforms rant really turned me off. If his intention was to convert people like me he did the oposite and made me defensive. If anything I am now convinced that the webforms approach is better. consider the following All the critical shortcomings of aspnet webforms have now been Addressed in visual studio 2010 with Aspnet 4.0.- Cleaner html, cleaner control IDs, friendly urls, leaner viewstate etc Why would any one want to implement the mighty gridview and other wonderful controls from scratch ? In MVC you have to do this yourself because ABSTRACTION IS PLAIN STUPID- Instead of writing loops why not just code using 1s and 0s then? A stateless web is a WEAKNESS so why would anyone want to get rid of viewstate? Everyone would like a better implementation but getting rid of it is a step backwards not forward. Unit testing is great but not a critical requirement for most web projects. I thought inline codes were dead with asp. But now they are back and fashionable - Thanks to MVC. I dont know about you people but codebehind was REVOLUTIONARY. Ive had the Ajax Update panel do so many wonderful things without writing a line of code so why demonise it? Ive succesfully implemented a chat client, IM client and Status bar using nothing but the update panel and script manager.Ofcourse you cant use it for everything but most of the time its appropriate. And finally the last word from JQUERY - 'Write less do more !' - That's what webforms is all about ! So Am I the only who still believes in webforms(and it getting better as Aspnet 4.0 has shown) or will it be dead and gone a few years from now like asp? I mean if inline coding is 'the future' why not just switch to PHP !

    Read the article

  • Using FFmpeg in .net?

    - by daniel
    So i know its a fairly big challenge but i want to write a basic movie player/converter in c# using the FFmpeg library. However the first obstacle i need to overcome is wrapping the FFmpeg library in c#. I downloaded ffmpeg but couldn't compile it on windows, so i downloaded a precompiled version for me. Ok awesome. Then i started looking for c# wrappers. I have looked around and have found a few wrappers such as SharpFFmpeg (http://sourceforge.net/projects/sharpffmpeg/) and ffmpeg-sharp (http://code.google.com/p/ffmpeg-sharp/). First of all i wanted to use ffmpeg-sharp as its LGPL and SharpFFmpeg is GPL. However it had quite a few compile errors. Turns out it was written for the mono compiler, i tried compiling it with mon but couldn't figure out how. I then started to manually fix the compiler errors myself, but came across a few scary ones and thought i'd better leave those alone. So i gave up on ffmpeg-sharp. Then i looked at SharpFFmpeg and it looks like what i want, all the functions P/Invoked for me. However its GPL? Both the AVCodec.cs and AVFormat.cs files look like ports of avcodec.c and avformat.c which i reckon i could port myself? Then not have to worry about licencing. But i wan't to get this right before i go ahead and start coding. Should I: Write my own c++ library for interacting with ffmpeg, then have my c# program talk to the c++ library in order to play/convert videos etc. OR Port avcodec.h and avformat.h (is that all i need?) to c# by using a whole lot of DllImports and write it entirely in c#? First of all consider that i'm not great at c++ as i rarely use it but i know enough to get around. The reason i'm thinking #1 might be the better option is that most FFmpeg tutorials are in c++ and i'd also have more control over memory management than if i was to do it in c#. What do you think? Also would you happen to have any usefull links (perhaps a tutorial) for using FFmpeg? EDIT: spelling mistakes

    Read the article

  • Another design-related C++ question

    - by Kotti
    Hi! I am trying to find some optimal solutions in C++ coding patterns, and this is one of my game engine - related questions. Take a look at the game object declaration (I removed almost everything, that has no connection with the question). // Abstract representation of a game object class Object : public Entity, IRenderable, ISerializable { // Object parameters // Other not really important stuff public: // @note Rendering template will never change while // the object 'lives' Object(RenderTemplate& render_template, /* params */) : /*...*/ { } private: // Object rendering template RenderTemplate render_template; public: /** * Default object render method * Draws rendering template data at (X, Y) with (Width, Height) dimensions * * @note If no appropriate rendering method overload is specified * for any derived class, this method is called * * @param Backend & b * @return void * @see */ virtual void Render(Backend& backend) const { // Render sprite from object's // rendering template structure backend.RenderFromTemplate( render_template, x, y, width, height ); } }; Here is also the IRenderable interface declaration: // Objects that can be rendered interface IRenderable { /** * Abstract method to render current object * * @param Backend & b * @return void * @see */ virtual void Render(Backend& b) const = 0; } and a sample of a real object that is derived from Object (with severe simplifications :) // Ball object class Ball : public Object { // Ball params public: virtual void Render(Backend& b) const { b.RenderEllipse(/*params*/); } }; What I wanted to get is the ability to have some sort of standard function, that would draw sprite for an object (this is Object::Render) if there is no appropriate overload. So, one can have objects without Render(...) method, and if you try to render them, this default sprite-rendering stuff is invoked. And, one can have specialized objects, that define their own way of being rendered. I think, this way of doing things is quite good, but what I can't figure out - is there any way to split the objects' "normal" methods (like Resize(...) or Rotate(...)) implementation from their rendering implementation? Because if everything is done the way described earlier, a common .cpp file, that implements any type of object would generally mix the Resize(...), etc methods implementation and this virtual Render(...) method and this seems to be a mess. I actually want to have rendering procedures for the objects in one place and their "logic implementation" - in another. Is there a way this can be done (maybe alternative pattern or trick or hint) or this is where all this polymorphic and virtual stuff sucks in terms of code placement?

    Read the article

  • Visual Web Developer 2008 Express and XHTML 1.1 (not applying)

    - by Mike Valeriano
    Hello. I may be missing something since I'm not used to work with IDEs for web development, so please understand if I'm doing something stupid. I've picked a copy of Beginning ASP.NET 3.5 in C# and VB to try and learn this thing quickly. I'm no HTML guru, but I know something about the DTDs and I want to either use XHTML 1.0 Strict or XHTML 1.1 while learning the ins and outs of ASP.NET. Following the book (just started actually), I'm not having any trouble understanding the concepts, since I have a C# background, but what I don't understand is how VWD goes about applying the schema you select for validation. The book explains that the drop-down list on the toolbar is enough to set this, so that's what I did: I've selected XHTML 1.1 (since there is no 1.0 strict option, what I find really odd) and then started the project. The thing is that the code generated automatically for the Default page had the XHTML 1.0 transitional DTD. Every other page added had the same DTD, even though XHTML 1.1 is still the selected schema in the drop-down list. I decided to test it out and there it was: inserting some text in the design view and then applying a bold formatting just inserted < b tags on the code. Changing color does add a span tag with added CSS though. What I want to know is: if I want strict XHTML (or 1.1) should I just code it manually? Or is there a "fix" for this problem (having a schema selected but not applied by the IDE)? Am I missing something really easy and dumb? I have no problem with coding it manually - I actually prefer doing that. The thing is that I really wanna try this WYSIWYG approach for once, since some developers swear by it. Plus I don't know yet if I'll need special tags for ASP.NET, so I think having them inserted automatically should be the best practice. I couldn't find any similar problems around (MSDN, Google, here), and as far as the book goes (up to the point I've read), this is the expected VWD behavior, which I find suspect at least. Sorry for the long text (and poor English - not my primary language). Thanks in advance for any help/tips.

    Read the article

  • C# Nested Property Accessing overloading OR Sequential Operator Overloading

    - by Tim
    Hey, I've been searching around for a solution to a tricky problem we're having with our code base. To start, our code resembles the following: class User { int id; int accountId; Account account { get { return Account.Get(accountId); } } } class Account { int accountId; OnlinePresence Presence { get { return OnlinePresence.Get(accountId); } } public static Account Get(int accountId) { // hits a database and gets back our object. } } class OnlinePresence { int accountId; bool isOnline; public static OnlinePresence Get(int accountId) { // hits a database and gets back our object. } } What we're often doing in our code is trying to access the account Presence of a user by doing var presence = user.Account.Presence; The problem with this is that this is actually making two requests to the database. One to get the Account object, and then one to get the Presence object. We could easily knock this down to one request if we did the following : var presence = UserPresence.Get(user.id); This works, but sort of requires developers to have an understanding of the UserPresence class/methods that would be nice to eliminate. I've thought of a couple of cool ways to be able to handle this problem, and was wondering if anyone knows if these are possible, if there are other ways of handling this, or if we just need to think more as we're coding and do the UserPresence.Get instead of using properties. Overload nested accessors. It would be cool if inside the User class I could write some sort of "extension" that would say "any time a User object's Account property's Presence object is being accessed, do this instead". Overload the . operator with knowledge of what comes after. If I could somehow overload the . operator only in situations where the object on the right is also being "dotted" it would be great. Both of these seem like things that could be handled at compile time, but perhaps I'm missing something (would reflection make this difficult?). Am I looking at things completely incorrectly? Is there a way of enforcing this that removes the burden from the user of the business logic? Thanks! Tim

    Read the article

  • detect face from image and crop face from that photo

    - by Siddhpura Amit
    I have done coding in that i am successfully getting face with rectangle drawing now i want to crop that rectangle area. if there are many rectangle( mean many faces) than user can select one of the face or rectangle and that rectangle areal should be cropped can any body help me... Below is my code class AndroidFaceDetector extends Activity { public String path; @Override public void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); Bundle bundle = this.getIntent().getExtras(); path = bundle.getString("mypath"); setContentView(new myView(this)); } class myView extends View { private int imageWidth, imageHeight; private int numberOfFace = 5; private FaceDetector myFaceDetect; private FaceDetector.Face[] myFace; float myEyesDistance; int numberOfFaceDetected; Bitmap myBitmap; public myView(Context context) { super(context); System.out.println("CONSTRUCTOR"); System.out.println("path = "+path); if (path != null) { BitmapFactory.Options BitmapFactoryOptionsbfo = new BitmapFactory.Options(); BitmapFactoryOptionsbfo.inPreferredConfig = Bitmap.Config.RGB_565; myBitmap = BitmapFactory.decodeFile(path, BitmapFactoryOptionsbfo); imageWidth = myBitmap.getWidth(); imageHeight = myBitmap.getHeight(); myFace = new FaceDetector.Face[numberOfFace]; myFaceDetect = new FaceDetector(imageWidth, imageHeight, numberOfFace); numberOfFaceDetected = myFaceDetect.findFaces(myBitmap, myFace); } else { Toast.makeText(AndroidFaceDetector.this, "Please Try again", Toast.LENGTH_SHORT).show(); } } @Override protected void onDraw(Canvas canvas) { System.out.println("ON DRAW IS CALLED"); if (myBitmap != null) { canvas.drawBitmap(myBitmap, 0, 0, null); Paint myPaint = new Paint(); myPaint.setColor(Color.GREEN); myPaint.setStyle(Paint.Style.STROKE); myPaint.setStrokeWidth(3); for (int i = 0; i < numberOfFaceDetected; i++) { Face face = myFace[i]; PointF myMidPoint = new PointF(); face.getMidPoint(myMidPoint); myEyesDistance = face.eyesDistance(); canvas.drawRect((int) (myMidPoint.x - myEyesDistance), (int) (myMidPoint.y - myEyesDistance), (int) (myMidPoint.x + myEyesDistance), (int) (myMidPoint.y + myEyesDistance), myPaint); } } } } }

    Read the article

  • How to Practice Unix Programming in C?

    - by danben
    After five years of professional Java (and to a lesser extent, Python) programming and slowly feeling my CS education slip away, I decided I wanted to broaden my horizons / general usefulness to the world and do something that feels more (to me) like I really have an influence over the machine. I chose to learn C and Unix programming since I feel like that is where many of the most interesting problems are. My end goal is to be able to do this professionally, if for no other reason than the fact that I have to spend 40-50 hours per week on work that pays the bills, so it may as well also be the type of coding I want to get better at. Of course, you don't get hired to do things you haven't dont before, so for now I am ramping up on my own. To this end, I started with K&R, which was a great resource in part due to the exercises spread throughout each chapter. After that I moved on to Computer Systems: A Programmer's Perspective, followed by ten chapters of Advanced Programming in the Unix Environment. When I am done with this book, I will read Unix Network Programming. What I'm missing in the Stevens books is the lack of programming problems; they mainly document functionality and provide examples, with a few end-of-chapter questions following. I feel that I would benefit much more from being challenged to use the knowledge in each chapter ala K&R. I could write some test program for each function, but this is a less desirable method as (1) I would probably be less motivated than if I were rising to some external challenge, and (2) I will naturally only think to use the function in the ways that have already occurred to me. So, I'd like to get some recommendations on how to practice. Obviously, my first choice would be to find some resource that has Unix programming challenges. I have also considered finding and attempting to contribute to some open source C project, but this is a bit daunting as there would be some overhead in learning to use the software, then learning the codebase. The only open-source C project I can think of that I use regularly is Python, and I'm not sure how easy that would be to get started on. That said, I'm open to all kinds of suggestions as there are likely things I haven't even thought of.

    Read the article

  • Windows Service HTTPListener Memory Issue

    - by crawshaws
    Hi all, Im a complete novice to the "best practices" etc of writing in any code. I tend to just write it an if it works, why fix it. Well, this way of working is landing me in some hot water. I am writing a simple windows service to server a single webpage. (This service will be incorperated in to another project which monitors the services and some folders on a group of servers.) My problem is that whenever a request is recieved, the memory usage jumps up by a few K per request and keeps qoing up on every request. Now ive found that by putting GC.Collect in the mix it stops at a certain number but im sure its not meant to be used this way. I was wondering if i am missing something or not doing something i should to free up memory. Here is the code: Public Class SimpleWebService : Inherits ServiceBase 'Set the values for the different event log types. Public Const EVENT_ERROR As Integer = 1 Public Const EVENT_WARNING As Integer = 2 Public Const EVENT_INFORMATION As Integer = 4 Public listenerThread As Thread Dim HTTPListner As HttpListener Dim blnKeepAlive As Boolean = True Shared Sub Main() Dim ServicesToRun As ServiceBase() ServicesToRun = New ServiceBase() {New SimpleWebService()} ServiceBase.Run(ServicesToRun) End Sub Protected Overrides Sub OnStart(ByVal args As String()) If Not HttpListener.IsSupported Then CreateEventLogEntry("Windows XP SP2, Server 2003, or higher is required to " & "use the HttpListener class.") Me.Stop() End If Try listenerThread = New Thread(AddressOf ListenForConnections) listenerThread.Start() Catch ex As Exception CreateEventLogEntry(ex.Message) End Try End Sub Protected Overrides Sub OnStop() blnKeepAlive = False End Sub Private Sub CreateEventLogEntry(ByRef strEventContent As String) Dim sSource As String Dim sLog As String sSource = "Service1" sLog = "Application" If Not EventLog.SourceExists(sSource) Then EventLog.CreateEventSource(sSource, sLog) End If Dim ELog As New EventLog(sLog, ".", sSource) ELog.WriteEntry(strEventContent) End Sub Public Sub ListenForConnections() HTTPListner = New HttpListener HTTPListner.Prefixes.Add("http://*:1986/") HTTPListner.Start() Do While blnKeepAlive Dim ctx As HttpListenerContext = HTTPListner.GetContext() Dim HandlerThread As Thread = New Thread(AddressOf ProcessRequest) HandlerThread.Start(ctx) HandlerThread = Nothing Loop HTTPListner.Stop() End Sub Private Sub ProcessRequest(ByVal ctx As HttpListenerContext) Dim sb As StringBuilder = New StringBuilder sb.Append("<html><body><h1>Test My Service</h1>") sb.Append("</body></html>") Dim buffer() As Byte = Encoding.UTF8.GetBytes(sb.ToString) ctx.Response.ContentLength64 = buffer.Length ctx.Response.OutputStream.Write(buffer, 0, buffer.Length) ctx.Response.OutputStream.Close() ctx.Response.Close() sb = Nothing buffer = Nothing ctx = Nothing 'This line seems to keep the mem leak down 'System.GC.Collect() End Sub End Class Please feel free to critisise and tear the code apart but please BE KIND. I have admitted I dont tend to follow the best practice when it comes to coding.

    Read the article

  • How would you organize a large complex web application (see basic example)?

    - by Anurag
    How do you usually organize complex web applications that are extremely rich on the client side. I have created a contrived example to indicate the kind of mess it's easy to get into if things are not managed well for big apps. Feel free to modify/extend this example as you wish - http://jsfiddle.net/NHyLC/1/ The example basically mirrors part of the comment posting on SO, and follows the following rules: Must have 15 characters minimum, after multiple spaces are trimmed out to one. If Add Comment is clicked, but the size is less than 15 after removing multiple spaces, then show a popup with the error. Indicate amount of characters remaining and summarize with color coding. Gray indicates a small comment, brown indicates a medium comment, orange a large comment, and red a comment overflow. One comment can only be submitted every 15 seconds. If comment is submitted too soon, show a popup with appropriate error message. A couple of issues I noticed with this example. This should ideally be a widget or some sort of packaged functionality. Things like a comment per 15 seconds, and minimum 15 character comment belong to some application wide policies rather than being embedded inside each widget. Too many hard-coded values. No code organization. Model, Views, Controllers are all bundled together. Not that MVC is the only approach for organizing rich client side web applications, but there is none in this example. How would you go about cleaning this up? Applying a little MVC/MVP along the way? Here's some of the relevant functions, but it will make more sense if you saw the entire code on jsfiddle: /** * Handle comment change. * Update character count. * Indicate progress */ function handleCommentUpdate(comment) { var status = $('.comment-status'); status.text(getStatusText(comment)); status.removeClass('mild spicy hot sizzling'); status.addClass(getStatusClass(comment)); } /** * Is the comment valid for submission */ function commentSubmittable(comment) { var notTooSoon = !isTooSoon(); var notEmpty = !isEmpty(comment); var hasEnoughCharacters = !isTooShort(comment); return notTooSoon && notEmpty && hasEnoughCharacters; } // submit comment $('.add-comment').click(function() { var comment = $('.comment-box').val(); // submit comment, fake ajax call if(commentSubmittable(comment)) { .. } // show a popup if comment is mostly spaces if(isTooShort(comment)) { if(comment.length < 15) { // blink status message } else { popup("Comment must be at least 15 characters in length."); } } // show a popup is comment submitted too soon else if(isTooSoon()) { popup("Only 1 comment allowed per 15 seconds."); } });

    Read the article

  • iGoogle Stack Overflow Gadget [closed]

    - by Charango
    Since SO's question database is becoming an excellent first point of call for finding answers to coding problems, some people (like me) might like to be able to fire up searches from their iGoogle home page, among the other searches you might launch from there. I've created a very simple gadget to do this, and put the source below. My hope is that this might provide a foundation on which community members with better ideas for performing this function, or ideas for enhancing it, can update it. Perhaps we could make it configurable to search via a site scope search from Google and / or a question search, for instance. I've hosted and registered this first version but if anyone makes changes and can host a new version / new pics elsewhere, please feel free. <?xml version="1.0" encoding="UTF-8"?> <Module> <ModulePrefs title="Stack Overflow Search" author="Community Wiki" author_email="[email protected]" author_affiliation="Stack Overflow" author_location="The Internet" author_aboutme="All sorts" author_link="http://stackoverflow.com" author_quote="Hofstadter's Law: It always takes longer than you expect, even when you take into account Hofstadter's Law." description="Stack Overflow is rapidly becoming one of the best resources for finding answers to your programming questions. This gadget adds a question search box to your iGoogle homepage" screenshot="http://arkios-solutions.com/misc/sogadget/SOGadget.png" thumbnail="http://arkios-solutions.com/misc/sogadget/SOGadget_Thumbnail.png" singleton="true" title_url="http://stackoverflow.com" /> <Content type="html"><![CDATA[ <img src="http://i.stackoverflow.com/Content/Img/stackoverflow-logo-250.png" alt="Stackoverflow Logo"/> <div style="font-family:arial;font-size:0.8em;"> This gadget allows you to search the Stack Overflow question database. </div> <form name='SOQueryForm' action="http://stackoverflow.com/search" method="get" target="new"> <p>Your question: <input type='text' size='34' name='q' /></p> <p><input type='submit' value='Go' /></p> </form> ]]> </Content> </Module> The source / installable gadget xml are also hosted here.

    Read the article

  • GOTO still considered harmful?

    - by Kyle Cronin
    Everyone is aware of Dijkstra's Letters to the editor: go to statement considered harmful (also here .html transcript and here .pdf) and there has been a formidable push since that time to eschew the goto statement whenever possible. While it's possible to use goto to produce unmaintainable, sprawling code, it nevertheless remains in modern programming languages. Even the advanced continuation control structure in Scheme can be described as a sophisticated goto. What circumstances warrant the use of goto? When is it best to avoid? As a followup question: C provides a pair of functions, setjmp and longjmp, that provide the ability to goto not just within the current stack frame but within any of the calling frames. Should these be considered as dangerous as goto? More dangerous? Dijkstra himself regretted that title, of which he was not responsible for. At the end of EWD1308 (also here .pdf) he wrote: Finally a short story for the record. In 1968, the Communications of the ACM published a text of mine under the title "The goto statement considered harmful", which in later years would be most frequently referenced, regrettably, however, often by authors who had seen no more of it than its title, which became a cornerstone of my fame by becoming a template: we would see all sorts of articles under the title "X considered harmful" for almost any X, including one titled "Dijkstra considered harmful". But what had happened? I had submitted a paper under the title "A case against the goto statement", which, in order to speed up its publication, the editor had changed into a "letter to the Editor", and in the process he had given it a new title of his own invention! The editor was Niklaus Wirth. A well thought out classic paper about this topic, to be matched to that of Dijkstra, is Structured Programming with go to Statements (also here .pdf), by Donald E. Knuth. Reading both helps to reestablish context and a non-dogmatic understanding of the subject. In this paper, Dijkstra's opinion on this case is reported and is even more strong: Donald E. Knuth: I believe that by presenting such a view I am not in fact disagreeing sharply with Dijkstra's ideas, since he recently wrote the following: "Please don't fall into the trap of believing that I am terribly dogmatical about [the go to statement]. I have the uncomfortable feeling that others are making a religion out of it, as if the conceptual problems of programming could be solved by a single trick, by a simple form of coding discipline!"

    Read the article

  • Minimalistic tools for developer documentation

    - by Pekka
    I am currently working on a large PHP CMS / Framework and documenting it extensively as I go along. In addition to phpdoc-style inline comments, I need to document XML structures, details on concepts and practices, write HOWTOs and so on. At the moment, I am using simple OpenOffice documents for that, but I'm unhappy with it and looking for a "real" documentation system. So, I am looking for recommendations for robust, minimalistic, easy-to-use documentation software. I have tried a number of Wikis, most prominently Dokuwiki. I like the open-minded approach, the freedom in editing, and the simplicity, but they provide little support in structuring a multi-chapter documentation, and make basic reorganisation tasks very difficult (e.g. moving pages to a different namespace). Working with the plugins is Cumbersome, and they are not really easy to use. Open Source would be a plus but is not a requirement. Thanks for all the suggestions. I have not had time to look into each one in detail. I will be trying Sphinx, especially because it provides so much support for a good structure. I may update this post later when I'm done and report how it worked out. The suggestions Trac's built-in wiki which is great but for my taste provides too little support for keeping a structure - it's perfect though for "normal", smaller size project documentation Markdown my current favourite because of its minimalism, however not sure yet whether maintaining a structure will be easy enough. A Markdown-Based system would of course be very easy to extend, e.g. to look up cross references from the project's code base. Of course it would be great to find something that already has that out of the box. The DocBook format and to edit, the commercial Oxygen XML Editor - a great standard for building documentation, no doubt. Maybe too "technical" for my purposes as I need something to open quickly, write into and go on coding. Still always worth a mention. Sphinx an Open Source, Python based documentation generator, promising structured documentation and extensive cross-referencing. Interesting and will take a look. Confluence a commercial but very affordable Wiki. XWiki, an Open Source playing in Confluence's league with numerous extensions and connectors to Eclipse and Microsoft Office. TiddlyWiki an open-source Wiki.

    Read the article

  • Drawing Bresenham’s Line- Algorithm in all quadrants

    - by Yoyo2965259
    I am newbie for OpenGL. I am practicing the exercises from my textbook but I could not get the outputs which is should be in Bresenham's Line Algorithm in all quadrants. Here's the coding: #include <Windows.h> #include <GL/glut.h> void init(void) { glClearColor(0.0, 0.0, 0.0, 0.0); glShadeModel(GL_FLAT); } void BresnCir(void) { int delta, deltadash; glClear(GL_COLOR_BUFFER_BIT); glPointSize(3.0); int r = 150; int x = 0; int y = r; int D = 2 * (1 - r); glBegin(GL_POINTS); do { glVertex2i(x, y); if (D < 0) { delta = 2 * D + 2 * y - 1; if (delta <= 0) { x++; Right(x); } else { x++; y--; Diagonal(x, y); } glVertex2i(x, y); } else { deltadash = 2 * D - 2 * x - 1; if (deltadash <= 0) { x++; y--; Diagonal(x, y); } else { y--; Down(y); } glVertex2i(x, y); } if (D == 0) { x++; y--; Diagonal(x, y); glVertex2i(x, y); } } while (y > 0); glEnd(); glFlush(); } int main(int argc, char** argv) { glutInit(&argc, argv); glutInitDisplayMode(GLUT_SINGLE | GLUT_RGB); glutInitWindowSize(400, 150); glutInitWindowPosition(100, 100); glutCreateWindow(argv[0]); init(); glutDisplayFunc(BresnCir); glutMainLoop(); return 0; } But, it keep comes out with errors C3861.

    Read the article

  • Problem with signal handlers being called too many times [closed]

    - by Hristo
    how can something print 3 times when it only goes the printing code twice? I'm coding in C and the code is in a SIGCHLD signal handler I created. void chld_signalHandler() { int pidadf = (int) getpid(); printf("pidafdfaddf: %d\n", pidadf); while (1) { int termChildPID = waitpid(-1, NULL, WNOHANG); if (termChildPID == 0 || termChildPID == -1) { break; } dll_node_t *temp = head; while (temp != NULL) { printf("stuff\n"); if (temp->pid == termChildPID && temp->type == WORK) { printf("inside if\n"); // read memory mapped file b/w WORKER and MAIN // get statistics and write results to pipe char resultString[256]; // printing TIME int i; for (i = 0; i < 24; i++) { sprintf(resultString, "TIME; %d ; %d ; %d ; %s\n",i,1,2,temp->stats->mboxFileName); fwrite(resultString, strlen(resultString), 1, pipeFD); } remove_node(temp); break; } temp = temp->next; } printf("done printing from sigchld \n"); } return; } the output for my MAIN process is this: MAIN PROCESS 16214 created WORKER PROCESS 16220 for file class.sp10.cs241.mbox pidafdfaddf: 16214 stuff stuff inside if done printing from sigchld MAIN PROCESS 16214 created WORKER PROCESS 16221 for file class.sp10.cs225.mbox pidafdfaddf: 16214 stuff stuff inside if done printing from sigchld and the output for the MONITOR process is this: MONITOR: pipe is open for reading MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs225.mbox MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs225.mbox MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs241.mbox MONITOR: end of readpipe ( I've taken out repeating lines so I don't take up so much space ) Thanks, Hristo

    Read the article

  • How should I provide access to this custom DAL?

    - by Casey
    I'm writing a custom DAL (VB.NET) for an ordering system project. I'd like to explain how it is coded now, and receive some alternate ideas to make coding against the DAL easier/more readable. The DAL is part of an n-tier (not n-layer) application, where each tier is in it's own assembly/DLL. The DAL consists of several classes that have specific behavior. For instance, there is an Order class that is responsible for retrieving and saving orders. Most of the classes have only two methods, a "Get" and a "Save," with multiple overloads for each. These classes are marked as Friend and are only visible to the DAL (which is in it's own assembly). In most cases, the DAL returns what I will call a "Data Object." This object is a class that contains only data and validation, and is located in a common assembly that both the BLL and DAL can read. To provide public access to the DAL, I currently have a static (module) class that has many shared members. A simplified version looks something like this: Public Class DAL Private Sub New End Sub Public Shared Function GetOrder(OrderID as String) as OrderData Dim OrderGetter as New OrderClass Return OrderGetter.GetOrder(OrderID) End Function End Class Friend Class OrderClass Friend Function GetOrder(OrderID as string) as OrderData End Function End Class The BLL would call for an order like this: DAL.GetOrder("123456") As you can imagine, this gets cumbersome very quickly. I'm mainly interested in structuring access to the DAL so that Intellisense is very intuitive. As it stands now, there are too many methods/functions in the DAL class with similar names. One idea I had is to break down the DAL into nested classes: Public Class DAL Private Sub New End Sub Public Class Orders Private Sub New End Sub Public Shared Function Get(OrderID as string) as OrderData End Function End Class End Class So the BLL would call like this: DAL.Orders.Get("12345") This cleans it up a bit, but it leaves a lot of classes that only have references to other classes, which I don't like for some reason. Without resorting to passing DB specific instructions (like where clauses) from BLL to DAL, what is the best or most common practice for providing a single point of access for the DAL?

    Read the article

  • How to make efficient code emerge through unit testing

    - by Jean
    Hi, I participate in a TDD Coding Dojo, where we try to practice pure TDD on simple problems. It occured to me however that the code which emerges from the unit tests isn't the most efficient. Now this is fine most of the time, but what if the code usage grows so that efficiency becomes a problem. I love the way the code emerges from unit testing, but is it possible to make the efficiency property emerge through further tests ? Here is a trivial example in ruby: prime factorization. I followed a pure TDD approach making the tests pass one after the other validating my original acceptance test (commented at the bottom). What further steps could I take, if I wanted to make one of the generic prime factorization algorithms emerge ? To reduce the problem domain, let's say I want to get a quadratic sieve implementation ... Now in this precise case I know the "optimal algorithm, but in most cases, the client will simply add a requirement that the feature runs in less than "x" time for a given environment. require 'shoulda' require 'lib/prime' class MathTest < Test::Unit::TestCase context "The math module" do should "have a method to get primes" do assert Math.respond_to? 'primes' end end context "The primes method of Math" do should "return [] for 0" do assert_equal [], Math.primes(0) end should "return [1] for 1 " do assert_equal [1], Math.primes(1) end should "return [1,2] for 2" do assert_equal [1,2], Math.primes(2) end should "return [1,3] for 3" do assert_equal [1,3], Math.primes(3) end should "return [1,2] for 4" do assert_equal [1,2,2], Math.primes(4) end should "return [1,5] for 5" do assert_equal [1,5], Math.primes(5) end should "return [1,2,3] for 6" do assert_equal [1,2,3], Math.primes(6) end should "return [1,3] for 9" do assert_equal [1,3,3], Math.primes(9) end should "return [1,2,5] for 10" do assert_equal [1,2,5], Math.primes(10) end end # context "Functionnal Acceptance test 1" do # context "the prime factors of 14101980 are 1,2,2,3,5,61,3853"do # should "return [1,2,3,5,61,3853] for ${14101980*14101980}" do # assert_equal [1,2,2,3,5,61,3853], Math.primes(14101980*14101980) # end # end # end end and the naive algorithm I created by this approach module Math def self.primes(n) if n==0 return [] else primes=[1] for i in 2..n do if n%i==0 while(n%i==0) primes<<i n=n/i end end end primes end end end

    Read the article

  • Problem with signal handlers

    - by Hristo
    how can something print 3 times when it only goes the printing code twice? I'm coding in C and the code is in a SIGCHLD signal handler I created. void chld_signalHandler() { int pidadf = (int) getpid(); printf("pidafdfaddf: %d\n", pidadf); while (1) { int termChildPID = waitpid(-1, NULL, WNOHANG); if (termChildPID == 0 || termChildPID == -1) { break; } dll_node_t *temp = head; while (temp != NULL) { printf("stuff\n"); if (temp->pid == termChildPID && temp->type == WORK) { printf("inside if\n"); // read memory mapped file b/w WORKER and MAIN // get statistics and write results to pipe char resultString[256]; // printing TIME int i; for (i = 0; i < 24; i++) { sprintf(resultString, "TIME; %d ; %d ; %d ; %s\n",i,1,2,temp->stats->mboxFileName); fwrite(resultString, strlen(resultString), 1, pipeFD); } remove_node(temp); break; } temp = temp->next; } printf("done printing from sigchld \n"); } return; } the output for my MAIN process is this: MAIN PROCESS 16214 created WORKER PROCESS 16220 for file class.sp10.cs241.mbox pidafdfaddf: 16214 stuff stuff inside if done printing from sigchld MAIN PROCESS 16214 created WORKER PROCESS 16221 for file class.sp10.cs225.mbox pidafdfaddf: 16214 stuff stuff inside if done printing from sigchld and the output for the MONITOR process is this: MONITOR: pipe is open for reading MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs225.mbox MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs225.mbox MONITOR PIPE: TIME; 0 ; 1 ; 2 ; class.sp10.cs241.mbox MONITOR: end of readpipe ( I've taken out repeating lines so I don't take up so much space ) Thanks, Hristo

    Read the article

  • NoSQL DB for .Net document-based database (ECM)

    - by Dane
    I'm halfway through coding a basic multi-tenant SaaS ECM solution. Each client has it's own instance of the database / datastore, but the .Net app is single instance. The documents are pretty much read only (i.e. an image archive of tiffs or PDFs) I've used MSSQL so far, but then started thinking this might be viable in a NoSQL DB (e.g. MongoDB, CouchDB). The basic premise is that it stores documents, each with their own particular indexes. Each tenant can have multiple document types. e.g. One tenant might have an invoice type, which has Customer ID, Invoice Number and Invoice Date. Another tenant might have an application form, which has Member Number, Application Number, Member Name, and Application Date. So far I've used the old method which Sharepoint (used?) to use, and created a document table which has int_field_1, int_field_2, date_field_1, date_field_2, etc. Then, I've got a "mapping" table which stores the customer specific index name, and the database field that will map to. I've avoided the key-value pair model in the DB due to volume of documents. This way, we can support multiple document types in the one table, and get reasonably high performance out of it, and allow for custom document type searches (i.e. user selects a document type, then they're presented with a list of search fields). However, a NoSQL DB might make this a lot simpler, as I don't need to worry about denormalizing the document. However, I've just got concerns about the rest of the data around a document. We store an "action history" against the document. This tracks views, whether someone emails the document from within the system, and other "future" functionality (e.g. faxing). We have control over the document load process, so we can manipulate the data however it needs to be to get it in the document store (e.g. assign unique IDs). Users will not be adding in their own documents, so we shouldn't need to worry about ACID compliance, as the documents are relatively static. So, my questions I guess : Is a NoSQL DB a good fit Is MongoDB the best for Asp.Net (I saw Raven and Velocity, but they're still kinda beta) Can I store a key for each document, and then store the action history in a MSSQL DB with this key? I don't need to do joins, it would be if a person clicks "View History" against a document. How would performance compare between the two (NoSQL DB vs denormalized "document" table) Volumes would be up to 200,000 new documents per month for a single tenant. My current scaling plan with the SQL DB involves moving the SQL DB into a cluster when certain thresholds are reached, and then reviewing partitioning and indexing structures.

    Read the article

  • Serializing Class Derived from Generic Collection yet Deserializing the Generic Collection

    - by Stacey
    I have a Repository Class with the following method... public T Single<T>(Predicate<T> expression) { using (var list = (Models.Collectable<T>)System.Xml.Serializer.Deserialize(typeof(Models.Collectable<T>), FileName)) { return list.Find(expression); } } Where Collectable is defined.. [Serializable] public class Collectable<T> : List<T>, IDisposable { public Collectable() { } public void Dispose() { } } And an Item that uses it is defined.. [Serializable] [System.Xml.Serialization.XmlRoot("Titles")] public partial class Titles : Collectable<Title> { } The problem is when I call the method, it expects "Collectable" to be the XmlRoot, but the XmlRoot is "Titles" (all of object Title). I have several classes that are collected in .xml files like this, but it seems pointless to rewrite the basic methods for loading each up when the generic accessors do it - but how can I enforce the proper root name for each file without hard coding methods for each one? The [System.Xml.Serialization.XmlRoot] seems to be ignored. When called like this... var titles = Repository.List<Models.Title>(); I get the exception <Titlesxmlns=''> was not expected. The XML is formatted such as. .. <?xml version="1.0" encoding="utf-16"?> <Titles xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema"> <Title> <Id>442daf7d-193c-4da8-be0b-417cec9dc1c5</Id> </Title> </Titles> Here is the deserialization code. public static T Deserialize<T>(String xmlString) { System.Xml.Serialization.XmlSerializer XmlFormatSerializer = new System.Xml.Serialization.XmlSerializer(typeof(T)); StreamReader XmlStringReader = new StreamReader(xmlString); //XmlTextReader XmlFormatReader = new XmlTextReader(XmlStringReader); try { return (T)XmlFormatSerializer.Deserialize(XmlStringReader); } catch (Exception e) { throw e; } finally { XmlStringReader.Close(); } }

    Read the article

  • Do you still limit line length in code?

    - by Noldorin
    This is a matter on which I would like to gauge the opinion of the community: Do you still limit the length of lines of code to a fixed maximum? This was certainly a convention of the past for many languages; one would typically cap the number of characters per line to a value such as 80 (and more recnetly 100 or 120 I believe). As far as I understand, the primary reasons for limiting line length are: Readability - You don't have to scroll over horizontally when you want to see the end of some lines. Printing - Admittedly (at least in my experience), most code that you are working on does not get printed out on paper, but by limiting the number of characters you can insure that formatting doesn't get messed up when printed. Past editors (?) - Not sure about this one, but I suspect that at some point in the distant past of programming, (at least some) text editors may have been based on a fixed-width buffer. I'm sure there are points that I am still missing out, so feel free to add to these... Now, when I tend to observe C or C# code nowadays, I often see a number of different styles, the main ones being: Line length capped to 80, 100, or even 120 characters. As far as I understand, 80 is the traditional length, but the longer ones of 100 and 120 have appeared because of the widespread use of high resolutions and widescreen monitors nowadays. No line length capping at all. This tends to be pretty horrible to read, and I don't see it too often, though it's certainly not too rare either. Inconsistent capping of line length. The length of some lines are limited to a fixed maximum (or even a maximum that changes depending on the file/location in code), while others (possibly comments) are not at all. My personal preference here (at least recently) has been to cap the line length to 100 in the Visual Studio editor. This means that in a decently sized window (on a non-widescreen monitor), the ends of lines are still fully visible. I can however see a few disadvantages in this, especially when you end up writing code that's indented 3 or 4 levels and then having to include a long string literal - though I often take this as a sign to refactor my code! In particular, I am curious what the C and C# coders (or anyone who uses Visual Studio for that matter) think about this point, though I would be interested in hearing anyone's thoughts on the subject. Edit Thanks for the all answers - I appreciate the variety of opinions here, all presenting sound reasons. Consensus does seem to be tipping in the direction of always (or almost always) limit the line length. Interestingly, it seems to be in various coding standards to limit the line length. Judging by some of the answers, both the Python and Google CPP guidelines set the limit at 80 chars. I haven't seen anything similar regarding C# or VB.NET, but I would be curious to see if there are ones anywhere.

    Read the article

  • General Questions on setting up Subversion (w/Netbeans) and web development workflow.

    - by Roeland
    Hey guys, I am web developer who currently works mostly on his own but, some projects require outside coding help (my brother.) Anyway, after running in to the problem of "working on the same files" and "saving over each others edits" I decided to research ways to avoid this. Through the help of stackoverflow I decided on subversion. My setup is the following: A windows 2008 server with a clean install. My desktop which is on the local network of the server. Then my brothers desktop which is at his house, not on lan. We both prefer to use netbeans for development. My Questions: How do we set this thing up correctly and most optimally. Here is my current setup and work flow. dev site: in the past I just created sub domains with my web host for test sites (company1.bythepixel.com, company2.bythepixel.com), and editted those sites with netbeans set up having remote sources (ftp). how do i set up my netbeans now? should I set it up with remote sources? I guess I may need to set up a web server on my local server. I'm just not sure of the work flow. When i hit save in netbeans.. will it update the repository.. then do i need to update the site from the repository somehow? live site: when going live, i would just copy all the files from the dev site into the live site. from what i gather i should be able to update the site from the dev repository? Currently I am toying with virtualsvn server (http://www.visualsvn.com/server/) on my local server. It looks like it is set up to use the http protocol. Is there advantages to this or should I consider something that does the file//. Do you recommend any other subversion software that would run on my 2008 box? how will my brother connect? should i set up a permanent vpn from his house to mine? suggestions? (not that important) how do I deal with databases, there anyway to do subversion on database? I know I have a lot of questions and I am trying to read / make sense of the free online subversion book, but its all so overwhelming! Wish there was a small subversion for dummies guide :)

    Read the article

  • Is it normal for a programmer with 2 years experience to take a long time to code simple programs?

    - by ajax81
    Hi all, I'm a relatively new programmer (18 months on the scene), and I'm finally getting to the point where I'm comfortable accepting projects and developing solutions under minimal supervision. Unfortunately, this also means that I've become acutely aware of my performance shortfalls, the most prevalent of which is the amount of time it takes me to develop, test, and submit algorithms for review. A great example of what I'm talking about occurred this week when I was tasked with developing a simple XML web service (asp.net 3.5) callable via client-side JavaScript, that accepts a single parameter and returns a dataset output to a modal window (please note this is the first time I've had to develop a web service and have had ZERO experience creating/consuming them...let alone calling them from JS client side). Keeping a long story short -- I worked on it for 4 days straight, all day each day, for a grand total of 36 hours, not including the time I spent dwelling on the problem in the shower, the morning commute, and laying awake in bed at night. I learned a great deal about web services and xml/json/javascript...but was called in for a management review to discuss the length of time it took me to develop the solution. In the meeting, I was praised for the quality of my work and was in fact told that my effort was commendable. However, they (senior leads and pm's) weren't impressed with the amount of time it took me to develop the solution and expressed that they would have liked to see the solution in roughly 1/3 of the time it took me. I guess what concerns me the most is that I've identified this pattern as common for myself. Between online videos, book research, and trial/error coding...if its something I haven't seen before, I can spend up to two weeks on a problem that seems to only take the pros in the videos moments to code up. And of course, knowing that management isn't happy with this pattern has shaken me up a bit. To sum up, I have some very specific questions I'd like to ask, and would greatly appreciate your objective professional feedback. Is my experience as a junior programmer common among new developers? Or is it possible that I'm just not cut out for the work? If you suspect that my experience is not common and that there may be an aptitude issue, do you have any suggestions/solutions that I could propose to management to help bring me up to speed? Do seasoned, professional programmers ever encounter knowledge barriers that considerably delay deliverables? When you started out in the industry, did you know how to "do it all"? If not, how long did it take you to be perceived as "proficient"? Was it a natural progression of trial and error, or was there a particular zen moment when you knew you had achieved super saiyen power level? Anyways, thanks for taking the time to read my question(s). I don't know if this is the right place to ask for professional career guidance, but I greatly appreciate your willingness to help me out. Cheers, Daniel

    Read the article

  • Fuzzy Regex, Text Processing, Lexical Analysis?

    - by justinzane
    I'm not quite sure what terminology to search for, so my title is funky... Here is the workflow I've got: Semi-structured documents are scanned to file. The files are OCR'd to text. The text is parsed into Python objects The objects are serialized (to SQL, JSON, whatever) for use. The documents are structures like this: HEADER blah blah, Page ### blah Garbage text... 1. Question Text... continued until now. A. Choice text... adsadsf. B. Another Choice... 2. Another Question... I need to extract the questions and choices. The problem is that, because the text is OCR output, there are occasional strange substitutions like '2' - 'Z' which makes ordinary regular expressions useless. I've tried the Levenshtein module and it helps, but it requires prior knowledge of what edit distance is to be expected. I don't know whether I'm looking to create a parser? a lexer? something else? This has lead me down all kinds of interesting but nonrelevant paths. Guidance would be greatly appreciated. Oh, also, the text is generally from specific technical domains, so general spelling tools are not so helpful. Regarding the structure of the documents, there is no clear visual pattern -- like line breaks or indentation -- with the exception of the fact that "questions" usually begin a line. Crap on the document can cause characters to appear before the actual beginning of the line, which means that something along the lines of r'^[0-9]+' does not reliably work. Though the "questions" always begin with an int, a period and a space; the OCR can substitute other characters or skip characters. This is not so much a problem with Tesseract or Cunieform, rather with the poor quality of the paper documents. # Note: for the project in question, it was decided that having a human prep the OCR'd text was better that spending the time coding a solution. I'd still love good pointers, however.

    Read the article

< Previous Page | 167 168 169 170 171 172 173 174 175 176 177 178  | Next Page >