Program structure in long running data processing python script

Posted by fmark on Stack Overflow See other posts from Stack Overflow or by fmark
Published on 2010-05-27T13:11:20Z Indexed on 2010/05/27 15:01 UTC
Read the original article Hit count: 248

For my current job I am writing some long-running (think hours to days) scripts that do CPU intensive data-processing. The program flow is very simple - it proceeds into the main loop, completes the main loop, saves output and terminates: The basic structure of my programs tends to be like so:

<import statements>
<constant declarations>

<misc function declarations>

def main():
   for blah in blahs():
      <lots of local variables>
      <lots of tightly coupled computation>

      for something in somethings():
          <lots more local variables>
          <lots more computation>

   <etc., etc.>

   <save results>

if __name__ == "__main__":
    main()

This gets unmanageable quickly, so I want to refactor it into something more manageable. I want to make this more maintainable, without sacrificing execution speed.

Each chuck of code relies on a large number of variables however, so refactoring parts of the computation out to functions would make parameters list grow out of hand very quickly. Should I put this sort of code into a python class, and change the local variables into class variables? It doesn't make a great deal of sense tp me conceptually to turn the program into a class, as the class would never be reused, and only one instance would ever be created per instance.

What is the best practice structure for this kind of program? I am using python but the question is relatively language-agnostic, assuming a modern object-oriented language features.

© Stack Overflow or respective owner

Related posts about python

Related posts about best-practices