There's a difference between code when you know you will be using it later and code that you're going to throw away. There's also a difference between being lazy and taking time to follow best practices. Below is an example of two command line programs that do the same thing. One is 3 lines long and the other is 100. (Of course the example is contrived, and usually the logic will be more than 3 lines. For cat I would usually use the command line cat) The effect of having the actual logic portion be small is that this is boilerplate code useful for starting python command line programs.

The extra length needs to buy us something. Here's my ideas of what it gives me:

  • Order - I know where items belong
  • Isolated Functionality - No (or limited) execution when importing/reusing
  • Logging - rather than dealing with prints that clutter the screen
  • Testing - I've got doctest built in, but I have a unittest file as well that gives decent coverage
  • Documentation - Illustrates module and function docs as well as doctests
  • Command line docs - use of optparse
  • __main__ - Illustrates use of main statement (with argument arg for reusability So lazy web. Am I a pep8 following geek that just complicates things? Should boilerplate code include more? less?

I'm asking cause I'm doing two intro to python classes soon, and I thought that I might use this as an example since it covers a bunch of ground.

Throwaway Code

import sys

for line in open(sys.argv[1]):
    print line,
More complicated, yet testable, maintainable

#! /usr/bin/env python
canonicalcat is an example of writing a python program that perhaps
you want to distribute or will be maintained over time.  As such one
should add documentation and this is an example of module level
documentation.  If one were to import canonicalcat and type
`help(canonicalcat)` it would spit this out

# These are the imports from the python stdlib
import sys
import logging
import optparse

# Some like to separate additional libraries from the standard ones

# file meta data
__version__ = "0.1"
__author__ = "matt harrison"
__license__ = "psf"

# Since python doesn't have constants, we can emulate them
# using (naming) conventions.  Put them here

logging.basicConfig(filename=".concat.log", level=logging.DEBUG)

def cat_file(fin, outfile=None):
    Main logic to cat file

    >>> import StringIO
    >>> fout = StringIO.StringIO()
    >>> cat_file("small.txt", fout)
    >>> lines = fout.getvalue()
    >>> lines

    logging.log(logging.DEBUG, "CONCAT: %s"% fin)
    if outfile is None:
        fout = sys.stdout
    elif isinstance(outfile, str):
        fout = open(outfile, 'w')
        fout = outfile

    for line in open(fin):

def _test():
    import doctest

def main(prog_args=None):
    A main function.  Rather than putting this logic into the the if
    __name__ statement below, creating a main function allows other
    programs to use main logic.  It also allows for testing, by
    passing in args rather than monkey patching sys.argv (which won't
    be thread safe).

    >>> main(["", "small.txt"])

    if prog_args is None:
        prog_args = sys.argv

    parser = optparse.OptionParser()
    parser.usage = """A python implementation of 'cat', default use is to
    provide a filename to cat"""
    parser.add_option("-o", "--output-file", dest="fileout",
                      help="specify file to cat to (default is stdout)")
    parser.add_option("-t", "--test", dest="test", action="store_true",
                      help="run doctests")

    opt, args = parser.parse_args(prog_args)

    if args[1:]:
        cat_file(args[1], opt.fileout)

    if opt.test:

if __name__ == "__main__":
    # when one "exectutes" a python program, it's __name__ is
    # __main__, otherwise it's name will be the module name