Standard Library Rage Of The Now
Probably one of the most common operations that build/release automation systems do is running other simple commands1, and making sure that they succeeded. Sometimes, you end up wanting to parse the output of the command you ran in your native language. Maybe you want to log it to a file, too.
At first, you might think this is an easy problem: most scripting language give you a multitude of ways to do accomplish this seemingly trivial and common operation.
Except… they mostly leave something out. A lot of the most commonly-used methods for this—backticks and system()—really mess it up, save for the trivialist, “No I don’t actually care if you ran the command I told you to run”-cases.
It turns out that if you want to call lots of commands in robust ways, it’s a somewhat complicated process2
I’m generally defining “robust” as
- Handles command arguments safely and securely
- you can pass it a timeout value, and it’ll do the right thing
- if some external force kills your subprocess, it’ll do the right thing, and communicate as much information as possible about what happened, and
- you can easily get access to standard out and standard error (SEPARATELY3), and manipulate them to your liking
- Has other niceties, like letting you know how long the program took to execute and changing the working directory
None of Perl’s4 built-in constructs do this for you, which is why there’s a beautiful5 263 lines of Perl I wrote almost two years ago to do this correctly. And it took about nine months to get all the corner cases correct and all of the bugs ironed out.6
Here at the Nest, we mostly use Python for build stuff, so when the need for “running a subprocess and collecting its output” came up, as it invariably does, I thought to myself “Well, with the Python book coming in at almost 1,600 pages, there’s gotta be something that wraps all the ugly, operating system-level (and -specific!) functions to do this!”
It turns out there is: the subprocess module!
At first, I was ecstatic, but as I played around with it more, it turns out that it’s completely and utterly useless for arbitrary data output sizes.
And I don’t mean “arbitrary” in the “gigabytes of output”-sense, but “arbitrary” in the “whatever the size of your kernel’s pipe() buffer”-sense.
Oh, and no… there isn’t a desire to make the library actually… y’know… useful.
Sigh.
Oh Lazy Web7 please tell me I’m missing something very obvious, and that there is a Python module out there for me.
I have high hopes that this is, indeed, 2008, and I don’t have to implement this myself.
Again.8
‘Cause I don’t wanna.
I really don’t wanna.
________________________________
1 Like “make” and “cp”
2 No pun inten… oh, sure, whatever. Sure. Pun intended.
3 Or not.
4 Let us say…
5 If I do say so myself…
6 Well… not exactly, ’cause I just found a minor bug while perusing it now…
7 Dear Lazy Web: I cannot link to you because you are not defined on Wikipedia or Urban Dictionary. Plz to be fixing so I can has. <3 preed
8 For the fourth time in the third language…
Lazyweb (one word)
@<strike>Anonymous</strike> LazyWeb:
I’m obviously too lazy to have researched that further myself.
Thanks.
<3 Preed
Why don’t you just use the perl code? You should be able to call it from Python (all those languages tend to be really good with the interop stuff).
Like popen2?
Dealing with subprocesses is hard, and subprocess.py gets basically everything right.
It’s your code that is wrong. The library does not say that subprocess.PIPE will have an infinite buffer. Just that it creates an OS-level communication channel from which you must read. Your code is just blocked in wait().
Most likely, you want to use .communicate() which multiplexes between waiting for the process to exit and reading output.
Re: subprocess blocking on full pipes… er, read from them then?
If you don’t care about real-time interaction, subprocess.Popen.communicate blocks till the process ends then returns (stdout, stderr) – and should handle races and pipe buffer issues for you.
More code samples and fewer sarcy footnotes might make your problem clearer though, could be something else entirely.
Python is awesome!
I’ve run into similar issues as you, and have had to make my own helper functions to do this kind of thing too. Regarding the issue of arbitrary data output sizes, I’ve used the ‘tempfile’ module to create temporary stdout/stderr outputs and read from those. There’s also a python package called ‘processing’ (I’d include a link to it, but I’m afraid my comment would get tagged as spam, so just google for ‘processing python’) which might help out more with this, but I’m not sure.
If you like, I could either send you the sample code I’ve written, or I could whip up a module if one doesn’t already exist, because I think that one really needs to exist.
Oh, another thing that might help you do what you want is SCons, but that’s probably too heavyweight for what you need–it’s a Pythonic replacement for make and autotools.
Another thing in the standard library that might help out is something under the ‘distutils’ package.
@Bill Barry:
Yah, someone also suggested that I could call my perl code from Python via XPCOM!
Again, hoping it’s 2008 here, and I really don’t have to jump through flaming hoops of stack assembly code to do something so simple.
@Ian:
Popen2 (the class, not the function) really isn’t what I want; it requires that I deal with stdin, and since I don’t care about that case in 90% of what I’m doing, it becomes a bucket of frustration.
@Colin:
I understand whats going on here (I had the pleasure of taking Implementation of Operating Systems too! , I’m just saying it’s annoying.
I’m fulling willing to consider (let’s face it: admit) that my code is the problem, and it seems like calling communicate() sort of does what I wanted. Sort of.
I didn’t see the minor bit in the (frankly pretty crappy) documentation for that function which says “Wait for process to terminate.”
It also doesn’t make clear whether or not I need to call communicate() multiple times or risk my buffer filling up again (it does vaguely note: “The data read is buffered in memory, so do not use this method if the data size is large or unlimited.”)
@Anon:
Yes, looks like calling communicate() before wait() might fix my issue. Lamentably, it still doesn’t really meet my requirement(s) (timeouts, etc.), but then… Perl’s standard library offerings didn’t either.
@Atul:
I’ll check that out; it looks like I might have solved my immediate problem, but as I said, it still doesn’t entirely address my requirements.
It’s interesting to note that Perl’s motto is TMTOWTDI, where as Python’s motto seems to be TMTOMTDI.
I’ve heard of SCons before; I’ve never had a chance to play with it, but any build system framework that doesn’t require you to enclose every variable longer than a single character and has else if’s gets my vote!
Why not just read from stdout optimistically, i.e. while wait: string_io_buffer.write(stdout.read()), or redirect stdout to a file?
Er, while wait: read won’t work, see this bug. You need to read before waiting or use communicate. More info here: http://bugs.python.org/issue1606