How should I use grep from python?

M

Matthew Wilson

I'm writing a command-line application and I want to search through lots
of text files for a string. Instead of writing the python code to do
this, I want to use grep.

This is the command I want to run:

$ grep -l foo dir

In other words, I want to list all files in the directory dir that
contain the string "foo".

I'm looking for the "one obvious way to do it" and instead I found no
consensus. I could os.popen, commands.getstatusoutput, the subprocess
module, backticks, etc.

As of May 2009, what is the recommended way to run an external process
like grep and capture STDOUT and the error code?


TIA

Matt
 
D

Diez B. Roggisch

Matthew said:
I'm writing a command-line application and I want to search through lots
of text files for a string. Instead of writing the python code to do
this, I want to use grep.

This is the command I want to run:

$ grep -l foo dir

In other words, I want to list all files in the directory dir that
contain the string "foo".

I'm looking for the "one obvious way to do it" and instead I found no
consensus. I could os.popen, commands.getstatusoutput, the subprocess
module, backticks, etc.

As of May 2009, what is the recommended way to run an external process
like grep and capture STDOUT and the error code?

subprocess. Which becomes pretty clear when reading it's docs:

"""
The subprocess module allows you to spawn new processes, connect to their
input/output/error pipes, and obtain their return codes. This module
intends to replace several other, older modules and functions, such as:
os.system
os.spawn*
os.popen*
popen2.*
commands.*
"""

Diez
 
T

Tim Chase

I'm writing a command-line application and I want to search through lots
of text files for a string. Instead of writing the python code to do
this, I want to use grep.

This is the command I want to run:

$ grep -l foo dir

In other words, I want to list all files in the directory dir that
contain the string "foo".

I'm looking for the "one obvious way to do it" and instead I found no
consensus. I could os.popen, commands.getstatusoutput, the subprocess
module, backticks, etc.

While it doesn't use grep or external processes, I'd just do it
in pure Python:

def files_containing(location, search_term):
for fname in os.listdir(location):
fullpath = os.path.join(location, fname)
if os.isfile(fullpath):
for line in file(fullpath):
if search_term in line:
yield fname
break
for fname in files_containing('/tmp', 'term'):
print fname

It's fairly readable, you can easily tweak the search methods
(case sensitive, etc), change it to be recursive by using
os.walk() instead of listdir(), it's cross-platform, and doesn't
require the overhead of an external process (along with the
"which call do I use to spawn the function" questions that come
with it :)

However, to answer your original question, I'd use os.popen which
is the one I see suggested most frequently.

-tkc
 
M

Matthew Wilson

subprocess. Which becomes pretty clear when reading it's docs:

Yeah, that's what I figured, but I wondered if there was already
something newer and shinier aiming to bump subprocess off its throne.

I'll just stick with subprocess for now. Thanks for the feedback!
 
M

Matthew Wilson

While it doesn't use grep or external processes, I'd just do it
in pure Python:

Thanks for the code!

I'm reluctant to take that approach for a few reasons:

1. Writing tests for that code seems like a fairly large amount of work.
I think I'd need to to either mock out lots of stuff or create a bunch
of temporary directories and files for each test run.

I don't intend to test that grep works like it says it does. I'll
just test that my code calls a mocked-out grep with the right options
and arguments, and that my code behaves nicely when my mocked-out
grep returns errors.

2. grep is crazy fast. For a search through just a few files, I doubt
it would matter, but when searching through a thousand files (which is
likely) I suspect that an all-python approach might lag behind. I'm
speculating here, though.

3. grep has lots and lots of cute options. I don't want to think about
implementing stuff like --color, for example. If I just pass all the
heavy lifting to grep, I'm already done.

On the other hand, your solution is platform-independent and has no
dependencies. Mine depends on an external grep command.

Thanks again for the feedback!

Matt
 
M

Marco Mariani

Matthew said:
consensus. I could os.popen, commands.getstatusoutput, the subprocess
module, backticks, etc.

Backticks do_not_do what you think they do.

And with py3k they're also as dead as a dead parrot.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,226
Members
46,815
Latest member
treekmostly22

Latest Threads

Top