fastest way to test file for string?

K

kj

Hi. I need to implement, within a Python script, the same
functionality as that of Unix's

grep -rl some_string some_directory

I.e. find all the files under some_directory that contain the string
"some_string".

I imagine that I can always resort to the shell for this, but is
there an efficient way of doing it within Python?

(BTW, portability is not high on my list here; this will run on a
Unix system, and non-portable Unix-specific solutions are fine with
me.)

TIA!
--
 
T

Tim Chase

Hi. I need to implement, within a Python script, the same
functionality as that of Unix's

grep -rl some_string some_directory

I.e. find all the files under some_directory that contain the string
"some_string".

I'd do something like this untested function:

def find_files_containing(base_dir, string_to_find):
for path, files, dirs in os.walk(base_dir):
for fname in files:
full_name = os.path.join(path, fname)
f = file(full_name)
for line in f:
if string_to_find in line:
f.close()
yield full_name
break
else:
f.close()

for filename in find_files_containing(
"/path/to/wherever/",
"some_string"
):
print filename

It's not very gracious regarding binary files, but caveat coder.

-tkc
 
P

pruebauno

Hi.  I need to implement, within a Python script, the same
functionality as that of Unix's

   grep -rl some_string some_directory

I.e. find all the files under some_directory that contain the string
"some_string".

I imagine that I can always resort to the shell for this, but is
there an efficient way of doing it within Python?

(BTW, portability is not high on my list here; this will run on a
Unix system, and non-portable Unix-specific solutions are fine with
me.)

TIA!
--

You can write your own version of grep in python using os.walk, open,
read and find. I don't know why one would want to do that unless for
portability reasons. It will be pretty hard to beat grep in efficiency
and succinctness. The most sensible thing IMHO is a shell script or
call grep using os.system (or using subprocess).
 
H

Hendrik van Rooyen

kj said:
Hi. I need to implement, within a Python script, the same
functionality as that of Unix's

grep -rl some_string some_directory

I.e. find all the files under some_directory that contain the string
"some_string".

I imagine that I can always resort to the shell for this, but is
there an efficient way of doing it within Python?

(BTW, portability is not high on my list here; this will run on a
Unix system, and non-portable Unix-specific solutions are fine with
me.)

Use grep. You will not beat it's performance.

- Hendrik
 
A

Aaron Brady

Hi.  I need to implement, within a Python script, the same
functionality as that of Unix's

   grep -rl some_string some_directory

I.e. find all the files under some_directory that contain the string
"some_string".
snip

The 'mmap.mmap' class has a 'find' method. Not what you asked for,
put possibly an alternative. No regex 'search' method either.
 
S

Steven D'Aprano

Use grep. You will not beat it's performance.


What makes you think that performance of the find itself is the only, or
even the main, consideration?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,825
Latest member
VernonQuy6

Latest Threads

Top