Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: Search for a string in binary files

Thread Tools

Re: Search for a string in binary files

Posts: n/a

> How could I use python to search for a string in binary files? From the
> command line, I would do something like this on a Linux machine to find
> this string:

> grep -a "Microsoft Excel" *.xls

> How can I do this in Python?

Quite easily. To get you started, here is an untested draft, I leave it to
you to try and debug.

import glob
for name in glob.glob('*.xls'):
if file(name, 'rb').read().find('Microsoft Excel') >= 0:
print "Found in", name

François Pinard

Reply With Quote
John Hunter
Posts: n/a
>>>>> "hokieghal99" == hokieghal99 <(E-Mail Removed)> writes:

hokieghal99> And, would it be more efficent (faster) to just call
hokieghal99> grep from python to do the searching?

Depending on how you call grep, probably. If you respawn grep for
each file, it might be slower than the python solution. If you first
build the file list of all the files you want to search and then call
grep on all the files simultaneously, it will likely be a good bit
faster. But you will have to deal with issues like quoting spaces in
filenames, etc....


Reply With Quote
Posts: n/a

> One last question: does grep actually open files when it searches them?

I did not look at `grep' sources for a good while, I might not remember
correctly, read me with caution. `grep' might be trying to `mmap' the files
if the file (and the underlying system) allows this, and there is system
overhead associated with that function, just like `open'.

> And, would it be more efficent (faster) to just call grep from python to
> do the searching?

No doubt to me that it is more efficient calling `grep' _instead_ of Python.
However, if Python is already started, it is more efficient doing the work
from within Python than launching an external program as `grep', as there is
non-negligible system overhead in doing so. (Yet for only a few files,
launching `grep' is fast enough that the user would not notice it anyway.)

Still, there are special cases, unusual in practice, when `grep' might be
faster despite the overhead of calling it. When the file is long enough,
and the string to be searched for meets some special conditions, the
Buyer-Moore algorithm (not sure of spelling) might progressively beat the
likely more simple-minded search technique used within `string.find'. Yet
if Python's `string.find' relies on `strstr' in GNU `libc', it might be
quite fast already. The implementation of such basic routines in `libc'
varied over time, they at least once used to be extremely well implemented
for speed, cleverly using bits of assembler here and there. For `strstr' in
particular, there was once some good code from Stephen van den Berg. I do
not know what `libc' uses nowadays, nor if Python takes advantage of it.

Finally, for huge files, proper reading in Python has to be done in chunks,
and the string to be searched for may happen to span chunks. Doing it
properly might require some more care than one might think at first. But in
practice, on the big average, for reasonable files, staying in Python wins.

François Pinard

Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
Binary tree search vs Binary search Bogdan C Programming 22 10-21-2010 09:46 PM
Help understand probems - Binary Search and Sequenital Search Timmy C++ 5 07-09-2007 02:41 PM
Advantages of Binary Files over Text files in Search and read utab C++ 3 11-28-2006 03:09 PM
Binary Search to search linearizer table? Andy C Programming 1 11-25-2003 04:40 AM
Re: Search for a string in binary files John Hunter Python 0 07-21-2003 10:06 PM