Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Re: More Help with python .find fucntion

Thread Tools

Re: More Help with python .find fucntion

Steven D'Aprano
Posts: n/a
On Fri, 07 Jan 2011 22:43:54 -0600, Keith Anthony wrote:

> My previous question asked how to read a file into a strcuture a line at
> a time. Figured it out. Now I'm trying to use .find to separate out
> the PDF objects. (See code) PROBLEM/QUESTION: My call to lines[i].find
> does NOT find all instances of endobj. Any help available? Any
> insights?
> #!/usr/bin/python
> inputfile = file('sample.pdf','rb') # This is PDF with which
> we will work
> lines = inputfile.readlines() # read file
> one line at a time

That's incorrect. readlines() reads the entire file in one go, and splits
it into individual lines.

> linestart = [] # Starting address for
> each line
> lineend = [] # Ending
> address for each line
> linetype = []

*raises eyebrow*

How is an empty list a starting or ending address?

The only thing worse than no comments where you need them is misleading
comments. A variable called "linestart" implies that it should be a
position, e.g. linestart = 0. Or possibly a flag.

> print len(lines) # print number of lines
> i = 0 # define an iterator, i

Again, 0 is not an iterator. 0 is a number.

> addr = 0 # and address pointer
> while i < len(lines): # Go through each line
> linestart = linestart + [addr]
> length = len(lines[i])
> lineend = lineend + [addr + (length-1)] addr = addr + length
> i = i + 1

Complicated and confusing and not the way to do it in Python. Something
like this is much simpler:

linetypes = [] # note plural
inputfile = open('sample.pdf','rb') # Don't use file, use open.

for line_number, line in enumerate(inputfile):
# Process one line at a time. No need for that nonsense with manually
# tracked line numbers, enumerate() does that for us.
# No need to initialise linetypes.
status = 'normal'
i = line.find(' obj')
if i >= 0:
print "Object found at offset %d in line %d" % (i, line_number)
status = 'object'
i = line.find('endobj')
if i >= 0:
print "endobj found at offset %d in line %d" % (i, line_number)
if status == 'normal': status = 'endobj'
else: status = 'object & endobj' # both found on the one line
# What if obj or endobj exist more than once in a line?

One last thing... if PDF files are a binary format, what makes you think
that they can be processed line-by-line? They may not have lines, except
by accident.

Reply With Quote

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off

Similar Threads
Thread Thread Starter Forum Replies Last Post
system fucntion in C aki C Programming 1 10-28-2010 04:20 PM
Stopping a fucntion from printing its output on screen sophie_newbie Python 4 10-18-2007 08:00 AM
Random fucntion with a twist C++ 7 01-28-2007 06:46 PM
How can i return an iterator from a fucntion? TOMERDR C++ 6 05-22-2006 09:21 PM
Now() Fucntion and CurrentCulture ra294 ASP .Net 5 11-26-2004 08:51 AM