Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Parsing text

Reply
Thread Tools

Parsing text

 
 
sicvic
Guest
Posts: n/a
 
      12-19-2005
I was wondering if theres a way where python can read through the lines
of a text file searching for a key phrase then writing that line and
all lines following it up to a certain point, such as until it sees a
string of "---------------------"

Right now I can only have python write just the line the key phrase is
found in.

Thanks,
Victor

 
Reply With Quote
 
 
 
 
Peter Hansen
Guest
Posts: n/a
 
      12-20-2005
sicvic wrote:
> I was wondering if theres a way where python can read through the lines
> of a text file searching for a key phrase then writing that line and
> all lines following it up to a certain point, such as until it sees a
> string of "---------------------"
>
> Right now I can only have python write just the line the key phrase is
> found in.


That's a good start. Maybe you could post the code that you've already
got that does this, and people could comment on it and help you along.
(I'm suggesting that partly because this almost sounds like homework,
but you'll benefit more by doing it this way than just by having an
answer handed to you whether this is homework or not.)

-Peter

 
Reply With Quote
 
 
 
 
Noah
Guest
Posts: n/a
 
      12-20-2005
sicvic wrote:
> I was wondering if theres a way where python can read through the lines
> of a text file searching for a key phrase then writing that line and
> all lines following it up to a certain point, such as until it sees a
> string of "---------------------"
>...
> Thanks,
> Victor


You did not specify the "key phrase" that you are looking for, so for
the sake
of this example I will assume that it is "key phrase".
I assume that you don't want "key phrase" or "---------------------" to
be returned
as part of your match, so we use minimal group matching (.*?)
You also want your regular expression to use the re.DOTALL flag because
this
is how you match across multiple lines. The simplest way to set this
flag is
to simply put it at the front of your regular expression using the (?s)
notation.

This gives you something like this:
print re.findall ("(?s)key phrase(.*?)---------------------",
your_string_to_search) [0]

So what that basically says is:
1. Match multiline -- that is, match across lines (?s)
2. match "key phrase"
3. Capture the group matching everything (?.*)
4. Match "---------------------"
5. Print the first match in the list [0]

Yours,
Noah

 
Reply With Quote
 
Bengt Richter
Guest
Posts: n/a
 
      12-20-2005
On 19 Dec 2005 15:15:10 -0800, "sicvic" <(E-Mail Removed)> wrote:

>I was wondering if theres a way where python can read through the lines
>of a text file searching for a key phrase then writing that line and
>all lines following it up to a certain point, such as until it sees a
>string of "---------------------"
>
>Right now I can only have python write just the line the key phrase is
>found in.
>

This sounds like homework, so just a (big) hint: have a look at itertools
dropwhile and takewhile. The solution is potentially a one-liner, depending
on your matching criteria (e.g., case-sensitive fixed string vs regular expression).

Regards,
Bengt Richter
 
Reply With Quote
 
sicvic
Guest
Posts: n/a
 
      12-20-2005
Not homework...not even in school (do any universities even teach
classes using python?). Just not a programmer. Anyways I should
probably be more clear about what I'm trying to do.

Since I cant show the actual output file lets say I had an output file
that looked like this:

aaaaa bbbbb Person: Jimmy
Current Location: Denver
Next Location: Chicago
----------------------------------------------
aaaaa bbbbb Person: Sarah
Current Location: San Diego
Next Location: Miami
Next Location: New York
----------------------------------------------

Now I want to put (and all recurrences of "Person: Jimmy")

Person: Jimmy
Current Location: Denver
Next Location: Chicago

in a file called jimmy.txt

and the same for Sarah in sarah.txt

The code I currently have looks something like this:

import re
import sys

person_jimmy = open('jimmy.txt', 'w') #creates jimmy.txt
person_sarah = open('sarah.txt', 'w') #creates sarah.txt

f = open(sys.argv[1]) #opens output file
#loop that goes through all lines and parses specified text
for line in f.readlines():
if re.search(r'Person: Jimmy', line):
person_jimmy.write(line)
elif re.search(r'Person: Sarah', line):
person_sarah.write(line)

#closes all files

person_jimmy.close()
person_sarah.close()
f.close()

However this only would produces output files that look like this:

jimmy.txt:

aaaaa bbbbb Person: Jimmy

sarah.txt:

aaaaa bbbbb Person: Sarah

My question is what else do I need to add (such as an embedded loop
where the if statements are?) so the files look like this

aaaaa bbbbb Person: Jimmy
Current Location: Denver
Next Location: Chicago

and

aaaaa bbbbb Person: Sarah
Current Location: San Diego
Next Location: Miami
Next Location: New York


Basically I need to add statements that after finding that line copy
all the lines following it and stopping when it sees
'----------------------------------------------'

Any help is greatly appreciated.

 
Reply With Quote
 
rzed
Guest
Posts: n/a
 
      12-20-2005
"sicvic" <(E-Mail Removed)> wrote in
news:(E-Mail Removed) oups.com:

> Not homework...not even in school (do any universities even
> teach classes using python?). Just not a programmer. Anyways I
> should probably be more clear about what I'm trying to do.
>
> Since I cant show the actual output file lets say I had an
> output file that looked like this:
>
> aaaaa bbbbb Person: Jimmy
> Current Location: Denver
> Next Location: Chicago
> ----------------------------------------------
> aaaaa bbbbb Person: Sarah
> Current Location: San Diego
> Next Location: Miami
> Next Location: New York
> ----------------------------------------------
>
> Now I want to put (and all recurrences of "Person: Jimmy")
>
> Person: Jimmy
> Current Location: Denver
> Next Location: Chicago
>
> in a file called jimmy.txt
>
> and the same for Sarah in sarah.txt
>
> The code I currently have looks something like this:
>
> import re
> import sys
>
> person_jimmy = open('jimmy.txt', 'w') #creates jimmy.txt
> person_sarah = open('sarah.txt', 'w') #creates sarah.txt
>
> f = open(sys.argv[1]) #opens output file
> #loop that goes through all lines and parses specified text
> for line in f.readlines():
> if re.search(r'Person: Jimmy', line):
> person_jimmy.write(line)
> elif re.search(r'Person: Sarah', line):
> person_sarah.write(line)
>
> #closes all files
>
> person_jimmy.close()
> person_sarah.close()
> f.close()
>
> However this only would produces output files that look like
> this:
>
> jimmy.txt:
>
> aaaaa bbbbb Person: Jimmy
>
> sarah.txt:
>
> aaaaa bbbbb Person: Sarah
>
> My question is what else do I need to add (such as an embedded
> loop where the if statements are?) so the files look like this
>
> aaaaa bbbbb Person: Jimmy
> Current Location: Denver
> Next Location: Chicago
>
> and
>
> aaaaa bbbbb Person: Sarah
> Current Location: San Diego
> Next Location: Miami
> Next Location: New York
>
>
> Basically I need to add statements that after finding that line
> copy all the lines following it and stopping when it sees
> '----------------------------------------------'
>
> Any help is greatly appreciated.
>


Something like this, maybe?

"""
This iterates through a file, with subloops to handle the
special cases. I'm assuming that Jimmy and Sarah are not the
only people of interest. I'm also assuming (for no very good
reason) that you do want the separator lines, but do not want
the "Person:" lines in the output file. It is easy enough to
adjust those assumptions to taste.

Each "Person:" line will cause a file to be opened (if it is
not already open, and will write the subsequent lines to it
until the separator is found. Be aware that all files remain
open unitl the loop at the end closes them all.
"""

outfs = {}
f = open('shouldBeDatabase.txt')
for line in f:
if line.find('Person:') >= 0:
ofkey = line[line.find('Person:')+7:].strip()
if not ofkey in outfs:
outfs[ofkey] = open('%s.txt' % ofkey, 'w')
outf = outfs[ofkey]
while line.find('-----------------------------') < 0:
line = f.next()
outf.write('%s' % line)
f.close()
for k,v in outfs.items():
v.close()

--
rzed
 
Reply With Quote
 
Dennis Lee Bieber
Guest
Posts: n/a
 
      12-20-2005
On 20 Dec 2005 08:06:39 -0800, "sicvic" <(E-Mail Removed)>
declaimed the following in comp.lang.python:

> The code I currently have looks something like this:
>
> import re


For a "non-programmer" you jumped into using a module I've never
made use of...

> import sys
>
> person_jimmy = open('jimmy.txt', 'w') #creates jimmy.txt
> person_sarah = open('sarah.txt', 'w') #creates sarah.txt
>

This presupposes that only these two names are of interest

> f = open(sys.argv[1]) #opens output file


Pardon, isn't that the input file?

> #loop that goes through all lines and parses specified text
> for line in f.readlines():
> if re.search(r'Person: Jimmy', line):
> person_jimmy.write(line)


Well, if you want all lines up to some terminator, shouldn't you be
writing them <G>

> elif re.search(r'Person: Sarah', line):
> person_sarah.write(line)
>
> #closes all files
>
> person_jimmy.close()
> person_sarah.close()
> f.close()
>


I have not tested this; nor is it the most optimal coding -- I tried
to keep each line simple... (hope your font reads better... Agent uses
one in which lower-L and upper-I look alike: iIlL; ln is lower-L+n, fIn
is f+upper-I+n [best to cut&paste rather than type by hand])

-=-=-=-=-=-=-=-
import sys
import os.path

START_FLAG = "Person: "
END_FLAG = "----------------------------------------------"

def personFile(s):
pName = s[s.find(START_FLAG) + len(START_FLAG):]
pFID = pName + ".txt"
if os.path.exists(pFID):
pOut = open(pFID, "a")
else:
pOut = open(pFID, "w")
pOut.write(START_FLAG)
pOut.write(pName)
pOut.write("\n")
return pOut

def processFile(fIn):
pOut = None
for ln in fIn:
ln = ln.strip() #get rid of trailing line ending, etc.
if pOut and ln == END_FLAG:
pOut.close()
pOut = None
elif not pOut and ln.find(START_FLAG) != -1:
pOut = personFile(ln)
elif pOut:
pOut.write(ln)
pOut.write("\n")
else:
# No output file, not a start flag... skip the line
pass

if __name__ == "__main__":
if sys.argv[1]:
dIn = open(sys.argv[1], "r")
processFile(dIn)
dIn.close()
else:
print "\n\nUsage: whatever Input_File_Name\n\n"
-=-=-=-=-=-=-=-=-
--
> ================================================== ============ <
> http://www.velocityreviews.com/forums/(E-Mail Removed) | Wulfraed Dennis Lee Bieber KD6MOG <
> (E-Mail Removed) | Bestiaria Support Staff <
> ================================================== ============ <
> Home Page: <http://www.dm.net/~wulfraed/> <
> Overflow Page: <http://wlfraed.home.netcom.com/> <

 
Reply With Quote
 
Gerard Flanagan
Guest
Posts: n/a
 
      12-20-2005
sicvic wrote:

> Since I cant show the actual output file lets say I had an output file
> that looked like this:
>
> aaaaa bbbbb Person: Jimmy
> Current Location: Denver


It may be the output of another process but it's the input file as far
as the parsing code is concerned.

The code below gives the following output, if that's any help ( just
adapting Noah's idea above). Note that it deals with the input as a
single string rather than line by line.


Jimmy
Jimmy.txt

Current Location: Denver
Next Location: Chicago

Sarah
Sarah.txt

Current Location: San Diego
Next Location: Miami
Next Location: New York

>>>


data='''
aaaaa bbbbb Person: Jimmy
Current Location: Denver
Next Location: Chicago
----------------------------------------------
aaaaa bbbbb Person: Sarah
Current Location: San Diego
Next Location: Miami
Next Location: New York
----------------------------------------------
'''

import StringIO
import re


src = StringIO.StringIO(data)

for name in ['Jimmy', 'Sarah']:
exp = "(?s)Person: %s(.*?)--" % name
filename = "%s.txt" % name
info = re.findall(exp, src.getvalue())[0]
print name
print filename
print info



hth

Gerard

 
Reply With Quote
 
Scott David Daniels
Guest
Posts: n/a
 
      12-20-2005
sicvic wrote:
> Not homework...not even in school (do any universities even teach
> classes using python?).

Yup, at least 6, and 20 wouldn't surprise me.

> The code I currently have looks something like this:
> ...
> f = open(sys.argv[1]) #opens output file
> #loop that goes through all lines and parses specified text
> for line in f.readlines():
> if re.search(r'Person: Jimmy', line):
> person_jimmy.write(line)
> elif re.search(r'Person: Sarah', line):
> person_sarah.write(line)

Using re here seems pretty excessive.
How about:
...
f = open(sys.argv[1]) # opens input file ### get comments right
source = iter(f) # files serve lines at their own pace. Let them
for line in source:
if line.endswith('Person: Jimmy\n'):
dest = person_jimmy
elif line.endswith('Person: Sarah\n'):
dest = person_sarah
else:
continue
while line != '---------------\n':
dest.write(line)
line = source.next()
f.close()
person_jimmy.close()
person_sarah.close()

--Scott David Daniels
(E-Mail Removed)
 
Reply With Quote
 
sicvic
Guest
Posts: n/a
 
      12-20-2005
Thank you everyone!!!

I got a lot more information then I expected. You guys got my brain
thinking in the right direction and starting to like programming.
You've got a great community here. Keep it up.

Thanks,
Victor

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
SAX parsing problem, when element contains text like "[text]" Kai Schlamp Java 1 03-27-2008 08:36 PM
[ANN] Parsing Tutorial and YARD 1.0: A C++ Parsing Framework Christopher Diggins C++ 0 07-09-2007 09:01 PM
Assistance parsing text file using Text::CSV_XS Domenico Discepola Perl Misc 6 09-02-2004 03:55 PM
SAX Parsing - Weird results when parsing content between tags. Naren XML 0 05-11-2004 07:25 PM
Perl expression for parsing CSV (ignoring parsing commas when in double quotes) GIMME Perl 2 02-11-2004 05:40 PM



Advertisments