Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Help needed to retrieve text from a text-file using RegEx

Reply
Thread Tools

Help needed to retrieve text from a text-file using RegEx

 
 
Bruno Desthuilliers
Guest
Posts: n/a
 
      02-09-2009
Oltmans a écrit :
> Here is the scenario:
>
> It's a command line program. I ask user for a input string. Based on
> that input string I retrieve text from a text file. My text file looks
> like following
>
> Text-file:
> -------------
> AbcManager=C:\source\code\Modules\Code-AbcManager\
> AbcTest=C:\source\code\Modules\Code-AbcTest\
> DecConnector=C:\source\code\Modules\Code-DecConnector\
> GHIManager=C:\source\code\Modules\Code-GHIManager\
> JKLConnector=C:\source\code\Modules\Code-JKLConnector
>
> -------------
>
> So now if I run the program and user enters
>
> DecConnector
>
> Then I'm supposed to show them this text "C:\source\code\Modules\Code-
> DecConnector" from the text-file. Right now I'm retrieving using the
> following code which seems quite ineffecient and inelegant at the same
> time
>
> with open('MyTextFile.txt')


This will lookup for MyFile.txt in the system's current working
directory - which is not necessarily in the script's directory.

> as file:


this shadows the builtin's 'file' symbol.

> for line in file:
>
> if mName in line: #mName is the string that
> contains user input


>
> Path =str(line).strip('\n')


'line' is already a string.

> tempStr=Path
>
> Path=tempStr.replace(mName+'=',"",1)


You don't need the temporary variable here. Also, you may want to use
str.split instead:


# NB : renaming for conformity to
# Python's official naming conventions

# 'name' => what the user looks for
# 'path_to_file' => fully qualified path to the 'database' file

target = "%s=" % name # what we are really looking for

with open(path_to_file) as the_file:
for line in the_file:
# special bonus : handles empty lines and 'comment' lines
# feel free to comment out the thre following lines if
# you're sure you don't need them !-)
line = line.strip()
if not line or line.startswith('#') or line.startswith(';'):
continue

# faster and simpler than a regexp
if line.startswith(target):
# since the '=' is in target, we can safely assume
# that line.split('=') will return at least a
# 2-elements list
path = line.split('=')[1]
# no need to look further
break
else:
# target not found...
path = None



> I was wondering if using RegEx will make this look better.


I don't think so. Really.
 
Reply With Quote
 
 
 
 
Oltmans
Guest
Posts: n/a
 
      02-09-2009
Here is the scenario:

It's a command line program. I ask user for a input string. Based on
that input string I retrieve text from a text file. My text file looks
like following

Text-file:
-------------
AbcManager=C:\source\code\Modules\Code-AbcManager\
AbcTest=C:\source\code\Modules\Code-AbcTest\
DecConnector=C:\source\code\Modules\Code-DecConnector\
GHIManager=C:\source\code\Modules\Code-GHIManager\
JKLConnector=C:\source\code\Modules\Code-JKLConnector

-------------

So now if I run the program and user enters

DecConnector

Then I'm supposed to show them this text "C:\source\code\Modules\Code-
DecConnector" from the text-file. Right now I'm retrieving using the
following code which seems quite ineffecient and inelegant at the same
time

with open('MyTextFile.txt') as file:

for line in file:

if mName in line: #mName is the string that
contains user input

Path =str(line).strip('\n')

tempStr=Path

Path=tempStr.replace(mName+'=',"",1)

I was wondering if using RegEx will make this look better. If so, can
you please suggest a Regular Expression for this? Any help is highly
appreciated. Thank you.
 
Reply With Quote
 
 
 
 
Chris Rebert
Guest
Posts: n/a
 
      02-09-2009
On Mon, Feb 9, 2009 at 9:22 AM, Oltmans <(E-Mail Removed)> wrote:
> Here is the scenario:
>
> It's a command line program. I ask user for a input string. Based on
> that input string I retrieve text from a text file. My text file looks
> like following
>
> Text-file:
> -------------
> AbcManager=C:\source\code\Modules\Code-AbcManager\
> AbcTest=C:\source\code\Modules\Code-AbcTest\
> DecConnector=C:\source\code\Modules\Code-DecConnector\
> GHIManager=C:\source\code\Modules\Code-GHIManager\
> JKLConnector=C:\source\code\Modules\Code-JKLConnector
>
> -------------
>
> So now if I run the program and user enters
>
> DecConnector
>
> Then I'm supposed to show them this text "C:\source\code\Modules\Code-
> DecConnector" from the text-file. Right now I'm retrieving using the
> following code which seems quite ineffecient and inelegant at the same
> time
>
> with open('MyTextFile.txt') as file:
>
> for line in file:
>
> if mName in line: #mName is the string that
> contains user input
>
> Path =str(line).strip('\n')
>
> tempStr=Path
>
> Path=tempStr.replace(mName+'=',"",1)
>
> I was wondering if using RegEx will make this look better. If so, can
> you please suggest a Regular Expression for this? Any help is highly
> appreciated. Thank you.


If I might repeat Jamie Zawinski's immortal quote:
Some people, when confronted with a problem, think "I know, I'll
use regular expressions." Now they have two problems.

If you add one section header (e.g. "[main]") to the top of the file,
you'll have a valid INI-format file which can be parsed by the
ConfigParser module --
http://docs.python.org/library/configparser.html

Cheers,
Chris

--
Follow the path of the Iguana...
http://rebertia.com
 
Reply With Quote
 
rdmurray@bitdance.com
Guest
Posts: n/a
 
      02-09-2009
Oltmans <(E-Mail Removed)> wrote:
> Here is the scenario:
>
> It's a command line program. I ask user for a input string. Based on
> that input string I retrieve text from a text file. My text file looks
> like following
>
> Text-file:
> -------------
> AbcManager=C:\source\code\Modules\Code-AbcManager\
> AbcTest=C:\source\code\Modules\Code-AbcTest\
> DecConnector=C:\source\code\Modules\Code-DecConnector\
> GHIManager=C:\source\code\Modules\Code-GHIManager\
> JKLConnector=C:\source\code\Modules\Code-JKLConnector
>
> -------------
>
> So now if I run the program and user enters
>
> DecConnector
>
> Then I'm supposed to show them this text "C:\source\code\Modules\Code-
> DecConnector" from the text-file. Right now I'm retrieving using the
> following code which seems quite ineffecient and inelegant at the same
> time
>
> with open('MyTextFile.txt') as file:
> for line in file:
> if mName in line: #mName is the string that contains user input
> Path =str(line).strip('\n')
> tempStr=Path
> Path=tempStr.replace(mName+'=',"",1)


I've normalized your indentation and spacing, for clarity.

> I was wondering if using RegEx will make this look better. If so, can
> you please suggest a Regular Expression for this? Any help is highly
> appreciated. Thank you.


This smells like it might be homework, but I'm hoping you'll learn some
useful python from what follows regardless of whether it is or not.

Since your complaint is that the above code is inelegant and inefficient,
let's clean it up. The first three lines that open the file and set up
your loop are good, and I think you will agree that they are pretty clean.
So, I'm just going to help you clean up the loop body.

'line' is already a string, since it was read from a file. No need to
wrap it in 'str':

Path = line.strip('\n')
tempStr=Path
Path=tempStr.replace(mName+'=',"",1)

'strip' removes characters from _both_ ends of the string. If you are
trying to make sure that you _only_ strip a trailing newline, then you
should be using rstrip. If, on the other hand, you just want to get
rid of any leading or trailing whitespace, you could just call 'strip()'.
Since your goal is to print the text from after the '=', I'll assume
that stripping whitespace is desirable:

Path = line.strip()
tempStr=Path
Path=tempStr.replace(mName+'=',"",1)

The statement 'tempStr=Path' doesn't do what you think it does.
It just creates an alternate name for the string pointed to by Path.
Further, there is no need to have an intermediate variable to hold a
value during transformation. The right hand side is computed, using
the current values of any variables mentioned, and _then_ the left hand
side is rebound to point to the result of the computation. So we can
just drop that line entirely, and use 'Path' in the 'replace' statement:

Path = line.strip()
Path = Path.replace(mName+'=',"",1)

However, you can also chain method calls, so really there's no need for
two statements here, since both calls are simple:

Path = line.strip().replace(mName+'=',"",1)

To make things even simpler, Python has a 'split' function. Given the
syntax of your input file I think we can assume that '=' never appears
in a variable name. split returns a list of strings constructed by
breaking the input string at the split character, and it has an optional
argument that gives the maximum number of splits to make. So by doing
'split('=', 1), we will get back a list consisting of the variable name
and the remainder of the line. The remainder of the line is exactly
what you are looking for, and that will be the second element of the
returned list. So now your loop body is:

Path = line.strip().split('=', 1)[1]

and your whole loop looks like this:

with open('MyTextFile.txt') as file:
for line in file:
if mName in line:
Path = line.strip().split('=', 1)[1]

I think that looks pretty elegant. Oh, and you might want to add a
'break' statement to the loop, and also an 'else:' clause (to the for
loop) so you can issue a 'not found' message to the user if they type
in a name that does not appear in the input file.

--RDM

 
Reply With Quote
 
Paul McGuire
Guest
Posts: n/a
 
      02-09-2009
On Feb 9, 11:22*am, Oltmans <(E-Mail Removed)> wrote:
> Here is the scenario:
>
> It's a command line program. I ask user for a input string. Based on
> that input string I retrieve text from a text file. My text file looks
> like following
>
> Text-file:
> -------------
> AbcManager=C:\source\code\Modules\Code-AbcManager\
> AbcTest=C:\source\code\Modules\Code-AbcTest\
> DecConnector=C:\source\code\Modules\Code-DecConnector\
> GHIManager=C:\source\code\Modules\Code-GHIManager\
> JKLConnector=C:\source\code\Modules\Code-JKLConnector
>


Assuming the text-file is in the under-30Mb size, I would just read
the whole thing into a dict at startup, and then use the dict over and
over.

data = file(filename).read()
lookup = dict( line.split('=',1) for line in data.splitlines() if
line )

# now no further need to access text file, just use lookup variable

while True:
user_entry = raw_input("Lookup key: ").strip()
if not user_entry:
break
if user_entry in lookup:
print lookup[user_entry]
else:
print "No entry for '%s'" % user_entry
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
how to store/ retrieve display rich text using access memo field rajpsmohan@googlemail.com ASP General 4 04-06-2009 10:55 AM
Using XPath to retrieve an XML element which contains a given text anne001 Ruby 4 08-11-2008 04:43 PM
How make regex that means "contains regex#1 but NOT regex#2" ?? seberino@spawar.navy.mil Python 3 07-01-2008 03:06 PM
help needed using a regex .sub Peter Vanderhaden Ruby 4 11-23-2007 07:10 PM
VB Code Example to retrieve SessionID needed.. Daren Hawes ASP .Net 2 07-28-2004 10:53 AM



Advertisments