Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Using wildcards...

Reply
Thread Tools

Using wildcards...

 
 
Harlin Seritt
Guest
Posts: n/a
 
      05-02-2005
I looked all over the net but could not find if it is possible to
insert wildcards into strings. What I am trying to do is this: I am
trying to parse text from a Bible file. In case you're not familiar
with the way the Bible organizes itself, it is broken down into Books >
Chapters > Verses. The particular text I am working with are organized
into Book files (*.txt -- flat text file). Here is what the file looks
like:

{1:1} Random text here. {1:2} More text here. and so on.

Of course the {*} can be of any length, so I can't just do .split()
based on the length of the bracket text. What I would like to do is to
..split() using something akin to this:

textdata.split('{*}') # The '*' being a wildcard

Is this possible to do? If so, how is it done?

Thanks,

Harlin Seritt

 
Reply With Quote
 
 
 
 
Roel Schroeven
Guest
Posts: n/a
 
      05-02-2005
Harlin Seritt wrote:

> I looked all over the net but could not find if it is possible to
> insert wildcards into strings. What I am trying to do is this: I am
> trying to parse text from a Bible file. In case you're not familiar
> with the way the Bible organizes itself, it is broken down into Books >
> Chapters > Verses. The particular text I am working with are organized
> into Book files (*.txt -- flat text file). Here is what the file looks
> like:
>
> {1:1} Random text here. {1:2} More text here. and so on.
>
> Of course the {*} can be of any length, so I can't just do .split()
> based on the length of the bracket text. What I would like to do is to
> .split() using something akin to this:
>
> textdata.split('{*}') # The '*' being a wildcard
>
> Is this possible to do? If so, how is it done?


You can use the split function in the re module with a suitable regular
expression:

>>> re.split('{\d+:\d+}', textdata)

['', ' Random text here. ', ' More text here. and so on.']

{\d+:\d+} means 'match {, then one or more digits, then :, then one or
more digits, then }'.

re.split('{.*}', textdata) would be a more direct translation of your
wildcard, but that doesn't work: .* matches as much as possible, so in
your example it would match '{1:1} Random text here. {1:2}' instead of
just '{1:1}' and '{1:2}'.

--
If I have been able to see further, it was only because I stood
on the shoulders of giants. -- Isaac Newton

Roel Schroeven
 
Reply With Quote
 
 
 
 
George Yoshida
Guest
Posts: n/a
 
      05-02-2005
Harlin Seritt wrote:
> {1:1} Random text here. {1:2} More text here. and so on.
>
> Of course the {*} can be of any length, so I can't just do .split()
> based on the length of the bracket text. What I would like to do is to
> .split() using something akin to this:
>
> textdata.split('{*}') # The '*' being a wildcard
>
> Is this possible to do? If so, how is it done?
>


You should look into re module.
regex has more flexible features for text processing than string
module or methods.

- Regular expression operations
http://docs.python.org/lib/module-re.html
- HOWTO
http://www.amk.ca/python/howto/regex/

In your case, the code would go like this:

>>> text = '{1:1} Random text here. {1:2} More text here. and so on.'
>>> import re
>>> pattern = re.compile('{\d+:\d+}')
>>> pattern.split(text)

['', ' Random text here. ', ' More text here. and so on.']

--
george

http://www.dynkin.com/
 
Reply With Quote
 
Kent Johnson
Guest
Posts: n/a
 
      05-02-2005
Harlin Seritt wrote:
> I looked all over the net but could not find if it is possible to
> insert wildcards into strings. What I am trying to do is this: I am
> trying to parse text from a Bible file. In case you're not familiar
> with the way the Bible organizes itself, it is broken down into Books >
> Chapters > Verses. The particular text I am working with are organized
> into Book files (*.txt -- flat text file). Here is what the file looks
> like:
>
> {1:1} Random text here. {1:2} More text here. and so on.
>
> Of course the {*} can be of any length, so I can't just do .split()
> based on the length of the bracket text. What I would like to do is to
> .split() using something akin to this:
>
> textdata.split('{*}') # The '*' being a wildcard


You can do this with the re module. For example

>>> import re
>>> s = '{1:1} Random text here. {1:2} More text here. and so on.'
>>> re.split(r'\{[^}]+\}', s)

['', ' Random text here. ', ' More text here. and so on.']

If you want to be a little stricter in what you accept for the split you could look explicitly for
digits:
>>> re.split(r'\{\d+:\d+\}', s)

['', ' Random text here. ', ' More text here. and so on.']

Kent
 
Reply With Quote
 
Harlin Seritt
Guest
Posts: n/a
 
      05-02-2005
George that is what I'm looking for. Thanks, Harlin

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Using a Link Button to redirect to another page by using data from =?Utf-8?B?R1REcml2ZXI=?= ASP .Net 1 02-16-2005 07:04 PM
Error page using the Application_Error void dosnt work when using DIV for a please wait message s_erez@hotmail.com ASP .Net 2 12-24-2004 12:11 PM
no code in webform using vs.net, but in webform using notepad timmso ASP .Net 1 12-12-2003 04:30 PM
Using GetOleDbSchemaTable to get SQL Server Field Description - using pete ASP .Net 1 08-29-2003 10:50 AM
Re: MVP? Index error on nested element using System.xml but NOT using msxml??? William F. Robertson, Jr. ASP .Net 1 06-25-2003 08:08 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57