Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Tag parsing in python

Reply
Thread Tools

Tag parsing in python

 
 
agnibhu
Guest
Posts: n/a
 
      08-28-2010
Hi all,

I'm a newbie in python. I'm trying to create a library for parsing
certain keywords.
For example say I've key words like abc: bcd: cde: like that... So the
user may use like
abc: How are you bcd: I'm fine cde: ok

So I've to extract the "How are you" and "I'm fine" and "ok"..and
assign them to abc:, bcd: and cde: respectively.. There may be
combination of keyowords introduced in future. like abc: xy: How are
you
So new keywords qualifying the other keywords so on..
So I would like to know the python way of doing this. Is there any
library already existing for making my work easier. ?

~
Agnibhu
 
Reply With Quote
 
 
 
 
Tim Chase
Guest
Posts: n/a
 
      08-28-2010
On 08/28/10 11:14, agnibhu wrote:
> For example say I've key words like abc: bcd: cde: like that... So the
> user may use like
> abc: How are you bcd: I'm fine cde: ok
>
> So I've to extract the "How are you" and "I'm fine" and "ok"..and
> assign them to abc:, bcd: and cde: respectively..


For this, you can do something like

>>> s = "abc: how are you bcd: I'm fine cde: ok"
>>> import re
>>> r = re.compile(r'(\w+):\s*((?:[^:](?!\w+)*)')
>>> r.findall(s)

[('abc', 'how are you'), ('bcd', "I'm fine"), ('cde', 'ok')]

Yes, it's a bit of a gnarled regexp, but it seems to do the job.

> There may be combination of keyowords introduced in future.
> like abc: xy: How are you So new keywords qualifying the other
> keywords so on.


I'm not sure I understand this bit of what you're asking. If you
have

s = "abc: xy: How are you"

why should that not be parsed as

>>> r.findall("abc: xy: How are you")

[('abc', ''), ('xy', 'How are you')]

as your initial description prescribes?

-tkc





 
Reply With Quote
 
 
 
 
Josh English
Guest
Posts: n/a
 
      08-28-2010
On Aug 28, 9:14*am, agnibhu <(E-Mail Removed)> wrote:
> Hi all,
>
> I'm a newbie in python. I'm trying to create a library for parsing
> certain keywords.
> For example say I've key words like abc: bcd: cde: like that... So the
> user may use like
> abc: How are you bcd: I'm fine cde: ok
>
> So I've to extract the "How are you" and "I'm fine" and "ok"..and
> assign them to abc:, bcd: and cde: respectively.. There may be
> combination of keyowords introduced in future. like abc: xy: How are
> you
> So new keywords qualifying the other keywords so on..
> So I would like to know the python way of doing this. Is there any
> library already existing for making my work easier. ?
>
> ~
> Agnibhu


Have you looked at pyparsing? (http://pyparsing.wikispaces.com/) It
may
be possible to use that library to do this.

Josh



 
Reply With Quote
 
Paul McGuire
Guest
Posts: n/a
 
      08-29-2010
On Aug 28, 11:14*am, agnibhu <(E-Mail Removed)> wrote:
> Hi all,
>
> I'm a newbie in python. I'm trying to create a library for parsing
> certain keywords.
> For example say I've key words like abc: bcd: cde: like that... So the
> user may use like
> abc: How are you bcd: I'm fine cde: ok
>
> So I've to extract the "How are you" and "I'm fine" and "ok"..and
> assign them to abc:, bcd: and cde: respectively.. There may be
> combination of keyowords introduced in future. like abc: xy: How are
> you
> So new keywords qualifying the other keywords so on..
> So I would like to know the python way of doing this. Is there any
> library already existing for making my work easier. ?
>
> ~
> Agnibhu


Here's how pyparsing can parse your keyword/tags:

from pyparsing import Combine, Word, alphas, Group, OneOrMore, empty,
SkipTo, LineEnd

text1 = "abc: How are you bcd: I'm fine cde: ok"
text2 = "abc: xy: How are you"

tag = Combine(Word(alphas)+":")
tag_defn = Group(OneOrMore(tag))("tag") + empty + SkipTo(tag |
LineEnd())("body")

for text in (text1,text2):
print text
for td in tag_defn.searchString(text):
print td.dump()
print

Prints:

abc: How are you bcd: I'm fine cde: ok
[['abc:'], 'How are you']
- body: How are you
- tag: ['abc:']
[['bcd:'], "I'm fine"]
- body: I'm fine
- tag: ['bcd:']
[['cde:'], 'ok']
- body: ok
- tag: ['cde:']

abc: xy: How are you
[['abc:', 'xy:'], 'How are you']
- body: How are you
- tag: ['abc:', 'xy:']



Now here's how to further use pyparsing to actually use those tags as
substitution macros:

from pyparsing import Forward, MatchFirst, Literal, And, replaceWith,
FollowedBy

# now combine macro detection with substitution
macros = {}
macro_substitution = Forward()
def make_macro_sub(tokens):
macros[tuple(tokens.tag)] = tokens.body

# define macro substitution
macro_substitution << MatchFirst(
[(Literal(k[0]) if len(k)==1
else And([Literal(kk) for kk in
k])).setParseAction(replaceWith(v))
for k,v in macros.items()] ) + ~FollowedBy(tag)

return ""
tag_defn.setParseAction(make_macro_sub)

scan_pattern = macro_substitution | tag_defn

test_text = text1 + "\nBob said, 'abc' I said, 'bcd:.'" + text2 +
"\nThen Bob said 'abc: xy'"

print test_text
print scan_pattern.transformString(test_text)


Prints:

abc: How are you bcd: I'm fine cde: ok
Bob said, 'abc' I said, 'bcd:.'abc: xy: How are you
Then Bob said 'abc: xy'

Bob said, 'How are you?' I said, 'I'm fine.'
Then Bob said 'How are you?'

 
Reply With Quote
 
Paul McGuire
Guest
Posts: n/a
 
      08-29-2010
On Aug 28, 11:23*pm, Paul McGuire <(E-Mail Removed)> wrote:
> On Aug 28, 11:14*am, agnibhu <(E-Mail Removed)> wrote:
>
>
>
>
>
> > Hi all,

>
> > I'm a newbie in python. I'm trying to create a library for parsing
> > certain keywords.
> > For example say I've key words like abc: bcd: cde: like that... So the
> > user may use like
> > abc: How are you bcd: I'm fine cde: ok

>
> > So I've to extract the "How are you" and "I'm fine" and "ok"..and
> > assign them to abc:, bcd: and cde: respectively.. There may be
> > combination of keyowords introduced in future. like abc: xy: How are
> > you
> > So new keywords qualifying the other keywords so on..


I got to thinking more about your keywords-qualifying-keywords
example, and I thought this would be a good way to support locale-
specific tags. I also thought how one might want to have tags within
tags, to be substituted later, requiring a "abc::" escaped form of
"abc:", so that the tag is substituted with the value of tag "abc:" as
a late binding.

Wasn't too hard to modify what I posted yesterday, and now I rather
like it.

-- Paul


# tag_substitute.py

from pyparsing import (Combine, Word, alphas, FollowedBy, Group,
OneOrMore,
empty, SkipTo, LineEnd, Optional, Forward, MatchFirst, Literal,
And, replaceWith)

tag = Combine(Word(alphas) + ~FollowedBy("::") + ":")
tag_defn = Group(OneOrMore(tag))("tag") + empty + SkipTo(tag |
LineEnd())("body") + Optional(LineEnd().suppress())


# now combine macro detection with substitution
macros = {}
macro_substitution = Forward()
def make_macro_sub(tokens):
# unescape '::' and substitute any embedded tags
tag_value =
macro_substitution.transformString(tokens.body.rep lace("::",":"))

# save this tag and value (or overwrite previous)
macros[tuple(tokens.tag)] = tag_value

# define overall macro substitution expression
macro_substitution << MatchFirst(
[(Literal(k[0]) if len(k)==1
else And([Literal(kk) for kk in
k])).setParseAction(replaceWith(v))
for k,v in macros.items()] ) + ~FollowedBy(tag)

# return empty string, so macro definitions don't show up in final
# expanded text
return ""

tag_defn.setParseAction(make_macro_sub)

# define pattern for macro scanning
scan_pattern = macro_substitution | tag_defn


sorry = """\
nm: Dave
sorry: en: I'm sorry, nm::, I'm afraid I can't do that.
sorry: es: Lo siento nm::, me temo que no puedo hacer eso.
Hal said, "sorry: en:"
Hal dijo, "sorry: es:" """
print scan_pattern.transformString(sorry)

Prints:

Hal said, "I'm sorry, Dave, I'm afraid I can't do that."
Hal dijo, "Lo siento Dave, me temo que no puedo hacer eso."
 
Reply With Quote
 
agnibhu
Guest
Posts: n/a
 
      08-30-2010
On Aug 29, 5:43*pm, Paul McGuire <(E-Mail Removed)> wrote:
> On Aug 28, 11:23*pm, Paul McGuire <(E-Mail Removed)> wrote:
>
>
>
> > On Aug 28, 11:14*am, agnibhu <(E-Mail Removed)> wrote:

>
> > > Hi all,

>
> > > I'm a newbie in python. I'm trying to create a library for parsing
> > > certain keywords.
> > > For example say I've key words like abc: bcd: cde: like that... So the
> > > user may use like
> > > abc: How are you bcd: I'm fine cde: ok

>
> > > So I've to extract the "How are you" and "I'm fine" and "ok"..and
> > > assign them to abc:, bcd: and cde: respectively.. There may be
> > > combination of keyowords introduced in future. like abc: xy: How are
> > > you
> > > So new keywords qualifying the other keywords so on..

>
> I got to thinking more about your keywords-qualifying-keywords
> example, and I thought this would be a good way to support locale-
> specific tags. *I also thought how one might want to have tags within
> tags, to be substituted later, requiring a "abc::" escaped form of
> "abc:", so that the tag is substituted with the value of tag "abc:" as
> a late binding.
>
> Wasn't too hard to modify what I posted yesterday, and now I rather
> like it.
>
> -- Paul
>
> # tag_substitute.py
>
> from pyparsing import (Combine, Word, alphas, FollowedBy, Group,
> OneOrMore,
> * * empty, SkipTo, LineEnd, Optional, Forward, MatchFirst, Literal,
> And, replaceWith)
>
> tag = Combine(Word(alphas) + ~FollowedBy("::") + ":")
> tag_defn = Group(OneOrMore(tag))("tag") + empty + SkipTo(tag |
> LineEnd())("body") + Optional(LineEnd().suppress())
>
> # now combine macro detection with substitution
> macros = {}
> macro_substitution = Forward()
> def make_macro_sub(tokens):
> * * # unescape '::' and substitute any embedded tags
> * * tag_value =
> macro_substitution.transformString(tokens.body.rep lace("::",":"))
>
> * * # save this tag and value (or overwrite previous)
> * * macros[tuple(tokens.tag)] = tag_value
>
> * * # define overall macro substitution expression
> * * macro_substitution << MatchFirst(
> * * * * * * [(Literal(k[0]) if len(k)==1
> * * * * * * * * else And([Literal(kk) for kk in
> k])).setParseAction(replaceWith(v))
> * * * * * * * * * * for k,v in macros.items()] ) + ~FollowedBy(tag)
>
> * * # return empty string, so macro definitions don't show up in final
> * * # expanded text
> * * return ""
>
> tag_defn.setParseAction(make_macro_sub)
>
> # define pattern for macro scanning
> scan_pattern = macro_substitution | tag_defn
>
> sorry = """\
> nm: Dave
> sorry: en: I'm sorry, nm::, I'm afraid I can't do that.
> sorry: es: Lo siento nm::, me temo que no puedo hacer eso.
> Hal said, "sorry: en:"
> Hal dijo, "sorry: es:" """
> print scan_pattern.transformString(sorry)
>
> Prints:
>
> Hal said, "I'm sorry, Dave, I'm afraid I can't do that."
> Hal dijo, "Lo siento Dave, me temo que no puedo hacer eso."


Thanks all for giving me great solutions. I'm happy to see the
respones.
Will try out these and post the reply soon.

Thanks once again,
Agnibhu..
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Parsing an XML file and adding another tag, if the tag is not available / the value is null P Perl Misc 7 01-12-2007 03:28 AM
how do u invoke Tag b's Tag Handler from within Tag a's tag Handler? shruds Java 1 01-27-2006 03:00 AM
To vlan tag or not to tag? budyerr Cisco 1 07-08-2004 03:45 AM
struts tag inside a tag kishan bisht Java 1 07-08-2003 11:04 PM
How to embed the <jsp:plugin> tag into a tag handler class...HELP !! jstack Java 1 07-04-2003 06:58 PM



Advertisments