Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > splitting a words of a line

Reply
Thread Tools

splitting a words of a line

 
 
Sumit
Guest
Posts: n/a
 
      12-06-2007
Hi ,
I am trying to splitt a Line whihc is below of format ,

AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500] "162.44.245.32 CN=dddd
cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET /mci/performance/
SelectProducts.aspx?
p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5] [0]

Here all the string whihc i want to split is
---------------------------------
AzAccept
PLYSSTM01
[23/Sep/2005:16:14:28 -0500]
162.44.245.32
CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
Secure,DC=customer,DC=rxcorp,DC=com"
GET
/mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc
d4b62ca2-09a0-4334622b-0e1c-03c42ba5
0
--------------------------------

i am trying to use re.split() method to split them , But unable to get
the exact result .

Any help on this is highly appriciated .

Thanks
Sumit
 
Reply With Quote
 
 
 
 
John Machin
Guest
Posts: n/a
 
      12-06-2007
On Dec 7, 2:21 am, Sumit <sumit.na...@gmail.com> wrote:
> Hi ,
> I am trying to splitt a Line whihc is below of format ,
>
> AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500] "162.44.245.32 CN=dddd
> cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
> Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET /mci/performance/
> SelectProducts.aspx?
> p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5] [0]


Because lines are mangled in transmission, it is rather difficult to
guess exactly what you have in your input and what your expected
results are.

Also you don't show exactly what you have tried.

At the end is a small script that contains my guess as to your input
and expected results, shows an example of what the re.VERBOSE flag is
intended for, and how you might debug your results.

So that you don't get your homework done 100% for free, I haven't
corrected the last mistake I made.

As usual, re may not be the best way of doing this exercise. Your
*single* piece of evidence may not be enough. It appears to be a
horrid conglomeration of instances of different things, each with its
own grammar. You may find that something like PyParsing would be more
legible and more robust.

>
> Here all the string whihc i want to split is
> ---------------------------------
> AzAccept
> PLYSSTM01
> [23/Sep/2005:16:14:28 -0500]
> 162.44.245.32
> CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
> Secure,DC=customer,DC=rxcorp,DC=com"
> GET
> /mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc
> d4b62ca2-09a0-4334622b-0e1c-03c42ba5
> 0
> --------------------------------
>
> i am trying to use re.split() method to split them , But unable to get
> the exact result .
>


C:\junk>type sumit.py
import re

textin = \
"""AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500] "162.44.245.32
CN=dddd """ \
"""cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk """ \
"""Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET /mci/
performance/""" \
"""SelectProducts.aspx?""" \
"""p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5]
[0]"""

expected = [
"AzAccept",
"PLYSSTM01",
"23/Sep/2005:16:14:28 -0500",
"162.44.245.32",
"CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
Secure,DC=custom
er,DC=rxcorp,DC=com",
"plysmhc03zp",
"GET",
"/mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc",
"d4b62ca2-09a0-4334622b-0e1c-03c42ba5",
"0",
]

pattern = r"""
(\S+) # AzAccept
\s+
(\S+) # PLYSSTM01
\s+\[
([^]]+) # 23/Sep/2005:16:14:28 -0500
]\s+"
(\S+) # 162.44.245.32
\s+
([^"]+) # CN=dddd cojack (890),OU=1, etc etc,DC=rxcorp,DC=com
"\s+"
(\S+) # plysmhc03zp
\s+
(\S+) # GET
\s+
(\S+) # /mci/performance/ ... menu=adhoc
\s+\[
([^]]+) # d4b62ca2-09a0-4334622b-0e1c-03c42ba5
]\s+\[
([^]]+) # 0
]$
"""

mobj = re.match(pattern, textin, re.VERBOSE)
if not mobj:
print "Bzzzt!"
else:
result = mobj.groups()
print "len check", len(result) == len(expected), len(result),
len(expected)
for a, b in zip(result, expected):
print a == b, repr(a), repr(b)



C:\junk>python sumit.py
len check True 10 10
True 'AzAccept' 'AzAccept'
True 'PLYSSTM01' 'PLYSSTM01'
True '23/Sep/2005:16:14:28 -0500' '23/Sep/2005:16:14:28 -0500'
True '162.44.245.32' '162.44.245.32'
True 'CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
Secure,DC=custo
mer,DC=rxcorp,DC=com' 'CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-
Users,OU=kk
k Secure,DC=customer,DC=rxcorp,DC=com'
True 'plysmhc03zp' 'plysmhc03zp'
True 'GET' 'GET'
False '/mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc"'
'/mci/perf
ormance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc'
True 'd4b62ca2-09a0-4334622b-0e1c-03c42ba5'
'd4b62ca2-09a0-4334622b-0e1c-03c42ba
5'
True '0' '0'

C:\junk>
 
Reply With Quote
 
 
 
 
Paul McGuire
Guest
Posts: n/a
 
      12-07-2007
On Dec 6, 9:21 am, Sumit <sumit.na...@gmail.com> wrote:
> Hi ,
> I am trying to splitt a Line whihc is below of format ,
>
> AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500] "162.44.245.32 CN=dddd
> cojack (890),OU=1,OU=Customers,OU=ISM-Users,OU=kkk
> Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET /mci/performance/
> SelectProducts.aspx?
> p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5] [0]
>


As John Machin mentioned, pyparsing may be helpful to you. Here is a
simple version:

data = """AzAccept PLYSSTM01 [23/Sep/2005:16:14:28 -0500]
"162.44.245.32 CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-
Users,OU=kkk Secure,DC=customer,DC=rxcorp,DC=com" "plysmhc03zp GET /
mci/performance/SelectProducts.aspx?
p=0&V=C&a=29&menu=adhoc" [d4b62ca2-09a0-4334622b-0e1c-03c42ba5] [0]"""

# Version 1 - simple
from pyparsing import *
LBRACK,RBRACK,COMMA = map(Suppress,"[],")
num = Word(nums)
date = Combine(num+"/"+Word(alphas)+"/"+num+":"+num+":"+num+":"+num) +
\
oneOf("+ -") + num
date.setParseAction(keepOriginalText)
uuid = delimitedList(Word(hexnums),"-",combine=True)
logString = Word(alphas,alphanums) + Word(alphas,alphanums) + \
LBRACK + date + RBRACK + quotedString + quotedString + \
LBRACK + uuid + RBRACK + LBRACK + Word(nums) + RBRACK

print logString.parseString(data)

Prints out:
['AzAccept', 'PLYSSTM01', '23/Sep/2005:16:14:28 -0500',
'"162.44.245.32 CN=dddd cojack (890),OU=1,OU=Customers,OU=ISM-
Users,OU=kkk Secure,DC=customer,DC=rxcorp,DC=com"', '"plysmhc03zp GET /
mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc"',
'd4b62ca2-09a0-4334622b-0e1c-03c42ba5', '0']


And here is a slightly fancier version, which parses the quoted
strings (uses the pprint pretty-printing module to show structure of
the parsed results):

# Version 2 - fancy
from pyparsing import *
LBRACK,RBRACK,COMMA = map(Suppress,"[],")
num = Word(nums)
date = Combine(num+"/"+Word(alphas)+"/"+num+":"+num+":"+num+":"+num) +
\
oneOf("+ -") + num
date.setParseAction(keepOriginalText)
uuid = delimitedList(Word(hexnums),"-",combine=True)

ipAddr = delimitedList(Word(nums),".",combine=True)
keyExpr=Word(alphas.upper())
valExpr=CharsNotIn(',')
qs1Expr = ipAddr + Group(delimitedList(Combine(keyExpr + '=' +
valExpr)))
def parseQS1(t):
return qs1Expr.parseString(t[0])
def parseQS2(t):
return t[0].split()

qs1 = quotedString.copy().setParseAction(removeQuotes, parseQS1)
qs2 = quotedString.copy().setParseAction(removeQuotes, parseQS2)

logString = Word(alphas,alphanums) + Word(alphas,alphanums) + \
LBRACK + date + RBRACK + qs1 + qs2 + \
LBRACK + uuid + RBRACK + LBRACK + Word(nums) + RBRACK

from pprint import pprint
pprint(logString.parseString(data).asList())

Prints:
['AzAccept',
'PLYSSTM01',
'23/Sep/2005:16:14:28 -0500',
'162.44.245.32',
['CN=dddd cojack (890)',
'OU=1',
'OU=Customers',
'OU=ISM-Users',
'OU=kkk Secure',
'DC=customer',
'DC=rxcorp',
'DC=com'],
'plysmhc03zp',
'GET',
'/mci/performance/SelectProducts.aspx?p=0&V=C&a=29&menu=adhoc',
'd4b62ca2-09a0-4334622b-0e1c-03c42ba5',
'0']

Find more about pyparsing at http://pyparsing.wikispaces.com.

-- Paul


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How suppress splitting of words within JEditorPane Mich Java 2 02-04-2004 10:36 AM
Not splitting words in output Chris Mantoulidis C++ 3 12-23-2003 02:47 AM
Re: Splitting up the definitions of a class into different files (splitting public from private)? John Dibling C++ 0 07-19-2003 04:41 PM
Re: Splitting up the definitions of a class into different files (splitting public from private)? Mark C++ 0 07-19-2003 04:24 PM
Re: Splitting up the definitions of a class into different files (splitting public from private)? John Ericson C++ 0 07-19-2003 04:03 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57