Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Partition Recursive

Reply
Thread Tools

Partition Recursive

 
 
macm
Guest
Posts: n/a
 
      12-23-2010
Hi Folks

I have this:

url = 'http://docs.python.org/dev/library/stdtypes.html?
highlight=partition#str.partition'

So I want convert to

myList =
['http',':','//','docs','.','python','.','org','/','dev','/','library','/','stdtypes','.','html','?','highlight','=','parti tion','#','str','.','partition']

The reserved char are:

specialMeaning = ["//",";","/", "?", ":", "@", "=" , "&","#"]

Regards

Mario
 
Reply With Quote
 
 
 
 
MRAB
Guest
Posts: n/a
 
      12-23-2010
On 23/12/2010 17:26, macm wrote:
> Hi Folks
>
> I have this:
>
> url = 'http://docs.python.org/dev/library/stdtypes.html?
> highlight=partition#str.partition'
>
> So I want convert to
>
> myList =
> ['http',':','//','docs','.','python','.','org','/','dev','/','library','/','stdtypes','.','html','?','highlight','=','parti tion','#','str','.','partition']
>
> The reserved char are:
>
> specialMeaning = ["//",";","/", "?", ":", "@", "=" , "&","#"]
>

I would use re.findall.
 
Reply With Quote
 
 
 
 
Jon Clements
Guest
Posts: n/a
 
      12-23-2010
On Dec 23, 5:26*pm, macm <(E-Mail Removed)> wrote:
> Hi Folks
>
> I have this:
>
> url = 'http://docs.python.org/dev/library/stdtypes.html?
> highlight=partition#str.partition'
>
> So I want convert to
>
> myList =
> ['http',':','//','docs','.','python','.','org','/','dev','/','library','/','stdtypes','.','html','?','highlight','=','parti tion','#','str','.','partition']
>
> The reserved char are:
>
> specialMeaning = ["//",";","/", "?", ":", "@", "=" , "&","#"]
>
> Regards
>
> Mario


I would use urlparse.urlsplit, then split further, if required.

>>> urlsplit(url)

SplitResult(scheme='http', netloc='docs.python.org', path='/dev/
library/stdtypes.html', query='highlight=partition',
fragment='str.partition')



Jon.
 
Reply With Quote
 
macm
Guest
Posts: n/a
 
      12-23-2010
Hi

urlparse isnt a option.

My reasult must be:

myList =
['http',':','//','docs','.','python','.','org','/','dev','/','library','/',
'stdtypes','.','html','?','highlight','=','partiti on','#','str','.','partition']

re module is slow.

Even I make a loop in urlparse.urlsplit I can lost specialMeaning
order.

Seen easy but best aproach will be recursive.

Regards

Mario




On Dec 23, 3:57*pm, Jon Clements <(E-Mail Removed)> wrote:
> On Dec 23, 5:26*pm, macm <(E-Mail Removed)> wrote:
>
>
>
>
>
>
>
>
>
> > Hi Folks

>
> > I have this:

>
> > url = 'http://docs.python.org/dev/library/stdtypes.html?
> > highlight=partition#str.partition'

>
> > So I want convert to

>
> > myList =
> > ['http',':','//','docs','.','python','.','org','/','dev','/','library','/', 'stdtypes','.','html','?','highlight','=','partiti on','#','str','.','partit ion']

>
> > The reserved char are:

>
> > specialMeaning = ["//",";","/", "?", ":", "@", "=" , "&","#"]

>
> > Regards

>
> > Mario

>
> I would use urlparse.urlsplit, then split further, if required.
>
> >>> urlsplit(url)

>
> SplitResult(scheme='http', netloc='docs.python.org', path='/dev/
> library/stdtypes.html', query='highlight=partition',
> fragment='str.partition')
>
> Jon.


 
Reply With Quote
 
kj
Guest
Posts: n/a
 
      12-24-2010
In <(E-Mail Removed)> macm <(E-Mail Removed)> writes:

>url = 'http://docs.python.org/dev/library/stdtypes.html?highlight=partition#str.partition'


>So I want convert to


>myList =
>['http',':','//','docs','.','python','.','org','/','dev','/','library','/','stdtypes','.','html','?','highlight','=','parti tion','#','str','.','partition']


>The reserved char are:


>specialMeaning = ["//",";","/", "?", ":", "@", "=" , "&","#"]



You forgot '.'.

>>> import re # sorry
>>> sp = re.compile('(//?|[;?:@=&#.])')
>>> filter(len, sp.split(url))

['http', ':', '//', 'docs', '.', 'python', '.', 'org', '/', 'dev', '/', 'library', '/', 'stdtypes', '.', 'html', '\
?', 'highlight', '=', 'partition', '#', 'str', '.', 'partition']

~kj
 
Reply With Quote
 
Ian Kelly
Guest
Posts: n/a
 
      12-24-2010
On 12/23/2010 10:03 PM, kj wrote:
>>>> import re # sorry
>>>> sp = re.compile('(//?|[;?:@=&#.])')
>>>> filter(len, sp.split(url))


Perhaps I'm being overly pedantic, but I would likely have written that
as "filter(None, sp.split(url))" for the same reason that "if string:"
is generally preferred to "if len(string):".

Cheers,
Ian

 
Reply With Quote
 
macm
Guest
Posts: n/a
 
      12-24-2010
Thanks all


In [11]: reps = 5
In [12]: t = Timer("url = 'http://docs.python.org/dev/library/
stdtypes.html? highlight=partition#str.partition' ;sp =
re.compile('(//?|[;?:@=&#.])'); filter(len, sp.split(url))", 'import
re')
In [13]: print sum(t.repeat(repeat=reps, number=1)) / reps
4.94003295898e-05

In [65]: t = Timer("url = 'http://docs.python.org/dev/library/
stdtypes.html? highlight=partition#str.partition' ;sp =
re.compile('(//?|[;?:@=&#.])'); filter(None, sp.split(url))", 'import
re')
In [66]: print sum(t.repeat(repeat=reps, number=1)) / reps
3.50475311279e-05


Ian with None is a litle fast, thanks kj!

Hi Mr. James, speed is always important. But ok re is fine. (but could
be e-07)

In next step I'll go to cython to win something.

Regards

Mario



On Dec 24, 3:33*am, Ian Kelly <(E-Mail Removed)> wrote:
> On 12/23/2010 10:03 PM, kj wrote:
>
> >>>> import re # sorry
> >>>> sp = re.compile('(//?|[;?:@=&#.])')
> >>>> filter(len, sp.split(url))

>
> Perhaps I'm being overly pedantic, but I would likely have written that
> as "filter(None, sp.split(url))" for the same reason that "if string:"
> is generally preferred to "if len(string):".
>
> Cheers,
> Ian


 
Reply With Quote
 
DevPlayer
Guest
Posts: n/a
 
      12-28-2010

# parse_url11.py

# http://www.velocityreviews.com/forums/(E-Mail Removed)
# 2010-12 (Dec)-27
# A brute force ugly hack from a novice programmer.

# You're welcome to use the code, clean it up, make positive
suggestions
# for improvement.

"""
Parse a url string into a list using a generator.
"""

#special_itemMeaning = ";?:@=&#."
#"//",
#"/",
special_item = [";", "?", ":", "@", "=", "&", "#", ".", "/", "//"]

# drop urls with obviously bad formatting - NOTIMPLEMENTED
drop_item = ["|", "localhost", "..", "///"]
ignore_urls_containing = ["php", "cgi"]

def url_parser_generator(url):
len_text = len(url)
index = 0
start1 = 0 # required here if url contains ONLY specials
start2 = 0 # required here if url contains ONLY non specials
while index < len_text:

# LOOP1 == Get and item in the special_item list; can be any
length
if url[index] in special_item:
start1 = index
inloop1 = True
while inloop1:
if inloop1:
if url[start1:index+1] in special_item:
#print "[",start1, ":", index+1, "] = ",
url[start1:index+1]
inloop1 = True
else: # not in ANYMORE, but was in special_item
#print "[",start1, ":", index, "] = ",
url[start1:index]
yield url[start1:index]
start1 = index
inloop1 = False

if inloop1:
if index < len_text-1:
index = index + 1
else:
#yield url[start1:index] # NEW
inloop1 = False

elif url[index] in drop_item:
# not properly implemeted at all
raise NotImplemented(
"Processing items in the drop_item list is not "\
"implemented.", url[index])

elif url[index] in ignore_urls_containing:
# not properly implemeted at all
raise NotImplemented(
"Processing items in the ignore_urls_containing list
"\
"is not implemented.", url[index])

# LOOP2 == Get any item not in the special_item list; can be
any length
elif not url[index] in special_item:
start2 = index
inloop2 = True
while inloop2:
if inloop2:
#if not url[start2:index+1] in special_item: #<-
doesn"t work
if not url[index] in special_item:
#print "[",start2, ":", index+1, "] = ",
url[start2:index+1]
inloop2 = True
else: # not in ANYMORE, but item was not in
special_item before
#print "[",start2, ":", index, "] = ",
url[start2:index]
yield url[start2:index]
start2 = index
inloop2 = False

if inloop2:
if index < len_text-1:
index = index + 1
else:
#yield url[start2:index] # NEW
inloop2 = False

else:
print url[index], "Not Implemented" # should not get here
index = index + 1

if index >= len_text-1:
break

# Process any remaining part of URL and yield it to caller.
# Don't know if last item in url is a special or non special.
# Used start1 and start2 instead of start and
# used inloop1 and inloop2 instead of inloop
# to help debug, as using just "start" and "inloop" can get be
# harder to track in a generator.
if start1 >= start2:
start = start1
else:
start = start2
yield url[start: index+1]

def parse(url):
mylist = []
words = url_parser_generator(url)
for word in words:
mylist.append(word)
#print word
return mylist

def test():
urls = {
0: (True,"http://docs.python.org/dev/library/stdtypes.html?
highlight=partition#str.partition"),

1: (True,"/http:///docs.python.org/dev/library/stdtypes.html?
highlight=partition#str.partition"),
2: (True,"//http:///docs.python.org/dev/library/stdtypes.html?
highlight=partition#str.partition"),
3: (True,"///http:///docs.python.org/dev/library/stdtypes.html?
highlight=partition#str.partition"),

4: (True,"/http:///docs.python.org/dev/library/stdtypes.html?
highlight=partition#str.partition/"),
5: (True,"//http:///docs.python.org/dev/library/stdtypes.html?
highlight=partition#str.partition//"),
6: (True,"///http:///docs.python.org/dev/library/stdtypes.html?
highlight=partition#str.partition///"),

7: (True,"/#/http:///#docs.python..org/dev//////library/
stdtypes./html??highlight=p=partition#str.partition///"),

8:
(True,"httpdocspythonorgdevlibrarystdtypeshtmlhigh lightpartitionstrpartition"),
9:
(True,"httpdocs.pythonorgdevlibrarystdtypeshtmlhig hlightpartitionstrpartition"),
10:
(True,":httpdocspythonorgdevlibrarystdtypeshtmlhig hlightpartitionstrpartition"),
11:
(True,"httpdocspythonorgdevlibrarystdtypeshtmlhigh lightpartitionstrpartition/"),

12: (True,"///:;#.???"), # only special_items
13: (True,"///a:;#.???"), # only 1 non special_item
14: (True,"///:;#.???a"), # only 1 non special_item
15: (True,"a///:;#.???"), # only 1 non special_item
16: (True,"http://docs.python.php"),
17: (True,"http://php.python.org"),
18: (True,"http://www.localhost.com"),
}

# test various combinations of special_item characters in possible
in urls
for url_num in range(len(urls)):
value = urls[url_num]
test, url = value
if test: # allow for single tesing
mylist = parse(url)
print
print
print "url:", url_num, " ", url
print
print mylist
print
return mylist

test()
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
dd - unequal speed from partition to partition, partition to file,file to partition bolega C Programming 1 03-28-2011 07:37 PM
Recursive functions Vs Non-recursive functions - performance aspect vamsi C Programming 21 03-09-2009 10:53 PM
Two recursive calls inside of a recursive function n00m C++ 12 03-13-2008 03:18 PM
Trying to resize partition using Partition Magic 8.0 Mike Computer Support 4 01-29-2004 06:54 PM
Help..Partition magic made my partition disappear into oblivion Pierre Jarry Computer Support 6 07-14-2003 04:54 AM



Advertisments