Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > python re - a not needed

Reply
Thread Tools

python re - a not needed

 
 
kepes.krisztian
Guest
Posts: n/a
 
      12-16-2004
Hi !

I want to get infos from a html, but I need all chars except <.
All chars is: over chr(31), and over (12 - hungarian accents.
The .* is very hungry, it is eat < chars too.

If I can use not, I simply define an regexp.
[not<]*</a>

It is get all in the href.

I wrote this programme, but it is too complex - I think:

import re

l=[]
for i in range(33,65):
if i<>ord('<') and i<>ord('>'):
l.append('\\'+chr(i))
s='|'.join(l)
all='\w|\s|\%s-\%s|%s'%(chr(12,chr(255),s)
sre='<Subj>([%s]{1,1024})</d>'%all
#sre='<Subj>([?!\\<]{1,1024})</d>'
s='<Subj>xmvccv มมม sdfkdsfj eirfie</d><A></d>'


print sre
print s
cp=re.compile(sre)
m=cp.search(s)
print m.groups()

Have the python an regexp exception, or not function ? How to I use it ?

Thanx for help:
kk
 
Reply With Quote
 
 
 
 
Peter Otten
Guest
Posts: n/a
 
      12-16-2004
kepes.krisztian wrote:

> Hi !
>
> I want to get infos from a html, but I need all chars except <.
> All chars is: over chr(31), and over (12 - hungarian accents.
> The .* is very hungry, it is eat < chars too.
>
> If I can use not, I simply define an regexp.
> [not<]*</a>
>
> It is get all in the href.
>
> I wrote this programme, but it is too complex - I think:
>
> import re
>
> l=[]
> for i in range(33,65):
> if i<>ord('<') and i<>ord('>'):
> l.append('\\'+chr(i))
> s='|'.join(l)
> all='\w|\s|\%s-\%s|%s'%(chr(12,chr(255),s)
> sre='<Subj>([%s]{1,1024})</d>'%all
> #sre='<Subj>([?!\\<]{1,1024})</d>'
> s='<Subj>xmvccv มมม sdfkdsfj eirfie</d><A></d>'
>
>
> print sre
> print s
> cp=re.compile(sre)
> m=cp.search(s)
> print m.groups()
>
> Have the python an regexp exception, or not function ? How to I use it ?
>
> Thanx for help:
> kk


You could try these regexps or variants thereof:

"<Subj>([^<]*)"

'^' changes the character set to exclude any characters listed after '^'
from matching.

"<Subj>(.*?)<"

The '?' makes the preceding '*' non-greedy, i. e. the following '<' will
match the first '<' character encountered in the string to be searched.

Peter

 
Reply With Quote
 
 
 
 
Max M
Guest
Posts: n/a
 
      12-16-2004
kepes.krisztian wrote:

> I want to get infos from a html, but I need all chars except <.
> All chars is: over chr(31), and over (12 - hungarian accents.
> The .* is very hungry, it is eat < chars too.


Instead of writing ad-hoc html parsers, use BeautifulSoup instead.

http://www.crummy.com/software/BeautifulSoup/

I will most likely do what you want in 2 or 3 lines of code.

--

hilsen/regards Max M, Denmark

http://www.mxm.dk/
IT's Mad Science
 
Reply With Quote
 
Paul Rubin
Guest
Posts: n/a
 
      12-16-2004
Max M <> writes:
> Instead of writing ad-hoc html parsers, use BeautifulSoup instead.
>
> http://www.crummy.com/software/BeautifulSoup/


Hey, I like that. Thanks.
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Help needed on this 857W config. Repost to be clearer what the problemsare and the help needed sparticle Cisco 3 08-30-2007 07:47 PM
Ideas needed & help needed! Ryan Macy Ruby 2 07-19-2006 08:04 PM
Needed Instructor's Manual for Data Structures and Algorithms in C++ needed!!! Thomas Nick C++ 0 06-13-2005 01:58 AM
Advise needed re Olympus Camedia C-3030 Zoom Camera (driver needed) Arawak Computer Support 2 11-18-2004 03:03 PM
Microsoft small business server 2003 - help needed to understand what is needed to use it Dima Computer Support 5 10-20-2004 08:27 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57