On Mon, 23 Jun 2003 17:52:30 -0700, "Mike Rovner" <>
wrote:
>John Fitzsimons wrote:
Hi Mike,
>> I want to search an ordered text file and list web links such as :
>> and Output like ;
>> ftp://ftp.eunet.bg/pub/simtelnet
>> http://clients.net2000.com.au/~johnf/faq
>> www.fourmilab.ch/annoyance-filter/
>> Can anyone suggest the code and/or a python program/script I could
>> adapt to do this please ?
>That's very dirty (with high false positive and negative hits) url
>recognizer:
>import re
>url=re.compile(r'(?<=\s)(?
?:ftp|http|https)://|www(?:\.[^\. ]+){2,}).*?(?=
>\s)')
>print '\n'.join(re.findall(url, your_text_goes_here))
As I am a total newbie I will need to work out how to make that a
python file BUT as you have done the hard work it should be a lot
easier now.
It also looks like it might be close to being able to be used as a
regex string for something like NoteTab. Though, if using search and
replace, I would need to make it search for everything NOT matching
the above syntax and replace it with nothing.
I will also need to work out what "Closure cannot immediately follow
BegOfLine, EndOfLine or another closure" means, and fix it.
Many thanks for your help.
Regards, John.
..