Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > MoinMoin WikiName and python regexes

Reply
Thread Tools

MoinMoin WikiName and python regexes

 
 
Ara.T.Howard
Guest
Posts: n/a
 
      06-08-2005

hi-

i know nada about python so please forgive me if this is way off base. i'm
trying to fix a bug in MoinMoin whereby

WordsWithTwoCapsInARowLike
^^
^^
^^

do not become WikiNames. this is because the the wikiname pattern is
basically

/([A-Z][a-z]+){2,}/

but should be (IMHO)

/([A-Z]+[a-z]+){2,}/

however, the way the patterns are constructed like

word_rule = ur'(??<![%(l)s])|^)%(parent)s(?:%(subpages)s(?:[%(u)s][%(l)s]+){2,})+(?![%(u)s%(l)s]+)' % {
'u': config.chars_upper,
'l': config.chars_lower,
'subpages': config.allow_subpages and (wikiutil.CHILD_PREFIX + '?') or '',
'parent': config.allow_subpages and (ur'(?:%s)?' % re.escape(PARENT_PREFIX)) or '',
}


and i'm not that familiar with python syntax. to me this looks like a map
used to bind variables into the regex - or is it binding into a string then
compiling that string into a regex - regexs don't seem to be literal objects
in pythong AFAIK... i'm thinking i need something like

word_rule = ur'(??<![%(l)s])|^)%(parent)s(?:%(subpages)s(?:[%(u)s]+[%(l)s]+){2,})+(?![%(u)s%(l)s]+)' % {
^
^
^
'u': config.chars_upper,
'l': config.chars_lower,
'subpages': config.allow_subpages and (wikiutil.CHILD_PREFIX + '?') or '',
'parent': config.allow_subpages and (ur'(?:%s)?' % re.escape(PARENT_PREFIX)) or '',
}

and this seems to work - but i'm wondering what the 's' in '%(u)s' implies?
obviously the u is the char range (unicode?)... but what's the 's'?

i'm looking at

http://docs.python.org/lib/re-syntax.html
http://www.amk.ca/python/howto/regex/

and coming up dry. sorry i don't have more time to rtfm - just want to
implement this simple fix and get on to fcgi configuration!

cheers.

-a
--
================================================== =============================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
================================================== =============================

 
Reply With Quote
 
 
 
 
Don
Guest
Posts: n/a
 
      06-08-2005
Ara.T.Howard wrote:

>
> hi-
>
> i know nada about python so please forgive me if this is way off base.
> i'm trying to fix a bug in MoinMoin whereby
>
> WordsWithTwoCapsInARowLike
> ^^
> ^^
> ^^
>
> do not become WikiNames. this is because the the wikiname pattern is
> basically
>

[snip]

PHPWiki has the same "feature", BTW. (Sorry, couldn't get MoinMoin to work
on Sourceforge, had to use PHPWiki).

-Don

 
Reply With Quote
 
 
 
 
deelan
Guest
Posts: n/a
 
      06-08-2005
Ara.T.Howard wrote:
(...)
> and i'm not that familiar with python syntax. to me this looks like a map
> used to bind variables into the regex - or is it binding into a string then
> compiling that string into a regex - regexs don't seem to be literal
> objects
> in pythong AFAIK... i'm thinking i need something like
>
> word_rule =
> ur'(??<![%(l)s])|^)%(parent)s(?:%(subpages)s(?:[%(u)s]+[%(l)s]+){2,})+(?![%(u)s%(l)s]+)'
> % {
> ^
> ^
> ^
> 'u': config.chars_upper,
> 'l': config.chars_lower,
> 'subpages': config.allow_subpages and (wikiutil.CHILD_PREFIX +
> '?') or '',
> 'parent': config.allow_subpages and (ur'(?:%s)?' %
> re.escape(PARENT_PREFIX)) or '',
> }
>
> and this seems to work - but i'm wondering what the 's' in '%(u)s' implies?
> obviously the u is the char range (unicode?)... but what's the 's'?


an example may help here:

>>> a = 123
>>> '%04d' % a

'0123'
>>> '%f' % a

'123.000000'
>>> '%s' % a

'123'

that "s" tells python to convert the number as string. the form %(key)s
tells python to lookup a dictionary "key" and format the found value
into a string:

>>> d = {'key': 123}
>>> '%(key)s' % d

'123'

so in your code there's some keys named 'u', 'l', 'subpages', etc. and
their values are substitued into that big RE, replacing the
corresponding key names.

HTH.

--
deelan <http://www.deelan.com/>


 
Reply With Quote
 
Terry Reedy
Guest
Posts: n/a
 
      06-08-2005

"Ara.T.Howard" <(E-Mail Removed)> wrote in message
news(E-Mail Removed) oaa.gov...
> i'm trying to fix a bug in MoinMoin whereby


A 'bug' is a discrepancy between promise (specification) and perfomance
(implementation). Have you really found such -- does MoinMoin not follow
the Wiki standard -- or are you just trying to customize MoinMoin to your
different specification.

> WordsWithTwoCapsInARowLike
> ^^
> do not become WikiNames.


Would your proposed change to make the above into an Wiki name also make
all-cap sequences like NATO, FTP, and API into WikiNames and do you really
want that? If WikiNum, appearing one place, were also mistyped as WikeNUm
(from holding down the shift key too long, which I do occasionally), should
the latter become a separate WikiName? I can certainly understand why the
Wike designers might have answered both questions 'No."

Terry J. Reedy



 
Reply With Quote
 
Ara.T.Howard
Guest
Posts: n/a
 
      06-08-2005
On Wed, 8 Jun 2005, Terry Reedy wrote:

>
> "Ara.T.Howard" <(E-Mail Removed)> wrote in message
> news(E-Mail Removed) oaa.gov...
>> i'm trying to fix a bug in MoinMoin whereby

>
> A 'bug' is a discrepancy between promise (specification) and perfomance
> (implementation). Have you really found such -- does MoinMoin not follow
> the Wiki standard -- or are you just trying to customize MoinMoin to your
> different specification.


well, according to the specification at

http://moinmoin.wikiwikiweb.de/WikiN...%28wikiname%29

ThisIsAWikiName

there seems to be general agreement here

http://wikka.jsnx.com/WikiName
http://twiki.org/cgi-bin/view/TWiki/WikiWord

though not a wikis agree.

in moinmoin others have noted the inconsistency and filed a bug as noted in

http://moinmoin.wikiwikiweb.de/MoinM...%28wikiname%29

the problem being that the specification is simply vague here and does not
specifically prohibit AWikiName.

>
>> WordsWithTwoCapsInARowLike
>> ^^
>> do not become WikiNames.

>
> Would your proposed change to make the above into an Wiki name also make
> all-cap sequences like NATO, FTP, and API into WikiNames


it wouldn't since

NATO !~ /^([A-Z]+[a-z]+){2,}$/
FTP !~ /^([A-Z]+[a-z]+){2,}$/
API !~ /^([A-Z]+[a-z]+){2,}$/

the pattern is

word = one, or more, upper case letters followed by one, or more, lower case
letters

wikiword = at least two words together

so

FOobar is not a link

but

AFooBar is

> If WikiNum, appearing one place, were also mistyped as WikeNUm (from holding
> down the shift key too long, which I do occasionally), should the latter
> become a separate WikiName? I can certainly understand why the Wike
> designers might have answered both questions 'No."


perhaps - it's just inconsistent the way it is now.

cheers.


-a
--
================================================== =============================
| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
| phone :: 303.497.6469
| My religion is very simple. My religion is kindness.
| --Tenzin Gyatso
================================================== =============================

 
Reply With Quote
 
Paul Bredbury
Guest
Posts: n/a
 
      06-08-2005
Ara.T.Howard wrote:
> i know nada about python so please forgive me if this is way off base. i'm
> trying to fix a bug in MoinMoin whereby
>
> WordsWithTwoCapsInARowLike


I don't think there is such a thing as the perfect "hyperlink vs
just-text" convention. In MoinMoin, you can force a custom link using e.g.:

[wiki:WebsiteSecurity this is the link text to WebsiteSecurity so call
it whatever you want such as WebsiteSecurities]

This custom linking, whilst obviously not ideal, solves the problems
mentioned at http://www.c2.com/cgi/wiki?WikiName

This seems better than producing endless confusing variations on the
"standard" (be it formal, actual, or simply obviously desired).

I'm not convinced of the usefulness of MoinMoin's "subpages" idea, while
we're on the (related) subject - they seem to create more problems than
they solve:
http://moinmoin.wikiwikiweb.de/HelpOnEditing/SubPages
 
Reply With Quote
 
Bengt Richter
Guest
Posts: n/a
 
      06-26-2005
On Wed, 8 Jun 2005 09:49:51 -0600, "Ara.T.Howard" <(E-Mail Removed)> wrote:

>
>hi-
>
>i know nada about python so please forgive me if this is way off base. i'm
>trying to fix a bug in MoinMoin whereby
>
> WordsWithTwoCapsInARowLike
> ^^
> ^^
> ^^
>
>do not become WikiNames. this is because the the wikiname pattern is
>basically
>
> /([A-Z][a-z]+){2,}/
>
>but should be (IMHO)
>
> /([A-Z]+[a-z]+){2,}/

That would take care of the example above, but does it change an official spec?

>
>however, the way the patterns are constructed like
>
> word_rule = ur'(??<![%(l)s])|^)%(parent)s(?:%(subpages)s(?:[%(u)s][%(l)s]+){2,})+(?![%(u)s%(l)s]+)' % {
> 'u': config.chars_upper,
> 'l': config.chars_lower,
> 'subpages': config.allow_subpages and (wikiutil.CHILD_PREFIX + '?') or '',
> 'parent': config.allow_subpages and (ur'(?:%s)?' % re.escape(PARENT_PREFIX)) or '',
> }
>
>
>and i'm not that familiar with python syntax. to me this looks like a map
>used to bind variables into the regex - or is it binding into a string then
>compiling that string into a regex - regexs don't seem to be literal objects
>in pythong AFAIK... i'm thinking i need something like
>
> word_rule = ur'(??<![%(l)s])|^)%(parent)s(?:%(subpages)s(?:[%(u)s]+[%(l)s]+){2,})+(?![%(u)s%(l)s]+)' % {
> ^
> ^
> ^
> 'u': config.chars_upper,
> 'l': config.chars_lower,
> 'subpages': config.allow_subpages and (wikiutil.CHILD_PREFIX + '?') or '',
> 'parent': config.allow_subpages and (ur'(?:%s)?' % re.escape(PARENT_PREFIX)) or '',
> }
>
>and this seems to work - but i'm wondering what the 's' in '%(u)s' implies?
>obviously the u is the char range (unicode?)... but what's the 's'?

'u' doesn't stand for unicode here. It is the key to look up config.chars_upper from the dict. That could
be unicode, and probably is. The 's' is the final part of a formatting spec which says how to convert the
data looked up, and 's' is for string, which doesn't change string data (unless, and UIAM, a conversion to unicode is required).

All of the above is making use of the % operator of strings, as in the expression
fmt % data
where fmt is a string containing ordinary characters and formatting specs in the form
of substrings escaped by a leading character '%'. The formatting specs take two basic
alternative forms: %<spec> or %(name)<spec>. If any '%' is followed by a parenthesized name,
as in '%(u)s' it means that the data to be formatted is retrieved from data['u'] for the latter example.
If there is no parenthesized name, the data is retrieved from data[i] where data must be a tuple and
i is the positional count of format specs in fmt. In some cases where there is no ambiguity,
and there is only one datum, data[0] may be written as the non-tuple value expression, e.g.,
instead of (123,) that data could be written as (123,)[0] or plain 123.

In the word_rule above, %(u)s uses 'u' as a key to get data from the dictionary { 'u': config.chars_upper, ...}
to substitute in the [%(u)s] as a string (that's what the 's' specifies), so config.chars_upper will
presumably have had a string value such as u'ABC..Z' and that will then be inserted in place of the %(u)s to
get u'...[ABC..Z]...' (if fmt is unicode, the resulting string will be unicode, UIAM)

>
>i'm looking at
>
> http://docs.python.org/lib/re-syntax.html
> http://www.amk.ca/python/howto/regex/
>

See also
http://www.python.org/doc/current/li...q-strings.html
(which IMO should be easier to find, but if you click on the index square
at the top right of any library reference page, you can see a "%formatting" link)

>and coming up dry. sorry i don't have more time to rtfm - just want to
>implement this simple fix and get on to fcgi configuration!
>
>cheers.
>
>-a
>--
>================================================= ==============================
>| email :: ara [dot] t [dot] howard [at] noaa [dot] gov
>| phone :: 303.497.6469
>| My religion is very simple. My religion is kindness.
>| --Tenzin Gyatso
>================================================= ==============================
>


Regards,
Bengt Richter
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
ANN: eGenix MoinMoin action plugins for renaming and search&replace eGenix Team: M.-A. Lemburg Python 0 03-31-2008 05:09 PM
Looking for a MoinMoin guru - MoinMoin+SpamBayes == no wiki spam? skip@pobox.com Python 4 02-09-2007 04:24 AM
Unicode and MoinMoin gdetre@princeton.edu Python 2 02-27-2006 05:21 PM
OT: MoinMoin and Mediawiki? Paul Rubin Python 23 01-12-2005 11:54 PM
Does Python optimize regexes? Jason Smith Python 5 07-03-2004 02:36 PM



Advertisments