Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > excluding search string in regular expressions

Reply
Thread Tools

excluding search string in regular expressions

 
 
Franz Steinhaeusler
Guest
Posts: n/a
 
      10-21-2004
Hello,

Following Problem:

find only occurances, where in the line are'::' characters and
the former line is not equal '**/'

so 2) and 3) should be found and 1) not.

1)
"""
**/
void C::B
"""

2)
"""

void C::B
"""

3)
"""
*/
void C::B
"""

I tried something
"\*\*/\n.*::"

But this is the opposite.

So my question is: how can I exclude a pattern?

single characters with [^ab] but I need not(ab)

not_this_brace_pattern(\*\*/\n).*::

thank you in advance,
--
Franz Steinhaeusler
 
Reply With Quote
 
 
 
 
Franz Steinhaeusler
Guest
Posts: n/a
 
      10-21-2004
On Thu, 21 Oct 2004 13:36:46 +0200, Franz Steinhaeusler
<(E-Mail Removed)> wrote:

>
>single characters with [^ab] but I need not(ab)
>
>not_this_brace_pattern(\*\*/\n).*::


Sorry,
is this the solution (simple concatenating [^*][^*][^/]\n.*:: ?


The background:
I want to scan cpp file, whether the have a doxygen comment already:
It should find all postitions, where this is missing:

ok

doxygen comment
**/
void CBs::InitButtonPanel (int progn1, int progn2)

the problem is to find the method or function definition, and for
that, I need a regex.
it should ignore blabla::InitButtonPanel(a, b);

So a mark is that if there is a semikolon at the end,
it is no function or method defininition.

So I would need
[^*][^*][^/]\n.*[)]*[^;]
but this is not working.

Thank you again in advance!
--
Franz Steinhaeusler
 
Reply With Quote
 
 
 
 
Mitja
Guest
Posts: n/a
 
      10-21-2004
Franz Steinhaeusler wrote:
> On Thu, 21 Oct 2004 13:36:46 +0200, Franz Steinhaeusler
> <(E-Mail Removed)> wrote:
>
>>
>> single characters with [^ab] but I need not(ab)
>>
>> not_this_brace_pattern(\*\*/\n).*::

>
> Sorry,
> is this the solution (simple concatenating
> [^*][^*][^/]\n.*:: ?


That should do, though it's admittedly far from elegant; I, too, would like to see a nicer solution.

> The background:
> I want to scan cpp file, whether the have a doxygen
> comment already: It should find all postitions, where
> this is missing:
>
> ok
>
> doxygen comment
> **/
> void CBs::InitButtonPanel (int progn1, int progn2)


In this case, I'd replace \n with \w*, meaning any amount of whitespace.


 
Reply With Quote
 
Franz Steinhaeusler
Guest
Posts: n/a
 
      10-21-2004
On Thu, 21 Oct 2004 14:40:24 +0200, "Mitja" <(E-Mail Removed)> wrote:

>Franz Steinhaeusler wrote:
>> On Thu, 21 Oct 2004 13:36:46 +0200, Franz Steinhaeusler
>> <(E-Mail Removed)> wrote:
>>
>>>
>>> single characters with [^ab] but I need not(ab)
>>>
>>> not_this_brace_pattern(\*\*/\n).*::

>>
>> Sorry,
>> is this the solution (simple concatenating
>> [^*][^*][^/]\n.*:: ?

>
>That should do, though it's admittedly far from elegant; I, too, would like to see a nicer solution.
>
>> The background:
>> I want to scan cpp file, whether the have a doxygen
>> comment already: It should find all postitions, where
>> this is missing:
>>
>> ok
>>
>> doxygen comment
>> **/
>> void CBs::InitButtonPanel (int progn1, int progn2)

>
>In this case, I'd replace \n with \w*, meaning any amount of whitespace.
>


Hello, thank you.

Oh, not really right (about finding c function/method definition):

[^*][^*][^/]\w*.*[)]*[^;]


if func()
{

would also be found.

A more common solution for detecting functions/Methods would be fine.

[^*][^*][^/]\w*--c-method/function/definition


--
Franz Steinhaeusler
 
Reply With Quote
 
Oliver Fromme
Guest
Posts: n/a
 
      10-21-2004
Mitja <(E-Mail Removed)> wrote:
> Franz Steinhaeusler wrote:
> > Franz Steinhaeusler wrote:
> > > [...]
> > > single characters with [^ab] but I need not(ab)
> > >
> > > not_this_brace_pattern(\*\*/\n).*::

> >
> > Sorry,
> > is this the solution (simple concatenating
> > [^*][^*][^/]\n.*:: ?

>
> That should do, though it's admittedly far from elegant; I, too,
> would like to see a nicer solution.


It won't work correctly. Franz needs a sub-expression that
matches anything which is not "**/". However, [^*][^*][^/]
is a character-wise negation, not word-wise. It doesn't
match "**/", but neither does it match "xx/", nor any other
string which has only one or two of the characters at the
right position.

What you need is a "negative look-behind assertion". The
following Python-RE will do: (?<!\*\*/)\n.*::
Remember to use raw string notation, or you need to double
the backslashes:

my_re_str = r"(?<!\*\*/)\n.*::"
my_re_obj = re.compile(my_re_str)

Note that you might want to use \s* instead of \n, so any
amount of whitespace (including newlines) is matched, not
just one single newline.

For more information about regular expressions supported by
Python, refer to the Library Reference manual:

http://docs.python.org/lib/re-syntax.html

Best regards
Oliver

--
Oliver Fromme, Konrad-Celtis-Str. 72, 81369 Munich, Germany

``All that we see or seem is just a dream within a dream.''
(E. A. Poe)
 
Reply With Quote
 
Diez B. Roggisch
Guest
Posts: n/a
 
      10-21-2004
>
> A more common solution for detecting functions/Methods would be fine.


Maybe you should go for a real parser here - together with a
C-syntax-grammar. Trying to cram this stuff into regexps is bound for not
catching special cases. And its gereally difficult to have a regexp _not_
macht a certain word.

Another approach would be to look for closing comments and function
definitions in several rexes, and use python-logic:

if doxy_close_rex.match(line):
line = lines.next()
if fun_def_rex.match(line):
....


--
Regards,

Diez B. Roggisch
 
Reply With Quote
 
Bengt Richter
Guest
Posts: n/a
 
      10-21-2004
On Thu, 21 Oct 2004 13:36:46 +0200, Franz Steinhaeusler <(E-Mail Removed)> wrote:

>Hello,
>
>Following Problem:
>
>find only occurances, where in the line are'::' characters and
>the former line is not equal '**/'
>
>so 2) and 3) should be found and 1) not.
>
>1)
>"""
>**/
>void C::B
>"""
>
>2)
>"""
>
>void C::B
>"""
>
>3)
>"""
>*/
>void C::B
>"""
>
>I tried something
>"\*\*/\n.*::"
>
>But this is the opposite.
>
>So my question is: how can I exclude a pattern?
>
>single characters with [^ab] but I need not(ab)
>
>not_this_brace_pattern(\*\*/\n).*::
>
>thank you in advance,


To look back a line, I think I'd just use a generator, and test current
and last lines to get what I wanted. E.g., perhaps you can adapt this:
(I am just going literally by
"""
find only occurances, where in the line are'::' characters and
the former line is not equal '**/'
"""
which doesn't need a regex)

>>> def findem(lineseq):

... getline = iter(lineseq).next
... curr = getline().rstrip()
... while True:
... last, curr = curr, getline().rstrip()
... if '::' in curr and last != '**/': yield curr
...

I made a file, modifying your data a little:

>>> print '----\n%s----'% file('franz.txt').read()

----
1)
"""
**/
void C::B -- no (1)
"""

2)
"""

void C::B -- yes (2)
"""

3)
"""
*/
void C::B -- yes (3)
"""
----

Here's what the generator returns:

>>> for line in findem(file('franz.txt')): print repr(line)

...
'void C::B -- yes (2)'
'void C::B -- yes (3)'


Regards,
Bengt Richter
 
Reply With Quote
 
Franz Steinhaeusler
Guest
Posts: n/a
 
      10-22-2004
On Thu, 21 Oct 2004 15:32:37 +0200, "Diez B. Roggisch"
<(E-Mail Removed)> wrote:

>>
>> A more common solution for detecting functions/Methods would be fine.

>
>Maybe you should go for a real parser here - together with a
>C-syntax-grammar. Trying to cram this stuff into regexps is bound for not
>catching special cases. And its gereally difficult to have a regexp _not_
>macht a certain word.
>


Hello Diez,

thanks, yes, it is difficult for "not" find a searchstring in regex

I only want to find a regex for an editor (which is written in python)
to have a common function (of course it cannot be so accurate as a
parser) to find a function/method defininition.

>Another approach would be to look for closing comments and function
>definitions in several rexes, and use python-logic:
>
>if doxy_close_rex.match(line):
> line = lines.next()
> if fun_def_rex.match(line):
> ....


--
Franz Steinhaeusler
 
Reply With Quote
 
Franz Steinhaeusler
Guest
Posts: n/a
 
      10-22-2004
On 21 Oct 2004 13:28:28 GMT, Oliver Fromme <(E-Mail Removed)>
wrote:

>Mitja <(E-Mail Removed)> wrote:
> > Franz Steinhaeusler wrote:
> > > Franz Steinhaeusler wrote:
> > > > [...]
> > > > single characters with [^ab] but I need not(ab)
> > > >
> > > > not_this_brace_pattern(\*\*/\n).*::
> > >
> > > Sorry,
> > > is this the solution (simple concatenating
> > > [^*][^*][^/]\n.*:: ?

> >
> > That should do, though it's admittedly far from elegant; I, too,
> > would like to see a nicer solution.

>


Hello Oliver,

>It won't work correctly. Franz needs a sub-expression that
>matches anything which is not "**/". However, [^*][^*][^/]
>is a character-wise negation, not word-wise. It doesn't
>match "**/", but neither does it match "xx/", nor any other
>string which has only one or two of the characters at the
>right position.


yes, you are right, the approach above is false.

>
>What you need is a "negative look-behind assertion".


??, sounds interesting

>The
>following Python-RE will do: (?<!\*\*/)\n.*::
>Remember to use raw string notation, or you need to double
>the backslashes:
>
>my_re_str = r"(?<!\*\*/)\n.*::"



>my_re_obj = re.compile(my_re_str)
>
>Note that you might want to use \s* instead of \n, so any
>amount of whitespace (including newlines) is matched, not
>just one single newline.
>
>For more information about regular expressions supported by
>Python, refer to the Library Reference manual:
>
>http://docs.python.org/lib/re-syntax.html
>


(?<!...)
Matches if the current position in the string is not preceded by a
match for..

That is it.

Many thanks for your helpful reply,

--
Franz Steinhaeusler
 
Reply With Quote
 
Franz Steinhaeusler
Guest
Posts: n/a
 
      10-22-2004
On Thu, 21 Oct 2004 22:38:00 GMT, http://www.velocityreviews.com/forums/(E-Mail Removed) (Bengt Richter) wrote:

>
>To look back a line, I think I'd just use a generator, and test current
>and last lines to get what I wanted. E.g., perhaps you can adapt this:
>(I am just going literally by
> """
> find only occurances, where in the line are'::' characters and
> the former line is not equal '**/'
> """
>which doesn't need a regex)
>
> >>> def findem(lineseq):

> ... getline = iter(lineseq).next
> ... curr = getline().rstrip()
> ... while True:
> ... last, curr = curr, getline().rstrip()
> ... if '::' in curr and last != '**/': yield curr
> ...
>[...]
>
>Regards,
>Bengt Richter


Hello Bengt,

thank you for suggesting this interesting approach,

regards
--
Franz Steinhaeusler
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
re.search much slower then grep on some regular expressions Henning_Thornblad Python 46 07-10-2008 05:15 PM
Regex match excluding search string in result: jobs ASP .Net 2 08-09-2007 04:49 PM
Here is my full program excluding the search and delete functions kimimaro C Programming 4 11-01-2004 02:23 AM
Please recommend a regular expression excluding return character... Kurt Euler Ruby 3 12-03-2003 12:10 PM
Add custom regular expressions to the validation list of available expressions Jay Douglas ASP .Net 0 08-15-2003 10:19 PM



Advertisments