Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C++ > Parser to list function names in C++?

Reply
Thread Tools

Parser to list function names in C++?

 
 
Henrik Goldman
Guest
Posts: n/a
 
      12-05-2006
Hi,

I would like to create a simplistic parser which goes through each .h file
and finds each function prototype (or inline implementation) along with
class names and member functions.

Examples:

test.h:

void f1();
inline int f2() {return 0;}

class A
{
void f3();
}

How would I aproach this from a simple viewpoint without a steep learning
curve. I know there exist a dozen parsers which are all pretty advanced and
requires lots of background knowledge but for my simple needs I think it
might be a bit overkill.
The parser should be in C++ too since rest of the app is also C++.

Any ideas how to proceed?

-- Henrik


 
Reply With Quote
 
 
 
 
Gianni Mariani
Guest
Posts: n/a
 
      12-05-2006

Henrik Goldman wrote:
> Hi,
>
> I would like to create a simplistic parser which goes through each .h file
> and finds each function prototype (or inline implementation) along with
> class names and member functions.
>
> Examples:
>
> test.h:
>
> void f1();
> inline int f2() {return 0;}
>
> class A
> {
> void f3();
> }
>
> How would I aproach this from a simple viewpoint without a steep learning
> curve. I know there exist a dozen parsers which are all pretty advanced and
> requires lots of background knowledge but for my simple needs I think it
> might be a bit overkill.
> The parser should be in C++ too since rest of the app is also C++.
>
> Any ideas how to proceed?


A true C++ parser is alot of work.

You could take an open source program that has a parser and teach it to
do what you want.

Perhaps you can look at doxygen or gcc.


G
 
Reply With Quote
 
 
 
 
jussij@zeusedit.com
Guest
Posts: n/a
 
      12-05-2006
Henrik Goldman wrote:

> I would like to create a simplistic parser which goes through
> each .h file and finds each function prototype (or inline
> implementation) along with class names and member
> functions.
> ....
> Any ideas how to proceed?


One approach would be to use a regular expression engine
to do the searching.

For example if I load your 'test.h' example header file
into Zeus and search for this regular expression:

[_a-z0-9]+[ &*\t]+[_a-z0-9 \t]*[_a-z0-9]+[ \t]*[(]+

it only finds these lines:

void f1();
inline int f2() {return 0;}
void f3();

Jussi Jumppanen
Zeus For Windows - "The ultimate programmer's editor/IDE"
http://www.zeusedit.com

 
Reply With Quote
 
Joseph Paterson
Guest
Posts: n/a
 
      12-06-2006
I suggest you have a look at flex/bison, or ANTLR.

Joseph.

 
Reply With Quote
 
CTG
Guest
Posts: n/a
 
      12-06-2006
you first of all sit down and work out the rules:
examples:

declaration of each function has a '(' followed by a ')' and a ';'
semicolon at the end except the in case of inline one.


I dont think its hard at all.

Henrik Goldman wrote:
> Hi,
>
> I would like to create a simplistic parser which goes through each .h file
> and finds each function prototype (or inline implementation) along with
> class names and member functions.
>
> Examples:
>
> test.h:
>
> void f1();
> inline int f2() {return 0;}
>
> class A
> {
> void f3();
> }
>
> How would I aproach this from a simple viewpoint without a steep learning
> curve. I know there exist a dozen parsers which are all pretty advanced and
> requires lots of background knowledge but for my simple needs I think it
> might be a bit overkill.
> The parser should be in C++ too since rest of the app is also C++.
>
> Any ideas how to proceed?
>
> -- Henrik


 
Reply With Quote
 
Evan
Guest
Posts: n/a
 
      12-06-2006
Henrik Goldman wrote:
> Hi,
>
> I would like to create a simplistic parser which goes through each .h file
> and finds each function prototype (or inline implementation) along with
> class names and member functions.
>
> Examples:
>
> test.h:
>
> void f1();
> inline int f2() {return 0;}
>
> class A
> {
> void f3();
> }
>
> How would I aproach this from a simple viewpoint without a steep learning
> curve. I know there exist a dozen parsers which are all pretty advanced and
> requires lots of background knowledge but for my simple needs I think it
> might be a bit overkill.


There are sort of two approaches I see. One is to use text pattern
matching like jussij suggests. (Though remember to also search for A-Z
and if you want to be pedantic, stuff like $ that you can also use in
identifiers but probably no one actually does. Also his won't spot
things like constructors (no return value), functions where there are
newlines in the whitespace (you can't use grep for those), operators,
and probably some other special cases.) There's a variant of this which
would use something like Flex to create a lexer, in which case you just
have to deal with whole tokens. This would might be easier if you know
at least a little Flex (or the ideas behind it) and can find the file
that GCC uses or something to do their lexing. Then again, it might
not.

The problem with that is that I'm not sure how hard it would be to get
just the lines in question. I mean, I know that jussij probably didn't
spent a lot of time working on that and could get something more to the
point with some more effort, but I suspect that it would be very
difficult to get something that works in full generality. At the same
time, if your results don't have to be perfect, this solution could be
very lightweight, even to the point of running a slightly modified
version of jussij's regex over your code with grep.

Now, as for if you want exact answers, you might have to go with one of
those parsers. I'll just give a shoutout for one that I know personally
called Elsa. It is complete and accurate enough to parse its own source
then output the source again in a form where it can be compiled and the
rebuilt version used to run the regression suite. At least, I think it
is, though I'm not quite sure how, because I'm currently fixing a
number of "pretty-printing" bugs that block correct translation of the
GCC 3.4 headers. (I'm working on a project that uses it for
source-to-source transformations.) There is one semi-show-stopping bug
in the parsing end though, which is that code containing endl or flush
confuses it. However, replacing endl with "\n" except in the definition
(I use a regex for telling apart uses and the definition; it's not
perfect either) will let things work right. (I know it's not quite
semantics preserving.) However, if you can stand to do that change,
it's quite easy to write an extension that will do what you want.
http://www.cs.berkeley.edu/~smcpeak/...lsa/semgrep.cc
has about a two and a half page long program that is "semantic grep";
you give it a variable name, and it will tell you all the places a
variable with that name is declared or used. On the other hand, if you
want to include it in another project... probably this is not the best
option. See www.cubewano.org/oink.

So pro with the parser approach is that it's very robust modulo bugs in
the implementation (in the case of Elsa, which will hopefully go away
in the fairly near future... Mozilla is eyeing the Oink project --
which now more or less includes Elsa -- for helping them), but the cons
are that it is pretty much by definition quite heavyweight. And there
are of course other options here. The other one that might be useful is
OpenC++, though I don't know much about that project. You could try to
hack the GCC front end. That's all the open-source c++ parsers I know
of.

Evan Driscoll

 
Reply With Quote
 
AnonMail2005@gmail.com
Guest
Posts: n/a
 
      12-06-2006

Henrik Goldman wrote:
> Hi,
>
> I would like to create a simplistic parser which goes through each .h file
> and finds each function prototype (or inline implementation) along with
> class names and member functions.
>
> Examples:
>
> test.h:
>
> void f1();
> inline int f2() {return 0;}
>
> class A
> {
> void f3();
> }
>
> How would I aproach this from a simple viewpoint without a steep learning
> curve. I know there exist a dozen parsers which are all pretty advanced and
> requires lots of background knowledge but for my simple needs I think it
> might be a bit overkill.
> The parser should be in C++ too since rest of the app is also C++.
>
> Any ideas how to proceed?
>
> -- Henrik

Your tool to do this will depend on what you want to do with
the output.

As someone else mentioned, you could get the output using
doxygen. I spent a day and a half playing around with it's
options and got it to producde what you need plus a ton of
other dependency related diagrams - class dependencies,
include file dependencies, and function call dependencies.

It's very flexible. I produced html output but it can also
producde XML output which can then be processed by some
other program.

 
Reply With Quote
 
Henrik Goldman
Guest
Posts: n/a
 
      12-06-2006
Hi,

> As someone else mentioned, you could get the output using
> doxygen. I spent a day and a half playing around with it's
> options and got it to producde what you need plus a ton of
> other dependency related diagrams - class dependencies,
> include file dependencies, and function call dependencies.
>
> It's very flexible. I produced html output but it can also
> producde XML output which can then be processed by some
> other program.


That actually sounds like a very useful idea. I just had a quick look and it
certainly looks interesting. It seems to give what I need but generates alot
of output so I must look into which files needs to be parsed etc.

-- Henrik


 
Reply With Quote
 
Henrik Goldman
Guest
Posts: n/a
 
      12-06-2006
Hi Evan,

Thanks for the suggestions.

I did look into Elsa but found it rather huge for my simple needs. Basically
I am trying to create an obfuscator which just changes names of functions
and classes. Elsa can probably do alot more then just this but the time to
learn how things work far superseeds the needs for my project.

-- Henrik


 
Reply With Quote
 
Default User
Guest
Posts: n/a
 
      12-06-2006
CTG wrote:

> you first of all sit down and work out the rules:



Please don't top-post. Your replies belong following or interspersed
with properly trimmed quotes. See the majority of other posts in the
newsgroup, or the group FAQ list:
<http://www.parashift.com/c++-faq-lite/how-to-post.html>

> examples:
>
> declaration of each function has a '(' followed by a ')' and a ';'
> semicolon at the end except the in case of inline one.


How do you distinguish that from a function call?

>
> I dont think its hard at all.


That probably means you haven't thought enough.

Such prototype declarations are not required by the language.

You have to be able to handle this as well:


void f()
{
return;
}

int main()
{
f();
return 0;
}


So no semicolon and no inline keyword to help. I recommend not trying
to roll your own on this. Use one of the prefab programs mentioned
elsewhere.



Brian
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
checking for mis-spelled variable names / function names News123 Python 2 11-26-2008 12:37 AM
Removing file names with '.' in their names from list? Sfdesigner Sfdesigner Ruby 5 08-13-2007 02:38 AM
Converting 'flat' gate level names to hierarchical names Paddy McCarthy VHDL 3 09-24-2004 05:34 PM
member function names identical to class names Ares Lagae C++ 8 09-24-2004 11:23 AM
table field names vs. display names Bob ASP .Net 1 07-30-2004 05:06 PM



Advertisments