Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Parsing C header files with python

Reply
Thread Tools

Parsing C header files with python

 
 
Ian McConnell
Guest
Posts: n/a
 
      08-21-2004
I've got a header file which lists a whole load of C functions of the form

int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);

It's a numerical library so all functions return an int and accept varying
combinations of float pointers, ints and doubles.

What's the easiest way breaking down this header file into a list of
functions and their argument using python? Is there something that will
parse this (Perhaps a protoize.py) ? I don't want (or understand!) a full C
parser, just this simple case.

It seems like someone should have done something like this before, but
googling for python, header file and protoize just gives me information on
compiling python. If there isn't anything I'll have a go with regexps.

The reason of parsing the header file is because I want to generate (using
python) a wrapper allow the library to be called from a different language.
I've only got to generate this wrapper once, so the python doesn't have to
be efficient.

Thanks,
Ian



--
"Thinks: I can't think of a thinks. End of thinks routine": Blue Bottle
 
Reply With Quote
 
 
 
 
Ville Vainio
Guest
Posts: n/a
 
      08-21-2004
>>>>> "Ian" == Ian McConnell <> writes:

Ian> I've got a header file which lists a whole load of C functions of the form
Ian> int func1(float *arr, int len, double arg1);
Ian> int func2(float **arr, float *arr2, int len, double arg1, double arg2);

Ian> It's a numerical library so all functions return an int and
Ian> accept varying combinations of float pointers, ints and
Ian> doubles.

Ian> What's the easiest way breaking down this header file into a
Ian> list of functions and their argument using python? Is there

Well, what comes immediately to mind (I might be overlooking
something) is that the function name is immediately before '(', and
arguments come after it separated by ','. Start with regexps and work
from there...


--
Ville Vainio http://tinyurl.com/2prnb
 
Reply With Quote
 
 
 
 
Paul McGuire
Guest
Posts: n/a
 
      08-21-2004
"Ian McConnell" <> wrote in message
news:...
> I've got a header file which lists a whole load of C functions of the form
>
> int func1(float *arr, int len, double arg1);
> int func2(float **arr, float *arr2, int len, double arg1, double arg2);
>
> It's a numerical library so all functions return an int and accept varying
> combinations of float pointers, ints and doubles.
>


If regexp's give you pause, try this pyparsing example. It makes heavy use
of setting results names, so that the parsed tokens can be easily retrieved
from the results as if they were named attributes.

Download pyparsing at http://pyparsing.sourceforge.net.

-- Paul


------------------------
from pyparsing import *

testdata = """
int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);
"""

ident = Word(alphas, alphanums + "_")
vartype = Combine( oneOf("float double int") + Optional(Word("*")), adjacent
= False)
arglist = delimitedList( Group(vartype.setResultsName("type") +
ident.setResultsName("name")) )
functionCall = Literal("int") + ident.setResultsName("name") + \
"(" + arglist.setResultsName("args") + ")" + ";"

for fn,s,e in functionCall.scanString(testdata):
print fn.name
for a in fn.args:
print " -", a.type, a.name

------------------------
gives the following output:

func1
- float* arr
- int len
- double arg1
func2
- float** arr
- float* arr2
- int len
- double arg1
- double arg2


 
Reply With Quote
 
Paddy McCarthy
Guest
Posts: n/a
 
      08-22-2004
Ian McConnell <> wrote in message news:<>...
> I've got a header file which lists a whole load of C functions of the form
>
> int func1(float *arr, int len, double arg1);
> int func2(float **arr, float *arr2, int len, double arg1, double arg2);
>
> It's a numerical library so all functions return an int and accept varying
> combinations of float pointers, ints and doubles.
>
> What's the easiest way breaking down this header file into a list of
> functions and their argument using python? Is there something that will
> parse this (Perhaps a protoize.py) ? I don't want (or understand!) a full C
> parser, just this simple case.
>

<<SNIP>>
>
> Thanks,
> Ian

Would this suffice:

<CODE>

>>> import re
>>> import pprint
>>> hdr=''' int func1(float *arr, int len, double arg1);

int func2(float **arr, float *arr2, int len, double arg1, double arg2);

'''
>>> print hdr

int func1(float *arr, int len, double arg1);
int func2(float **arr, float *arr2, int len, double arg1, double arg2);


>>> func2args = {}
>>> for line in hdr.split('\n'):

line = [word for word in re.split(r'[\s,;()]+', line) if word]
if len(line)>2:func2args[line[1]] = line[2:]


>>> pprint.pprint(func2args)

{'func1': ['float', '*arr', 'int', 'len', 'double', 'arg1'],
'func2': ['float',
'**arr',
'float',
'*arr2',
'int',
'len',
'double',
'arg1',
'double',
'arg2']}
>>>


</CODE>
 
Reply With Quote
 
Ian McConnell
Guest
Posts: n/a
 
      08-22-2004
"Paul McGuire" <._bogus_.com> writes:

> "Ian McConnell" <> wrote in message
> news:...
>> I've got a header file which lists a whole load of C functions of the form
>>
>> int func1(float *arr, int len, double arg1);
>> int func2(float **arr, float *arr2, int len, double arg1, double arg2);
>>
>> It's a numerical library so all functions return an int and accept varying
>> combinations of float pointers, ints and doubles.
>>

>
> If regexp's give you pause, try this pyparsing example. It makes heavy use
> of setting results names, so that the parsed tokens can be easily retrieved
> from the results as if they were named attributes.
>
> Download pyparsing at http://pyparsing.sourceforge.net.


Thanks. Your example with pyparsing was just what I was looking for. It also
copes very nicely with newlines and spacing in the header file.

 
Reply With Quote
 
Paul McGuire
Guest
Posts: n/a
 
      08-23-2004
"Ian McConnell" <> wrote in message
news:...
> "Paul McGuire" <._bogus_.com> writes:
>

<snip>
>
> Thanks. Your example with pyparsing was just what I was looking for. It

also
> copes very nicely with newlines and spacing in the header file.
>

Ian -

It is just at this kind of one-off parsing job that I think pyparsing really
shines. I am sure that you could have accomplished this with regexp's, but
a) it would have taken at least a bit longer
b) it would have required more whitespace handline (such as function decls
that span linebreaks)
c) it would have been trickier to add other unanticipated changes (support
for other arg data types (such as char, long), embedded comments, etc.)

BTW, all it takes to make this grammar comment-immune is to add the
following statement before calling scanString():

functionCall.ignore( cStyleComment )

cStyleComment is predefined in the pyparsing module to recognize /* ... */
comments. Adding this will properly handle (i.e., skip over) definitions
like:

/*
int commentedOutFunc(float arg1, float arg2);
*/

Try that with regexp's!

-- Paul


 
Reply With Quote
 
Miki Tebeka
Guest
Posts: n/a
 
      08-23-2004
Hello Ian,

> I've got a header file which lists a whole load of C functions of the form
>
> int func1(float *arr, int len, double arg1);
> int func2(float **arr, float *arr2, int len, double arg1, double arg2);
>
> It's a numerical library so all functions return an int and accept varying
> combinations of float pointers, ints and doubles.
>
> What's the easiest way breaking down this header file into a list of
> functions and their argument using python? Is there something that will
> parse this (Perhaps a protoize.py) ? I don't want (or understand!) a full C
> parser, just this simple case.

There is an ANSI-C parser in ply (http://systems.cs.uchicago.edu/ply/)
which you can use.

Bye.
--
------------------------------------------------------------------------
Miki Tebeka <>
http://tebeka.spymac.net
The only difference between children and adults is the price of the toys
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Header files with "header.h" or <header.h> ?? mlt C++ 2 01-31-2009 02:54 PM
Parsing C header files Wayne Ruby 1 08-04-2007 08:13 AM
UNIX header files to Windows header files anand.ba@gmail.com C Programming 3 05-01-2006 03:57 PM
Header files included in header files John Smith C Programming 18 07-24-2004 04:55 AM
What is better /standard for creating files. a cpp file with header or cpp and seperate file for header DrUg13 C++ 1 02-10-2004 09:20 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57