Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > parse bib file in C

Reply
Thread Tools

parse bib file in C

 
 
Rudra Banerjee
Guest
Posts: n/a
 
      06-30-2012
I will be grateful if somebody shows my the way to parse bib file using C program.
I am novice in C so, inspite of a large hits while I search file parser in C, failed to create a bib parser.
a bib file structure is


###############################
## SAMPLE BIB FILE FORMAT ##
@article{key1(alpha-numeric),
Title="Some Title(char)",
Author="Author List(char)",
Year="2012(int)",
volume="123(int)",
Pages="321(int)"
journal="Publishers(char)"
}


@book{key2(alpha-numeric),
Title="Some OTHER Title(char)",
Author="OTHER Author List(char)",
Year="2010(int)",
volume="1234(int)",
Pages="4321(int)"
Publishers="Publishers(char)",
Address="Publishers Address(alpha-numeric)",
Edition="Books Edition"
}
#################################
I want to parse this type of file and put it in a 2d array.
Please help.
 
Reply With Quote
 
 
 
 
Rudra Banerjee
Guest
Posts: n/a
 
      06-30-2012
Thanks a lot for trying to help me.

On Saturday, 30 June 2012 10:00:18 UTC+5:30, Barry Schwarz wrote:
> A 2d array of what? What parts of the above data do you want to save?


I want 2d array of the entries, say,
array[1][0]=article;array[1][1]=key1; array[1][2]=Some Title, array[1][3]=Author List
array[2][0]=book,array[2][1]=key2; array[1][2]=Some OYHER Title, array[1][3]=OTHER Author List
etc.

> How many @article entries are there in the input? How many @book
> entries? Do you intend for the entire array to be in memory at once?


I would really love to make it general, so it should parse @article/@book also. And also it is not the case that all @article entry is at followed by all @book entry. I would love to write the output on the memory on the fly.

> What will you do with the data once you parse it? Are the int values
> actually imbedded in quotes?

Yes
>Is the order of the data fixed?


No

> Is every entry guaranteed to have all the data you show?


No

> Is the file well behaved (every left brace has a matching right brace, the reverse
> also, every left parenthesis has a corresponding right parenthesis and
> conversely, all quotes occur in pairs, etc)?


Yes
> Are the datum ID and the datum always on the same line?

That is the general practice, but its not a RULE.
> Do any lines have multiple data?


The may have, if there is a number of Author. But for my purpose, parsing 1st two of the author will be sufficient

> Are the volume and journal IDs really not capitalized?

The initial letters are in Capital, like Phys. Rev. B.

Following is 3 entry from a real bib file. Hope this will help.

@article{Armgnac1930,
author = "Armgnac, Alden C.",
journal = "Popular Science",
month = "December",
pages = "31",
title = "{New Steel Alloy Is Rust Proof}",
year = "1930"
}

@book{ashcroftsolid,
author = "Ashcroft, NW and Mermin, ND",
booktitle = "{Solid State Physics}",
publisher = "Brooks Cole",
title = "{Solid State Physics}",
x-fetchedfrom = "Google Scholar",
year = "1976"
}

@article{Banerjee2010a,
author = "Banerjee, Mitali and {\textbf{Rudra Banerjee}} and Majumdar, A.K. and Mookerjee, Abhijit and Sanyal, Biplab and Nigam, A.K.",
doi = "10.1016/j.physb.2010.07.028",
file = ":home/rudra/Documents/papers/sdarticle(3).pdfdf",
issn = "09214526",
journal = "Physica B: Condensed Matter",
keywords = "Magnetic phases; Spin glasses",
number = "20",
pages = "4287--4293",
publisher = "Elsevier",
title = "{Magnetism in NiFeMo disordered alloys: Experiment and theory}",
volume = "405",
year = "2010"
}




 
Reply With Quote
 
 
 
 
Rudra Banerjee
Guest
Posts: n/a
 
      06-30-2012
If anyone kindly provide me a small sample code, I can try to build the code over that and return back with the problem.
 
Reply With Quote
 
Rudra Banerjee
Guest
Posts: n/a
 
      06-30-2012
I will be grateful if someone show me a sample code so that I can build on that and come back if I face any problem.
 
Reply With Quote
 
none
Guest
Posts: n/a
 
      06-30-2012
In article <(E-Mail Removed)>,
Rudra Banerjee <(E-Mail Removed)> wrote:
>I will be grateful if somebody shows my the way to parse bib file using
>C program. I am novice in C so, inspite of a large hits while I search
>file parser in C, failed to create a bib parser.


You may want to have a look at the bibtex parser called btparse.
You will find it at:

http://www.cpan.org/authors/Greg_War...se-0.34.tar.gz

But since you say you are a novice, then probably you will
find it too advanced for you. You will need to know something
about lexical parsers and analyzers to begin making sense
of the program. Consider setting aside this project for a
later time. Pick up something more suitable for a novice.

--
Rouben Rostamian

 
Reply With Quote
 
Rudra Banerjee
Guest
Posts: n/a
 
      06-30-2012
Thanks for the link.
Previously I have done a naive xml to bibtex converter using libxml.
So, I hope, if someone show me the first step, I can manage.

Though, being novice, I was not aware of terms like lexical parser etc. Thanks for leting me know the way.
 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      06-30-2012
Rudra Banerjee <(E-Mail Removed)> writes:

> I will be grateful if somebody shows my the way to parse bib file
> using C program. I am novice in C so, inspite of a large hits while I
> search file parser in C, failed to create a bib parser.


The default answer to any such question is "use lex and yacc" (the GNU
versions being flex and bison). Today there are many other similar
programs, but lex and yacc have lots of tutorial material written about
them so I think they might still be the beginner's choice. For Usenet
help on using them, comp.unix.programmer might be the bets place, though
there may be more specific groups.

But another question comes to mind. I you are a C novice, why are you
doing this in C? You say elsewhere that you want the result in a "2d
array" but that does not seem like the right structure for any bibtex
processing that I can think of. What is the top-level task you are
trying to achieve, and why do think C the right way to do it?

<snip>
--
Ben.
 
Reply With Quote
 
Jorgen Grahn
Guest
Posts: n/a
 
      06-30-2012
On Sat, 2012-06-30, Rudra Banerjee wrote:
> I will be grateful if somebody shows my the way to parse bib file
> using C program.

....
> ## SAMPLE BIB FILE FORMAT ##
> @article{key1(alpha-numeric),
> Title="Some Title(char)",
> Author="Author List(char)",
> Year="2012(int)",
> volume="123(int)",
> Pages="321(int)"
> journal="Publishers(char)"
> }


To be pedantic, you probably mean BibTeX. Bib is another, much older
tool which used (more or less) the refer(1) file format:

%T Some Title
%A Author1
%A Author2
%A Author3
%D 2012
%V 123
%J Publishers

/Jorgen

--
// Jorgen Grahn <grahn@ Oo o. . .
\X/ snipabacken.se> O o .
 
Reply With Quote
 
Rudra Banerjee
Guest
Posts: n/a
 
      06-30-2012
On Saturday, 30 June 2012 17:22:02 UTC+5:30, Ben Bacarisse wrote:
> But another question comes to mind. I you are a C novice, why are you
> doing this in C? You say elsewhere that you want the result in a "2d
> array" but that does not seem like the right structure for any bibtex
> processing that I can think of. What is the top-level task you are
> trying to achieve, and why do think C the right way to do it?
> Ben.


What I want to achive is a JabRef like viewer from GTK. By primary programming knowledge is in Fortran, and this is my time-passing. So, I dont think python will be very good option for me. In 2d array,say,array[i][j], as shown previously
"I want 2d array of the entries, say,
array[1][0]=article;array[1][1]=key1; array[1][2]=Some Title, array[1][3]=Author List
array[2][0]=book,array[2][1]=key2; array[1][2]=Some OYHER Title, array[1][3]=OTHER Author List
etc. "
 
Reply With Quote
 
Stefan Ram
Guest
Posts: n/a
 
      06-30-2012
Rudra Banerjee <(E-Mail Removed)> writes:
>I will be grateful if someone show me a sample code so that I
>can build on that and come back if I face any problem.


To write a parser for a language, one does not want to use
examples of that language, but a grammar for that language.
Give a C programmer a grammar and some money and he'll
happily write a parser for you. As for an example:

(If answering to the following post, one should please not
quote all of it, but only a few lines one directly refers to.)

In order to interpret or translate an expression (term), it is
decomposed into lexical units (tokens, words), which then are
used by a parser to build symbols and a structured
representation of the input. This representation then might be
evaluated or translated into some other representation.

The syntactial structuring resembles the rules for the
construction of an expression, which often is given by so-
called "productions" of the EBNF (extended Backus-Nauer-Form)
and which sometimes are left-recursive.

When writing a parser, the left-recursive productions sometimes
are a worry to the author, because it is not obvious how to
avoid an infinite recursion. The solution is to rewrite them as
right-recursive productions.

The addition with a binary infix Operator, for example, is
left associative. However, it is simpler to analyze in a
right-associative manner. Therefore, one analyzes the source
using right-associative rules and then creates a result
using a left-associative interpretation.

A left-associative grammar might be, for example, as follows.

<numeral> ::= '2' | '4' | '5'.
<expression> ::= <numeral> | <expression> '+' <numeral>.
start symbol: <expression>.

To analyze this using a recursive descent parser, one
prefers to use the following grammar.

<numeral> ::= '2' | '4' | '5'.
<expression> ::= <numeral>[ '+' <expression> ].
start symbol: <expression>.

This can be written using iteration as follows.

<numeral> ::= '2' | '4' | '5'.
<expression> ::= <numeral>{ '+' <numeral> }.
start symbol: <expression>.

However, the product is created in the sense of the
first grammar. Example code follows.

#include <stdio.h> /* printf */

/* scanner */

static inline char get()
{ static char const * const source = "2+4+5)";
static int pos = 0;
return source[ pos++ ]; }

/* parser */

static inline int numeral(){ return get() - '0'; }

static int sum(){ int result = numeral();
while( '+' == get() )result += numeral();
return result; }

/* main */

int main( void ){ printf( "sum = %d\n", sum() ); }

To be able to parse expressions with higher
priority, the grammar can be extended.

<numeral> ::= '2' | '4' | '5'.
<product> ::= <numeral> | <product> '*' <numeral>.
<sum> ::= <product> | <sum> '+' <product>.
start symbol: <sum>.

In iterative notation:

<numeral> ::= '2' | '4' | '5'.
<product> ::= <numeral>{ '*' <numeral> }.
<sum> ::= <product>{ '+' <product> }.
start symbol: <sum>.

In C:

#include <stdio.h> /* printf */

/* scanner */

static inline char get( int const move )
{ static char const * const source = "2+4*5)";
static int pos = 0;
return source[ pos += move ]; }

/* parser */

static inline int numeral(){ return get( 1 )- '0'; }

static int product(){ int result = numeral();
while( '*' == get( 0 )){ get( 1 ); result *= numeral(); }
return result; }

static int sum(){ int result = product();
while( '+' == get( 1 ))result += product();
return result; }

/* main */

int main( void ){ printf( "sum = %d\n", sum() ); }

Exercises

- What is the output of the above programs?

- Extend the last grammar and the last program so as
to handle subtraction.

- Extend the result of the last exercise in order
to handle division.

- Extend the result of the last exercise so that also
numbers with multiple digits are accepted.

- Extend the result of the last exercise so that also
terms in parentheses are accepted. The input "(2+4)*5)"
should give the result "30".

- Extend the result of the last exercise so that
also a unary minus "-" is recognized.

- Extend the result of the last exercise so that
more operators and functions are recognized.

- Extend the result of the last exercise so that
meaningful error messages are created for all
inputs that do not fulfill the rules of the input
language.

- Extend the result of the last exercise so that the
error messages also show the location where the error
was detected. It should be possible to enter an expression
that spans multiple lines, and an error message should
contain the number of the line where the error was
detected.

See also:

http://compilers.iecc.com/crenshaw/


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
optparse: parse v. parse! ?? 7stud -- Ruby 3 02-20-2008 05:20 AM
Parse a html file as a XML file Stan SR ASP .Net 2 01-19-2008 05:56 PM
How to parse a string like C program parse the command line string? linzhenhua1205@163.com C Programming 19 03-15-2005 07:41 PM
How to check for EOF (End of file) when using StreamReader to parse text file Sacha Korell ASP .Net 2 09-06-2003 02:59 PM
Parse Text File and Output to File John M. Lembo Perl 0 08-01-2003 04:34 PM



Advertisments