Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > getc and ungetc

Reply
Thread Tools

getc and ungetc

 
 
Bill Cunningham
Guest
Posts: n/a
 
      11-18-2004
Would getc and ungetc be the best and most simple what to parse
expressions for a parser?

Bill



 
Reply With Quote
 
 
 
 
Trent Buck
Guest
Posts: n/a
 
      11-18-2004
Quoth Bill Cunningham on or about 2004-11-18:
> Would getc and ungetc be the best and most simple what to parse
> expressions for a parser?


ungetc can't be applied more than once `in a row' (i.e. sequentially).
I suspect that makes for a rather unsuitable function for a parser.

-t
 
Reply With Quote
 
 
 
 
Chris Barts
Guest
Posts: n/a
 
      11-19-2004
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Bill Cunningham wrote:
| Would getc and ungetc be the best and most simple what to parse
| expressions for a parser?
|

It's usually easier to do like K&R did in "The C Programming Language"
(at the end, when they designed the RPN calculator): Read in a block of
text, and then define your own getchar()/ungetchar() functions to push
and pop characters on and off that buffer. You can read in more text (a
block at a time, for efficiency and convenience) when your getchar()
function tries to read beyond the end of your buffer.

You need to write a bit more code yourself, but it's fairly trivial code
and you can reuse it in any other parser you write.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBnXYsKxatjOtX+j0RAnY8AJ4gpW7DPFF/VgtMv7V2cdfzYINTawCeOcqi
fpvs+q6HPDjLx6sVc1ozUXk=
=A6oi
-----END PGP SIGNATURE-----
 
Reply With Quote
 
Malcolm
Guest
Posts: n/a
 
      11-20-2004
"Trent Buck" <(E-Mail Removed)> wrote
>
> > Would getc and ungetc be the best and most simple what to parse
> > expressions for a parser?

>
> ungetc can't be applied more than once `in a row' (i.e. sequentially).
> I suspect that makes for a rather unsuitable function for a parser.
>

It depends on your parser design.
Most simple parsers used for things like computer languages divide the input
stream into token, and then parse from left to right with one token of "look
ahead". So if your tokens are single characters then getc() and ungetc() may
be adequate.
Of course if you have ambitions to build a natural language parser, or even
some simple grammars with unusual characteristics, then this scheme won't
work, and you will need some method of scanning up and down many tokens on
input.


 
Reply With Quote
 
SM Ryan
Guest
Posts: n/a
 
      11-20-2004
# Of course if you have ambitions to build a natural language parser, or even
# some simple grammars with unusual characteristics, then this scheme won't
# work, and you will need some method of scanning up and down many tokens on
# input.

Such a parser runs the risk of exponential running time, while a tabular
parser doesn't need to back up and has cubic time worst case. ungetc has
at most marginal usability for some types of lexical scanners.

--
SM Ryan http://www.rawbw.com/~wyrmwif/
There are subtler ways of badgering a witness.
 
Reply With Quote
 
Bill Cunningham
Guest
Posts: n/a
 
      11-20-2004

"SM Ryan" <(E-Mail Removed)> wrote in
message news:(E-Mail Removed)...
> # Of course if you have ambitions to build a natural language parser, or

even
> # some simple grammars with unusual characteristics, then this scheme

won't
> # work, and you will need some method of scanning up and down many tokens

on
> # input.
>
> Such a parser runs the risk of exponential running time, while a tabular
> parser doesn't need to back up and has cubic time worst case. ungetc has
> at most marginal usability for some types of lexical scanners.
>

I have k&r 2. What about redirecting fgetc to stdio and using it insteat
getc?

Bill


 
Reply With Quote
 
Bill Cunningham
Guest
Posts: n/a
 
      11-20-2004

"Bill Cunningham" <(E-Mail Removed)> wrote in message
news:cbNnd.214$(E-Mail Removed)-media.com...
>
> "SM Ryan" <(E-Mail Removed)> wrote in
> message news:(E-Mail Removed)...
> > # Of course if you have ambitions to build a natural language parser, or

> even
> > # some simple grammars with unusual characteristics, then this scheme

> won't
> > # work, and you will need some method of scanning up and down many

tokens
> on
> > # input.
> >
> > Such a parser runs the risk of exponential running time, while a tabular
> > parser doesn't need to back up and has cubic time worst case. ungetc has
> > at most marginal usability for some types of lexical scanners.
> >

> I have k&r 2. What about redirecting fgetc to stdio and using it insteat
> getc?
>
> Bill


I've heard of top down recursive parsers.
Bill


 
Reply With Quote
 
Herbert Rosenau
Guest
Posts: n/a
 
      11-21-2004
On Thu, 18 Nov 2004 20:21:41 UTC, "Bill Cunningham" <(E-Mail Removed)>
wrote:

> Would getc and ungetc be the best and most simple what to parse
> expressions for a parser?


Yes. But be aware of that ungetc can only unget ONE char at a time.

On other hand you can simply by macro or function extend getc and
ungetc to unget more than one char. Your UNGETC() would accept a
number of chars to give back in your GETC() in reverse order. Your
GETC() will give back the chars UNGETC had received before it gets new
chars from the stream itself.

A bit tricky is to ungent a char that you have never gotten - legally
as there is nothing that forbids it and ungenc requires the char that
is to unget. Be sure that you does NOT tries to unget EOF, this won't
work. This can be very useful when you has a long list of keyword
delemiters with same meaning to a long list of similar keywords.

Your parser may convert keywords or keychars into tokens and save the
tokens until all or a part of the input stream is readed and then work
on the generated token, it may work in other ways you thinks it
matches your requirements.

getc() (in conjunktion with ungetc() ) gives you the strongest
possible control over the stream you can ever need. When needed you
can siply count the number of chars, words, lines... readed in as side
effect, reset these counters as needed..... You avoids supervising of
buffers - you does need one.

Build your parser as state mashine and you can reuse the same code
again and again beside the little number of statements you needs to
handle a specific (sub)state. You gets high flexibility as you would
easy extend the functionality of the parser by create a new
(sub)state. Makes maintenance an easy work.

--
Tschau/Bye
Herbert

Visit http://www.ecomstation.de the home of german eComStation

 
Reply With Quote
 
Malcolm
Guest
Posts: n/a
 
      11-22-2004

"Bill Cunningham" <(E-Mail Removed)> wrote
>
> I've heard of top down recursive parsers.
>

What is important in terms of the tokenizer is that they be left-right with
only one token of lookahead. Most practical parsers are in this class.

Unfortunately, using getc / ungetc means that the tokens are constrained to
be single characters. This may be Ok for a very simple application, but if
the tokens are naturally several characters long it will be a nuisance. You
can do it, for instance if you have a mathematical function tan() then you
could build it up from the latters 't' 'a' and 'n' rather than reading it in
as a single token TAN. But it is generally a lot easier to do the lexical
analysis before the parse proper.


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
StringIO#ungetc returns dupes in Ruby1.9? David Masover Ruby 0 06-06-2008 09:51 AM
getc() and EOF stephen.r.gray@gmail.com C Programming 5 01-27-2007 08:24 AM
scanf(), ungetc() behaviour. Argento C Programming 62 10-06-2006 11:04 PM
ungetc av C Programming 15 07-19-2006 05:31 PM
getc can return EOF, but ungetc can't sent it back... why? TTroy C Programming 11 03-14-2005 12:13 AM



Advertisments