Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Histogram of character frequencies

Reply
Thread Tools

Histogram of character frequencies

 
 
John Bode
Guest
Posts: n/a
 
      12-03-2007
On Dec 1, 5:02 pm, (E-Mail Removed) wrote:
> Johannes Bauer wrote:
> > (E-Mail Removed) schrieb:

>
> > > int x[256]; // frequencies

>
> > Global.

>
> It's completely acceptable to have variables defined at file scope in
> C!
>


But it's often not a good idea. You shouldn't define a variable at
file scope unless it *needs* to be at file scope.

>
>
> > > void main()

>
> > Illegal.

>
> Why does everyone have this hangup about this? I took a class in C a
> while back and my teacher always used void main() { ... }. I can
> confirm that it works fine with both MicroSoft compiler and BorLand.
>


The document which defines the C language provides two possible
definitions for main():

int main(void)
int main(int argc, char **argv) /* or char *argv[], which is
equivalent */

Many implementations will load and execute the program with no
*apparent* problem if main() is typed void, but that's no guarantee
that the system has not been left in bad or inconsistent state. There
is at least one platform out there (Acorn?) that will not load the
program *at all* if main() is not typed int.

Using int main() is guaranteed to work everywhere (at least on every
hosted implementation). Using void main() is *not* guaranteed to work
everywhere.

>
>
> > Either you are pretty dumb or you actually do not read ANY of the
> > answers which are given to you. I vote for number one.

>
> I read the answers but mostly people only comment on trivial things
> that aren't even errors! I'll be glad to have substantial comments on
> my code.
>


void main() is an error unless your particular compiler explicitly
supports it (which it probably doesn't).
 
Reply With Quote
 
 
 
 
John Bode
Guest
Posts: n/a
 
      12-03-2007
On Dec 1, 12:21 pm, (E-Mail Removed) wrote:
> Hello everyone,
>
> Thanks again for all the suggestions, though I think some people are a
> bit fussy in their answers.
>
> Here is a solution to Exercise 1.14. It deals well with control
> characters too.
>
> // make histogram of character frequencies
>
> int x[256]; // frequencies
>
> void main()


int main(void)

> {
> char c;


int c; /* getchar() returns int */

> int i, y=0, z;
> while(! feof(stdin) )
> if(++x[c=getchar()]>y)
> y=x[z=c];


feof() will not return true until *after* you try to read past the end
of file. You should check against the result of getchar() first:

while ((c = getchar()) != EOF)
{
if (++x[c] > y)
y = x[c]; /* z never gets used again, so we don't bother
with it */
}
if (c == EOF)
{
if (feof(stdin))
{
/* reached end-of-file condition */
}
else
{
/* Some other read error */
}
}

Also, years of experience have taught me that it's best to use
compound statements (i.e., braces) for all but the innermost scope of
a nested statement like that; it just makes things easier to read.

> do {
> for(i=0; i<256; i++)
> if(x[i]>0)
> printf("%s", x[i]>y ? " * " : " ");
> printf("\n");
> } while(y--);
> for(i=0; i<256; i++)
> if(x[i]>0)
> if(i>32)
> printf(" %c ", i);
> else
> printf("%02x ", i);


Again, I'd suggest using compound statements for all but the innermost
scope; it will just make things easier to follow in the future.

> printf("\n");
>
> }


 
Reply With Quote
 
 
 
 
Keith Thompson
Guest
Posts: n/a
 
      12-03-2007
santosh <(E-Mail Removed)> writes:
> Keith Thompson wrote:
>> http://www.velocityreviews.com/forums/(E-Mail Removed) writes:

> <snip>
>
>>> if(x[i]>0)
>>> if(i>32)

>>
>> Where do you get the value 32? It happens to be the code for the ' '
>> character in ASCII. Your code will be much clearer if you use a
>> character constant ' ' rather than an integer constant 32.
>>
>> Presumably the point is to determine whether i is a printable
>> character. Use the isgraph() function, declared in <ctype.h>, for
>> this.

>
> <snip>
>
> Why not isprint() instead of isgraph()?


Because the original program prints the hex code, not the character
itself, for the space character. isgraph() tests for printing
characters other than ' '.

--
Keith Thompson (The_Other_Keith) <(E-Mail Removed)>
Looking for software development work in the San Diego area.
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Chris Torek
Guest
Posts: n/a
 
      12-03-2007
>santosh a écrit :
>> A C "byte" may be wider than 8 bits.


In article <fj13cg$me0$(E-Mail Removed)> O_TEXT <(E-Mail Removed)> wrote:
>By usual definition, a byte is exactly 8 bits.


See <http://web.torek.net/torek/c/silliness.html>

>Most platforms works with ASCII character set.
>
>Do you know many platforms which work with a character set not
>compatible with ASCII?


Sure: a number of IBM mainframes use EBCDIC.

>How do you program with a variety of distincts characters on
>distincts platforms with the standard C?


It is mostly a matter of finding assumptions about character sets
(such as the assumption that "all alphabetic characters are
contiguous") and cleaning them up. There is usually a tradeoff
involved -- you may need an extra lookup table here or there, for
instance -- but in general the cost of accomodating different
"native text" encodings is relatively low, especially when compared
with the cost of accomodating UTF-8, UTF-16, Unicode, and/or
"internationalization".
--
In-Real-Life: Chris Torek, Wind River Systems
Salt Lake City, UT, USA (40°39.22'N, 111°50.29'W) +1 801 277 2603
email: forget about it http://web.torek.net/torek/index.html
Reading email is like searching for food in the garbage, thanks to spammers.
 
Reply With Quote
 
Barry Schwarz
Guest
Posts: n/a
 
      12-04-2007
On Sat, 1 Dec 2007 09:21:15 -0800 (PST),
(E-Mail Removed) wrote:

>Hello everyone,
>
>Thanks again for all the suggestions, though I think some people are a
>bit fussy in their answers.
>
>Here is a solution to Exercise 1.14. It deals well with control
>characters too.


For some version of solution and deals well.

>
>// make histogram of character frequencies
>
>int x[256]; // frequencies
>
>void main()


Trolling are you?

>{
> char c;
> int i, y=0, z;
> while(! feof(stdin) )


And again?

> if(++x[c=getchar()]>y)


If char defaults to signed, this will invoke undefined behavior just
after reading the last char (if you can generate an end of file
condition on stdin).

> y=x[z=c];


What do you use z for?

> do {
> for(i=0; i<256; i++)
> if(x[i]>0)
> printf("%s", x[i]>y ? " * " : " ");


You viewing device supports 256 character on a single line?

> printf("\n");
> } while(y--);
> for(i=0; i<256; i++)
> if(x[i]>0)
> if(i>32)


What do you think is magical about 32? My system has numerous
unprintable characters above that value.

> printf(" %c ", i);
> else
> printf("%02x ", i);
> printf("\n");
>}



Remove del for email
 
Reply With Quote
 
Barry Schwarz
Guest
Posts: n/a
 
      12-06-2007
On Sun, 2 Dec 2007 06:39:20 -0800 (PST),
(E-Mail Removed) wrote:

>James Kuyper wrote:
>> (E-Mail Removed) wrote:
>> > Johannes Bauer wrote:
>> >> (E-Mail Removed) schrieb:
>> >>
>> >>> int x[256]; // frequencies
>> >> Global.
>> >
>> > It's completely acceptable to have variables defined at file scope in
>> > C!

>>
>> What's acceptable is not always a good idea. Global objects have many
>> disadvantages; they should be avoided except when necessary; they aren't
>> necessary in this case.

>
>In this case they help simplify the code - the array gets initialized
>to 0 at compile-time, instead of needing extra code for an
>initialization loop bad for efficiency!


But you could achieve the same effect without any of the problems that
global variables cause simply by declaring the array static inside the
function.

snip

>OK you're right I should remember that. However I don't think it's the
>end of the world - the standard library is always linked in so the
>right functions will be found in the end by the linker.


The headers have very little to do with what the linker will link in
with your code and everything to do with the compiler generating
correct code. Leaving out stdlib.h and calling malloc introduces
undefined behavior. Leaving out string.h and passing anything other
than a void* or char* to memcpy or memset introduces undefined
behavior.

snip


Remove del for email
 
Reply With Quote
 
Peter 'Shaggy' Haywood
Guest
Posts: n/a
 
      12-06-2007
Groovy hepcat (E-Mail Removed) was jivin' in comp.lang.c
on Mon, 3 Dec 2007 1:39 am. It's a cool scene! Dig it.

> James Kuyper wrote:
>> (E-Mail Removed) wrote:
>> > Johannes Bauer wrote:
>> >> (E-Mail Removed) schrieb:
>> >>
>> >>> int x[256]; // frequencies
>> >> Global.
>> >
>> > It's completely acceptable to have variables defined at file scope
>> > in C!

>>
>> What's acceptable is not always a good idea. Global objects have many
>> disadvantages; they should be avoided except when necessary; they
>> aren't necessary in this case.

>
> In this case they help simplify the code - the array gets initialized
> to 0 at compile-time, instead of needing extra code for an
> initialization loop bad for efficiency!


Others have shown you how to initialise block scope arrays. The
generated object code may simply be a loop in which elements are
assigned a given value. In that case initialisation may be no more
efficient than a loop you write yourself. This is also true of the
implicit initialisation of a file scope array.
Code that makes extensive use of things like global variables is often
called "spaghetti code". Only meat ball programmers write code like
that.

>> > Why does everyone have this hangup about this? I took a class in C
>> > a while back and my teacher always used void main() { ... }. I can
>> > confirm that it works fine with both MicroSoft compiler and
>> > BorLand.

>>
>> That doesn't make it legal. A conforming implementation of C is
>> allowed to reject a program which declares main() that way.

>
> But no "conforming implementation" on Windows rejects it! I don't
> believe any C compiler anywhere would reject it.


Let's test that assertion, shall we? I reboot to Windoze (because I'm
using Linux), open a console window and enter these lines:

-----------------------------------------------------------------------
copy con testing.c
#include <stdio.h>

void main(void)
{
puts("Hello, World!");
}
^z
bcc32 -A -etesting.exe -w testing.c
-----------------------------------------------------------------------

The resulting output from Borland Builder is as follows:

-----------------------------------------------------------------------
Borland C++ 5.3 for Win32 Copyright (c) 1993, 1998 Borland International
testing.c:
Error testing.c 4: main must have a return type of int in function main
*** 1 errors in Compile ***
-----------------------------------------------------------------------

That's an error message (which halts compilation), not merely a warning
(which allows compilation to continue). When invoked with the -A
command line option (which forces it to be standard compliant) the
Borland compiler rejects void as a return type for main(). Not only has
"any C compiler anywhere" rejected it, but a "'conforming
implementation' on Windows" has rejected it.
Clearly, therefore, you are wrong.

>> > I read the answers but mostly people only comment on trivial things
>> > that aren't even errors! I'll be glad to have substantial comments
>> > on my code.

>>
>> Several of the "trivial" things people have commented on ARE errors,
>> and serious ones - you don't seem to understand how serious. Most
>> importantly, #inclusion of the appropriate standard headers is
>> absolutely essential for your code to even compile, at least under
>> most implentations of C. If what you've given us is the complete text
>> of your program, and if you are using a compiler which accepts your
>> code as written, junk it - it's teaching you some very bad habits.

>
> OK you're right I should remember that. However I don't think it's the
> end of the world - the standard library is always linked in so the
> right functions will be found in the end by the linker.


Who says? The library may not be linked without the compiler magic
contained in the headers. Or they may be linked, but functions not
called properly. The point is that failing to include the proper
headers is a very serious error, and you must understand this.

>> Also, you're using feof() incorrectly, and until you understand why
>> the way that you're using it is incorrect, I would not recommend
>> relying upon any of your programs to function properly.

>
> I don't really understand the problem with feof - it just checks if
> the EOF indicator is set in a given FILE * struct. Anyway I'll read
> about it.


Many newbies think feof()'s purpose is to indicate when the end of a
file is reached by a read function. This is incorrect. Its purpose is
to indicate when a file stream's end of file indicator is set. This
only happens when you try to read from a stream that has *already*
reached the end. To explain this more clearly, consider the following
situation.
Suppose you have a stream containing three bytes, and you are reading
one byte at a time, using getchar(), in a loop, like so:

while(!feof(stdin))
{
c = getchar();
putchar(c);
}

On the first iteration of the loop you test the end of file indicator
for the input stream (stdin in this example), and it is not set, so you
then read the first byte and write this out. On the second iteration of
the loop you test the end of file indicator again, and it is not set,
so you then read the second byte and write this out. On the third
iteration of the loop you test the end of file indicator again, and it
is not set, so you then read the third byte and write this out. So far
so good. But the end of file indicator is still *not* set, and feof()
will return false. So you iterate a fourth time and try another read.
The read fails, of course, because there are no more bytes in the
stream. This is the perfect time to exit the loop; and since getchar()
returns EOF to indicate that it failed to read a byte, you have the
perfect way to detect this situation. However, you ignore this value
and simply continue processing the (now invalid) data you think you've
read from the file. You send EOF to stdout. *Now* the end of file
indicator is set, and feof() returns true. But it's too late. You don't
test for this until the beginning of the fifth iteration, *after*
you've used the invalid data. You've read in three valid bytes and
written out four bytes, one of which is not valid.
What you should be doing is this:

int c; /* c must be an int so we can detect EOF. */

while(EOF != (c = getchar()))
{
putchar(c);
}
if(!feof(stdin)) /* Or we could use if(ferror(stdin)). */
{
/* File read error: handle it somehow. */
}

Here we attempt to read a byte with getchar(), and only enter the loop
if the return value does not indicate a failure to read a byte. After
the failure code (EOF) has been detected, the loop is exited, and we
then attempt to determine whether the failure occurred due to an error
or an end of file condition. Here's a breakdown of how it works (using
the same 3 byte example input as before).
On the first iteration we read the first byte and test whether the
read was successful. It was, so we output the byte. On the second
iteration we read the second byte and test whether the read was
successful. It was, so we output the byte. On the third iteration we
read the third byte and test whether the read was successful. It was,
so we output the byte. On the fourth iteration we read a byte, but the
stream is exhausted and the read fails; getchar() returns EOF. We
detect this and exit the loop. We've read in three valid bytes and
written out three bytes, all of which are valid. *Now* we call feof()
to test whether the failure was due to an end of file condition, and,
if so, skip the error handling code. However, if feof() returns false,
then the read failure must have been due to an error, in which case we
handle the error somehow (perhaps by emitting a diagnostic message and
quitting).

--
Dig the sig!

----------- Peter 'Shaggy' Haywood ------------
Ain't I'm a dawg!!
 
Reply With Quote
 
Flash Gordon
Guest
Posts: n/a
 
      12-06-2007
Barry Schwarz wrote, On 06/12/07 02:14:
> On Sun, 2 Dec 2007 06:39:20 -0800 (PST),
> (E-Mail Removed) wrote:
>
>> James Kuyper wrote:
>>> (E-Mail Removed) wrote:
>>>> Johannes Bauer wrote:
>>>>> (E-Mail Removed) schrieb:
>>>>>
>>>>>> int x[256]; // frequencies
>>>>> Global.
>>>> It's completely acceptable to have variables defined at file scope in
>>>> C!
>>> What's acceptable is not always a good idea. Global objects have many
>>> disadvantages; they should be avoided except when necessary; they aren't
>>> necessary in this case.

>> In this case they help simplify the code - the array gets initialized
>> to 0 at compile-time, instead of needing extra code for an
>> initialization loop bad for efficiency!

>
> But you could achieve the same effect without any of the problems that
> global variables cause simply by declaring the array static inside the
> function.


That still has a number of the problems of global variables, just not
all of them.

>> OK you're right I should remember that. However I don't think it's the
>> end of the world - the standard library is always linked in so the
>> right functions will be found in the end by the linker.

>
> The headers have very little to do with what the linker will link in
> with your code and everything to do with the compiler generating
> correct code. Leaving out stdlib.h and calling malloc introduces
> undefined behavior. Leaving out string.h and passing anything other
> than a void* or char* to memcpy or memset introduces undefined
> behavior.


Actually, any call to memcpy or memset without a declaration in scope
invokes undefined behaviour even if you pass void* parameters. This is
because the return type is void*, and the return type being wrong causes
undefined behaviour even if you do not use the value returned. I can
even think of ways it could cause problems!
--
Flash Gordon
 
Reply With Quote
 
RoS
Guest
Posts: n/a
 
      12-09-2007
In data Fri, 07 Dec 2007 00:52:19 +1100, Peter 'Shaggy' Haywood
scrisse:

>while(!feof(stdin))
>{
> c = getchar();
> putchar(c);
>}
>
>On the first iteration of the loop you test the end of file indicator
>for the input stream (stdin in this example), and it is not set, so you
>then read the first byte and write this out. On the second iteration of
>the loop you test the end of file indicator again, and it is not set,
>so you then read the second byte and write this out. On the third
>iteration of the loop you test the end of file indicator again, and it
>is not set, so you then read the third byte and write this out. So far
>so good. But the end of file indicator is still *not* set, and feof()
>will return false. So you iterate a fourth time and try another read.
>The read fails, of course, because there are no more bytes in the
>stream. This is the perfect time to exit the loop; and since getchar()
>returns EOF to indicate that it failed to read a byte, you have the
>perfect way to detect this situation. However, you ignore this value
>and simply continue processing the (now invalid) data you think you've
>read from the file. You send EOF to stdout. *Now* the end of file
>indicator is set, and feof() returns true. But it's too late. You don't
>test for this until the beginning of the fifth iteration, *after*
>you've used the invalid data. You've read in three valid bytes and
>written out four bytes, one of which is not valid.
> What you should be doing is this:
>
>int c; /* c must be an int so we can detect EOF. */
>
>while(EOF != (c = getchar()))
>{
> putchar(c);
>}
>if(!feof(stdin)) /* Or we could use if(ferror(stdin)). */
>{
> /* File read error: handle it somehow. */
>}


so what is right?

while(1)
{
c = getchar();
if( feof(stdin) || ferror(stdin) ) break;
putchar(c);
}

or

while(1)
{
c = getchar();
if( feof(stdin) ) break;
putchar(c);
}

while(1)
{
c = getchar();
if( ferror(stdin) ) break;
putchar(c);
}

or no one of above

thank you

>Here we attempt to read a byte with getchar(), and only enter the loop
>if the return value does not indicate a failure to read a byte. After
>the failure code (EOF) has been detected, the loop is exited, and we
>then attempt to determine whether the failure occurred due to an error
>or an end of file condition. Here's a breakdown of how it works (using
>the same 3 byte example input as before).
> On the first iteration we read the first byte and test whether the
>read was successful. It was, so we output the byte. On the second
>iteration we read the second byte and test whether the read was
>successful. It was, so we output the byte. On the third iteration we
>read the third byte and test whether the read was successful. It was,
>so we output the byte. On the fourth iteration we read a byte, but the
>stream is exhausted and the read fails; getchar() returns EOF. We
>detect this and exit the loop. We've read in three valid bytes and
>written out three bytes, all of which are valid. *Now* we call feof()
>to test whether the failure was due to an end of file condition, and,
>if so, skip the error handling code. However, if feof() returns false,
>then the read failure must have been due to an error, in which case we
>handle the error somehow (perhaps by emitting a diagnostic message and
>quitting).

 
Reply With Quote
 
William Pursell
Guest
Posts: n/a
 
      12-09-2007
On 9 Dec, 06:51, RoS <(E-Mail Removed)> wrote:

> so what is right?

<snip bad code>
>
> or no one of above


None of the above. Try:

while( EOF != ( c = getchar())) {
/* ... */
}

/* Now check feof() and ferror() */
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
D80 histogram vs histogram on computer Martin Sørensen Digital Photography 14 12-19-2007 09:41 PM
numpy: frequencies robert Python 2 11-18-2006 05:41 PM
Arbitrary Clock Frequencies From Base Clock abhisheknag@gmail.com VHDL 5 06-23-2006 12:45 PM
B vs G: Frequencies Used? (PeteCresswell) Wireless Networking 2 01-01-2006 08:16 PM
Cordless keyboard mouse frequencies Melv Computer Support 7 11-29-2003 03:34 AM



Advertisments