Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > simple file compression program

Reply
Thread Tools

simple file compression program

 
 
sophia
Guest
Posts: n/a
 
      03-26-2008
Dear all,

the following is the file compression program ,using elimination of
spaces, which I saw in a book

#include<stdio.h>
#include<stdlib.h>

int main(int argc,char * argv[])
{

FILE* fs,*ft;

fs = fopen(argv[1],"r");
if(fs == NULL)
{
printf("\n Cannot open the file %s",argv[1]);
exit(1);
}

ft = fopen(argv[2],"w");
if(fs == NULL)
{
printf("\n Cannot open the file %s",argv[2]);
exit(1);
}

while( (ch=fgetc(fs)) != EOF)
{

if(ch == 32)
{
if( (ch=fgetc(fs)) != EOF)
fputc(ch+127,ft);
}
else
fputc(ch,ft);

}

fclose(fs);
fclose(ft);

return EXIT_SUCCESS;
}

Now my questions are as as follows

1) Is there any other simpler method to compress text files, similar
to the above program(Other than standard algorithms like huffman,LZW)
 
Reply With Quote
 
 
 
 
mstorkamp@yahoo.com
Guest
Posts: n/a
 
      03-26-2008
On Mar 26, 3:09 pm, sophia <(E-Mail Removed)> wrote:
> if(ch == 32)
> {
> if( (ch=fgetc(fs)) != EOF)
> fputc(ch+127,ft);
> }
> else
> fputc(ch,ft);


What happens when the character represented by the value 32 is the
last character in the file? You are not writing any representation of
that character to your output file. You will not be able to recreate
your source file.

> Now my questions are as as follows
>
> 1) Is there any other simpler method to compress text files, similar
> to the above program(Other than standard algorithms like huffman,LZW)


yes. Not really a C issue however. First define what you mean by 'text
file', then devise a way of mapping the (smaller) domain of your text
file into the (larger) domain of an unsigned char. And don't forget to
open your destination file for binary access.
 
Reply With Quote
 
 
 
 
Walter Roberson
Guest
Posts: n/a
 
      03-26-2008
In article <(E-Mail Removed)>,
sophia <(E-Mail Removed)> wrote:

>the following is the file compression program ,using elimination of
>spaces, which I saw in a book
>
>#include<stdio.h>
>#include<stdlib.h>
>
>int main(int argc,char * argv[])
>{
>
> FILE* fs,*ft;
>
> fs = fopen(argv[1],"r");
> if(fs == NULL)
> {
> printf("\n Cannot open the file %s",argv[1]);


You are not outputing a \n as the last character. It is
implementation defined at to whether the last output line will
appear in such a case (and it is also possible that it will appear
but then be immediately overwritten by the next shell prompt, making
it seem that it did not appear.)

Error messages are better output to stderr.

> exit(1);


exit(1) does not have a defined effect. The arguments
with defined meaning are 0, EXIT_SUCCESS and EXIT_FAILURE

> }
>
> ft = fopen(argv[2],"w");
> if(fs == NULL)
> {
> printf("\n Cannot open the file %s",argv[2]);
> exit(1);
> }
>
>while( (ch=fgetc(fs)) != EOF)


You have not declared ch by this point. The exact definition of ch
is important to the program. For example, if it were declared as
'char' and 'char' happened to be unsigned on that system, then
it would not be possible for ch to compare equal to EOF, which is
always negative.

>{
>
> if(ch == 32)


What is 32? If you mean a space, code a space, ' ' . The numerical
values of particular characters are not specified in C.

> {
> if( (ch=fgetc(fs)) != EOF)
> fputc(ch+127,ft);


As the character set representation is not specified by C, it
is possible that ch+127 is a valid character in the character set.

If the file ends in a 32 then that trailing 32 will be lost with
your logic.

I note that you do not open the file in binary mode. It could
happen that in the input, there were often space characters immediately
proceeding end-of-line indicators. The end of line indicators would
be read as '\n' and that '\n' would be transformed by your compressor
to '\n'+127 which is unlikely to be an end of line indicator. You
could thus end up with output lines that exceeded the maximum text
output line size supported by the implementation. You could also
potentially happen upon characters for which the character + 127
came out as '\n', thus introducing an end of line where there was none
before.

> }
> else
> fputc(ch,ft);
>
>}
>
> fclose(fs);
> fclose(ft);
>
> return EXIT_SUCCESS;
>}



>Now my questions are as as follows
>
>1) Is there any other simpler method to compress text files, similar
>to the above program(Other than standard algorithms like huffman,LZW)


Yes, many of them, most equally inefficient. The code you give at
best compresses space followed by a character to a different character
code, and leaves everything else alone -- it doesn't even try to
compress runs of spaces into something more efficient. If the code
were to be applied to typical English text, it would produce a
more efficient output if, instead of compressing spaces, it compressed
'e', 't', 'a', 'i', 'o', or 'n', all of which occur in English text
with greater frequency than space does.
--
"The whole history of civilization is strewn with creeds and
institutions which were invaluable at first, and deadly
afterwards." -- Walter Bagehot
 
Reply With Quote
 
Malcolm McLean
Guest
Posts: n/a
 
      03-26-2008

"sophia" <(E-Mail Removed)> wrote in message
> 1) Is there any other simpler method to compress text files, similar
> to the above program(Other than standard algorithms like huffman,LZW)
>

squnch compression. It's a sliding dictionarty method that has seen
induistrial use because of its super-fast decompress. Look in the Basic
Algorithms pages of my website.

--
Free games and programming goodies.
http://www.personal.leeds.ac.uk/~bgy1mm

 
Reply With Quote
 
Bartc
Guest
Posts: n/a
 
      03-26-2008

"sophia" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Dear all,
>
> the following is the file compression program ,using elimination of
> spaces, which I saw in a book

....
Now my questions are as as follows
>
> 1) Is there any other simpler method to compress text files, similar
> to the above program(Other than standard algorithms like huffman,LZW)


Knowing nothing about compression, I had a go myself.

My first attempt looked promising, but I wasn't processing the entire file
so it was actually *doubling* the size!

Had a second attempt, and I think if done properly (tie up all loose ends)
that could achieve 20-30% (reduction that is). But it is not that simple. In
fact it's very fiddly (and requires 2 passes of the input). I guess I could
get it up to 50% if I tried hard.

What compression levels are you trying to achieve? And how simple do you
want it?

In practice I guess it would be a much better idea to use an existing
compression library, unless you like a challenge.

--
Bart



 
Reply With Quote
 
Barry Schwarz
Guest
Posts: n/a
 
      03-27-2008
On Wed, 26 Mar 2008 13:09:35 -0700 (PDT), sophia
<(E-Mail Removed)> wrote:

>Dear all,
>
>the following is the file compression program ,using elimination of
>spaces, which I saw in a book


Was it listed as a bad example? Perhaps the book was intended as a
satire?

>
>#include<stdio.h>
>#include<stdlib.h>
>
>int main(int argc,char * argv[])
>{
>
> FILE* fs,*ft;
>
> fs = fopen(argv[1],"r");


How does the program know argv[1] is not NULL or for that matter that
it even exists?

> if(fs == NULL)
> {
> printf("\n Cannot open the file %s",argv[1]);
> exit(1);
> }
>
> ft = fopen(argv[2],"w");
> if(fs == NULL)
> {
> printf("\n Cannot open the file %s",argv[2]);
> exit(1);
> }
>
>while( (ch=fgetc(fs)) != EOF)


Where is ch declared?

>{
>
> if(ch == 32)


32 is not the value of ' ' on my system.

> {
> if( (ch=fgetc(fs)) != EOF)
> fputc(ch+127,ft);


On my system adding 127 to a printable character value will produce a
value that won't fit in a char. While this technically isn't overflow
since fputc takes an int, it will mess up the output file.

It appears to skip only one space. And it does so without regard to
whether the space is "significant".

> }
> else
> fputc(ch,ft);
>
>}
>
> fclose(fs);
> fclose(ft);
>
> return EXIT_SUCCESS;
>}
>
>Now my questions are as as follows
>
>1) Is there any other simpler method to compress text files, similar
>to the above program(Other than standard algorithms like huffman,LZW)



Remove del for email
 
Reply With Quote
 
sophia
Guest
Posts: n/a
 
      03-27-2008
On Mar 27, 10:11*am, Barry Schwarz <(E-Mail Removed)> wrote:
> On Wed, 26 Mar 2008 13:09:35 -0700 (PDT), sophia
>
> <(E-Mail Removed)> wrote:
> >Dear all,

>
> >the following is the file compression program ,using elimination of
> >spaces, which I saw in a book

>
> Was it listed as a bad example? *Perhaps the book was intended as a
> satire?


i don't know if the book was intended as sattire or not .
The book ISBN number is 81-7656-537-7 and this program is given in
page no: 55

> >while( (ch=fgetc(fs)) != EOF)

>
> Where is ch declared?
>
> >{

>
> > *if(ch == 32)

>
> 32 is not the value of ' ' on my system.
>
> > *{
> > * * *if( (ch=fgetc(fs)) != EOF)
> > * * *fputc(ch+127,ft);

>
> On my system adding 127 to a printable character value will produce a
> value that won't fit in a char. *While this technically isn't overflow
> since fputc takes an int, it will mess up the output file.
>
> It appears to skip only one space. *And it does so without regard to
> whether the space is "significant".
>
>

i think to skip more than one space the following changes can be
made(assuming 32 stands for ' ')


while( (ch=fgetc(fs)) != EOF)
{
if(ch == 32)
{
count = 1;
while( (ch=fgetc(fs)) == 32)
count++;
fputc(count+127,ft);
}
fputc(ch,ft);
}

fputc takes signed int or unsigned int ?
 
Reply With Quote
 
santosh
Guest
Posts: n/a
 
      03-27-2008
sophia wrote:

> On Mar 27, 10:11*am, Barry Schwarz <(E-Mail Removed)> wrote:
>> On Wed, 26 Mar 2008 13:09:35 -0700 (PDT), sophia
>>
>> <(E-Mail Removed)> wrote:
>> >Dear all,

>>
>> >the following is the file compression program ,using elimination of
>> >spaces, which I saw in a book

>>
>> Was it listed as a bad example? *Perhaps the book was intended as a
>> satire?

>
> i don't know if the book was intended as sattire or not .
> The book ISBN number is 81-7656-537-7 and this program is given in
> page no: 55
>
>> >while( (ch=fgetc(fs)) != EOF)

>>
>> Where is ch declared?
>>
>> >{

>>
>> > if(ch == 32)

>>
>> 32 is not the value of ' ' on my system.
>>
>> > {
>> > if( (ch=fgetc(fs)) != EOF)
>> > fputc(ch+127,ft);

>>
>> On my system adding 127 to a printable character value will produce a
>> value that won't fit in a char. *While this technically isn't
>> overflow since fputc takes an int, it will mess up the output file.
>>
>> It appears to skip only one space. *And it does so without regard to
>> whether the space is "significant".
>>
>>

> i think to skip more than one space the following changes can be
> made(assuming 32 stands for ' ')
>
>
> while( (ch=fgetc(fs)) != EOF)
> {
> if(ch == 32)


Why not make this ASCII independent by replacing 32 with ' '?

> {
> count = 1;
> while( (ch=fgetc(fs)) == 32)
> count++;
> fputc(count+127,ft);


And this is also implementation defined behaviour.

> }
> fputc(ch,ft);
> }
>
> fputc takes signed int or unsigned int ?


It takes a signed int argument, but converts that to an unsigned char
before writing to the stream. If the write fails it returns EOF,
otherwise the character it wrote converted to int.

 
Reply With Quote
 
Barry Schwarz
Guest
Posts: n/a
 
      03-28-2008
On Thu, 27 Mar 2008 02:45:44 -0700 (PDT), sophia
<(E-Mail Removed)> wrote:

>On Mar 27, 10:11*am, Barry Schwarz <(E-Mail Removed)> wrote:
>> On Wed, 26 Mar 2008 13:09:35 -0700 (PDT), sophia
>>
>> <(E-Mail Removed)> wrote:
>> >Dear all,

>>
>> >the following is the file compression program ,using elimination of
>> >spaces, which I saw in a book

>>
>> Was it listed as a bad example? *Perhaps the book was intended as a
>> satire?

>
>i don't know if the book was intended as sattire or not .
>The book ISBN number is 81-7656-537-7 and this program is given in
>page no: 55
>
>> >while( (ch=fgetc(fs)) != EOF)

>>
>> Where is ch declared?
>>
>> >{

>>
>> > *if(ch == 32)

>>
>> 32 is not the value of ' ' on my system.
>>
>> > *{
>> > * * *if( (ch=fgetc(fs)) != EOF)
>> > * * *fputc(ch+127,ft);

>>
>> On my system adding 127 to a printable character value will produce a
>> value that won't fit in a char. *While this technically isn't overflow
>> since fputc takes an int, it will mess up the output file.
>>
>> It appears to skip only one space. *And it does so without regard to
>> whether the space is "significant".
>>
>>

> i think to skip more than one space the following changes can be
>made(assuming 32 stands for ' ')


Why assume something known to be false when the expression ' ' will
work every time.

>
>
>while( (ch=fgetc(fs)) != EOF)
>{
> if(ch == 32)
> {
> count = 1;
> while( (ch=fgetc(fs)) == 32)


What happens if the last three characters in the stream are blank?

> count++;
> fputc(count+127,ft);
> }
> fputc(ch,ft);
>}
>
> fputc takes signed int or unsigned int ?



Remove del for email
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
What program is best at JPEG compression? u235bomb@ml1.net Digital Photography 84 08-07-2007 09:20 AM
Re: avi compression program needed fuctifino Computer Support 0 09-15-2004 10:08 PM
Program for compression b C Programming 1 09-22-2003 06:53 PM
Program for compression b C++ 1 09-22-2003 06:53 PM
A few simple problems in a simple program. jmac@berkeley.edu C Programming 7 07-23-2003 09:51 PM



Advertisments