Velocity Reviews

Velocity Reviews (http://www.velocityreviews.com/forums/index.php)
-   C Programming (http://www.velocityreviews.com/forums/f42-c-programming.html)
-   -   newby question on C I/O (http://www.velocityreviews.com/forums/t805552-newby-question-on-c-i-o.html)

analyst41@hotmail.com 11-04-2011 10:07 PM

newby question on C I/O
 
I have a file that looks like

Value Cost Special_Status
12 34 Yes
21 44 yes
32 43 no
......................

I can read it into some arrays (or a data frame using R) using Fortran
trivially

subroutine getdat(values,costs,statuses.items)

character statuses*3
real values,costs
dimension values(1000),Costs(1000),Statuses(1000)

integer items, j

open (unit=1,file="filename")

read (1,*)

j = 0
do
j = j + 1
read (1,*,end=100) values(j),costs(j),statuses(j)
enddo

100 continue

items = j - 1

stop
end

Seems much harder in C - can anyone post an intuitive, easy solution
to do this in C?

Thanks.

Eric Sosman 11-05-2011 12:50 AM

Re: newby question on C I/O
 
On 11/4/2011 6:07 PM, analyst41@hotmail.com wrote:
> I have a file that looks like
>
> Value Cost Special_Status
> 12 34 Yes
> 21 44 yes
> 32 43 no
> .....................
>
> I can read it into some arrays (or a data frame using R) using Fortran
> trivially


Hmmph. This isn't the FORTRAN I knew. But anyhow ...

> subroutine getdat(values,costs,statuses.items)


I don't comprehend the "statuses.items" notation -- as I hinted,
my acquaintance with FORTRAN has been tenuous for the last few decades.
I'll proceed on the assumption that "." should have been ",".

> character statuses*3
> real values,costs
> dimension values(1000),Costs(1000),Statuses(1000)


Is Fortran case-blind? FORTRAN knew only one letter case, but
most compilers would have been mightily confused by the difference
between 'C' and 'c' or 'S' and 's'. (I recall a little game we used
to play in school: The object was to produce an inexplicable listing
of a program and its output, and to post same as a puzzle. One of the
better hinged on the difference between 'C' and 'c', both of which
appeared as 'C' on the monocase printer, but only one of which was
recognized as "this introduces a comment." Coupled with a non-blank
but non-printing character in the "this line continues the previous"
position, this allowed something that looked like a comment in the
listing to be actual, executable code ...)

> integer items, j
>
> open (unit=1,file="filename")
>
> read (1,*)
>
> j = 0
> do
> j = j + 1
> read (1,*,end=100) values(j),costs(j),statuses(j)
> enddo
>
> 100 continue
>
> items = j - 1
>
> stop
> end
>
> Seems much harder in C - can anyone post an intuitive, easy solution
> to do this in C?


#include <stdio.h>
#include <stdlib.h>
void getdat(float values[], float costs[],
char[][4] statuses, int items) {
char buffer[200]; /* big enough for a line */
FILE *unit = fopen("filename", "r");
fgets(buffer, sizeof buffer, unit); /* read 1st line */
for (int j = 0; j < items; ++j) {
/* note that C arrays are 0-based */
fgets(buffer, sizeof buffer, unit);
sscanf(buffer, "%f%f%s", &values[j], &costs[j],
&statuses[j]);
}
exit(EXIT_SUCCESS);
}

A couple notes: First, what I've provided is TERRIBLE code,
because it completely ignores the possibility of error: garbage in
the input, actual I/O errors, or anything else. I left it all out
because I don't know what your newfangled Fortran does in such cases
and thus don't know how to imitate it. In my day, FORTRAN had FORMAT
constructs, and there were fatal run-time errors if the input didn't
match the FORMAT.

Second, it would be simple to combine the fgets/sscanf pair into
a single fscanf call. If the absence of error-checking were acceptable
this would be a sensible thing to do. But when error-checking is added,
it will probably turn out to be useful to separate the physical I/O
(fgets) from the interpretation of the data (sscanf) instead of trying
to put them into one portmanteau (fscanf).

Third, both code snippets "leak" an open I/O stream. Since both
terminate the program that's probably not important; the operating
system's post-termination cleanup will probably dispose of them (in
C's case, "probably" becomes "definitely"). But if the eventual intent
is to do something more than just read and ignore all the data, you'd
want an fclose() call in C and an I-don't-know-what in Fortran.

Finally, I reiterate: I'm just guessing at what your Fortran does;
it seems to be some kind of mutant offspring of the FORTRAN I once knew,
and the genetic damage may extend to more than just appearance.

--
Eric Sosman
esosman@ieee-dot-org.invalid

James Kuyper 11-05-2011 01:46 AM

Re: newby question on C I/O
 
On 11/04/2011 06:07 PM, analyst41@hotmail.com wrote:
> I have a file that looks like
>
> Value Cost Special_Status
> 12 34 Yes
> 21 44 yes
> 32 43 no
> .....................
>
> I can read it into some arrays (or a data frame using R) using Fortran
> trivially
>
> subroutine getdat(values,costs,statuses.items)
>
> character statuses*3
> real values,costs
> dimension values(1000),Costs(1000),Statuses(1000)
>
> integer items, j
>
> open (unit=1,file="filename")
>
> read (1,*)
>
> j = 0
> do
> j = j + 1
> read (1,*,end=100) values(j),costs(j),statuses(j)


The '*' in that READ statement, where a FORMAT would ordinarily go,
indicates implicit formatting. A key difference between C and Fortran is
that I/O is a built-in part of the Fortran language, while in C I/O is
handled by standard library functions. Given the way C functions work,
that means that something equivalent to the implicit formatting used by
the READ statement above isn't feasible in C. The simplest approach
would be to use fscanf():

FILE *infile = fopen("filename", "r");

if(infile)
{
int j = 0;

while(fscanf(infile, "%f%f%3s\n",
values[j], costs[j], statuses[j]) == 3)
{
// process the line of data
j++;
}
if(ferror(infile)
{
// Error handling.
}
else
{
// Reached end of file.
}

if(fclose(infile))
{ // File couldn't be closed
// error handling
}
}
else
{ // File couldn't be opened
// error handling
}

I tried to keep this code simple, so it's can be compared to your
Fortran. I added only a little more error handling than is present in
your code. However, by my standards, this code is still less than ideal,
because it doesn't behave well in the face of format errors in the input
file.

If any line contains a number too big to be represented as a floating
point value, the behavior of fscanf() is undefined. If one of the
numbers is incorrectly formatted, the error detection capabilities of
this approach are limited. If any line contains too many fields, or too
few, the remaining lines in the file will be parsed incorrectly, because
fscanf() doesn't attach any special significant to new lines; they're
just whitespace, equivalent to tabs or spaces. I'm not sufficiently
familiar with the Fortran READ statement to be sure, but I suspect that
many of those issues would be equally problematic for your Fortran code.

I would use fgets() to fill in a line buffer, so I can process the data
one line at a time, checking for the possibility that the line being
read in is longer than the buffer. Format errors on one line won't carry
over to other lines.

I've only recently learned how to avoid the undefined behavior when the
numbers are too big, by using strtod(). It's more complicated to use
than fscanf("%f"), but it provides much better detection of possible
input errors.
--
James Kuyper

analyst41@hotmail.com 11-05-2011 01:57 AM

Re: newby question on C I/O
 
On Nov 4, 8:50*pm, Eric Sosman <esos...@ieee-dot-org.invalid> wrote:
> On 11/4/2011 6:07 PM, analys...@hotmail.com wrote:
>
> > I have a file that looks like

>
> > Value Cost Special_Status
> > 12 * *34 * * * * *Yes
> > 21 * *44 * * * * *yes
> > 32 * *43 * * * * * no
> > .....................

>
> > I can read it into some arrays (or a data frame using R) using Fortran
> > trivially

>
> * * *Hmmph. *This isn't the FORTRAN I knew. *But anyhow ...
>
> > * * * * * subroutine getdat(values,costs,statuses.items)

>
> * * *I don't comprehend the "statuses.items" notation -- as I hinted,
> my acquaintance with FORTRAN has been tenuous for the last few decades.
> I'll proceed on the assumption that "." should have been ",".
>
> > * * * * *character statuses*3
> > * * * * *real values,costs
> > * * * * *dimension values(1000),Costs(1000),Statuses(1000)

>
> * * *Is Fortran case-blind? *FORTRAN knew only one letter case, but
> most compilers would have been mightily confused by the difference
> between 'C' and 'c' or 'S' and 's'. *(I recall a little game we used
> to play in school: The object was to produce an inexplicable listing
> of a program and its output, and to post same as a puzzle. *One of the
> better hinged on the difference between 'C' and 'c', both of which
> appeared as 'C' on the monocase printer, but only one of which was
> recognized as "this introduces a comment." *Coupled with a non-blank
> but non-printing character in the "this line continues the previous"
> position, this allowed something that looked like a comment in the
> listing to be actual, executable code ...)
>
>
>
>
>
> > * * * * * integer items, j

>
> > * * * * *open (unit=1,file="filename")

>
> > * * * * *read (1,*)

>
> > * * * * *j = 0
> > * * * * *do
> > * * * * *j = j + 1
> > * * * * read (1,*,end=100) values(j),costs(j),statuses(j)
> > * * * * enddo

>
> > 100 continue

>
> > * * * *items = j - 1

>
> > * * * *stop
> > * * * *end

>
> > Seems much harder in C - can anyone post an intuitive, easy solution
> > to do this in C?

>
> * * * * #include <stdio.h>
> * * * * #include <stdlib.h>
> * * * * void getdat(float values[], float costs[],
> * * * * * * * * char[][4] statuses, int items) {
> * * * * * * char buffer[200]; */* big enough for a line */
> * * * * * * FILE *unit = fopen("filename", "r");
> * * * * * * fgets(buffer, sizeof buffer, unit); */* read 1st line */
> * * * * * * for (int j = 0; j < items; ++j) {
> * * * * * * * * /* note that C arrays are 0-based */
> * * * * * * * * fgets(buffer, sizeof buffer, unit);
> * * * * * * * * sscanf(buffer, "%f%f%s", &values[j], &costs[j],
> * * * * * * * * * * * *&statuses[j]);
> * * * * * * }
> * * * * * * exit(EXIT_SUCCESS);
> * * * * }
>
> * * *A couple notes: First, what I've provided is TERRIBLE code,
> because it completely ignores the possibility of error: garbage in
> the input, actual I/O errors, or anything else. *I left it all out
> because I don't know what your newfangled Fortran does in such cases
> and thus don't know how to imitate it. *In my day, FORTRAN had FORMAT
> constructs, and there were fatal run-time errors if the input didn't
> match the FORMAT.
>
> * * *Second, it would be simple to combine the fgets/sscanf pair into
> a single fscanf call. *If the absence of error-checking were acceptable
> this would be a sensible thing to do. *But when error-checking is added,
> it will probably turn out to be useful to separate the physical I/O
> (fgets) from the interpretation of the data (sscanf) instead of trying
> to put them into one portmanteau (fscanf).
>
> * * *Third, both code snippets "leak" an open I/O stream. *Since both
> terminate the program that's probably not important; the operating
> system's post-termination cleanup will probably dispose of them (in
> C's case, "probably" becomes "definitely"). *But if the eventual intent
> is to do something more than just read and ignore all the data, you'd
> want an fclose() call in C and an I-don't-know-what in Fortran.
>
> * * *Finally, I reiterate: I'm just guessing at what your Fortran does;
> it seems to be some kind of mutant offspring of the FORTRAN I once knew,
> and the genetic damage may extend to more than just appearance.
>
> --
> Eric Sosman
> esos...@ieee-dot-org.invalid- Hide quoted text -
>
> - Show quoted text -


How weird, you want a WORRKING code :-)

OK then, if you insist.

> cat data.txt

value cost premium_status
23.0 43 Yes
23 23.3 no
43 60 yes

> cat fortranreader.f95

program main

parameter (maxrows = 10)

real values,costs
character*3 statuses
dimension values(maxrows),costs(maxrows),statuses(1000)

integer items,j

call getdat(values,costs,statuses,items)

do j = 1,items

print *,values(j),costs(j),trim(statuses(j))

enddo

! Do something with the data

stop
end

subroutine getdat(values,costs,statuses,items)

parameter (maxrows = 10)

real values,costs
character*3 statuses
dimension values(maxrows),costs(maxrows),statuses(1000)

integer items,j

open (unit=1,file = 'data.txt')

read (1,*)

do j = 1,maxrows

read (1,*,end=100) values(j),costs(j),statuses(j)

enddo

100 continue

items = j - 1

return
end

> a.out

23.000000 43.000000 Yes
23.000000 23.299999 no
43.000000 60.000000 yes

I am going to type your code in and see what it does. But the working
fortran shows what I want to do with the C (reason is that I have to
call a C library with the data).

Ike Naar 11-05-2011 06:52 AM

Re: newby question on C I/O
 
On 2011-11-05, James Kuyper <jameskuyper@verizon.net> wrote:
> while(fscanf(infile, "%f%f%3s\n",
> values[j], costs[j], statuses[j]) == 3)


The infamous scanf trap :-)

&values[j], &costs[j], statuses[j]) == 3)

Just curious: is the trailing "\n" in the fscanf format useful,
or would the behaviour be the same if it were omitted?

Malcolm McLean 11-05-2011 11:13 AM

Re: newby question on C I/O
 
On Nov 5, 8:52*am, Ike Naar <i...@iceland.freeshell.org> wrote:
>
> Just curious: is the trailing "\n" in the fscanf format useful,
> or would the behaviour be the same if it were omitted?
>

Whitespace matches whitespace. It's better to out a newline rather
than a space, because it indicates to a reader what you expect, but
from the program's point of view it makes no difference. This is a
flaw in fscanf(). It's a very old function, and in the light of thirty
year's experience we'd specify it a bit differently if it was new.
--
Basic Algorithms - how to do 3D games
http://www.malcolmmclean.site11.com/www

James Kuyper 11-05-2011 12:16 PM

Re: newby question on C I/O
 
On 11/05/2011 02:52 AM, Ike Naar wrote:
> On 2011-11-05, James Kuyper <jameskuyper@verizon.net> wrote:
>> while(fscanf(infile, "%f%f%3s\n",
>> values[j], costs[j], statuses[j]) == 3)

>
> The infamous scanf trap :-)
>
> &values[j], &costs[j], statuses[j]) == 3)


Sorry for the mistake. I don't use fscanf() very often, and not just
because of the problems I mentioned with out-of-range numeric values.
For the last 15 years, most of the files my programs read are in HDF-EOS
format, which they read using third-party library functions. Most of the
rest are in CCSDS Production Data Se format, which they read using fread().

> Just curious: is the trailing "\n" in the fscanf format useful,
> or would the behaviour be the same if it were omitted?


I remember reaching the conclusion, a decade or two ago, that it did
have some useful effect, but I can't find support for that concept by
reading the standard's description of fscanf(). I believe I was assuming
that a white-space directive requires the presence of at least one
corresponding character of white-space, but I can't find explicit
wording to that effect.
--
James Kuyper

Eric Sosman 11-05-2011 01:13 PM

Re: newby question on C I/O
 
On 11/5/2011 2:52 AM, Ike Naar wrote:
> On 2011-11-05, James Kuyper<jameskuyper@verizon.net> wrote:
>> while(fscanf(infile, "%f%f%3s\n",
>> values[j], costs[j], statuses[j]) == 3)

>
> The infamous scanf trap :-)
>
> &values[j],&costs[j], statuses[j]) == 3)
>
> Just curious: is the trailing "\n" in the fscanf format useful,
> or would the behaviour be the same if it were omitted?


It's "useful" if the intended "use" is to convert three
items and then gobble all the whitespace that follows, on the
current and all subsequent lines. If '\n' were omitted the
behavior would change: There'd be no post-conversion gobbling.

--
Eric Sosman
esosman@ieee-dot-org.invalid

Eric Sosman 11-05-2011 01:21 PM

Re: newby question on C I/O
 
On 11/5/2011 8:16 AM, James Kuyper wrote:
> On 11/05/2011 02:52 AM, Ike Naar wrote:
>> On 2011-11-05, James Kuyper<jameskuyper@verizon.net> wrote:
>>> while(fscanf(infile, "%f%f%3s\n",
>>> values[j], costs[j], statuses[j]) == 3)

>>
>> The infamous scanf trap :-)
>>
>> &values[j],&costs[j], statuses[j]) == 3)

>
> Sorry for the mistake. I don't use fscanf() very often, and not just
> because of the problems I mentioned with out-of-range numeric values.
> For the last 15 years, most of the files my programs read are in HDF-EOS
> format, which they read using third-party library functions. Most of the
> rest are in CCSDS Production Data Se format, which they read using fread().
>
>> Just curious: is the trailing "\n" in the fscanf format useful,
>> or would the behaviour be the same if it were omitted?

>
> I remember reaching the conclusion, a decade or two ago, that it did
> have some useful effect, but I can't find support for that concept by
> reading the standard's description of fscanf(). I believe I was assuming
> that a white-space directive requires the presence of at least one
> corresponding character of white-space, but I can't find explicit
> wording to that effect.


A single white space character in the format string matches any
number of white space characters in the input, including zero. All
that matters is the whiteness: '\n' will gobble '\n', but also '\t'
and '\r' and '\v' and '\f' and ' ', and perhaps other locale-specific
spaces, in solid runs or in combinations.

--
Eric Sosman
esosman@ieee-dot-org.invalid

BartC 11-06-2011 09:23 PM

Re: newby question on C I/O
 
<analyst41@hotmail.com> wrote in message
news:0f0ee88e-09f4-43ca-8c95-84f9a4846455@c1g2000vbw.googlegroups.com...
> I have a file that looks like
>
> Value Cost Special_Status
> 12 34 Yes
> 21 44 yes
> 32 43 no
> .....................
>
> I can read it into some arrays (or a data frame using R) using Fortran
> trivially
>
> subroutine getdat(values,costs,statuses.items)


> character statuses*3
> real values,costs
> dimension values(1000),Costs(1000),Statuses(1000)


This looks weird Fortran (I'm sure you never needed to declare arrays in two
steps.)

But anyway, you have a file containing 3 numbers per line, and hopefully not
more than 1000 lines, to read into these arrays.

This sounds like line-oriented input. I can do this in C, but with several
layers of functions over the standard C functions. I think at the bottom of
the pile of functions, is a fgets() call, and a sscanf() call (a version of
scanf), and a bit of string processing to chop up each line into tokens.
sscanf() is made to operate on one token at a time (each a separate string).

Quite a lot of work, but it means that in the end I can just do:

'read the next line into a buffer'

'read a floating point number' (into values[i])
'read another number' (into costs[i])
'read a name' (into status[i])

without ever again having to delve into the mysteries of scanf().

If end-of-buffer is encountered, then it will read 0.0 or "", rather than
wrap to the next line and get everything out of step.

So, yes it is harder than Fortran, but it's doable with some effort.

--
Bartc



All times are GMT. The time now is 12:55 AM.

Powered by vBulletin®. Copyright ©2000 - 2014, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.