Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Looking for advice on how to deal with array of structs

Reply
Thread Tools

Looking for advice on how to deal with array of structs

 
 
Cliff Martin
Guest
Posts: n/a
 
      01-25-2007
Hi,

I am reading a fairly large file a line at a time, doing some
processing, and filtering out bits of the line. I am storing the
interesting information in a struct and then printing it out. This
works without any problems. I now would like to filter "duplicate"
records. They aren't really duplicate, I can't just qsort as is and
eliminate the matching rows. I have number of fields that will be the
same, but some of the fields will differ and the timestamp may be off
by a second or two. I want to eliminate records that have the fields
that match where the difference between timestamps is less than n
seconds. Again, this is not a problem, I can get seconds since epoch
and compare, or just use difftime. My problem is this, I want to build
an array of structs, that will hold lines that match, sort them on the
date member and then eliminate based on matching timestamps.

For example in the set:
foo,bar ...,a, Thu Jan 25 01:40:11 EST 2007
foo,bar ...,a, Thu Jan 25 01:45:35 EST 2007
foo,bar ...,a, Thu Jan 25 01:48:09 EST 2007
foo,bar ...,b, Thu Jan 25 01:40:12 EST 2007
foo,baz ..., Thu Jan 25 01:40:11 EST 2007

I would like to read the first 4 lines into structs, store them in
array, sort them and then print them out, while eliminating the
"duplicate" line 4. I would not want to read the 5th line, yet, because
it is dissimilar to the first 4 - foo, baz instead of foo,bar. I am
ignoring the field that has a or b in it (one of the reasons a simple
sort will not work)

My problem comes from not knowing where to create the array, what size
to allocate, and how to re-initialize it when I move to the next set to
sort.

I have included a mix of psuedo code and real code. I have also made
everything generic. My ultimate questions are:

* Is it a good idea to declare the struct instance outside the while
loop and the reinitialize it every time through the loop, or would it
be better to make it local? The real struct is 132 bytes.

* Is this the best way to (re)initialize a struct (init_cr)?

* How do I (re)initialize an array of struct?

* any comments on how I plan to tackle this

Code:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>

typedef struct {
char field1[11];
char field1[11];
time_t date_secs;
} record;

int main(int argc, char *argv[]) {
FILE *fp;
fp = stdin;
record cr, lr;
int len = 512;
char buf[len+1];

record records[11];

fp = fopen(argv[1], "r");
if(fp == NULL) {
fputs("Could not open file for reading", stderr);
exit(1);
}

while(fgets(buf, len, fp)) {
init_cr(&cr);

/* split line into fields */
/* process fields and save in struct */

/* if the fields of interest match last fields */
/* save struct in next position in array */
/* increment counter for how many structs are saved */
/* resize the array to accommodate more structs if needed
*/
/* else */
/* print array of structs */
/* reset array ??? */
}
return 0;
}

void init_cr(callrecord *cr) {
cr->field1[0] = '\0';
cr->field2[0] = '\0';
cr->date_secs = 0;
}


Cliff

 
Reply With Quote
 
 
 
 
santosh
Guest
Posts: n/a
 
      01-25-2007
Cliff Martin wrote:
> Hi,
>
> I am reading a fairly large file a line at a time, doing some
> processing, and filtering out bits of the line. I am storing the
> interesting information in a struct and then printing it out. This
> works without any problems. I now would like to filter "duplicate"
> records. They aren't really duplicate, I can't just qsort as is and
> eliminate the matching rows. I have number of fields that will be the
> same, but some of the fields will differ and the timestamp may be off
> by a second or two. I want to eliminate records that have the fields
> that match where the difference between timestamps is less than n
> seconds. Again, this is not a problem, I can get seconds since epoch
> and compare, or just use difftime. My problem is this, I want to build
> an array of structs, that will hold lines that match, sort them on the
> date member and then eliminate based on matching timestamps.
>
> For example in the set:
> foo,bar ...,a, Thu Jan 25 01:40:11 EST 2007
> foo,bar ...,a, Thu Jan 25 01:45:35 EST 2007
> foo,bar ...,a, Thu Jan 25 01:48:09 EST 2007
> foo,bar ...,b, Thu Jan 25 01:40:12 EST 2007
> foo,baz ..., Thu Jan 25 01:40:11 EST 2007
>
> I would like to read the first 4 lines into structs, store them in
> array, sort them and then print them out, while eliminating the
> "duplicate" line 4. I would not want to read the 5th line, yet, because
> it is dissimilar to the first 4 - foo, baz instead of foo,bar. I am
> ignoring the field that has a or b in it (one of the reasons a simple
> sort will not work)
>
> My problem comes from not knowing where to create the array, what size
> to allocate, and how to re-initialize it when I move to the next set to
> sort.


>From your requirements it appears that a linked list will be a better

option than an array of structs. It's easy to keep it sorted as you add
elements to the list or remove them.

> I have included a mix of psuedo code and real code. I have also made
> everything generic. My ultimate questions are:
>
> * Is it a good idea to declare the struct instance outside the while
> loop and the reinitialize it every time through the loop, or would it
> be better to make it local? The real struct is 132 bytes.


Depends on the desired lifetime for the structure instance.

> * Is this the best way to (re)initialize a struct (init_cr)?
>
> * How do I (re)initialize an array of struct?


Use a FOR loop.

> * any comments on how I plan to tackle this
>
> Code:
> #include <stdio.h>
> #include <stdlib.h>
> #include <time.h>
>
> typedef struct {
> char field1[11];
> char field1[11];


Two objects in scope with the same identifier.

> time_t date_secs;
> } record;
>
> int main(int argc, char *argv[]) {
> FILE *fp;
> fp = stdin;


This is not guaranteed to work on all C implementations.

> record cr, lr;
> int len = 512;
> char buf[len+1];


Nor this.

> record records[11];
>
> fp = fopen(argv[1], "r");
> if(fp == NULL) {
> fputs("Could not open file for reading", stderr);
> exit(1);


Use EXIT_FAILURE instead of 1, unless you have a good reason for using
the latter.

> }
>
> while(fgets(buf, len, fp)) {


fgets() will store only len-1 characters into the buffer so sizeof(buf)
would do.

> init_cr(&cr);
>
> /* split line into fields */
> /* process fields and save in struct */
>
> /* if the fields of interest match last fields */
> /* save struct in next position in array */
> /* increment counter for how many structs are saved */
> /* resize the array to accommodate more structs if needed
> */


Note that you cannot resize a statically allocated array, unless it's a
VLA. Allocate using malloc() and resize with realloc().

> /* else */
> /* print array of structs */
> /* reset array ??? */


If the next iteration will rewrite to the array, then probably this is
not needed.

> }
> return 0;
> }
>
> void init_cr(callrecord *cr) {
> cr->field1[0] = '\0';
> cr->field2[0] = '\0';
> cr->date_secs = 0;
> }


I still feel a linked list of structures may serve your purpose better.

 
Reply With Quote
 
 
 
 
Cliff Martin
Guest
Posts: n/a
 
      01-25-2007
> > typedef struct {
> > char field1[11];
> > char field1[11];Two objects in scope with the same identifier.


should be field2, but just for the example. real code has about 12
different identifiers, all of which are unique.

> > int main(int argc, char *argv[]) {
> > FILE *fp;
> > fp = stdin;

> This is not guaranteed to work on all C implementations.


assigning to a file pointer? So I should just open stdin like a regular
file if the fp is NULL? Is the a compiler option to force this usage?
I'm using gcc.

> > record cr, lr;
> > int len = 512;
> > char buf[len+1];

> Nor this.


what's wrong with this?

> > exit(1);

>Use EXIT_FAILURE instead of 1, unless you have a good reason for using
> the latter.


ok.

> > while(fgets(buf, len, fp)) {

> fgets() will store only len-1 characters into the buffer so sizeof(buf)
> would do.


did not know this. Use this style everywhere. I will correct it.

> */Note that you cannot resize a statically allocated array, unless it's a
> VLA. Allocate using malloc() and resize with realloc().


What is a VLA - Very Large Array?

Thanks,

Cliff

 
Reply With Quote
 
Cliff Martin
Guest
Posts: n/a
 
      01-25-2007
OK, I am using the command line options -ansi and -pedantic. They tell
me some things I'm doing wrong.

I don't understand why this is wrong:

record cr, lr;

The compiler complains I'm mixing declarations and code.

The compiler does not catch the assigning stdin to fp, how can I
enforce that?

Cliff

 
Reply With Quote
 
Cliff Martin
Guest
Posts: n/a
 
      01-25-2007
On Jan 25, 9:16 am, "Cliff Martin" <(E-Mail Removed)> wrote:
> OK, I am using the command line options -ansi and -pedantic. They tell
> me some things I'm doing wrong.
>
> I don't understand why this is wrong:
>
> record cr, lr;
>
> The compiler complains I'm mixing declarations and code.
>


Never mind about the mixing declarations and code, I just had some
lines out of order.

Cliff

 
Reply With Quote
 
Chris Dollin
Guest
Posts: n/a
 
      01-25-2007
Cliff Martin wrote:

(You need to quote more context. Remember not all of us are reading
this on Google Groups.)

> OK, I am using the command line options -ansi and -pedantic. They tell
> me some things I'm doing wrong.
>
> I don't understand why this is wrong:
>
> record cr, lr;
>
> The compiler complains I'm mixing declarations and code.


The preceeding lines are:

> > int main(int argc, char *argv[]) {
> > FILE *fp;
> > fp = stdin;


So you have a declaration (`record ...`) following a statement
(`fp = stdin;`). You can't do this in C90 (which is what the
-ansi -pedantic options get you, I believe) [you /can/ in C99,
and it's a widely available extension].

Easy fix:

FILE *fp = stdin;

Now it's a declaration.

> The compiler does not catch the assigning stdin to fp, how can I
> enforce that?


You don't need to. (I think santosh may have meant that you were
mixing declarations and statements: assigning `stdin` to a FILE*
variable seems OK to me.)

--
Chris "electric hedgehog" Dollin
"A facility for quotation covers the absence of original thought." /Gaudy Night/

 
Reply With Quote
 
David T. Ashley
Guest
Posts: n/a
 
      01-25-2007


"santosh" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) oups.com...
> Cliff Martin wrote:
>> Hi,
>>
>> I am reading a fairly large file a line at a time, doing some
>> processing, and filtering out bits of the line. I am storing the
>> interesting information in a struct and then printing it out. This
>> works without any problems. I now would like to filter "duplicate"
>> records. They aren't really duplicate, I can't just qsort as is and
>> eliminate the matching rows. I have number of fields that will be the
>> same, but some of the fields will differ and the timestamp may be off
>> by a second or two. I want to eliminate records that have the fields
>> that match where the difference between timestamps is less than n
>> seconds. Again, this is not a problem, I can get seconds since epoch
>> and compare, or just use difftime. My problem is this, I want to build
>> an array of structs, that will hold lines that match, sort them on the
>> date member and then eliminate based on matching timestamps.
>>
>> For example in the set:
>> foo,bar ...,a, Thu Jan 25 01:40:11 EST 2007
>> foo,bar ...,a, Thu Jan 25 01:45:35 EST 2007
>> foo,bar ...,a, Thu Jan 25 01:48:09 EST 2007
>> foo,bar ...,b, Thu Jan 25 01:40:12 EST 2007
>> foo,baz ..., Thu Jan 25 01:40:11 EST 2007
>>
>> I would like to read the first 4 lines into structs, store them in
>> array, sort them and then print them out, while eliminating the
>> "duplicate" line 4. I would not want to read the 5th line, yet, because
>> it is dissimilar to the first 4 - foo, baz instead of foo,bar. I am
>> ignoring the field that has a or b in it (one of the reasons a simple
>> sort will not work)
>>
>> My problem comes from not knowing where to create the array, what size
>> to allocate, and how to re-initialize it when I move to the next set to
>> sort.

>
>>From your requirements it appears that a linked list will be a better

> option than an array of structs. It's easy to keep it sorted as you add
> elements to the list or remove them.


A somewhat simpler data structure is an array pointers, each of which points
to a structure.

In this case, insert operations boil down to allocating memory for the new
structure, perhaps realloc()'ing the array of pointers (I'd arrange not to
do this every time), then moving a bunch of pointers down by one element and
putting the new pointer in.

It is likely that memmove()'ing a bunch of pointers is not more expensive
than traversing a linked list to find the right insert point.

However, on a modern machine and with less than 1E6 records, I doubt anyone
would notice a difference.

--
David T. Ashley ((E-Mail Removed))
http://www.e3ft.com (Consulting Home Page)
http://www.dtashley.com (Personal Home Page)
http://gpl.e3ft.com (GPL Publications and Projects)


 
Reply With Quote
 
CBFalconer
Guest
Posts: n/a
 
      01-25-2007
Cliff Martin wrote:
>
> OK, I am using the command line options -ansi and -pedantic. They
> tell me some things I'm doing wrong.
>
> I don't understand why this is wrong:
>
> record cr, lr;
>
> The compiler complains I'm mixing declarations and code.


Possibly because you are mixing declarations and code.

>
> The compiler does not catch the assigning stdin to fp, how can I
> enforce that?


The crystal ball is murky, but appears to be saying 'line 142'.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews


 
Reply With Quote
 
Dave Thompson
Guest
Posts: n/a
 
      02-06-2007
On 24 Jan 2007 23:52:01 -0800, "santosh" <(E-Mail Removed)>
wrote:

> Cliff Martin wrote:


> > #include <stdio.h>


> > FILE *fp;
> > fp = stdin;

>
> This is not guaranteed to work on all C implementations.
>

Yes it is. Well, all conforming ones except freestanding which don't
have stdio at all, but I don't think that's what you meant.

Copying an _actual_ FILE
FILE fobject = * stdin;
and using it is not guaranteed to work, although usually it does.

Assigning _to_ stdin,out,err is not guaranteed to work, and often
doesn't.

Copying any FILE* including stdin,out,err to another pointer and using
it (consistent with the open and any other uses) is legal, and
sometimes useful, although in this example (snipped) it was not useful
because it was subsequently unconditionally overwritten.

> > char buf[len+1];

>
> Nor this.
>

Correct there.

> > while(fgets(buf, len, fp)) {

>
> fgets() will store only len-1 characters into the buffer so sizeof(buf)
> would do.


Up to len-1 characters from the input including the \n if reached,
PLUS the terminating null byte, for a total of at most the size given,
so yes sizeof(buf) would do.

Nit: and the parentheses aren't needed for sizeof primaryexpr, nor
sizeof *ptr or sizeof ptr->field or sizeof obj.field, although they
are for some more complicated expressions and typenames.

- David.Thompson1 at worldnet.att.net
 
Reply With Quote
 
CBFalconer
Guest
Posts: n/a
 
      02-06-2007
Dave Thompson wrote:
>

.... snip ...
>
> Copying any FILE* including stdin,out,err to another pointer and
> using it (consistent with the open and any other uses) is legal,
> and sometimes useful, although in this example (snipped) it was
> not useful because it was subsequently unconditionally overwritten.


No it wasn't (unconditionally overwritten). Look again. That was
the whole point of the extract, since stdin is a text file, and
when overwritten it would be a binary file.

--
<http://www.cs.auckland.ac.nz/~pgut001/pubs/vista_cost.txt>
<http://www.securityfocus.com/columnists/423>

"A man who is right every time is not likely to do very much."
-- Francis Crick, co-discover of DNA
"There is nothing more amazing than stupidity in action."
-- Thomas Matthews

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM
Array of structs instead of an array with pointers to structs? Paminu C Programming 5 10-11-2005 07:18 PM
length of an array in a struct in an array of structs in a struct in an array of structs Tuan Bui Perl Misc 14 07-29-2005 02:39 PM
const structs in other structs Chris Hauxwell C Programming 6 04-27-2004 07:03 PM
structs with fields that are structs Patricia Van Hise C Programming 5 04-05-2004 01:37 AM



Advertisments