On 9/3/2011 9:04 AM, pozz wrote:
> Suppose I have a structure:
>
> typedef struct {
> int version;
> DUMMY dummy;
> FOO foo;
> BAR bars[128];
> } CONFIG;
>
> stored in a "config.dat" file with fwrite(). At startup, the application
> open the file and read the configuration. I think it is a normal
> approach to store the configuration of an application in a non-volatile
> way.
> Of course, there are many file types for storing application
> configuration (INI, XML, CSV, database...), but in my case a pure binary
> file is sufficient and simple to use.
>
> Now suppose I have a new version of the software and a new version of
> the CONFIG structure:
>
> typedef struct {
> int version;
> DUMMY dummy;
> FOOOLD foo;
> BAR bars[128];
> } CONFIGOLD;
>
> typedef struct {
> int version;
> DUMMY dummy;
> FOO foo;
> NEWELEM newelem;
> BAR bars[256];
> } CONFIG;
>
> Note that some elements are inserted in the middle of the structure, the
> size of the array bars is changed and the definition of sub-structure
> (FOO in the example) is also changed.
>
> I want to write a function that opens the configuration file and, based
> on the version, read the configuration or make an upgrade of the
> configuration file.
>
> Normally I would proceed opening the file, reading the version and, in
> the case it is old, reading the old configuration structure, copying to
> the new configuration structure (making adaptation), deleting the old
> file and creating/writing the new structure to the file. Something
> similar to this (without error checking):
>
> int fd;
> CONFIG cfg;
> fd = open("config.dat", O_RDONLY);
It seems odd that you use C's fwrite() for output but then
resort to non-C methods to read it again. Why not fread()?
> read(fd, &cfg.version, sizeof(cfg.version));
> if (cfg.version == 2) {
> lseek(fd, 0, SEEK_SET);
> read(fd, &cfg, sizeof(cfg));
The seeking seems superfluous. Why not just keep on reading
from the current file position, taking into account the fact that
you've read the version number already?
read(fd, (char*)&cfg + sizeof(cfg.version),
sizeof(cfg) - sizeof(cfg.version));
> close(fd);
> } else if (cfg.version == 1) {
> CONFIGOLD cfgold;
> BAR bar_default = { ... };
> lseek(fd, 0, SEEK_SET);
> read(fd, &cfgold, sizeof(cfgold));
> /* Copy from old to new configuration, filling the new elements
> * with default values */
> cfg.version = 2;
> cfg.dummy = cfgold.dummy;
> <...adapt cfgold.foo to cfg.foo, it's application dependent...>
> cfg.newelem = newelem_default;
> memcpy(cfg.bars, cfgold.bars, 128 * sizeof(BAR));
> memcpy(&cfg.bars[128], &bar_default, 128 * sizeof(BAR));
> close(fd);
> remove("config.dat");
Aside: You may live to regret this. What if the system crashes
just after removing the old configuration file but before creating
the new one? It might be better to write the new data to "config.tmp"
and then remove("config.dat"), rename("config.tmp", "config.dat")
once you're sure the new data has been safely written. Better still:
/* ... write "config.tmp" ... */
remove("config.bak");
rename("config.dat", "config.bak");
rename("config.tmp", "config.dat");
.... and still more elaborate schemes are possible.
> fd = open("config.dat", O_WRONLY | O_CREAT);
> write(fd, &cfg, sizeof(cfg));
> close(fd);
> }
>
> This algorithm assumes to maintain both structures in RAM, but I
> couldn't on my embedded platform with a small amount of memory.
You need both only while the load-and-convert is in progress.
If `oldcfg' is an `auto' variable it will go away when the function
returns; if you get its space from malloc() you can free() it when
conversion is finished.
But if even that is too much of a burden, you can perhaps read
the old configuration piecemeal instead of in one big gulp. It looks
like the DUMMY element can be read directly into `cfg' without using
extra storage. You haven't revealed the relationship between FOOOLD
and FOO, but you can surely perform the conversion with no more than
sizeof(FOOOLD) additional memory, perhaps less. If the expanded BAR
array just has the old BAR elements as a prefix you need no extra
space; if the conversion is more complicated you might need some.
But in all, you need at most max(sizeof(FOOOLD), 128*sizeof(BAR))
additional memory, possibly less.
> [...]
> The problem I couldn't solve is related to the reading/writing of each
> field. Indeed, between fields the compiler could add padding bytes, so
> reading/writing the entire structure (with padding) is completely
> different than reading/writing field by field (without padding).
You don't need an actual instance of the struct to determine
how many padding bytes, if any, are present. If you're writing
a struct S { T1 f1; T2 f2; ... } field-by-field using independent
sources for the f1,f2,... you can do something like this:
T1 x_f1 = ...;
T2 x_f2 = ...;
...
size_t written = 0; // bytes written thus far
fwrite (&x_f1, sizeof x_f1, 1, stream);
written += sizeof x_f1;
while (written < offsetof(struct S, f2)) {
putc('\0', stream); // write padding bytes
++written;
}
fwrite (&x_f2, sizeof x_f2, 1, stream);
written += sizeof x_f2;
...
A similar approach works for reading: Just use getc() to consume
and ignore padding bytes instead of putc() to create them.
> What do you think? Do you have other better suggestions?
Design a better configuration file format. Seriously. You
are in this bind and going to all this work *because* you've got
an on-disk image of an in-memory object, and because the in-memory
object's form is subject to incompatible changes. If you had
written the data field-by-field in the first place you would not
need to worry about padding bytes. If you had changed the `cfg'
solely by adding things to the end instead of roiling the middle,
you could read the prefix, check the version, and then maybe read
more. If you had adopted a more flexible format than image-of-RAM
you would have even more freedom to adapt and extend. In short,
your difficulties seem mostly self-inflicted.
--
Eric Sosman
d