Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > Data relocation (pointer subtraction undefined except within array)

Reply
Thread Tools

Data relocation (pointer subtraction undefined except within array)

 
 
chrisbazley@bigfoot.com
Guest
Posts: n/a
 
      12-31-2009
Hello,

I wasted a lot of time yesterday writing some code which manages a
collection of strings within a single heap block allocated by a
function similar to 'malloc'. A separate array of structs includes
members which point to the start of each string. My intention is to
replace existing (working) code where each string is held in a
separate heap block, to reduce a high turnover of blocks.

When replacing existing strings with longer strings, or adding strings
to the collection, the heap block containing the strings must be
extended using a function like 'realloc'. However, 'realloc' may move
the base address of the block, which invalidates the pointers to the
start of each string. I don't really want to replace all my 'char *'
members with 'ptrdiff_t' (offset from start of heap block containing
strings) because of the loss of type-safety and additional cost of
accessing the strings.

I thought I could solve my problem by adding a relocation offset to
each pointer immediately after calling 'realloc', derived by
subtracting the old from the new address of the heap block. However,
turning to the appendix of my copy of Kernighan & Ritchie, I
discovered - to my dismay - that the result of subtracting one pointer
from another is undefined unless both point to objects within the same
array.

Presumably that means the following code would have undefined effects:

unsigned int i;
char *strings, *new_strings;
ptrdiff_t relocate;
struct
{
char *string;
}
objs[10];

/* Resize heap block containing strings */
new_strings = realloc(strings, new_size);
if (new_strings == NULL)
{
/* ...handle error... */
}

/* Relocate pointers to the start of each string */
relocate = new_strings - strings;
for (i = 0; i < sizeof(objs) / sizeof(objs[0]); i++)
{
objs[i].string += relocate;
}
strings = new_strings;

Can anyone think of a machine architecture where the above code would
not work? It seems likely to be a common idiom, even if not strictly
legal.

It occurs to me that I could circumvent the K&R restriction by
utilitising the old address of the heap block within my relocation
loop:

for (i = 0; i < sizeof(objs) / sizeof(objs[0]); i++)
{
objs[i].string = new_strings + (objs[i].string - strings);
}

However, given that 'strings' is no longer a valid pointer, is this
version any less reprehensible than my original code?

I look forward to your comments.

TIA,
--
Christopher Bazley
 
Reply With Quote
 
 
 
 
Ben Bacarisse
Guest
Posts: n/a
 
      12-31-2009
http://www.velocityreviews.com/forums/(E-Mail Removed) writes:
<snip>
> When replacing existing strings with longer strings, or adding strings
> to the collection, the heap block containing the strings must be
> extended using a function like 'realloc'. However, 'realloc' may move
> the base address of the block, which invalidates the pointers to the
> start of each string. I don't really want to replace all my 'char *'
> members with 'ptrdiff_t' (offset from start of heap block containing
> strings) because of the loss of type-safety and additional cost of
> accessing the strings.
>
> I thought I could solve my problem by adding a relocation offset to
> each pointer immediately after calling 'realloc', derived by
> subtracting the old from the new address of the heap block. However,
> turning to the appendix of my copy of Kernighan & Ritchie, I
> discovered - to my dismay - that the result of subtracting one pointer
> from another is undefined unless both point to objects within the same
> array.
>
> Presumably that means the following code would have undefined effects:
>
> unsigned int i;
> char *strings, *new_strings;
> ptrdiff_t relocate;
> struct
> {
> char *string;
> }
> objs[10];
>
> /* Resize heap block containing strings */
> new_strings = realloc(strings, new_size);
> if (new_strings == NULL)
> {
> /* ...handle error... */
> }
>
> /* Relocate pointers to the start of each string */
> relocate = new_strings - strings;
> for (i = 0; i < sizeof(objs) / sizeof(objs[0]); i++)
> {
> objs[i].string += relocate;
> }
> strings = new_strings;
>
> Can anyone think of a machine architecture where the above code would
> not work? It seems likely to be a common idiom, even if not strictly
> legal.


I can't think of a current one, no, but then I don't know many
architectures anymore. Old ones, yes. Then again, can you say for
sure that there won't be any in the future? [On a segmented
architecture, pointer arithmetic may be done on the offset alone,
provided every array lies wholly within a segment. Inter-array
arithmetic can be meaningless in such situations.]

> It occurs to me that I could circumvent the K&R restriction by
> utilitising the old address of the heap block within my relocation
> loop:
>
> for (i = 0; i < sizeof(objs) / sizeof(objs[0]); i++)
> {
> objs[i].string = new_strings + (objs[i].string - strings);
> }
>
> However, given that 'strings' is no longer a valid pointer, is this
> version any less reprehensible than my original code?


I think this is safer from a purely practical point of view. It is
still undefined, but I suspect it is more likely to work on systems
where the first would fail.

Another strategy is to scan the old block replacing the pointers with
offsets within the block. I.e. you rely on the implementation defined
conversion from an integer type (ptrdiff_t) to char *. If you keep
the difference positive, it is very likely to work everywhere.

--
Ben.
 
Reply With Quote
 
 
 
 
Seebs
Guest
Posts: n/a
 
      12-31-2009
On 2009-12-31, (E-Mail Removed) <(E-Mail Removed)> wrote:
> However,
> turning to the appendix of my copy of Kernighan & Ritchie, I
> discovered - to my dismay - that the result of subtracting one pointer
> from another is undefined unless both point to objects within the same
> array.


Yes. That's largely because of old-style DOS systems, though there may
come future systems with similar behavior.

> However, given that 'strings' is no longer a valid pointer, is this
> version any less reprehensible than my original code?


Interestingly:

There are conceivable (probably even real) implementations on which this will
work, and the other won't. There are conceivable (probably even real)
implementations on which the other will work, and this won't.

The two things you might run into are:

1. Systems in which loading an invalid pointer into an address register
can page fault even if you don't try to dereference it.
2. Systems in which pointers to different objects may be in different
memory regions such that subtraction doesn't work as expected.

Suggestion: If you can handle the overhead, then malloc-copy-adjust-free
rather than using realloc. In practice, if realloc is changing addresses,
it MUST be possible to malloc, copy, and then free. However, I am sure
that if you tried hard enough, you could find some implementation somewhere
on which there existed M and N such that allocating M bytes and reallocing
to N succeeded, but trying to allocate M and N bytes simultaneously would
fail.

I wouldn't worry about it. The canonical thing to do is indeed to do
the copy/adjust phase yourself in cases like this.

Or! You could use ptrdiff_t offsets internally, and provide an API which
yields a pointer for a given object. This could be as simple as
#define STR(x) (global_base + x->offset)

-s
--
Copyright 2009, all wrongs reversed. Peter Seebach / (E-Mail Removed)
http://www.seebs.net/log/ <-- lawsuits, religion, and funny pictures
http://en.wikipedia.org/wiki/Fair_Game_(Scientology) <-- get educated!
 
Reply With Quote
 
bartc
Guest
Posts: n/a
 
      12-31-2009

<(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Hello,
>
> I wasted a lot of time yesterday writing some code which manages a
> collection of strings within a single heap block allocated by a
> function similar to 'malloc'. A separate array of structs includes
> members which point to the start of each string. My intention is to
> replace existing (working) code where each string is held in a
> separate heap block, to reduce a high turnover of blocks.
>
> When replacing existing strings with longer strings, or adding strings
> to the collection, the heap block containing the strings must be
> extended using a function like 'realloc'. However, 'realloc' may move
> the base address of the block, which invalidates the pointers to the
> start of each string. I don't really want to replace all my 'char *'
> members with 'ptrdiff_t' (offset from start of heap block containing
> strings) because of the loss of type-safety and additional cost of
> accessing the strings.
>
> I thought I could solve my problem by adding a relocation offset to
> each pointer immediately after calling 'realloc', derived by
> subtracting the old from the new address of the heap block. However,
> turning to the appendix of my copy of Kernighan & Ritchie, I
> discovered - to my dismay - that the result of subtracting one pointer
> from another is undefined unless both point to objects within the same
> array.


If you allocate a single block with malloc(), that is effectively a single
array. It doesn't matter if you then choose to divide it into multiple
arrays.

--
Bartc


 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      12-31-2009
On 12/31/2009 8:52 AM, (E-Mail Removed) wrote:
> Hello,
>
> I wasted a lot of time yesterday writing some code which manages a
> collection of strings within a single heap block allocated by a
> function similar to 'malloc'. A separate array of structs includes
> members which point to the start of each string. My intention is to
> replace existing (working) code where each string is held in a
> separate heap block, to reduce a high turnover of blocks.
>[...]


Others have discussed the problems of after-the-fact
readjustment of pointers. May I suggest that you might not
need it?

If the goal is to reduce overhead by holding many small
strings in one big block, perhaps you could consider allocating
a second big block when the first is full, leaving the first in
place and still holding its unmoved strings. If the blocks are
"large" compared to the per-block overhead, you'd waste only a
small amount of space -- and you'd completely avoid the dangers
of pointer-fiddling.

Maybe the data structure you're using isn't amenable to
this -- but it might be worth a thought or two.

--
Eric Sosman
(E-Mail Removed)lid
 
Reply With Quote
 
Barry Schwarz
Guest
Posts: n/a
 
      12-31-2009
On 31 Dec 2009 16:19:06 GMT, Seebs <(E-Mail Removed)> wrote:

>On 2009-12-31, (E-Mail Removed) <(E-Mail Removed)> wrote:
>> However,
>> turning to the appendix of my copy of Kernighan & Ritchie, I
>> discovered - to my dismay - that the result of subtracting one pointer
>> from another is undefined unless both point to objects within the same
>> array.

>
>Yes. That's largely because of old-style DOS systems, though there may
>come future systems with similar behavior.
>
>> However, given that 'strings' is no longer a valid pointer, is this
>> version any less reprehensible than my original code?

>
>Interestingly:
>
>There are conceivable (probably even real) implementations on which this will
>work, and the other won't. There are conceivable (probably even real)
>implementations on which the other will work, and this won't.
>
>The two things you might run into are:
>
>1. Systems in which loading an invalid pointer into an address register
>can page fault even if you don't try to dereference it.
>2. Systems in which pointers to different objects may be in different
>memory regions such that subtraction doesn't work as expected.
>
>Suggestion: If you can handle the overhead, then malloc-copy-adjust-free
>rather than using realloc. In practice, if realloc is changing addresses,
>it MUST be possible to malloc, copy, and then free. However, I am sure
>that if you tried hard enough, you could find some implementation somewhere
>on which there existed M and N such that allocating M bytes and reallocing
>to N succeeded, but trying to allocate M and N bytes simultaneously would
>fail.


Consider the case where realloc attempts to extend the existing area
and only assigns a new area when the extension fails. For example, if
the "heap" is 10 bytes and the first 6 have been allocated,
reallocating to 7 is easy but allocating a new 7 is not possible.
There is a similar situation possible if the heap is fragmented.

--
Remove del for email
 
Reply With Quote
 
chrisbazley@bigfoot.com
Guest
Posts: n/a
 
      01-27-2010
On Dec 31 2009, 4:21*pm, "bartc" <(E-Mail Removed)> wrote:
> <(E-Mail Removed)> wrote in message
>
> news:(E-Mail Removed)...
> > Hello,

>
> > I wasted a lot of time yesterday writing some code which manages a
> > collection of strings within a single heap block allocated by a
> > function similar to 'malloc'. A separate array of structs includes
> > members which point to the start of each string. My intention is to
> > replace existing (working) code where each string is held in a
> > separate heap block, to reduce a high turnover of blocks.

>
> > When replacing existing strings with longer strings, or adding strings
> > to the collection, the heap block containing the strings must be
> > extended using a function like 'realloc'. However, 'realloc' may move
> > the base address of the block, which invalidates the pointers to the
> > start of each string. I don't really want to replace all my 'char *'
> > members with 'ptrdiff_t' (offset from start of heap block containing
> > strings) because of the loss of type-safety and additional cost of
> > accessing the strings.

>
> > I thought I could solve my problem by adding a relocation offset to
> > each pointer immediately after calling 'realloc', derived by
> > subtracting the old from the new address of the heap block. However,
> > turning to the appendix of my copy of Kernighan & Ritchie, I
> > discovered - to my dismay - that the result of subtracting one pointer
> > from another is undefined unless both point to objects within the same
> > array.

>
> If you allocate a single block with malloc(), that is effectively a single
> array. It doesn't matter if you then choose to divide it into multiple
> arrays.


I think you misunderstood my problem. I was not doing arithmetic
between pointers to objects within a single block; I was trying to
relocate pointers so that they point to the new location of objects
within a block that has been moved by 'realloc'.

--
Christopher Bazley
 
Reply With Quote
 
chrisbazley@bigfoot.com
Guest
Posts: n/a
 
      01-27-2010
On Dec 31 2009, 5:04*pm, Eric Sosman <(E-Mail Removed)>
wrote:
> On 12/31/2009 8:52 AM, (E-Mail Removed) wrote:
>
> > Hello,

>
> > I wasted a lot of time yesterday writing some code which manages a
> > collection of strings within a single heap block allocated by a
> > function similar to 'malloc'. A separate array of structs includes
> > members which point to the start of each string. My intention is to
> > replace existing (working) code where each string is held in a
> > separate heap block, to reduce a high turnover of blocks.
> >[...]

>
> * * *Others have discussed the problems of after-the-fact
> readjustment of pointers. *May I suggest that you might not
> need it?
>
> * * *If the goal is to reduce overhead by holding many small
> strings in one big block, perhaps you could consider allocating
> a second big block when the first is full, leaving the first in
> place and still holding its unmoved strings. *If the blocks are
> "large" compared to the per-block overhead, you'd waste only a
> small amount of space -- and you'd completely avoid the dangers
> of pointer-fiddling.
>
> * * *Maybe the data structure you're using isn't amenable to
> this -- but it might be worth a thought or two.


It sounds as though you are suggesting a linked list of large buffers,
each holding multiple strings. Unfortunately I don't think that would
suit my needs because strings are often replaced by other strings of
different length, or deleted entirely. I don't think I would be happy
with memory usage increasing with each modification of one of my
objects-with-associated-strings, even if all that memory could be
recovered (by walking the linked list) upon deletion of the object.

--
Christopher Bazley
 
Reply With Quote
 
chrisbazley@bigfoot.com
Guest
Posts: n/a
 
      01-27-2010
Thanks to everyone who posted replies (especially Ben, Peter and
Eric); you helped me to organise my thoughts and eventually arrive at
a solution. I feel like I owe an explanation of how I solved my
problem, so here it is...

On Dec 31 2009, 4:19*pm, Seebs <(E-Mail Removed)> wrote:
[snip]
> The two things you might run into are:
>
> 1. *Systems in which loading an invalid pointer into an address register
> can page fault even if you don't try to dereference it.


Ah, I wouldn't have thought of that! I am used to the ARM
architecture, where all registers (except the program counter) are
general purpose; the CPU can't tell whether a value is a pointer or
not until it is used as the base address in a load or store
instruction.

> 2. *Systems in which pointers to different objects may be in different
> memory regions such that subtraction doesn't work as expected.


Like the BBC Master Series microcomputer, where additional memory (so-
called 'sideways RAM') was paged into the memory map between 0x8000
and 0xC000 in 16 KB chunks.

> Suggestion: *If you can handle the overhead, then malloc-copy-adjust-free
> rather than using realloc. *In practice, if realloc is changing addresses,
> it MUST be possible to malloc, copy, and then free. *However, I am sure
> that if you tried hard enough, you could find some implementation somewhere
> on which there existed M and N such that allocating M bytes and reallocing
> to N succeeded, but trying to allocate M and N bytes simultaneously would
> fail.
>
> I wouldn't worry about it. *The canonical thing to do is indeed to do
> the copy/adjust phase yourself in cases like this.


This is closest to the solution that I eventually adopted.

My original idea of shuffling strings around the buffer when a string
is deleted or replaced turned into a nightmare (which is why my
initial implementation was to put each string in a separate heap
block). A single call through my API allows several elements of an
array to be modified in an atomic operation. The strings associated
with some elements may grow, and those associated with others may
shrink. It's basically a sliding heap, but more complex because of the
need for atomicity!

Before any strings could be moved, I needed to calculate the 'peak'
memory usage for the edit about to happen. This may be greater than
the final string buffer requirement, if short strings are replaced by
long strings before short strings are replaced by long strings. I also
felt that I should periodically minimise the string buffer size (e.g.
when in a quiescent state). Every time the string buffer was resized,
I would have needed to traverse the array, relocating every string
pointers.

In the end, I decided that the pain of maintaining my own sliding heap
within a single block just wasn't worth it. Without the guarantee of
contiguous address space from a fixed base address, resizing the
string buffer would likely require its entire content to be copied
(whether implicitly by 'realloc' or explicitly by 'memcpy'). Although
likely to be quicker than copying strings piecemeal using 'strcpy',
that is offset by the fact that some of the strings will be destined
for the scrapheap anyway.

Before editing the array of structs, I now sum the expected change in
string buffer requirement for each array element (e.g. -3 when
replacing "ladder" with "car") and add it to the current buffer size.
This allows me to calculate the required string buffer size without
traversing the whole array twice (unless every element is to be
modified).

I then allocate a new string buffer of exactly the right size and
traverse the array of structs from the beginning, copying each string
from the old buffer to the new buffer, or else replacing it. Rather
than relocating pointers, I get fresh pointers returned from 'strcpy'.
Lastly, I free the old string buffer. Thus, the buffer is never longer
than it needs to be and each edit operation requires only one
'malloc'/'free'.

My performance tests show speed changes of 27% to 27923% relative to
the previous strategy of allocating a separate heap block for each
string!

> Or! *You could use ptrdiff_t offsets internally, and provide an API which
> yields a pointer for a given object. *This could be as simple as
> * * * * #define STR(x) *(global_base + x->offset)


Unfortunately, I'm wedded to the idea of using the same struct type on
both sides of an API. The client passes a pointer to an array of this
type, each element of which may contain one or more string pointers.
Internally, my module makes a deep copy of each struct and bundles it
with some internal data.

Requiring the client to pass a string buffer pointer and specify
strings as ptrdiff_t offsets within that buffer would make using my
API more painful. Alternatively, I could store ptrdiff_t value(s) as
part of the internal data associated with each struct, but it would be
messy to keep that synchronised with the struct type definition.

Cheers,
--
Christopher Bazley
 
Reply With Quote
 
Christopher Bazley
Guest
Posts: n/a
 
      01-30-2010
In message <(E-Mail Removed)
ps.com>
(E-Mail Removed) wrote:

[snip]

I doubt anyone read this far, but there were a number of mistakes in
my original posting...

> Before any strings could be moved, I needed to calculate the 'peak'
> memory usage for the edit about to happen. This may be greater than
> the final string buffer requirement, if short strings are replaced by
> long strings before short strings are replaced by long strings.


Of course I meant "...if short strings are replaced by long strings
before long strings are replaced by short strings."

[snip]

> My performance tests show speed changes of 27% to 27923% relative to
> the previous strategy of allocating a separate heap block for each
> string!


I meant 127% to 28023% of the original speed. (One of my tests took
1/280.23 of the number of clock ticks that it previously did.)

--
Chris Bazley
Star Fighter 3000: http://starfighter.acornarcade.com/
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
try -> except -> else -> except? David House Python 2 07-06-2009 05:48 PM
who is simpler? try/except/else or try/except Fabio Z Tessitore Python 5 08-13-2007 12:52 AM
converting a nested try/except statement into try/except/else John Salerno Python 20 08-11-2006 02:48 PM
Embeded Perl, Dynaloader, relocation error: undefined symbol: Perl_Tstack_sp_ptr kyle.burton@gmail.com Perl Misc 0 06-24-2005 06:35 PM
relocation error: nimage_c.so: undefined symbol: str2cstr ara howard Ruby 1 10-28-2003 08:57 AM



Advertisments