Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > C Programming > How do linkers work?

Reply
Thread Tools

How do linkers work?

 
 
jacob navia
Guest
Posts: n/a
 
      03-23-2008
OK, after the stack and the debuggers, let's look a little bit
more in depth into this almost ignored piece of the language,
the linker.

Obviously, the C standard doesn't mention this [1]. And almost
never we discuss it here.

Like many other things, this is an error because the linker
is an *essential* piece of the language. Without it, nothing
would ever work.

Separate compilation
--------------------
C supports the separate compilation of modules. Each module
is compiled into an independent object code file (.o in Unix,
or .obj under Microsoft) and those separate object files are
assembled into the executable by the "link editor" or linker
for short.

There are several standards for object file formats:
o The "ELF" format used in most Unix systems
o The COFF format used under windows 32 bit
o The OMF format used by 16 bit DOS/Windows systems

and many others I do not know...

What is important in the context of this discussion, is
what is inside from the language viewpoint.

An object file contains:
o A symbol table of exported symbols
o Several "sections" of data.
o Relocation information


Sections
--------
The "sections" are logical parts of the program that should be
assembled into the final executable. Basically we have 3 kind
of sections:
1) The code section, i.e. we have here the binary opcodes for the
processor
2) The data section, i.e. the initialized tables, strings, or
numbers that are contained in the module
3) The non initialized data section, that is basically just
a size information: XXX bytes should be reserved for non
initialized variables

For example;

int function(char *a)
{
static int bss;

if (strcmp(a,"foobar"))
return 42;
else
return 366554 + bss++;
}

In the code section we would have:
o The prologue code
o The call, the if, and the return with its
o epilogue code

In the data section we would have the "foobar" array
of characters followed by a zero, the number 42 and
the number 366554 in case the processor doesn't support
inlined integer constants. If the processor DOES support
inlined constant values (the x86 for example), the two
integers values would go in the code section

The non-initialized section would contain sizeof(int)
bytes to hold the integer called "bss".

Relocations
-----------
The symbol "strcmp" is not defined in the module, and its
address is not known at compile time. The object module
contains just a record to indicate to the linker:

From: compiler
To: linker

Dear Linker:
Please fill at the offset 4877 in the code section, sizeof(void *)
bytes with the address of the symbol "strcmp".

Thanks in advance

Your compiler

The relocations can be much more complicated than that, but basically,
all of them are just that.

The symbol table
----------------
The object module defines some symbols, and imports some symbols
from other modules. All those symbols are specified in the object
module symbol table. In some object code formats we find also
debug information records in the symbol table. In others,
the debug information is written into a separate section.

Libraries
Static libraries are just a bunch of object code modules that
are stored into a single file for convenience reasons. They
are seen by the linker in the same way as many object files.

----------------------------------------------------------------------

With all this information, the linker goes through all object files
noting which symbols are defined in which module, which symbols are
required from one module and defined in another, until there are no
more object files or libraries. It checks then that all symbols are
defined (if not will complain) and builds the executable.

Linkers can be very complex beasts, like, for instance, the gnu "ld"
linker. This is a linker that features:
o A "link editor language", that allows you to change the
workings of the linker and describe your own executable
format...
o An apparent "machine independence" (what does this means in
a linker is not obvious to me) that allows it to link
object modules from different formats...
o A "BFD" format, that is a kind of GNU machine independent
object file format, or similar.

Other linkers, like lcc-win's for instance are completely stupid beasts
that can only link the format generated by lcc-win and nothing else.
Obviously, the only thing *you* care about a linker is how fast it is,
so in this sense, lcc-win is a better choice: it is quite fast. But
you pay the price: it can only link lcc-win's code...

In the next installment we will go in detail into the dark corners of
the linkers, specifically, the problems with symbol collision.
-------------
[1] The only mention of the linker in the standard is when
speaking about extended characters in identifiers, it mentions

<quote>

On systems in which linkers cannot accept extended characters, an
encoding of the universal character name may be used in forming valid
external identifiers. For example, some otherwise unused
character or sequence of characters may be used to encode the \u in a
universal character name. Extended characters may produce a long
external identifier.

<end quote>

Nowhere is the "linker" defined.
--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32
 
Reply With Quote
 
 
 
 
Richard Heathfield
Guest
Posts: n/a
 
      03-23-2008
jacob navia said:

> OK, after the stack and the debuggers, let's look a little bit
> more in depth into this almost ignored piece of the language,
> the linker.
>
> Obviously, the C standard doesn't mention this [1].


See 2.1.1.2( of C89 or 5.1.1.2( of C99.

<snip>

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
 
Reply With Quote
 
 
 
 
jacob navia
Guest
Posts: n/a
 
      03-24-2008
Richard Heathfield wrote:
> jacob navia said:
>
>> OK, after the stack and the debuggers, let's look a little bit
>> more in depth into this almost ignored piece of the language,
>> the linker.
>>
>> Obviously, the C standard doesn't mention this [1].

>
> See 2.1.1.2( of C89 or 5.1.1.2( of C99.
>
> <snip>
>


5.1/1.2( says:
<quote>
All external object and function references are resolved. Library
components are linked to satisfy external references to functions and
objects not defined in the current translation. All such translator
output is collected into a program image which contains information
needed for execution in its execution environment.
<end quote>

Not really a specification of a linker!

It does NOT say:

1) What to do when several modules define the same symbol
2) What to do when a symbol has contradictory definitions
in different modules.

We will see what consequences those omissions have in the next
installment.



--
jacob navia
jacob at jacob point remcomp point fr
logiciels/informatique
http://www.cs.virginia.edu/~lcc-win32
 
Reply With Quote
 
Richard Heathfield
Guest
Posts: n/a
 
      03-24-2008
jacob navia said:

> Richard Heathfield wrote:
>> jacob navia said:
>>
>>> OK, after the stack and the debuggers, let's look a little bit
>>> more in depth into this almost ignored piece of the language,
>>> the linker.
>>>
>>> Obviously, the C standard doesn't mention this [1].

>>
>> See 2.1.1.2( of C89 or 5.1.1.2( of C99.
>>

<snip>

> Not really a specification of a linker!


It isn't intended to be.

> It does NOT say:
>
> 1) What to do when several modules define the same symbol


The behaviour is undefined (see 6.2.2).

> 2) What to do when a symbol has contradictory definitions
> in different modules.


The behaviour is undefined (see 6.7(4)).

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
 
Reply With Quote
 
Keith Thompson
Guest
Posts: n/a
 
      03-24-2008
jacob navia <(E-Mail Removed)> writes:
> OK, after the stack and the debuggers, let's look a little bit
> more in depth into this almost ignored piece of the language,
> the linker.
>
> Obviously, the C standard doesn't mention this [1]. And almost
> never we discuss it here.
>
> Like many other things, this is an error because the linker
> is an *essential* piece of the language. Without it, nothing
> would ever work.

[...]

Consider that all compiled languages depend on linkers just as much as
C does. If a post would be just as relevant to comp.lang.whatever as
it is to comp.lang.c, why post it to comp.lang.c rather than
comp.lang.c++ or comp.lang.fortran?

--
Keith Thompson (The_Other_Keith) <(E-Mail Removed)>
Nokia
"We must do something. This is something. Therefore, we must do this."
-- Antony Jay and Jonathan Lynn, "Yes Minister"
 
Reply With Quote
 
Tony Giles
Guest
Posts: n/a
 
      03-24-2008
Keith Thompson wrote:

> [...]
>
> Consider that all compiled languages depend on linkers just as much as
> C does. If a post would be just as relevant to comp.lang.whatever as
> it is to comp.lang.c, why post it to comp.lang.c rather than
> comp.lang.c++ or comp.lang.fortran?
>


Sorry Keith, I don't really understand what you are saying here.Are you
saying that subjects like this should be cross posted to
comp.lang.whatever (and risk the wrath of the usual suspects) or not be
posted at all?

I am one year into programming now and hope to make a living out of it
someday. Personally speaking I have zero experience with other languages
outside of C and hence have found Jacob's recent posts about stacks,
debugging and linking very informative - if he had posted them to
comp.lang.fortran or wherever I would have missed them.

I have learned one hell of a lot here recently just by browsing (when I
get completely stuck I'm sure I'll be asking questions) but have been
getting increasingly ****ed off with the "off topic" brigade and the
(seeming) mob of regulars who just bitch about one and other.

To all: give a thought for us learners. I for one am here for an
education - in the art of C and programming in general. For me, topics
as mentioned before are very much on topic. Maybe I should trawl through
comp.lang.endless but I'd rather the one stop on what I am learing!
 
Reply With Quote
 
Flash Gordon
Guest
Posts: n/a
 
      03-24-2008
Tony Giles wrote, On 24/03/08 07:23:
> Keith Thompson wrote:
>
>> [...]
>>
>> Consider that all compiled languages depend on linkers just as much as
>> C does. If a post would be just as relevant to comp.lang.whatever as
>> it is to comp.lang.c, why post it to comp.lang.c rather than
>> comp.lang.c++ or comp.lang.fortran?

>
> Sorry Keith, I don't really understand what you are saying here.Are you
> saying that subjects like this should be cross posted to
> comp.lang.whatever (and risk the wrath of the usual suspects) or not be
> posted at all?


Keith is pointing out it is not really topical here. There is
comp.lang.programming for general programming, comp.lang.misc (I've not
looked in to), comp.compilers, OS specific groups etc.

> I am one year into programming now and hope to make a living out of it
> someday. Personally speaking I have zero experience with other languages
> outside of C and hence have found Jacob's recent posts about stacks,
> debugging and linking very informative - if he had posted them to
> comp.lang.fortran or wherever I would have missed them.


That does not make them topical here. See above for suggestions of other
groups where they could be topical.

> I have learned one hell of a lot here recently just by browsing (when I
> get completely stuck I'm sure I'll be asking questions) but have been
> getting increasingly ****ed off with the "off topic" brigade and the
> (seeming) mob of regulars who just bitch about one and other.


So you are ****ed off with the people who point out when something is
off topic but *not* with the people who come back and say (often with
insults) that it is on topic?

> To all: give a thought for us learners. I for one am here for an
> education - in the art of C and programming in general. For me, topics
> as mentioned before are very much on topic. Maybe I should trawl through
> comp.lang.endless but I'd rather the one stop on what I am learing!


You will be looking for a job at some point, should job applications be
topical here? If you want to learn about other topics you have to look
in other places, just as you need multiple text books, where is the
problem with that as the other places *do* exist?
--
Flash Gordon
 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      03-24-2008
Tony Giles <(E-Mail Removed)> writes:

> Keith Thompson wrote:
>
>> [...]
>>
>> Consider that all compiled languages depend on linkers just as much as
>> C does. If a post would be just as relevant to comp.lang.whatever as
>> it is to comp.lang.c, why post it to comp.lang.c rather than
>> comp.lang.c++ or comp.lang.fortran?

>
> Sorry Keith, I don't really understand what you are saying here.Are
> you saying that subjects like this should be cross posted to
> comp.lang.whatever (and risk the wrath of the usual suspects) or not
> be posted at all?


I think he is proposing a test for topicality: if it is topical only
here post it here, otherwise post it to a single more suitable group
(where one exists). Cross-posting should be a last resort and should be
done to the very smallest possible set of groups.

> I am one year into programming now and hope to make a living out of it
> someday. Personally speaking I have zero experience with other
> languages outside of C and hence have found Jacob's recent posts about
> stacks, debugging and linking very informative - if he had posted them
> to comp.lang.fortran or wherever I would have missed them.


They belong in comp.programming -- a group that would have benefited
from a lively discussion of various approaches to debugging.

> For me, topics
> as mentioned before are very much on topic. Maybe I should trawl
> through comp.lang.endless but I'd rather the one stop on what I am
> learing!


You should probably add comp.programming. comp.lang.c can't include
everything that you should be learning about.

--
Ben.
 
Reply With Quote
 
Richard Heathfield
Guest
Posts: n/a
 
      03-24-2008
Tony Giles said:

<snip>

> I am one year into programming now and hope to make a living out of it
> someday. Personally speaking I have zero experience with other languages
> outside of C and hence have found Jacob's recent posts about stacks,
> debugging and linking very informative - if he had posted them to
> comp.lang.fortran or wherever I would have missed them.


How can you tell whether the information he has presented is authoritative?
If such article are posted in the kind of group where they are topical,
they stand a much higher chance of getting proper peer review.

<snip>

> To all: give a thought for us learners.


We do. We assume you come here to learn more about C, and as a group we
provide an astoundingly authoritative resource - on C. Not on debuggers,
linkers, stacks, and the like. If you want peer-reviewed, authoritative
articles on those subjects, you'd be better off finding a group where
debuggers, linkers, and stacks are topical, because that's where you're
most likely to find the experts.

--
Richard Heathfield <http://www.cpax.org.uk>
Email: -http://www. +rjh@
Google users: <http://www.cpax.org.uk/prg/writings/googly.php>
"Usenet is a strange place" - dmr 29 July 1999
 
Reply With Quote
 
Eric Sosman
Guest
Posts: n/a
 
      03-24-2008
jacob navia wrote:
> OK, after the stack and the debuggers, let's look a little bit
> more in depth into this almost ignored piece of the language,
> the linker.
>
> Obviously, the C standard doesn't mention this [1]. And almost
> never we discuss it here.
> [...]


Your copy of the Standard must be incomplete. It appears
you haven't seen sections 5.1.1.1, 5.1.1.2, 6.2.2, and 6.9.

--
Eric Sosman
http://www.velocityreviews.com/forums/(E-Mail Removed)lid
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Re: How include a large array? Edward A. Falk C Programming 1 04-04-2013 08:07 PM



Advertisments