Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Computing > Windows 64bit > 16-byte stack alignment - is it really necessary?

Reply
Thread Tools

16-byte stack alignment - is it really necessary?

 
 
=?Utf-8?B?SmVyZW15IEdvcmRvbg==?=
Guest
Posts: n/a
 
      08-01-2005
I have a query about how important it really is to maintain RSP on 16-byte
alignment. It seems to be a "given" that this will aid performance and the
X64 documentation is very insistent about this.

However the AMD documentation says:-
"Stack Alignment. Control-transfer performance can degrade significantly
when the stack pointer is not aligned properly. Stack pointers should be word
aligned in 16-bit segments, doubleword aligned in 32-bit segments, and
quadword aligned in 64-bit mode."
Section 3.73 Chapter 3 "General Purpose Programming" AMD64.
In other words the processor manufacturer recommends only 8-byte alignment
for the stack.

The reason I ask is that I am converting my assembler (GoAsm) to work with
64-bit source code for applications running under Windows XP64. It will
certainly be far easier to keep the stack on 8-byte alignment, rather than
16-byte alignment.

Bearing in mind the processor manufacturer's requirement, I cannot at
present understand the requirement to align on 16-bytes. Why is this
insisted upon? Is this a hangover from some previous thinking? Is it
something which may be reconsidered and eventually dropped? I understand no
exception will be generated by stack alignment on 8-bytes rather than
16-byte, but it is said that performance may be affected. Why is this? I
will of course do my own speed trials, but at my current planning stage for
GoAsm, any insight into this would be very useful.

-----
Jeremy Gordon
The "Go" tools
http://www.GoDevTool.com
 
Reply With Quote
 
 
 
 
Darrell Gorter[MSFT]
Guest
Posts: n/a
 
      08-03-2005
Hello Jeremy,
This is required as part of the amd64 calling convention, it is not for
performance only.
In general writing asm on x64 is quite different that writing it on x86.
Here is a document on msdn that may be useful.
http://msdn.microsoft.com/library/de...us/kmarch/hh/k
march/64bitAMD_6ec00b51-bf75-41bf-8635-caa8653c8bd9.xml.asp
Thanks,
Darrell Gorter[MSFT]

This posting is provided "AS IS" with no warranties, and confers no rights
--------------------
<Thread-Topic: 16-byte stack alignment - is it really necessary?
<thread-index: AcWW4/NXRsKX3DAAS2OcD12MKd4U3A==
<X-WBNR-Posting-Host: 213.162.104.195
<From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?="
<>
<Subject: 16-byte stack alignment - is it really necessary?
<Date: Mon, 1 Aug 2005 14:57:03 -0700
<Lines: 31
<Message-ID: <55DA3544-59C3-4F81-8708->
<MIME-Version: 1.0
<Content-Type: text/plain;
< charset="Utf-8"
<Content-Transfer-Encoding: 7bit
<X-Newsreader: Microsoft CDO for Windows 2000
<Content-Class: urn:content-classes:message
<Importance: normal
<Priority: normal
<X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
<Newsgroups: microsoft.public.windows.64bit.general
<NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
<Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
<Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13644
<X-Tomcat-NG: microsoft.public.windows.64bit.general
<
<I have a query about how important it really is to maintain RSP on 16-byte
<alignment. It seems to be a "given" that this will aid performance and the
<X64 documentation is very insistent about this.
<
<However the AMD documentation says:-
<"Stack Alignment. Control-transfer performance can degrade significantly
<when the stack pointer is not aligned properly. Stack pointers should be
word
<aligned in 16-bit segments, doubleword aligned in 32-bit segments, and
<quadword aligned in 64-bit mode."
<Section 3.73 Chapter 3 "General Purpose Programming" AMD64.
<In other words the processor manufacturer recommends only 8-byte alignment
<for the stack.
<
<The reason I ask is that I am converting my assembler (GoAsm) to work with
<64-bit source code for applications running under Windows XP64. It will
<certainly be far easier to keep the stack on 8-byte alignment, rather than
<16-byte alignment.
<
<Bearing in mind the processor manufacturer's requirement, I cannot at
<present understand the requirement to align on 16-bytes. Why is this
<insisted upon? Is this a hangover from some previous thinking? Is it
<something which may be reconsidered and eventually dropped? I understand
no
<exception will be generated by stack alignment on 8-bytes rather than
<16-byte, but it is said that performance may be affected. Why is this? I
<will of course do my own speed trials, but at my current planning stage
for
<GoAsm, any insight into this would be very useful.
<
<-----
<Jeremy Gordon
<The "Go" tools
<http://www.GoDevTool.com
<

 
Reply With Quote
 
 
 
 
=?Utf-8?B?SmVyZW15IEdvcmRvbg==?=
Guest
Posts: n/a
 
      08-04-2005
Thanks, Darrell for your answer.
I was aware of the article you mentioned, in fact it was that article which
first alerted me to the supposed requirement that the stack is always aligned
on a 16-byte boundary except within prolog and epolog code in a stack frame
(when, of course it could not always be so aligned).

The particular parts of the article which worry me are:-
"Dynamic Parameter Stack Area Construction"
"The stack is always 16-byte aligned when a call instruction is executed."
And in the article:-
"Function types"
"Leaf function- A function that does not require a stack frame. A leaf
function does not require a function table entry. It cannot call any
functions, allocate space, or save any nonvolatile registers. It can leave
the stack unaligned while it executes."

Here it is said that a leaf function cannot call another function. This is
enormously prohibitive for compilers, and seems to be a consequence of the
requirement that the stack should always be aligned on a 16-byte boundary at
the time of a call. Of course the compiler could ensure that the alignment
is achieved just prior to the call, and restored afterwards, but this would
add bloat (extra opcodes). I need to adjust the output of my development
tools to ensure that the requirements of x64 are always met but only in so
far as that requirement is strictly necessary. This is why I need to
understand the reason for (a) leaf functions being prohibited from calling
another leaf function, which seems to be related to (b) the requirement that
the stack be 16-byte aligned when calling even a leaf function.

Bearing in mind that the AMD64 literature requires stack alignment only on
an 8-byte boundary which is what one would expect, I believe the requirement
in x64 for stack alignment on a 16-byte boundary is related to exception
handling. I can see that in frame functions, which have prolog and epilog
code, it may well be the case that the exception handler requires all calls
within the frame function to be carried out when the stack pointer is aligned
on a 16-byte boundary. This may be necessary in order to achieve an orderly
unwind (going back through the calls until the correct unwind data is found,
and then continuing the unwind back beyond that).
However, I do not understand at present why this is necessary for leaf
functions. Generally a leaf function will not have its own unwind data. It
will not have a function table entry. It will therefore be invisible to the
exception handler. Instead, the exception handler will identify the function
which called the leaf function. To my mind, therefore, if there are a series
of leaf calls (ie. in which leaf functions call other leaf functions) the
exception handler will ignore them all and automatically start the unwind at
the frame function which called the very first leaf function in the series.
This is why I am sceptical about the documented prohibition against leaf
functions calling other leaf functions and the requirement of 16-byte
alignment of the stack when a leaf function is called.
However, perhaps this requirement is not to do with exception handling. If
so, what is the requirement to do with?
Any help you can give me would be appreciated.
--
Jeremy Gordon
The "Go" tools


""Darrell Gorter[MSFT]"" wrote:

> Hello Jeremy,
> This is required as part of the amd64 calling convention, it is not for
> performance only.
> In general writing asm on x64 is quite different that writing it on x86.
> Here is a document on msdn that may be useful.
> http://msdn.microsoft.com/library/de...us/kmarch/hh/k
> march/64bitAMD_6ec00b51-bf75-41bf-8635-caa8653c8bd9.xml.asp
> Thanks,
> Darrell Gorter[MSFT]
>
> This posting is provided "AS IS" with no warranties, and confers no rights
> --------------------
> <Thread-Topic: 16-byte stack alignment - is it really necessary?
> <thread-index: AcWW4/NXRsKX3DAAS2OcD12MKd4U3A==
> <X-WBNR-Posting-Host: 213.162.104.195
> <From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?="
> <>
> <Subject: 16-byte stack alignment - is it really necessary?
> <Date: Mon, 1 Aug 2005 14:57:03 -0700
> <Lines: 31
> <Message-ID: <55DA3544-59C3-4F81-8708->
> <MIME-Version: 1.0
> <Content-Type: text/plain;
> < charset="Utf-8"
> <Content-Transfer-Encoding: 7bit
> <X-Newsreader: Microsoft CDO for Windows 2000
> <Content-Class: urn:content-classes:message
> <Importance: normal
> <Priority: normal
> <X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
> <Newsgroups: microsoft.public.windows.64bit.general
> <NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
> <Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
> <Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13644
> <X-Tomcat-NG: microsoft.public.windows.64bit.general
> <
> <I have a query about how important it really is to maintain RSP on 16-byte
> <alignment. It seems to be a "given" that this will aid performance and the
> <X64 documentation is very insistent about this.
> <
> <However the AMD documentation says:-
> <"Stack Alignment. Control-transfer performance can degrade significantly
> <when the stack pointer is not aligned properly. Stack pointers should be
> word
> <aligned in 16-bit segments, doubleword aligned in 32-bit segments, and
> <quadword aligned in 64-bit mode."
> <Section 3.73 Chapter 3 "General Purpose Programming" AMD64.
> <In other words the processor manufacturer recommends only 8-byte alignment
> <for the stack.
> <
> <The reason I ask is that I am converting my assembler (GoAsm) to work with
> <64-bit source code for applications running under Windows XP64. It will
> <certainly be far easier to keep the stack on 8-byte alignment, rather than
> <16-byte alignment.
> <
> <Bearing in mind the processor manufacturer's requirement, I cannot at
> <present understand the requirement to align on 16-bytes. Why is this
> <insisted upon? Is this a hangover from some previous thinking? Is it
> <something which may be reconsidered and eventually dropped? I understand
> no
> <exception will be generated by stack alignment on 8-bytes rather than
> <16-byte, but it is said that performance may be affected. Why is this? I
> <will of course do my own speed trials, but at my current planning stage
> for
> <GoAsm, any insight into this would be very useful.
> <
> <-----
> <Jeremy Gordon
> <The "Go" tools
> <http://www.GoDevTool.com
> <
>
>

 
Reply With Quote
 
Darrell Gorter[MSFT]
Guest
Posts: n/a
 
      08-05-2005
Hello Jeremy,
A function can not call another function without first reserving stack for
parameters. A leaf function can not do this because, the unwinder assumes
the caller address is at rsp.
The unwinder needs to be able to find the first function with unwind info.
If you are in a leaf function then the address of the caller is at rsp.
If you execute a call instruction then the address of the callee leaf
function is pushed onto the stack and rsp is changed, thus the address at
rsp no longer is that of the first non-leaf function (i.e. has unwind info).
You can tailcall from leaf functions to other leaf functions (or non-leaf
functions) through a jump instruction as that doesn't affect the stack.

Thanks,
Darrell Gorter[MSFT]

This posting is provided "AS IS" with no warranties, and confers no rights
--------------------
<Thread-Topic: 16-byte stack alignment - is it really necessary?
<thread-index: AcWYysxdafDeOw9iRcieK8pLIz47wA==
<X-WBNR-Posting-Host: 213.162.104.195
<From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?="
<>
<References: <55DA3544-59C3-4F81-8708->
<>
<Subject: RE: 16-byte stack alignment - is it really necessary?
<Date: Thu, 4 Aug 2005 01:02:02 -0700
<Lines: 131
<Message-ID: <30642324-54CB-4CC8-B5AD->
<MIME-Version: 1.0
<Content-Type: text/plain;
< charset="Utf-8"
<Content-Transfer-Encoding: 7bit
<X-Newsreader: Microsoft CDO for Windows 2000
<Content-Class: urn:content-classes:message
<Importance: normal
<Priority: normal
<X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
<Newsgroups: microsoft.public.windows.64bit.general
<NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
<Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
<Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13887
<X-Tomcat-NG: microsoft.public.windows.64bit.general
<
<Thanks, Darrell for your answer.
<I was aware of the article you mentioned, in fact it was that article
which
<first alerted me to the supposed requirement that the stack is always
aligned
<on a 16-byte boundary except within prolog and epolog code in a stack
frame
<(when, of course it could not always be so aligned).
<
<The particular parts of the article which worry me are:-
<"Dynamic Parameter Stack Area Construction"
<"The stack is always 16-byte aligned when a call instruction is executed."
<And in the article:-
<"Function types"
<"Leaf function- A function that does not require a stack frame. A leaf
<function does not require a function table entry. It cannot call any
<functions, allocate space, or save any nonvolatile registers. It can leave
<the stack unaligned while it executes."
<
<Here it is said that a leaf function cannot call another function. This
is
<enormously prohibitive for compilers, and seems to be a consequence of the
<requirement that the stack should always be aligned on a 16-byte boundary
at
<the time of a call. Of course the compiler could ensure that the
alignment
<is achieved just prior to the call, and restored afterwards, but this
would
<add bloat (extra opcodes). I need to adjust the output of my development
<tools to ensure that the requirements of x64 are always met but only in so
<far as that requirement is strictly necessary. This is why I need to
<understand the reason for (a) leaf functions being prohibited from calling
<another leaf function, which seems to be related to (b) the requirement
that
<the stack be 16-byte aligned when calling even a leaf function.
<
<Bearing in mind that the AMD64 literature requires stack alignment only on
<an 8-byte boundary which is what one would expect, I believe the
requirement
<in x64 for stack alignment on a 16-byte boundary is related to exception
<handling. I can see that in frame functions, which have prolog and epilog
<code, it may well be the case that the exception handler requires all
calls
<within the frame function to be carried out when the stack pointer is
aligned
<on a 16-byte boundary. This may be necessary in order to achieve an
orderly
<unwind (going back through the calls until the correct unwind data is
found,
<and then continuing the unwind back beyond that).
<However, I do not understand at present why this is necessary for leaf
<functions. Generally a leaf function will not have its own unwind data.
It
<will not have a function table entry. It will therefore be invisible to
the
<exception handler. Instead, the exception handler will identify the
function
<which called the leaf function. To my mind, therefore, if there are a
series
<of leaf calls (ie. in which leaf functions call other leaf functions) the
<exception handler will ignore them all and automatically start the unwind
at
<the frame function which called the very first leaf function in the
series.
<This is why I am sceptical about the documented prohibition against leaf
<functions calling other leaf functions and the requirement of 16-byte
<alignment of the stack when a leaf function is called.
<However, perhaps this requirement is not to do with exception handling.
If
<so, what is the requirement to do with?
<Any help you can give me would be appreciated.
<--
<Jeremy Gordon
<The "Go" tools
<
<
<""Darrell Gorter[MSFT]"" wrote:
<
<> Hello Jeremy,
<> This is required as part of the amd64 calling convention, it is not for
<> performance only.
<> In general writing asm on x64 is quite different that writing it on x86.
<> Here is a document on msdn that may be useful.
<>
http://msdn.microsoft.com/library/de...us/kmarch/hh/k
<> march/64bitAMD_6ec00b51-bf75-41bf-8635-caa8653c8bd9.xml.asp
<> Thanks,
<> Darrell Gorter[MSFT]
<>
<> This posting is provided "AS IS" with no warranties, and confers no
rights
<> --------------------
<> <Thread-Topic: 16-byte stack alignment - is it really necessary?
<> <thread-index: AcWW4/NXRsKX3DAAS2OcD12MKd4U3A==
<> <X-WBNR-Posting-Host: 213.162.104.195
<> <From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?="
<> <>
<> <Subject: 16-byte stack alignment - is it really necessary?
<> <Date: Mon, 1 Aug 2005 14:57:03 -0700
<> <Lines: 31
<> <Message-ID: <55DA3544-59C3-4F81-8708->
<> <MIME-Version: 1.0
<> <Content-Type: text/plain;
<> < charset="Utf-8"
<> <Content-Transfer-Encoding: 7bit
<> <X-Newsreader: Microsoft CDO for Windows 2000
<> <Content-Class: urn:content-classes:message
<> <Importance: normal
<> <Priority: normal
<> <X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
<> <Newsgroups: microsoft.public.windows.64bit.general
<> <NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
<> <Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
<> <Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13644
<> <X-Tomcat-NG: microsoft.public.windows.64bit.general
<> <
<> <I have a query about how important it really is to maintain RSP on
16-byte
<> <alignment. It seems to be a "given" that this will aid performance and
the
<> <X64 documentation is very insistent about this.
<> <
<> <However the AMD documentation says:-
<> <"Stack Alignment. Control-transfer performance can degrade
significantly
<> <when the stack pointer is not aligned properly. Stack pointers should
be
<> word
<> <aligned in 16-bit segments, doubleword aligned in 32-bit segments, and
<> <quadword aligned in 64-bit mode."
<> <Section 3.73 Chapter 3 "General Purpose Programming" AMD64.
<> <In other words the processor manufacturer recommends only 8-byte
alignment
<> <for the stack.
<> <
<> <The reason I ask is that I am converting my assembler (GoAsm) to work
with
<> <64-bit source code for applications running under Windows XP64. It
will
<> <certainly be far easier to keep the stack on 8-byte alignment, rather
than
<> <16-byte alignment.
<> <
<> <Bearing in mind the processor manufacturer's requirement, I cannot at
<> <present understand the requirement to align on 16-bytes. Why is this
<> <insisted upon? Is this a hangover from some previous thinking? Is it
<> <something which may be reconsidered and eventually dropped? I
understand
<> no
<> <exception will be generated by stack alignment on 8-bytes rather than
<> <16-byte, but it is said that performance may be affected. Why is this?
I
<> <will of course do my own speed trials, but at my current planning stage
<> for
<> <GoAsm, any insight into this would be very useful.
<> <
<> <-----
<> <Jeremy Gordon
<> <The "Go" tools
<> <http://www.GoDevTool.com
<> <
<>
<>
<

 
Reply With Quote
 
=?Utf-8?B?SmVyZW15IEdvcmRvbg==?=
Guest
Posts: n/a
 
      08-05-2005
Thanks Darrell, I understand this. I hope to find a way to set up the
exception records to avoid difficulty arising from this in leaf functions
which call other leaf functions.
A colleague also suggested to me from recent documentation he had read, that
16-byte stack alignment ensures that XMM data can be moved on and off the
stack without causing an exception. For example the instruction MOVDQA
requires its data to be 16-byte aligned. This may be another reason for the
requirement for 16-byte stack alignment. Again I believe this can be dealt
with by a compiler.
Thanks very much for your help.
--
Jeremy Gordon
The "Go" tools


""Darrell Gorter[MSFT]"" wrote:

> Hello Jeremy,
> A function can not call another function without first reserving stack for
> parameters. A leaf function can not do this because, the unwinder assumes
> the caller address is at rsp.
> The unwinder needs to be able to find the first function with unwind info.
> If you are in a leaf function then the address of the caller is at rsp.
> If you execute a call instruction then the address of the callee leaf
> function is pushed onto the stack and rsp is changed, thus the address at
> rsp no longer is that of the first non-leaf function (i.e. has unwind info).
> You can tailcall from leaf functions to other leaf functions (or non-leaf
> functions) through a jump instruction as that doesn't affect the stack.
>
> Thanks,
> Darrell Gorter[MSFT]
>
> This posting is provided "AS IS" with no warranties, and confers no rights
> --------------------
> <Thread-Topic: 16-byte stack alignment - is it really necessary?
> <thread-index: AcWYysxdafDeOw9iRcieK8pLIz47wA==
> <X-WBNR-Posting-Host: 213.162.104.195
> <From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?="
> <>
> <References: <55DA3544-59C3-4F81-8708->
> <>
> <Subject: RE: 16-byte stack alignment - is it really necessary?
> <Date: Thu, 4 Aug 2005 01:02:02 -0700
> <Lines: 131
> <Message-ID: <30642324-54CB-4CC8-B5AD->
> <MIME-Version: 1.0
> <Content-Type: text/plain;
> < charset="Utf-8"
> <Content-Transfer-Encoding: 7bit
> <X-Newsreader: Microsoft CDO for Windows 2000
> <Content-Class: urn:content-classes:message
> <Importance: normal
> <Priority: normal
> <X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
> <Newsgroups: microsoft.public.windows.64bit.general
> <NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
> <Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
> <Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13887
> <X-Tomcat-NG: microsoft.public.windows.64bit.general
> <
> <Thanks, Darrell for your answer.
> <I was aware of the article you mentioned, in fact it was that article
> which
> <first alerted me to the supposed requirement that the stack is always
> aligned
> <on a 16-byte boundary except within prolog and epolog code in a stack
> frame
> <(when, of course it could not always be so aligned).
> <
> <The particular parts of the article which worry me are:-
> <"Dynamic Parameter Stack Area Construction"
> <"The stack is always 16-byte aligned when a call instruction is executed."
> <And in the article:-
> <"Function types"
> <"Leaf function- A function that does not require a stack frame. A leaf
> <function does not require a function table entry. It cannot call any
> <functions, allocate space, or save any nonvolatile registers. It can leave
> <the stack unaligned while it executes."
> <
> <Here it is said that a leaf function cannot call another function. This
> is
> <enormously prohibitive for compilers, and seems to be a consequence of the
> <requirement that the stack should always be aligned on a 16-byte boundary
> at
> <the time of a call. Of course the compiler could ensure that the
> alignment
> <is achieved just prior to the call, and restored afterwards, but this
> would
> <add bloat (extra opcodes). I need to adjust the output of my development
> <tools to ensure that the requirements of x64 are always met but only in so
> <far as that requirement is strictly necessary. This is why I need to
> <understand the reason for (a) leaf functions being prohibited from calling
> <another leaf function, which seems to be related to (b) the requirement
> that
> <the stack be 16-byte aligned when calling even a leaf function.
> <
> <Bearing in mind that the AMD64 literature requires stack alignment only on
> <an 8-byte boundary which is what one would expect, I believe the
> requirement
> <in x64 for stack alignment on a 16-byte boundary is related to exception
> <handling. I can see that in frame functions, which have prolog and epilog
> <code, it may well be the case that the exception handler requires all
> calls
> <within the frame function to be carried out when the stack pointer is
> aligned
> <on a 16-byte boundary. This may be necessary in order to achieve an
> orderly
> <unwind (going back through the calls until the correct unwind data is
> found,
> <and then continuing the unwind back beyond that).
> <However, I do not understand at present why this is necessary for leaf
> <functions. Generally a leaf function will not have its own unwind data.
> It
> <will not have a function table entry. It will therefore be invisible to
> the
> <exception handler. Instead, the exception handler will identify the
> function
> <which called the leaf function. To my mind, therefore, if there are a
> series
> <of leaf calls (ie. in which leaf functions call other leaf functions) the
> <exception handler will ignore them all and automatically start the unwind
> at
> <the frame function which called the very first leaf function in the
> series.
> <This is why I am sceptical about the documented prohibition against leaf
> <functions calling other leaf functions and the requirement of 16-byte
> <alignment of the stack when a leaf function is called.
> <However, perhaps this requirement is not to do with exception handling.
> If
> <so, what is the requirement to do with?
> <Any help you can give me would be appreciated.
> <--
> <Jeremy Gordon
> <The "Go" tools
> <
> <
> <""Darrell Gorter[MSFT]"" wrote:
> <
> <> Hello Jeremy,
> <> This is required as part of the amd64 calling convention, it is not for
> <> performance only.
> <> In general writing asm on x64 is quite different that writing it on x86.
> <> Here is a document on msdn that may be useful.
> <>
> http://msdn.microsoft.com/library/de...us/kmarch/hh/k
> <> march/64bitAMD_6ec00b51-bf75-41bf-8635-caa8653c8bd9.xml.asp
> <> Thanks,
> <> Darrell Gorter[MSFT]
> <>
> <> This posting is provided "AS IS" with no warranties, and confers no
> rights
> <> --------------------
> <> <Thread-Topic: 16-byte stack alignment - is it really necessary?
> <> <thread-index: AcWW4/NXRsKX3DAAS2OcD12MKd4U3A==
> <> <X-WBNR-Posting-Host: 213.162.104.195
> <> <From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?="
> <> <>
> <> <Subject: 16-byte stack alignment - is it really necessary?
> <> <Date: Mon, 1 Aug 2005 14:57:03 -0700
> <> <Lines: 31
> <> <Message-ID: <55DA3544-59C3-4F81-8708->
> <> <MIME-Version: 1.0
> <> <Content-Type: text/plain;
> <> < charset="Utf-8"
> <> <Content-Transfer-Encoding: 7bit
> <> <X-Newsreader: Microsoft CDO for Windows 2000
> <> <Content-Class: urn:content-classes:message
> <> <Importance: normal
> <> <Priority: normal
> <> <X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
> <> <Newsgroups: microsoft.public.windows.64bit.general
> <> <NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
> <> <Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
> <> <Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13644
> <> <X-Tomcat-NG: microsoft.public.windows.64bit.general
> <> <
> <> <I have a query about how important it really is to maintain RSP on
> 16-byte
> <> <alignment. It seems to be a "given" that this will aid performance and
> the
> <> <X64 documentation is very insistent about this.
> <> <
> <> <However the AMD documentation says:-
> <> <"Stack Alignment. Control-transfer performance can degrade
> significantly
> <> <when the stack pointer is not aligned properly. Stack pointers should
> be
> <> word
> <> <aligned in 16-bit segments, doubleword aligned in 32-bit segments, and
> <> <quadword aligned in 64-bit mode."
> <> <Section 3.73 Chapter 3 "General Purpose Programming" AMD64.
> <> <In other words the processor manufacturer recommends only 8-byte
> alignment
> <> <for the stack.
> <> <
> <> <The reason I ask is that I am converting my assembler (GoAsm) to work
> with
> <> <64-bit source code for applications running under Windows XP64. It
> will
> <> <certainly be far easier to keep the stack on 8-byte alignment, rather
> than
> <> <16-byte alignment.
> <> <
> <> <Bearing in mind the processor manufacturer's requirement, I cannot at
> <> <present understand the requirement to align on 16-bytes. Why is this
> <> <insisted upon? Is this a hangover from some previous thinking? Is it
> <> <something which may be reconsidered and eventually dropped? I
> understand
> <> no
> <> <exception will be generated by stack alignment on 8-bytes rather than
> <> <16-byte, but it is said that performance may be affected. Why is this?
> I
> <> <will of course do my own speed trials, but at my current planning stage
> <> for
> <> <GoAsm, any insight into this would be very useful.
> <> <
> <> <-----
> <> <Jeremy Gordon
> <> <The "Go" tools
> <> <http://www.GoDevTool.com
> <> <
> <>
> <>
> <
>
>

 
Reply With Quote
 
=?Utf-8?B?SmVyZW15IEdvcmRvbg==?=
Guest
Posts: n/a
 
      08-20-2005
I did experiment by intentionally putting the stack out of alignment by 8
bytes, and at least within a window procedure, DefWindowProc objected. The
debugger showed an exception at a MOVAPS instruction within system code.

However, I tried the same thing with the latest version of Windows x64,
5.2.3790 Service Pack 1, and there is no MOVAPS exception any more.

I suspect that the system has been made more robust and now itself ensures
that the stack is properly aligned. If so, then this makes it much easier
for a compiler to generate the correct code, because it would not have to
ensure that the stack is always 16-byte aligned.

Has anyone found any APIs sensitive to stack misalignment under the latest
version of x64? If I find any I shall post a report here.
--
Jeremy Gordon
The "Go" tools


"Jeremy Gordon" wrote:

> Thanks Darrell, I understand this. I hope to find a way to set up the
> exception records to avoid difficulty arising from this in leaf functions
> which call other leaf functions.
> A colleague also suggested to me from recent documentation he had read, that
> 16-byte stack alignment ensures that XMM data can be moved on and off the
> stack without causing an exception. For example the instruction MOVDQA
> requires its data to be 16-byte aligned. This may be another reason for the
> requirement for 16-byte stack alignment. Again I believe this can be dealt
> with by a compiler.
> Thanks very much for your help.
> --
> Jeremy Gordon
> The "Go" tools
>
>
> ""Darrell Gorter[MSFT]"" wrote:
>
> > Hello Jeremy,
> > A function can not call another function without first reserving stack for
> > parameters. A leaf function can not do this because, the unwinder assumes
> > the caller address is at rsp.
> > The unwinder needs to be able to find the first function with unwind info.
> > If you are in a leaf function then the address of the caller is at rsp.
> > If you execute a call instruction then the address of the callee leaf
> > function is pushed onto the stack and rsp is changed, thus the address at
> > rsp no longer is that of the first non-leaf function (i.e. has unwind info).
> > You can tailcall from leaf functions to other leaf functions (or non-leaf
> > functions) through a jump instruction as that doesn't affect the stack.
> >
> > Thanks,
> > Darrell Gorter[MSFT]
> >
> > This posting is provided "AS IS" with no warranties, and confers no rights
> > --------------------
> > <Thread-Topic: 16-byte stack alignment - is it really necessary?
> > <thread-index: AcWYysxdafDeOw9iRcieK8pLIz47wA==
> > <X-WBNR-Posting-Host: 213.162.104.195
> > <From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?="
> > <>
> > <References: <55DA3544-59C3-4F81-8708->
> > <>
> > <Subject: RE: 16-byte stack alignment - is it really necessary?
> > <Date: Thu, 4 Aug 2005 01:02:02 -0700
> > <Lines: 131
> > <Message-ID: <30642324-54CB-4CC8-B5AD->
> > <MIME-Version: 1.0
> > <Content-Type: text/plain;
> > < charset="Utf-8"
> > <Content-Transfer-Encoding: 7bit
> > <X-Newsreader: Microsoft CDO for Windows 2000
> > <Content-Class: urn:content-classes:message
> > <Importance: normal
> > <Priority: normal
> > <X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
> > <Newsgroups: microsoft.public.windows.64bit.general
> > <NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
> > <Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
> > <Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13887
> > <X-Tomcat-NG: microsoft.public.windows.64bit.general
> > <
> > <Thanks, Darrell for your answer.
> > <I was aware of the article you mentioned, in fact it was that article
> > which
> > <first alerted me to the supposed requirement that the stack is always
> > aligned
> > <on a 16-byte boundary except within prolog and epolog code in a stack
> > frame
> > <(when, of course it could not always be so aligned).
> > <
> > <The particular parts of the article which worry me are:-
> > <"Dynamic Parameter Stack Area Construction"
> > <"The stack is always 16-byte aligned when a call instruction is executed."
> > <And in the article:-
> > <"Function types"
> > <"Leaf function- A function that does not require a stack frame. A leaf
> > <function does not require a function table entry. It cannot call any
> > <functions, allocate space, or save any nonvolatile registers. It can leave
> > <the stack unaligned while it executes."
> > <
> > <Here it is said that a leaf function cannot call another function. This
> > is
> > <enormously prohibitive for compilers, and seems to be a consequence of the
> > <requirement that the stack should always be aligned on a 16-byte boundary
> > at
> > <the time of a call. Of course the compiler could ensure that the
> > alignment
> > <is achieved just prior to the call, and restored afterwards, but this
> > would
> > <add bloat (extra opcodes). I need to adjust the output of my development
> > <tools to ensure that the requirements of x64 are always met but only in so
> > <far as that requirement is strictly necessary. This is why I need to
> > <understand the reason for (a) leaf functions being prohibited from calling
> > <another leaf function, which seems to be related to (b) the requirement
> > that
> > <the stack be 16-byte aligned when calling even a leaf function.
> > <
> > <Bearing in mind that the AMD64 literature requires stack alignment only on
> > <an 8-byte boundary which is what one would expect, I believe the
> > requirement
> > <in x64 for stack alignment on a 16-byte boundary is related to exception
> > <handling. I can see that in frame functions, which have prolog and epilog
> > <code, it may well be the case that the exception handler requires all
> > calls
> > <within the frame function to be carried out when the stack pointer is
> > aligned
> > <on a 16-byte boundary. This may be necessary in order to achieve an
> > orderly
> > <unwind (going back through the calls until the correct unwind data is
> > found,
> > <and then continuing the unwind back beyond that).
> > <However, I do not understand at present why this is necessary for leaf
> > <functions. Generally a leaf function will not have its own unwind data.
> > It
> > <will not have a function table entry. It will therefore be invisible to
> > the
> > <exception handler. Instead, the exception handler will identify the
> > function
> > <which called the leaf function. To my mind, therefore, if there are a
> > series
> > <of leaf calls (ie. in which leaf functions call other leaf functions) the
> > <exception handler will ignore them all and automatically start the unwind
> > at
> > <the frame function which called the very first leaf function in the
> > series.
> > <This is why I am sceptical about the documented prohibition against leaf
> > <functions calling other leaf functions and the requirement of 16-byte
> > <alignment of the stack when a leaf function is called.
> > <However, perhaps this requirement is not to do with exception handling.
> > If
> > <so, what is the requirement to do with?
> > <Any help you can give me would be appreciated.
> > <--
> > <Jeremy Gordon
> > <The "Go" tools
> > <
> > <
> > <""Darrell Gorter[MSFT]"" wrote:
> > <
> > <> Hello Jeremy,
> > <> This is required as part of the amd64 calling convention, it is not for
> > <> performance only.
> > <> In general writing asm on x64 is quite different that writing it on x86.
> > <> Here is a document on msdn that may be useful.
> > <>
> > http://msdn.microsoft.com/library/de...us/kmarch/hh/k
> > <> march/64bitAMD_6ec00b51-bf75-41bf-8635-caa8653c8bd9.xml.asp
> > <> Thanks,
> > <> Darrell Gorter[MSFT]
> > <>
> > <> This posting is provided "AS IS" with no warranties, and confers no
> > rights
> > <> --------------------
> > <> <Thread-Topic: 16-byte stack alignment - is it really necessary?
> > <> <thread-index: AcWW4/NXRsKX3DAAS2OcD12MKd4U3A==
> > <> <X-WBNR-Posting-Host: 213.162.104.195
> > <> <From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?="
> > <> <>
> > <> <Subject: 16-byte stack alignment - is it really necessary?
> > <> <Date: Mon, 1 Aug 2005 14:57:03 -0700
> > <> <Lines: 31
> > <> <Message-ID: <55DA3544-59C3-4F81-8708->
> > <> <MIME-Version: 1.0
> > <> <Content-Type: text/plain;
> > <> < charset="Utf-8"
> > <> <Content-Transfer-Encoding: 7bit
> > <> <X-Newsreader: Microsoft CDO for Windows 2000
> > <> <Content-Class: urn:content-classes:message
> > <> <Importance: normal
> > <> <Priority: normal
> > <> <X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
> > <> <Newsgroups: microsoft.public.windows.64bit.general
> > <> <NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
> > <> <Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
> > <> <Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13644
> > <> <X-Tomcat-NG: microsoft.public.windows.64bit.general
> > <> <
> > <> <I have a query about how important it really is to maintain RSP on
> > 16-byte
> > <> <alignment. It seems to be a "given" that this will aid performance and
> > the
> > <> <X64 documentation is very insistent about this.
> > <> <
> > <> <However the AMD documentation says:-
> > <> <"Stack Alignment. Control-transfer performance can degrade
> > significantly
> > <> <when the stack pointer is not aligned properly. Stack pointers should
> > be
> > <> word
> > <> <aligned in 16-bit segments, doubleword aligned in 32-bit segments, and
> > <> <quadword aligned in 64-bit mode."
> > <> <Section 3.73 Chapter 3 "General Purpose Programming" AMD64.
> > <> <In other words the processor manufacturer recommends only 8-byte
> > alignment
> > <> <for the stack.
> > <> <
> > <> <The reason I ask is that I am converting my assembler (GoAsm) to work
> > with
> > <> <64-bit source code for applications running under Windows XP64. It
> > will
> > <> <certainly be far easier to keep the stack on 8-byte alignment, rather
> > than
> > <> <16-byte alignment.
> > <> <
> > <> <Bearing in mind the processor manufacturer's requirement, I cannot at
> > <> <present understand the requirement to align on 16-bytes. Why is this
> > <> <insisted upon? Is this a hangover from some previous thinking? Is it
> > <> <something which may be reconsidered and eventually dropped? I
> > understand
> > <> no
> > <> <exception will be generated by stack alignment on 8-bytes rather than
> > <> <16-byte, but it is said that performance may be affected. Why is this?
> > I
> > <> <will of course do my own speed trials, but at my current planning stage
> > <> for
> > <> <GoAsm, any insight into this would be very useful.
> > <> <
> > <> <-----
> > <> <Jeremy Gordon
> > <> <The "Go" tools
> > <> <http://www.GoDevTool.com
> > <> <
> > <>
> > <>
> > <
> >
> >

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
C/C++ compilers have one stack for local variables and return addresses and then another stack for array allocations on the stack. Casey Hawthorne C Programming 3 11-01-2009 08:23 PM
stack pointer alignment on x86 and x86_64 omkarenator C Programming 2 05-01-2009 01:53 PM
Alignment on stack arrays H.K. Kingston-Smith C Programming 16 06-17-2008 08:29 AM
REALLY Basic: DIV/CSS Vertical Alignment Slick50 HTML 2 12-03-2006 09:15 AM
Stack alignment issues jacob navia Windows 64bit 4 12-31-2005 07:40 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57