| Home | Forums | Reviews | Guides | Newsgroups | Register | Search |
![]() |
| Thread Tools |
|
=?Utf-8?B?SmVyZW15IEdvcmRvbg==?=
Guest
Posts: n/a
|
I have a query about how important it really is to maintain RSP on 16-byte
alignment. It seems to be a "given" that this will aid performance and the X64 documentation is very insistent about this. However the AMD documentation says:- "Stack Alignment. Control-transfer performance can degrade significantly when the stack pointer is not aligned properly. Stack pointers should be word aligned in 16-bit segments, doubleword aligned in 32-bit segments, and quadword aligned in 64-bit mode." Section 3.73 Chapter 3 "General Purpose Programming" AMD64. In other words the processor manufacturer recommends only 8-byte alignment for the stack. The reason I ask is that I am converting my assembler (GoAsm) to work with 64-bit source code for applications running under Windows XP64. It will certainly be far easier to keep the stack on 8-byte alignment, rather than 16-byte alignment. Bearing in mind the processor manufacturer's requirement, I cannot at present understand the requirement to align on 16-bytes. Why is this insisted upon? Is this a hangover from some previous thinking? Is it something which may be reconsidered and eventually dropped? I understand no exception will be generated by stack alignment on 8-bytes rather than 16-byte, but it is said that performance may be affected. Why is this? I will of course do my own speed trials, but at my current planning stage for GoAsm, any insight into this would be very useful. ----- Jeremy Gordon The "Go" tools http://www.GoDevTool.com |
|
|
|
|
|||
|
|||
| =?Utf-8?B?SmVyZW15IEdvcmRvbg==?= |
|
|
|
| |
|
Darrell Gorter[MSFT]
Guest
Posts: n/a
|
Hello Jeremy,
This is required as part of the amd64 calling convention, it is not for performance only. In general writing asm on x64 is quite different that writing it on x86. Here is a document on msdn that may be useful. http://msdn.microsoft.com/library/de...us/kmarch/hh/k march/64bitAMD_6ec00b51-bf75-41bf-8635-caa8653c8bd9.xml.asp Thanks, Darrell Gorter[MSFT] This posting is provided "AS IS" with no warranties, and confers no rights -------------------- <Thread-Topic: 16-byte stack alignment - is it really necessary? <thread-index: AcWW4/NXRsKX3DAAS2OcD12MKd4U3A== <X-WBNR-Posting-Host: 213.162.104.195 <From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?=" <> <Subject: 16-byte stack alignment - is it really necessary? <Date: Mon, 1 Aug 2005 14:57:03 -0700 <Lines: 31 <Message-ID: <55DA3544-59C3-4F81-8708-> <MIME-Version: 1.0 <Content-Type: text/plain; < charset="Utf-8" <Content-Transfer-Encoding: 7bit <X-Newsreader: Microsoft CDO for Windows 2000 <Content-Class: urn:content-classes:message <Importance: normal <Priority: normal <X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0 <Newsgroups: microsoft.public.windows.64bit.general <NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250 <Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl <Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13644 <X-Tomcat-NG: microsoft.public.windows.64bit.general < <I have a query about how important it really is to maintain RSP on 16-byte <alignment. It seems to be a "given" that this will aid performance and the <X64 documentation is very insistent about this. < <However the AMD documentation says:- <"Stack Alignment. Control-transfer performance can degrade significantly <when the stack pointer is not aligned properly. Stack pointers should be word <aligned in 16-bit segments, doubleword aligned in 32-bit segments, and <quadword aligned in 64-bit mode." <Section 3.73 Chapter 3 "General Purpose Programming" AMD64. <In other words the processor manufacturer recommends only 8-byte alignment <for the stack. < <The reason I ask is that I am converting my assembler (GoAsm) to work with <64-bit source code for applications running under Windows XP64. It will <certainly be far easier to keep the stack on 8-byte alignment, rather than <16-byte alignment. < <Bearing in mind the processor manufacturer's requirement, I cannot at <present understand the requirement to align on 16-bytes. Why is this <insisted upon? Is this a hangover from some previous thinking? Is it <something which may be reconsidered and eventually dropped? I understand no <exception will be generated by stack alignment on 8-bytes rather than <16-byte, but it is said that performance may be affected. Why is this? I <will of course do my own speed trials, but at my current planning stage for <GoAsm, any insight into this would be very useful. < <----- <Jeremy Gordon <The "Go" tools <http://www.GoDevTool.com < |
|
|
|
|
|||
|
|||
| Darrell Gorter[MSFT] |
|
|
|
| |
|
=?Utf-8?B?SmVyZW15IEdvcmRvbg==?=
Guest
Posts: n/a
|
Thanks, Darrell for your answer.
I was aware of the article you mentioned, in fact it was that article which first alerted me to the supposed requirement that the stack is always aligned on a 16-byte boundary except within prolog and epolog code in a stack frame (when, of course it could not always be so aligned). The particular parts of the article which worry me are:- "Dynamic Parameter Stack Area Construction" "The stack is always 16-byte aligned when a call instruction is executed." And in the article:- "Function types" "Leaf function- A function that does not require a stack frame. A leaf function does not require a function table entry. It cannot call any functions, allocate space, or save any nonvolatile registers. It can leave the stack unaligned while it executes." Here it is said that a leaf function cannot call another function. This is enormously prohibitive for compilers, and seems to be a consequence of the requirement that the stack should always be aligned on a 16-byte boundary at the time of a call. Of course the compiler could ensure that the alignment is achieved just prior to the call, and restored afterwards, but this would add bloat (extra opcodes). I need to adjust the output of my development tools to ensure that the requirements of x64 are always met but only in so far as that requirement is strictly necessary. This is why I need to understand the reason for (a) leaf functions being prohibited from calling another leaf function, which seems to be related to (b) the requirement that the stack be 16-byte aligned when calling even a leaf function. Bearing in mind that the AMD64 literature requires stack alignment only on an 8-byte boundary which is what one would expect, I believe the requirement in x64 for stack alignment on a 16-byte boundary is related to exception handling. I can see that in frame functions, which have prolog and epilog code, it may well be the case that the exception handler requires all calls within the frame function to be carried out when the stack pointer is aligned on a 16-byte boundary. This may be necessary in order to achieve an orderly unwind (going back through the calls until the correct unwind data is found, and then continuing the unwind back beyond that). However, I do not understand at present why this is necessary for leaf functions. Generally a leaf function will not have its own unwind data. It will not have a function table entry. It will therefore be invisible to the exception handler. Instead, the exception handler will identify the function which called the leaf function. To my mind, therefore, if there are a series of leaf calls (ie. in which leaf functions call other leaf functions) the exception handler will ignore them all and automatically start the unwind at the frame function which called the very first leaf function in the series. This is why I am sceptical about the documented prohibition against leaf functions calling other leaf functions and the requirement of 16-byte alignment of the stack when a leaf function is called. However, perhaps this requirement is not to do with exception handling. If so, what is the requirement to do with? Any help you can give me would be appreciated. -- Jeremy Gordon The "Go" tools ""Darrell Gorter[MSFT]"" wrote: > Hello Jeremy, > This is required as part of the amd64 calling convention, it is not for > performance only. > In general writing asm on x64 is quite different that writing it on x86. > Here is a document on msdn that may be useful. > http://msdn.microsoft.com/library/de...us/kmarch/hh/k > march/64bitAMD_6ec00b51-bf75-41bf-8635-caa8653c8bd9.xml.asp > Thanks, > Darrell Gorter[MSFT] > > This posting is provided "AS IS" with no warranties, and confers no rights > -------------------- > <Thread-Topic: 16-byte stack alignment - is it really necessary? > <thread-index: AcWW4/NXRsKX3DAAS2OcD12MKd4U3A== > <X-WBNR-Posting-Host: 213.162.104.195 > <From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?=" > <> > <Subject: 16-byte stack alignment - is it really necessary? > <Date: Mon, 1 Aug 2005 14:57:03 -0700 > <Lines: 31 > <Message-ID: <55DA3544-59C3-4F81-8708-> > <MIME-Version: 1.0 > <Content-Type: text/plain; > < charset="Utf-8" > <Content-Transfer-Encoding: 7bit > <X-Newsreader: Microsoft CDO for Windows 2000 > <Content-Class: urn:content-classes:message > <Importance: normal > <Priority: normal > <X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0 > <Newsgroups: microsoft.public.windows.64bit.general > <NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250 > <Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl > <Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13644 > <X-Tomcat-NG: microsoft.public.windows.64bit.general > < > <I have a query about how important it really is to maintain RSP on 16-byte > <alignment. It seems to be a "given" that this will aid performance and the > <X64 documentation is very insistent about this. > < > <However the AMD documentation says:- > <"Stack Alignment. Control-transfer performance can degrade significantly > <when the stack pointer is not aligned properly. Stack pointers should be > word > <aligned in 16-bit segments, doubleword aligned in 32-bit segments, and > <quadword aligned in 64-bit mode." > <Section 3.73 Chapter 3 "General Purpose Programming" AMD64. > <In other words the processor manufacturer recommends only 8-byte alignment > <for the stack. > < > <The reason I ask is that I am converting my assembler (GoAsm) to work with > <64-bit source code for applications running under Windows XP64. It will > <certainly be far easier to keep the stack on 8-byte alignment, rather than > <16-byte alignment. > < > <Bearing in mind the processor manufacturer's requirement, I cannot at > <present understand the requirement to align on 16-bytes. Why is this > <insisted upon? Is this a hangover from some previous thinking? Is it > <something which may be reconsidered and eventually dropped? I understand > no > <exception will be generated by stack alignment on 8-bytes rather than > <16-byte, but it is said that performance may be affected. Why is this? I > <will of course do my own speed trials, but at my current planning stage > for > <GoAsm, any insight into this would be very useful. > < > <----- > <Jeremy Gordon > <The "Go" tools > <http://www.GoDevTool.com > < > > |
|
|
|
|
|||
|
|||
| =?Utf-8?B?SmVyZW15IEdvcmRvbg==?= |
|
Darrell Gorter[MSFT]
Guest
Posts: n/a
|
Hello Jeremy,
A function can not call another function without first reserving stack for parameters. A leaf function can not do this because, the unwinder assumes the caller address is at rsp. The unwinder needs to be able to find the first function with unwind info. If you are in a leaf function then the address of the caller is at rsp. If you execute a call instruction then the address of the callee leaf function is pushed onto the stack and rsp is changed, thus the address at rsp no longer is that of the first non-leaf function (i.e. has unwind info). You can tailcall from leaf functions to other leaf functions (or non-leaf functions) through a jump instruction as that doesn't affect the stack. Thanks, Darrell Gorter[MSFT] This posting is provided "AS IS" with no warranties, and confers no rights -------------------- <Thread-Topic: 16-byte stack alignment - is it really necessary? <thread-index: AcWYysxdafDeOw9iRcieK8pLIz47wA== <X-WBNR-Posting-Host: 213.162.104.195 <From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?=" <> <References: <55DA3544-59C3-4F81-8708-> <> <Subject: RE: 16-byte stack alignment - is it really necessary? <Date: Thu, 4 Aug 2005 01:02:02 -0700 <Lines: 131 <Message-ID: <30642324-54CB-4CC8-B5AD-> <MIME-Version: 1.0 <Content-Type: text/plain; < charset="Utf-8" <Content-Transfer-Encoding: 7bit <X-Newsreader: Microsoft CDO for Windows 2000 <Content-Class: urn:content-classes:message <Importance: normal <Priority: normal <X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0 <Newsgroups: microsoft.public.windows.64bit.general <NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250 <Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl <Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13887 <X-Tomcat-NG: microsoft.public.windows.64bit.general < <Thanks, Darrell for your answer. <I was aware of the article you mentioned, in fact it was that article which <first alerted me to the supposed requirement that the stack is always aligned <on a 16-byte boundary except within prolog and epolog code in a stack frame <(when, of course it could not always be so aligned). < <The particular parts of the article which worry me are:- <"Dynamic Parameter Stack Area Construction" <"The stack is always 16-byte aligned when a call instruction is executed." <And in the article:- <"Function types" <"Leaf function- A function that does not require a stack frame. A leaf <function does not require a function table entry. It cannot call any <functions, allocate space, or save any nonvolatile registers. It can leave <the stack unaligned while it executes." < <Here it is said that a leaf function cannot call another function. This is <enormously prohibitive for compilers, and seems to be a consequence of the <requirement that the stack should always be aligned on a 16-byte boundary at <the time of a call. Of course the compiler could ensure that the alignment <is achieved just prior to the call, and restored afterwards, but this would <add bloat (extra opcodes). I need to adjust the output of my development <tools to ensure that the requirements of x64 are always met but only in so <far as that requirement is strictly necessary. This is why I need to <understand the reason for (a) leaf functions being prohibited from calling <another leaf function, which seems to be related to (b) the requirement that <the stack be 16-byte aligned when calling even a leaf function. < <Bearing in mind that the AMD64 literature requires stack alignment only on <an 8-byte boundary which is what one would expect, I believe the requirement <in x64 for stack alignment on a 16-byte boundary is related to exception <handling. I can see that in frame functions, which have prolog and epilog <code, it may well be the case that the exception handler requires all calls <within the frame function to be carried out when the stack pointer is aligned <on a 16-byte boundary. This may be necessary in order to achieve an orderly <unwind (going back through the calls until the correct unwind data is found, <and then continuing the unwind back beyond that). <However, I do not understand at present why this is necessary for leaf <functions. Generally a leaf function will not have its own unwind data. It <will not have a function table entry. It will therefore be invisible to the <exception handler. Instead, the exception handler will identify the function <which called the leaf function. To my mind, therefore, if there are a series <of leaf calls (ie. in which leaf functions call other leaf functions) the <exception handler will ignore them all and automatically start the unwind at <the frame function which called the very first leaf function in the series. <This is why I am sceptical about the documented prohibition against leaf <functions calling other leaf functions and the requirement of 16-byte <alignment of the stack when a leaf function is called. <However, perhaps this requirement is not to do with exception handling. If <so, what is the requirement to do with? <Any help you can give me would be appreciated. <-- <Jeremy Gordon <The "Go" tools < < <""Darrell Gorter[MSFT]"" wrote: < <> Hello Jeremy, <> This is required as part of the amd64 calling convention, it is not for <> performance only. <> In general writing asm on x64 is quite different that writing it on x86. <> Here is a document on msdn that may be useful. <> http://msdn.microsoft.com/library/de...us/kmarch/hh/k <> march/64bitAMD_6ec00b51-bf75-41bf-8635-caa8653c8bd9.xml.asp <> Thanks, <> Darrell Gorter[MSFT] <> <> This posting is provided "AS IS" with no warranties, and confers no rights <> -------------------- <> <Thread-Topic: 16-byte stack alignment - is it really necessary? <> <thread-index: AcWW4/NXRsKX3DAAS2OcD12MKd4U3A== <> <X-WBNR-Posting-Host: 213.162.104.195 <> <From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?=" <> <> <> <Subject: 16-byte stack alignment - is it really necessary? <> <Date: Mon, 1 Aug 2005 14:57:03 -0700 <> <Lines: 31 <> <Message-ID: <55DA3544-59C3-4F81-8708-> <> <MIME-Version: 1.0 <> <Content-Type: text/plain; <> < charset="Utf-8" <> <Content-Transfer-Encoding: 7bit <> <X-Newsreader: Microsoft CDO for Windows 2000 <> <Content-Class: urn:content-classes:message <> <Importance: normal <> <Priority: normal <> <X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0 <> <Newsgroups: microsoft.public.windows.64bit.general <> <NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250 <> <Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl <> <Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13644 <> <X-Tomcat-NG: microsoft.public.windows.64bit.general <> < <> <I have a query about how important it really is to maintain RSP on 16-byte <> <alignment. It seems to be a "given" that this will aid performance and the <> <X64 documentation is very insistent about this. <> < <> <However the AMD documentation says:- <> <"Stack Alignment. Control-transfer performance can degrade significantly <> <when the stack pointer is not aligned properly. Stack pointers should be <> word <> <aligned in 16-bit segments, doubleword aligned in 32-bit segments, and <> <quadword aligned in 64-bit mode." <> <Section 3.73 Chapter 3 "General Purpose Programming" AMD64. <> <In other words the processor manufacturer recommends only 8-byte alignment <> <for the stack. <> < <> <The reason I ask is that I am converting my assembler (GoAsm) to work with <> <64-bit source code for applications running under Windows XP64. It will <> <certainly be far easier to keep the stack on 8-byte alignment, rather than <> <16-byte alignment. <> < <> <Bearing in mind the processor manufacturer's requirement, I cannot at <> <present understand the requirement to align on 16-bytes. Why is this <> <insisted upon? Is this a hangover from some previous thinking? Is it <> <something which may be reconsidered and eventually dropped? I understand <> no <> <exception will be generated by stack alignment on 8-bytes rather than <> <16-byte, but it is said that performance may be affected. Why is this? I <> <will of course do my own speed trials, but at my current planning stage <> for <> <GoAsm, any insight into this would be very useful. <> < <> <----- <> <Jeremy Gordon <> <The "Go" tools <> <http://www.GoDevTool.com <> < <> <> < |
|
|
|
|
|||
|
|||
| Darrell Gorter[MSFT] |
|
=?Utf-8?B?SmVyZW15IEdvcmRvbg==?=
Guest
Posts: n/a
|
Thanks Darrell, I understand this. I hope to find a way to set up the
exception records to avoid difficulty arising from this in leaf functions which call other leaf functions. A colleague also suggested to me from recent documentation he had read, that 16-byte stack alignment ensures that XMM data can be moved on and off the stack without causing an exception. For example the instruction MOVDQA requires its data to be 16-byte aligned. This may be another reason for the requirement for 16-byte stack alignment. Again I believe this can be dealt with by a compiler. Thanks very much for your help. -- Jeremy Gordon The "Go" tools ""Darrell Gorter[MSFT]"" wrote: > Hello Jeremy, > A function can not call another function without first reserving stack for > parameters. A leaf function can not do this because, the unwinder assumes > the caller address is at rsp. > The unwinder needs to be able to find the first function with unwind info. > If you are in a leaf function then the address of the caller is at rsp. > If you execute a call instruction then the address of the callee leaf > function is pushed onto the stack and rsp is changed, thus the address at > rsp no longer is that of the first non-leaf function (i.e. has unwind info). > You can tailcall from leaf functions to other leaf functions (or non-leaf > functions) through a jump instruction as that doesn't affect the stack. > > Thanks, > Darrell Gorter[MSFT] > > This posting is provided "AS IS" with no warranties, and confers no rights > -------------------- > <Thread-Topic: 16-byte stack alignment - is it really necessary? > <thread-index: AcWYysxdafDeOw9iRcieK8pLIz47wA== > <X-WBNR-Posting-Host: 213.162.104.195 > <From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?=" > <> > <References: <55DA3544-59C3-4F81-8708-> > <> > <Subject: RE: 16-byte stack alignment - is it really necessary? > <Date: Thu, 4 Aug 2005 01:02:02 -0700 > <Lines: 131 > <Message-ID: <30642324-54CB-4CC8-B5AD-> > <MIME-Version: 1.0 > <Content-Type: text/plain; > < charset="Utf-8" > <Content-Transfer-Encoding: 7bit > <X-Newsreader: Microsoft CDO for Windows 2000 > <Content-Class: urn:content-classes:message > <Importance: normal > <Priority: normal > <X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0 > <Newsgroups: microsoft.public.windows.64bit.general > <NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250 > <Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl > <Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13887 > <X-Tomcat-NG: microsoft.public.windows.64bit.general > < > <Thanks, Darrell for your answer. > <I was aware of the article you mentioned, in fact it was that article > which > <first alerted me to the supposed requirement that the stack is always > aligned > <on a 16-byte boundary except within prolog and epolog code in a stack > frame > <(when, of course it could not always be so aligned). > < > <The particular parts of the article which worry me are:- > <"Dynamic Parameter Stack Area Construction" > <"The stack is always 16-byte aligned when a call instruction is executed." > <And in the article:- > <"Function types" > <"Leaf function- A function that does not require a stack frame. A leaf > <function does not require a function table entry. It cannot call any > <functions, allocate space, or save any nonvolatile registers. It can leave > <the stack unaligned while it executes." > < > <Here it is said that a leaf function cannot call another function. This > is > <enormously prohibitive for compilers, and seems to be a consequence of the > <requirement that the stack should always be aligned on a 16-byte boundary > at > <the time of a call. Of course the compiler could ensure that the > alignment > <is achieved just prior to the call, and restored afterwards, but this > would > <add bloat (extra opcodes). I need to adjust the output of my development > <tools to ensure that the requirements of x64 are always met but only in so > <far as that requirement is strictly necessary. This is why I need to > <understand the reason for (a) leaf functions being prohibited from calling > <another leaf function, which seems to be related to (b) the requirement > that > <the stack be 16-byte aligned when calling even a leaf function. > < > <Bearing in mind that the AMD64 literature requires stack alignment only on > <an 8-byte boundary which is what one would expect, I believe the > requirement > <in x64 for stack alignment on a 16-byte boundary is related to exception > <handling. I can see that in frame functions, which have prolog and epilog > <code, it may well be the case that the exception handler requires all > calls > <within the frame function to be carried out when the stack pointer is > aligned > <on a 16-byte boundary. This may be necessary in order to achieve an > orderly > <unwind (going back through the calls until the correct unwind data is > found, > <and then continuing the unwind back beyond that). > <However, I do not understand at present why this is necessary for leaf > <functions. Generally a leaf function will not have its own unwind data. > It > <will not have a function table entry. It will therefore be invisible to > the > <exception handler. Instead, the exception handler will identify the > function > <which called the leaf function. To my mind, therefore, if there are a > series > <of leaf calls (ie. in which leaf functions call other leaf functions) the > <exception handler will ignore them all and automatically start the unwind > at > <the frame function which called the very first leaf function in the > series. > <This is why I am sceptical about the documented prohibition against leaf > <functions calling other leaf functions and the requirement of 16-byte > <alignment of the stack when a leaf function is called. > <However, perhaps this requirement is not to do with exception handling. > If > <so, what is the requirement to do with? > <Any help you can give me would be appreciated. > <-- > <Jeremy Gordon > <The "Go" tools > < > < > <""Darrell Gorter[MSFT]"" wrote: > < > <> Hello Jeremy, > <> This is required as part of the amd64 calling convention, it is not for > <> performance only. > <> In general writing asm on x64 is quite different that writing it on x86. > <> Here is a document on msdn that may be useful. > <> > http://msdn.microsoft.com/library/de...us/kmarch/hh/k > <> march/64bitAMD_6ec00b51-bf75-41bf-8635-caa8653c8bd9.xml.asp > <> Thanks, > <> Darrell Gorter[MSFT] > <> > <> This posting is provided "AS IS" with no warranties, and confers no > rights > <> -------------------- > <> <Thread-Topic: 16-byte stack alignment - is it really necessary? > <> <thread-index: AcWW4/NXRsKX3DAAS2OcD12MKd4U3A== > <> <X-WBNR-Posting-Host: 213.162.104.195 > <> <From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?=" > <> <> > <> <Subject: 16-byte stack alignment - is it really necessary? > <> <Date: Mon, 1 Aug 2005 14:57:03 -0700 > <> <Lines: 31 > <> <Message-ID: <55DA3544-59C3-4F81-8708-> > <> <MIME-Version: 1.0 > <> <Content-Type: text/plain; > <> < charset="Utf-8" > <> <Content-Transfer-Encoding: 7bit > <> <X-Newsreader: Microsoft CDO for Windows 2000 > <> <Content-Class: urn:content-classes:message > <> <Importance: normal > <> <Priority: normal > <> <X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0 > <> <Newsgroups: microsoft.public.windows.64bit.general > <> <NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250 > <> <Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl > <> <Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13644 > <> <X-Tomcat-NG: microsoft.public.windows.64bit.general > <> < > <> <I have a query about how important it really is to maintain RSP on > 16-byte > <> <alignment. It seems to be a "given" that this will aid performance and > the > <> <X64 documentation is very insistent about this. > <> < > <> <However the AMD documentation says:- > <> <"Stack Alignment. Control-transfer performance can degrade > significantly > <> <when the stack pointer is not aligned properly. Stack pointers should > be > <> word > <> <aligned in 16-bit segments, doubleword aligned in 32-bit segments, and > <> <quadword aligned in 64-bit mode." > <> <Section 3.73 Chapter 3 "General Purpose Programming" AMD64. > <> <In other words the processor manufacturer recommends only 8-byte > alignment > <> <for the stack. > <> < > <> <The reason I ask is that I am converting my assembler (GoAsm) to work > with > <> <64-bit source code for applications running under Windows XP64. It > will > <> <certainly be far easier to keep the stack on 8-byte alignment, rather > than > <> <16-byte alignment. > <> < > <> <Bearing in mind the processor manufacturer's requirement, I cannot at > <> <present understand the requirement to align on 16-bytes. Why is this > <> <insisted upon? Is this a hangover from some previous thinking? Is it > <> <something which may be reconsidered and eventually dropped? I > understand > <> no > <> <exception will be generated by stack alignment on 8-bytes rather than > <> <16-byte, but it is said that performance may be affected. Why is this? > I > <> <will of course do my own speed trials, but at my current planning stage > <> for > <> <GoAsm, any insight into this would be very useful. > <> < > <> <----- > <> <Jeremy Gordon > <> <The "Go" tools > <> <http://www.GoDevTool.com > <> < > <> > <> > < > > |
|
|
|
|
|||
|
|||
| =?Utf-8?B?SmVyZW15IEdvcmRvbg==?= |
|
=?Utf-8?B?SmVyZW15IEdvcmRvbg==?=
Guest
Posts: n/a
|
I did experiment by intentionally putting the stack out of alignment by 8
bytes, and at least within a window procedure, DefWindowProc objected. The debugger showed an exception at a MOVAPS instruction within system code. However, I tried the same thing with the latest version of Windows x64, 5.2.3790 Service Pack 1, and there is no MOVAPS exception any more. I suspect that the system has been made more robust and now itself ensures that the stack is properly aligned. If so, then this makes it much easier for a compiler to generate the correct code, because it would not have to ensure that the stack is always 16-byte aligned. Has anyone found any APIs sensitive to stack misalignment under the latest version of x64? If I find any I shall post a report here. -- Jeremy Gordon The "Go" tools "Jeremy Gordon" wrote: > Thanks Darrell, I understand this. I hope to find a way to set up the > exception records to avoid difficulty arising from this in leaf functions > which call other leaf functions. > A colleague also suggested to me from recent documentation he had read, that > 16-byte stack alignment ensures that XMM data can be moved on and off the > stack without causing an exception. For example the instruction MOVDQA > requires its data to be 16-byte aligned. This may be another reason for the > requirement for 16-byte stack alignment. Again I believe this can be dealt > with by a compiler. > Thanks very much for your help. > -- > Jeremy Gordon > The "Go" tools > > > ""Darrell Gorter[MSFT]"" wrote: > > > Hello Jeremy, > > A function can not call another function without first reserving stack for > > parameters. A leaf function can not do this because, the unwinder assumes > > the caller address is at rsp. > > The unwinder needs to be able to find the first function with unwind info. > > If you are in a leaf function then the address of the caller is at rsp. > > If you execute a call instruction then the address of the callee leaf > > function is pushed onto the stack and rsp is changed, thus the address at > > rsp no longer is that of the first non-leaf function (i.e. has unwind info). > > You can tailcall from leaf functions to other leaf functions (or non-leaf > > functions) through a jump instruction as that doesn't affect the stack. > > > > Thanks, > > Darrell Gorter[MSFT] > > > > This posting is provided "AS IS" with no warranties, and confers no rights > > -------------------- > > <Thread-Topic: 16-byte stack alignment - is it really necessary? > > <thread-index: AcWYysxdafDeOw9iRcieK8pLIz47wA== > > <X-WBNR-Posting-Host: 213.162.104.195 > > <From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?=" > > <> > > <References: <55DA3544-59C3-4F81-8708-> > > <> > > <Subject: RE: 16-byte stack alignment - is it really necessary? > > <Date: Thu, 4 Aug 2005 01:02:02 -0700 > > <Lines: 131 > > <Message-ID: <30642324-54CB-4CC8-B5AD-> > > <MIME-Version: 1.0 > > <Content-Type: text/plain; > > < charset="Utf-8" > > <Content-Transfer-Encoding: 7bit > > <X-Newsreader: Microsoft CDO for Windows 2000 > > <Content-Class: urn:content-classes:message > > <Importance: normal > > <Priority: normal > > <X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0 > > <Newsgroups: microsoft.public.windows.64bit.general > > <NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250 > > <Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl > > <Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13887 > > <X-Tomcat-NG: microsoft.public.windows.64bit.general > > < > > <Thanks, Darrell for your answer. > > <I was aware of the article you mentioned, in fact it was that article > > which > > <first alerted me to the supposed requirement that the stack is always > > aligned > > <on a 16-byte boundary except within prolog and epolog code in a stack > > frame > > <(when, of course it could not always be so aligned). > > < > > <The particular parts of the article which worry me are:- > > <"Dynamic Parameter Stack Area Construction" > > <"The stack is always 16-byte aligned when a call instruction is executed." > > <And in the article:- > > <"Function types" > > <"Leaf function- A function that does not require a stack frame. A leaf > > <function does not require a function table entry. It cannot call any > > <functions, allocate space, or save any nonvolatile registers. It can leave > > <the stack unaligned while it executes." > > < > > <Here it is said that a leaf function cannot call another function. This > > is > > <enormously prohibitive for compilers, and seems to be a consequence of the > > <requirement that the stack should always be aligned on a 16-byte boundary > > at > > <the time of a call. Of course the compiler could ensure that the > > alignment > > <is achieved just prior to the call, and restored afterwards, but this > > would > > <add bloat (extra opcodes). I need to adjust the output of my development > > <tools to ensure that the requirements of x64 are always met but only in so > > <far as that requirement is strictly necessary. This is why I need to > > <understand the reason for (a) leaf functions being prohibited from calling > > <another leaf function, which seems to be related to (b) the requirement > > that > > <the stack be 16-byte aligned when calling even a leaf function. > > < > > <Bearing in mind that the AMD64 literature requires stack alignment only on > > <an 8-byte boundary which is what one would expect, I believe the > > requirement > > <in x64 for stack alignment on a 16-byte boundary is related to exception > > <handling. I can see that in frame functions, which have prolog and epilog > > <code, it may well be the case that the exception handler requires all > > calls > > <within the frame function to be carried out when the stack pointer is > > aligned > > <on a 16-byte boundary. This may be necessary in order to achieve an > > orderly > > <unwind (going back through the calls until the correct unwind data is > > found, > > <and then continuing the unwind back beyond that). > > <However, I do not understand at present why this is necessary for leaf > > <functions. Generally a leaf function will not have its own unwind data. > > It > > <will not have a function table entry. It will therefore be invisible to > > the > > <exception handler. Instead, the exception handler will identify the > > function > > <which called the leaf function. To my mind, therefore, if there are a > > series > > <of leaf calls (ie. in which leaf functions call other leaf functions) the > > <exception handler will ignore them all and automatically start the unwind > > at > > <the frame function which called the very first leaf function in the > > series. > > <This is why I am sceptical about the documented prohibition against leaf > > <functions calling other leaf functions and the requirement of 16-byte > > <alignment of the stack when a leaf function is called. > > <However, perhaps this requirement is not to do with exception handling. > > If > > <so, what is the requirement to do with? > > <Any help you can give me would be appreciated. > > <-- > > <Jeremy Gordon > > <The "Go" tools > > < > > < > > <""Darrell Gorter[MSFT]"" wrote: > > < > > <> Hello Jeremy, > > <> This is required as part of the amd64 calling convention, it is not for > > <> performance only. > > <> In general writing asm on x64 is quite different that writing it on x86. > > <> Here is a document on msdn that may be useful. > > <> > > http://msdn.microsoft.com/library/de...us/kmarch/hh/k > > <> march/64bitAMD_6ec00b51-bf75-41bf-8635-caa8653c8bd9.xml.asp > > <> Thanks, > > <> Darrell Gorter[MSFT] > > <> > > <> This posting is provided "AS IS" with no warranties, and confers no > > rights > > <> -------------------- > > <> <Thread-Topic: 16-byte stack alignment - is it really necessary? > > <> <thread-index: AcWW4/NXRsKX3DAAS2OcD12MKd4U3A== > > <> <X-WBNR-Posting-Host: 213.162.104.195 > > <> <From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?=" > > <> <> > > <> <Subject: 16-byte stack alignment - is it really necessary? > > <> <Date: Mon, 1 Aug 2005 14:57:03 -0700 > > <> <Lines: 31 > > <> <Message-ID: <55DA3544-59C3-4F81-8708-> > > <> <MIME-Version: 1.0 > > <> <Content-Type: text/plain; > > <> < charset="Utf-8" > > <> <Content-Transfer-Encoding: 7bit > > <> <X-Newsreader: Microsoft CDO for Windows 2000 > > <> <Content-Class: urn:content-classes:message > > <> <Importance: normal > > <> <Priority: normal > > <> <X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0 > > <> <Newsgroups: microsoft.public.windows.64bit.general > > <> <NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250 > > <> <Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl > > <> <Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13644 > > <> <X-Tomcat-NG: microsoft.public.windows.64bit.general > > <> < > > <> <I have a query about how important it really is to maintain RSP on > > 16-byte > > <> <alignment. It seems to be a "given" that this will aid performance and > > the > > <> <X64 documentation is very insistent about this. > > <> < > > <> <However the AMD documentation says:- > > <> <"Stack Alignment. Control-transfer performance can degrade > > significantly > > <> <when the stack pointer is not aligned properly. Stack pointers should > > be > > <> word > > <> <aligned in 16-bit segments, doubleword aligned in 32-bit segments, and > > <> <quadword aligned in 64-bit mode." > > <> <Section 3.73 Chapter 3 "General Purpose Programming" AMD64. > > <> <In other words the processor manufacturer recommends only 8-byte > > alignment > > <> <for the stack. > > <> < > > <> <The reason I ask is that I am converting my assembler (GoAsm) to work > > with > > <> <64-bit source code for applications running under Windows XP64. It > > will > > <> <certainly be far easier to keep the stack on 8-byte alignment, rather > > than > > <> <16-byte alignment. > > <> < > > <> <Bearing in mind the processor manufacturer's requirement, I cannot at > > <> <present understand the requirement to align on 16-bytes. Why is this > > <> <insisted upon? Is this a hangover from some previous thinking? Is it > > <> <something which may be reconsidered and eventually dropped? I > > understand > > <> no > > <> <exception will be generated by stack alignment on 8-bytes rather than > > <> <16-byte, but it is said that performance may be affected. Why is this? > > I > > <> <will of course do my own speed trials, but at my current planning stage > > <> for > > <> <GoAsm, any insight into this would be very useful. > > <> < > > <> <----- > > <> <Jeremy Gordon > > <> <The "Go" tools > > <> <http://www.GoDevTool.com > > <> < > > <> > > <> > > < > > > > |
|
|
|
|
|||
|
|||
| =?Utf-8?B?SmVyZW15IEdvcmRvbg==?= |
|
|
|
| |
![]() |
| Thread Tools | |
|
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| C/C++ compilers have one stack for local variables and return addresses and then another stack for array allocations on the stack. | Casey Hawthorne | C Programming | 3 | 11-01-2009 08:23 PM |
| stack pointer alignment on x86 and x86_64 | omkarenator | C Programming | 2 | 05-01-2009 01:53 PM |
| Alignment on stack arrays | H.K. Kingston-Smith | C Programming | 16 | 06-17-2008 08:29 AM |
| REALLY Basic: DIV/CSS Vertical Alignment | Slick50 | HTML | 2 | 12-03-2006 09:15 AM |
| Stack alignment issues | jacob navia | Windows 64bit | 4 | 12-31-2005 07:40 PM |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc..
SEO by vBSEO ©2010, Crawlability, Inc. |




