16-byte stack alignment - is it really necessary?

Discussion in 'Windows 64bit' started by =?Utf-8?B?SmVyZW15IEdvcmRvbg==?=, Aug 1, 2005.

  1. I have a query about how important it really is to maintain RSP on 16-byte
    alignment. It seems to be a "given" that this will aid performance and the
    X64 documentation is very insistent about this.

    However the AMD documentation says:-
    "Stack Alignment. Control-transfer performance can degrade significantly
    when the stack pointer is not aligned properly. Stack pointers should be word
    aligned in 16-bit segments, doubleword aligned in 32-bit segments, and
    quadword aligned in 64-bit mode."
    Section 3.73 Chapter 3 "General Purpose Programming" AMD64.
    In other words the processor manufacturer recommends only 8-byte alignment
    for the stack.

    The reason I ask is that I am converting my assembler (GoAsm) to work with
    64-bit source code for applications running under Windows XP64. It will
    certainly be far easier to keep the stack on 8-byte alignment, rather than
    16-byte alignment.

    Bearing in mind the processor manufacturer's requirement, I cannot at
    present understand the requirement to align on 16-bytes. Why is this
    insisted upon? Is this a hangover from some previous thinking? Is it
    something which may be reconsidered and eventually dropped? I understand no
    exception will be generated by stack alignment on 8-bytes rather than
    16-byte, but it is said that performance may be affected. Why is this? I
    will of course do my own speed trials, but at my current planning stage for
    GoAsm, any insight into this would be very useful.

    -----
    Jeremy Gordon
    The "Go" tools
    http://www.GoDevTool.com
     
    =?Utf-8?B?SmVyZW15IEdvcmRvbg==?=, Aug 1, 2005
    #1
    1. Advertising

  2. Hello Jeremy,
    This is required as part of the amd64 calling convention, it is not for
    performance only.
    In general writing asm on x64 is quite different that writing it on x86.
    Here is a document on msdn that may be useful.
    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/kmarch/hh/k
    march/64bitAMD_6ec00b51-bf75-41bf-8635-caa8653c8bd9.xml.asp
    Thanks,
    Darrell Gorter[MSFT]

    This posting is provided "AS IS" with no warranties, and confers no rights
    --------------------
    <Thread-Topic: 16-byte stack alignment - is it really necessary?
    <thread-index: AcWW4/NXRsKX3DAAS2OcD12MKd4U3A==
    <X-WBNR-Posting-Host: 213.162.104.195
    <From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?="
    <>
    <Subject: 16-byte stack alignment - is it really necessary?
    <Date: Mon, 1 Aug 2005 14:57:03 -0700
    <Lines: 31
    <Message-ID: <>
    <MIME-Version: 1.0
    <Content-Type: text/plain;
    < charset="Utf-8"
    <Content-Transfer-Encoding: 7bit
    <X-Newsreader: Microsoft CDO for Windows 2000
    <Content-Class: urn:content-classes:message
    <Importance: normal
    <Priority: normal
    <X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
    <Newsgroups: microsoft.public.windows.64bit.general
    <NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
    <Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
    <Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13644
    <X-Tomcat-NG: microsoft.public.windows.64bit.general
    <
    <I have a query about how important it really is to maintain RSP on 16-byte
    <alignment. It seems to be a "given" that this will aid performance and the
    <X64 documentation is very insistent about this.
    <
    <However the AMD documentation says:-
    <"Stack Alignment. Control-transfer performance can degrade significantly
    <when the stack pointer is not aligned properly. Stack pointers should be
    word
    <aligned in 16-bit segments, doubleword aligned in 32-bit segments, and
    <quadword aligned in 64-bit mode."
    <Section 3.73 Chapter 3 "General Purpose Programming" AMD64.
    <In other words the processor manufacturer recommends only 8-byte alignment
    <for the stack.
    <
    <The reason I ask is that I am converting my assembler (GoAsm) to work with
    <64-bit source code for applications running under Windows XP64. It will
    <certainly be far easier to keep the stack on 8-byte alignment, rather than
    <16-byte alignment.
    <
    <Bearing in mind the processor manufacturer's requirement, I cannot at
    <present understand the requirement to align on 16-bytes. Why is this
    <insisted upon? Is this a hangover from some previous thinking? Is it
    <something which may be reconsidered and eventually dropped? I understand
    no
    <exception will be generated by stack alignment on 8-bytes rather than
    <16-byte, but it is said that performance may be affected. Why is this? I
    <will of course do my own speed trials, but at my current planning stage
    for
    <GoAsm, any insight into this would be very useful.
    <
    <-----
    <Jeremy Gordon
    <The "Go" tools
    <http://www.GoDevTool.com
    <
     
    Darrell Gorter[MSFT], Aug 3, 2005
    #2
    1. Advertising

  3. Thanks, Darrell for your answer.
    I was aware of the article you mentioned, in fact it was that article which
    first alerted me to the supposed requirement that the stack is always aligned
    on a 16-byte boundary except within prolog and epolog code in a stack frame
    (when, of course it could not always be so aligned).

    The particular parts of the article which worry me are:-
    "Dynamic Parameter Stack Area Construction"
    "The stack is always 16-byte aligned when a call instruction is executed."
    And in the article:-
    "Function types"
    "Leaf function- A function that does not require a stack frame. A leaf
    function does not require a function table entry. It cannot call any
    functions, allocate space, or save any nonvolatile registers. It can leave
    the stack unaligned while it executes."

    Here it is said that a leaf function cannot call another function. This is
    enormously prohibitive for compilers, and seems to be a consequence of the
    requirement that the stack should always be aligned on a 16-byte boundary at
    the time of a call. Of course the compiler could ensure that the alignment
    is achieved just prior to the call, and restored afterwards, but this would
    add bloat (extra opcodes). I need to adjust the output of my development
    tools to ensure that the requirements of x64 are always met but only in so
    far as that requirement is strictly necessary. This is why I need to
    understand the reason for (a) leaf functions being prohibited from calling
    another leaf function, which seems to be related to (b) the requirement that
    the stack be 16-byte aligned when calling even a leaf function.

    Bearing in mind that the AMD64 literature requires stack alignment only on
    an 8-byte boundary which is what one would expect, I believe the requirement
    in x64 for stack alignment on a 16-byte boundary is related to exception
    handling. I can see that in frame functions, which have prolog and epilog
    code, it may well be the case that the exception handler requires all calls
    within the frame function to be carried out when the stack pointer is aligned
    on a 16-byte boundary. This may be necessary in order to achieve an orderly
    unwind (going back through the calls until the correct unwind data is found,
    and then continuing the unwind back beyond that).
    However, I do not understand at present why this is necessary for leaf
    functions. Generally a leaf function will not have its own unwind data. It
    will not have a function table entry. It will therefore be invisible to the
    exception handler. Instead, the exception handler will identify the function
    which called the leaf function. To my mind, therefore, if there are a series
    of leaf calls (ie. in which leaf functions call other leaf functions) the
    exception handler will ignore them all and automatically start the unwind at
    the frame function which called the very first leaf function in the series.
    This is why I am sceptical about the documented prohibition against leaf
    functions calling other leaf functions and the requirement of 16-byte
    alignment of the stack when a leaf function is called.
    However, perhaps this requirement is not to do with exception handling. If
    so, what is the requirement to do with?
    Any help you can give me would be appreciated.
    --
    Jeremy Gordon
    The "Go" tools


    ""Darrell Gorter[MSFT]"" wrote:

    > Hello Jeremy,
    > This is required as part of the amd64 calling convention, it is not for
    > performance only.
    > In general writing asm on x64 is quite different that writing it on x86.
    > Here is a document on msdn that may be useful.
    > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/kmarch/hh/k
    > march/64bitAMD_6ec00b51-bf75-41bf-8635-caa8653c8bd9.xml.asp
    > Thanks,
    > Darrell Gorter[MSFT]
    >
    > This posting is provided "AS IS" with no warranties, and confers no rights
    > --------------------
    > <Thread-Topic: 16-byte stack alignment - is it really necessary?
    > <thread-index: AcWW4/NXRsKX3DAAS2OcD12MKd4U3A==
    > <X-WBNR-Posting-Host: 213.162.104.195
    > <From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?="
    > <>
    > <Subject: 16-byte stack alignment - is it really necessary?
    > <Date: Mon, 1 Aug 2005 14:57:03 -0700
    > <Lines: 31
    > <Message-ID: <>
    > <MIME-Version: 1.0
    > <Content-Type: text/plain;
    > < charset="Utf-8"
    > <Content-Transfer-Encoding: 7bit
    > <X-Newsreader: Microsoft CDO for Windows 2000
    > <Content-Class: urn:content-classes:message
    > <Importance: normal
    > <Priority: normal
    > <X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
    > <Newsgroups: microsoft.public.windows.64bit.general
    > <NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
    > <Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
    > <Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13644
    > <X-Tomcat-NG: microsoft.public.windows.64bit.general
    > <
    > <I have a query about how important it really is to maintain RSP on 16-byte
    > <alignment. It seems to be a "given" that this will aid performance and the
    > <X64 documentation is very insistent about this.
    > <
    > <However the AMD documentation says:-
    > <"Stack Alignment. Control-transfer performance can degrade significantly
    > <when the stack pointer is not aligned properly. Stack pointers should be
    > word
    > <aligned in 16-bit segments, doubleword aligned in 32-bit segments, and
    > <quadword aligned in 64-bit mode."
    > <Section 3.73 Chapter 3 "General Purpose Programming" AMD64.
    > <In other words the processor manufacturer recommends only 8-byte alignment
    > <for the stack.
    > <
    > <The reason I ask is that I am converting my assembler (GoAsm) to work with
    > <64-bit source code for applications running under Windows XP64. It will
    > <certainly be far easier to keep the stack on 8-byte alignment, rather than
    > <16-byte alignment.
    > <
    > <Bearing in mind the processor manufacturer's requirement, I cannot at
    > <present understand the requirement to align on 16-bytes. Why is this
    > <insisted upon? Is this a hangover from some previous thinking? Is it
    > <something which may be reconsidered and eventually dropped? I understand
    > no
    > <exception will be generated by stack alignment on 8-bytes rather than
    > <16-byte, but it is said that performance may be affected. Why is this? I
    > <will of course do my own speed trials, but at my current planning stage
    > for
    > <GoAsm, any insight into this would be very useful.
    > <
    > <-----
    > <Jeremy Gordon
    > <The "Go" tools
    > <http://www.GoDevTool.com
    > <
    >
    >
     
    =?Utf-8?B?SmVyZW15IEdvcmRvbg==?=, Aug 4, 2005
    #3
  4. Hello Jeremy,
    A function can not call another function without first reserving stack for
    parameters. A leaf function can not do this because, the unwinder assumes
    the caller address is at rsp.
    The unwinder needs to be able to find the first function with unwind info.
    If you are in a leaf function then the address of the caller is at rsp.
    If you execute a call instruction then the address of the callee leaf
    function is pushed onto the stack and rsp is changed, thus the address at
    rsp no longer is that of the first non-leaf function (i.e. has unwind info).
    You can tailcall from leaf functions to other leaf functions (or non-leaf
    functions) through a jump instruction as that doesn't affect the stack.

    Thanks,
    Darrell Gorter[MSFT]

    This posting is provided "AS IS" with no warranties, and confers no rights
    --------------------
    <Thread-Topic: 16-byte stack alignment - is it really necessary?
    <thread-index: AcWYysxdafDeOw9iRcieK8pLIz47wA==
    <X-WBNR-Posting-Host: 213.162.104.195
    <From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?="
    <>
    <References: <>
    <>
    <Subject: RE: 16-byte stack alignment - is it really necessary?
    <Date: Thu, 4 Aug 2005 01:02:02 -0700
    <Lines: 131
    <Message-ID: <>
    <MIME-Version: 1.0
    <Content-Type: text/plain;
    < charset="Utf-8"
    <Content-Transfer-Encoding: 7bit
    <X-Newsreader: Microsoft CDO for Windows 2000
    <Content-Class: urn:content-classes:message
    <Importance: normal
    <Priority: normal
    <X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
    <Newsgroups: microsoft.public.windows.64bit.general
    <NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
    <Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
    <Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13887
    <X-Tomcat-NG: microsoft.public.windows.64bit.general
    <
    <Thanks, Darrell for your answer.
    <I was aware of the article you mentioned, in fact it was that article
    which
    <first alerted me to the supposed requirement that the stack is always
    aligned
    <on a 16-byte boundary except within prolog and epolog code in a stack
    frame
    <(when, of course it could not always be so aligned).
    <
    <The particular parts of the article which worry me are:-
    <"Dynamic Parameter Stack Area Construction"
    <"The stack is always 16-byte aligned when a call instruction is executed."
    <And in the article:-
    <"Function types"
    <"Leaf function- A function that does not require a stack frame. A leaf
    <function does not require a function table entry. It cannot call any
    <functions, allocate space, or save any nonvolatile registers. It can leave
    <the stack unaligned while it executes."
    <
    <Here it is said that a leaf function cannot call another function. This
    is
    <enormously prohibitive for compilers, and seems to be a consequence of the
    <requirement that the stack should always be aligned on a 16-byte boundary
    at
    <the time of a call. Of course the compiler could ensure that the
    alignment
    <is achieved just prior to the call, and restored afterwards, but this
    would
    <add bloat (extra opcodes). I need to adjust the output of my development
    <tools to ensure that the requirements of x64 are always met but only in so
    <far as that requirement is strictly necessary. This is why I need to
    <understand the reason for (a) leaf functions being prohibited from calling
    <another leaf function, which seems to be related to (b) the requirement
    that
    <the stack be 16-byte aligned when calling even a leaf function.
    <
    <Bearing in mind that the AMD64 literature requires stack alignment only on
    <an 8-byte boundary which is what one would expect, I believe the
    requirement
    <in x64 for stack alignment on a 16-byte boundary is related to exception
    <handling. I can see that in frame functions, which have prolog and epilog
    <code, it may well be the case that the exception handler requires all
    calls
    <within the frame function to be carried out when the stack pointer is
    aligned
    <on a 16-byte boundary. This may be necessary in order to achieve an
    orderly
    <unwind (going back through the calls until the correct unwind data is
    found,
    <and then continuing the unwind back beyond that).
    <However, I do not understand at present why this is necessary for leaf
    <functions. Generally a leaf function will not have its own unwind data.
    It
    <will not have a function table entry. It will therefore be invisible to
    the
    <exception handler. Instead, the exception handler will identify the
    function
    <which called the leaf function. To my mind, therefore, if there are a
    series
    <of leaf calls (ie. in which leaf functions call other leaf functions) the
    <exception handler will ignore them all and automatically start the unwind
    at
    <the frame function which called the very first leaf function in the
    series.
    <This is why I am sceptical about the documented prohibition against leaf
    <functions calling other leaf functions and the requirement of 16-byte
    <alignment of the stack when a leaf function is called.
    <However, perhaps this requirement is not to do with exception handling.
    If
    <so, what is the requirement to do with?
    <Any help you can give me would be appreciated.
    <--
    <Jeremy Gordon
    <The "Go" tools
    <
    <
    <""Darrell Gorter[MSFT]"" wrote:
    <
    <> Hello Jeremy,
    <> This is required as part of the amd64 calling convention, it is not for
    <> performance only.
    <> In general writing asm on x64 is quite different that writing it on x86.
    <> Here is a document on msdn that may be useful.
    <>
    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/kmarch/hh/k
    <> march/64bitAMD_6ec00b51-bf75-41bf-8635-caa8653c8bd9.xml.asp
    <> Thanks,
    <> Darrell Gorter[MSFT]
    <>
    <> This posting is provided "AS IS" with no warranties, and confers no
    rights
    <> --------------------
    <> <Thread-Topic: 16-byte stack alignment - is it really necessary?
    <> <thread-index: AcWW4/NXRsKX3DAAS2OcD12MKd4U3A==
    <> <X-WBNR-Posting-Host: 213.162.104.195
    <> <From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?="
    <> <>
    <> <Subject: 16-byte stack alignment - is it really necessary?
    <> <Date: Mon, 1 Aug 2005 14:57:03 -0700
    <> <Lines: 31
    <> <Message-ID: <>
    <> <MIME-Version: 1.0
    <> <Content-Type: text/plain;
    <> < charset="Utf-8"
    <> <Content-Transfer-Encoding: 7bit
    <> <X-Newsreader: Microsoft CDO for Windows 2000
    <> <Content-Class: urn:content-classes:message
    <> <Importance: normal
    <> <Priority: normal
    <> <X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
    <> <Newsgroups: microsoft.public.windows.64bit.general
    <> <NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
    <> <Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
    <> <Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13644
    <> <X-Tomcat-NG: microsoft.public.windows.64bit.general
    <> <
    <> <I have a query about how important it really is to maintain RSP on
    16-byte
    <> <alignment. It seems to be a "given" that this will aid performance and
    the
    <> <X64 documentation is very insistent about this.
    <> <
    <> <However the AMD documentation says:-
    <> <"Stack Alignment. Control-transfer performance can degrade
    significantly
    <> <when the stack pointer is not aligned properly. Stack pointers should
    be
    <> word
    <> <aligned in 16-bit segments, doubleword aligned in 32-bit segments, and
    <> <quadword aligned in 64-bit mode."
    <> <Section 3.73 Chapter 3 "General Purpose Programming" AMD64.
    <> <In other words the processor manufacturer recommends only 8-byte
    alignment
    <> <for the stack.
    <> <
    <> <The reason I ask is that I am converting my assembler (GoAsm) to work
    with
    <> <64-bit source code for applications running under Windows XP64. It
    will
    <> <certainly be far easier to keep the stack on 8-byte alignment, rather
    than
    <> <16-byte alignment.
    <> <
    <> <Bearing in mind the processor manufacturer's requirement, I cannot at
    <> <present understand the requirement to align on 16-bytes. Why is this
    <> <insisted upon? Is this a hangover from some previous thinking? Is it
    <> <something which may be reconsidered and eventually dropped? I
    understand
    <> no
    <> <exception will be generated by stack alignment on 8-bytes rather than
    <> <16-byte, but it is said that performance may be affected. Why is this?
    I
    <> <will of course do my own speed trials, but at my current planning stage
    <> for
    <> <GoAsm, any insight into this would be very useful.
    <> <
    <> <-----
    <> <Jeremy Gordon
    <> <The "Go" tools
    <> <http://www.GoDevTool.com
    <> <
    <>
    <>
    <
     
    Darrell Gorter[MSFT], Aug 5, 2005
    #4
  5. Thanks Darrell, I understand this. I hope to find a way to set up the
    exception records to avoid difficulty arising from this in leaf functions
    which call other leaf functions.
    A colleague also suggested to me from recent documentation he had read, that
    16-byte stack alignment ensures that XMM data can be moved on and off the
    stack without causing an exception. For example the instruction MOVDQA
    requires its data to be 16-byte aligned. This may be another reason for the
    requirement for 16-byte stack alignment. Again I believe this can be dealt
    with by a compiler.
    Thanks very much for your help.
    --
    Jeremy Gordon
    The "Go" tools


    ""Darrell Gorter[MSFT]"" wrote:

    > Hello Jeremy,
    > A function can not call another function without first reserving stack for
    > parameters. A leaf function can not do this because, the unwinder assumes
    > the caller address is at rsp.
    > The unwinder needs to be able to find the first function with unwind info.
    > If you are in a leaf function then the address of the caller is at rsp.
    > If you execute a call instruction then the address of the callee leaf
    > function is pushed onto the stack and rsp is changed, thus the address at
    > rsp no longer is that of the first non-leaf function (i.e. has unwind info).
    > You can tailcall from leaf functions to other leaf functions (or non-leaf
    > functions) through a jump instruction as that doesn't affect the stack.
    >
    > Thanks,
    > Darrell Gorter[MSFT]
    >
    > This posting is provided "AS IS" with no warranties, and confers no rights
    > --------------------
    > <Thread-Topic: 16-byte stack alignment - is it really necessary?
    > <thread-index: AcWYysxdafDeOw9iRcieK8pLIz47wA==
    > <X-WBNR-Posting-Host: 213.162.104.195
    > <From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?="
    > <>
    > <References: <>
    > <>
    > <Subject: RE: 16-byte stack alignment - is it really necessary?
    > <Date: Thu, 4 Aug 2005 01:02:02 -0700
    > <Lines: 131
    > <Message-ID: <>
    > <MIME-Version: 1.0
    > <Content-Type: text/plain;
    > < charset="Utf-8"
    > <Content-Transfer-Encoding: 7bit
    > <X-Newsreader: Microsoft CDO for Windows 2000
    > <Content-Class: urn:content-classes:message
    > <Importance: normal
    > <Priority: normal
    > <X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
    > <Newsgroups: microsoft.public.windows.64bit.general
    > <NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
    > <Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
    > <Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13887
    > <X-Tomcat-NG: microsoft.public.windows.64bit.general
    > <
    > <Thanks, Darrell for your answer.
    > <I was aware of the article you mentioned, in fact it was that article
    > which
    > <first alerted me to the supposed requirement that the stack is always
    > aligned
    > <on a 16-byte boundary except within prolog and epolog code in a stack
    > frame
    > <(when, of course it could not always be so aligned).
    > <
    > <The particular parts of the article which worry me are:-
    > <"Dynamic Parameter Stack Area Construction"
    > <"The stack is always 16-byte aligned when a call instruction is executed."
    > <And in the article:-
    > <"Function types"
    > <"Leaf function- A function that does not require a stack frame. A leaf
    > <function does not require a function table entry. It cannot call any
    > <functions, allocate space, or save any nonvolatile registers. It can leave
    > <the stack unaligned while it executes."
    > <
    > <Here it is said that a leaf function cannot call another function. This
    > is
    > <enormously prohibitive for compilers, and seems to be a consequence of the
    > <requirement that the stack should always be aligned on a 16-byte boundary
    > at
    > <the time of a call. Of course the compiler could ensure that the
    > alignment
    > <is achieved just prior to the call, and restored afterwards, but this
    > would
    > <add bloat (extra opcodes). I need to adjust the output of my development
    > <tools to ensure that the requirements of x64 are always met but only in so
    > <far as that requirement is strictly necessary. This is why I need to
    > <understand the reason for (a) leaf functions being prohibited from calling
    > <another leaf function, which seems to be related to (b) the requirement
    > that
    > <the stack be 16-byte aligned when calling even a leaf function.
    > <
    > <Bearing in mind that the AMD64 literature requires stack alignment only on
    > <an 8-byte boundary which is what one would expect, I believe the
    > requirement
    > <in x64 for stack alignment on a 16-byte boundary is related to exception
    > <handling. I can see that in frame functions, which have prolog and epilog
    > <code, it may well be the case that the exception handler requires all
    > calls
    > <within the frame function to be carried out when the stack pointer is
    > aligned
    > <on a 16-byte boundary. This may be necessary in order to achieve an
    > orderly
    > <unwind (going back through the calls until the correct unwind data is
    > found,
    > <and then continuing the unwind back beyond that).
    > <However, I do not understand at present why this is necessary for leaf
    > <functions. Generally a leaf function will not have its own unwind data.
    > It
    > <will not have a function table entry. It will therefore be invisible to
    > the
    > <exception handler. Instead, the exception handler will identify the
    > function
    > <which called the leaf function. To my mind, therefore, if there are a
    > series
    > <of leaf calls (ie. in which leaf functions call other leaf functions) the
    > <exception handler will ignore them all and automatically start the unwind
    > at
    > <the frame function which called the very first leaf function in the
    > series.
    > <This is why I am sceptical about the documented prohibition against leaf
    > <functions calling other leaf functions and the requirement of 16-byte
    > <alignment of the stack when a leaf function is called.
    > <However, perhaps this requirement is not to do with exception handling.
    > If
    > <so, what is the requirement to do with?
    > <Any help you can give me would be appreciated.
    > <--
    > <Jeremy Gordon
    > <The "Go" tools
    > <
    > <
    > <""Darrell Gorter[MSFT]"" wrote:
    > <
    > <> Hello Jeremy,
    > <> This is required as part of the amd64 calling convention, it is not for
    > <> performance only.
    > <> In general writing asm on x64 is quite different that writing it on x86.
    > <> Here is a document on msdn that may be useful.
    > <>
    > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/kmarch/hh/k
    > <> march/64bitAMD_6ec00b51-bf75-41bf-8635-caa8653c8bd9.xml.asp
    > <> Thanks,
    > <> Darrell Gorter[MSFT]
    > <>
    > <> This posting is provided "AS IS" with no warranties, and confers no
    > rights
    > <> --------------------
    > <> <Thread-Topic: 16-byte stack alignment - is it really necessary?
    > <> <thread-index: AcWW4/NXRsKX3DAAS2OcD12MKd4U3A==
    > <> <X-WBNR-Posting-Host: 213.162.104.195
    > <> <From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?="
    > <> <>
    > <> <Subject: 16-byte stack alignment - is it really necessary?
    > <> <Date: Mon, 1 Aug 2005 14:57:03 -0700
    > <> <Lines: 31
    > <> <Message-ID: <>
    > <> <MIME-Version: 1.0
    > <> <Content-Type: text/plain;
    > <> < charset="Utf-8"
    > <> <Content-Transfer-Encoding: 7bit
    > <> <X-Newsreader: Microsoft CDO for Windows 2000
    > <> <Content-Class: urn:content-classes:message
    > <> <Importance: normal
    > <> <Priority: normal
    > <> <X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
    > <> <Newsgroups: microsoft.public.windows.64bit.general
    > <> <NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
    > <> <Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
    > <> <Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13644
    > <> <X-Tomcat-NG: microsoft.public.windows.64bit.general
    > <> <
    > <> <I have a query about how important it really is to maintain RSP on
    > 16-byte
    > <> <alignment. It seems to be a "given" that this will aid performance and
    > the
    > <> <X64 documentation is very insistent about this.
    > <> <
    > <> <However the AMD documentation says:-
    > <> <"Stack Alignment. Control-transfer performance can degrade
    > significantly
    > <> <when the stack pointer is not aligned properly. Stack pointers should
    > be
    > <> word
    > <> <aligned in 16-bit segments, doubleword aligned in 32-bit segments, and
    > <> <quadword aligned in 64-bit mode."
    > <> <Section 3.73 Chapter 3 "General Purpose Programming" AMD64.
    > <> <In other words the processor manufacturer recommends only 8-byte
    > alignment
    > <> <for the stack.
    > <> <
    > <> <The reason I ask is that I am converting my assembler (GoAsm) to work
    > with
    > <> <64-bit source code for applications running under Windows XP64. It
    > will
    > <> <certainly be far easier to keep the stack on 8-byte alignment, rather
    > than
    > <> <16-byte alignment.
    > <> <
    > <> <Bearing in mind the processor manufacturer's requirement, I cannot at
    > <> <present understand the requirement to align on 16-bytes. Why is this
    > <> <insisted upon? Is this a hangover from some previous thinking? Is it
    > <> <something which may be reconsidered and eventually dropped? I
    > understand
    > <> no
    > <> <exception will be generated by stack alignment on 8-bytes rather than
    > <> <16-byte, but it is said that performance may be affected. Why is this?
    > I
    > <> <will of course do my own speed trials, but at my current planning stage
    > <> for
    > <> <GoAsm, any insight into this would be very useful.
    > <> <
    > <> <-----
    > <> <Jeremy Gordon
    > <> <The "Go" tools
    > <> <http://www.GoDevTool.com
    > <> <
    > <>
    > <>
    > <
    >
    >
     
    =?Utf-8?B?SmVyZW15IEdvcmRvbg==?=, Aug 5, 2005
    #5
  6. I did experiment by intentionally putting the stack out of alignment by 8
    bytes, and at least within a window procedure, DefWindowProc objected. The
    debugger showed an exception at a MOVAPS instruction within system code.

    However, I tried the same thing with the latest version of Windows x64,
    5.2.3790 Service Pack 1, and there is no MOVAPS exception any more.

    I suspect that the system has been made more robust and now itself ensures
    that the stack is properly aligned. If so, then this makes it much easier
    for a compiler to generate the correct code, because it would not have to
    ensure that the stack is always 16-byte aligned.

    Has anyone found any APIs sensitive to stack misalignment under the latest
    version of x64? If I find any I shall post a report here.
    --
    Jeremy Gordon
    The "Go" tools


    "Jeremy Gordon" wrote:

    > Thanks Darrell, I understand this. I hope to find a way to set up the
    > exception records to avoid difficulty arising from this in leaf functions
    > which call other leaf functions.
    > A colleague also suggested to me from recent documentation he had read, that
    > 16-byte stack alignment ensures that XMM data can be moved on and off the
    > stack without causing an exception. For example the instruction MOVDQA
    > requires its data to be 16-byte aligned. This may be another reason for the
    > requirement for 16-byte stack alignment. Again I believe this can be dealt
    > with by a compiler.
    > Thanks very much for your help.
    > --
    > Jeremy Gordon
    > The "Go" tools
    >
    >
    > ""Darrell Gorter[MSFT]"" wrote:
    >
    > > Hello Jeremy,
    > > A function can not call another function without first reserving stack for
    > > parameters. A leaf function can not do this because, the unwinder assumes
    > > the caller address is at rsp.
    > > The unwinder needs to be able to find the first function with unwind info.
    > > If you are in a leaf function then the address of the caller is at rsp.
    > > If you execute a call instruction then the address of the callee leaf
    > > function is pushed onto the stack and rsp is changed, thus the address at
    > > rsp no longer is that of the first non-leaf function (i.e. has unwind info).
    > > You can tailcall from leaf functions to other leaf functions (or non-leaf
    > > functions) through a jump instruction as that doesn't affect the stack.
    > >
    > > Thanks,
    > > Darrell Gorter[MSFT]
    > >
    > > This posting is provided "AS IS" with no warranties, and confers no rights
    > > --------------------
    > > <Thread-Topic: 16-byte stack alignment - is it really necessary?
    > > <thread-index: AcWYysxdafDeOw9iRcieK8pLIz47wA==
    > > <X-WBNR-Posting-Host: 213.162.104.195
    > > <From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?="
    > > <>
    > > <References: <>
    > > <>
    > > <Subject: RE: 16-byte stack alignment - is it really necessary?
    > > <Date: Thu, 4 Aug 2005 01:02:02 -0700
    > > <Lines: 131
    > > <Message-ID: <>
    > > <MIME-Version: 1.0
    > > <Content-Type: text/plain;
    > > < charset="Utf-8"
    > > <Content-Transfer-Encoding: 7bit
    > > <X-Newsreader: Microsoft CDO for Windows 2000
    > > <Content-Class: urn:content-classes:message
    > > <Importance: normal
    > > <Priority: normal
    > > <X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
    > > <Newsgroups: microsoft.public.windows.64bit.general
    > > <NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
    > > <Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
    > > <Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13887
    > > <X-Tomcat-NG: microsoft.public.windows.64bit.general
    > > <
    > > <Thanks, Darrell for your answer.
    > > <I was aware of the article you mentioned, in fact it was that article
    > > which
    > > <first alerted me to the supposed requirement that the stack is always
    > > aligned
    > > <on a 16-byte boundary except within prolog and epolog code in a stack
    > > frame
    > > <(when, of course it could not always be so aligned).
    > > <
    > > <The particular parts of the article which worry me are:-
    > > <"Dynamic Parameter Stack Area Construction"
    > > <"The stack is always 16-byte aligned when a call instruction is executed."
    > > <And in the article:-
    > > <"Function types"
    > > <"Leaf function- A function that does not require a stack frame. A leaf
    > > <function does not require a function table entry. It cannot call any
    > > <functions, allocate space, or save any nonvolatile registers. It can leave
    > > <the stack unaligned while it executes."
    > > <
    > > <Here it is said that a leaf function cannot call another function. This
    > > is
    > > <enormously prohibitive for compilers, and seems to be a consequence of the
    > > <requirement that the stack should always be aligned on a 16-byte boundary
    > > at
    > > <the time of a call. Of course the compiler could ensure that the
    > > alignment
    > > <is achieved just prior to the call, and restored afterwards, but this
    > > would
    > > <add bloat (extra opcodes). I need to adjust the output of my development
    > > <tools to ensure that the requirements of x64 are always met but only in so
    > > <far as that requirement is strictly necessary. This is why I need to
    > > <understand the reason for (a) leaf functions being prohibited from calling
    > > <another leaf function, which seems to be related to (b) the requirement
    > > that
    > > <the stack be 16-byte aligned when calling even a leaf function.
    > > <
    > > <Bearing in mind that the AMD64 literature requires stack alignment only on
    > > <an 8-byte boundary which is what one would expect, I believe the
    > > requirement
    > > <in x64 for stack alignment on a 16-byte boundary is related to exception
    > > <handling. I can see that in frame functions, which have prolog and epilog
    > > <code, it may well be the case that the exception handler requires all
    > > calls
    > > <within the frame function to be carried out when the stack pointer is
    > > aligned
    > > <on a 16-byte boundary. This may be necessary in order to achieve an
    > > orderly
    > > <unwind (going back through the calls until the correct unwind data is
    > > found,
    > > <and then continuing the unwind back beyond that).
    > > <However, I do not understand at present why this is necessary for leaf
    > > <functions. Generally a leaf function will not have its own unwind data.
    > > It
    > > <will not have a function table entry. It will therefore be invisible to
    > > the
    > > <exception handler. Instead, the exception handler will identify the
    > > function
    > > <which called the leaf function. To my mind, therefore, if there are a
    > > series
    > > <of leaf calls (ie. in which leaf functions call other leaf functions) the
    > > <exception handler will ignore them all and automatically start the unwind
    > > at
    > > <the frame function which called the very first leaf function in the
    > > series.
    > > <This is why I am sceptical about the documented prohibition against leaf
    > > <functions calling other leaf functions and the requirement of 16-byte
    > > <alignment of the stack when a leaf function is called.
    > > <However, perhaps this requirement is not to do with exception handling.
    > > If
    > > <so, what is the requirement to do with?
    > > <Any help you can give me would be appreciated.
    > > <--
    > > <Jeremy Gordon
    > > <The "Go" tools
    > > <
    > > <
    > > <""Darrell Gorter[MSFT]"" wrote:
    > > <
    > > <> Hello Jeremy,
    > > <> This is required as part of the amd64 calling convention, it is not for
    > > <> performance only.
    > > <> In general writing asm on x64 is quite different that writing it on x86.
    > > <> Here is a document on msdn that may be useful.
    > > <>
    > > http://msdn.microsoft.com/library/default.asp?url=/library/en-us/kmarch/hh/k
    > > <> march/64bitAMD_6ec00b51-bf75-41bf-8635-caa8653c8bd9.xml.asp
    > > <> Thanks,
    > > <> Darrell Gorter[MSFT]
    > > <>
    > > <> This posting is provided "AS IS" with no warranties, and confers no
    > > rights
    > > <> --------------------
    > > <> <Thread-Topic: 16-byte stack alignment - is it really necessary?
    > > <> <thread-index: AcWW4/NXRsKX3DAAS2OcD12MKd4U3A==
    > > <> <X-WBNR-Posting-Host: 213.162.104.195
    > > <> <From: "=?Utf-8?B?SmVyZW15IEdvcmRvbg==?="
    > > <> <>
    > > <> <Subject: 16-byte stack alignment - is it really necessary?
    > > <> <Date: Mon, 1 Aug 2005 14:57:03 -0700
    > > <> <Lines: 31
    > > <> <Message-ID: <>
    > > <> <MIME-Version: 1.0
    > > <> <Content-Type: text/plain;
    > > <> < charset="Utf-8"
    > > <> <Content-Transfer-Encoding: 7bit
    > > <> <X-Newsreader: Microsoft CDO for Windows 2000
    > > <> <Content-Class: urn:content-classes:message
    > > <> <Importance: normal
    > > <> <Priority: normal
    > > <> <X-MimeOLE: Produced By Microsoft MimeOLE V6.00.3790.0
    > > <> <Newsgroups: microsoft.public.windows.64bit.general
    > > <> <NNTP-Posting-Host: TK2MSFTNGXA03.phx.gbl 10.40.2.250
    > > <> <Path: TK2MSFTNGXA01.phx.gbl!TK2MSFTNGXA03.phx.gbl
    > > <> <Xref: TK2MSFTNGXA01.phx.gbl microsoft.public.windows.64bit.general:13644
    > > <> <X-Tomcat-NG: microsoft.public.windows.64bit.general
    > > <> <
    > > <> <I have a query about how important it really is to maintain RSP on
    > > 16-byte
    > > <> <alignment. It seems to be a "given" that this will aid performance and
    > > the
    > > <> <X64 documentation is very insistent about this.
    > > <> <
    > > <> <However the AMD documentation says:-
    > > <> <"Stack Alignment. Control-transfer performance can degrade
    > > significantly
    > > <> <when the stack pointer is not aligned properly. Stack pointers should
    > > be
    > > <> word
    > > <> <aligned in 16-bit segments, doubleword aligned in 32-bit segments, and
    > > <> <quadword aligned in 64-bit mode."
    > > <> <Section 3.73 Chapter 3 "General Purpose Programming" AMD64.
    > > <> <In other words the processor manufacturer recommends only 8-byte
    > > alignment
    > > <> <for the stack.
    > > <> <
    > > <> <The reason I ask is that I am converting my assembler (GoAsm) to work
    > > with
    > > <> <64-bit source code for applications running under Windows XP64. It
    > > will
    > > <> <certainly be far easier to keep the stack on 8-byte alignment, rather
    > > than
    > > <> <16-byte alignment.
    > > <> <
    > > <> <Bearing in mind the processor manufacturer's requirement, I cannot at
    > > <> <present understand the requirement to align on 16-bytes. Why is this
    > > <> <insisted upon? Is this a hangover from some previous thinking? Is it
    > > <> <something which may be reconsidered and eventually dropped? I
    > > understand
    > > <> no
    > > <> <exception will be generated by stack alignment on 8-bytes rather than
    > > <> <16-byte, but it is said that performance may be affected. Why is this?
    > > I
    > > <> <will of course do my own speed trials, but at my current planning stage
    > > <> for
    > > <> <GoAsm, any insight into this would be very useful.
    > > <> <
    > > <> <-----
    > > <> <Jeremy Gordon
    > > <> <The "Go" tools
    > > <> <http://www.GoDevTool.com
    > > <> <
    > > <>
    > > <>
    > > <
    > >
    > >
     
    =?Utf-8?B?SmVyZW15IEdvcmRvbg==?=, Aug 20, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Shashay Doofray

    Is 911 phone service really necessary?

    Shashay Doofray, Feb 1, 2004, in forum: VOIP
    Replies:
    62
    Views:
    1,884
    Ned Flanders
    Feb 15, 2004
  2. Yibbels
    Replies:
    12
    Views:
    969
    Darkknight
    Mar 28, 2005
  3. Sako
    Replies:
    3
    Views:
    498
    Walter Roberson
    Oct 2, 2006
  4. Moke G

    Is it ever really necessary to log out ?

    Moke G, Aug 13, 2006, in forum: Computer Support
    Replies:
    4
    Views:
    823
    Plato
    Aug 14, 2006
  5. jacob navia

    Stack alignment issues

    jacob navia, Dec 18, 2005, in forum: Windows 64bit
    Replies:
    4
    Views:
    2,314
    jorgon
    Dec 31, 2005
Loading...

Share This Page