Frame Relay EEK versus Traffic Shaping

Discussion in 'Cisco' started by Vincent C Jones, Oct 31, 2005.

  1. Frame relay traffic shaping and QoS appears to be incompatible with
    frame relay end-to-end keepalives. Need traffic shaping to provide
    production traffic with guaranteed bandwidth in the presence of other
    traffic. Need end-to-end keepalives (EEK) to reliably detect loss of
    frame connectivity at the link level (for policy routing).

    The configuration which follows works fine as long as there is not too
    much production traffic. No problems (except for surfers) when
    background traffic exceeds available bandwidth. But major problems when
    production traffic exceeds allocated bandwidth. This forces the PVC down
    due to loss of end-to-end keepalives. It appears that end-to-end
    keepalives have lower priority than reserved bandwidth, unlike routing
    keepalives, which have their own high priority queue.

    Side note: the background traffic is policed because if left to
    contend for available bandwidth, the frame relay traffic shaping
    kicks in before the background traffic is rate limited, boosting the
    queuing delays for production traffic to unacceptable (approx five
    seconds one way) levels. As configured, production traffic delay
    still suffers a major, but not fatal, hit, rising from 60 ms at no
    load to around 150 ms. Using "priority" rather than "bandwidth"
    does not improve the delay hit, implying it is queuing occuring
    after the QoS queuing has already been applied.

    Another hint: With both production and background traffic sources
    running at overload, production traffic is not being limited to
    the allocation, behaving more like priority queuing rather than
    QoS queuing.

    Serial0/0 is a full T1 at this end, a 256K fractional T1 at the
    destination end.

    Anyone have any ideas what is going on or how to fix it? Ideal would be
    getting QoS to work the way it does on a leased line, but just getting
    frame EEK to queue at a higher priority would be acceptable.

    Configuration excerpts:

    Cisco Internetwork Operating System Software
    IOS (tm) C1700 Software (C1700-SY7-M), Version 12.3(10b), RELEASE SOFTWARE (fc3)
    cisco 1760 (MPC860P) processor (revision 0x500) with 56320K/9216K bytes of memory.
    Processor board ID FOC08091V2T (923892838), with hardware revision 0000
    MPC860P processor: part number 5, mask 2

    !
    class-map match-all APPL-priority
    match access-group name PRIORITY-TRAFFIC
    !
    policy-map FRAME256policy
    class APPL-priority
    bandwidth 100
    class class-default
    fair-queue
    police cir 120000
    conform-action transmit
    exceed-action drop
    violate-action drop
    !
    interface Serial0/0
    description DLCI 100
    no ip address
    encapsulation frame-relay IETF
    logging event dlci-status-change
    serial restart-delay 0
    no fair-queue
    frame-relay traffic-shaping
    !
    interface Serial0/0.150 point-to-point
    description TestLink
    bandwidth 240
    ip address 206.208.93.37 255.255.255.252
    delay 1000
    frame-relay class FrameShape256
    frame-relay interface-dlci 150
    !
    ip access-list extended PRIORITY-TRAFFIC
    permit icmp host 192.168.100.131 any
    !
    map-class frame-relay FrameShape256
    frame-relay end-to-end keepalive mode bidirectional
    frame-relay cir 256000
    frame-relay mincir 256000
    frame-relay traffic-rate 240000 240000
    service-policy output FRAME256policy
    !

    Thanks in advance...
     
    Vincent C Jones, Oct 31, 2005
    #1
    1. Advertisements

  2. Vincent C Jones

    Igor Mamuzic Guest

    If we assume that FR end-to-end keepalives falls into default class and
    there is no any QoS marking for it, then maybe your background traffic
    consists of huge number of smaller packets causing keepalives to compete
    with background... I assume that you use route-map with tracking option
    using FR IP SLA monitoring to detect link faults... If so, why you have to
    rely on FR mechanism? Is it possible for you to use icmpecho probes or TCP
    based SLA monitoring? On this way you can tag those control packets to
    ensure highest priority for them...

    B.R.
    Igor
     
    Igor Mamuzic, Nov 1, 2005
    #2
    1. Advertisements

  3. Thanks for the response. Tracked down the EEK drops to IOS getting
    confused by the many test changes and not executing frame relay
    traffic shaping despite the configuration... "copy run start"
    followed by "reload" and the frame relay traffic shaping started
    working correctly and EEK no longer had problems, even under extreme
    overload, at least with CIR/bandwidth of 256Kbps and traffic shaped
    to 240Kbps.

    However, I still have a problem with queueing delays in the
    reserved bandwidth traffic when the background traffic goes into
    overload. E.g. applying 500Kbps of background traffic brings the
    delays and packet losses for background traffic up to the 10's
    of seconds and 60-80% range, but also hits the foreground (using
    130Kbps of 179Kbp reserved) with 2 second delays and 25% packet
    loss, which is unacceptable. I can cure it by applying policing to
    the background traffic, but then the background traffic is policed
    even when foreground traffic is not present. Foreground delay and loss
    are acceptable (under 200 ms and negligable) as long as foreground
    traffic is around 90 Kbps or less, regardless of background traffic.
     
    Vincent C Jones, Nov 1, 2005
    #3
  4. Vincent C Jones

    Merv Guest

    Would you need to implement frame-relay fragmentation on the 256K PVC ?
     
    Merv, Nov 1, 2005
    #4
  5. Vincent C Jones

    anybody43 Guest

    Vincent said:
    I have seen this once before. Needed TAC to fix it but
    they thought it quite common and suggested it
    right away when presented with whatever
    anomaly it was.

    Warning, I am speculating here (as usual?)
    It also seems rather likely that you will have
    considered these possibilities.

    Can you reduce the queue sizes?
    Is RED an option? (W or no W)

    Kind of reading between the lines
    I guess that you are using synthetic test traffic.
    Possibly in practise (with TCP anyway) the
    delays will be lower since the transmitters
    will back off. Unless of course the number of
    'sessions' is large.

    Good luck.
     
    anybody43, Nov 1, 2005
    #5
  6. Vincent C Jones

    Igor Mamuzic Guest

    Vincent,

    Maybe you should consider implementing LLQ, since there is normal that your
    foreground traffic experience some slight delay and loss (ok, I admit here
    that 25% isn't "some loss":) it's a loss that can impact net performance
    seriously) since router dequeues background traffic too, but several times
    less often comparing with the foreground. If you want to ensure absolute
    priority then you need to implement LLQ. In that case router will not
    forward packets from background traffic queue until foreground traffic queue
    is empty. Of course, this may lead to background traffic starvation, so it's
    important to police foreground traffic preventing it to totally occupies all
    the available bw. LLQ ensures just right that+when foreground exceeds it's
    policed limit - LLQ will ensure that it gets appropriate WFQ priority and
    it's Cisco's recommendation for VoIP and similar delay&loss sensitive
    traffic.
    If LLQ isn't appropriate for your traffic (as it isn't in my case, too),
    then you have to get down on your knees and prey your management for extra
    bw, I'm afraid:)

    B.R.
    Igor
     
    Igor Mamuzic, Nov 2, 2005
    #6
  7. Maybe I'm missing something, but in my original post, I mentioned
    that I get the same behavior using either the "priority" or the
    "bandwidth" keyword. I did not try fragmentation, but I did not
    see where that would help when the delays were tens of packets in
    duration and packet loss rates for priority packets were over 20%
    even though the priority allocation was not even close to being
    consumed.

    And yes, for my testing, both foreground and background traffic were
    artificially generated, and well behaved background traffic would be
    throttled by the dropped packets. But the requirement was for the
    foreground application to work well even in the presence of hostile
    background traffic (remember Nacchi/Welchia ??).
     
    Vincent C Jones, Nov 3, 2005
    #7
  8. Vincent C Jones

    Merv Guest

    IOS bug CSCsa65035 ?
     
    Merv, Nov 3, 2005
    #8
  9. Vincent C Jones

    dennis Guest

    Its almost comical that you claim to need EEKs to "ensure reliablitty"
    but they are causing you to lose traffic by falsing determing the link
    to be down. Another one of cisco's great "enhancements". How the world
    survived for so many years with EEKs is a mystery to me. EEKs are a
    waste of bandwidth. They don't ensure anything. If the link is down,
    the FR LMI will notifiy the router as per the FR spec. You dont need
    extra crap to do it. Its just cisco taking advantage of people who
    don't know how FR works.

    I'm not sure what the mumbo-jumbo in your FR settings mean (police cir,
    etc), but surely you aren't trying to force or limit your traffic to
    always be below the CIR, are you? You don't need to throttle below your
    CIR unless you get congestion notifications from the network. CIR
    doesn't mean thats all you can get (it actually doesnt MEAN anything
    really), its supposed to mean that anything below that will not be
    discarded by the network unless absolutely necessary. I've had links
    with 256K cir that could send at full t1 all day long without ever
    losing a packet.

    Are you trying to tell me that ciscos, with all of their "Features"
    can't properly react to network congestion notifications?

    I'd certainly suggest trying to get rid of your "exceed-action drop"
    directive, because all you're doing is discarding traffic that likely
    doesn't need to be dropped. Its completely ridiculous for you to have
    your router purposely drop traffic when the entire point of shaping is
    to avoid drops in the first place.

    Dennis
    www.etinc.com
     
    dennis, Nov 3, 2005
    #9
  10. A few comments on your comments in-line...

    Hence my assumption that the behavior observed is due to a defect. My
    posting was an attempt to solicit others' opinions as to whether
    the defect was in my configuration or in the IOS. As I posted in
    a followup, a reboot of the router cured the situation, strongly
    indicating that it was due to an IOS defect.
    Primarily by using routing protocols over the links which detect
    unreported failures through the use of network layer keepalive
    exchanges. Unfortunately, there are other of "cisco's great
    "enhancements"" that require the failure of a path to be detected at the
    link layer, which routing protocols do not provide.
    If properly implemented and configured, they should ensure that any
    frame relay network error which blocks a PVC but is not reported by
    the LMI will cause the router to declare the subinterface defined
    for that DLCI to be down.
    Nice concept if it were true. Have you read the FR spec? Can you
    refer us to where it states that the LMI must detect and report all
    failures which render a DLCI as not useable? I believe you will
    find that the LMI is only REQUIRED to be locally meaningful and
    that any reflection of end-to-end status is OPTIONAL. Regardless,
    what it says in the FR spec (and technically, you would also need to
    specify which one) is moot. There are real world frame relay service
    providers who do not always signal PVC connectivity problems in
    the LMI exchanges between the service provider and the service user.
    Au contraire, mon ami. It is just cisco responding to the needs of
    practitioners whose networks must work reliably in the real world.
    Yes, I am, and for good reason. At the other side of the frame relay
    network, the physical link servicing this PVC is a fractional T1
    with a physical data rate which matches the CIR.
    Frequently, but not always true, as alredy explained. You need to
    understand the specifics of an implementation before attempting to
    apply blanket generalities. In this application, by the time BECNs
    were received, priority packets would be queueing up at the other
    side of the network awaiting delivery out the fractional T1 and
    the delays would engender unacceptable application performance.
    Yes. This was actually a common problem when frame relay first
    came out twenty years ago. Prototypes would perform beautifully,
    and sometimes even the production rollout would be flawless, but
    eventually, the overall traffic levels supported by the service
    provider would grow to reach their design levels and CIR would
    start to be enforced and the applications which depended on "free
    bandwidth" would fall flat on their faces.
    As stated above, BECN and FECN are irrelevant in this application and
    were not tested.
    There is traffic which pays for the network and there is traffic
    which is along for a free ride. I am inclined to agree with your
    definition of the "entire point of shaping", but remember that
    I was doing more than shaping. I was also providing Quality of
    Service to ensure that the paying traffic got the service it
    required in order to be willing to continue paying the bills. The
    traffic policing applied to freeloader traffic was an attempt to
    get around faults uncovered in the traffic shaping which allowed
    the freeloader traffic to slow down the paying traffic long before
    the paying traffic reached the level it was paying for. I
    included it to reinforce the fact that the problem was in the traffic
    shaping, not an inability of the router to cope with gross overload in
    the freeloader traffic.
    P.S. I normally ignore flames, but after reading the white paper
    "Bandwidth Management for ISPs and Universities" on your web site,
    I thought I would give you the benefit of the doubt. Just be careful
    you don't fall into the same trap you warn your customers against:
    "The worst case is if you've used some other bandwidth management
    product and you think you know everything. Unfortunately, what you
    know is defined by terms that likely only apply to the product
    you have been using." Been there, done that, been burnt...its a
    philosophy that applies to far more than just bandwidth management.
     
    Vincent C Jones, Nov 4, 2005
    #10
  11. Vincent C Jones

    dennis Guest

    I guess you didn't find the frame relay FAQ, once a top-rated page and
    now a bit out of date but entertaining nonetheless:

    http://www.etinc.com/frfaq.htm

    I wrote what was once the defacto standard frame relay implementation
    for linux/BSD (and in fact our bandwidth management product was
    originally written to use on frame relay networks to minimuze drops),
    and thusly, I've read the specs many, many times.

    I may have forgotten that the assumption that FR switches are
    implemented according to spec is a dubious one much of the time, but
    LMI reporting is SUPPOSED to be the end-to-end status. However I
    generally would equate a FR keepalive to a PPP keepalive. They're much
    more likely to generate a false negative then they are to save the day.
    In most cases, knowing the DLCI isn't working (but the switch isn't
    reporting it down) isn't very useful information, since you probably
    know it isn't working just as quickly from someone complaining about
    it. Your routing protocols (if you need to switch something on a down)
    have keepalives of their own, so the extra few seconds you may save
    once in a blue moon don't seem worth it.

    I think when you use "extras" you have to weigh the chances that they
    are going to "save the day" against the damage they may cause by
    failing themselves. Just because something seems like a good idea
    doesn't mean it is (spanning tree comes to mind). In this case, are the
    EEKs going to save you tremendous time and/or money more often then
    they're going to cause a perfectly good frame relay connection to be
    thought to be down?. Another prime example is vlan trunking. You make
    an utter mess of your network in order to secure against some unlikely
    event. So you suffer 100% of the time in order to avoid a .1% chance of
    a problem. You make your network unmanageable by the avg technician,
    and you limit your choice in equipment that you can use on your
    network. And in the end the vlan trunking itself causes problems much
    more often then the events that you're trying to avoid. Its crazy.

    DB
     
    dennis, Nov 5, 2005
    #11
  12. Vincent C Jones

    dennis Guest

    Also of note is that EEKs will not detect "all failures which render a
    DLCI unusable". If they did, there would be a better case.
     
    dennis, Nov 5, 2005
    #12
  13. A few comments in-line to your response, followed by my response...

    Actually, I was looking for a reference to an accepted standard
    such as one from the Frame Relay Forum or ITU-T. You might want to
    go back and read your FAQ yourself, as even there the only comment
    re: LMI meaning is the statement: "DLCIs are marked "ACTIVE" if
    there is a valid connection set up for that particular DLCI. If
    you do not get an ACTIVE response from the switch, then the frame
    relay network provider probably does not have the connection set up
    properly." Being configured correctly with successful handshaking
    over the local loop does not equate to end-to-end connectivity,
    at least not in the real world of commercial frame relay networks.
    Excuse me, but I can't go to Sprint or Verizon and say "Dennis claims
    your LMI should be more meaningful." I need a citation to a specific
    page & paragraph number in a binding standard. Which neither your
    linux/BSD implementation nor your frame relay FAQ are. And no,
    I am not about to waste an hour or two going over the standards
    just to prove you wrong, because I don't care because I have to
    deal with the LMI as provided by my clients' suppliers.
    As I have said multiple times, I would love to have a citation to an
    accepted standard that specifies this requirement for LMI.
    Good analogy.
    Hmmm. I have not had this problem with PPP. I have had problems with
    brain dead implementations of PPP LQM, but that is a different story.
    My client's pay a lot of money to minimize the probability that their
    users will ever notice, yet alone complain about, a down link. There
    is no way to achieve even two or three nines of availability if you
    have to wait for users to complain before taking action. It can be
    scary how many people have to discover the hard way that redundancy
    without proper attention to design is a good way to spend money
    with no improvement to availability.
    As I have stated previously, some Cisco features are not integrated
    with the routing protocols and will only work if a link goes down
    at the link layer. Not my choice.
    Agree. You are preaching to the choir on this one.
    Spanning tree is OK for what it was designed to achieve. How it is used
    is another story. When you try to make something idiot proof, along
    comes a bigger idiot.
    No, hence the original posting in an attempt to determine why EEKs were
    failing in my test configuration.
    Agree that it is scary how many networks are designed by people who
    have no idea of how networking protocols work, and consequently no
    idea of their limitations.
    I left my original response intact just in case you want to take the
    time to read it more carefully, because you appear to be reacting
    to key words rather than the intent of my posting. Instead of
    assuming I'm a wet behind the ears noobie, go back and re-read my
    postings assuming competence on my part.
     
    Vincent C Jones, Nov 6, 2005
    #13
  14. Very true, nor will OSPF or EIGRP hello exchanges. But all will
    detect common failures which are not reported by LMI on major
    provider networks and should be able to detect all instances of
    total loss of communications capability. Detecting excessive BER is
    much trickier. PPP LQM is designed to, but Cisco's implementation
    is not useful. ISIS allows configuring large hello packets, but the
    cost of the IOS upgrades required to get support make it difficult
    to justify. The money is better spent on a good network management
    system which tracks and reports rising BER on a link.
     
    Vincent C Jones, Nov 6, 2005
    #14
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.