Dog-slow 2621...any ideas why?

Discussion in 'Cisco' started by John Caruso, Apr 30, 2004.

  1. John Caruso

    John Caruso Guest

    We have a 2621 router that's receiving traffic on its fast0/0 interface
    and then putting it back out over a GRE tunnel (tunnel0) to another site
    (the GRE tunnel is encrypted with IPSEC as it passes through a perimeter
    PIX). The router is receiving a 2-3Mbps data stream on f0/0 and is then
    passing that same data stream out over tunnel0 (which is router out
    again via fast0/0).

    With this relatively tiny traffic flow, the router's CPU is consistently
    pegged at 60% (?!?). Is that an expected utilization level for such
    limited throughput? Here's a representative sample of "show proc cpu
    sorted" output:

    CPU utilization for five seconds: 61%/17%; one minute: 59%; five minutes: 60%
    PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
    34 4948576 121764 40640 34.00% 33.30% 33.01% 0 IP Input
    5 1006912 82233 12244 8.47% 8.23% 8.24% 0 Pool Manager
    36 20204 3064 6593 1.83% 0.61% 0.99% 66 Virtual Exec
    3 5152 5953 865 0.07% 0.03% 0.03% 0 OSPF Hello
    1 0 2 0 0.00% 0.00% 0.00% 0 Chunk Manager
    2 336 2140 157 0.00% 0.00% 0.00% 0 Load Meter
    4 8992 1247 7210 0.00% 0.05% 0.05% 0 Check heaps
    6 0 1 0 0.00% 0.00% 0.00% 0 AAA_SERVER_DEADT
    7 0 2 0 0.00% 0.00% 0.00% 0 Timers
    8 0 2 0 0.00% 0.00% 0.00% 0 Serial Backgroun
    9 0 2 0 0.00% 0.00% 0.00% 0 AAA high-capacit

    (As a side question, why is the CPU total so much more than the total of
    the individual columns? This appears to be a consistent behavior.)

    When the throughput sometimes goes up to 4Mbps, the 2621 CPU is 100% busy.

    Here's the relevant config from the router, such as it is:

    ip cef
    interface Tunnel0
    bandwidth 100000
    ip address a.b.c.d
    tunnel source Loopback1
    tunnel destination e.f.g.h
    interface FastEthernet0/0
    ip address i.j.k.l
    no ip redirects
    no ip proxy-arp
    duplex auto
    speed auto
    no cdp enable

    As you can see, not much there, and no ACLs or anything else. And here's
    what the interface stats look like while the CPU is spiking:

    router#show int tunnel0
    Tunnel0 is up, line protocol is up
    Hardware is Tunnel
    MTU 1514 bytes, BW 100000 Kbit, DLY 500000 usec,
    reliability 255/255, txload 1/255, rxload 7/255
    Encapsulation TUNNEL, loopback not set
    Keepalive not set
    Tunnel protocol/transport GRE/IP, key disabled, sequencing disabled
    Tunnel TTL 255
    Checksumming of packets disabled, fast tunneling enabled
    Last input 00:00:00, output 00:00:00, output hang never
    Last clearing of "show interface" counters never
    Input queue: 8/75/0/0 (size/max/drops/flushes); Total output drops: 13
    Queueing strategy: fifo
    Output queue: 0/0 (size/max)
    5 minute input rate 2865000 bits/sec, 305 packets/sec
    5 minute output rate 142000 bits/sec, 154 packets/sec
    10426041 packets input, 80312218 bytes, 0 no buffer
    Received 0 broadcasts, 0 runts, 0 giants, 0 throttles
    0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored, 0 abort
    3815651 packets output, 360576285 bytes, 0 underruns
    0 output errors, 0 collisions, 0 interface resets
    0 output buffer failures, 0 output buffers swapped out

    router#show int f0/0
    FastEthernet0/0 is up, line protocol is up
    Hardware is AmdFE, address is
    MTU 1500 bytes, BW 100000 Kbit, DLY 100 usec,
    reliability 255/255, txload 7/255, rxload 7/255
    Encapsulation ARPA, loopback not set
    Keepalive set (10 sec)
    Full-duplex, 100Mb/s, 100BaseTX/FX
    ARP type: ARPA, ARP Timeout 04:00:00
    Last input 00:00:00, output 00:00:00, output hang never
    Last clearing of "show interface" counters never
    Input queue: 49/75/12723/0 (size/max/drops/flushes); Total output drops: 0
    Queueing strategy: fifo
    Output queue: 0/40 (size/max)
    5 minute input rate 3127000 bits/sec, 710 packets/sec
    5 minute output rate 3037000 bits/sec, 457 packets/sec
    22780914 packets input, 848212467 bytes
    Received 125 broadcasts, 0 runts, 0 giants, 0 throttles
    0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
    0 watchdog
    0 input packets with dribble condition detected
    14270991 packets output, 394991767 bytes, 0 underruns
    0 output errors, 0 collisions, 5 interface resets
    0 babbles, 0 late collision, 0 deferred
    0 lost carrier, 0 no carrier
    0 output buffer failures, 0 output buffers swapped out

    If anyone has any ideas or suggestions about what might be going on here
    or how to remedy it (beyond getting a faster router), please let me know.
    BTW, the router is running IOS 12.3(6a).

    - John
    John Caruso, Apr 30, 2004
    1. Advertisements

  2. John Caruso

    mh Guest

    check the output of "sh int stat" to see what switching path is being
    used on each interface
    mh, Apr 30, 2004
    1. Advertisements

  3. John Caruso

    John Caruso Guest

    Mainly process switching:

    router# show int stat

    Switching path Pkts In Chars In Pkts Out Chars Out
    Processor 22623952 280167058 11364006 3920105756
    Route cache 8050498 922485732 7959718 982775134
    Total 30674450 1202652790 19323724 607913594
    Switching path Pkts In Chars In Pkts Out Chars Out
    Processor 11292688 4024970618 22117 3042325
    Route cache 2458419 361698763 5499181 568888751
    Total 13751107 91702085 5521298 571931076

    But CEF is enabled, as I showed in the config. I did also try "ip route-
    cache same-interface" earlier during a large transfer, since the packets
    are coming in fast0/0 and going back out on it as well (well, out the
    *physical* interface--logically, they're going back out on tunnel0), but
    it didn't help.

    Maybe the GRE tunnel is forcing the router to process switch, since all
    packets have to be encapsulated?

    Any suggestions for what to do, or is this just a natural result of this

    (As a point of comparison, this same traffic flow is going through another
    2621 router on its way to the outside world, and that router is only
    topping out at 15%...but it's just handing the packets from f0/0 to f0/1,
    and doing almost no process switching.)

    - John
    John Caruso, May 1, 2004
  4. John Caruso

    mh Guest

    Maybe the GRE tunnel is forcing the router to process switch, since all
    See if you can change the router to one having two Fast Ethernet interfaces
    to determine if that makes a difference.
    mh, May 1, 2004
  5. John Caruso

    mh Guest

    BTW why is the vGRE being used for - to carry non-IP traffic
    or routing protocols ???
    mh, May 1, 2004
  6. That looks likely at this point, though CEF for multipoint GRE tunnels
    was introduced fairly recently
    so it would be odd if point-to-point tunnels weren't able to be
    CEF-switched. Try 'show cef not-cef-switched' to see if the
    'Unsupported' column is incrementing.

    Terry Baranski, May 1, 2004
  7. I agree, from the config snippet it doesn't appear that mGRE tunnels are
    in use; just ordinary point-to-point GRE tunnels.

    I have a 3620 router which is performing the same function, and it does
    appear to be fast-switching packets which enter through one interface; are
    GRE-encapsulated and then sent out the same interface:

    tunnel-rtr#sh int stat

    Switching path Pkts In Chars In Pkts Out Chars Out
    Processor 49011 9393449 94980 9849685
    Route cache 262630 108017658 261848 111169096
    Total 311641 117411107 356828 121018781
    Switching path Pkts In Chars In Pkts Out Chars Out
    Processor 1293 134005 517 82135
    Route cache 126618 27312552 136012 84898793
    Total 127911 27446557 136529 84980928

    This router is running 11.3(11e), IP Only, and CEF is not available (first
    available in IOS 12.0). "Show ip interface ..." states that fast-switching
    is enabled for the FastEthernet0/0 and Tunnel0 interfaces.


    Matthew Melbourne, May 1, 2004
  8. John Caruso

    AnyBody43 Guest

    As you have identified the router is process switching
    the packets.

    "CPU utilization for five seconds: 61%/17%"
    Means total load 61% of which 17% is at interrupt level.
    Often this can be taken as 44% process switching and 17% fast

    It may well be that tunnelled traffic is always process switched
    and in any case there will most likely be a big overhead
    for the tunnel. For example a new TCP checksum
    may need to be computed, buffers allocated, data copied
    between buffers . . .? I don't know the architecture of tunnels.
    There is no entry in the index of "Inside Cisco IOS Software
    Architecture" for "Tunnel", "GRE", or "Generic ... Encap" either.

    Maybe routers that have "Particle Buffers" do tunneling better?

    You will need to search the docs to see if there are any clues about
    whether tunnels can be fast switched or not.

    Another thing to consider is the possibility that the router
    is having to fragment a substantial porportion of the traffic.
    If that is the case then eliminating that operation could well
    make a big difference.

    These things (routers) do not do magic, just be amazed that
    it can run at 100% CPU without crashing every 5 minutes:)

    If you need a bigger router then you need a bigger router.
    AnyBody43, May 1, 2004
  9. John Caruso

    John Caruso Guest

    Yeah, agreed (and thanks for the link).
    Nope; the only non-zero columns are "No_adj" and "Receive" (39646 and
    26591106 packets, respectively). And for the "Receive" column, the Cisco
    docs say:

    Indicates the number of packets ultimately destined to the router, or
    packets destined to a tunnel endpoint on the router. If the decapsulated
    tunnel packet is IP, the packet is CEF switched. Otherwise, packets are
    process switched.

    However, *all* of the packets in question on this GRE tunnel are IP (and
    more specifically TCP--part of a snapmirror transfer over TCP between two
    Netapp filers). Yet clearly they're not being CEF switched.

    I'm starting to think I may need to refer this issue to the TAC, since it
    seems clear that the bulk of these packets should not be process switched.
    Perhaps this is a bug in IOS 12.3(6a).

    - John
    John Caruso, May 3, 2004
  10. John Caruso

    John Caruso Guest

    That's correct (the router does have two other tunnels, but they're both
    point-to-point also, and neither of them are involved in the transfer
    that's causing the CPU spike).
    Very interesting. As a test, I decided to try turning off CEF entirely
    so that the interfaces would perform fast switching, as in your environment
    (verified with show ip int). No the CPU hit was 100% instead of
    just 60%. So CEF is indeed helping, although it's not handling nearly as
    many packets as it should be handling, based on the Cisco documentation.

    If anyone has any other ideas or suggestions for configurations to try, I'd
    be interested to hear them.

    - John
    John Caruso, May 3, 2004
  11. I wonder whether this is Cisco bug CSCdy55588, "Packets are not
    fast-switched with GRE", first reported in 12.3(0.5).

    Are you able to try an alternative, such as the latest 12.2 GD?


    Matthew Melbourne, May 3, 2004
  12. John Caruso

    John Caruso Guest

    Yes, this was it. I'd checked for fragmentation last week and I'd swear
    there wasn't very much...but clearly I was wrong, because when I checked
    again today, fragmentation was running at about 22%. And not only that,
    but the 2621 is on the reassembly end of these fragments (which is the
    more CPU-intensive half of the operation).

    The fix was to lower the GRE tunnel MTU to 1476 bytes. What I'm not clear
    on is why the tunnel MTU wasn't set to 1476 in the first place, though.
    Various Cisco docs claim that the default *should* be equal to the MTU of
    the output interface minus the 24 bytes of GRE overhead--for example:

    But clearly this wasn't the case, since when I hardcoded "ip mtu 1476"
    into the tunnel configuration, that fixed the problem. Also, oddly, a
    show interface on this tunnel interface (or any of the other tunnel
    interfaces on this router) ALWAYS shows an MTU size of 1514 bytes:

    Tunnel0 is up, line protocol is up
    Hardware is Tunnel
    Internet address is a.b.c.d/e
    MTU 1514 bytes, BW 100000 Kbit, DLY 500000 usec,

    This is with the MTU set to 1476, but the displayed value always stays at
    1514 bytes no matter what MTU size I set in the configuration. Weird.

    BTW, trying "ip tcp adjust-mss 1436" as suggested in that same document
    was a complete bust (surprisingly), even though this was a TCP transfer;
    with this as the only change, the router was again pegged at 100%. My
    only guess is that the packet inspection required by this command forces
    the router to process switch.

    Thanks to everyone for the assistance!

    - John
    John Caruso, May 4, 2004
  13. John Caruso

    mh Guest

    so what is your avgerage CPU utilization now?
    mh, May 4, 2004
  14. John Caruso

    John Caruso Guest

    About 5-7% during the snapmirror transfers. I'm now seeing a large rsync
    over the same path that's hitting 20% (formerly masked by the snapmirror
    transfers); my guess is that the initiating host isn't doing PMTUD, or
    isn't doing it correctly.

    I may file bug reports with the TAC regarding the fact that the GRE MTU
    wasn't already set to 1476 (as per various docs), and the fact that "show
    int" always shows an MTU of 1514 for any GRE tunnels.

    - John
    John Caruso, May 6, 2004
  15. John Caruso

    AnyBody43 Guest

    Thanks for posting the outcome.

    MTU over any kind of tunnels is something that worries me a lot.

    Maybe one day I will understand it all.
    AnyBody43, May 6, 2004
  16. John Caruso

    Captain Guest

    How do you check for packet fragmentation?
    (I may have a similar mtu issue!)
    Captain, May 12, 2004
  17. John Caruso

    mh Guest

    Use the command "show ip traffic"

    See the Frags: counters under IP statistics
    mh, May 13, 2004
  18. John Caruso

    Hansang Bae Guest

    sho ip traffic

    Or you can ping with "don't fragment" bit set and see when it breaks.
    Don't forget to account for the 8 byte ICMP header.



    "Somehow I imagined this experience would be more rewarding" Calvin
    *************** USE ROT13 TO SEE MY EMAIL ADDRESS ****************
    Due to the volume of email that I receive, I may not not be able to
    reply to emails sent to my account. Please post a followup instead.
    Hansang Bae, May 13, 2004
  19. John Caruso

    Captain Guest

    I get the following:
    Router>sh ip traffic
    IP statistics:
    Rcvd: 767573726 total, 20914467 local destination
    1 format errors, 118 checksum errors, 414776 bad hop count
    0 unknown protocol, 0 not a gateway
    0 security failures, 0 bad options, 435 with options
    Opts: 0 end, 0 nop, 0 basic security, 0 loose source route
    0 timestamp, 0 extended security, 0 record route
    0 stream ID, 0 strict source route, 435 alert, 0 cipso
    0 other
    Frags: 7439848 reassembled, 4 timeouts, 0 couldn't reassemble
    34827678 fragmented, 0 couldn't fragment
    Bcast: 29686 received, 13473 sent
    Mcast: 0 received, 0 sent
    Sent: 16477883 generated, 446310396 forwarded
    Drop: 2644 encapsulation failed, 199 unresolved, 0 no adjacency
    12884 no route, 0 unicast RPF, 0 forced drop

    The fragmented count looks high, is this a problem?

    All my other routers are 0.
    Captain, May 13, 2004
  20. John Caruso

    Ivan Ostres Guest

    Fragmented and reassembled looks very high. It it important to notice
    that both operations are CPU and memory intensive, so that might be a
    reason for high CPU and memory utilisation.

    It would be good to find out, why so much fragmentation/reassemling

    Ivan Ostres, May 13, 2004
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.