C6500 High Interrupt Load caused by ARP

Discussion in 'Cisco' started by Holger Amberg, May 10, 2006.

  1. Hello,

    i have a Cisco 6509 with Sup720 running IOS (tm) s72033_rp Software
    (s72033_rp-IPSERVICESK9_WAN-M), Version 12.2(18)SXE4. In the last 5 days
    i've discovered a strange problem. The cpu load raised from 5 - 10% to
    50-90% at interrupt level. The traffic hasn't changed so much. If i run
    a clear arp-cache the load suddenly drops down to 5 - 10% for a short
    time (max. 1 hour). With the growing of the arp table above 8000 entries
    the load raises up to 90%. What could be a reason for that? As far as i
    know the Cisco 6500 is able to handle more than 64k of arp entries.

    c6500#sh ip arp sum
    9136 IP ARP entries, with 532 of them incomplete

    If i change the arp timeout to 60 seconds for example everything works fine.

    All portcards are equipped with the DFC modules. The Sup720 is equipped
    with a PFC3B card.

    Hopefully someone of you has a hint or a solution for me. Thank you in
    advance. If you need additional information please let me know.

    Best regards,

    Holger Amberg
     
    Holger Amberg, May 10, 2006
    #1
    1. Advertisements

  2. Holger Amberg

    Merv Guest

    Your 6500 could be the victim of an ARP DOS attack.

    Becasue ARPs are broadcast traffic they are very easy to catpure using
    a sniffer (Etherreal) to see if there is a high volume of ARPs being
    sourced from the same MAC address.

    What is particular difficult to track down is DOS programs that change
    the source MAC address.
     
    Merv, May 10, 2006
    #2
    1. Advertisements

  3. Hello,
    Thanks for your reply, unfortunately this doesn't seem to be the reason.
    The most arp requests (around 90%) are sourced by the 6500's vlan
    interface. The other requests are sourced by some servers in the
    datacenter, but very less requests.

    Best regards,

    Holger Amberg
     
    Holger Amberg, May 10, 2006
    #3
  4. Holger Amberg

    anybody43 Guest

    i have a Cisco 6509 with Sup720 running IOS (tm) s72033_rp Software
    At first sight this appears to be unrelated to ARP. The only thing that
    the
    device does at Interrupt Level (as I understand it) is "Fast Switch"
    packets.

    I think that normally a 6500 SE 720 does CEF switching (routing that
    is)
    in Hardware however some
    packets can not be hardware switched and in that case the
    packets are passed to the Processor to be routed.

    I would proceed as for CPU load troubleshooting on a traditional
    router.

    sh int switching
    sh int statistics

    Do you have 8000 nodes in your network?

    As Merv says consider a DOS attack.

    I am not saying that the issue is nothing to do with the
    8000 ARP entries just that I doubt that the traffic that is
    causing the high CPU is ARP traffic.

    Could the large number of ARP entries be related to proxy ARP?


    Post more detail. ARP tables, what are your address ranges?

    I wonder if you can find some way to examine the software
    routing process's routing-cache?


    e.g. Can you turn off software CEF but leave hardware CEF
    on and then do
    sh ip route-cache

    This would be an out of hours job 'cos it could all melt.
     
    anybody43, May 11, 2006
    #4
  5. Holger Amberg

    Merv Guest

    The 6500 will ARP for packets it is trying to deliver onto a VLAN
    segment.

    An IP scanner running thru your subnets would generate that type of
    traffic and result in large number of incompletes if you IP subnet are
    sparesely populated.
     
    Merv, May 11, 2006
    #5
  6. Holger Amberg

    jay Guest

    Hi,

    What does 'show int stats' show ?

    #show int stats
    Vlan1
    Switching path Pkts In Chars In Pkts Out Chars Out
    Processor 24483518 1744541863 13997696 1450102892
    Route cache 273 44304 770707 173521523
    Distributed cache 0 0 0 0
    Total 24483791 1744586167 14768403 1623624415
    Vlan2
    Switching path Pkts In Chars In Pkts Out Chars Out
    Processor 69662642 457846948 30465060 435446238
    Route cache 78161980 1497793252 100308501 1929661234
    Distributed cache 0 0 0 0
    Total 147824622 1955640200 130773561 2365107472
    Vlan3
    Switching path Pkts In Chars In Pkts Out Chars Out
    Processor 6879745 358345705 22815 2486835
    Route cache 0 0 0 0
    Distributed cache 0 0 0 0
    Total 6879745 358345705 22815 2486835
    Vlan4
    Switching path Pkts In Chars In Pkts Out Chars Out
    Processor 6879166 358286507 22815 2486835
    Route cache 0 0 0 0
    Distributed cache 0 0 0 0
    Total 6879166 358286507 22815 2486835

    Is it actually ARP input ?

    #show processes cpu | ex 0.00%
    CPU utilization for five seconds: 0%/0%; one minute: 1%; five minutes:
    0%
    PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
    9 9879256 29456820 335 0.08% 0.03% 0.05% 0 ARP Input

    31 19741468 442585601 44 0.24% 0.25% 0.16% 0 IP Input
     
    jay, May 11, 2006
    #6
  7. Holger Amberg

    Merv Guest

    clear interface counters, wait 5 minutes, then capture and
    post the output for "sh int acc" for vlan 1 thru vlan 4 interfaces
     
    Merv, May 11, 2006
    #7
  8. Holger Amberg

    Merv Guest

    The other thing you might consider doing is enabling NETFLOW accounting
    on each of the VLAN interfaces so that you could see if the source of
    packet being sent to non-existing destionation IP address in each of
    the VLAN IP subnets
     
    Merv, May 12, 2006
    #8
  9. Hi,

    below the needed data:

    c6500#sh int vlan1 acc
    Vlan1 VLAN1
    Protocol Pkts In Chars In Pkts Out Chars Out
    Other 42986 4318002 0 0
    IP 67839362074 32993319263061 67276120863
    32858452676179
    DEC MOP 330 25410 330 42570
    ARP 5293273 318627286 37479315 4197683280


    c6500#show processes cpu | ex 0.00%
    CPU utilization for five seconds: 54%/50%; one minute: 57%; five
    minutes: 58%
    PID Runtime(ms) Invoked uSecs 5Sec 1Min 5Min TTY Process
    8 3848836 3499370 1099 0.71% 1.17% 1.02% 0 ARP Input
    118 4563888 11286276 404 1.43% 1.87% 2.01% 0 IP Input
    159 826800 40730 20299 0.31% 0.42% 0.42% 0 Adj Manager
    164 813396 81268 10008 0.79% 0.36% 0.32% 0 IPC LC
    Message H
    167 339552 353729 959 0.31% 0.26% 0.24% 0 CEF process
    245 183132 364740 502 0.15% 0.04% 0.01% 0 RPC
    c6k_rp_envir
    262 1078256 7232968 149 0.71% 0.33% 0.34% 0 Port
    manager per

    We have about 2000 servers with around 8000 assigned IP adresses.

    c6500#sh ip arp sum
    7991 IP ARP entries, with 775 of them incomplete

    If i clear the arp cache, the cpu load falls down to 5% - 10% for a
    short time. No real idea. I've scanned the network for attacks, but was
    unable to find something suspicious. Proxy-ARP ist disabled on all
    interfaces.

    Best regards,

    Holger Amberg
     
    Holger Amberg, May 13, 2006
    #9
  10. Holger Amberg

    Merv Guest

    Put a sniifer on a port that is in VLAN1 and capture ARP request and
    replies to see if there is any pattern to the ARP requests.

    If for examplke the destination address increments in each ARP
    requests, then thos requests would be consider very suspicious

    Also is the encapsulation failure counter seen in show ip traffic
    rapidly incrmenting ?
     
    Merv, May 13, 2006
    #10
  11. Holger Amberg

    Merv Guest

    Please post the output for "show int vlan 1" so we can see how many
    packets are being switched at layer 2 and how many at layer 3
     
    Merv, May 13, 2006
    #11
  12. Holger Amberg

    Merv Guest

    Until you find the cause of your issue, you might want to disable ICMP
    unreachable on the VLAN interfaces
     
    Merv, May 13, 2006
    #12
  13. Holger Amberg

    Merv Guest

    It may be that the 6500 continues to try to resolve incomplete entries.
    If that is the case then that would explain the lower CPU usage after
    you clear the ARP cache.

    If this is the case then large number of incompletes would be quite
    bad...
     
    Merv, May 13, 2006
    #13
  14. Hello,
    Attached the full output:

    c6500#sh int vlan1
    Vlan1 is up, line protocol is up
    Hardware is EtherSVI, address is 0015.2ccb.3a00 (bia 0015.2ccb.3a00)
    Description: VLAN1
    Internet address is XXX.XX.XXX.XXX/24
    MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec,
    reliability 255/255, txload 29/255, rxload 29/255
    Encapsulation ARPA, loopback not set
    ARP type: ARPA, ARP Timeout 01:00:00
    Last input 00:00:00, output 00:00:00, output hang never
    Last clearing of "show interface" counters never
    Input queue: 1/4096/342883/342883 (size/max/drops/flushes); Total
    output drops : 0
    Queueing strategy: fifo
    Output queue: 0/4096 (size/max)
    5 minute input rate 1143780000 bits/sec, 322918 packets/sec
    5 minute output rate 1148701000 bits/sec, 324610 packets/sec
    L2 Switched: ucast: 5032784627 pkt, 3875091409210 bytes - mcast:
    2690022 pkt, 334426979 bytes
    L3 in Switched: ucast: 117782279980 pkt, 57251998773550 bytes -
    mcast: 0 pkt, 0 bytes mcast
    L3 out Switched: ucast: 117777279685 pkt, 57260176366988 bytes mcast:
    0 pkt, 0 bytes
    120346959816 packets input, 59364744882436 bytes, 0 no buffer
    Received 2259513 broadcasts (367382 IP multicast)
    0 runts, 0 giants, 0 throttles
    0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
    119435831020 packets output, 59176291086168 bytes, 0 underruns
    0 output errors, 0 interface resets
    0 output buffer failures, 0 output buffers swapped out

    Today i found some new log entries:

    May 14 22:50:53: %EARL_L3_ASIC-SP-3-INTR_WARN: EARL L3 ASIC: Non-fatal
    interrupt Packet Parser block interrupt
    May 14 22:51:09: %MLS_STAT-SP-4-IP_CSUM_ERR: IP checksum errors
    May 14 22:51:53: %EARL_L3_ASIC-SP-3-INTR_WARN: EARL L3 ASIC: Non-fatal
    interrupt Packet Parser block interrupt
    May 14 22:52:54: %EARL_L3_ASIC-SP-3-INTR_WARN: EARL L3 ASIC: Non-fatal
    interrupt Packet Parser block interrupt
    May 14 22:53:56: %EARL_L3_ASIC-SP-3-INTR_WARN: EARL L3 ASIC: Non-fatal
    interrupt Packet Parser block interrupt

    Google search doesn't provide me with usefull information to this error
    messages. Maybe the large number of incompletes is the problem, but then
    it would be very hard to solve that (except of static arp usage?).

    The unreachables option is already set:

    no ip redirects
    no ip unreachables
    no ip proxy-arp


    Best regards,

    Holger Amberg
     
    Holger Amberg, May 15, 2006
    #14
  15. Holger Amberg

    Merv Guest

    by the looks of the L2 and L3 counters values, most of you traffic is
    being handled by CPU. It would be useful if you cleared the counters
    first and then reposted the result.

    Do the ARP incompletes occur accross all of the VLANs or just a few.

    In order to find the source IP address of the packet that are resulting
    in ARP incomplete, you could define an extended access list to permit
    the destination addresses of some of the incomplete with VLAN1
    destiantion IP addresses and then permit any any. You can then use the
    ACL matching counters to tell you if a particular VLAN is the source of
    the traffic. This ACL would be applied on all or some of the VLAN
    interface other than VLAN1.

    Something like:

    ip access-list extended ARP_SOURCE
    permit ip any host x.x.x.x ! where x.x.x.x is one of incomplete ARP
    permit ip any any
    exit

    show access-list ARP_SOURCE ! check match counters
     
    Merv, May 15, 2006
    #15
  16. Holger Amberg

    Merv Guest

    Also what does "show mls statistics" display ?
     
    Merv, May 15, 2006
    #16
  17. Holger Amberg

    anybody43 Guest

    c6500#sh int vlan1
    <snipped for brevity>

    Merv said:-
    Merv,

    I am not up to date with the sup720 architecture.
    Would you mind explaining the above please?

    Does the SUP720 use MLS?

    I had assumed that it was like the 4500 SUP 4/5 which
    I think of as "Hardware CEF". There is no MLS,
    no "first packet" just wire rate forwarding on loads of ports:)
    at layer 3 (or 2 clearly).

    Also:-
    Back to the OP's issue.

    I am having trouble with the idea that the root cause of this
    is ARP related. Sure it may be a symptom, of some
    inappropriate network or end station behaviour but as I see it
    the 50% Interrupt Level CPU must be a consequence of
    switching actual traffic and not anything directly
    related to ARP packets being processed by the box.

    I agree that is is a good idea to track down the sources of the
    incomplete ARP entries.

    I have:-

    sw1#sh ip arp sum
    414 IP ARP entries, with 3 of them incomplete

    sw2#sh ip arp sum
    421 IP ARP entries, with 5 of them incomplete

    1% vs the OP's 10%.


    Thanks.
     
    anybody43, May 15, 2006
    #17
  18. Holger Amberg

    Merv Guest

    Merv, May 15, 2006
    #18
  19. Hello,
    Attached the cleared statistics:

    c6500#sh int vla1
    Vlan1 is up, line protocol is up
    Hardware is EtherSVI, address is 0015.2ccb.3a00 (bia 0015.2ccb.3a00)
    Description: VLAN1
    Internet address is 193.22.254.200/24
    MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec,
    reliability 255/255, txload 30/255, rxload 31/255
    Encapsulation ARPA, loopback not set
    ARP type: ARPA, ARP Timeout 01:00:00
    Last input 00:00:00, output 00:00:00, output hang never
    Last clearing of "show interface" counters 00:05:01
    Input queue: 3/4096/0/0 (size/max/drops/flushes); Total output drops: 0
    Queueing strategy: fifo
    Output queue: 0/4096 (size/max)
    5 minute input rate 1224177000 bits/sec, 251822 packets/sec
    5 minute output rate 1202331000 bits/sec, 246864 packets/sec
    L2 Switched: ucast: 17856288 pkt, 19350134534 bytes - mcast: 2057
    pkt, 251767 bytes
    L3 in Switched: ucast: 60307951 pkt, 28623964763 bytes - mcast: 0
    pkt, 0 bytes mcast
    L3 out Switched: ucast: 60287765 pkt, 28625838855 bytes mcast: 0 pkt,
    0 bytes
    76192988 packets input, 46145818370 bytes, 0 no buffer
    Received 1685 broadcasts (499185 IP multicast)
    0 runts, 0 giants, 0 throttles
    0 input errors, 0 CRC, 0 frame, 0 overrun, 0 ignored
    77917013 packets output, 46823199043 bytes, 0 underruns
    0 output errors, 0 interface resets
    0 output buffer failures, 0 output buffers swapped out

    At the moment we have only the default vlan1.


    c6500#show mls statistics

    Statistics for Earl in Module 5

    L2 Forwarding Engine
    Total packets Switched : 167069132363

    L3 Forwarding Engine
    Total packets L3 Switched : 167068616897 @ 328278 pps

    Total Packets Bridged : 19461984156
    Total Packets FIB Switched : 147452107429
    Total Packets ACL Routed : 0
    Total Packets Netflow Switched : 0
    Total Mcast Packets Switched/Routed : 26251871
    Total ip packets with TOS changed : 8256444465
    Total ip packets with COS changed : 978
    Total non ip packets COS changed : 0
    Total packets dropped by ACL : 2049762
    Total packets dropped by Policing : 12213434

    Errors
    MAC/IP length inconsistencies : 0
    Short IP packets received : 0
    IP header checksum errors : 304618
    TTL failures : 1914145
    MTU failures : 0

    Total packets L3 Switched by all Modules: 167068616897 @ 328278 pps

    Best regards,

    Holger Amberg
     
    Holger Amberg, May 16, 2006
    #19
  20. Holger Amberg

    Merv Guest


    I believe you said the ARP were being sourced from VLAN1 which implies
    that there is more than one VLAN


    What else does this switch connect to - ie what is the network topology
     
    Merv, May 16, 2006
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.