redundant switches / redundant server NICs

Discussion in 'Cisco' started by Stuart Kendrick, Aug 9, 2004.

  1. hi folks,

    i'm analyzing the interaction between Catalyst 4000s and servers
    configured with redundant NICs (Intel's TEAMing software).

    we install two C4K in each server room, and cable the "a" NIC of each
    server to the "a" ethernet switch, and the "b" NIC of each serfver to
    the "b" ethernet switch.

    when we unplug one of these cables, a continuous ping to the target
    typically shows 0-1 missed pings ... and syslog on the server shows
    the kernel detecting the loss of link, disengaging the primary NIC,
    and activating the standby NIC. when we plug the cable back in, the
    kernel detects this event and reverses the procedure. and we're back
    to where we started. cool.

    when we reboot a switch, we see the same behavior ... but about 30
    seconds after the reboot, the C4K brings link up on its ports ... the
    kernel obligingly changes its view of active and standby ... and the
    server has just isolated itself, but the C4K is in no position to
    forward traffic. in fact, it won't be ready to forward traffic for
    another couple minutes. (near the end ofthat time, it will take link
    down for ~30 seconds, before bringing it up just prior to going fully

    i have a TAC case open on this -- the engineer says that the C4K
    raises link in order to perform hardware level testing on the port ...
    part of the power-on diagnostics ... if the testing fails, then the
    Sup card will log an error message. this is good stuff.

    however, in the meantime, the server sees link up, thinks it can use
    that NIC ... and forwards packets into oblivion.

    i can see various ways to solve this. i could disable diagnostics ...
    but then i miss the benefit of having the C4K identify failed ports
    for me. i could configure the servers not to failback ... but then,
    at any given moment, my servers are in an indeterminate state,
    network-wise (I won't know a priori which NIC is active).

    how have other folks handled this problem?


    stuart kendrick
    Stuart Kendrick, Aug 9, 2004
    1. Advertisements

  2. This delay is probably spanning tree related: you could look at you
    portfast/uplinkfast configuration to bring this time down.
    When I have configured Intel teaming in the past I've used the smart-switch
    feature which makes the active nic the current one until it fails. In other
    words, if the switch the active nic is connected to fails then the team
    switches to use the standby nic, but does not switch back once the 1st
    switch returns to active duty.
    BTW, this sort of redundancy was not designed to give instant failover with
    no dropped packets, but to allow the continued operation of a service after
    a failure. Losing a few seconds of availability is better than losing it for

    Buzz Lightbeer, Aug 9, 2004
    1. Advertisements

  3. Stuart Kendrick

    Hansang Bae Guest

    Are you sure you're not over thinking this problem with TAC? I.e. doing
    a "set port host x/y" will fix the 50 sec delay you're talking about.
    And when the Cat brings up the port for diags, I'm not sure that it
    would send out the necessary link pulse to negotiate with the other
    side. I could be worng, but I don't think it would do this.

    The delay you're talking about sounds like the result of Spanning tree
    calculation, trunking protocol and PaGP calculation. All of which can
    be turned off with "set port host"



    "Somehow I imagined this experience would be more rewarding" Calvin
    *************** USE ROT13 TO SEE MY EMAIL ADDRESS ****************
    Due to the volume of email that I receive, I may not not be able to
    reply to emails sent to my account. Please post a followup instead.
    Hansang Bae, Aug 10, 2004
  4. yes, it is quite possible that i'm making this harder than it really
    is ...

    however, i think i have the "set port host x/y" thing down ... i.e.
    portfast enabled, trunking disabled, channeling disabled, and so

    mp-a-esx> sh port cap 6/27
    Model WS-X4448-GB-RJ45
    Port 6/27
    Type 10/100/1000
    Speed auto,10,100,1000
    Duplex half,full
    Trunk encap type 802.1Q
    Trunk mode on,off,desirable,auto,nonegotiate
    Channel 6/1-48
    Flow control
    Security yes
    Dot1x yes
    Membership static,dynamic
    Fast start yes
    QOS scheduling rx-(none),tx-(2q1t)
    CoS rewrite no
    ToS rewrite no
    Rewrite no
    UDLD yes
    Inline power no
    AuxiliaryVlan 1..1000,1025..4094,untagged,none
    SPAN source,destination,reflector
    Link debounce timer yes
    IGMPFilter yes
    Dot1q-all-tagged no
    Jumbo frames no

    and from the config file:

    #module 6 : 48-port 10/100/1000 Ethernet
    set vlan 42 6/1-48
    set port auxiliaryvlan 6/1 642
    set port auxiliaryvlan 6/2 642
    set port enable 6/1-48
    set port level 6/1-48 normal
    set port speed 6/1-48 auto
    set port clock 6/1-48 auto
    set port trap 6/1-48 disable
    set port name 6/1-48
    set port security 6/1-48 disable age 0 maximum 1 shutdown 0
    unicast-flood enable
    violation shutdown
    set port dot1x 6/1-48 port-control force-authorized
    set port dot1x 6/1-48 multiple-host disable
    set port dot1x 6/1-48 shutdown-timeout disable
    set port dot1x 6/1-48 re-authentication disable
    set port membership 6/1-48 static
    set port protocol 6/1-48 ip on
    set port protocol 6/1-48 ipx auto
    set port protocol 6/1-48 group auto
    set port flowcontrol 6/18-19 send desired
    set port flowcontrol 6/1-17,6/20-48 send on
    set port flowcontrol 6/1-48 receive desired
    set cdp enable 6/1-48
    set udld disable 6/1-48
    set udld aggressive-mode disable 6/1-48
    set trunk 6/1 off dot1q 1-1005,1025-4094
    set trunk 6/2 off dot1q 1-1005,1025-4094
    set spantree portfast 6/1-48 enable
    set spantree bpdu-filter 6/1-48 default
    set spantree bpdu-guard 6/1-48 default
    set spantree mst link-type 6/1-48 auto
    set spantree portpri 6/1-48 32 mst
    set spantree portinstancepri 6/1 0 mst
    set spantree portinstancepri 6/2 0 mst
    set spantree guard none 6/1-48
    set port gvrp 6/1-48 disable
    set gvrp registration normal 6/1-48
    set gvrp applicant normal 6/1-48
    set port gmrp 6/1-48 enable
    set gmrp registration normal 6/1-48
    set gmrp fwdall disable 6/1-48
    set port debounce 6/1 disable
    set port debounce 6/2 disable
    set port unicast-flood 6/1-48 enable
    set port errdisable-timeout 6/1-48 enable
    set cam notification added disable 6/1-48
    set cam notification removed disable 6/1-48
    set port channel 6/33-34 mode on
    set port channel 6/1-32,6/35-48 mode off
    Stuart Kendrick, Aug 10, 2004
  5. yes, i can see myself going to this "don't switch back to active duty"
    approach, too. but before i go there, i want confidence that i
    understand what is happening, and that i'm not missing some cleaner
    solution. i guess what you're saying is that this is the cleanest
    solution you know of. thanx for the input!

    Stuart Kendrick, Aug 10, 2004
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.