Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Computing > Cisco > redundant switches / redundant server NICs

Reply
Thread Tools

redundant switches / redundant server NICs

 
 
Stuart Kendrick
Guest
Posts: n/a
 
      08-09-2004
hi folks,

i'm analyzing the interaction between Catalyst 4000s and servers
configured with redundant NICs (Intel's TEAMing software).

we install two C4K in each server room, and cable the "a" NIC of each
server to the "a" ethernet switch, and the "b" NIC of each serfver to
the "b" ethernet switch.

when we unplug one of these cables, a continuous ping to the target
typically shows 0-1 missed pings ... and syslog on the server shows
the kernel detecting the loss of link, disengaging the primary NIC,
and activating the standby NIC. when we plug the cable back in, the
kernel detects this event and reverses the procedure. and we're back
to where we started. cool.

when we reboot a switch, we see the same behavior ... but about 30
seconds after the reboot, the C4K brings link up on its ports ... the
kernel obligingly changes its view of active and standby ... and the
server has just isolated itself, but the C4K is in no position to
forward traffic. in fact, it won't be ready to forward traffic for
another couple minutes. (near the end ofthat time, it will take link
down for ~30 seconds, before bringing it up just prior to going fully
functional).

i have a TAC case open on this -- the engineer says that the C4K
raises link in order to perform hardware level testing on the port ...
part of the power-on diagnostics ... if the testing fails, then the
Sup card will log an error message. this is good stuff.

however, in the meantime, the server sees link up, thinks it can use
that NIC ... and forwards packets into oblivion.

i can see various ways to solve this. i could disable diagnostics ...
but then i miss the benefit of having the C4K identify failed ports
for me. i could configure the servers not to failback ... but then,
at any given moment, my servers are in an indeterminate state,
network-wise (I won't know a priori which NIC is active).

how have other folks handled this problem?

--sk

stuart kendrick
fhcrc
 
Reply With Quote
 
 
 
 
Buzz Lightbeer
Guest
Posts: n/a
 
      08-09-2004
"Stuart Kendrick" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) m...
> hi folks,
>
> i'm analyzing the interaction between Catalyst 4000s and servers
> configured with redundant NICs (Intel's TEAMing software).
>
> we install two C4K in each server room, and cable the "a" NIC of each
> server to the "a" ethernet switch, and the "b" NIC of each serfver to
> the "b" ethernet switch.
>
> when we unplug one of these cables, a continuous ping to the target
> typically shows 0-1 missed pings ... and syslog on the server shows
> the kernel detecting the loss of link, disengaging the primary NIC,
> and activating the standby NIC. when we plug the cable back in, the
> kernel detects this event and reverses the procedure. and we're back
> to where we started. cool.
>
> when we reboot a switch, we see the same behavior ... but about 30
> seconds after the reboot, the C4K brings link up on its ports ... the
> kernel obligingly changes its view of active and standby ... and the
> server has just isolated itself, but the C4K is in no position to
> forward traffic. in fact, it won't be ready to forward traffic for
> another couple minutes. (near the end ofthat time, it will take link
> down for ~30 seconds, before bringing it up just prior to going fully
> functional).


This delay is probably spanning tree related: you could look at you
portfast/uplinkfast configuration to bring this time down.

> i have a TAC case open on this -- the engineer says that the C4K
> raises link in order to perform hardware level testing on the port ...
> part of the power-on diagnostics ... if the testing fails, then the
> Sup card will log an error message. this is good stuff.
>
> however, in the meantime, the server sees link up, thinks it can use
> that NIC ... and forwards packets into oblivion.
>
> i can see various ways to solve this. i could disable diagnostics ...
> but then i miss the benefit of having the C4K identify failed ports
> for me. i could configure the servers not to failback ... but then,
> at any given moment, my servers are in an indeterminate state,
> network-wise (I won't know a priori which NIC is active).
>
> how have other folks handled this problem?


When I have configured Intel teaming in the past I've used the smart-switch
feature which makes the active nic the current one until it fails. In other
words, if the switch the active nic is connected to fails then the team
switches to use the standby nic, but does not switch back once the 1st
switch returns to active duty.

>
> --sk
>
> stuart kendrick
> fhcrc


BTW, this sort of redundancy was not designed to give instant failover with
no dropped packets, but to allow the continued operation of a service after
a failure. Losing a few seconds of availability is better than losing it for
hours.

BL
--
As the days go by, we face the increasing inevitability that we are alone in
a godless, uninhabited, hostile and meaningless universe. Still, you've got
to laugh, haven't you? - Holly


 
Reply With Quote
 
 
 
 
Hansang Bae
Guest
Posts: n/a
 
      08-10-2004
In article <(E-Mail Removed)> ,
http://www.velocityreviews.com/forums/(E-Mail Removed) says...
> hi folks,
>
> i'm analyzing the interaction between Catalyst 4000s and servers
> configured with redundant NICs (Intel's TEAMing software).
>
> we install two C4K in each server room, and cable the "a" NIC of each
> server to the "a" ethernet switch, and the "b" NIC of each serfver to
> the "b" ethernet switch.
>
> when we unplug one of these cables, a continuous ping to the target
> typically shows 0-1 missed pings ... and syslog on the server shows
> the kernel detecting the loss of link, disengaging the primary NIC,
> and activating the standby NIC. when we plug the cable back in, the
> kernel detects this event and reverses the procedure. and we're back
> to where we started. cool.
>
> when we reboot a switch, we see the same behavior ... but about 30
> seconds after the reboot, the C4K brings link up on its ports ... the
> kernel obligingly changes its view of active and standby ... and the
> server has just isolated itself, but the C4K is in no position to
> forward traffic. in fact, it won't be ready to forward traffic for
> another couple minutes. (near the end ofthat time, it will take link
> down for ~30 seconds, before bringing it up just prior to going fully
> functional).
>
> i have a TAC case open on this -- the engineer says that the C4K
> raises link in order to perform hardware level testing on the port ...
> part of the power-on diagnostics ... if the testing fails, then the
> Sup card will log an error message. this is good stuff.
>
> however, in the meantime, the server sees link up, thinks it can use
> that NIC ... and forwards packets into oblivion.
>
> i can see various ways to solve this. i could disable diagnostics ...
> but then i miss the benefit of having the C4K identify failed ports
> for me. i could configure the servers not to failback ... but then,
> at any given moment, my servers are in an indeterminate state,
> network-wise (I won't know a priori which NIC is active).
>
> how have other folks handled this problem?



Are you sure you're not over thinking this problem with TAC? I.e. doing
a "set port host x/y" will fix the 50 sec delay you're talking about.
And when the Cat brings up the port for diags, I'm not sure that it
would send out the necessary link pulse to negotiate with the other
side. I could be worng, but I don't think it would do this.

The delay you're talking about sounds like the result of Spanning tree
calculation, trunking protocol and PaGP calculation. All of which can
be turned off with "set port host"

--

hsb

"Somehow I imagined this experience would be more rewarding" Calvin
*************** USE ROT13 TO SEE MY EMAIL ADDRESS ****************
************************************************** ******************
Due to the volume of email that I receive, I may not not be able to
reply to emails sent to my account. Please post a followup instead.
************************************************** ******************
 
Reply With Quote
 
Stuart Kendrick
Guest
Posts: n/a
 
      08-10-2004
Hansang Bae <(E-Mail Removed)> wrote in message

yes, it is quite possible that i'm making this harder than it really
is ...

however, i think i have the "set port host x/y" thing down ... i.e.
portfast enabled, trunking disabled, channeling disabled, and so
forth.

mp-a-esx> sh port cap 6/27
Model WS-X4448-GB-RJ45
Port 6/27
Type 10/100/1000
Speed auto,10,100,1000
Duplex half,full
Trunk encap type 802.1Q
Trunk mode on,off,desirable,auto,nonegotiate
Channel 6/1-48
Flow control
receive-(off,on,desired),send-(off,on,desired)
Security yes
Dot1x yes
Membership static,dynamic
Fast start yes
QOS scheduling rx-(none),tx-(2q1t)
CoS rewrite no
ToS rewrite no
Rewrite no
UDLD yes
Inline power no
AuxiliaryVlan 1..1000,1025..4094,untagged,none
SPAN source,destination,reflector
Link debounce timer yes
IGMPFilter yes
Dot1q-all-tagged no
Jumbo frames no
mp-a-esx>

and from the config file:

#module 6 : 48-port 10/100/1000 Ethernet
set vlan 42 6/1-48
set port auxiliaryvlan 6/1 642
set port auxiliaryvlan 6/2 642
[...]
set port enable 6/1-48
set port level 6/1-48 normal
set port speed 6/1-48 auto
set port clock 6/1-48 auto
set port trap 6/1-48 disable
set port name 6/1-48
set port security 6/1-48 disable age 0 maximum 1 shutdown 0
unicast-flood enable
violation shutdown
set port dot1x 6/1-48 port-control force-authorized
set port dot1x 6/1-48 multiple-host disable
set port dot1x 6/1-48 shutdown-timeout disable
set port dot1x 6/1-48 re-authentication disable
set port membership 6/1-48 static
set port protocol 6/1-48 ip on
set port protocol 6/1-48 ipx auto
set port protocol 6/1-48 group auto
set port flowcontrol 6/18-19 send desired
set port flowcontrol 6/1-17,6/20-48 send on
set port flowcontrol 6/1-48 receive desired
set cdp enable 6/1-48
set udld disable 6/1-48
set udld aggressive-mode disable 6/1-48
set trunk 6/1 off dot1q 1-1005,1025-4094
set trunk 6/2 off dot1q 1-1005,1025-4094
[...]
set spantree portfast 6/1-48 enable
set spantree bpdu-filter 6/1-48 default
set spantree bpdu-guard 6/1-48 default
set spantree mst link-type 6/1-48 auto
set spantree portpri 6/1-48 32 mst
set spantree portinstancepri 6/1 0 mst
set spantree portinstancepri 6/2 0 mst
[...]
set spantree guard none 6/1-48
set port gvrp 6/1-48 disable
set gvrp registration normal 6/1-48
set gvrp applicant normal 6/1-48
set port gmrp 6/1-48 enable
set gmrp registration normal 6/1-48
set gmrp fwdall disable 6/1-48
set port debounce 6/1 disable
set port debounce 6/2 disable
[...]
set port unicast-flood 6/1-48 enable
set port errdisable-timeout 6/1-48 enable
set cam notification added disable 6/1-48
set cam notification removed disable 6/1-48
set port channel 6/33-34 mode on
set port channel 6/1-32,6/35-48 mode off


> Are you sure you're not over thinking this problem with TAC? I.e. doing
> a "set port host x/y" will fix the 50 sec delay you're talking about.
> And when the Cat brings up the port for diags, I'm not sure that it
> would send out the necessary link pulse to negotiate with the other
> side. I could be worng, but I don't think it would do this.
>
> The delay you're talking about sounds like the result of Spanning tree
> calculation, trunking protocol and PaGP calculation. All of which can
> be turned off with "set port host"
>
> --
>
> hsb
>

 
Reply With Quote
 
Stuart Kendrick
Guest
Posts: n/a
 
      08-10-2004
yes, i can see myself going to this "don't switch back to active duty"
approach, too. but before i go there, i want confidence that i
understand what is happening, and that i'm not missing some cleaner
solution. i guess what you're saying is that this is the cleanest
solution you know of. thanx for the input!

--sk

> When I have configured Intel teaming in the past I've used the smart-switch
> feature which makes the active nic the current one until it fails. In other
> words, if the switch the active nic is connected to fails then the team
> switches to use the standby nic, but does not switch back once the 1st
> switch returns to active duty.
>
> >
> > --sk
> >
> > stuart kendrick
> > fhcrc

>
> BTW, this sort of redundancy was not designed to give instant failover with
> no dropped packets, but to allow the continued operation of a service after
> a failure. Losing a few seconds of availability is better than losing it for
> hours.
>
> BL

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
3750 switches stacked How to make power Redundant kesavroop Cisco 0 09-08-2010 06:10 AM
Redundant link between to L3 switches John Cisco 4 11-25-2008 08:39 PM
Best Configuration?: 1-Router/2-1G Switches/1-Server 2003 w/2-1 Ghz Nics Zandra Computer Support 2 07-08-2005 11:03 PM
Catalyst switches and Broadcom NICs Richard Graves Cisco 2 05-17-2005 07:09 PM
redundant switches spanning-tree question lfnetworking Cisco 2 05-16-2005 12:35 PM



Advertisments