Connectivity Problem with WS-C3750G-24TS and Broadcom BCM5708C SLB

Discussion in 'Cisco' started by Torsten Steuerer, Feb 22, 2008.

  1. Hello,

    I have a problem with Broadcom Smart Loadbalancing and Cisco WS-

    We have a stack consisting of two Cisco WS-C3750G-24TS.
    A Dell PowerEdge Server with two Broadcom BCM5708C Cards was
    connected to the Stack; one Card per Switch. The two Cards were teamed
    with Broadcom Smart Load Balancing, using the Broadcom Advanced
    Control Suite.

    Every about 20 minutes or so, the Dell Server was able to ping e.g.,
    machines A, B and C, but not D, E and F.
    After one or two minutes, every address was pingable again. Machines
    D, E and F were reachable from every other place within the Network.
    There is no log entry in the Cisco or in the Windows event log
    correlating to the problem.

    We upgraded the NIC Drivers, changed cables, without any result.
    When we remove one of the Broadcom NIC connections, the connection
    remains stable.

    The strange thing is that we have a lot of similar configurations
    (3750 Stack connected to Dell Servers with teamed Broadcom NICs)
    running without any Problems.

    The Switch Port configuration is as follows:

    interface GigabitEthernet1/0/4
    switchport access vlan 10
    switchport mode access
    macro description cisco-desktop
    spanning-tree portfast
    spanning-tree bpduguard enable

    IOS Version is 12.2(25)SEE2

    The particular Switch that gives us the Problem is running since 1
    year and 10 weeks; though I am not a 100 percent sure; I believe that
    the above configuration was running on that switch as well without
    problems for months (I don't get always informed what our Serveradmins
    are doing :-( )

    Any clues?

    Thanks in advance,

    Torsten Steuerer, Feb 22, 2008
    1. Advertisements

  2. Torsten Steuerer

    Peter Guest

    Hi Torsten,
    I have been involved in a similar setup (in our case using a stack of
    2 x 3750 48 port switches and MS Teaming on the Servers), and seen
    something very similar, IE periods where things just "go to sleep"...
    switch wide! It all depended on the MODE of teaming being used on the
    Server. Now I am NOT a Server person, but was discovered a
    configuration change had been applied on the Server end and was able
    to come up with a work-around that did what we wanted for our setup
    using the following.

    If I understand it correctly, MS Teaming seems to provide about 4
    possible configuration modes for Teaming operation (all using just 1
    IP address on the server) -
    1. Load balancing using Single MAC sharing.
    2. Path Redundancy using Single MAC sharing.
    3. Load balancing using 2 MAC's.
    4. Path Redundancy using 2 MAC's.

    When configured for Mode 1, we had an almost identical situation to
    you, except in our case the switch just dumped its entire MAC address
    table and refreshed (very slowly), which seemed to take quite a bit of
    time, so it used to stop ALL traffic on the switch stack (IE all 96
    ports) until something was done to sort out the issue.

    My theory was that Mode 1 was presenting an IDENTICAL MAC on 2
    different Switch ports, causing an issue for the 3750's MAC table.

    We found that Mode 2 worked fine, which gave us what we really needed,
    a backup or alternate path. Only 1 real path was ever active at any
    one time, the other path NEVER saw the MAC of the server on it until
    the first PATH had died.

    The down side of this is that it also means you are vulnerable to the
    Server not doing something stupid that re-enables path sharing (highly
    possible with an MS environment).

    Modes 3 & 4 should be fine as only 1 MAC per port was ever seen by the

    So if you use Teaming for redundancy purposes, and your version of
    Teaming allows this, and works the same as the MS way, then it should
    be possible to get it to work.
    Ours was a failure about every 20 minutes or so as well, except it
    would knock down the entire stack and take 5-10 minutes to recover.
    Nope, not a squeak anywhere on what was going on. Our only clue was
    that Network access to the 3750 stack completely died until the MAC
    table had been re-built, but by then everything looked fine again.

    Exactly... its my guess you are getting a single MAC appearing on 2
    different switch ports, and this is causing the switch to choke..
    How is the Teaming configured on these? We also had a set of Servers
    using MS Teaming running fine for abut a year, but it was only when
    new Servers were added in a different MODE of Teaming that the new
    problem emerged.

    Pretty much identical I Think , except we were at SEE3.

    Let me guess, they were "tweaking the way things were configured" to
    get it to run better?.....;-)

    Peter, Feb 23, 2008
    1. Advertisements

  3. Torsten Steuerer

    Thrill5 Guest

    Thrill5, Feb 23, 2008
  4. Hello Peter,

    I asked our Server Admin to point me to a similar configuration like
    the one that is causing problems.
    Unfortunately, he is on Holiday at the moment, so I checked on a few
    Servers where I got access to.
    On four Servers I checked, I found !!! three !!! different

    - Adapter Fault Tolerance
    - Adaptive Load Balancing
    - Switch Fault Tolerance

    Though all those Servers have Intel Cards which obviously don't cause
    the same problems, the Server configs seem in no way being consistent.
    Or on the n-th Server, they tried the n-1th config (see above) %-}

    Greetz to NZ,

    Torsten Steuerer, Feb 26, 2008
  5. I thought about that option but that means additional config and
    management burden for me which I want to avoid.
    Especially since I can never be sure that somebody is repatching

    Torsten Steuerer, Feb 26, 2008
  6. Torsten Steuerer

    Bod43 Guest

    I agree that the LACP scheme is the way to go.

    Sadly it seems that Server NIC configuration is not considered
    a networking job in a lot of places.

    ServerAdmin - "It pings so the network must be OK."
    Bod43, Feb 28, 2008
  7. Finally I found the reason. One machine in the net was poisoning arp
    Happened like this:

    Havin three machines, DBServer, AppServer and BadGuy.

    From Time to time, Bad guy was sending an ARP request for DBServer.
    In that ARP request, the sender address consisted of the MAC address
    of Bad Guy and the IP address of AppServer!
    So DBServer overwrote his ARP cache entry for AppServer; now pointing
    to the MAC address of BadGuy and was now unable to reach AppServer.
    Thus until the "real" AppServer sent an ARP request for DBServer.

    This behaviour was most likely caused by Broadcom NICs in combination
    with VMWare. Unfortunality so far I could only find this blog entrys 1 and 7
    (Broadcom confessed the ARP cache poisoning problem) pointing out the

    How could I ever blame my good ol' Ciscos ;-)
    Torsten Steuerer, Mar 4, 2008
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.