Switches made things slower

Discussion in 'Cisco' started by Trent Collicutt, Jan 19, 2004.

  1. I have a floor in our building which was services by D-Link hubs. I
    decided that things might speed up a bit if we replaced the D-Link
    hubs with Catalyst 2950T switches. We connected one of the switches
    to the backbone cable, and the rest of the switches were daisy chained
    using the 1000BaseT ports. The 10/100 ports were left to
    auto-negotiate with the user machines, and portfast was turned on.

    Now, it appears that things are slower. File transfers that used to
    take 5 minutes, now take 20. Database operations take longer. There
    are no errors on the switchports, which show up on the counters. The
    machines all auto to 100 Full.

    Most of the NICs are 3C905b and a bunch of Intel Pro/100 models.

    I am in the middle of trying to convince different departments to
    spend the cash to replace the 10 yr old hubs, but this will be
    difficult if it actually makes things slower.

    Any ideas?
     
    Trent Collicutt, Jan 19, 2004
    #1
    1. Advertisements

  2. :I have a floor in our building which was services by D-Link hubs. I
    :decided that things might speed up a bit if we replaced the D-Link
    :hubs with Catalyst 2950T switches. We connected one of the switches
    :to the backbone cable, and the rest of the switches were daisy chained
    :using the 1000BaseT ports. The 10/100 ports were left to
    :auto-negotiate with the user machines, and portfast was turned on.

    :Now, it appears that things are slower. File transfers that used to
    :take 5 minutes, now take 20. Database operations take longer. There
    :are no errors on the switchports, which show up on the counters. The
    :machines all auto to 100 Full.

    Chances are excellent that you have a duplex conflict.

    Windows 2000 Server (and possibly some other Windows) sometimes lies
    about the duplex that has been negotiated. I have personally seen a
    case in which W2K Server claimed it was running at 100 Full, but it was
    really only running at 100 Half. Possibly the ethernet card was lying
    to W2K itself, I don't know. The cure was to force both ends of the
    link to 100 Full rather than autonegotiating.
     
    Walter Roberson, Jan 19, 2004
    #2
    1. Advertisements

  3. decided that things might speed up a bit if we replaced the D-Link
    If the swithes, connections between them and the clients work there
    is perhaps be a bottleneck somewhere ?

    Consider a case like this: you used a 10meg hubs from end-to end.
    Everyone works at 10meg and the actual rate is perhaps 4meg or so
    for the transfer at the real life. Assume the transfer is single and
    the server feeds the data to the client only. Everything else is quiet
    on your network.

    Now, you upgrade a 100meg switch at *one* end. You feed the line
    at 100meg and eventually it ends up into a 10meg segment. What
    now happens is that the 10meg segment is practically 100% used.
    Why? Because the switch buffers the 100meg feed and sends is
    as soon as there is any room at the 10meg line. They keep it buzy
    and it's a hald-duplex line.

    Collision rates like 25% or even more might be just normal in this
    case. So, it mean a lot of packets get lost. They are mostly the packets
    sent by the slower device on the 10meg line. So, perhaps the servers
    sends at 100meg which eventially drop close to the 10meg line rate
    the client is in. At that time most of the acks from the client receiving
    the data stream get still lost and the transfer rate is slowed down.

    Notice that actually every collision is a collision for an ack if there
    is a single transfer! A collision happens only as the client sends
    something and for a single-direction transfer it sends (almost) only
    acks.

    There is only one fix for this: make sure the slow device is able to
    work at full duplex or accept that it is slow at a very high rate.
    (And make sure there is no other slow half-duplex link somewhere
    between the ends.)

    Even if you upgrade it to a 10meg switch it makes a big difference
    because the acks are not lost as often (theoretically never if a full
    duplex is being used.). Better though, make all the network
    proper and everything gets faster like you expected. Partial
    solutions can have drawbacks as seen here.

    I've seen the same problem appear for the previous reason.
    Getting all the network upgraded solves things.
     
    Harri Suomalainen, Jan 19, 2004
    #3
  4. Trent Collicutt

    Hansang Bae Guest

    [snip]
    You can never lose packets (frames) because of collisions. That's the
    whole point of collisions and collision detect mechanism.

    The original posters problems could be (in order of likelihood)

    1) Duplex mismatch at the client end (ones connecting to the switch or
    from hub to switch)

    2) Cabliing was marginal. This works fine for 10Mbps but won't work
    for 100Mbps.

    3) The switched traffic is overwhelming the servers. Before the
    upgrade, built-in flow control of 10Mbps/HD Ethernet took care of this
    problem. With switches, it may no longer be true.


    --

    hsb

    "Somehow I imagined this experience would be more rewarding" Calvin
    *************** USE ROT13 TO SEE MY EMAIL ADDRESS ****************
    ********************************************************************
    Due to the volume of email that I receive, I may not not be able to
    reply to emails sent to my account. Please post a followup instead.
    ********************************************************************
     
    Hansang Bae, Jan 19, 2004
    #4
  5. mmm, a bit vague but you can check the logs on all the machines. 'sh
    log' and note any error syslogs.

    Try removing all etherchannels and loops and see if you get the same
    problem.

    Check the speed and duplex of each port to make sure they are
    100/full. (Where applicable)

    Also try upgrading all the switches to the latest code.

    Can you post a sample config?

    Simon
     
    Simon Tibbitts, Jan 19, 2004
    #5
  6. Trent Collicutt

    chris Guest

    Ok, first mistake. You didn't measure the existing traffic levels to
    figure out if you needed to upgrade. You might decide to upgrade for
    other reasons such as better reliability or features like
    manageability and security. In reality those 10-meg clients probably
    were fine and it was the servers that needed upgraded. Don't worry,
    I'd probably upgrade if I could get someone to buy nicer equipment.
    You're probably slamming the single port you connected to the
    backbone. Look at the traffic levels, duplex, and collisions. If the
    rest of the network is still hubs this may cause its own problems.
    Put together a coherent upgrade plan that identifies what you want to
    replace and why. From a business perspective, yuo need to justify the
    expense as a longer term savings. For example, having managed
    switches will save xxx hours a month in troubleshooting problems,
    users will wait on average xxx seconds shorter for data, etc.

    The honest reality is that most people don't need gigabit backbones
    and 100-meg to the desktop.

    -Chris
     
    chris, Jan 20, 2004
    #6
  7. Trent Collicutt

    AnyBody43 Guest

    wrote
    Loads of good stuff here.

    "Now, it appears that things are slower"

    Switches are slower than hubs for a single point
    to point data transfer.
    Cisco 2950T (and any other switches that convert between
    media of different speeds) receive a whole packet before
    starting to transmit it out the other side. In contrast a
    hub receives a single bit before starting to transmit
    it out the other side.

    Since you are also
    upgrading to 100/1000 (FE/GE) the switched infrastructure might
    reasonably be expected to be faster than a 10M shared infrastructure.


    "We connected one of the switches to the backbone cable."
    "> You're probably slamming the single port you connected to the
    Yes.


    BTW-You don't say what protocols you are running however it would
    be VERY unusual for say a server on a fast cable to in any sense
    swamp a workstation on a slow cable. The people who designed this
    stuff thought of that and it just doesn't happen.

    "the rest of the switches were daisy chained using the
    1000BaseT ports"
    This does not sound too good. How many in the chain?
    However this is unlikely, given the 1000M connection,
    to be the root of your problem.

    From the sound of it you are responsible for a sizable
    organisation and I would suggest that you should buy
    some time from a network consultant who will be able
    to guide you in the right direction. This sort of help
    might be available from a Cisco Partner as part of the
    purchase price. You may have to pay a bit more for the
    kit but it may be cheaper than hiring a consultant
    seperately.

    I _also_ do not trust the reported duplex setting.
    (Thanks Walter for confirming my own suspicions).

    I recommend checking the duplex settings by using the counters
    on the switch.

    If switch end is FD and you get CRC (FCS) and or align/frame
    errors reported then the other end is HD.

    If switch end is HD and you get late collisions
    then the other end is FD.


    Obviously one or two errors of any kind tell you nothing but
    modern installations normally run with pretty much zero errors.
    More than about 1 error in 1,000,000 frames is unusual.
     
    AnyBody43, Jan 20, 2004
    #7

  8. Tried that tack. I work for government. The feeling is that the time
    is already budgeted for, and even if we save the time it is istill
    being paid for. Therefore, ther is no cost savings.
     
    Trent Collicutt, Jan 20, 2004
    #8
  9. version 12.1
    no service single-slot-reload-enable
    no service pad
    service timestamps debug uptime
    service timestamps log uptime
    no service password-encryption
    !
    hostname 4Sull3
    !
    enable secret 5 $1$Ed7B$DzekNHWUXhx/dcSqH09ep1
    !
    ip subnet-zero
    ip domain-name gov.pe.ca
    ip name-server 10.102.20.2
    !
    no spanning-tree optimize bpdu transmission
    spanning-tree extend system-id
    !
    !
    interface FastEthernet0/1
    no ip address
    duplex half
    speed 10
    spanning-tree portfast
    !
    interface FastEthernet0/2
    no ip address
    spanning-tree portfast
    !
    interface FastEthernet0/3
    no ip address
    spanning-tree portfast
    !
    interface FastEthernet0/4
    no ip address
    spanning-tree portfast
    !
    interface FastEthernet0/5
    no ip address
    spanning-tree portfast
    !
    interface FastEthernet0/6
    no ip address
    spanning-tree portfast
    !
    interface FastEthernet0/7
    no ip address
    spanning-tree portfast
    !
    interface FastEthernet0/8
    no ip address
    spanning-tree portfast
    !
    interface FastEthernet0/9
    no ip address
    spanning-tree portfast
    !
    interface FastEthernet0/10
    no ip address
    spanning-tree portfast
    !
    interface FastEthernet0/11
    no ip address
    spanning-tree portfast
    !
    interface FastEthernet0/12
    no ip address
    spanning-tree portfast
    !
    interface FastEthernet0/13
    no ip address
    spanning-tree portfast
    !
    interface FastEthernet0/14
    no ip address
    spanning-tree portfast
    !
    interface FastEthernet0/15
    no ip address
    spanning-tree portfast
    !
    interface FastEthernet0/16
    no ip address
    spanning-tree portfast
    !
    interface FastEthernet0/17
    no ip address
    spanning-tree portfast
    !
    interface FastEthernet0/18
    no ip address
    spanning-tree portfast
    !
    interface FastEthernet0/19
    no ip address
    spanning-tree portfast
    !
    interface FastEthernet0/20
    no ip address
    spanning-tree portfast
    !
    interface FastEthernet0/21
    no ip address
    spanning-tree portfast
    !
    interface FastEthernet0/22
    no ip address
    spanning-tree portfast
    !
    interface FastEthernet0/23
    no ip address
    spanning-tree portfast
    !
    interface FastEthernet0/24
    no ip address
    spanning-tree portfast
    !
    interface GigabitEthernet0/1
    no ip address
    !
    interface GigabitEthernet0/2
    switchport mode access
    no ip address
    !
    interface Vlan1
    ip address 10.10.10.10 255.255.255.0
    no ip route-cache
    !
    ip default-gateway 10.10.10.254
    no ip http server
    !
    snmp-server engineID local 800000090300000C85536641
    snmp-server community islandtel01 RO
    snmp-server location 4th Sullivan
    snmp-server contact Trent Collicutt
    banner login ^CCWARNING, YOU ARE ABOUT TO ACCESS A RESTRICTED DEVICE.
    IF YOU ARE NOT ARE NOT AN AUTHORIZED USER, DISCONNECT IMMEDIATELY TO
    AVOID BEING PROSECUTED.^C
    !
    line con 0
    password fakepass
    login
    line vty 0 4
    password fakepass
    login
    line vty 5 15
    password fakepass
    login
    !
    end

    4Sull3#


    Actually, one particular case stemmed from a Netware server that
    crashed when a UPS failed. Other machines still report problems, and
    I have entire floors who say they would not have noticed if I hadn't
    told them I upgraded.

    Fortunately, I am upgrading core routing from a RSM to a 6509, and the
    backbone from 100 meg copper to 2000 meg fibre. Hopefully they notice
    that.
     
    Trent Collicutt, Jan 20, 2004
    #9
  10. :Tried that tack. I work for government. The feeling is that the time
    :is already budgeted for, and even if we save the time it is istill
    :being paid for. Therefore, ther is no cost savings.

    ???

    I work for government too, but we typically work at 300%+
    task overload, so saving time is important.

    [I handed off my routine interruptions to someone. He's kept
    completely busy just with the interruptions, with little time to
    make progress. And I'm now back up to about 200% overtime, just trying
    to keep up with administration, technical research, and development.
    If I had to take on all those interruptions again, I wouldn't be able
    to get more than 3 hours sleep a night :( ]
     
    Walter Roberson, Jan 20, 2004
    #10
  11. Trent Collicutt

    Sam Wilson Guest

    [major snips in the above]

    We found here that for some reason some Novell servers (I don't have
    the details and I'm not sure I can get them now) worked better at 100M
    HD than 100M FD - on FD they would drop packets.

    Sam
     
    Sam Wilson, Jan 20, 2004
    #11
  12. A few comments inline...

    See the white paper "Performance Impact of Backbone Speed in Switched
    Backbone Architectures" on my web site for the gory details of why
    switching can be slower than using repeaters (simple hubs). But be
    aware that the chances of seeing any significant impact when the
    endpoint links are at 100Mbps or faster are miniscule unless the
    applications are already fully utilizing the entire 100Mbps bandwidth.
    This assumes the interfaces on the systems can actually handle the higher
    data rate efficiently. As soon as anyone starts dropping packets, all
    performance gains go out the window.
    The trick may be finding the right consultant. This could be anything
    from an application tuning problem to backbone cabling not up to the
    requirements of the higher speeds being run.
    This is the most common cause of performance degradation during switch
    upgrades. Hint: depending on the kit, not only can you not trust
    auto-negotiation, sometimes you can't even trust what the interfaces
    report their status to be!
    Any number of late collisions greater than one per physical
    connect/disconnect cycle is too many.
    Been there, done that... You are in a maze of twisty little passages,
    all of them the same... Good luck and good hunting!
     
    Vincent C Jones, Jan 20, 2004
    #12
  13. Workload? yes. Budget for equipment? No. On a good day, I'm allowed
    to buy a patch cable.
     
    Trent Collicutt, Jan 20, 2004
    #13
  14. 4 switches connected together via Gig ports. The traffic on the line
    going to the backbone is about 4Mbps, on average.

    I'm not really expecting a huge speed increase. The capacity of the
    core router may not handle any more, until I install the 6509. I just
    didn't expect some of these machines to have such a drastic slow down.

    I thought about cabling. It checks out fine. 10 Half settings don't
    bring it up to what it was with the hubs. Some are slower, most don't
    notice a difference, and none really see any improvement at all.

    I'm hoping it turns out to be server related.
     
    Trent Collicutt, Jan 20, 2004
    #14

  15. Actually, I have found that one particular case, involving a Netware
    server, I changed the duplex to half and thethroughput shot up. I put
    it back to full duplex and the throughput has stayed higher. 2 min
    for a 40 Mbyte transfer, rather than 37. I've noticed that some of
    the servers, even if set in the OS and the NICs hardware,
    spontaneously change their mind what duplex they should be at.
     
    Trent Collicutt, Jan 20, 2004
    #15
  16. I read the paper. Interesting. As for pipelined protocols, one of
    the applications is a database app. The programmer seems to have
    decided UDP and SQL commands directly from the client to the database
    server with explicit acknowledgement of each packet was the most
    efficient way of doing things. This app has slowed drastically. The
    file transfer problems looks like inefficiencies in the Netware
    client.

    Your charts seem to indicate that if I change from a 100 meg backbone
    to a 1 Gig backbone, or etherchanneled 2 Gig, my users will probably
    complain of slower application responses.

    As for ideas about the lines to the switch stacks being a bottleneck,
    we use one core RSM for routing. When the total network traffic hits
    about 40 Meg, the network slows to a crawl. I think this might be why
    performance didn't increase. That is being dealt with. I just found
    it a surprise with the slowdown. Explainable? yes. Just not
    expected.

    Someone suggested a consultant. We did that before we started the
    installs. I designed the system, and the consultant came in and said
    "That'll work"
     
    Trent Collicutt, Jan 20, 2004
    #16
  17. Hello, Trent!
    You wrote on 20 Jan 2004 15:09:31 -0800:

    TC> As for ideas about the lines to the switch stacks being a
    TC> bottleneck, we use one core RSM for routing. When the total
    TC> network traffic hits about 40 Meg, the network slows to a
    TC> crawl. I think this might be why performance didn't increase.
    TC> That is being dealt with. I just found it a surprise with the
    TC> slowdown. Explainable? yes. Just not expected.

    What Supervisor Engine do you have on the switch where RSM is installed? Is it
    equipped with NFFC card?
    My gut feeling is that you are using RSM as a router-on-a-stick right now. If
    you have NFFC equipped Supervisor you may try to switch to MLS instead. Here is
    the link with overview -

    http://www.cisco.com/en/US/products/sw/iosswrel/ps1831/products_configuration_gu
    ide_chapter09186a00800ca6cf.html

    With best regards,
    Andrey.
     
    Andrey Tarasov, Jan 21, 2004
    #17
  18. Trent Collicutt

    Hansang Bae Guest

    Pretty common with crappy network drivers (nlm's) For example, I've
    sworn off of 3Com POS NICs. They had a problem with 3c509b variety and
    never admitted to it. Even after extensive testing and proof that I
    provided. RMA'ed the entire lot. Bastards.

    As to why some servers work better at HD...the collision mechanism is a
    time tested throttle mechanism. Sometimes, servers choke when the lanes
    getting to it are too wide open.

    Looking at the NIC stats on the MONITOR screen can determine if this is
    happening or not.

    --

    hsb

    "Somehow I imagined this experience would be more rewarding" Calvin
    *************** USE ROT13 TO SEE MY EMAIL ADDRESS ****************
    ********************************************************************
    Due to the volume of email that I receive, I may not not be able to
    reply to emails sent to my account. Please post a followup instead.
    ********************************************************************
     
    Hansang Bae, Jan 21, 2004
    #18
  19. Supervisor II in the core switch with RSM, and there is no MLS
    running. The other buildings use 5000s with Sup 1 engines. They are
    being replaced with 3750 stacks and the 5000 with the RSM is being
    replaced with a 6509 with MFSC and PFC with MLS running. Ultimately
    we plan to run gig fibre lines from the 6509 to each 3750, and use a
    2950G floor switch, with fibre lines running to each of the building
    3750s.

    There will be the same number of switch hops as now, but I hope if we
    get this far, we don't have to leave hubs in the wiring closets.
     
    Trent Collicutt, Jan 21, 2004
    #19
  20. Trent Collicutt

    CiscoTech Guest

    Supervisor II in the core switch with RSM, and there is no MLS
    Funny you should bring that up.....
    We were running a 6506 with the MSM (Multilayer Switching Module, provided
    layer 3 routing)
    then one more 500 node network was added, the whole thing slowed to a crawl.

    After several calls to TAC, we found out that the MSM module for the Sup1 was
    based on the 7000 router chipset. The packet rate increased just enough when
    the last network was added that it brought the whole network down.....

    A upgrade to the SUP2 and the MSFCII module fixed us up.

    The CPU on the MSM module was over 50%.
    Someone had asked if MLS was turned on...Turning on the MLS could possibly hurt
    instead of help...on the MSM module, enabling MLS increased the CPU load and
    hurt the overall performance of the core switch. (Just the opposite of what one
    would think...I tried it)

    Just a thought.....
     
    CiscoTech, Jan 22, 2004
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.