complex design issue...

Discussion in 'Cisco' started by Jason, Mar 23, 2005.

  1. Jason

    Jason Guest

    ----enet---- : denotes Ethernet
    ----e1---- : denotes E-1 leased facility

    Two main sites:
    R1a----enet----R1b----e1----R34a----enet----R34b

    Chain of routers:
    Group one:
    R2----e1----R3----e1----R4----e1----R5----e1----R6----e1----R7----e1----R8

    Group two:
    R8----e1----R9----e1----R10----e1----R11----e1----R12----e1----R13----e1----R14

    Group three:
    R14----e1----R15----e1----R16----e1----R17----e1----R18----e1----R19----e1----R20----e1----R21

    Group four:
    R21----e1----R22----e1----R23----e1----R24----e1----R25----e1----R26----e1----R27----e1----R28----e1----R29

    Group five:
    R29----e1----R30----e1----R31----e1----R32----e1----R33

    Each of the start and end routers in each group also have connections to two of
    the main sites, like this:

    R2----e1----R1a
    R8----e1----R1a and ----e1----R34a
    R14----e1----R1b and ----e1----R34b
    R21----e1----R1a and ----e1----R34a
    R29----e1----R1b and ----e1----R34b
    R33----e1----R34b

    Crude ASCII-art diagram below: all connections E-1 except as noted.
    1 2 3 4 5 6 7 8
    12345678901234567890123456789012345678901234567890123456789012345678901234567890
    R1a----enet----R1b-------------R34a--------enet---------R34b
    /|% | + #| * | \
    / | % ###|###+########### | * | \
    / | % # | + | * | \
    / | % # | + ********|******************** | \
    / | % # | +* | | \
    / | %# | *+ | | \
    / | # % | * ++++++++|+++++++++++++++++++ | \
    / | # % | * | + | |
    / | # %%%%%|%%%*%%%%%%%%%% | + | |
    | | # | * % | + | |
    | |# | * %| + | |
    R2 R8 R14 R21 R29 R33
    | / \ / \ / \ / \ |
    R3 R7 R9 R13 R15 R20 R22 R28 R30 R32
    \ / \ / \ / \ / \ /
    R4 R6 R10 R12 R16 R19 R23 R27 R31
    \ / \ / \ / \ /
    R5 R11 R17–-R18 R24 –R25- R26
    I can’t change the E1 connections (customer provided)

    possibilities:
    OSPF area 0 is the R1a-R1b-R34a-R34b chain of routers.
    OSPF area 1 is R1a-R2-R3-R4-R5-R6-R7-R8-R1a
    OSPF area 2 is R1a-R8-R9-R10-R11-R12-R13-R14-R1b-R1a
    OSPF area 3 is R1b-R14-R15-R16-R17-R18-R19-R20-R21-R34a-R1b
    OSPF area 4 is R34a-R21-R22-R23-R24-R25-R26-R27-R28-R29-R34b-R34a
    OSPF area 5 is r34b-R29-R30-R31-R32-R33-R34b

    How to handle the R1a-r22, R1b-r29, R34a-r8, and R34b-r14 connections? Area 0?

    I have strong doubts that this is going to work with any routing protocol. For
    example, how to make the traffic originating at R5 default to the R4 link,
    floating static I think.

    How will a failure of R2 propagate? R3 should reroute traffic to R4, R4 to R5,
    etc. until R8, which should send it to R1a or R34a, depending on ultimate
    destination (other networks the other side of R1 and R34.

    The idea is that a failure of any one router will affect only that site, no
    others. Or if Site 1 goes down, then all traffic can go to Site 34 via one of
    the other connections. Let’s say Site 1 is dead, and the connection from R8 to
    R34a is also down, then traffic from R2 thru R13 should get forwarded to R14
    (via the chain of routers) which will then forward the traffic to R34a. So, in
    general, the concept is a highly resilient network. I just have doubts that the
    daisy-chain of routers is going to perform anywhere near what the customer wants.

    Customer also has requirement for recovery (i.e. re-convergence) from any
    router/link failure at max _eighteen_ seconds. This network will support a
    life/safety application. (hmmm, doesn’t Cisco have a ‘don’t sue us if you use
    this in a life/safety network and someone dies’ clause in the license agreement? )

    Comments, suggestions welcome. Can’t provide much more technical info as a NDA
    applies. Opportunity to re-connect the routers in a differing fashion are
    severely limited.

    Don’t even ask about the IP address structure the customer proposed (and has
    written proprietary device code to) that simply will not work.

    Jason.

    remove the obvious to get my email........
     
    Jason, Mar 23, 2005
    #1
    1. Advertisements

  2. Jason

    thrill5 Guest

    Whoever designed this network forgot the golden rule, the KISS principle
    (Keep It Simple Stupid). I have never been a fan of OSPF because it doesn't
    scale without a lot of manual configuration. EIGRP scales extremely well
    and would work fine in this network. The biggest problem I see is that
    there are too many paths and too much redundancy. After a certain point,
    adding more redundancy does not increase reliability it decreases it because
    the routing becomes very complex (as in this scenario) and make trouble
    shooting difficult. If going the EIGRP path, use summary-routes on as many
    links as you can, as this will keep the convergence time low.

    Scott
     
    thrill5, Mar 24, 2005
    #2
    1. Advertisements

  3. Jason

    Jason Guest

    Scott, thanks.

    Well, more bad news for this network. Ultimately, it will not be Cisco
    equipment. So EIGRP is out. Cisco will be used for the lab and pilot. Some
    'ruggedized' router will be used for production. Made by a company that so far
    I've been unable to find a successful router product from.

    The topology below does a pretty good job of providing circuit redundancy, while
    minimizing the number of E1s required. Also, this network is geographically
    linear. Think of, oh, say exits along an interstate (that's not what this is,
    but the linear nature is pretty close) Site 1 and Site 34 are at the two ends.
    Travelling along the interstate, you need to communicate with local devices to
    find out if the way ahead is clear. This is done via a parallel RF network that
    interfaces to this network at Site 1 and Site 34. Based on your location, you
    talk to a local device.

    A very large problem is actually a result of all the redundancy: when site one
    goes down, and the traffic from Site 2 propagates down the chain of routers
    finally getting to Site 8, where, ooops, the connection between area 1 and area
    0 is broken, how to pass the traffic out of area 1 into area 2, and get it
    forwarded to site 34?

    Ow, my head begins to hurt.

    I think I'm going to have to figure out how to tell the customer this design
    (dreamt up by non-IP-network types) simply will not work, and give me a week to
    come up with a decent solution. But they are not the ultimate customer,
    he/she/they/it are half a world away, and they control the E1 provisioning.

    More ideas?

    Jason.
     
    Jason, Mar 24, 2005
    #3
  4. Jason

    stephen Guest

    well - maybe.

    i didnt see the original post, but 30+ routers is fine in a single OSPF
    area.

    EIGRP scales extremely well
    Scaling with OSPF usually gets to be an issue once there are multiple areas
    (because OSPF is only a link state protocol within an area, not between
    them).

    so the 1st Q should be - do i need areas at all?

    it doesnt seem to make much sense to deploy a protocol optimised for complex
    networks and then confine the resilient protocol to subsets of the network
    that arent really resilient.

    the main reason in this network may be the designer wants to summarise
    routes at area boundaries - again you need to check if the scale is such
    that you care to add the complexity and the risk that the summarisations
    will break the resilience.
    golden rule is set up the routing protocol properly, put in statics as close
    to where the static "starts" in the cloud, and let it propagate. if you need
    lots of statics for similar routes scattered across the network then the
    design is wrong.
    1 area fixes this.
    the reliability is going to depend on link reliability (which if they are
    radio is a problem, or if they are off an SDH / fibre link system should be
    fine)- a decent modelling package can let you figure out what is likely
    you will have to alter the default timeouts - the dead timer + convergence
    is going to have to be sub 18 sec combined, and the default dead timer is
    around 40 sec..... Cisco also only re-eval the link state database every few
    seconds.

    maybe you need a reliable underlying network? SDH is the standard carrier
    tool for this, and if you have the capacity and money and right kind of
    underlying pipes, you can get link recovery in 50 mSec.
    high speed convergence in ISP backbones often uses IS-IS, where the hello
    times can be sub second - not sure if that is practical with these link
    speeds though - you dont want minor hits or flapping links to cripple the
    network.
     
    stephen, Mar 24, 2005
    #4
  5. Interesting challenge. You'll probably get lots of recommendations to
    "just use EIGRP" even though this is a classic worst-case topology
    for EIGRP. I'd stick with a link-state protocol (OSPF or IS-IS).

    Unless your diagram is only a small sample of the final
    implementation, you can resolve your concerns with OSPF simply
    by expanding backbone area zero to include the second layer of
    interconnects (so it contains R1a, R1b, R34a, R34b, R2, R8, R14, R21,
    R29, and R33). The assignment of other areas then becomes obvious.

    You will need to tune the timers to get your recovery time within
    spec. This could be a challenge if the network expands by an
    order of magnitude, as you need to not only speed up the hello/dead
    times, but also the database update and propagate timers. If the
    previous sentence is not "preaching to the choir" and this really
    is a life/safety network, get some help from a competent consultant
    before you make any commitments.

    Also, don't forget the monitoring and management side of the network,
    the topology cannot tolerate multiple failures, so any time a link
    or site fails, you're playing "beat the clock" to get it fixed
    before the next link or site fails. Note that no matter how good a
    job you do, you're looking at a finite probability of disconnects
    of working systems, a routing protocol is not a substitute for
    disaster recovery planning, nor does it take a disaster to be a
    disaster from the networking viewpoint.

    Good luck and have fun. And keep your liability insurance is up to date.
     
    Vincent C Jones, Mar 24, 2005
    #5
  6. Jason

    Hansang Bae Guest


    Surely you jest. this is *THE* scenario where I would stay away from
    EIGRP.


    --

    hsb


    "Somehow I imagined this experience would be more rewarding" Calvin
    **************************ROT13 MY ADDRESS*************************
    Due to the volume of email that I receive, I may not not be able to
    reply to emails sent to my account. Please post a followup instead.
    ********************************************************************
     
    Hansang Bae, Mar 24, 2005
    #6
  7. Jason

    Jason Guest

    Vincent, and everyone else...
    Well, it won't be EIGRP because the ultimate production network will use
    ruggedized routers, not from Cisco (a whole 'nuther ball of string, as they
    say). Going w/ OSPF so far. Both you and HSB have said that this is a "classic
    worst-case" for EIGRP (or similar words...) Why?
    This is the complete network that is of concern. There are some other parts that
    feed this one, but they are trivial at this point (ISDN/PRI access from an RF
    network controller at site 1 and site 34, for example). I had planned that Area
    0 be the four core routers (2 at site 1 and two at site 34) and all their
    interfaces, as well as the interface on the router at the other end of those
    circuits. Then as you say, the other areas become obvious. However, I have
    concerns that if/when a failure occurs in such a way as to make an area
    disconnected from area 0 (say, lose the circuits from R2 and R8 to area 0),
    then there is a path via area 2, but there is no longer a link from area 1 to
    area 0. So, virtual link, you say? ahh, but how to make that dynamic, so it
    comes into play _only_ under certain failure scenarios? I haven't figured that
    out yet.

    So, my three options right now are: A) everything in Area 0, B) split up into
    multiple areas and deal with virtual links and redundant/fallback paths, or C)
    come up with a new topology.

    Well, the last statement first: I _am_ the competent consultant. Unfortunately,
    I was brought into this design much more than 3/4 into the project. My customer
    has already sold the design to their customer.

    Yes, had a long conversation today with my customer and his management. The one
    thing that could get the end customer to come up with more E-1s is the inability
    to get this network to converge/recover quickly enough in the event of an
    outage. Another earlier reply alluded to the quality of the E1s as they are
    being provided by the customer off their own SDH network. We discussed how a
    dynamic routing protocol works, how changes are not immediately transmitted to
    everyone, how things have to expire and propagate, etc. How one router tells
    another, who has to do things before telling another, etc. I also brought up
    that changing the hello timers and hold-down timers and garbage-collect timers,
    etc. can be done, finding a fine balance between convergence time and
    overwhelming the network with OSPF overhead could be touchy. The life/safety
    part comes in because the ultimate device receiving information from this
    network is self-powered, and very heavy, on very fixed pathways, and very long,
    and very hard to stop in a short distance. (did that give you enough hints
    without violating the NDA? :) )
    I brought this up again today as well. The documents I'm working from have no
    mention of management, security, performance monitoring, "nuth'n". I made some
    recommendations. Will have to see where they go. My customer has thought about a
    couple of these things, but I don't think they gave it much more than, "oh, the
    equipment vendor has software for that...." All I can do is make the
    recommendations that they examine the issues. Your other points about being a
    substitute are well understood (by me, at least, I'm trying to get that thru to
    my customer...)
    Yeah, I hear that about the insurance. The good thing is that the ultimate
    implementation is in another hemisphere, and I doubt that that country much
    cares about some round-eye's legal liability - they tend to execute their own
    people rather easily. :=<

    Thanks for the comments, I'm actually hoping I can get this contract over
    quickly, because it is such a nightmare. Lesson to self: look closer at the
    scenarios during the interview process.....

    Jason.
     
    Jason, Mar 25, 2005
    #7
  8. :. So, virtual link, you say? ahh, but how to make that dynamic, so it
    :comes into play _only_ under certain failure scenarios? I haven't figured that
    :eek:ut yet.

    Read Vincent's book ;-)

    I'm bogged down a bit at the moment on the chapter "50 Ways to Lose
    Your LAN (and Survive)"
     
    Walter Roberson, Mar 25, 2005
    #8
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.