linux networking buffer

Discussion in 'Linux Networking' started by Surinder, Mar 30, 2013.

  1. Surinder

    Surinder Guest


    A) RX count as seen and set by ethtool.
    B) sysctl -w net.core.netdev_max_backlog=3000
    C) sysctl -w net.core.rmem_max=256000 along with setsockopt(SO_RCVBUF)

    on transmit side
    D) TX count as seen and set by ethtool
    E) txqueuelen as seen and set by ifconfig
    F) sysctl -w net.core.wmem_max and setsockopt(SO_SNDBUF)

    I guess
    A,D are set in NIC eprom and meant for NIC memory
    or NIC driver (host non-pagable memory) ?
    B,E are kept in kernel outside any device driver
    C,F are kept in kernel networking stack (not part of NIC driver)

    A,D changes are for that very NIC
    B,E are about system wide queue size
    C,F allows per socket buffer size tuning.

    I wrote a ping program in C/Linux to send icmp echo to 1000 hosts, I got
    around 170 drops in single iteration i.e. without retry.
    These are my buffer values
    RX(200), backlog(1000), net.core.rmem_max = 131071, SO_RCVBUF :262142

    packet sniffer got all 1000 reply packets.
    packet size is about 100byte on wire and 84byte with IP header. so
    262142 should easily hold 2600 packets without drop.
    changing RX didn't make difference.
    on increasing rmem_max, all 1000 are received though.
    There was no other traffic to the collector (as verified by packet
    capture tool).

    - Surinder
    Surinder, Mar 30, 2013
    1. Advertisements

  2. Surinder

    Rick Jones Guest

    Except that Linux socket buffer limits are applied to not just the
    packet headers/data but also the size of the buffer(s) used to hold
    the data. So, if the NIC driver happens to post 2048 byte buffers to
    the card, that 262141 bytes would be able to hold no more than 127
    packets. So, my guess would be that your receiver wasn't keeping-up
    with all the replies coming-in and lost the race a few times.
    Suggesting that the "queue" which overflows is not the NIC's RX
    That would be consistent then with the increased rmem_max size being
    sufficient to cover those cases where your receiver wasn't able to
    keep-up for a short length of time.

    If your application is simply sending all 1000 requests and *then*
    looking to receive/process replies, you should alter the logic of your
    application to be able to also look for replies as you are sending

    rick jones
    Rick Jones, Apr 5, 2013
    1. Advertisements

  3. Surinder

    Tauno Voipio Guest

    I'm not sure if it is wise to continue handholding the OP. The task
    at hand smells a bit too much of first stage of a spam-o-bot.

    Luckily it seems that he's not too much up to the task.
    Tauno Voipio, Apr 6, 2013
  4. Surinder

    Rick Jones Guest

    Oh, back when my primary workstation was an HP-UX 11.11-running HP9000
    J5600 I used the same version of browser/email client for years at a
    time. I wasn't being forced-upgraded to a new version every couple

    rick jones
    Rick Jones, Apr 8, 2013
  5. Surinder

    Surinder Guest

    Thanks for reply.
    SO_RCVBUF of 262141 may be coming into picture after kernel got the
    packet and putting it into socket specific queues. I understand that
    you are suggesting that total packet capacity is calculated based on
    some constant size for the packet. But that would lead to wastage of
    memory when the packet size are small. as tcp or udp is 65K.
    buffer of 262141 could hold 4 packets of 65K each.
    I got my program working properly with bigger buffer size, just that I
    want to optimise on buffer size. huge size of rmem_max could mean more
    non-pagable memory reserved for kernel itself. not that I am running low
    on RAM but just want to be conservative.
    I could get the required reliability and performance with having
    all-sends/all-recvs in simpler and readable code.
    If it hit the wall, then I would be having sender-thread/recvr-thread to
    go in parallel, which would ease load on the queues even though that
    would add complexity of synchronizing.

    And also I wanted to understand que sizing mechanism between linux
    socket sender/rcvr.

    This is my Linux info though I intended to ask for Linux in general.
    Linux khyber 2.6.13-15.18-default #1 Tue Oct 2 17:36:20 UTC 2007 i686
    i686 i386 GNU/Linux

    Thanks Again.
    - Surinder

    --- news:// - complaints: ---
    Surinder, Apr 9, 2013
  6. Surinder

    Surinder Guest


    This is my Linux info though I intended to ask for Linux in general.
    2.6.13-15.18-default #1 Tue Oct 2 17:36:20 UTC 2007 i686 i386 GNU/Linux

    surely a lot would have changed in linux kernel since.

    How would Thunderbird make or version relate to my question :)

    - Surinder
    Surinder, Apr 9, 2013
  7. Surinder

    Surinder Guest


    A tool has no intent else than doing its job.
    Its the person who has intents.
    And it is different thing to make guess on intents of others.
    And to guess their capacity to follow their intents.
    And that on forum that is meant for understand making/working of tool.
    And to act as wet blanket.

    I hate spam more than I hate mosquitoes.
    spam has (despite of all anti spam tools) make lot of non reversible
    losses to Internet. Worst affected being NNTP newsgroups.
    sometimes a valid email ends up in spam box.

    The concerned application is ping check for a set of devices just for

    - Surinder
    Surinder, Apr 9, 2013
  8. Surinder

    Rick Jones Guest

    Socket buffers in general, and things like net.core.[rw]mem_max
    specifically are not preallocations. They are limits. So, an
    SO_RCVBUF of 256KB is not consuming 256 KB of memory unless there are
    actually data/packets waiting therein.

    For the inbound data path, the allocations are actually made by the
    NIC driver when it allocates buffers to post to the NIC for inbound
    DMA. The strategies employed for buffer sizes there will vary from
    NIC to NIC, and will depend on the NIC's programming model.

    rick jones
    Rick Jones, Apr 9, 2013
  9. Surinder

    Surinder Guest

    Good to hear that.

    - Surinder
    Surinder, Apr 11, 2013
  10. Surinder

    Surinder Guest

    Following code answers few of my questions.
    (inline with what Rick told)
    709 sk->sk_userlocks |= SOCK_RCVBUF_LOCK;
    710 /*
    711 * We double it on the way in to account for
    712 * "struct sk_buff" etc. overhead. Applications
    713 * assume that the SO_RCVBUF setting they make will
    714 * allow that much actual data to be received on that
    715 * socket.
    716 *
    717 * Applications are unaware that "struct sk_buff" and
    718 * other overheads allocate from the receive buffer
    719 * during socket buffer allocation.
    720 *
    721 * And after considering the possible alternatives,
    722 * returning the value we actually used in getsockopt
    723 * is the most desirable behavior.
    724 */
    725 sk->sk_rcvbuf = max_t(u32, val * 2, SOCK_MIN_RCVBUF);
    726 break;

    As my packets are below 100 bytes data, the sk_buff overhead would be
    proportionally quite high. Visually looking at sk_buff, for a 32-bit
    system, the size of sk_buff appears to be around 200 bytes.
    For 100 byte on wire packet, that means only one-third of rcvbuf is
    holding packet data. 2/3 is sk_buff overhead.

    So for SO_RCVBUF :262142,
    262142/3 = 87380 bytes are available for packet data.
    87380/100 = 873 is number of packets that can be held in it.

    1000 - 873 = 127 drops out of 1000 are expected.
    What I got was 170, which is not far from expectation.

    Following Chapters/Books were really helpfull.
    [1] Understanding Linux Network Internals By Christian Benvenuti
    Part III: Transmission and Reception

    [2] Essential Linux Device Drivers by Sreekrishnan Venkateswaran
    Chapter 15. Network Interface Cards

    Thanks all folks here.

    - Surinder
    Surinder, Apr 13, 2013
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.