Data sent through sockets are *sometimes* inverted ?!!!

Discussion in 'Linux Networking' started by mast4as, Feb 17, 2012.

  1. mast4as

    mast4as Guest

    Hi everyone

    I have a simple client/server apps where the client send data (TCP/IP)
    in the form of square of pixels colors (3 floats) to the server. I can
    change the size of the square but by default, I started with something
    like 32x32 pixels. So overall that's packets of 32*32*3*8 (8 bytes per
    float) floats sent to the server. These packets are send to the server
    for an entire image which can be say 1024x1024 in resolution. For my
    testing I set the pixel color to always be 1 0 0.1.

    On the client side:
    * 1 write -> 4 integers to specify pos and size of the square in the
    frame
    * 1 write -> square dim^2 * 3 (RGB) * 8 bytes pixel color data

    On the server side I do 2 reads:
    * do a first read to find the dimension and position of the square in
    the frame with 1 read
    * read all the pixels for the current square (square size^2 * 3 (RGB)
    * 8 bytes) with 1 read

    It runs okay but occasionally the squares have wrong values. For a
    block of 32x32 pixels say the first 20 pixels have RGB values that are
    1 0 0.1 then the next 20 pixels have their RGB inverted say: 0 0.1 1
    then the rest of the pixels will be 0.1 0 1 say. This problem only
    happens when the square reach a certain size. When the square are 8x8
    or 16x16.

    So I was wondering if it is possible that when the packed of data sent
    is too big, that the order of the bytes changes when it's read by the
    server. Or is this not supposed to happen at all which would suggest a
    bug in the code (however I really checked that I was writing and
    reading the correct number of bytes so I find that strange).

    If the write/read process doesn't guarantee that the bytes are not re-
    ordered, is that just happening after the data written and read go
    over a certain limit. Can I find this limit ? In other words is there
    a limit (number of bytes) under which it is certain that bytes won't
    be re-ordered. It seems that I can send the data in much smaller
    packets (by sending say 8 pixels at a time from a square of 64x64) but
    isn't that not very efficient ?

    Thanks a lot for your help

    -coralie
     
    mast4as, Feb 17, 2012
    #1
    1. Advertisements

  2. mast4as

    Chris Davies Guest

    I can't help but feel you're reinventing an image format and/or part
    of the VNC protocol.

    TCP guarantees ordered delivery. (Or no delivery at all.) So there will be
    nothing at the transport layer that changes the order of your bytes.

    Is your code going to be portable across multiple CPU
    architectures? (Might it need to be so? Consider mobile devices, for a
    start.) If so, you really need to be aware of things such as "network
    byte order".

    Chris
     
    Chris Davies, Feb 17, 2012
    #2
    1. Advertisements

  3. TCP does not reorder bytes.
    Without seeing your code it's impossible to tell what's going wrong, but
    a very common error is to assume that write() sends everything you ask
    in one go or that each read() corresponds exactly to one write().
     
    Richard Kettlewell, Feb 17, 2012
    #3
  4. mast4as

    Jorgen Grahn Guest

    TCP only deals with series of octets/bytes. The rest is up to your
    application to handle. If you're thinking of it as "sending three
    floats", your approach is wrong. If you cannot describe the protocol
    in terms of octets, your approach is wrong.

    [snip]

    I didn't read the rest carefully, but I think this may be the core of
    your problem.

    /Jorgen
     
    Jorgen Grahn, Feb 17, 2012
    #4
  5. mast4as

    Joe Pfeiffer Guest

    And, a particular way this can cause a bug in your code: when you
    write() some number n bytes and then attempt to read() n bytes, there is
    no guarantee you will actually get all n bytes in a single read() -- you
    may well need to make several read() calls to get all the bytes, using
    something like

    for (count = 0; count < n; count += num)
    num = read(fd, buf+count, n-count);
    if (num < 0) {
    // some sort of error-handling code
    }
    }

    (untested off-the-top-of-my-head code but you get the idea)

    If your code is assuming you get all the bytes in one read() (and I do
    read your description as saying that's what you're doing), then part
    of your buffer may contain old values and your next read() may return
    some data from the prior write(). This could cause the sort of symptoms
    you're reporting.

    write() is also not guaranteed to send all the bytes out in a single
    try, so careful programming would require a similar loop for your
    write()s. I've never actually seen a write() do anything but either
    send all the bytes or else fail, however.
     
    Joe Pfeiffer, Feb 17, 2012
    #5
  6. mast4as

    Jorgen Grahn Guest

    That wasn't what I was writing about, but you're right, of course.
    With TCP you're responsible for cutting the byte stream into pieces if
    asked to, and reassembling the pieces you read, when needed.

    ....
    Have you looked? If you have a TCP socket with just N bytes of free Tx
    buffer, and you write N+1 bytes to it, I'd expect a partial write to
    happen. I'd consider /not/ handling that case a serious bug.

    /Jorgen
     
    Jorgen Grahn, Feb 19, 2012
    #6
  7. mast4as

    Joe Pfeiffer Guest

    I'd actually argue that that was *exactly* what you were writing about.
    The programmer needs to make sure that his code correctly translates the
    objects he wants to think about into the bytes that TCP will be
    "thinking" about, and that those bytes get sent and received
    correctly :)
    Yes, I have -- as part of debugging more instances than I want to admit
    to of discovering I wasn't doing a good enough job of what we discuss
    above.

    I've wound up dumping lots and lots of "tried to write %d, write
    returned %d" and I can't remember ever seeing an instance of something
    other than -1 or what I was trying to send. It's an easy enough
    loop to write that I always do it (you're right, not handling the case
    would be a bug that might turn up now and might turn up in a decade),
    but so far I can't remember ever going around that loop twice.
     
    Joe Pfeiffer, Feb 20, 2012
    #7
  8. mast4as

    Jorgen Grahn Guest

    Ok. My mental image of it is two separate parts: (a) expressing your
    data in terms of streams of octets (or lines of text, as in the more
    popular RFCs), and (b) mapping those streams to socket read/write
    semantics.
    Interesting -- I must try it tonight. (On Linux, since I don't have
    easy access to any other Unix outside work.)

    /Jorgen
     
    Jorgen Grahn, Feb 20, 2012
    #8
  9. mast4as

    Jorgen Grahn Guest

    So far I have failed to disprove what you wrote. Netcat to a
    SIGSTOPped server, until first the server's RX buffer got filled and
    then the client's TX buffer got filled. The client blocked instead of
    performing a partial write.

    /Jorgen
     
    Jorgen Grahn, Feb 22, 2012
    #9
  10. mast4as

    Jerry Peters Guest

    What happens if you set the socket to non-blocking?

    Jerry
     
    Jerry Peters, Feb 22, 2012
    #10
  11. mast4as

    Jorgen Grahn Guest

    Something else, probably. But at this point I'd rather look at the
    linux/net/ sources -- when I have the time and energy.

    Noone here has claimed that you don't need to check for partial writes,
    but it could be very useful to know (e.g. for debugging purposes) how
    Linux behaves in this case.

    /Jorgen
     
    Jorgen Grahn, Feb 23, 2012
    #11
  12. Jorgen Grahn wrote:
    [...]
    Well write(2) and send(2) both return the number
    of octets written, or -1.

    What's the question here ?
    Of course both might wait forever if the socket is
    blocked for some reason.

    Set O_NONBLOCK upon creating the socket and see what happens :)
    Another possibility is using a 'timed write', which could get you out
    of waiting for(;;) or while(1) :p

    -rasp
    PS: The OP should apply some sort of checksumming if he
    really thinks he's losing some bytes....
     
    Ralph Spitzner, Feb 23, 2012
    #12
  13. mast4as

    Jorgen Grahn Guest

    The question is "under which situations does write() on a TCP socket
    return a partial write?". ("Partial write" meaning that some, but not
    all of the data you fed it got sent.)
    Why? TCP doesn't lose bytes[1]; only programming errors cause that.

    /Jorgen

    [1] Except the rare cases where packets get corrupted in transit
    in such a way that the checksum is still valid.
     
    Jorgen Grahn, Feb 23, 2012
    #13
  14. mast4as

    Chris Davies Guest

    Empirically, I can confirm that partial writes can happen when the
    network connection is broken.

    I started with a socket connection to a remote host, and then had a loop
    that wrote approximately 500 bytes per iteration. I used usleep() to slow
    the loop a little. I had nc listening on the other end. I then manipulated
    the firewall to simulate a network glitch - variously experimenting with
    REJECT and DROP.

    With the data session established but then blocked, I would stop the
    netcat (nc) and then unblock the data path. The result was an immediate
    failure of write.

    For those who want to repeat my experiment, you can get my client from
    http://www.roaima.co.uk/stuff/2012/02/23/client.c, and the commands I
    ran were these:

    nc -l -vvv \* 50194 # Listener

    make client && ./client # Client

    iptables -I OUTPUT -p tcp --dport 50194 -j REJECT # Client block
    iptables -D OUTPUT -p tcp --dport 50194 -j REJECT # Client unblock

    Chris
     
    Chris Davies, Feb 23, 2012
    #14
  15. Jorgen Grahn wrote:
    [...]
    Well if the connection somehow gets interrupted write will return
    a number less that the number of bytes you fed to it.

    [....]
    Well yes, but the OP _believes_that TCP is doing some evil to his data,
    apart from this not being true and not knowing his programming
    techniques and/or skills, a checksum might help him :p

    -rasp
     
    Ralph Spitzner, Feb 23, 2012
    #15
  16. mast4as

    Joe Pfeiffer Guest

    At this point, the OP seems to have vanished back into the wilderness.
    However, I think making sure he's really writing and reading all the
    bytes he thinks he is will help him a lot more than adding another
    checksum in his own code.

    Note that my earlier observation that I don't remember ever seeing a
    partial write() wasn't a result of an experiment seeing if I could
    trigger it, it was just an observation of what I've encountered in
    debugging my own code. It is interesting, though, to see people
    experimenting with trying to find circumstances under which it can
    happen.
     
    Joe Pfeiffer, Feb 23, 2012
    #16
  17. My _guess_ is that a signal received during the write might cause
    that, but I've never actually seen it happen nor have I analized the
    kernel source code to see if it would.
    I've seen that happen with Ethernet frames, but the TCP checksum
    detected it and caused a retransmission.
     
    Grant Edwards, Feb 23, 2012
    #17
  18. mast4as

    Jorgen Grahn Guest

    Richard W. Stevens did a nice investigation of this; it's in TCP/IP
    Illustrated, I think. He looked at live data and could tell that back
    then, a non-negligable number of packets which get corrupted pass both
    the Ethernet and TCP checksum. Enough so you have to consider it when
    designing application protocols.

    /Jorgen
     
    Jorgen Grahn, Feb 23, 2012
    #18
  19. mast4as

    Jerry Peters Guest

    Why? The behavior could change at the next kernel upgrade.

    But how do you know how it behaves in all circumstances? What about a
    loaded server, where the problem is *NOT* the client has stopped
    accepting packets, but the server is running low on transmit buffers?

    Jerry
     
    Jerry Peters, Feb 23, 2012
    #19
  20. mast4as

    Jorgen Grahn Guest

    Yes, but it's still useful to know how your code usually behaves in
    run-time, isn't it?

    A concrete example: an application fails, and when you read the code
    you find it doesn't handle partial writes correctly. Should you focus
    on this, or keep looking elsewhere?
    Look, all I'm trying to do is find one scenario where partial writes
    happen. It started upthread, when Joe Pfeiffer said he had (almost)
    never seen it, while my base assumption was that it was common in
    anything but the most naive tests.

    /Jorgen
     
    Jorgen Grahn, Feb 23, 2012
    #20
    1. Advertisements

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments (here). After that, you can post your question and our members will help you out.