Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Interplatform (interprocess, interlanguage) communication

Reply
Thread Tools

Interplatform (interprocess, interlanguage) communication

 
 
Arne Vajhűj
Guest
Posts: n/a
 
      02-04-2012
On 2/4/2012 4:22 PM, Roedy Green wrote:
> On Sat, 04 Feb 2012 12:50:15 -0800, Roedy Green
> <(E-Mail Removed)> wrote, quoted or indirectly quoted
> someone who said :
>> both read and write same file on SSD

>
> Let's say you used a simple RandomAccessFile. How could you implement
> a busy lock field in the file to indicate the file was busy being
> updated? or busy being read? In RAM you have test and set locks to
> check a value and set the value in one atomic operation. How could
> you simulate that without test and set hardware on the SSD?


java.nio.channels.FileLock with the caveats about what the OS
supports.

> You can't
> very well share a RAM lock between separate jobs.


You can in most OS. It is just not well supported in Java.

Arne

 
Reply With Quote
 
 
 
 
BGB
Guest
Posts: n/a
 
      02-06-2012
On 2/3/2012 12:52 PM, Stefan Ram wrote:
> »X« below is another language than Java, for example,
> VBA, C#, or C.
>


I am mostly a C developer, so I am writing more from my perspective...


> When an X process and a Java process have to exchange
> information on the same computer, what possibilites are
> there? The Java process should act as a client, sending
> commands to the X process and also wants to read answers
> from the X process. So, the X process is a kind of server.
>
> My criteria are: reliability and it should not be extremely
> slow (say exchanging a string should not take more than
> about 10 ms). The main criterion is reliability.
>
> »Reliability« means little risk of creating problems, little
> risk of failure at run-time. (It might help when the client
> [=Java process] can reset the communication to a known and
> sane start state in case of problems detected at run-time.)
>
> The host OS is Windows, but a portable solution won't hurt.
>
> A list of possibilities I am aware of now:
>
> Pipes
>
> I have no experience with this. I heard one can establish
> a new process »proc« with »exec« and then use
>
> BufferedWriter out = new BufferedWriter(new
> OutputStreamWriter(proc.getOutputStream()));
> BufferedReader in = new BufferedReader(new
> InputStreamReader(proc.getInputStream()));
>


no real comment, as I don't have much experience using pipes on Windows.


> Files
>
> One process writes to the end of a file, the other reads
> from the end of the file? - I never tried this, don't know
> if it is guaranteed to work that one process can detect and
> read, whether the other has just appended something to a file.
>
> What if the processes run very long and the files get too
> large? But OTOH this is very transparent, which makes it easy
> to debug, since one can open the files and directly inspect
> them, or even append commands manually with »copy con file«.
>


IME, I have often seen synchronization issues in these cases. sometimes
the OS will refuse to let multiple programs access the same file at the
same time, but sometimes it does work (I think depending on how the file
is opened and which flags are given and similar).

if just naively using "fopen()" or similar (in C), IME/IIRC, the OS will
typically only allow a single version of the file to be open at once
(not necessarily as limiting as it may seem).


in scenarios where it has worked (multiple versions can be opened), it
often seems like the OS is "lazy": one process will see an out-of-date
version of the file data (the data will often be out-of-date until the
writer closes the file or similar).

I never really felt all that inclined to look into the how/why/when
aspects of all this.

a partial exception is when using shared-memory, which tends to stay
up-to-date.


these issues don't seem to really pop up so much if one passes data in
an "open file, write, close file" or "open file, read, close file"
strategy (then the file is always seen up-to-date, and typically the
chance of clash remains fairly small).

this strategy is arguably not very efficient, but it is fairly simple
and tends to work "well enough" for many use cases (particularly passing
"globs of data once in a great while", or when operating at
"user-interaction" time-frames, such as the file is reloaded, say,
because the user just saved to it).

if done well, this can be used to implement things like a "magic
notepad", whereby data edited/saved in Notepad is automatically
reflected in the running app (say, by polling+"stat()", then processing
the file if it has changed).

conceptually, the latency should only really be limited by polling rate
(although granted polling isn't free, and a process bogging down the
system by polling a file in a tight loop isn't necessarily desirable
either).


another advantage of files is that they are more amendable to "makeshift
options" than some of the other strategies (one doesn't really need to
care what apps are thrown in the mix, so long as they can read/write the
files in question).


> Sockets
>
> This is slightly less transparent than files, but has the
> advantage that it becomes very easy to have the two
> processes running on different computers later, if this
> should ever be required. Debugging should be possible
> by a man-in-the-middle proxy that prints all information
> it sees or by connecting to the server with a terminal.
>


I have used sockets for IPC before fairly well.

a minor issue with TCP for IPC though is that sometimes the buffering
does something very annoying:
no matter how long one waits, TCP will not send the data until a certain
amount has been written to the socket (IIRC, one can disable buffering
or similar to prevent this, but unbuffered sockets can be evil on a
network if used naively, such as writing an individual byte or datum at
a time, rather than sending the entire message in a single write, since
an unbuffered socket may attempt to send a datagram for *every* write to
the socket).

TCP works fairly well for transmitting lots of small messages (and apart
from the potential buffering issue has very little latency).

UDP also has some merit, but the big annoying hassle of having to pack
ones' messages into UDP datagrams (however, UDP is much more resistant
against stalls, which can easily become an issue for TCP sockets if
going over the wider internet, however UDP is unreliable and unordered
which also needs to be taken into account).


> JNI
>
> JNI might be used to access code written in C or
> ABI-compatible languages. This should be fast, but I heard
> that it is error prone to write JNI code and needs some
> learning (code less maintainable)?
>


JNI can work, but is also annoying in some ways.

if one simply wants to call functions or pass data or messages to/from C
code, it works fairly well. JNI is, however, not readily capable of IPC
AFAIK. it also may result in some level of "physical coupling" between
code in the languages in question (may or may not be desirable, probably
depends on the task, often it is preferable IME to avoid coupling where
possible, even often within code within the same language).

it is also not necessarily all that much more convenient than options
such as sockets (likely depends a lot on the task though, for many, it
may just be easier to write a message parser/dispatcher for whatever
comes over the socket).

 
Reply With Quote
 
 
 
 
BGB
Guest
Posts: n/a
 
      02-07-2012
On 2/7/2012 11:11 AM, jebblue wrote:
> On Fri, 03 Feb 2012 19:52:08 +0000, Stefan Ram wrote:
>
>> »X« below is another language than Java, for example,
>> VBA, C#, or C.
>>
>> When an X process and a Java process have to exchange information on
>> the same computer, what possibilites are there? The Java process
>> should act as a client, sending commands to the X process and also
>> wants to read answers from the X process. So, the X process is a kind
>> of server.
>>
>> My criteria are: reliability and it should not be extremely slow (say
>> exchanging a string should not take more than about 10 ms). The main
>> criterion is reliability.
>>

>
>> Sockets
>>
>> This is slightly less transparent than files, but has the advantage
>> that it becomes very easy to have the two processes running on
>> different computers later, if this should ever be required. Debugging
>> should be possible by a man-in-the-middle proxy that prints all
>> information it sees or by connecting to the server with a terminal.
>>

>
> I recommend using sockets.
>


in general, I agree (sockets generally make the most sense), although
there are cases where file-based communications can make sense, although
probably not in the form as described in the OP.


another issue (besides how to pass messages), is what sort of form to
pass messages in.

usually, in my case, if storing data in files, I tend to prefer
ASCII-based formats.

usually, for passing messages over sockets, I have used "compact"
specialized binary formats, typically serialized data from some other
form (such as XML nodes or S-Expressions). although "magic byte value"
based message formats are initially simpler, they tend to be harder to
expand later (whereas encoding/decoding some more generic form, though
initially more effort, can turn out to be easier to maintain and extend
later).

note: this does not mean SOAP or CORBA or some other "standardized"
messaging system, rather just that one initially builds and processes
the messages in some form that is more high-level than spitting out
bytes, and processing everything via a loop and a big "switch()" or
similar (although this can be an initially fairly simple option, so has
some merit due to ease of implementation).


the main reason for picking a binary message-serialization format (for
something like S-Expressions or XML nodes), would be mostly if there is
a chance that the serialized data will go over the internet, and a
textual format can be a bit bulkier (and thus slower to transmit over a
slower connection), as well as typically being slower to decode (a
sanely designed message format can be much more quickly unpacked than a
textual format can be parsed).

sending text over sockets may have merits as well, and is generally
preferable for "open" protocols.


or such...
 
Reply With Quote
 
Arved Sandstrom
Guest
Posts: n/a
 
      02-08-2012
On 12-02-07 07:38 PM, BGB wrote:
> On 2/7/2012 11:11 AM, jebblue wrote:

[ SNIP ]
>>
>> I recommend using sockets.
>>

>
> in general, I agree (sockets generally make the most sense), although
> there are cases where file-based communications can make sense, although
> probably not in the form as described in the OP.
>
>
> another issue (besides how to pass messages), is what sort of form to
> pass messages in.
>
> usually, in my case, if storing data in files, I tend to prefer
> ASCII-based formats.
>
> usually, for passing messages over sockets, I have used "compact"
> specialized binary formats, typically serialized data from some other
> form (such as XML nodes or S-Expressions). although "magic byte value"
> based message formats are initially simpler, they tend to be harder to
> expand later (whereas encoding/decoding some more generic form, though
> initially more effort, can turn out to be easier to maintain and extend
> later).
>
> note: this does not mean SOAP or CORBA or some other "standardized"
> messaging system, rather just that one initially builds and processes
> the messages in some form that is more high-level than spitting out
> bytes, and processing everything via a loop and a big "switch()" or
> similar (although this can be an initially fairly simple option, so has
> some merit due to ease of implementation).
>
>
> the main reason for picking a binary message-serialization format (for
> something like S-Expressions or XML nodes), would be mostly if there is
> a chance that the serialized data will go over the internet, and a
> textual format can be a bit bulkier (and thus slower to transmit over a
> slower connection), as well as typically being slower to decode (a
> sanely designed message format can be much more quickly unpacked than a
> textual format can be parsed).
>
> sending text over sockets may have merits as well, and is generally
> preferable for "open" protocols.
>
> or such...


I've done a fair bit with sockets myself, including recently, in fact
including on a current gig. Some of the message formats have been
designed by others, some by me. A few of them are specialized industry
standards, some are very custom and bespoke.

A few of the formats have been binary: fixed-length blocks of data with
fields at various offsets. Works well enough if it suits the data.

A bunch of others have been text and line-oriented: a fixed number of
lines of data in known order, so that line 10 is always the data for a
particular field.

Other things to consider: JAXB, JSON etc. Minimum coding fuss at the
endpoints if that's what's appropriate for constructing message payloads.

I like text-based protocols, for some simple situations, that behave
like SMTP or POP. But it obviously depends on what you expect your
client and server to do, it's just another approach to be aware of.

One of the big things in designing one's own messaging is error
handling. People generally do just fine with the happy path, but ignore
comprehensive error handling, or get wrapped around the axle trying to
do it.

A lot of situations admit of more than one approach.

AHS
--
....wherever the people are well informed they can be trusted with their
own government...
-- Thomas Jefferson, 1789
 
Reply With Quote
 
Arne VajhĂžj
Guest
Posts: n/a
 
      02-08-2012
On 2/7/2012 6:38 PM, BGB wrote:
> On 2/7/2012 11:11 AM, jebblue wrote:
>> On Fri, 03 Feb 2012 19:52:08 +0000, Stefan Ram wrote:
>>> »X« below is another language than Java, for example,
>>> VBA, C#, or C.
>>>
>>> When an X process and a Java process have to exchange information on
>>> the same computer, what possibilites are there? The Java process
>>> should act as a client, sending commands to the X process and also
>>> wants to read answers from the X process. So, the X process is a kind
>>> of server.
>>>
>>> My criteria are: reliability and it should not be extremely slow (say
>>> exchanging a string should not take more than about 10 ms). The main
>>> criterion is reliability.
>>>

>>
>>> Sockets
>>>
>>> This is slightly less transparent than files, but has the advantage
>>> that it becomes very easy to have the two processes running on
>>> different computers later, if this should ever be required. Debugging
>>> should be possible by a man-in-the-middle proxy that prints all
>>> information it sees or by connecting to the server with a terminal.
>>>

>>
>> I recommend using sockets.

>
> in general, I agree (sockets generally make the most sense),


> another issue (besides how to pass messages), is what sort of form to
> pass messages in.
>
> usually, in my case, if storing data in files, I tend to prefer
> ASCII-based formats.
>
> usually, for passing messages over sockets, I have used "compact"
> specialized binary formats, typically serialized data from some other
> form (such as XML nodes or S-Expressions). although "magic byte value"
> based message formats are initially simpler, they tend to be harder to
> expand later (whereas encoding/decoding some more generic form, though
> initially more effort, can turn out to be easier to maintain and extend
> later).


If you want compact and text go for JSON.

Arne

 
Reply With Quote
 
Martin Gregorie
Guest
Posts: n/a
 
      02-08-2012
On Tue, 07 Feb 2012 16:38:31 -0700, BGB wrote:

> in general, I agree (sockets generally make the most sense), although
> there are cases where file-based communications can make sense, although
> probably not in the form as described in the OP.
>

Yes, for small amounts of data or message passing between processes I
tend to like sockets - as others have said, the fact that they are
agnostic about the location of the communicating processes is often very
useful.

> usually, for passing messages over sockets, I have used "compact"
> specialized binary formats,
>

Yep. ASN.1 has to be about the most compact way of encoding structured,
multi-field messages with XML occupying the other end of the scale.

That said, for short, list of fields messages I often use a CSV string
preceded by an unsigned binary byte value containing the string length:
this type of message is both easy to transfer, even if the connection
wants to fragment it during transmission, and by having a printable text
payload, its also convenient for trouble shooting.


--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
 
Reply With Quote
 
BGB
Guest
Posts: n/a
 
      02-08-2012
On 2/7/2012 6:31 PM, Martin Gregorie wrote:
> On Tue, 07 Feb 2012 16:38:31 -0700, BGB wrote:
>
>> in general, I agree (sockets generally make the most sense), although
>> there are cases where file-based communications can make sense, although
>> probably not in the form as described in the OP.
>>

> Yes, for small amounts of data or message passing between processes I
> tend to like sockets - as others have said, the fact that they are
> agnostic about the location of the communicating processes is often very
> useful.
>


yep.


>> usually, for passing messages over sockets, I have used "compact"
>> specialized binary formats,
>>

> Yep. ASN.1 has to be about the most compact way of encoding structured,
> multi-field messages with XML occupying the other end of the scale.
>


I disagree partly WRT ASN.1:
a disadvantage of ASN.1 is that a lot of times it tends to use
fixed-width integer encodings (and often sends structures in a
"reasonably raw" form), whereas one can shave more bytes using a
variable-length-integer scheme (why encode an integer in 4 bytes if you
only need 1 byte in a given case?). it is also possible to shave more
bytes if one makes the format use an adaptive/context-sensitive encoding
scheme and maybe a variant of Huffman coding or similar (and possibly
encode integer values using a similar scheme to that used in Deflate).
it is in-fact not particularly difficult to outperform ASN.1 in these
regards.


granted, yes, custom Huffman-based data encodings are probably not "the
norm" for network protocols (though some programs, such as the Quake 3
engine, have used Huffman-compressed network protocols).

there is also "arithmetic coding" and "range coding", but with these it
is a lot harder to make the codec be acceptably fast (whereas there are
some tricks to allow optimizing Huffman codecs).


in cases where I have used XML, I have typically used a custom binary
XML variant, which can greatly reduce the overhead vs textual XML. in
terms of saving bytes, my encoding can be more compact than WBXML or
XML+Deflate, but is arguably more "esoteric", and as-is doesn't make use
of schemas (it is instead a basic adaptive coding, and is vaguely
similar to an LZ-Markov coding, attempting to exploit repeating patterns
in tag-structure and similar via prediction, but like most adaptive
codings initially transmits the data in a less dense form as it needs to
build up a new context for each message). the coding in question doesn't
use Huffman coding (for sake of simplicity, and because I don't always
particularly need "maximum compactness"), but a Huffman-based variant
could be created if needed.

there is also EXI, but I don't know how my encoding compares (EXI
probably does better though, given that IIRC it uses binary universal
codes and schemas).


for something else of mine I am using S-Expression based messages
(currently between components within the same process), and had
considered using a vaguely similar binary coding if/when I get around to it.


> That said, for short, list of fields messages I often use a CSV string
> preceded by an unsigned binary byte value containing the string length:
> this type of message is both easy to transfer, even if the connection
> wants to fragment it during transmission, and by having a printable text
> payload, its also convenient for trouble shooting.
>


yes, this is possible.

also possibly would be a TLV encoding (say, possibly doing something
similar to the Matroska MKV file-format).


say, the integer values are encoded something like (range, encoding):
0-127 0xxxxxxx
128-16383 10xxxxxx xxxxxxxx
16384-2097151 110xxxxx xxxxxxxx xxxxxxxx
2097152-... ...

likewise, one can get a signed variant by folding the sign into the LSB,
forming a pattern like: 0, -1, 1, -2, 2, ...

then, one defines tags as:
{
VLI tag;
VLI length;
byte data[length];
}

where tags can hold either data or messages (and, the smallest tag size
needs 2 bytes, or 3 bytes if one has 1 byte of payload for the tag).


if the length is optional (presence depends on tag), one can reduce the
typical tag size to 1 byte. likewise, tags can be combined with an
MTF/MRU scheme such that any recently used tags have a small value (and
can thus be encoded in a single byte). (many of my formats define tags
inline, rather than relying on some large hard-coded tag-list).

more bytes can be saved if more of the message structure is known, say
that not only does the tag encode a particular tag-type, but also may
carry information about what follows after it (various combinations of
attributes, and if it contains sub-tags and what they might be, ...).

if a new tag is defined, it is added to the MRU, but if not used
frequently may move "backwards" (towards higher index numbers) or
eventually be forgotten (falls off the end of the list).

note that some hard-coded tag-numbers will be needed for basic control
purposes (encoding new/unfamiliar tags, ...).


a Huffman-based variant could be similar, just one may encode integers
differently. an example scheme is to use a prefix value (Huffman coded)
and a suffix bit pattern (similar to Deflate). a simpler (but less
compact) scheme was used in JPEG, and IIRC I had before "compromised"
between them by having the Huffman table be stored using Rice codes.


example (prefix range, value range, suffix bits):
0-15 0-15 0
16-23 16-31 1
24-31 32-63 2
32-39 64-127 3
40-47 128-255 4
48-55 512-1024 5
56-63 1024-2047 6
64-71 2048-4095 7
72-79 4096-8191 8
80-87 8192-16383 9
....

also note that a nifty thing (also used in Deflate) is to compress the
Huffman table itself using Huffman coding.


likewise, one can save a few bytes if the encoder is smart enough to
recognize when tags encode numeric data (mostly specific to XML, with
S-Expressions or similar one knows when they are dealing with numeric data).

likewise, one can encode floats as a pair of integer values (although
floats present a few of their own complexities). one can also devise
special encodings for things like numeric vectors, quaternions, ... if
needed as well.


likewise, either an LZ77 or LZ-Markov scheme can be used for encoding
strings (an example would be to used a fixed-size rotating window like
in Deflate, and essentially using the same basic encoding for strings,
albeit likely with the use of an "End-Of-String" marker).

say (range, meaning):
0-255: literal byte values
258: End Of String
259-321: LZ77 Run (encodes length, followed by window offset).

String encoding would be used, say, for encoding both literal text, and
also for escaping things like tag and attribute names.

....


the main variability is mostly in terms of the type of payload being
transmitted:
be it XML-based, S-Expression based, or potentially object-based
(similar to either JSON, or a sort of "heap pickling" style system).


for most structured data, it shouldn't be needed to change the
"fundamentals" too much. the main difference is between tree-structured
and heap-like / graph-structured data, as graph-structured data is often
better sent as a flat list of objects with a certain entry being a "root
node" than as a tree (this can be accomplished either by building a
list, or using an algorithm to detect and break-up cycles when needed).


granted, for most use-cases something like this is likely to be overkill.


or such...

 
Reply With Quote
 
BGB
Guest
Posts: n/a
 
      02-08-2012
On 2/7/2012 5:26 PM, Arved Sandstrom wrote:
> On 12-02-07 07:38 PM, BGB wrote:
>> On 2/7/2012 11:11 AM, jebblue wrote:

> [ SNIP ]
>>>
>>> I recommend using sockets.
>>>

>>
>> in general, I agree (sockets generally make the most sense), although
>> there are cases where file-based communications can make sense, although
>> probably not in the form as described in the OP.
>>
>>
>> another issue (besides how to pass messages), is what sort of form to
>> pass messages in.
>>
>> usually, in my case, if storing data in files, I tend to prefer
>> ASCII-based formats.
>>
>> usually, for passing messages over sockets, I have used "compact"
>> specialized binary formats, typically serialized data from some other
>> form (such as XML nodes or S-Expressions). although "magic byte value"
>> based message formats are initially simpler, they tend to be harder to
>> expand later (whereas encoding/decoding some more generic form, though
>> initially more effort, can turn out to be easier to maintain and extend
>> later).
>>
>> note: this does not mean SOAP or CORBA or some other "standardized"
>> messaging system, rather just that one initially builds and processes
>> the messages in some form that is more high-level than spitting out
>> bytes, and processing everything via a loop and a big "switch()" or
>> similar (although this can be an initially fairly simple option, so has
>> some merit due to ease of implementation).
>>
>>
>> the main reason for picking a binary message-serialization format (for
>> something like S-Expressions or XML nodes), would be mostly if there is
>> a chance that the serialized data will go over the internet, and a
>> textual format can be a bit bulkier (and thus slower to transmit over a
>> slower connection), as well as typically being slower to decode (a
>> sanely designed message format can be much more quickly unpacked than a
>> textual format can be parsed).
>>
>> sending text over sockets may have merits as well, and is generally
>> preferable for "open" protocols.
>>
>> or such...

>
> I've done a fair bit with sockets myself, including recently, in fact
> including on a current gig. Some of the message formats have been
> designed by others, some by me. A few of them are specialized industry
> standards, some are very custom and bespoke.
>
> A few of the formats have been binary: fixed-length blocks of data with
> fields at various offsets. Works well enough if it suits the data.
>
> A bunch of others have been text and line-oriented: a fixed number of
> lines of data in known order, so that line 10 is always the data for a
> particular field.
>
> Other things to consider: JAXB, JSON etc. Minimum coding fuss at the
> endpoints if that's what's appropriate for constructing message payloads.
>
> I like text-based protocols, for some simple situations, that behave
> like SMTP or POP. But it obviously depends on what you expect your
> client and server to do, it's just another approach to be aware of.
>


well, text need not be all that limiting.
if one has XML or free-form S-Expressions (in their true sense, like in
Lisp or Scheme, not the mutilated/watered-down Rivest ones), then one
can do a fair amount with text.

IME, there are many tradeoffs (regarding ease of use, ...) between XML
and S-Exps, and neither seems "clearly better" (as far as
representations go, I find S-Exps easier to work with, but namespaces
and attributes in XML can make it more flexible, as one can more easily
throw new tags or attributes at the problem with less chance of breaking
existing code).

an example is this:
<foo> <bar value="3"/> </foo>
and:
(foo (bar 3))

now, consider one wants to add a new field to 'foo' (say 'ln').
<foo ln="15"> <bar value="3"/> </foo>
and:
(foo 15 (bar 3))

a difference here is that existing code will probably not even notice
the new XML attribute, whereas the positional nature of most
S-Expressions makes the latter far more likely to break something (and
there is no good way to "annotate" an S-Exp, whereas with XML it is
fairly solidly defined that one can simply add new attributes).


note: my main way of working with XML is typically via DOM-style
interfaces (if I am using it, it is typically because I am directly
working with the data structure, and not as the result of some dumb-ass
"data binding" crud...).


typically, the "internal representation" and "concrete serialization"
are different:
I may use a textual XML serialization, or just as easily, I could use a
binary format;
likewise for S-Exps (actually, I probably far more often represent
S-Exps as a binary format of one form or another than I use them in a
form externally serialized as text).

all hail the mighty DOM-node or CONS-cell...


> One of the big things in designing one's own messaging is error
> handling. People generally do just fine with the happy path, but ignore
> comprehensive error handling, or get wrapped around the axle trying to
> do it.
>


yeah, but this applies to programming in general, so message-passing is
likely nothing special here. one issue maybe special to sockets though
is the matter of whether or not the whole message has been received,
often resulting in some annoying code to basically read messages from
the socket and not decode them until the entire message has been received.


> A lot of situations admit of more than one approach.
>


agreed.

it is like me and file-formats.
often I just use ASCII text (simple, easy, editable in Notepad or
similar, ...).

I make plenty of use of simple line-oriented text formats as well.

other times, I might use more advanced binary formats, or maybe even
employ the use of "data compression" techniques (such as Huffman
coding), so a lot depends.

 
Reply With Quote
 
Arved Sandstrom
Guest
Posts: n/a
 
      02-08-2012
On 12-02-08 04:41 AM, BGB wrote:
> On 2/7/2012 5:26 PM, Arved Sandstrom wrote:
>> On 12-02-07 07:38 PM, BGB wrote:
>>> On 2/7/2012 11:11 AM, jebblue wrote:

>> [ SNIP ]
>>>>
>>>> I recommend using sockets.
>>>>
>>>
>>> in general, I agree (sockets generally make the most sense), although
>>> there are cases where file-based communications can make sense, although
>>> probably not in the form as described in the OP.
>>>
>>>
>>> another issue (besides how to pass messages), is what sort of form to
>>> pass messages in.
>>>
>>> usually, in my case, if storing data in files, I tend to prefer
>>> ASCII-based formats.
>>>
>>> usually, for passing messages over sockets, I have used "compact"
>>> specialized binary formats, typically serialized data from some other
>>> form (such as XML nodes or S-Expressions). although "magic byte value"
>>> based message formats are initially simpler, they tend to be harder to
>>> expand later (whereas encoding/decoding some more generic form, though
>>> initially more effort, can turn out to be easier to maintain and extend
>>> later).
>>>
>>> note: this does not mean SOAP or CORBA or some other "standardized"
>>> messaging system, rather just that one initially builds and processes
>>> the messages in some form that is more high-level than spitting out
>>> bytes, and processing everything via a loop and a big "switch()" or
>>> similar (although this can be an initially fairly simple option, so has
>>> some merit due to ease of implementation).
>>>
>>>
>>> the main reason for picking a binary message-serialization format (for
>>> something like S-Expressions or XML nodes), would be mostly if there is
>>> a chance that the serialized data will go over the internet, and a
>>> textual format can be a bit bulkier (and thus slower to transmit over a
>>> slower connection), as well as typically being slower to decode (a
>>> sanely designed message format can be much more quickly unpacked than a
>>> textual format can be parsed).
>>>
>>> sending text over sockets may have merits as well, and is generally
>>> preferable for "open" protocols.
>>>
>>> or such...

>>
>> I've done a fair bit with sockets myself, including recently, in fact
>> including on a current gig. Some of the message formats have been
>> designed by others, some by me. A few of them are specialized industry
>> standards, some are very custom and bespoke.
>>
>> A few of the formats have been binary: fixed-length blocks of data with
>> fields at various offsets. Works well enough if it suits the data.
>>
>> A bunch of others have been text and line-oriented: a fixed number of
>> lines of data in known order, so that line 10 is always the data for a
>> particular field.
>>
>> Other things to consider: JAXB, JSON etc. Minimum coding fuss at the
>> endpoints if that's what's appropriate for constructing message payloads.
>>
>> I like text-based protocols, for some simple situations, that behave
>> like SMTP or POP. But it obviously depends on what you expect your
>> client and server to do, it's just another approach to be aware of.
>>

>
> well, text need not be all that limiting.


You may have misunderstood something I said if you got that impression
from me, that text is all that limiting.

[ SNIP ]

> note: my main way of working with XML is typically via DOM-style
> interfaces (if I am using it, it is typically because I am directly
> working with the data structure, and not as the result of some dumb-ass
> "data binding" crud...).


I haven't been able to completely avoid using the DOM, but I loathe the
API. If I'm using XML at all, and JAXB suits, I'll use JAXB. More
generally I'll use SAX or StAX.

I almost never encounter a situation where DOM is called for, simply
because no random access to the document is called for. When I send XML
back and forth as a payload, the entire thing is meant to be used, and
it makes sense to do the immediate and complete conversion into real
information rather than storing it into an opaque and kludgy DOM
representation.

For a lot of situations, not just message passing between endpoints, I
have backed away from XML anyway. For configuration files I have gotten
newly enthused by .properties files, because so often they fit the bill
much better than XML configuration files. And I mentioned JSON
previously, I prefer that to XML in many situations now.

[ SNIP ]

>> One of the big things in designing one's own messaging is error
>> handling. People generally do just fine with the happy path, but ignore
>> comprehensive error handling, or get wrapped around the axle trying to
>> do it.

>
> yeah, but this applies to programming in general, so message-passing is
> likely nothing special here.


That's true, but it's maybe a bit more of an art form with messages.
Your message producer may be Java and produce beautiful exceptions in
your carefully designed exception hierarchy, but your clients may very
well not be Java at all, in which case you may end up with an error
message sub-protocol that borrows ideas from from HTTP status codes.

A lot of Java programmers these days maybe have never really dealt with
return codes, because we sort of tell them not to use them in Java, but
in the case of implementation-neutral status codes (including ones for
errors) that's really the design mindset that you need to be in: status
codes.

one issue maybe special to sockets though
> is the matter of whether or not the whole message has been received,
> often resulting in some annoying code to basically read messages from
> the socket and not decode them until the entire message has been received.


There is that. Although I find that once you've worked through one or
two socket implementations that you tend to devise some pretty re-usable
code for handling the incomplete message situations.
[ SNIP ]

AHS
--
....wherever the people are well informed they can be trusted with their
own government...
-- Thomas Jefferson, 1789
 
Reply With Quote
 
BGB
Guest
Posts: n/a
 
      02-08-2012
On 2/8/2012 4:19 AM, Arved Sandstrom wrote:
> On 12-02-08 04:41 AM, BGB wrote:
>> On 2/7/2012 5:26 PM, Arved Sandstrom wrote:
>>> On 12-02-07 07:38 PM, BGB wrote:
>>>> On 2/7/2012 11:11 AM, jebblue wrote:
>>> [ SNIP ]
>>>>>
>>>>> I recommend using sockets.
>>>>>
>>>>
>>>> in general, I agree (sockets generally make the most sense), although
>>>> there are cases where file-based communications can make sense, although
>>>> probably not in the form as described in the OP.
>>>>
>>>>
>>>> another issue (besides how to pass messages), is what sort of form to
>>>> pass messages in.
>>>>
>>>> usually, in my case, if storing data in files, I tend to prefer
>>>> ASCII-based formats.
>>>>
>>>> usually, for passing messages over sockets, I have used "compact"
>>>> specialized binary formats, typically serialized data from some other
>>>> form (such as XML nodes or S-Expressions). although "magic byte value"
>>>> based message formats are initially simpler, they tend to be harder to
>>>> expand later (whereas encoding/decoding some more generic form, though
>>>> initially more effort, can turn out to be easier to maintain and extend
>>>> later).
>>>>
>>>> note: this does not mean SOAP or CORBA or some other "standardized"
>>>> messaging system, rather just that one initially builds and processes
>>>> the messages in some form that is more high-level than spitting out
>>>> bytes, and processing everything via a loop and a big "switch()" or
>>>> similar (although this can be an initially fairly simple option, so has
>>>> some merit due to ease of implementation).
>>>>
>>>>
>>>> the main reason for picking a binary message-serialization format (for
>>>> something like S-Expressions or XML nodes), would be mostly if there is
>>>> a chance that the serialized data will go over the internet, and a
>>>> textual format can be a bit bulkier (and thus slower to transmit over a
>>>> slower connection), as well as typically being slower to decode (a
>>>> sanely designed message format can be much more quickly unpacked than a
>>>> textual format can be parsed).
>>>>
>>>> sending text over sockets may have merits as well, and is generally
>>>> preferable for "open" protocols.
>>>>
>>>> or such...
>>>
>>> I've done a fair bit with sockets myself, including recently, in fact
>>> including on a current gig. Some of the message formats have been
>>> designed by others, some by me. A few of them are specialized industry
>>> standards, some are very custom and bespoke.
>>>
>>> A few of the formats have been binary: fixed-length blocks of data with
>>> fields at various offsets. Works well enough if it suits the data.
>>>
>>> A bunch of others have been text and line-oriented: a fixed number of
>>> lines of data in known order, so that line 10 is always the data for a
>>> particular field.
>>>
>>> Other things to consider: JAXB, JSON etc. Minimum coding fuss at the
>>> endpoints if that's what's appropriate for constructing message payloads.
>>>
>>> I like text-based protocols, for some simple situations, that behave
>>> like SMTP or POP. But it obviously depends on what you expect your
>>> client and server to do, it's just another approach to be aware of.
>>>

>>
>> well, text need not be all that limiting.

>
> You may have misunderstood something I said if you got that impression
> from me, that text is all that limiting.
>
> [ SNIP ]
>


ok.

it came off that you were implying that text only really worked well for
simple protocols, like SMTP, POP, HTTP, ...


>> note: my main way of working with XML is typically via DOM-style
>> interfaces (if I am using it, it is typically because I am directly
>> working with the data structure, and not as the result of some dumb-ass
>> "data binding" crud...).

>
> I haven't been able to completely avoid using the DOM, but I loathe the
> API. If I'm using XML at all, and JAXB suits, I'll use JAXB. More
> generally I'll use SAX or StAX.
>


I have rarely done things for which SAX has made sense...
usually in cases where SAX would make sense, I end up using
line-oriented text formats instead (because there is often little
obvious reason for why XML syntax would make much sense).


> I almost never encounter a situation where DOM is called for, simply
> because no random access to the document is called for. When I send XML
> back and forth as a payload, the entire thing is meant to be used, and
> it makes sense to do the immediate and complete conversion into real
> information rather than storing it into an opaque and kludgy DOM
> representation.
>


often, I use it for things like compiler ASTs, where it competes some
against S-Expressions (they are produced by the main parser, worked on,
and then later converted into bytecode or similar).

typically, one works by walking the tree, and potentially
rebuilding/rewriting a new tree in the process, or maybe adding
annotations to the existing tree.


a recent case where I did consider using XML as a message-passing
protocol, I ended up opting for S-Expressions (or, more properly,
Lisp-style lists) instead, mostly because they are a lot easier to build
and process, and much less painful than working with a DOM-style API
(and also because S-Expressions tend to perform better and use less
memory in my case as well...).

typically, the messages are tree-structured data of some sort (in the
recent example, it was being used for scene-graph delta messages, which
basically update the status of various objects in the scene, as well as
passing other events for things "going on", like sound-effects being
heard, updates to camera location and status, ...).


it is also desirable to keep the serialized representation small, since
a lot may be going on (in real time), and it would be annoying (say, to
players) if the connection got needlessly bogged down sending lots of
overly verbose update messages (more so if one has stuff like
network-synchronized rag-dolls or similar, where a ragdoll may send
position updates for nearly every bone for every frame).

say:
(bonedelta 499 (bone 0 (org ...) (rot ...)) (bone 1 (org ...) (rot ...))
....)
(bonedelta 515 ...)
....


hence, it may make a little sense to employ a compressed binary format.
I also personally dislike schemas or similar concepts, as they tend to
make things brittle (both the transmitter and receiver need a correct
and up-to-date schema, creating a higher risk of version issues), and
typically don't really compress all that much better (and are
potentially worse) than what a decent adaptive coding can do.

("on the wire", S-Exps and XML are not all that drastically different,
the main practical differences are more in terms of how one may work
with them in-program).


granted, yes, text+deflate also works OK if one is feeling lazy (since
IME Deflate will typically reduce textual XML or S-Exps to around
10%-25% their original size, vs say a 5%-10% one might get with a
specialized binary format).

there is also the tradeoff of designing a binary format to be standalone
(say, including its own Huffman compressor), or to be used in
combination with deflate (at which point one tries to design the format
to instead produce data which deflate can utilize efficiently).

in the latter option, there is the secondary concern of external deflate
(assuming that the data will probably be sent in a compressed channel or
stored in a ZIP file or similar), or using deflate internally (like in
PNG or similar).

there are many tradeoffs...


> For a lot of situations, not just message passing between endpoints, I
> have backed away from XML anyway. For configuration files I have gotten
> newly enthused by .properties files, because so often they fit the bill
> much better than XML configuration files. And I mentioned JSON
> previously, I prefer that to XML in many situations now.
>


I typically use line-oriented text formats for most of these purposes...

never really did understand why someone would use XML for things like
configuration files (it neither makes them easier to process, nor does
it help anything with users trying to edit them).


as-is, my configuration format consists of "console commands", which may
in turn set "cvars" or issue key-binding commands, ...

for another (more serious) system, I am using a format which is
partially a hybrid of INI and REG files (it is for a registry-like
hierarchical database). I have on/off considered switching to a binary
database format, but never got around to it.

some amount of other data is stored in formats similar to the Quake map
format, or other special-purpose text formats.


> [ SNIP ]
>
>>> One of the big things in designing one's own messaging is error
>>> handling. People generally do just fine with the happy path, but ignore
>>> comprehensive error handling, or get wrapped around the axle trying to
>>> do it.

>>
>> yeah, but this applies to programming in general, so message-passing is
>> likely nothing special here.

>
> That's true, but it's maybe a bit more of an art form with messages.
> Your message producer may be Java and produce beautiful exceptions in
> your carefully designed exception hierarchy, but your clients may very
> well not be Java at all, in which case you may end up with an error
> message sub-protocol that borrows ideas from from HTTP status codes.
>
> A lot of Java programmers these days maybe have never really dealt with
> return codes, because we sort of tell them not to use them in Java, but
> in the case of implementation-neutral status codes (including ones for
> errors) that's really the design mindset that you need to be in: status
> codes.
>


granted, I am actually primarily a C and C++ programmer, but
message-passing isn't particularly language-specific. granted, yes, the
lack of "standard" exceptions is an annoyance in C, where typically one
either needs to not use exceptions, or end up using non-portable
exception mechanisms, and there is no particularly good way to "build
ones' own", although some people have before done some fairly "creative"
things with macros...


> one issue maybe special to sockets though
>> is the matter of whether or not the whole message has been received,
>> often resulting in some annoying code to basically read messages from
>> the socket and not decode them until the entire message has been received.

>
> There is that. Although I find that once you've worked through one or
> two socket implementations that you tend to devise some pretty re-usable
> code for handling the incomplete message situations.
> [ SNIP ]
>


yep.

one can always tag messages and then give them with a length.

{ tag, length, data[length] }
message is then not processed until entire data region is received.
typically, this is plenty sufficient.


likewise, a PPP/HDLC style system (message start/end codes) could also
be used.


depending on other factors, one can also do things like in JPEG or MPEG,
and use a special escape-code for messages and control-codes.

this can allow a top-level message format like:
{ escape-code, tag [ length, data[length] ... ] }

typically, in such cases (I have seen) there have been ways to escape
the escape-code, usually for cases where the escape code appeared
by-chance in the data. this in-turn adds the annoyance of typically
having to escape any escape-codes in the payload data.

some others have partly worked around the above by making the escape
code fairly long (32 or 48 bits or more) and very unlikely to appear by
chance, and likely involving "sanity checks" to try to rule out false
positives.

say: { escape-magic, tag, length, data[length], checksum }
with the assumption that chance is very unlikely to lead to all of:
an escape magic, a valid tag value, a sane length, and a valid checksum.

depending, the escape-magic and tag can be the same value.

for example:
the byte 0x7E is magic;
7E,00 escapes 7E (or maybe 7E,7E)
7E,01 Start Of Message (followed by message data)
7E,02 End Of Message (maybe, followed by checksum)
others: reserved for link-control messages.


then one can pass encoded messages over the link.

typically, I have not tried parsing incomplete messages, as trying to
make a message decoder deal gracefully with truncated data is a bit more
of a hassle.

depending on other factors (say, if one is using Huffman), then one can
also use special markers to transmit the Huffman tables and other things.

say:
7E,03: Stream Reset (possibly followed by a stream/protocol ID magic)
7E,04-07: Huffman Tables 0-3
7E,08: End Of Huffman Table
....
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
The Future of Voice Communication @ BonaFideReviews Silverstrand Front Page News 0 09-27-2005 01:47 PM
SystemVerilog Interprocess Communication - Project VeriPage Update Swapnajit Mittra VHDL 0 12-21-2004 05:11 PM
communication between processes john VHDL 10 11-30-2004 09:59 AM
PC communication on wireless network? Mervin Williams Wireless Networking 3 08-24-2004 06:32 PM
Communication between HttpApplication that run on the same server Sherif ElMetainy ASP .Net 7 11-06-2003 11:23 PM



Advertisments