Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Special Character Token

Reply
Thread Tools

Special Character Token

 
 
Sameer
Guest
Posts: n/a
 
      03-01-2005
Hello,
In the process of designing a chatting system, I have to send some text
from one machine to another. This text usually contains 3 to 4 parts
separated by a token like ~ or ^ or $. At the other end I use
StringTokenizer to decode the text.
It is expected that the texts separated by these tokens must not
contain such tokens. We do not expect such things from users and a user
may type a message which contain these tokens and it will lead to
malfunctioning of the chatting system.

Can I insert some special character tokens which can not be generated
by keyboard easily or in general typing.
How to generate such token characters?
Please give answer in Java and Unicode context.
Give methods for coding and decoding of characters and to embed them in
text.
-Sameer

 
Reply With Quote
 
 
 
 
Eric Sosman
Guest
Posts: n/a
 
      03-01-2005


Sameer wrote:
> Hello,
> In the process of designing a chatting system, I have to send some text
> from one machine to another. This text usually contains 3 to 4 parts
> separated by a token like ~ or ^ or $. At the other end I use
> StringTokenizer to decode the text.
> It is expected that the texts separated by these tokens must not
> contain such tokens. We do not expect such things from users and a user
> may type a message which contain these tokens and it will lead to
> malfunctioning of the chatting system.
>
> Can I insert some special character tokens which can not be generated
> by keyboard easily or in general typing.
> How to generate such token characters?
> Please give answer in Java and Unicode context.
> Give methods for coding and decoding of characters and to embed them in
> text.


"Security by obscurity" is not very robust. As soon as
somebody figures out the right ALT sequence or similar trick,
the vandals will have a field day with your chat system.

A better way is to develop an encoding that can handle
all characters, even those that would ordinarily have special
meaning. One simple approach is to double a special character
whenever it appears in a non-special context (e.g., in the
message body). For example, if you use # to delimit the
parts of the message and the three parts are

Knick-knack paddy-whack

Give # dog # bone

This old ### came rolling home

.... you could transmit the message as

#
Knick-knack paddy-whack
#
Give ## dog ## bone
#
This old ###### came rolling home
#

When the receiver gets this stream of characters it looks
for each #. If a # is followed by another #, the two become
one # considered as an ordinary data character. But if a #
is followed by something other than a second #, it is a part
separator, not a data character.

--
http://www.velocityreviews.com/forums/(E-Mail Removed)

 
Reply With Quote
 
 
 
 
Oscar kind
Guest
Posts: n/a
 
      03-01-2005
Sameer <(E-Mail Removed)> wrote:
> In the process of designing a chatting system, I have to send some text
> from one machine to another. This text usually contains 3 to 4 parts
> separated by a token like ~ or ^ or $. At the other end I use
> StringTokenizer to decode the text.
> It is expected that the texts separated by these tokens must not
> contain such tokens. We do not expect such things from users and a user
> may type a message which contain these tokens and it will lead to
> malfunctioning of the chatting system.
>
> Can I insert some special character tokens which can not be generated
> by keyboard easily or in general typing.
> How to generate such token characters?
> Please give answer in Java and Unicode context.
> Give methods for coding and decoding of characters and to embed them in
> text.


As stated earlier by Eric, such a thing will not work because the text of
the user can include anything. His idea of doubling special characters is
therefore a good one.

<plug mode="shameless">

Another solution is to use CSV records, although implementing this from
scratch would be more work. See my playground project on
http://oscar.stachanov.com/java/
(look for the classes CSVParser & CSVFormatter)

</plug mode="shameless">


--
Oscar Kind http://home.hccnet.nl/okind/
Software Developer for contact information, see website

PGP Key fingerprint: 91F3 6C72 F465 5E98 C246 61D9 2C32 8E24 097B B4E2
 
Reply With Quote
 
shriop
Guest
Posts: n/a
 
      03-03-2005
I went and looked at this project of yours. Do you really think
wrapping up ReadLine.Split(',') inside a class is going to fool anyone?
And your description for the project says that you're cleanly and
correctly handling the csv format. This is totally wrong. I'm sorry.

 
Reply With Quote
 
Oscar kind
Guest
Posts: n/a
 
      03-03-2005
shriop <(E-Mail Removed)> wrote:
> I went and looked at this project of yours. Do you really think
> wrapping up ReadLine.Split(',') inside a class is going to fool anyone?
> And your description for the project says that you're cleanly and
> correctly handling the csv format. This is totally wrong. I'm sorry.


The implementation is correct: it handles the CSV format exactly as
specified here:
http://www.creativyst.com/Doc/Articles/CSV/CSV01.htm

This implementation exhibits a more stable behaviour than for example the
unpredictable one from Microsoft: That one uses the list separator from
the regional settings, but sometimes silently ignores it. Microsoft didn't
document that, let alone when, their implementation does this, nor what
record separator is used instead.

Also, IMHO, using String.split(String, int) doesn't make an implementation
unclean (and there is no ReadLine class btw). I'm therefore not trying to
fool anyone.

Admittedly, there are improvements possible, and I welcome any
constructive criticism. This requires arguments though. Did you have any?


--
Oscar Kind http://home.hccnet.nl/okind/
Software Developer for contact information, see website

PGP Key fingerprint: 91F3 6C72 F465 5E98 C246 61D9 2C32 8E24 097B B4E2
 
Reply With Quote
 
shriop
Guest
Posts: n/a
 
      03-04-2005
You're absolutely right. I was too quick to judgement and now I see how
you're handling all the situations. The only rule I can find now taking
a second look that as far as I can see you're still violating is

Fields with leading or trailing spaces must be delimited with
double-quote characters.

You appear to always be trimming leading and trailing whitespace
whether in quotes or not. Other than that, and other than that fact
that your class is very string heavy, it does appear correct.

 
Reply With Quote
 
Oscar kind
Guest
Posts: n/a
 
      03-04-2005
shriop <(E-Mail Removed)> wrote:
> You appear to always be trimming leading and trailing whitespace
> whether in quotes or not. Other than that, and other than that fact
> that your class is very string heavy, it does appear correct.


It is rather heavy: String.split(String, int) uses regular expressions,
which for a simple case as this isn't efficient. It's just easy to
understand and maintain.

Also, note that I trim leading and trailing whitespace first, and then
remove surrounding quotes (if present): the record separator (',') may
be surrounded by whitespace. This isn't considered part of the fields
(hence it's trimmed). This is also the reason that fields with leading
and/or trailing whitespace should be quoted.

If I were to optimize it, I would need to do the following:
- Read the stream character by character (probsbly buffered, but still)
- Add field values character by character instead of token by token

This works approximately the same, but the algorithm is (IMHO) less easy
to understand, as it is more low-level. I'm not used to that.


--
Oscar Kind http://home.hccnet.nl/okind/
Software Developer for contact information, see website

PGP Key fingerprint: 91F3 6C72 F465 5E98 C246 61D9 2C32 8E24 097B B4E2
 
Reply With Quote
 
shriop
Guest
Posts: n/a
 
      03-05-2005
You got me again, you're right about the trimming.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
This is an unexpected token. The expected token is 'NAME' =?Utf-8?B?Y2FzaGRlc2ttYWM=?= ASP .Net 2 07-13-2007 11:38 AM
Token pasting (## operator) - Add whitespace to a token Wessi C Programming 3 08-11-2005 01:02 PM
"token" "token sequence" "scalar variable" "vector" ?? G Fernandes C Programming 1 02-18-2005 05:32 AM
split string contain special token? zealotcat@gmail.com C++ 15 01-14-2005 09:19 PM
preprocessor, token concatenation, no valid preprocessor token Cronus C++ 1 07-14-2004 11:10 PM



Advertisments