Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Regexp and Pattern.class

Reply
Thread Tools

Regexp and Pattern.class

 
 
roger_varley@yahoo.com
Guest
Posts: n/a
 
      12-17-2004
Hi

I've got an application (over which I have no control) that presents
its data as a single string. The data contains ' (single quote)
characters that denote end of line. However, the data can also
legitimately contain the ' character, so the generating program escapes
any embedded ' characters with ? (Question mark). (Its a Tradacomms
formatted EDI file if anyone is interested).

How/Can I phrase the regexp parameter to the Pattern.split() method to
split the string back into the original lines. Once I've cracked this,
the + and : characters used to split each line into groups and
individual fields should be easy

Or am I going to have to hand-roll this by reading the string a
character at a time?

Regards
Roger

 
Reply With Quote
 
 
 
 
Tilman Bohn
Guest
Posts: n/a
 
      12-17-2004
In message <(E-Mail Removed) .com>,
http://www.velocityreviews.com/forums/(E-Mail Removed) wrote on 17 Dec 2004 08:19:21 -0800:

> Hi
>
> I've got an application (over which I have no control) that presents
> its data as a single string. The data contains ' (single quote)
> characters that denote end of line. However, the data can also
> legitimately contain the ' character, so the generating program escapes
> any embedded ' characters with ? (Question mark). (Its a Tradacomms
> formatted EDI file if anyone is interested).


First question: Can a question mark followed by an apostrophe be
legal application data? If so, how is the question mark or the
complete sequence escaped?

For now I'll assume the sequence ?' can never occur legally in
the application data.

> How/Can I phrase the regexp parameter to the Pattern.split() method to
> split the string back into the original lines.


Under the above assumption you would split either on "(?<!\\?)'"
or on "(?<=[^?])'", according to taste. The look-behind assertions
are needed so the last character of each line isn't cut off.

> Once I've cracked this,
> the + and : characters used to split each line into groups and
> individual fields should be easy


So no help needed there then. Ok.

> Or am I going to have to hand-roll this by reading the string a
> character at a time?


Nope. The above should work.

--
Cheers, Tilman

--
`Boy, life takes a long time to live...' -- Steven Wright
 
Reply With Quote
 
 
 
 
roger_varley@yahoo.com
Guest
Posts: n/a
 
      12-17-2004

>
> First question: Can a question mark followed by an apostrophe be
> legal application data? If so, how is the question mark or the
> complete sequence escaped?
>


I've never seen that combination in <mumble> years of handling
Tradacomms EDI files so I've had to actually go and test it. The
generating program throws out ???' where the sequence ?' occurs.


> For now I'll assume the sequence ?' can never occur legally in
> the application data.
>


Thanks for your help.

Regards
Roger

 
Reply With Quote
 
klynn47@comcast.net
Guest
Posts: n/a
 
      12-17-2004
Sometimes I find it easier to use the Unicode representation of certain
characters.

 
Reply With Quote
 
Tilman Bohn
Guest
Posts: n/a
 
      12-17-2004
In message <(E-Mail Removed). com>,
(E-Mail Removed) wrote on 17 Dec 2004 09:24:20 -0800:

[...]
> I've never seen that combination in <mumble> years of handling
> Tradacomms EDI files so I've had to actually go and test it. The
> generating program throws out ???' where the sequence ?' occurs.


Interesting. Ok, in this case the pattern I gave you won't work
correctly. Before you can find the correct one you'll need to try
what happens for a) ??' and b) ???'.

--
Cheers, Tilman

`Boy, life takes a long time to live...' -- Steven Wright
 
Reply With Quote
 
klynn47@comcast.net
Guest
Posts: n/a
 
      12-17-2004
Sometimes I find it easier to use the Unicode representation of some
characters.

 
Reply With Quote
 
klynn47@comcast.net
Guest
Posts: n/a
 
      12-17-2004
Sometimes I find it easier to use the Unicode representation of certain
characters.

 
Reply With Quote
 
Tilman Bohn
Guest
Posts: n/a
 
      12-17-2004
In message <(E-Mail Removed) .com>,
(E-Mail Removed) wrote on 17 Dec 2004 08:19:21 -0800:

[...]
> How/Can I phrase the regexp parameter to the Pattern.split() method to
> split the string back into the original lines.


BTW, that's backwards. The regexp gets passed to Pattern.compile()
first, then your input is the parameter to the split() method executed
on the resulting Pattern object.

--
Cheers, Tilman

`Boy, life takes a long time to live...' -- Steven Wright
 
Reply With Quote
 
roger_varley@yahoo.com
Guest
Posts: n/a
 
      12-20-2004
Hi Tilman

??' in the input results in ?????' in the output file and ???' in the
input file results in ???????' in the output.

Regards
Roger

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
new RegExp().test() or just RegExp().test() Matěj Cepl Javascript 3 11-24-2009 02:41 PM
[regexp] How to convert string "/regexp/i" to /regexp/i - ? Joao Silva Ruby 16 08-21-2009 05:52 PM
Ruby 1.9 - ArgumentError: incompatible encoding regexp match(US-ASCII regexp with ISO-2022-JP string) Mikel Lindsaar Ruby 0 03-31-2008 10:27 AM
Programmatically turning a Regexp into an anchored Regexp Greg Hurrell Ruby 4 02-14-2007 06:56 PM
RegExp.exec() returns null when there is a match - a JavaScript RegExp bug? Uldis Bojars Javascript 2 12-17-2006 09:59 PM



Advertisments