Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Java > Confusion about String.matches method

Reply
Thread Tools

Confusion about String.matches method

 
 
Joshua Cranmer
Guest
Posts: n/a
 
      06-01-2011
On 06/01/2011 10:10 AM, laredotornado wrote:
> which returns false. If I remove the new line ("\n"), it matches, but
> I can't guarantee my input won't contain new lines. How can I modify
> my regular expression to match? Thanks, - Dave


There is a flag that you can set to treat newlines as regular characters.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth
 
Reply With Quote
 
 
 
 
Daniele Futtorovic
Guest
Posts: n/a
 
      06-01-2011
On 01/06/2011 17:17, Joshua Cranmer allegedly wrote:
> On 06/01/2011 10:10 AM, laredotornado wrote:
>> which returns false. If I remove the new line ("\n"), it matches, but
>> I can't guarantee my input won't contain new lines. How can I modify
>> my regular expression to match? Thanks, - Dave

>
> There is a flag that you can set to treat newlines as regular characters.
>


Pattern.DOTALL, to be precise. You can't use that in combination with
String#matches() however, as it is an argument to Pattern#compile. So
either go that way, or use the embedded flag, "(?s)" (put it at the
start of your regex).

--
DF.
Determinism trumps correctness.
 
Reply With Quote
 
 
 
 
Ian Shef
Guest
Posts: n/a
 
      06-01-2011
laredotornado <(E-Mail Removed)> wrote in news:2fd869d6-c2f6-4360-
http://www.velocityreviews.com/forums/(E-Mail Removed):

<snip>
> K, thought I had this all rigured out thanks to everyone's
> suggestions, but I still have this one RE that's failing and I can't
> figure out why. I have
>
> "G37 Convertible\n$45,750*".matches("^.*\\Q$45,750\\E.* $")
>
> which returns false. If I remove the new line ("\n"), it matches, but
> I can't guarantee my input won't contain new lines. How can I modify
> my regular expression to match? Thanks, - Dave
>

You have not provided sufficient information. Could the new line be located
anywhere, or only adjacent to and in front of the dollar sign?

If the answeer is "anywhere", it may be easier to discard all newlines first.
e.g.

String s ;
..
..
..
s = s.replace("\n", "") ;
..
..
..


Another way if the line terminator could be anywhere is to enable dotall
mode. This causes period to also match line terminators. See the
documentation for Pattern for how to enable this mode.


 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      06-06-2011
On Wed, 01 Jun 2011 16:02:16 +0100, Nigel Wade <(E-Mail Removed)>
wrote, quoted or indirectly quoted someone who said :

> So, to get your '\\n' in the RE you need to
>have '\\\\n' in the string.


Oops.

If you are trying to match a eol char in a regex the two chars in ram
will be \ n

If you are creating a string literal it will be "\\n"

The extra \ is to tell Java this is not a Java literal.

The easy way to create these strings is to use Quoter.
See http://mindprod.com/applet/quoter.html

One you get the hang of it, you can write them off the top of your
head.


--
Roedy Green Canadian Mind Products
http://mindprod.com
How long did it take after the car was invented before owners understood
cars would not work unless you regularly changed the oil and the tires?
We have gone 33 years and still it is rare to uncover a user who
understands computers don't work without regular backups.

 
Reply With Quote
 
Esmond Pitt
Guest
Posts: n/a
 
      06-06-2011
On 6/06/2011 5:50 PM, Roedy Green wrote:
> On Wed, 01 Jun 2011 16:02:16 +0100, Nigel Wade<(E-Mail Removed)>
> wrote, quoted or indirectly quoted someone who said :
>
>> So, to get your '\\n' in the RE you need to
>> have '\\\\n' in the string.

>
> Oops.


No 'oops' about it. The poster is correct.

> If you are trying to match a eol char in a regex the two chars in ram
> will be \ n
>
> If you are creating a string literal it will be "\\n"


And if you are creating a regex it will be "\\\\n".

> The extra \ is to tell Java this is not a Java literal.


And the doubled \\ are there to tell the regex this is a backslashed
backslash, i.e. a real backslash, not a regex escape.

> The easy way to create these strings is to use Quoter.


I don't see how any piece of software can understand whether a quoted
string is for use in a regular expression or not.
 
Reply With Quote
 
Nigel Wade
Guest
Posts: n/a
 
      06-06-2011
On 06/06/11 08:50, Roedy Green wrote:
> On Wed, 01 Jun 2011 16:02:16 +0100, Nigel Wade <(E-Mail Removed)>
> wrote, quoted or indirectly quoted someone who said :
>
>> So, to get your '\\n' in the RE you need to
>> have '\\\\n' in the string.

>
> Oops.


Oops yourself.

>
> If you are trying to match a eol char in a regex the two chars in ram
> will be \ n


If they are, you won't match a newline. The '\' needs to be escaped in
the RE. The string in the RE needs to be \\n.

>
> If you are creating a string literal it will be "\\n"
>
> The extra \ is to tell Java this is not a Java literal.
>
> The easy way to create these strings is to use Quoter.
> See http://mindprod.com/applet/quoter.html
>
> One you get the hang of it, you can write them off the top of your
> head.
>


and, apparently, get them wrong.

I repeat what I said in my previous post:

Anyone who claims they understand RE is just someone who hasn't yet
realized they don't.


--
Nigel Wade



 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      06-08-2011
On Mon, 06 Jun 2011 20:18:24 +1000, Esmond Pitt
<(E-Mail Removed)> wrote, quoted or indirectly quoted someone
who said :

>>> So, to get your '\\n' in the RE you need to
>>> have '\\\\n' in the string.

>>
>> Oops.

>
>No 'oops' about it. The poster is correct.



I presumed you are trying to match a single 0x0a, the usual case.

If you were going to look for it without regexes you would look for
"\n".

If you wanted a regex you want the two chars \ and n in the regex
string.

However, if you do that, Java will think \ is an escape char, so you
need to escape the escape:

"\\n"

If you were trying to scan for the pair of characters \ n
then it gets weird since \ is a escape character both in Java and in
Regex. You scan for
"\\\\n"

Someday we will stop using "in-band" controls and the goofy quoting
problems will go away. You could use two "colours" one for commands
and one for data. We are hamstringing ourselved by imagining our
programming tools are limited to TTYs.
--
Roedy Green Canadian Mind Products
http://mindprod.com
How long did it take after the car was invented before owners understood
cars would not work unless you regularly changed the oil and the tires?
We have gone 33 years and still it is rare to uncover a user who
understands computers don't work without regular backups.

 
Reply With Quote
 
Roedy Green
Guest
Posts: n/a
 
      06-08-2011
On Mon, 06 Jun 2011 20:18:24 +1000, Esmond Pitt
<(E-Mail Removed)> wrote, quoted or indirectly quoted someone
who said :

>I don't see how any piece of software can understand whether a quoted
>string is for use in a regular expression or not.


It can't. It presumes everything are data, and quotes minimally. You
then apply your commands on top of the prequoted sample string.

try it out. It can save you quite a bit of time going cross-eyed
proofreading.

The problem with regexes is all it takes is one char off an the whole
thing does not work. You have no clue where the problem is. You
rarely find errors with syntax checking. There is no trace.
The other problem is a regex will work 90% of the time. It may be
quietly rejecting a small percentage of the strings, and you might not
notice.
--
Roedy Green Canadian Mind Products
http://mindprod.com
How long did it take after the car was invented before owners understood
cars would not work unless you regularly changed the oil and the tires?
We have gone 33 years and still it is rare to uncover a user who
understands computers don't work without regular backups.

 
Reply With Quote
 
Gene Wirchenko
Guest
Posts: n/a
 
      06-08-2011
On Tue, 07 Jun 2011 21:18:54 -0700, Roedy Green
<(E-Mail Removed)> wrote:

[snip]

>The problem with regexes is all it takes is one char off an the whole
>thing does not work. You have no clue where the problem is. You
>rarely find errors with syntax checking. There is no trace.
>The other problem is a regex will work 90% of the time. It may be
>quietly rejecting a small percentage of the strings, and you might not
>notice.


There are more problems than that.

I assume that you are familiar with this quote:
Some people, when confronted with a problem, think "I know, I'll use
regular expressions." Now they have two problems.

I find regexes to be less than totally useful. I sometimes have
to define a format string with substitution parameters. Here is an
example:
Per client's instruction, the total of all invoices for the current
month will be charged against the supplied credit card number on %D
unless we hear otherwise prior to that date.

The date gets substituted for the %D. There are a few rules.
There must be one and only "%D" string. "%" is an escape character
and is doubled for the literal "%".

I could write a regex for this, BUT I also have to have a routine
for executing the string substitution, and regexes do not help with
this. I do not want two rather different versions of the code. (As
it is, I have two versions of code that are somewhat similar.) More
importantly, if one routine gets changed, so should the other, and it
should be obvious how to do it.

If I wanted to add a second variable to the example above, say a
contact name, and wanted the constraint of appearing once and only
once, using a regex would get even uglier.

I could use regexes for such things as validating with no
interpretation, but such data that I have to validate usually has
trivial formatting. For example, a Canadian Postal Code is "A9A 9A9"
with some limitations on the alphabetic characters. A regex would be
overkill.

Sincerely,

Gene Wirchenko
 
Reply With Quote
 
Michael Wojcik
Guest
Posts: n/a
 
      06-08-2011
Roedy Green wrote:
>
> Someday we will stop using "in-band" controls and the goofy quoting
> problems will go away. You could use two "colours" one for commands
> and one for data. We are hamstringing ourselved by imagining our
> programming tools are limited to TTYs.


While in-band signaling in strings is a problem (in fact a number of
problems, leading to many of the most common software vulnerabilities,
such as C buffer overflows and formatting errors), color-coding causes
as many problems as it solves. Not all programmers have "normal" color
vision, for one thing.

Color-coding in programming languages has been tried, notably in the
original Smalltalk. It didn't catch on, and for good reason.

Of course, many people find color-coded views of program source
useful. But since they're optional, they don't penalize programmers
who can't use them, or don't want to.

--
Michael Wojcik
Micro Focus
Rhetoric & Writing, Michigan State University
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
method def in method vs method def in block Kyung won Cheon Ruby 0 11-21-2008 08:48 AM
Confusion over Static class and Static method! Anup Daware ASP .Net 2 02-02-2007 12:12 PM
RE: Method binding confusion Robert Brewer Python 14 05-26-2004 03:31 AM
Method binding confusion A B Carter Python 6 05-03-2004 11:06 AM
help needed with class and method confusion Cndistin Python 3 01-06-2004 09:56 PM



Advertisments