Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Javascript > Checking that user has entered a word or words in text input form using regular expressions...

Reply
Thread Tools

Checking that user has entered a word or words in text input form using regular expressions...

 
 
Luke Matuszewski
Guest
Posts: n/a
 
      04-17-2006
Hi !

I have faced the problem of checking that the user has entered the
unicode letter (not only ASCII set of letters...). It seems that
ECMAScript 3rd regular expressions do not include posix character
classes like:

\p{L}

, which above stands for Unicode letter. Maybe someone has done it ?
(thru negating other known and defined character classes in RegExp
object).

Please help.

Best regards
Luke M.

 
Reply With Quote
 
 
 
 
Hal Rosser
Guest
Posts: n/a
 
      04-18-2006

"Luke Matuszewski" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed) ups.com...
> Hi !
>
> I have faced the problem of checking that the user has entered the
> unicode letter (not only ASCII set of letters...). It seems that
> ECMAScript 3rd regular expressions do not include posix character
> classes like:
>
> \p{L}
>
> , which above stands for Unicode letter. Maybe someone has done it ?
> (thru negating other known and defined character classes in RegExp
> object).
>
> Please help.
>
> Best regards
> Luke M.
>

How about if the value.length is > 0 ?
anything they could paste or type would be covered.


 
Reply With Quote
 
 
 
 
Lasse Reichstein Nielsen
Guest
Posts: n/a
 
      04-18-2006
"Luke Matuszewski" <(E-Mail Removed)> writes:

> I have faced the problem of checking that the user has entered the
> unicode letter (not only ASCII set of letters...).


As you say, RegExp's are not helping. Nor is there the more direct
approach which would be String.prototype.isAlpha. I'm afraid you will
have to do it yourself.

As a shorthand, you can try testing whether
string.toLowerCase() != string.toUpperCase()
but I bet there are letters with only one case.

You could also consider why it's so important that only letters
are entered. After all, there are some pretty weird letters out
there, where a normal digit would look much nicer.

Youmight also consider whether "letter" is the correct description,
or if what you want is what the Unicode specification calls "Alphabetic".
See <URL:http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt>
(You can also see why it's something of a mouthful to create a regexp
for it

/L
--
Lasse Reichstein Nielsen - http://www.velocityreviews.com/forums/(E-Mail Removed)
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
 
Reply With Quote
 
RobG
Guest
Posts: n/a
 
      04-18-2006
Lasse Reichstein Nielsen said on 18/04/2006 4:06 PM AEST:
> "Luke Matuszewski" <(E-Mail Removed)> writes:
>
>
>> I have faced the problem of checking that the user has entered the
>>unicode letter (not only ASCII set of letters...).

>
>
> As you say, RegExp's are not helping. Nor is there the more direct
> approach which would be String.prototype.isAlpha. I'm afraid you will
> have to do it yourself.


[...]

> Youmight also consider whether "letter" is the correct description,


The phrase 'the letter' has me confused. You seem to have interpreted
it as 'a letter', which may well be what the OP meant.


> or if what you want is what the Unicode specification calls "Alphabetic".
> See <URL:http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt>
> (You can also see why it's something of a mouthful to create a regexp
> for it


If that is the requirement, why not:

if ( !/\d/.test(inputValue) )
{
// inputValue doesn't have any digits
}


--
Rob
Group FAQ: <URL:http://www.jibbering.com/FAQ>
 
Reply With Quote
 
Lasse Reichstein Nielsen
Guest
Posts: n/a
 
      04-18-2006
RobG <(E-Mail Removed)> writes:

> Lasse Reichstein Nielsen said on 18/04/2006 4:06 PM AEST:
>> or if what you want is what the Unicode specification calls "Alphabetic".
>> See <URL:http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt>
>> (You can also see why it's something of a mouthful to create a regexp
>> for it

>
> If that is the requirement, why not:
>
> if ( !/\d/.test(inputValue) )
> {
> // inputValue doesn't have any digits


Because there's more (much more) to Unicode than letters and digits.
In the file linked, the Grapheme_Base and Math groups contains symbols
that are neither digit nor letter. Take, e.g., codepoint 0x3251:
"circled numer twenty one", or 0x4dc0 "Hexagram for the creative
heaven".

/L
--
Lasse Reichstein Nielsen - (E-Mail Removed)
DHTML Death Colors: <URL:http://www.infimum.dk/HTML/rasterTriangleDOM.html>
'Faith without judgement merely degrades the spirit divine.'
 
Reply With Quote
 
Luke Matuszewski
Guest
Posts: n/a
 
      04-18-2006

Lasse Reichstein Nielsen wrote:
> RobG <(E-Mail Removed)> writes:
>
> > Lasse Reichstein Nielsen said on 18/04/2006 4:06 PM AEST:
> >> or if what you want is what the Unicode specification calls "Alphabetic".
> >> See <URL:http://www.unicode.org/Public/UNIDATA/DerivedCoreProperties.txt>
> >> (You can also see why it's something of a mouthful to create a regexp
> >> for it

> >
> > If that is the requirement, why not:
> >
> > if ( !/\d/.test(inputValue) )
> > {
> > // inputValue doesn't have any digits

>
> Because there's more (much more) to Unicode than letters and digits.
> In the file linked, the Grapheme_Base and Math groups contains symbols
> that are neither digit nor letter. Take, e.g., codepoint 0x3251:
> "circled numer twenty one", or 0x4dc0 "Hexagram for the creative
> heaven".
>


There are Unicode letters and Unicode blocks (like InMongolian). For
better understanding what i really mean please read "Unicode support"
paragraph in the followin URL:

<URL:http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html>

(see also: http://www.unicode.org/unicode/reports/tr18/ ).

I did not checked the ECMAScript 4 proposal/standard track, but they
should 'upgrade' regular expressions to support Classes for Unicode
blocks and categories.

Best regards
Luke M.

 
Reply With Quote
 
Thomas 'PointedEars' Lahn
Guest
Posts: n/a
 
      04-18-2006
Lasse Reichstein Nielsen wrote:

> "Luke Matuszewski" <(E-Mail Removed)> writes:
>> I have faced the problem of checking that the user has entered the
>> unicode letter (not only ASCII set of letters...).

>
> As you say, RegExp's are not helping.


But they are. It is just a matter of how complex the RegExp should/can be.

> Nor is there the more direct approach which would be
> String.prototype.isAlpha. I'm afraid you will
> have to do it yourself.


But one does not have to reinvent the wheel completely, and can use the
definition for name characters in XML specifications[1], for example,
instead. That also works for identifiers, BTW.


PointedEars
___________
[1] <URL:http://www.w3.org/XML/Core/#Publications>
 
Reply With Quote
 
Dr John Stockton
Guest
Posts: n/a
 
      04-19-2006
JRS: In article <(E-Mail Removed). com>,
dated Mon, 17 Apr 2006 16:41:43 remote, seen in
news:comp.lang.javascript, Luke Matuszewski
<(E-Mail Removed)> posted :
>
> I have faced the problem of checking that the user has entered the
>unicode letter (not only ASCII set of letters...).


More generally, ISTM that it would be useful to extend RegExp notation.

\w is really a misnomer, since it refers to more than the general
"English word" characters A-Z; \i for Identifier would have been better.

\z appears to be free (so is \l; but that looks like \1), and could be
used to mean "letter of the current language".

It could be preset by default to A-Z or browser preference, reset by any
recognised language indication among the page headers, and resettable by
giving a country code or some form of expression ( [A-Z]-[AEIOU] meaning
the consonants, for example) ; it should be saveable in a variable.

--
John Stockton, Surrey, UK. ?@merlyn.demon.co.uk Turnpike v4.00 IE 4
<URL:http://www.jibbering.com/faq/> JL/RC: FAQ of news:comp.lang.javascript
<URL:http://www.merlyn.demon.co.uk/js-index.htm> jscr maths, dates, sources.
<URL:http://www.merlyn.demon.co.uk/> TP/BP/Delphi/jscr/&c, FAQ items, links.
 
Reply With Quote
 
Luke Matuszewski
Guest
Posts: n/a
 
      04-22-2006

Dr John Stockton wrote:
>
> \z appears to be free (so is \l; but that looks like \1), and could be
> used to mean "letter of the current language".
>


Also the Perl \p{prop} and \P{prop} notation could be included.

<quote
url="http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Pattern.html#ubc">

\p{prop} matches if the input has the property prop, while \P{prop}
does not match if the input has that property. Blocks are specified
with the prefix In, as in InMongolian. Categories may be specified with
the optional prefix Is: Both \p{L} and \p{IsL} denote the category of
Unicode letters. Blocks and categories can be used both inside and
outside of a character class.

</quote>

BR
Luke M.

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
problem in running a basic code in python 3.3.0 that includes HTML file Satabdi Mukherjee Python 1 04-04-2013 07:48 PM
Processing user input as it's entered Sven Python 0 03-26-2013 10:07 AM
checking for nothing entered in an asp form post wk6pack ASP General 3 07-05-2005 06:31 PM
Parsing user-entered content to remove rude words Greg Perl Misc 9 04-17-2005 03:32 PM
Date entered from textbox becomes null (1/1/1900) when entered into SQL table. TN Bella ASP .Net 1 07-01-2004 02:53 PM



Advertisments