Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Trying to write my first Regex's

Reply
Thread Tools

Trying to write my first Regex's

 
 
Robert TV
Guest
Posts: n/a
 
      06-25-2004
Hi, I am trying to learn the fine points of writing correct regex's to
untaint my data. I have gone through a few tutorials and I have a very basic
idea of their operations. I would like some assistance writing them
correctly.

Example 1

$name = "Jimmy Spenser";
# allow $name to only have letters or spaces by filtering out unwanted junk
if ($name =~ /\d|[\!\@\#\$\%\^\&\*\(\)\-\=\_\+]/ {
print "Bad"
} else {
print "Good";
}

Im sure the above is sloppy and right now your laughing. Also there are
other charaters that exist that were not included in the filter. It was my
goal to filter out and digits "\d" and all the trailing characters. I tried
$name =~ /\W/ but that wouldn't allow spaces. What is the best was to allow
$name to only have any case letters or spaces?

Example 2

$address = "#12 - 4243 Jones Street.";
# allow $address to only have letters, digits, the # sign or spaces by
filtering out unwanted junk
if ($name =~ /[\!\@\$\%\^\&\*\(\)\-\=\_\+]/ {
print "Bad"
} else {
print "Good";
}

Now my filter needs to allow digits and the # sign as well as letters and
periods and spaces etc. Is there a way to better write these filters so that
I can "define" what I consider allowable instead of filtering out what is
bad? $name is allowed to have for instance /digits/letters/number
sign/period/spaces/ but does not HAVE to contain them, any other charater
would be detected as bad.

My end goal will be creating a web form that will be secsure by not allowing
bad stuff.

Thank you all

Robert


 
Reply With Quote
 
 
 
 
Bob Walton
Guest
Posts: n/a
 
      06-25-2004
Robert TV wrote:

> Hi, I am trying to learn the fine points of writing correct regex's to
> untaint my data. I have gone through a few tutorials and I have a very basic
> idea of their operations. I would like some assistance writing them
> correctly.
>
> Example 1
>
> $name = "Jimmy Spenser";
> # allow $name to only have letters or spaces by filtering out unwanted junk
> if ($name =~ /\d|[\!\@\#\$\%\^\&\*\(\)\-\=\_\+]/ {



You'd better carefully read and study "perldoc perlre" -- that regexp
isn't even close. It will match any string containing anywhere in it
one of the characters: a digit, !, @, #, $, %, ^, &, *, (, ), -, =, _,
+, but will fail to match many many other characters you probably don't
want either, like all the control characters, ~, `, [, {, |, \, etc etc.
If you wanted to match any string which contains a character that is
not a letter or whitespace, you might try:

if($name =~ /[^a-z\s]/i){

But warning: that is not how to untaint stuff. Keep reading.


> print "Bad"
> } else {
> print "Good";
> }
>



Well, you want to design a regexp that will allow only what you want,
not one that disallows specific stuff -- if you happen to neglect a
disallow item, it would get through. So to have a regexp that matches
only on all letters or whitespace, try:

if($name =~ /^[a-z\s]*$/i){
print "Good\n";
}
else{
print "Bad\n";
}

In that regexp, the /i switch is used on the end to make it case
insensitive (saves making the character class [a-zA-Z\s]). The ^
anchors the start of the match at the beginning of the string so
something like ***blah won't match, and the $ anchors the end of the
match at the end of the string so something like blah*** won't match.
Note that \s is a code for a regexp that matches any one single
whitespace character.

You should also read up on tainting (perldoc perlsec) where you will
learn that you need to assign a variable's value from one of the $1, $2
etc variables which result from a successful pattern match from a regexp
containing parentheses groupings. This means something like:

...
if($name =~ /^([a-z\s]*)$/i){
$name=$1; #$name is now untainted
}
else{
die "\$name had a bad value which I refuse to untaint: $name";
}
...


> Im sure the above is sloppy and right now your laughing. Also there are
> other charaters that exist that were not included in the filter. It was my
> goal to filter out and digits "\d" and all the trailing characters. I tried
> $name =~ /\W/ but that wouldn't allow spaces. What is the best was to allow
> $name to only have any case letters or spaces?
>
> Example 2
>
> $address = "#12 - 4243 Jones Street.";
> # allow $address to only have letters, digits, the # sign or spaces by
> filtering out unwanted junk
> if ($name =~ /[\!\@\$\%\^\&\*\(\)\-\=\_\+]/ {
> print "Bad"
> } else {
> print "Good";
> }
>



Again, write a regexp to match only on what you *want to permit*, like:

if($name =~ /^([a-z\d#\s]*)$/i){
$name=$1; #$name now untainted
}
else {
die "I refuse to untaint this tainted crap: $name";
}

I note, though, that this will fail on your example string because it
contains a period and a hyphen, neither of which is among your defined
permitted characters above.


> Now my filter needs to allow digits and the # sign as well as letters and
> periods and spaces etc. Is there a way to better write these filters so that
> I can "define" what I consider allowable instead of filtering out what is
> bad? $name is allowed to have for instance /digits/letters/number
> sign/period/spaces/ but does not HAVE to contain them, any other charater
> would be detected as bad.
>
> My end goal will be creating a web form that will be secsure by not allowing
> bad stuff.



An admirable goal. Be sure to very carefully think through what you
permit, as making a bad decision in your untainting regexp can leave
security holes. Just the fact that Perl considers the data to be
untainted does not mean it is secure -- that is up to your regexp. Perl
helps you a lot by letting you know it is certain that you did pass the
data through an untaining regexp.


....


> Robert


--
Bob Walton
Email: http://bwalton.com/cgi-bin/emailbob.pl

 
Reply With Quote
 
 
 
 
Iain Chalmers
Guest
Posts: n/a
 
      06-25-2004
In article <ILLCc.847056$Pk3.308032@pd7tw1no>,
"Robert TV" <(E-Mail Removed)> wrote:

> Hi, I am trying to learn the fine points of writing correct regex's to
> untaint my data. I have gone through a few tutorials and I have a very basic
> idea of their operations. I would like some assistance writing them
> correctly.
>
> Example 1
>
> $name = "Jimmy Spenser";
> # allow $name to only have letters or spaces by filtering out unwanted junk
> if ($name =~ /\d|[\!\@\#\$\%\^\&\*\(\)\-\=\_\+]/ {
> print "Bad"
> } else {
> print "Good";
> }
>
> Im sure the above is sloppy and right now your laughing. Also there are
> other charaters that exist that were not included in the filter. It was my
> goal to filter out and digits "\d" and all the trailing characters. I tried
> $name =~ /\W/ but that wouldn't allow spaces. What is the best was to allow
> $name to only have any case letters or spaces?


Note the ^ as the first character in a character class negates the
class, so:


if ($name =~ /[^A-Za-z ]/) { print "Bad"}

means "if name contains anything thats not [A-Za-z ]"

>
> Example 2
>
> $address = "#12 - 4243 Jones Street.";
> # allow $address to only have letters, digits, the # sign or spaces by
> filtering out unwanted junk
> if ($name =~ /[\!\@\$\%\^\&\*\(\)\-\=\_\+]/ {
> print "Bad"
> } else {
> print "Good";
> }


if ($address=~ /[^0-9A-Za-z#. ]/) { print "Bad"}

>
> Now my filter needs to allow digits and the # sign as well as letters and
> periods and spaces etc. Is there a way to better write these filters so that
> I can "define" what I consider allowable instead of filtering out what is
> bad? $name is allowed to have for instance /digits/letters/number
> sign/period/spaces/ but does not HAVE to contain them, any other charater
> would be detected as bad.


See character classes in perlre

perldoc perlre

cheers,

big
--
"I ran out of gas! I got a flat tire! I didn't have change for cab fare!
I lost my tux at the cleaners! I locked my keys in the car! An old friend
came in from out of town! Someone stole my car! There was an earthquake!
A terrible flood! Locusts! It wasn't my fault I swear to god!" Jake Blues
 
Reply With Quote
 
Robert TV
Guest
Posts: n/a
 
      06-25-2004
"Bob Walton" <(E-Mail Removed)> wrote
> An admirable goal. Be sure to very carefully think through what you
> permit, as making a bad decision in your untainting regexp can leave
> security holes. Just the fact that Perl considers the data to be
> untainted does not mean it is secure -- that is up to your regexp. Perl
> helps you a lot by letting you know it is certain that you did pass the
> data through an untaining regexp.


Thank you Bob, that was an excellent reply, your suggestions and advice will
be of great value in my learning process. I really appreciate your
assistance.

Robert


 
Reply With Quote
 
Daedalus
Guest
Posts: n/a
 
      06-25-2004
> > Now my filter needs to allow digits and the # sign as well as letters
and
> > periods and spaces etc. Is there a way to better write these filters so

that
> > I can "define" what I consider allowable instead of filtering out what

is
> > bad? $name is allowed to have for instance /digits/letters/number
> > sign/period/spaces/ but does not HAVE to contain them, any other

charater
> > would be detected as bad.
> >
> > My end goal will be creating a web form that will be secsure by not

allowing
> > bad stuff.

>
>
> An admirable goal. Be sure to very carefully think through what you
> permit, as making a bad decision in your untainting regexp can leave
> security holes. Just the fact that Perl considers the data to be
> untainted does not mean it is secure -- that is up to your regexp. Perl
> helps you a lot by letting you know it is certain that you did pass the
> data through an untaining regexp.
>


It might be a good idea to make a more precise regexp when permitting
special caracter, specifying where it can be used in the string rather than
just permit it within a class.

DAE



 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Trying to write a first plain ruby script Jesse Crockett Ruby 2 07-11-2008 05:03 PM
Trying to connect wirelessly for the first time.. need help =?Utf-8?B?UmFpbiBSYWJiaXQ=?= Wireless Networking 1 01-22-2007 03:43 AM
trying to use swig for the first time Gary Wilson Jr Python 0 01-23-2006 08:00 PM
Trying to convert my first movie Chirashi DVD Video 8 01-22-2004 06:43 PM
Trying to create my first ASP.Net application Frank Wilson ASP .Net 6 08-14-2003 12:18 AM



Advertisments