Robert TV wrote:
> Hi, I am trying to learn the fine points of writing correct regex's to
> untaint my data. I have gone through a few tutorials and I have a very basic
> idea of their operations. I would like some assistance writing them
> correctly.
>
> Example 1
>
> $name = "Jimmy Spenser";
> # allow $name to only have letters or spaces by filtering out unwanted junk
> if ($name =~ /\d|[\!\@\#\$\%\^\&\*\(\)\-\=\_\+]/
{
You'd better carefully read and study "perldoc perlre" -- that regexp
isn't even close. It will match any string containing anywhere in it
one of the characters: a digit, !, @, #, $, %, ^, &, *, (, ), -, =, _,
+, but will fail to match many many other characters you probably don't
want either, like all the control characters, ~, `, [, {, |, \, etc etc.
If you wanted to match any string which contains a character that is
not a letter or whitespace, you might try:
if($name =~ /[^a-z\s]/i){
But warning: that is not how to untaint stuff. Keep reading.
> print "Bad"
> } else {
> print "Good";
> }
>
Well, you want to design a regexp that will allow only what you want,
not one that disallows specific stuff -- if you happen to neglect a
disallow item, it would get through. So to have a regexp that matches
only on all letters or whitespace, try:
if($name =~ /^[a-z\s]*$/i){
print "Good\n";
}
else{
print "Bad\n";
}
In that regexp, the /i switch is used on the end to make it case
insensitive (saves making the character class [a-zA-Z\s]). The ^
anchors the start of the match at the beginning of the string so
something like ***blah won't match, and the $ anchors the end of the
match at the end of the string so something like blah*** won't match.
Note that \s is a code for a regexp that matches any one single
whitespace character.
You should also read up on tainting (perldoc perlsec) where you will
learn that you need to assign a variable's value from one of the $1, $2
etc variables which result from a successful pattern match from a regexp
containing parentheses groupings. This means something like:
...
if($name =~ /^([a-z\s]*)$/i){
$name=$1; #$name is now untainted
}
else{
die "\$name had a bad value which I refuse to untaint: $name";
}
...
> Im sure the above is sloppy and right now your laughing. Also there are
> other charaters that exist that were not included in the filter. It was my
> goal to filter out and digits "\d" and all the trailing characters. I tried
> $name =~ /\W/ but that wouldn't allow spaces. What is the best was to allow
> $name to only have any case letters or spaces?
>
> Example 2
>
> $address = "#12 - 4243 Jones Street.";
> # allow $address to only have letters, digits, the # sign or spaces by
> filtering out unwanted junk
> if ($name =~ /[\!\@\$\%\^\&\*\(\)\-\=\_\+]/
{
> print "Bad"
> } else {
> print "Good";
> }
>
Again, write a regexp to match only on what you *want to permit*, like:
if($name =~ /^([a-z\d#\s]*)$/i){
$name=$1; #$name now untainted
}
else {
die "I refuse to untaint this tainted crap: $name";
}
I note, though, that this will fail on your example string because it
contains a period and a hyphen, neither of which is among your defined
permitted characters above.
> Now my filter needs to allow digits and the # sign as well as letters and
> periods and spaces etc. Is there a way to better write these filters so that
> I can "define" what I consider allowable instead of filtering out what is
> bad? $name is allowed to have for instance /digits/letters/number
> sign/period/spaces/ but does not HAVE to contain them, any other charater
> would be detected as bad.
>
> My end goal will be creating a web form that will be secsure by not allowing
> bad stuff.
An admirable goal. Be sure to very carefully think through what you
permit, as making a bad decision in your untainting regexp can leave
security holes. Just the fact that Perl considers the data to be
untainted does not mean it is secure -- that is up to your regexp. Perl
helps you a lot by letting you know it is certain that you did pass the
data through an untaining regexp.
....
> Robert
--
Bob Walton
Email:
http://bwalton.com/cgi-bin/emailbob.pl