Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > matching all perldoc names but no more

Reply
Thread Tools

matching all perldoc names but no more

 
 
wana
Guest
Posts: n/a
 
      11-06-2004
I was getting carried away answering myself in another thread so I thought I
should purify my actual problem:

I am allowing a user to enter a perldoc name and I will run 'perldoc $name'
for them.

What regex will match all perldoc names but not allow for a command to be
slipped into the name.

for example, here is my latest:

/^[a-zA-Z1-9\:]+$/

if you allowed just anything:

/.*/

a user could enter 'perlref | rm -r ./*' or something like that.

previous attempts:

/^[a-z]+$/

seemed perfect but left out perlfaq1-9

/^[a-z1-9]+$/

left out CGI and other ones with caps.

Is there a rule for all current and future perldoc names? I mean, they
can't possible have a | or a > in their name or even a space in the middle,
right?

wana
 
Reply With Quote
 
 
 
 
Tad McClellan
Guest
Posts: n/a
 
      11-06-2004
wana <> wrote:

> I am allowing a user to enter a perldoc name and I will run 'perldoc $name'
> for them.
>
> What regex will match all perldoc names but not allow for a command to be
> slipped into the name.



You won't need to solve that problem if you choose an approach
that does not require solving that problem.

If they can only look up the std docs, then build a lookup table
of the actual installed std docs, see code below.

Or maybe process the =head2 POD tags in perltoc.pod for legal names.

I think this ought to work though: /^(\w|:+$/

(leaving out single quote on purpose since it is deprecated.)


---------------------------------
#!/usr/bin/perl
use warnings;
use strict;

foreach my $pod ( 'foo bar', qw/ perlnope perl perltoc perlfunc / ) {
if ( is_pod($pod) )
{ print "$pod is a POD\n" }
else
{ print "$pod is *not* a POD\n" }
}


BEGIN {
my %pods;

chomp( my $dir = qx/ perldoc -l perlfunc / );
$dir =~ s#/[^/]+$##; # should use File::Basename here...

opendir POD, $dir or die "could not open '$dir' directory $!";
$pods{ $_ } = 1 for map { s/.pod$// ? $_ : () } readdir POD;
closedir POD;

sub is_pod { exists $pods{ $_[0] } ? 1 : 0 }
}
---------------------------------


--
Tad McClellan SGML consulting
Perl programming
Fort Worth, Texas
 
Reply With Quote
 
 
 
 
A. Sinan Unur
Guest
Posts: n/a
 
      11-06-2004
wana <> wrote in
news::

> I was getting carried away answering myself in another thread so I
> thought I should purify my actual problem:
>
> I am allowing a user to enter a perldoc name and I will run 'perldoc
> $name' for them.


I thinking you are going down the wrong road. You know exactly the list of
phrases you want to allow. Why don't you just restrict the options to that.
Even if you do not have Perl on your computer, it is not hard to write
script to parse the output of perldoc perltoc. That will give you the list
of allowable phrases. Now, you can make sure the phrase sent to your CGI
matches only one of those in the set of allowable perldoc arguments.

Sinan
 
Reply With Quote
 
wana
Guest
Posts: n/a
 
      11-06-2004
Tad McClellan wrote:

> wana <> wrote:
>
>> I am allowing a user to enter a perldoc name and I will run 'perldoc
>> $name' for them.
>>
>> What regex will match all perldoc names but not allow for a command to be
>> slipped into the name.

>
>
> You won't need to solve that problem if you choose an approach
> that does not require solving that problem.
>
> If they can only look up the std docs, then build a lookup table
> of the actual installed std docs, see code below.
>
> Or maybe process the =head2 POD tags in perltoc.pod for legal names.
>
> I think this ought to work though: /^(\w|:+$/


I only avoided \w because perlre states that it is not portable across
character sets and may be insecure, which is critical in my case. That may
or may not be an issue in my program.

wana

>
> (leaving out single quote on purpose since it is deprecated.)
>
>
> ---------------------------------
> #!/usr/bin/perl
> use warnings;
> use strict;
>
> foreach my $pod ( 'foo bar', qw/ perlnope perl perltoc perlfunc / ) {
> if ( is_pod($pod) )
> { print "$pod is a POD\n" }
> else
> { print "$pod is *not* a POD\n" }
> }
>
>
> BEGIN {
> my %pods;
>
> chomp( my $dir = qx/ perldoc -l perlfunc / );
> $dir =~ s#/[^/]+$##; # should use File::Basename here...
>
> opendir POD, $dir or die "could not open '$dir' directory $!";
> $pods{ $_ } = 1 for map { s/.pod$// ? $_ : () } readdir POD;
> closedir POD;
>
> sub is_pod { exists $pods{ $_[0] } ? 1 : 0 }
> }
> ---------------------------------
>
>


 
Reply With Quote
 
wana
Guest
Posts: n/a
 
      11-08-2004
Jim Gibson wrote:

> In article <>, wana
> <> wrote:
>
>> Tad McClellan wrote:
>>
>> > wana <> wrote:
>> >

>
> [ problem of untainting perldoc subjects snipped ]
>
>> >
>> > I think this ought to work though: /^(\w|:+$/

>>
>> I only avoided \w because perlre states that it is not portable across
>> character sets and may be insecure, which is critical in my case. That
>> may or may not be an issue in my program.

>
> Where in perldoc perlre does it say that? It does not say it in the
> version (5.8.5) on my computer. I could not find the string 'insecure'
> anywhere in 'perldoc perlre', and 'portable' only occurs once in a
> discussion of character ranges.


The words to look for are 'unsafe' and 'unportable' about 78% into perlre.
The discussion about character ranges is what I am talking about.
[a-zA-Z1-9] is safe but \w may vary in different locales.

wana
 
Reply With Quote
 
Alan J. Flavell
Guest
Posts: n/a
 
      11-08-2004
On Mon, 8 Nov 2004, wana wrote:

> Jim Gibson wrote:
>
> > In article <>, wana


> >> I only avoided \w because perlre states that it is not portable
> >> across character sets and may be insecure, which is critical in
> >> my case. That may or may not be an issue in my program.


That depends on what you mean by "insecure".

> > Where in perldoc perlre does it say that? It does not say it in
> > the version (5.8.5) on my computer. I could not find the string
> > 'insecure' anywhere in 'perldoc perlre', and 'portable' only
> > occurs once in a discussion of character ranges.

>
> The words to look for are 'unsafe' and 'unportable' about 78% into perlre.


I don't read that as being about "security" (in the usual meaning of
that term)...

> The discussion about character ranges is what I am talking about.
> [a-zA-Z1-9] is safe


It'll reliably do a specific job. I'd suggest that the use of the
word "unsafe" in the documentation is a bit misleading. I think in
this specific reference it means "might not do what the naive reader
expects"; but "unsafe" often refers to the possibility of malicious
data causing security-relevant damage to result (such as, for example,
unintended interpolation taking place using externally-derived data),
and that's not what is intended here, AFAICS.

> but \w may vary in different locales.


Which, in some situations, might be exactly what one wants.

all the best
 
Reply With Quote
 
Ben Morrow
Guest
Posts: n/a
 
      11-09-2004

Quoth "Alan J. Flavell" <>:
> On Mon, 8 Nov 2004, wana wrote:
> > Jim Gibson wrote:
> >
> > > In article <>, wana

>
> > >> I only avoided \w because perlre states that it is not portable
> > >> across character sets and may be insecure, which is critical in
> > >> my case. That may or may not be an issue in my program.

>

<snip>
>
> It'll reliably do a specific job. I'd suggest that the use of the
> word "unsafe" in the documentation is a bit misleading. I think in
> this specific reference it means "might not do what the naive reader
> expects"; but "unsafe" often refers to the possibility of malicious
> data causing security-relevant damage to result (such as, for example,
> unintended interpolation taking place using externally-derived data),
> and that's not what is intended here, AFAICS.


The locale is externally-derived data. A malicious user could (under
some OSen at least) construct their own locale that said ';' was a word
character.

I would hope (but I haven't tested) that if 'use locale' is in effect
and the locale setting was tainted then such regexen won't untaint...
One can always secure things by explicitly asking for the C locale, or
simply not using 'locale', which will cause \w to match what you expect.

> > but \w may vary in different locales.

>
> Which, in some situations, might be exactly what one wants.


Of course, but not when dealing with shell metachars.

Ben

--
"The Earth is degenerating these days. Bribery and corruption abound.
Children no longer mind their parents, every man wants to write a book,
and it is evident that the end of the world is fast approaching."
-Assyrian stone tablet, c.2800 BC
 
Reply With Quote
 
wana
Guest
Posts: n/a
 
      11-09-2004
Jim Gibson wrote:

> In article <>, wana
> <> wrote:
>
>> Jim Gibson wrote:
>>
>> > In article <>, wana
>> > <> wrote:
>> >
>> >> Tad McClellan wrote:
>> >>
>> >> > wana <> wrote:
>> >> >
>> >
>> > [ problem of untainting perldoc subjects snipped ]
>> >
>> >> >
>> >> > I think this ought to work though:Â*Â*Â*Â*^(\w|:+$
>> >>
>> >> I only avoided \w because perlre states that it is not portable across
>> >> character sets and may be insecure, which is critical in my case.
>> >> That may or may not be an issue in my program.
>> >
>> > Where in perldoc perlre does it say that? It does not say it in the
>> > version (5.8.5) on my computer. I could not find the string 'insecure'
>> > anywhere in 'perldoc perlre', and 'portable' only occurs once in a
>> > discussion of character ranges.

>>
>> The words to look for are 'unsafe' and 'unportable' about 78% into
>> perlre. The discussion about character ranges is what I am talking about.
>> [a-zA-Z1-9] is safe but \w may vary in different locales.

>
> The warning is about defining your own character ranges, such as [ -~]
> for the ascii printable set. That may give an error in other character
> sets. The doc says nothing about character classes such as \w being
> unsafe or unportable across character sets. In fact, it implies that
> using \w is safer than defining your own character sets.
>
> Here it is from perlre:
>
> "Note also that the whole range idea is rather unportable between char-
> acter sets--and even within character sets they may cause results you
> probably didn't expect.Â*Â*AÂ*soundÂ*principleÂ*isÂ*toÂ*useÂ*onlyÂ*rangesÂ*that
> begin from and end at either alphabets of equal case ([a-e], [A-E]),Â*Â*or
> digits ([0-9]).Â*Â*AnythingÂ*elseÂ*isÂ*unsafe.Â*Â*IfÂ*inÂ*doubt,Â*spellÂ*outÂ*the
> character sets in full."


Â*for example:

$comm = $ARGV[0];
if ($comm =~ /^\w+/$) # the same as ^[a-zA-Z1-9_]+$
{
Â*Â*Â*Â*Â*Â*Â*Â*`echoÂ*$comm`
}

this prevents a user from slipping in dangerous characters like | or >
etc...

Suppose a new character set comes along and is described by a different
locale.Â*Â*ThenÂ*supposeÂ*thisÂ*codeÂ*isÂ*cut&pasteÂ*orÂ*includedÂ*otherwiseÂ*within
the new locale which has a character in its alphabet that the shell
interpretes as | for example.Â*Â*NowÂ*thereÂ*isÂ*aÂ*securityÂ*compromise,Â*henceÂ*it
is insecure and unsafe.Â*Â*IÂ*don'tÂ*knowÂ*ifÂ*thisÂ*isÂ*possible,Â*butÂ*that'sÂ*what
I read into the statement in perlre.Â*Â*IfÂ*thisÂ*isÂ*possible,Â*itÂ*isÂ*clearlyÂ*a
potential, though unlikely, security risk.Â*Â*IÂ*believeÂ*perlsecÂ*touches
briefly on the same subject.

wana
 
Reply With Quote
 
Alan J. Flavell
Guest
Posts: n/a
 
      11-09-2004
On Tue, 9 Nov 2004, Ben Morrow wrote:

> Quoth "Alan J. Flavell" <>:
> >
> > It'll reliably do a specific job. I'd suggest that the use of the
> > word "unsafe" in the documentation is a bit misleading. I think in
> > this specific reference it means "might not do what the naive reader
> > expects"; but "unsafe" often refers to the possibility of malicious
> > data causing security-relevant damage to result (such as, for example,
> > unintended interpolation taking place using externally-derived data),
> > and that's not what is intended here, AFAICS.

>
> The locale is externally-derived data. A malicious user could (under
> some OSen at least) construct their own locale that said ';' was a word
> character.


Good call. I withdraw the comment.

> I would hope (but I haven't tested) that if 'use locale' is in effect
> and the locale setting was tainted then such regexen won't untaint...


Let's hope so.

> > > but \w may vary in different locales.

> >
> > Which, in some situations, might be exactly what one wants.

>
> Of course, but not when dealing with shell metachars.


I take it you were commenting here on the specific problem, rather
than on the cited documentation as such.

cheers
 
Reply With Quote
 
Alan J. Flavell
Guest
Posts: n/a
 
      11-09-2004
On Tue, 9 Nov 2004, wana wrote:

> Alan J. Flavell wrote:

[snip]
> > Good call. I withdraw the comment.

[snip]

> Thanks to all for further discussion. I still think that the security issue
> with tainted data is at least partly the intent of this paragraph in
> perlre.


Just so. That's why I accepted that my comment had been misguided.

> I mentioned that the \w topic is also discussed in perlsec:


[...]

> The second paragraph makes it clear that this is the issue. It is
> really not a big deal and on the outer fringes of my perl knowledge
> as a newbie and an amateur. I just wanted to make my point that
> what I read in perlre meant what I thought it meant. At least I am
> finally reading my perldocs before posting!


Absolutely. My apologies that I missed this point the first time
around. It'll remind me to check the documentation properly myself
instead of just skim-reading it.

Umble pie for tea today...

cheers
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Help with Pattern matching. Matching multiple lines from while reading from a file. Bobby Chamness Perl Misc 2 05-03-2007 06:02 PM
Matching Directory Names and Grouping Them J Python 4 01-12-2007 04:02 PM
Matching attribute names to element names in a different path Carl XML 0 04-01-2004 01:15 PM
Pattern matching : not matching problem Marc Bissonnette Perl Misc 9 01-13-2004 05:52 PM
Perl faq and posted to newsgroup but not to perldoc.com Upstart Perl Misc 1 08-11-2003 03:30 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57