Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > filemask to regex

Reply
Thread Tools

filemask to regex

 
 
George Mpouras
Guest
Posts: n/a
 
      08-24-2013
I want to convert a OS filemask with possible wildcards to regex
What to you think of the following approach

$mask = "??-media*.wm?";
$mask=~s|\*\.\*|\*|g; # *.* -> *
$mask=~s|\.|\\.|g; # . -> \.
$mask=~s|\?|.|g; # ? -> .
$mask=~s|\*|.*?|g; # * -> .*?
$mask=~s/(\(|\)|\+|\^|\[|\]|\{|\}|\$|\@|\%)/\\$1/g; #escape ()+^[]{}$@%
$mask = qr/^$mask$/i;
 
Reply With Quote
 
 
 
 
Rainer Weikusat
Guest
Posts: n/a
 
      08-25-2013
George Mpouras <(E-Mail Removed)> writes:
> I want to convert a OS filemask with possible wildcards to regex
> What to you think of the following approach
>
> $mask = "??-media*.wm?";
> $mask=~s|\*\.\*|\*|g; # *.* -> *


[...]

> $mask=~s|\*|.*?|g; # * -> .*?


This sequence of conversion is wrong because it will translate *.* to
..*?, ie, something which matches a string with not . in it.
 
Reply With Quote
 
 
 
 
Rainer Weikusat
Guest
Posts: n/a
 
      08-25-2013
George Mpouras <(E-Mail Removed)> writes:
> I want to convert a OS filemask with possible wildcards to regex
> What to you think of the following approach
>
> $mask = "??-media*.wm?";
> $mask=~s|\*\.\*|\*|g; # *.* -> *
> $mask=~s|\.|\\.|g; # . -> \.
> $mask=~s|\?|.|g; # ? -> .
> $mask=~s|\*|.*?|g; # * -> .*?
> $mask=~s/(\(|\)|\+|\^|\[|\]|\{|\}|\$|\@|\%)/\\$1/g; #escape ()+^[]{}$@%
> $mask = qr/^$mask$/i;


I think I would again prefer to do a part-by-part lexical analysis of
the input, mainly because this means that quotemeta can be used to
quote metacharacters in the 'text' parts:

------------------
sub xlate_tin_pattern
{
my $out;

for ($_[0]) {
/\G(\?+)/gc && do {
$out .= '.' x length($1);
redo;
};

/\G\*+/gc && do {
$out .= '.*?';
redo;
};

/\G([^?*]+)/g && do {
$out .= quotemeta($1);
redo;
};
}

return $out;
}

print(xlate_tin_pattern($_), "\n") for @ARGV;
 
Reply With Quote
 
George Mpouras
Guest
Posts: n/a
 
      08-25-2013
Στις 25/8/2013 7:52 μμ, ο/η Rainer Weikusat *γραψε:
> George Mpouras <(E-Mail Removed)> writes:
>> I want to convert a OS filemask with possible wildcards to regex
>> What to you think of the following approach
>>
>> $mask = "??-media*.wm?";
>> $mask=~s|\*\.\*|\*|g; # *.* -> *
>> $mask=~s|\.|\\.|g; # . -> \.
>> $mask=~s|\?|.|g; # ? -> .
>> $mask=~s|\*|.*?|g; # * -> .*?
>> $mask=~s/(\(|\)|\+|\^|\[|\]|\{|\}|\$|\@|\%)/\\$1/g; #escape ()+^[]{}$@%
>> $mask = qr/^$mask$/i;

>
> I think I would again prefer to do a part-by-part lexical analysis of
> the input, mainly because this means that quotemeta can be used to
> quote metacharacters in the 'text' parts:
>
> ------------------
> sub xlate_tin_pattern
> {
> my $out;
>
> for ($_[0]) {
> /\G(\?+)/gc && do {
> $out .= '.' x length($1);
> redo;
> };
>
> /\G\*+/gc && do {
> $out .= '.*?';
> redo;
> };
>
> /\G([^?*]+)/g && do {
> $out .= quotemeta($1);
> redo;
> };
> }
>
> return $out;
> }
>
> print(xlate_tin_pattern($_), "\n") for @ARGV;
>



very good !
but the line
/\G(\?+)/gc && do { $out .= '.' x length($1); redo };
it fries my brain.
So I think I stick with the equivelant f1()











print xlate_tin_pattern('@s??im..pl%e.???a'), "\n";
print f1('@s??im..pl%e.???a'), "\n";


sub xlate_tin_pattern
{
my $out;
for ($_[0]){
/\G(\?+)/gc && do { $out .= '.' x length($1); redo };
/\G\*+/gc && do { $out .= '.*?'; redo };
/\G([^?*]+)/g && do { $out .= quotemeta($1); redo }}
$out
}


sub f1
{
$out=$_[0];
$out=~s/([^?*]+)/\Q$1\E/g;
$out=~s|\?|.|g;
$out=~s|\*+|.*?|g;
$out
}



 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      08-25-2013
Rainer Weikusat <(E-Mail Removed)> writes:

> George Mpouras <(E-Mail Removed)> writes:
>> I want to convert a OS filemask with possible wildcards to regex
>> What to you think of the following approach
>>
>> $mask = "??-media*.wm?";
>> $mask=~s|\*\.\*|\*|g; # *.* -> *

>
> [...]
>
>> $mask=~s|\*|.*?|g; # * -> .*?

>
> This sequence of conversion is wrong because it will translate *.* to
> .*?, ie, something which matches a string with not . in it.


I think that may be deliberate. I was going to ask "what OS>", but when
I saw that, I remembered that in MS-DOS (and maybe others), *.* means
all files. Similarly X*.* means all file beginning with X. (The reason
being that the . is not in the file name, just in the presentation of
it, though I still think that's a weak argument.)

I'm not saying the translation is correct -- I can't remember all of
MS-DOS's rules, and it's likely to be wrong of the target is not an
MS-DOS-like OS.

--
Ben.
 
Reply With Quote
 
Rainer Weikusat
Guest
Posts: n/a
 
      08-26-2013
Ben Bacarisse <(E-Mail Removed)> writes:
> Rainer Weikusat <(E-Mail Removed)> writes:
>> George Mpouras <(E-Mail Removed)> writes:
>>> I want to convert a OS filemask with possible wildcards to regex
>>> What to you think of the following approach
>>>
>>> $mask = "??-media*.wm?";
>>> $mask=~s|\*\.\*|\*|g; # *.* -> *

>>
>> [...]
>>
>>> $mask=~s|\*|.*?|g; # * -> .*?

>>
>> This sequence of conversion is wrong because it will translate *.* to
>> .*?, ie, something which matches a string with not . in it.

>
> I think that may be deliberate. I was going to ask "what OS>", but when
> I saw that, I remembered that in MS-DOS (and maybe others), *.* means
> all files. Similarly X*.* means all file beginning with X. (The reason
> being that the . is not in the file name, just in the presentation of
> it, though I still think that's a weak argument.)


'DOS filenames'' (and very likely VMS filenames as well) are not plain
strings but consist of two components, a 'name' part and a 'type'
part, and because of this, *.* means 'all names and all types', ie
'every file'. If the input these patterns are supposed to be matched
against is really a list of 'DOS filenames', translating *.* to .*\..*
(or .+\..+) instead of .* (or .+) will make no difference because the
extension is always going to be there. But when it was just a list of
strings, making '*.*' match both abc and abc.def is IMHO
counterintuitive. It also precludes some possibly useful applications
such as 'match everything which has an extension'.
 
Reply With Quote
 
George Mpouras
Guest
Posts: n/a
 
      08-26-2013
Στις 26/8/2013 1:50 πμ, ο/η Ben Bacarisse *γραψε:
>
> I'm not saying the translation is correct -- I can't remember all of
> MS-DOS's rules, and it's likely to be wrong of the target is not an
> MS-DOS-like OS.
>


yes you are corrrect it was intented, at windows the *.* means * !
but the Rainer aproach at his other answer is very clever and correct
 
Reply With Quote
 
George Mpouras
Guest
Posts: n/a
 
      08-26-2013
if we forget the windows at bash there is also the interesting range
operator !

ls -l somefile{01,02,03,07}
ls -l somefile{01..05}



 
Reply With Quote
 
Ben Bacarisse
Guest
Posts: n/a
 
      08-26-2013
Rainer Weikusat <(E-Mail Removed)> writes:

> Ben Bacarisse <(E-Mail Removed)> writes:
>> Rainer Weikusat <(E-Mail Removed)> writes:
>>> George Mpouras <(E-Mail Removed)> writes:
>>>> I want to convert a OS filemask with possible wildcards to regex
>>>> What to you think of the following approach
>>>>
>>>> $mask = "??-media*.wm?";
>>>> $mask=~s|\*\.\*|\*|g; # *.* -> *
>>>
>>> [...]
>>>
>>>> $mask=~s|\*|.*?|g; # * -> .*?
>>>
>>> This sequence of conversion is wrong because it will translate *.* to
>>> .*?, ie, something which matches a string with not . in it.

>>
>> I think that may be deliberate. I was going to ask "what OS>", but when
>> I saw that, I remembered that in MS-DOS (and maybe others), *.* means
>> all files. Similarly X*.* means all file beginning with X. (The reason
>> being that the . is not in the file name, just in the presentation of
>> it, though I still think that's a weak argument.)

>
> 'DOS filenames'' (and very likely VMS filenames as well) are not plain
> strings but consist of two components, a 'name' part and a 'type'
> part, and because of this, *.* means 'all names and all types', ie
> 'every file'. If the input these patterns are supposed to be matched
> against is really a list of 'DOS filenames', translating *.* to .*\..*
> (or .+\..+) instead of .* (or .+) will make no difference because the
> extension is always going to be there.


I don't follow. If I get a list of DOS file names using, say, DIR,
those with no extension have no dot. .*\.\* won't match them but .*
will. You can write a file with no extension as "XYZ." as well as "XYZ"
but, IIRC, many programs dropped the '.' if there was no extension.

> But when it was just a list of
> strings, making '*.*' match both abc and abc.def is IMHO
> counterintuitive. It also precludes some possibly useful applications
> such as 'match everything which has an extension'.


I must be missing your point because I don't follow this either. The
DOS way to match names with an extension was to write *.?*, and the
suggested translation will work for that.

--
Ben.
 
Reply With Quote
 
Dr.Ruud
Guest
Posts: n/a
 
      08-26-2013
On 24/08/2013 16:03, George Mpouras wrote:

> I want to convert a OS filemask with possible wildcards to regex
> What to you think of the following approach
>
> $mask = "??-media*.wm?";
> $mask=~s|\*\.\*|\*|g; # *.* -> *
> $mask=~s|\.|\\.|g; # . -> \.
> $mask=~s|\?|.|g; # ? -> .
> $mask=~s|\*|.*?|g; # * -> .*?
> $mask=~s/(\(|\)|\+|\^|\[|\]|\{|\}|\$|\@|\%)/\\$1/g; #escape ()+^[]{}$@%
> $mask = qr/^$mask$/i;


Also checkout `perldoc -f glob`.

--
Ruud

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How make regex that means "contains regex#1 but NOT regex#2" ?? seberino@spawar.navy.mil Python 3 07-01-2008 03:06 PM
Help: Filemask problem Amy Lee Perl Misc 4 10-14-2007 01:34 PM
Is ASP Validator Regex Engine Same As VS2003 Find Regex Engine? =?Utf-8?B?SmViQnVzaGVsbA==?= ASP .Net 2 10-22-2005 02:43 PM
Java regex imposture re: Perl regex compatibility a_c_Attlee@yahoo.com Java 2 05-06-2005 12:16 AM
perl regex to java regex Rick Venter Java 5 11-06-2003 10:55 AM



Advertisments