Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Regexp, Strings and spaces

Reply
Thread Tools

Regexp, Strings and spaces

 
 
Florent Carli
Guest
Posts: n/a
 
      06-23-2004
Hello experts,

I'm looking for a regexp to get the information from smtg like this :

field1="value with or without spaces" field2=valuewithoutspaces

My only concern is that I don't want to match the quotes caracters.
For now I came up with :
my (@res) = $line =~ m/=(?(?<=["])[^"]+(?=["])|(?<!["])\S+(?!["])))/g
But the lookbehinds do not work ...

Any way to do this without using lookbehinds ?

Thanks!
 
Reply With Quote
 
 
 
 
Anno Siegel
Guest
Posts: n/a
 
      06-23-2004
Florent Carli <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> Hello experts,
>
> I'm looking for a regexp to get the information from smtg like this :
>
> field1="value with or without spaces" field2=valuewithoutspaces
>
> My only concern is that I don't want to match the quotes caracters.
> For now I came up with :
> my (@res) = $line =~ m/=(?(?<=["])[^"]+(?=["])|(?<!["])\S+(?!["])))/g
> But the lookbehinds do not work ...
>
> Any way to do this without using lookbehinds ?


Sure: /"?([^"]*)/

Take a look at one or another of the csv modules too.

Anno
 
Reply With Quote
 
 
 
 
J. Romano
Guest
Posts: n/a
 
      06-23-2004
http://www.velocityreviews.com/forums/(E-Mail Removed) (Florent Carli) wrote in message news:<(E-Mail Removed). com>...
> I'm looking for a regexp to get the information from smtg like this :
>
> field1="value with or without spaces" field2=valuewithoutspaces
>
> My only concern is that I don't want to match the quotes caracters.
> For now I came up with :
> my (@res) = $line =~ m/=(?(?<=["])[^"]+(?=["])|(?<!["])\S+(?!["])))/g
> But the lookbehinds do not work ...


This is easier done without lookbehinds:

$line =
'field1="value with or without spaces" field2=valuewithoutspaces'

while ( $line =~ m/="([^"]*)"|=(\w*)/g )
{
push @res, $1 if defined $1;
push @res, $2 if defined $2;
}

Essentially, the above lines of code loop through every instance of
either
="some text"
or
=some_text
The first instance has a pattern match of m/="[^"]*"/ and the second
instance has a pattern match of m/=(\w*)/ . Therefore, I put them
together (by joining them with the "|" symbol and put capturing
parentheses around the text I'm intersted in) with the regular
expression m/="([^"]*)"|=(\w*)/g .

The "/g" is used to loop through every match, populating either $1
or $2 every time through the loop. Inside the loop, I push either $1
or $2 into the @res array, depending on which one is defined (that is,
which one happened to match).

I hope this helps.

-- Jean-Luc
 
Reply With Quote
 
Florent Carli
Guest
Posts: n/a
 
      06-24-2004
>
> Sure: /"?([^"]*)/
>

This does not work since 'field=hello field2="world"' would get you
'hello field2=' into $1.
 
Reply With Quote
 
Florent Carli
Guest
Posts: n/a
 
      06-24-2004
> $line =
> 'field1="value with or without spaces" field2=valuewithoutspaces'
>
> while ( $line =~ m/="([^"]*)"|=(\w*)/g )
> {
> push @res, $1 if defined $1;
> push @res, $2 if defined $2;
> }
>


I think my specifications were bad.
The "line" can be as long as it wants with so many fields.
It can be field1="test" field2=test2 field3="test 3"
field4="testagain"
and the next line could be
field1="test 4" field2="test 5" field3=test_6 field4="test n7"

What I need was to get value of field2 for any type of field2 I can
get : "value with space", "valuewithoutspace", valuewithoutspace, or
even empty or "".
Any all cases, the value alone (without quotes) must go into $1 and $1
only.

For now, the only regexp able to do this I have found is :
field2=["]?((?<=["])[^"]*(?=["])|(?<!["])\S*(?!["]))
But like I said, the software I use to parse is using a version of
perl that does not support lookbehinds ...

I'm trying to do basically the same thing windows does when you type :
copy "my file.doc" "d:\my documents"
or
copy myfile.doc d:\

But only with one regexp (and no second pass in perl to remove the
quotes for instance )
any idea ?
 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      06-24-2004
Florent Carli <(E-Mail Removed)> wrote in comp.lang.perl.misc:
> >
> > Sure: /"?([^"]*)/
> >

> This does not work since 'field=hello field2="world"' would get you
> 'hello field2=' into $1.


I didn't read your original specification that way.

The best solution is probably a module (Text::Balanced, or one of
the CSV modules). For background information, see the FAQ:

How can I split a [character] delimited string except when inside [character]

Anno
 
Reply With Quote
 
A. Sinan Unur
Guest
Posts: n/a
 
      06-24-2004
(E-Mail Removed) (Florent Carli) wrote in
news:(E-Mail Removed) m:

> For now, the only regexp able to do this I have found is :
> field2=["]?((?<=["])[^"]*(?=["])|(?<!["])\S*(?!["]))
> But like I said, the software I use to parse is using a version of
> perl that does not support lookbehinds ...
>
> I'm trying to do basically the same thing windows does when you type :
> copy "my file.doc" "d:\my documents"
> or
> copy myfile.doc d:\
>
> But only with one regexp (and no second pass in perl to remove the
> quotes for instance )
> any idea ?


Is this just out of curiosity?

If there is some other purpose to this, take a look at Text::Balanced.
The few times I needed this type of functionality, that module worked
very well for me.

--
A. Sinan Unur
(E-Mail Removed) (reverse each component for email address)
 
Reply With Quote
 
Florent Carli
Guest
Posts: n/a
 
      06-25-2004
The problem is that I have to enter a regex into a config file of a
software which does not understand lookbehinds (probably a old version
of perl, since I get a "bad pattern <?...").
Anyway, I'm not using perl directly for this, I have to find a regex
to do that, without lookbehinds, that's it.
That's why I can not code a second pass to remove quotes after a
/field2=("[^"]*"|\S*)/ for instance, or something that would give me
the one backreference I need after a /field2=(?:"([^"]*)"|(\S*))/.
I can't use a perl module either, of course.
If fact, I cannot code at all, the only thing I can control is 1
regexp.

Thanks!

> Is this just out of curiosity?
>
> If there is some other purpose to this, take a look at Text::Balanced.
> The few times I needed this type of functionality, that module worked
> very well for me.

 
Reply With Quote
 
J. Romano
Guest
Posts: n/a
 
      06-25-2004
(E-Mail Removed) (Florent Carli) wrote in message news:<(E-Mail Removed) om>...
>
> I think my specifications were bad.


I'm sorry, but did you even try out my code? It does exactly what
you want. I even tested it.

> The "line" can be as long as it wants with so many fields.
> It can be field1="test" field2=test2 field3="test 3"
> field4="testagain"
> and the next line could be
> field1="test 4" field2="test 5" field3=test_6 field4="test n7"


It does exactly that. I even created a short script for you to run
to show you that it works. Here, try this:

#!/usr/bin/perl -w
use strict;
my @res; # results will be stored here
# Process the input lines (from the DATA section):
while (<DATA>)
{
while ( m/="([^"]*)"|=(\w*)/g )
{
push @res, $1 if defined $1;
push @res, $2 if defined $2;
}
}
# Print out the @res array to show the results:
foreach (my $i = 0; $i < @res; $i++)
{
print "\$res[$i] = \"$res[$i]\"\n";
}
__DATA__
# These are sample input lines:
field1="test" field2=test2 field3="test 3" field4="testagain"
field1="test 4" field2="test 5" field3=test_6 field4="test n7"
field1=""
__END__


> What I need was to get value of field2 for any type of field2 I can
> get : "value with space", "valuewithoutspace", valuewithoutspace, or
> even empty or "".
> Any all cases, the value alone (without quotes) must go into $1 and $1
> only.


No, I think you are mistaken. The value alone (without quotes)
must go into the @res array, and not necessarily into $1. The match
will either temporarily be in $1 or $2, but regardless of which it
goes into, it WILL be placed into the @res array, which is what you
want.

> For now, the only regexp able to do this I have found is :
> field2=["]?((?<=["])[^"]*(?=["])|(?<!["])\S*(?!["]))
> But like I said, the software I use to parse is using a version of
> perl that does not support lookbehinds ...


Don't use look-behinds. They are not needed for your task. And
please test the code I gave you before saying that it doesn't do what
you want.

-- Jean-Luc
 
Reply With Quote
 
J. Romano
Guest
Posts: n/a
 
      06-25-2004
(E-Mail Removed) (Florent Carli) wrote in message news:<(E-Mail Removed) om>...
> The problem is that I have to enter a regex into a config file of a
> software which does not understand lookbehinds (probably a old version
> of perl, since I get a "bad pattern <?...").


Oh, so that's why you had all those restrictions. Without the
knowledge of your restrictions, we couldn't really give you a complete
answer.

> Anyway, I'm not using perl directly for this, I have to find a regex
> to do that, without lookbehinds, that's it.


Are you sure you are using Perl for this? I've done similar things
myself (that is, putting a regular expression in a config file), but I
don't think it was Perl that was evaluating them. It could be that
Perl has nothing to do with this.

> That's why I can not code a second pass to remove quotes after a
> /field2=("[^"]*"|\S*)/ for instance, or something that would give me
> the one backreference I need after a /field2=(?:"([^"]*)"|(\S*))/.
> I can't use a perl module either, of course.
> If fact, I cannot code at all, the only thing I can control is 1
> regexp.


The main problem is that you are searching for different patterns,
depending on what your delimeter is. If you have 'value="some text"',
then you will be looking for the next '"' character to signal the end
of your pattern. But if you have 'value=some_text', then you will be
looking for whitespace to signal the end of your pattern. This flow
of logic (if-then-else) is something that regular expressions alone
weren't made to handle.

I don't think your problem has a working solution because regular
expressions lack the ability to carry out the above logic. So let me
propose two work-arounds:

1. You could modify the program that reads the config files to handle
the logic you need.

or

2. You can write a simple Perl script to convert your config file so
that all the fields have quotes around the values (whether they need
them or not). In other words, your script would change all instances
of:

field1=some_text

to:

field1="some_text"

Then you could just set your regular expression to be:

m/field[0-9]+="([^"]*)"/

and then all your fields would be extracted. Problem solved.

Of course, I would imagine that the second work-around will be much
easier for you to implement, unless there is some other restriction
that you haven't shared with us.

Hopefully you'll find a solution that works for you.

-- Jean-Luc
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
XSD to allow A-Z 0-9 and spaces, but not leading/trailing spaces johkar XML 2 12-10-2009 09:24 AM
[CSS] how can I show spaces as spaces? Tomasz Chmielewski HTML 21 09-10-2009 06:43 PM
Re: How to trim a String trailing spaces, but not leading spaces? Roedy Green Java 3 09-14-2008 02:10 AM
Re: How to trim a String trailing spaces, but not leading spaces? John B. Matthews Java 4 09-12-2008 05:28 AM
Strings, Strings and Damned Strings Ben C Programming 14 06-24-2006 05:09 AM



Advertisments