![]() |
Regexp, Strings and spaces
Hello experts,
I'm looking for a regexp to get the information from smtg like this : field1="value with or without spaces" field2=valuewithoutspaces My only concern is that I don't want to match the quotes caracters. For now I came up with : my (@res) = $line =~ m/=(?:((?<=["])[^"]+(?=["])|(?<!["])\S+(?!["])))/g But the lookbehinds do not work ... Any way to do this without using lookbehinds ? Thanks! |
Re: Regexp, Strings and spaces
Florent Carli <nospam@tomcat.ca.tc> wrote in comp.lang.perl.misc:
> Hello experts, > > I'm looking for a regexp to get the information from smtg like this : > > field1="value with or without spaces" field2=valuewithoutspaces > > My only concern is that I don't want to match the quotes caracters. > For now I came up with : > my (@res) = $line =~ m/=(?:((?<=["])[^"]+(?=["])|(?<!["])\S+(?!["])))/g > But the lookbehinds do not work ... > > Any way to do this without using lookbehinds ? Sure: /"?([^"]*)/ Take a look at one or another of the csv modules too. Anno |
Re: Regexp, Strings and spaces
nospam@tomcat.ca.tc (Florent Carli) wrote in message news:<6d12cccb.0406230205.5c5b99df@posting.google. com>...
> I'm looking for a regexp to get the information from smtg like this : > > field1="value with or without spaces" field2=valuewithoutspaces > > My only concern is that I don't want to match the quotes caracters. > For now I came up with : > my (@res) = $line =~ m/=(?:((?<=["])[^"]+(?=["])|(?<!["])\S+(?!["])))/g > But the lookbehinds do not work ... This is easier done without lookbehinds: $line = 'field1="value with or without spaces" field2=valuewithoutspaces' while ( $line =~ m/="([^"]*)"|=(\w*)/g ) { push @res, $1 if defined $1; push @res, $2 if defined $2; } Essentially, the above lines of code loop through every instance of either ="some text" or =some_text The first instance has a pattern match of m/="[^"]*"/ and the second instance has a pattern match of m/=(\w*)/ . Therefore, I put them together (by joining them with the "|" symbol and put capturing parentheses around the text I'm intersted in) with the regular expression m/="([^"]*)"|=(\w*)/g . The "/g" is used to loop through every match, populating either $1 or $2 every time through the loop. Inside the loop, I push either $1 or $2 into the @res array, depending on which one is defined (that is, which one happened to match). I hope this helps. -- Jean-Luc |
Re: Regexp, Strings and spaces
>
> Sure: /"?([^"]*)/ > This does not work since 'field=hello field2="world"' would get you 'hello field2=' into $1. |
Re: Regexp, Strings and spaces
> $line =
> 'field1="value with or without spaces" field2=valuewithoutspaces' > > while ( $line =~ m/="([^"]*)"|=(\w*)/g ) > { > push @res, $1 if defined $1; > push @res, $2 if defined $2; > } > I think my specifications were bad. The "line" can be as long as it wants with so many fields. It can be field1="test" field2=test2 field3="test 3" field4="testagain" and the next line could be field1="test 4" field2="test 5" field3=test_6 field4="test n°7" What I need was to get value of field2 for any type of field2 I can get : "value with space", "valuewithoutspace", valuewithoutspace, or even empty or "". Any all cases, the value alone (without quotes) must go into $1 and $1 only. For now, the only regexp able to do this I have found is : field2=["]?((?<=["])[^"]*(?=["])|(?<!["])\S*(?!["])) But like I said, the software I use to parse is using a version of perl that does not support lookbehinds ... I'm trying to do basically the same thing windows does when you type : copy "my file.doc" "d:\my documents" or copy myfile.doc d:\ But only with one regexp (and no second pass in perl to remove the quotes for instance ;) ) any idea ? |
Re: Regexp, Strings and spaces
Florent Carli <nospam@tomcat.ca.tc> wrote in comp.lang.perl.misc:
> > > > Sure: /"?([^"]*)/ > > > This does not work since 'field=hello field2="world"' would get you > 'hello field2=' into $1. I didn't read your original specification that way. The best solution is probably a module (Text::Balanced, or one of the CSV modules). For background information, see the FAQ: How can I split a [character] delimited string except when inside [character] Anno |
Re: Regexp, Strings and spaces
nospam@tomcat.ca.tc (Florent Carli) wrote in
news:6d12cccb.0406240335.e7fceed@posting.google.co m: > For now, the only regexp able to do this I have found is : > field2=["]?((?<=["])[^"]*(?=["])|(?<!["])\S*(?!["])) > But like I said, the software I use to parse is using a version of > perl that does not support lookbehinds ... > > I'm trying to do basically the same thing windows does when you type : > copy "my file.doc" "d:\my documents" > or > copy myfile.doc d:\ > > But only with one regexp (and no second pass in perl to remove the > quotes for instance ;) ) > any idea ? Is this just out of curiosity? If there is some other purpose to this, take a look at Text::Balanced. The few times I needed this type of functionality, that module worked very well for me. -- A. Sinan Unur 1usa@llenroc.ude (reverse each component for email address) |
Re: Regexp, Strings and spaces
The problem is that I have to enter a regex into a config file of a
software which does not understand lookbehinds (probably a old version of perl, since I get a "bad pattern <?..."). Anyway, I'm not using perl directly for this, I have to find a regex to do that, without lookbehinds, that's it. That's why I can not code a second pass to remove quotes after a /field2=("[^"]*"|\S*)/ for instance, or something that would give me the one backreference I need after a /field2=(?:"([^"]*)"|(\S*))/. I can't use a perl module either, of course. If fact, I cannot code at all, the only thing I can control is 1 regexp. Thanks! > Is this just out of curiosity? > > If there is some other purpose to this, take a look at Text::Balanced. > The few times I needed this type of functionality, that module worked > very well for me. |
Re: Regexp, Strings and spaces
nospam@tomcat.ca.tc (Florent Carli) wrote in message news:<6d12cccb.0406240335.e7fceed@posting.google.c om>...
> > I think my specifications were bad. I'm sorry, but did you even try out my code? It does exactly what you want. I even tested it. > The "line" can be as long as it wants with so many fields. > It can be field1="test" field2=test2 field3="test 3" > field4="testagain" > and the next line could be > field1="test 4" field2="test 5" field3=test_6 field4="test n°7" It does exactly that. I even created a short script for you to run to show you that it works. Here, try this: #!/usr/bin/perl -w use strict; my @res; # results will be stored here # Process the input lines (from the DATA section): while (<DATA>) { while ( m/="([^"]*)"|=(\w*)/g ) { push @res, $1 if defined $1; push @res, $2 if defined $2; } } # Print out the @res array to show the results: foreach (my $i = 0; $i < @res; $i++) { print "\$res[$i] = \"$res[$i]\"\n"; } __DATA__ # These are sample input lines: field1="test" field2=test2 field3="test 3" field4="testagain" field1="test 4" field2="test 5" field3=test_6 field4="test n°7" field1="" __END__ > What I need was to get value of field2 for any type of field2 I can > get : "value with space", "valuewithoutspace", valuewithoutspace, or > even empty or "". > Any all cases, the value alone (without quotes) must go into $1 and $1 > only. No, I think you are mistaken. The value alone (without quotes) must go into the @res array, and not necessarily into $1. The match will either temporarily be in $1 or $2, but regardless of which it goes into, it WILL be placed into the @res array, which is what you want. > For now, the only regexp able to do this I have found is : > field2=["]?((?<=["])[^"]*(?=["])|(?<!["])\S*(?!["])) > But like I said, the software I use to parse is using a version of > perl that does not support lookbehinds ... Don't use look-behinds. They are not needed for your task. And please test the code I gave you before saying that it doesn't do what you want. -- Jean-Luc |
Re: Regexp, Strings and spaces
nospam@tomcat.ca.tc (Florent Carli) wrote in message news:<6d12cccb.0406242332.188d519@posting.google.c om>...
> The problem is that I have to enter a regex into a config file of a > software which does not understand lookbehinds (probably a old version > of perl, since I get a "bad pattern <?..."). Oh, so that's why you had all those restrictions. Without the knowledge of your restrictions, we couldn't really give you a complete answer. > Anyway, I'm not using perl directly for this, I have to find a regex > to do that, without lookbehinds, that's it. Are you sure you are using Perl for this? I've done similar things myself (that is, putting a regular expression in a config file), but I don't think it was Perl that was evaluating them. It could be that Perl has nothing to do with this. > That's why I can not code a second pass to remove quotes after a > /field2=("[^"]*"|\S*)/ for instance, or something that would give me > the one backreference I need after a /field2=(?:"([^"]*)"|(\S*))/. > I can't use a perl module either, of course. > If fact, I cannot code at all, the only thing I can control is 1 > regexp. The main problem is that you are searching for different patterns, depending on what your delimeter is. If you have 'value="some text"', then you will be looking for the next '"' character to signal the end of your pattern. But if you have 'value=some_text', then you will be looking for whitespace to signal the end of your pattern. This flow of logic (if-then-else) is something that regular expressions alone weren't made to handle. I don't think your problem has a working solution because regular expressions lack the ability to carry out the above logic. So let me propose two work-arounds: 1. You could modify the program that reads the config files to handle the logic you need. or 2. You can write a simple Perl script to convert your config file so that all the fields have quotes around the values (whether they need them or not). In other words, your script would change all instances of: field1=some_text to: field1="some_text" Then you could just set your regular expression to be: m/field[0-9]+="([^"]*)"/ and then all your fields would be extracted. Problem solved. Of course, I would imagine that the second work-around will be much easier for you to implement, unless there is some other restriction that you haven't shared with us. Hopefully you'll find a solution that works for you. -- Jean-Luc |
| All times are GMT. The time now is 07:08 AM. |
Powered by vBulletin®. Copyright ©2000 - 2013, vBulletin Solutions, Inc.
SEO by vBSEO ©2010, Crawlability, Inc.