Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Help with split using multiple delimiters

Reply
Thread Tools

Help with split using multiple delimiters

 
 
geeknc@yahoo.com
Guest
Posts: n/a
 
      07-27-2005
I have a file that contains 5 elements per line each seperated by white
space, however the 4th element is surrounded by quotes.

Each line in a file looks like this:

ItemA ItemB 1.1.1.1.1 "xxx xx xxxxxx" ItemD

I was hoping to do something like this....

($a,$b,$c,$d,$e) = split(/split on white space or "...."/, $string);

and end up with....

$a = "ItemA";
$b = "ItemB";
$c = "1.1.1.1.1";
$d = "xxx xx xxxxxx";
$e = "ItemD";

I have tried multiple delimiters, but nothing seems to return 5
elements. Thank you, in advance, for any help you can offer.

 
Reply With Quote
 
 
 
 
it_says_BALLS_on_your forehead
Guest
Posts: n/a
 
      07-27-2005
don't use split--use a regex.

($a, $b, $c, $d, $e) = $string =~
/(\S+)\s+(\S+)\s+(\S+)\s+"(.+)"\s+(\S+)/;

or if using $_

($a, $b, $c, $d, $e) = /(\S+)\s+(\S+)\s+(\S+)\s+"(.+)"\s+(\S+)/;

you can wrap each element in double quotes later.

you may be able to do

@array = /(\S+)\s+(\S+)\s+(\S+)\s+"(.+)"\s+(\S+)/;

for (@array) {
$_ = qq{"$_"};
}

 
Reply With Quote
 
 
 
 
Paul Lalli
Guest
Posts: n/a
 
      07-27-2005
wrote:
> I have a file that contains 5 elements per line each seperated by white
> space, however the 4th element is surrounded by quotes.


Can you explain what was wrong with the solution you found in the FAQ?
You did, of course, search the FAQ before asking hundreds of other
people for help, right?

perldoc -q split
How can I split a [character] delimited string except when
inside [character]? (Comma-separated files)

In your case, the first [character] is a space, the second is a
double-quotes.

Paul Lalli

 
Reply With Quote
 
James Taylor
Guest
Posts: n/a
 
      07-28-2005
In article < .com>,
<> wrote:
>
> don't use split--use a regex.
>
> ($a, $b, $c, $d, $e) = $string =~
> /(\S+)\s+(\S+)\s+(\S+)\s+"(.+)"\s+(\S+)/;


If you don't know in advance which fields will be quoted,
you can use this regex instead:

my ($a, $b, $c, $d, $e) = $string =~ /("[^"]*"|\S+)/g;
# but then you need to remove any quotes by saying:
s/^"([^"]*)"$/$1/ foreach $a, $b, $c, $d, $e;

If you don't mind the fields all going in one array, you
could do it all in one go like this:

my @fields;
push @fields, $+ while $string =~ /"([^"]*)"|(\S+)/g;

Of course, nothing stops you then assigning the @fields
array to individual scalar variables:

my ($a, $b, $c, $d, $e) = @fields;

If a single line while loop with a fairly simple regex seems too
easy or too efficient, you can always spend time reading up on
the various CPAN modules suggested by the FAQ (perldoc -q split)
work out how to setup the necessary OO object instances, how
to call the provided methods to get the result you require,
test that it does what you expect, pray that there are no
earlier versions of the module around that are buggy, pray
that no future versions will be buggy, load the whole module
at compile time and hope that this and the method call interface
don't hit performance too much, and then sit back and enjoy
the somewhat dubious pleasures of OPC (Other People's Code)
in the knowledge that at least you didn't have to do the
work yourself. (Irony intended.)

Even if you wanted to use a module, I note that the FAQ
entry "How can I split a [character] delimited string except
when inside [character]?" recommends the use of Text::CVS or
Text::CVS_XS but I don't believe CVS is what's needed here.

--
James Taylor, London, UK PGP key: 3FBE1BF9
To protect against spam, the address in the "From:" header is not valid.
In any case, you should reply to the group so that everyone can benefit.
If you must send me a private email, use james at oakseed demon co uk.

 
Reply With Quote
 
it_says_BALLS_on_your forehead
Guest
Posts: n/a
 
      07-29-2005
i don't know if that would work because of greedy matching. you may
need a ? after your asterisk, to make it stingy matching.

 
Reply With Quote
 
Anno Siegel
Guest
Posts: n/a
 
      07-29-2005
James Taylor <spam-block-@-SEE-MY-SIG.com> wrote in comp.lang.perl.misc:
> In article < .com>,
> <> wrote:


[...]

> Even if you wanted to use a module, I note that the FAQ
> entry "How can I split a [character] delimited string except
> when inside [character]?" recommends the use of Text::CVS or
> Text::CVS_XS but I don't believe CVS is what's needed here.


That must be a typo in the FAQ. s/CVS/CSV/g.

Anno
--
If you want to post a followup via groups.google.com, don't use
the broken "Reply" link at the bottom of the article. Click on
"show options" at the top of the article, then click on the
"Reply" at the bottom of the article headers.
 
Reply With Quote
 
James Taylor
Guest
Posts: n/a
 
      07-29-2005
Simon, I'm not sure which bit of my post you were replying
to, or even if it was me you were replying to, as you did
not quote any context. I will therefore attempt to rebuild
the relevant context below with the correct attributions.
You probably need to get a better news reader if you can.

In article < .com>,
<> wrote:
>
> In article <>,
> James Taylor wrote:
> >
> > If you don't know in advance which fields will be quoted,
> > you can use this regex instead:
> >
> > my ($a, $b, $c, $d, $e) = $string =~ /("[^"]*"|\S+)/g;
> > # but then you need to remove any quotes by saying:
> > s/^"([^"]*)"$/$1/ foreach $a, $b, $c, $d, $e;
> >
> > If you don't mind the fields all going in one array, you
> > could do it all in one go like this:
> >
> > my @fields;
> > push @fields, $+ while $string =~ /"([^"]*)"|(\S+)/g;

>
> i don't know if that would work because of greedy matching. you may
> need a ? after your asterisk, to make it stingy matching.


If we're sure that the OP's input lines contain simple
double quoted strings that do not themselves contain double
quotes (and this is what his example illustrated) then a
greedy [^"]* will swallow everything up to the next double
quote just as we require. Obviously, if the closing quote was
missing, it wouldn't capture the correct thing. (I think it
would backtrack and treat the opening quote as part of a
space delimited word instead). The OP could check there are
an even number of double quotes beforehand by saying:

die "Bad input line: $string\n" if $string =~ tr/"// % 2;

If the input lines were similar to CSV in allowing strings
that themselves contain double quotes, doubled up like this:

ItemA ItemB 1.1.1.1.1 "He said ""Hello"" to me" ItemD

then a more complex regex would be required. If this is what the
OP wants he can ask, but I don't believe it is. What he shouldn't
do, though, is use Text:arseWords because, contrary to popular
belief, it doesn't handle CSV style quotes.

--
James Taylor, London, UK PGP key: 3FBE1BF9
To protect against spam, the address in the "From:" header is not valid.
In any case, you should reply to the group so that everyone can benefit.
If you must send me a private email, use james at oakseed demon co uk.

 
Reply With Quote
 
James Taylor
Guest
Posts: n/a
 
      07-29-2005
In article <dccplg$oqk$>,
Anno Siegel <> wrote:
>
> James Taylor wrote:
> >
> > Even if you wanted to use a module, I note that the FAQ
> > entry "How can I split a [character] delimited string except
> > when inside [character]?" recommends the use of Text::CVS or
> > Text::CVS_XS but I don't believe CVS is what's needed here.

>
> That must be a typo in the FAQ. s/CVS/CSV/g.


Who's responsible for maintaining the FAQ?
What's the correct procedure for nudging them?

--
James Taylor, London, UK PGP key: 3FBE1BF9
To protect against spam, the address in the "From:" header is not valid.
In any case, you should reply to the group so that everyone can benefit.
If you must send me a private email, use james at oakseed demon co uk.

 
Reply With Quote
 
it_says_BALLS_on_your forehead
Guest
Posts: n/a
 
      07-29-2005
> James Taylor wrote:
> If you don't know in advance which fields will be quoted,
> you can use this regex instead:



....so based on that (you said fieldS), the greedy matching would have
caused the regex to do something that was unintended.

> James Taylor also wrote:
> If this is what the
> OP wants he can ask, but I don't believe it is.


....referring to nested quotes. you'r right, he didn't ask that. nor did
i assume he did. the example that he gave suggests that the 4th field
would always be the quoted field, so that's why i gave him the simple
regex that i did.

i was simply pointing out what i thought was an oversight in your
regex, because my interpretation was that you thought the OP may have
to deal with multiple quoted fields, and if that were the case, the
default greedy matching would eat up all but the last quote.

 
Reply With Quote
 
xhoster@gmail.com
Guest
Posts: n/a
 
      07-29-2005
"it_says_BALLS_on_your forehead" <> wrote:
> > James Taylor wrote:
> > If you don't know in advance which fields will be quoted,
> > you can use this regex instead:

>
> ...so based on that (you said fieldS), the greedy matching would have
> caused the regex to do something that was unintended.


Can you illustrate this alleged problem?

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
How to join a range of array string slots with blank delimiters? Almost opposite of string.split()? Gunter Hansen Java 5 09-01-2011 06:41 AM
String#split and capturing delimiters Albert Schlef Ruby 3 10-30-2009 06:52 PM
howto split string with both comma and semicolon delimiters dmitrey Python 4 06-13-2008 01:10 AM
strtok behavior with multiple consecutive delimiters Geometer C Programming 34 05-09-2006 02:32 PM
strtok behavior with multiple consecutive delimiters Geometer C++ 33 05-09-2006 02:32 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57