Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > Substituting in a group

Reply
Thread Tools

Substituting in a group

 
 
aquadoll
Guest
Posts: n/a
 
      06-20-2008
(Duplicate copy - not sure if the previous msg got posted !!)

Hello,
I am having the following kind of lines:

ABC XXX,2231,"Math, Physics",0.45,2
PQR ERR,2217,"Physics, Chemistry, Math",0.21,5
ABC PQR,1213,Physics,0.5,1

I want to detect when there are groups of subjects in the 3rd column,
remove the quotes in those cases and replace the comma by # inside the
groups. So, the above lines would be transformed to:

ABC XXX,2231,Math# Physics,0.45,2
PQR ERR,2217,Physics# Chemistry# Math,0.21,5
ABC PQR,1213,Physics,0.5,1

I could not think of any one-liner, so I tried the following:
(Assuming I am reading each line in a variable called $Entry)

if($Entry =~ /"[A-Za-z\s]*(,[A-Za-z\s]*)+"/)
{
my $TempEntry=$Entry;
$TempEntry =~ s/"([A-Za-z\s]*([,][A-Za-z\s]*)+)"/$1/;
# Change comma to # in this phrase
$TempEntry =~ s/,/#/g;
print "TempEntry=$TempEntry\n";
# Now replace the original phrase with this phrase in the original
entry
$Entry =~ s/"[A-Za-z\s]*(,[A-Za-z\s]*)+"/$TempEntry/;
print "New Entry=$Entry\n";
}


The above does not work - for some reason all commas get transformed
into # for the first two lines. Where is the problem?

Also, is there a not-so-cryptic one-liner for this one?

Thanks.

 
Reply With Quote
 
 
 
 
patrick
Guest
Posts: n/a
 
      06-20-2008
On Jun 20, 8:51*am, aquadoll <(E-Mail Removed)> wrote:
> (Duplicate copy - not sure if the previous msg got posted !!)
>
> Hello,
> I am having the following kind of lines:
>
> ABC XXX,2231,"Math, Physics",0.45,2
> PQR ERR,2217,"Physics, Chemistry, Math",0.21,5
> ABC PQR,1213,Physics,0.5,1
>
> I want to detect when there are groups of subjects in the 3rd column,
> remove the quotes in those cases and replace the comma by # inside the
> groups. So, the above lines would be transformed to:
>
> ABC XXX,2231,Math# Physics,0.45,2
> PQR ERR,2217,Physics# Chemistry# Math,0.21,5
> ABC PQR,1213,Physics,0.5,1
>
> I could not think of any one-liner, so I tried the following:
> (Assuming I am reading each line in a variable called $Entry)
>
> if($Entry =~ /"[A-Za-z\s]*(,[A-Za-z\s]*)+"/)
> {
> * *my $TempEntry=$Entry;
> * *$TempEntry =~ s/"([A-Za-z\s]*([,][A-Za-z\s]*)+)"/$1/;
> * *# Change comma to # in this phrase
> * *$TempEntry =~ s/,/#/g;
> * *print "TempEntry=$TempEntry\n";
> * *# Now replace the original phrase with this phrase in the original
> entry
> * *$Entry =~ s/"[A-Za-z\s]*(,[A-Za-z\s]*)+"/$TempEntry/;
> * *print "New Entry=$Entry\n";
>
> }
>
> The above does not work - for some reason all commas get transformed
> into # for the first two lines. Where is the problem?
>
> Also, is there a not-so-cryptic one-liner for this one?
>
> Thanks.


You might try
perl -F'"' -lane '$F[0] =~ s/"//; $F[1] =~ s/"//;$F[1] =~ s/,/#/;print
@F' in.txt > out.txt

Patrick
 
Reply With Quote
 
 
 
 
Paul Lalli
Guest
Posts: n/a
 
      06-20-2008
On Jun 20, 11:51*am, aquadoll <(E-Mail Removed)> wrote:
> (Duplicate copy - not sure if the previous msg got posted !!)
>
> Hello,
> I am having the following kind of lines:
>
> ABC XXX,2231,"Math, Physics",0.45,2
> PQR ERR,2217,"Physics, Chemistry, Math",0.21,5
> ABC PQR,1213,Physics,0.5,1
>
> I want to detect when there are groups of subjects in the 3rd column,
> remove the quotes in those cases and replace the comma by # inside the
> groups. So, the above lines would be transformed to:
>
> ABC XXX,2231,Math# Physics,0.45,2
> PQR ERR,2217,Physics# Chemistry# Math,0.21,5
> ABC PQR,1213,Physics,0.5,1



> I could not think of any one-liner, so I tried the following:
> (Assuming I am reading each line in a variable called $Entry)
>
> if($Entry =~ /"[A-Za-z\s]*(,[A-Za-z\s]*)+"/)
> {
> * *my $TempEntry=$Entry;
> * *$TempEntry =~ s/"([A-Za-z\s]*([,][A-Za-z\s]*)+)"/$1/;


This gets rid of all the quotes in the TempEntry.

> * *# Change comma to # in this phrase
> * *$TempEntry =~ s/,/#/g;


This changes ALL commas in the entire entry, not just the commas that
were originally part of the quoted material.

> * *print "TempEntry=$TempEntry\n";
> * *# Now replace the original phrase with this phrase in the original
> entry
> * *$Entry =~ s/"[A-Za-z\s]*(,[A-Za-z\s]*)+"/$TempEntry/;
> * *print "New Entry=$Entry\n";
>
> }
>
> The above does not work - for some reason all commas get transformed
> into # for the first two lines. Where is the problem?


$TempEntry is the whole line, not just the part of $Entry you cared
about.

#First obtain the grouped items substring
my ($group) = ($TempEntry =~ /("[^"]+?")/);
#Create a copy of the group string to modify:
my $mod_group = $group;
#Remove all commas from the group
$mod_group =~ tr/,/#/;
#Remove the quotes from the group:
$mod_group =~ s/^"|"#//g;
#Replace the original group with the modified group in the original
Entry
$TempEntry =~ s/$group/$mod_group/;


Hope that helps,
Paul Lalli
 
Reply With Quote
 
Dave B
Guest
Posts: n/a
 
      06-20-2008
aquadoll wrote:

> ABC XXX,2231,"Math, Physics",0.45,2
> PQR ERR,2217,"Physics, Chemistry, Math",0.21,5
> ABC PQR,1213,Physics,0.5,1
>
> I want to detect when there are groups of subjects in the 3rd column,
> remove the quotes in those cases and replace the comma by # inside the
> groups. So, the above lines would be transformed to:
>
> ABC XXX,2231,Math# Physics,0.45,2
> PQR ERR,2217,Physics# Chemistry# Math,0.21,5
> ABC PQR,1213,Physics,0.5,1
>[snip]
> Also, is there a not-so-cryptic one-liner for this one?


I'm a beginner in perl, so please forgive any naivety. This oneliner seems
to work:

$ perl -pe 'if (s/"([^"]*)"/$1/) {$m=$n=$1; $n=~s/,/#/g; s/$m/$n/;}' file
XXX,2231,Math# Physics,0.45,2
PQR ERR,2217,Physics# Chemistry# Math,0.21,5
ABC PQR,1213,Physics,0.5,1

This assumes that the text between double quotes (the part that is matched
in the first place) does not appear elsewhere before the double quotes, and
assumes that it's the only text in double quotes in the line.

--
D.
 
Reply With Quote
 
aquadoll
Guest
Posts: n/a
 
      06-20-2008
On Jun 20, 11:02*am, Paul Lalli <(E-Mail Removed)> wrote:
> On Jun 20, 11:51*am, aquadoll <(E-Mail Removed)> wrote:
>
>
>
> > (Duplicate copy - not sure if the previous msg got posted !!)

>
> > Hello,
> > I am having the following kind of lines:

>
> > ABC XXX,2231,"Math, Physics",0.45,2
> > PQR ERR,2217,"Physics, Chemistry, Math",0.21,5
> > ABC PQR,1213,Physics,0.5,1

>
> > I want to detect when there are groups of subjects in the 3rd column,
> > remove the quotes in those cases and replace the comma by # inside the
> > groups. So, the above lines would be transformed to:

>
> > ABC XXX,2231,Math# Physics,0.45,2
> > PQR ERR,2217,Physics# Chemistry# Math,0.21,5
> > ABC PQR,1213,Physics,0.5,1
> > I could not think of any one-liner, so I tried the following:
> > (Assuming I am reading each line in a variable called $Entry)

>
> > if($Entry =~ /"[A-Za-z\s]*(,[A-Za-z\s]*)+"/)
> > {
> > * *my $TempEntry=$Entry;
> > * *$TempEntry =~ s/"([A-Za-z\s]*([,][A-Za-z\s]*)+)"/$1/;

>
> This gets rid of all the quotes in the TempEntry.
>
> > * *# Change comma to # in this phrase
> > * *$TempEntry =~ s/,/#/g;

>
> This changes ALL commas in the entire entry, not just the commas that
> were originally part of the quoted material.
>
> > * *print "TempEntry=$TempEntry\n";
> > * *# Now replace the original phrase with this phrase in the original
> > entry
> > * *$Entry =~ s/"[A-Za-z\s]*(,[A-Za-z\s]*)+"/$TempEntry/;
> > * *print "New Entry=$Entry\n";

>
> > }

>
> > The above does not work - for some reason all commas get transformed
> > into # for the first two lines. Where is the problem?

>
> $TempEntry is the whole line, not just the part of $Entry you cared
> about.
>
> #First obtain the grouped items substring
> my ($group) = ($TempEntry =~ /("[^"]+?")/);
> #Create a copy of the group string to modify:
> my $mod_group = $group;
> #Remove all commas from the group
> $mod_group =~ tr/,/#/;
> #Remove the quotes from the group:
> $mod_group =~ s/^"|"#//g;
> #Replace the original group with the modified group in the original
> Entry
> $TempEntry =~ s/$group/$mod_group/;
>
> Hope that helps,
> Paul Lalli


Hello,
Thanks for all the replies. I was actually trying to get the part of
$Entry I am interested in, in $TempEntry.
I used the following 2 lines (as shown in the OP):
$TempEntry=$Entry
$TempEntry =~ s/"([A-Za-z\s]*([,][A-Za-z\s]*)+)"/$1/;

Why did the above did not get "the part of $Entry I am interested in"
in $TempEntry? What did I do wrong?
Thanks.
 
Reply With Quote
 
John W. Krahn
Guest
Posts: n/a
 
      06-20-2008
patrick wrote:
> On Jun 20, 8:51 am, aquadoll <(E-Mail Removed)> wrote:
>>
>> I am having the following kind of lines:
>>
>> ABC XXX,2231,"Math, Physics",0.45,2
>> PQR ERR,2217,"Physics, Chemistry, Math",0.21,5
>> ABC PQR,1213,Physics,0.5,1
>>
>> I want to detect when there are groups of subjects in the 3rd column,
>> remove the quotes in those cases and replace the comma by # inside the
>> groups. So, the above lines would be transformed to:
>>
>> ABC XXX,2231,Math# Physics,0.45,2
>> PQR ERR,2217,Physics# Chemistry# Math,0.21,5
>> ABC PQR,1213,Physics,0.5,1

>
> You might try
> perl -F'"' -lane '$F[0] =~ s/"//; $F[1] =~ s/"//;$F[1] =~ s/,/#/;print
> @F' in.txt > out.txt


split() *removes* the expression you are splitting on so there are no
'"' characters in @F to remove so that could be simplified to:

perl -F'"' -lane '$F[1] =~ s/,/#/;print @F' in.txt > out.txt

But that only changes the first ',' to a '#' and not all of them so you
probably want this instead:

perl -F'"' -lane '$F[1] =~ s/,/#/g;print @F' in.txt > out.txt

Or:

perl -F'"' -lane '$F[1] =~ tr/,/#/;print @F' in.txt > out.txt



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall
 
Reply With Quote
 
Willem
Guest
Posts: n/a
 
      06-20-2008
aquadoll wrote:
) ABC XXX,2231,"Math, Physics",0.45,2
) PQR ERR,2217,"Physics, Chemistry, Math",0.21,5
) ABC PQR,1213,Physics,0.5,1
)
) I want to detect when there are groups of subjects in the 3rd column,
) remove the quotes in those cases and replace the comma by # inside the
) groups. So, the above lines would be transformed to:
)
) ABC XXX,2231,Math# Physics,0.45,2
) PQR ERR,2217,Physics# Chemistry# Math,0.21,5
) ABC PQR,1213,Physics,0.5,1
)
) I could not think of any one-liner, so I tried the following:
) (Assuming I am reading each line in a variable called $Entry)

How about:

while (s/(")(.*?)"/$2/) { substr($_,$+[1]-1,$+[2]-$+[1]) =~ s/,/#/g }

Which should do what you want, even for multiple quoted strings.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
Reply With Quote
 
Willem
Guest
Posts: n/a
 
      06-20-2008
aquadoll wrote:
) Hello,
) Thanks for all the replies. I was actually trying to get the part of
) $Entry I am interested in, in $TempEntry.
) I used the following 2 lines (as shown in the OP):
) $TempEntry=$Entry
) $TempEntry =~ s/"([A-Za-z\s]*([,][A-Za-z\s]*)+)"/$1/;
)
) Why did the above did not get "the part of $Entry I am interested in"
) in $TempEntry? What did I do wrong?

It's a substitution. You substitute the quoted part with the part
between quotes. The rest remains intact.

To get just the part between quotes, use this:

my ($TempEntry) = $Entry =~ /"(.*?)"/;

Why the complicated match string by the way ?
Do you only want to match quoted strings that contain a comma ?
It seems needlessly complex.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
Reply With Quote
 
Willem
Guest
Posts: n/a
 
      06-20-2008
Willem wrote:
) while (s/(")(.*?)"/$2/) { substr($_,$+[1]-1,$+[2]-$+[1]) =~ s/,/#/g }

Of course,
while (s/"(.*?)"/$1/) { substr($_,$-[1]-1,$+[1]-$-[1]) =~ s/,/#/g }
is slightly easier.


SaSW, Willem
--
Disclaimer: I am in no way responsible for any of the statements
made in the above text. For all I know I might be
drugged or something..
No I'm not paranoid. You all think I'm paranoid, don't you !
#EOT
 
Reply With Quote
 
aquadoll
Guest
Posts: n/a
 
      06-20-2008
On Jun 20, 12:01*pm, Willem <(E-Mail Removed)> wrote:
> Willem wrote:
>
> ) while (s/(")(.*?)"/$2/) { substr($_,$+[1]-1,$+[2]-$+[1]) =~ s/,/#/g }
>
> Of course,
> * while (s/"(.*?)"/$1/) { substr($_,$-[1]-1,$+[1]-$-[1]) =~ s/,/#/g }
> is slightly easier.
>
> SaSW, Willem
> --
> Disclaimer: I am in no way responsible for any of the statements
> * * * * * * made in the above text. For all I know I might be
> * * * * * * drugged or something..
> * * * * * * No I'm not paranoid. You all think I'm paranoid, don't you !
> #EOT


Thanks for the great discussion - I learnt a few things.
One last question: what does $ and [1] stands for in the above post in
"$-[1]-1"? Where can I find more about that in perldoc?
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
regular expressions, substituting and adding in one step? John Salerno Python 6 05-10-2006 01:52 PM
Substituting AOL Wav files with MSN Wav files Thaqalain Computer Support 6 07-07-2005 08:17 PM
Substituting Java API classes for enhanced functionality (e.g. java.io.File) Christian Schlichtherle Java 8 07-04-2005 11:20 PM
Python to C++ conversion substituting vectors for lists in a recursive function lugal Python 2 03-23-2005 02:51 PM
substituting values in property file Andy Fish Java 2 12-29-2003 05:11 PM



Advertisments