Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > "un-meta" the control characters

Reply
Thread Tools

"un-meta" the control characters

 
 
Paul Lalli
Guest
Posts: n/a
 
      11-02-2009
A coworker just presented me with this task. I came up with two
solutions, but I don't like either of them. He has a text document
and wants to scan it for characters such as newline, tab, form feed,
carriage return, vertical tab. If found, he wants to replace them
with their typical representation (ie, \n, \t, \f, \r, \v).

I first gave him the obvious:
$string =~ s/\n/\\n/;
$string =~ s/\t/\\t/;
$string =~ s/\f/\\f/;
$string =~ s/\r/\\r/;
$string =~ s/\v/\\v/;

which I don't like because of how much copy/paste is involved. Then I
came up with:

for (qw/n t f r v/) {
my $meta = eval("\\$_");
$string =~ s/$meta/\\$_/;
}

which I don't like, because the comment he'd have to put in the code
to explain it would be longer than the code itself, or the first
version.

So can anyone think of a better way? Is there any kind of intrinsic
link between a newline character and the letter 'n' that could be used
to go "backwards" here?

Thanks,
Paul Lalli
 
Reply With Quote
 
 
 
 
Uri Guttman
Guest
Posts: n/a
 
      11-02-2009
>>>>> "PL" == Paul Lalli <> writes:

PL> A coworker just presented me with this task. I came up with two
PL> solutions, but I don't like either of them. He has a text document
PL> and wants to scan it for characters such as newline, tab, form feed,
PL> carriage return, vertical tab. If found, he wants to replace them
PL> with their typical representation (ie, \n, \t, \f, \r, \v).

PL> I first gave him the obvious:
PL> $string =~ s/\n/\\n/;
PL> $string =~ s/\t/\\t/;
PL> $string =~ s/\f/\\f/;
PL> $string =~ s/\r/\\r/;
PL> $string =~ s/\v/\\v/;

PL> which I don't like because of how much copy/paste is involved. Then I
PL> came up with:

use a hash table for the conversion:

my %controls = (
"\n" => '\\n',
"\t" => '\\t',
"\r" => '\\r',
"\f" => '\\f',
"\v" => '\\v',
) ;

$string =~ s/([\n\t\r\f\v])/$controls{$1}/g;

and if you want to get anal about dups of the chars do this:

my @controls = qw( n t r f v ) ;
my %control_to_escape = map { eval( "\\$_" ) => "\\$_" } @controls ;

my $controls_re = '[' . join( '', map "\\$_", @controls ) . ']' ;

$string =~ s/($controls_re)/$controls_to_escape{$1}/g;

see ma! only one use of the actual control letters!

uri

--
Uri Guttman ------ -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------
 
Reply With Quote
 
 
 
 
Randal L. Schwartz
Guest
Posts: n/a
 
      11-02-2009
>>>>> "Uri" == Uri Guttman <> writes:

>>>>> "PL" == Paul Lalli <> writes:

PL> A coworker just presented me with this task. I came up with two
PL> solutions, but I don't like either of them. He has a text document
PL> and wants to scan it for characters such as newline, tab, form feed,
PL> carriage return, vertical tab. If found, he wants to replace them
PL> with their typical representation (ie, \n, \t, \f, \r, \v).

PL> I first gave him the obvious:
PL> $string =~ s/\n/\\n/;
PL> $string =~ s/\t/\\t/;
PL> $string =~ s/\f/\\f/;
PL> $string =~ s/\r/\\r/;
PL> $string =~ s/\v/\\v/;

PL> which I don't like because of how much copy/paste is involved. Then I
PL> came up with:

Uri> use a hash table for the conversion:

Uri> my %controls = (
Uri> "\n" => '\\n',
Uri> "\t" => '\\t',
Uri> "\r" => '\\r',
Uri> "\f" => '\\f',
Uri> "\v" => '\\v',
Uri> ) ;

Just to scare people:

my %controls = (
"\n" => '\n',
"\t" => '\t',
"\r" => '\r',
"\f" => '\f',
"\v" => '\v',
);

Ok, that's downright evil.

print "Just another Perl hacker,"; # the original

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion
 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      11-02-2009
On Mon, 2 Nov 2009 11:07:56 -0800 (PST), Paul Lalli <> wrote:

>A coworker just presented me with this task. I came up with two
>solutions, but I don't like either of them. He has a text document
>and wants to scan it for characters such as newline, tab, form feed,
>carriage return, vertical tab. If found, he wants to replace them
>with their typical representation (ie, \n, \t, \f, \r, \v).
>
>I first gave him the obvious:
>$string =~ s/\n/\\n/;
>$string =~ s/\t/\\t/;
>$string =~ s/\f/\\f/;
>$string =~ s/\r/\\r/;
>$string =~ s/\v/\\v/;
>
>which I don't like because of how much copy/paste is involved. Then I
>came up with:
>
>for (qw/n t f r v/) {
> my $meta = eval("\\$_");
> $string =~ s/$meta/\\$_/;
>}
>
>which I don't like, because the comment he'd have to put in the code
>to explain it would be longer than the code itself, or the first
>version.
>
>So can anyone think of a better way? Is there any kind of intrinsic
>link between a newline character and the letter 'n' that could be used
>to go "backwards" here?
>


Yet another way..

use strict;
use warnings;

my %translation = (
'\n'=>"\n",
'\t'=>"\t",
'\f'=>"\f",
'\r'=>"\r",
# ,'\v'=>"\v" - no 'v' for 'm'e, vt?
);

my $sample = "line 1\tsome\nline 2\t\t\f\n\rline 3\n";

while (my ($literal,$actual) = each %translation) {
$sample =~ s/$actual/$literal/eg;
}

print $sample;

__END__

-sln
 
Reply With Quote
 
sln@netherlands.com
Guest
Posts: n/a
 
      11-02-2009
On Mon, 02 Nov 2009 12:21:06 -0800, wrote:

>On Mon, 2 Nov 2009 11:07:56 -0800 (PST), Paul Lalli <> wrote:
>
>while (my ($literal,$actual) = each %translation) {
> $sample =~ s/$actual/$literal/eg;

$sample =~ s/$actual/$literal/g;
-sln
 
Reply With Quote
 
John W. Krahn
Guest
Posts: n/a
 
      11-02-2009
Paul Lalli wrote:
> A coworker just presented me with this task. I came up with two
> solutions, but I don't like either of them. He has a text document
> and wants to scan it for characters such as newline, tab, form feed,
> carriage return, vertical tab. If found, he wants to replace them
> with their typical representation (ie, \n, \t, \f, \r, \v).
>
> I first gave him the obvious:
> $string =~ s/\n/\\n/;
> $string =~ s/\t/\\t/;
> $string =~ s/\f/\\f/;
> $string =~ s/\r/\\r/;
> $string =~ s/\v/\\v/;


Perl doesn't have a "\v" character:

$string =~ s/\cK/\\v/;

Or:

$string =~ s/\13/\\v/;

Or:

$string =~ s/\xB/\\v/;




John
--
The programmer is fighting against the two most
destructive forces in the universe: entropy and
human stupidity. -- Damian Conway
 
Reply With Quote
 
C.DeRykus
Guest
Posts: n/a
 
      11-03-2009
On Nov 2, 11:07*am, Paul Lalli <mri...@gmail.com> wrote:
> A coworker just presented me with this task. *I came up with two
> solutions, but I don't like either of them. *He has a text document
> and wants to scan it for characters such as newline, tab, form feed,
> carriage return, vertical tab. *If found, he wants to replace them
> with their typical representation (ie, \n, \t, \f, \r, \v).
>
> I first gave him the obvious:
> $string =~ s/\n/\\n/;
> $string =~ s/\t/\\t/;
> $string =~ s/\f/\\f/;
> $string =~ s/\r/\\r/;
> $string =~ s/\v/\\v/;
>
> which I don't like because of how much copy/paste is involved. *Then I
> came up with:
>
> for (qw/n t f r v/) {
> * *my $meta = eval("\\$_");
> * *$string =~ s/$meta/\\$_/;
>
> }
> ...



Did that work? I don't understand why the eval is needed
at all:

my $string = "1\n 2\t 3\f 4\r 5\cK";
for (qw/n t f r cK/) {
my $meta = "\\$_";
$string =~ s/$meta/\\$_/;
}
print $string; # 1\n 2\t 3\f 4\r 5\cK

--
Charles DeRykus
 
Reply With Quote
 
Randal L. Schwartz
Guest
Posts: n/a
 
      11-03-2009
>>>>> "Ben" == Ben Morrow <> writes:

Ben> For extra added evil:

Ben> my $bs = "\\";
Ben> $string =~ s/$bs$_/$bs$_/g for qw/n r t f/;

And I thought *I* was being bad.

--
Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095
<> <URL:http://www.stonehenge.com/merlyn/>
Smalltalk/Perl/Unix consulting, Technical writing, Comedy, etc. etc.
See http://methodsandmessages.vox.com/ for Smalltalk and Seaside discussion
 
Reply With Quote
 
C.DeRykus
Guest
Posts: n/a
 
      11-04-2009
On Nov 3, 11:18*am, Ben Morrow <b...@morrow.me.uk> wrote:
> Quoth "C.DeRykus" <dery...@gmail.com>:
>
>
>
> > On Nov 2, 11:07*am, Paul Lalli <mri...@gmail.com> wrote:

>
> > > for (qw/n t f r v/) {
> > > * *my $meta = eval("\\$_");
> > > * *$string =~ s/$meta/\\$_/;

>
> > > }

>
> > Did that work? I don't understand why the eval is needed
> > at all:

>
> > my $string = "1\n 2\t 3\f 4\r 5\cK";
> > for (qw/n t f r cK/) {
> > * * my $meta = "\\$_";
> > * * $string =~ s/$meta/\\$_/;
> > }
> > print $string; * # *1\n 2\t 3\f *4\r *5\cK

>
> That's... evil. It relies on the fact that regexes undergo two separate
> expansion phases, and requires that variable expansion happens in the
> first phase but other qqish escapes are expanded in the second. I'm not
> entirely convinced that's documented behaviour: anyone care to dig out
> perlre and prove it one way or the other?
>
> For extra added evil:
>
> * * my $bs = "\\";
> * * $string =~ s/$bs$_/$bs$_/g for qw/n r t f/;
>


Perl magic is evil? Say it ain't so

I didn't spot a full explanation in perlre but I see perlop
steps through the compilation in "gory details of parsing
quoted constructs" and ends with what happens at runtime
in "parsing regular expressions".

This closely mirrors Chapter 7's section - Perl Regular
Expressions in J.Friedl's "Mastering Regular Expressions"
1st ed.


--
Charles DeRykus
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
pointers to constant characters and constant pointers to characters sam_cit@yahoo.co.in C Programming 4 12-14-2006 11:10 PM
How to convert HTML special characters to the real characters with a Java script Stefan Mueller HTML 3 07-23-2006 10:09 PM
Reversible replacement of whitespace characters with visible characters Micah Python 2 06-02-2006 09:55 PM
Convert Raw Text Escaped Characters to Characters nicholas.wakefield@gmail.com Java 2 07-11-2005 09:17 PM
help-> xslt transformation to pdf (chinese characters changed to # characters) omegaman XML 1 09-21-2004 10:44 PM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57