Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Perl > Perl Misc > transforming german characters

Reply
Thread Tools

transforming german characters

 
 
steve_f
Guest
Posts: n/a
 
      08-06-2004
I want to transform special German characters to obtain the following
variations:

groß bräu
gross bräu
gross braeu

there are two sets -

set one:
ß = ss = \xDF

set two:
Ä = Ae = \xC4
Ö = Oe = \xD6
Ü = Ue = \xDC
ä = ae = \xE4
ö = oe = \xF6
ü = ue = \xFC

basically, the rules are transform ß independently
and with set two, they are either all on or off together.

I wrote the follow which works well, but looks
pretty bad I think. so again this is a style question...
can anyone suggest a cleaner approach? TIA

sub transform_characters {
my @input = @_;
my @output;
for my $string (@input) {
push @output, $string;
if ($string =~ /\xDF/) {
$string =~ s/\xDF/ss/g;
push @output, $string;
if (test_for_character($string)) {
$string = swap_all($string);
push @output, $string;
}
next;
}
if (test_for_character($string)) {
$string = swap_all($string);
push @output, $string;
}
}
return @output;
}

sub test_for_character {
my $string = shift;
if ($string =~ /\xC4/ ||
$string =~ /\xD6/ ||
$string =~ /\xDC/ ||
$string =~ /\xE4/ ||
$string =~ /\xF6/ ||
$string =~ /\xFC/) {
return 1
} else {
return 0
}
}

sub swap_all {
my $string = shift;
$string =~ s/\xC4/Ae/g;
$string =~ s/\xD6/Oe/g;
$string =~ s/\xDC/Ue/g;
$string =~ s/\xE4/ae/g;
$string =~ s/\xF6/oe/g;
$string =~ s/\xFC/ue/g;
return $string;
}

 
Reply With Quote
 
 
 
 
John W. Krahn
Guest
Posts: n/a
 
      08-06-2004
steve_f wrote:
> I want to transform special German characters to obtain the following
> variations:
>
> groß bräu
> gross bräu
> gross braeu
>
> there are two sets -
>
> set one:
> ß = ss = \xDF
>
> set two:
> Ä = Ae = \xC4
> Ö = Oe = \xD6
> Ü = Ue = \xDC
> ä = ae = \xE4
> ö = oe = \xF6
> ü = ue = \xFC
>
> basically, the rules are transform ß independently
> and with set two, they are either all on or off together.
>
> I wrote the follow which works well, but looks
> pretty bad I think.


It doesn't look too bad, I've seen worse.


> so again this is a style question...
> can anyone suggest a cleaner approach? TIA


The usual idiom is to use a hash for the search and replace tables.


> sub transform_characters {
> my @input = @_;
> my @output;
> for my $string (@input) {
> push @output, $string;
> if ($string =~ /\xDF/) {
> $string =~ s/\xDF/ss/g;


Using a match followed by a substitution is a usual beginner mistake.
You only need the substitution.

if ( $string =~ s/\xDF/ss/g ) {


> push @output, $string;
> if (test_for_character($string)) {
> $string = swap_all($string);
> push @output, $string;
> }
> next;
> }
> if (test_for_character($string)) {
> $string = swap_all($string);
> push @output, $string;
> }
> }
> return @output;
> }
>
> [snip code]


Using a hash you could write that as:

my %set1 = (
"\xDF" => 'ss',
);
# Use a character class because all keys are single characters
# If keys are multiple characters use alternation instead
my $key1 = '[' . join( '', keys %set1 ) . ']';

my %set2 = (
"\xC4" => 'Ae',
"\xD6" => 'Oe',
"\xDC" => 'Ue',
"\xE4" => 'ae',
"\xF6" => 'oe',
"\xFC" => 'ue',
);
my $key2 = '[' . join( '', keys %set2 ) . ']';

sub transform_characters {
my @input = @_;
my @output;
for my $string ( @input ) {
push @output, $string;
if ( $string =~ s/($key1)/$set1{$1}/og ) {
push @output, $string;
if ( $string =~ s/($key2)/$set2{$1}/og ) {
push @output, $string;
}
next;
}
if ( $string =~ s/($key2)/$set2{$1}/og ) {
push @output, $string;
}
}
return @output;
}



John
--
use Perl;
program
fulfillment
 
Reply With Quote
 
 
 
 
Gunnar Hjalmarsson
Guest
Posts: n/a
 
      08-06-2004
steve_f wrote:
> I want to transform special German characters to obtain the
> following variations:
>
> groß bräu
> gross bräu
> gross braeu
>
> there are two sets -
>
> set one:
> ß = ss = \xDF
>
> set two:
> Ä = Ae = \xC4
> Ö = Oe = \xD6
> Ü = Ue = \xDC
> ä = ae = \xE4
> ö = oe = \xF6
> ü = ue = \xFC
>
> basically, the rules are transform ß independently
> and with set two, they are either all on or off together.


As John said, there is no reason to look for the characters with
separate regexes, and accordingly there is no reason to distinguish
between two sets.

> for my $string (@input) {
> push @output, $string;


Here you copy the whole original text to @output ...

> if ($string =~ /\xDF/) {
> $string =~ s/\xDF/ss/g;
> push @output, $string;


.... and here you *add* the converted string. In the suggestion below,
I'm assuming that was a mistake.

sub transform_characters {
my @text = @_;

my %replace = (
"\xDF" => 'ss',
"\xC4" => 'Ae',
"\xD6" => 'Oe',
"\xDC" => 'Ue',
"\xE4" => 'ae',
"\xF6" => 'oe',
"\xFC" => 'ue',
);

for (@text) {
s/(\xDF|\xC4|\xD6|\xDC|\xE4|\xF6|\xFC)/$replace{$1}/g;
}

@text
}

my @output = transform_characters(@input);

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl
 
Reply With Quote
 
steve_f
Guest
Posts: n/a
 
      08-07-2004
Thanks Gunnar, some great stuff here....I can use simple
statements to just brute force things, but I know there is
a more elegent way.

On Fri, 06 Aug 2004 21:35:18 +0200, Gunnar Hjalmarsson <> wrote:

>steve_f wrote:
>> I want to transform special German characters to obtain the
>> following variations:
>>
>> groß bräu
>> gross bräu
>> gross braeu
>>
>> there are two sets -
>>
>> set one:
>> ß = ss = \xDF
>>
>> set two:
>> Ä = Ae = \xC4
>> Ö = Oe = \xD6
>> Ü = Ue = \xDC
>> ä = ae = \xE4
>> ö = oe = \xF6
>> ü = ue = \xFC
>>
>> basically, the rules are transform ß independently
>> and with set two, they are either all on or off together.

>
>As John said, there is no reason to look for the characters with
>separate regexes, and accordingly there is no reason to distinguish
>between two sets.


The ß can either be on or off independent of the others so
you can get:

groß bräu
gross bräu
gross braeu

I should of stated the problem more directly:

if set one - set one on & set two on
set one off & set two on
set one off & set two off

if only set two - set two all on
- set two all off

>> for my $string (@input) {
>> push @output, $string;

>
>Here you copy the whole original text to @output ...
>
>> if ($string =~ /\xDF/) {
>> $string =~ s/\xDF/ss/g;
>> push @output, $string;

>
>... and here you *add* the converted string. In the suggestion below,
>I'm assuming that was a mistake.
>
> sub transform_characters {
> my @text = @_;
>
> my %replace = (
> "\xDF" => 'ss',
> "\xC4" => 'Ae',
> "\xD6" => 'Oe',
> "\xDC" => 'Ue',
> "\xE4" => 'ae',
> "\xF6" => 'oe',
> "\xFC" => 'ue',
> );
>

I really like the idea of the hash. Yes, I have heard you are not
thinking in Perl if you are not using hashes.

> for (@text) {
> s/(\xDF|\xC4|\xD6|\xDC|\xE4|\xF6|\xFC)/$replace{$1}/g;
> }

this is super! thanks
>
> @text
> }
>
> my @output = transform_characters(@input);


 
Reply With Quote
 
steve_f
Guest
Posts: n/a
 
      08-07-2004
Thank you John, this is really useful. Just to start, I must always remind
myself if I am doing something too many times to generalize.

>John W. Krahn wrote:


[ snip - my statement of problem ]

>>
>> I wrote the follow which works well, but looks
>> pretty bad I think.

>
>It doesn't look too bad, I've seen worse.
>

I was able to brute force my way through it
>
>> so again this is a style question...
>> can anyone suggest a cleaner approach? TIA

>
>The usual idiom is to use a hash for the search and replace tables.
>


yes, I see and it is very good...changes the whole approach

>
>> sub transform_characters {
>> my @input = @_;
>> my @output;
>> for my $string (@input) {
>> push @output, $string;
>> if ($string =~ /\xDF/) {
>> $string =~ s/\xDF/ss/g;

>
>Using a match followed by a substitution is a usual beginner mistake.
>You only need the substitution.
>
> if ( $string =~ s/\xDF/ss/g ) {
>


ahh...ok...that's good to learn

[ snip code ]

>
>Using a hash you could write that as:
>
>my %set1 = (
> "\xDF" => 'ss',
> );
># Use a character class because all keys are single characters
># If keys are multiple characters use alternation instead


can you explain this a bit further? I'm not quite sure what you mean
by alternation, but I really only looked up the escaped values for
this particular problem.

>my $key1 = '[' . join( '', keys %set1 ) . ']';


also here I start to get really lost....ok, you are loading into a scalar
the keys as one long string...joining them with no space between...
with two brackets so

$key1 = [\xDF]
$key2 = [\xC4\xD6\xDC\xE4\xF6\xFC]
correct?

I see you use it down below in this substitution but it is a bit hard
for me to understand:

if ( $string =~ s/($key1)/$set1{$1}/og )

well, if you have the time please give me a bit more clarrification
on this because I haven't seen it before.

>
>my %set2 = (
> "\xC4" => 'Ae',
> "\xD6" => 'Oe',
> "\xDC" => 'Ue',
> "\xE4" => 'ae',
> "\xF6" => 'oe',
> "\xFC" => 'ue',
> );
>my $key2 = '[' . join( '', keys %set2 ) . ']';
>
>sub transform_characters {
> my @input = @_;
> my @output;
> for my $string ( @input ) {
> push @output, $string;
> if ( $string =~ s/($key1)/$set1{$1}/og ) {
> push @output, $string;
> if ( $string =~ s/($key2)/$set2{$1}/og ) {
> push @output, $string;
> }
> next;
> }
> if ( $string =~ s/($key2)/$set2{$1}/og ) {
> push @output, $string;
> }
> }
> return @output;
> }
>
>
>
>John


Thanks again John.

Steve

 
Reply With Quote
 
Joe Smith
Guest
Posts: n/a
 
      08-07-2004
Gunnar Hjalmarsson wrote:

> steve_f wrote:
>
>> I want to transform special German characters to obtain the
>> following variations:
>>
>> groß bräu
>> gross bräu
>> gross braeu


>> for my $string (@input) {
>> push @output, $string;

>
> Here you copy the whole original text to @output ...
>
>> if ($string =~ /\xDF/) {
>> $string =~ s/\xDF/ss/g;
>> push @output, $string;

>
>
> ... and here you *add* the converted string. In the suggestion below,
> I'm assuming that was a mistake.


As I read it, steve_f wants to output three separate lines for each
line of input that has both sets of characters.
line 1 = original string.
line 2 = string after doing just the ss substitution
line 3 = string after doing ss and all the other substitutions.
If so, adding the converted string with a second and third push is correct.
-Joe
 
Reply With Quote
 
John W. Krahn
Guest
Posts: n/a
 
      08-08-2004
steve_f wrote:
>
>>John W. Krahn wrote:
>>
>>Using a hash you could write that as:
>>
>>my %set1 = (
>> "\xDF" => 'ss',
>> );
>># Use a character class because all keys are single characters
>># If keys are multiple characters use alternation instead

>
> can you explain this a bit further? I'm not quite sure what you mean
> by alternation, but I really only looked up the escaped values for
> this particular problem.


Gunnar's example uses alternation.


>>my $key1 = '[' . join( '', keys %set1 ) . ']';


Changing this to use alternation would look something like:

my $key1 = '(?:' . join( '|', keys %set1 ) . ')';


> also here I start to get really lost....ok, you are loading into a scalar
> the keys as one long string...joining them with no space between...
> with two brackets so
>
> $key1 = [\xDF]
> $key2 = [\xC4\xD6\xDC\xE4\xF6\xFC]
> correct?


Yes.


> I see you use it down below in this substitution but it is a bit hard
> for me to understand:
>
> if ( $string =~ s/($key1)/$set1{$1}/og )
>
> well, if you have the time please give me a bit more clarrification
> on this because I haven't seen it before.


The substitution and match operators interpolate variables like double
quoted strings so after interpolation the substitution operator sees:

if ( $string =~ s/([\xDF])/ss/g )


John
--
use Perl;
program
fulfillment
 
Reply With Quote
 
steve_f
Guest
Posts: n/a
 
      08-09-2004
On Sun, 08 Aug 2004 23:41:42 GMT, "John W. Krahn" <> wrote:

>steve_f wrote:
>>
>>>John W. Krahn wrote:
>>>
>>>Using a hash you could write that as:
>>>
>>>my %set1 = (
>>> "\xDF" => 'ss',
>>> );
>>># Use a character class because all keys are single characters
>>># If keys are multiple characters use alternation instead

>>
>> can you explain this a bit further? I'm not quite sure what you mean
>> by alternation, but I really only looked up the escaped values for
>> this particular problem.

>
>Gunnar's example uses alternation.
>
>
>>>my $key1 = '[' . join( '', keys %set1 ) . ']';

>
>Changing this to use alternation would look something like:
>
>my $key1 = '(?:' . join( '|', keys %set1 ) . ')';
>
>
>> also here I start to get really lost....ok, you are loading into a scalar
>> the keys as one long string...joining them with no space between...
>> with two brackets so
>>
>> $key1 = [\xDF]
>> $key2 = [\xC4\xD6\xDC\xE4\xF6\xFC]
>> correct?

>
>Yes.
>
>
>> I see you use it down below in this substitution but it is a bit hard
>> for me to understand:
>>
>> if ( $string =~ s/($key1)/$set1{$1}/og )
>>
>> well, if you have the time please give me a bit more clarrification
>> on this because I haven't seen it before.

>
>The substitution and match operators interpolate variables like double
>quoted strings so after interpolation the substitution operator sees:
>


ahhhhhhhhhh...all very fancy stuff, but I got it! thanks for
showing me this

>if ( $string =~ s/([\xDF])/ss/g )
>
>
>John


 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Help-Charging German Fritzbox from German to English Nigel UK VOIP 4 01-22-2008 10:47 PM
Newbie:sending German characters on TCP link Navin Mishra ASP .Net 2 02-27-2007 10:02 PM
Transforming XML containing Asian characters? mikeyjudkins@yahoo.com XML 4 06-08-2005 07:17 PM
_wcsupr () with german characters Ajey C++ 1 03-30-2005 01:10 PM
Cannot read German characters via FileInputStream Zsolt Java 6 02-08-2004 01:51 AM



Advertisments
 



1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57