Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > ASP .Net > Special character to &abc equivalents

Reply
Thread Tools

Special character to &abc equivalents

 
 
Colin Peters
Guest
Posts: n/a
 
      05-07-2005
Hi,

I'm reading a file and writing it to the html output for a page.

I've come across two difficulties which I would like to solve.

The files contain special characters from European alphabets, namely
those which have the two little dots above the vowels called umlauts.

Normally, these are rendered in html using "%auml;", but in the file
they are just š.

1. I'm using a StreamReader to read the file and I have found that if I
don't use System.Text.Encoding.UTF7 then the characters are lost
completely. Is this the correct way, or is there a way to automatically
get the Stream Reader to select the correct encoding, or use other code
to determine which would be best?

2. Having read the character from the file, it is output literally to
the html, which I guess is to be expected. Is there a way to process a
string in order to change the š to &šuml; and so on.

Thanks in advance for any replies.


 
Reply With Quote
 
 
 
 
Yunus Emre ALP÷ZEN [MCAD.NET]
Guest
Posts: n/a
 
      05-07-2005
My advice u set underlying operating system encoding whatever u want. And
use streamreader and streamwriter with System.Text.Encoding.Default which
uses underlying OS encoding.

I had same problems with Turkish encoding but this is the best solution
(IMHO)

--

Thanks,
Yunus Emre ALP÷ZEN
BSc, MCAD.NET

"Colin Peters" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Hi,
>
> I'm reading a file and writing it to the html output for a page.
>
> I've come across two difficulties which I would like to solve.
>
> The files contain special characters from European alphabets, namely those
> which have the two little dots above the vowels called umlauts.
>
> Normally, these are rendered in html using "%auml;", but in the file they
> are just š.
>
> 1. I'm using a StreamReader to read the file and I have found that if I
> don't use System.Text.Encoding.UTF7 then the characters are lost
> completely. Is this the correct way, or is there a way to automatically
> get the Stream Reader to select the correct encoding, or use other code to
> determine which would be best?
>
> 2. Having read the character from the file, it is output literally to the
> html, which I guess is to be expected. Is there a way to process a string
> in order to change the š to &šuml; and so on.
>
> Thanks in advance for any replies.
>
>



 
Reply With Quote
 
 
 
 
Joerg Jooss
Guest
Posts: n/a
 
      05-07-2005
Colin Peters wrote:

> Hi,
>
> I'm reading a file and writing it to the html output for a page.
>
> I've come across two difficulties which I would like to solve.
>
> The files contain special characters from European alphabets, namely
> those which have the two little dots above the vowels called umlauts.
>
> Normally, these are rendered in html using "%auml;", but in the file
> they are just š.
>
> 1. I'm using a StreamReader to read the file and I have found that if
> I don't use System.Text.Encoding.UTF7 then the characters are lost
> completely.


UTF-7 is hardly what you want. Did you try ISO-8859-1? Or Windows-1252?

> Is this the correct way, or is there a way to
> automatically get the Stream Reader to select the correct encoding,
> or use other code to determine which would be best?


In general, there's no way to guess a character encoding because
there's no universal metadata that could tell you what encoding is
being used.

To put it differently: You must know the encoding, or allow the user to
switch between possible encodings.


> 2. Having read the character from the file, it is output literally to
> the html, which I guess is to be expected. Is there a way to process
> a string in order to change the š to &šuml; and so on.


That's not necessary if the page is encoded correctly.

Cheers,
--
http://www.joergjooss.de
(E-Mail Removed)
 
Reply With Quote
 
Colin Peters
Guest
Posts: n/a
 
      05-07-2005
Yunus Emre ALP÷ZEN [MCAD.NET] wrote:

> My advice u set underlying operating system encoding whatever u want. And
> use streamreader and streamwriter with System.Text.Encoding.Default which
> uses underlying OS encoding.
>
> I had same problems with Turkish encoding but this is the best solution
> (IMHO)
>


Unfortunately I'm using shared hosting. I have little influence over
operating system parameters.

Thanks anyway.
 
Reply With Quote
 
Colin Peters
Guest
Posts: n/a
 
      05-07-2005
Joerg Jooss wrote:

> UTF-7 is hardly what you want. Did you try ISO-8859-1? Or Windows-1252?



I didn't see this as an option provided by Intellisense for the class:
System.Text.Encoding

Thanks anyway.

> Colin Peters wrote:
>
>
>>Hi,
>>
>>I'm reading a file and writing it to the html output for a page.
>>
>>I've come across two difficulties which I would like to solve.
>>
>>The files contain special characters from European alphabets, namely
>>those which have the two little dots above the vowels called umlauts.
>>
>>Normally, these are rendered in html using "%auml;", but in the file
>>they are just š.
>>
>>1. I'm using a StreamReader to read the file and I have found that if
>>I don't use System.Text.Encoding.UTF7 then the characters are lost
>>completely.

>
>
> UTF-7 is hardly what you want. Did you try ISO-8859-1? Or Windows-1252?
>
>
>>Is this the correct way, or is there a way to
>>automatically get the Stream Reader to select the correct encoding,
>>or use other code to determine which would be best?

>
>
> In general, there's no way to guess a character encoding because
> there's no universal metadata that could tell you what encoding is
> being used.
>
> To put it differently: You must know the encoding, or allow the user to
> switch between possible encodings.
>
>
>
>>2. Having read the character from the file, it is output literally to
>>the html, which I guess is to be expected. Is there a way to process
>>a string in order to change the š to &šuml; and so on.

>
>
> That's not necessary if the page is encoded correctly.
>
> Cheers,

 
Reply With Quote
 
Juan T. Llibre
Guest
Posts: n/a
 
      05-07-2005
You can set the encoding as a Page directive.

<%@Page Language="VB" ResponseEncoding="UTF-8"%>

<%@Page Language="C#" ResponseEncoding="ISO-8859-1"%>





Juan T. Llibre
ASP.NET MVP
http://asp.net.do/foros/
Foros de ASP.NET en EspaŮol
Ven, y hablemos de ASP.NET...
======================

"Colin Peters" <(E-Mail Removed)> wrote in message
news:(E-Mail Removed)...
> Joerg Jooss wrote:
>
> > UTF-7 is hardly what you want. Did you try ISO-8859-1? Or Windows-1252?

>
>
> I didn't see this as an option provided by Intellisense for the class:
> System.Text.Encoding
>
> Thanks anyway.


>> Colin Peters wrote:
>>>Hi,
>>>
>>>I'm reading a file and writing it to the html output for a page.
>>>
>>>I've come across two difficulties which I would like to solve.
>>>
>>>The files contain special characters from European alphabets, namely
>>>those which have the two little dots above the vowels called umlauts.

4>>>
>>>Normally, these are rendered in html using "%auml;", but in the file
>>>they are just š.
>>>
>>>1. I'm using a StreamReader to read the file and I have found that if
>>>I don't use System.Text.Encoding.UTF7 then the characters are lost
>>>completely.

>>
>>
>> UTF-7 is hardly what you want. Did you try ISO-8859-1? Or Windows-1252?
>>
>>
>>>Is this the correct way, or is there a way to
>>>automatically get the Stream Reader to select the correct encoding,
>>>or use other code to determine which would be best?

>>
>>
>> In general, there's no way to guess a character encoding because
>> there's no universal metadata that could tell you what encoding is
>> being used.
>>
>> To put it differently: You must know the encoding, or allow the user to
>> switch between possible encodings.
>>
>>
>>
>>>2. Having read the character from the file, it is output literally to
>>>the html, which I guess is to be expected. Is there a way to process
>>>a string in order to change the š to &šuml; and so on.

>>
>>
>> That's not necessary if the page is encoded correctly.
>>
>> Cheers,



 
Reply With Quote
 
Joerg Jooss
Guest
Posts: n/a
 
      05-07-2005
Colin Peters wrote:

> Joerg Jooss wrote:
>
> > UTF-7 is hardly what you want. Did you try ISO-8859-1? Or

> Windows-1252?
>
>
> I didn't see this as an option provided by Intellisense for the class:
> System.Text.Encoding


There are only a few default instances in Encoding. You can construct
all encodings by name using Encoding.GetEncoding(), e.g.

Encoding enc = Encoding.GetEncoding("ISO-8859-1").

Cheers,
--
http://www.joergjooss.de
(E-Mail Removed)
 
Reply With Quote
 
Colin Peters
Guest
Posts: n/a
 
      05-07-2005
Aha! The penny has dropped. Or in this case, the Euro.

Many thanks to all.



Joerg Jooss wrote:

> Colin Peters wrote:
>
>
>>Joerg Jooss wrote:
>>
>> > UTF-7 is hardly what you want. Did you try ISO-8859-1? Or

>>Windows-1252?
>>
>>
>>I didn't see this as an option provided by Intellisense for the class:
>>System.Text.Encoding

>
>
> There are only a few default instances in Encoding. You can construct
> all encodings by name using Encoding.GetEncoding(), e.g.
>
> Encoding enc = Encoding.GetEncoding("ISO-8859-1").
>
> Cheers,

 
Reply With Quote
 
=?Utf-8?B?UGF1bCBQYXJraW5zb24=?=
Guest
Posts: n/a
 
      05-09-2005
Server.HtmlEncode(string) will convert any "special chars" from a text file
to the relevant &abc; equivalent without having to worry about codepages... I
use it in my chat application to prevent malicious code being inserted into
the database.

Regards,

Paul Parkinson (www.elysaria.com)

"Colin Peters" wrote:

> Aha! The penny has dropped. Or in this case, the Euro.
>
> Many thanks to all.
>
>
>
> Joerg Jooss wrote:
>
> > Colin Peters wrote:
> >
> >
> >>Joerg Jooss wrote:
> >>
> >> > UTF-7 is hardly what you want. Did you try ISO-8859-1? Or
> >>Windows-1252?
> >>
> >>
> >>I didn't see this as an option provided by Intellisense for the class:
> >>System.Text.Encoding

> >
> >
> > There are only a few default instances in Encoding. You can construct
> > all encodings by name using Encoding.GetEncoding(), e.g.
> >
> > Encoding enc = Encoding.GetEncoding("ISO-8859-1").
> >
> > Cheers,

>

 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
are there Tomboy and F-Spot equivalents? Tshepang Lekhonkhobe Python 1 12-23-2006 05:52 AM
equivalents between MS and Borland C++ Allen F. C++ 3 02-09-2005 12:39 PM
CSS equivalents for attributes Jeff Thies HTML 36 07-14-2004 11:14 PM
Any digital equivalents of Olympus Stylus Epic? Alan D. Digital Photography 5 01-03-2004 09:24 PM
New to Python; Command equivalents Code_Dark Python 2 11-05-2003 12:32 PM



Advertisments