![]() |
|
|
|||||||
![]() |
Java - UTF8 characters not appearing correctly in email subject line |
|
|
Thread Tools | Search this Thread |
|
|
#1 |
|
Hi everyone,
Thanks for taking the time to look at this. I've got a problem trying to send emails with 'unusual' characters (e.g. ó) in the Subject (& contents). The following code gives an example :- package mail; import javax.mail.*; import java.util.*; import javax.mail.internet.*; public class Send { static String mailServer = "smtp.server.com"; static String mailFrom = ""; static String mailTo = ""; static String mailSubject = "Actualización"; static String mailBody = "Actualización"; static Properties props = System.getProperties(); public static void Send() { props.put("mail.host",mailServer); props.put("mail.transport.protocol","smtp"); Session mailSession = Session.getDefaultInstance(props,null); mailSession.setDebug(false); MimeMessage msg = new MimeMessage(mailSession); try { msg.setFrom(new InternetAddress(mailFrom)); InternetAddress[] address = {new InternetAddress(mailTo)}; msg.setRecipients(Message.RecipientType.TO,address ); msg.setSubject(mailSubject, "UTF8"); msg.setSentDate(new Date()); msg.setText(mailBody, "UTF8"); Transport.send(msg); } catch (Exception e) { } } public static void main(String[] args) { Send(); } } When I run this I end up with '=?UTF8?Q?Actualizaci=C3=B3n?=' in the subject & the contents of 'This message uses a character set that is not supported by the Internet Service. To view the original message content, open the attached message. If the text doesn't display correctly, save the attachment to disk, and then open it using a viewer that can display the original character set.'. The email server can deal with these special characters as I can use Outlook to create the email I require. Thanks in advance for any solutions. Andee Andee Weir |
|
|
|
|
#2 |
|
Posts: n/a
|
I am not an expert in internet mail but my understanding is that there
are still a significant number of SMTP servers that only support 7bit ASCII encoding. If your email happens to pass through just one of these it is likely to get mangled. You send email to your smtp server to send on to the final destination. It will pass through other servers on it's way and one of these may only support ASCII. Perhaps someone can expand on this or correct it if necessary, Barry Andee Weir wrote: > Hi everyone, > > Thanks for taking the time to look at this. > > I've got a problem trying to send emails with 'unusual' characters > (e.g. ó) in the Subject (& contents). The following code gives an > example :- > > package mail; > > import javax.mail.*; > import java.util.*; > import javax.mail.internet.*; > > public class Send { > static String mailServer = "smtp.server.com"; > static String mailFrom = ""; > static String mailTo = ""; > static String mailSubject = "Actualización"; > static String mailBody = "Actualización"; > static Properties props = System.getProperties(); > > public static void Send() { > > props.put("mail.host",mailServer); > props.put("mail.transport.protocol","smtp"); > Session mailSession = Session.getDefaultInstance(props,null); > mailSession.setDebug(false); > MimeMessage msg = new MimeMessage(mailSession); > try { > msg.setFrom(new InternetAddress(mailFrom)); > InternetAddress[] address = {new InternetAddress(mailTo)}; > msg.setRecipients(Message.RecipientType.TO,address ); > msg.setSubject(mailSubject, "UTF8"); > msg.setSentDate(new Date()); > msg.setText(mailBody, "UTF8"); > Transport.send(msg); > } catch (Exception e) { > } > } > > public static void main(String[] args) { > Send(); > } > } > > When I run this I end up with '=?UTF8?Q?Actualizaci=C3=B3n?=' in the > subject & the contents of 'This message uses a character set that is > not supported by the Internet Service. To view the original message > content, open the attached message. If the text doesn't display > correctly, save the attachment to disk, and then open it using a > viewer that can display the original character set.'. > > The email server can deal with these special characters as I can use > Outlook to create the email I require. > > Thanks in advance for any solutions. > > Andee |
|
|
|
#3 |
|
Posts: n/a
|
Andee Weir wrote:
> Hi everyone, > > Thanks for taking the time to look at this. > > I've got a problem trying to send emails with 'unusual' characters > (e.g. ó) in the Subject (& contents). The following code gives an > example :- > static String mailSubject = "Actualización"; > static String mailBody = "Actualización"; One minor point. Instead of using non-ASCII characters directly in your sources, use the Unicode escapes instead. \uxxxx. native2ascii can help you with this. > msg.setSubject(mailSubject, "UTF8"); > msg.setSentDate(new Date()); > msg.setText(mailBody, "UTF8"); Not sure if this is a problem, but as of JDK 1.2 and higher it was changed to the "official" string of "UTF-8". > When I run this I end up with '=?UTF8?Q?Actualizaci=C3=B3n?=' in the > subject & the contents of 'This message uses a character set that is > not supported by the Internet Service. To view the original message > content, open the attached message. If the text doesn't display > correctly, save the attachment to disk, and then open it using a > viewer that can display the original character set.'. Aha! Looks like your mail went out exactly as you told it to. That is the proper way to encode non-ASCII mail headers. However... the proper encoding name is "UTF-8". "UTF8" is just Java's internal key used to identify the encoding, not it's public name. |
|
|
|
#4 |
|
Posts: n/a
|
Barry White wrote:
> I am not an expert in internet mail but my understanding is that there > are still a significant number of SMTP servers that only support 7bit > ASCII encoding. > > If your email happens to pass through just one of these it is likely to > get mangled. You send email to your smtp server to send on to the final > destination. It will pass through other servers on it's way and one of > these may only support ASCII. > > Perhaps someone can expand on this or correct it if necessary, Yes. Non-ASCII text generally needs to get handled explicitly to make it through mail gateways. Hence the need for the character set. Once that is on there, intermediate mail gateways are free to change the transfer encoding to get through. Mail headers are a little different in how they get across. However, his are properly encoded in ASCII-only escapes: '=?UTF8?Q?Actualizaci=C3=B3n?=' That means "Character set is 'UTF8'" followed by "This data is Q-encoded". The OP's problem is most likely only that he is using Java's internal name of "UTF8" instead of the proper public name of "UTF-8". |
|
|
|
#5 |
|
Posts: n/a
|
Thanks for the help guys - it worked a treat.
Just a word of warning - when I ran native2ascii in dos & typed in ó it returned \u00a2 which when I used it in the email code returned a cent symbol. The actual code I required was \u00f3 (found at http://www.unicode.org/charts/PDF/U0080.pdf). Thanks again, Andee |
|
|
|
#6 |
|
Posts: n/a
|
Andee Weir wrote:
> Thanks for the help guys - it worked a treat. > > Just a word of warning - when I ran native2ascii in dos & typed in ó > it returned \u00a2 which when I used it in the email code returned a > cent symbol. The actual code I required was \u00f3 Then you probably were using different encodings for creating the file and running native2ascii. |
|
|
|
#7 |
|
Posts: n/a
|
"Jon A. Cruz" <> schrieb im Newsbeitrag news:... > Barry White wrote: > Mail headers are a little different in how they get across. However, his > are properly encoded in ASCII-only escapes: > '=?UTF8?Q?Actualizaci=C3=B3n?=' > > That means "Character set is 'UTF8'" followed by "This data is Q-encoded". Watch out: UTF-8 is not ASCII UTF-8 uses 8 bits, while ASCII uses 7 bits. You won't notice a difference as long as you use ASCII text only, as both charsets map to the same bit values here. But as soon as you need other characters, UTF-8 will introduce a multibyte character with the most significant bit in the first byte set => no ASCII. To ensure such text goes through 7bit clean ASCII mail transfer agents, use something like MIME encoding, such as Base64. Hiran |
|
|
|
#8 |
|
Posts: n/a
|
Hiran Chaudhuri wrote:
> Watch out: UTF-8 is not ASCII > > UTF-8 uses 8 bits, while ASCII uses 7 bits. Jon got it absolutely right. I have explained that a few times here. An encoding information like "UTF-8" in such a mail header does not mean that the data is in that encoding. It means the data WAS in that encoding. It is now in plain 7 bit ASCII. The encoding information is there, so MUAs can reconstruct the original string. > To ensure such text goes through 7bit clean ASCII mail transfer agents, use > something like MIME encoding, such as Base64. In headers you use Q or B encoding. /Thomas |
|
|
|
#9 |
|
Posts: n/a
|
Hiran Chaudhuri wrote:
> "Jon A. Cruz" <> schrieb im Newsbeitrag > news:... > >>Barry White wrote: >>Mail headers are a little different in how they get across. However, his >>are properly encoded in ASCII-only escapes: >>'=?UTF8?Q?Actualizaci=C3=B3n?=' >> >>That means "Character set is 'UTF8'" followed by "This data is Q-encoded". > > > Watch out: UTF-8 is not ASCII > Yes, I know. And so does the mail API the poster was using. That's the reason for the second part of my statement "This data is Q-encoded". > UTF-8 uses 8 bits, while ASCII uses 7 bits. You won't notice a difference as > long as you use ASCII text only, as both charsets map to the same bit values > here. But as soon as you need other characters, UTF-8 will introduce a > multibyte character with the most significant bit in the first byte set => > no ASCII. Right. Look at that subject line. That's what is there. Multibyte UTF-8 characters encoded into 7-bit ASCII only. Notice the difference between "character set" and "encoding". > > To ensure such text goes through 7bit clean ASCII mail transfer agents, use > something like MIME encoding, such as Base64. Or for header lines do like the poster's mail API does and follow RFC 2047. "Q" encoding is most often used when only a few characters are non-ASCII, or when enough of them to carry meaning are ASCII. "B" encoding uses Base64. BTW, that subject line *is* using MIME, since it's following RFC-2045 through RFC-2049. However... the person using the API doesn't have to worry about the mechanics. They pass in a Unicode string (because Java is Unicode) along with the character set you'd like it to try to use and the mail API takes care of the rest. |
|
|
|
#10 |
|
Posts: n/a
|
Michael Borgwardt wrote:
> Andee Weir wrote: > >> Thanks for the help guys - it worked a treat. >> >> Just a word of warning - when I ran native2ascii in dos & typed in ó >> it returned \u00a2 which when I used it in the email code returned a >> cent symbol. The actual code I required was \u00f3 > > > Then you probably were using different encodings for creating the file > and running native2ascii. > > Yes. That was *exactly* one of the largest reasons not to use non-ASCII litterally in sources. "o acute" is 0xa2 in CodePage 437 (the DOS code page) and CodePage 850 (the DOS international code page), while it is 0xf3 in Latin-1 (ISO-8859-1) and CodePage 1252 (default Windows western). One thing native2ascii lets you set is the encoding. If you've been editing sources in Windows, you'd probably want to call it with Cp1252 as the explicit encoding. |
|