Umlaute (äöü) in iso-8859-15 8bit mails/news aren't shown

Discussion of bugs in Mozilla Thunderbird
Post Reply
askwar
Posts: 45
Joined: April 25th, 2003, 12:29 am

Umlaute (äöü) in iso-8859-15 8bit mails/news aren't shown

Post by askwar »

Hello.

<tt>User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4b) Gecko/20030416 Thunderbird/0.1a</tt>

Most of the time when I read a mail/news with Umlaute (äöü), Thunderbird doesn't show the Umlaut correctly; instead, it shows a ? (question mark). In the source view of the mail (Ctrl+U), I can see the Umlaut just fine.

This happens with many mails which use the <b>iso-8859-15</b> charset <em>and</em> <b>8bit</b> Content-Transfer-Encoding - however, it doesn't happen all the time. Especially 8bit mails which I wrote myself with Thunderbird are shown correctly; however 8bit mails written with mutt on Linux do not show up correctly.

I first noticed this with news from Thomas 'PointedEars' Lahn in the german newsgroup de.comm.software.mozilla.mailnews. For example his news in the thread "Kein Killfile in NS 7.0 Compact?!", Message-Id: <<tt>4454943.DU9NshWDTl@pointedears.de</tt>>.

This doesn't happen with Mozilla 1.3 MailNews.

I've set the same font in Mozilla and Thunderbird (Courier New).

Did anyone else notice this? And - how can I make it go away or find out why this happens?

Thanks,
Alexander
User avatar
RIV@NVX
Posts: 467
Joined: December 24th, 2002, 7:32 am

Post by RIV@NVX »

I noticed the same thing. This is only bug I noticed so far.
Also, Japanese messages don't show right font either.
Wonder what's causing it... is there a bugzilla id?
Why would you even consider to use the OS that is older and more obsolete than your computer?
See, that's just one of the reasons why I pick Linux.
User avatar
SoaRex
Posts: 369
Joined: April 20th, 2003, 10:01 pm
Location: Japan, the earth, the solar system

Re: Umlaute (äöü) in iso-8859-15 8bit mails/news aren't show

Post by SoaRex »

I experienced same problem on some iso-2022-jp and shift_jis (Japanese) news posts with both 04/28 and 05/03 builds.
It seems to be caused by quotation of charset value in Content-Type header.
Content-Type: text/plain; charset="iso-2022-jp" => displayed as if US-ASCII
Content-Type: text/plain; charset=iso-2022-jp => displayed properly

Does the mails or posts contain quoted charset value?
Last edited by SoaRex on May 7th, 2003, 12:06 am, edited 1 time in total.
User avatar
RIV@NVX
Posts: 467
Joined: December 24th, 2002, 7:32 am

Post by RIV@NVX »

Well, as far as I know, it doesn't happen on Mozilla Mail/News... so it's something Thunderbird specific.
Why would you even consider to use the OS that is older and more obsolete than your computer?
See, that's just one of the reasons why I pick Linux.
askwar
Posts: 45
Joined: April 25th, 2003, 12:29 am

Post by askwar »

I suppose this is related to http://bugzilla.mozilla.org/show_bug.cgi?id=164629: "regression : iso-8859-15 character set wrongly displayed in latest nightly builds". Or do you know of a better bug?
User avatar
RIV@NVX
Posts: 467
Joined: December 24th, 2002, 7:32 am

Post by RIV@NVX »

askwar wrote:I suppose this is related to http://bugzilla.mozilla.org/show_bug.cgi?id=164629: "regression : iso-8859-15 character set wrongly displayed in latest nightly builds". Or do you know of a better bug?


Perhaps this thread talks about two problems then - first is about iso-8859-15 characters, second is about japanese and some other characters when message mimetype isn't identified correctly. I am sure that second one is Thunderbird specific. I can't confirm on first one (because I don't get iso-8859-15 messages).
Why would you even consider to use the OS that is older and more obsolete than your computer?
See, that's just one of the reasons why I pick Linux.
User avatar
SoaRex
Posts: 369
Joined: April 20th, 2003, 10:01 pm
Location: Japan, the earth, the solar system

Post by SoaRex »

RIV@NVX wrote:second is about japanese and some other characters when message mimetype isn't identified correctly.
I am sure that second one is Thunderbird specific.


Second problem (quoted charset value case) of message body can be bypassed by setting "Apply default (character coding) to all messages in folder" (General Information in newsgroup properties).

In addition to message body problem, many(but not all) iso-2022-jp subjects are displayed inproperly.
Above workaround is not effective on iso-2022-jp subject problem.

These are Thunderbird specific problems as you say.
askwar
Posts: 45
Joined: April 25th, 2003, 12:29 am

Post by askwar »

I think the thread talks about the same problem, I think. My default character encoding is set to iso-8859-1. However, when I receive a message which is encoded in iso-8859-15, Umlaute (äöü) (ie. encoded characters) aren't decoded. Instead I see a ? (question mark). This applies to both header and body encoding.

However, header encoding is even more broken (I suppose because of the way headers need to be encoded (see below). Instead of just one ?, the whole encoded word (or words) are replaced by one ?. When I change the characterset to iso-8859-15, the encoded word is shown.


Example:
Subject: Re: =?iso-8859-15?Q?Unterst=FCtzt?= Mozilla Unicode
=?iso-8859-15?Q?vollst=E4ndig=3F?=

This is shown in Thunderbird as:

Re: 翽 Mozilla Unicode 翽

(with question marks).
User avatar
SoaRex
Posts: 369
Joined: April 20th, 2003, 10:01 pm
Location: Japan, the earth, the solar system

Post by SoaRex »

askwar wrote:This applies to both header and body encoding.

askwar wrote:Example:
Subject: Re: =?iso-8859-15?Q?Unterst=FCtzt?= Mozilla Unicode
=?iso-8859-15?Q?vollst=E4ndig=3F?=

This is shown in Thunderbird as:

Re: 翽 Mozilla Unicode 翽


Your problems are same as mine, I think. These are not character set specific.

askwar, please check Content-Type header first by "View Message Source" or "Headers All".
Is charset value quoted? ( charset=iso-8859-15 or charset="iso-8859-15" ? )
If charset="iso-8859-15"(quoted), force iso-8859-15 coding on message body.
(1) Check "Apply default (character coding) to all messages in folder" of General Information in folder property.
(2) Re-open the message.
Will message body be displayed properly?

Subject may not be displayed properly even after forcing "Apply default (character coding)".
Subject case is different from message-body case since encoding/decoding of subject and charset of message-body are independent.

My guess is as follows:
When subject contains (a) encoded part followed by non-encoded part, or (b) multiple encoded parts, Thunderbird fails to decode (This is your example case).
If subject consists of (c) single encoded part only, no problem occurs.
askwar
Posts: 45
Joined: April 25th, 2003, 12:29 am

Post by askwar »

Hi.

I found another broken message.

Message-ID: b9bjun$bir$07$1@news.t-online.com
Newsgroups: de.comm.software.mozilla.browser
From: Joachim Pense <joachim.pense@t-online.de>
Subject: Re: [Phoenix]Outlook Express Links in gleichem Fenster

Content-Type: text/plain; charset=iso-8859-15
Content-Transfer-Encoding: 8Bit

The message contains Umlaute which at first aren't displayed. After switching charset to iso-8859-15 via selecting View -> Character Coding -> Western (ISO-8859-15), the Umlaute are displayed. I didn't have to Re-open the message; the change worked "instantaneously" in the message preview pane.
Last edited by askwar on May 8th, 2003, 4:38 am, edited 1 time in total.
User avatar
SoaRex
Posts: 369
Joined: April 20th, 2003, 10:01 pm
Location: Japan, the earth, the solar system

Post by SoaRex »

askwar wrote:I found another broken message.
Content-Type: text/plain; charset=iso-8859-15

I found that quoted or not-quoted of charset value is not the reason.
Any message body of charset=iso-2022-jp was not displayed in Japanese characters.
Good case was charset=utf-8 case. Sorry for my mistake.

I could not access news.t-online.com because I'm not a authorized user.
But I could observe your iso-8859-15 case at NewsGroup = opera.deutsch/opera.italiano ( NewsServer = news.opera.com ).
Both iso-2022-jp(bad) and utf-8(good) cases can be observed at NewsGroup = opera.japanese.
news.opera.com is managed by Opera software and opend to public.
User avatar
RIV@NVX
Posts: 467
Joined: December 24th, 2002, 7:32 am

Post by RIV@NVX »

askwar wrote:The message contains Umlaute which at first aren't displayed. After switching charset to iso-8859-15 via selecting View -> Character Coding -> Western (ISO-8859-15), the Umlaute are displayed. I didn't have to Re-open the message; the change worked "instantaneously" in the message preview pane.


Same thing happens with Croatian characters (iso-8859-2), but I have to use "Apply default to all messages (ignore coding specifed by mime header". Otherwise, they are all displayed as question marks.

Also, Japanese messages don't have special font, if they use our characters.

I actually have no idea what could cause it, because it all works in Mozilla. Guess I will have to stick to Mozilla Mail for a while (at least until this is fixed).
Why would you even consider to use the OS that is older and more obsolete than your computer?
See, that's just one of the reasons why I pick Linux.
Post Reply