Are Your CGIs Ready For HTTP 1.1 CONTENT_TYPE?Friday July 30th, 1999Eric Krock of Netscape has a second piece of news to pass along: "The HTTP 1.0 CONTENT_TYPE line looked like this: Content-Type: application/x-www-form-urlencoded The HTTP 1.1 CONTENT_TYPE line looks like this: Content-Type: application/x-www-form-urlencoded; charset=ISO-8859-1 In the interest of being standards-compliant and supporting multiple languages, Navigator 5 will support this feature of HTTP 1.1. Improperly designed CGI scripts that aren't forward-compatible with this change to the HTTP protocol will need to be fixed to support HTTP 1.1. Make sure your CGI scripts are ready for the HTTP 1.1 CONTENT_TYPE header. Read this new View Source article to find out how!" I hope no-one misreads the story and presumes the presence of "charset=ISO-8859-1" specifically. I think Krock means to indicate that a charset will be present; it may or may not be ISO 8859-1. It's not quite the same, but similar. Recently I've heard that IE supports UTF8 in query strings. Up to now I had no time to seek thru the RFCs for a corresponding standard. Will Mozilla support Unicode strings (e.g. from text boxes) in query strings? Klaus I once messed around with IE4 beta and read a bit about UTF-8 in URLs (not just the query string). This is what I've learned: The problem is, that UTF-8 is incompatible with the old encoding wich said convert everything (8bit) to %xx (not regarding the charset!). For 7bit chars this is ok, and old style and new style encoding are jsut the same. This is of course not true for "real" 8bit chars or unicode chars. Eg. every umlaut of latin1 becomes a two byte char in UTF-8. The suggested way to handle this problem, was to let the server check if the URL is valid UTF-8 (can be done, semi-heuristically) and then check the local filesystem for the resource being UTF-8 encoding and, if it was not found, being encoded in the old style. So, much for new standards and backward compatibility. Masi (masi@blackbox.net) To all who are interested in: in the meantime I found a draft (via w3c) for Unicode in URLs (and therefore also in query strings): http://www.ietf.org/internet-drafts/draft-masinter-url-i18n-04.txt So since w3c doesn't list any standards regarding this, Microsoft again makes its own standards... Klaus Anon is correct. I was giving an example of charset=ISO-8859-1, rather than implying that value is hardcoded. I think people will get the idea. If I get lots of flame mail about Netscape imposing ISO-8859-1 on the entire world, I guess I'll know otherwise ... ;-> Eric This is ironic. Mozilla doesn't handle charsets in Content-type correctly. Going to www.w3.org in the latest build doesn't work, b/c it says: Unknown File Type: text/html; charset=iso-8859-1 bugzilla, here I come... Can apache help out and tack on a default charset for us? I mean, will Mozilla completely get pissed off if one is absent? Wouldn't it be bad for Netscape to crash on all the CGI scripts it used to handle just fine? Actually, no. You see, badly written CGIs have been a major problem (can you say ?something¶m1¶m2) for quite some time. If we're going to go standard-compliant, we will have to take the chance of "breaking" badly designed code in some areas. And Mozilla won't exactly "crash on all the CGI scripts it used to handle just fine". It will crash some badly written CGIs which can't handle the W3C standard, I think. Remember the old cgi-lib.pl script? Anyone who still uses them will need to tweak their copy to work with HTTP/1.1. I came accross this earlier when playing around with mozilla and some CGI scripts. narbey when you say script?param¶m¶m.. is that the wrong whereas script?name=param&name=param&... etc would be considered OK?? I use the get often for scripts because it is bookmarkable.. i've never used this content stuff you'r talking about, and i've been working wiht scripts for almost 3 years now :P .. i have used content-type: text/html... and a few other mime's, but that's the general structure!! HELP!!! kevin@idzi.com Just wanted to emphasize: Mozilla is going to send along the chosen enconding and *only* CGI scripts which check the HTTP Content-Type they *got* will get broken. In my experience, that are not so many. Sorry. A little addition: Of course, only CGI-Scripts which do not expect the charset part will be broken. ?foo&foo is incorrrect because the & needs to be escaped out. ?foo&foo is correct. |