Bug 319057 - unicode characters in virtual host name
Summary: unicode characters in virtual host name
Status: CLOSED WONTFIX
Alias: None
Product: Jetty
Classification: RT
Component: server (show other bugs)
Version: unspecified   Edit
Hardware: Macintosh Mac OS X - Carbon (unsup.)
: P5 enhancement (vote)
Target Milestone: 7.1.x   Edit
Assignee: Jan Bartel CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 343018
  Show dependency tree
 
Reported: 2010-07-06 17:19 EDT by Kjell Tillstrand CLA
Modified: 2015-07-01 01:10 EDT (History)
3 users (show)

See Also:


Attachments
tcpdump file (2.16 KB, application/octet-stream)
2010-07-07 05:49 EDT, Kjell Tillstrand CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Kjell Tillstrand CLA 2010-07-06 17:19:57 EDT
Build Identifier: jetty-hightide-7.1.5.v20100705

When using unicode chars i virtual host name server fails to handle requests. I've taken a working webapp and changes the virtual host entry to one with unicode chars in it. Browsing to the URL I get a long connecting period before browser return default error page. 

Reproducible: Always

Steps to Reproduce:
1. Add virtual host name with unicode characters such as swedish едц.
2. Let context deploy. 
3. Browse url.
Comment 1 Jan Bartel CLA 2010-07-06 19:41:03 EDT
Kjell,

Was there any debug log messages on the server side?

Also, can you capture a tcpdump (eg with wireshark) of the request so we can see what bytes were transmitted for the URL?

Note that URLs are really supposed to be ASCII only, with non ascii % url encoded. The actual character encoding of those bytes probably depends on your browser settings. So your browser may be encoding it as Unicode, but Jetty's default is UTF-8.

So, if you can see that the request is going to be arriving in some other encoding, you can either call request.setCharacterEncoding() before you read any contents, or you can set the  org.eclipse.jetty.util.UrlEncoding.charset system property.

Its probably worthwhile having a read of this wiki page, as background:

http://docs.codehaus.org/display/JETTY/International+Characters+and+Character+Encodings

Finally if nothing is working, then send the wireshark/tcp dump and attach to this bug report so we can take a look at what is actually on the wire.

thanks
Jan
Comment 2 Greg Wilkins CLA 2010-07-06 20:29:05 EDT
Ah yes!   UTF-8 names are now allowed!

Can you capture a tcpdump or wireshark trace of the actual bytes being sent by your browser.


thanks
Comment 3 Kjell Tillstrand CLA 2010-07-07 05:49:41 EDT
Created attachment 173628 [details]
tcpdump file
Comment 4 Kjell Tillstrand CLA 2010-07-07 05:57:05 EDT
Added a tcpdump file. 

Usecase:
browsed to localhost:8080 and get the 404 showing list of contexts (1). I note that the context string showed in the browser has the correct format. I then clicked on contexts and waited for browser to timeout. I'm using Crome version 5.0.375.99, but I'm getting an simular response when using firefox. 

I'm guessing it's  RFC 3490 support that is failing, or me doing something totally wrong. Is this RFC officially supported?
Comment 5 Kjell Tillstrand CLA 2010-07-07 06:30:25 EDT
Added a tcpdump file. 

Usecase:
browsed to localhost:8080 and get the 404 showing list of contexts (1). I note that the context string showed in the browser has the correct format. I then clicked on contexts and waited for browser to timeout. I'm using Crome version 5.0.375.99, but I'm getting an simular response when using firefox. 

I'm guessing it's  RFC 3490 support that is failing, or me doing something totally wrong. Is this RFC officially supported?
Comment 6 Jan Bartel CLA 2010-07-13 22:55:45 EDT
Hi Kjell,

Firstly, here are some good links for information on international characters in domain names:

http://www.chromium.org/developers/design-documents/idn-in-google-chrome
http://en.wikipedia.org/wiki/Punycode
http://tools.ietf.org/html/rfc3492
http://unicode.org/faq/idn.html

In a nutshell, when you enter a url with non-ascii chars in it as part of the hostname, then browser will "punycode" it to an ascii representation. This ascii representation must be configured into your dns service, and also *as the virtual host* for jetty.

For example, say I have the domain www.едц.com and I'm running a webapp on port 8080 at context /test. The url I type into my browser is:

http://www.едц.com:8080/test/

The browser translates this to the ascii equivalent:

http://www.xn--4cab6c.com:8080/test/

If www.едц.com is a virtual host, then I would configure it's ascii equivalent in the context xml file for the context:

<Configure class="org.eclipse.jetty.webapp.WebAppContext">

  <Set name="contextPath">/</Set>
  <Set name="war"><SystemProperty name="jetty.home" default="."/>/webapps/test.war</Set>

  <Set name="virtualHosts">
    <Array type="String">
      <Item>www.xn--4cab6c.com</Item>
    </Array>
  </Set>
</Configure>

Now, as I have no webapp deployed at /, if I hit http://www.едц.com:8080/, jetty's default handler will show me the virtual host www.xn--4cab6c.com. Clicking on the link provided will take me to my webapp.

I think it would be nicer if you could configure the original form of the hostname in the jetty config files, rather than the punycoded form - it's friendlier :) So I'm changing this issue to an enhancement.

cheers
Jan
Comment 7 Jan Bartel CLA 2015-07-01 01:10:31 EDT
Well this has been open for years as an enhancement and there seems to be zero demand for it. I'm going to close it. If anyone is desperately keen for it, then please reopen and attach your code contribution to implement it :)

Jan