utf-8 (swedish) characters in the URL

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

utf-8 (swedish) characters in the URL

Mats Andersson-2

I have tried hard to get this working in Tapestry 5.1, but can't get it
to work 100%.

My intention is to have search parameters in the activation context to
support RESTful URL:s for my search results page. This works out of the
box in Tapestry, but since the Tapestry specific encoding of non-ascii
characters ($nnnn) is not very user friendly, or search engine friendly,
I have replaced the URLEncoder with my own implementation. Currently it
accepts the swedish characters to be sent as is, just as the normal
ascii characters. This also requires that Tomcat is set up correctly:

<Connector ... URIEncoding="UTF-8"/>

This way the user can enter swedish characters in the URL, which are
handled correctly on the server. So far so good.



The problem is when returning utf-8 strings from onPassivate(). When the
value arrives in onActivate() the swedish characters are all replaced by
the replacement character U+FFFD, or 65533 (diamond with question mark),
making it impossible to know what character it was from the beginning.
It seems like it is HttpServletRequest.getServletPath() called from
RequestImpl.java that causes this.

 From the service method of a contributed HttpServletRequestHandler you
can see the results from calling the HttpServletRequest methods:

         request.getServletPath: /searchresults/?vrigt
         request.getRequestURI: /searchresults/%F6vrigt
         service: request.getRequestURL:
http://192.168.0.100:8080/searchresults/%F6vrigt

Have anyone solved this, or am I doing something that is not supposed to
work in Tapestry 5.


Regards,
Mats



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: utf-8 (swedish) characters in the URL

Martin Strand-4
Hi Mats.
I believe you need to URLEncode those characters before returning them  
 from your custom URLEncoder.
This is what I'm doing and it works fine in our app - Unicode characters  
show up "pretty" in the address field.

/searchresult/övrigt
/searchresult/日本


PrettyURLEncoder.java:

public String encode(String input)
{
   String output = customEncoding(input);
   ...
   // Encode characters before giving the URL to the client...
   return java.net.URLEncoder.encode(output, "UTF-8");
}

public String decode(String input)
{
   ...
   // ...and decode them on their way back
   input = java.net.URLDecoder.decode(input, "UTF-8");
   ...
}


We are however using 5.2 and Jetty, not sure whether that makes a  
difference.


Martin


On Thu, 13 Jan 2011 12:20:38 +0100, Mats Andersson  
<[hidden email]> wrote:

>
> I have tried hard to get this working in Tapestry 5.1, but can't get it  
> to work 100%.
>
> My intention is to have search parameters in the activation context to  
> support RESTful URL:s for my search results page. This works out of the  
> box in Tapestry, but since the Tapestry specific encoding of non-ascii  
> characters ($nnnn) is not very user friendly, or search engine friendly,  
> I have replaced the URLEncoder with my own implementation. Currently it  
> accepts the swedish characters to be sent as is, just as the normal  
> ascii characters. This also requires that Tomcat is set up correctly:
>
> <Connector ... URIEncoding="UTF-8"/>
>
> This way the user can enter swedish characters in the URL, which are  
> handled correctly on the server. So far so good.
>
>
>
> The problem is when returning utf-8 strings from onPassivate(). When the  
> value arrives in onActivate() the swedish characters are all replaced by  
> the replacement character U+FFFD, or 65533 (diamond with question mark),  
> making it impossible to know what character it was from the beginning.  
> It seems like it is HttpServletRequest.getServletPath() called from  
> RequestImpl.java that causes this.
>
>  From the service method of a contributed HttpServletRequestHandler you  
> can see the results from calling the HttpServletRequest methods:
>
>          request.getServletPath: /searchresults/?vrigt
>          request.getRequestURI: /searchresults/%F6vrigt
>          service: request.getRequestURL:  
> http://192.168.0.100:8080/searchresults/%F6vrigt
>
> Have anyone solved this, or am I doing something that is not supposed to  
> work in Tapestry 5.
>
>
> Regards,
> Mats

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: utf-8 (swedish) characters in the URL

Mats Andersson-2
Thanks, that did it!

The characters are handled correctly in the app and show up pretty in
Firefox and Google Chrome. MSIE ( 8 ) shows the encoded characters in
the URL (%nn%mm), but that is probably not my fault. I see that other
sites have the same issue in MSIE.


Mats



Martin Strand skrev 2011-01-13 12:58:

> Hi Mats.
> I believe you need to URLEncode those characters before returning them
> from your custom URLEncoder.
> This is what I'm doing and it works fine in our app - Unicode
> characters show up "pretty" in the address field.
>
> /searchresult/övrigt
> /searchresult/日本
>
>
> PrettyURLEncoder.java:
>
> public String encode(String input)
> {
>   String output = customEncoding(input);
>   ...
>   // Encode characters before giving the URL to the client...
>   return java.net.URLEncoder.encode(output, "UTF-8");
> }
>
> public String decode(String input)
> {
>   ...
>   // ...and decode them on their way back
>   input = java.net.URLDecoder.decode(input, "UTF-8");
>   ...
> }
>
>
> We are however using 5.2 and Jetty, not sure whether that makes a
> difference.
>
>
> Martin
>
>
> On Thu, 13 Jan 2011 12:20:38 +0100, Mats Andersson
> <[hidden email]> wrote:
>
>>
>> I have tried hard to get this working in Tapestry 5.1, but can't get
>> it to work 100%.
>>
>> My intention is to have search parameters in the activation context
>> to support RESTful URL:s for my search results page. This works out
>> of the box in Tapestry, but since the Tapestry specific encoding of
>> non-ascii characters ($nnnn) is not very user friendly, or search
>> engine friendly, I have replaced the URLEncoder with my own
>> implementation. Currently it accepts the swedish characters to be
>> sent as is, just as the normal ascii characters. This also requires
>> that Tomcat is set up correctly:
>>
>> <Connector ... URIEncoding="UTF-8"/>
>>
>> This way the user can enter swedish characters in the URL, which are
>> handled correctly on the server. So far so good.
>>
>>
>>
>> The problem is when returning utf-8 strings from onPassivate(). When
>> the value arrives in onActivate() the swedish characters are all
>> replaced by the replacement character U+FFFD, or 65533 (diamond with
>> question mark), making it impossible to know what character it was
>> from the beginning. It seems like it is
>> HttpServletRequest.getServletPath() called from RequestImpl.java that
>> causes this.
>>
>>  From the service method of a contributed HttpServletRequestHandler
>> you can see the results from calling the HttpServletRequest methods:
>>
>>          request.getServletPath: /searchresults/?vrigt
>>          request.getRequestURI: /searchresults/%F6vrigt
>>          service: request.getRequestURL:
>> http://192.168.0.100:8080/searchresults/%F6vrigt
>>
>> Have anyone solved this, or am I doing something that is not supposed
>> to work in Tapestry 5.
>>
>>
>> Regards,
>> Mats
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]