[LONG] java.net.URI encoding weirdness

  • Thread starter Stanimir Stamenkov
  • Start date
S

Stanimir Stamenkov

This is a long time observation but I wanted to summarize it and
give heads up to ones which might not have encountered it, yet.

It doesn't appear java.net.URI behaves in undocumented way, but just
in no useful way. In my experience the java.net.URI is only
suitable for parsing certain URI parts, and not for constructing URI
instances, either using the properties of an existing URI or using
values obtained else way.

My use case is simple: Have an input URI which I want to modify
certain components/properties of, and produce a new URI. For
example, change the 'host' or 'path' of an HTTP URL.

The first example behaves pretty much as I expect:

import java.net.URI;
import java.net.URLEncoder;

public class URITest {

public static void main(String[] args) throws Exception {
System.out.println(URLEncoder
.encode("#%&/;=?@", "US-ASCII"));

URI u = URI.create("http://user@domain@server1:8080"
+ "/path?param=value#fragment");
System.out.println(u.toASCIIString());

URI v = new URI(u.getScheme(),
u.getUserInfo(),
"server2",
u.getPort(),
u.getPath(),
u.getQuery(),
u.getFragment());
System.out.println(v.toASCIIString());

URI w = new URI(u.getScheme(),
u.getRawUserInfo(),
"server3",
u.getPort(),
u.getRawPath(),
u.getRawQuery(),
u.getRawFragment());
System.out.println(w.toASCIIString());
}

}

It tests the behavior of the URI(scheme, userInfo, host, port, path,
query, fragment) constructor, and the output is as:

http://user@domain@server1:8080/path?param=value#fragment
http://user@domain@server2:8080/path?param=value#fragment
http://user%40domain@server3:8080/path?param=value#fragment

As I would expect the 'userInfo' is encoded properly when given as
decoded value (and double-encoded if given as a raw, already encoded
value). The other properties, in this case, don't make a difference
because their values are the same in raw and decoded form.

----

Now, I expect the URI(scheme, authority, path, query, fragment)
constructor would need a raw 'authority' value as it gets parsed
into 'userInfo', 'host' and 'port' components/properties:

public class URITest2 {

public static void main(String[] args) throws Exception {
URI u = URI.create("http://user@domain@server1:8080"
+ "/path?param=value#fragment");
System.out.println(u.toASCIIString());

URI v = new URI(u.getScheme(),
u.getAuthority(),
"/htap",
u.getQuery(),
u.getFragment());
System.out.println(v.toASCIIString());

URI w = new URI(u.getScheme(),
u.getRawAuthority(),
"/htap",
u.getQuery(),
u.getFragment());
System.out.println(w.toASCIIString());
}

}

The output:

http://user@domain@server1:8080/path?param=value#fragment
http://user@domain@server1:8080/htap?param=value#fragment
http://user%40domain@server1:8080/htap?param=value#fragment

shows there's no way to re-construct a correct URI using it.

----

The constructor URI(str) is not particularly interesting as it
parses the complete URI string, and I've further tried the simpler
URI(scheme, ssp, fragment) one:

public class URITest2a {

public static void main(String[] args) throws Exception {
URI u = URI.create("http://user@domain@server1:8080"
+ "/path?param=value#frag%23ment");
System.out.println(u.toASCIIString());

URI v = new URI(u.getScheme(),
u.getSchemeSpecificPart(),
u.getFragment());
System.out.println(v.toASCIIString());

URI w = new URI(u.getScheme(),
u.getRawSchemeSpecificPart(),
u.getRawFragment());
System.out.println(w.toASCIIString());

URI x = new URI(u.getScheme(),
u.getRawSchemeSpecificPart(),
u.getFragment());
System.out.println(x.toASCIIString());
}

}

The output:

http://user@domain@server1:8080/path?param=value#frag#ment
http://user@domain@server1:8080/path?param=value#frag#ment
http://user%40domain@server1:8080/path?param=value#frag%23ment
http://user%40domain@server1:8080/path?param=value#frag#ment

shows the 'fragment' is properly encoded, but then either using the
'rawSchemeSpecificPart' or the decoded 'schemeSpecificPart' doesn't
yield correct new URI.

----

It becomes even funnier when dealing with 'path' and 'query'
components which contain special URI characters (back to using the
"most specific" constructor from the first example):

public class URITest3 {

public static void main(String[] args) throws Exception {
URI u = URI.create("http://server1/path"
+ "?param%3D1=value%261&param%3F2=value%232"
+ "#fragment");
System.out.println(u.toASCIIString());

URI v = new URI(u.getScheme(),
u.getUserInfo(),
"server2",
u.getPort(),
u.getPath(),
u.getQuery(),
u.getFragment());
System.out.println(v.toASCIIString());

URI w = new URI(u.getScheme(),
u.getRawUserInfo(),
"server3",
u.getPort(),
u.getRawPath(),
u.getRawQuery(),
u.getRawFragment());
System.out.println(w.toASCIIString());
}

}

Output:

http://server1/path?param=1=value&1&param?2=value#2#fragment
http://server2/path?param=1=value&1&param?2=value#2#fragment
http://server3/path?param%3D1=value%261&param%3F2=value%232#fragment

The query part gets damaged either way.

----

The only way to construct a proper URI, changing just certain
components of a source URI, seems to construct it manually:

public class URITest4 {

public static void main(String[] args) throws Exception {
URI u = URI.create("http://server1/path"
+ "?param%3D1=value%261&param%3F2=value%232"
+ "#fragment");
System.out.println(u.toASCIIString());

StringBuilder v = new StringBuilder();
v.append(u.getScheme()).append("://");
if (u.getRawUserInfo() != null) {
v.append(u.getRawUserInfo()).append('@');
}
v.append(u.getHost());
if (u.getPort() != -1) {
v.append(':').append(u.getPort());
}

v.append("/pat2"); // Replace path

if (u.getRawQuery() != null) {
v.append('?').append(u.getRawQuery());
}

if (u.getRawFragment() != null) {
v.append('#').append(u.getRawFragment());
}

System.out.println(v);
}

}

I think all this mess is caused by the URI constructors blindly
encoding special URI characters in given 'path', 'query' etc.
without considering the context, and you probably shouldn't be using
the java.net.URI constructors for any serious work.

Do you think Oracle should reconsider the java.net.URI
implementation so it becomes more useful? What alternatives to
java.net.URI you're aware of (may something like
javax.ws.rs.core.UriBuilder), regarding such
manipulation/construction use case?
 
M

Mike Amling

This is a long time observation but I wanted to summarize it and give
heads up to ones which might not have encountered it, yet.

It doesn't appear java.net.URI behaves in undocumented way, but just in
no useful way. In my experience the java.net.URI is only suitable for
parsing certain URI parts, and not for constructing URI instances,
either using the properties of an existing URI or using values obtained
else way.

My use case is simple: Have an input URI which I want to modify certain
components/properties of, and produce a new URI. For example, change
the 'host' or 'path' of an HTTP URL.

The first example behaves pretty much as I expect:

import java.net.URI;
import java.net.URLEncoder;

public class URITest {

public static void main(String[] args) throws Exception {
System.out.println(URLEncoder
.encode("#%&/;=?@", "US-ASCII"));

URI u = URI.create("http://user@domain@server1:8080"
+ "/path?param=value#fragment");
System.out.println(u.toASCIIString());

URI v = new URI(u.getScheme(),
u.getUserInfo(),
"server2",
u.getPort(),
u.getPath(),
u.getQuery(),
u.getFragment());
System.out.println(v.toASCIIString());

URI w = new URI(u.getScheme(),
u.getRawUserInfo(),
"server3",
u.getPort(),
u.getRawPath(),
u.getRawQuery(),
u.getRawFragment());
System.out.println(w.toASCIIString());
}

}

It tests the behavior of the URI(scheme, userInfo, host, port, path,
query, fragment) constructor, and the output is as:

http://user@domain@server1:8080/path?param=value#fragment
http://user@domain@server2:8080/path?param=value#fragment
http://user%40domain@server3:8080/path?param=value#fragment

As I would expect the 'userInfo' is encoded properly when given as
decoded value (and double-encoded if given as a raw, already encoded
value). The other properties, in this case, don't make a difference
because their values are the same in raw and decoded form.

----

Now, I expect the URI(scheme, authority, path, query, fragment)
constructor would need a raw 'authority' value as it gets parsed into
'userInfo', 'host' and 'port' components/properties:

public class URITest2 {

public static void main(String[] args) throws Exception {
URI u = URI.create("http://user@domain@server1:8080"
+ "/path?param=value#fragment");
System.out.println(u.toASCIIString());

URI v = new URI(u.getScheme(),
u.getAuthority(),
"/htap",
u.getQuery(),
u.getFragment());
System.out.println(v.toASCIIString());

URI w = new URI(u.getScheme(),
u.getRawAuthority(),
"/htap",
u.getQuery(),
u.getFragment());
System.out.println(w.toASCIIString());
}

}

The output:

http://user@domain@server1:8080/path?param=value#fragment
http://user@domain@server1:8080/htap?param=value#fragment
http://user%40domain@server1:8080/htap?param=value#fragment

shows there's no way to re-construct a correct URI using it.

... TL;DR

Looks like you need

URI v = new URI(
u.getScheme(),
u.getAuthority().someKindOfEncodeFunction(),
"/htap",
u.getQuery(),
u.getFragment());


Mike Amling
 
M

Mike Amling

This is a long time observation but I wanted to summarize it and give
heads up to ones which might not have encountered it, yet.
...
Now, I expect the URI(scheme, authority, path, query, fragment)
constructor would need a raw 'authority' value as it gets parsed into
'userInfo', 'host' and 'port' components/properties:

public class URITest2 {

public static void main(String[] args) throws Exception {
URI u = URI.create("http://user@domain@server1:8080"
+ "/path?param=value#fragment");
System.out.println(u.toASCIIString());

URI v = new URI(u.getScheme(),
u.getAuthority(),
"/htap",
u.getQuery(),
u.getFragment());
System.out.println(v.toASCIIString());

URI w = new URI(u.getScheme(),
u.getRawAuthority(),
"/htap",
u.getQuery(),
u.getFragment());
System.out.println(w.toASCIIString());
}

}

The output:

http://user@domain@server1:8080/path?param=value#fragment
http://user@domain@server1:8080/htap?param=value#fragment
http://user%40domain@server1:8080/htap?param=value#fragment

shows there's no way to re-construct a correct URI using it.

... TL;DR

Looks like you need

URI v = new URI(
u.getScheme(),
u.getAuthority().someKindOfEncodeFunction(),
"/htap",
u.getQuery(),
u.getFragment());

Oh, no, that won't work, either. Sorry.

Mike Amling
 
S

Stanimir Stamenkov

Mon, 05 May 2014 11:37:11 -0500, /Mike Amling/:
Looks like you need

URI v = new URI(
u.getScheme(),
u.getAuthority().someKindOfEncodeFunction(),
"/htap",
u.getQuery(),
u.getFragment());

What king of encoding function one would need to construct the
following URI:

http://user@domain@server1:8080/htap?param=value#fragment

?

If you take a closer look I've also used:

URI w = new URI(u.getScheme(),
u.getRawAuthority(),
"/htap",
u.getQuery(),
u.getFragment());

Where the 'rawAuthority' is already encoded as:

user%40domain@server1:8080

but then the result is:

http://user%40domain@server1:8080/htap?param=value#fragment
 
M

markspace

I think all this mess is caused by the URI constructors blindly encoding
special URI characters in given 'path', 'query' etc. without considering
the context, and you probably shouldn't be using the java.net.URI
constructors for any serious work.

*cough* java.net.UrlDecoder.
 
S

Stanimir Stamenkov

Mon, 05 May 2014 11:58:29 -0700, /markspace/:
On 5/5/2014 6:11 AM, Stanimir Stamenkov wrote:

I think all this mess is caused by the URI constructors blindly encoding
special URI characters in given 'path', 'query' etc. without considering
the context, and you probably shouldn't be using the java.net.URI
constructors for any serious work.

*cough* java.net.UrlDecoder.

I don't really get if it is meant as a solution, but have you
noticed the URI constructors produce wrong (at least unexpected)
results with encoded and decoded values?
 
M

markspace

Mon, 05 May 2014 11:58:29 -0700, /markspace/:
I don't really get if it is meant as a solution, but have you noticed
the URI constructors produce wrong (at least unexpected) results with
encoded and decoded values?

I did not notice that, no. All of your ctors had encoded values. None
were unencoded.

I haven't researched this fully, but a quick test indicated it does work.
 
S

Stanimir Stamenkov

Mon, 05 May 2014 16:51:29 -0700, /markspace/:
I did not notice that, no. All of your ctors had encoded values.
None were unencoded.

I haven't researched this fully, but a quick test indicated it does
work.

Could you share your exact test? And no, my tests didn't use all
encoded values – here's one of the examples, once again:

public class URITest3 {

public static void main(String[] args) throws Exception {
URI u = URI.create("http://server1/path"
+ "?param%3D1=value%261&param%3F2=value%232"
+ "#fragment");
System.out.println(u.toASCIIString());

URI v = new URI(u.getScheme(),
u.getUserInfo(),
"server2",
u.getPort(),
u.getPath(),
u.getQuery(),
u.getFragment());
System.out.println(v.toASCIIString());

URI w = new URI(u.getScheme(),
u.getRawUserInfo(),
"server3",
u.getPort(),
u.getRawPath(),
u.getRawQuery(),
u.getRawFragment());
System.out.println(w.toASCIIString());
}

}

The original URI 'u' is:

http://server1/path?param=1=value&1&param?2=value#2#fragment

Then the URI 'v' which is constructed with all decoded arguments
becomes:

http://server2/path?param=1=value&1&param?2=value#2#fragment

Note how the 'query' is wrong now.

However, the URI 'w' which uses the encoded arguments is also wrong:

http://server3/path?param%3D1=value%261&param%3F2=value%232#fragment
 
S

Steven Simpson

The original URI 'u' is:

http://server1/path?param=1=value&1&param?2=value#2#fragment

Then the URI 'v' which is constructed with all decoded arguments becomes:

http://server2/path?param=1=value&1&param?2=value#2#fragment

Note how the 'query' is wrong now.

However, the URI 'w' which uses the encoded arguments is also wrong:

http://server3/path?param%3D1=value%261&param%3F2=value%232#fragment

This rings a bell. I was finding URI's multi-arg constructors useless.
Since the query string must be submitted as a unit, it must already be
using & and =, and any special characters in it that are to be taken
literally must already be encoded, yet the constructor treats it as
unencoded, and adds another level of encoding. That also goes for your
userInfo component with an @ in it to separate user and domain.

I think the solution for me was just to use URLEncoder, concatenate the
strings, then pass to URI.create().
 
S

Stanimir Stamenkov

Tue, 06 May 2014 08:47:09 +0100, /Steven Simpson/:
This rings a bell. I was finding URI's multi-arg constructors
useless. Since the query string must be submitted as a unit, it must
already be using & and =, and any special characters in it that are
to be taken literally must already be encoded, yet the constructor
treats it as unencoded, and adds another level of encoding. That
also goes for your userInfo component with an @ in it to separate
user and domain.

I think the solution for me was just to use URLEncoder, concatenate
the strings, then pass to URI.create().

Yeah, that was what I was trying to avoid – the manual URI
construction/concatenation, and then the overhead of parsing (for
every construction). This appears the most natural usage of the URI
constructors to me – having the different components/properties
already parsed.
 
M

markspace

public static void main(String[] args) throws Exception {
URI u = URI.create("http://server1/path"
+ "?param%3D1=value%261&param%3F2=value%232"
^^^ ^^^ ^^^ ^^^
+ "#fragment");
Note how the 'query' is wrong now.

Doesn't each % above indicate an encoded value, or are you referring to
something else? I'm not sure we're not talking cross purposes here.

URI u = URI.create( decode("http://server1/path"
+ "?param%3D1=value%261&param%3F2=value%242"
+ "#fragment"));
System.out.println(u.toASCIIString());

run:
http://server1/path?param=1=value&1&param?2=value$2#fragment
BUILD SUCCESSFUL (total time: 1 second)
 
S

Stanimir Stamenkov

Tue, 06 May 2014 10:26:56 +0300, /markspace/:
public static void main(String[] args) throws Exception {
URI u = URI.create("http://server1/path"
+ "?param%3D1=value%261&param%3F2=value%232"
^^^ ^^^ ^^^ ^^^
+ "#fragment");
Note how the 'query' is wrong now.

Doesn't each % above indicate an encoded value, or are you referring
to something else?

You're missing I'm using further:

URI v = new URI(u.getScheme(),
u.getUserInfo(),
"server2",
u.getPort(),
u.getPath(),
u.getQuery(),
u.getFragment());

The 'query' property value is already decoded.
I'm not sure we're not talking cross purposes here.

URI u = URI.create( decode("http://server1/path"
+ "?param%3D1=value%261&param%3F2=value%242"
+ "#fragment"));
System.out.println(u.toASCIIString());

run:
http://server1/path?param=1=value&1&param?2=value$2#fragment
BUILD SUCCESSFUL (total time: 1 second)

The URI.create(String) and the URI(String) constructor perform
complete parsing, and they are working fine. Note, after decoding
the result of your example is already wrong, different from the
original URI:

http://server1/path?param=1=value&1&param?2=value$2#fragment

There are two parameters (decoded names):

param=1
param?2

and their (decoded) values are:

value&1
value#2

What would be the query parameters in your result:

http://server1/path?param=1=value&1&param?2=value$2#fragment

?
 
M

markspace

public static void main(String[] args) throws Exception {
URI u = URI.create("http://server1/path"
+ "?param%3D1=value%261&param%3F2=value%232"
+ "#fragment");
Note how the 'query' is wrong now.


OK, I think I see what is going on. Most of your OP was frankly wrong,
however this one represents an actual bug in the URI parsing:


public class URITest3 {

public static void main(String[] args) throws Exception {
URI u = URI.create("http://server1/path"
+ "?param%3D1=value%261&param%3F2=value%232"
+ "#fragment");
System.out.println(u.toASCIIString());

URI v = new URI(u.getScheme(),
u.getUserInfo(),
"server2",
u.getPort(),
u.getPath(),
u.getQuery(),
u.getFragment());
System.out.println(v.toASCIIString());

// "raw" is user error, omitted.
}

Output:

http://server1/path?param=1=value&1&param?2=value#2#fragment
http://server2/path?param=1=value&1&param?2=value#2#fragment

Yes clearly the query part was not parsed correctly. It's ambiguous and
can't be parsed. (Incidentally, '=' and '&' weren't parsed correctly,
but I think '?' was, and was incorrectly escaped in your original
string. Double check with the spec on that.)

To fix this, Oracle should at minimum deprecate this ctor. It's plainly
broken.

Oracle should consider replacing it with a ctor that allows the query to
be parsed correctly:

public URI( String scheme, String userInfo, String host,
int port, String path, String fragment,
String[] ... paramValue ) ...

where paramValue is a vararg of pairs of strings which can be then
parsed and assembled correctly

new URI( ...,
{"param=1","value&1"},{"param?2","value#2"} );

However this is getting a bit complicated, so I think Oracle should
consider adding a builder class to build URIs a little more
conveniently. They should consider adding both parsed (like URI's
current multi-argument ctors) and raw (already quoted) for easy
"matching" of existing URIs. I think they should also make public some
parts of the Parse (currently an private class inside URI) and make at
least the various quoting methods available to developers, URLEncoder
looks ill-suited for this purpose.

public class URI {
public static quoteScheme( .. ){...
public static quoteUserInfo( .. ){...
public static quoteHost( .. ){...
public static quotePath( .. ){...
... etc.

public class Builder {
public Builder() {...
public Builder setScheme(...
public Builder setHost(...
public Builder setPath(...
public Builder setRawPath(...
public Builder addParameter( String param, String value )..
public Builder changeParameter( String param, String value )..
public Builder addRawParameter( String paramValue)..
public Builder setFragment(...
public Builder setRawFragment(...
public Builder clearParams()...
public Builder clearAll()...
... etc. as needed

You should double check your test cases and then submit a bug report to
the OpenJDK project (I have better luck communicating with them than
Oracle's bug reports page). That's about the best I can do right now.

To fix an immediate problem, I'm pretty sure all your other test cases
were wrong, so you should use test case 3 above as a template and you'll
have to deal with parsing the query string yourself, unfortunately.
 
S

Steven Simpson

public static void main(String[] args) throws Exception {
URI u = URI.create("http://server1/path"
+ "?param%3D1=value%261&param%3F2=value%232"
^^^ ^^^ ^^^ ^^^
+ "#fragment");
Note how the 'query' is wrong now.

Doesn't each % above indicate an encoded value, or are you referring
to something else? I'm not sure we're not talking cross purposes here.


The name of the first parameter is "param=1", and its value is "value&1".

The name of the second parameter is "param?2", with the value "value$2".

Because = and & are used to delimit parameters in the query string, the
literals in these parameter names and values have to be encoded by the
user before going into the string.

URI u = URI.create( decode("http://server1/path"
+ "?param%3D1=value%261&param%3F2=value%242"
+ "#fragment"));
System.out.println(u.toASCIIString());

run:
http://server1/path?param=1=value&1&param?2=value$2#fragment

This query string no longer matches the intended input.

Why decode() before passing into create()? The URI class needs to parse
the string before anything gets decoded.

import java.net.URI;

public class URITest4 {
public static void main(String[] args) throws Exception {
URI u = URI.create("http://server1/path"
+ "?param%3D1=value%261&param%3F2=value%232"
+ "#fragment");
System.out.println(u.toASCIIString());

System.out.println(" Query: " + u.getQuery());
System.out.println("Raw query: " + u.getRawQuery());
}
}

% java URITest4
http://server1/path?param=1=value&1&param?2=value#2#fragment
Query: param=1=value&1&param?2=value#2
Raw query: param%3D1=value%261&param%3F2=value%232

This shows that getQuery() is not useful, as it decodes too soon. The
value must be split at & first, then at =, then the names and values
should be decoded. This is why v is wrong in URITest3.

w is wrong in URITest3 because, although getRawQuery()'s correct value
is provided, the URI constructor incorrectly encodes it again.

I guess the problem stems from java.net.URI only partially parsing some
components. For those components that have no further structure, it's
okay to decode. But the query string has more structure, which must be
parsed before decoding. The same goes for userInfo() to some extent,
since : is a special character in it, which the user might want to use
literally.

Correspondingly, the parts which are externally assembled should not be
encoded by multi-arg URI constructors, because the caller will already
have had to do that.
 
S

Stanimir Stamenkov

Tue, 06 May 2014 12:24:44 -0700, /markspace/:
OK, I think I see what is going on. Most of your OP was frankly
wrong, however this one represents an actual bug in the URI parsing:
[...]
// "raw" is user error, omitted.

I was just demonstrating it is impossible to get a correct result
both ways.
Oracle should consider replacing it with a ctor that allows the
query to be parsed correctly:

public URI( String scheme, String userInfo, String host,
int port, String path, String fragment,
String[] ... paramValue ) ...
[...]

It's not just about the query parameters. It could be the path
contains percent encoded characters such as / (slash) and ;
(semicolon) which should not be decoded before being dereferenced
(like before parsing matrix parameters).

http://tools.ietf.org/html/rfc3986#section-2.4
 
M

markspace

The name of the first parameter is "param=1", and its value is "value&1".

The name of the second parameter is "param?2", with the value "value$2

Yes, mistake on my part. I made a quick test that appeared to work, so
I posted it. Mea culpa.
% java URITest4
http://server1/path?param=1=value&1&param?2=value#2#fragment
Query: param=1=value&1&param?2=value#2
Raw query: param%3D1=value%261&param%3F2=value%232

This shows that getQuery() is not useful, as it decodes too soon. The

Yup, see my post about 5 minutes just before yours. Clearly that's the
problem, the query can't possibly be parsed as a single long string.
It's a bug.
 
M

markspace

It's not just about the query parameters. It could be the path contains
percent encoded characters such as / (slash) and ; (semicolon) which
should not be decoded before being dereferenced (like before parsing
matrix parameters).

I'm still not sure 100% what you're saying, or if it's correct. The
multi-argument ctors for URI specifically say that they will encode any
illegal character, including '%'. The single argument ctor say's it
does not, and expects all URIs to be already properly quoted. That's
the spec for the URI class. Read the docs.

The query part of the multi-argument ctor doesn't encode all query
strings correctly. That I feel is a bug. It ought to at least throw an
error if it gets something ambiguous.

If there are other multi-argument ctors for the URI class that also
don't quote characters correctly, you ought to file separate bug reports
for those as well. I'm not 100% convinced this is true, but I'm not
really going to pursue it either. Your use of the 'raw' methods passed
to the multi-argument ctors is definitely wrong though, you really
aren't reading the docs there.

Good luck with your bug reports.
 
S

Stanimir Stamenkov

Tue, 06 May 2014 12:24:44 -0700, /markspace/:
URI v = new URI(u.getScheme(),
u.getUserInfo(),
"server2",
u.getPort(),
u.getPath(),
u.getQuery(),
u.getFragment());
[...]
To fix this, Oracle should at minimum deprecate this ctor. It's
plainly broken.

Not just this one, but all except URI(String) are broken (that's
what I've demonstrated in my OP). As they are not working correctly
with almost every data which needs to be percent encoded, I think
they should just accept encoded data and not perform additional
encoding, apart from encoding the separator characters which
distinguish them from the other components.

Again, here's the relevant "When to Encode or Decode" section from
the "URI: Generic Syntax" specification:

http://tools.ietf.org/html/rfc3986#section-2.4

The URI class may continue to provide decoded component values for
the cases one doesn't need to parse further into them, but then the
safest way would be for the constructors to accept encoded values.
All of:

schemeSpecificPart
authority
path
query

should not be decoded before being actually dereferenced.
 
S

Stanimir Stamenkov

Tue, 06 May 2014 13:04:22 -0700, /markspace/:
I'm still not sure 100% what you're saying, or if it's correct. The
multi-argument ctors for URI specifically say that they will encode
any illegal character, including '%'.

I've already stated in my OP it is not the URI class behaves in
undocumented (or wrong in this regard) way – just in no practically
useful way.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,702
Latest member
LukasConde

Latest Threads

Top