S
Stanimir Stamenkov
This is a long time observation but I wanted to summarize it and
give heads up to ones which might not have encountered it, yet.
It doesn't appear java.net.URI behaves in undocumented way, but just
in no useful way. In my experience the java.net.URI is only
suitable for parsing certain URI parts, and not for constructing URI
instances, either using the properties of an existing URI or using
values obtained else way.
My use case is simple: Have an input URI which I want to modify
certain components/properties of, and produce a new URI. For
example, change the 'host' or 'path' of an HTTP URL.
The first example behaves pretty much as I expect:
import java.net.URI;
import java.net.URLEncoder;
public class URITest {
public static void main(String[] args) throws Exception {
System.out.println(URLEncoder
.encode("#%&/;=?@", "US-ASCII"));
URI u = URI.create("http://user@domain@server1:8080"
+ "/path?param=value#fragment");
System.out.println(u.toASCIIString());
URI v = new URI(u.getScheme(),
u.getUserInfo(),
"server2",
u.getPort(),
u.getPath(),
u.getQuery(),
u.getFragment());
System.out.println(v.toASCIIString());
URI w = new URI(u.getScheme(),
u.getRawUserInfo(),
"server3",
u.getPort(),
u.getRawPath(),
u.getRawQuery(),
u.getRawFragment());
System.out.println(w.toASCIIString());
}
}
It tests the behavior of the URI(scheme, userInfo, host, port, path,
query, fragment) constructor, and the output is as:
http://user@domain@server1:8080/path?param=value#fragment
http://user@domain@server2:8080/path?param=value#fragment
http://user%40domain@server3:8080/path?param=value#fragment
As I would expect the 'userInfo' is encoded properly when given as
decoded value (and double-encoded if given as a raw, already encoded
value). The other properties, in this case, don't make a difference
because their values are the same in raw and decoded form.
----
Now, I expect the URI(scheme, authority, path, query, fragment)
constructor would need a raw 'authority' value as it gets parsed
into 'userInfo', 'host' and 'port' components/properties:
public class URITest2 {
public static void main(String[] args) throws Exception {
URI u = URI.create("http://user@domain@server1:8080"
+ "/path?param=value#fragment");
System.out.println(u.toASCIIString());
URI v = new URI(u.getScheme(),
u.getAuthority(),
"/htap",
u.getQuery(),
u.getFragment());
System.out.println(v.toASCIIString());
URI w = new URI(u.getScheme(),
u.getRawAuthority(),
"/htap",
u.getQuery(),
u.getFragment());
System.out.println(w.toASCIIString());
}
}
The output:
http://user@domain@server1:8080/path?param=value#fragment
http://user@domain@server1:8080/htap?param=value#fragment
http://user%40domain@server1:8080/htap?param=value#fragment
shows there's no way to re-construct a correct URI using it.
----
The constructor URI(str) is not particularly interesting as it
parses the complete URI string, and I've further tried the simpler
URI(scheme, ssp, fragment) one:
public class URITest2a {
public static void main(String[] args) throws Exception {
URI u = URI.create("http://user@domain@server1:8080"
+ "/path?param=value#frag%23ment");
System.out.println(u.toASCIIString());
URI v = new URI(u.getScheme(),
u.getSchemeSpecificPart(),
u.getFragment());
System.out.println(v.toASCIIString());
URI w = new URI(u.getScheme(),
u.getRawSchemeSpecificPart(),
u.getRawFragment());
System.out.println(w.toASCIIString());
URI x = new URI(u.getScheme(),
u.getRawSchemeSpecificPart(),
u.getFragment());
System.out.println(x.toASCIIString());
}
}
The output:
http://user@domain@server1:8080/path?param=value#frag#ment
http://user@domain@server1:8080/path?param=value#frag#ment
http://user%40domain@server1:8080/path?param=value#frag%23ment
http://user%40domain@server1:8080/path?param=value#frag#ment
shows the 'fragment' is properly encoded, but then either using the
'rawSchemeSpecificPart' or the decoded 'schemeSpecificPart' doesn't
yield correct new URI.
----
It becomes even funnier when dealing with 'path' and 'query'
components which contain special URI characters (back to using the
"most specific" constructor from the first example):
public class URITest3 {
public static void main(String[] args) throws Exception {
URI u = URI.create("http://server1/path"
+ "?param%3D1=value%261¶m%3F2=value%232"
+ "#fragment");
System.out.println(u.toASCIIString());
URI v = new URI(u.getScheme(),
u.getUserInfo(),
"server2",
u.getPort(),
u.getPath(),
u.getQuery(),
u.getFragment());
System.out.println(v.toASCIIString());
URI w = new URI(u.getScheme(),
u.getRawUserInfo(),
"server3",
u.getPort(),
u.getRawPath(),
u.getRawQuery(),
u.getRawFragment());
System.out.println(w.toASCIIString());
}
}
Output:
http://server1/path?param=1=value&1¶m?2=value#2#fragment
http://server2/path?param=1=value&1¶m?2=value#2#fragment
http://server3/path?param%3D1=value%261¶m%3F2=value%232#fragment
The query part gets damaged either way.
----
The only way to construct a proper URI, changing just certain
components of a source URI, seems to construct it manually:
public class URITest4 {
public static void main(String[] args) throws Exception {
URI u = URI.create("http://server1/path"
+ "?param%3D1=value%261¶m%3F2=value%232"
+ "#fragment");
System.out.println(u.toASCIIString());
StringBuilder v = new StringBuilder();
v.append(u.getScheme()).append("://");
if (u.getRawUserInfo() != null) {
v.append(u.getRawUserInfo()).append('@');
}
v.append(u.getHost());
if (u.getPort() != -1) {
v.append(':').append(u.getPort());
}
v.append("/pat2"); // Replace path
if (u.getRawQuery() != null) {
v.append('?').append(u.getRawQuery());
}
if (u.getRawFragment() != null) {
v.append('#').append(u.getRawFragment());
}
System.out.println(v);
}
}
I think all this mess is caused by the URI constructors blindly
encoding special URI characters in given 'path', 'query' etc.
without considering the context, and you probably shouldn't be using
the java.net.URI constructors for any serious work.
Do you think Oracle should reconsider the java.net.URI
implementation so it becomes more useful? What alternatives to
java.net.URI you're aware of (may something like
javax.ws.rs.core.UriBuilder), regarding such
manipulation/construction use case?
give heads up to ones which might not have encountered it, yet.
It doesn't appear java.net.URI behaves in undocumented way, but just
in no useful way. In my experience the java.net.URI is only
suitable for parsing certain URI parts, and not for constructing URI
instances, either using the properties of an existing URI or using
values obtained else way.
My use case is simple: Have an input URI which I want to modify
certain components/properties of, and produce a new URI. For
example, change the 'host' or 'path' of an HTTP URL.
The first example behaves pretty much as I expect:
import java.net.URI;
import java.net.URLEncoder;
public class URITest {
public static void main(String[] args) throws Exception {
System.out.println(URLEncoder
.encode("#%&/;=?@", "US-ASCII"));
URI u = URI.create("http://user@domain@server1:8080"
+ "/path?param=value#fragment");
System.out.println(u.toASCIIString());
URI v = new URI(u.getScheme(),
u.getUserInfo(),
"server2",
u.getPort(),
u.getPath(),
u.getQuery(),
u.getFragment());
System.out.println(v.toASCIIString());
URI w = new URI(u.getScheme(),
u.getRawUserInfo(),
"server3",
u.getPort(),
u.getRawPath(),
u.getRawQuery(),
u.getRawFragment());
System.out.println(w.toASCIIString());
}
}
It tests the behavior of the URI(scheme, userInfo, host, port, path,
query, fragment) constructor, and the output is as:
http://user@domain@server1:8080/path?param=value#fragment
http://user@domain@server2:8080/path?param=value#fragment
http://user%40domain@server3:8080/path?param=value#fragment
As I would expect the 'userInfo' is encoded properly when given as
decoded value (and double-encoded if given as a raw, already encoded
value). The other properties, in this case, don't make a difference
because their values are the same in raw and decoded form.
----
Now, I expect the URI(scheme, authority, path, query, fragment)
constructor would need a raw 'authority' value as it gets parsed
into 'userInfo', 'host' and 'port' components/properties:
public class URITest2 {
public static void main(String[] args) throws Exception {
URI u = URI.create("http://user@domain@server1:8080"
+ "/path?param=value#fragment");
System.out.println(u.toASCIIString());
URI v = new URI(u.getScheme(),
u.getAuthority(),
"/htap",
u.getQuery(),
u.getFragment());
System.out.println(v.toASCIIString());
URI w = new URI(u.getScheme(),
u.getRawAuthority(),
"/htap",
u.getQuery(),
u.getFragment());
System.out.println(w.toASCIIString());
}
}
The output:
http://user@domain@server1:8080/path?param=value#fragment
http://user@domain@server1:8080/htap?param=value#fragment
http://user%40domain@server1:8080/htap?param=value#fragment
shows there's no way to re-construct a correct URI using it.
----
The constructor URI(str) is not particularly interesting as it
parses the complete URI string, and I've further tried the simpler
URI(scheme, ssp, fragment) one:
public class URITest2a {
public static void main(String[] args) throws Exception {
URI u = URI.create("http://user@domain@server1:8080"
+ "/path?param=value#frag%23ment");
System.out.println(u.toASCIIString());
URI v = new URI(u.getScheme(),
u.getSchemeSpecificPart(),
u.getFragment());
System.out.println(v.toASCIIString());
URI w = new URI(u.getScheme(),
u.getRawSchemeSpecificPart(),
u.getRawFragment());
System.out.println(w.toASCIIString());
URI x = new URI(u.getScheme(),
u.getRawSchemeSpecificPart(),
u.getFragment());
System.out.println(x.toASCIIString());
}
}
The output:
http://user@domain@server1:8080/path?param=value#frag#ment
http://user@domain@server1:8080/path?param=value#frag#ment
http://user%40domain@server1:8080/path?param=value#frag%23ment
http://user%40domain@server1:8080/path?param=value#frag#ment
shows the 'fragment' is properly encoded, but then either using the
'rawSchemeSpecificPart' or the decoded 'schemeSpecificPart' doesn't
yield correct new URI.
----
It becomes even funnier when dealing with 'path' and 'query'
components which contain special URI characters (back to using the
"most specific" constructor from the first example):
public class URITest3 {
public static void main(String[] args) throws Exception {
URI u = URI.create("http://server1/path"
+ "?param%3D1=value%261¶m%3F2=value%232"
+ "#fragment");
System.out.println(u.toASCIIString());
URI v = new URI(u.getScheme(),
u.getUserInfo(),
"server2",
u.getPort(),
u.getPath(),
u.getQuery(),
u.getFragment());
System.out.println(v.toASCIIString());
URI w = new URI(u.getScheme(),
u.getRawUserInfo(),
"server3",
u.getPort(),
u.getRawPath(),
u.getRawQuery(),
u.getRawFragment());
System.out.println(w.toASCIIString());
}
}
Output:
http://server1/path?param=1=value&1¶m?2=value#2#fragment
http://server2/path?param=1=value&1¶m?2=value#2#fragment
http://server3/path?param%3D1=value%261¶m%3F2=value%232#fragment
The query part gets damaged either way.
----
The only way to construct a proper URI, changing just certain
components of a source URI, seems to construct it manually:
public class URITest4 {
public static void main(String[] args) throws Exception {
URI u = URI.create("http://server1/path"
+ "?param%3D1=value%261¶m%3F2=value%232"
+ "#fragment");
System.out.println(u.toASCIIString());
StringBuilder v = new StringBuilder();
v.append(u.getScheme()).append("://");
if (u.getRawUserInfo() != null) {
v.append(u.getRawUserInfo()).append('@');
}
v.append(u.getHost());
if (u.getPort() != -1) {
v.append(':').append(u.getPort());
}
v.append("/pat2"); // Replace path
if (u.getRawQuery() != null) {
v.append('?').append(u.getRawQuery());
}
if (u.getRawFragment() != null) {
v.append('#').append(u.getRawFragment());
}
System.out.println(v);
}
}
I think all this mess is caused by the URI constructors blindly
encoding special URI characters in given 'path', 'query' etc.
without considering the context, and you probably shouldn't be using
the java.net.URI constructors for any serious work.
Do you think Oracle should reconsider the java.net.URI
implementation so it becomes more useful? What alternatives to
java.net.URI you're aware of (may something like
javax.ws.rs.core.UriBuilder), regarding such
manipulation/construction use case?