Michael Winter wrote:
[snip]
[...] the path doesn't contain hierarchical information.
It does not have to:
Yes, in general, URIs do not need to have hierarchical paths, but HTTP
URIs are hierarchical. That said, the hierarchy can be arbitrary; it
certainly doesn't need to follow a directory structure, for example.
Yes, I know you know that, I'm just trying to eliminate an unnecessary
response.
[snip]
These productions do not apply here.
I never said they did. You seemed to miss the part where I wrote, "in a
relative-path reference" (a term defined in 4.2 Relative Reference). The
URI suggested by the OP isn't a relative-path reference, but an absolute
URI (4.3 - though a fragment might not be prohibited). The information
above was just a qualification to prevent misinterpretation of the
preceding statement. That is, a colon can appear in path segments, but
not /all/ path segments.
[snip]
You are misunderstanding the RFC, and your logic is flawed.
I disagree. I have simply stated facts. However, it's hard to refute
conclusively unless you identify what you think I have misunderstood, or
where exactly logic has apparently failed me.
For an /(ht|f)tps?:/ URI/URL (see subsection 1.1.3) must contain an
authority component because of the need for a host (a general URI
does not need to, as it may be a URN).
I already knew that. I think you misunderstood why I wrote what I did,
though that is perhaps my fault. I started by making comments specific
to HTTP URIs, but then shifted to a more generic treatment without
explicitly noting it.
Your problem with what I wrote would seem to revolve around my mention
of the authority component. It only occurred in relation to empty path
segments in general, and not specifically to the URI suggested by the OP
(so the specifics of HTTP URIs are irrelevant, in this instance).
To some, allowing empty path segments might seem to be an oversight, or
a simplification of the grammar. Given the number of revisions to the
URI syntax RFCs, the former is unlikely, but the latter isn't entirely
unreasonable. However, even if that were the case, the RFC needn't have
limited itself to stating:
If a URI does not contain an authority component, then the path
cannot begin with two slash characters ("//").
-- 3.3 Path
It could have just forbade empty segments, instead.
As I didn't know what the grounds were for your objection to the
proposed URI, I hoped to cover the two obvious syntactic possibilities.
On reflection, I should have just asked.
[snipped well-intentioned quotation of the grammar]
As per
| 2.2. Reserved Characters
`:' is such a character:
| reserved = gen-delims / sub-delims
|
| gen-delims = ":" / "/" / "?" / "#" / "[" / "]" / "@"
Furthermore (in the same subsection),
| URI producing applications should percent-encode data octets that
| correspond to characters in the reserved set unless these characters
| are specifically allowed by the URI scheme to represent data in that
| component.
The reason for this recommendation (SHOULD) is that another `scheme:'
character sequence within the URI can render the URI ambiguous.
I don't see why. A scheme can only occur at the start of a URI, and a
URI may only start in five ways. Three of those require an unambiguous
delimiter: authority (//), query (?), and fragment (#). The remaining
two is with the scheme itself, or a path. If the path begins with a
slash, that too is unambiguous. If it doesn't, then a colon in the first
segment will be confusing, but this can be resolved by adding a leading
dot segment (foo:bar -> ./foo:bar).
So, if a colon occurs before any slash, question mark, or hash
characters, it delimits the scheme. Anywhere else and it is part of some
(sub-)component.
So the example given by the OP may be a syntactically Valid URI, but one
that is at least unwise to use.
I agree.
[snip]
Mike