M
Marshall
Joachim said:They are, when it comes to aliasing of mutable data. I think it's
justified by the fact that aliased mutable data has a galling tendency
to break abstraction barriers. (More on this on request.)
I can see how pointers, or other kinds of aliasing
(by my more restricted definition) break abstraction barries;
it is hard to abstract something that can change out from
under you uncontrollably.
The identity isn't visible from inside SQL. (Unless there's an OID
facility available, which *is* an explicit identity.)
Agreed about OIDs.
Definitely not. You can have two equal records and update just one of
them, yielding non-equal records; by my definition (and by intuition),
this means that the records were equal but not identical.
This sort of thing comes up when one has made the mistake of
not defining any keys on one's table and needs to rectify the
situation. It is in fact considered quite a challenge to do.
My preferred technique for such a situation is to create
a new table with the same columns and SELECT DISTINCT ...
INTO ... then recreate the original table. So that way doesn't
fit your model.
How else might you do it? I suppose there are nonstandard
extensions, such as UPDATE ... LIMIT. And in fact there
might be some yucky thing that made it in to the standard
that will work.
But what you descrbe is certainly *not* possible in the
relational algebra; alas that SQL doesn't hew closer
to it. Would you agree? Would you also agree that
if a language *did* adhere to relation semantics,
then relation elements would *not* have identity?
(Under such a language, one could not have two
equal records, for example.)
[The above paragraph is the part of this post that I
really care about; feel free to ignore the rest if it's naive
or annoying or whatever, but please do respond to this.]
Such a mapping is indeed possible. Simply extend the table with a new
column, number the columns consecutively, and identify the records via
that column.
That doesn't work for me. It is one thing to say that for all
tables T, for all elements e in T, e has identity. It is a different
thing to say that for all tables T, there exists a table T' such
that for all elements e in T', e has identity.
But even if you don't do that, there's still identity. It is irrelevant
whether the programs can directly read the value of the identity field;
the adverse effects happen because updates are in-place. (If every
update generated a new record, then we'd indeed have no identity.)
Okay. At this point, though, the term aliasing has become extremely
general. I believe "i+1+1" is an alias for "i+2" under this definition.
No, "i+1+1" isn't an alias in itself. It's an expression - to be an
alias, it would have to be a reference to something.
However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
important - 1+1 is replacable by 2 in every context, so this is
essentially the same as saying "a[i+2] is an alias of a[i+2]", which is
vacuously true.
To me, the SQL with the where clause is like i+2, not like a[2].
A query is simply an expression that makes mention of a variable.
However I would have to admit that if SQL table elements have
identity, it would be more like the array example.
(The model of SQL I use in my head is one with set semantics,
and I am careful to avoid running in to bag semantics by, for
example, always defining at least one key, and using SELECT
DISTINCT where necessary. But it is true that the strict set
semantics in my head are not the wobbly bag semantics in
the standard.)
There's another aspect here. If two expressions are always aliases to
the same mutable, that's usually easy to determine; this kind of
aliasing is usually not much of a problem.
What's more of a problem are those cases where there's occasional
aliasing. I.e. a and a[j] may or may not be aliases of each other,
depending on the current value of i and j, and *that* is a problem
because the number of code paths to be tested doubles. It's even more of
a problem because testing with random data will usually not uncover the
case where the aliasing actually happens; you have to go around and
construct test cases specifically for the code paths that have aliasing.
Given that references may cross abstraction barriers (actually that's
often the purpose of constructing a reference in the first place), this
means you have to look for your test cases at multiple levels of
software abstraction, and *that* is really, really bad.
That is so general that I am concerned it has lost its ability to
identify problems specific to pointers.
If the reference to "pointers" above means "references", then I don't
know about any pointer problems that cannot be reproduced, in one form
or the other, in any of the other aliasing mechanisms.
It seems we agree that, for example, raw C pointers can easily
cause aliasing problems. To me, Java references, which are much
more tightly controlled, are still subject to aliasing problems,
although
in practice the situation is significantly less severe. (It is possible
to have multiple references to a single Java mutable object and
not know it.) I would assume the same is posible with, say, SML
refs.
However the situation in SQL strikes me as being different.
If one is speaking of base tables, and one has two queries,
one has everything one needs to know simply looking at
the queries. If views are involved, one must further examine
the definition of the view.
For example, we might ask in C, if we update a, will
the value of b[j] change? And we can't say, because
we don't know if there is aliasing. But in Fortran, we
can say that it won't, because they are definitely
different variables. But even so in Fortran, if we ask
if we update a, will a[j] change, and we know we
can't tell unless we know the values of i and j.
In SQL, if we update table T, will a SELECT on table
S change? We know it won't; they are different variables.
But if we update T, will a select on T change? It
certainly might, but I wouldn't call it aliasing, because
the two variables involved are the same variable, T.
Aliasing is indeed a more general idea that goes beyond address spaces.
However, identity and aliasing can be defined in fully abstract terms,
so I welcome this opportunity to get rid of a too-concrete model.
Yes.
Note that the WHERE clause properly includes array indexing (just set up
a table that has continuous numeric primary key, and a single other column).
I am still not convinced. It is not sufficient to show how sometimes
you can establish identity, or how you could establish identity based
on some related table; it must be shown how the identity can be
established in general, and I don't think it can. If it can't, then
I claim we are back to expressions that include a variable, and
not aliasing, even by the wider definition.
I.e. I'm not talking about how a is an alias of a[i+1] after updating
i, I'm talking about how a may be an alias of a[j].
It seems odd to me to suggest that "i+1" has identity.
It doesn't (unless it's passed around as a closure, but that's
irrelevant to this discussion).
"i" does have identity. "a" does have identity. "a[i+1]" does have
identity.
Let me say that for purposes of this discussion, if it can be assigned
to (or otherwise mutated), it has identity. (We *can* assign identity to
immutable things, but it's equivalent to equality and not interesting
for this discussion.)
I can see
that i has identity, but I would say that "i+1" has only value.
Agreed.
Cool.
But perhaps the ultimate upshoot of this thread is that my use
of terminology is nonstandard.
It's somewhat restricted, but not really nonstandard.
Whew!
Hmmm. Is it that there are so many, or that they are simply not
part of our daily concern?
I guess it's the latter. IIRC there are four or five isolation levels.
Yes, four.
Indeed.
Though the number of parameter passing mechanisms isn't that large
anyway. Off the top of my head, I could recite just three (by value, by
reference, by name aka lazy), the rest are just combinations with other
concepts (in/out/through, most notably) or a mapping to implementation
details (by reference vs. "pointer by value" in C++, for example).
Fair enough. I could only remember three, but I was sure someone
else could name more.
Marshall