What is Expressiveness in a Computer Language

M

Marshall

Joachim said:
They are, when it comes to aliasing of mutable data. I think it's
justified by the fact that aliased mutable data has a galling tendency
to break abstraction barriers. (More on this on request.)

I can see how pointers, or other kinds of aliasing
(by my more restricted definition) break abstraction barries;
it is hard to abstract something that can change out from
under you uncontrollably.

The identity isn't visible from inside SQL. (Unless there's an OID
facility available, which *is* an explicit identity.)

Agreed about OIDs.

Definitely not. You can have two equal records and update just one of
them, yielding non-equal records; by my definition (and by intuition),
this means that the records were equal but not identical.

This sort of thing comes up when one has made the mistake of
not defining any keys on one's table and needs to rectify the
situation. It is in fact considered quite a challenge to do.
My preferred technique for such a situation is to create
a new table with the same columns and SELECT DISTINCT ...
INTO ... then recreate the original table. So that way doesn't
fit your model.

How else might you do it? I suppose there are nonstandard
extensions, such as UPDATE ... LIMIT. And in fact there
might be some yucky thing that made it in to the standard
that will work.

But what you descrbe is certainly *not* possible in the
relational algebra; alas that SQL doesn't hew closer
to it. Would you agree? Would you also agree that
if a language *did* adhere to relation semantics,
then relation elements would *not* have identity?
(Under such a language, one could not have two
equal records, for example.)

[The above paragraph is the part of this post that I
really care about; feel free to ignore the rest if it's naive
or annoying or whatever, but please do respond to this.]

Such a mapping is indeed possible. Simply extend the table with a new
column, number the columns consecutively, and identify the records via
that column.

That doesn't work for me. It is one thing to say that for all
tables T, for all elements e in T, e has identity. It is a different
thing to say that for all tables T, there exists a table T' such
that for all elements e in T', e has identity.

But even if you don't do that, there's still identity. It is irrelevant
whether the programs can directly read the value of the identity field;
the adverse effects happen because updates are in-place. (If every
update generated a new record, then we'd indeed have no identity.)
Okay. At this point, though, the term aliasing has become extremely
general. I believe "i+1+1" is an alias for "i+2" under this definition.

No, "i+1+1" isn't an alias in itself. It's an expression - to be an
alias, it would have to be a reference to something.

However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
important - 1+1 is replacable by 2 in every context, so this is
essentially the same as saying "a[i+2] is an alias of a[i+2]", which is
vacuously true.

To me, the SQL with the where clause is like i+2, not like a[2].
A query is simply an expression that makes mention of a variable.
However I would have to admit that if SQL table elements have
identity, it would be more like the array example.

(The model of SQL I use in my head is one with set semantics,
and I am careful to avoid running in to bag semantics by, for
example, always defining at least one key, and using SELECT
DISTINCT where necessary. But it is true that the strict set
semantics in my head are not the wobbly bag semantics in
the standard.)

There's another aspect here. If two expressions are always aliases to
the same mutable, that's usually easy to determine; this kind of
aliasing is usually not much of a problem.
What's more of a problem are those cases where there's occasional
aliasing. I.e. a and a[j] may or may not be aliases of each other,
depending on the current value of i and j, and *that* is a problem
because the number of code paths to be tested doubles. It's even more of
a problem because testing with random data will usually not uncover the
case where the aliasing actually happens; you have to go around and
construct test cases specifically for the code paths that have aliasing.
Given that references may cross abstraction barriers (actually that's
often the purpose of constructing a reference in the first place), this
means you have to look for your test cases at multiple levels of
software abstraction, and *that* is really, really bad.
That is so general that I am concerned it has lost its ability to
identify problems specific to pointers.

If the reference to "pointers" above means "references", then I don't
know about any pointer problems that cannot be reproduced, in one form
or the other, in any of the other aliasing mechanisms.


It seems we agree that, for example, raw C pointers can easily
cause aliasing problems. To me, Java references, which are much
more tightly controlled, are still subject to aliasing problems,
although
in practice the situation is significantly less severe. (It is possible
to have multiple references to a single Java mutable object and
not know it.) I would assume the same is posible with, say, SML
refs.

However the situation in SQL strikes me as being different.
If one is speaking of base tables, and one has two queries,
one has everything one needs to know simply looking at
the queries. If views are involved, one must further examine
the definition of the view.

For example, we might ask in C, if we update a, will
the value of b[j] change? And we can't say, because
we don't know if there is aliasing. But in Fortran, we
can say that it won't, because they are definitely
different variables. But even so in Fortran, if we ask
if we update a, will a[j] change, and we know we
can't tell unless we know the values of i and j.

In SQL, if we update table T, will a SELECT on table
S change? We know it won't; they are different variables.
But if we update T, will a select on T change? It
certainly might, but I wouldn't call it aliasing, because
the two variables involved are the same variable, T.

Aliasing is indeed a more general idea that goes beyond address spaces.

However, identity and aliasing can be defined in fully abstract terms,
so I welcome this opportunity to get rid of a too-concrete model.


Yes.
Note that the WHERE clause properly includes array indexing (just set up
a table that has continuous numeric primary key, and a single other column).

I am still not convinced. It is not sufficient to show how sometimes
you can establish identity, or how you could establish identity based
on some related table; it must be shown how the identity can be
established in general, and I don't think it can. If it can't, then
I claim we are back to expressions that include a variable, and
not aliasing, even by the wider definition.


I.e. I'm not talking about how a is an alias of a[i+1] after updating
i, I'm talking about how a may be an alias of a[j].
It seems odd to me to suggest that "i+1" has identity.

It doesn't (unless it's passed around as a closure, but that's
irrelevant to this discussion).
"i" does have identity. "a" does have identity. "a[i+1]" does have
identity.
Let me say that for purposes of this discussion, if it can be assigned
to (or otherwise mutated), it has identity. (We *can* assign identity to
immutable things, but it's equivalent to equality and not interesting
for this discussion.)
I can see
that i has identity, but I would say that "i+1" has only value.
Agreed.

Cool.

But perhaps the ultimate upshoot of this thread is that my use
of terminology is nonstandard.

It's somewhat restricted, but not really nonstandard.
Whew!

Hmmm. Is it that there are so many, or that they are simply not
part of our daily concern?

I guess it's the latter. IIRC there are four or five isolation levels.


Yes, four.

Indeed.
Though the number of parameter passing mechanisms isn't that large
anyway. Off the top of my head, I could recite just three (by value, by
reference, by name aka lazy), the rest are just combinations with other
concepts (in/out/through, most notably) or a mapping to implementation
details (by reference vs. "pointer by value" in C++, for example).

Fair enough. I could only remember three, but I was sure someone
else could name more. :)


Marshall
 
J

Joachim Durchholz

Marshall said:
This sort of thing comes up when one has made the mistake of
not defining any keys on one's table and needs to rectify the
situation. It is in fact considered quite a challenge to do.
My preferred technique for such a situation is to create
a new table with the same columns and SELECT DISTINCT ...
INTO ... then recreate the original table. So that way doesn't
fit your model.

I was mentioning multiple equal records just as an example where the
records have an identity that's independent of the record values.
How else might you do it? I suppose there are nonstandard
extensions, such as UPDATE ... LIMIT. And in fact there
might be some yucky thing that made it in to the standard
that will work.

Right, it might be difficult to update multiple records that are exactly
the same. It's not what SQL was intended for, and difficult to do anyway
- I was thinking of LIMIT (I wasn't aware that it's nonstandard), and I
agree that there may be other ways to do it.

However, I wouldn't overvalue that case. The Jane/John Doe example
posted elsewhere in this thread is strictly within the confines of what
SQL was built for, yet there is aliasing.
But what you descrbe is certainly *not* possible in the
relational algebra; alas that SQL doesn't hew closer
to it. Would you agree?

Yup, SQL (particularly its update semantics) aren't relational semantics.
Still, it's SQL we've been talking about.
Would you also agree that
if a language *did* adhere to relation semantics,
then relation elements would *not* have identity?
(Under such a language, one could not have two
equal records, for example.)

Any identity that one could define within relational semantics would be
just equality, so no, it wouldn't be very helpful (unless as part of a
greater framework).
However, that's not really a surprise. Aliasing becomes relevant (even
observable) only when there's updates, and relational algebra doesn't
have updates.
That doesn't work for me. It is one thing to say that for all
tables T, for all elements e in T, e has identity. It is a different
thing to say that for all tables T, there exists a table T' such
that for all elements e in T', e has identity.

Let me continue that argument:
Records in T' have identity.
From an SQL point-of-view, there's no difference between the records in
T' and records in other tables, so they must share all properties, in
particular that of having an identity.

(I agree that's not as convincing as seeing aliasing in action. See below.)
But even if you don't do that, there's still identity. It is irrelevant
whether the programs can directly read the value of the identity field;
the adverse effects happen because updates are in-place. (If every
update generated a new record, then we'd indeed have no identity.)
Okay. At this point, though, the term aliasing has become extremely
general. I believe "i+1+1" is an alias for "i+2" under this definition.
No, "i+1+1" isn't an alias in itself. It's an expression - to be an
alias, it would have to be a reference to something.

However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
important - 1+1 is replacable by 2 in every context, so this is
essentially the same as saying "a[i+2] is an alias of a[i+2]", which is
vacuously true.

To me, the SQL with the where clause is like i+2, not like a[2].

I'd say WHERE is like [i+2]: neither is valid on its own, it's the
"selector" part of a reference into a table.
A query is simply an expression that makes mention of a variable.
However I would have to admit that if SQL table elements have
identity, it would be more like the array example.
OK.

(The model of SQL I use in my head is one with set semantics,
and I am careful to avoid running in to bag semantics by, for
example, always defining at least one key, and using SELECT
DISTINCT where necessary. But it is true that the strict set
semantics in my head are not the wobbly bag semantics in
the standard.)

Set-vs.-bag doesn't affect SQL's aliasing in general, it was a specific
property of the example that I chose. So you don't have to worry wether
that might be the cause of the misunderstanding.
It seems we agree that, for example, raw C pointers can easily
cause aliasing problems. To me, Java references, which are much
more tightly controlled, are still subject to aliasing problems,
although
in practice the situation is significantly less severe. (It is possible
to have multiple references to a single Java mutable object and
not know it.)
Exactly.

I would assume the same is posible with, say, SML refs.

So do I.
However the situation in SQL strikes me as being different.
If one is speaking of base tables, and one has two queries,
one has everything one needs to know simply looking at
the queries. If views are involved, one must further examine
the definition of the view.

Sure. Usually, SQL itself is enough abstract that no additional
abstraction happens; so even if you have aliasing, you usually know
about it.
I.e. when compared to what you were saying about Java: "It is possible
to have multiple references to a single Java mutable object and
not know it.", the "and not know it" part is usually missing.

The picture changes as soon as the SQL statements get hidden behind
layers of abstraction, say result sets and prepared query handles.

.... Um... let me also restate the preconditions for aliasing become a
problem. You need three elements for that:
1) Identities that can be stored in multiple places.
2) Updates.
3) Abstraction (of updatable entities).

Without identities, there cannot be aliasing.
Without updates, the operation that makes aliasing problematic is excluded.
Without abstraction, you may have aliasing but can clearly identify it,
and don't have a problem.

I think the roadblock you're hitting is that you keep looking for
nonlocal trouble to identify spots where SQL might have aliasing; since
SQL isn't used with abstraction often, you come up empty and conclude
there's no aliasing.
My position is that not all aliasing is evil, and that there's a
technical definition: two program entities (l-expressions in C parlance)
are aliases if changing the referenced data through one also changes the
other.
For example, we might ask in C, if we update a, will
the value of b[j] change? And we can't say, because
we don't know if there is aliasing. But in Fortran, we
can say that it won't, because they are definitely
different variables.


Even that may be the case.
There's an EQUIVALENCE statement that can explicitly alias array sections.
Also, I suspect that newer versions of Fortran have reference parameters.
But even so in Fortran, if we ask
if we update a, will a[j] change, and we know we
can't tell unless we know the values of i and j.


In fact, that's the usual form of aliasing in Fortran.
In SQL, if we update table T, will a SELECT on table
S change? We know it won't; they are different variables.
But if we update T, will a select on T change? It
certainly might, but I wouldn't call it aliasing, because
the two variables involved are the same variable, T.

I fail to see how this is different from the aliasing in a and a[j].

It's just the same kind of aliasing that I have with the statements
SELECT * FROM T WHERE key = $i
and
SELECT * FROM T WHERE key = $j
(where $i and $j are placeholders for variables from program variables).

After running
UPDATE T SET f = 5 WHERE key = $i
not only the first SELECT may give different results, the second may,
too (in those cases where $i and $j happen to have the same value).


I can also construct pointer-like aliasing in SQL. If I have
SELECT * FROM employees WHERE country = "US"
and
SELECT * FROM employees WHERE name = "Doe"
then somebody with a grudge against all US employees running
UPDATE employees SET salary = 0 WHERE country = "US"
will (on purpose or not) also modify the results of the second query.

Not a problem while all three queries are in plain view, but imagine
them wrapped in some opaque object or behind a module interface.
I am still not convinced. It is not sufficient to show how sometimes
you can establish identity, or how you could establish identity based
on some related table; it must be shown how the identity can be
established in general, and I don't think it can. If it can't, then
I claim we are back to expressions that include a variable, and
not aliasing, even by the wider definition.

See above.

I admit that identity cannot be reliably established in SQL. The
identity of a record isn't available (unless via OID extensions).
However, it is there and it can have effect.

Translated to real life, "my best friend" and "my employer" may be
referring to the same identity, even if these descriptions may change in
the future.
Fair enough. I could only remember three, but I was sure someone
else could name more. :)

I got a private mail naming more parameter passing mechanisms. It's a
bit difficult define what's a parameter passing mechanism and what's
just an attribute for such a mechanism. If you count all combinations as
an extra mechanism, you can easily get dozens.

Regards,
Jo
 
C

Chris F Clark

Marshall said:
In general, I feel that "records" are not the right conceptual
level to think about.

Unfortunately, they are the right level. Actually,the right level
might even be lower, the fields within a record, but that's moving
even farther away from the direction you wish to go. The right level
is the lowest level at which the programmer can see and manipulate
values. Thus, since a programmer can see and manipulate fields within
records in SQL that is the level we need to concern ourselves with
when talking about aliasing in SQL. In other words, since a SELECT
(or UPDATE) statement can talk about specific fields of specific
records within the table (not directly, but reliably, which I'll
explain in a moment) then those are the "primitive objects" in SQL
that can be aliased.

What I meant by "not directly, but reliably":

If you have a fixed database, and you do two selects which specify the
same sets of fields to be selected and the same keys to select records
with, one expects the two selects to return the same values. You do
not necessarily expect the records to be returned in the same order,
but you do expect the same set of records to be returned and the
records to have the same values in the same fields. Further, if you
do an update, you expect certain fields of certain records to change
(and be reflected in subsequent selects).

However, therein lies the rub, if you do a select on some records, and
then an update that changes those records, the records you have from
the select have either changed or show outdated values. If there is
some way to refer to the records you first selected before the update,
then you have an aliasing problem, but maybe one can't do that. One
could also have an aliasing problem, if one were allowed to do two
updates simultaneously, so that one update could changed records in
the middle of the other changing the records. However, I don't know
if that's allowed in the relational algebra either. (I think I finally
see your point, and more on that later.)
I do not see that they have
any identity outside of their value. We can uniquely identify
any particular record via a key, but a table may have more
than one key, and an update may change the values of one
key but not another. So it is not possible in general to
definitely and uniquely assign a mapping from each record
of a table after an update to each record of the table before
the update, and if you can't do that, then where
is the record identity?

I don't know if your point of having two keys within one table amkes a
difference. If one can uniquely identify a record, then it has
identity. The fact that there is another mapping where one may not be
able to uniquely identify the record is not necessarily relevant.
But what you descrbe is certainly *not* possible in the
relational algebra; alas that SQL doesn't hew closer
to it. Would you agree? Would you also agree that
if a language *did* adhere to relation semantics,
then relation elements would *not* have identity?
(Under such a language, one could not have two
equal records, for example.)

Ok, here I think I will explain my understanding of your point. If we
treat each select statement and each update statements as a separate
program, i.e. we can't save records from one select statement to a
later one (and I think in the relational algebra one cannot), then
each single (individual) select statement or update statement is a
functional program. That is we can view a single select statement as
doing a fold over some incoming data, but it cannot change the data.
Similarly, we can view an update statement as creating a new copy of
the table with mutations in it, but the update statement can only
refer to the original immutable table that was input. (Well, that's
atleast how I recall the relational algebra.) It is the second point,
about the semantics of the update statement which is important. There
are no aliasing issues because in the relational algebra one cannot
refer to the mutated values in an update statement, every reference in
an update statement is refering to the unmutated input (again, that's
my recollection). (Note, if my recollection is off, and one can refer
to the mutated records in an update statement, perhaps they can
explain why aliasing is not an issue in it.)

Now for some minor points:
For example, we might ask in C, if we update a, will
the value of b[j] change? And we can't say, because
we don't know if there is aliasing. But in Fortran, we
can say that it won't, because they are definitely
different variables.


Unfortunately, your statement about FORTRAN is not true, if a and b
are parameters to a subroutine (or members of a common block, or
equivalenced, or ...), then an update of a might change b[j].

This is in fact, an important point, it is in subroutines (and
subroutine like things), where we have local names for things
(bindings) that are actually names for things which may or may not be
distinct, that aliasing is the primary issue. I don't recall the
relational algebra having subroutines, and even if it did they would
be unimportant, given that one could not refer within a single update
statement to the records that a subroutine changed (since one cannot
refer in an update statement to any record which has been changed, see
the caveats above).

Joachim Durchholz wrote: > >
Marshall again:
Fair enough. I could only remember three, but I was sure someone
else could name more. :)

There actual are some more, but very rare, for example there was call-
by-text-string, which is sort of like call-by-name, but allows a
parameter to reach into the calling routine and mess with local
variables therein. Most of the rare ones have really terrible
semantics, and are perhaps best forgotten except to keep future
implementors from choosing them. For example, call-by-text-string is
really easy to implement in a naive interpreter that reparses each
statement at every invocation, but but it exposes the implementation
properties so blatantly that writing a different interpretor is nearly
impossible.

If I recall correctly, there are also some call-by- methods that take
into account networks and addressing space issues--call-by-reference
doesn't work if one cannot see the thing being referenced. Of coruse,
one must then ask whether one considers "remote-procedure-call" and/or
message-passing to be the same thing as a local call.

One last nit, Call-by-value is actually call-by-copy-in-copy-out when
generalized.

There are some principles that one can use to organize the parameter
passing methods. However, the space is much richer than people
commonly know about (and that is actually a good thing, that most
people aren't aware of the richness, simplicity is good).

-Chris
 
D

Dr.Ruud

Chris F Clark schreef:
If you have a fixed database, and you do two selects which specify the
same sets of fields to be selected and the same keys to select records
with, one expects the two selects to return the same values.

When your "fixed" means read-only, or (fully) locked, then yes.

Modern databases also have modes without locks, so without special
attention (like maybe your "fixed") the two selects do not necessarily
return the same set of records.

Further, if you
do an update, you expect certain fields of certain records to change
(and be reflected in subsequent selects).

Doesn't your "fixed" block updates?

However, therein lies the rub, if you do a select on some records, and
then an update that changes those records, the records you have from
the select have either changed or show outdated values.

Not necessarily outdated: values can be fixed in time for a purpose.
Example: calculating interest is done once per day, but the whole
process takes more than a day.

Some systems implemented a lock plus requery just before the update, to
check for unacceptable changes in the stored data; this to prevent
having to keep locks while waiting.

If there is
some way to refer to the records you first selected before the update,
then you have an aliasing problem, but maybe one can't do that. One
could also have an aliasing problem, if one were allowed to do two
updates simultaneously, so that one update could changed records in
the middle of the other changing the records.

Some databases allow you to travel back in time: run this query on the
data of 1 year ago. All previous values are kept "behind" the current
value.
 
M

Marshall

Joachim said:
Yup, SQL (particularly its update semantics) aren't relational semantics.
Still, it's SQL we've been talking about.


Any identity that one could define within relational semantics would be
just equality, so no, it wouldn't be very helpful (unless as part of a
greater framework).
However, that's not really a surprise. Aliasing becomes relevant (even
observable) only when there's updates, and relational algebra doesn't
have updates.

Good point. Perhaps I should have said "relational algebra +
variables with assignment." It is interesting to consider
assignment vs. the more restricted update operators: insert,
update, delete. I think I see what you mean about the restricted
update operators working on identity.

Okay. At this point, though, the term aliasing has become extremely
general. I believe "i+1+1" is an alias for "i+2" under this definition.
No, "i+1+1" isn't an alias in itself. It's an expression - to be an
alias, it would have to be a reference to something.

However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
important - 1+1 is replacable by 2 in every context, so this is
essentially the same as saying "a[i+2] is an alias of a[i+2]", which is
vacuously true.

To me, the SQL with the where clause is like i+2, not like a[2].

I'd say WHERE is like [i+2]: neither is valid on its own, it's the
"selector" part of a reference into a table.

Is it possible you're being distracted by the syntax? WHERE is a
binary operation taking a relation and a filter function. I don't
think of filters as being like array indexing; do they appear
analogous to you? (Always a difficult question because often
times A and B share some qualities and not others, and
what does one call that?)

Sure. Usually, SQL itself is enough abstract that no additional
abstraction happens; so even if you have aliasing, you usually know
about it.
I.e. when compared to what you were saying about Java: "It is possible
to have multiple references to a single Java mutable object and
not know it.", the "and not know it" part is usually missing.
Yes!


The picture changes as soon as the SQL statements get hidden behind
layers of abstraction, say result sets and prepared query handles.

... Um... let me also restate the preconditions for aliasing become a
problem. You need three elements for that:
1) Identities that can be stored in multiple places.
2) Updates.
3) Abstraction (of updatable entities).

Without identities, there cannot be aliasing.
Without updates, the operation that makes aliasing problematic is excluded.
Without abstraction, you may have aliasing but can clearly identify it,
and don't have a problem.

Aha! That description works very well for me.

I admit that identity cannot be reliably established in SQL. The
identity of a record isn't available (unless via OID extensions).
However, it is there and it can have effect.

Yes, I think I see what you mean now. This is in part a consequence
of the restricted update operators.

Thanks,

Marshall
 
M

Marshall

Chris said:
Unfortunately, they are the right level. Actually,the right level
might even be lower, the fields within a record, but that's moving
even farther away from the direction you wish to go. The right level
is the lowest level at which the programmer can see and manipulate
values.

But how is this not always "the bit"?


I don't know if your point of having two keys within one table amkes a
difference. If one can uniquely identify a record, then it has
identity. The fact that there is another mapping where one may not be
able to uniquely identify the record is not necessarily relevant.

The issue with two keys is that the keys may not *agree* on the
mapping before vs. after the update.

For example, we might ask in C, if we update a, will
the value of b[j] change? And we can't say, because
we don't know if there is aliasing. But in Fortran, we
can say that it won't, because they are definitely
different variables.


Unfortunately, your statement about FORTRAN is not true, if a and b
are parameters to a subroutine (or members of a common block, or
equivalenced, or ...), then an update of a might change b[j].


Ah, common blocks. How that takes me back!

But your point is a good one; I oversimplified.

Marshall again:

There actual are some more, but very rare, for example there was call-
by-text-string, which is sort of like call-by-name, but allows a
parameter to reach into the calling routine and mess with local
variables therein. Most of the rare ones have really terrible
semantics, and are perhaps best forgotten except to keep future
implementors from choosing them. For example, call-by-text-string is
really easy to implement in a naive interpreter that reparses each
statement at every invocation, but but it exposes the implementation
properties so blatantly that writing a different interpretor is nearly
impossible.

If I recall correctly, there are also some call-by- methods that take
into account networks and addressing space issues--call-by-reference
doesn't work if one cannot see the thing being referenced. Of coruse,
one must then ask whether one considers "remote-procedure-call" and/or
message-passing to be the same thing as a local call.

One last nit, Call-by-value is actually call-by-copy-in-copy-out when
generalized.

There are some principles that one can use to organize the parameter
passing methods. However, the space is much richer than people
commonly know about (and that is actually a good thing, that most
people aren't aware of the richness, simplicity is good).


Fair enough.


Marshall
 
C

Chris Smith

We Chris's stick together, as always.

Marshall said:
But how is this not always "the bit"?

First of all, we should be consistent in speaking about the logical
language semantics, which may not include bits anyway.

That said, if bits were the primitive concept of data in a language,
then we'd be able to get away with talking about higher-level concepts
because we agree to always manipulate a larger structure (a byte, for
example) as a pure value. If we were using bitwise operators, then we
wouldn't be able to get away with it, for example. That said, it's also
quite possible to consider aliasing on higher levels as well; it's just
not possible to point out the lack of aliasing for higher levels of
abstraction, and thus conclude that no aliasing exists. Aliasing is
still possible for entities within those layers of abstraction.
 
C

Chris Smith

Marshall said:
Is it possible you're being distracted by the syntax? WHERE is a
binary operation taking a relation and a filter function. I don't
think of filters as being like array indexing; do they appear
analogous to you? (Always a difficult question because often
times A and B share some qualities and not others, and
what does one call that?)

This is definitely a good question. Nevertheless, I think we're fooling
ourselves if we decide that we've made progress by defining part of the
problem out of existence. The original problem was that when there are
several ways of getting access to something that is identical (rather
than just equal), this causes problems for invariant checking. These
several ways of accessing data are being called 'aliasing' -- I have no
comment on whether that's standard usage or not, because I don't know.
I suspect that aliasing is normally used for something closer to the
physical level. The point is that whatever "aliasing" really means,
what we're discussing is having two paths by which one may access
identical data.

So there are infinitely complex ways in which this can occur. I can
have pointers that are aliased. I can have integers, which by
themselves are just plain values, but can alias each other as indices
into an array. I can have strings that do the same thing for an
associative array. I can, of course, have arbitrarily more complex data
structures with exactly the same issue. I can also have two different
WHERE clauses, which when applied to the same relation yield exactly the
same set of tuples. The problem arises when code is written to update
some value (or set of tuples, in the SQL case, since definitions differ
there) using one pointer/int/string/WHERE clause/etc, and at the same
time an invariant was written using the other pointer/int/WHERE/etc, the
result is that either of:

a) The compiler has to be able to prove that the two are not aliased

b) The compiler has to re-evaluate the invariant from scratch when this
operation is performed.

(Yeah, there's actually a whole continuum between a and b, based on more
limited abilities of the compiler to prove things. I'm oversimplifying,
and I think it's rather benign in this case.)

So in this sense, a WHERE clause is a binary operation in exactly the
same way that an array indexing expression is a binary operation, and
both are capable of aliasing in at least this logical sense.

Now it's certainly true that languages with unrestricted pointers are
less successful at limiting the scope of re-evaluation of invariants.
In the array or SQL relation cases, there's a limited set of value/tuple
modifications that might cause the invariant to need rechecking
(specifically those that point to the same array, or to the relation or
one of its views). A language with unrestricted untyped pointers, if it
doesn't get lucky with the capture analysis, may have to re-evaluate all
invariants in the application on any given assignment. Nevertheless,
the re-evaluation of the invariant still needs to be done.
 
J

Joachim Durchholz

Marshall said:
Good point. Perhaps I should have said "relational algebra +
variables with assignment." It is interesting to consider
assignment vs. the more restricted update operators: insert,
update, delete.

Actually I see it the other way round: assignment is strictly less
powerful than DML since it doesn't allow creating or destroying
variables, while UPDATE does cover assignment to fields.
(However, it's usually new+assignment+delete vs. INSERT+UPDATE+DELETE,
at which point there is not much of a difference.)
Okay. At this point, though, the term aliasing has become extremely
general. I believe "i+1+1" is an alias for "i+2" under this definition.
No, "i+1+1" isn't an alias in itself. It's an expression - to be an
alias, it would have to be a reference to something.

However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
important - 1+1 is replacable by 2 in every context, so this is
essentially the same as saying "a[i+2] is an alias of a[i+2]", which is
vacuously true.

To me, the SQL with the where clause is like i+2, not like a[2].
I'd say WHERE is like [i+2]: neither is valid on its own, it's the
"selector" part of a reference into a table.

Is it possible you're being distracted by the syntax?

I think that's very, very unlikely ;-)
WHERE is a
binary operation taking a relation and a filter function. I don't
think of filters as being like array indexing; do they appear
analogous to you?

Yes.

Filters are just like array indexing: both select a subset of variables
from a collection. In SQL, you select a subset of a table, in a
programming language, you select a subset of an array.

(The SQL selection mechanism is far more flexible in the kinds of
filtering you can apply, while array indexing allows filtering just by
ordinal position. However, the relevant point is that both select things
that can be updated.)
Yes, I think I see what you mean now. This is in part a consequence
of the restricted update operators.

I don't think so. You can update SQL records any way you want.
The unavailability of a record identity is due to the fact that, well,
it's unavailable in standard SQL.

Regards,
Jo
 
M

Marshall

Joachim said:
Actually I see it the other way round: assignment is strictly less
powerful than DML since it doesn't allow creating or destroying
variables, while UPDATE does cover assignment to fields.

Oh, my.

Well, for all table variables T, there exists some pair of
values v and v', such that we can transition the value of
T from v to v' via assignment, but not by any single
insert, update or delete. The reverse is not true. I
would consider that a solid demonstration of which
one is more powerful. Further, it is my understanding
that your claim of row identity *depends* on the restricted
nature of DML; if the only mutator operation is assignment,
then there is definitely no record identity.

(However, it's usually new+assignment+delete vs. INSERT+UPDATE+DELETE,
at which point there is not much of a difference.)

I am not sure what this means. Delete can be expressed in
terms of assignment. (As can insert and update.) (Assignment
can also be expressed in terms of insert and delete.) I
don't know what "new" would be in a value-semantics, relational
world. So I would say it's assignment vs. INSERT+UPDATE+DELETE,
but perhaps I'm not understanding what you mean.

Assignment by itself is complete. Insert and delete together
are complete.

Okay. At this point, though, the term aliasing has become extremely
general. I believe "i+1+1" is an alias for "i+2" under this definition.
No, "i+1+1" isn't an alias in itself. It's an expression - to be an
alias, it would have to be a reference to something.

However, a[i+1+1] is an alias to a[i+2]. Not that this is particularly
important - 1+1 is replacable by 2 in every context, so this is
essentially the same as saying "a[i+2] is an alias of a[i+2]", which is
vacuously true.

To me, the SQL with the where clause is like i+2, not like a[2].
I'd say WHERE is like [i+2]: neither is valid on its own, it's the
"selector" part of a reference into a table.

Is it possible you're being distracted by the syntax?

I think that's very, very unlikely ;-)

Well, I just thought I'd ask. :)

Yes.

Filters are just like array indexing: both select a subset of variables
from a collection.

I can't agree with this wording. A filter produces a collection
value from a collection value. I don't see how variables
enter in to it. One can filter either a collection constant or
a collection variable; if one speaks of filtering a collection
variable, on is really speaking of filtering the collection value
that the variable currently contains; filtering is not an operation
on the variable as such, the way the "address of" operator is.
Note you can't update the result of a filter.

In SQL, you select a subset of a table, in a
programming language, you select a subset of an array.

(The SQL selection mechanism is far more flexible in the kinds of
filtering you can apply, while array indexing allows filtering just by
ordinal position. However, the relevant point is that both select things
that can be updated.)

When you have been saying "select things that can be updated"
I have been assuming you meant that one can derive values
from variables, and that some other operation can update that
variable, causing the expression, if re-evaluated, to produce
a different value. However the phrase also suggests that
you mean that the *result* of the select can *itself* be
updated. Which one do you mean? (Or is there a third
possibility?)

I don't think so. You can update SQL records any way you want.
The unavailability of a record identity is due to the fact that, well,
it's unavailable in standard SQL.

I disagree. The concept of record-identity is quite tenuous, and
depends on specifics of SQL semantics. (In fact I'm not entirely
convinced it's definitely there even in SQL.) It certainly does not
hold in, say, set theory. Set elements do not have identity; they
only have value. If we have a set-semantics language that
supports set variables with assignment, we still do not have
element-identity.


Marshall
 
C

Chris Smith

David Hopwood said:
Chris said:
If checked by execution, yes. In which case, I am trying to get my head
around how it's any more true to say that functional languages are
compilable postconditions than to say the same of imperative languages.

A postcondition must, by definition [*], be a (pure) assertion about the
values that relevant variables will take after the execution of a subprogram.

Okay, my objection was misplaced, then, as I misunderstood the original
statement. I had understood it to mean that programs written in pure
functional languages carry no additional information except for their
postconditions.
 
C

Chris Smith

Darren New said:
I'm not sure what linear or uniqueness typing is. It's typestate, and if
I remember correctly the papers I read 10 years ago, the folks at
TJWatson that invented Hermes also invented the concept of typestate.
They at least claim to have coined the term.

Coining the term is one thing, but I feel pretty confident that the idea
was not invented in 1986 with the Hermes language, but rather far
earlier. Perhaps they may have invented the concept of considering it
any different from other applications of types, though. I still have
trouble delineating how to consider "typestate" different from ordinary
types in formal terms, unless I make the formal model complex enough to
include some kind of programmer-defined identifiers such as variables.
The distinction is not at all relevant to common type system models
based on the lambda calculus.

While acknowledging, on the one hand, that the word "typestate" is used
to describe this, I also object that types have *always* been assigned
to expressions in differing type environments. Otherwise, it would be
impossible to assign types to lambda abstractions in the simply typed
lambda calculus, the simplest of all generally studied type systems.
What is being named here is the overcoming of a limitation that
programming language designers imposed upon themselves, whether from not
understanding the theoretical research or not believing it important, I
don't know.
 
Z

Zoltan Somogyi

Andreas Rossberg said:
Uh, aliasing all over the place! Actually, I think that logic
programming, usually based on deep unification, brings by far the worst
incarnation of aliasing issues to the table.

I agree that deep unification, as implemented in Prolog, brings with it
lots of aliasing problems. However, these problems have been recognized
from the seventies, and people have tried to solve them. There have
been a huge variety of mode systems added to various dialects of Prolog
over the years, which each attempt to tackle (various parts of) the aliasing
problem, none of them fully successfully in my view, since their designers
usually actually *wanted* to retain at least *some* of the expressive power
of the logic variable.

Some non-Prolog logic languages have departed from this approach, the earliest
being the Relational Language. To tie this message to another thread, the
Relational Language had a strict mode system that is as identical as possible
to the notion of typestate in NIL and Hermes given the limitations of
comparing declarative apples with imperative oranges, but significantly
earlier, 1979 vs 1986 IIRC.
I have explored the OO path to its bitter end and am
convinced it is not the way. So what is left? Uniqueness
types and logic programming, I suppose. I enjoy logic
programming but it doesn't seem quite right. But notice:
no pointers there! And it doesn't seem to suffer from the
lack.

Of course, the main logic programming language today that disallows the use
of unification for arbitrary aliasing is Mercury. It enforces this through
a strong mode system. Our motivation for the design of this mode system
was precisely to eliminate the problems Andreas identified above. However
it has also turned out to be an excellent foundation for Mercury's notion of
unique modes, its equivalent of uniqueness types, which Mercury uses to
express I/O.

Zoltan Somogyi <[email protected]> http://www.cs.mu.oz.au/~zs/
Department of Computer Science and Software Engineering, Univ. of Melbourne
 
J

Joachim Durchholz

Marshall said:
Oh, my.

Well, for all table variables T, there exists some pair of
values v and v', such that we can transition the value of
T from v to v' via assignment, but not by any single
insert, update or delete.

I fail to see an example that would support such a claim.

On the other hand, UPDATE can assign any value to any field of any
record, so it's doing exactly what an assignment does. INSERT/DELETE can
create resp. destroy records, which is what new and delete operators
would do.

I must really be missing the point.
Further, it is my understanding
that your claim of row identity *depends* on the restricted
nature of DML; if the only mutator operation is assignment,
then there is definitely no record identity.

Again, I fail to connect.

I and others have given aliasing examples that use just SELECT and UPDATE.
I am not sure what this means. Delete can be expressed in
terms of assignment. (As can insert and update.)

INSERT cannot be expressed in terms of assignment. INSERT creates a new
record; there's no way that assignment in a language like C can create a
new data structure!
The same goes for DELETE.
(Assignment can also be expressed in terms of insert and delete.)

Agreed.

I also realize that this makes it a bit more difficult to nail down the
nature of identity in a database. It's certainly not storage location:
if you DELETE a record and then INSERT it with the same values, it may
be allocated somewhere entirely else, and our intuition would say it's
not "the same" (i.e. not identical). (In a system with OID, it would
even be impossible to recreate such a record, since it would have a
different OID. I'm not sure whether this makes OID systems better or
worse at preserving identity, but that's just a side track.)

Since intuition gives me ambivalent results here, I'll go back to my
semiformal definition (and take the opportunity to make it a bit more
precise):
Two path expressions (lvalues, ...) are aliases if and only if the
referred-to values compare equal, and if they stay equal after applying
any operation to the referred-to value through either of the path
expressions.

In the context of SQL, this means that identity isn't the location where
the data is stored. It's also not the values stored in the record -
these may change, including key data. SQL record identity is local, it
can be defined from one operation to the next, but there is no such
thing as a global identity that one can memorize and look up years
later, without looking at the intermediate states of the store.

It's a gross concept, now that I think about it. Well, or at least
rather alien for us programmers, who are used to taking the address of a
variable to get a tangible identity that will stay stable over time.

On the other hand, variable addresses as tangible identities don't hold
much water anyway.
Imagine data that's written out to disk at program end, and read back
in. Further imagine that while the data is read into main memory,
there's a mechanism that redirects all further reads and writes to the
file into the read-in copy in memory, i.e. whenever any program changes
the data, all other programs see the change, too.
Alternatively, think about software agents that move from machine to
machine, carrying their data with them. They might be communicating with
each other, so they need some means of establishing identity
("addressing") the memory buffers that they use for communication.
I don't know what "new" would be in a value-semantics, relational
world.

It would be INSERT.

Um, my idea of "value semantics" is associated with immutable values.
SQL with INSERT/DELETE/UPDATE certainly doesn't match that definition.

So by my definition, SQL doesn't have value semantics, by your
definition, it would have value semantics but updates which are enough
to create aliasing problems, so I'm not sure what point you're making
here...
I can't agree with this wording. A filter produces a collection
value from a collection value. I don't see how variables
enter in to it.

A collection can consist of values or variables.

And yes, I do think that WHERE is a selection over a bunch of variables
- you can update records after all, so they are variables! They don't
have a name, at least none which is guaranteed to be constant over their
lifetime, but they can be mutated!
One can filter either a collection constant or
a collection variable; if one speaks of filtering a collection
variable, on is really speaking of filtering the collection value
that the variable currently contains; filtering is not an operation
on the variable as such, the way the "address of" operator is.
Note you can't update the result of a filter.

If that's your definition of a filter, then WHERE is not a filter,
simple as that.
When you have been saying "select things that can be updated"
I have been assuming you meant that one can derive values
from variables, and that some other operation can update that
variable, causing the expression, if re-evaluated, to produce
a different value.

That's what I meant.
However the phrase also suggests that
you mean that the *result* of the select can *itself* be
updated.

The "that" in "things that can be updated" refers to the selected
things. I'm not sure how this "that" could be interpreted to refer to
the selection as a whole (is my understanding of English really that bad?)
Which one do you mean? (Or is there a third
possibility?)

I couldn't tell - I wouldn't have thought that there are even two
possibilities.

Regards,
Jo
 
R

Rob Warnock

+---------------
| INSERT cannot be expressed in terms of assignment. INSERT creates a new
| record; there's no way that assignment in a language like C can create a
| new data structure! The same goes for DELETE.
+---------------

Well, what about "malloc()" & "free()"? I mean, if your
"database" is a simple linked list, then INSERT is just:

ROW *p = malloc(sizeof *p);
p->field1 = value1;
p->field2 = value2;
...
p->fieldN = valueN;
database = cons(p, database); /* Common Lisp's CONS */

and DELETE is just:

ROW *p = find_if(predicate_function, database); /* CL's FIND-IF */
database = delete(p, database); /* CL's DELETE */
free(p);

[There are single-pass methods, of course, but...]


-Rob
 
J

Joachim Durchholz

Chris said:
David Hopwood said:
Chris said:
If checked by execution, yes. In which case, I am trying to get my head
around how it's any more true to say that functional languages are
compilable postconditions than to say the same of imperative languages.

A postcondition must, by definition [*], be a (pure) assertion about the
values that relevant variables will take after the execution of a subprogram.

Okay, my objection was misplaced, then, as I misunderstood the original
statement. I had understood it to mean that programs written in pure
functional languages carry no additional information except for their
postconditions.

Oh, but that's indeed correct for pure functional languages. (Well, they
*might* carry additional information anyway, but it's not a requirement
to make it a programming language.)

The answer that I have for your original objection may be partial, but
here goes anyway:

Postconditions cannot easily capture all side effects.
E.g. assume there's a file-open routine but no way to test whether a
given file handle was ever opened (callers are supposed to test the
return code from the open() call).
In an imperative language, that's a perfectly valid (though possibly not
very good) library design.
Now the postcondition would have to say something like "from now on,
read() and write() are valid on the filehandle that I returned". I know
of no assertion language that can express such temporal relationships,
and even if there is (I'm pretty sure there is), I'm rather sceptical
that programmers would be able to write correct assertions, or correctly
interpret them - temporal logic offers several traps for the unwary.
(The informal postcondition above certainly isn't complete, since it
doesn't cover close(); I shunned the effort to get a wording that would
correctly cover sequences like open()-close()-open(), or
open()-close()-close()-open().)

Regards,
Jo
 
J

Joachim Durchholz

Rob said:
+---------------
| INSERT cannot be expressed in terms of assignment. INSERT creates a new
| record; there's no way that assignment in a language like C can create a
| new data structure! The same goes for DELETE.
+---------------

Well, what about "malloc()" & "free()"?

These do exactly what assignment cannot do.

The correspondence between SQL and C would be:
INSERT - malloc() (create a new data structure)
DELETE - free() (destroy a data structure)
UPDATE - assignment (change data within a data structure)

Regards,
Jo
 
C

Chris Smith

Joachim Durchholz said:
I fail to see an example that would support such a claim.

On the other hand, UPDATE can assign any value to any field of any
record, so it's doing exactly what an assignment does. INSERT/DELETE can
create resp. destroy records, which is what new and delete operators
would do.

I must really be missing the point.

I *think* I understand Marshall here. When you are saying "assignment",
you mean assignment to values of attributes within tuples of the cell.
When Marshall is saying "assignment", he seems to mean assigning a
completely new *table* value to a relation; i.e., wiping out the entire
contents of the relation and replacing it with a whole new set of
tuples. Your assignment is indeed less powerful than DML, whereas
Marshall's assignment is more powerful than DML.
 
M

Marshall

Chris said:
I *think* I understand Marshall here. When you are saying "assignment",
you mean assignment to values of attributes within tuples of the cell.
When Marshall is saying "assignment", he seems to mean assigning a
completely new *table* value to a relation; i.e., wiping out the entire
contents of the relation and replacing it with a whole new set of
tuples. Your assignment is indeed less powerful than DML, whereas
Marshall's assignment is more powerful than DML.

Exactly.


Marshall
 
M

Marshall

Joachim said:
I fail to see an example that would support such a claim.

variable T : unary relation of int
T = { 1, 2, 3 }; // initialization
T := { 4, 5 }; // assignment

The above transition of the value of T cannot be be
done by any one single insert, update or delete.
Two would suffice, however. (In fact, any assignement
can be modeled at a full delete followed by an insert
of the new value.)

On the other hand, UPDATE can assign any value to any field of any
record,
Yes.

so it's doing exactly what an assignment does.

No. The variable is the table, not the records. Relations are not
arrays.
Records are not lvalues.

INSERT/DELETE can
create resp. destroy records, which is what new and delete operators
would do.

I must really be missing the point.


Again, I fail to connect.

I and others have given aliasing examples that use just SELECT and UPDATE.

Sure, but update's semantics are defined in a per-record way,
which is consistent with record identity. Assignment's isn't.

INSERT cannot be expressed in terms of assignment. INSERT creates a new
record; there's no way that assignment in a language like C can create a
new data structure!
The same goes for DELETE.

I was intendind to be discussing a hypothetical relation-based
language,
so while I generally agree with you statement about C, I don't see
how it applies.

Agreed.

I also realize that this makes it a bit more difficult to nail down the
nature of identity in a database.

I would propose that variables have identity, and values do not.
In part this is via the supplied definition of identity, in which, when
you change one thing, if something else changes as well, they
share identity. Since one cannot change values, they necessarily
lack identity.

It's certainly not storage location:
if you DELETE a record and then INSERT it with the same values, it may
be allocated somewhere entirely else, and our intuition would say it's
not "the same" (i.e. not identical).

Well, it would depend on how our intuition had been primed. If it
was via implementation techniques in low level languages, we
might reach a different conclusion than if our intuition was primed
via logical models and relation theory.

(In a system with OID, it would
even be impossible to recreate such a record, since it would have a
different OID. I'm not sure whether this makes OID systems better or
worse at preserving identity, but that's just a side track.)

OIDs are something of a kludge, and they break set semantics.

Since intuition gives me ambivalent results here, I'll go back to my
semiformal definition (and take the opportunity to make it a bit more
precise):
Two path expressions (lvalues, ...) are aliases if and only if the
referred-to values compare equal, and if they stay equal after applying
any operation to the referred-to value through either of the path
expressions.

Alas, this leaves me confused. I don't see how a path expression
(in this case, SELECT ... WHERE) can be an l-value. You cannot
apply imperative operations to the result. (Also I think the use
of equality here is too narrow; it is only necessary to show that
two things both change, not that they change in the same way.)

I was under the impression you agred that "i+2" was not
a "path expression". If our hypothetical language lacks record
identity, then I would say that any query is simply an expression
that returns a value, as in "i+2."

In the context of SQL, this means that identity isn't the location where
the data is stored. It's also not the values stored in the record -
these may change, including key data. SQL record identity is local, it
can be defined from one operation to the next, but there is no such
thing as a global identity that one can memorize and look up years
later, without looking at the intermediate states of the store.

Yes, however all of this depends on record identity.

It's a gross concept, now that I think about it. Well, or at least
rather alien for us programmers, who are used to taking the address of a
variable to get a tangible identity that will stay stable over time.

It is certaily alien if one is not used to relation semantics, which
is the default case.

On the other hand, variable addresses as tangible identities don't hold
much water anyway.
Imagine data that's written out to disk at program end, and read back
in. Further imagine that while the data is read into main memory,
there's a mechanism that redirects all further reads and writes to the
file into the read-in copy in memory, i.e. whenever any program changes
the data, all other programs see the change, too.
Alternatively, think about software agents that move from machine to
machine, carrying their data with them. They might be communicating with
each other, so they need some means of establishing identity
("addressing") the memory buffers that they use for communication.

These are exactly why content-based addressing is so important.
Location addressing depends on an address space, and this
concept does not distribute well.

It would be INSERT.

Um, my idea of "value semantics" is associated with immutable values.
SQL with INSERT/DELETE/UPDATE certainly doesn't match that definition.

Sorry, I was vague. Compare, in OOP, the difference between a value
object and a "regular" object.

So by my definition, SQL doesn't have value semantics, by your
definition, it would have value semantics but updates which are enough
to create aliasing problems, so I'm not sure what point you're making
here...


A collection can consist of values or variables.

And yes, I do think that WHERE is a selection over a bunch of variables
- you can update records after all, so they are variables! They don't
have a name, at least none which is guaranteed to be constant over their
lifetime, but they can be mutated!

We seem to have slipped back from the hypothetical relation language
with only assignement back to SQL.

If that's your definition of a filter, then WHERE is not a filter,
simple as that.

Fair enough! Can you correct my definition of filter, though?
I am still unaware of the difference.

That's what I meant.


The "that" in "things that can be updated" refers to the selected
things. I'm not sure how this "that" could be interpreted to refer to
the selection as a whole (is my understanding of English really that bad?)

Your English is extraordinary. I could easily conclude that you
were born in Boston and educated at Harvard, and either have
Germanic ancestry or have simply adopted a Germanic name
out of whimsy. If English is not your native tongue, there is no
way to detect it.

Argh, late for dropping off my daughter at school now. Must run.
Sorry if I was a bit unclear due to being rushed.


Marshall
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top