Writing single bits to a file

C

Charlie Gordon

Charlie Gordon said:
cr88192 said:
Keith Thompson said:
[...]
for exponentiation, I would suggest you do use ** as in Fortran. It
is not ambiguous because a ** b has no meaning for b scalar or
struct type. You would need to bend the grammar a bit, but at least
the precedence would be much better suited than that of ^

I had a '**' operator at one time before, but it kept clashing with
pointers handling. as a result, if it existed, it would have to be
parsed specially, and have rules to avoid accidentally mis-parsing a
pointer operation:
"i**s++", where the intention was actually "i*(*s++)". provably
disambiguating this would require more info than the parser has
available (the only real option would be, for example, requiring
spaces...).

'a^.b' could be another option (where '^.', is given a high rather
than a low precedence).
[...]

"**" can be unambiguous if you allow parsing to be affected by
semantic analysis. But typically the source is tokenized before it's
parsed and analyzed (even though all these things theoretically happen
in translation phase 7). If you see ``x**y'', you can't tell whether
it's ``x ** y'' (an exponentiation) or ``x * *y'' without knowing the
type of y.

knowing the type of y is the problem, though theoretically it could be
handled by parse-tree tweaking, if I had the parse tree at the same point
I was doing type handling (in my compiler, I do not, since these are
handled as different stages).

a later frontend may also make type info available in more of the upper
compiler, such that such inferences can be made...

Probably your best option.
yeah, I considered, but did not accept these ideas...
it matters to me that compatibility not be broken.

Wise choice.
however, I may at some point add such an operator (after all, my last
script language had such an operator...).

short circuit xor does not get much usage IMHO.
'^.' still seems like a better option IMO, since it resembles '^', but is
a different operator...
(I can just make it have a very different precedence than '^').

ok, this drops the precedence-similarity idea (if the new operators have
different precedences than the old ones they resemble).

'&.', '|.', and '^.' might be made tightly binding (slightly more tightly
than '*' and '/').
'*.' and '/.' will be the same as '*' and '/'.
'+.' and '-.' will be the same as '+' and '-'.

'*.' could thus be an alternative for dot product, and maybe an
additional multiply form (is some other cases).
'/.' could be used for a 'reverse divide' for types with non-communitive
multiplication and division (such as quaternions, which currently use a
builtin function for this). potentially, it could also serve as a
shorthand for dividing ints and getting a float (aka: cast-free).

No, these tokens are really problematic. I pointed at ``1.^2'' that would
become ambiguous if you attach semantics to ^ for floating point values
(as you may have), as is unequivically parsed as 1. ^ 2. ; at least the
.^ and more generally . prefixed arithmetic operators you are considering
would not cause incompatibilities with current C syntax, just parsing
surprises for programmers trying to use your extensions. Adding tokens
with a trailing . do create incompatibilites with the current C syntax as
it would cause legitimate expressions to be parsed differently. Consider
these:

.5==x^.5==y // x equals 0.5 or y equals 0.5 but not both.
.8<x&.9>x
1.1==x|.9==y
x/.1
y*.2
z-.2
...

I always put around binary operators but a lot of programmers don't.

I always put *spaces* around binary operators but a lot of programmers
don't.
 
K

Kenneth Brody

Eric Sosman wrote:
[...]
gets worse: Some of the best compressors encode their output
in fractions of bits; if a file length of 10007 bits is bad,
a length of 10006.59029663+ bits is *really* bad!
[...]

I've never heard of that. What's a fraction of a bit? Is it
something that needs to hold less than two states?

What would you call those states? "True" and "Fal"? :)

--
+-------------------------+--------------------+-----------------------+
| Kenneth J. Brody | www.hvcomputer.com | #include |
| kenbrody/at\spamcop.net | www.fptech.com | <std_disclaimer.h> |
+-------------------------+--------------------+-----------------------+
Don't e-mail me at: <mailto:[email protected]>
 
B

Ben Bacarisse

Charlie Gordon said:
I always put *spaces* around binary operators but a lot of programmers
don't.

I found your original version delightfully self-referential -- to the
point where if you had written "I put around operators but many put."
I would have though it deliberate!
 
K

Keith Thompson

Charlie Gordon said:
Of course, I'm not proposing ** to be a token, but x * *y to be
reinterpreted as fexp(x, y) if y is a numeric type. This trick can be
played on the parse tree if you have one, at code generation time, or on the
fly if you generate code directly. The programmer would be more inclined to
write x ** y or x**y, but it is parsed as x * *y. This trick would be more
difficult to play in an interpreter with dynamic typing, but still possible,
by sticking the appropriate behaviour to fexp(x, y) for y pointer type.
[...]

My gut reaction to this idea is: Ick.

If I were designing a new language with a "**" operator, I'd just make
"**" a token. If "*" is also a unary operator, then "x * *y" would
require a space. The kind of special-case treatment you suggest is,
in my opinion, just too convoluted.

I like the way tokenization and analysis are separated in C. It makes
the language easier to implement and, more importantly, easier to
describe. A more complicated definition might allow "x+++++y" to be
legal, but at the cost of creating odd corner cases that couldn't be
resolved without detailed analysis of the standard.

And if you're adding extensions to the language, it's not unlikely
that you'd eventually want to add operator overloading. How do you
overload "**" if it' a composite of "*" and "*", and how do you
interpret "x**y" if either interpretation could be correct?
 
K

Keith Thompson

cr88192 said:
however, I may at some point add such an operator (after all, my last
script language had such an operator...).

Um, a short-circuit xor operator is logically impossible; you have to
know the values of both operands to determine the result.

On the other hand, a logical xor might make sense (it would yield 0 or
1 rather than the bitwise result), and "^^" would be a sensible symbol
for it.

Objective-C, if i recall correctly, uses the "@" character for all its
extensions to C, which avoids any incompatibilities. You might
consider a similar approach. For example, you might make "@**" a
token and use it as an exponentiation operator.
 
?

=?iso-2022-kr?q?=1B=24=29CHarald_van_D=0E=29=26=0F

I had a '**' operator at one time before, but it kept clashing with
pointers handling. as a result, if it existed, it would have to be
parsed specially, and have rules to avoid accidentally mis-parsing a
pointer operation: "i**s++", where the intention was actually
"i*(*s++)". provably disambiguating this would require more info than
the parser has available (the only real option would be, for example,
requiring spaces...).

Wouldn't you be able to define the ** operator as either accepting two
arithmetic types, or accepting an arithmetic left operand and a pointer
right operand? This would mean you could write a * *p to mean a * (*p),
but not to mean pow(a, p), and that you could write a**b to mean either
a * (*b) or pow(a, b). Additionally, it would mean you don't have parsing
problems: a**b is always two operands for one operator, regardless of
whether this operator performs one operation or two.
 
C

cr88192

Charlie Gordon said:
cr88192 said:
Keith Thompson said:
[...]
for exponentiation, I would suggest you do use ** as in Fortran. It
is not ambiguous because a ** b has no meaning for b scalar or
struct type. You would need to bend the grammar a bit, but at least
the precedence would be much better suited than that of ^

I had a '**' operator at one time before, but it kept clashing with
pointers handling. as a result, if it existed, it would have to be
parsed specially, and have rules to avoid accidentally mis-parsing a
pointer operation:
"i**s++", where the intention was actually "i*(*s++)". provably
disambiguating this would require more info than the parser has
available (the only real option would be, for example, requiring
spaces...).

'a^.b' could be another option (where '^.', is given a high rather
than a low precedence).
[...]

"**" can be unambiguous if you allow parsing to be affected by
semantic analysis. But typically the source is tokenized before it's
parsed and analyzed (even though all these things theoretically happen
in translation phase 7). If you see ``x**y'', you can't tell whether
it's ``x ** y'' (an exponentiation) or ``x * *y'' without knowing the
type of y.

knowing the type of y is the problem, though theoretically it could be
handled by parse-tree tweaking, if I had the parse tree at the same point
I was doing type handling (in my compiler, I do not, since these are
handled as different stages).

a later frontend may also make type info available in more of the upper
compiler, such that such inferences can be made...

Probably your best option.
yeah, I considered, but did not accept these ideas...
it matters to me that compatibility not be broken.

Wise choice.
however, I may at some point add such an operator (after all, my last
script language had such an operator...).

short circuit xor does not get much usage IMHO.

has more use, but more as a 'logical xor', since it is not possible to
short-cicuit like is possible with '&&' or '||'.
No, these tokens are really problematic. I pointed at ``1.^2'' that would
become ambiguous if you attach semantics to ^ for floating point values
(as you may have), as is unequivically parsed as 1. ^ 2. ; at least the
.^ and more generally . prefixed arithmetic operators you are considering
would not cause incompatibilities with current C syntax, just parsing
surprises for programmers trying to use your extensions. Adding tokens
with a trailing . do create incompatibilites with the current C syntax as
it would cause legitimate expressions to be parsed differently. Consider
these:

.5==x^.5==y // x equals 0.5 or y equals 0.5 but not both.
.8<x&.9>x
1.1==x|.9==y
x/.1
y*.2
z-.2
...

I always put around binary operators but a lot of programmers don't.

odd, I always thought the preceeding decimal digits were required.
at least in my parser, the number will not be recognized as a number, unless
it starts with a decimal digit, say, '0'...

 
K

Keith Thompson

Harald van D©¦k said:
Wouldn't you be able to define the ** operator as either accepting two
arithmetic types, or accepting an arithmetic left operand and a pointer
right operand? This would mean you could write a * *p to mean a * (*p),
but not to mean pow(a, p), and that you could write a**b to mean either
a * (*b) or pow(a, b). Additionally, it would mean you don't have parsing
problems: a**b is always two operands for one operator, regardless of
whether this operator performs one operation or two.

Interesting idea, but it would change operator precedence in ways that
I don't want to think about.
 
K

Keith Thompson

cr88192 said:
odd, I always thought the preceeding decimal digits were required.
at least in my parser, the number will not be recognized as a number,
unless it starts with a decimal digit, say, '0'...

Take a look at the syntax for a floating-constant, C99 6.4.4.2.

Just as a matter of style, I never use a leading or trailing decimal
point (I at least prepend or append a 0), but it's permitted.
 
C

CBFalconer

Kenneth said:
Eric Sosman wrote:
[...]
gets worse: Some of the best compressors encode their output
in fractions of bits; if a file length of 10007 bits is bad,
a length of 10006.59029663+ bits is *really* bad!
[...]

I've never heard of that. What's a fraction of a bit? Is it
something that needs to hold less than two states?

Look up arithmetic compression.
 
C

cr88192

Keith Thompson said:
Take a look at the syntax for a floating-constant, C99 6.4.4.2.

Just as a matter of style, I never use a leading or trailing decimal
point (I at least prepend or append a 0), but it's permitted.

ok, I missed that, having assumed the leading digits were required.

I am not sure if in-practice things are done this way, or not.
2 options here:
disallow floating point numbers lacking a numeric prefix ('.5' being
technically invalid);
adding a 'disambiguation rule', such that whitespace is required following
the dot if following the dot could ambiguously be confused as a
fractional-number.

'2^.3' would thus be invalid (parsed as '2 ^ .3'), and would thus have to be
written:
'2^. 3'.

'^:', is also possible, but IMO uglier, and has potential implications (the
above operator style at least has precedent in a few certain specific
functional languages...).

'^,' is also possible, since these operators are not allowed standalone or
in suffix position.

(just here looking for a rule I can generalize to create a number of
auxilary operators is all).
 
K

Keith Thompson

cr88192 said:
ok, I missed that, having assumed the leading digits were required.

I am not sure if in-practice things are done this way, or not.

I'm not sure what you mean. In practice, all C compilers allow
floating-constants with leading or trailing decimal points.
2 options here:
disallow floating point numbers lacking a numeric prefix ('.5' being
technically invalid);

If you're designing your own language, you can do that. If you want
something compatible with C, you can't. '.5' is perfectly valid.

[...]
(just here looking for a rule I can generalize to create a number of
auxilary operators is all).

Just my opinion: if you have to jump through a lot of hoops to make
something work, you should consider backing off and using a different
approach.

I think I suggested using '@' for any added syntax, since C doesn't
already use it. '$' and '`' are also possibilities (though '`' and
''' can be difficult to distinguish).

Incidentally, please don't quote signatures (the stuff following
"-- ") unless you're actually commenting on them.
 
C

cr88192

Keith Thompson said:
I'm not sure what you mean. In practice, all C compilers allow
floating-constants with leading or trailing decimal points.

well, it is not so much about compilers or standards, but about existing
code and coding practices.
do people actually write code this way?...
I guess I will have to assume they do.

2 options here:
disallow floating point numbers lacking a numeric prefix ('.5' being
technically invalid);

If you're designing your own language, you can do that. If you want
something compatible with C, you can't. '.5' is perfectly valid.

yeah.

[...]

(just here looking for a rule I can generalize to create a number of
auxilary operators is all).

Just my opinion: if you have to jump through a lot of hoops to make
something work, you should consider backing off and using a different
approach.

I think I suggested using '@' for any added syntax, since C doesn't
already use it. '$' and '`' are also possibilities (though '`' and
''' can be difficult to distinguish).

yeah.
@ and $, are possible, and are good for standalone operators, but not for
masses of operators (for example, with the conventions before, it was fairly
easy to come up with 20+ new possible operators).

@ and $, are ok.
@+, +@, $/, ... now you are just getting ugly...


so, in my view, we need a dominant character, and a softer modifier
character ('.' was considered as a character, because this is what ocaml
did...). (many of the good double-dominant combos are already used
anyways...).



softer modifiers can be:
characters that hopefully have no other contextually meaningful use (thus
leading to ambiguity);
are not visually misleading (IMO, this makes , ; and : bad choices for
modifiers).

+`, -`, ~`, ... are probably better options though, since otherwise this
characher is not used for anything (and in all sanity, this should not clash
with anything). it will be assumed that we know what character it is due to
context.

though still possible, '.' would probably require disambiguating spaces in
too many situations...


so, possible '2`3' representing fexp(2, 3)...

Incidentally, please don't quote signatures (the stuff following
"-- ") unless you're actually commenting on them.

ok, will try to remember this...
 
B

Ben Pfaff

Keith Thompson said:
I think I suggested using '@' for any added syntax, since C doesn't
already use it. '$' and '`' are also possibilities (though '`' and
''' can be difficult to distinguish).

Using any of these would require not just new syntax but an
addition to the basic source character set, which does not
currently contain any of those characters.
 
K

Keith Thompson

Ben Pfaff said:
Using any of these would require not just new syntax but an
addition to the basic source character set, which does not
currently contain any of those characters.

Which is no more problematic than adding new operator symbols.
 
K

Keith Thompson

cr88192 said:
well, it is not so much about compilers or standards, but about
existing code and coding practices.
do people actually write code this way?...
I guess I will have to assume they do.

*What* is about existing code and coding practices?

You're implementing a C-like compiler that provides some significant
set of extensions, right? If you're not concerned about the
requirements of the existing C standard, why post here?
 
B

Ben Pfaff

Keith Thompson said:
Which is no more problematic than adding new operator symbols.

Sure it is. Why would adding new operators that use characters
that are already in the character set be problematic, especially
if the operators were carefully chosen to avoid ambiguity? But
adding new characters to the character set potentially requires
some people to get new keyboards, fonts, etc., and practically
speaking would necessitate new digraphs, trigraphs, additions to
<iso646.h>, and so on. In the extreme, consider APL.
 
C

Charlie Gordon

Ben Pfaff said:
Sure it is. Why would adding new operators that use characters
that are already in the character set be problematic, especially
if the operators were carefully chosen to avoid ambiguity? But
adding new characters to the character set potentially requires
some people to get new keyboards, fonts, etc., and practically
speaking would necessitate new digraphs, trigraphs, additions to
<iso646.h>, and so on. In the extreme, consider APL.

While using non ASCII characters would cause headaches, $ and especially @
are now universally supported. I used to program in APL a lot: I stopped
when I no longer had access to the proper keyboard ;-)
 
R

Richard Heathfield

Charlie Gordon said:
I used to program in APL a lot: I stopped
when I no longer had access to the proper keyboard ;-)

Ah, you should have said before. I chucked out an APL keyboard two or three
weeks ago. It'll be about fifteen feet deep in the landfill by now.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,997
Messages
2,570,241
Members
46,833
Latest member
BettyeMacf

Latest Threads

Top