Isn't 'std:.string::npos' an integral constant?

H

Hendrik Schober

Hi,

this
#include <string>
class test {
typedef std::string::size_type size_type;
static const size_type x = std::string::npos;
};
doesn't compile using either VC9 ("expected constant expression")
or Comeau Online ("constant value is not known"). If I replace
'std::string::npos' by '-1' it compiles.
Why isn't 'std::string;::npos' a "known constant expression"? What
am I missing?

TIA;

Schobi
 
J

James Kanze

Hard to say. AFAICT, 14.7.1/1 requirements are met, the
static data member 'npos' should be instantiated (and the
definition should exist), and since 'npos' itself was
initialised with a const expression, it is allowed to be used
in another const expression (5.19/1). Seems like a bug in
both compilers.

Or in the standard? I'm not convinced that there is a
requirement that npos be an integral constant expression,
Basically, it has to be a static member of const size_type, and
it has to have the value of -1, but that's all I see (unless you
consider the presentation of the class as binding---I'm not sure
it is supposed to be). I would expect that basic_string has
been explicitly specialized for char and wchar_t (in order to
possibly use special optimizations), and in that case, there's
absolutely no guarantee that the compiler can see the
initialization of npos.
 
H

Hendrik Schober

James said:
Or in the standard? I'm not convinced that there is a
requirement that npos be an integral constant expression,
Basically, it has to be a static member of const size_type, and
it has to have the value of -1, but that's all I see (unless you
consider the presentation of the class as binding---I'm not sure
it is supposed to be). I would expect that basic_string has
been explicitly specialized for char and wchar_t (in order to
possibly use special optimizations), and in that case, there's
absolutely no guarantee that the compiler can see the
initialization of npos.

Thank you both for taking the time to look at this.
However, I'm not convinced.
So I went and had a look at the standard (which I don't
like, as I have a really hard time understanding that
legaleze).

For one, there's 5.19 which says:
"An integral constant expression can involve [...]
static data members of integral or enumeration types
[...]."
So it seems that a static data member of integral type
is a constant integral expression.
Then there's 9.4.2/4:
"If a static data member is of const integral or const
enumeration type, its declaration in the class
definition can specify a constant initializer which
shall be an integral constant expression (5.19)."
The way I read this, my 'test::x', which is of integral
type, can be initialized in-class, if I use a constant
integral expression.
Finally, 21.3 shows 'std::basic_string<T>::npos' as
static const size_type npos = -1;
(which is the only thing I was able to find resembling
a definition of what it should be).

To me this looks like 'std::string::npos' is defined as
being '-1', which would make it an integral constant
definition, which in turn means it could be used to
initialize constant static data member of integral type.
Which I thought I tried to do.

I'm sure I missed something along the way (for example,
I have no idea what 14.7.1 has to do with all this), but
I wouldn't know what.

Schobi
 
J

James Kanze

Thank you both for taking the time to look at this.
However, I'm not convinced.
So I went and had a look at the standard (which I don't
like, as I have a really hard time understanding that
legaleze).
For one, there's 5.19 which says:
"An integral constant expression can involve [...]
static data members of integral or enumeration types
[...]."
So it seems that a static data member of integral type
is a constant integral expression.

Not necessarily. You cut part of the requirements. The
essential parts read "An integral constant expression can
involve [...]const variables or static data members of integral
or enumeration types initialized with constant
expressions,[...]" It's quite clear that the "or" between
"variables" and "static data members" joins just those two
nominal groups; that everything else (const, initialized with
constant expressions) applies to both. The standard requires
that std::string::npos be const, and that it have the value -1,
but it doesn't require that it be initialized with a constant
expression. And in practice, the usual interpretation here is
that the initialization with the constant expression must be
visible in the translation unit which is being compiled.
Then there's 9.4.2/4:
"If a static data member is of const integral or const
enumeration type, its declaration in the class
definition can specify a constant initializer which
shall be an integral constant expression (5.19)."
The way I read this, my 'test::x', which is of integral
type, can be initialized in-class, if I use a constant
integral expression.

It can be. Yes.
Finally, 21.3 shows 'std::basic_string<T>::npos' as
static const size_type npos = -1;
(which is the only thing I was able to find resembling
a definition of what it should be).

And this is where it is unclear. How significant is the fact
that the value is given in this format? It is obvious that the
standard doesn't require textual identity with its class
definitions, but just what does it require? (And I'd consider
it a defect that there is nothing about npos outside of the
class definition.)
To me this looks like 'std::string::npos' is defined as
being '-1', which would make it an integral constant
definition, which in turn means it could be used to
initialize constant static data member of integral type.
Which I thought I tried to do.
I'm sure I missed something along the way (for example,
I have no idea what 14.7.1 has to do with all this), but
I wouldn't know what.

The real question is whether the initializer of npos is visible
when you want to use it as an integral constant expression. I
would sort of expect it to be in most implementations, because
this seems like the simplest way of doing it. But I'm far from
sure that it is required.
 
H

Hendrik Schober

James said:
For one, there's 5.19 which says:
"An integral constant expression can involve [...]
static data members of integral or enumeration types
[...]."
So it seems that a static data member of integral type
is a constant integral expression.

Not necessarily. You cut part of the requirements. The
essential parts read "An integral constant expression can
involve [...]const variables or static data members of integral
or enumeration types initialized with constant
expressions,[...]" It's quite clear that the "or" between
"variables" and "static data members" joins just those two
nominal groups; that everything else (const, initialized with
constant expressions) applies to both. [...]

(Is it? Well, did I say I dread having to read the standard?)
Then there's 9.4.2/4:
"If a static data member is of const integral or const
enumeration type, its declaration in the class
definition can specify a constant initializer which
shall be an integral constant expression (5.19)."
The way I read this, my 'test::x', which is of integral
type, can be initialized in-class, if I use a constant
integral expression.

It can be. Yes.
Finally, 21.3 shows 'std::basic_string<T>::npos' as
static const size_type npos = -1;
(which is the only thing I was able to find resembling
a definition of what it should be).

And this is where it is unclear. How significant is the fact
that the value is given in this format? It is obvious that the
standard doesn't require textual identity with its class
definitions, but just what does it require? [...]

It doesn't? I would have thought it does.
The real question is whether the initializer of npos is visible
when you want to use it as an integral constant expression. I
would sort of expect it to be in most implementations, because
this seems like the simplest way of doing it. But I'm far from
sure that it is required.

In VC9's implementation (Dinkumware), 'npos' is initialized
after the class template's definition. But so it was in VC71
(Dinkumware, too), where th same code used to compile...

Anyway, thanks for taking the time to explain.

Schobi
 
H

Hendrik Schober

Paavo said:
Integral const expressions are usuful mostly for declaring C-style array
sizes and for template specialization. Most probably you never want to
declare an array of std::string::npos elements. OTOH, it may be imaginable
that you want to specialize a template with that value, but I guess this
was not considered a sufficiently sound reason to require std::string::npos
to be a const expression.

In my case it was some open source lib which, for reasons I haven't
looked into, introduced its own string class while depending on
'std::string' to supply its 'npos' value...

Schobi
 
J

James Kanze

James Kanze wrote:

[...]
In VC9's implementation (Dinkumware), 'npos' is initialized
after the class template's definition. But so it was in VC71
(Dinkumware, too), where th same code used to compile...

That's interesting. If I understand correctly, you're saying
that the code is basically something like:

template< typename charT ... >
class basic_string
{
// ...
size_t const npos ;
// ...
} ;

template< typename charT ... >
size_t const basic_string::npos = -1 ;

without any explicit specialization of basic_string.

If so, it's an interesting case. According to §14.6.4.1, the
point of instantiation of the members of the class "immediately
follows the namespace scope declaration or definition that
refers to the specialization." So given:

struct Test
{
static std::size_t const t = std::string::npos ;
} ;

, the instantiation of the static class member (which is what
contains the initialization) doesn't occur until after the end
of Test, so at the point of declaration of Test::t, the
initializer is not visible.

Should the compiler "see" this initialization or not. Quite
frankly, I think that the standard is ambiguous in this regard.
Or rather, it clearly specifies something that isn't
implementable, so we have to guess what is really meant. The
actual words are "[...]const variables or static data members of
integral or enumeration types initialized with constant
expressions." There's actually nothing there which constrains
the requirement to "visible" initializations, and taken
literally (and ignoring template issues and such), given
something like:

struct X
{
static int const i ;
} ;

, X::i should be an integral constant expression if it was
initializations with a constant expression, even if the actual
initialization were in a completely different translation unit.
This is obviously not the intent, of course, since it makes
separate compilation impossible. But what is the intent? If it
is that the initialization must be visible at the point where
the expression is used, then your example shouldn't compile,
because the initialization isn't visible until the static data
member of the template is initialized, after the class
definition which uses it. If it is that the initialization must
be visible in the translation unit, then your code is legal.

Sounds like a defect report is in order.
 
H

Hendrik Schober

James said:
James Kanze wrote:
[...]
In VC9's implementation (Dinkumware), 'npos' is initialized
after the class template's definition. But so it was in VC71
(Dinkumware, too), where th same code used to compile...

That's interesting. If I understand correctly, you're saying
that the code is basically something like:

template< typename charT ... >
class basic_string
{
// ...
size_t const npos ;
// ...
} ;

template< typename charT ... >
size_t const basic_string::npos = -1 ;

Yep.
without any explicit specialization of basic_string.

I haven't found one. ('std::string' is a simple 'typedef' in
both versions.)
If so, it's an interesting case. According to §14.6.4.1, the
point of instantiation of the members of the class "immediately
follows the namespace scope declaration or definition that
refers to the specialization." So given:

struct Test
{
static std::size_t const t = std::string::npos ;
} ;

, the instantiation of the static class member (which is what
contains the initialization) doesn't occur until after the end
of Test, so at the point of declaration of Test::t, the
initializer is not visible.

That is the same as for member functions which are parsed
as if they were defined immediately following the class,
right?
Should the compiler "see" this initialization or not. Quite
frankly, I think that the standard is ambiguous in this regard.
Or rather, it clearly specifies something that isn't
implementable, so we have to guess what is really meant. The
actual words are "[...]const variables or static data members of
integral or enumeration types initialized with constant
expressions." There's actually nothing there which constrains
the requirement to "visible" initializations, and taken
literally (and ignoring template issues and such), given
something like:

struct X
{
static int const i ;
} ;

, X::i should be an integral constant expression if it was
initializations with a constant expression, even if the actual
initialization were in a completely different translation unit.

Which is what I thought.
This is obviously not the intent, of course, since it makes
separate compilation impossible.

Does it? If so, I'm missing why the actual value of your
'Test::t' needs to be known at compile-time. Is this what
is wanted? Is this specified somewhere?
But what is the intent? If it
is that the initialization must be visible at the point where
the expression is used, then your example shouldn't compile,
because the initialization isn't visible until the static data
member of the template is initialized, after the class
definition which uses it. If it is that the initialization must
be visible in the translation unit, then your code is legal.

Interestingly, Comeau compile this
struct test1 {
static const int x = -1;
};

struct test2 {
static const int x = test1::x;
};
without any complaints. However, it compiles this
struct test1 {
static const int x;
};
const int test1::x = 1;

struct test2 {
static const int x = test1::x;
};
just as well.
Sounds like a defect report is in order.

Probably, but I won't do it. I have a hard time /reading/
this. No chance of me /writing/ any of it. :(

Schobi
 
J

James Kanze

James said:
James Kanze wrote:
[...]
In VC9's implementation (Dinkumware), 'npos' is initialized
after the class template's definition. But so it was in VC71
(Dinkumware, too), where th same code used to compile...
That's interesting. If I understand correctly, you're saying
that the code is basically something like:
template< typename charT ... >
class basic_string
{
// ...
size_t const npos ;
// ...
} ;
template< typename charT ... >
size_t const basic_string::npos = -1 ;
without any explicit specialization of basic_string.
I haven't found one. ('std::string' is a simple 'typedef' in
both versions.)

That would be the case even if there were an explicit
specialization.

Out of curiousity, I generated the preprocessor output for a
program consisting of just an #include <string>, for all of the
compiler/library combinations I have available. None seem to
have an explicit specialization. They split with regards to
where npos is initialized, with those that initialize it in the
class never providing a declaration (which results in undefined
behavior if you use it in a context where the object is
required---but compilers can defined undefined behavior).
That is the same as for member functions which are parsed as
if they were defined immediately following the class, right?
Exactly.
Should the compiler "see" this initialization or not. Quite
frankly, I think that the standard is ambiguous in this regard.
Or rather, it clearly specifies something that isn't
implementable, so we have to guess what is really meant. The
actual words are "[...]const variables or static data members of
integral or enumeration types initialized with constant
expressions." There's actually nothing there which constrains
the requirement to "visible" initializations, and taken
literally (and ignoring template issues and such), given
something like:
struct X
{
static int const i ;
} ;
, X::i should be an integral constant expression if it was
initializations with a constant expression, even if the
actual initialization were in a completely different
translation unit.
Which is what I thought.

That's what it actually says. It presents a certain number of
difficulties for the implementation, however.
Does it? If so, I'm missing why the actual value of your
'Test::t' needs to be known at compile-time. Is this what is
wanted? Is this specified somewhere?

Think of something like:

struct X
{
static int const i ;
} ;

double a[ X::i ] ;

Do you really expect this to compile if the initialization value
of the X::i is in a different compilation unit? An "integral
constant expression" is designed to support things like array
dimensions. There are a number of places where the language
requires integral constant expressions; places where the
compiler really does need to know the numeric value.

It's what you can do with an integral constant expression which
implies that the compiler must know its value.
Interestingly, Comeau compile this
struct test1 {
static const int x = -1;
};
struct test2 {
static const int x = test1::x;
};
without any complaints. However, it compiles this
struct test1 {
static const int x;
};
const int test1::x = 1;

struct test2 {
static const int x = test1::x;
};
just as well.

In both cases, the initialization value of the variable is
visible at the point of use, so no problem. In the second case,
try moving the definition of test1::x to after the definition of
test2.
Probably, but I won't do it. I have a hard time /reading/
this. No chance of me /writing/ any of it. :(

I've raised the question in the committee. We'll see what they
make of it.
 
J

James Kanze

Indeed. However, if test1 is a template, it fails. Seems a bit
inconsistent.

Not really. A template definition is not a variable definition;
only the instantiation of a template definition is a variable
definition. And the "point of instantiation" of a member
function or member static variable is immediately following the
namespace scope declaration which triggers the instantiation.
So if you write:

template< typename T >
struct test1
{
static int const x ;
} ;

template< typename T >
int const test1::x = 1 ;

struct test2
{
static int const x = test1< int >::x ;
} ;

, the compiler inserts the instantiation of the class test1<int>
immediately before the definition of test2, and the
instantiation of the static data member immediately after, i.e.:

struct test1< int >
{
static int const x ;
} ;

struct test2
{
static int const x = test1< int >::x ;
} ;
int const test1< int >::x = 1 ;

And the value of test1< int >::x isn't visible when test2::x is
declared.
 
J

James Kanze

And the implementation of the standard library does not have
to be written in portable C++, something which even members of
the standards committee have to be reminded of.

That was more or less what I meant to imply with my comment in
parentheses. In this case, the implementation of
std::basic_string in the g++ library clearly contains undefined
behavior, according to the standard. But it works with g++,
because of what the compiler does in this particular case, and
that's all that matters; it's not an error in the library.
 
H

Hendrik Schober

Paavo said:
Hendrik Schober said:
struct Test
{
static std::size_t const t = std::string::npos ;
} ;
[...]

Does it? If so, I'm missing why the actual value of your
'Test::t' needs to be known at compile-time. Is this what
is wanted? Is this specified somewhere?

This is what all the hassle about const expressions is about. They have
to be known at compile-time in order to specify array sizes, etc. This is
why the special syntax of initializing a static integral constant inside
the class was invented in the first place.

Ah, I see. This actually makes sense to me.
Given that Test::t is not portably known at compile phase, one should
just use the regular out-of-class initialization for it, that's it.

As I said -- it wasn't my code. I changed it anyway.
There are other people who claim that out-of-class initialization is
cumbersome and who propose to extend the in-class initialization rules,
to allow floating-point numbers for example. I am not sure if they have
also proposed to allow non-constant expressions in the initializer. Given
that the compilers/linkers have to cope with static template member
definitions in the headers anyway, it should be technically solvable.

Good. However, I doubt we'll see this soon...
[...]
Paavo

Schobi
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,825
Latest member
VernonQuy6

Latest Threads

Top