<string> to lowercase

K

kanze

A question: what does the double-colon mean in this context, and from
which library does the tolower function come?

It tells the compiler only to look in global namespace, not in std::.

In the actual code in question, according to the standard, there was no
function tolower in global namespace. In practice, however, most, if
not all implementations are broken in this regard, and the C version of
tolower (the one in <cctypes>) is visible in global namespace.

Of course, he could have gotten the same effect (I think -- I'm not
really that sure of the standard here, and of course, no two compilers
do exactly the same thing anyway) by including <ctype.h>.
 
K

kanze

llewelly said:
(e-mail address removed) writes:
(e-mail address removed) (Brian Stone) wrote in message
The easiest way I know is to use the transform() function from the
<algorithm> library. Here's an example of how to apply this to a
string to convert the case...
#include <iostream>
#include <string>
#include <algorithm>
#include <functional>
#include <cctype>
using namespace std;
int main ( int argc, char **argv )
{
string A = "TeStInG!";
cout << A << endl; // output: TeStInG!
transform ( A.begin(), A.end(), A.begin(), ptr_fun:):tolower) );
cout << A << endl; // output: testing!
transform ( A.begin(), A.end(), A.begin(), ptr_fun:):toupper) );
cout << A << endl; // output: TESTING!
}
[snip]
In fact, the only variant which compiled (and that got a warning
from Sun CC) is yours, with ::tolower and ::toupper. And you are
playing on a bug in practically every implementation of <cctype>,
which exposes ::tolower and ::toupper (rather than only having them
available in std::, as the standard requires). [snip]

The 'using namespace std;' at global scope makes std::tolower
and std::toupper be availible at global scope. (See 3.4.3.2)

Now if only someone would interpret the second paragraph of that section
into something I could understand.

There is a statement to the effect that "using-derectives are ingored in
any namspace, including X, directly containing one or more declarations
of m". If there is a toupper declared in :: (the standard forbids it,
but all of the implementations I know do have one), then the
using-directives in :: should be ignored. Which would make this code
work. But it wouldn't work if the code, including the using directive,
where in another namespace.

This is all getting a bit beyond me. What I do know is that the
compilers I have (Sun CC 5.1 and g++ 3.4.0) do NOT treat tolower and
toupper as ambiguous here, even if I include <locale> (so that there are
any number of toupper and tolower functions available). Whereas they do
if I leave the :: out. So whatever the standard says...

The whole thing has gotten me totally confused. For the moment, I'll
just stick with my current solution :

#include <ctype.h>

namespace {
struct ToUpper
{
char operator()( char ch ) const
{
return ::toupper( (unsigned char)ch ) ;
}
}
}

// ...

transform( a.begin(), a.end(), a.begin(), ToUpper() ) ;

It also has the advantage of not having undefined behavior when one of
the char values happens to be negative. And it should still work even
if an implementor eventually does come up with a conforming <cctype>.

(Of course, in practice I never use toupper or tolower anyway, because I
often have to deal with things like 'ß'. But I do use other functions
with other algorithms said:
Even without the 'using namespace std', we have 17.4.3.1.3/5:
# Each function signature from the Standard C library declared
# with external linkage is reserved to the implementation for use
# as a function signature with both extern "C" and extern "C++"
# linkage, (168) or as a name of namespace scope in the global
# namespace.

That's a reservation of the name; it doesn't mean that the name is
visible. What it means is that if you define an `isupper' function at
global scope, your program might not work.
 
G

Gabriel Dos Reis

(e-mail address removed) writes:

[...]

| > The 'using namespace std;' at global scope makes std::tolower
| > and std::toupper be availible at global scope. (See 3.4.3.2)
|
| Now if only someone would interpret the second paragraph of that section
| into something I could understand.

I think only compiler writers care about what it means ;-p

| There is a statement to the effect that "using-derectives are ingored in
| any namspace, including X, directly containing one or more declarations
| of m". If there is a toupper declared in :: (the standard forbids it,
| but all of the implementations I know do have one), then the
| using-directives in :: should be ignored. Which would make this code
| work. But it wouldn't work if the code, including the using directive,
| where in another namespace.

The executive summary is that if you write X::m, then any actual
declaration of "m" in "X" hides any other declarations that would have
been found by searching namespaces nominated, directly or indirectly,
in using declarations reachable from X. If no actually declaration
for "m" is made in "X", then the result of the name lookup will be
that of applying the rule recursively to the nominated namespaces.

| This is all getting a bit beyond me. What I do know is that the
| compilers I have (Sun CC 5.1 and g++ 3.4.0) do NOT treat tolower and
| toupper as ambiguous here, even if I include <locale> (so that there are
| any number of toupper and tolower functions available). Whereas they do
| if I leave the :: out. So whatever the standard says...

because the declaration of ::toupper "hides" other declarations for
toupper in the used namespace std.
 
K

kanze

Gabriel Dos Reis said:
(e-mail address removed) writes:

| > The 'using namespace std;' at global scope makes std::tolower and
| > std::toupper be availible at global scope. (See 3.4.3.2)
| Now if only someone would interpret the second paragraph of that
| section into something I could understand.
I think only compiler writers care about what it means ;-p

I think that there's a lot like that in the standard:). Maybe one day,
I'll get the occasion to write a compiler, instead of just complaining
about them. (The current crop of compilers are pretty lousy -- they
always do what say, and never what I mean:).)
| There is a statement to the effect that "using-derectives are
| ingored in any namspace, including X, directly containing one or
| more declarations of m". If there is a toupper declared in :: (the
| standard forbids it, but all of the implementations I know do have
| one), then the using-directives in :: should be ignored. Which
| would make this code work. But it wouldn't work if the code,
| including the using directive, where in another namespace.
The executive summary is that if you write X::m, then any actual
declaration of "m" in "X" hides any other declarations that would have
been found by searching namespaces nominated, directly or indirectly,
in using declarations reachable from X. If no actually declaration
for "m" is made in "X", then the result of the name lookup will be
that of applying the rule recursively to the nominated namespaces.

In sum, what I had intuitively expected (and what the compilers I use
seem to implement). So why did the other posters say that "using
namespace std" meant that ::toupper would find a toupper in std::.

But wait a minute. If I say explicitly that the only toupper that I
want considered is the one in global namespace (e.g. ::toupper), and
there isn't one in global namespace, the compiler will look elsewhere?
That doesn't sound intuitively right -- I would expect an error.
| This is all getting a bit beyond me. What I do know is that the
| compilers I have (Sun CC 5.1 and g++ 3.4.0) do NOT treat tolower and
| toupper as ambiguous here, even if I include <locale> (so that there
| are any number of toupper and tolower functions available). Whereas
| they do if I leave the :: out. So whatever the standard says...
because the declaration of ::toupper "hides" other declarations for
toupper in the used namespace std.

Except that, of course, if the libraries were conform, there wouldn't
have been a ::toupper in global namespace:).

Anyhow, I still contend that the only "correct" solution using transform
involves something like:
boost::bind( (char (*)(char, std::locale const&))&std::toupper,
_1, std::locale() )
For a pretty weak definition of correct, even then -- any toupper that
doesn't convert "Maße" to "MASSE" is irremedially broken, and won't be
acceptable to some of my customers.
 
G

Gabriel Dos Reis

(e-mail address removed) writes:

[...]

| > | There is a statement to the effect that "using-derectives are
| > | ingored in any namspace, including X, directly containing one or
| > | more declarations of m". If there is a toupper declared in :: (the
| > | standard forbids it, but all of the implementations I know do have
| > | one), then the using-directives in :: should be ignored. Which
| > | would make this code work. But it wouldn't work if the code,
| > | including the using directive, where in another namespace.
|
| > The executive summary is that if you write X::m, then any actual
| > declaration of "m" in "X" hides any other declarations that would have
| > been found by searching namespaces nominated, directly or indirectly,
| > in using declarations reachable from X. If no actually declaration
| > for "m" is made in "X", then the result of the name lookup will be
| > that of applying the rule recursively to the nominated namespaces.
|
| In sum, what I had intuitively expected (and what the compilers I use
| seem to implement). So why did the other posters say that "using
| namespace std" meant that ::toupper would find a toupper in std::.

I cannot speak for them and I hope they will clarify hat they meant.
And to tell the truth, I've lost most of those postings.

| But wait a minute. If I say explicitly that the only toupper that I
| want considered is the one in global namespace (e.g. ::toupper), and
| there isn't one in global namespace, the compiler will look elsewhere?

Yes. This is called in TC++PL3 "namespace composition" -- you trick
people into thinking that the "m" they're referring to in X::m comes
from your "X", whereas you may have just "stolen" it through
using-directives. E.g.

namespace N {
int m;
};

namespace X {
using namespace N; // compose X with N
}

int main()
{
return X::m; // finds N::m;
}

| That doesn't sound intuitively right -- I would expect an error.

I guess, it depends on how you look at the "::".
A view is that it is a scope resolution operator, i.e. it
disambiguates when there is a scope problem -- either there is no
visible declaration or there are too many visible declarations from
different scopes.

| > | This is all getting a bit beyond me. What I do know is that the
| > | compilers I have (Sun CC 5.1 and g++ 3.4.0) do NOT treat tolower and
| > | toupper as ambiguous here, even if I include <locale> (so that there
| > | are any number of toupper and tolower functions available). Whereas
| > | they do if I leave the :: out. So whatever the standard says...
|
| > because the declaration of ::toupper "hides" other declarations for
| > toupper in the used namespace std.
|
| Except that, of course, if the libraries were conform, there wouldn't
| have been a ::toupper in global namespace:).

Yes, but you (James) won't quibble me on that; right? :)

| Anyhow, I still contend that the only "correct" solution using transform
| involves something like:
| boost::bind( (char (*)(char, std::locale const&))&std::toupper,
| _1, std::locale() )

I really do dislike the cast notation in front of std::toupper. It is
not a cast, it is an abuse of notation (manual overload resolution).
People should not be tricked into thinking that someone is doing a
weird cast from std::toupper; let's sequester cast notations to cast.
 
K

kanze

Gabriel Dos Reis said:
(e-mail address removed) writes:

| > | There is a statement to the effect that "using-derectives are
| > | ingored in any namspace, including X, directly containing one or
| > | more declarations of m". If there is a toupper declared in ::
| > | (the standard forbids it, but all of the implementations I know
| > | do have one), then the using-directives in :: should be ignored.
| > | Which would make this code work. But it wouldn't work if the
| > | code, including the using directive, where in another namespace.
| > The executive summary is that if you write X::m, then any actual
| > declaration of "m" in "X" hides any other declarations that would
| > have been found by searching namespaces nominated, directly or
| > indirectly, in using declarations reachable from X. If no actually
| > declaration for "m" is made in "X", then the result of the name
| > lookup will be that of applying the rule recursively to the
| > nominated namespaces.
| In sum, what I had intuitively expected (and what the compilers I
| use seem to implement). So why did the other posters say that
| "using namespace std" meant that ::toupper would find a toupper in
| std::.
I cannot speak for them and I hope they will clarify hat they meant.
And to tell the truth, I've lost most of those postings.

I may have misunderstood what they were trying to say, but I got the
impression from them that what they were saying was that the =AB using
namespace std =BB was why the compiler was finding a toupper in std
namespace.
| But wait a minute. If I say explicitly that the only toupper that I
| want considered is the one in global namespace (e.g. ::toupper), and
| there isn't one in global namespace, the compiler will look
| elsewhere?
Yes. This is called in TC++PL3 "namespace composition" -- you trick
people into thinking that the "m" they're referring to in X::m comes
from your "X", whereas you may have just "stolen" it through
using-directives. E.g.
namespace N {
int m;
};
namespace X {
using namespace N; // compose X with N
}
int main()
{
return X::m; // finds N::m;
}
| That doesn't sound intuitively right -- I would expect an error.
I guess, it depends on how you look at the "::".

Yes. It occured to me shortly after posting that this is sort of the
way the :: works within a class hierarchy. It will look deeper than the
classname given, but only if it doesn't find the name in the first
class.

The analogy is far from exact, but it is enough to make me suspicious of
my "intuitively". The situation has enough variety that there is no
intuitiveity.
A view is that it is a scope resolution operator, i.e. it
disambiguates when there is a scope problem -- either there is no
visible declaration or there are too many visible declarations from
different scopes.
| > | This is all getting a bit beyond me. What I do know is that the
| > | compilers I have (Sun CC 5.1 and g++ 3.4.0) do NOT treat tolower
| > | and toupper as ambiguous here, even if I include <locale> (so
| > | that there are any number of toupper and tolower functions
| > | available). Whereas they do if I leave the :: out. So whatever
| > | the standard says...
| > because the declaration of ::toupper "hides" other declarations
| > for toupper in the used namespace std.
| Except that, of course, if the libraries were conform, there
| wouldn't have been a ::toupper in global namespace:).
Yes, but you (James) won't quibble me on that; right? :)

Well, I don't think that it's your fault, even if you actively work on
one of the libraries:).

Realisticly, I wonder if the standard doesn't ask too much here. Maybe
it should make it unspecified whether <cctype> introduces the names into
global scope or not. Theoretically, I find what the standard requires
much better, but that doesn't do me any good if all of the implementors
ignore the requirement.
| Anyhow, I still contend that the only "correct" solution using
| transform involves something like:
| boost::bind( (char (*)(char, std::locale const&))&std::toupper,
| _1, std::locale() )
I really do dislike the cast notation in front of std::toupper. It is
not a cast, it is an abuse of notation (manual overload resolution).
People should not be tricked into thinking that someone is doing a
weird cast from std::toupper; let's sequester cast notations to cast.

I said correct, and I even put correct in quotes; I certainly didn't say
that it was elegant, nor that I liked it:). For once, in fact, I agree
with you 100%.
From a pratical point of view: it's what the standard says, and most, if
not all, implementations seem to be conformant on this particular point.
From an even more pratical point of view: it's overly complex, totally
illegible, and so not really maintainable. In production code, I'd
always write a custom function which used the standard function, so that
overload resolution would handle the issue automatically.
 
L

llewelly

Gabriel Dos Reis said:
(e-mail address removed) writes:

| > | There is a statement to the effect that "using-derectives are
| > | ingored in any namspace, including X, directly containing one or
| > | more declarations of m". If there is a toupper declared in ::
| > | (the standard forbids it, but all of the implementations I know
| > | do have one), then the using-directives in :: should be ignored.
| > | Which would make this code work. But it wouldn't work if the
| > | code, including the using directive, where in another namespace.
| > The executive summary is that if you write X::m, then any actual
| > declaration of "m" in "X" hides any other declarations that would
| > have been found by searching namespaces nominated, directly or
| > indirectly, in using declarations reachable from X. If no actually
| > declaration for "m" is made in "X", then the result of the name
| > lookup will be that of applying the rule recursively to the
| > nominated namespaces.
| In sum, what I had intuitively expected (and what the compilers I
| use seem to implement). So why did the other posters say that
| "using namespace std" meant that ::toupper would find a toupper in
| std::.
I cannot speak for them and I hope they will clarify hat they meant.
And to tell the truth, I've lost most of those postings.

I may have misunderstood what they were trying to say, but I got the
impression from them that what they were saying was that the =AB using
namespace std =BB was why the compiler was finding a toupper in std
namespace.

That's my understanding. I get it from 3.4.3/4 :

# A name prefixed by the unary scope operator :: (5.1) is looked
# up in global scope, in the translation unit where it is
# used. The name shall be declared in global namespace scope or
# shall be a name whose declaration is visible in global scope
# because of a using-directive (3.4.3.2). The use of :: allows a
# global name to be referred to even if its identifier has been
# hidden (3.3.7).

Note that I think 17.4.3.1.3/4 (see below) allows 'toupper' to be
found in the global namespace even without the 'using namespace
std'; AFAICT the 'using namespace std' servers only to require
that the name found have the semantics you expect for 'toupper',
as opposed to implementation-defined semantics.

[snip]
Realisticly, I wonder if the standard doesn't ask too much here. Maybe
it should make it unspecified whether <cctype> introduces the names into
global scope or not.
[snip]

I think the standard already goes farther than that, see 17.4.3.1.3/4:

# Each name from the Standard C library declared with external
# linkage is reserved to the implementation for use as a name with
# extern "C" linkage, both in namespace std and in the global
# namespace.

There has been some dispute here and on comp.std.c++ about what this
means, but my interpretation (which I would rather be wrong) is
that if a name such as 'toupper' is *not* brought into the global
namespace by a sequence such as:

#include<cstddef>
using namespace std;

'::toupper' has implementation-defined semantics.

Someday, I am going to to make time to test snippets such as:

//note - no headers #included.
extern "C" double qsort(double,double,double,double);

int main()
{
double d= qsort(1.0,1.0,1.0,1.0);
return (int)d;
}

on several different implementations. (On g++ 3.4 on freebsd, it
compiles with no errors or warnings, and dumps core at runtime. )
Theoretically, I find what the standard requires
much better,

I think you're mistaken about what it requires, though I wish you
were right.
but that doesn't do me any good if all of the implementors
ignore the requirement.

Agreed. I do wonder if #including the C++ <cxxx> headers is actually
more dangerous than #including the older equivalent inherted from
C89 .
I said correct, and I even put correct in quotes; I certainly didn't say
that it was elegant, nor that I liked it:). For once, in fact, I agree
with you 100%.

not all, implementations seem to be conformant on this particular point.
illegible, and so not really maintainable.

In some cases (though not the above) I find it cleaner to specify the
template arguments to the function template explicitly.
In production code, I'd
always write a custom function which used the standard function, so that
overload resolution would handle the issue automatically.
[snip]

Agreed.
 
K

kanze

llewelly said:
(e-mail address removed) writes:
Gabriel Dos Reis said:
(e-mail address removed) writes:
[...]
| > | There is a statement to the effect that "using-derectives
| > | are ingored in any namspace, including X, directly
| > | containing one or more declarations of m". If there is a
| > | toupper declared in :: (the standard forbids it, but all of
| > | the implementations I know do have one), then the
| > | using-directives in :: should be ignored. Which would make
| > | this code work. But it wouldn't work if the code, including
| > | the using directive, where in another namespace.
| > The executive summary is that if you write X::m, then any
| > actual declaration of "m" in "X" hides any other declarations
| > that would have been found by searching namespaces nominated,
| > directly or indirectly, in using declarations reachable from
| > X. If no actually declaration for "m" is made in "X", then the
| > result of the name lookup will be that of applying the rule
| > recursively to the nominated namespaces.
| In sum, what I had intuitively expected (and what the compilers
| I use seem to implement). So why did the other posters say that
| "using namespace std" meant that ::toupper would find a toupper
| in std::.
I cannot speak for them and I hope they will clarify hat they
meant. And to tell the truth, I've lost most of those postings.
I may have misunderstood what they were trying to say, but I got
the impression from them that what they were saying was that the
=AB using namespace std =BB was why the compiler was finding a
toupper in std namespace.
That's my understanding. I get it from 3.4.3/4 :
# A name prefixed by the unary scope operator :: (5.1) is looked
# up in global scope, in the translation unit where it is
# used. The name shall be declared in global namespace scope or
# shall be a name whose declaration is visible in global scope
# because of a using-directive (3.4.3.2). The use of :: allows a
# global name to be referred to even if its identifier has been
# hidden (3.3.7).

There are two things involved here. First, although the standard
doesn't allow it, there is a toupper in global namespace. Second, there
are a number of toupper in std:: (since in the implementation I use,
<iostream> does indirectly pull in <locale>). What I don't understand
is this: having done "using namespace std:",

- if I write ::toupper, and the toupper in global namespace isn't
available, do I find the toupper in std::, and

- if the above is true, and I do have a toupper in global namespace,
why isn't the function call ambiguous.

(I think that to really understand this, I'm going to have to find time
to write up some simple examples of my own. It's difficult following
toupper, because you are never 100% sure what the library may have done
with it.)
Note that I think 17.4.3.1.3/4 (see below) allows 'toupper' to be
found in the global namespace even without the 'using namespace
std'; AFAICT the 'using namespace std' servers only to require
that the name found have the semantics you expect for 'toupper',
as opposed to implementation-defined semantics.

I'm not sure. It certainly means that I cannot define a toupper of my
own in global namespace.

I think the intent is just a pratical one (for compiler implementers).
Regardless of the namespace in which I declare or define an ``extern
"C"'' function, the name must appear to the linker as if the function
were defined in the global namespace, since C can't do it any
differently. If the name isn't reserved to the implementation, I could
legally define a function of this name myself, and the linker would take
it instead of the one from the C library. That doesn't mean that my
code can see the name outside of std::.

But there may be other "special features" of ``extern "C"'' that I'm not
familiar with, which do make if visible.
[snip]
Realisticly, I wonder if the standard doesn't ask too much here.
Maybe it should make it unspecified whether <cctype> introduces the
names into global scope or not.
[snip]

I think the standard already goes farther than that, see 17.4.3.1.3/4:
# Each name from the Standard C library declared with external
# linkage is reserved to the implementation for use as a name
# with extern "C" linkage, both in namespace std and in the
# global namespace.

The name is reserved to the implementation. That doesn't mean that I
can see it. Or does it?

I think a clarification is in order.
There has been some dispute here and on comp.std.c++ about what this
means, but my interpretation (which I would rather be wrong) is
that if a name such as 'toupper' is *not* brought into the global
namespace by a sequence such as:
#include<cstddef>
using namespace std;
'::toupper' has implementation-defined semantics.

What interests me is what happens without the "using namespace std;".
Is the compiler still allowed to find toupper?
Someday, I am going to to make time to test snippets such as:
//note - no headers #included.
extern "C" double qsort(double,double,double,double);
int main()
{
double d= qsort(1.0,1.0,1.0,1.0);
return (int)d;
}
on several different implementations. (On g++ 3.4 on freebsd, it
compiles with no errors or warnings, and dumps core at runtime. )

As far as I can see, you've violated §17.4.3.1.3/5, so your code has
undefined behavior. (Or does this paragraph only apply if you include
at least one standard header?)
I think you're mistaken about what it requires, though I wish you
were right.

I think it is open to interpretation.

Personally, for the moment, I stick with the good old <ctype.h> -- at
least I know what I'm getting:). Sort of, because of course,
Agreed. I do wonder if #including the C++ <cxxx> headers is actually
more dangerous than #including the older equivalent inherted from
C89 .

That's been my fear. I don't know whether it is really justified by the
standard, but it does seem that what actual implementations do is less
clear than in the case of <ctype.h>. (Of course, one of the actual
implementations I still have to deal with is g++ 2.95.2. Which
complicates the issue because of its particular handling of std::.)
 
G

Gabriel Dos Reis

(e-mail address removed) writes:

| | > (e-mail address removed) writes:
|
| > > | > >> (e-mail address removed) writes:
|
| > >> [...]
|
| > >> | > | There is a statement to the effect that "using-derectives
| > >> | > | are ingored in any namspace, including X, directly
| > >> | > | containing one or more declarations of m". If there is a
| > >> | > | toupper declared in :: (the standard forbids it, but all of
| > >> | > | the implementations I know do have one), then the
| > >> | > | using-directives in :: should be ignored. Which would make
| > >> | > | this code work. But it wouldn't work if the code, including
| > >> | > | the using directive, where in another namespace.
|
| > >> | > The executive summary is that if you write X::m, then any
| > >> | > actual declaration of "m" in "X" hides any other declarations
| > >> | > that would have been found by searching namespaces nominated,
| > >> | > directly or indirectly, in using declarations reachable from
| > >> | > X. If no actually declaration for "m" is made in "X", then the
| > >> | > result of the name lookup will be that of applying the rule
| > >> | > recursively to the nominated namespaces.
|
| > >> | In sum, what I had intuitively expected (and what the compilers
| > >> | I use seem to implement). So why did the other posters say that
| > >> | "using namespace std" meant that ::toupper would find a toupper
| > >> | in std::.
|
| > >> I cannot speak for them and I hope they will clarify hat they
| > >> meant. And to tell the truth, I've lost most of those postings.
|
| > > I may have misunderstood what they were trying to say, but I got
| > > the impression from them that what they were saying was that the
| > > =AB using namespace std =BB was why the compiler was finding a
| > > toupper in std namespace.
|
| > That's my understanding. I get it from 3.4.3/4 :
|
| > # A name prefixed by the unary scope operator :: (5.1) is looked
| > # up in global scope, in the translation unit where it is
| > # used. The name shall be declared in global namespace scope or
| > # shall be a name whose declaration is visible in global scope
| > # because of a using-directive (3.4.3.2). The use of :: allows a
| > # global name to be referred to even if its identifier has been
| > # hidden (3.3.7).
|
| There are two things involved here. First, although the standard
| doesn't allow it, there is a toupper in global namespace. Second, there
| are a number of toupper in std:: (since in the implementation I use,
| <iostream> does indirectly pull in <locale>). What I don't understand
| is this: having done "using namespace std:",
|
| - if I write ::toupper, and the toupper in global namespace isn't
| available, do I find the toupper in std::, and

Yes, I already explained why.

| - if the above is true, and I do have a toupper in global namespace,
| why isn't the function call ambiguous.

Because the declaration at the global scope "hides" other declarations
available through used namespaces. See my previous explanation.

| (I think that to really understand this, I'm going to have to find time
| to write up some simple examples of my own. It's difficult following
| toupper, because you are never 100% sure what the library may have done
| with it.)

Probably.
 
G

Gabriel Dos Reis

| (e-mail address removed) writes:
|
| > | >> (e-mail address removed) writes:
| >
| >> [...]
| >
| >> | > | There is a statement to the effect that "using-derectives are
| >> | > | ingored in any namspace, including X, directly containing one or
| >> | > | more declarations of m". If there is a toupper declared in ::
| >> | > | (the standard forbids it, but all of the implementations I know
| >> | > | do have one), then the using-directives in :: should be ignored.
| >> | > | Which would make this code work. But it wouldn't work if the
| >> | > | code, including the using directive, where in another namespace.
| >
| >> | > The executive summary is that if you write X::m, then any actual
| >> | > declaration of "m" in "X" hides any other declarations that would
| >> | > have been found by searching namespaces nominated, directly or
| >> | > indirectly, in using declarations reachable from X. If no actually
| >> | > declaration for "m" is made in "X", then the result of the name
| >> | > lookup will be that of applying the rule recursively to the
| >> | > nominated namespaces.
| >
| >> | In sum, what I had intuitively expected (and what the compilers I
| >> | use seem to implement). So why did the other posters say that
| >> | "using namespace std" meant that ::toupper would find a toupper in
| >> | std::.
| >
| >> I cannot speak for them and I hope they will clarify hat they meant.
| >> And to tell the truth, I've lost most of those postings.
| >
| > I may have misunderstood what they were trying to say, but I got the
| > impression from them that what they were saying was that the =AB using
| > namespace std =BB was why the compiler was finding a toupper in std
| > namespace.
|
| That's my understanding. I get it from 3.4.3/4 :
|
| # A name prefixed by the unary scope operator :: (5.1) is looked
| # up in global scope, in the translation unit where it is
| # used. The name shall be declared in global namespace scope or
| # shall be a name whose declaration is visible in global scope
| # because of a using-directive (3.4.3.2). The use of :: allows a
| # global name to be referred to even if its identifier has been
| # hidden (3.3.7).


What this says is that with ::toupper, you find those in the
global namespace *if* there are corresponding declarations there;
*otherwise*, you find those available through searching of used
namespaces (directly or indirectly). In particular, you do NOT find
both categoriss.

| Note that I think 17.4.3.1.3/4 (see below) allows 'toupper' to be
| found in the global namespace even without the 'using namespace
| std';

No, it does not. What that paragraph means is that a user does cannot
define them at global scope, or declare them with C language linkage.

| AFAICT the 'using namespace std' servers only to require
| that the name found have the semantics you expect for 'toupper',
| as opposed to implementation-defined semantics.
|
| [snip]
| > Realisticly, I wonder if the standard doesn't ask too much here. Maybe
| > it should make it unspecified whether <cctype> introduces the names into
| > global scope or not.
| [snip]
|
| I think the standard already goes farther than that, see 17.4.3.1.3/4:
|
| # Each name from the Standard C library declared with external
| # linkage is reserved to the implementation for use as a name with
| # extern "C" linkage, both in namespace std and in the global
| # namespace.
|
| There has been some dispute here and on comp.std.c++ about what this
| means, but my interpretation (which I would rather be wrong) is
| that if a name such as 'toupper' is *not* brought into the global
| namespace by a sequence such as:
|
| #include<cstddef>
| using namespace std;
|
| '::toupper' has implementation-defined semantics.

I don't see how you derive tht. Certainly <cstddef> is not described
to define namespace in the global namespace.

| Someday, I am going to to make time to test snippets such as:
|
| //note - no headers #included.
| extern "C" double qsort(double,double,double,double);

Strictly speaking, you get into undefined beahviour territory.
Presicely because of the very paragraph you quote above.
 
L

LR

Zombie said:
Hi, what is the correct way of converting contents of a <string> to
lowercase?
There are no methods of <string> class to do this

I know I'm a little late responding, but I was reading this thread and
browsing through my compiler docs and the standard, and I was wondering
if there would be something wrong with:

std::string t("MiXeD cAsE");
std::ctype<std::string::value_type>().tolower(t.begin(),t.end());

Or have I missed something really obvious? Character sets other than
ASCII 0...127?

LR
 
S

sanjay

I think you didn't even try compiling your code. std::ctype::~ctype is
protected and won't let you compile.

Thanks,
Sanjay.
 
K

kanze

LR said:
Zombie wrote:
I know I'm a little late responding, but I was reading this
thread and browsing through my compiler docs and the standard,
and I was wondering if there would be something wrong with:
std::string t("MiXeD cAsE");
std::ctype<std::string::value_type>().tolower(t.begin(),t.end());

Or have I missed something really obvious?

Just that ctype::tolower doesn't take string iterators as
parameters. (I'm also a bit dubious about using the default
constructor of ctype -- I'm not sure what the resulting class is
supposed to do. The whole point of ctype is that you get it
from a defined locale.)
Character sets other than ASCII 0...127?

That shouldn't be a problem. The conversion is obviously
locale specific, but for a given local, and an adequately
loose definition of work, it should work.
 
M

msb222

I have one minor point to add to this discussion... While the tolower
and std::tolower, and locale stuff is all well and good for
lower-casing text strings in general, there is a non-trivial
performance cost to this operation. Depending on the implementations I
have measured including VC++7, which does case-insensitive symbol
lookups at runtime I am able to squeeze almost 15% faster performance
out of the code by using a cache strategy.

What we did was to write our own wrapper function structure called
fast_tolower... at startup time, an array is created of 256 bytes (if
you are using wide characters, you would end up with a 65536 array of
wchar_t, which would be about 128k of memory... which seems like a lot,
but our program does massive crunching on the order of gigs so it's
worth it). That then gets populated with the results of tolower() all
of the values in that byte range.

So all the fast_tolower lookup does is a constant array access, and
then we use that with std::transform. It's definitely overkill for a
small little program, but if you have a large system working on lots of
text or symbols that need to be case insensitive (dealing with one
locale of course) I think it's a good idea to do it this way.

One way you could generalize this cache is to have something possibly
templatized on the locale info, and then there would be one static
cache for each locale you are actually using if you care about case
info in that locale

Marc
 
L

LR

Just that ctype::tolower doesn't take string iterators as
parameters. (I'm also a bit dubious about using the default
constructor of ctype -- I'm not sure what the resulting class is
supposed to do. The whole point of ctype is that you get it
from a defined locale.)


Thanks for responding, and also to Sanjay, who pointed out the problem
of std::ctype's dtor being protected. I completely missed that. But my
compiler (VC++ 6.0) did compile and run the code I posted. More recent
MS product doesn't compile it. I also tried simply constructing a
std::ctype<char> at www.comeaucomputing.com and it seemed to compile.


LR
 
K

kanze

msb222 said:
I have one minor point to add to this discussion... While the
tolower and std::tolower, and locale stuff is all well and
good for lower-casing text strings in general, there is a
non-trivial performance cost to this operation.

I imagine that that is why the functions taking char* (instead
of just a single char) were added to the locale mechanism. In
most cases, it probably doesn't matter, but there are
conceivably cases where a virtual function call (as opposed to
an inlined function with no call) might be too expensive.
Depending on the implementations I have measured including
VC++7, which does case-insensitive symbol lookups at runtime I
am able to squeeze almost 15% faster performance out of the
code by using a cache strategy.
What we did was to write our own wrapper function structure
called fast_tolower... at startup time, an array is created of
256 bytes

If this isn't what the implementation of ctype does, there's
something wrong with it. At least in the specialization for
char.
(if you are using wide characters, you would end up with a
65536 array of wchar_t, which would be about 128k of
memory... which seems like a lot, but our program does massive
crunching on the order of gigs so it's worth it). That then
gets populated with the results of tolower() all of the values
in that byte range.

To be really useful, wchar_t should be at least 21 bits. On the
machines I usually work on, it's 32 bits. And over 4 billion 4
byte elements isn't going to cut it.

In practice, of course, most of the code blocks don't have
upper/lower case, so using an additional level of indirection,
and only implementing the full table for blocks with at least
one upper/lower would probably be acceptable.
So all the fast_tolower lookup does is a constant array
access, and then we use that with std::transform. It's
definitely overkill for a small little program, but if you
have a large system working on lots of text or symbols that
need to be case insensitive (dealing with one locale of
course) I think it's a good idea to do it this way.

If profiling shows that your compiler's implementation of
ctype::tolower( char*, char const* ) is flakey, it's definitly a
solution to be considered. You might also want to consider it
simply because it means that you can pass the function arbitrary
iterators, rather than only char*'s -- if it avoids an otherwise
unnecessary copy, you have a speed advantage there as well.

Of course, if you want an internationalized environment, and to
do the conversion correctly, it gets a lot more complicated, and
the standard functions quickly become unusable (since toupper
will sometimes return two characters for a single lower case,
and things like that).
 
B

Ben Hutchings

msb222 wrote:
What we did was to write our own wrapper function structure called
fast_tolower... at startup time, an array is created of 256 bytes
So all the fast_tolower lookup does is a constant array access, and
then we use that with std::transform.
<snip>

If you look at any implementation of std::tolower, I'm fairly sure
you'll find it does the same! The speed advantage of fast_tolower
probably comes either from inlining (if tolower is not inline), static
linking to the array (if the C library is dynamically linked) or the
avoidance of conversions.
 
K

kanze

msb222 said:
I have one minor point to add to this discussion... While the
tolower and std::tolower, and locale stuff is all well and
good for lower-casing text strings in general, there is a
non-trivial performance cost to this operation.

I imagine that that is why the functions taking char* (instead
of just a single char) were added to the locale mechanism. In
most cases, it probably doesn't matter, but there are
conceivably cases where a virtual function call (as opposed to
an inlined function with no call) might be too expensive.
Depending on the implementations I have measured including
VC++7, which does case-insensitive symbol lookups at runtime I
am able to squeeze almost 15% faster performance out of the
code by using a cache strategy.
What we did was to write our own wrapper function structure
called fast_tolower... at startup time, an array is created of
256 bytes

If this isn't what the implementation of ctype does, there's
something wrong with it. At least in the specialization for
char.
(if you are using wide characters, you would end up with a
65536 array of wchar_t, which would be about 128k of
memory... which seems like a lot, but our program does massive
crunching on the order of gigs so it's worth it). That then
gets populated with the results of tolower() all of the values
in that byte range.

To be really useful, wchar_t should be at least 21 bits. On the
machines I usually work on, it's 32 bits. And over 4 billion 4
byte elements isn't going to cut it.

In practice, of course, most of the code blocks don't have
upper/lower case, so using an additional level of indirection,
and only implementing the full table for blocks with at least
one upper/lower would probably be acceptable.
So all the fast_tolower lookup does is a constant array
access, and then we use that with std::transform. It's
definitely overkill for a small little program, but if you
have a large system working on lots of text or symbols that
need to be case insensitive (dealing with one locale of
course) I think it's a good idea to do it this way.

If profiling shows that your compiler's implementation of
ctype::tolower( char*, char const* ) is flakey, it's definitly a
solution to be considered. You might also want to consider it
simply because it means that you can pass the function arbitrary
iterators, rather than only char*'s -- if it avoids an otherwise
unnecessary copy, you have a speed advantage there as well.

Of course, if you want an internationalized environment, and to
do the conversion correctly, it gets a lot more complicated, and
the standard functions quickly become unusable (since toupper
will sometimes return two characters for a single lower case,
and things like that).
 
Joined
Dec 25, 2009
Messages
3
Reaction score
0
Hy guys,
I use this tool to convert a string to lowercase, pretty useful when you need to do it right away
stringfunction.com/string-lowercase.html
8)
David
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,155
Messages
2,570,871
Members
47,401
Latest member
CliffGrime

Latest Threads

Top