sorting 5million minute data: choose std::list or std::vector

M

Marc

Paavo said:
Bjarke Hammersholt Roune said:
In this case -1 is merely a shorter way of saying
std::numeric_limits<size_t>::max(), and once you've typed that a few
times you come to appreciate (size_t)-1. [...]
Once you realize that you
also realize that in fact npos could never be a valid index since
valid indexes are strictly less than the size of the array (or in this
case string) you are indexing into, and the maximum possible number of
addressable bytes is precisely std::numeric_limits<size_t>::max, or
npos.

Nope, the maximum number of addressable bytes is std::numeric_limits
<size_t>::max()+1.

What should s.length() return in that case?
 
B

Bjarke Hammersholt Roune

Once you realize that you
Nope, the maximum number of addressable bytes is std::numeric_limits
<size_t>::max()+1. Are you trying to claim that if I use a 8-bit unsigned
char for indexing I cannot address the full 256-byte array? Or are you
trying to say that an operating system cannot access the last byte in the
address space?
Correct, an OS can address all its memory because it does not address
that memory as indexes into std::string. However, strings are not like
raw arrays in that strings store their size. They store their size in
a size_t so the size cannot exceed npos. Hence the maximum possible
valid index is npos - 1.
Does not follow either.
You want std::string to support negative indexes?

Cheers
Bjarke Hammersholt Roune
 
M

Michael Doubez

On 12/02/2011 19:27, Juha Nieminen wrote:
Array indices are never negative.
Do you know what value std::string::npos usually is? That's right,
a value which would otherwise be a completely valid index, but which
is just agreed to mean "not any valid index". Usually this value is
size_t(-1). In other words, the natural value for npos would be -1,
but since the index type is unsigned, it cannot be used.
Even if we conceded that array indices are never negative, the result
of subtracting to array indices can be, in which case you have to do the
manual casting to avoid problems. (Why do you think ptrdiff_t is signed?)
Array indices are never negative; the result of subtracting array indices
cannot be used as an array index if the result would otherwise be negative
(if signed).
char str[24];
char* p_str =  str+12;
p_str[-12] ='A';

Touch ?  I don't think so.  Pointers are not arrays.  In C and C++ array
indexes are never negative.  An array is a distinct concept in C/C++ and
has a distinct type; "p_str" above could subsequently be made to point
to something else (other than an array).


Then lets take another example:
int array1[2][5];
int (&array2) [5] = array1[1];

array2[-1] = 42;

You will admit that array2 is a reference to a 'array of 5 int
element'.
And I can rightfully use a -1 indexing.
Not true; an array is a distinct concept, a pointer is a distinct concept:

template <typename T>
void is_same_i_dont_think_so(const T&, const T&) {}
...
is_same_i_dont_think_so(str, p_str);

int array3[5];
is_same_i_think_so(array2,array3);

Just because an array can decay to a pointer does not mean that arrays
are the same as pointers.

But the subscript operator on an array has exactely the same effect as
on a pointer: E1[E2] <=> *(E1+E2)

Now just to get the misunderstanding out, I think you are right for a
given definition of index: 'index' meaning the i-th element - where it
therefore cannot be negative.

But we commonly use 'index' to name the integral part of the subscript
operator and in this case, it can be negative.
 
J

James Kanze

On 12/02/2011 19:27, Juha Nieminen wrote:
Array indices are never negative.
Do you know what value std::string::npos usually is? That's right,
a value which would otherwise be a completely valid index, but which
is just agreed to mean "not any valid index". Usually this value is
size_t(-1). In other words, the natural value for npos would be -1,
but since the index type is unsigned, it cannot be used.
Even if we conceded that array indices are never negative, the result
of subtracting to array indices can be, in which case you have to do the
manual casting to avoid problems. (Why do you think ptrdiff_t is signed?)
Array indices are never negative; the result of subtracting array indices
cannot be used as an array index if the result would otherwise be negative
(if signed).
char str[24];
char* p_str = str+12;
p_str[-12] ='A';
Touch .
Touch ? I don't think so. Pointers are not arrays.

Red herring. Nobody said they were.
In C and C++ array
indexes are never negative. An array is a distinct concept in C/C++ and
has a distinct type; "p_str" above could subsequently be made to point
to something else (other than an array).

Arrays and pointers are two very distinct things, but in an
expression, there is no way to index into an array. You have to
convert the array to a pointer first.
Not true; an array is a distinct concept, a pointer is a distinct concept:
template <typename T>
void is_same_i_dont_think_so(const T&, const T&) {}
...
is_same_i_dont_think_so(str, p_str);
Just because an array can decay to a pointer does not mean that arrays
are the same as pointers.

I know that, and you know that I know that. On the other hand,
there is no "array index operator" which operates on an array.
To index into an array, you have to convert the array to
a pointer, and use pointer arithmetic.
 
P

Paul

James Kanze said:
On 12/02/2011 19:27, Juha Nieminen wrote:
Array indices are never negative.
Do you know what value std::string::npos usually is? That's right,
a value which would otherwise be a completely valid index, but which
is just agreed to mean "not any valid index". Usually this value is
size_t(-1). In other words, the natural value for npos would be -1,
but since the index type is unsigned, it cannot be used.
Even if we conceded that array indices are never negative, the
result
of subtracting to array indices can be, in which case you have to do
the
manual casting to avoid problems. (Why do you think ptrdiff_t is
signed?)
Array indices are never negative; the result of subtracting array
indices
cannot be used as an array index if the result would otherwise be
negative
(if signed).
char str[24];
char* p_str = str+12;
p_str[-12] ='A';
Touch .
Touch ? I don't think so. Pointers are not arrays.

Red herring. Nobody said they were.
In C and C++ array
indexes are never negative. An array is a distinct concept in C/C++ and
has a distinct type; "p_str" above could subsequently be made to point
to something else (other than an array).

Arrays and pointers are two very distinct things, but in an
expression, there is no way to index into an array. You have to
convert the array to a pointer first.
Not true; an array is a distinct concept, a pointer is a distinct
concept:
template <typename T>
void is_same_i_dont_think_so(const T&, const T&) {}
...
is_same_i_dont_think_so(str, p_str);
Just because an array can decay to a pointer does not mean that arrays
are the same as pointers.

I know that, and you know that I know that. On the other hand,
there is no "array index operator" which operates on an array.
To index into an array, you have to convert the array to
a pointer, and use pointer arithmetic.

--

Boost::multi_array also provides support for negative array bases.
Also note that it claims to be be more efficient than vector, which seems to
be Leighs preferred std::container.
"Boost MultiArray is a more efficient and convenient way to express
N-dimensional arrays than existing alternatives (especially the
std::vector<std::vector<...>> formulation of N-dimensional arrays). "
 
G

gwowen

Please cite the part of the Standard which says that indexing into an
array *requires* converting the array into a pointer.  Although the
array/pointer *equivalence* exists it is not a *requirement* that such
*equivalence* always be utilized by a compiler (implementation) as far
as I can tell (I could be wrong; I am not familiar with the entire
Standard).

Well, the "as if" rule applies as normal, so there is no requirement
on the compiler to implement array subscript as *((E1)+(E2)). All
that is required is that array subscripting E1[E2] behaves in a way
which is functionally identical to *((E1)+(E2)). If you don't want to
describe that as "utilized by a compiler", that's your perogative, but
it is literally a distinction without a difference.
 
P

Paul

Leigh Johnston said:
A compiler can optimize indexing into a local array as direct memory read
of the stack (i.e. offsetting the stack pointer) without performing any
pointer arithmetic at all.

Please cite the part of the Standard which says that indexing into an
array *requires* converting the array into a pointer. Although the
array/pointer *equivalence* exists it is not a *requirement* that such
*equivalence* always be utilized by a compiler (implementation) as far as
I can tell (I could be wrong; I am not familiar with the entire Standard).
8.3.4
"Except where it has been declared for a class (13.5.5), the subscript
operator [] is interpreted in such a way
that E1[E2] is identical to *((E1)+(E2)). "

This guarantees the following to work on *all* compliant implementations:

int* p_arr = new int[10]; //create an array
p_arr +=5; //offset to -5 based indexing
p_arr[-5]; //index the 1st element in the array.
 
M

Michael Doubez

On 13/02/2011 12:47, James Kanze wrote:
On 12/02/2011 19:27, Juha Nieminen wrote:
Array indices are never negative.
Do you know what value std::string::npos usually is? That's right,
a value which would otherwise be a completely valid index, but which
is just agreed to mean "not any valid index". Usually this value is
size_t(-1). In other words, the natural value for npos would be -1,
but since the index type is unsigned, it cannot be used.
Even if we conceded that array indices are never negative, the result
of subtracting to array indices can be, in which case you have to do the
manual casting to avoid problems. (Why do you think ptrdiff_t is signed?)
Array indices are never negative; the result of subtracting array indices
cannot be used as an array index if the result would otherwise be negative
(if signed).
char str[24];
char* p_str =  str+12;
p_str[-12] ='A';
Touch .
Touch ?  I don't think so.  Pointers are not arrays.  In C and C++ array
indexes are never negative.  An array is a distinct concept in C/C++ and
has a distinct type; "p_str" above could subsequently be made to point
to something else (other than an array).
Then lets take another example:
int array1[2][5];
int (&array2) [5] = array1[1];
array2[-1] = 42;
You will admit that array2 is a reference to a 'array of 5 int
element'.
And I can rightfully use a -1 indexing.

Indeed this is a case I did not consider when posting, however...

[snip]


But the subscript operator on an array has exactely the same effect as
on a pointer: E1[E2]<=>  *(E1+E2)

n3225 says:

"Because of the conversion rules that apply to +, if E1 is an
array and E2 an integer, then E1[E2] refers to the E2-th member of E1."

Well, the standard doesn't make clear what a i-th member is except for
the array declaration:
"[...]If the value of the constant expression is N, the array has N
elements numbered 0 to N-1[...]"

IMO it is just to make clear that although array is not first class
citizen, the notations are equivalent.
I agree.




Given what the standard says I am not convinced that -1 is an allowable
array index as an array does not have an element at the i-th position if
i is negative.

But the standard clearly states that arr <=> i[arr] <=> *(arr+i).
And you even quoted that the + operator causes the array to decay to
pointer, in this case, the integral part of subscript only applies to
a pointer, not an array.

[snip]
I think the standard could perhaps make it clear that for operations on
an array reference to a "sub-array" of a larger array the allowable
index can be out of bounds of the "sub-array".

I wonder what motivated the phrase you quoted. I found the same
wording in my copy of C99, so it must date from then back.

However, AFAIS there is no notion of out-of-bound for plain arrays and
there isn't any requirement for bound checking; in this regard the
standard seems to rely on pointer's semantic.
The fact that these array operations are probably fine on most if not
all sane implementations is a different issue.

Considering the mass of low level code relying on out of bound access
(like in packet decoding), that would be a bad move to forbid it.
 
G

gwowen

"E1[E2] designates the E2-th
element of E1 (counting from zero)."

which in my eyes seems to explicitly forbid negative array indices

If you think that explicitly forbids negative array indices, we do not
share a common understanding of the word "explicitly".
 
G

gwowen

int main()
{
   int array1[2][5];
   int (&array2) [5] = array1[1];
   array2[-1] = 42;

}

"ComeauTest.c", line 5: warning: subscript out of range
     array2[-1] = 42;

Comeau C++ also warns you if you write
int x = 7, y=0;
if(x=3) y=3;

"ComeauTest.c", line 5: warning: use of "=" where "==" may have been
intended
if(x=3) y =3;
"ComeauTest.c", line 3: warning: variable "x" was set but never used
int x = 7;

And, in all these both cases, Comeau is quite right to warn, it's
horrible style.
But the fact that its horrible style has no bearing on whether its
valid or well defined.
 
G

gwowen

Oh I have just about had enough of you and your feeble trolls.  *plonk*

Since you believe such knowledgeable people James Kanze and Juha
Nieminen (and, as far as I can tell, practically everyone else who's
ever disagreed with you) to be a troll, I shall wear that particular
badge with pride. (I fear I may be unworthy, for unlike yourself, I am
sufficiently self-aware to know that I don't know half as much as
either of those gentlemen).
 
P

Paul

Leigh Johnston said:
Please cite the part of the Standard which says that indexing into an
array *requires* converting the array into a pointer. Although the
array/pointer *equivalence* exists it is not a *requirement* that such
*equivalence* always be utilized by a compiler (implementation) as far
as I can tell (I could be wrong; I am not familiar with the entire
Standard).

Well, the "as if" rule applies as normal, so there is no requirement
on the compiler to implement array subscript as *((E1)+(E2)). All
that is required is that array subscripting E1[E2] behaves in a way
which is functionally identical to *((E1)+(E2)). If you don't want to
describe that as "utilized by a compiler", that's your perogative, but
it is literally a distinction without a difference.

Yes but the Standard also states that E1[E2] refers to the E2-th member of
E1 if E1 is an array which in my eyes disallows out-of-bounds access of E1
even if E1 is a "sub-array reference" (i.e. part of a larger N-dimensional
array) where the effective index is valid for the containing array.

The C standard also states the following:

"E1[E2] designates the E2-th
element of E1 (counting from zero)."

which in my eyes seems to explicitly forbid negative array indices
assuming my analysis of the text is not flawed (it is ambiguous at best
IMO).
So in your eyes Boost::multi_array must be invalid code.

It is not in the interest of C++ to impose a restiction on array indexing.
Can you give one valid reason why C++ should impose such a restriction on
the language?
 
P

Paul

Oh I have just about had enough of you and your feeble trolls. *plonk*

Since you believe such knowledgeable people James Kanze and Juha
Nieminen (and, as far as I can tell, practically everyone else who's
ever disagreed with you) to be a troll, I shall wear that particular
badge with pride. (I fear I may be unworthy, for unlike yourself, I am
sufficiently self-aware to know that I don't know half as much as
either of those gentlemen).
'''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''

You forgot about me , I am uber troll and you also don't know half as much
as me about *anything* :), especially Malawi cichlids, bespoke toucans and
the theory of screwdriver ratchet mechanisms.
..
 
I

Ian Collins

#include<string>
#include<iostream>

int main() {
std::string s = "abc";
// unsigned used because "indexes are not negative"
unsigned int k = s.find('x');
if (k==s.npos) {
std::cout<< "ok\n";
} else {
std::cout<< "FAIL\n";
}
}

This works fine if size_type is the same size than unsigned int (e.g.
32bit), but fails in a common 64-bit compilation.

Even worse, if you were to use unsigned long, it will work on Unix and
Unix like 64 bit systems, but fail on windows.
If npos was signed and -1, then this would work fine with "int k",
regardless of the size of std::string::size_type. Actually, it seems to
work fine with "int k" even now in a 64-bit MSVC compilation, but this
involves unsigned -> signed conversion of an out-of-range value and is
thus implementation-defined behavior.

Currently the correct solution is to use std::string::size_type instead
of int or unsigned int. However, this is an extra piece of detail to
follow and not so easy to get correct at first attempt. If I know that my
app is only dealing with ordinary strings, easily addressable by a plain
int, why should I need to take care about such details?

I've often wondered why std::string find doesn't return an iterator, to
be consistent with std::find and container find methods. Then there
would be no need for std::string::npos.
 
P

Paul

Leigh Johnston said:
What's the big deal? Just use std::string::size_type (I do). On a
related note typedef is a powerful C++ tool and a decent C++ programmer
should be using it a lot (I do).

People in the "use int everywhere" camp are lazy fuckers who no doubt
prefer "int" because it only involves typing 3 characters .. oh no teh
source codez doesn't fit in my editor windowz.. must use "int".

On the contrary its people who use STL everywhere who are lazy.
Do you think you alone are the only person who uses typedef? You didn't even
know what it meant a month a go ref your reply to the following:
You suggest you know assembly so surely you know how you create a class
type in ASM. You simply typedef the construct in whatever segment you
choose.

What the f**k are you talking about?
</quote>

You are in the use STL everywhere camp because you feel it is tried and
tested and safe and it often removes the burden of responsibility.
Personally I normally use unsigned int for array indexing but I don't think
this is a 'one solution for all situations'. You seem to prefer the 'one
solution for all situations' way of thinking i.e: always use std::vector,
always use std::string, arrays are always zero based indexed. This is a
person who can't be bother to check for bounds or do memory allocation and
IMO is a lazy thinker.
 
J

James Kanze

On 14 fév, 15:30, Leigh Johnston <[email protected]> wrote:

[...]
Then lets take another example:
int array1[2][5];
int (&array2) [5] = array1[1];
array2[-1] = 42;
You will admit that array2 is a reference to a 'array of
5 int element'. And I can rightfully use a -1 indexing.

Actually, according the C and (I think) the C++ standards, the
above is undefined behavior.

[...]
The standard clearly recognizes "arrays": pointer arithmetic is
only valid within an array. (For this purpose, a scalar is
treated as an array of 1 element.)
But the standard clearly states that arr <=> i[arr] <=>
*(arr+i).


More precisely, the standard defines + and *, and defines a
as being *(a+b). It does not define an indexing operator on
arrays.

Having said that, pointer arithmetic is only valid within an
array. The standard allows a pointer to carry around
information concerning the array into which it points---this is
commonly called a fat pointer, and do all sorts of bounds
checking at runtime. At least one implementation actually did
(and maybe still does) this; in practice, however, there is
a strong argument for using the same pointer representation as
the system API.
[snip]
I think the standard could perhaps make it clear that for operations on
an array reference to a "sub-array" of a larger array the allowable
index can be out of bounds of the "sub-array".
I wonder what motivated the phrase you quoted. I found the same
wording in my copy of C99, so it must date from then back.

I don't have access to any copy of the C standard at present, so
I can't quote the text, but from discussions with some of the
people involved, I know that the intent (at least of some of the
authors of the C standard) was that operations on a sub-array of
a larger array are not allowed to be out of bounds, that this
was undefined behavior.
However, AFAIS there is no notion of out-of-bound for plain arrays and
there isn't any requirement for bound checking; in this regard the
standard seems to rely on pointer's semantic.

There's no requirement for bounds checking, but it's not
forbidden either. At least one implementation (ObjectCenter)
did it; I don't know the current status. The semantics of C++
(pointer arithmetic and no array indexation) are such, however,
to make bound checking particularly expensive, so it isn't
widely used.
Considering the mass of low level code relying on out of bound access
(like in packet decoding), that would be a bad move to forbid it.

Formally, it is forbidden. The fact that it is widely used
probably means that no compiler will ever dare enforce it.
 
J

James Kanze

On 14/02/2011 14:40, James Kanze wrote:
A compiler can optimize indexing into a local array as direct memory
read of the stack (i.e. offsetting the stack pointer) without performing
any pointer arithmetic at all.
Please cite the part of the Standard which says that indexing into an
array *requires* converting the array into a pointer. Although the
array/pointer *equivalence* exists it is not a *requirement* that such
*equivalence* always be utilized by a compiler (implementation) as far
as I can tell (I could be wrong; I am not familiar with the entire
Standard).

§5.2.1:
A postfix expression followed by an expression in square
brackets is a postfix expression. One of the expressions
shall have the type “pointer to T” and the other shall
have enumeration or integral type. The result is an
lvalue of type “T.” The type “T” shall be
a completely-defined object type. The expression E1[E2]
is identical (by definition) to *((E1)+(E2)).

It's the definition of the [] operator. (Note that this allows
things like 1["abc"].)

After that, of course, there is the "as if" rule, which says
that the compiler can do pretty much anything it wants, as long
as the "observable behavior" of the program doesn't change. The
standard really doesn't impose much on the generated code.

In the case of user defined [], all bets are off, and the
language imposes no relationship between it and any other
operator. Most pre-standard array or vector classes I've seen
do define it as an index operator (and do not define any
associated pointer operations). In this regard, std::vector and
std::array are a bit special: although they do define [] as an
indexing operator, they also support pointers (and iterators)
into the array, and that pointer arithmetic on those pointers
works as it does with built-in arrays.
 
J

James Kanze

On 14/02/2011 22:38, Ian Collins wrote:

[...]
People in the "use int everywhere" camp are lazy fuckers who no doubt
prefer "int" because it only involves typing 3 characters .. oh no teh
source codez doesn't fit in my editor windowz.. must use "int".

One can see the level of your communication skills. Personally,
I don't think people like Stroustrup are "lazy fuckers".

The language designers intended for int to be used everywhere,
unless special considerations dictated otherwise. If you have
some concrete arguments as to why this situation has changed,
present them. But just calling the language designers "lazy
fuckers" isn't much of an argument.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,143
Messages
2,570,821
Members
47,367
Latest member
mahdiharooniir

Latest Threads

Top