sorting the input

B

Ben Bacarisse

arnuld said:
I have shown you the output and it seems ok to me. Did you find something
wrong ?

Yes. Not in the sense of "it won't work" but wrong in the sense of
not right! If I do this:

#define MAX_SIZE 1000
...
char line[MAX_SIZE];
...
if (fgets(line, 902, fp) != NULL) ...

is it wrong? I think so, but the code will compile and execute quite
safely. That is all I was pointing out (see below).
I can't understand the 13.8 of FAQs: http://c-faq.com/lib/qsort1.html

/* compare strings via pointers */
int pstrcmp(const void *p1, const void *p2)
{
return strcmp(*(char * const *)p1, *(char * const *)p2);
}

parameters are made void* but in fact we are passing char**, which is 2 levels of
indirection. and what about the cast:

(char* const*)p1

we are casting a <pointer to void> to <pointer to a const pointer to char>
, right ?

Yup. qsort is given an array to sort. All it know about this data is
how big each element is and how many there are:

+----------+----------+----------+----------+----
base-----> | element1 | element2 | element3 | element4 | ...
+----------+----------+----------+----------+----

If qsort needs to compare element1 with element4, it passes a pointer
to these elements to the comparison function. These pointers must be
void * because qsort has no type information at all. Your comparison
function gets two pointers like this:

+----------+----------+----------+----------+----
base-----> | element1 | element2 | element3 | element4 | ...
+----------+----------+----------+----------+----
^ ^
| |
+---------------+ |
| |
int compare(const void *p1, const void * p2)

In your case, the elements are actually char *s -- they point to
strings like this:

+---+---+---+---+---+ +---+---+---+---+---+
| a | b | c | \n| \0| | d | e | f | \n| \0|
+---+---+---+---+---+ +---+---+---+---+---+
^ ^
| |
+---|------+----------+----------+---|------+----
base-----> | element1 | element2 | element3 | element4 | ...
+----------+----------+----------+----------+----
^ ^
| |
+---------------+ |
| |
int compare(const void *p1, const void * p2)

so although p1 is declared void * what it *really* is is a char ** --
it points to a char *. To get at the strings to compare you must
convert p1 to a char ** and access the char * it finds there (by
applying the * operator to the result). We make everything const, but
just blank that out to follow the intent:

int compare(const void *p1, const void * p2)
{
const char *const *cp1 = p1; /* char **cp1 = p1; in effect */
const char *const *cp2 = p2;
return strcmp(*cp1, *cp2);
}

*cp1 is 'element1' -- a char * pointing at "abc\n" and *cp2 is a char
pointing at "def\n". The code in the FAQ does the same (with one less
const) all in a single expression. I prefer to avoid the cast. Does
that help?
incorrect ?

Yes, you pass ARR_SIZE as the max parameter, but temp_arr is of size
STR_SIZE. Now, as it happens, max is smaller than STR_SIZE so this is
safe but fgets shoudl be passed the size of the thing is is to use,
not some other size that relates to something else!
you mean, by default, if I type: <malloc( size_arr )> then it will be
converted to <malloc(size_arr * 1)> ?

Not really. If you write x it is not converted to x * 1 even though
they are the same. If you write malloc(size_arr) you get size_arr
bytes. Multiplying by sizeof(char) is pointless because sizeof(char)
is 1. Some people like to write it out because it reminds them that
are allocating space for chars, but I prefer not to do that.
 
B

Ben Bacarisse

Richard Heathfield said:
Richard Tobin said:

In effect, [sizeof] instructs the compiler to find the number of bytes
used by the argument. That depends on the *type* of the argument,
not its value. And you don't need to evaluate an expression to
determine its type: if a and b are ints, a+b is an int regardless
of the values of a and b; if x is an array of doubles, x[5] is always
a double, and so on.

The compiler does some calculations to determine the type and hence
the size, so you could say it evaluates *something*, but it doesn't
evaluate the argument itself.

s/argument/operand/g

Sorry to be picky, but too few people realise that sizeof is an operator
rather than a function, so I think it's better to be precise in this
situation. (Yes, I know that you know. Just a typo, understood.)

That may be being a bit over-picky. The distinction is such a fine
one that the standard gets it wrong. Although the definition of
"argument" states that these are what get passed in function calls,
the text later refers to operators taking arguments.
 
A

arnuld

You can omit sizeof(char). It is 1 by definition. I can see why some
people might want a size there, but if you are one of them then you
should use the c.l.c-approved idiom:

if( (p = malloc(size_arr * sizeof *p)) )

where is "p" pointing to ? nowhere . so you can't dereference it yet.
 
A

arnuld

I have tried the strcmp function from FAQ:

http://c-faq.com/lib/qsort1.html

it fails to do its job. It Segfaults :( . If I remove its call from the
program, my program compiles and runs fine:


/* write a program to read a set of lines from input and sort them
* and then print them.
*
* version 1.0
*/


#include <stdio.h>
#include <stdlib.h>
#include <string.h>

enum MAXLINES { ARR_SIZE = 100, STR_SIZE = 1000 };

char* arr_of_ptr[ARR_SIZE];
/* char arr_of_char[STR_SIZE]; */

int readlines( char**, const int );
void printlines( char** );
int p_strcmp( const void*, const void* );


/* main() will simply call the other functions to do the job */
int main( void )
{
if( readlines( arr_of_ptr, ARR_SIZE ) > 0 )
{
qsort( arr_of_ptr, ARR_SIZE, sizeof( char* ), p_strcmp );
printlines( arr_of_ptr );
}


return 0;
}


/* 1) read lines till we get the NULL,
* 2) store those lines into an array of characters <arr_of_lines>,
* 3) pointer of arry of pointers <arr_of_ptr> will point to the
* individual elements of array of characters <arr_of_lines>,
*
*/

int readlines( char* arr_of_ptr[], const int max )
{
char *p, **p_arrptr;
int num_lines, size_arr;
char temp_arr[STR_SIZE];

num_lines = 0;
p_arrptr = arr_of_ptr;

while( fgets(temp_arr, max, stdin) && num_lines < max )
{
size_arr = strlen( temp_arr ) + 1;
if( (p = malloc( size_arr * sizeof( char ))) )
{
strcpy( p, temp_arr );
*p_arrptr++ = p;
++num_lines;
}

}

return num_lines;
}



/* it will simply print the lines pointed to by the elements of
* arrays of pointers <arr_of_ptr>.
*
*/
void printlines( char* arr_of_ptr[] )
{
printf("\n-------------------------\n");
while( *arr_of_ptr )
{
printf("%s", *arr_of_ptr++ );
}
}


/* compare 2 strings using pointers */
int p_strcmp( const void* pv1, const void* pv2 )
{
return strcmp( *(char* const*)pv1, *(char* const*)pv2 );
}
 
A

arnuld

He isn't dereferencing it: sizeof does not evaluate its operand (except
for one tiny corner-case in C99 that doesn't apply here), so no
dereferencing is taking place.

so, sizeof() operator does not evaluate its operands. It finds the number
of bytes without evaluating anything ?
 
A

arnuld

The second argument to qsort is the number of array elements to
process. While there are ARR_SIZE elements, most of them contain NULL
and should not be processed. readlines told you how many elements to
process but you threw that information away.


Ohh.. No.. Why on Earth I forgot to use readlines :(


int main( void )
{
const int read_num = readlines( arr_of_ptr, ARR_SIZE );

if( read_num )
{
qsort( arr_of_ptr, read_num, sizeof( char* ), p_strcmp );
printlines( arr_of_ptr );
}



People have already told you that max is not appropriate in the call
to fgets. If you won't listen, why post?

That was my typo, I think I need a break.



It is a nit but this will include the '\n' that fgets inserts for a
line shorter than the buffer. Most seem to think it is worth
eliminating.

fgets() stops reading when it will encounter a '\n' and this is what
reading a line means. So it will retain every newline and in the end will
put '\0'. I think its fine.

I did not get what you mean by "inserts for a line shorter than the
buffer". The last 2 elements of the malloc-ed array will always be '\n'
and '\0', not matter whether the line is small or large, as long it is of
STR_SIZE, I have defined for the limit.



When either v1 or v2 is NULL, this invoked undefined behavior. A
segfault is one of the nicer manifestations of same.


I just needed to replace the "max" with "STR_SIZE" and it works fine now :)


/* write a program to read a set of lines from input and sort them
* and then print them.
*
* version 2.0
*/


#include <stdio.h>
#include <stdlib.h>
#include <string.h>

enum MAXLINES { ARR_SIZE = 100, STR_SIZE = 1000 };

char* arr_of_ptr[ARR_SIZE];
/* char arr_of_char[STR_SIZE]; */

int readlines( char**, const int );
void printlines( char** );
int p_strcmp( const void*, const void* );


/* main() will simply call the other functions to do the job */
int main( void )
{
const int read_num = readlines( arr_of_ptr, STR_SIZE );

if( read_num )
{
qsort( arr_of_ptr, read_num, sizeof( char* ), p_strcmp );
printlines( arr_of_ptr );
}


return 0;
}


/* 1) read lines till we get the NULL,
* 2) store those lines into an array of characters <arr_of_lines>,
* 3) pointer of arry of pointers <arr_of_ptr> will point to the
* individual elements of array of characters <arr_of_lines>,
*
*/

int readlines( char* arr_of_ptr[], const int max )
{
char *p, **p_arrptr;
int num_lines, size_arr;
char temp_arr[STR_SIZE];

num_lines = 0;
p_arrptr = arr_of_ptr;

while( fgets(temp_arr, max, stdin) && num_lines < max )
{
size_arr = strlen( temp_arr ) + 1;
if( (p = malloc( size_arr * sizeof( char ))) )
{
strcpy( p, temp_arr );
*p_arrptr++ = p;
++num_lines;
}

}

return num_lines;
}



/* it will simply print the lines pointed to by the elements of
* arrays of pointers <arr_of_ptr>.
*
*/
void printlines( char* arr_of_ptr[] )
{
printf("\n-------------------------\n");
while( *arr_of_ptr )
{
printf("%s", *arr_of_ptr++ );
}
}


/* compare 2 strings using pointers */
int p_strcmp( const void* pv1, const void* pv2 )
{
return strcmp( *(char* const*)pv1, *(char* const*)pv2 );
}

========== OUTPUT =================
/home/arnuld/programs/C $ gcc -ansi -pedantic -Wall -Wextra 5-7.c
/home/arnuld/programs/C $ ./a.out
like
zzz
abc
 
A

arnuld

Yup. qsort is given an array to sort. All it know about this data is
how big each element is and how many there are:


so, <char** cp> can be stored in <void* vp>. I thought a ** (pointer to
pointer) needs a ** type to store but here we are storing a ** into a
single *. are they of same size ?



+----------+----------+----------+----------+----
base-----> | element1 | element2 | element3 | element4 | ...
+----------+----------+----------+----------+----
....[SNIP]...
int compare(const void *p1, const void * p2) {
const char *const *cp1 = p1; /* char **cp1 = p1; in effect */
const char *const *cp2 = p2;
return strcmp(*cp1, *cp2);
}
}
*cp1 is 'element1' -- a char * pointing at "abc\n" and *cp2 is a char
pointing at "def\n". The code in the FAQ does the same (with one less
const) all in a single expression. I prefer to avoid the cast. Does
that help?


helps in knowing that you think like me on this aspect. I also do not like
casts ;)

thanks for all the effort you put in :)
 
A

arnuld

In effect, it instructs the compiler to find the number of bytes
used by the argument. That depends on the *type* of the argument,
not its value. And you don't need to evaluate an expression to
determine its type: if a and b are ints, a+b is an int regardless
of the values of a and b; if x is an array of doubles, x[5] is always
a double, and so on.

The compiler does some calculations to determine the type and hence
the size, so you could say it evaluates *something*, but it doesn't
evaluate the argument itself.


I never expected this to be like this way. I am quite surprised at this
C feature. Are there any other C operators who do not evaluate their
operands ?
 
B

Barry Schwarz

so, <char** cp> can be stored in <void* vp>. I thought a ** (pointer to
pointer) needs a ** type to store but here we are storing a ** into a
single *. are they of same size ?

You don't care about the size. A void* is guaranteed convertible to
or from any other type of object pointer. One type of object pointer
is char*. Another is char**. Another is char***. Any of these can
be assigned to a void* and the compiler will generate the appropriate
code to perform the implicit conversion.


Remove del for email
 
B

Barry Schwarz

snip



fgets() stops reading when it will encounter a '\n' and this is what
reading a line means. So it will retain every newline and in the end will
put '\0'. I think its fine.

I did not get what you mean by "inserts for a line shorter than the
buffer". The last 2 elements of the malloc-ed array will always be '\n'
and '\0', not matter whether the line is small or large, as long it is of

If the amount of data processed by fgets fills your buffer, then fgets
will not have room for the '\n'. In this case the last two characters
will be the (N-1)_th character fgets read followed by a '\0'. There
will be no '\n' in the string.

There is a '\n' in the string returned from fgets only when the amount
of data read leaves at least two unused bytes in the buffer.

snip some more


Remove del for email
 
B

Barry Schwarz

On Wed, 23 Apr 2008 08:43:13 +0500, arnuld
snip
I came up with the Static Version of this function with 1 error and I
can't seem to find a way either to remove that error or to convert it to a
dynamic version of the program:

You really should get your previous version of the program working
before you fly off on a tangent.
/* write a program to read a set of lines from input and sort them
* and then print them.
*
* version 1.0

Surely it should be at least 2.0.
*/


#include <stdio.h>
#include <stdlib.h>
#include <string.h>

enum MAXLINES { ARRSIZE = 100 };

char* arr_of_ptr[ARRSIZE];
char arr_of_lines[ARRSIZE];

You must recognize that there is only room for one line here.

Are you sure that 100 characters is large enough to hold your lines.
int readlines( char**, char*, const int max );
void printlines( char** );


/* main() will simply call the other functions to do the job */

int main( void )
{
if( readlines( arr_of_ptr, arr_of_lines, ARRSIZE ) > 0 )
{
qsort( arr_of_ptr, ARRSIZE, sizeof( char* ) ); /* Line 26 */

How many arguments does qsort require? How many are you providing?
Which one did you forget?

And you still have the wrong number of elements to sort. Why did you
throw the return from readlines away?
printlines( arr_of_ptr );
}
else
{
fprintf( stderr, "error: out of memory\n" );
}

return 0;
}


/* 1) read lines till we get the NULL,
* 2) store those lines into an array of characters <arr_of_lines>,
* 3) pointer of arry of pointers <arr_of_ptr> will point to the
* individual elements of array of characters <arr_of_lines>,
*
*/

int readlines( char* arr_of_ptr[], char arr_of_lines[], int max )
{
int num_lines;

char temp_arr[ARRSIZE];

num_lines = 0;

while( fgets(temp_arr, max, stdin) )
{
strcpy( arr_of_lines, temp_arr );
*arr_of_ptr++ = arr_of_lines++;

It would make commenting on your code a lot easier if you didn't reuse
the object names. arr_of_ptr will be correctly incremented so it
points to the second element in your global pointer array. What will
arr_of_lines be incremented to point to?
++num_lines;
}

return num_lines;
}


/* it will simply print the lines pointed to by the elements of
* arrays of pointers <arr_of_ptr>.
*
*/
void printlines( char* arr_of_ptr[] )
{
while( *arr_of_ptr != '\0' )
{
puts( *arr_of_ptr++ );
}
}
================ OUTPUT ====================
[arnuld@raj C]$ gcc -ansi -pedantic -Wall -Wextra 5-7.c
5-7.c: In function `main':
5-7.c:27: error: too few arguments to function `qsort'
[arnuld@raj C]$



I know that qsort needs a compare function and I found this in FAQ:

http://www.c-faq.com/lib/qsort1.html


but a statement like: *(char * const *)p1

is totally beyond my capability.


Remove del for email
 
S

santosh

Barry said:
If the amount of data processed by fgets fills your buffer, then fgets
will not have room for the '\n'. In this case the last two characters
will be the (N-1)_th character fgets read followed by a '\0'. There
will be no '\n' in the string.

There is a '\n' in the string returned from fgets only when the amount
of data read leaves at least two unused bytes in the buffer.

This is system specific but at least on some systems you can enter data
two bytes less than the buffer and still dispense with the newline if
you terminate your data by two successive end-of-file sequences. I
think a better way to phrase what you noted above might be to say that
there is a '\n' in the string returned from fgets only if a newline
character is read on or before byte N-1 is read where N is the size of
the buffer supplied to fgets.
 
S

santosh

arnuld said:
yes, this is exactly what I am thinking about. It i snot about newline
in particular it is about the size of the input. Right now, It is
fixed to STR_SIZE limit. Can we make the program to accept any amount
of input ?

Yes. If you had been even cursorily following this group you'd have
known that it's a frequently discussed problem and several solutions by
the regulars have been periodically posted. Richard Heathfield
maintains a page on this issue with links to his and other solutions.

<http://www.cpax.org.uk/prg/writings/fgetdata.php>
 
B

Barry Schwarz

This is system specific but at least on some systems you can enter data
two bytes less than the buffer and still dispense with the newline if
you terminate your data by two successive end-of-file sequences. I

We agree that an end-of-file sequence is a system specific concept.
think a better way to phrase what you noted above might be to say that
there is a '\n' in the string returned from fgets only if a newline
character is read on or before byte N-1 is read where N is the size of
the buffer supplied to fgets.

My system does not terminate a "line" with a '\n' character in the
file. For files with fixed length records, the line ends after the
specified number of bytes have been processed. For variable length
records, the length of the line is stored in a 16-bit field before the
text of the line and the system processes that many bytes. Therefore,
any behavior dependent on reading a '\n' is specific to the way your
system stores files.

This in no way affects the portable behavior of fgets. If the amount
of data in the "line" leaves at least two unused bytes in the buffer,
fgets MUST terminate the string with a '\n' followed by a '\0'.
Failure to do so violates the required performance specified in the
standard.


Remove del for email
 
B

Barry Schwarz

yes, this is exactly what I am thinking about. It i snot about newline in
particular it is about the size of the input. Right now, It is fixed to
STR_SIZE limit. Can we make the program to accept any amount of input ?

One common method of dealing with "lines" of unknown length is

call fgets to read sizeof(buffer) characters
if '\n' present in buffer
line is complete
else
do
(process incomplete data as best you can)*
call fgets to read next sizeof(buffer) characters
until '\n' present in buffer

* This is obviously the tricky part. For example:

If the buffer is dynamically allocated, you could use realloc
to enlarge it and read the next set of data into the extended part of
the buffer. When the '\n finally appears, your buffer contains the
complete line.

If your only task is to count character frequency in the file
(a frequent student assignment), you could process the partial line
and then simply reuse the buffer.


Remove del for email
 
B

Ben Bacarisse

Barry Schwarz said:
This in no way affects the portable behavior of fgets. If the amount
of data in the "line" leaves at least two unused bytes in the buffer,
fgets MUST terminate the string with a '\n' followed by a '\0'.
Failure to do so violates the required performance specified in the
standard.

Not by my reading of it. It seems entirely within the allowed
behaviour that the last line (i.e. the one that triggers the
end-of-file condition) might not leave a \n in the buffer. In fact,
both reading a newline and seeing the end of the file are listed as
reasons to stop input and null-terminate the string. Am I missing
something?
 
L

lawrence.jones

Ben Bacarisse said:
That may be being a bit over-picky. The distinction is such a fine
one that the standard gets it wrong. Although the definition of
"argument" states that these are what get passed in function calls,
the text later refers to operators taking arguments.

Only in one place. I've made a note to fix it.

-Larry Jones

In short, open revolt and exile is the only hope for change? -- Calvin
 
A

arnuld

If the amount of data processed by fgets fills your buffer, then fgets
will not have room for the '\n'. In this case the last two characters
will be the (N-1)_th character fgets read followed by a '\0'. There
will be no '\n' in the string.

There is a '\n' in the string returned from fgets only when the amount
of data read leaves at least two unused bytes in the buffer.

yes, this is exactly what I am thinking about. It i snot about newline in
particular it is about the size of the input. Right now, It is fixed to
STR_SIZE limit. Can we make the program to accept any amount of input ?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,230
Members
46,819
Latest member
masterdaster

Latest Threads

Top