POSIX enhancements to printf

I

Ian Collins

jacob said:
Le 26/02/2014 21:43, Ian Collins a écrit :

Linux mint is in english and my LANG environment variable is:
LANG=en_US.UTF-8

and it doesn't work!

What doesn't?

Do you call setlocale()?
In the Macintosh (not a portable ut a tower Mac) with OS X 10.9.2
the most recent version downloaded 6 hours ago!)
~ $ uname -a
Darwin macpro.local 13.1.0 Darwin Kernel Version 13.1.0: Thu Jan 16
19:40:37 PST 2014; root:xnu-2422.90.20~2/RELEASE_X86_64 x86_64

the printf("%'d\n",123456789);

will NOT work out of the box without setting the locale to en_US
within the program.

Doesn't setlocale(LC_ALL, "") work? It should and it does on mine.
This is really weird.

Most locale related stuff is!
In my implementation I default to "," as thousand separator, and
probably you can use setlocale but I wouldn't be sure of that, my
implementation of all this stuff is probably not the best part of
lcc-win :)

In any case under lcc-win it works out of the box without any locale stuff.

Given the POSIX extension is local specific, that's a bit of an odd claim!
 
I

Ian Collins

James said:
On 02/26/2014 04:28 PM, Ian Collins wrote:
....

As I understand it, calling those functions without first calling
setlocale() is perfectly acceptable - calling setlocale() is required
only if you want to use a locale other than the "C" locale.

That's right I should have explained it better. The basic problem is
that for this extension, the "C" locale isn't much use.
 
G

Geoff

jacob@linux-mint ~ $ cat tf.c
#include <stdio.h>
int main(void) { printf("%'d\n",123456789); }

jacob@linux-mint ~ $ gcc tf.c && ./a.out
123456789

You're doing it wrong.

Experimentation will tell you what your locale will do in context. It
appears the gcc implementation on your system doesn't set the numeric
behavior for your locale. The examples heretofore presented also fail
to check the return value of setlocale() to see whether it succeeded,
a shameful oversight given the pedantic behavior exhibited by some of
the posters in this group toward others when presented with code such
as they have posted in this thread.

If your system doesn't format numbers in such a format as you want
them then you must use a mixed locale:


#include <stdio.h>
#include <locale.h>
#include <ctype.h>

int main (void)
{
char * mylocale = "";

// C locale
printf("%'d %s\n", 123456789, mylocale);

// use en_US locale if implemented
mylocale = setlocale(LC_ALL, "en_US");
printf("%'d %s\n", 123456789, mylocale);

// use fr_FR locale if implemented
mylocale = setlocale(LC_ALL, "fr_FR");
printf("%'d %s\n", 123456789, mylocale);

// use de_DE locale if implemented
mylocale = setlocale(LC_ALL, "de_DE");
printf("%'d %s\n", 123456789, mylocale);

// use mixed French/US locale if implemented
mylocale = setlocale(LC_ALL, "fr_FR");
mylocale = setlocale(LC_NUMERIC, "en_US");
printf("%'d %s\n", 123456789, mylocale);

}

The man files I've searched state there are only three standard locale
names: C, POSIX and "". What's most aggravating about the
documentation is that it never presents any method for finding out
what other locale names an implementation supports. One ends up
guessing what locale names
 
I

Ian Collins

Geoff said:
The man files I've searched state there are only three standard locale
names: C, POSIX and "". What's most aggravating about the
documentation is that it never presents any method for finding out
what other locale names an implementation supports. One ends up
guessing what locale names

Solaris appears to be alone in having localelist() which does that job.
To quote the setlocale() man page:

To get the list of installed locales, instead of calling
setlocale() over a list of potentially installed locales and
checking on the return values, using localelist(3C) is
recommended. The localelist() function does not switch
locales and it is more efficient, faster, and fully MT-safe.
 
J

jacob navia

Le 27/02/2014 03:47, Geoff a écrit :
#include <stdio.h>
#include <locale.h>
#include <ctype.h>

int main (void)
{
char * mylocale = "";

// C locale
printf("%'d %s\n", 123456789, mylocale);

// use en_US locale if implemented
mylocale = setlocale(LC_ALL, "en_US");
printf("%'d %s\n", 123456789, mylocale);

// use fr_FR locale if implemented
mylocale = setlocale(LC_ALL, "fr_FR");
printf("%'d %s\n", 123456789, mylocale);

// use de_DE locale if implemented
mylocale = setlocale(LC_ALL, "de_DE");
printf("%'d %s\n", 123456789, mylocale);

// use mixed French/US locale if implemented
mylocale = setlocale(LC_ALL, "fr_FR");
mylocale = setlocale(LC_NUMERIC, "en_US");
printf("%'d %s\n", 123456789, mylocale);

}

Mac OS X
123456789
123,456,789 en_US
123456789 fr_FR // German and French locales
123456789 de_DE // do not work
123,456,789 en_US

Under linux-mint I get
123456789
123456789 (null)
123456789 (null)
123456789 (null)
123456789 (null)

Under linux nothing works, whatever the reason... I tried to figure out
what should be done but this is getting us too far. Too much work to
figure out. I used the "synaptic package manager" and searched for the
"locale" keyword but after downloading several of the dozens of packages
found I gave up, nothing changed.

One of the problems of the package manager under linux is that it
downloads the package but will never TELL YOU where the package is
stored so you have to search for some executable (and you do not know
the name of course) somewhere in the whole file system...

Thanks for your input
 
I

Ike Naar

That's right I should have explained it better. The basic problem is
that for this extension, the "C" locale isn't much use.

Apparently this behaviour is system-specific, for me,

/* begin code */
#include <stdio.h>
#include <locale.h>

int main(void)
{
if (setlocale(LC_ALL, "C"))
{
printf("%'d\n", 123456789);
}
return 0;
}
/* end code */

prints

123,456,789
 
M

Mark Storkamp

Experimentation will tell you what your locale will do in context. It
appears the gcc implementation on your system doesn't set the numeric
behavior for your locale. The examples heretofore presented also fail
to check the return value of setlocale() to see whether it succeeded,
a shameful oversight given the pedantic behavior exhibited by some of
the posters in this group toward others when presented with code such
as they have posted in this thread.

Better?
$ cat test.c
#include <stdio.h>
#include <stdlib.h>
#include <locale.h>
#include <string.h>

int main(void)
{
char buffer[50];
char *tryLocale;
int stringLength;
do
{
if (printf("Which locale? ") < 0)
return EXIT_FAILURE;
if (fflush(stdout) != 0)
return EXIT_FAILURE;
if ((tryLocale = fgets(buffer, 50, stdin)) == NULL)
return EXIT_FAILURE;
stringLength = strlen(tryLocale);
if (stringLength > 1)
{
tryLocale[stringLength-1] = 0;
} else
{
return EXIT_FAILURE;
}
if (!strcasecmp(tryLocale, "quit"))
return EXIT_SUCCESS;
if (!strcasecmp(tryLocale, "exit"))
return EXIT_SUCCESS;
if (setlocale(LC_ALL, tryLocale) == NULL)
{
if (printf("unknown locale %s\n", tryLocale) < 0)
return EXIT_FAILURE;
} else
{
if (printf("%'d\n", 123456789) < 0)
return EXIT_FAILURE;
}
} while (1);
}

$ ./test
Which locale? en_US
123,456,789
Which locale? de_DE
123456789
Which locale? fi_FI
123.456.789
Which locale? C
123456789
Which locale? quit
$
 
I

Ian Collins

jacob said:
Under linux-mint I get
123456789
123456789 (null)
123456789 (null)
123456789 (null)
123456789 (null)

You should try appending .UTF8 to the local strings ("en_US.UTF-8" for
example).
Under linux nothing works, whatever the reason...

One of the reasons I try and avoid it :)
 
B

BartC

jacob navia said:
I am implementing the POSIX enhancements to printf under lcc-win.

I had already implemented the ' modifier, that allows to group big numbers
sequences of 3 digits, i.e. instead of

File size is 633455543 bytes

to write

File size is 633 455 543 bytes.

MUCH more readable!

Yes, but reading this thread it seems extraordinarily difficult to guarantee
it working in any C system. But writing user-code to do the same isn't too
demanding (example code below).

Useful also are separators (but not commas...) in numeric constants of C
source code, but I can't see that ever happening, even though it is trivial
to code in any specific implementation.
I am now starting to implement positional arguments

printf("%d %d %d\n", 1, 2, 3);
output:
1 2 3

printf("%3$d %2$d %1$d\n",1,2,3);
output:
3 2 1

These are really nice things to have, and it is a pity they aren't more
used.

Maybe because no-one can see the point! Do you have a better example?

---------------------

Example code to insert separators into a number. (This function designed to
be used by others which can take care of negative numbers, leading zeros
etc):

#include <stdio.h>
#include <stdint.h>

char* u32_to_str(uint32_t a, char* s, int base, int sep) {
/* Convert 32-bit unsigned int a to string in s (which must be big enough).
Base is number base, usually 10 but can be 2 or 16 or anything
in-between.
Sep is 0, or separator character inserted every 3 (base 10) or 4 chars.
Return s
*/
char* u;
char t[100]; /* allow for 64-bit binary with separators */
int i, k, g;
char digits[] =
{'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F'};

i=-1;
k=0; /* count digits to next separator */
g=(base==10?3:4); /* separated group size */

do {
t[++i]=digits[a%base];

a=a/base;
if (sep && a && ++k==g) {
t[++i]=sep;
k=0;
}
}while (a!=0);

u=s;
while (i>=0) {
*u=t[i--];
++u;
}

return s;
}

int main(void){
char str[100];
printf("%s",u32_to_str(123456789,str,10,','));
}
 
M

Melzzzzz

You should try appending .UTF8 to the local strings ("en_US.UTF-8"
for example).
mylocale = setlocale(LC_ALL, "en_US.UTF8");
printf("%'d %s\n", 123456789, mylocale);

This works.
 
K

Kenny McCormack

[QUOTE="Melzzzzz said:
You should try appending .UTF8 to the local strings ("en_US.UTF-8"
for example).
mylocale = setlocale(LC_ALL, "en_US.UTF8");
printf("%'d %s\n", 123456789, mylocale);

This works.[/QUOTE]

Good to hear. However (and FWIW), note that it works correctly on my
system either with or without the ".UTF8". So, as (I believe it was) BartC
posted, it is kind of a crap-shoot. It depends on what exactly is installed
on your system (and, as Kiki will tell us, is not at all covered by the C
standards documents - and is therefore completely off-topic here).

BartC also indicates that this uncertainty bothers him and he would, thus,
prefer to do it himself in user-code. He might be interested to read the
parallel thread that I have started in comp.lang.awk, in which I argue the
opposite (which boils down to "uncertainty is good" [heh heh]).
 
J

James Kuyper

Maybe because no-one can see the point! Do you have a better example?

The main reason I've seen given for positional arguments is
multi-lingual messages - a different format string is used for each
language, with the same list of arguments. The problem is that the most
natural order for those arguments may be language-dependent. I am not
sufficiently fluent in enough different languages to come up with a good
example of my own, but I found one at
<http://http://www.gnu.org/software/gawk/manual/html_node/Printf-Ordering.html>.
 
J

jacob navia

Le 27/02/2014 11:37, BartC a écrit :
Yes, but reading this thread it seems extraordinarily difficult to
guarantee
it working in any C system.

It works with lcc-win under windows.

Problems arise when the THOUSANDS_SEPARATOR is not defined using gcc
since that implementation doesn't have ant backup option apparently.
 
B

BartC

Example code to insert separators into a number. .....
u=s;
while (i>=0) {
*u=t[i--];
++u;
}

(If anyone is actually trying to run or understand this function, there's a
line *u=0; that got left out, to terminate the string, after this
copy/reversal operation.)
 
K

Kenny McCormack

Example code to insert separators into a number. ....
u=s;
while (i>=0) {
*u=t[i--];
++u;
}

(If anyone is actually trying to run or understand this function, there's a
line *u=0; that got left out, to terminate the string, after this
copy/reversal operation.)

The problem with doing this in user-code is that you are only doing it for
one specific case - that of US (and, if I am reading some of the other
posts correctly, France also, but not many of the other continental
countries) conventions. So, you're really just solving your own personal
problem (which is fine as an end-user programmer, not so good if you are
trying to be a systems programmer).

For an example, Kiki quite correctly gave the example of what happens when
you set your locale to something Indian (the country in Asia). They have
completely different conventions about how to do this sort of thing.
 
J

James Kuyper

On 02/26/2014 09:47 PM, Geoff wrote:
....
The man files I've searched state there are only three standard locale
names: C, POSIX and "". What's most aggravating about the
documentation is that it never presents any method for finding out
what other locale names an implementation supports. One ends up
guessing what locale names

On my Linux system, "man -k locale" led me to the locale command, which,
with the -a option, lists all available locales.
 
B

BartC

jacob navia said:
Le 27/02/2014 11:37, BartC a écrit :

It works with lcc-win under windows.

Yes, but sometimes people want to compile their apps with more than one
compiler (for various reasons). Or maybe create a source distribution that
tries not to specify a particular compiler.
Problems arise when the THOUSANDS_SEPARATOR is not defined using gcc since
that implementation doesn't have ant backup option apparently.

Does POSIX define a separator for hex output? (Which would be every four
digits.) In my experience, long numbers are more likely to be in hex!

(BTW, does anyone in continental Europe actually use periods (".") to
separate thousands, and commas as decimal points?)
 
B

BartC

The problem with doing this in user-code is that you are only doing it for
one specific case - that of US (and, if I am reading some of the other
posts correctly, France also, but not many of the other continental
countries) conventions. So, you're really just solving your own personal
problem (which is fine as an end-user programmer, not so good if you are
trying to be a systems programmer).

I was writing software for use in Europe twenty years ago (predating a lot
of this regional/locale stuff too). Then this kind of thing: thousands
separators, decimal point character, date format etc, was just an option in
the software.

And in my example, I kept the actual separator character unspecified (I used
a comma for the test).

But the grouping format (into thousands for decimal) would be the same. If
any country liked using some wacky scheme (arranging a long number into a
matrix for example), then they're out of luck.

(The example such as the Indian one; when you start looking you can find all
sorts of conventions, a lot to do with denoting monetary quantities and
such. I think then it's not really a language issue anymore, that you fix
with an apostrophe in a format string, any more than C's sort() function
can arrange the entries in one of their telephone directories without a
massive amount of extra coding.)

I doubt that it was taken seriously at the time anyway: clients seemed quite
capable of using US/UK-style number formatting, if they even bothered to
invoke the thousand-separator style at all.

Perhaps it's the same now (does a printed floating point value use something
other than a period as a decimal point in certain locales?)
 
B

BartC

Richard said:
Are you being serious?

I did this stuff too and official accounting reports MUST have the
correct formatting or its WRONG.

Undoubtedly.

However I'm asking whether C will directly print, for example, floating
point values with a comma and whatever other local conventions demand.
 
K

Kenny McCormack

[QUOTE="Richard said:
For an example, Kiki quite correctly gave the example of what happens when
you set your locale to something Indian (the country in Asia). They have
completely different conventions about how to do this sort of thing.

Even in "western europe" it's all totally different. Look at German
number formatting to that of the UK.[/QUOTE]

I am quite aware that various continental countries swap the use of dots
and commas, but I gave the example of India because they also change the
grouping - i.e., it is no longer "thousands separators" for them:

$ LC_ALL=en_IN gawk 'BEGIN {printf("%\047d\n",123456789)}'
12,34,56,789
$

--
"I heard somebody say, 'Where's Nelson Mandela?' Well,
Mandela's dead. Because Saddam killed all the Mandelas."

George W. Bush, on the former South African president who
is still very much alive, Sept. 20, 2007
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
474,075
Messages
2,570,562
Members
47,197
Latest member
NDTShavonn

Latest Threads

Top