sscanf question

B

Brian O'Brien

I am trying to extract a item description and price from a line of text.

A Cup of Coffee $1.23

n = sscanf(str, "%s%*[\t $]%f", item, &price);

But how can I make "A Cup of Coffee" as a single string? How does sscanf know that I don't want just the "A" of the sentence?
 
B

BartC

Brian O'Brien said:
I am trying to extract a item description and price from a line of text.

A Cup of Coffee $1.23

n = sscanf(str, "%s%*[\t $]%f", item, &price);

But how can I make "A Cup of Coffee" as a single string? How does sscanf
know that I don't want just the "A" of the sentence?

First specify how you would know, anyway, where the description ends, and
the price begins.

If the description is in a fixed-width field, then you might not need sscanf
at all. (And if variable width, I would probably use quotes around the
description.)
 
L

Les Cargill

Brian said:
I am trying to extract a item description and price from a line of text.

A Cup of Coffee $1.23

n = sscanf(str, "%s%*[\t $]%f", item, &price);

But how can I make "A Cup of Coffee" as a single string? How does sscanf know that I don't want just the "A" of the sentence?

How reliable is the "$" token? strstr(s,"$") any help:

char s[] = "A Cup Of Coffee $1.23";

char *p = strstr(s,"$");
if (p!=NULL)
{
p++;
sscanf(...
}
 
B

Ben Bacarisse

Les Cargill said:
Brian said:
I am trying to extract a item description and price from a line of text.

A Cup of Coffee $1.23

n = sscanf(str, "%s%*[\t $]%f", item, &price);

But how can I make "A Cup of Coffee" as a single string? How does sscanf know that I don't want just the "A" of the sentence?

How reliable is the "$" token? strstr(s,"$") any help:

char s[] = "A Cup Of Coffee $1.23";

char *p = strstr(s,"$");
if (p!=NULL)
{
p++;
sscanf(...
}

I might not bother as one can test for the $ is the sscanf call by
explicitly matching it. I would, however, make a couple of other
refinements. First, I'd limit the length of string input to avoid
buffer overruns and, second, I'd add a couple of spaces into the format
to make it more flexible. Basically they mean that extra spaces alone
will never be the reason that the scan fails.

char b[10];
floar p;
if (sscanf(str, " %9[^$\t] $%f", b, &p) == 2) {
// good to go...
}

You still need to trim trailing spaces from b, but it's probably about
as close to what the OP wants as possible.
 
E

Edward A. Falk

I am trying to extract a item description and price from a line of text.

A Cup of Coffee $1.23

n = sscanf(str, "%s%*[\t $]%f", item, &price);

But how can I make "A Cup of Coffee" as a single string? How does sscanf know that I
don't want just the "A" of the sentence?

You're dangerously close to needing to write your own parser or
using regexes, but I think this is do-able, as long as your
input format doesn't vary too much.

char item[80];

/* alphanum & blanks only */
n = sscanf(str, "%79[\t 0-9A-Za-z]$%f", item, &price);
OR
/* Anything but '$' */
n = sscanf(str, "%79[^$]$%f", item, &price);

You'll need to strip the trailing blanks from item. I don't think
there's a way to avoid that (unless you can guarantee that the separator
is a tab and there are no tabs in the item description).

Note the use of the field width. It's really not optional. This is a
security hole if you accept input from any untrusted source, and it's a
potential bug even if you do (someday, someone will edit that database
and not know about the length limitation.)

One final note: the use of float (or even double) to store monetary
amounts is looking for trouble. I speak from experience here.
 
P

Prathamesh Kulkarni

Brian O'Brien said:
I am trying to extract a item description and price from a line of text.
A Cup of Coffee $1.23
n = sscanf(str, "%s%*[\t $]%f", item, &price);
But how can I make "A Cup of Coffee" as a single string? How does sscanf know that I
don't want just the "A" of the sentence?



You're dangerously close to needing to write your own parser or

using regexes, but I think this is do-able, as long as your

input format doesn't vary too much.



char item[80];



/* alphanum & blanks only */

n = sscanf(str, "%79[\t 0-9A-Za-z]$%f", item, &price);

OR

/* Anything but '$' */

n = sscanf(str, "%79[^$]$%f", item, &price);



You'll need to strip the trailing blanks from item. I don't think

there's a way to avoid that (unless you can guarantee that the separator

is a tab and there are no tabs in the item description).



Note the use of the field width. It's really not optional. This is a

security hole if you accept input from any untrusted source, and it's a

potential bug even if you do (someday, someone will edit that database

and not know about the length limitation.)



One final note: the use of float (or even double) to store monetary

amounts is looking for trouble. I speak from experience here.

Please could you suggest some better ways
than float/double to represent monetary amounts ?
Thanks ;)
 
F

Fred K

Integer times 100 - i.e., in integer units of the smallest unit in the monetary system.

So $1.23 is stored as 123

The reason is to avoid roundoff and truncation errors.
 
B

BartC

Fred K said:
Integer times 100 - i.e., in integer units of the smallest unit in the
monetary system.

So $1.23 is stored as 123

The reason is to avoid roundoff and truncation errors.

You still get problems when doing calculations (there are remainders to take
care of).

Floating point is workable if you know that an amount might be meant to
represent a whole number of cents for example. Or you can use floating
point to represent cents rather than dollars ...
 
E

Eric Sosman

Please could you suggest some better ways
than float/double to represent monetary amounts ?

Represent sums of money as integer counts of the smallest
unit: Use cents or pence instead of dollars or pounds.

There's no really smooth way to do this with sscanf() and
friends. You can read "1.23" as a double, multiply by 100
(cents per dollar), and convert to integer, but be sure to
round before converting. (If you don't, numbers that aren't
exactly representable as double values may be mis-read. For
example, sscanf() might turn "1.23" into something along the
lines of 1.229999942656; the product would be 122.9999942656,
and un-rounded conversion would give you 122 cents instead
of the desired 123.)

Another problem with sscanf() for this purpose is that
it will accept "1.123" without complaint, even though that
input is probably a typo in this context. One possibility
would be to read the dollars and cents separately with
something like "%d.%2d" and then glue the pieces together,
but that approach would happily accept "12.3" -- again,
probably not a valid price. (Also, it would reject ".95",
which you might want to consider perfectly legal.)

It seems to me that getting sscanf() to do the job for
you, and do it well, might be at least as much work as doing
the job yourself with strtol() and strchr() and the rest.
 
K

Keith Thompson

BartC said:
You still get problems when doing calculations (there are remainders to take
care of).

Floating point is workable if you know that an amount might be meant to
represent a whole number of cents for example. Or you can use floating
point to represent cents rather than dollars ...

Floating-point can't exactly represent all integer values within its
range -- and operations that yield inexact results generally don't give
any warning about a loss of precision. If you can guarantee that all
the numbers you'll be dealing with are within the contiguous range of
integers that can be represented exactly, then you'll probably be ok.
But then why use floating-point rather a large integer type in the first
place?

For *serious* financial calculations, there are typically standards that
specify exactly how those calculations are to be carried out. In such
contexts, you need to find those standards and follow them.
 
G

glen herrmannsfeldt

(snip, someone wrote)
Floating-point can't exactly represent all integer values within its
range -- and operations that yield inexact results generally don't give
any warning about a loss of precision. If you can guarantee that all
the numbers you'll be dealing with are within the contiguous range of
integers that can be represented exactly, then you'll probably be ok.

Also, as long as you don't do division. Well, it will work for a
while, but can fail long before add, subtract, and multiply when the
result is rounded differently than you expect.

Now, with IEEE you know that the result is rounded.

IBM S/360 (and HFP on newer processors) truncates the quotient
(except on the 360/91 and related processors). The truncated
quotient might be a better choice if you are using it for
integer arithmetic.
But then why use floating-point rather a large integer type
in the first place?

Yes, good question. In the days of 16 bit integers and 64 bit
floating point, I know some used floating point for larger
integer values, but now with 64 bit integers on most machines
there isn't much of an excuse anymore.
For *serious* financial calculations, there are typically standards that
specify exactly how those calculations are to be carried out. In such
contexts, you need to find those standards and follow them.

Yes.

And also, as Knuth says, for typesetting.

-- glen
 
B

BartC

Floating-point can't exactly represent all integer values within its
range

64-bit floats can still represent some $45 trillion dollars, even if you
work in cents. Anything more than that, then perhaps you should make use of
some 'serious' methods. But we're talking about the price of coffee.
-- and operations that yield inexact results generally don't give
any warning about a loss of precision.

Integers aren't much better, when division is introduced. And division is
necessary to do things like percentages. Unless the proposal is not only to
use integers, but to use rationals so that you never get to do the division
(but I'd be interested in how you'd print out the results!).
But then why use floating-point rather a large integer type in the first
place?

Because integers don't do fractions very well. Not without using special
methods, at which point it becomes easier to just use floating point.

If you *know* that intermediate results need to be rounded to the nearest
cent, for example, then you just round to the nearest cent (ie. to the
nearest whole number). As far as add, subtract and multiply are concerned,
calculating using floating point whole numbers is little different to using
integers. But if you then have to calculate 15.9% tax on $53.49, it's a lot
easier with floating point.

(I've done this stuff where the total needed to be calculated in two
separate currencies. The conversion rate between them was specified to four
decimal places. Or you're buying by weight or length, and need to use either
metric or Imperial measure. These things just aren't integer-friendly!)
 
G

glen herrmannsfeldt

BartC said:
64-bit floats can still represent some $45 trillion dollars, even if you
work in cents. Anything more than that, then perhaps you should make use of
some 'serious' methods. But we're talking about the price of coffee.
Integers aren't much better, when division is introduced. And division is
necessary to do things like percentages. Unless the proposal is not only to
use integers, but to use rationals so that you never get to do the division
(but I'd be interested in how you'd print out the results!).

Integers are still a lot better.
Because integers don't do fractions very well. Not without using special
methods, at which point it becomes easier to just use floating point.
If you *know* that intermediate results need to be rounded to the nearest
cent, for example, then you just round to the nearest cent (ie. to the
nearest whole number). As far as add, subtract and multiply are concerned,
calculating using floating point whole numbers is little different to using
integers. But if you then have to calculate 15.9% tax on $53.49, it's a lot
easier with floating point.

Take 5349 (cents) multiply by 159 and then divide by 1000 to get the
truncated tax in cents. But with tax, you have to watch the rounding.
If you want the rounded value, add 500 before dividing by 1000.

Some years ago I lived in Maryland, where, as far as I could tell, the
tax was rounded up. You can add from 0 to 999 before dividing depending
on how you want it rounded. You do have to be sure that the intemediate
doesn't overflow, but with 64 bit integers that shouldn't be hard.

With floating point, the intermediate values might be rounded. You then
have to be careful to avoid double rounding, which can give an
unexpected wrong result.

With integer division, you control the rounding.
(I've done this stuff where the total needed to be calculated in two
separate currencies. The conversion rate between them was specified to four
decimal places. Or you're buying by weight or length, and need to use either
metric or Imperial measure. These things just aren't integer-friendly!)

Multiply be the value with four decimal places, after moving the decimal
point to the right four places, then divide by 10000. Not hard at all.
Again, add the appropriate amount before dividing to round as needed.

-- glen
 
B

Brian O'Brien

Yep.... I hear you when it comes to the floating point numbers...
Particularly .1 and .2 These number don't express themselves in binary ...at least not exactly.. So ya you can get lots of rounding errors.

I think scanf may not be the best way.. perhaps scanning from the right to the left until you find something that isn't a number or . would help extract the number... then anything from the left to the location found would bethe item... less any denomination symbol.

Thank you all for your input.
B.
 
B

BartC

glen herrmannsfeldt said:
Take 5349 (cents) multiply by 159 and then divide by 1000 to get the
truncated tax in cents. But with tax, you have to watch the rounding.
If you want the rounded value, add 500 before dividing by 1000.

OK, by using 'special methods' as I mentioned. You've now also introduced
the arbitrary figures 10 (10*15.9), 500 and 1000, and presumably need to
impose some arbitrary convention as to how percentage values are going to be
represented in the program (I guess you can't just have 15.9 or 0.159).

At this level of program, just using floating point is a lot easier! You
just need to round to a cent at every stage, taking care when values are
negative; one little function.
With floating point, the intermediate values might be rounded.

Do you have a calculation where that rounding (which will be to the
low-order bits of the representation) will actually give a different result
to doing the whole thing with integers? (And with ordinary amounts that are
likely to be encountered.)
Multiply be the value with four decimal places, after moving the decimal
point to the right four places, then divide by 10000. Not hard at all.

No. Waste of time inventing floating point, really!
 
G

glen herrmannsfeldt

(snip, someone wrote)
(then I wrote)
OK, by using 'special methods' as I mentioned. You've now also introduced
the arbitrary figures 10 (10*15.9), 500 and 1000, and presumably need to
impose some arbitrary convention as to how percentage values are going to be
represented in the program (I guess you can't just have 15.9 or 0.159).

Well, you can do it in fixed point decimal just fine. If you write
it in PL/I, instead of C, 15.9 is FIXED DECIMAL(3,1), that is, three
digits with one after the decimal point. If you multiply, then the
compiler knows what you mean and does it all for you, in fixed point.

They added complex numbers to C, yet not many people use them. Maybe
next we can have fixed decimal to make these calculations easier.
At this level of program, just using floating point is a lot easier! You
just need to round to a cent at every stage, taking care when values are
negative; one little function.
Do you have a calculation where that rounding (which will be to the
low-order bits of the representation) will actually give a different result
to doing the whole thing with integers? (And with ordinary amounts that are
likely to be encountered.)

With interest rates, it might be unusual, but, as previously mentioned,
exchange rates often go to four decimal places. It won't take such big
values to get rounding errors in that case. How long will it take you
to show that they don't occur?

You might look at: http://dl.acm.org/citation.cfm?id=221334
No. Waste of time inventing floating point, really!

Floating point was invented for scientific problems that range over
many orders of magnitude, and have inherent relative uncertainty.
That is, the uncertainty scales with the value.

In finance and typesetting, the uncertainty (cents or pixels) is fixed,
independent of the size of the values. In the case of absolute
uncertainty, fixed point arithmetic is a much better choice.

Calculations in exadollars or femtodollars are pretty rare,
but femtometers and exameters not so rare. (Nuclear physics
and astrophysics, respectively.)

It will take longer to prove that the rounding is right
in floating point than to do it right in fixed point.

-- glen
 
B

Ben Bacarisse

Brian O'Brien said:
Yep.... I hear you when it comes to the floating point numbers...
Particularly .1 and .2 These number don't express themselves in
binary ... at least not exactly..

It's a little odd to single out .1 and .2. The same is true of .3, .4,
..6, .7, .8 and .9.

<snip>
 
B

BartC

With interest rates, it might be unusual, but, as previously mentioned,
exchange rates often go to four decimal places. It won't take such big
values to get rounding errors in that case. How long will it take you
to show that they don't occur?

I'm asking you for *one* example where there might be a problem, but you're
asking me to verify zillions of possible calculations? That's not fair!

Actually I've now found my own examples where there might be differences.

For example, 1% of $1234.50 (0.01*123450). Whenever a result has an odd 0.5
cents that needs to be rounded up or down (with different rounding methods,
the problem just shifts elsewhere).

Although neither integer nor floating point will give the 'right' answer,
with integers it's possible to consistently round the same way.

So you're right, although the intermediate rounding, while affecting which
way it might go, has less to do with it than the errors in representing
values such as 0.01.

Of course you can just use whole number calculations as you suggested, but
still using floating point; showing they are at least versatile!
 
G

glen herrmannsfeldt

(snip, I wrote)
I'm asking you for *one* example where there might be a problem,
but you're asking me to verify zillions of possible calculations?
That's not fair!

When you put your money in a bank, you want to know that they will
do the interest computation right. You won't ask if they have an
example where they did it wrong, but that they do it right under
all conditions.

Fortunately, you don't have to ask that because regulators will
ask for you.

-- glen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,822
Latest member
israfaceZa

Latest Threads

Top