Problem reading/writing U.K. pound sign

L

loial

I am reading and writing a files which contains the U.K pound sign £

But it is not being written correctly to the output file, even though
I am specifying UTF-8

Should this code work?

Reading :

InputStream fr;
BufferedReader br;
try {
fr = new FileInputStream(strDocumentFile);
br = new BufferedReader(new InputStreamReader(fr, "UTF-8"));
}
catch (java.io.FileNotFoundException e) {
String strReturn = "FileNotFoundException trying to open " +
strInFile;
traceError (strReturn);
return strReturn;

}

catch (java.io.UnsupportedEncodingException e) {
String strReturn = "FileNotFoundException trying to open " +
strInFile;
traceError (strReturn);
return strReturn;

}

String s = null;
do {
try {
s = br.readLine();

} catch (Exception e) {
String strReturn = getExceptionStackString(e) + " while
reading " + strInFile;
traceError (strReturn);
return strReturn;
}

Writing :


OutputStream fw;
BufferedWriter bw;
try {
fw = new FileOutputStream(strOutFile,true);
bw = new BufferedWriter(new OutputStreamWriter(fw, "UTF-8"));
}
catch (java.io.FileNotFoundException e) {
strReturn = "FileNotFoundException trying to open " +
strOutFile;
traceError (strReturn);
return strReturn;

}

catch (java.io.UnsupportedEncodingException e) {
strReturn = "FileNotFoundException trying to open " +
strOutFile;
traceError (strReturn);
return strReturn;

}

bw.write(s);
 
L

loial

Here is my one line test data file

001£999.99



Here is my test program code which I am compiling and running on
linux.


import java.io.*;
import java.util.*;
public class poundtesting {
public static void main(String[] args) {

String strReturn;
String strDocumentFile = "/home/john/poundtest";
InputStream fr;
BufferedReader br = null;
try {
fr = new FileInputStream(strDocumentFile);
br = new BufferedReader(new InputStreamReader(fr, "UTF-8"));
}
catch (java.io.FileNotFoundException e) {
strReturn = "FileNotFoundException trying to open " +
strDocumentFile;
System.out.println(strReturn);

}

catch (java.io.UnsupportedEncodingException e) {
strReturn = "FileNotFoundException trying to open " +
strDocumentFile;
System.out.println(strReturn);

}

String s = null;
String outline = null;
do {
try {
s = br.readLine();
if (s != null) {
outline = s;
System.out.println(s);
}

} catch (Exception e) {
strReturn = " Error while reading " + strDocumentFile;
System.out.println(strReturn);
}
} while (s != null);

String strBatchFile = "/home/john/poundout";
OutputStream fw;
BufferedWriter bw = null;
try {
fw = new FileOutputStream(strBatchFile,true);
bw = new BufferedWriter(new OutputStreamWriter(fw, "UTF-8"));
}
catch (java.io.FileNotFoundException e) {
strReturn = "FileNotFoundException trying to open " +
strBatchFile;
System.out.println(strReturn);

}

catch (java.io.UnsupportedEncodingException e) {
strReturn = "FileNotFoundException trying to open " +
strBatchFile;
System.out.println(strReturn);

}
try {
bw.write(outline);
bw.newLine();

} catch (IOException e) {
strReturn = "IOEXception trying to write " + strBatchFile ;
System.out.println(strReturn);
}




try {
br.close();
bw.close();
} catch (Exception e) {
// Don't care
}
}
}
 
R

RedGrittyBrick

loial said:
Here is my one line test data file
001£999.99

Here is my test program code which I am compiling and running on
linux.

<snip>

Your code works correctly on my PC. The output file contains a pound
sign encoded as code-point 0xa3 which is correct for UTF-8 and for
ISO-8859-1 Latin1.

What byte-values do you see in your output file? Can you post a hex dump?
 
J

John B. Matthews

loial said:
Here is my one line test data file

001£999.99

Your program appears to work as expected on Mac OS X 10.5.8, Java 1.5
and Ubuntu 9.10, Java version 1.6, as does Pete's. Here's the data file
contents, identical before and after. It looks correctly encoded to me:

000000: 30 30 31 c2 a3 39 39 39 2e 39 39 0a 001£999.99.
Here is my test program code which I am compiling and running
on linux.

What distribution, version and Java implementation?
 
L

Lothar Kimmeringer

RedGrittyBrick said:
The output file contains a pound
sign encoded as code-point 0xa3 which is correct for UTF-8 and for
ISO-8859-1 Latin1.

It surely isn't correct for UTF8. You have missed the peceding
0xc2 or there is something wrong with your test.


Regards, Lothar
--
Lothar Kimmeringer E-Mail: (e-mail address removed)
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!
 
L

Lothar Kimmeringer

bugbear said:
Depends wether you're talking about and encoding or a code point.

The thread is about encoding and "RedGrittyBrick" say "encoded
as" leading me to the assumtion that his posting is as well.


Regards, Lothar
--
Lothar Kimmeringer E-Mail: (e-mail address removed)
PGP-encrypted mails preferred (Key-ID: 0x8BC3CD81)

Always remember: The answer is forty-two, there can only be wrong
questions!
 
R

RedGrittyBrick

Lothar said:
It surely isn't correct for UTF8. You have missed the peceding
0xc2 or there is something wrong with your test.

I was using gvim to inspect the output file, it showed the £ correctly
when I told it the file was utf8 encoded. I used Gvim's ga command to
show the code-point of the character under the cursor. I forgot about
the multibyte encoding details (For which I should have used the g8
command).

Thanks for the correction.
 
J

John B. Matthews

Lothar Kimmeringer said:
The thread is about encoding and "RedGrittyBrick" say "encoded
as" leading me to the assumtion that his posting is as well.

I couldn't see anything wrong with RGB's statement, but I frequently
stumble over the terminology. This is especially true in column A of the
Latin-1 Supplement [1], where the code-point and code-value seem to
coincide. I've broken it down to make sure I understand [2]. I'd welcome
any corrections or clarifications.

Glyph: £ (pound sign)
Unicode code-point, escape: \u00a3
UCS-4/UCS-32 code-value, hex: 0xa3
UTF-8 encoding, no BOM, hex octets: c2a3

Given the UTF-8 octet sequence for the UCS-4 range 0000 0080-0000 07FF,

110xxxxx 10xxxxxx
-------- --------
c2 a3 = 11000010 10100011
10 100011 = 10100011 = a3

Mac users may like the desktop Calculator's "Programmer View", which
conveniently displays ASCII or Unicode glyphs.

[1]<http://www.unicode.org/charts/PDF/U0080.pdf>
[2]<http://www.ietf.org/rfc/rfc2279.txt>
 
R

Roedy Green

I am reading and writing a files which contains the U.K pound sign £

But it is not being written correctly to the output file, even though
I am specifying UTF-8

See http://mindprod.com/jgloss/sscce.html

If you post a complete programs, it is easier for people to help you.
They don't then have to write a sandwich to run your code.

For code to write/read a pound sign see
http://mindprod.com/applet/fileio.html

For the pound sign use '\u00a3'. If you use the character plain, it
may get scrambled if your source code is not UTF-8 too.

Recall that the console is likely not UTF-8, no displaying a result
there will likely get scrambled.

see http://mindprod.com/jgloss/encoding.html
 
M

Martin Gregorie

I am reading and writing a files which contains the U.K pound sign £

But it is not being written correctly to the output file, even though I
am specifying UTF-8

[snip code]

One alternative is to use "GBP" instead, at least for output. How much
control do you have over the format of the input files?
In a multicurrency financial program I'd expect to see the ISO currency
codes used rather than currency symbols for both input and output.

Many systems will accept further abbreviations too, e.g. "GBP 32.00" or
"USD 1.5B", and I wouldn't expect drop-down currency lists to be used
either, since entering a single field like those shown is faster than
using a mouse to select from a currency list and then typing the amount.
 
R

Roedy Green

In a multicurrency financial program I'd expect to see the ISO currency
codes used rather than currency symbols for both input and output.

This is CSV file the CurrCon uses to determine the currency symbol:

#currency abbr, decimals, currency symbol, currency name
AED, 2, \u00a4, Utd. Arab Emir. Dirham
AFA, 2, \u00a4, Afghanistan Afghani
ALL, 2, \u00a4, Albanian Lek
ANG, 2, \u00a4, NL Antillian Guilders
AON, 2, \u00a4, Angolan New Kwanza
ARS, 2, \u20b1, Argentine Pesos
AUD, 2, $, Australian Dollars
AWG, 2, \u00a4, Aruban Florins
BBD, 2, $, Barbados Dollars
BDT, 2, \u00a4, Bangladeshi Taka
BGL, 2, \u00a4, Bulgarian Lev
BHD, 2, \u00a4, Bahraini Dinars
BIF, 0, \u20a3, Burundi Francs
BMD, 2, $, Bermudian Dollars
BND, 2, $, Brunei Dollars
BOB, 2, \u00a4, Bolivian Boliviano
BRL, 2, \u20a2, Brazilian Real
BSD, 2, $, Bahamanian Dollars
BTN, 2, \u00a4, Bhutan Ngultrum
BWP, 2, \u00a4, Botswana Pula
BZD, 2, $, Belize Dollars
CAD, 2, $, Canadian Dollars
CHF, 2, \u20a3, Swiss Francs
CLP, 0, \u20b1, Chilean Pesos
CNY, 2, \u00a4, Chinese Yuan Renminbi
COP, 2, \u20b1, Colombian Pesos
CRC, 2, \u20a1, Costa Rican Colon
CSK, 2, \u00a4, Czech Koruna
CUP, 2, \u20b1, Cuban Pesos
CVE, 2, \u00a4, Cape Verde Escudos
CYP, 2, \u00a3, Cyprus Pound
DJF, 0, \u20a3, Djibouti Francs
DKK, 2, \u00a4, Danish Krone
DOP, 2, \u20b1, Dominican R. Pesos
DZD, 2, \u00a4, Algerian Dinars
ECS, 0, \u00a4, Ecuador Sucre
EEK, 2, \u00a4, Estonian Kroon
EGP, 2, \u00a3, Egyptian Pounds
ETB, 2, \u00a4, Ethiopian Birr
EUR, 2, \u20ac, Euros
FJD, 2, $, Fiji Dollars
FKP, 2, \u00a3, Falkland Islands Pounds
GBP, 2, \u00a3, British Pounds
GHC, 2, \u20b5, Ghanaian Cedi
GIP, 2, \u00a3, Gibraltar Pounds
GMD, 2, \u00a4, Gambian Dalasi
GNF, 0, \u20a3, Guinea Francs
GTQ, 2, \u00a4, Guatemalan Quetzal
GYD, 2, $, Guyanese Dollars
HKD, 2, $, Hong Kong Dollars
HNL, 2, \u00a4, Honduran Lempira
HRK, 2, \u00a4, Croatian Kuna
HTG, 2, \u00a4, Haitian Gourde
HUF, 2, \u00a4, Hungarian Forint
IDR, 2, \u00a4, Indonesian Rupiah
ILS, 2, \u20aa, Israeli New Shekels
INR, 2, \u20a8, Indian Rupee
IRR, 2, \ufdfc, Iranian Rial
ISK, 2, \u00a4, Iceland Krona
JMD, 2, $, Jamaican Dollars
JOD, 2, \u00a4, Jordanian Dinars
JPY, 0, \u00a5, Japanese Yen
KES, 2, \u00a4, Kenyan Shillings
KHR, 2, \u17db, Cambodian Riel
KMF, 0, \u20a3, Comoros Francs
KPW, 2, \u20a9, North Korean Won
KRW, 0, \u20a9, South-Korean Won
KWD, 2, \u00a4, Kuwaiti Dinar
KYD, 2, $, Cayman Islands Dollars
KZT, 2, \u00a4, Kazakhstan Tenge
LAK, 2, \u20ad, Lao Kip
LBP, 2, \u00a3, Lebanese Pounds
LKR, 2, \u00a4, Sri Lanka Rupees
LRD, 2, $, Liberian Dollars
LSL, 2, \u00a4, Lesotho Loti
LTL, 2, \u00a4, Lithuanian Litas
LVL, 2, \u00a4, Latvian Lats
LYD, 2, \u00a4, Libyan Dinar
MAD, 2, \u00a4, Moroccan Dirham
MGF, 0, \u20a3, Malagasy Francs
MMK, 2, \u00a4, Myanmar Kyat
MNT, 2, \u20ae, Mongolian Tugrik
MOP, 2, \u00a4, Macau Pataca
MRO, 2, \u00a4, Mauritanian Ouguiya
MTL, 2, \u20a4, Maltese Lira
MUR, 2, \u00a4, Mauritius Rupee
MVR, 2, \u00a4, Maldive Rufiyaa
MWK, 2, \u00a4, Malawi Kwacha
MXP, 2, \u20b1, Mexican Pesos
MYR, 2, \u00a4, Malaysian Ringgit
MZM, 2, \u00a4, Mozambique Metical
NAD, 2, $, Namibia Dollars
NGN, 2, \u20a6, Nigerian Naira
NIO, 2, \u00a4, Nicaraguan Cordoba Oro
NOK, 2, \u00a4, Norwegian Kroner
NPR, 2, \u00a4, Nepalese Rupees
NZD, 2, $, New Zealand Dollars
OMR, 2, \ufdfc, Omani Rial
PAB, 2, \u00a4, Panamanian Balboa
PEN, 2, \u00a4, Peruvian Nuevo Sol
PGK, 2, \u00a4, Papua New Guinea Kina
PHP, 2, \u20b1, Philippine Pesos
PKR, 2, \u00a4, Pakistan Rupee
PLZ, 2, \u00a4, Polish Zloty
PYG, 0, \u20b2, Paraguay Guarani
QAR, 2, \ufdfc, Qatari Rial
ROL, 2, \u00a4, Romanian Leu
RSD, 0, \u00a4, Serbian dinar
RUB, 2, \u00a4, Russian Roubles
SAR, 2, \u00a4, Saudi Riyal
SBD, 2, $, Solomon Islands Dollars
SCR, 2, \u00a4, Seychelles Rupees
SDD, 2, \u00a4, Sudanese Dinars
SEK, 2, \u00a4, Swedish Krona
SGD, 2, $, Singapore Dollars
SHP, 2, \u00a3, St. Helena Pounds
SIT, 2, \u00a4, Slovenian Tolar
SLL, 2, \u00a4, Sierra Leone Leone
SOS, 2, \u00a4, Somali Shillings
SRG, 2, \u00a4, Suriname Guilder
STD, 2, \u00a4, Sao Tome/Principe Dobra
SVC, 2, \u20a1, El Salvador Colon
SYP, 2, \u00a3, Syrian Pounds
SZL, 2, \u00a4, Swaziland Lilangeni
THB, 2, \u0e3f, Thai Baht
TND, 2, \u00a4, Tunisian Dinars
TOP, 2, \u00a4, Tonga Pa'anga
TRL, 0, \u20a4, Turkish Lira
TTD, 2, $, Trinidad/Tobago Dollars
TWD, 2, $, Taiwan Dollars
TZS, 2, \u00a4, Tanzanian Shillings
UAH, 2, \u20b4, Ukraine Hryvnia
UGS, 2, \u00a4, Uganda Shillings
USD, 2, $, US Dollars
UYP, 2, \u20b1, Uruguayan Pesos
VEB, 2, \u00a4, Venezuelan Bolivar
VND, 2, \u20ab, Vietnamese Dong
VUV, 0, \u00a4, Vanuatu Vatu
WST, 2, \u00a4, Samoan Tala
XAF, 0, \u20a3, CFA Franc BEAC
XCD, 2, $, East Caribbean Dollars
XOF, 0, \u20a3, CFA Franc BCEAO
XAG, 2, \u0020, Silver (oz.)
XAU, 3, \u0020, gold (oz.)
XPT, 3, \u0020, platitum (oz.)
YER, 2, \ufdfc, Yemeni Rial
YUN, 2, \u00a4, Yugoslav Dinars
ZAR, 2, \u00a4, South African Rand
ZMK, 2, \u00a4, Zambian Kwacha
ZWD, 2, $, Zimbabwe Dollars
 
M

Martin Gregorie

This is CSV file the CurrCon uses to determine the currency symbol:

#currency abbr, decimals, currency symbol, currency name
Yes, that's the info you'd need, but its faster for the user if its used
to validate the input string after entry, rather than to produce a long,
scrollable selection list. You need the decimal place info for both
validation and to expand abbreviations like 1.5M correctly. I'm intrigued
to see that there are no longer any currencies with three decimal places.
Some years back a few middle eastern currencies used them.
 
L

Lew

Martin said:
I'm intrigued
to see that there are no longer any currencies with three decimal places.
Some years back a few middle eastern currencies used them.

While it is unconventional to quote, say, USD prices to the eighth-
cent, mil or beyond, it is not unheard of.

Admittedly it's a special-purpose use case when it happens, but one
should not always be too convinced that the minimum useful monetary
precision is one one-hundredth of the currency unit.
 
T

Tom Anderson

While it is unconventional to quote, say, USD prices to the eighth-
cent, mil or beyond, it is not unheard of.

Admittedly it's a special-purpose use case when it happens, but one
should not always be too convinced that the minimum useful monetary
precision is one one-hundredth of the currency unit.

British Telecom phone bills are worked out in tenths of a penny
(millipounds?), then rounded (down) to pence at the end.

tom

--
The other big thing is the method by which these new discoveries had
been made. They had not been made in studies. They were not made by
the ransacking of ancient texts. Nobody deduced the existence of Nova
Scotia. These things were discovered by the very simple process of
driving a ship into them. A ship is a form of scientific instrument. --
Allan Chapman
 
M

Martin Gregorie

While it is unconventional to quote, say, USD prices to the eighth-
cent, mil or beyond, it is not unheard of.

Admittedly it's a special-purpose use case when it happens, but one
should not always be too convinced that the minimum useful monetary
precision is one one-hundredth of the currency unit.
Sure, but that sort of manipulation will usually be in some sort of
costing calculation - part of a BOM package. You also see it in UK equity
prices, which are quoted in pence to two decimal places rather than
pounds.

I was referring in particular to currency amounts within financial
transactions which AFAIK always specify the number of decimal places to
be used for a given currency.

Thomas Pornin is also correct - its usual to specify the exact
calculation rules to be applied to currency conversions. The European
Central Bank certainly defines the rules for converting between Euros and
other currencies and IIRC so does S.W.I.F.T.
 
M

Martin Gregorie

[...] I'm intrigued
to see that there are no longer any currencies with three decimal
places. [...]

Today's quote for Motors Liquidation Company (MTLQQ.PK, the
successor/inheritor/janitor/whatever for General Motors) is a whopping
0.7112 USD, up 0.0302 from yesterday's close of 0.681. (Source:
<http://finance.yahoo.com/q?s=MTLQQ.PK>)

Thats a stock exchange price, not a currency amount, which is what I was
muttering about.
 
A

Arne Vajhøj

[...] I'm intrigued
to see that there are no longer any currencies with three decimal
places. [...]

Today's quote for Motors Liquidation Company (MTLQQ.PK, the
successor/inheritor/janitor/whatever for General Motors) is a whopping
0.7112 USD, up 0.0302 from yesterday's close of 0.681. (Source:
<http://finance.yahoo.com/q?s=MTLQQ.PK>)

Thats a stock exchange price, not a currency amount, which is what I was
muttering about.

Words can have different meanings to different people.

But I would expect most developers to consider any price
including the price of 1 stock a currency amount.

Arne
 
M

Martin Gregorie

On 1/13/2010 8:08 AM, Martin Gregorie wrote:
[...] I'm intrigued
to see that there are no longer any currencies with three decimal
places. [...]

Today's quote for Motors Liquidation Company (MTLQQ.PK, the
successor/inheritor/janitor/whatever for General Motors) is a whopping
0.7112 USD, up 0.0302 from yesterday's close of 0.681. (Source:
<http://finance.yahoo.com/q?s=MTLQQ.PK>)

Thats a stock exchange price, not a currency amount, which is what I
was muttering about.

Words can have different meanings to different people.

But I would expect most developers to consider any price including the
price of 1 stock a currency amount.
Respectfully disagree. Its a common issue with pricing anything that is
sold only in multi-packs - and this includes equities, which are almost
never bought or sold as single items. For instance:

- Eric quoted MTLQQ.PK at 0.7112 USD. You'd normally buy equities
in shapes of at least $100 but, if you held a minimum quantity and took
dividends as additional shares you might end up with a dividend of,
say, 6 shares, value $4.2672, plus $0.68 c/f from a dividend of $4.95

IOW, the calculation will always be adjusted so any cash amount, in
this case the dividend and the c/f value, will be in dollars and whole
cents.

- If you're building a widget that needs 8 rivets, which are sold $37.50
in packs of 1000 the BOM package will for certain use a unit price
of 0.03750 when costing the widget but, again, you'll never see that
used as a monetary amount. Its just a cost factor.
 
L

Lew

Martin said:
Respectfully disagree. Its a common issue with pricing anything that is
sold only in multi-packs - and this includes equities, which are almost
never bought or sold as single items. For instance:

- Eric quoted MTLQQ.PK at  0.7112 USD. You'd normally buy equities
  in shapes of at least $100 but, if you held a minimum quantity and took
  dividends as additional shares you might end up with a dividend of,
  say, 6 shares, value $4.2672, plus  $0.68 c/f from a dividend of $4..95

  IOW, the calculation will always be adjusted so any cash amount, in
  this case the dividend and the c/f value, will be in dollars and whole
  cents.

- If you're building a widget that needs 8 rivets, which are sold $37.50
  in packs of 1000 the BOM package will for certain use a unit price
  of 0.03750 when costing the widget but, again, you'll never see that
  used as a monetary amount. Its just a cost factor.

"Monetary" is not synonymous with "currency"?

Just because 0.03750 is not an amount you actually paid in the
transaction doesn't mean it isn't a currency amount; it's just not the
actual currency amount of the actual transaction. You still have to
deal with the unit price when working up the transaction.

The computer system for Acme Widget Co. still has to handle that unit
price in its calculations and records. How else would you display the
answer to the query, "What is the unit price of the widget?" other
than as a currency amount?

Assuming you're using "currency amount" in the obvious, normal sense
of "monetary amount expressed in currency units". If you have some
non-obvious, ergo far less useful definition of "currency amount", do
please share it.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,955
Messages
2,570,117
Members
46,705
Latest member
v_darius

Latest Threads

Top