Y
Yohan N. Leder
You can convert between ISO-8859-15 and UTF8, too.
What's the advantage rather than to just use ISO-8859-15 everywhere ?
You can convert between ISO-8859-15 and UTF8, too.
I can't really recommend one or the other. I prefer vendor-independend
standards and I'm a Unix guy, so I would generally prefer iso-8859-15.
OTOH, you probably have more Windows than Unix users, and the Unix users
are probably more able to work around charset issues, so windows-1252
will probably be less trouble to support.
Quoth Yohan N. Leder said:And, knowing the only difference between ISO-8859-1 and ISO-8859-15 is
the euro sign (from what I've understood), why not continue to use ISO-
8859-1 and manage to translate any euro sign to its HTML entity (&euro![]()
The main concern about input, in this case, is to know when to convert
this euro sign : before submission (maybe using javascript)
Yeuch!
or at STDIN
parsing time. The second one requiring that STDIN be not corrupted by
the presence of this outside-charset char
Quoth Yohan N. Leder said:What's the advantage rather than to just use ISO-8859-15 everywhere ?
Yohan said:And, knowing the only difference between ISO-8859-1 and ISO-8859-15 is
the euro sign (from what I've understood)
Yohan said:What you say
here is that PHP can *include* a Perl script ?
You're wrong there, there's more than one difference in the
conversion table
Column 1 is the local, single byte character value, column 2 is
Unicode, which is identical to Latin-1 for characters with code
under 256.
Well, frankly it would be better to write the geriatric Perl versions
out of the way, and get on and do the job properly! Thats the best
answer, to be honest.
No: PHP can load and execute other PHP files. It's the PHP equivalent of
modules in Perl.
That's fine for output, but if forms are submitted in the same charset
as the page the form was on, people won't be able to submit an entry
containing a euro. At least, not in any form you will be able to
understand.
There is no way of identifying a euro sign, however the browser submits
it (and non-broken browsers won't, anyway, as it's not valid). Every
8-bit byte is a valid ISO8859-1 character, so whatever single- or
multi-byte sequence the browser transmits for euro will just look like a
sequence of perfectly valid, but wrong, ISO8859-1 characters.
I think I would recommend either using 8859-15, or, if you think that's
dodgy,
1. work internally in iso8859-15,
2. make sure your output data is plain 7-bit ascii (HTML-escape
everything else),
3. mark the data as UTF-8 (this is valid, as UTF-8 is a strict
superset of 7-bit ascii)
4. decode the UTF-8 submissions into iso8859-15 yourself. This
shouldn't be too hard: there will be some 128 two-byte sequences
you want to translate to single bytes, and any other top-bit-set
character is an error. If you're feeling lazy you could fork
iconv(1). You may be able to rip bits from one of the
Unicode::* modules, though I'd expect the actual decoding
routines to be in C (which I guess is no use to you).
Ben
Hmm, I have to think deeper about your solution above and the one Alan
talks about (UTF-8 support trough full "&code;" encoding).
I have to
decide but still a little bit undecided at this time : what don't the
unicode be not invented from the beginning :-?
Quoth Yohan N. Leder said:Hmm; OK about the html entities to handle unicode values... Now, I've to
turn around and think deeper about your way and the Ben's one.
Well, however, imagine I'll release a second version without any support
for Perl before to 5.8
: how to support UTF-8 in full (i/o and
internally). What the key points to check and/or rewrite in scripts ?
Does all regex and built_in functions support to work from UTF-8 strings
?
What about litteral strings (the configurable one I told about) ?
Yohan N. Leder:
No: PHP can load and execute other PHP files. It's the PHP equivalent
of modules in Perl.
I believe Alan and I are suggesting materially the same thing, if that
helps you understand it better (two ways of explaining things are
usually better than one.
It may be easier to write that version first, and get the
algorithms/whatever right before worrying about the character encoding.
If you write clean code it should be fairly straightforward to add the
conversions afterwards.
Yup. You need to mark filehandles with their encoding, using binmode or
3-arg open: see perlunicode. If you're getting data from CGI variables,
you may need to decode it into Perl's internal format: see Encode for
that.
You can use the encoding or utf8 pragmas to specify what charset your
source file is in, which includes the literal strings. See their
documentation.
Ben
Quoth Yohan N. Leder said:OK, it's effectively less complex. It's a pity I've to provide these old
plateform first...
But, a idea is born in my mind reading you : what do
you think about the tools which turn a Perl source in an exe ? Don't
they embedd a 5.8 Perl interpreter ?
But you can write the new version first, if you like. Install perl on
your local machine.
They do. However, if your admins will let you install a custom CGI
binary, but won't upgrade perl, then they need their heads examining.![]()
Want to reply to this thread or ask your own question?
You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.