W
williams.wilkie
Hello! I have recently been turned on to Encode. We have some folks
who are copying and pasting from Word straight into our CMS and the
need to convert from "Windows-1252" to "utf-8" is now critical.
For a one liner I have been using this....
perl -MEncode=from_to -i -pe 'from_to($_, "windows-1252", "utf-8")'
file1.txt file2.txt
Works good for editing in place.
My quandry is that now I need to tackle multiple files in a directory
and another developer mentioned that if "UTF-8" and "Windows-1252" are
intermixed in a file that it may get confused and I should do a
transliteration like..
tr/\x93/\N{LEFT DOUBLE QUOTATION MARK}/;
I wonder if that's really true and when it comes to open and closing
file handles for this should I be using something like "binmode
OUTPUTFILEHANDLE, ':bytes';"
I am impressed with Encode but any advice or words that anyone wants
to throw in would be greatly appreciated.
Wilkie
flames go quietly to /dev/null
who are copying and pasting from Word straight into our CMS and the
need to convert from "Windows-1252" to "utf-8" is now critical.
For a one liner I have been using this....
perl -MEncode=from_to -i -pe 'from_to($_, "windows-1252", "utf-8")'
file1.txt file2.txt
Works good for editing in place.
My quandry is that now I need to tackle multiple files in a directory
and another developer mentioned that if "UTF-8" and "Windows-1252" are
intermixed in a file that it may get confused and I should do a
transliteration like..
tr/\x93/\N{LEFT DOUBLE QUOTATION MARK}/;
I wonder if that's really true and when it comes to open and closing
file handles for this should I be using something like "binmode
OUTPUTFILEHANDLE, ':bytes';"
I am impressed with Encode but any advice or words that anyone wants
to throw in would be greatly appreciated.
Wilkie
flames go quietly to /dev/null