East european characters from LaTex to UTF8

F

Francois

Hi
With the module TeX::Encode and Encode, I convert characters from
LaTex to UTF8. It works great except for characters use in Slovacia,
for example c or z with caron: è ¾

TeX::Encode use the followings modules
use Encode::Encoding;
use Pod::LaTeX;
use HTML::Entities

and from the comments in TeX::Encode "It uses the the mapping from
Pod::LaTeX, but we use HTML::Entities
to get the Unicode character".
Is there another module I should install to convert these east
european characters ?
Thanks for any advice !

Francois
 
J

Joost Diepenmaat

Hi
With the module TeX::Encode and Encode, I convert characters from LaTex
to UTF8. It works great except for characters use in Slovacia, for
example c or z with caron: Ä Å¾

Which encoding are your original latex files? Plain 7bit ASCII or
ISO-8859-1 with latex markup for the special characters or something else?

If something else, it may help to open/read the latex files using the
right "lower level" encoding layer, for example, if you're using cp1250
for the latex files:

open my $fh,"<:encoding(cp1250)","/my/latex/file.tex" or die $!;

print decode('latex',<$fh>);

See also the manpages for perlio and Encode

Joost.
 
J

Joost Diepenmaat

print decode('latex',<$fh>);

Oops. That should probably be

print decode('latex',join('',<$fh>))

or something similar - decode accepts only a single input string.

Joost.
 
F

Francois

Which encoding are your original latex files? Plain 7bit ASCII or
ISO-8859-1 with latex markup for the special characters or something else?

The file is ascii: it's from google scholar with the Import BibTex
option on:

@article{fedor2007dea,
title={{Dissociative electron attachment to HBr: A temperature
effect}},
author={Fedor, J. and Cingel, M. and Skaln{\`y}, JD and Scheier, P.
and M{\"a}rk, TD and {\v{C}}{\'\i}{\v{z}}ek, M. and Koloren{\v{c}}, P.
and Hor{\'a}{\v{c}}ek, J.},
journal={Physical Review A},
volume={75},
number={2},
pages={22703},
year={2007},
publisher={APS}
}
 
J

Joost Diepenmaat

The file is ascii: it's from google scholar with the Import BibTex
option on

Hmm... Looks like Pod::LaTeX only handles iso 8858-1 characters.
You will probably have to add the extra characters you're using to
TeX::Encode yourself, or find some other way of converting latex to txt.

Joost.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,236
Members
46,825
Latest member
VernonQuy6

Latest Threads

Top