PDF::Writer and Unicode

X

Xavier Noria

According to the current manual PDF documents generated by =20
PDF::Writer can use UTF-16BE, but after a few trials with iconv I =20
can't get my UTF-8 strings right. Example:

$KCODE =3D 'u'

require 'rubygems'
require 'pdf/writer'
require 'iconv'

str =3D Iconv.iconv('UTF-16BE', 'UTF-8', '=E1 =DF =80')
pdf =3D PDF::Writer.new

# renders =E1 and =DF right, but not =80
pdf.text str

# same output with garbage prepended
pdf.text "\xfe\xff#{str}"
pdf.save_as('unicode_test.pdf')

The manual does not document if any encoding is needed for =20
select_font, I've played around with variations of

# gives complete garbage
pdf.select_font 'Times-Roman', :encoding =3D> 'UTF-16BE'

without luck.

TextMate is generating UTF-8 source files for sure. Any ideas?

-- fxn
 
V

Vincent Fourmond

Xavier said:
The manual does not document if any encoding is needed for select_font,
I've played around with variations of

# gives complete garbage
pdf.select_font 'Times-Roman', :encoding => 'UTF-16BE'

without luck.

I'm not familiar with PDF::Writer, but I would be surprised if you
really had all the glyphs for 'UTF-16BE' by default. What is the exact
output ? Does it produce the PDF file, or it simply fails with an
exception, or crashes ?

If a PDF file is produced (of reasonable size), would you mind posting
it ?

Cheers,

Vince
 
X

Xavier Noria

--Apple-Mail-1-51346835
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=WINDOWS-1252;
delsp=yes;
format=flowed

I'm not familiar with PDF::Writer, but I would be surprised if you
really had all the glyphs for 'UTF-16BE' by default. What is the exact
output ? Does it produce the PDF file, or it simply fails with an
exception, or crashes ?

If a PDF file is produced (of reasonable size), would you mind =20
posting
it ?

Sure, it's just 4KB. This is the PDF generated by

$KCODE =3D 'u'

require 'rubygems'
require 'pdf/writer'
require 'iconv'

str =3D Iconv.iconv('UTF-16BE', 'UTF-8', '=E1 =DF =80')
pdf =3D PDF::Writer.new
pdf.text str
pdf.text "\xfe\xff#{str}"
pdf.save_as('unicode_test.pdf')

As you see, the glyph we get wrong in this small test is the euro =20
symbol. This is important to me because not only my database in in =20
UTF-8 coming from an unrestricted UTF-8 frontend (website), but the =20
application has money here and there and needs to be able to output =20
that currency symbol.

-- fxn


--Apple-Mail-1-51346835
Content-Transfer-Encoding: quoted-printable
Content-Type: application/pdf;
x-unix-mode=0644;
name=unicode_test.pdf
Content-Disposition: inline;
filename=unicode_test.pdf

%PDF-1.3=0A%=E2=E3=CF=D3\n=0A1=200=20obj=0A<<=20/Type=20/Catalog=0A=
/Outlines=202=200=20R=0A/Pages=203=200=20R>>=0Aendobj=0A2=200=20obj=0A<<=20=
/Type=20/Outlines=20>>=0Aendobj=0A3=200=20obj=0A<<=20/Type=20/Pages=0A=
/Kids=20[6=200=20R=0A]=0A/Count=201=0A/Resources=20<<=0A/ProcSet=204=200=20=
R=0A/Font=20<<=20=0A/F1=208=200=20R=20>>=0A>>=0A/MediaBox=20[0=200=20=
612.0=20792.0]=0A=20>>=0Aendobj=0A4=200=20obj=0A[/PDF=20/Text=20]=0A=
endobj=0A5=200=20obj=0A<<=0A/CreationDate=20(D:200702161320)=0A/Creator=20=
(5_hello_world_utf_8.rb)=0A/Producer=20(PDF::Writer=20for=20Ruby)=0A>>=0A=
endobj=0A6=200=20obj=0A<<=20/Type=20/Page=0A/Parent=203=200=20R=0A=
/Contents=207=200=20R=0A>>=0Aendobj=0A7=200=20obj=0A<<=0A/Length=20158=20=BT=2036.000=20744.440=20Td=20/F1=2010.0=20Tf=200=20Tr=20(=00=E1=00=20=00=DF=
=00=20=20=AC)=20Tj=20ET=0ABT=2036.000=20732.880=20Td=20/F1=2010.0=20Tf=20=
0=20Tr=20(=FE=FF=00=E1=00=20=00=DF=00=20=20=AC)=20Tj=20ET=0Aendstream=0A=
endobj=0A=0A8=200=20obj=0A<<=20/Type=20/Font=0A/Subtype=20/Type1=0A/Name=20=
/F1=0A/BaseFont=20/Helvetica=0A/Encoding=20/WinAnsiEncoding=0A>>=0A=
endobj=0Axref=0A0=209=0A0000000000=2065535=20f=20=0A0000000016=2000000=20=
n=20=0A0000000080=2000000=20n=20=0A0000000117=2000000=20n=20=0A=
0000000259=2000000=20n=20=0A0000000288=2000000=20n=20=0A0000000407=20=
00000=20n=20=0A0000000470=2000000=20n=20=0A0000000680=2000000=20n=20=0A=0A=
trailer=0A=20=20<<=20/Size=209=0A=20=20=20=20=20/Root=201=200=20R=0A=20=
/Info=205=200=20R=0A=20=20>>=0Astartxref=0A787=0A%%EOF=0A=

--Apple-Mail-1-51346835
Content-Transfer-Encoding: 7bit
Content-Type: text/plain;
charset=US-ASCII;
format=flowed




--Apple-Mail-1-51346835--
 
V

Vincent Fourmond

Xavier said:
Sure, it's just 4KB. This is the PDF generated by

$KCODE = 'u'

require 'rubygems'
require 'pdf/writer'
require 'iconv'

str = Iconv.iconv('UTF-16BE', 'UTF-8', 'á ß €')
pdf = PDF::Writer.new
pdf.text str
pdf.text "\xfe\xff#{str}"
pdf.save_as('unicode_test.pdf')

As you see, the glyph we get wrong in this small test is the euro
symbol. This is important to me because not only my database in in UTF-8
coming from an unrestricted UTF-8 frontend (website), but the
application has money here and there and needs to be able to output that
currency symbol.

Actually, what you see on the screen is the latin1 representation of
your UTF-16BE string (see below). ^@ means chr 0 and seem to be ignored
by the PDF viewers, and UTF-16BE has the good taste to map to latin1 for
values up to 255. See what less unicode_test.pdf is giving me (I'm on a
latin1 locale):

BT 36.000 744.440 Td /F1 10.0 Tf 0 Tr (^@á^@ ^@ß^@ ¬) Tj ET
BT 36.000 732.880 Td /F1 10.0 Tf 0 Tr (þÿ^@á^@ ^@ß^@ ¬) Tj ET

Moreover, in this particular case, you are using the Helvetica
built-in font, and I'm pretty sure it doesn't have glyphes for a Euro
symbol. Finally, acroread says that the encoding of the font is 'ansi'.
That is definitely not what you want. Keep in mind that most of the
fonts (about everywhere) are defined for a small encoding (ansi/latin1,
or other 8bits encodings). I unfortunately don't think I can help you
further. If you don't rely too much yet on PDF::Writer, you could use
pdfLaTeX as an alternative, although PDF produced will be significantly
bigger (for small files)...

Welcome to the nightmare world of fonts and encodings...

Vince
 
A

Austin Ziegler

According to the current manual PDF documents generated by
PDF::Writer can use UTF-16BE, but after a few trials with iconv I
can't get my UTF-8 strings right. Example:

The manual is incorrect; I have recently figured out how to write
UTF-16 strings, but the current PDF::Writer doesn't do this (and there
are issues that I need to resolve before this will even show up in any
release of PDF::Writer).

-austin
 
S

Simon Kröger

Vincent said:
Welcome to the nightmare world of fonts and encodings...

.... and PDF generation in Ruby.

If this helps, you can see myself struggle with the same
problem here:

http://groups.google.de/group/comp.lang.ruby/browse_thread/thread/54336c6a932903fe/f0bb48520dac2ba5

I ended up using libharu (http://libharu.sourceforge.net/)

It is cross platform, FAST and has ruby bindings (it is a little bit
clumsy to use and the ruby bindings are missing some functions but
it is the best i could find)

example:
-----------------------------------------------------------------------
require "hpdf"

pdf = HPDFDoc.new
font = pdf.get_font("Helvetica", "CP1254")

page = pdf.add_page

page.set_size(HPDFDoc::HPDF_PAGE_SIZE_A4, HPDFDoc::HPDF_PAGE_PORTRAIT)
page.set_font_and_size(font, 96)

page.begin_text

page.move_text_pos(100, 700)
page.show_text("\x80")

page.end_text

pdf.save_to_file "c:/temp/test.pdf"
-----------------------------------------------------------------------

With a little love to the wrapper this could be really good...

cheers

Simon
 
X

Xavier Noria

Moreover, in this particular case, you are using the Helvetica
built-in font, and I'm pretty sure it doesn't have glyphes for a Euro
symbol.

Austin explained the issue. But to understand that remark in any
case, is that Helvetica in the PDF different from the Helvetica I use
in the system? The Helvetica here in the Mac certainly has the euro
symbol.

-- fxn
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,968
Messages
2,570,154
Members
46,702
Latest member
LukasConde

Latest Threads

Top