html to pdf

S

Surbhi

Hi
We have HTML datasheets but now we want then in PDF format because
page layout is very bad when HTML is printed.

I am through with the XML and XSLT part. But donot hav any idea of XSL-
FO.
I guess, i would need to use "fo" tags in the XSLT. Could someone
suggest me some good reference material, or pointers for this.

Its a bit urgent, please do help.

Best Regards
Surbhi
 
A

Andy Dingley

We have HTML datasheets but now we want then in PDF format because
page layout is very bad when HTML is printed.

I wouldn't give up on printed HTML, but I can understand how you're
thinking here.

You have two options, "Print HTML to PDF", using tools such as Adobe's
(expensive) or Foxit (open-source, simpler, free)

Otherwise go down the XSLT, XSL:FO, PDF route. You'll probably find
Apache FOP to be the easiest route from :FO to PDF. I do this a lot (a
quarter of my working day) and host it all within Java and Ant as a
"make" framework to glue it all together. For bigger systems, Cocoon
or Apache Forrest are worth looking at too.

Learning to code the XSL:FO is painful, the rest is well-established
pipeline tools that just get on with it and work. You'll find that
good HTML + CSS knowledge is a good starting point to understanding
XSL:FO properties and rendering. If you have that, then just the W3C
specs for XSL:FO are enough to work with. If you want a CSS
background, read Lie & Bos "Cascading Style Sheets". Usenet group
c.i.w.a.s is good too.

This stuff isn't an easy bit of knowledge to learn, so start simple
and get _something_ working first, then look to expand it. It's very
useful long-term though, so it does repay the effort.
 
K

Ken Starks

Surbhi said:
Hi
We have HTML datasheets but now we want then in PDF format because
page layout is very bad when HTML is printed.

I am through with the XML and XSLT part. But donot hav any idea of XSL-
FO.
I guess, i would need to use "fo" tags in the XSLT. Could someone
suggest me some good reference material, or pointers for this.

Its a bit urgent, please do help.

Best Regards
Surbhi

Whether you would be better
to transform your XML directly to 'Formatting Objects' or to
transform it indirectly ( to 'Docbook' or something similar with an
off-the-shelf transformation to xsl-fo ) is a moot point.

There are also a few off-the-shelf stylesheets that convert html
directly to pdf, but the Typographic quality varies.


Whatever, you still need a 'serialiser' to convert the xsl-fo into pdf.
Apache FOP is a popular open source one.

Wikipedia is a good place to start:

http://en.wikipedia.org/wiki/XSL_Formatting_Objects


If you want a complete system, so you can concentrate of learning one
bit at a time (such as xsl-fo),
you could try Apache Cocoon, and use the 'Hello World' example
where the same XML is converted to MANY output formats, including
pdf, xhtml, svg, postscript, flash, open document, Excel ...

(There is an example of the kind of stuff you need to
do, at:

http://cocoon.apache.org/2.1/howto/howto-html-pdf-publishing.html

)


Some of the major software vendors have tutorials in the use
of xsl-fo. For example 'Render-X' and 'Antenna House'; most of
the material is just as relevant to a free serialiser such as
Apache FOP. (If graphics are important, you may need a 'try and
see' approach , particularly with vector graphics and
transparency which are degraded or lost by some serialisers).

On the other hand, at least one serialiser now goes beyond the xsl-fo
specification, allowing rudimentary interactive forms in the pdf.

Finally, there are systems which you can use to convert your XML
to LaTeX, and from there you will get very high quality output. But
LaTeX is yet another massive leaning task if none of your team already
know it.
 
A

Andy Dingley

Whether you would be better
to transform your XML directly to 'Formatting Objects' or to
transform it indirectly ( to 'Docbook' or something similar with an
off-the-shelf transformation to xsl-fo ) is a moot point.

I wouldn't go down that route, via DocBook.

Of course this all depends a _lot_ on the quality of the HTML. HTML
3.2 with presentation guff all over it is a lot more trouble to work
with than pure-semantics HTML 4 + CSS. This is true for any processing
toolset. HTML 4 with a bad case of "divitis" is actually one of the
easiest targets for conversion to XSL:FO. Bad practice for coding
semantic HTML, but a closer match to your target here.

HTML is somewhat more generalised than DocBook, so converting
"upwards" to DocBook is unlikely to have any more structure implied in
it than is simply inferred automatically from the HTML. DocBook isn't
some fantastic panacea anyway - I've rarely used it in practice, as
its minor advantages over HTML are all too often outweighed by being
yet another format. Unless you need book-level structuring, if all you
need is inline markup, paragraphs and headings, then HTML 4 gives you
nearly as much anyway.

I'd consider going from HTML to DocBook if I was concatenating a
number of pages to make one single DocBook representing the whole set
as a site, but very rarely for single page stuff.

As to the use of pre-existing transforms for DocBook to XSL:FO, then
these are certainly available and well-done, but they're not as useful
as one might think. This is for two reasons: they're not as necessary
as one might think, and it's not so hard to do without them.

The off-the-shelf DocBook stylesheets have a big advantage in that
they're competent, full implementations of all DocBook elements. Now
most of us just don't need that, because we only author a tiny subset
of DocBook anyway. I've never used the <kitchen-sink> element,
although I'm sure DocBook has one somewhere. This is particularly the
case for auto-generated DocBook out of HTML. Secondly, it's not that
hard to write a minimal XSLT to make simple (i.e. little formatting
subtlety) XSL:FO. Thirdly it's harder to make XSL:FO with complex
formatting. If you don't need this, either use the pre-exisitng
stylesheet or write your own - neither is impossibly hard. If you _do_
need complex formatting, you probably have to write your own XSLT
whether you like it or not.
 
K

Ken Starks

I agree with you, Andy. DocBook is a poor example, being far too
heavy. I think I was really thinking of something more lightweight
such as LinuxDoc (which you can take into Lyx for tweaking). I have
also, recently, given .dita a quick spin, but it also seems to
be yet another format. (It too has many more elements than html,
by the way.)

Yours,

Ken.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,995
Messages
2,570,228
Members
46,818
Latest member
SapanaCarpetStudio

Latest Threads

Top