PDF to HTML conversion help...

G

Guest

Hello,

I tried the following post a few weeks ago and never received any replies,
so I figured I'd try again.

I'm seeking suggestions for an interesting problem I have. I'm building a
web application that presents legal documents that users must accept in order
to proceed on the site. The documents are in PDF format. The legal
department requires that I convert the PDF docs to HTML and display them on a
web page. I can't just display the PDF document (which would be simple
enough) but rather need to convert the PDF to HTML. This will allow me to
include the standard "I agree..." checkbox and submit button at the bottom of
the page. Users will only be able to get to the submit button by scrolling
down the entire HTML page, and thus theoretically reading the entire
agreement. I found a tool called "ABC Amber PDF Converter" that converts PDF
files to HTML, but the output quality leaves a LOT to be desired, with the
output being unusable. Plus, it requires shelling out to launch the
command-line exe. Do you have any other ideas, or know of any other tools
that might do the trick?

Thanks for your help,
Greg S.
 
A

Anthony Biondo Jr.

I know Adobe Acrobat writer allows you to embed forms in it. You could
easily create a form to post to your server and insert the data.

hope this helps,
Anthony
 
G

Guest

Hello Anthony,

Thanks for your suggestion. Actually, I'm using a tool named PDFKit.Net to
generate the PDF. Basically, I'm doing what you suggested in that I have a
PDF Form in which I dynamically insert data from the db. The only problem is
that the legal department wants me to email the customer the PDF version of
the document I generate, but display an HTML version of it on the web page.
Their requirement is that I display the whole thing as one long document
(with no scroll bars) so that the user is forced to scroll down the entire
page to get to the accept checkbox and submit button. They won't let me just
display the PDF document in a web page or in some sort of viewer control. So
I'm trying to find some way to convert the PDF document to HTML while keeping
most of the formatting in tact. Hopefully there's something out there that
does this...

Thanks for your help,
Greg
 
G

Guest

Ben,

Thanks for your suggestions. Unfortunately, because the server my web app
sits on is used for various other sites as well, and Adobe Acrobat tends to
be a bit of a resource hog, my bosses don't want to install it on the server.
Plus, since I will be generating the documents on the fly, I'm not sure how
Acrobat would handle simultaneous requests. I know MS doesn't recommend
using Office products in this fashion.

As for embedding the PDF in the HTML page, I thought of that one, too. The
problem is the legal department doesn't want it to be inside of a separate
control with scrollbars, as this conceivably would allow the user to scroll
down the page to get to the "I Agree" button without actually scrolling
through the legal agreement, so that one was a no go.

I really do appreciate your input, though.

Greg
 
H

Hans Kesting

webgreginsf said:
Hello Anthony,

Thanks for your suggestion. Actually, I'm using a tool named
PDFKit.Net to generate the PDF.

You are *generating* those PDF's yourself? Why not generate the
HTML version from the same data, instead of first going to PDF
and then to HTML?
If you want output in PDF format, use your existing code,
if you want output in HTML, generate that directly from the data,
exactly the way *you* want it.

Hans Kesting
 
G

Guest

Hans,

Thanks for your idea. That is actually a very good idea and might be what I
have to do. The only reason I was trying to stear clear from generating the
HTML is that these documents I'm generating/displaying are 30+ page monster
legal contracts that can change fairly often. The legal department creates
the originals in MS Word, which we then convert to a PDF form using Acrobat
Writer. This form serves as a "template" that I place out on the web server
and populate certain user-specific pieces of data like the user's name,
company info, etc. at run time. We were hoping to automate the process so
that the users can generate the PDF templates without any manual intervention
from my team. If I use the HTML method like you talked about, I'll probably
have to manually convert the monsters to HTML on a fairly regular basis.
They don't want MS Word on the server, so I can't just use the built-in Word
to HTML conversion to generate the HTML as the generated HTML uses
Word-specific tags that requires Word to be on the web server.

I think you're right, though, and doing the HTML myself is probably the best
option, so that I'll have two templates for each doc (both PDF and HTML
versions). I can then dynamically fill in the same info into both of them at
run time. It requires some manual steps, but hey something is better than
nothing, right?

Thanks for your help,
Greg
 
K

Kelly White

webgreginsf said:
Hans,

Thanks for your idea. That is actually a very good idea and might be what I
have to do. The only reason I was trying to stear clear from generating the
HTML is that these documents I'm generating/displaying are 30+ page monster
legal contracts that can change fairly often. The legal department creates
the originals in MS Word, which we then convert to a PDF form using Acrobat
Writer. This form serves as a "template" that I place out on the web server
and populate certain user-specific pieces of data like the user's name,
company info, etc. at run time. We were hoping to automate the process so
that the users can generate the PDF templates without any manual intervention
from my team. If I use the HTML method like you talked about, I'll probably
have to manually convert the monsters to HTML on a fairly regular basis.
They don't want MS Word on the server, so I can't just use the built-in Word
to HTML conversion to generate the HTML as the generated HTML uses
Word-specific tags that requires Word to be on the web server.

I think you're right, though, and doing the HTML myself is probably the best
option, so that I'll have two templates for each doc (both PDF and HTML
versions). I can then dynamically fill in the same info into both of them at
run time. It requires some manual steps, but hey something is better than
nothing, right?

Thanks for your help,
Greg

:
If you are using one of the latest versions of office save it out as xml
and transform the file into html that way on the server.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
474,001
Messages
2,570,254
Members
46,850
Latest member
VMRKlaus8

Latest Threads

Top