What is the best html to latex program on the market or the internet ?

vasan999 · Oct 22, 2007

Basically, it should do all that any of the tools below and in
addition,

1/
human readable output that maintains the text lines of the source, ie
does not scramble the text lines or insert newlines unnecessarily or
removes them. inserts minimal latex elements.

2/
maintains cross-links, ie convert <href to \ref and <name= to \label

but if the set of htmls is incomplete proceed with the assumption that
the reference is there, ie dont delete the links or try to modify them
or their addresses. One of the tool I tested is too smart in this
respect and actually ruins the result.

3/
proper conversion of images, tables, etc. No math mode involved in
html.

4/
Even an emacs lisp function could be written by a guru that can do the
job.

5/
Is there any commercial wysiwig tool ?

LaTeX etc

* html2latex is a program based on the NCSA html parser. Contact:
(e-mail address removed).
* Another html2latex can combine several HTML files into a single
LaTeX file, converting links between the files to references. External
URL's can be converted into footnotes or into a bibliography sorted on
URL. Contact: (e-mail address removed) (Frans J. Faase)
* Another html2latex implemented on Linux by yacc+lex+C. Also
available from the TSX-11 Linux FTP site as nc-html2latex-0.97.tar.gz.
Contact: (e-mail address removed) (Naoya Tozuka)
* htmlatex.pl is a perl script to do the conversion (may be moving
soon). Contact: (e-mail address removed) (Jake Kesinger)
* There is also a sed script to convert HTML into LaTeX.

vasan999 · Oct 23, 2007

The site says, that this will convert html to latex. Can anyone
explain me this
code? I am not familiar with such difficult commands especially there
are no
comments line by line explanation and overall operation.

1i\
\\documentstyle{article}
1i\
\\begin{document}
$a\
\\end{document}
# Too bad there's no way to make sed ignore case!
/<[Xx][Mm][Pp]>/,/<.[Xx][Mm][Pp]>/b lit
/<.[Xx][Mm][Pp]>/b lit
/<[Ll][Ii][Ss][Tt][Ii][Nn][Gg]>/,/<.[Ll][Ii][Ss][Tt][Ii][Nn][Gg]>/b
lit
/<.[Ll][Ii][Ss][Tt][Ii][Nn][Gg]>/b lit
/<[Pp][Rr][Ee]>/,/<.[Pp][Rr][Ee]>/b pre
/<.[Pp][Rr][Ee]>/b pre
# Stuff to ignore
s?<[Ii][Ss][Ii][Nn][Dd][Ee][Xx]>??
s?</[Aa][Dd][Dd][Rr][Ee][Ss][Ss]>??g
s?<[Nn][Ee][Xx][Tt][Ii][Dd][^>]*>??g
# character set translations for LaTex special chars
s?&gt.?>?g
s?&lt.?<?g
s?\\?\\backslash ?g
s?{?\\{?g
s?}?\\}?g
s?%?\\%?g
s?\$?\\$?g
s?&?\\&?g
s?#?\\#?g
s?_?\\_?g
s?~?\\~?g
s?\^?\\^?g
# Paragraph borders
s?<[Pp]>?\\par ?g
s?</[Pp]>??g
# Headings
s?<[Tt][Ii][Tt][Ll][Ee]>$[^<]*$</[Tt][Ii][Tt][Ll][Ee]>?\
\section*{\1}?g
s?<[Hh]n>?\\part{?g
s?</[Hh]n>?}?g
s?<[Hh]1>?\\section*{?g
s?</[Hh][0-9]>?}?g
s?<[Hh]2>?\\subsection*{?g
s?<[Hh]3>?\\subsubsection*{?g
s?<[Hh]4>?\\subsubsection*{?g
s?<[Hh]5>?\\paragraph{?g
s?<[Hh]6>?\\subparagraph{?g
# UL is itemize
s?<[Uu][Ll]>?\\begin{itemize}?g
s?</[Uu][Ll]>?\\end{itemize}?g
s?<[Ll][Ii]>?\\item ?g
# DL is description
s?<[Dd][Ll]>?\\begin{description}?g
s?</[Dd][Ll]>?\\end{description}?g
# closing delimiter for DT is first < or end of line which ever comes
first NO
#s?<[Dd][Tt]>$[^<]*$<?\\item[\1]<?g
#s?<[Dd][Tt]>$[^<]*$$?\\item[\1]?g
#s?<[Dd][Dd]>??g
s?<[Dd][Tt]>?\\item[<?g
s?<[Dd][Dd]>?]?g
# Other common SGML markup. this is ad-hoc
s?<sec[ab]>??
s?</sec[ab]>??g
# Italics
s?<it>$[^<]*$</it>?{\\it \1 }?g
# Get rid of Anchors

re
s?<[Aa][^>]*>??g
s?</[Aa]>??g
# This is a subroutine in sed, in case you are not a sed guru
: lit
s?<[Xx][Mm][Pp]>?\\begin{verbatim}?g
s?</[Xx][Mm][Pp]>?\\end{verbatim}?
s?<[Ll][Ii][Ss][Tt][Ii][Nn][Gg]>?\\begin{verbatim}?g
s?</[Ll][Ii][Ss][Tt][Ii][Nn][Gg]>?\\end{verbatim}?

vasan999 · Oct 23, 2007

maybe I should post in european tex groups also

Edd Barrett · Oct 23, 2007

maybe I should post in european tex groups also

Hi,

I don't know if this can be of help:
http://openwetware.org/wiki/User:Austin_J._Che/Extensions/LatexDoc

This is something that we are looking into to allow researchers to
distribute documents in both PDF and web-based (we hope).

Thanks

Edd

metaperl.com · Oct 23, 2007

I like PlasTeX.SF.Net

gnuist006 · Oct 23, 2007

I like PlasTeX.SF.Net

I think OP wanted html->latex

http://plastex.sourceforge.net/

SAS is currently using plasTeX to generate HTML and DocBook for
10,000+ pages of scientific documentation nightly.

Peter Flynn · Oct 23, 2007

The site says, that this will convert html to latex. Can anyone
explain me this code? I am not familiar with such difficult commands
especially there are no comments line by line explanation and overall
operation.

1i\
\\documentstyle{article}

[snip]

This is a sed(1) script. sed is a stream editor, available on most
platforms.

///Peter

Peter Flynn · Oct 23, 2007

Basically, it should do all that any of the tools below and in
addition,

You've already asked this, and been given the answer, but in case you
didn't see it...

XSLT.

Run your HTML through Tidy to produce XHTML.
Then write an XSLT script to transform it to LaTeX.
This gives you 100% control and ensures robustness.

However, handling all the stupid things HTML authors do may make it
long-winded if you want to cope with them all. On the other hand, if
you are dealing with a reasonably consistent subset, it's probably the
most reliable method.

///Peter

gnuist006 · Oct 24, 2007

You've already asked this, and been given the answer, but in case you
didn't see it...

XSLT.

Run your HTML through Tidy to produce XHTML.
Then write an XSLT script to transform it to LaTeX.
This gives you 100% control and ensures robustness.

However, handling all the stupid things HTML authors do may make it
long-winded if you want to cope with them all. On the other hand, if
you are dealing with a reasonably consistent subset, it's probably the
most reliable method.

///Peter

forgot to cc to myself.
Janusz

Victor Ivrii · Oct 24, 2007

You've already asked this, and been given the answer, but in case you
didn't see it...

XSLT.

Run your HTML through Tidy to produce XHTML.
Then write an XSLT script to transform it to LaTeX.
This gives you 100% control and ensures robustness.

However, handling all the stupid things HTML authors do may make it
long-winded if you want to cope with them all. On the other hand, if
you are dealing with a reasonably consistent subset, it's probably the
most reliable method.

One should remember that while tex parser (tex/latex/...) can run in
quiet mode, it is not a default and finished tex document normally
does not contain any tex errors. Meanwhile few html parsers (web
browsers) even advise about errors. As a result absolute majority of
html sources contain errors, from few to few hundreds (the latter is
the case usually with commercial web pages, produced by community
colleges graduates, who check their pages only against a specific
version of MSIE). The task of converting of such html sources to error-
free tex ones seems to be a really daunting

tsy · Oct 24, 2007

(e-mail address removed) wrote:
Run your HTML through Tidy to produce XHTML.
Then write an XSLT script to transform it to LaTeX.
This gives you 100% control and ensures robustness.

Is XSLT way easier than using a decent scripting language with a SAX
library?

Peter Flynn · Oct 27, 2007

Is XSLT way easier than using a decent scripting language with a SAX
library?

Yes. XSLT *is* a decent scripting (well, transformation-to-other-formats)
language.

///Peter

Desperately need help for html to LaTeX conversion	2	Oct 15, 2007
[ANN] deplate 0.7.3: A converter for wiki markup with output to HTML,LaTeX, DocBook, and more.	0	Nov 2, 2005
What is causing the overlap?	1	Jun 12, 2005
Last Call for Papers: The 2011 International Conference on Modeling,Simulation, and Visualization M	0	May 17, 2011
[ANN] kramdown 0.8.0 released	0	Jun 8, 2010
How to convert MS Word special characters to HTML codes?	1	Mar 31, 2012
What is the best way to clock data in on one clock edge and out on another?	4	Apr 26, 2006
CFP: The 2011 International Conference on Modeling, Simulation andVisualization Methods (MSV'11), U	2	Jan 30, 2011

What is the best html to latex program on the market or the internet ?

vasan999

vasan999

vasan999

Edd Barrett

metaperl.com

gnuist006

Peter Flynn

Peter Flynn

gnuist006

Victor Ivrii

tsy

Peter Flynn

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads