M
Marcel Kessler
Hi there
Does anyone know a good way of converting HTML to plain text, keeping as
much of the formatting as possible?
The HTML will be produced by an editor like FCKEditor, and
transformation should happen in Java.
So far I've found the following options, none of them really convincing:
# Using w3m or lynx to convert html to plain text
(http://www.biglist.com/lists/xsl-list/archives/200406/msg00689.html)
+ neat output
- need to call C from java
# Google gdata routine
(http://www.biglist.com/lists/xsl-list/archives/200406/msg00689.html)
+ java source available
- only basic stripping, no tables etc
# Use xml & xslt
(http://www-128.ibm.com/developerworks/java/library/x-xmlist1/)
+ good result
- complicated approach, cannot use wysiwyg-editor like FCKEditor
# use other tools like docfraq, detagger, notetab etc.
- no better results than with w3m
Thanks and regars
Marcel
Does anyone know a good way of converting HTML to plain text, keeping as
much of the formatting as possible?
The HTML will be produced by an editor like FCKEditor, and
transformation should happen in Java.
So far I've found the following options, none of them really convincing:
# Using w3m or lynx to convert html to plain text
(http://www.biglist.com/lists/xsl-list/archives/200406/msg00689.html)
+ neat output
- need to call C from java
# Google gdata routine
(http://www.biglist.com/lists/xsl-list/archives/200406/msg00689.html)
+ java source available
- only basic stripping, no tables etc
# Use xml & xslt
(http://www-128.ibm.com/developerworks/java/library/x-xmlist1/)
+ good result
- complicated approach, cannot use wysiwyg-editor like FCKEditor
# use other tools like docfraq, detagger, notetab etc.
- no better results than with w3m
Thanks and regars
Marcel