I have a small Eclipse Java project which includes the XML export repair
and demonstrates how the repaired XML can be transformed through a series
of steps into a suitable XSL:FO format which can be converted to PDF,
Postscript or some document format (RTF, DOCX, ODT, WML) using either
Apache FOP or XMLmind FC (XFC). How do share this code? Do I have to be a
comitter on the project?
Alternatively post me an email or some otjher localtion I can post/send it
to...
Would be cool to see someone take my efforts a step further!
I have previously been looking into OAW and their EMF toolkit.
I think you could leverage the OAW framework to automate importing
documentation written in Microsoft Word. I imagine you could setup a
folder structure with the word documents saved as (filtered) htm,
including their resources (pictures) and a small metadata file (xml) for
each folder.
Then traverse the whole folder structure and generate the appropriate XMI
model, upload the images into the /ressources folder of the plugin and
reconfigure the internal img hrefs to point to the images in the
/ressources folder. I also noticed, that the Express Word to HTML
converter ignores a lot of special characters if they have not been
properly ecaped (such as danish character æ,ø,å). All the .htm files
should therefore be properly HTML encoded before they are converted.