[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [Newsgroup Home]
[news.eclipse.technology.epf] WordHtmForEPF - Preparing Word saved .htm pages for instant EPF import

Hi,

I was annoyed that I couldn't paste a .htm document saved from Word into a EPF rich editor without getting some rather unuseful results. I created a small groovy script to greatly improve the process.

Usage guide:
Save Word doc as filtered .htm page. It will create a folder with external resources used such as images, numbered image001, image002 and so on. This really sucks for importing into EPF.


Run the WordHtmForEPF.groovy in this directory, perhaps with a filepattern argument for selecting which htm files to process, fx: WordHtmForEPF.groovy Guide*.htm

The script will create new .htm files with EPF_ prefix for each .htm file matching the filepattern. These new .htm files have image src attributes redirected to images in a /resource folder. The images from the Word saved resource folders will be renamed more usefully (with document name prefix) and copied to the new /resources folder.

Now if you open a EPF_xxx.htm fil in a broswer you can copy paste it into a EPF RTE and the result will be just beautiful. All images are transfered, stored in EPF /resources folder for the type of method element and each image named to indicate the method element they are part of :)

It even handles special characters (danish characters as of now...)

---

/**
* @author kristian@xxxxxxxxxx
*
*/
//import java.net commons
public class WordHtmForEPF {

private static encode(first) {
def prefix = first.replaceAll(/(%20|\s)/) {fullM, space ->
return '-'
}
prefix = prefix.replaceAll(/(æ|Æ|%C3%A6)/) {fullM, space ->
return 'ae'
}
prefix = prefix.replaceAll(/(ø|Ø)/) {fullM, space ->
return 'oe'
}
prefix = prefix.replaceAll(/(å|Å)/) {fullM, space ->
return 'aa'
}
prefix = prefix.replaceAll(/(é|É)/) {fullM, space ->
return 'e'
}
return prefix
}

/**
* @param args
*/
public static void main(def args){
println "WordHtmEPF v.1.0 - by Kristian Mandrup consulting"
def filePattern = "[^EPF_].*.htm"
if (args.length > 0) {
filePattern = args[0]
}
def removePrefix = false
if (args.length > 1) {
if (args[1] == 'remove')
removePrefix = true
}

def dir = new File(".")
dir.eachFileMatch(~"${filePattern}") {File f ->
println "Generating EPF image references for: ${f.name}"
def str = f.getText()
def replaced = str.replaceAll(/src="(.*)\/(.*)">/) {fullMatch, first, second ->
return 'src="resources/' + encode(first) + '-' + second + '">'
}
// println replaced
def f2 = new File('EPF_' + f.name)
f2 << replaced
String dirName = f.name[0..-5] + '-filer'
File fdir = new File(dirName)
renameResources(fdir, removePrefix)
File resDir = new File('resources')
if (!resDir.exists()) {
// println "Renaming ${resDir.name} to resources "
fdir.renameTo(new File('resources'))
} }
}

private static void renameResources(File directory, removePref) {
def renameClos = { dir, filePattern, prefix, removePrefix -> println "Prefix ${prefix}"
println "filePattern ${filePattern}"
dir.eachFileMatch(~"${filePattern}") {File f -> // is prefix already present in start of name!?
String newFileName = "${prefix}-${f.name}"
def replaceName = encode(newFileName)
if (!f.name.startsWith(prefix)) { if (removePrefix) { int index = prefix.length()+1
replaceName = f.name[index..-1]
} } else {
// don't add prefix if present already!
replaceName = f.name
}

// copy all files to resources dir
String newDirectoryName = 'resources/'
def d1 = new File(newDirectoryName)
d1.mkdir()
File newResourceFile = new File(newDirectoryName + '/' + replaceName)
// copy file
println "Copy ${f.name} to resources/${newResourceFile.name} "
new AntBuilder().copy(file: "${f.canonicalPath}", tofile:"${newResourceFile.canonicalPath}")
}
}
def filePattern = ".*.(jpg|JPG|gif|GIF|png|PNG|bmp|BMP)"
String prefix = ""
def path = directory.canonicalPath
def index = path.lastIndexOf('\\')+1
prefix = path[index..-1]
println "Directory name used as default prefix: ${prefix}"
println "Renaming all pictures (jpg|JPG|gif|GIF|png|PNG|bmp|BMP)"
renameClos( directory, filePattern, prefix, removePref ) }

}