[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[epf-dev] WordHtmForEPF - Preparing Word saved .htm pages for instant EPF import


Hi,

I was annoyed that I couldn't paste a .htm document saved from Word into a EPF rich editor without getting some rather unuseful results. I created a small groovy script to greatly improve the process.

Usage guide:
Save Word doc as filtered .htm page. It will create a folder with external resources used such as images, numbered image001, image002 and so on. This really sucks for importing into EPF.

Run the WordHtmForEPF.groovy in this directory, perhaps with a filepattern argument for selecting which htm files to process, fx: WordHtmForEPF.groovy Guide*.htm

The script will create new .htm files with EPF_ prefix for each .htm file matching the filepattern. These new .htm files have image src attributes redirected to images in a /resource folder.
The images from the Word saved resource folders will be renamed more usefully (with document name prefix) and copied to the new /resources folder.

Now if you open a EPF_xxx.htm fil in a broswer you can copy paste it into a EPF RTE and the result will be just beautiful. All images are transfered, stored in EPF /resources folder for the type of method element and each image named to indicate the method element they are part of :)

It even handles special characters (danish characters as of now...)

/**
 * @author kristian@xxxxxxxxxx
 *
 */
 //import java.net commons
public class WordHtmForEPF {

        private static encode(first) {
                def prefix = first.replaceAll(/(%20|\s)/) {fullM, space ->
                        return '-'
                }
                prefix = prefix.replaceAll(/(æ|Æ|%C3%A6)/) {fullM, space ->
                        return 'ae'
                }
                prefix = prefix.replaceAll(/(ø|Ø)/) {fullM, space ->
                return 'oe'
                }
                prefix = prefix.replaceAll(/(å|Å)/) {fullM, space ->
                return 'aa'
                }
                prefix = prefix.replaceAll(/(é|É)/) {fullM, space ->
                return 'e'
                }
                return prefix
        }
       
        /**
         * @param args
         */
        public static void main(def args){
                println "WordHtmEPF v.1.0 - by Kristian Mandrup consulting"
                def filePattern = "[^EPF_].*.htm"
                if (args.length > 0) {
                        filePattern = args[0]
                }
                def removePrefix = false
                if (args.length > 1) {
                        if (args[1] == 'remove')
                                removePrefix = true                        
                }
               
                def dir = new File(".")
                dir.eachFileMatch(~"${filePattern}") {File f ->
                        println "Generating EPF image references for: ${f.name}"
                        def str = f.getText()
                        def replaced = str.replaceAll(/src="" {fullMatch, first, second ->
                                        return 'src="" + encode(first) + '-' + second + '">'
                                }
                        // println replaced
                        def f2 = new File('EPF_' + f.name)
                        f2 << replaced
                        String dirName = f.name[0..-5] + '-filer'
                        File fdir = new File(dirName)
                        renameResources(fdir, removePrefix)        
                        File resDir = new File('resources')
                        if (!resDir.exists()) {
                                // println "Renaming  ${resDir.name} to resources "
                                fdir.renameTo(new File('resources'))
                        }
                }
        }
       
        private static void renameResources(File directory, removePref) {        
                def renameClos = { dir, filePattern, prefix, removePrefix ->
                        println "Prefix ${prefix}"
                        println "filePattern ${filePattern}"
                dir.eachFileMatch(~"${filePattern}") {File f ->
                        // is prefix already present in start of name!?
                        String newFileName = "${prefix}-${f.name}"
                        def replaceName = encode(newFileName)                        
                        if (!f.name.startsWith(prefix)) {                                                                                    
                            if (removePrefix) {
                                    int index = prefix.length()+1
                                    replaceName = f.name[index..-1]
                            }
                        } else {
                                // don't add prefix if present already!
                                replaceName = f.name
                        }
                       
                                // copy all files to resources dir
                                String newDirectoryName = 'resources/'
                                def d1 = new File(newDirectoryName)
                                d1.mkdir()
                                File newResourceFile = new File(newDirectoryName + '/' + replaceName)
                                // copy file
                                println "Copy ${f.name} to resources/${newResourceFile.name} "                                
                                new AntBuilder().copy(file: "${f.canonicalPath}",
                         tofile:"${newResourceFile.canonicalPath}")
                }
                }                
                def filePattern = ".*.(jpg|JPG|gif|GIF|png|PNG|bmp|BMP)"
                String prefix = ""        
                def path = directory.canonicalPath
                def index = path.lastIndexOf('\\')+1
                prefix = path[index..-1]
                println "Directory name used as default prefix: ${prefix}"                        
                println "Renaming all pictures (jpg|JPG|gif|GIF|png|PNG|bmp|BMP)"
                renameClos( directory, filePattern, prefix, removePref )
        }        
       
}