[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[platform-swt-dev] XOM vs DOM4j; Tagsoup, JTidy, cyberneko

Dear All:

I am rewriting the Web Automation Framework I had mentioned earlier (www.mkosh.com) to include XPath support.

Unfortunately, I have stuck upon an issue that is somewhat unfamiliar to me and would appreciate your help/advice.

Specifically, as XPath support is not available (to a large extent) in Internet Explorer, I have done the "trick" of
parsing my HTML document on the fly and using Java-based XPath libraries.

Unfortunately, Internet Explorer also does not have the hooks for the proper events for DOM modifications (MutationEvents), and thus, the
DOM structure will have to be reloaded upon _each_ execution of a Javascript that can potentially change the HTML content of the page.

Specifically, I started out using Tagsoup+DOM, but am now trying some of these other alternatives.

Speed is important, but so is size.

It seem Tagsoup+XOM or Cyberneko+DOM4j are the alternatives, and Tagsoup+XOM is slightly slower, whereas Cyberneko (NekoHTML)+DOM4j is slightly faster but much bulkier (as must include Xerces library).

Any ideas/advice? Do you think it is good to include both?

Thank you