[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[List Home]
|
[platform-swt-dev] XOM vs DOM4j; Tagsoup, JTidy, cyberneko
|
- From: Misha Koshelev <misha680@xxxxxxxxx>
- Date: Wed, 12 May 2010 14:25:00 -0500
- Delivered-to: platform-swt-dev@eclipse.org
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:content-type :content-transfer-encoding; bh=ybOpT4Io4xsgpDL4aog2Epa44vbZC0K+6/6rVs0w+4Y=; b=mUTE6Hm6j7KxP3DgV3KgBtcE05uTZ7X/NEvU2qz+Y3QO2ap5igyULiXHxPzPU0JTVN U0CtIKoBLw6F9QWQepyfGMkWewW17nWcDS0qdG0s19sGEdfwXsWl+x/R5iX5Nxx9XPvW 7aTtH/Qg4BBmcjtVwa8QXZjjjAqvWJBf6EwPQ=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; b=LfZgErqIUtfCbe0MT46LUiD88E8Z+1seKphQQjpgS7pgQoClBi2LzoQHrkZyzIJ2S9 GKy4BukJ+bgBVRoDCCfnOSJBnT53YvHunyxKuTKIPw7XpTY2doI2mOPixh5Shf88Bu0z PijAqz199UOaVZrzQBpReVrJzAaYEnuuq8Py0=
- User-agent: Thunderbird 2.0.0.24 (X11/20100317)
Dear All:
I am rewriting the Web Automation Framework I had mentioned earlier (www.mkosh.com) to include XPath support.
Unfortunately, I have stuck upon an issue that is somewhat unfamiliar to me and would appreciate your help/advice.
Specifically, as XPath support is not available (to a large extent) in Internet Explorer, I have done the "trick" of
parsing my HTML document on the fly and using Java-based XPath libraries.
Unfortunately, Internet Explorer also does not have the hooks for the proper events for DOM modifications (MutationEvents), and thus, the
DOM structure will have to be reloaded upon _each_ execution of a Javascript that can potentially change the HTML content of the page.
Specifically, I started out using Tagsoup+DOM, but am now trying some of these other alternatives.
Speed is important, but so is size.
It seem Tagsoup+XOM or Cyberneko+DOM4j are the alternatives, and Tagsoup+XOM is slightly slower, whereas Cyberneko (NekoHTML)+DOM4j is slightly faster but much bulkier (as must include Xerces library).
Any ideas/advice? Do you think it is good to include both?
Thank you
Misha