[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[platform-swt-dev] XOM vs DOM4j; Tagsoup, JTidy, cyberneko
- From: Misha Koshelev <misha680@xxxxxxxxx>
- Date: Wed, 12 May 2010 14:25:00 -0500
- Delivered-to: firstname.lastname@example.org
- Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:received:received:message-id:date:from :user-agent:mime-version:to:subject:content-type :content-transfer-encoding; bh=ybOpT4Io4xsgpDL4aog2Epa44vbZC0K+6/6rVs0w+4Y=; b=mUTE6Hm6j7KxP3DgV3KgBtcE05uTZ7X/NEvU2qz+Y3QO2ap5igyULiXHxPzPU0JTVN U0CtIKoBLw6F9QWQepyfGMkWewW17nWcDS0qdG0s19sGEdfwXsWl+x/R5iX5Nxx9XPvW 7aTtH/Qg4BBmcjtVwa8QXZjjjAqvWJBf6EwPQ=
- Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=message-id:date:from:user-agent:mime-version:to:subject :content-type:content-transfer-encoding; b=LfZgErqIUtfCbe0MT46LUiD88E8Z+1seKphQQjpgS7pgQoClBi2LzoQHrkZyzIJ2S9 GKy4BukJ+bgBVRoDCCfnOSJBnT53YvHunyxKuTKIPw7XpTY2doI2mOPixh5Shf88Bu0z PijAqz199UOaVZrzQBpReVrJzAaYEnuuq8Py0=
- User-agent: Thunderbird 220.127.116.11 (X11/20100317)
I am rewriting the Web Automation Framework I had mentioned earlier (www.mkosh.com) to include XPath support.
Unfortunately, I have stuck upon an issue that is somewhat unfamiliar to me and would appreciate your help/advice.
Specifically, as XPath support is not available (to a large extent) in Internet Explorer, I have done the "trick" of
parsing my HTML document on the fly and using Java-based XPath libraries.
Unfortunately, Internet Explorer also does not have the hooks for the proper events for DOM modifications (MutationEvents), and thus, the
Specifically, I started out using Tagsoup+DOM, but am now trying some of these other alternatives.
Speed is important, but so is size.
It seem Tagsoup+XOM or Cyberneko+DOM4j are the alternatives, and Tagsoup+XOM is slightly slower, whereas Cyberneko (NekoHTML)+DOM4j is slightly faster but much bulkier (as must include Xerces library).
Any ideas/advice? Do you think it is good to include both?