Community
Participate
Working Groups
If a documentation plugin has content stored in an HTML file for which the DOCTYPE is: XHTML 1.0 Transitional named entities such as   or → will either cause a parsing error or simply fail to display. When an error occurs, it looks like this: org.xml.sax.SaxParseException: Reference to undefined entity "→" I have seen this behavior on Windows XP and Linux (Red Hat) systems. I have also see the page in question rendered properly *except* for the named entities; this happens on a Windows XP system with IE 7 installed. If the DOCTYPE is set to html instead: HTML 4.01 Transitional the file diplays properly in all cases. Note that the named entity references display properly in Eclipse 3.2. Although there is a workaround (using the decimal entity numbers rather than the entity names), I've marked this as major because files that rendered properly in 3.2 will now cause errors or display incorrectly. I have a simple doc plugin that demonstrates the problem, but I'm not sure how to attach it to this bug report. I'm happy to supply it if someone contacts me.
In the initial description I inserted "&" for the ampersand character, thinking it might be intercepted by the bugzilla web interface. It looks like it's not. The entity names in question are really " " and "→" and the like. Sorry for the confusion.
Can I go ahead and close this bug?
Or is the bug still valid but the description has changed?
My comment #1 had only to do with the formatting of the text of the initial bug report. None of the particulars of the bug are changed. Thanks...
If you can attach the plugin that would be very helpful. File/Export/Plug in Development/Deployable Plugins and Fragments is one way to do this, the default settings will create a single jar file which can be attached.
Created attachment 68836 [details] Example doc plugin to demonstrate html entity problem To see the behavior described in this bug, put the attached html_entities.jar file in the plugins directory and launch the help system. The top-level TOC entry "HTML Entities Test Plugin" contains example HTML and XHTML. With this plugin, both the HTML and XHTML pages will display (even though the XHTML page displays incorrectly) on my WinXP SP2 machine with IE7 installed. (The behavior is the same if I use Firefox as the help system browser.) On some other WinXP machines and on Linux (RedHat Enterprise 3) machines, I see the SAX parser errors described above.
I can see the bad rendering (using IE6) but not the parse error - can you paste a complete stack trace in?
Here's the stack trace from a Windows XP SP2 box: An error occured while processing the requested document: org.xml.sax.SAXParseException: Reference to undefined entity "→". at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3376) at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3370) at org.apache.crimson.parser.Parser2.expandEntityInContent(Parser2.java:2697) at org.apache.crimson.parser.Parser2.maybeReferenceInContent(Parser2.java:2606) at org.apache.crimson.parser.Parser2.content(Parser2.java:2017) at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1691) at org.apache.crimson.parser.Parser2.content(Parser2.java:1963) at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1691) at org.apache.crimson.parser.Parser2.content(Parser2.java:1963) at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1691) at org.apache.crimson.parser.Parser2.content(Parser2.java:1963) at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1691) at org.apache.crimson.parser.Parser2.content(Parser2.java:1963) at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1691) at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:667) at org.apache.crimson.parser.Parser2.parse(Parser2.java:337) at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:448) at org.apache.crimson.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:185) at org.eclipse.help.internal.dynamic.DocumentReader.read(DocumentReader.java:56) at org.eclipse.help.internal.dynamic.XMLProcessor.process(XMLProcessor.java:49) at org.eclipse.help.internal.xhtml.DynamicXHTMLProcessor.process(DynamicXHTMLProcessor.java:66) at org.eclipse.help.internal.webapp.servlet.DynamicXHTMLFilter$1.close(DynamicXHTMLFilter.java:79) at java.io.FilterOutputStream.close(FilterOutputStream.java:143) at org.eclipse.help.internal.webapp.servlet.FilterHTMLHeadAndBodyOutputStream.close(FilterHTMLHeadAndBodyOutputStream.java:290) at org.eclipse.help.internal.webapp.servlet.EclipseConnector.transfer(EclipseConnector.java:136) at org.eclipse.help.internal.webapp.servlet.ContentServlet.doGet(ContentServlet.java:42) at javax.servlet.http.HttpServlet.service(HttpServlet.java:596) at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) at org.eclipse.equinox.http.registry.internal.ServletManager$ServletWrapper.service(ServletManager.java:177) at org.eclipse.equinox.http.servlet.internal.ServletRegistration.handleRequest(ServletRegistration.java:91) at org.eclipse.equinox.http.servlet.internal.ProxyServlet.processAlias(ProxyServlet.java:110) at org.eclipse.equinox.http.servlet.internal.ProxyServlet.service(ProxyServlet.java:68) at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) at org.eclipse.equinox.http.jetty.internal.HttpServerManager$InternalHttpServiceServlet.service(HttpServerManager.java:277) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:428) at org.mortbay.jetty.servlet.ServletHandler.dispatch(ServletHandler.java:677) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:568) at org.mortbay.http.HttpContext.handle(HttpContext.java:1530) at org.mortbay.http.HttpContext.handle(HttpContext.java:1482) at org.mortbay.http.HttpServer.service(HttpServer.java:909) at org.mortbay.http.HttpConnection.service(HttpConnection.java:820) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:986) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:837) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:245) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534) Here's the trace from a Linux (RedHat 3 Enterprise) box: An error occured while processing the requested document: org.xml.sax.SAXParseException: Reference to undefined entity "→". at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3339) at org.apache.crimson.parser.Parser2.fatal(Parser2.java:3333) at org.apache.crimson.parser.Parser2.expandEntityInContent(Parser2.java:2660) at org.apache.crimson.parser.Parser2.maybeReferenceInContent(Parser2.java:2569) at org.apache.crimson.parser.Parser2.content(Parser2.java:1980) at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1654) at org.apache.crimson.parser.Parser2.content(Parser2.java:1926) at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1654) at org.apache.crimson.parser.Parser2.content(Parser2.java:1926) at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1654) at org.apache.crimson.parser.Parser2.content(Parser2.java:1926) at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1654) at org.apache.crimson.parser.Parser2.content(Parser2.java:1926) at org.apache.crimson.parser.Parser2.maybeElement(Parser2.java:1654) at org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:634) at org.apache.crimson.parser.Parser2.parse(Parser2.java:333) at org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:448) at org.apache.crimson.jaxp.DocumentBuilderImpl.parse(DocumentBuilderImpl.java:185) at org.eclipse.help.internal.dynamic.DocumentReader.read(DocumentReader.java:56) at org.eclipse.help.internal.dynamic.XMLProcessor.process(XMLProcessor.java:49) at org.eclipse.help.internal.xhtml.DynamicXHTMLProcessor.process(DynamicXHTMLProcessor.java:66) at org.eclipse.help.internal.webapp.servlet.DynamicXHTMLFilter$1.close(DynamicXHTMLFilter.java:79) at java.io.FilterOutputStream.close(FilterOutputStream.java:143) at org.eclipse.help.internal.webapp.servlet.FilterHTMLHeadAndBodyOutputStream.close(FilterHTMLHeadAndBodyOutputStream.java:290) at org.eclipse.help.internal.webapp.servlet.EclipseConnector.transfer(EclipseConnector.java:136) at org.eclipse.help.internal.webapp.servlet.ContentServlet.doGet(ContentServlet.java:42) at javax.servlet.http.HttpServlet.service(HttpServlet.java:596) at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) at org.eclipse.equinox.http.registry.internal.ServletManager$ServletWrapper.service(ServletManager.java:177) at org.eclipse.equinox.http.servlet.internal.ServletRegistration.handleRequest(ServletRegistration.java:91) at org.eclipse.equinox.http.servlet.internal.ProxyServlet.processAlias(ProxyServlet.java:110) at org.eclipse.equinox.http.servlet.internal.ProxyServlet.service(ProxyServlet.java:68) at javax.servlet.http.HttpServlet.service(HttpServlet.java:689) at org.eclipse.equinox.http.jetty.internal.HttpServerManager$InternalHttpServiceServlet.service(HttpServerManager.java:277) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:428) at org.mortbay.jetty.servlet.ServletHandler.dispatch(ServletHandler.java:677) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:568) at org.mortbay.http.HttpContext.handle(HttpContext.java:1530) at org.mortbay.http.HttpContext.handle(HttpContext.java:1482) at org.mortbay.http.HttpServer.service(HttpServer.java:909) at org.mortbay.http.HttpConnection.service(HttpConnection.java:820) at org.mortbay.http.HttpConnection.handleNext(HttpConnection.java:986) at org.mortbay.http.HttpConnection.handle(HttpConnection.java:837) at org.mortbay.http.SocketListener.handleConnection(SocketListener.java:245) at org.mortbay.util.ThreadedServer.handle(ThreadedServer.java:357) at org.mortbay.util.ThreadPool$PoolThread.run(ThreadPool.java:534)
Hi Adam, can you take a look into this?
I've reproduced both the org.xml.sax.SaxParseException as described and the behaviour that Chris was seeing, which is that no exception is thrown, but that the escape characters do not render correctly. The SaxParseException can only be seen on the Apache Crimson parser, which is no longer in use on more recent Sun JREs (it is used by Sun JRE v1.4.2). On all newer parsers, the escape characters are still not rendered correctly, but no exception is thrown. The reason for this behaviour is that the help system is running XHTML documents through an XML parser before it passes the HTML on to the browser in order to determine if any dynamic content has been included that needs to be resolved (see http://help.eclipse.org/help32/topic/org.eclipse.platform.doc.isv/guide/ua_dynamic.htm). The XML parser is trying to resolve all of the entities it sees to determine if they are valid. To do this it first checks the XML DTD (included with the parser) and if they are not found it will attempt to go to the DTD for the specified doctype. This requires an excessive number of network calls which are now being supressed in 3.3 for performance reasons. This is why things like ">" still work. They are included in the XML DTD and are therefore found by the parser. Fixing this would require significant changes to the code and we are now too late in the cycle to address this for 3.3. For now XHTML docs will need to be writen (or updated) to use the entity numbers rather than the entity names. Here is a list of the most common entities: http://www.w3schools.com/tags/ref_entities.asp
Created attachment 72626 [details] patch We stopped shipping the DTDs due to legal concerns and we started suppressing the excessive number of network calls to them for performance reasons. This patch works around both of those issues by retrieving DTDs the first time they are requested (with a network call) and caching them in the eclipse configuration directory (under "<configuration>/org.eclipse.help/DTDs". It then uses the cached copies for all subsequent requests.
The patch is good and has been applied to HEAD. The patch relies on an internet connection to read the DTDs but this is an improvement over being unable to resolve the entities. Without an internet connection the original problem still exists.
Backing out the patch and reopening as the JUnit test "testXMLProcessor" is failing. I should have run the JUnits before I committed this patch.
The test is failing because the DocumentReader is now adding the following to the script tag for live help: xml:space="preserve" This must have to do with the presence of the DTD, but I'm not yet sure why the parser would be arbitrarily inserting this attribute.
Created attachment 73432 [details] patch After some investigation it seems that the behaviour of the parser is to insert the default value whenever it finds a node that is missing a required attribute. Since it now has access to the DTDs, it is able to find these nodes. This does not affect the functionality of the parsing it just means that the outputted document model does not exactly match the inputted xhtml. To workaround this problem, the <script> and <a> nodes in "xhtml_expected.txt" in the have been updated in this new patch to include the missing required attributes ("xml:space" and "shape", respectively). The alternative would be to implement our own XML parser that does not insert defaults. This seems like overkill.
With the new patch will the tests pass with or without an internet connection?
Created attachment 73443 [details] patch With the DTDs not yet in the config directory and no internet connection, the test would fail since it would revert to the old behaviour and would not add the required attributes. In this version of the patch, the xhtml input file for the test also contains the default values. This ensures that even with the old behaviour (no DTD) they will be included in the output. The test should now pass under all circumstances.
Patch committed to HEAD (with copyright statements added).
*** Bug 214376 has been marked as a duplicate of this bug. ***
Reopening. This was re-discovered in https://bugs.eclipse.org/bugs/show_bug.cgi?id=245984
What version of Eclipse are you using? I just tried this using I20080812-0800 with the example plugin which is attached to this bug and it is working fine. Can you test using the example attached to this bug. It is possible that the other bug you referred to is using an entity which is not in one of the DTDs which gets included in Eclipse.
It was (re)reported in https://bugs.eclipse.org/bugs/show_bug.cgi?id=245984 The text uses   and © entities: Copyright © 2006, 2008, Oracle. All rights reserved.
I was able to use the © and  & entities without problem in an xhtml document and have them render correctly using Eclipse 3.4. It seems that Bug 245984 is a different problem, and one which does not have a test case. If Bug 245984 is a problem with the help system then a new bug with test case should be opened but since the example in this bug works fine and continues to work when I add a copy or nbsp entity I plan to set the state of this bug back to fixed.
I'm going to set the state back to FIXED because the test case attached works fine. If you have a different test case or a different scenario that is failing then please open a new bug.