Community
Participate
Working Groups
When the artifact and metadata repositories are written with the XMLWriter in org.eclipse.equinox.internal.p2.persistence with indentation (generated by the indent method) https://github.com/eclipse/rt.equinox.p2/blob/master/bundles/org.eclipse.equinox.p2.repository/src/org/eclipse/equinox/internal/p2/persistence/XMLWriter.java#L271 Although compression largely gets rid of the space in the compressed formats, it is still smaller still not to generate indentation. It also improves performance since the XML parser doesn't have to skip whitepsace when parsing the files; when starting up, for example, Eclipse will read the artifacts.xml file in uncompressed format. On my local Eclipse 4.5 install (which isn't heavily loaded) this is the size difference that it can make: -rw-r--r--@ 1 180626 9 Sep 22:06 artifacts-no-space.xml -rw-r--r--@ 1 211754 9 Sep 22:04 artifacts.xml On a 512-block system, it's a difference of 352 blocks vs 413 to read the data off disk. This is with both whitespace and newlines removed; a configuration property to allow the data to be written without whitespace and newlines could be passed in, and then have the printwriters add indentation where applicable.
*** Bug 432654 has been marked as a duplicate of this bug. ***
Alex, could you provide a Gerrit for this?
(In reply to Lars Vogel from comment #2) > Alex, could you provide a Gerrit for this? Ping?
New Gerrit change created: https://git.eclipse.org/r/165521
Gerrit change https://git.eclipse.org/r/c/equinox/rt.equinox.p2/+/165521 was merged to [master]. Commit: http://git.eclipse.org/c/equinox/rt.equinox.p2.git/commit/?id=79b2508ee17685b18fd72dc79e41ef916d43231a
Thanks, Alex. Please set milestone and update bug status, if this work is done.
Alex, can you add this to the N&N and also describe the p2.useWhitespace option in this entry? I guess some people would like to make the generated files readable for debugging purpose.
Sure, I can add a N&N once we've bedded down RC1 and we're happy with how we're generating the update sites. I wasn't sure if the code for generating the sites hosted on Eclipse would use the nightly builds, or if they would use a stable version (so we might not have seen it in use yet). Given that they are XML files, you can pipe them through either xmltidy or open them up in the browser; the browser's built-in CSS for navigating XML file should show them as navigable, even if the underlying source isn't.
I'm quite unhappy with this change. Oomph uses the XMLWriter to present information about IUs to the user in the Repository Explorer, and that feature has now become unusable. Firstly because it's unreadable and secondly because it's one giant long string, presenting that in StyledText performs so badly that the UI freezes for more than a minute in some cases. Of course setting the text and styling it must be done in the UI thread, so there's no avoiding that. These static final constants were added. org.eclipse.equinox.internal.p2.persistence.XMLWriter.useWhitespace org.eclipse.equinox.internal.p2.update.XMLWriter.useWhitespace The naming is already poor because final constants should be all upper case. I would like to change each of these to: protected boolean useWhitespace = Boolean.getBoolean("p2.useWhitespace"); //$NON-NLS-1$ This way a client can change the value without having global impact. I suppose even private would be okay because I could set and restore the system property, though that's pretty questionable as well. But the current design of a global final singleton value is not workable.
(In reply to Ed Merks from comment #9) > I'm quite unhappy with this change. > > Oomph uses the XMLWriter to present information about IUs to the user in the > Repository Explorer, and that feature has now become unusable. > > Firstly because it's unreadable and secondly because it's one giant long > string, presenting that in StyledText performs so badly that the UI freezes > for more than a minute in some cases. Of course setting the text and > styling it must be done in the UI thread, so there's no avoiding that. > > These static final constants were added. > > org.eclipse.equinox.internal.p2.persistence.XMLWriter.useWhitespace > org.eclipse.equinox.internal.p2.update.XMLWriter.useWhitespace > > The naming is already poor because final constants should be all upper case. Can be changed, as they have not found their way into a release. Please provide Gerrit. > I would like to change each of these to: > > protected boolean useWhitespace = Boolean.getBoolean("p2.useWhitespace"); > //$NON-NLS-1$ > > This way a client can change the value without having global impact. Sounds good to me, please upload a Gerrit.
(In reply to Ed Merks from comment #9) > I'm quite unhappy with this change. > > Oomph uses the XMLWriter to present information about IUs to the user in the > Repository Explorer, and that feature has now become unusable. > > Firstly because it's unreadable and secondly because it's one giant long > string, presenting that in StyledText performs so badly that the UI freezes > for more than a minute in some cases. Of course setting the text and > styling it must be done in the UI thread, so there's no avoiding that. > > These static final constants were added. > > org.eclipse.equinox.internal.p2.persistence.XMLWriter.useWhitespace > org.eclipse.equinox.internal.p2.update.XMLWriter.useWhitespace > > The naming is already poor because final constants should be all upper case. > > I would like to change each of these to: > > protected boolean useWhitespace = Boolean.getBoolean("p2.useWhitespace"); > //$NON-NLS-1$ > > This way a client can change the value without having global impact. > > I suppose even private would be okay because I could set and restore the > system property, though that's pretty questionable as well. > > But the current design of a global final singleton value is not workable. Why does Oomph use a P2-specific internal XML writer for this, instead of using its own mechanism?
Why does anyone reuse any software rather than rewriting the whole thing from scratch themselves? The underlying thinking behind this change seems to be that no human ever looks at these things so who cares that it's now become one giant unreadable line of gibberish. Certainly no one will be able to read it now and if no one needs to read it ever, why bother writing it in XML in the first place? I also question to change the default behavior rather than allowing for opt-in as opposed to forcing opt-out?
It's an internal implementation detail of the P2 persistence mechanism. It shouldn't be used by those outside of P2. If you want to have an Oomph specific XMLWriter that you have control of, it would make sense to use that instead of having an internal dependency on something that may break/change without notice.
(In reply to Alex Blewitt from comment #13) > It's an internal implementation detail of the P2 persistence mechanism. It > shouldn't be used by those outside of P2. If you want to have an Oomph > specific XMLWriter that you have control of, it would make sense to use that > instead of having an internal dependency on something that may break/change > without notice. API-rights activists discussions will get us nowhere. Yes, these classes are internal, but unfortunately p2 is so rife with internals that one can't actually implement useful utilities on top of it without using internals. If there were a toXML utility API somewhere, that would be great, but there isn't. If you look at the history of org.eclipse.equinox.internal.p2.persistence.XMLWriter, you'll see how it evolved to support generic requirements (Bug 528387) and that should make it clear that duplicating this code base is also not a solution to changing implementation details in p2. It just means that I would have to keep duplicate code in lock-step just to appease the API-rights activists.
I think you both are right (Alex with we should not use internal API and Ed with sometimes it is just practical impossible to do not use internals). I think Ed parameter suggestion does not do any harm so if you could provide a Gerrit that should be fine. Ed, please open also a bug to request a toXML API so that this can be used to official usage.
I can provide a patch with the suggested changes for review. Note that I was just suggesting to make protected non-final variables (initialize the same way as now) instead. Okay?
(In reply to Ed Merks from comment #16) > I can provide a patch with the suggested changes for review. > > Note that I was just suggesting to make protected non-final variables > (initialize the same way as now) instead. Okay? Fine for me.
The more I look at and think about the impact of this change, it really bothers me a lot. I know for sure that this will break long-established processing steps that have been written based on the fact that elements are on separate lines, e.g., https://git.eclipse.org/c/oomph/org.eclipse.oomph.git/tree/releng/org.eclipse.oomph.releng/src/ArtifactRepositoryAdjuster.java So I look at how I could pass a system property to the tycho-p2-repository-plugin to turn this new behavior off. But I don't see how I can do that. I believe this change also impacts org.eclipse.equinox.internal.p2.persistence.CompositeWriter because it extends org.eclipse.equinox.internal.p2.persistence.XMLWriter and I'm sure that many people edit the composite manually or via scripts, though perhaps no so many applications actually create the composite using p2... I really don't think the new behavior should be the new default. I think new options such as this should be opt-in not opt-out. But that begs the question of how to enable the opt-in behavior in things like tycho-p2-repository-plugin. I believe that has been given no thought. I think the change would be less likely to be disruptive if linefeeds were not eliminated, but rather only the indentation was eliminated. That way at least the XML is less readable rather than unreadable. Certainly Eclipse editors *do not* like (does not properly display at least on Windows) lines that are several megabytes long.
Alexander, WDYT?
(In reply to Ed Merks from comment #18) > The more I look at and think about the impact of this change, it really > bothers me a lot. > > I know for sure that this will break long-established processing steps that > have been written based on the fact that elements are on separate lines, > e.g., > > https://git.eclipse.org/c/oomph/org.eclipse.oomph.git/tree/releng/org. > eclipse.oomph.releng/src/ArtifactRepositoryAdjuster.java Since it's an XML file, the processing should be done using XML tools, not line/text/regex oriented ones. It's good that we're catching such bugs and can fix them. In this case, it would seem that XPath would be the right candidate for determining the greatest version, and the Transformer would be appropriate for re-writing attributes. Then again, it's such a simple transformation that using a SAX parser would do the trick as well, since you'd only have to process an attribute. https://docs.oracle.com/javase/7/docs/api/javax/xml/xpath/XPath.html https://docs.oracle.com/javase/6/docs/api/javax/xml/transform/Transformer.html > So I look at how I could pass a system property to the > tycho-p2-repository-plugin to turn this new behavior off. But I don't see > how I can do that. Why should anyone using the artifacts/repository files care whether they're compressed or not? The APIs exist to be able to read/write them already, and we already have XML processing tools. In fact, a lot of work in Eclipse (e.g. EMF, E4) works almost exclusively with XML files. > I believe this change also impacts > org.eclipse.equinox.internal.p2.persistence.CompositeWriter because it > extends org.eclipse.equinox.internal.p2.persistence.XMLWriter and I'm sure > that many people edit the composite manually or via scripts, though perhaps > no so many applications actually create the composite using p2... > > I really don't think the new behavior should be the new default. I think > new options such as this should be opt-in not opt-out. But that begs the > question of how to enable the opt-in behavior in things like > tycho-p2-repository-plugin. I believe that has been given no thought. No, because there wasn't the intention that this would have to be an opt-in change. We want to be able to write out more minimal files; having it as an opt-in property wouldn't make sense, because then you'd have to enable it at every build tool that might be using the P2 APIs. Having it opt-out was only there as a break-glass incident/access rather than an expected form of using the files. > I think the change would be less likely to be disruptive if linefeeds were > not eliminated, but rather only the indentation was eliminated. That way at > least the XML is less readable rather than unreadable. Certainly Eclipse > editors *do not* like (does not properly display at least on Windows) lines > that are several megabytes long. But who is opening an editor on an XML file, and certainly the artifacts.xml/content.xml files, and why? I get that writing hand-held compositeContent/Artifacts.xml files might make sense, but in general they're pretty small in either case. In the interest in moving forwards, perhaps a half-way house would be to re-enable the end-of-line markers for now (so they don't cause problems with those editing it) and would allow a transition for processing tools like ArtifactRepositoryAdjuster to be adjusted to do XML transformations instead of text ones? Then if necessary, a subsequent change could enable the stripping of newlines as well.
> > So I look at how I could pass a system property to the > > tycho-p2-repository-plugin to turn this new behavior off. But I don't see > > how I can do that. > > Why should anyone using the artifacts/repository files care whether they're > compressed or not? The APIs exist to be able to read/write them already, and > we already have XML processing tools. In fact, a lot of work in Eclipse > (e.g. EMF, E4) works almost exclusively with XML files. > I just explained why someone cares and I've pointed out two things that are broken or unusable as a result of the assumption that no one ought to care. > > I believe this change also impacts > > org.eclipse.equinox.internal.p2.persistence.CompositeWriter because it > > extends org.eclipse.equinox.internal.p2.persistence.XMLWriter and I'm sure > > that many people edit the composite manually or via scripts, though perhaps > > no so many applications actually create the composite using p2... > > > > I really don't think the new behavior should be the new default. I think > > new options such as this should be opt-in not opt-out. But that begs the > > question of how to enable the opt-in behavior in things like > > tycho-p2-repository-plugin. I believe that has been given no thought. > > No, because there wasn't the intention that this would have to be an opt-in > change. We want to be able to write out more minimal files; having it as an > opt-in property wouldn't make sense, because then you'd have to enable it at > every build tool that might be using the P2 APIs. Having it opt-out was only > there as a break-glass incident/access rather than an expected form of using > the files. > I'm not sure who "we" are in this case. I would hope that "we" want to make changes that have maximal positive impact and minimal negative impact. > > I think the change would be less likely to be disruptive if linefeeds were > > not eliminated, but rather only the indentation was eliminated. That way at > > least the XML is less readable rather than unreadable. Certainly Eclipse > > editors *do not* like (does not properly display at least on Windows) lines > > that are several megabytes long. > > But who is opening an editor on an XML file, and certainly the > artifacts.xml/content.xml files, and why? I get that writing hand-held > compositeContent/Artifacts.xml files might make sense, but in general > they're pretty small in either case. > Have you never needed to understand what p2 ends up publishing and why something isn't ending up in a repository? Certainly the repository explorer tries to eliminate the need to do this manually, but assuming no human ever does this is a bad arbitrary assumption. > In the interest in moving forwards, perhaps a half-way house would be to > re-enable the end-of-line markers for now (so they don't cause problems with > those editing it) and would allow a transition for processing tools like > ArtifactRepositoryAdjuster to be adjusted to do XML transformations instead > of text ones? Then if necessary, a subsequent change could enable the > stripping of newlines as well. In the interest of gatherings facts, I compared the size of this repo's content jar: http://download.eclipse.org/releases/2020-09/202008071000/ with the size of what's produced now. The saving is roughly 7% so that seems fairly significant. But then I though, what about the *.xml.xz? Unfortunately the mirror application doesn't support using org.eclipse.equinox.p2.internal.repository.tools.XZCompressor for that purpose. In fact *nothing* in the platform in actually uses this class! So I guess only Tycho uses it, but it's internal! Certainly I use it for producing EMF's update sites and for producing JustJ's update sites, but then I'm a bad person because I use p2 internal "APIs". Of course the fact that there no way to produce the *.xml.xz without using internals kind of loops back into a pointlessly circular API discussion. In any case, if I modify the MirrorApplication to use this class, the process of compressing the XML does not complete for many minutes. In fact, it takes so long that I have lost patience waiting for it to complete after more than 10 minutes. If I disable the whitespace "optimization" via the system property, the processing completes quickly as normal and the size difference is 814K versus 817K which is less than .4% size savings. Clearly the unacceptable performance of XZCompressor for a single very long line of XML is a disaster waiting to happen. And this will not be noticed except by downstream consumers using p2 internal APIs, which is the only "API" for producing a p2.index and the *.xml.xz variants. In my opinion, this change should simply be reverted. It does not significantly improve the size of the *.xml.xz and its definitely going to break clients, including Oomph, EMF, JustJ and Tycho. If the goal is to improve p2 performance we should rather ensure that projects all produce *.xml.xz rather than trying to make the content.jar smaller. Fortunately that is Tychos's default: https://www.eclipse.org/tycho/sitedocs/tycho-p2/tycho-p2-repository-plugin/assemble-repository-mojo.html#xzCompress
I'm also either for reverting this change or for making it opt-in. My project's build and promotion system uses line-by-line processing of p2 files since more than 10 years. Whether one likes that or not, it works and is fast. The size reduction through whitespace removal is less than 12 KB per content.jar. In addition I personally have to look at the inner details of these files on almost a daily basis. And of course I use to use my machine's text editor for this purpose. (In reply to Alex Blewitt from comment #0) > On a 512-block system, it's a difference of 352 blocks vs 413 to read the > data off disk. Even for 1000 copies of this file that would save about 30 MB of disk space. The cost of this space can not be expressed in cents, I guess. Size reduction is probably more interesting when it comes to (repeated) network transfers. Also, I haven't seen any processing time benchmarks/comparisons which would justify this non-optional change. > [...] a configuration property > to allow the data to be written without whitespace and newlines could be > passed in, and then have the printwriters add indentation where applicable. This sounded more like an opt-in approach.
I'll revert the change as we are really close to M3 and I won't be available to review new patch prior to M3. Alex, if you enhance it to be opt-in I'll review but it will probably be for the December release.
New Gerrit change created: https://git.eclipse.org/r/c/equinox/rt.equinox.p2/+/167636
Today I tried to measure better what was going wrong with XZCompressor performance and today it did not take exceedingly long with the whitespace missing. I'm not sure what was going on yesterday. Perhaps some Windows anti-virus behavior... :-( In any case, the 814K content.xml.xz is reduced to 793K, so less than a 3.5% reduction. I just don't believe this saving justifies the reduced human readability. Just last week when working on JustJ trying to generate an "a.jre.*" IU (for use in a Tycho build as an environment) using a p2.inf, I was constantly looking at the p2 metadata of the repo and even the target/p2content.xml (generated by Tycho using XMLWriter) so I could see what the publisher is actually publishing. This is when I noticed the repository explorer performs poorly now and that led to finding the changes in this Bugzilla as the cause...
New Gerrit change created: https://git.eclipse.org/r/c/equinox/rt.equinox.p2/+/167639
Ed/Eike, can you file bugs that highlight the areas that you have found where tools have been written assuming a particular layout of the XML file so they can be fixed? Feel free to assign them to me.
Gerrit change https://git.eclipse.org/r/c/equinox/rt.equinox.p2/+/167639 was merged to [master]. Commit: http://git.eclipse.org/c/equinox/rt.equinox.p2.git/commit/?id=9834b0e8ea73543e7235c923301ee653e0e64f5d
Alex, with the change being made opt-in now - it would need a way to tell tycho to opt-in. Are you interested in this?
Yes, we'll need a way of making sure that the builds hosted at Eclipse.org produce the compact repositories by default, but I think that some of the processing in Oomph as highlighted by Ed needs to be resolved before we can make it so. Either way, there should be a follow up to make this the default either for all P2 users (and allow Oomph to opt out where necessary) or for making the Tycho builds opt-in. Do you think it would make sense for that bug to live in P2 so it can co-ordinate users of P2 code?
(In reply to Alex Blewitt from comment #30) > Yes, we'll need a way of making sure that the builds hosted at Eclipse.org > produce the compact repositories by default, but I think that some of the > processing in Oomph as highlighted by Ed needs to be resolved before we can > make it so. > > Either way, there should be a follow up to make this the default either for > all P2 users (and allow Oomph to opt out where necessary) or for making the > Tycho builds opt-in. Do you think it would make sense for that bug to live > in P2 so it can co-ordinate users of P2 code? It's fine IMO.
Raised bug 566070 to investigate surrounding issues. Ed, can you mark any bugs you raise regarding Oomph etc as blockers on that bug?
I'm moving this out to 4.18 although my vote would be to not bother with this at all and keep the human readable whitespace.
Let's say there is bigger fish to catch now.