Bug 310132 - [publisher] Publisher app + ant task should support statsURI option and generating required metadata for tracking download stats
Summary: [publisher] Publisher app + ant task should support statsURI option and gener...
Status: RESOLVED DUPLICATE of bug 436872
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: p2 (show other bugs)
Version: 3.6   Edit
Hardware: PC Linux
: P3 enhancement (vote)
Target Milestone: ---   Edit
Assignee: Pascal Rapicault CLA
QA Contact:
URL:
Whiteboard:
Keywords: helpwanted, polish
Depends on:
Blocks: 313602 316111 341744
  Show dependency tree
 
Reported: 2010-04-22 11:18 EDT by Nick Boldt CLA
Modified: 2014-09-01 07:27 EDT (History)
19 users (show)

See Also:


Attachments
publisher action that supports adding the p2.stats* properties (5.55 KB, text/plain)
2010-05-19 19:47 EDT, Hugues Malphettes CLA
no flags Details
Extended UpdateSite publisher application (8.27 KB, application/zip)
2010-05-25 12:24 EDT, Hugues Malphettes CLA
no flags Details
Updated app for adding p2 stats (8.52 KB, application/x-zip)
2010-06-08 08:05 EDT, Martin Oberhuber CLA
no flags Details
updated app for -p2.statsTrackedFeatures (8.88 KB, application/x-zip)
2010-06-09 01:34 EDT, Martin Oberhuber CLA
no flags Details
updated app v3 for -p2.statsSuffix (8.95 KB, application/x-zip)
2010-06-09 04:43 EDT, Martin Oberhuber CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nick Boldt CLA 2010-04-22 11:18:25 EDT
Per discussion in bug 302160, the Publisher should be able to add metadata required to track one or more bundles as described in bug 302160 comment 17 and bug 302160 comment 18.

It would be awesome if this could be done in time for people to adopt it for Helios, so we can finally get an accurate picture of p2 install download stats.
Comment 1 Hugues Malphettes CLA 2010-05-19 19:47:10 EDT
Created attachment 169250 [details]
publisher action that supports adding the p2.stats* properties

When this publisher action is performed, the p2.stats properties are added to the artifact repository.
I have integrated this to tycho with an extended UpdateSitePublisherApplication and I am using this for the jetty build.

The extended PublisherApplication uses those arguments:
-p2.statsURI http://eclipse.org/stats -p2.statsTrackedBundles org.eclipse.jetty.server,org.eclipse.jetty.osgi.pde.templates
Comment 2 Martin Oberhuber CLA 2010-05-21 17:17:50 EDT
Is there any remote chance getting this patch released for 3.6?

Or if not, then perhaps have it committed to HEAD and tagged for the 
  http://wiki.eclipse.org/Platform-releng-basebuilder ?

I don't know how others feel, but I don't feel very much like manually unjarring my artifacts.jar to hack in some XML data as per
  http://wiki.eclipse.org/Equinox_p2_download_stats
on each Helios contribution from now on...
Comment 3 Pascal Rapicault CLA 2010-05-21 17:54:10 EDT
Adding it to the RC3 list for consideration especially that there is already a patch attached.
Comment 4 Denis Roy CLA 2010-05-21 18:32:24 EDT
I hope this won't just track every single jar download, right?

> The extended PublisherApplication uses those arguments:
> -p2.statsURI http://eclipse.org/stats -p2.statsTrackedBundles
> org.eclipse.jetty.server,org.eclipse.jetty.osgi.pde.templates

For Eclipse, the statsURI should be http://download.eclipse.org/stats
Comment 5 Hugues Malphettes CLA 2010-05-21 18:54:30 EDT
(In reply to comment #4)
> I hope this won't just track every single jar download, right?
> 
> > The extended PublisherApplication uses those arguments:
> > -p2.statsURI http://eclipse.org/stats -p2.statsTrackedBundles
> > org.eclipse.jetty.server,org.eclipse.jetty.osgi.pde.templates
> 
> For Eclipse, the statsURI should be http://download.eclipse.org/stats

Thanks for the note Denis, I was using the wrong url :/
This (example) code only tracks the 2 bundles "org.eclipse.jetty.server,org.eclipse.jetty.osgi.pde.templates"
Comment 6 Andrew Niefer CLA 2010-05-25 10:13:31 EDT
The patch is incomplete, there is nothing there to hook it up to the outside world.

We are in RC3, I think it is way too late to be putting in new features.
http://www.eclipse.org/eclipse/development/plans/freeze_plan_3.6.php#FixPassAfterRC2
While this is nice to have, I don't think it qualifies as a "serious defect".

I would suggest making something available separately but not releasing this in p2 itself for 3.6
Comment 7 Hugues Malphettes CLA 2010-05-25 12:24:18 EDT
Created attachment 169851 [details]
Extended UpdateSite publisher application

Attached is an extension of the UpdateSitePublisherApplication that uses the DownloadableStats. The zip contains the exported bundle and the source bundle; all EPL.

It is invoked just like the standard UpdateSite Publisher using this application id:
- application org.sonatype.tycho.p2.updatesite.UpdateSitePublisherWithJRE
and it will read the parameters -p2.statsURI and -p2.statsTrackedBundles as described in the previous comment.

This is not a good solution as it only applies to update sites.

It is certainly too late in the game and I don't mean to push this for 3.6.
Anyways, I would need some guidance as to where you would like to plug this create this publisher's action if you want me to make a complete patch:
- in the AbstractPublisherApplication ?
- in a separate publisher application ?

I hope this helps.
Comment 8 Andrew Niefer CLA 2010-05-25 16:46:26 EDT
For the code itself, I see 
  if (artifact instanceof ArtifactDescriptor) {
     //should we check that this root IU is indeed a bundle?
     //(don't know how to do that)
     ((ArtifactDescriptor)artifact).setProperty("p2.statsURI", getStatsValue(publisherInfo, artifact));
  }

From http://wiki.eclipse.org/Equinox_p2_download_stats, this should probably be "download.stats".

As the patch stands, I would put it in a separate publisher application and perhaps also on the featuresAndBundles and UpdateSite applications as optional arguments.  This is the quickest and simplest thing for now.

More generally, there are basically 2 places to put things
1) p2.publisher - generally the place for generating new metadata
2) p2.repository.tools - generally the place for working with existing repos

publisher - this could be supported in the p2.inf file, there is already IPropertyAdvice which sets repository properties on simple artifact descriptors.  We could add a second method for normal artifact descriptor properties, we would just need AdviceFileAdvice (ie the p2.inf) to implement this.  We would still need an action to add the main repository property, but the properties on individual artifacts would be done by p2.inf

repository.tools -  my first thought was to add something to the <p2.process.repo/> ant task (RecreateRepositoryApplication in org.eclipse.equinox.p2.repository.tools).  This task goes over an artifact repo and fixes the artifact size, download size and md5sum for each descriptor.  This doesn't use the publisher actions, and it also isn't currently available as a command line app.
Comment 9 Martin Oberhuber CLA 2010-06-08 07:43:37 EDT
I tried using the app, but I'm finding the following issues:

1.) For individual artifacts, it generates a "p2.statsURI" property, but
    as andrew notes this must be a "download.stats" property.

2.) My understanding is that the download.stats property should be associated
    with bundles only (classifier='osgi.bundle') but this app also generates
    them for features (classifier='org.eclipse.update.feature').
    I'm concerned that this may lead to counting each install twice, since
    both the feature.group and the bundle would be installed.

3.) Up to now, I have been using the 
       -application org.eclipse.equinox.p2.metadata.generator.EclipseGenerator
    I had been using this with the arguments
       -compress \
       -reusePack200Files \
    and the result was that in my artifacts.xml, I found each artifact listed
    twice (once the compressed form and once the uncompressed one). I am 
    concerned that the new UpdateSitePublisher doesn't enlist the compressed
    artifacts any more. Is this acceptable?

For the records, I do want to enable download stats for the TM Helios contribution (as per bug 316111), but I definitely don't want to manually edit my artifacts.jar / artifacts.xml file and I also see no way patching the file with shellscripts. So my desire is blocked by fixing the 3 issues above. Here is how I'm installing the contributions and calling the app:

cd ${HOME}/ws2/eclipse/dropins/eclipse/plugins
wget --no-check-certificate \
  "https://bugs.eclipse.org/bugs/attachment.cgi?id=169851"
mv attachment.cgi\?id\=169851 x.zip
unzip x.zip
cd ../../..
tgtlauncher=`ls plugins/org.eclipse.equinox.launcher_* | sort | tail -1`
SITE=/home/data/httpd/download.eclipse.org/dsdp/tm/updates/3.2
CMD="java -jar ${HOME}/ws2/eclipse/${tgtlauncher} \
    -application org.sonatype.tycho.p2.updatesite.UpdateSitePublisherWithJRE \
    -source ${SITE} \
    -metadataRepository file:${SITE} \
    -artifactRepository file:${SITE} \
    -compress \
    -p2.statsURI http://download.eclipse.org/stats/tm \
    -p2.statsTrackedBundles org.eclipse.rse.core,org.eclipse.tm.terminal.local \
    -vmargs -Xmx256M"
echo $CMD
$CMD
echo "Result: $?"
Comment 10 Martin Oberhuber CLA 2010-06-08 08:05:24 EDT
Created attachment 171389 [details]
Updated app for adding p2 stats

Attached ZIP fixes issues (1) and (2) from my previous comment. The ZIP contains both the binary bundle as well as a source bundle. The relevant changes are in DownloadStatsAction:

  if (artifact instanceof ArtifactDescriptor) {
    //check that this root IU is indeed a bundle
    String classifier = artifact.getArtifactKey().getClassifier();
    if ("osgi.bundle".equals(classifier)) {
      ((ArtifactDescriptor)artifact).setProperty("download.stats",
          getStatsValue(publisherInfo, artifact));
      System.out.println("Added stats: "+artifact);
    } else {
      System.out.println("SKIPPED stats: "+artifact);
    }
  }

Regarding my concern #3 from the previous comment, I would like to hear input from others ... is it OK to not have the pack.gz artifacts listed explicitly in artifacts.jar? I'm concerned that without this, the pack.gz artifacts might not be used.
Comment 11 Hugues Malphettes CLA 2010-06-08 12:14:20 EDT
(In reply to comment #10)
> 
> Regarding my concern #3 from the previous comment, I would like to hear input
> from others ... is it OK to not have the pack.gz artifacts listed explicitly in
> artifacts.jar? I'm concerned that without this, the pack.gz artifacts might not
> be used.

Thanks for the updated patch.
This UpdateSitePublisher is not a replacement for the EclipseGenerator.
I think it is necessary to extend the EclipseGenerator and use the DownloadStatsAction in there.

In particular the UpdateSitePublisher does not take care of the packed artifacts.
I am using yet another custom application that invokes RecreateRepositoryApplication from the command-line to list the packed artifacts. Thanks Andrew for pointing to that application.

Martin, your comment makes me realize that I probably need to extend (or fork) the RecreateRepositoryApplication to add the download stats properties to the pack.gz artifacts. Right now they are not generated there so I doubt that packed artifacts will get hit.

I apologize as my well intentioned bit of code is so incomplete that I am afraid it will end up breaking other builds.

I am happy to attach my bundle and its sources that are extending almost all the publisher apps if it does not add to the confusion.
Comment 12 Martin Oberhuber CLA 2010-06-08 13:21:04 EDT
Well, before moving forward I would like to get a statement from p2 how UpdateSitePublisher relates to packed artifacts as per my question 3 in 
comment 9 and comment 10.

John or Andrew, can you shed some light on this?

Hugues, I don't see much risk in this breaking a build. And I'm afraid that attaching your "everything" would really add more confusion than help.

I would appreciate more of us in the Eclipse Community working together on a solution for adding the p2 download stats property. It just doesn't work manually editing those files.
Comment 13 John Arthorne CLA 2010-06-08 14:38:43 EDT
(In reply to comment #10)
> Regarding my concern #3 from the previous comment, I would like to hear input
> from others ... is it OK to not have the pack.gz artifacts listed explicitly in
> artifacts.jar? I'm concerned that without this, the pack.gz artifacts might not
> be used.

Correct. If it's not listed in artifacts.jar it will never be discovered by p2. p2 only knows about the artifacts that are described in the index file (artifacts.jar). Andrew should know more about what publisher application/options to use here to ensure both packed and unpacked artifacts are listed.
Comment 14 Andrew Niefer CLA 2010-06-08 14:50:14 EDT
(In reply to comment #13)
There is an ant task <p2.process.repo/> which will update the artifacts.jar to add pack.gz entries corresponding to existing jar entries if there is a pack.gz file on disk.  It will also at the same time update the MD5sum, artifact size, and download size.  

This is the task mentioned at the bottom of comment #8
Comment 15 David Williams CLA 2010-06-08 15:09:37 EDT
(In reply to comment #9)

> 2.) My understanding is that the download.stats property should be associated
>     with bundles only (classifier='osgi.bundle') but this app also generates
>     them for features (classifier='org.eclipse.update.feature').
>     I'm concerned that this may lead to counting each install twice, since
>     both the feature.group and the bundle would be installed.

I don't think that's true about "... should be associated with bundles only ...". I think either would be fine, in fact, to me "features" are more conceptually correct since users install things by feature, but, they should be telling you about the same thing, as the idea is just to pick a few of one or the other that represent some area of interest. Definitely don't count each artifact ... so I've heard (concerns of too many HEAD requests going back to eclipse.org). 

The count twice issue is probably valid if feature and bundle have same id. 
(Which isn't done in webtools, but I know some projects do, so the ultimate solution would need to account for that).
Comment 16 Martin Oberhuber CLA 2010-06-08 16:04:18 EDT
John - thanks for the insight.

Andrew - an ant task doesn't help me much, is there an application that I could run to have pack.gz files listed in artifacts.jar? As mentioned, up until today I've been using this:

  java -jar ${basebuilder}/plugins/org.eclipse.equinox.launcher.jar \
    -application org.eclipse.equinox.p2.metadata.generator.EclipseGenerator \
    -updateSite ${SITE}/ \
    -site file:${SITE}/site.xml \
    -metadataRepository file:${SITE}/ \
    -metadataRepositoryName "${TPVERSION} Update Site" \
    -artifactRepository file:${SITE}/ \
    -artifactRepositoryName "${TPVERSION} Artifacts" \
    -compress \
    -reusePack200Files \
    -noDefaultIUs \

But since docs tell me that "...Old style integration was done by calling the metadata generator ..." I assume there must be some new style application that can do the same for me? Preferredly as a post-processing step of Hugues' modified UpdateSitePublisher, such that the download.stats property remains in there?

David - I tend to prefer taking stats of bundles over features, because commercial applications on top of Eclipse (like ours) tend to replace the original features by home-grown features. So if a commercial up runs an "update" it wouldn't count the download of a feature, since the feature is not being downloaded. Same is true if the feature layout changes while keeping bundles around.

I also think that if I only have bundle counting, I can always find a bundle to use for any particular feature; but not the other way round.

Sure the "double counting" is not an issue when the bundle / feature ID's are not the same. In my case they happen to be the same, so that's why I changed the app to only inject stats entries for bundles.
Comment 17 Martin Oberhuber CLA 2010-06-09 01:34:06 EDT
Created attachment 171491 [details]
updated app for -p2.statsTrackedFeatures

So... I have slightly improved the app for generating stats, adding a new commandline option
   -p2.statsTrackedFeatures

which takes a comma-separated list of feature ID's to add the p2.downloadStats property to. The caller of the app can now decide what features to add stats for and what bundles to add stats for, without risking unexpected side effects due to name aliasing.

I have moved to tracking features in my own (DSDP-TM) Helios contribution since that works around the issue I have with packed bundles (since features are never packed). My stats contribution has a tiny bit of redundancy built in, so we'll see how that goes compared to to tracking bundles. I found out that if I first use the Publisher for adding download stats, and then the old generator for adding the packk200 artifacts, I seem to get what I need. For those who are interested, my shellscript is here:
http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.tm.rse/releng/org.eclipse.rse.updatesite/bin/mkTestUpdates.sh?root=DSDP_Project&view=markup

I do think though, that a problematic lack of functionality has been introduced by deprecating the p2 metadata generator (Eclipse Generator) in favor of the publisher, while not providing a commandline publisher app that's capable of
  - generating proper artifact metadata for packed bundles
    (the old -reusePack200Files option)
  - specifying an explicit name for the repository
    (the old -metadataRepositoryName option)

I'm wondering whether there are any existing bugs already to follow up on this lack of functionality?
Comment 18 Martin Oberhuber CLA 2010-06-09 04:43:26 EDT
Created attachment 171500 [details]
updated app v3 for -p2.statsSuffix

Here is a 3rd update of the app, now supporting an argument
  -p2.statsSuffix
in order to append a fixed suffix to all stats properties.

I found that this is useful after looking at some existing stats. The point is that /releases/helios is a combined repository, and will be holding the SR1 and SR2 releases in the future. In order to be able and see the difference between the SR0 downloads and the SR1 ones, they must use different versions.

Some projects append the feature / bundle versions to their stats ID's but I found a single suffix to uniquely identify my release more helpful. In fact I'm appending "_tm320" to all my stats ID's for identifying the TM 3.2.0 release -- that way, I can query the stats db just for _tm320 and will get downloads from both my project site as well as Helios.
Comment 19 Pascal Rapicault CLA 2011-05-02 15:10:23 EDT
Tobias could you please see if you can release this. Thx.
Comment 20 Tobias Oberlies CLA 2011-06-03 07:07:13 EDT
It's unlikely that I will have time to review the proposed changes. Sorry for that.
Reassigning to inbox.
Comment 21 Thomas Watson CLA 2011-06-08 11:28:48 EDT
Move all 3.8 bugs to Juno.
Comment 22 David Williams CLA 2012-03-02 01:11:05 EST
FWIW, WTP has our own "custom app" used for stats and p2.mirrorsURL which I also use in finalizing some release-train repos. See 

http://wiki.eclipse.org/WTP/Releng/Tools/addRepoProperties

if interested. (It does not provide ant tasks or anything cool, just a raw Eclipse Application).
Comment 23 Pascal Rapicault CLA 2014-01-14 22:12:36 EST
I'm looking into this for Luna but without any promise.
The current thinking is to enhance the category editor to allow for the URL of the stats server to be specified and also list the artifacts for which stats must be tracked.
Comment 24 Pascal Rapicault CLA 2014-05-05 10:21:29 EDT
Will not be addressed in Luna.
Comment 25 Mickael Istria CLA 2014-08-25 04:32:28 EDT
Isn't is a dup of bug 436872 ?
Comment 26 Tobias Oberlies CLA 2014-09-01 07:27:06 EDT
Also looks like a duplicate to me.

*** This bug has been marked as a duplicate of bug 436872 ***