[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
RE: [cross-project-issues-dev] Download stats and p2
|
Hi Wayne,
thanks for your answers. I have two notes:
1 - Even if we get only one mirror's logs that may be helpful
to double check whether our mirroring / p2 strategies
do really work as expected. How often is content.jar fetched?
How often are the pack.gz fetched vs the original .jar ?
Is there any kind of request that goes to eclipse.org only?
How many failures are reported? ...
2 - Again I don't know what web server logs look like, but
from my naïve understanding we could go a pretty long way
with something very simple. If we just wont total numbers
(not grouped by geo region of downloader), this may be enough:
cat /var/logs/httpd.access \
| grep '{interesting date range}' \
| grep /path/to/mirrors/eclipse \
| sed -e '{extract filename only}' \
| sort \
| awk '{count consecutive occurrances}'
Assuming that Apache is the prevalent web server, the server logs
shouldn't be all that different. If we test such a script on our
own server to get total numbers and then ask one or two mirrors
to run this and mail back the results every day...
Cheers,
--
Martin Oberhuber, Senior Member of Technical Staff, Wind River
Target Management Project Lead, DSDP PMC Member
http://www.eclipse.org/dsdp/tm
> -----Original Message-----
> From: cross-project-issues-dev-bounces@xxxxxxxxxxx
> [mailto:cross-project-issues-dev-bounces@xxxxxxxxxxx] On
> Behalf Of Wayne Beaton
> Sent: Donnerstag, 18. Juni 2009 04:51
> To: Cross project issues
> Subject: Re: [cross-project-issues-dev] Download stats and p2
>
> Before I respond to Martin's question, I'd like to apologize
> to the p2
> team. While the suggested solution may have been a hack (unintended
> behaviour) in the update manager, it really is a purpose-designed
> feature in p2. It is hardly a "hack". I have added a section to the
> "Equinox p2 Getting Started for Releng" page [1] that describes the
> "Artifacts.xml mapping rule change".
>
> Onto the response...
>
> I believe that the list of mirrors sent to p2 *does not* include
> eclipse.org (I may be incorrect) so we won't have even enough data to
> base an approximation upon. We are leaning very heavily on
> our mirrors
> for this release (we are not adding any additional bandwidth).
>
> The concern with using server logs from mirrors is that the
> best we can
> hope for is an approximation of what's really happening.
>
> It is doubtful that all mirrors will participate. If only a couple of
> the major mirrors do not participate, our numbers will be woefully
> incorrect. Mirrors come and go, which would make maintaining
> an accurate
> approximation challenging.
>
> We anticipate hat none of our major mirror providers will consent to
> providing us with their data. I may, for example, be able to convince
> the good folks at the University of Waterloo to hand it over;
> I might be
> able to convince them to set up some kind of job to do it on
> a regular
> basis. However, I am skeptical that they actually will.
>
> You should also keep in mind that these organizations provide mirrors
> for many sites. Even if they do decide to hand it over, we
> will likely
> find ourselves buried in irrelevant log data.
>
> I am willing to try approaching one or two of the mirror providers to
> see how feasible this is, but I am not hopeful.
>
> FWIW, I haven't heard anything on this topic after the board meeting
> this week. Hopefully tomorrow, I'll get some feedback to see
> how big a
> deal this really is.
>
> Wayne
>
> [1]http://wiki.eclipse.org/Equinox_p2_Getting_Started_for_Releng
>
> Oberhuber, Martin wrote:
> > Hi Wayne et al,
> >
> > I'd like to ask back regarding option (1) from your E-Mail,
> > direct download stats from the web and ftp servers' access
> > logs on Eclipse.org (and those mirrors who happen to give them
> > to us).
> >
> > I'm assuming that for download.eclipse.org such logs already
> > exist, and recalling Denis' excited "shooting for 1 Mio
> > downloads now" blog or similar in previous years, I'm further
> > assuming that at least for Eclipse.org the analysis is not
> > that bad.
> >
> > Going for the server logs gives the most accurate data at
> > zero impact for the release itself. I'm not a web guy, but
> > I do assume that tools exist for analyzing those access logs.
> > Why not just go and ask some of the mirrors and see who is
> > willing to collaborate?
> >
> > But perhaps that is happening already, the stats are being
> > prepared but details are confidential for strategic members
> > only [some small reward for strategic membership]... while
> > some aggregate numbers are shared with the Community...
> >
> > Cheers,
> > --
> > Martin Oberhuber, Senior Member of Technical Staff, Wind River
> > Target Management Project Lead, DSDP PMC Member
> > http://www.eclipse.org/dsdp/tm
> >
> >
> >
> >
> >> -----Original Message-----
> >> From: cross-project-issues-dev-bounces@xxxxxxxxxxx
> >> [mailto:cross-project-issues-dev-bounces@xxxxxxxxxxx] On
> >> Behalf Of Wayne Beaton
> >> Sent: Freitag, 12. Juni 2009 20:51
> >> To: cross project issues
> >> Subject: [cross-project-issues-dev] Download stats and p2
> >>
> >> Greetings all. We have a small problem. Actually, I guess that the
> >> problem is as big as you choose to decide it is...
> >>
> >> The Eclipse Foundation tracks downloads that go through the
> >> download.php
> >> script:
> >>
> >> http://www.eclipse.org/downloads/download.php?file=[...]
> >>
> >> This includes things like the packages and direct downloads
> >> provided by
> >> projects (assuming that everybody is using the script in
> >> their download
> >> links).
> >>
> >> Downloads that occur through p2 do not go through this
> >> script. They go
> >> directly to our download server and to our mirrors. The
> >> mirrors do not
> >> (and arguably cannot reasonably) provide us with download stats.
> >>
> >> So... if somebody, for example, downloads the "Eclipse IDE for PHP
> >> Developers" we will know that we have one more download of
> >> PDT. If they
> >> instead download the "Eclipse IDE for Java Developers" and
> >> then use p2
> >> to add PDT to their configuration, we currently do not have
> >> any way of
> >> tracking that download of PDT.
> >>
> >> Inability to accurately track downloads is a huge concern for the
> >> Eclipse Board.
> >>
> >> We have explored several mechanisms for tracking this download.
> >> Unfortunately, we've not been holding these conversations as
> >> publicly as
> >> I'd like, so I'll summarize them briefly below...
> >>
> >> 1. Get mirrors to give us their download stats. We could ask.
> >> But most
> >> will not give them to us. Besides, their logs probably contain
> >> information about everything they mirror, which will be way more
> >> information than we need. And it'll be a heck of a lot of
> information
> >> for our webmasters to weed through.
> >>
> >> 2. Add a plug-in that gathers information from p2 post
> >> install and send
> >> that information to eclipse.org. Effectively, this is a call-home
> >> mechanism that will require some additional UI elements and
> >> considerable
> >> effort awfully late in our development cycle. Ultimately, it will
> >> require some kind of opt-in from the user; many of whom
> will refuse
> >> leaving us with incomplete data. FWIW, we could use the
> UDC for this,
> >> but it has the same problem.
> >>
> >> 3. All p2 downloads go through eclipse.org. Denis is
> >> concerned that the
> >> download.php script and--to some degree--the rest of our
> >> infrastructure
> >> will not be able to scale to handle the value that can
> >> potentially come
> >> from p2 downloads. FWIW, we're not increasing our bandwidth
> >> for Galileo;
> >> instead, we're depending very heavily on mirrors.
> >>
> >> Bug 239668 [1] has been open for some time to discuss this issue.
> >>
> >> We've decided that the best approach is something that we've been
> >> calling the "Single File Hack". In this hack, we configure the p2
> >> metadata (artifacts.xml) to send requests for some small
> >> subset of the
> >> files to eclipse.org. Ideally, we send requests for one plug-in or
> >> feature for each thing that we need to track. The number of
> >> files needs
> >> to be kept relatively small.
> >>
> >> There are problems with this hack. For one, eclipse.org
> >> becomes a single
> >> point of failure for all downloads. Further, we will have to let
> >> organizations that mirror our downloads for internal
> consumption know
> >> how to turn it off.
> >>
> >> What we're going to need from each project is the names of
> the files
> >> that we need to be tracking.
> >>
> >> I'd love to hear your thoughts on this topic.
> >>
> >> Wayne
> >>
> >> [1]https://bugs.eclipse.org/bugs/show_bug.cgi?id=239668
> >> _______________________________________________
> >> cross-project-issues-dev mailing list
> >> cross-project-issues-dev@xxxxxxxxxxx
> >> https://dev.eclipse.org/mailman/listinfo/cross-project-issues-dev
> >>
> >>
> > _______________________________________________
> > cross-project-issues-dev mailing list
> > cross-project-issues-dev@xxxxxxxxxxx
> > https://dev.eclipse.org/mailman/listinfo/cross-project-issues-dev
> >
> _______________________________________________
> cross-project-issues-dev mailing list
> cross-project-issues-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/cross-project-issues-dev
>