Archive for the ‘UDC’ Category

Wanna see some numbers?

Tuesday, August 5th, 2008

I’ve got some numbers from from the usage data collector for the month of July. It’s the first full month of real data. Currently, I’ve only got data for commands, views, editors, and perspectives used; the information is restricted to only show those entries that start with “org.eclipse.”. You can change the sort order of the table by clicking in the columns.

I haven’t been able to update the “project proximity” report. There’s just too much data for the query I’m running. Once I optimize the query, I’ll update that report too.

Holy Cow!

Thursday, July 17th, 2008

Participation in the Usage Data Collector has been nothing short of outstanding. As of this moment, we have gathered usage data from more than 22,000 users (more than 20,000 new participants this month alone). The usage trends page shows this in a cool graph.

The other reports that we’ve generated are a little out of date since running those reports is pretty darned time consuming. I’ve optimized the database for storage space first, writes second, and reading/reporting third. I’m going to spend some time today making reading a little less painful so that we can actually run those reports.

22,000 users participating. Staggering.

Project Pairing Visualization

Tuesday, June 24th, 2008

At Nick’s suggestion, I’ve been tinkering with using Zest to visualize the project pairing data that I’ve spend so darned much time gathering from the Usage Data Collector results over the past week.

This image was generated using the “Spring graphlayout algorithm to represent how closely related the various projects are.

proximity.png

The spring layout algorithm pulls the most frequently used project combinations towards each other, so those projects most commonly found in the wild naturally drift to the middle.

I’m not very happy with the way that it’s drawn; I’ll work on that over the next few days. Right now, it runs as an SWT application. I may turn it into a plug-in.

The graph helped me notice a few other bundles that needed to be rolled into their project (recall that the names shown on the report and in this graph are taken from the third segment of each bundle’s symbolic name). So, in the next version of the report (and this graph), ‘cvs’ will be rolled into ‘platform’, and ‘contribution’ into ‘ajdt’. I also decided to roll those curious ‘mylar’ references into ‘mylyn’.

Project Pairing reports: Now with column sorting!

Monday, June 23rd, 2008

Last week, I wrote about a new report I added for the UDC data that shows project pairings “in the wild”. In a comment, Nick effectively killed my weekend. Nick, my wife’s not happy with you. I showed her your picture. Be careful.

The report can now be sorted on columns. Actually, this part was easy to implement.

The hard part is that I noticed something when I started sorting the data on the different columns. I noticed that, while I expected that each pairing should be represented twice, some of them are only represented once. I expected to see, for example, a pairing of ‘ajdt’ and ‘platform’ along with a pairing of ‘platform’ and ‘ajdt’; but I only see one (the latter). I tried several different queries, and even tried more manually assembling the data (the final query is surprisingly simple, yet yields the same results as some of its more complicated cousins). I stumbled into two known MySQL bugs, and (possibly) one unknown one. To ensure that things stayed interesting, I opted to move the main query into a stored procedure (put the Data Tools Project’s SQL editing support into full use (though it was complicated by the fact that I had never once before ever created a stored procedure).

I’ll keep hacking at it, but in the meantime, I’ve made the new version available anyway.

In my efforts to refine the query, I discovered that I had previously been doubling the actual numbers (in my defense, I did state that I thought they felt a little high). That’s been fixed.

Once I can figure out a reasonable home for the queries, I’ll make ‘em available so that Nick can lose a weekend too.

I also spent a little time looking for a visualization package; something that will let me draw a diagram showing the relative proximities of the projects. I did stumble upon a neat little physics package that I figure I can use: each project can be represented as particle, with the number of users represented as the strength of a spring joining the particles. I think it’ll look cool. But, unfortunately, it’s become low priority; at least for the next week or so.

In the meantime, does anybody know of some decent software that can be used to represent this sort of information?

Project pairings

Thursday, June 19th, 2008

I decided that it might be cool to find out which projects are being used together in the wild, so I wrote a new report using data collected by the Usage Data Collector (UDC). The report shows a table where each row contains a pair of projects and a number. The number represents the number of users who have used at least one of the bundles from each of the projects. I’ve garnered some delightful insight from this report.

First, it’s interesting to me to note that there are two people out there that have mylar bundles installed (”mylar” is the former name of the “mylyn” project). I assume that this is somebody who just has the bridge “do nothing” bundles that were created as part of the the transition to the new name. What’s particularly interesting about that is that somebody actually installed the UDC into what I’m sure is an existing Eclipse 3.3-based product to produce this input.

Second, RAP seems to be used in conjunction with JST by a lot of folks. I guess that this feels natural, but in my own dealings with RAP thus far, I have not used both RAP and Java EE explicitly together (and no, JST is not a prerequisite for RAP). My guess is that a lot of folks out there doing RAP stuff are also doing Java EE stuff.

Some of the usage numbers are impressive. Data tools, Device Debugging, Test and Performance Tools Platform are used by a lot of folks.

The report needs some improvement. First, I’m identifying the project by extracting the third segment from the bundle name, (i.e. “org.eclipse..whatever”. I’ve tried to gather up all the names used by the Platform project and dump them into a single “platform” entry, but I think that I’ve missed a couple. Also, there are some other bundles that are not quite following the rules that I’ll have to sort out. Presently, the report only looks at bundles with names that start with “org.eclipse.” and doesn’t consider versions.

I’m also a little skeptical about some of the numbers. Some of them feel a little high (especially when compared against numbers from the other reports). while I’m confident in the pairings and the relative magnitude of the numbers, it’s probably a little early to start basing any decisions on them.

Anyway… enjoy the Project Pairings report. And please do let me know if you can think of any way to improve it.

Usage Data Collection and Testing

Friday, June 13th, 2008

Earlier this week, while helping James prepare some words to include in this week’s EclipseZone newsletter on the Usage Data Collector (UDC), a thought occurred to me (which I included in the text): the information gathered by the UDC should help in testing.

Think about the magnitude of the testing problem that faces Eclipse projects. Consider just Ganymede. A user/organization is assembling a development environment will choose to either include, or not include each of these projects. That means, that there is an upper limit of 224 = 16,777,216 possible combinations of projects. This assumes that each user is going to take all the available bits of each project; if you try to factor in that each project produces some number bundles that can either be included or not included, then this number gets way bigger.

Now, with the assumption that each user/organization is going to take all the bits of each project, the number is actually a bit lower. To have a development environment, you need to have the Eclipse top-level project’s code, which drops the upper limit to 223 = 8,388,608 possible combinations of projects.

If each project has exactly one automated unit test that takes 1 second to run (which, if you consider the time required to startup the test environment is probably a little low), running the tests on all of the combinations will take, well… a heck of a lot of time (2,330 days). And that completely disregards any time required to actually configure all those different combinations. In fairness, the upper limit is probably (though not necessarily) a little high. Let’s say that the actual number of combination of projects in the wild is really two orders of magnitude lower than that upper limit (which brings us down to a scant 23 days of automated unit testing).

The point is that rigorously testing all possible combinations of Eclipse projects is—while not impossible—certainly impractical.

One of the handy things that the UDC captures is combinations of software that actually exist in the wild. We can tell from the data, for example, that certain features are actually used in combination. We will be able to tell, for example, how many people are actually using RAP in combination with BIRT. Or Subversive with Mylyn. Or Subversive and CVS in conjunction with TPTP and Buckminster. I have this vision in my head of a graph showing all the different bundles clustered into groups where the closeness between bundles indicates how commonly they are seen together in the wild.

From this, I expect that we’ll end up with a relatively small (i.e. much smaller than the upper limit) set of combinations of projects and bundles that are really used in the wild. From there, we can set priorities for testing, usability, interoperability, integration, etc.

What do we do with this information? At a minimum, we (“we” the Eclipse community, not necessarily “we” the Eclipse Foundation) might consider using the Eclipse Packaging Project (EPP) to build these most commonly found combinations to make them available for testing. It has been suggested that the Eclipse Foundation should invest some resources into testing; maybe having the packages based on real combinations in the wild is a step toward that, or maybe even something else entirely. Frankly, I think it’d be cool if we had a Testing project. At least we’ll know what to test.

I’ve added a new report that shows (1) how many people are participating each month, (2) how many new people are contributing data for the first time each month, and (3) how many events are uploaded each month. The report is updated weekly.

Lots of custom views, editors, and perspectives; not so many commands

Thursday, May 29th, 2008

If you take a look at some of the recent reports generated from the usage data collector, you may notice that there are a lot of custom (i.e. not from eclipse.org) views, editors, and perspectives out there, but not so many custom commands.

This is merely my own observation based on what I see in the report (so reality might be different). What I expected to see was a lot more xxx entries (what we use to indicate data collected about non-org.eclipse.* bundles) in the commands table than appear in the others. I expect to see commands being invoked a heck of a lot more often than the various views, editors, and perspectives are opened. But I don’t.

I assume that most folks are running the usage data collector in their development environments (you’d have to explicitly include it in your RCP application’s configuration if you wanted to see it there), which seems to indicate that use of the command infrastructure is very low amongst tool developers working outside of eclipse.org projects.

The reason for this low penetration seems pretty clear to me: the command infrastructure is a little hard to get your brain around (plus it is “relatively new”). There certainly are a lot of questions about it on the various Eclipse newsgroups (like newcomer). From this data, I can see that (a) the Command infrastructure needs to be easier to use, (b) better documentation and examples are required, or (c) both. A small group of us spoke at EclipseCon about writing an article to fill the documentation gaps; or at least to try and bring as many of the pieces together into coherent whole. That effort has taken the form of Bug 223445. Unfortunately, the effort seems to have stalled (as if often the case after the initial excitement wanes). But new life will be breathed into the effort. Oh yes… new life.

The usage data collector is part of the Ganymede packages, so the data that we’re getting is—thus far—only from those early adopters who have been testing our milestone and release candidate builds. It’ll be interesting to see what other information we can glean as more users opt to provide their data.

FWIW, we are very careful to control the kind of data we’re collecting. First, we only capture the names of commands, views, editors, perspectives, and bundles; along with bundle versions. With each data upload, we also capture the country from which the upload occurred. We are not capturing anything of a personal nature. We do not, for example, capture IP addresses, file names, email addresses, the name of your dog, or any other such thing. Second, we are careful about what we report. We’re obfuscating the names of bundles, views, editors, and commands that do not originate from eclipse.org (we also don’t obfuscate the names from the Subclipse and ICU4J bundles).

So… tell me honestly: how do you feel about your Eclipse IDE capturing usage data?

Lots of Web Developers

Thursday, March 20th, 2008

Participation with the Usage Data Collector (UDC) continues to increase. Over the past 14 days, 2,757,764 usage data events were been generated by 547 users (an average of 5,042 events per user). It seems that 370 of you used the JDT Package Explorer and 88 used the WST Servers view. I haven’t run any analysis to prove it, but I imagine that those 88 users of the Servers view are also included in the Package Explorer number. My little brain tells me that a lot of folks are using Eclipse to write enterprise applications using Web Tools.

More results (raw data) can be found on the UDC results page.

Subversive or Subclipse?

Friday, March 14th, 2008

Jesper was curious about Subclipse use captured by the Usage Data Collector, so I tweaked my queries to stop obfuscating the Subclipse bundles. The results for Subclipse and Subversive use by 465 users over the past 14 days are shown here.

First, the commands:

  • org.tigris.subversion.subclipse.ui.commit (invoked 805 times)
  • org.tigris.subversion.subclipse.ui.update (invoked 556 times)

It doesn’t appear that the Subversive plug-ins make use of the command framework (or I’m looking for the wrong name).

Perspectives:

  • org.tigris.subversion.subclipse.ui.svnPerspective (activated 160 times)
  • org.eclipse.team.svn.ui.repository.RepositoryPerspective (activated 128 times)

Views:

  • org.tigris.subversion.subclipse.ui.repository.RepositoriesView (opened 119 times by 32 users)
  • org.eclipse.team.svn.ui.repository.RepositoriesView (opened 178 by 22 users)
  • org.eclipse.team.svn.ui.repository.browser.RepositoryBrowser (opened 20 times by 9 users)
  • org.eclipse.team.svn.ui.properties.PropertiesView (opened 20 times by 12 users)

For comparison, the CVS Repositories view was opened 189 times by 57 users.

It looks pretty close to me; at this point, I don’t think we have enough data to make any real conclusions…

Usage Data Collector Usage

Friday, March 14th, 2008

Over the past 14 days, 453 (526 in total) of you have consented to upload your usage data to the Eclipse Foundation server. There hasn’t been a lot of change in the top five commands:

  • org.eclipse.ui.file.save 87098
  • org.eclipse.ui.edit.text.goto.wordNext 71274
  • org.eclipse.ui.edit.delete 66449
  • org.eclipse.ui.edit.paste 57309
  • org.eclipse.ui.edit.text.goto.wordPrevious 51138

Perspective use has some interesting parts, though they occur a little further down the list. It seems that most of us are using Eclipse for Java development:

  • org.eclipse.jdt.ui.JavaPerspective 4592
  • org.eclipse.debug.ui.DebugPerspective 2240
  • org.eclipse.team.ui.TeamSynchronizingPerspective 1155
  • org.eclipse.jst.j2ee.J2EEPerspective 1095
  • org.eclipse.jdt.ui.JavaBrowsingPerspective 111

I find it curious that so many among us use the Java Browsing Perspective. It must be all the Smalltalkers in the crowd…

A few fewer of us are using the PDE:

  • org.eclipse.pde.ui.PDEPerspective 299
  • Use of the CDT and PDT appears to be quite healthy:

    • org.eclipse.cdt.ui.CPerspective 182
    • org.eclipse.php.perspective 87

    Subversive seems popular:

    • org.eclipse.team.svn.ui.repository.RepositoryPerspective 122

    You can view all the results on the EPP Usage Data Collector results page.

  • You are currently browsing the archives for the UDC category.
  • Pages

  • Archives

  • Categories