Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[technology-pmc] Eclipse Recommenders: Usage Data Collector

Hi PMC,

we've be quite busy in the last months to push Code Recommenders forward to the vision of "IDE 2.0" (see http://code-recommenders.blogspot.com/2010/08/eclipse-and-academia-briding-gap.html for a summary of the vision). I'm happy to announce that we are just a few small steps away from making this happen. 

Recently, Lars Vogel and others asked us to support platforms such as Android. The challenges with platform such as Android are (i) that there are only a few example applications available for analysis, and (ii) Android's bytecode format makes it impossible for us to analyze such apps to extract knowledge about how developer should program agains Android APIs. For that reason we started a project to write a usage data collector that collects the knowledge about how programmers use APIs directly from inside the IDE. With this collector, developers can (if they wish to support Eclipse Recommenders) share their experience with the community, i.e., upload anonymized usage data containing the information which API methods they used in their code. Andreas Frankenberger volunteered to develop such a system for Android which has been contributed to Code Recommenders recently:  http://dev.eclipse.org/ipzilla/show_bug.cgi?id=5453.

However, whenever data gets uploaded privacy becomes a serious topic which needs careful consideration. Wayne asked me to discuss potential threats and solutions with the Technology PMC. So, that's the reason for this mail.

The current solution collects anonymized API usage information and stores it into a central Apache CouchDB. Out of this database, every night new call recommendation models, documentation, usage statistics, bug detection models etc. are generated and offered for download to the users. The server is currently hosted at the university.

Wayne pointed out that upload target and privacy warrant further discussion with the PMC (see the last comment at http://dev.eclipse.org/ipzilla/show_bug.cgi?id=5453). I would be glad to take your questions on this. Which concerns do you have regarding data sharing and server upload target? And: any thoughts on how to get around them?

Regarding the system demands:
The current solution (anonymized, share-if-you-like, own server for model generation, data collection, and traffic) seems to be an acceptable solution to me since I don't have any server access on Eclipse nor can I actually estimate the demands of such a system at the moment. No-one tried this before and at it is unclear whether our current architecture scales well in the large. We need the flexibility to discard non-working solutions quickly and replace them by better solutions as soon as we hit a limit.  The server-side is somehow also part of the research.



A few additional points I would like to share:

In the last months we worked hard to make Eclipse Recommenders known by companies and researchers alike to spread the word about research at Eclipse. http://www.slideshare.net/Microbiotic/ide-20-research-at-eclipse-ecoop-2011 summarizes the efforts and results to some extent I hope. For instance, we built connections to several universities such as MIT, McGill Montreal, Rio de Janeiro, Munich and Kassel which consider to bring parts of their research to Eclipse. This process is slow but they start considering.

Code Recommenders itself offers a quite interesting set of tools such as the extdocs platform, code completion, code-search etc. 

Additionally, we are currently working on four contributions to JDT: A subwords completion engine (350000), an extension of the Completion Proposal (340876), a clean code method sorter(344394), as well as a JDT base chain completion engine. Other collaborations are in progress. All these tools evolved from our research and are continuously refined to become stable for the Eclipse community.

Most of our tools are built around an active community that shares knowledge - and so are most of the other research tools we target to be hosted at Eclipse too. Some improvements to Eclipse such as the intelligent calls completion, subwords or callchain completion engine become possible because of the availability of statistical usage data. To continue this work and to make this a healthy part of the Eclipse ecosystem, sharing concepts are needed that are simple to use (and safe) for programmers that want to share their knowledge with their community. A convincing solution is needed that is attractive for Eclipse and potential future research groups that wish to contribute to Eclipse.


Thanks,
Marcel

-- 
Eclipse Code Recommenders:


Back to the top