Search source code in CVS… Can that be useful?
Folks have asked for the ability to search through CVS in the past, so I figured I’d see if I can leverage the existing software we run on eclipse.org to accomplish this. It’s important for me to try to use what we already have to keep software maintenance to a minimum. We’re running ViewVC already, which provides a nice web interface to CVS/SVN, and we have a search engine, so I’ve launched the Indexer right now on portions of CVS to see if it can work.
Here is what is being indexed:
- each file’s commit log (example). This page contains revision, tag and branch strings, as well as commit comments (which often includes a Bugzilla bug number!)
- the actual source code contained in the HEAD stream only (example). Picking only the HEAD stream will ensure the index isn’t contaminated with code from older revisions.
Try a couple of searches with the very limited data that’s already been indexed (results may be slow as the indexer is working):
Search for a Bugzilla bug reference: Bug 134394
Another one: 171518
Search by package name: “org.eclipse.ui.tutorials.rcp.part1″
Search for words in source code: TODO
I guesstimate it would take a few days to index all of CVS and SVN, and it will add a few hundred thousand URLs to our search database, so this is all just a test. I’d likely need to upgrade the search engine to the latest release, which offers better performance for large sites like Eclipse.
Again, the key here is leveraging our existing setup for this. Ideally, I could install some new whizbang application that arguably does a better job, but using ViewVC and Search means zero added maintenance for us.
So the question is: Do you think this is useful?
Posted September 26th, 2007 by Denis Roy in category: Uncategorized
You can skip to the end and leave a response. Pinging is currently not allowed.
11 Responses to “Search source code in CVS… Can that be useful?”
Leave a Reply
You must be logged in using your Eclipse Bugzilla account to post a comment.


Mark Phippard Says:
September 26th, 2007 at 7:22 pm
Have you looked at Krugle?
http://www.krugle.com/
If you get all of the Eclipse projects indexed you just need to add a search box somewhere on your pages.
We have something like this on openCollabNet:
http://www.open.collab.net/
Mark
pombreda Says:
September 26th, 2007 at 8:23 pm
Dear awesome webmasters, I suggest that you look at opengrok http://www.opensolaris.org/os/project/opengrok/
It is open source, based on the excellent lucene and supports both CVS, SVN and multiple languages including Java of course.
Originally, the Solaris folks made so it would be possible to search the newly open sourced Solaris code base.
It provides a google search syntax and can search through code very nicely.
Chris Aniszczyk (zx) Says:
September 26th, 2007 at 8:30 pm
yes, opengrok would be great.
Check out this bug:
https://bugs.eclipse.org/bugs/show_bug.cgi?id=162160
Denis Roy Says:
September 26th, 2007 at 8:53 pm
Krugle? OpenGrok? Glad to see you guys read the “I Don’t Wanna Install Yet Another Darn Open Source App That I Need To Maintain”
pombreda Says:
September 26th, 2007 at 10:28 pm
Denis:
You guys use lucene already right?
Opengrok is based on lucene.
Denis Roy Says:
September 27th, 2007 at 5:37 am
No, we don’t use lucene.
pombreda Says:
September 27th, 2007 at 10:17 am
So it is a build vs “buy” decision.
I would go for “buy”: this is absolutely not trivial to do the code search part right.
So you could roll your own, but if it does not help much developers that would be wasted time.
Open grok does it reasonably well.
And opengrok with lucene provides a very solid index techno that requires little maintainance and additional work once deployed.
I would be ready to help if you have issues.
Note that you could also consider an altertive to viewVC which is Mozilla’s LXR, and that does also search.
http://lxr.mozilla.org/
Kevin McGuire Says:
September 27th, 2007 at 12:45 pm
Cool. It could be useful. A case I just hit is, I’m triaging bugs and want to map a package name to project. Since these don’t always align well in org.eclipse.ui, and I don’t have all the source loaded, I could use the search as a reverse lookup and find out who works in that code area to assign the bug. Unfortunately I guess things aren’t fully indexed yet because my package query resulted in no hits (and should’ve).
I’m guessing there’s lots of other uses too but it’d have to be accurate and reasonably performant otherwise folks won’t bother. If both those can’t be achieved through rolling our own then either should use third party as suggested in other comments or don’t bother.
John Arthorne Says:
September 27th, 2007 at 2:36 pm
Note that Eclipse projects are already indexed by Krugle… there is nothing for the foundation to install (i.e., WORKSFORME). For example, here is a link to search results for “WorkbenchAdvisor” in Eclipse:
http://krugle.com/kse/files?query=WorkbenchAdvisor&lang=java&findin=code&project=eclipse
Denis Roy Says:
September 27th, 2007 at 6:10 pm
Thanks John. WORKSFORME works for me.
Nick Says:
September 28th, 2007 at 2:05 am
For another way to search CVS — by filename / history / committer / activity — instead of by substring, see Search CVS. Not only was it grown at Eclipse itself, but it’s EPL’d and in use already.