Better search
The Search engine at eclipse.org gets its fair share of use; unfortunately, the results don’t always seem relevant. The search engine itself, although not a Google, is not a bad piece of software in its own right — it simply has a hard time indexing the entire site with the current configuration. The main problems are:
- There are a bizillion pages on the site, and the indexer is not configured to rank them properly. Currently, the body of a page has (by far) the most weight, and the URL, Title and META tags little to none. With this configuration, a mail archive page containing the word “SWT” 10 times seem more relevant than the SWT home page when simply searching for “SWT”.
- The download.eclipse.org and Infocenters (help.eclipse.org) sections are currently not indexed, yet those two servers contain valuable information
- the sheer quantity of Mail and News archives pages is astonishing (300,000 and counting), and because of the poor ranking configuration and the natural recurrence of rich keywords within each page, these documents often get top ranks — yet mean little to the crowd actually using the search engine
To achieve better search results, I’ve installed the latest version of mnogo, and I’ve tuned it for the eclipse.org content:
- the title has the most weight, followed my META tags (keywords, description)
- the body comes in next, but much lower
- the URL now plays an important part - a page with “birt” in the URL will rank higher when searching for birt
- page headings, using h1, are now considered as well
- default search will exclude mail and news archives. Most searches are done using generic terms, so those folks looking for a snippet of code buried deep in an archive will know to use an extended search
- a more user-friendly scope for searching: website only, downloads, documentation, archives
The new search engine is currently indexing the site now. It should take another day or two for it to complete. I’ll post details when I’m ready for you to give it a test run.
Posted September 28th, 2005 by Denis Roy in category: Uncategorized
You can skip to the end and leave a response. Pinging is currently not allowed.
4 Responses to “Better search”
Leave a Reply
You must be logged in using your Eclipse Bugzilla account to post a comment.


Vineet Says:
September 28th, 2005 at 10:31 am
I know this is an additional project, but as an Eclipse plug-in developer it would be nice to also have the entire source code indexed as well. Possibly something like lxr.
Denis Roy Says:
September 28th, 2005 at 5:35 pm
This is a really neat idea. I’m not a developer, so I have never needed to search the source code.
Thanks!
Katrina Says:
October 31st, 2005 at 6:52 pm
Hey,
Very nice blog and some interesting posts. Have a look at my new bit torrent site called - Mininova -
Thanks
David Skul Says:
November 19th, 2005 at 9:21 am
This is a great resource for web masters and designers…
Do you want Better Search Results – Read Art of the Content Site by Nathan Anderson
After reading the newest ebook published by Nathan Anderson, I was floored at how it easy it was to generate great search engine results without paying a guru an outrageous sum of money to do so. The search engines are accessible to everyone because of this new look at search engine optimization. Nathan Anderson takes the time to write down to earth explanations that any beginner can grasp and apply immediately. His writing style makes the technical jargon used by many technology nerds palatable to the amateur user.
Nathan clearly defines what a content site is and how it works. He also goes into detail about the components that make up a content site. I especially enjoyed this quote from the books opening letter. “Many SEOs are quite proud of their content creation abilities, claiming it to be an “art”. They call it “SEO Copywriting” in order to make themselves feel more important than they are… when in fact, any monkey that knows English can perform nearly on-par with these “experts.”
This comment says a lot about Anderson and the kind of person he is. The sheer simplicity of the seven clearly outlined components of a content site is indeed the work of monkeys. Thanks to Nathan’s lack of egotism and his new ebook we can all share in better search results. Nathan holds nothing back as he goes into on-page and off-page optimization factors. Anderson reveals twelve points that any SEO would charge thousands of dollars to reveal.
Just when you thought that the book was worth every minute spent signing up to his exclusive opt-in list, Nathan goes on to reveal the seven components of a content site as in a very well defined and easy to understand writing method. This portion of his new ebook clearly explains that the following seven components are general characteristics of a content site.
1. Houses primarily HTML Text
2. Includes elements of interaction
3. May include multiple media types
4. Grows over time
5. Is tightly themed (niched)
6. Takes advantage of personal passions
7. Community = Success
Anderson goes on to explain every characteristic in great detail to ensures that the reader knows exactly how to compile, combine, and complete a site that embodies “the art” of a true content site. The simplicity of every single component is emphasized with examples and clear explanations.
Nathan goes as far as to give a list of resources and tools that is unmatched anywhere on the net. He goes into the programs and tools he uses personally to construct his own content sites. It was a true pleasure to read this ebook and use the tips and techniques Nathan has given to the world. The ebook is free at http://www.artofthecontentsite.com. Take a minute download it and start living your dreams.
SEO Solutions and this one-way link provided by LinkAcquire.com
David C Skul - CEO LinkAcquire.com and Relativity, Inc. is pleased to serve his clients throughtraffic generating articles and one way links.