A WebMaster’s view of Eclipse.org

Rants, praise and observations related to the technical and psychological challenges of running servers for a pretty busy site.

Galileo SR1 - available early for Friends of Eclipse

Galileo SR1 is available a day early for Friends of Eclipse.  Here is your link:

http://friends.eclipse.org/galileo_sr1.html

Of course, it’s never a bad time to become a Friend:

http://www.eclipse.org/donate/

What is my NFS server doing?

If you’re running an NFS daemon (nfsd), at some point in time you may have wondered what it was doing right now.  If it’s running in kernel space, tools like lsof and strace don’t work, so you’re left guessing.

After much Googleing and some inspecting of the Kernel source code, I discovered some debugging values that can be poked into /proc/sys/sunrpc/nfsd_debug.  The most useful was 32, which I used like this:

echo 32 > /proc/sys/sunrpc/nfsd_debug; tail -f /var/log/messages | grep lookup

Essentially, this will give you an idea as to what files are being served up by nfsd.  Be careful, though: on a busy NFS server, this will spew lots of output to /var/log/messages.

After stopping the above command with CTRL+C, don’t forget to turn off nfsd_debug:

echo 0 > /proc/sys/sunrpc/nfsd_debug

With this trick I was able to find some nasties that were hurting our NFS performance.

Fun times at work

What a fun quarter this has been so far.  It started with the new Forums site, then Matt and I performed some much-needed hardware maintenance, Karl and I swapped all our Cisco devices for new ones, and I upgraded Bugzilla last weekend.  In the mix, I’ve been hunting down MySQL and NFS problems and looking for all kinds of optimizations to try to restore some snappiness to our site.

Both Bugzilla and the Forums still need a bit of work, and after that I tackle another big toy: Git.

Fun times!

Bugzilla 3.4 coming soon

This Saturday I’ll be upgrading our “ancient” Bugzilla 3.0 to the latest Bugzilla 3.4.  Lots of new features will be available to everyone, and many of those are features our committers have requested.  Just check out this dependency tree!

Here’s a highlight of a few new features:

A new Guided Bug Entry wizard to help new users file bugs in the correct locations and reduce triage/duplication.

A much improved UI:

Uncluttered attachments by default:

.. and many more:

  • Better database storage for improved performance
  • Better email handling for improved performance
  • UTF-8 for better internationalization

I’ve set up a Bugzilla 3.4 sandbox for your testing enjoyment, and you can read about the new features in Bugzilla 3.2 and 3.4.

Network Upgrade

Tomorrow, Denis and I will be heading in to the data center to upgrade our network gear–to take advantage of the equipment that Cisco donated to us.  This is a big upgrade since the key pieces of our infrastructure, including the LocalDirector load balancer, and the PIX firewall are as old as the Foundation, and are topped out in bandwidth capacity.  The new gear includes an ASA firewall, CSS load balancer, and a few new switches, all of which are gigabit-capable.  This gives us some head room over the next couple of years, allowing us to increase our bandwidth–we can’t right now.  Thanks to Cisco for the new gear!

As we announced last week on the committers list, this is going to affect a lot of stuff.  Since this amounts to a major organ transplant for our site, everything will be off line for a certain period of time.  If you try to access Eclipse.org services this weekend and encounter unavailability or intermittent availability, rest assured we’re working to resolve it.  If everything goes smoothly it could be a very short interruption.

Squeezing more performance out of your Apache web server

You can learn a lot about your web server by strace‘ing it.  Look at this, with my commentary in red:

www-vm1:~ # strace -p 5827
Process 5827 attached - interrupt to quit
poll(
  ^^Your shell will sit here until this process
    receives a request

[{fd=14, events=POLLIN, revents=POLLIN}], 1, 15000) = 1
read(14, "GET /modeling/images/dl-more.gif"..., 8000) = 677
  ^^ A request for a plain old GIF file

gettimeofday({1249652331, 82192}, NULL) = 0
stat("/home/local/data/httpd/www.eclipse.org/html/modeling/images/dl-more.gif", {st_mode=S_IFREG|0654, st_size=111, ...}) = 0
  ^^ This is the absolute path to the file on disk.

lstat("/home", {st_mode=S_IFDIR|0755, st_size=192, ...}) = 0
lstat("/home/local", {st_mode=S_IFDIR|0755, st_size=120, ...}) = 0
lstat("/home/local/data", {st_mode=S_IFDIR|0755, st_size=72, ...}) = 0
lstat("/home/local/data/httpd", {st_mode=S_IFDIR|0755, st_size=112, ...}) = 0
lstat("/home/local/data/httpd/www.eclipse.org", {st_mode=S_IFDIR|0755, st_size=72, ...}) = 0
lstat("/home/local/data/httpd/www.eclipse.org/html", {st_mode=S_IFDIR|0750, st_size=4808, ...}) = 0
  ^^ Apache crawls the entire directory structure leading
     up to the file

open("/home/local/data/httpd/www.eclipse.org/html/.htaccess", O_RDONLY) = -1 ENOENT (No such file or directory)
lstat("/home/local/data/httpd/www.eclipse.org/html/modeling", {st_mode=S_IFDIR|0750, st_size=768, ...}) = 0
open("/home/local/data/httpd/www.eclipse.org/html/modeling/.htaccess", O_RDONLY) = -1 ENOENT (No such file or directory)
lstat("/home/local/data/httpd/www.eclipse.org/html/modeling/images", {st_mode=S_IFDIR|0755, st_size=2056, ...}) = 0
open("/home/local/data/httpd/www.eclipse.org/html/modeling/images/.htaccess", O_RDONLY) = -1 ENOENT (No such file or directory)
  ^^ Unless you have AllowOverride None, Apache will
     look for .htaccess files in each subdirectory of
     the DocumentRoot

lstat("/home/local/data/httpd/www.eclipse.org/html/modeling/images/dl-more.gif", {st_mode=S_IFREG|0654, st_size=111, ...}) = 0
open("/home/local/data/httpd/www.eclipse.org/html/modeling/images/dl-more.gif", O_RDONLY) = 15
close(15)                               = 0
read(14, 0x555555a356b8, 8000)          = -1 EAGAIN (Resource temporarily unavailable)
writev(14, [{"HTTP/1.1 304 Not Modified\r\nDate:"..., 170}], 1) = 170
  ^^ All of that hard work to simply respond "Use your cached copy"
     to the client!

write(10, "192.168.0.1 - - [07/Aug/2009:"..., 256) = 256
  ^^ IP address changed to protect the innocent.

poll(
  ^^ That request is complete, wait for another

All of this happens lightning fast. But do consider: Had the request been for a PHP file, all the above would have been repeated for each nested require(), require_once() and include() file.  So the moral of this story is:

1. Don’t nest your DocumentRoot too deeply.  We could trim /home/local/data/httpd/www.eclipse.org/html to save lots of CPU and disk cycles

2. Keep AllowOverride None to avoid accessing .htaccess files everywhere, unless you really need them.  We’re investigating this seriously for www.eclipse.org.

3. Keep your web-visible directory structure short, too.  http://www.eclipse.org/some/directory/structure/that/is/really/deep/and/nested/page.php will obviously generate lots of file stats (especially with AllowOverride)

4. If you use PHP files, only include what you need, otherwise you’re stat’ing (and reading, and possibly compiling) PHP files for nothing.

The more you reduce disk and CPU cycles for one request, the more your web server will scale.

The endless list of Forums

First off: I’d like to thank Eric Rizzo for helping out with the new Forums.  As if he doesn’t help out enough by answering questions and providing insight, now he’s organizing the Forums, giving them descriptive names and pinging projects to get updated descriptions.

Now, on to the problem.  There are about 115 forums for you to browse through, and as more and more projects get created, the page will only gather more clutter. Do you have any thoughts as to how all these forums should be organized?  If so, we could really use your help in bug 284281.

99.999% uptime?

Even with gazillions of dollars in the bank, it seems like the Five Nines eludes even Google.

Ready to kick the tires

It’s not done yet, but if you want, go ahead and try out the Eclipse Community Forums.  The Control Panel needs work but the rest should function quite well.

If you see anything broken, please comment on bug 284281.

Newsgroups? No way, let’s call them something else.

While many[1] enjoy the finer charms of NNTP newsgroups, it’s no secret that NNTP is one of those old-school protocols, predating the joys of Facebook, Tweetering and instant texting.

On some blog somewhere, someone suggested I look at FUDForum for Eclipse.org, since it’s one of those cool Forum apps that has the ability to act as a front-end to nntp newsgroups.  So I did, and I’m preparing to put this out for everyone to see.  See bug 284281 for details.

But I have one problem: What do we call it?  Eclipse Newsgroups?  No way — the cool kids will think we’re stuck in 1971.  Eclipse Forums? Eclipse Chat?  Help!!!

Help me put a name to this page .. If I like your ideas, I’ll send some Eclipse swag your way.

[1] Please tell me I’m not the only one…

Recent Posts

Archives

Categories

Meta