Bug 507401 - NoClassDefFoundError: org/eclipse/.../Foo$1 when the pool is on a shared drive
Summary: NoClassDefFoundError: org/eclipse/.../Foo$1 when the pool is on a shared drive
Status: RESOLVED WONTFIX
Alias: None
Product: Oomph
Classification: Tools
Component: Setup (show other bugs)
Version: 1.6.0   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-11-11 08:41 EST by Aaron Digulla CLA
Modified: 2021-02-25 04:01 EST (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Aaron Digulla CLA 2016-11-11 08:41:53 EST
I just installed Eclipse Java EE Neon using Oomph.

After a couple of minutes, Eclipse locks up. Looking into the log, three are a lot of weird NoClassDefFoundErrors:

java.lang.NoClassDefFoundError: org/eclipse/wst/sse/ui/StructuredTextEditor$8
	at org.eclipse.wst.sse.ui.StructuredTextEditor$CharacterPairListener.verifyKey(StructuredTextEditor.java:923)

If you look closely, you'll see that Foo can't load Foo$1 ie. an inner class of itself. I can't imagine how that could happen when the build succeeds. It looks like the inner classes are missing in the JARs.

This left me completely baffled until I looked into the Eclipse install: The folder was nearly empty. Which got me wondering where Eclipse gets its plugins from ... until I realized that maybe Oomph doesn't copy the bundles from the pool.

That is a really, really bad idea when the pool is on a shared drive which can have lots of mini-outages which you normally never notice. But eventually, Windows will cut the connection, and Java will fail to lazy load more classes.

Suggestions: If the pool is on a network drive, create a local pool (one on the local harddisk) and copy the bundles into the local pool. That saves download times.

Or reject bundle pools on shared drives.
Comment 1 Ed Merks CLA 2016-11-12 04:07:13 EST
Certainly you have control over whether a pool is used or not and this shared drive thing seems to me only a problem depending on the reliability of your network so to simply disallow it on the principle that it might probably maybe for sure be a problem.

But definitely the idea of having a bundle pool to speed up installations by avoiding additional downloads, while also being able to create a "traditional non-shared-pool" installation is an interesting idea that we talked about before. It would definitely be a nice enhancement that provides some of the advantages of each approach.
Comment 2 Aaron Digulla CLA 2016-11-12 15:07:48 EST
This isn't about where to pool is. Your decision to allow this just wasted one day of my life. My gut feeling is that other people will also think that putting the pool on a shared drive is a good idea, especially since Eclipse downloads are so unreliable.

Some timings. When I install a new Eclipse using a pool on my local hard disk, that takes 53 seconds. Of that, the installer hangs for 47 seconds to download the tiny Eclipse launcher. From Switzerland. Our pipes are good.

That's why I can't follow your "oh, that's just a nice to have, who cares" reasoning.

It's a nasty bug, Eclipse can't do anything about it because it happens at the VM level. Even catching the error won't really work because there might be real NCDF errors.

What's worse, when the bug hits either of these will happen:

- Eclipse will lock up. You can't quit anymore, the window won't respond to mouse or keyboard. Your only way out is killing the process. You won't see an error in the UI because at this time, Eclipse can't load any classes anymore.

- The VM will suddenly crash, possibly ruining your workspace. This happens when the ZIP library gets interrupted at a bad time.

That's why I suggest to refuse sharing the pool on a network drive for the time being. At this time, until a better solution is found, putting the pool on a shared windows drive is high risk with little benefit.

If you don't do it, it feels like you don't care about your users or the perceived quality of your product. A lot of my co-workers are leaving for IDEA because "Eclipse breaks too often". I'm trying my best to give local support but replies like that make my grind my teeth in anger.
Comment 3 Ed Merks CLA 2016-11-12 18:02:09 EST
From what you describe it sounds like it's just always bad to use a shared drive either for the pool or for the installation, if the installation doesn't use a shared pool, and probably even if it does. In fact, it's probably a bad to use a shared drive for the workspace as well.  Or perhaps more generally, shared drives are questionable for storing any data because they are generally unreliable and in all cases we should warn the user about that. 

Is that a reasonable general conclusion?  It does make me wonder what a shared drive is ever good for if they're inherently unreliable.  But definitely I've seen enough forum posts to know that the workspace on a shared/network drive appears to be a bad idea, so I don't doubt your conclusion.

But I'm not sure how I can replicate such an issue locally...  And I'm not sure there is API in Java to tell me a java.util.File is on a shared drive... 

As an aside, when it comes to wasting a day of your life, I'm really very sorry for that.  Just keep in mind that I've spent several years of my life to provide something I hope will be of value to the community.  Of course I've been around long enough that I don't realistically expect a thank you my efforts.  That would certainly be nice, but generally what I expect is mostly complaints about things that don't work well or could work better.  There are always lots of those, and then I need to decide which of those are most important to warrant further investment of effort. So I generally persevere, without grinding my teeth in anger.  That being said, strongly expressed negativity never sits very well, and most certainly it never raises the priority of my efforts.  

The best way to raise the priority of a problem is when someone attempts to contribute a solution via Gerrit.  I.e., the more effort I see someone else invest in solving a problem the more I'm willing to invest in helping them.  So rather than expressing your anger you might better spend that time showing how simple it is to write a method that returns true for false for whether a File is on a shared drive.    

And all that being said, I'm not entirely convinced that installing using shared drives is a problem that affects a significant portion of the user base.  

Being able to create an "standard" installation that uses a pool simply to make the installation process faster is something I can imagine benefiting far more users...
Comment 4 Aaron Digulla CLA 2016-11-13 03:54:25 EST
Thanks for your understanding. It's appreciated. I still think this is a critical bug:

- It's easy to trigger inadvertently.
- There is a long time between triggering the bug and it's effect
- When the bug strikes, it makes Eclipse unusable in a very strange way
- It's hard to figure out what's happening; the error message gives no clue at all.
- There is a small possibility to corrupt your workspace.
- Most people running into this bug will just give up on Eclipse when it happens.
- My team members now think that the quality of Eclipse Neon is very, very bad if something like this can happen.
- There seems to be a simple workaround for this bug but I feel that it's too hard to figure that out.

Long story: I tried to use Oomph to create setup files so new developers can be productive after a couple of minutes instead of spending a day or more to get everything right. Most of the things that I care about worked - importing preferences, selecting extra plugins.

The process wasn't painless but I got something after two days. So ... it's okay. I was very happy with the way to allow to select bundles and get an eclipse with everything that we need. I despise p2 and everything related because of all the problems I have with it, your solution worked pretty good. I wasn't happy with the lack of documentation but then, I've used OSS for some time. I know how it is. Google did the trick for me.

I used the version of Eclipse which I've created this way, to make sure it lives up to the promise.

After a few minutes, it locked up. Like never before. I was completely puzzled. The window was there but it stopped responding to key strokes. WTF??? Okay, let's restart. WTF??? The close button doesn't work anymore. Alt+F4 doesn't work. File menu ... WTF??? It's blank - the size is correct but there is no text and no icons. Okay, I know that Exit is the last item so ... nope, doesn't work either.

I've never experienced this behavior before with any software. I was completely stumped. But I've seen Eclipse act up before, so I killed the process and tried again.

And it locked up again after a couple of minutes. What is going on here? I started it, I waited for the build to complete, I closed it. That worked.

Weird. I had no idea what to try next, so I reported to my manager that I couldn't get Oomph to work. We would have to find another way to pre-package Eclipse. Conclusion after two days of work: Custom Oomph setup is a nice idea but something is very, very badly broken. Don't touch it again - ever.

A few days later, I tried to use the normal installation process (Eclipse installer + manually setting up everything). Took me two days. On day one, Oomph download would timeout. Since it downloads only index files (everything else is already in the shared pool, remember?) that sucked big time. I went home with the impression that Oomph is more of a problem than a solution.

Day two: Downloads worked again. Hmph. Nice. After downloading a ton of index files (every time I start Oomph; what happened to caching?) and the bundle org.eclipse.equinox.launcher (why does it do that? It's just 50KB but why not load it from the pool???), I could install a pure JEE Neon. I set up the workspace, configured everything, and left.

Half an hour later: "Eclipse is stuck."

And there was my bug again. AAAARRRRGGHHH!! Everyone in the team is now thinking that Oomph and/or Eclipse is piece of crap.

Since I had no choice, I dug into the problem. Error log. NoClassDefFoundError. Lot's of them. Weird ones.

And then, it dawned on me. Out shared drives aren't very reliable for some reason that I can't fathom. I'm not a Windows expert. I just happened to know that when I leave an Explorer window open for some time, it will vanish eventually. No idea why.

And that's when I finally had everything to write this bug report.

And you came, looked at it, and switched the priority from "critical" to "meh".

That made me very angry. I felt that many people would stumble over this and just leave Eclipse forever when it happens to them. It just feels like a very stupid failure of the quality assurance of you people when Eclipse stubbornly hangs like that. It's exactly the kind of behavior which makes many developers I've met turn away from Eclipse: "It's just so unreliable. IDEA just works."

I know that Windows has it's share of bugs, nothing that Eclipse can do about. I have no idea why our network behaves so strangely. I wouldn't know what to tell our sysadmins to fix. I have no idea how the error could be turned into a useful error message.

That's why I suggest to prevent pools on network drives. If that was possible, I'd suggest to add code around every I/O operation in Eclipse to do some extra checks because of shared drives, because they can fail at any time. No matter what you do, they are never as reliable as local disks (and those would also profit from a more resilient I/O layer).
Comment 5 Aaron Digulla CLA 2016-11-13 03:55:35 EST
How to detect network drives: If you're running Java 8, you can try java.nio.file.FileStore. For Java 7, try FileSystemView (http://stackoverflow.com/questions/9163707/java-how-to-determine-the-type-of-drive-a-file-is-located-on).

If that fails, I suggest this approach:

- Add a fake bundle (just a JAR with some classes) to the pool.
- Create a thread which a classloader that loads a class from the bundle.
- Sleep for 10 minutes.
- Try to load another class from the bundle.
- If that fails, warn the user about it.
- If it succeeds, recreate the classloader and loop.

My gut feeling is that this is a bad solution but I can't think of anything better.

I tried to submit code to Eclipse several times; that never went well. So I won't do it again - ever.
Comment 6 Ed Merks CLA 2016-11-13 04:28:42 EST
Unfortunately download.eclipse.org has been ill-behaved the last months.  For sure every Tuesday at 10:00AM it will become unusable for most of the day (because all running Eclipse instances in a given time zone will check for updates at the same time).  I opened a Bugzilla:

https://bugs.eclipse.org/bugs/show_bug.cgi?id=498116

Perhaps it will be fixed for Neon.2, though I'm not so hopeful...

This week yet another problem recurred.  The download server stopped returning a populated mirror list. When this happens, the main download server in Canada serves all p2 requests and that too guarantees the server will become unresponsive.

It's all pretty frustrating because the user's perception is that Oomph is unreliable when in fact it's the server that's unreliable, and as you can see in the bug I opened, it's guaranteed to be unreliable one day per week. :-(

If there were a reliable way to detect an unreliable filesystem I would simply add the code to check that.  The links you gave are interesting, but don't appear to provide a reliable solution either. :-( I suppose one way would be to look for the \\foo\... in the file path, but if I understand correctly, one can bind a shared drive to a drive letter and then I have no reliable way to detect that the file system is a shared drive. Certainly we can't do a long running reliability check during the installation process itself.  How did you specify the use of the shared drive, was it using \\foo\... or using X:\...?

I'm sure IDEA would not work well on a shared drive either. I doubt they warn you when you do that...

When it comes always downloading a few things for every install, the problem is that binaries (e.g., IUs for executables) are not put into the pool.  That's annoying, but Oomph does work offline, so these things are cached somewhere.  I opened https://bugs.eclipse.org/bugs/show_bug.cgi?id=507430 to track that issue and see if there's something we could do to improve this behavior.

When it comes to contributing, yes it can be rather discouraging. But the platform team has changed over the years, and I have had good success with contributions more recently.  A good recent example (for the EGit project) is https://bugs.eclipse.org/bugs/show_bug.cgi?id=501392 which involved more than a week of effort to complete.  But I was sick of the problem and I wanted it fixed.  I was convinced that no one else would ever fix it.  Now I personally enjoy the benefits and I am proud that I contributed.
Comment 7 Aaron Digulla CLA 2016-11-17 12:03:17 EST
Thanks for the explanation, Ed. I've played with the ideas and here is what I found so far:

- java.io.File doesn't help
- java.nio.file.Path doesn't help
- java.nio.file.FileSystem doesn't help
- java.nio.file.spi.FileSystemProvider doesn't help
- java.nio.file.FileStore looks promising (type() method) but doesn't help (type is always NTFS).

So I've looked at an odd suggestion: javax.swing.filechooser.FileSystemView.

FileSystemView.getFileSystemView().getSystemTypeDescription(file)

where "file" is java.io.File. That will print the type of the file ... The good news:

1. This doesn't seem to start the whole of Swing.
2. It actually prints something

BUT

The code relies on native code. The native code will return a String. That String is locale dependent. It doesn't respond to Locale.setDefault().

*sigh* I'm not one of those people who are happy about all the things they know which don't work...

Last resort: Windows command line. That would work along these lines:

1. Detect that this is windows.
1. If the path starts with // or \\, reject it (UNC path, most likely remote).
1. If it's a drive letter, run "net use <letter>:", i.e. "net use c:".

For local drives the "net.exe" will terminate with an error. For network drives, it will return 0.

As far as I can tell, that's the best way to test this condition.

Another option would be "fsutil":

fsutil fsinfo drivetype {drive letter}

(see http://stackoverflow.com/a/15639941/34088). That should work as well, but I'm not sure whether fsutil.exe is a standard Windows utility. net.exe should be available everywhere.

Related:

https://sites.google.com/site/baohuagu/how-to-detect-if-a-drive-is-network-drive
Comment 8 Aaron Digulla CLA 2016-11-18 03:30:54 EST
I've created a Gist which should work in many cases. Use as you wish:
https://gist.github.com/digulla/31eed31c7ead29ffc7a30aaf87131def
Comment 9 Ed Merks CLA 2020-01-02 06:52:46 EST
Sorry, there just isn't a general way to detect such things. :-(