Bug 174930 - [build] full build after starting with a newly installed build
Summary: [build] full build after starting with a newly installed build
Status: VERIFIED FIXED
Alias: None
Product: Equinox
Classification: Eclipse Project
Component: Framework (show other bugs)
Version: 3.3   Edit
Hardware: PC Windows XP
: P3 major (vote)
Target Milestone: 3.3 M7   Edit
Assignee: Thomas Watson CLA
QA Contact:
URL:
Whiteboard:
Keywords: polish
Depends on:
Blocks:
 
Reported: 2007-02-21 04:33 EST by Dani Megert CLA
Modified: 2007-04-11 04:45 EDT (History)
6 users (show)

See Also:


Attachments
Full debug log (55.95 KB, text/plain, text/file)
2007-02-21 04:36 EST, Dani Megert CLA
no flags Details
Debug info for the session where I imported and exited (280.39 KB, application/x-zip-compressed)
2007-03-16 11:51 EDT, Dani Megert CLA
no flags Details
Debug info from the restart (625.83 KB, application/x-zip-compressed)
2007-03-16 11:52 EDT, Dani Megert CLA
no flags Details
Another trace with a full build (535.01 KB, application/x-zip-compressed)
2007-03-16 13:40 EDT, Dani Megert CLA
no flags Details
ZIP over M6 (173.33 KB, application/x-zip-compressed)
2007-03-27 11:02 EDT, Dani Megert CLA
no flags Details
patch (2.33 KB, patch)
2007-04-02 15:24 EDT, Thomas Watson CLA
no flags Details | Diff
patch (2.43 KB, patch)
2007-04-03 09:46 EDT, Thomas Watson CLA
no flags Details | Diff
Debug info from using I20070403-1110 + patched osgi plug-in from Tom (11.74 KB, application/x-zip-compressed)
2007-04-04 10:55 EDT, Dani Megert CLA
no flags Details
patch (2.55 KB, patch)
2007-04-04 17:34 EDT, Thomas Watson CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Dani Megert CLA 2007-02-21 04:33:43 EST
I20070220-1330

When I start up my dev workspace where all binary plug-ins are imported and no PDE projects are selected with a newly installed build then a full build happens.

Switched from I20070213-0907 to I20070220-1330.
Comment 1 Dani Megert CLA 2007-02-21 04:36:09 EST
Created attachment 59458 [details]
Full debug log

The debug log shows many INCREMENTAL bug also some FULL builds.
Comment 2 Eric Jodet CLA 2007-02-21 05:00:50 EST
This might be related to the correction of bug
https://bugs.eclipse.org/bugs/show_bug.cgi?id=172444
Comment 3 Dani Megert CLA 2007-02-21 05:11:49 EST
Note that the FULL build:

Successfully read state for org.apache.lucene.analysis
Clearing last state : State for org.apache.lucene.analysis (#0 @ Tue Feb 20 18:07:05 CET 2007)
FULL build
Recording new state : State for org.apache.lucene.analysis (#0 @ Wed Feb 21 10:28:22 CET 2007)
Finished build of org.apache.lucene.analysis @ Wed Feb 21 10:28:22 CET 2007

happens on an imported BINRARY plug-in.
Comment 4 Eric Jodet CLA 2007-02-21 06:08:58 EST
(In reply to comment #2)
after further investigation, bug 172444 seems unrelated 
Comment 5 Dani Megert CLA 2007-03-16 11:49:53 EDT
Using I20070313-1051 it got harder to reproduce. In the normal exit/start scenario I couldn't make it happen anymore BUT: when I did these steps:

1. start workspace
2. import binary plug-ins
3. wait until building done
4. exit
5. start
==> full build
Comment 6 Dani Megert CLA 2007-03-16 11:51:25 EDT
Created attachment 61116 [details]
Debug info for the session where I imported and exited
Comment 7 Dani Megert CLA 2007-03-16 11:52:10 EDT
Created attachment 61117 [details]
Debug info from the restart
Comment 8 Dani Megert CLA 2007-03-16 13:40:01 EDT
Created attachment 61141 [details]
Another trace with a full build

OK, this really seems to be the pattern. Here's another debug trace. And yes - I waited until the workbench was calm before I exited and restarted ;-)
Comment 9 Jerome Lanneluc CLA 2007-03-19 05:56:36 EDT
Thanks Dani. It looks like the order of access rules is being changed between exit and restart, thus causing the full build.

before exit:
        [...]
	pattern=org/eclipse/core/resources/team/* (ACCESSIBLE)
	pattern=org/eclipse/core/internal/resources/refresh/win32/* (DISCOURAGED | IGNORE IF BETTER)
	pattern=org/eclipse/core/internal/indexing/* (DISCOURAGED | IGNORE IF BETTER)
        [...]

after restart:
        [...]
	pattern=org/eclipse/core/resources/team/* (ACCESSIBLE)
	pattern=org/eclipse/core/internal/indexing/* (DISCOURAGED | IGNORE IF BETTER)
	pattern=org/eclipse/core/internal/resources/refresh/win32/* (DISCOURAGED | IGNORE IF BETTER)
        [...]

Investigating who is changing this order...
Comment 10 Jerome Lanneluc CLA 2007-03-20 10:24:22 EDT
Dani is using -Dosgi.clean=true to launch his workspace, Wassim could that be the problem (i.e. are you ensuring the same order for access rules is used when this option is used) ?
Comment 11 Wassim Melhem CLA 2007-03-20 10:31:45 EDT
-Dosgi.clean=true does not matter since we store our state in the workspace metadata.  The order of access rules is computed based on the order that the runtime gives us.

In this case though, I suspect the fact that many new plug-ins were added to the sdk between the two builds may have something to do with the difference in the classpaths from one week to the next.
Comment 12 Jerome Lanneluc CLA 2007-03-20 10:39:07 EDT
The title might say otherwise, but I believe that Dani is restarting the workbench on the same SDK, so no new plug-ins are added. Am I right Dani ?
Comment 13 Dani Megert CLA 2007-03-20 10:42:18 EDT
>In this case though, I suspect the fact that many new plug-ins were added to
>the sdk between the two builds may have something to do with the difference in
>the classpaths from one week to the next.
NOTE: the problem happens after I started with the new buid (see comment 5 for the scenario). After exiting I would expect that my workspace is in a stable state again.
Comment 14 Wassim Melhem CLA 2007-03-20 10:46:38 EDT
oh wait.  this bug was reported on a relatively ancient (Feb) build.

As for the present, what plug-ins can we import as binary as per comment 5 to reproduce the issue?
Comment 15 Dani Megert CLA 2007-03-20 10:48:54 EDT
>oh wait.  this bug was reported on a relatively ancient (Feb) build.
Nope. Latest logs are from I20070313-1051

I have almost all plug-ins imported as binary (all that are required by JDT UI minus the platform-text cvs module).
Comment 16 Jerome Lanneluc CLA 2007-03-20 11:34:46 EDT
Moving to PDE/UI as it seems that access rules are changing between shutdown and restart.
Comment 17 Wassim Melhem CLA 2007-03-20 11:36:41 EDT
yes, we need to investigate on the pde side.
Comment 18 Philipe Mulet CLA 2007-03-21 08:27:40 EDT
Wassim ? Are you fixing it for M6 ?
I'd like M6 to include all our fixes for classpath issues (either JDT or PDE). On JDT front, all our available fixes are in. 
Comment 19 Wassim Melhem CLA 2007-03-21 10:51:07 EDT
I plan to take a look/fix for M6.  It came into our bucket a bit late though.
Comment 20 Wassim Melhem CLA 2007-03-21 15:49:55 EDT
Brian, upon shutting down and restarting, the access rules and their order returned should be exactly the same since the workspace/target has not changed.

can you verify that the rules are being unnecessarily reordered?  Thanks.
Comment 21 Dani Megert CLA 2007-03-22 06:40:41 EDT
Hi Brian,

here are the easy reproducible steps:
1. start fresh workspace using I20070321-1800
2. import ALL plug-ins and fragments as binary projects
3. wait until workspace has been built
4. exit
5. restart
==> full build

Here's how I launch my workspace from the command line in case that matters:
C:\JavaSDKs\jdk1.5.0_10\bin\java -showversion -Xms50M -Xmx350M -Dosgi.clean=true -jar plugins\org.eclipse.equinox.launcher_*.jar
 -update -debug -keyring c:\eclipse\.keyring -application org.eclipse.ui.ide.workbench -showlocation -data c:\eclipse\workspaces\tmpx   1>log.txt 2>&1
Comment 22 Philipe Mulet CLA 2007-03-26 12:48:19 EDT
I saw this bug again today, self-hosting on 3.3M6.
Looks like the target of the bug should be moved to M7.
Comment 23 Jerome Lanneluc CLA 2007-03-27 04:56:55 EDT
After several attempts following steps in comment #21, I was not able to reproduce the full build problem. Dani, when you follow steps in comment #21 (assuming you are running on a plain 3.3M6), do you see the full build problem on every attempt ?
Comment 24 Dani Megert CLA 2007-03-27 05:00:19 EDT
The problem is apparently still there as you can see from comment 22.

I will try again later today. Did you try with the command line args I provided? NOTE: I do not start eclipse.exe.
Comment 25 Jerome Lanneluc CLA 2007-03-27 05:14:12 EDT
I agree that the problem is still there. I'm just trying to find reliable steps to reproduce the problem so that Brian can debug his code.

Yes, I used the same command line as you provided.
Comment 26 Dani Megert CLA 2007-03-27 11:00:50 EDT
I can reproduce it with fresh 3.3 M6.

HOWEVER, I found out something important: the problem goes away if I disable my other (linked) locations that come in via 'links' folder.

So, try this:
1. install fresh M6 into c:\eclipse\drops ==> c:\eclipse\drops
2. rename c:\eclipse\drops\eclipse to c:\eclipse\drops\3.3_M6
3. download and unzip the attached bug.zip to c:\
   (verify that your install gets a 'links' directory
4. continue with comment 21

Philippe, are you also using additional locations / 'links' folder?
Comment 27 Dani Megert CLA 2007-03-27 11:02:15 EDT
Created attachment 62107 [details]
ZIP over M6
Comment 28 Philipe Mulet CLA 2007-03-27 11:35:02 EDT
Re: comment 26.

I wasn't using other locations/links folder.
Comment 29 Philipe Mulet CLA 2007-03-27 11:36:18 EDT
My impression was that the behavior was VM specific (I often switched between VMs), but this could be a red herring.
Comment 30 Dani Megert CLA 2007-03-28 04:49:26 EDT
Could anyone now reproduce the problem with my latest steps?
Comment 31 Jerome Lanneluc CLA 2007-03-28 07:00:42 EDT
(In reply to comment #30)
> Could anyone now reproduce the problem with my latest steps?
I just tried it with your links folder but I was not able to reproduce the problem.
Comment 32 Dani Megert CLA 2007-03-28 07:19:36 EDT
That's really strange. Can you verify the setup by checking the PDE Target location and see whether the additional location has been recognized?
Comment 33 Jerome Lanneluc CLA 2007-03-28 07:29:32 EDT
I already did :-) I yes, the additional location was there.
Comment 34 Dani Megert CLA 2007-03-28 08:51:16 EDT
Jerome, do you have a T60 (I have a T43)? It might be a timing issue: Markus can also not reproduce on his T60. Brian can you give it a shot?
Comment 35 Jerome Lanneluc CLA 2007-03-28 08:53:46 EDT
I have a T41p.
Comment 36 Brian Bauman CLA 2007-03-29 09:24:26 EDT
Sorry about not replying earlier.  I tried it yesterday with the linked content provided by Dani and unfortunately was not able to reproduce the problem.  For the record I have a T42p and was running with the IBM 1.5 JVM.
Comment 37 Dani Megert CLA 2007-03-29 09:28:23 EDT
Can you advise on where to set breakpoints to track down the problem on my side?
Comment 38 Dani Megert CLA 2007-04-02 08:27:05 EDT
I can now also reproduce on a different machine but with the complete contents of my custom location. I've sent Brian, Jerome and Wassim that via e-mail as it contains information that I cannot share here. Please try to reproduce.
Comment 39 Wassim Melhem CLA 2007-04-02 08:59:20 EDT
Dani,

I was unable to reproduce, but I am a bit confused about this scenario. 

If the entire workspace is made up of binary plug-ins, what does a full build do? 
Comment 40 Dani Megert CLA 2007-04-02 09:01:42 EDT
This is was just to find the small scenario. The full build also happens on my dev workspace with 100s of source plug-ins - which is very bad.
Comment 41 Jerome Lanneluc CLA 2007-04-02 09:29:15 EDT
I was able to reproduce with the custom location that Dani sent.
Comment 42 Dani Megert CLA 2007-04-02 09:48:27 EDT
I'm so happy :-)
Comment 43 Wassim Melhem CLA 2007-04-02 10:21:32 EDT
ok, I think I may have been able to reproduce in a runtime workbench, which is good.

Jerome, what are some minimal jdt/core debug flags that would show me the before-after classpaths that would cause a rebuild?
Comment 44 Jerome Lanneluc CLA 2007-04-02 10:26:04 EDT
Running with the classpath resolution tracing on (org.eclipse.jdt.core/debug/cpresolution=true), I see several instances of "missbehaving container" where the access rules on restart are in a different order than the access rules given during the first session.

E.g. while initializing "org.eclipse.pde.core.requiredPlugins" for "org.eclipse.ltk.ui.refactoring", the classpath entry for "/org.eclipse.core.resources" has the following rules in different order:

- pattern=org/eclipse/core/internal/resources/refresh/win32/* (DISCOURAGED | IGNORE IF BETTER)
- pattern=org/eclipse/core/internal/indexing/* (DISCOURAGED | IGNORE IF BETTER)

Is it because org/eclipse/core/internal/resources/refresh/win32 is a fragment ?
Comment 45 Jerome Lanneluc CLA 2007-04-02 10:27:10 EDT
(In reply to comment #43)
> ok, I think I may have been able to reproduce in a runtime workbench, which is
> good.
> 
> Jerome, what are some minimal jdt/core debug flags that would show me the
> before-after classpaths that would cause a rebuild?
> 
Enabling this flag should show you the before-after classpaths: org.eclipse.jdt.core/debug/cpresolution=true
Comment 46 Wassim Melhem CLA 2007-04-02 13:16:23 EDT
An update:

State#getVisiblePackages() is returning a different order of packages coming from the two fragments of core.resources.

We need to figure out now if PDE is adding/removing fragments from the state to cause the runtime to return a different order, or if the order coming out of the runtime state is inconsistent.
Comment 47 Thomas Watson CLA 2007-04-02 15:24:45 EDT
Created attachment 62699 [details]
patch

This patch does an insertion sort for the unresolved bundles when in devmode.  As long as the bundle ids stay the same for the BundleDescriptions in PDE this should help give a consistent ordering for fragment resolution.

The theory is that fragments are getting resolved in different orders by PDE.  This is causing getVisiblePackages to return the packages exported by two or more fragments attached to the same host in different orders.

Performance:  This is probably not the fastest way to sort the unresolved bundles list, but it was the easiest to implement to test out the possible fix.  Need to keep an eye on performance with large target platform sets.
Comment 48 Wassim Melhem CLA 2007-04-02 15:37:40 EDT
Tom, I verified that a full build occurs without the patch, and no false build occurs with the patch.
Comment 49 Thomas Watson CLA 2007-04-02 17:32:17 EDT
Moving this to Equinox->Framework.  I'm not sure there is a consistent way for PDE to fix this.  I think it needs to be done in the resolver.
Comment 50 Thomas Watson CLA 2007-04-03 09:46:30 EDT
Created attachment 62774 [details]
patch

New patch that optimizes the case where bundles are added with incrementing bundle ids (the typical scenario).
Comment 51 Wassim Melhem CLA 2007-04-03 14:42:16 EDT
Tom, would you like me to work with the patch before you release it?
Comment 52 Thomas Watson CLA 2007-04-03 14:59:08 EDT
Yes, please.  I want to release the patch on in the next 2 days.  I have been running with the patch myself.  I would be useful if others on the bug report could try it out in their environments as well.
Comment 53 Dani Megert CLA 2007-04-04 02:18:46 EDT
>I would be useful if others on the bug report
>could try it out in their environments as well.
Just send me the patched plug-in(s).
Comment 54 Dani Megert CLA 2007-04-04 10:54:43 EDT
I verified with the patched plug-in that the fix does NOT work. See attached debug log.
Comment 55 Dani Megert CLA 2007-04-04 10:55:51 EDT
Created attachment 62934 [details]
Debug info from using I20070403-1110 + patched osgi plug-in from Tom
Comment 56 Thomas Watson CLA 2007-04-04 15:20:16 EDT
I still see the error when using Dani's large test case.  What I see is that The BundleDescription objects that PDE creates for the initial load use different Bundle IDs (long) than on the subsequent restarts.  In the resolver we sort the BundleDescriptions according to there Bundle ID values.  On a restart it seems that the IDs use a different order than on the initial workspace load.

For example, we have two fragments to org.eclipse.core.resources:

org.eclipse.core.resources.compatibility
org.eclipse.core.resources.win32

On the initial workspace load I see these fragments get the following IDs

org.eclipse.core.resources.compatibility -> 181
org.eclipse.core.resources.win32 -> 182

On ever restart after that I see these fragments get the following IDs

org.eclipse.core.resources.compatibility -> 289
org.eclipse.core.resources.win32 -> 268

Notice that the id order is different from the initial load to subsequent restarts.  This change in id orders makes the patch in the resolver worthless because it still changes the order in which fragments will be attached to their hosts which will lead to exports from the fragments being in a different order.
Comment 57 Thomas Watson CLA 2007-04-04 17:34:35 EDT
Created attachment 62986 [details]
patch

Discussed this with Wassim.  Seems that sorting by bundle id is not good enough because PDE-UI may use different long IDs.  This patch sorts by Bundle-SymbolicName instead.

This patch works for me on Dani's large testcase.
Comment 58 Wassim Melhem CLA 2007-04-04 20:57:19 EDT
I verified that the latest patch does not cause a rebulid on dani's workspace.

Now I have to try it without the patch.  This scenario is so long.
Comment 59 Wassim Melhem CLA 2007-04-05 08:10:27 EDT
ok, the patch is good.

You should go ahead and release it.
Comment 60 Thomas Watson CLA 2007-04-05 09:06:28 EDT
Fixed released to HEAD.
Comment 61 Dani Megert CLA 2007-04-11 04:45:16 EDT
Verified in I20070410-1043.
Thanks guys!