174930 – [build] full build after starting with a newly installed build

Bug 174930 - [build] full build after starting with a newly installed build

Summary: [build] full build after starting with a newly installed build

Status:	VERIFIED FIXED

Alias:	None

Product:	Equinox
Classification:	Eclipse Project
Component:	Framework (show other bugs)
Version:	3.3
Hardware:	PC Windows XP

Importance:	P3 major (vote)
Target Milestone:	3.3 M7
Assignee:	Thomas Watson
QA Contact:

URL:
Whiteboard:
Keywords:	polish

Depends on:
Blocks:

Reported:	2007-02-21 04:33 EST by Dani Megert
Modified:	2007-04-11 04:45 EDT (History)
CC List:	6 users (show)

See Also:

Attachments
Full debug log (55.95 KB, text/plain, text/file) 2007-02-21 04:36 EST, Dani Megert	no flags	Details
Debug info for the session where I imported and exited (280.39 KB, application/x-zip-compressed) 2007-03-16 11:51 EDT, Dani Megert	no flags	Details
Debug info from the restart (625.83 KB, application/x-zip-compressed) 2007-03-16 11:52 EDT, Dani Megert	no flags	Details
Another trace with a full build (535.01 KB, application/x-zip-compressed) 2007-03-16 13:40 EDT, Dani Megert	no flags	Details
ZIP over M6 (173.33 KB, application/x-zip-compressed) 2007-03-27 11:02 EDT, Dani Megert	no flags	Details
patch (2.33 KB, patch) 2007-04-02 15:24 EDT, Thomas Watson	no flags	Details \| Diff
patch (2.43 KB, patch) 2007-04-03 09:46 EDT, Thomas Watson	no flags	Details \| Diff
Debug info from using I20070403-1110 + patched osgi plug-in from Tom (11.74 KB, application/x-zip-compressed) 2007-04-04 10:55 EDT, Dani Megert	no flags	Details
patch (2.55 KB, patch) 2007-04-04 17:34 EDT, Thomas Watson	no flags	Details \| Diff
Show Obsolete (2) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Dani Megert

2007-02-21 04:33:43 EST

I20070220-1330

When I start up my dev workspace where all binary plug-ins are imported and no PDE projects are selected with a newly installed build then a full build happens.

Switched from I20070213-0907 to I20070220-1330.

Comment 1 Dani Megert

2007-02-21 04:36:09 EST

Created attachment 59458 [details]
Full debug log

The debug log shows many INCREMENTAL bug also some FULL builds.

Comment 2 Eric Jodet

2007-02-21 05:00:50 EST

This might be related to the correction of bug
https://bugs.eclipse.org/bugs/show_bug.cgi?id=172444

Comment 3 Dani Megert

2007-02-21 05:11:49 EST

Note that the FULL build:

Successfully read state for org.apache.lucene.analysis
Clearing last state : State for org.apache.lucene.analysis (#0 @ Tue Feb 20 18:07:05 CET 2007)
FULL build
Recording new state : State for org.apache.lucene.analysis (#0 @ Wed Feb 21 10:28:22 CET 2007)
Finished build of org.apache.lucene.analysis @ Wed Feb 21 10:28:22 CET 2007

happens on an imported BINRARY plug-in.

Comment 4 Eric Jodet

2007-02-21 06:08:58 EST

(In reply to comment #2)
after further investigation, bug 172444 seems unrelated

Comment 5 Dani Megert

2007-03-16 11:49:53 EDT

Using I20070313-1051 it got harder to reproduce. In the normal exit/start scenario I couldn't make it happen anymore BUT: when I did these steps:

1. start workspace
2. import binary plug-ins
3. wait until building done
4. exit
5. start
==> full build

Comment 6 Dani Megert

2007-03-16 11:51:25 EDT

Created attachment 61116 [details]
Debug info for the session where I imported and exited

Comment 7 Dani Megert

2007-03-16 11:52:10 EDT

Created attachment 61117 [details]
Debug info from the restart

Comment 8 Dani Megert

2007-03-16 13:40:01 EDT

Created attachment 61141 [details]
Another trace with a full build

OK, this really seems to be the pattern. Here's another debug trace. And yes - I waited until the workbench was calm before I exited and restarted ;-)

Comment 9 Jerome Lanneluc

2007-03-19 05:56:36 EDT

Thanks Dani. It looks like the order of access rules is being changed between exit and restart, thus causing the full build.

before exit:
        [...]
	pattern=org/eclipse/core/resources/team/* (ACCESSIBLE)
	pattern=org/eclipse/core/internal/resources/refresh/win32/* (DISCOURAGED | IGNORE IF BETTER)
	pattern=org/eclipse/core/internal/indexing/* (DISCOURAGED | IGNORE IF BETTER)
        [...]

after restart:
        [...]
	pattern=org/eclipse/core/resources/team/* (ACCESSIBLE)
	pattern=org/eclipse/core/internal/indexing/* (DISCOURAGED | IGNORE IF BETTER)
	pattern=org/eclipse/core/internal/resources/refresh/win32/* (DISCOURAGED | IGNORE IF BETTER)
        [...]

Investigating who is changing this order...

Comment 10 Jerome Lanneluc

2007-03-20 10:24:22 EDT

Dani is using -Dosgi.clean=true to launch his workspace, Wassim could that be the problem (i.e. are you ensuring the same order for access rules is used when this option is used) ?

Comment 11 Wassim Melhem

2007-03-20 10:31:45 EDT

-Dosgi.clean=true does not matter since we store our state in the workspace metadata.  The order of access rules is computed based on the order that the runtime gives us.

In this case though, I suspect the fact that many new plug-ins were added to the sdk between the two builds may have something to do with the difference in the classpaths from one week to the next.

Comment 12 Jerome Lanneluc

2007-03-20 10:39:07 EDT

The title might say otherwise, but I believe that Dani is restarting the workbench on the same SDK, so no new plug-ins are added. Am I right Dani ?

Comment 13 Dani Megert

2007-03-20 10:42:18 EDT

>In this case though, I suspect the fact that many new plug-ins were added to
>the sdk between the two builds may have something to do with the difference in
>the classpaths from one week to the next.
NOTE: the problem happens after I started with the new buid (see comment 5 for the scenario). After exiting I would expect that my workspace is in a stable state again.

Comment 14 Wassim Melhem

2007-03-20 10:46:38 EDT

oh wait.  this bug was reported on a relatively ancient (Feb) build.

As for the present, what plug-ins can we import as binary as per comment 5 to reproduce the issue?

Comment 15 Dani Megert

2007-03-20 10:48:54 EDT

>oh wait.  this bug was reported on a relatively ancient (Feb) build.
Nope. Latest logs are from I20070313-1051

I have almost all plug-ins imported as binary (all that are required by JDT UI minus the platform-text cvs module).

Comment 16 Jerome Lanneluc

2007-03-20 11:34:46 EDT

Moving to PDE/UI as it seems that access rules are changing between shutdown and restart.

Comment 17 Wassim Melhem

2007-03-20 11:36:41 EDT

yes, we need to investigate on the pde side.

Comment 18 Philipe Mulet

2007-03-21 08:27:40 EDT

Wassim ? Are you fixing it for M6 ?
I'd like M6 to include all our fixes for classpath issues (either JDT or PDE). On JDT front, all our available fixes are in.

Comment 19 Wassim Melhem

2007-03-21 10:51:07 EDT

I plan to take a look/fix for M6.  It came into our bucket a bit late though.

Comment 20 Wassim Melhem

2007-03-21 15:49:55 EDT

Brian, upon shutting down and restarting, the access rules and their order returned should be exactly the same since the workspace/target has not changed.

can you verify that the rules are being unnecessarily reordered?  Thanks.

Comment 21 Dani Megert

2007-03-22 06:40:41 EDT

Hi Brian,

here are the easy reproducible steps:
1. start fresh workspace using I20070321-1800
2. import ALL plug-ins and fragments as binary projects
3. wait until workspace has been built
4. exit
5. restart
==> full build

Here's how I launch my workspace from the command line in case that matters:
C:\JavaSDKs\jdk1.5.0_10\bin\java -showversion -Xms50M -Xmx350M -Dosgi.clean=true -jar plugins\org.eclipse.equinox.launcher_*.jar
 -update -debug -keyring c:\eclipse\.keyring -application org.eclipse.ui.ide.workbench -showlocation -data c:\eclipse\workspaces\tmpx   1>log.txt 2>&1

Comment 22 Philipe Mulet

2007-03-26 12:48:19 EDT

I saw this bug again today, self-hosting on 3.3M6.
Looks like the target of the bug should be moved to M7.

Comment 23 Jerome Lanneluc

2007-03-27 04:56:55 EDT

After several attempts following steps in comment #21, I was not able to reproduce the full build problem. Dani, when you follow steps in comment #21 (assuming you are running on a plain 3.3M6), do you see the full build problem on every attempt ?

Comment 24 Dani Megert

2007-03-27 05:00:19 EDT

The problem is apparently still there as you can see from comment 22.

I will try again later today. Did you try with the command line args I provided? NOTE: I do not start eclipse.exe.

Comment 25 Jerome Lanneluc

2007-03-27 05:14:12 EDT

I agree that the problem is still there. I'm just trying to find reliable steps to reproduce the problem so that Brian can debug his code.

Yes, I used the same command line as you provided.

Comment 26 Dani Megert

2007-03-27 11:00:50 EDT

I can reproduce it with fresh 3.3 M6.

HOWEVER, I found out something important: the problem goes away if I disable my other (linked) locations that come in via 'links' folder.

So, try this:
1. install fresh M6 into c:\eclipse\drops ==> c:\eclipse\drops
2. rename c:\eclipse\drops\eclipse to c:\eclipse\drops\3.3_M6
3. download and unzip the attached bug.zip to c:\
   (verify that your install gets a 'links' directory
4. continue with comment 21

Philippe, are you also using additional locations / 'links' folder?

Comment 27 Dani Megert

2007-03-27 11:02:15 EDT

Created attachment 62107 [details]
ZIP over M6

Comment 28 Philipe Mulet

2007-03-27 11:35:02 EDT

Re: comment 26.

I wasn't using other locations/links folder.

Comment 29 Philipe Mulet

2007-03-27 11:36:18 EDT

My impression was that the behavior was VM specific (I often switched between VMs), but this could be a red herring.

Comment 30 Dani Megert

2007-03-28 04:49:26 EDT

Could anyone now reproduce the problem with my latest steps?

Comment 31 Jerome Lanneluc

2007-03-28 07:00:42 EDT

(In reply to comment #30)
> Could anyone now reproduce the problem with my latest steps?
I just tried it with your links folder but I was not able to reproduce the problem.

Comment 32 Dani Megert

2007-03-28 07:19:36 EDT

That's really strange. Can you verify the setup by checking the PDE Target location and see whether the additional location has been recognized?

Comment 33 Jerome Lanneluc

2007-03-28 07:29:32 EDT

I already did :-) I yes, the additional location was there.

Comment 34 Dani Megert

2007-03-28 08:51:16 EDT

Jerome, do you have a T60 (I have a T43)? It might be a timing issue: Markus can also not reproduce on his T60. Brian can you give it a shot?

Comment 35 Jerome Lanneluc

2007-03-28 08:53:46 EDT

I have a T41p.

Comment 36 Brian Bauman

2007-03-29 09:24:26 EDT

Sorry about not replying earlier.  I tried it yesterday with the linked content provided by Dani and unfortunately was not able to reproduce the problem.  For the record I have a T42p and was running with the IBM 1.5 JVM.

Comment 37 Dani Megert

2007-03-29 09:28:23 EDT

Can you advise on where to set breakpoints to track down the problem on my side?

Comment 38 Dani Megert

2007-04-02 08:27:05 EDT

I can now also reproduce on a different machine but with the complete contents of my custom location. I've sent Brian, Jerome and Wassim that via e-mail as it contains information that I cannot share here. Please try to reproduce.

Comment 39 Wassim Melhem

2007-04-02 08:59:20 EDT

Dani,

I was unable to reproduce, but I am a bit confused about this scenario. 

If the entire workspace is made up of binary plug-ins, what does a full build do?

Comment 40 Dani Megert

2007-04-02 09:01:42 EDT

This is was just to find the small scenario. The full build also happens on my dev workspace with 100s of source plug-ins - which is very bad.

Comment 41 Jerome Lanneluc

2007-04-02 09:29:15 EDT

I was able to reproduce with the custom location that Dani sent.

Comment 42 Dani Megert

2007-04-02 09:48:27 EDT

I'm so happy :-)

Comment 43 Wassim Melhem

2007-04-02 10:21:32 EDT

ok, I think I may have been able to reproduce in a runtime workbench, which is good.

Jerome, what are some minimal jdt/core debug flags that would show me the before-after classpaths that would cause a rebuild?

Comment 44 Jerome Lanneluc

2007-04-02 10:26:04 EDT

Running with the classpath resolution tracing on (org.eclipse.jdt.core/debug/cpresolution=true), I see several instances of "missbehaving container" where the access rules on restart are in a different order than the access rules given during the first session.

E.g. while initializing "org.eclipse.pde.core.requiredPlugins" for "org.eclipse.ltk.ui.refactoring", the classpath entry for "/org.eclipse.core.resources" has the following rules in different order:

- pattern=org/eclipse/core/internal/resources/refresh/win32/* (DISCOURAGED | IGNORE IF BETTER)
- pattern=org/eclipse/core/internal/indexing/* (DISCOURAGED | IGNORE IF BETTER)

Is it because org/eclipse/core/internal/resources/refresh/win32 is a fragment ?

Comment 45 Jerome Lanneluc

2007-04-02 10:27:10 EDT

(In reply to comment #43)
> ok, I think I may have been able to reproduce in a runtime workbench, which is
> good.
> 
> Jerome, what are some minimal jdt/core debug flags that would show me the
> before-after classpaths that would cause a rebuild?
> 
Enabling this flag should show you the before-after classpaths: org.eclipse.jdt.core/debug/cpresolution=true

Comment 46 Wassim Melhem

2007-04-02 13:16:23 EDT

An update:

State#getVisiblePackages() is returning a different order of packages coming from the two fragments of core.resources.

We need to figure out now if PDE is adding/removing fragments from the state to cause the runtime to return a different order, or if the order coming out of the runtime state is inconsistent.

Comment 47 Thomas Watson

2007-04-02 15:24:45 EDT

Created attachment 62699 [details]
patch

This patch does an insertion sort for the unresolved bundles when in devmode.  As long as the bundle ids stay the same for the BundleDescriptions in PDE this should help give a consistent ordering for fragment resolution.

The theory is that fragments are getting resolved in different orders by PDE.  This is causing getVisiblePackages to return the packages exported by two or more fragments attached to the same host in different orders.

Performance:  This is probably not the fastest way to sort the unresolved bundles list, but it was the easiest to implement to test out the possible fix.  Need to keep an eye on performance with large target platform sets.

Comment 48 Wassim Melhem

2007-04-02 15:37:40 EDT

Tom, I verified that a full build occurs without the patch, and no false build occurs with the patch.

Comment 49 Thomas Watson

2007-04-02 17:32:17 EDT

Moving this to Equinox->Framework.  I'm not sure there is a consistent way for PDE to fix this.  I think it needs to be done in the resolver.

Comment 50 Thomas Watson

2007-04-03 09:46:30 EDT

Created attachment 62774 [details]
patch

New patch that optimizes the case where bundles are added with incrementing bundle ids (the typical scenario).

Comment 51 Wassim Melhem

2007-04-03 14:42:16 EDT

Tom, would you like me to work with the patch before you release it?

Comment 52 Thomas Watson

2007-04-03 14:59:08 EDT

Yes, please.  I want to release the patch on in the next 2 days.  I have been running with the patch myself.  I would be useful if others on the bug report could try it out in their environments as well.

Comment 53 Dani Megert

2007-04-04 02:18:46 EDT

>I would be useful if others on the bug report
>could try it out in their environments as well.
Just send me the patched plug-in(s).

Comment 54 Dani Megert

2007-04-04 10:54:43 EDT

I verified with the patched plug-in that the fix does NOT work. See attached debug log.

Comment 55 Dani Megert

2007-04-04 10:55:51 EDT

Created attachment 62934 [details]
Debug info from using I20070403-1110 + patched osgi plug-in from Tom

Comment 56 Thomas Watson

2007-04-04 15:20:16 EDT

I still see the error when using Dani's large test case.  What I see is that The BundleDescription objects that PDE creates for the initial load use different Bundle IDs (long) than on the subsequent restarts.  In the resolver we sort the BundleDescriptions according to there Bundle ID values.  On a restart it seems that the IDs use a different order than on the initial workspace load.

For example, we have two fragments to org.eclipse.core.resources:

org.eclipse.core.resources.compatibility
org.eclipse.core.resources.win32

On the initial workspace load I see these fragments get the following IDs

org.eclipse.core.resources.compatibility -> 181
org.eclipse.core.resources.win32 -> 182

On ever restart after that I see these fragments get the following IDs

org.eclipse.core.resources.compatibility -> 289
org.eclipse.core.resources.win32 -> 268

Notice that the id order is different from the initial load to subsequent restarts.  This change in id orders makes the patch in the resolver worthless because it still changes the order in which fragments will be attached to their hosts which will lead to exports from the fragments being in a different order.

Comment 57 Thomas Watson

2007-04-04 17:34:35 EDT

Created attachment 62986 [details]
patch

Discussed this with Wassim.  Seems that sorting by bundle id is not good enough because PDE-UI may use different long IDs.  This patch sorts by Bundle-SymbolicName instead.

This patch works for me on Dani's large testcase.

Comment 58 Wassim Melhem

2007-04-04 20:57:19 EDT

I verified that the latest patch does not cause a rebulid on dani's workspace.

Now I have to try it without the patch.  This scenario is so long.

Comment 59 Wassim Melhem

2007-04-05 08:10:27 EDT

ok, the patch is good.

You should go ahead and release it.

Comment 60 Thomas Watson

2007-04-05 09:06:28 EDT

Fixed released to HEAD.

Comment 61 Dani Megert

2007-04-11 04:45:16 EDT

Verified in I20070410-1043.
Thanks guys!