Community
Participate
Working Groups
This recurring error should be fixed by RC1. When an application failed to deploy due to errors in publish operation, for example: - wrong buildpack The module cannot be deleted because in DeleteModulesOperation, module deletion is stopped due to this check: if (appModule == null || !appModule.isDeployed()) { continue; } This check is in place for another issue: Bug 485228 We need to better distinguish preventing deleting of existing app on CF when cancelling deployment, which the above fixes, compared to a user explicitly deleting a module. Cleaning the server does not fix the problem
To duplicate this error, add/remove or drag/drop a new application onto a CF server, and in the deployment wizard enter an incorrect buildpack. Once CF shows an error, try deleting the module. The module appears to be deleted in the Servers view, but when deploying the application again, the deployment wizard does not open and the application is auto-deployed with the wrong buildpack. The user does not have the option to change the buildpack on any subsequent publish even if they deleted the module from the view (but clearly not from the server).
A possible cause for this error is that if an error occurs in a publish operation, and the application is successfully created in CF before the error occurs, the publish state of the module is still UNKNOWN even though the application exists in CF. To find the cause of the error, trace where module publish state is set to UNKNOWN when an error occurs during any publish operation. If the application already exists in CF when publish error occurs, the publish state should reflect the state of the app in CF (e.g. STOP, START, etc..).
I take a look at the problem. Based on my investigation, the problem is indeed coming from apppModule.isDeployed() return false. However, the problem is not coming from the module status is unknown but the module does not exist at all. In org.eclipse.cft.server.core.internal.client.CloudFoundryApplicationModule.isDeployed(), it has: exists() && getState() != IServer.STATE_UNKNOWN exists() is false and I also check the actual CF server and the application itself does not exist in the system. Therefore, the cause of the problem is the cloud server in Eclipse still keeps the cloud module around on the server that is out of sync with the actual server itself. Therefore, the fix is to remove the cloud module application from the server instance to match that the actual CF server does not really have that application exist.
Code released to master
Thanks for looking at this. I also took a look, but I came up with a different solution I believe generally the problem is what you mentioned, that locally we are holding on to the module even if app does not exist, so Server is out of synch with CF. But I believe the code that does this is this internal "cache" of modules that are being deployed. See ModuleCache, which tags and untags a module during publish to prevent a module that is being added from also being deleted by a parallel process. The problem is that if error occurs during publish, this cache does not "untag" the module that was being added, resulting in the cache holding on to the module, whether the app was created in CF or not. So next time a user tries to delete the module, when addRemoveModules(..) is called on CloudFoundryServer, the logic that checks ModuleCache will prevent the module from being deleted from the cache, as it is still being tagged as "in deployment" I believe a suitable fix is: 1. Rename the "tag/untag" methods in ModuleCache to better reflect what they really are doing, which is prevent a module that is being added from accidentally being deleted by a parallel process in: org.eclipse.cft.server.core.internal.CloudFoundryServer.addAndDeleteModules() A better rename for these methods I think would be: tagAsUndeployed(IModule module) -> tagAddModuleInProgress(IModule) tagAsDeployed(IModule module) -> tagAddModuleCompleted(IModule) isUndeployed(IModule) -> isAddingModule(IModule) 2. When CoreException is handled in AbstractPublishApplicationOperation: Here we do an update of modules, as expected, but we also forget to untag the module from the cache. The module should be untagged any time a publish operation terminates, whether it successfully published the module, or it failed. I believe here is the actual error. I was able to verify that if we do a proper untag of module in step two above (so in the error handling code in AbstractPublishApplicationOperation) , the Server stays correctly in synch with Cloud Foundry: A. If publish failed, but app was created, the "update modules" step in error handler correctly creates the module in Server B. If publish failed, but app was not created, the "update modules" step in error handler correctly removes the module in Server
My solution was in error handling in AbstractPublishApplicationOperation: catch (CoreException e) { // Untag module from cache getBehaviour().getCloudFoundryServer().tagAsCompleted(getModule()); // Proceed with regular update module getBehaviour().operations().updateModule(getModule()).run(monitor); ... }
My original fix was based on the fact that the cloud application has been created in the PushApplication operation so it should be cleaned up before leaving the method if the application has not been created yet. Based on the latest findings on comment #5, there is a better way to fix the problem that will cover the another failing scenario as described in the comment. Therefore, I am reopening this and I am going to roll back my change. On a offline discussion with Nieraj, we also need to similar module update operation on the operation cancel case as well since I notice that cancelling the publish operation may cause the same out of sync problem as well.
Pushed the fix to master. Now modules are being cleaned from the internal cache both when: 1. Operation is cancelled 2. Any errors handled during publish operation Also added a test case to test the fix for condition 2 above. Testing cancellation reliably is a bit more challenging than testing for errors during publish, and requires additional testing harness support for asynch execution so it will be omitted.