Bug 521695 - Initial project sync generates sync job for each file
Summary: Initial project sync generates sync job for each file
Status: NEW
Alias: None
Product: PTP
Classification: Tools
Component: RDT.sync (show other bugs)
Version: 9.1.3   Edit
Hardware: PC Linux
: P3 normal (vote)
Target Milestone: ---   Edit
Assignee: Project Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks: 521693
  Show dependency tree
 
Reported: 2017-08-31 10:23 EDT by Kaloyan Raev CLA
Modified: 2017-09-01 16:03 EDT (History)
1 user (show)

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Kaloyan Raev CLA 2017-08-31 10:23:49 EDT
So, if we have Synchronized Projects already integrated for PHP Projects. Now I am trying to do some post-processing in the Synchronized PHP Project wizard after the initial sync. See bug 521693.

I found that I can implement the ISyncListener interface. It would have worked great if the handleSyncEvent() method was not called thousands of times - once for each synched file!

My expectation is that the handleSyncEvent() method is called only once after the initial sync is complete.

I did some debugging and found that the org.eclipse.ptp.internal.rdt.sync.ui.ResourceChangeListener class triggers a new sync job for each metadata file created by the wizard and each file that is synched from the remote location.

This not only blocks my improvement, but is also quite wasteful and hits the IDE performance.

How about having the SynchronizedJob extending WorkspaceJob instead of Job? I guess this would avoid notify the platform for resource changes during the initial sync.
Comment 1 John Eblen CLA 2017-08-31 11:53:13 EDT
Yes, I remember dealing with this problem in the core sync code. IIRC, I added quite a bit of logic that checks the status of the git repo and syncs only when necessary.

Would this approach work in your case? Could you add some logic so that most calls to your handler do nothing or very little?

If the initial sync is downloading many files that aren't actually needed, file filtering can help, especially to avoid large files, which can greatly increase the time for the initial sync.

Finally, a solution that avoids notifying the platform for certain changes might work but would need to be careful not to mask changes that sync needs to know about.
Comment 2 Kaloyan Raev CLA 2017-09-01 02:39:49 EDT
(In reply to John Eblen from comment #1)
> Would this approach work in your case? Could you add some logic so that most
> calls to your handler do nothing or very little?

I tried, but could not find a reliable way. I need to check if a file with name 'composer.json' is present after the initial sync. I can raise a flag if I get a notification and 'composer.json' is already present. But I need to also use a synchronized block to ensure that my logic is executed only once. Also, in the scenario where 'composer.json' is not present in the project, I cannot determine which is the last notification and won't be able to unregister my listener. All of the above amplify the existing performance issue and I won't to avoid it.

> If the initial sync is downloading many files that aren't actually needed,
> file filtering can help, especially to avoid large files, which can greatly
> increase the time for the initial sync.

PHP projects (and web projects in general) can easily have thousands and tens of thousands of files. The time for initial sync is not a problem: ~10-15 seconds for a few thousands file from a very distance remote location, which is an astounding speed compared to the typical FTP/SFTP approach, which may takes hours for this use case.

The performance issue is in the IDE responsiveness due to the flood of extra sync jobs.

> Finally, a solution that avoids notifying the platform for certain changes
> might work but would need to be careful not to mask changes that sync needs
> to know about.

Could you elaborate on this? I want to understand more about what resource change notifications are actually required by the initial sync.
Comment 3 John Eblen CLA 2017-09-01 15:39:54 EDT
(In reply to Kaloyan Raev from comment #2)
> I tried, but could not find a reliable way. I need to check if a file with
> name 'composer.json' is present after the initial sync. I can raise a flag
> if I get a notification and 'composer.json' is already present. But I need
> to also use a synchronized block to ensure that my logic is executed only
> once. Also, in the scenario where 'composer.json' is not present in the
> project, I cannot determine which is the last notification and won't be able
> to unregister my listener. All of the above amplify the existing performance
> issue and I won't to avoid it.

This might be more of a conceptual/design problem. The system doesn't really define an "initial sync" (a construct that groups all of these events under one umbrella). Thus, it would be difficult in any case to know when it is complete.

Rather, it sounds like you're trying to recognize some state transition that depends on the presence of the "composer.json" file. My hunch is that there is another solution not involving the handler, such as checking for the file periodically or in response to a user request.

Currently, the SyncEvent passed to the handler is empty. So another solution might be to populate it with information such as the delta of file changes.

> > Finally, a solution that avoids notifying the platform for certain changes
> > might work but would need to be careful not to mask changes that sync needs
> > to know about.
> 
> Could you elaborate on this? I want to understand more about what resource
> change notifications are actually required by the initial sync.
I was just responding to your idea of suppressing notifications by changing the parent of SynchronizedJob. I simply wanted to emphasize that such a solution could be difficult. I haven't looked into it, though.
Comment 4 Kaloyan Raev CLA 2017-09-01 16:03:36 EDT
Let me explain again the issue I opened this bug for.

I create a new synchronized project to a remote location with large number of files. The single sync job that transfers all files from the remote location to the local project triggers (via the ResourceChangeListener) a new, unnecessary sync job for each of the copied files to sync them back from the local project to the remote location.

If this issue did not exist I would have been able to resolve bug 521693 by using ISyncListener. There are other alternatives to resolve 521693, but they are not as elegant. I prefer to have this issue fixed in PTP rather to implement workarounds.