Bug 234597 - Deleting lots of files takes forever
Summary: Deleting lots of files takes forever
Status: RESOLVED WONTFIX
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Resources (show other bugs)
Version: 3.4   Edit
Hardware: All All
: P3 major (vote)
Target Milestone: ---   Edit
Assignee: Platform-Resources-Inbox CLA
QA Contact:
URL:
Whiteboard:
Keywords: performance
Depends on:
Blocks:
 
Reported: 2008-05-29 08:17 EDT by Dani Megert CLA
Modified: 2019-04-14 11:34 EDT (History)
7 users (show)

See Also:


Attachments
YourKit 7.0.12 snapshot (1.29 MB, application/zip)
2008-05-29 08:57 EDT, Martin Aeschlimann CLA
no flags Details
profiler screenshot (132.04 KB, image/png)
2019-04-14 07:32 EDT, Michael Keppler CLA
no flags Details

Note You need to log in before you can comment on or make changes to this bug.
Description Dani Megert CLA 2008-05-29 08:17:15 EDT
3.4 RC2

I wanted to delete a folder with many files. The UI froze completely (not able to cancel) and I had to kill Eclipse after 10 minutes.

Deleting same folder takes less than 5s via explorer.


There are two problems here:
1. that I cannot cancel and have to kill Eclipse (critical)
2. the time it takes (major)
Comment 1 Dani Megert CLA 2008-05-29 08:19:02 EDT
Steps:

1. checkout org.eclipse.jdt.ui.tests.refactoring
2. delete folder 'resources' via Package Explorer
Comment 2 Martin Aeschlimann CLA 2008-05-29 08:56:27 EDT
Problem 1 is bug 175733, which we planed to fix for 3.4 but weren't able to as we ran out of time.

When I profiled the scenario, it took over 10 minutes to finish.

All time was spend in IResource.delete(..)
A hot spot seems to be AbstractDataTreeNode.assembleWith(...)

Comment 3 Martin Aeschlimann CLA 2008-05-29 08:57:42 EDT
Created attachment 102625 [details]
YourKit 7.0.12 snapshot
Comment 4 John Arthorne CLA 2008-05-29 09:55:11 EDT
Szymon, I suggest looking at the profiler trace. Most of the time seems to be in the CVS EclipseSynchronizer#prepareForDeletion. In particular it seems to call IWorkspaceRoot#getProjects(...) a very large number of times. There may be some simple caching here to get a big performance win.
Comment 5 Szymon Brandys CLA 2015-04-01 09:44:07 EDT
I am no longer involved in Platform Team/Compare development.
Comment 6 Eclipse Genie CLA 2019-03-27 02:21:25 EDT
This bug hasn't had any activity in quite some time. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet.

If you have further information on the current state of the bug, please add it. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant.

--
The automated Eclipse Genie.
Comment 7 Dani Megert CLA 2019-03-27 11:32:30 EDT
Could not test steps from comment 1 with CVS as the repos i not longer available. I've tested with Git instead:

8s for checking precondition
2'50s to delete it

Deleting it directly on the Windows 7 file system takes 40s.

So, it's not as bad as when using CVS but still not good. The major time consumption seems to happen related to updating the internal history.
Comment 8 Michael Keppler CLA 2019-04-14 07:32:52 EDT
Created attachment 278274 [details]
profiler screenshot

I've run the same use case through a profiler (delete the resources directory from workspace, backed by git), and there is not much that can be done. In my case 57 seconds are spent on the deletion (after checking preconditions), of those 50 seconds are local history. Unfortunately local history requires moving the to be deleted files to new directories and Windows is not really fast for those operations.

All the typical optimizations seem to be in place, e.g.
* I see no duplicate calls to file system methods
* streams are buffered

Since this seems bound by IO throughput of the hardware, the only optimization to come up with would be to exchange IO operations for memory, e.g. to store all the history blob changes in memory, and to only write them back to disk in a background job after the operation finished. For the above case that would allow deferring 14 of the 50 seconds until after the delete operation. However, this optimization would be somewhat dangerous regarding file consistency, it might not scale with huge delete operations, it would require a synchronization of potential other access to history etc. I think we should not even try that.

I suggest closing WONTFIX.
Comment 9 Dani Megert CLA 2019-04-14 11:34:51 EDT
(In reply to Michael Keppler from comment #8)
> Created attachment 278274 [details]
> profiler screenshot
> 
> I've run the same use case through a profiler (delete the resources
> directory from workspace, backed by git), and there is not much that can be
> done. In my case 57 seconds are spent on the deletion (after checking
> preconditions), of those 50 seconds are local history. Unfortunately local
> history requires moving the to be deleted files to new directories and
> Windows is not really fast for those operations.
> 
> All the typical optimizations seem to be in place, e.g.
> * I see no duplicate calls to file system methods
> * streams are buffered
> 
> Since this seems bound by IO throughput of the hardware, the only
> optimization to come up with would be to exchange IO operations for memory,
> e.g. to store all the history blob changes in memory, and to only write them
> back to disk in a background job after the operation finished. For the above
> case that would allow deferring 14 of the 50 seconds until after the delete
> operation. However, this optimization would be somewhat dangerous regarding
> file consistency, it might not scale with huge delete operations, it would
> require a synchronization of potential other access to history etc. I think
> we should not even try that.
> 
> I suggest closing WONTFIX.
Thanks Michael. I agree.