257211 – [dstore][performance] Get content of large directories in groups

Bug 257211 - [dstore][performance] Get content of large directories in groups

Summary: [dstore][performance] Get content of large directories in groups

Status:	NEW

Alias:	None

Product:	Target Management
Classification:	Tools
Component:	RSE (show other bugs)
Version:	3.0.2
Hardware:	PC Windows XP

Importance:	P3 enhancement (vote)
Target Milestone:	---
Assignee:	dsdp.tm.rse-inbox
QA Contact:	Martin Oberhuber

URL:
Whiteboard:
Keywords:	performance

Depends on:
Blocks:

Reported:	2008-12-02 10:17 EST by Samuel Wu
Modified:	2008-12-02 10:51 EST (History)
CC List:	1 user (show)

See Also:

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Samuel Wu

2008-12-02 10:17:56 EST

Getting the content of large remote directories (2000+ files/folders) can be
extremely slow, especially when the network is slow or distance is huge -- for
e.g. some shops have developers in India connecting to servers somewhere in
North America.

One way to help with this is to get the content in "groups". For example, if a
directory contains more than a certain (configurable) number of children, then
we will divide the children into different buckets. The number of children per
bucket should also be configurable (i.e. 500 chilren per bucket, or 1000 per
bucket). 

Folder1
   +-- 1..1000
   +-- 1001..2000
   +-- 2001+

Exanding a range will get the children within that range (bucket) only. 

In the initial implementation, no actions will be allowed on the range nodes and
the content of the range/bucket does NOT need to be auto-refreshed when content is changed (i.e. files deleted/added/etc). For example, if a file is deleted in the range 1..1000, then that range will have 999 files/folders. If a file is added instead, then range 1..1000 will really have 1001 files. A refresh will have to be issued on the parent directory in order to re-get the buckets/ranges.

Comment 1 Martin Oberhuber

2008-12-02 10:46:10 EST

I find the "bucket" idea problematic in terms of usability, and likely hard to implement. Before *any* further action on this, we need more exact data about how long it takes, and with what communication protocol. Samuel, I assume you are talking about dstore; can you try the same folders with SSH for comparison?

Note that for putting stuff into buckets, you'll need to sort items on the remote side anyways. You can do that with an agent-based solution like dstore only (ssh or ftp won't guarantee the output to be sorted... what bucket should the user look in). So, you'll need to query all names in the remote folder anyways.

Assuming 2000 names of 20 characters each, that's 40 KBytes to transfer. On a slow modem (14 kbps), transferring this data takes 30 seconds, which I personally find acceptable if it doesn't happen too often. I see 3 useful possibilities for improvement:

1. "Stream" like transfer of directory contents, such that results 1-n can
already be displayed in the viewer while the transfer is still ongoing.
The proposed Java7 FileSystem API's go that way for this kind of problem,
and I'm in favor of adopting this for EFS as well.
2. On "Refresh", don't throw away the entire contents while refresh is still
ongoing, but allow user to continue working with old data. Replace data
with new contents as it is received. There is an existing bug for this.
3. On initial transfer, transfer the file names only but not the stat data
(modtime, permissions). We *will* need to know isFile vs isFolder though,
so I'm not sure how much we gain from this.

Thoughts?