Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[smila-dev] RE: Problems with BinStorage

hi marius,

can u take this into account? I totally agree on this subject with Daniel.

it also reflects on the discussion we had earlier about mimicking a file system or not. 
one train of thought was that the bin storage should create folders on its own and that the user/admin should not need to take care of this.

I support this idea as long it applies to this performance problem. at the same time I maintain that the bin storage also needs to give a folder view to the client if the client wants to take care of this or has advanced partitioning needs. however, it should not be possible for a client to traverse the internal folder structure owned by the bin storage needed to meet the perf. requirements.

also keep in mind that this only applies to bin storages backed by the local file system and might not be needed by other underlying storages.

Kind regards
Thomas Menzel @ brox IT-Solutions GmbH


-----Original Message-----
From: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] On Behalf Of Daniel.Stucky@xxxxxxxxxxx
Sent: Dienstag, 7. Oktober 2008 13:53
To: smila-dev@xxxxxxxxxxx
Subject: [smila-dev] Problems with BinStorage

Hi all,

we did some tests with a larger amount of data than in the usual
development cases to create some index dump files. The system performed
ok for about 2 hours, where 20 index dump files (each about 10 MB) were
created. The creation of the 21st file took about 30 min, the 22nd 4
hours.

I assume that one of the problems for the decreasing performance is the
BinStorage. For every record attachment a folder in
workspace\.metadata\.plugins\org.eclipse.eilf.binstorage\storage\default
with one file is created. After 7 hours it contained 109295 files (754
MB) and 109298 folders. NTFS (and also most linux filesystems) are not
optimized for such a huge amount of folders (or files) in ONE directory.

Remember that the goal is to index millions of documents! So we have to
change the behavior of BinStorage, it is a NO GO to store all documents
in one folder. I guess that the whole logic of BinStorage was programmed
by ourselves. Why did we do that ? Aren't there any implementations
already available in the open source community ? We should take a look
at how for example distributed filesystems like hadoup, or lucene stores
it's data. Or at least create a tree like structure beneath
org.eclipse.eilf.binstorage\storage\default.
Of course his is all up for discussion.

BTW: there is currently no documentation for BinStorage available in the
eclipse wiki. This should be added by the responsible developers.

Bye,
Daniel
_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev


Back to the top