Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [smila-dev] Re: Problems with BinStorage

Hi Marius,

Could you please also implement method that will allow direct access to File object, something like File getPhysicalFile(Id); it's required for blackboard to provide direct attachment access.


Thanks,
Dmitry

Marius Cimpean wrote:
Hi all

1. I completely restructured the binstorage bundles (not in the SVN yet) - from the design&architecture point of view (the persistence logic is still the same - flat fs). My intention was, first to restructure the binstorage bundles (done) and then improve the persistence mechanism (backend - persistence logic) - based on the discussion list. The reported test case by Daniel (huge amount of data) sounds like "it is a must" related to changing the persistence mechanism, which currently was implemented (as described by Daniel in flat fs way). 2. Wiki page for binstorage http://wiki.eclipse.org/SMILA/Project_Concepts/Binary_Storage

Best regards,
Marius

----- Original Message ----- From: <smila-dev-request@xxxxxxxxxxx>
To: <smila-dev@xxxxxxxxxxx>
Sent: Tuesday, October 07, 2008 4:46 PM
Subject: smila-dev Digest, Vol 4, Issue 13


Send smila-dev mailing list submissions to
smila-dev@xxxxxxxxxxx

To subscribe or unsubscribe via the World Wide Web, visit
https://dev.eclipse.org/mailman/listinfo/smila-dev
or, via email, send a message with subject or body 'help' to
smila-dev-request@xxxxxxxxxxx

You can reach the person managing the list at
smila-dev-owner@xxxxxxxxxxx

When replying, please edit your Subject line so it is more specific
than "Re: Contents of smila-dev digest..."


Today's Topics:

  1. Problems with BinStorage (Daniel.Stucky@xxxxxxxxxxx)
  2. Re: Problems with BinStorage (Dmitry Hazin)
  3. Re: Problems with BinStorage (Ivan Churkin)
  4. RE: Problems with BinStorage (Thomas Menzel)
  5. [Fatal Error] :1:1: Content is not allowed in prolog & search
     test (Marius Cimpean)
  6. Re: [Fatal Error] :1:1: Content is not allowed in prolog &
     search test (Ivan Churkin)
  7. Re: [Fatal Error] :1:1: Content is not allowed in prolog &
     search test (Ivan Churkin)
  8. AW: [smila-dev] RE: Problems with BinStorage
     (Daniel.Stucky@xxxxxxxxxxx)
  9. Oct. 22 Webinar: Ensuring Clean IP (Thomas Menzel)


----------------------------------------------------------------------

Message: 1
Date: Tue, 7 Oct 2008 13:53:09 +0200
From: <Daniel.Stucky@xxxxxxxxxxx>
Subject: [smila-dev] Problems with BinStorage
To: <smila-dev@xxxxxxxxxxx>
Message-ID:
<69D276452CD2904980D5B6AC33C9BE170D5FA118@xxxxxxxxxxxxxxxxxxxxxxxxx>
Content-Type: text/plain; charset="us-ascii"

Hi all,

we did some tests with a larger amount of data than in the usual
development cases to create some index dump files. The system performed
ok for about 2 hours, where 20 index dump files (each about 10 MB) were
created. The creation of the 21st file took about 30 min, the 22nd 4
hours.

I assume that one of the problems for the decreasing performance is the
BinStorage. For every record attachment a folder in
workspace\.metadata\.plugins\org.eclipse.eilf.binstorage\storage\default
with one file is created. After 7 hours it contained 109295 files (754
MB) and 109298 folders. NTFS (and also most linux filesystems) are not
optimized for such a huge amount of folders (or files) in ONE directory.

Remember that the goal is to index millions of documents! So we have to
change the behavior of BinStorage, it is a NO GO to store all documents
in one folder. I guess that the whole logic of BinStorage was programmed
by ourselves. Why did we do that ? Aren't there any implementations
already available in the open source community ? We should take a look
at how for example distributed filesystems like hadoup, or lucene stores
it's data. Or at least create a tree like structure beneath
org.eclipse.eilf.binstorage\storage\default.
Of course his is all up for discussion.

BTW: there is currently no documentation for BinStorage available in the
eclipse wiki. This should be added by the responsible developers.

Bye,
Daniel


------------------------------

Message: 2
Date: Tue, 07 Oct 2008 19:04:35 +0700
From: Dmitry Hazin <dhazin@xxxxxxxxxxxx>
Subject: Re: [smila-dev] Problems with BinStorage
To: Smila project developer mailing list <smila-dev@xxxxxxxxxxx>
Message-ID: <48EB5053.5010301@xxxxxxxxxxxx>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi,

There was a discussion about BinStorage redesign some time ago, where
this problem was discussed too.
Discussion started here:
http://dev.eclipse.org/mhonarc/lists/smila-dev/msg00084.html
So I think BinStorage should be in process of redesign now..
Thanks,
Dmitry


Daniel.Stucky@xxxxxxxxxxx wrote:
Hi all,

we did some tests with a larger amount of data than in the usual
development cases to create some index dump files. The system performed
ok for about 2 hours, where 20 index dump files (each about 10 MB) were
created. The creation of the 21st file took about 30 min, the 22nd 4
hours.

I assume that one of the problems for the decreasing performance is the
BinStorage. For every record attachment a folder in
workspace\.metadata\.plugins\org.eclipse.eilf.binstorage\storage\default
with one file is created. After 7 hours it contained 109295 files (754
MB) and 109298 folders. NTFS (and also most linux filesystems) are not
optimized for such a huge amount of folders (or files) in ONE directory.

Remember that the goal is to index millions of documents! So we have to
change the behavior of BinStorage, it is a NO GO to store all documents
in one folder. I guess that the whole logic of BinStorage was programmed
by ourselves. Why did we do that ? Aren't there any implementations
already available in the open source community ? We should take a look
at how for example distributed filesystems like hadoup, or lucene stores
it's data. Or at least create a tree like structure beneath
org.eclipse.eilf.binstorage\storage\default.
Of course his is all up for discussion.

BTW: there is currently no documentation for BinStorage available in the
eclipse wiki. This should be added by the responsible developers.

Bye,
Daniel
_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev



------------------------------

Message: 3
Date: Tue, 07 Oct 2008 19:06:20 +0700
From: Ivan Churkin <ivan@xxxxxxxxxxxx>
Subject: Re: [smila-dev] Problems with BinStorage
To: Smila project developer mailing list <smila-dev@xxxxxxxxxxx>
Message-ID: <48EB50BC.7090006@xxxxxxxxxxxx>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Daniel

I voted many times to rewrite binstorage components (with more clear and
simple API).
see "[smila-dev] binstorage redesign" thread.
Imho, it's to many bundles and services for such ordinary component. And
I don't like content of that bundles...
--
Ivan

Daniel.Stucky@xxxxxxxxxxx wrote:
Hi all,

we did some tests with a larger amount of data than in the usual
development cases to create some index dump files. The system performed
ok for about 2 hours, where 20 index dump files (each about 10 MB) were
created. The creation of the 21st file took about 30 min, the 22nd 4
hours.

I assume that one of the problems for the decreasing performance is the
BinStorage. For every record attachment a folder in
workspace\.metadata\.plugins\org.eclipse.eilf.binstorage\storage\default
with one file is created. After 7 hours it contained 109295 files (754
MB) and 109298 folders. NTFS (and also most linux filesystems) are not
optimized for such a huge amount of folders (or files) in ONE directory.

Remember that the goal is to index millions of documents! So we have to
change the behavior of BinStorage, it is a NO GO to store all documents
in one folder. I guess that the whole logic of BinStorage was programmed
by ourselves. Why did we do that ? Aren't there any implementations
already available in the open source community ? We should take a look
at how for example distributed filesystems like hadoup, or lucene stores
it's data. Or at least create a tree like structure beneath
org.eclipse.eilf.binstorage\storage\default.
Of course his is all up for discussion.

BTW: there is currently no documentation for BinStorage available in the
eclipse wiki. This should be added by the responsible developers.

Bye,
Daniel
_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev




------------------------------

Message: 4
Date: Tue, 7 Oct 2008 14:05:48 +0200
From: Thomas Menzel <tmenzel@xxxxxxx>
Subject: [smila-dev] RE: Problems with BinStorage
To: Smila project developer mailing list <smila-dev@xxxxxxxxxxx>
Message-ID:
<6CDC32AFFBA5AA4B8BEA6397594F76BD1FA018F74D@xxxxxxxxxxxxxxxxxxxxxxx>
Content-Type: text/plain; charset="us-ascii"

hi marius,

can u take this into account? I totally agree on this subject with Daniel.

it also reflects on the discussion we had earlier about mimicking a file system or not. one train of thought was that the bin storage should create folders on its own and that the user/admin should not need to take care of this.

I support this idea as long it applies to this performance problem. at the same time I maintain that the bin storage also needs to give a folder view to the client if the client wants to take care of this or has advanced partitioning needs. however, it should not be possible for a client to traverse the internal folder structure owned by the bin storage needed to meet the perf. requirements.

also keep in mind that this only applies to bin storages backed by the local file system and might not be needed by other underlying storages.

Kind regards
Thomas Menzel @ brox IT-Solutions GmbH


-----Original Message-----
From: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] On Behalf Of Daniel.Stucky@xxxxxxxxxxx
Sent: Dienstag, 7. Oktober 2008 13:53
To: smila-dev@xxxxxxxxxxx
Subject: [smila-dev] Problems with BinStorage

Hi all,

we did some tests with a larger amount of data than in the usual
development cases to create some index dump files. The system performed
ok for about 2 hours, where 20 index dump files (each about 10 MB) were
created. The creation of the 21st file took about 30 min, the 22nd 4
hours.

I assume that one of the problems for the decreasing performance is the
BinStorage. For every record attachment a folder in
workspace\.metadata\.plugins\org.eclipse.eilf.binstorage\storage\default
with one file is created. After 7 hours it contained 109295 files (754
MB) and 109298 folders. NTFS (and also most linux filesystems) are not
optimized for such a huge amount of folders (or files) in ONE directory.

Remember that the goal is to index millions of documents! So we have to
change the behavior of BinStorage, it is a NO GO to store all documents
in one folder. I guess that the whole logic of BinStorage was programmed
by ourselves. Why did we do that ? Aren't there any implementations
already available in the open source community ? We should take a look
at how for example distributed filesystems like hadoup, or lucene stores
it's data. Or at least create a tree like structure beneath
org.eclipse.eilf.binstorage\storage\default.
Of course his is all up for discussion.

BTW: there is currently no documentation for BinStorage available in the
eclipse wiki. This should be added by the responsible developers.

Bye,
Daniel
_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev


------------------------------

Message: 5
Date: Tue, 7 Oct 2008 15:09:42 +0300
From: "Marius Cimpean" <marius.cimpean@xxxxxxxxxxx>
Subject: [smila-dev] [Fatal Error] :1:1: Content is not allowed in
prolog & search test
To: <smila-dev@xxxxxxxxxxx>
Message-ID: <019B1208876346FF9186F18D95447B60@MariusNUMERICA>
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
reply-type=original

Hi

I just made an SVN update and run the local builds; then started the tests
(run the EILF and search tests).

There are two unexpected behaviors:
1. the search page does not return any results

2. the EILF console displays following message : "[Fatal Error] :1:1:
Content is not allowed in prolog."
when closing the app.

I guess, "preparing the bundles for checking-in" causes the this error
message ("[Fatal Error] :1:1: Content is not allowed in prolog") - it may be
that some files (xml, xsd ...) got changed in some special text editor
(UTF-8 BOM issue) - so finally we end up in a parser error.

Does anyone else have these two behaviors ?

Best Regards,
Marius




------------------------------

Message: 6
Date: Tue, 07 Oct 2008 19:21:07 +0700
From: Ivan Churkin <ivan@xxxxxxxxxxxx>
Subject: Re: [smila-dev] [Fatal Error] :1:1: Content is not allowed in
prolog & search test
To: Smila project developer mailing list <smila-dev@xxxxxxxxxxx>
Message-ID: <48EB5433.5060300@xxxxxxxxxxxx>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi Marius,

Guess its my fault :(.
I doing massive changes with generated code. Will fix it soon.
--
Ivan

Marius Cimpean wrote:
Hi

I just made an SVN update and run the local builds; then started the
tests (run the EILF and search tests).

There are two unexpected behaviors:
1. the search page does not return any results

2. the EILF console displays following message : "[Fatal Error] :1:1:
Content is not allowed in prolog."
when closing the app.

I guess, "preparing the bundles for checking-in" causes the this error
message ("[Fatal Error] :1:1: Content is not allowed in prolog") - it
may be that some files (xml, xsd ...) got changed in some special text
editor (UTF-8 BOM issue) - so finally we end up in a parser error.

Does anyone else have these two behaviors ?

Best Regards,
Marius

_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev



------------------------------

Message: 7
Date: Tue, 07 Oct 2008 19:37:22 +0700
From: Ivan Churkin <ivan@xxxxxxxxxxxx>
Subject: Re: [smila-dev] [Fatal Error] :1:1: Content is not allowed in
prolog & search test
To: Smila project developer mailing list <smila-dev@xxxxxxxxxxx>
Message-ID: <48EB5802.5050909@xxxxxxxxxxxx>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

There are two unexpected behaviors:
>1. the search page does not return any results
It was because some new record filter "workflow-object" becomes required
but it was not reflected in the configuration.
It has been fixed.

>2. the EILF console displays following message : "[Fatal Error] :1:1:
Content is not allowed in prolog."
>when closing the app.
It's because I did changes with generated code and one commit was wrong
:(. It was fixed recently.

--
Regards, Ivan



Ivan Churkin wrote:
Hi Marius,

Guess its my fault :(.
I doing massive changes with generated code. Will fix it soon.
--
Ivan

Marius Cimpean wrote:
Hi

I just made an SVN update and run the local builds; then started the
tests (run the EILF and search tests).

There are two unexpected behaviors:
1. the search page does not return any results

2. the EILF console displays following message : "[Fatal Error] :1:1:
Content is not allowed in prolog."
when closing the app.

I guess, "preparing the bundles for checking-in" causes the this
error message ("[Fatal Error] :1:1: Content is not allowed in
prolog") - it may be that some files (xml, xsd ...) got changed in
some special text editor (UTF-8 BOM issue) - so finally we end up in
a parser error.

Does anyone else have these two behaviors ?

Best Regards,
Marius

_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev

_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev



------------------------------

Message: 8
Date: Tue, 7 Oct 2008 15:02:15 +0200
From: <Daniel.Stucky@xxxxxxxxxxx>
Subject: AW: [smila-dev] RE: Problems with BinStorage
To: <smila-dev@xxxxxxxxxxx>
Message-ID:
<69D276452CD2904980D5B6AC33C9BE170D5FA1AA@xxxxxxxxxxxxxxxxxxxxxxxxx>
Content-Type: text/plain; charset="iso-8859-1"

Hi Marius,

could you please add your (updated) concept for BinStorage to http://wiki.eclipse.org/SMILA/Project_Concepts so that we have a common base for further discussion.

Thanks.
Daniel


-----Ursprüngliche Nachricht-----
Von: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-
bounces@xxxxxxxxxxx] Im Auftrag von Thomas Menzel
Gesendet: Dienstag, 7. Oktober 2008 14:06
An: Smila project developer mailing list
Betreff: [smila-dev] RE: Problems with BinStorage

hi marius,

can u take this into account? I totally agree on this subject with
Daniel.

it also reflects on the discussion we had earlier about mimicking a
file system or not.
one train of thought was that the bin storage should create folders on
its own and that the user/admin should not need to take care of this.

I support this idea as long it applies to this performance problem. at
the same time I maintain that the bin storage also needs to give a
folder view to the client if the client wants to take care of this or
has advanced partitioning needs. however, it should not be possible for
a client to traverse the internal folder structure owned by the bin
storage needed to meet the perf. requirements.

also keep in mind that this only applies to bin storages backed by the
local file system and might not be needed by other underlying storages.

Kind regards
Thomas Menzel @ brox IT-Solutions GmbH


------------------------------

Message: 9
Date: Tue, 7 Oct 2008 15:42:53 +0200
From: Thomas Menzel <tmenzel@xxxxxxx>
Subject: [smila-dev] Oct. 22 Webinar: Ensuring Clean IP
To: Smila project developer mailing list <smila-dev@xxxxxxxxxxx>
Message-ID:
<6CDC32AFFBA5AA4B8BEA6397594F76BD1FA018F76A@xxxxxxxxxxxxxxxxxxxxxxx>
Content-Type: text/plain; charset="us-ascii"

http://www.eclipse.org/newsportal/article.php?id=1834&group=eclipse.foundation

Kind regards
Thomas Menzel @ brox IT-Solutions GmbH

-------------- next part --------------
An HTML attachment was scrubbed...
URL: https://dev.eclipse.org/mailman/private/smila-dev/attachments/20081007/8f9ae047/attachment.html

------------------------------

_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev


End of smila-dev Digest, Vol 4, Issue 13
****************************************




_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev


Back to the top