Send smila-dev mailing list submissions to
smila-dev@xxxxxxxxxxx
To subscribe or unsubscribe via the World Wide Web, visit
https://dev.eclipse.org/mailman/listinfo/smila-dev
or, via email, send a message with subject or body 'help' to
smila-dev-request@xxxxxxxxxxx
You can reach the person managing the list at
smila-dev-owner@xxxxxxxxxxx
When replying, please edit your Subject line so it is more specific
than "Re: Contents of smila-dev digest..."
Today's Topics:
1. Problems with BinStorage (Daniel.Stucky@xxxxxxxxxxx)
2. Re: Problems with BinStorage (Dmitry Hazin)
3. Re: Problems with BinStorage (Ivan Churkin)
4. RE: Problems with BinStorage (Thomas Menzel)
5. [Fatal Error] :1:1: Content is not allowed in prolog & search
test (Marius Cimpean)
6. Re: [Fatal Error] :1:1: Content is not allowed in prolog &
search test (Ivan Churkin)
7. Re: [Fatal Error] :1:1: Content is not allowed in prolog &
search test (Ivan Churkin)
8. AW: [smila-dev] RE: Problems with BinStorage
(Daniel.Stucky@xxxxxxxxxxx)
9. Oct. 22 Webinar: Ensuring Clean IP (Thomas Menzel)
----------------------------------------------------------------------
Message: 1
Date: Tue, 7 Oct 2008 13:53:09 +0200
From: <Daniel.Stucky@xxxxxxxxxxx>
Subject: [smila-dev] Problems with BinStorage
To: <smila-dev@xxxxxxxxxxx>
Message-ID:
<69D276452CD2904980D5B6AC33C9BE170D5FA118@xxxxxxxxxxxxxxxxxxxxxxxxx>
Content-Type: text/plain; charset="us-ascii"
Hi all,
we did some tests with a larger amount of data than in the usual
development cases to create some index dump files. The system performed
ok for about 2 hours, where 20 index dump files (each about 10 MB) were
created. The creation of the 21st file took about 30 min, the 22nd 4
hours.
I assume that one of the problems for the decreasing performance is the
BinStorage. For every record attachment a folder in
workspace\.metadata\.plugins\org.eclipse.eilf.binstorage\storage\default
with one file is created. After 7 hours it contained 109295 files (754
MB) and 109298 folders. NTFS (and also most linux filesystems) are not
optimized for such a huge amount of folders (or files) in ONE directory.
Remember that the goal is to index millions of documents! So we have to
change the behavior of BinStorage, it is a NO GO to store all documents
in one folder. I guess that the whole logic of BinStorage was programmed
by ourselves. Why did we do that ? Aren't there any implementations
already available in the open source community ? We should take a look
at how for example distributed filesystems like hadoup, or lucene stores
it's data. Or at least create a tree like structure beneath
org.eclipse.eilf.binstorage\storage\default.
Of course his is all up for discussion.
BTW: there is currently no documentation for BinStorage available in the
eclipse wiki. This should be added by the responsible developers.
Bye,
Daniel
------------------------------
Message: 2
Date: Tue, 07 Oct 2008 19:04:35 +0700
From: Dmitry Hazin <dhazin@xxxxxxxxxxxx>
Subject: Re: [smila-dev] Problems with BinStorage
To: Smila project developer mailing list <smila-dev@xxxxxxxxxxx>
Message-ID: <48EB5053.5010301@xxxxxxxxxxxx>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi,
There was a discussion about BinStorage redesign some time ago, where
this problem was discussed too.
Discussion started here:
http://dev.eclipse.org/mhonarc/lists/smila-dev/msg00084.html
So I think BinStorage should be in process of redesign now..
Thanks,
Dmitry
Daniel.Stucky@xxxxxxxxxxx wrote:
Hi all,
we did some tests with a larger amount of data than in the usual
development cases to create some index dump files. The system performed
ok for about 2 hours, where 20 index dump files (each about 10 MB) were
created. The creation of the 21st file took about 30 min, the 22nd 4
hours.
I assume that one of the problems for the decreasing performance is the
BinStorage. For every record attachment a folder in
workspace\.metadata\.plugins\org.eclipse.eilf.binstorage\storage\default
with one file is created. After 7 hours it contained 109295 files (754
MB) and 109298 folders. NTFS (and also most linux filesystems) are not
optimized for such a huge amount of folders (or files) in ONE directory.
Remember that the goal is to index millions of documents! So we have to
change the behavior of BinStorage, it is a NO GO to store all documents
in one folder. I guess that the whole logic of BinStorage was programmed
by ourselves. Why did we do that ? Aren't there any implementations
already available in the open source community ? We should take a look
at how for example distributed filesystems like hadoup, or lucene stores
it's data. Or at least create a tree like structure beneath
org.eclipse.eilf.binstorage\storage\default.
Of course his is all up for discussion.
BTW: there is currently no documentation for BinStorage available in the
eclipse wiki. This should be added by the responsible developers.
Bye,
Daniel
_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev
------------------------------
Message: 3
Date: Tue, 07 Oct 2008 19:06:20 +0700
From: Ivan Churkin <ivan@xxxxxxxxxxxx>
Subject: Re: [smila-dev] Problems with BinStorage
To: Smila project developer mailing list <smila-dev@xxxxxxxxxxx>
Message-ID: <48EB50BC.7090006@xxxxxxxxxxxx>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi Daniel
I voted many times to rewrite binstorage components (with more clear and
simple API).
see "[smila-dev] binstorage redesign" thread.
Imho, it's to many bundles and services for such ordinary component. And
I don't like content of that bundles...
--
Ivan
Daniel.Stucky@xxxxxxxxxxx wrote:
Hi all,
we did some tests with a larger amount of data than in the usual
development cases to create some index dump files. The system performed
ok for about 2 hours, where 20 index dump files (each about 10 MB) were
created. The creation of the 21st file took about 30 min, the 22nd 4
hours.
I assume that one of the problems for the decreasing performance is the
BinStorage. For every record attachment a folder in
workspace\.metadata\.plugins\org.eclipse.eilf.binstorage\storage\default
with one file is created. After 7 hours it contained 109295 files (754
MB) and 109298 folders. NTFS (and also most linux filesystems) are not
optimized for such a huge amount of folders (or files) in ONE directory.
Remember that the goal is to index millions of documents! So we have to
change the behavior of BinStorage, it is a NO GO to store all documents
in one folder. I guess that the whole logic of BinStorage was programmed
by ourselves. Why did we do that ? Aren't there any implementations
already available in the open source community ? We should take a look
at how for example distributed filesystems like hadoup, or lucene stores
it's data. Or at least create a tree like structure beneath
org.eclipse.eilf.binstorage\storage\default.
Of course his is all up for discussion.
BTW: there is currently no documentation for BinStorage available in the
eclipse wiki. This should be added by the responsible developers.
Bye,
Daniel
_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev
------------------------------
Message: 4
Date: Tue, 7 Oct 2008 14:05:48 +0200
From: Thomas Menzel <tmenzel@xxxxxxx>
Subject: [smila-dev] RE: Problems with BinStorage
To: Smila project developer mailing list <smila-dev@xxxxxxxxxxx>
Message-ID:
<6CDC32AFFBA5AA4B8BEA6397594F76BD1FA018F74D@xxxxxxxxxxxxxxxxxxxxxxx>
Content-Type: text/plain; charset="us-ascii"
hi marius,
can u take this into account? I totally agree on this subject with Daniel.
it also reflects on the discussion we had earlier about mimicking a file
system or not.
one train of thought was that the bin storage should create folders on its
own and that the user/admin should not need to take care of this.
I support this idea as long it applies to this performance problem. at the
same time I maintain that the bin storage also needs to give a folder view
to the client if the client wants to take care of this or has advanced
partitioning needs. however, it should not be possible for a client to
traverse the internal folder structure owned by the bin storage needed to
meet the perf. requirements.
also keep in mind that this only applies to bin storages backed by the
local file system and might not be needed by other underlying storages.
Kind regards
Thomas Menzel @ brox IT-Solutions GmbH
-----Original Message-----
From: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx]
On Behalf Of Daniel.Stucky@xxxxxxxxxxx
Sent: Dienstag, 7. Oktober 2008 13:53
To: smila-dev@xxxxxxxxxxx
Subject: [smila-dev] Problems with BinStorage
Hi all,
we did some tests with a larger amount of data than in the usual
development cases to create some index dump files. The system performed
ok for about 2 hours, where 20 index dump files (each about 10 MB) were
created. The creation of the 21st file took about 30 min, the 22nd 4
hours.
I assume that one of the problems for the decreasing performance is the
BinStorage. For every record attachment a folder in
workspace\.metadata\.plugins\org.eclipse.eilf.binstorage\storage\default
with one file is created. After 7 hours it contained 109295 files (754
MB) and 109298 folders. NTFS (and also most linux filesystems) are not
optimized for such a huge amount of folders (or files) in ONE directory.
Remember that the goal is to index millions of documents! So we have to
change the behavior of BinStorage, it is a NO GO to store all documents
in one folder. I guess that the whole logic of BinStorage was programmed
by ourselves. Why did we do that ? Aren't there any implementations
already available in the open source community ? We should take a look
at how for example distributed filesystems like hadoup, or lucene stores
it's data. Or at least create a tree like structure beneath
org.eclipse.eilf.binstorage\storage\default.
Of course his is all up for discussion.
BTW: there is currently no documentation for BinStorage available in the
eclipse wiki. This should be added by the responsible developers.
Bye,
Daniel
_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev
------------------------------
Message: 5
Date: Tue, 7 Oct 2008 15:09:42 +0300
From: "Marius Cimpean" <marius.cimpean@xxxxxxxxxxx>
Subject: [smila-dev] [Fatal Error] :1:1: Content is not allowed in
prolog & search test
To: <smila-dev@xxxxxxxxxxx>
Message-ID: <019B1208876346FF9186F18D95447B60@MariusNUMERICA>
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
reply-type=original
Hi
I just made an SVN update and run the local builds; then started the tests
(run the EILF and search tests).
There are two unexpected behaviors:
1. the search page does not return any results
2. the EILF console displays following message : "[Fatal Error] :1:1:
Content is not allowed in prolog."
when closing the app.
I guess, "preparing the bundles for checking-in" causes the this error
message ("[Fatal Error] :1:1: Content is not allowed in prolog") - it may
be
that some files (xml, xsd ...) got changed in some special text editor
(UTF-8 BOM issue) - so finally we end up in a parser error.
Does anyone else have these two behaviors ?
Best Regards,
Marius
------------------------------
Message: 6
Date: Tue, 07 Oct 2008 19:21:07 +0700
From: Ivan Churkin <ivan@xxxxxxxxxxxx>
Subject: Re: [smila-dev] [Fatal Error] :1:1: Content is not allowed in
prolog & search test
To: Smila project developer mailing list <smila-dev@xxxxxxxxxxx>
Message-ID: <48EB5433.5060300@xxxxxxxxxxxx>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Hi Marius,
Guess its my fault :(.
I doing massive changes with generated code. Will fix it soon.
--
Ivan
Marius Cimpean wrote:
Hi
I just made an SVN update and run the local builds; then started the
tests (run the EILF and search tests).
There are two unexpected behaviors:
1. the search page does not return any results
2. the EILF console displays following message : "[Fatal Error] :1:1:
Content is not allowed in prolog."
when closing the app.
I guess, "preparing the bundles for checking-in" causes the this error
message ("[Fatal Error] :1:1: Content is not allowed in prolog") - it
may be that some files (xml, xsd ...) got changed in some special text
editor (UTF-8 BOM issue) - so finally we end up in a parser error.
Does anyone else have these two behaviors ?
Best Regards,
Marius
_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev
------------------------------
Message: 7
Date: Tue, 07 Oct 2008 19:37:22 +0700
From: Ivan Churkin <ivan@xxxxxxxxxxxx>
Subject: Re: [smila-dev] [Fatal Error] :1:1: Content is not allowed in
prolog & search test
To: Smila project developer mailing list <smila-dev@xxxxxxxxxxx>
Message-ID: <48EB5802.5050909@xxxxxxxxxxxx>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
There are two unexpected behaviors:
>1. the search page does not return any results
It was because some new record filter "workflow-object" becomes required
but it was not reflected in the configuration.
It has been fixed.
>2. the EILF console displays following message : "[Fatal Error] :1:1:
Content is not allowed in prolog."
>when closing the app.
It's because I did changes with generated code and one commit was wrong
:(. It was fixed recently.
--
Regards, Ivan
Ivan Churkin wrote:
Hi Marius,
Guess its my fault :(.
I doing massive changes with generated code. Will fix it soon.
--
Ivan
Marius Cimpean wrote:
Hi
I just made an SVN update and run the local builds; then started the
tests (run the EILF and search tests).
There are two unexpected behaviors:
1. the search page does not return any results
2. the EILF console displays following message : "[Fatal Error] :1:1:
Content is not allowed in prolog."
when closing the app.
I guess, "preparing the bundles for checking-in" causes the this
error message ("[Fatal Error] :1:1: Content is not allowed in
prolog") - it may be that some files (xml, xsd ...) got changed in
some special text editor (UTF-8 BOM issue) - so finally we end up in
a parser error.
Does anyone else have these two behaviors ?
Best Regards,
Marius
_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev
_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev
------------------------------
Message: 8
Date: Tue, 7 Oct 2008 15:02:15 +0200
From: <Daniel.Stucky@xxxxxxxxxxx>
Subject: AW: [smila-dev] RE: Problems with BinStorage
To: <smila-dev@xxxxxxxxxxx>
Message-ID:
<69D276452CD2904980D5B6AC33C9BE170D5FA1AA@xxxxxxxxxxxxxxxxxxxxxxxxx>
Content-Type: text/plain; charset="iso-8859-1"
Hi Marius,
could you please add your (updated) concept for BinStorage to
http://wiki.eclipse.org/SMILA/Project_Concepts so that we have a common
base for further discussion.
Thanks.
Daniel
-----Ursprüngliche Nachricht-----
Von: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-
bounces@xxxxxxxxxxx] Im Auftrag von Thomas Menzel
Gesendet: Dienstag, 7. Oktober 2008 14:06
An: Smila project developer mailing list
Betreff: [smila-dev] RE: Problems with BinStorage
hi marius,
can u take this into account? I totally agree on this subject with
Daniel.
it also reflects on the discussion we had earlier about mimicking a
file system or not.
one train of thought was that the bin storage should create folders on
its own and that the user/admin should not need to take care of this.
I support this idea as long it applies to this performance problem. at
the same time I maintain that the bin storage also needs to give a
folder view to the client if the client wants to take care of this or
has advanced partitioning needs. however, it should not be possible for
a client to traverse the internal folder structure owned by the bin
storage needed to meet the perf. requirements.
also keep in mind that this only applies to bin storages backed by the
local file system and might not be needed by other underlying storages.
Kind regards
Thomas Menzel @ brox IT-Solutions GmbH
------------------------------
Message: 9
Date: Tue, 7 Oct 2008 15:42:53 +0200
From: Thomas Menzel <tmenzel@xxxxxxx>
Subject: [smila-dev] Oct. 22 Webinar: Ensuring Clean IP
To: Smila project developer mailing list <smila-dev@xxxxxxxxxxx>
Message-ID:
<6CDC32AFFBA5AA4B8BEA6397594F76BD1FA018F76A@xxxxxxxxxxxxxxxxxxxxxxx>
Content-Type: text/plain; charset="us-ascii"
http://www.eclipse.org/newsportal/article.php?id=1834&group=eclipse.foundation
Kind regards
Thomas Menzel @ brox IT-Solutions GmbH
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
https://dev.eclipse.org/mailman/private/smila-dev/attachments/20081007/8f9ae047/attachment.html
------------------------------
_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev
End of smila-dev Digest, Vol 4, Issue 13
****************************************