[
Date Prev][
Date Next][
Thread Prev][
Thread Next][
Date Index][
Thread Index]
[
List Home]
AW: [smila-dev] Lucene indexing performance
|
FYI, here's a short update of the still running Lucene indexing test with SMILA:
In the first hour, 85.000 docs were indexed.
In the secound hour, approx. 65.000 were indexed, makes 150.000 in total.
Now, after 7 hours, 380.000 docs are indexed, this is 55.000/hour.
Not sure how this will go on, but I think we have to do something...
BTW, in a test szenario without SMILA, it took 175 h to index the 25 Mio docs with Lucene.
(That's 140.000 docs/hour.)
Best regards,
Andreas
> -----Ursprüngliche Nachricht-----
> Von: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] Im Auftrag von
> Daniel.Stucky@xxxxxxxxxxx
> Gesendet: Mittwoch, 13. Mai 2009 13:31
> An: smila-dev@xxxxxxxxxxx
> Betreff: [smila-dev] Lucene indexing performance
>
> Hi all,
>
> during an index build (over 150.000 documents) we noticed that indexing
> speed gets slower as the index increases in size. Compared to the first
> hour of execution, the 2nd hour was only capable of indexing 80% of the
> load that was indexed in the first hour.
>
> I took a look at the Lucene integration code (by brox) and found, that
> for each index update (add or delete) a new IndexWriter is created and
> closed. This assures that the document is committed for IndexReaders and
> the index is flushed, but I guess that it's bad for performance.
>
> What were the reasons for implementing it that way ? Wouldn't it be
> possible to reuse an IndexWriter, flushing the index either by Memory
> usage or number of documents added/deleted ?
>
> Bye,
> Daniel
> _______________________________________________
> smila-dev mailing list
> smila-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/smila-dev