Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
AW: [smila-dev] Lucene indexing performance

FYI, here's a short update of the still running Lucene indexing test with SMILA:

In the first hour, 85.000 docs were indexed.
In the secound hour, approx. 65.000 were indexed, makes 150.000 in total.
Now, after 7 hours, 380.000 docs are indexed, this is 55.000/hour.

Not sure how this will go on, but I think we have to do something...

BTW, in a test szenario without SMILA, it took 175 h to index the 25 Mio docs with Lucene.
(That's 140.000 docs/hour.)

Best regards,
 Andreas

> -----Ursprüngliche Nachricht-----
> Von: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] Im Auftrag von
> Daniel.Stucky@xxxxxxxxxxx
> Gesendet: Mittwoch, 13. Mai 2009 13:31
> An: smila-dev@xxxxxxxxxxx
> Betreff: [smila-dev] Lucene indexing performance
> 
> Hi all,
> 
> during an index build (over 150.000 documents) we noticed that indexing
> speed gets slower as the index increases in size. Compared to the first
> hour of execution, the 2nd hour was only capable of indexing 80% of the
> load that was indexed in the first hour.
> 
> I took a look at the Lucene integration code (by brox) and found, that
> for each index update (add or delete) a new IndexWriter is created and
> closed. This assures that the document is committed for IndexReaders and
> the index is flushed, but I guess that it's bad for performance.
> 
> What were the reasons for implementing it that way ? Wouldn't it be
> possible to reuse an IndexWriter, flushing the index either by Memory
> usage or number of documents added/deleted ?
> 
> Bye,
> Daniel
> _______________________________________________
> smila-dev mailing list
> smila-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/smila-dev


Back to the top