Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[rdf4j-dev] possible bug/performance bottleneck in nativestore bulk upload

I've been running some test with uploading a fairly large (~10M triples) Turtle file to a new Native Store in a single transaction.

One of the things I've noticed is that fairly quickly it starts creating a temporary native store in my system's temp directory. This appears to be caused by the MemoryOverflowModel that has reached it threshold and starts syncing to disk.

However, this immediately slows upload to a crawl.

I'm kind of missing the point of doing the overflow to a temp native store: I'm trying to upload a lot of data into a native store - surely temporarily syncing my upload to _another_ native store only to then at the end having to copy it over from this temporary store to the actual store is _never_ more performant than just directly syncing to the actual store?

Am I overlooking something or do we have a bit of a flow error here? Wouldn't it make more sense to have the native store's memory overflow model sync directly to the main store (_especially_ if the transaction is a bulk transaction, i.e. IsolationLevel.NONE)?

If not, shouldn't we at least make sure that the temporary native store is configured as load-performant as possible (single index etc.)? Or perhaps use a different kind of memory overflow (e.g. just a raw data file)?

Thoughts?

Jeen


Back to the top