Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[rdf4j-dev] Question about parsing large files

Hi everyone!

I'm currently working on a project that involves bulk loading larger files (right now limited to n3, ttl and associated family). I was trying to parse about a 100 million triples (~13 GB) and it caused the parser to run out of memory with the JVM set to have 32 GB of heap.

Looked quickly into the parser implementation and it seems like the parser is not able to be set to parse per statement iteration. So what I am doing now is trying to chunk the larger triple files into a series of smaller ones and load each individually, but this is proving rather error prone and unmaintainable in the long term for different formats.

Does anyone have any insights into how to better approach this or work around a stream parsing? Also, newer to the codebase, so any pointers if I missed something would be appreciated!

Thank you!
- Benjamin Herber




Back to the top