Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
AW: [smila-dev] OutOfMemoryException during Crawl

Hi Marius,
some comments inline.

> 1) why does BinStore use so much memory and does not free anything ?
> 
> [Marius] : current binstorage implementation does not release the file
> system manager (commons vfs). The commons vfs manager is being used to
> manage file persistence. By closing it, all files created by this
> manager
> will be closed (anyway, as i've seen, currently each file is being
> closed
> after creating it ... ), and cleans up any temporary files. This
> "release
> opportunity" shall be called only when the commons-vfs manager is no
> longer
> needed ... I'm not very optimist that applying this will solve the OOM
> at
> all. Anyway I 'll do some local tests and come back with an answer.
> There is one more thing that can be configured related to OOM issue
> (binstorage & commons-vfs) : the cache strategy which currently is set
> to
> refresh data every time the app. request a file - which is fine (the
> other
> two options are manually call to refresh (this would be better in OOM
> case,
> but the time/response will increase) and refresh data every time an
> instance
> is referred ... which is not an appropriate solution)

I think cache strategy ON_RESOLVE is reasonable. The cache
implementation used is also relevant. By default it's SoftRefFilesCach.
Quote from the documentation: "This cache will return the same instance
for a file as long as it is "strongly reachable" e.g. you hold a
reference to this object. If the FileObject is no longer reachable, and
the jvm needs some memory, it will be released."
In our latest test case the processing ends with BinStore writing
attachments to the filesystem. I would assume that thereafter no more
references exist and that any used memory could be freed. This is not
the case. We modified the impl using the NullFilesCache. To our surprise
the memory consumption was the same and the OOM occurred after just 15
minutes ! Perhaps the VFS api is not used as intended (just a guess)?
Again, I don't think that BinStore is responsible for the OOM but makes
it easier to happen. 



> 3) what causes this slow but linear increasing consumption of memory ?
> 
> [Marius]: The xml storage (by using the Oracle Berkeley DB Xml)
> represents
> an important memory consumption .... during my tests I often ended-up
> in
> OOM. The org.eclipse.smila.xmlstorage bundle takes care of resources
> releasing during the XML data processing.
> The idea is (I would call this an disadvantage of BDB Xml) that
> users/developers shall determine/estimate the volume of data which is
> going
> to be processed (parsed/stored/fetched) into the BDB Xml container(s)
> from
> the very beginning, before opening (starting up the BDB Xml). In many
> cases
> any re-configuration of BDB xml environment will have no effect until
> it
> gets restarted. When dealing with huge amount of data (if there are
> also
> many concurrent access users) situations like OOM or "unable to
> allocate
> memory for mutex" (error just reported by Ralf) can occur.
> As a conclusion, the xmlstorage uses memory - depending on the
> processed
> data; but the releasing resources techniques are applied... so, the
> "linear
> increasing consumption of memory" shouldn't be because of xmlstorage.

We did not see any problems regarding XML store. Remember that this
linear memory increase exists even without executing any actions in
SMILA. Just starting SMILA is enough.

Bye,
Daniel


Back to the top