Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [rdf4j-dev] Newer Lucene Version

Hi Bart, all,

the issue is now tracked in https://github.com/eclipse/rdf4j-storage/issues/71

Also I provided an initial pull request with the current status to https://github.com/eclipse/rdf4j-storage/pull/72

The updates of the Lucene Sail and Solr (Lucene 6.6.4 and Solr 6.6.4) look solid already and tests are passing.

Updating elasticsearch seems a bit more difficult. The code change should be roughly fine (though I still left some preliminary comments and TODO markers), however, I am still facing issues with the integration test. It fails with errors complaining about a "JarHell", i.e. different versions of transitive dependencies being loaded. I am currently looking into the maven descriptors to see if I can resolve this.

Could someone of you already check the current PR and give feedback? Especially the elasticsearch part, as I have never worked with it before and do not have a testing environment. Lucen Sail I will test in our test installation, so that I can definitely cover.

Also one more question: the lucene based modules still use "URI" instead of "IRI", should I also do the refactoring here?

Best,
 Andreas


2018-05-22 8:18 GMT+02:00 Bart Hanssens (BOSA) <bart.hanssens@xxxxxxxxxxxx>:

Hello Andreas,

 

 

Mark Hale could probably provide more info on the Elasticsearch part.

There a unit tests for elasticsearch and lucene, but I haven’t checked how much of the feature set is actually covered (James ? Jacek ?)

 

Meanwhile, could you perhaps create a new ticket in https://github.com/eclipse/rdf4j-storage/ ?

Hoping some other lucene/elasticsearch users will notice it, and would be willing/able to help…

 

 

Best regards

 

Bart

 

From: rdf4j-dev-bounces@xxxxxxxxxxx [mailto:rdf4j-dev-bounces@eclipse.org] On Behalf Of Andreas Schwarte
Sent: donderdag 17 mei 2018 19:22
To: rdf4j developer discussions <rdf4j-dev@xxxxxxxxxxx>
Subject: Re: [rdf4j-dev] Newer Lucene Version

 

Hi Bart,

thanks for the feedback.

I will go for the intermediate update then and try to provide a change.

For the lucene upgrade it is pretty much straight forward (lucene 6 and lucene 7 are mostly compatible on the API level), i.e. I could almost reuse my change from the evaluation. Once the code change is complete I will test the upgrade in one of our installations and check the compatibility of lucene indices.

Elasticsearch unfortunately is a different story with quite a lot of API changes and no clear migration guide. I will try to resolve the required changes and then someone with good knowledge about elasticsearch should do a careful review. So far I have not worked with elasticsearch at all.

By the way: is there good coverage for these features in the unit tests?

Best,

 Andreas

 

 

2018-05-16 23:26 GMT+02:00 Bart Hanssens (BOSA) <bart.hanssens@xxxxxxxxxxxx>:

Hi Andreas,

 

 

Thanks for the research, IMHO an upgrade would indeed be useful.

 

And indeed, Elasticsearch 2.4 is EOL as well (https://www.elastic.co/support/eol)

There was at least one similar issue on elasticsearch a few months ago (https://github.com/eclipse/rdf4j-storage/issues/18),

and some discussion on this list on Lucene and Elasticsearch versions (https://dev.eclipse.org/mhonarc/lists//rdf4j-dev/msg00400.html)

 

 

Any experience in upgrading existing Lucene 5 installations to 7.x ?

If I read this correctly, one should reindex the data, and skipping a major version does not seem advisable

https://lucene.apache.org/solr/guide/7_0/major-changes-in-solr-7.html

 

“Re-indexing your data is considered the best practice and you should try to do so if possible.

However, if re-indexing is not feasible, keep in mind you can only upgrade one major version at a time.

Thus, Solr 6.x indexes will be compatible with Solr 7 but Solr 5.x indexes will not be.”

 

 

Maybe a two-step approach would be better ?

I.e. first upgrade to Lucene 6.6 /  ElasticSearch 5.6 (supported until March 2019), then to 7.x in another RDF4J release ?

 

 

Best regards

 

Bart

 

From: rdf4j-dev-bounces@xxxxxxxxxxx [mailto:rdf4j-dev-bounces@eclipse.org] On Behalf Of Andreas Schwarte
Sent: woensdag 16 mei 2018 11:36
To: rdf4j developer discussions <rdf4j-dev@xxxxxxxxxxx>
Subject: [rdf4j-dev] Newer Lucene Version

 

Hi,

in our new company we have more strict security guidelines w.r.t open source software.

The lucene version that is currently bundled with RDF4J (i.e. lucene 5.x) is EOL, same for solr (and probably elasticsearch). The current stable is 7.3.0 (or 7.3.1).

How are the policies with an upgrade of these 3rd party components? Could this be done in a 2.4.0 release?

I have done an evaluation of the update. Quite a bit of Lucene API replacements required, but looks pretty dsave. The only thing that I could not solve so far is the update of "elasticsearch", which fails in maven with a "bytecode enforce check" on log4j

[INFO] Restricted to JDK 1.8 yet org.apache.logging.log4j:log4j-api:jar:2.9.1:compile contains META-INF/versions/9/org/apache/logging/log4j/util/ProcessIdUtil.class targeted to JDK 1.9

Any ideas on how to fix this?

If we are ok to get the work on an upgrade started I could upload my change as a pull request.

Opinions?

Thanks,

 Andreas


_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/rdf4j-dev

 


_______________________________________________
rdf4j-dev mailing list
rdf4j-dev@xxxxxxxxxxx
To change your delivery options, retrieve your password, or unsubscribe from this list, visit
https://dev.eclipse.org/mailman/listinfo/rdf4j-dev



Back to the top