Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [smila-dev] AW: [Aperture-devel] Aperture bundlization for SMILA

2009/1/26  <Daniel.Stucky@xxxxxxxxxxx>:
> Hi all,
>
> with the fixes provided by Antoni I managed to get the "bundelized"
> aperture to run in Smila.
>
> In Smila we should refactor our two existing aperture integration
> bundles into just one and also clean up the code and implement a
> ProcessingService instead of a pipelet (Aperture OSGi services are used
> now which "cries" to use DS)
>
> Here is a list of all the bundles (and their License) required to run
> "bundelized" aperture in Smila:
>
> com.drew.metadata_2.4.0.jar (Public Domain)
> javax.activation_1.1.1.jar (CDDL)
> javax.mail_1.4.1.jar (CDDL)
> jcl104-over-slf4j-1.5.0.jar (MIT)
> openrdf-sesame-2.2.1-onejar-osgi.jar (BSD)
> org.apache.poi_3.2.0.jar (Apache License 2.0)
> org.bouncycastle.bcmail_1.32.0.jar (MIT)
> org.bouncycastle.bcprovider_1.32.0.jar (MIT)
> org.fontbox_0.2.0.jar (BSD)
> org.htmlparser_1.6.0.jar (CPL 1.0)
> org.jempbox.xmp_0.2.0.jar (BSD)
> org.pdfbox_0.7.4.jar (BSD)
> org.semanticdesktop.aperture.safe_1.2.0.jar (BSD)
> org.semanticdesktop.aperture_1.2.0.jar (BSD)
> rdf2go.api-4.7.0.jar (BSD)
> rdf2go.impl.sesame22-4.7.0.jar (BSD)
> slf4j-api-1.5.0.jar (MIT)
> slf4j-jdk14-1.5.0.jar (MIT)
> com.sun.media.jai (Sun Binary Code License Agreement) required by
> PDFBox. Did not publish this bundle yet, as we can't use it in Smila.
>
> License wise, the bundles are all EPL compatible except for
> com.sun.media.jai.
>
> 1) bundle org.semanticdesktop.aperture.safe_1.2.0.jar imports packages
> from org.pdfbox_0.7.4.jar which in turn imports packages from
> com.sun.media.jai. As the latter can't be provided by Smila (because of
> LGPL) the other two bundles cannot be started if these packages are
> missing!!! So we should separate the Extractors relying on PDFBox from
> the other Extractors (putting them in  their own bundle).

PDFBox moved to Apache, they already removed the classes that depend
on JAI and will make a release soon:

> It seems to be a good approach in general, to provide the Extractors not
> in one bundle but on a "bundle per extractor" basis. Even though the
> Licenses of the other 3rd party bundles are OK, this does NOT mean that
> the bundles will pass eclipse legal process ! One common problem is code
> provenance. So if all Extractors remain in one bundle
> org.semanticdesktop.aperture.safe_1.2.0.jar and just one 3rd party
> bundle used by one Extractor does not pass it's CQ, Aperture can't be
> used in Smila until this CQ is resolved or the dependencies are removed.
> Finer grained bundles will allow us to use Aperture with a subset of
> available Extractors. Adding additional extractors when their CQs are
> completed.

My idea is two have one module enabled for ESF and another one not. We
could juggle components between those two, but having a separate
bundle for each extractor will be difficult.

> 2) bundle org.semanticdesktop.aperture_1.2.0.jar contains 2 jar files
>        + aduna-commons-xml.2.0.jar
>        + applewrapper-0.2.jar
>  We need to create CQs for both jars and according to
> http://aperture.wiki.sourceforge.net/Dependencies applewrapper-0.2.jar
> is LGPL !? Are there any alternatives ?

applewrapper-0.2.jar is BSD, said the author - Gunnar Grimnes on this
list on 17.09.2008. All the source files in the source repository have
BSD headers. The dependencies page had a mistake - I fixed it. The
applewrapper-license.txt file was correct.

> 3) do we need all those bundles for just mimetype detection and
> extractors ? (e.g. sesame ?) Or could some dependencies be removed,
> perhaps also by finer grained bundles ?

For aperture to work, it must have an RDF store. You can use an
in-memory implementation get the fulltext and discard the RDF, but RDF
is necessary. Theoretically you could use any RDF store implementation
for which an RDF2Go adapter is available, but practically AFAIK only
the sesame adapter is actively maintained and works with the latest
version of sesame.

-- 
Antoni Myłka
antoni.mylka@xxxxxxxxx


Back to the top