Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
AW: [smila-dev] Use-Cases for Aperture subsets

Hi Aperture-Team,

at first I (and the whole SMILA Team) would like to thank you for all your effort you spent on the modularization of Aperture. I have found some time this week to check you results and have to say that it really looks great.

I downloaded the release, cherry picked the components we want to use in SMILA (which at the moment includes the mimetype identifier and the extractors for pdf, office and openoffice and of course all required dependencies). With a few adaptations our Aperture Pipelet now runs again.

So, concerning the use cases you described I think our use-case would be number 4. We would prefer to be able to cherrypick selected functionality only. For us it is perfectly ok to do this manually, no need for some extra maven target (which I suppose would have to be created to support the use-cases). 

So with your work you enabled us to now start the eclipse IP process for those selected jars (at the moment they count 17). We expect some issues in the IP process (e.g. missing releases for jempbox and pdfbox, but this is not of your concern. So we have to wait and see what jars are approved by eclipse IP process and the we can integrate those approved components. After that it would be nice if this approved "feature set" could be created automatically using maven, but as I said before it's OK to do this manually. No need to put some extra work on you guys.


So, thanks again for the great work !!!

Bye,
Daniel

-----Ursprüngliche Nachricht-----
Von: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] Im Auftrag von Antoni Mylka
Gesendet: Mittwoch, 26. August 2009 19:17
An: Aperture Devel; Smila project developer mailing list
Betreff: [smila-dev] Use-Cases for Aperture subsets

Aperturians, SMILA people,

What are/would be your use cases for a limited subset of Aperture.

Update for the SMILA mailing list: we've finished the mavenization and
released the 1.3.0 release which is fully OSGI-friendly and comes split
into 73 little modules you can cherry-pick at will. Please have a look
at the use cases below, they were mostly inspired by you.

Now a question to you all. Do you think it makes sense to ship all 73
little jars separately? Back in 2008 it seemed like a good idea. Now I'd
like to do get some feedback,

The use cases made possible by the whole mavenization:
1. only magic mimetype identifier + its api (2 jars)
2. only the extractors - just add a maven dependency on
   default-extractor, in total 28 aperture jars + 18 required
   dependencies
3. only stuff that works without LGPL dependencies,
   66 aperture jars + 29 non-LGPL deps = 95 jars to deploy
   everything without
   - html-helper (depends on LGPL htmlparser)
   - mime-extractor (on html-helper)
   - extractor-audio-mp3 (on LGPL jaudiotagger)
   - outlook-crawler (on LGPL jacob)
   - default-extractor (on mp3 and mime extractors)
   - default-crawler (on outlook-crawler)
   - default-runtime (on default extractors and crawlers)
  (tm-extractors are LGPL but the word-extractor can work without them)
4. only stuff that works with dependencies already approved for my
   project
   - remove non-approved dependencies
   - and all aperture modules depending on them (recursively)
5. in osgi start only those aperture services you need, and don't start
   any single service you don't need (deploy all 73 + 32 = 105 jars and
   then cherry-pick the aperture jars you don't need, together with
   their dependencies).

I'd like to know if you actually find those use-cases useful for your
projects, does this make sense for you or "WTF?, 108 jars for a single
freakin' library?, let's take the onejar and forget this whole nonsense".

I'm asking because we could reduce the current module count by 35
without sacrificing the use-cases 1-4. The use case 5 requires exactly
73 modules but it's definitely the weakest one. For instance a merge of
all POI extractors into one module would mean that either all of them
start at once or none at all, is there anyone who needs word documents
but doesn't want to process ones from visio? Even if there is, this can
be done at the application level.

If there is a strong need for a specific mix of aperture components, we
can prepare another runtime-whatever module and ship the mix as a single
jar. (SMILA?)

All kinds of comments welcome

Antoni Mylka
antoni.mylka@xxxxxxxxx

P.S ideas for merging modules
- merge all 'core', util and vocabulary modules -13
- merge all POI extractor modules -6
  - microsoft-util
  - microsoft-office
  - microsoft-visio
  - microsoft-word
  - microsoft-quattro
  - corel-office
  - corel-util
- merge all no-dependencies extractors into a single module -4
  - plaintext
  - xml
  - html
  - opendocument (what when a proper OO java library emerges)
  - openxml (may be later moved to POI, when we update to POI 3.5)
- merge datasource, accessor and crawler into a single module for
   - file (ds,cr,acc,detector) -3
   - ical (ds,cr) -1
   - web (ds,cr,acc) -2
   - mbox (ds,cr) -1
- merge all plain javamail modules -3
   - crawler-mail
   - crawler-imap
   - datasource-imap
   - sub-mime
- merge no-dependencies security modules -1
   - security-standard
   - security-swing
- mime-extractor is deprecated anyway, but it still uses the html-helper
which in turn introduces a dependency on htmlparser, removing the
mime-extractor would relieve us of two modules and one ugly LGPL dependency
_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev


Back to the top