[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
[smila-dev] Use-Cases for Aperture subsets

Aperturians, SMILA people,

What are/would be your use cases for a limited subset of Aperture.

Update for the SMILA mailing list: we've finished the mavenization and
released the 1.3.0 release which is fully OSGI-friendly and comes split
into 73 little modules you can cherry-pick at will. Please have a look
at the use cases below, they were mostly inspired by you.

Now a question to you all. Do you think it makes sense to ship all 73
little jars separately? Back in 2008 it seemed like a good idea. Now I'd
like to do get some feedback,

The use cases made possible by the whole mavenization:
1. only magic mimetype identifier + its api (2 jars)
2. only the extractors - just add a maven dependency on
   default-extractor, in total 28 aperture jars + 18 required
   dependencies
3. only stuff that works without LGPL dependencies,
   66 aperture jars + 29 non-LGPL deps = 95 jars to deploy
   everything without
   - html-helper (depends on LGPL htmlparser)
   - mime-extractor (on html-helper)
   - extractor-audio-mp3 (on LGPL jaudiotagger)
   - outlook-crawler (on LGPL jacob)
   - default-extractor (on mp3 and mime extractors)
   - default-crawler (on outlook-crawler)
   - default-runtime (on default extractors and crawlers)
  (tm-extractors are LGPL but the word-extractor can work without them)
4. only stuff that works with dependencies already approved for my
   project
   - remove non-approved dependencies
   - and all aperture modules depending on them (recursively)
5. in osgi start only those aperture services you need, and don't start
   any single service you don't need (deploy all 73 + 32 = 105 jars and
   then cherry-pick the aperture jars you don't need, together with
   their dependencies).

I'd like to know if you actually find those use-cases useful for your
projects, does this make sense for you or "WTF?, 108 jars for a single
freakin' library?, let's take the onejar and forget this whole nonsense".

I'm asking because we could reduce the current module count by 35
without sacrificing the use-cases 1-4. The use case 5 requires exactly
73 modules but it's definitely the weakest one. For instance a merge of
all POI extractors into one module would mean that either all of them
start at once or none at all, is there anyone who needs word documents
but doesn't want to process ones from visio? Even if there is, this can
be done at the application level.

If there is a strong need for a specific mix of aperture components, we
can prepare another runtime-whatever module and ship the mix as a single
jar. (SMILA?)

All kinds of comments welcome

Antoni Mylka
antoni.mylka@xxxxxxxxx

P.S ideas for merging modules
- merge all 'core', util and vocabulary modules -13
- merge all POI extractor modules -6
  - microsoft-util
  - microsoft-office
  - microsoft-visio
  - microsoft-word
  - microsoft-quattro
  - corel-office
  - corel-util
- merge all no-dependencies extractors into a single module -4
  - plaintext
  - xml
  - html
  - opendocument (what when a proper OO java library emerges)
  - openxml (may be later moved to POI, when we update to POI 3.5)
- merge datasource, accessor and crawler into a single module for
   - file (ds,cr,acc,detector) -3
   - ical (ds,cr) -1
   - web (ds,cr,acc) -2
   - mbox (ds,cr) -1
- merge all plain javamail modules -3
   - crawler-mail
   - crawler-imap
   - datasource-imap
   - sub-mime
- merge no-dependencies security modules -1
   - security-standard
   - security-swing
- mime-extractor is deprecated anyway, but it still uses the html-helper
which in turn introduces a dependency on htmlparser, removing the
mime-extractor would relieve us of two modules and one ugly LGPL dependency