Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [smila-dev] search api : result structure

Hi, 

Am Donnerstag, den 24.03.2011, 10:59 +0100 schrieb Thomas Menzel:
> hi,
> I have posted on the wiki page a discussion note, to which nobody
> replied yet. 
> 
> hence, in case u missed it, I want to give a gentle hint via the
> mailing list and suggest to start/continue discussion by mail of these
> items that I just copied from the wiki page:

Yes, I missed it. Sorry (; 

> Discussion
> Top 1 
> TM: i suppose that the '_' indicates meta information regarding the
> result item? is that a convention in this context? 

Yes, it is. See the explanation for "Metadata Elements" under
http://wiki.eclipse.org/SMILA/Documentation/2011.Simplification/Data_Model_and_Serialization_Formats#Concepts

> Top 2 
> TM: previously each result item was an own record. what is the reason
> to change this so completely? 

The main reason was to be able to get rid of the "SearchPipelet" and
"SearchPipeline" distinction. And if we put all search result records on
the blackboard side by side with the query record, it would not be nice
to distinguish them after the retrieval (id[0] is query, rest is
results? not so nice). It's easier this way, I think. It would even
enable you to process multiple searches in one single call (maybe in
different indexes?). It's not supported by the current search pipelet
implementations and search service API, but that could be extended quite
easily (just add some [] and loops ;-)

> TOP 3 
> TM: since score is used to describe and explain the meaning of
> _weight, why not use _score and convey the meaning directly? 

_weight is a name we used internally, so i just reused it. _score would
be fine with me. We are in some clean-up discussions of our internal
APIs anyway, so I see if we can align this.

> TOP 4 return binary content
> TM: there is no nice way to return binary content anymore. these 2
> solutions i came up with: 
> 
>      1. add an attachment to the search record with a name after this
>         pattern: <resultItem-record.Id>.<resultItem.atachmentName> 
>      2. convert the byte[] into a string and return it in the AnyMap
> 
I thought of something like 1., maybe
<result-item-index>.<attachmentName>, e.g. "0.content", "1.content",
it's a bit shorter than using the _recordid. 2. would means to use
something like base64 for real binary data
On the other hand I think of attachments as a mean to get big, binary
data into the system during crawling and transform it to "real" metadata
in the indexing process so that you don't need to have attachments in
the search result, therefore I did not specify this in more detail.

> adding to this list I want this point:
>
> top 5 result record vs. result item

> since the results are returned as part of one record now (see top 2),
> I suggest to use a diff. wording here. I would talk of result *items*
> instead of result *records*, since they are not records anymore (in a
> technical sense) and it gets confusing when classes and documentation
> still talks about result *record* (at least I was confused in the
> beginning). hence, I would rename the ResultRecordAccessor ->
> ResultItemAccessor which yields IMO a clearer distinction to the
> ResultAccessor.

Basically the same answer as for TOP 3: just reused an internally used
name, I'm not in love with it myself (: However, they may not be records
anymore in the sense of the data object, but they are metadata from
indexed records. We could also use "results". I'll add it to my list.

Cheers,
Jürgen.




Back to the top