Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [smila-dev] search api : result structure

>-----Original Message-----
>From: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] On Behalf Of Juergen Schumacher
>Sent: Donnerstag, 24. März 2011 13:27
>To: Smila project developer mailing list
>Subject: Re: [smila-dev] search api : result structure
>
...

>> Discussion
>> Top 1 
>> TM: i suppose that the '_' indicates meta information regarding the
>> result item? is that a convention in this context? 
>
>Yes, it is. See the explanation for "Metadata Elements" under
>http://wiki.eclipse.org/SMILA/Documentation/2011.Simplification/Data_Model_and_Serialization_Formats#Concepts

ah, thanks. I missed that when reading...

>> Top 2 
>> TM: previously each result item was an own record. what is the reason
>> to change this so completely? 
>
>The main reason was to be able to get rid of the "SearchPipelet" and
>"SearchPipeline" distinction. And if we put all search result records on
>the blackboard side by side with the query record, it would not be nice
>to distinguish them after the retrieval (id[0] is query, rest is
>results? not so nice). It's easier this way, I think. It would even
>enable you to process multiple searches in one single call (maybe in
>different indexes?). It's not supported by the current search pipelet
>implementations and search service API, but that could be extended quite
>easily (just add some [] and loops ;-)

that makes sense. +1 then 
I'm wondering if we should include this as an important hint regarding migration...

>> TOP 3 
>> TM: since score is used to describe and explain the meaning of
>> _weight, why not use _score and convey the meaning directly? 
>
>_weight is a name we used internally, so i just reused it. _score would
>be fine with me. We are in some clean-up discussions of our internal
>APIs anyway, so I see if we can align this.

+1

>
>> TOP 4 return binary content
>> TM: there is no nice way to return binary content anymore. these 2
>> solutions i came up with: 
>> 
>>      1. add an attachment to the search record with a name after this
>>         pattern: <resultItem-record.Id>.<resultItem.atachmentName> 
>>      2. convert the byte[] into a string and return it in the AnyMap
>> 
>I thought of something like 1., maybe
><result-item-index>.<attachmentName>, e.g. "0.content", "1.content",
>it's a bit shorter than using the _recordid. 2. would means to use
>something like base64 for real binary data
>On the other hand I think of attachments as a mean to get big, binary
>data into the system during crawling and transform it to "real" metadata
>in the indexing process so that you don't need to have attachments in
>the search result, therefore I did not specify this in more detail.

well, we have a use case where we return the full content of the indexed document to a client. 
in order to support rights we need to do a search on the index that checks if the requesting user has access and as part of the result returned the document byte[] in the attachment.

I can live with the solutions above, and just wanted to point that such a need exists.
we should add that to wiki as well...

>
>> adding to this list I want this point:
>>
>> top 5 result record vs. result item
>
>> since the results are returned as part of one record now (see top 2),
>> I suggest to use a diff. wording here. I would talk of result *items*
>> instead of result *records*, since they are not records anymore (in a
>> technical sense) and it gets confusing when classes and documentation
>> still talks about result *record* (at least I was confused in the
>> beginning). hence, I would rename the ResultRecordAccessor ->
>> ResultItemAccessor which yields IMO a clearer distinction to the
>> ResultAccessor.
>
>Basically the same answer as for TOP 3: just reused an internally used
>name, I'm not in love with it myself (: However, they may not be records
>anymore in the sense of the data object, but they are metadata from
>indexed records. We could also use "results". I'll add it to my list.

exactly my sentiment! +1


Back to the top