Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
AW: [smila-user] AW: Funny questions about SMILAs

you are working with the standardanalyzer... it has a lot of english specific features...
 
The standard analyzer is doing the following tings:
* Tokenization (based on gramar with email detection, acronyms, CJK, aphanumerics, ...)
* Lower Case
* Removes stop words
 
BTW: Answre 3 was incorrect.
 
Dependent on the analyzer you've chosen the ü and ue should not be the same... therefore its a bit strange.
 
Do you have more detailed information? e.g. search requests...
 
Have you further tried luke?
 

Von: smila-user-bounces@xxxxxxxxxxx [smila-user-bounces@xxxxxxxxxxx] im Auftrag von Andreas.Schultz@xxxxxxxxxxx [Andreas.Schultz@xxxxxxxxxxx]
Gesendet: Dienstag, 2. Februar 2010 21:41
An: smila-user@xxxxxxxxxxx
Cc: Igor.Novakovic@xxxxxxxxxxx
Betreff: AW: [smila-user] AW: Funny questions about SMILAs

<?xml version="1.0" encoding="UTF-8"?>

<!--

/***********************************************************************************************************************

 * Copyright (c) 2008 empolis GmbH and brox IT Solutions GmbH. All rights reserved. This program and the accompanying

 * materials are made available under the terms of the Eclipse Public License v1.0 which accompanies this distribution,

 * and is available at http://www.eclipse.org/legal/epl-v10.html

 *

 * Contributors: brox IT-Solutions GmbH - initial creator

 **********************************************************************************************************************/

-->

<AnyFinderDataDictionary xmlns="http://www.anyfinder.de/DataDictionary" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.anyfinder.de/DataDictionary ../xml/AnyFinderDataDictionary.xsd">

  <Index Name="test_index" ForceFlush="true" >

    <Connection xmlns="http://www.anyfinder.de/DataDictionary/Connection" MaxConnections="5"/>

    <IndexStructure xmlns="http://www.anyfinder.de/IndexStructure" Name="test_index">

      <Analyzer ClassName="org.apache.lucene.analysis.standard.StandardAnalyzer"/>

      <IndexField FieldNo="17" IndexValue="true" Name="UserId" StoreText="true" Tokenize="false" Type="Text"/>

      <IndexField FieldNo="16" IndexValue="true" Name="View_4" StoreText="true" Tokenize="false" Type="Text"/>

      <IndexField FieldNo="15" IndexValue="true" Name="View_3" StoreText="true" Tokenize="false" Type="Text"/>

      <IndexField FieldNo="14" IndexValue="true" Name="View_2" StoreText="true" Tokenize="false" Type="Text"/>

      <IndexField FieldNo="13" IndexValue="false" Name="View_1" StoreText="true" Tokenize="false" Type="Text"/>

      <IndexField FieldNo="12" IndexValue="true" Name="Source" StoreText="true" Tokenize="false" Type="Text"/>

      <IndexField FieldNo="11" IndexValue="true" Name="Leading_ID" StoreText="true" Tokenize="false" Type="Text"/>

      <IndexField FieldNo="10" IndexValue="true" Name="Category" StoreText="true" Tokenize="false" Type="Text"/>

      <IndexField FieldNo="9" IndexValue="true" Name="Author" StoreText="true" Tokenize="true" Type="Text"/>

      <IndexField FieldNo="8" IndexValue="true" Name="MimeType" StoreText="true" Tokenize="true" Type="Text"/>

      <IndexField FieldNo="7" IndexValue="true" Name="Size" StoreText="true" Tokenize="true" Type="Number"/>

      <IndexField FieldNo="6" IndexValue="true" Name="Extension" StoreText="true" Tokenize="true" Type="Text"/>

      <IndexField FieldNo="5" IndexValue="true" Name="Title" StoreText="true" Tokenize="true" Type="Text"/>

      <IndexField FieldNo="4" IndexValue="true" Name="Url" StoreText="true" Tokenize="false" Type="Text">

        <Analyzer ClassName="org.apache.lucene.analysis.WhitespaceAnalyzer"/>

      </IndexField>

      <IndexField FieldNo="3" IndexValue="true" Name="LastModifiedDate" StoreText="true" Tokenize="false" Type="Date"/>

      <IndexField FieldNo="2" IndexValue="true" Name="Path" StoreText="true" Tokenize="true" Type="Text"/>

      <IndexField FieldNo="1" IndexValue="true" Name="Filename" StoreText="true" Tokenize="true" Type="Text"/>

      <IndexField FieldNo="0" IndexValue="true" Name="Content" StoreText="true" Tokenize="true" Type="Text"/>

    </IndexStructure>

    <Configuration xmlns="http://www.anyfinder.de/DataDictionary/Configuration" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.anyfinder.de/DataDictionary/Configuration ../xml/DataDictionaryConfiguration.xsd">

      <DefaultConfig>

                               <Field FieldNo="17">

          <FieldConfig Constraint="required" Weight="0" xsi:type="FTText">

            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>

          </FieldConfig>

        </Field> 

                               <Field FieldNo="16">

          <FieldConfig Constraint="optional" Weight="0" xsi:type="FTText">

            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>

          </FieldConfig>

        </Field>       

                               <Field FieldNo="15">

          <FieldConfig Constraint="optional" Weight="0" xsi:type="FTText">

            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>

          </FieldConfig>

        </Field>       

                               <Field FieldNo="14">

          <FieldConfig Constraint="optional" Weight="0" xsi:type="FTText">

            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>

          </FieldConfig>

        </Field>       

                               <Field FieldNo="13">

          <FieldConfig Constraint="optional" Weight="0" xsi:type="FTText">

            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>

          </FieldConfig>

        </Field>       

                               <Field FieldNo="12">

          <FieldConfig Constraint="optional" Weight="0" xsi:type="FTText">

            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>

          </FieldConfig>

        </Field>       

                               <Field FieldNo="11">

          <FieldConfig Constraint="optional" Weight="0" xsi:type="FTText">

            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>

          </FieldConfig>

        </Field>       

                               <Field FieldNo="10">

          <FieldConfig Constraint="required" Weight="0" xsi:type="FTText">

            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>

          </FieldConfig>

        </Field>       

                               <Field FieldNo="9">

          <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">

            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>

          </FieldConfig>

        </Field>       

        <Field FieldNo="8">

          <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">

            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>

          </FieldConfig>

        </Field>

        <Field FieldNo="7">

          <FieldConfig Constraint="required" Weight="0" xsi:type="FTNumber">

            <Parameter xmlns="http://www.anyfinder.de/Search/NumberField"/>

          </FieldConfig>

        </Field>

        <Field FieldNo="6">

          <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">

            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>

          </FieldConfig>

        </Field>       

        <Field FieldNo="5">

          <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">

            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>

          </FieldConfig>

        </Field>

        <Field FieldNo="4">

          <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">

            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>

          </FieldConfig>

        </Field>

        <Field FieldNo="3">

          <FieldConfig Constraint="required" Weight="0" xsi:type="FTDate">

            <Parameter xmlns="http://www.anyfinder.de/Search/DateField"/>

          </FieldConfig>

        </Field>

        <Field FieldNo="2">

          <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">

            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>

          </FieldConfig>

        </Field>

        <Field FieldNo="1">

          <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">

            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>

          </FieldConfig>

        </Field>

        <Field FieldNo="0">

          <FieldConfig Constraint="optional" Weight="1" xsi:type="FTText">

            <NodeTransformer xmlns="http://www.anyfinder.de/Search/ParameterObjects" Name="urn:ExtendedNodeTransformer">

              <ParameterSet xmlns="http://www.brox.de/ParameterSet"/>

            </NodeTransformer>

            <Parameter xmlns="http://www.anyfinder.de/Search/TextField" Operator="OR" Tolerance="exact"/>

          </FieldConfig>

        </Field>

      </DefaultConfig>

    </Configuration>

  </Index>

</AnyFinderDataDictionary>

 

Andreas Schultz
Senior Software Developer

- - - - Bitte beachten Sie meine neuen Kontaktdaten - - - -


Empolis GmbH  |  Meisenstr. 90 | 33607 Bielefeld  |  Germany
AN ATTENSITY GROUP COMPANY
Phone +49 (0)521 55 785 413|  Fax +49 (0)521 55 785 121
andreas.schultz@xxxxxxxxxxx

 

www.empolis.com
Sitz Kaiserslautern  |  Amtsgericht Kaiserslautern HRB 30711  |  Geschäftsführer: Dr. Stefan Wess, Dr. Peter Tepassé

 

………………………………………………………………………………………………………………………………………………………………………………………………………..

Know. Right. Now.

Das ist unsere Philosophie. Empolis, an Attensity Group Company, bietet eine integrierte Suite von Geschäftsanwendungen,

die mit Hilfe patentierter semantischer Informations-Technologien die exponentiell wachsende Menge unstrukturierter
Daten analysiert, interpretiert und automatisiert verarbeitet. Entscheider, Experten, Mitarbeiter und Kunden erhalten so
stets situations- und aufgabengerecht genau das Wissen, das für ihre Arbeit relevant ist.

………………………………………………………………………………………………………………………………………………………………………………………………………..

Abonnieren Sie unseren monatlichen Newsletter: http://www.empolis.de/newsletter.html

 

Von: smila-user-bounces@xxxxxxxxxxx [mailto:smila-user-bounces@xxxxxxxxxxx] Im Auftrag von Georg Schmidt
Gesendet: Dienstag, 2. Februar 2010 21:40
An: Smila project user mailing list
Cc: Novakovic, Igor, M-E-D
Betreff: AW: [smila-user] AW: Funny questions about SMILAs

 

Sorry. the attachment is blocked by my mail provider... please add it into mail body.

 

 


Von: smila-user-bounces@xxxxxxxxxxx [smila-user-bounces@xxxxxxxxxxx] im Auftrag von Andreas.Schultz@xxxxxxxxxxx [Andreas.Schultz@xxxxxxxxxxx]
Gesendet: Dienstag, 2. Februar 2010 21:33
An: smila-user@xxxxxxxxxxx
Cc: Igor.Novakovic@xxxxxxxxxxx
Betreff: AW: [smila-user] AW: Funny questions about SMILAs

Hi Georg,

 

thanks for your answers.

Attachted the DD under concern.

 

Best

 

Andreas Schultz
Senior Software Developer

- - - - Bitte beachten Sie meine neuen Kontaktdaten - - - -


Empolis GmbH  |  Meisenstr. 90 | 33607 Bielefeld  |  Germany
AN ATTENSITY GROUP COMPANY
Phone +49 (0)521 55 785 413|  Fax +49 (0)521 55 785 121
andreas.schultz@xxxxxxxxxxx

 

www.empolis.com
Sitz Kaiserslautern  |  Amtsgericht Kaiserslautern HRB 30711  |  Geschäftsführer: Dr. Stefan Wess, Dr. Peter Tepassé

 

………………………………………………………………………………………………………………………………………………………………………………………………………..

Know. Right. Now.

Das ist unsere Philosophie. Empolis, an Attensity Group Company, bietet eine integrierte Suite von Geschäftsanwendungen,

die mit Hilfe patentierter semantischer Informations-Technologien die exponentiell wachsende Menge unstrukturierter
Daten analysiert, interpretiert und automatisiert verarbeitet. Entscheider, Experten, Mitarbeiter und Kunden erhalten so
stets situations- und aufgabengerecht genau das Wissen, das für ihre Arbeit relevant ist.

………………………………………………………………………………………………………………………………………………………………………………………………………..

Abonnieren Sie unseren monatlichen Newsletter: http://www.empolis.de/newsletter.html

 

Von: smila-user-bounces@xxxxxxxxxxx [mailto:smila-user-bounces@xxxxxxxxxxx] Im Auftrag von Georg Schmidt
Gesendet: Dienstag, 2. Februar 2010 20:27
An: Smila project user mailing list
Betreff: [smila-user] AW: Funny questions about SMILAs

 

Hi Andreas,

 

to answer the questions its important to add further information to the question.

 

Please add for all questions the data dictionary definition of the index. In SMILA each field may have an own analyzer and the search functionalty is quiet dependend on the analyzer used for a given field.

 

1) Analyzer dependent... Keep in mind that stemming (e.g. english one) may be used dependent on the analyzer... German umlauts may just be rubbish to them.

 

2) yes... just use the constructor definitions (via. DD; take a look onto the schema and just parameterize it as the constructors are used in Java)

 

3) Stemming

 

4) strange... i think they ought to be white spaces...

 

Kind Regards,

 

Georg

 


Von: smila-user-bounces@xxxxxxxxxxx [smila-user-bounces@xxxxxxxxxxx] im Auftrag von Andreas.Schultz@xxxxxxxxxxx [Andreas.Schultz@xxxxxxxxxxx]
Gesendet: Montag, 1. Februar 2010 10:09
An: smila-user@xxxxxxxxxxx; smila-dev@xxxxxxxxxxx
Betreff: [smila-user] Funny questions about SMILAs

Hi all,

 

it would be kind of you to help me concerning the following topics:

 

1)       How does SMILA work with German special characters like ö,ä,ü,ß.

I tried request with “Schueler”/ “Schüler” and the result was nearly the same.

But when I tried “über” / “ueber” the second request does not return any response.

So please tell me why Schueler and Schüler as part of a request seem to be identical, but über and ueber not!?

2)       Is the Lucene- StandardAnalyzer in a way configurable which allows to alter/add/delete/ etc. stop-words?

3)       Does the Lucene- StandardAnalyzer provide a normalization?

4)       Using “\n”, “\r” or “\t” as a search request leads to a search result which is not empty. Could this be disabled?

 

Best

Andreas Schultz
Senior Software Developer

- - - - Bitte beachten Sie meine neuen Kontaktdaten - - - -


Empolis GmbH  |  Meisenstr. 90 | 33607 Bielefeld  |  Germany
AN ATTENSITY GROUP COMPANY
Phone +49 (0)521 55 785 413|  Fax +49 (0)521 55 785 121
andreas.schultz@xxxxxxxxxxx

 

www.empolis.com
Sitz Kaiserslautern  |  Amtsgericht Kaiserslautern HRB 30711  |  Geschäftsführer: Dr. Stefan Wess, Dr. Peter Tepassé

 

………………………………………………………………………………………………………………………………………………………………………………………………………..

Know. Right. Now.

Das ist unsere Philosophie. Empolis, an Attensity Group Company, bietet eine integrierte Suite von Geschäftsanwendungen,

die mit Hilfe patentierter semantischer Informations-Technologien die exponentiell wachsende Menge unstrukturierter
Daten analysiert, interpretiert und automatisiert verarbeitet. Entscheider, Experten, Mitarbeiter und Kunden erhalten so
stets situations- und aufgabengerecht genau das Wissen, das für ihre Arbeit relevant ist.

………………………………………………………………………………………………………………………………………………………………………………………………………..

Abonnieren Sie unseren monatlichen Newsletter: http://www.empolis.de/newsletter.html

 


Back to the top