Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [smila-dev] SMILA as Search engine

Hi,

Am 28.08.2012 13:02, schrieb Corinth, Rene:
I’m working for the PT-DLR (http://www.pt-dlr.de/) and we are managing a lot of websites. Now we want to replace our actual search engine with Smila. By default Smila is indexing http://wiki.eclipse.org/SMILA/ and it’s easy to change the startURL in the jobs.json.

Now my problem: I want to give Smila more than one website (e.g. url1.com + url2.com). So the indexing  should work independent of each other.

You can add more crawl job definitions, one for each web site.
Either add them to the configuration jobs.json file, or POST them to /smila/jobmanager/jobs.

Another possibility to do this in one job is described on http://wiki.eclipse.org/SMILA/Documentation/Importing/CrawlingMultipleStartURLs.

In addition if I implement a search form in the website, it should show only content from itself, for example:

If I’m searching something in url1.com, stuff from url1.com should be shown only.
For each crawled page you could extract the domain part of the URL into a new attribute and then in the search request add a filter to restrict the result to those pages with the required domain attribute value.

On adding attributes to the index see http://wiki.eclipse.org/SMILA/Documentation/Solr_3.5
On filtering see see http://wiki.eclipse.org/SMILA/Documentation/Search#Query_Parameters

Does anybody know where I could find some tutorials for my case or can give me some hints.

Sorry, there is currently no complete tutorial on this.

Cheers,
Juergen.

Back to the top