Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
AW: AW: [smila-dev] SSS & Persistence Questions

Hi all,

#1: I tried it both with a keystore containing the needed certificate chain
as well as with providing no keystore in the configuration xml. Both yielded
the same exception as I already described.

#2: Currently that is no particular use case, I just want to be able to
directly access and dump the (raw/unprocessed!) content which has been
crawled. If it's already within the XML store - fine ! :) ... then for a
first step, I just need to now how to access it. (sorry if this is described
somewhere in the wiki, then just post me a link please...)


Thanks!

Markus




-----Ursprüngliche Nachricht-----
Von: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx] Im
Auftrag von dhazin@xxxxxxxxxxxx
Gesendet: Freitag, 7. November 2008 12:27
An: Smila project developer mailing list
Betreff: Re: AW: [smila-dev] SSS & Persistence Questions

Hi,

Regarding #1, are you sure that you really need SSL Authentication to
crawl your https website? SSLCertificate configuration is needed for
CLIENT ssl authentication, so probably you don't need it in your case. Try
to crawl without configuring SSLCertificate, just pass your https url as a
seed.

Thanks,
Dmitry


> Hi,
>
> #1: at least I think so, but I will recheck this.
>
> #2: Could you send me that code snippet you mentioned?
>
> Thanks!
>
> Markus
>
> -----Ursprüngliche Nachricht-----
> Von: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx]
> Im
> Auftrag von August Georg Schmidt
> Gesendet: Dienstag, 4. November 2008 16:51
> An: Smila project developer mailing list
> Betreff: RE: [smila-dev] SSS & Persistence Questions
>
> Hi Markus,
>
> thanks for your interest in SMILA.
>
> To your question #1:
>
> Are you sure you the full certificate chain imported in your cacerts file?
> The message looks like an missing certificate in your store.
>
> #2:
>
> You have to prepare a pipelet that is able to store this information in
> file
> system. Another way is just to export the data from Berkeley DB XML via
> our
> XQJ interface. I would be able to send you a snippet as a starting
> point...
> That should be quiet easy and fast... about one document within 5ms on
> file
> system.
>
> Hope this helps. If not try to get into touch with us. (Tomorrow I have
> several meetings but maybe another member may be able help)
>
> Kind regards,
>
> Georg
>
>
> -----Original Message-----
> From: smila-dev-bounces@xxxxxxxxxxx [mailto:smila-dev-bounces@xxxxxxxxxxx]
> On Behalf Of Markus
> Sent: Dienstag, 4. November 2008 15:24
> To: smila-dev@xxxxxxxxxxx
> Subject: [smila-dev] SSS & Persistence Questions
>
> Hi Smila Developers,
>
> I have two short questions:
>
>
> - I am currently trying to crawl a website via HTTPS, but I'm getting
> the following error message (in the logfile), although I have imported
> the required TrustedCertificate into a keystore and passed it properly
> via the XML Crawler configuration as a parameter:
>
>   2008-11-03 15:20:39,504 [Thread-8] ERROR fetcher.Fetcher - fetch of
> url failed with javax.net.ssl.SSLHandshakeException:
> sun.security.validator.ValidatorException: PKIX path building failed:
> sun.security.provider.certpath.SunCertPathBuilderException: unable to
> find valid certification path to requested target
>
> How could I track down the error?
>
> - What concreteley would I have to do to change the "5 minutes to
> success" example within the wiki in order to persist the content of
> crawled files (either to the filesystem or to a database) instead of
> just indexing them?
>
>
> Thanks alot!
>
> Best regards
>
>
> Markus
>
> ----------------------------------------------------------------
> This message was sent using IMP, the Internet Messaging Program.
>
>
> _______________________________________________
> smila-dev mailing list
> smila-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/smila-dev
> _______________________________________________
> smila-dev mailing list
> smila-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/smila-dev
>
> _______________________________________________
> smila-dev mailing list
> smila-dev@xxxxxxxxxxx
> https://dev.eclipse.org/mailman/listinfo/smila-dev
>


_______________________________________________
smila-dev mailing list
smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev



Back to the top