Skip to main content

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] [List Home]
Re: [smila-dev] SMILA Crawling mySQL DB

Hi there,

we are running SMILA-incubation-0.5-M3 and I've attached our crawler config as well as the JdbcDataSourceConnectionConfigSchema.xsd.

Thanks for your patience!

Best,
Kerstin


On 19.02.2010 09:15, Thomas Menzel wrote:
Hi,

This sound all rather strange. Can you tell me which version/revision
of smila you are running?

And also the whole crawler file would be helpful.

If there is isn't some simple solution to this then I also think that
opening a bug for this and moving the discussion there will be more
fruitful.

If I have the time today I'd like to resolve the issue posted below,
to get finally rid of it but I cant promise that.

Thomas Menzel @ IT-Solutions GmbH


-----Original Message----- From: smila-dev-bounces@xxxxxxxxxxx
[mailto:smila-dev-bounces@xxxxxxxxxxx] On Behalf Of Kerstin Bach
Sent: Donnerstag, 18. Februar 2010 16:03 To: smila-dev@xxxxxxxxxxx
Subject: Re: [smila-dev] SMILA Crawling mySQL DB

Hi Thomas,

thanks for your fast reply!

We're still struggling, because it still seems that our changes did
not apply. We changed the
...\SMILA\configuration\org.eclipse.smila.connectivity.framework.crawler.jdbc\schemas\JdbcDataSourceConnectionConfigSchema.xsd


file and of course we added a corresponding jdbc.xml and named the
DataSourceID 'jdbc' in
...\SMILA\configuration\org.eclipse.smila.connectivity.framework

Further we have deleted the workspace and restarted SMILA. We still
get the same error along with the following message in the
SMILA.log: 2010-02-18 15:57:07,890 ERROR [RMI TCP
Connection(8)-147.172.96.177 ]  framework.CrawlerControllerAgentBase
- org.eclipse.smila.connectivity.ConnectivityException: Error
loading DataSource with DataSourceId 'jdbc'

Any ideas what might help to get the crawler started?

Thanks, Kerstin



On 17.02.2010 09:15, Thomas Menzel wrote:
... and I was mistaken...

Checking the regex against the con-string I noticed that the
"/forum" part isn't covered.

Anyhow, as I said, just change the attribute def. like so in the
XSD: ... <xs:complexType> <xs:attribute name="Connection"
use="required" type="xs:normalizedString" /> ...

The patterns were initially thought as a safety measure using
XML/XSD to validate the correct format of the connection string -
nothing else. The JDBC crawler just takes the value as is from the
attribute.

What puzzles me though, is that your pattern isn't included in the
list of the SAX Exception!

What XSD did you change?

And did you do a restart? Note, that all Crawler Schemas are
registered on startup and on install as far as I can tell.

Thomas Menzel @ IT-Solutions GmbH

-----Original Message----- From: smila-dev-bounces@xxxxxxxxxxx
[mailto:smila-dev-bounces@xxxxxxxxxxx] On Behalf Of Thomas Menzel
Sent: Mittwoch, 17. Februar 2010 08:41 To: Smila project developer
mailing list Subject: RE: [smila-dev] SMILA Crawling mySQL DB

Hi Kerstin,

If I'm not mistaken, then your restriction pattern doesn't include
the port part(:3306) of the connection string you have spec'ed. If
that is the default port for mySQL then you might be able to just
omit it in the connection string - that is, if the driver supports
that...

I would just simply remove the constraint on the attribute and make
it a normalizedString without any restrictions (and thus
patterns).

See also 282116: [crawler] JDBC :: remove all constraints on the
connection string
https://bugs.eclipse.org/bugs/show_bug.cgi?id=282116

Thomas Menzel @ IT-Solutions GmbH

-----Original Message----- From: smila-dev-bounces@xxxxxxxxxxx
[mailto:smila-dev-bounces@xxxxxxxxxxx] On Behalf Of Kerstin Bach
Sent: Dienstag, 16. Februar 2010 19:44 To: Smila project developer
mailing list Subject: [smila-dev] SMILA Crawling mySQL DB

Dear all,

we are trying to use the mySQL-Crawler that has already been
mentioned on this list. Is there somewhere an example on how to set
up such a crawler?

We've already created an DataSourceConnectionConfig (based on the
kinkon example). Further we've added the following mySQL connection
string restriction to the
JdbcDataSourceConnectionConfigSchema.xsd: <xs:pattern
value="jdbc:mysql://[\w\.\-]+:\d+(;(DatabaseName|HostProcess|NetAddress|Password|PortNumber|ProgramName|SelectMethod|SendStringParametersAsUnicode|ServerName|User)=[\w\i]+)*"


/>

However we got the following error: Error loading DataSource with
DataSourceId 'jdbc': javax.xml.bind.UnmarshalException - with
linked exception: [org.xml.sax.SAXParseException:
cvc-pattern-valid: Value 'jdbc:mysql://localhost:3306/forum' is not
facet-valid with respect to pattern
'jdbc:oracle:thin:@[\w\.\-]+:\d+:\w+|jdbc:microsoft:sqlserver://[\w\.\-]+:\d+(;(DatabaseName|HostProcess|NetAddress|Password|PortNumber|ProgramName|SelectMethod|SendStringParametersAsUnicode|ServerName|User)=[\w\i]+)*|jdbc:sqlserver://[\w\.\-]+:\d+(;(DatabaseName|HostProcess|NetAddress|Password|PortNumber|ProgramName|SelectMethod|SendStringParametersAsUnicode|ServerName|User)=[\w\i]+)*|jdbc:odbc:[\w\.\-]+|jdbc:derby:[\w\.\-\\:/]+'


for type '#AnonType_ConnectionDatabaseProcess'.]

What did we miss - any suggestions?

Thanks in advance!

Best, Kerstin

_______________________________________________ smila-dev mailing
list smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev
_______________________________________________ smila-dev mailing
list smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev
_______________________________________________ smila-dev mailing
list smila-dev@xxxxxxxxxxx
https://dev.eclipse.org/mailman/listinfo/smila-dev

<DataSourceConnectionConfig xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";  xsi:noNamespaceSchemaLocation="../org.eclipse.smila.connectivity.framework.crawler.jdbc/schemas/JdbcDataSourceConnectionConfigSchema.xsd">
	<DataSourceID>jdbc</DataSourceID>
	<SchemaID>org.eclipse.smila.connectivity.framework.crawler.jdbc</SchemaID>
	<DataConnectionID>
		<Crawler>JdbcCrawler</Crawler>
	</DataConnectionID>
	<!--CompoundHandling>No</CompoundHandling-->
	<!--DeltaIndexing>disabled</DeltaIndexing-->
	<DeltaIndexing>full</DeltaIndexing>
	<Attributes>
		<Attribute Name="thread_id" HashAttribute="false" KeyAttribute="false" Type="String">
			<ColumnName>thread_id</ColumnName>
			<SqlType>string</SqlType>
		</Attribute>
		<Attribute Name="Title" HashAttribute="true" KeyAttribute="true" Type="String">
			<ColumnName>Title</ColumnName>
			<SqlType>string</SqlType>
		</Attribute>
		<Attribute Name="Content" HashAttribute="true" KeyAttribute="true" Type="String">
			<ColumnName>Content</ColumnName>
			<SqlType>string</SqlType>
		</Attribute>
		<Attribute Name="post_date" HashAttribute="true" KeyAttribute="true" Type="String">
			<ColumnName>post_date</ColumnName>
			<SqlType>string</SqlType>
		</Attribute>
		</Attributes>
	<Process>
		<Selections>
			<SQL>
				(SELECT thread_id, subject as Title, post_text as Content, post_date FROM threads) 
			</SQL>
		</Selections>
		<Database Connection="jdbc:mysql://localhost:3306/forum"
			FetchSize="100000"
			User="user"
			Password="sqlpw" 
			JdbcDriver="com.mysql.jdbc.Driver" />		
	</Process>
</DataSourceConnectionConfig>
<?xml version="1.0" encoding="UTF-8"?>
<!--
  /***********************************************************************************************************************
  * Copyright (c) 2008 empolis GmbH and brox IT Solutions GmbH. All rights reserved. This program and the accompanying
  * materials are made available under the terms of the Eclipse Public License v1.0 which accompanies this distribution,
  * and is available at http://www.eclipse.org/legal/epl-v10.html
  *
  * Contributors: Michael Breidenband (brox IT Solutions GmbH) - initial creator
  **********************************************************************************************************************/
-->
<xs:schema elementFormDefault="qualified"
  attributeFormDefault="unqualified"
  xmlns:xs="http://www.w3.org/2001/XMLSchema";>
  <xs:redefine schemaLocation="../../org.eclipse.smila.connectivity.framework.schema/schemas/RootDataSourceConnectionConfigSchema.xsd">
    <xs:complexType name="Process">
      <xs:annotation>
        <xs:documentation>
          Process Specification
        </xs:documentation>
      </xs:annotation>
      <xs:complexContent>
        <xs:extension base="Process">
          <xs:sequence>
            <xs:element name="Selections">
              <xs:complexType>
                <xs:sequence>
                  <xs:element name="Grouping"
                    minOccurs="0">
                    <xs:complexType>
                      <xs:sequence>
                        <xs:element
                          name="Stepping" type="xs:positiveInteger" />
                        <xs:element name="SQL"
                          type="xs:string" />
                      </xs:sequence>
                    </xs:complexType>
                  </xs:element>
                  <xs:element name="SQL"
                    type="xs:string" />
                </xs:sequence>
              </xs:complexType>
            </xs:element>
            <xs:element name="Database">
              <xs:annotation>
                <xs:documentation>
                  Database connection information
                </xs:documentation>
              </xs:annotation>
              <xs:complexType>
                <xs:attribute name="Connection"
                  use="required" type="xs:normalizedString">
                  <xs:simpleType>
                    <xs:restriction
                      base="xs:string">
                      <xs:pattern
                        value="jdbc:oracle:thin:@[\w\.\-]+:\d+:\w+" />
                      <xs:pattern
                        value="jdbc:microsoft:sqlserver://[\w\.\-]+:\d+(;(DatabaseName|HostProcess|NetAddress|Password|PortNumber|ProgramName|SelectMethod|SendStringParametersAsUnicode|ServerName|User)=[\w\i]+)*" />
                      <xs:pattern
                        value="jdbc:sqlserver://[\w\.\-]+:\d+(;(DatabaseName|HostProcess|NetAddress|Password|PortNumber|ProgramName|SelectMethod|SendStringParametersAsUnicode|ServerName|User)=[\w\i]+)*" />
                      <xs:pattern
                        value="jdbc:odbc:[\w\.\-]+" />
                      <xs:pattern
                        value="jdbc:derby:[\w\.\-\\:/]+" />
						<xs:pattern
                        value="jdbc:mysql://[\w\.\-]+:\d+(;(DatabaseName|HostProcess|NetAddress|Password|PortNumber|ProgramName|SelectMethod|SendStringParametersAsUnicode|ServerName|User)=[\w\i]+)*" />
                      <!-- please modify the connection string restriction in case of custom jdbc drivers -->
                    </xs:restriction>
                  </xs:simpleType>
                </xs:attribute>
                <xs:attribute name="User"
                  type="xs:string" use="required" />
                <xs:attribute name="Password"
                  type="xs:string" use="required" />
                <xs:attribute name="FetchSize"
                  type="xs:int" use="required" />
                <xs:attribute name="JdbcDriver"
                  type="xs:string" use="optional" />
              </xs:complexType>
            </xs:element>
          </xs:sequence>
        </xs:extension>
      </xs:complexContent>
    </xs:complexType>
    <xs:complexType name="Attribute">
      <xs:complexContent>
        <xs:extension base="Attribute">
          <xs:sequence>
            <xs:element name="ColumnName" type="xs:string" />
            <xs:element name="SqlType">
              <xs:simpleType>
                <xs:restriction base="xs:string">
                  <xs:enumeration value="string" />
                  <xs:enumeration value="long" />
                  <xs:enumeration value="date" />
                  <xs:enumeration value="double" />
                  <xs:enumeration value="blob" />
                  <xs:enumeration value="clob" />
                  <xs:enumeration value="boolean" />
                  <xs:enumeration value="byte[]" />
                  <xs:enumeration value="timestamp" />
                </xs:restriction>
              </xs:simpleType>
            </xs:element>
          </xs:sequence>
        </xs:extension>
      </xs:complexContent>
    </xs:complexType>
  </xs:redefine>
</xs:schema>

Back to the top