A WebMaster’s view of Eclipse.org

Rants, praise and observations related to the technical and psychological challenges of running servers for a pretty busy site.

Optimizing SQL Part 2: the size of your queries

The size and complexity of a SQL query can greatly influence its execution time. Since MySQL 5.0, subselects are now supported, but I often see SQL novices overuse subselects with disastrous results.

For instance, consider these queries I use in Babel.  For accurate results, I ran all the queries once, then cleared the query cache, then ran them again.

select string_id from strings where file_id = 35;
[snip]
83 rows in set (0.00 sec)
select translation_id from translations where string_id in
    (select string_id from strings where file_id = 35) and language_id = 2;
[snip]
126 rows in set (8.18 sec)

If I replace the above subselect with the exact list of string_id values from the first query, look at the results:

select translation_id from translations where string_id in
    (1072,1073,[snip: 80 others],197676) and language_id = 2;
[snip]
126 rows in set (4.94 sec)

In this case, without any further optimization, it’s worth issuing two simple queries rather than a single subselect.

Of course, in() cannot compete with an inner join for this type of query:

select translation_id from translations
    inner join strings on strings.string_id = translations.string_id
    where translations.language_id = 2 and strings.file_id = 35;
[snip]
126 rows in set (0.02 sec)

Subselects are not always your friend. Using table aliases can then cut down on the amount of bytes you’re sending to your server, reducing the load on your network stack, and making your query easier to read:

select translation_id from translations t
    inner join strings s on s.string_id = t.string_id
    where t.language_id = 2 and s.file_id = 35;

Next tip: maximizing query cache

Posted May 8th, 2009 by Denis Roy in category: Uncategorized
You can skip to the end and leave a response. Pinging is currently not allowed.

4 Responses to “Optimizing SQL Part 2: the size of your queries”


  1. Min Says:

    What about using the “with” syntax for select statement?

    (A little rusty on my sql, but something like this…)

    WITH (select string_id AS myid FROM strings WHERE file_id=35) AS temp
    SELECT translation_id FROM translations WHERE string_id IN temp.myid AND langauge_id=2

    Hmm, not sure if its valid to use “IN” like that, but you could join or do something else against the “temp” table that you’ve created with the first class.


  2. Denis Roy Says:

    Thanks for your feedback. I’m not sure if MySQL supports WITH, but a temp table is not required at all in this case. Read the complete post :-)

    For what it’s worth, yes, you can use IN() that way, but in this case it’s simply not optimal.


  3. Ken Says:

    Oracle (and probably others) optimizes a non-correlated subquery into an inner join. So some of the blame lay on MySQL, not “SQL novices.”


  4. Denis Roy Says:

    Good point, Ken, thanks.

Leave a Reply

You must be logged in using your Eclipse Bugzilla account to post a comment.

Recent Posts

Archives

Categories

Meta