How to Design Indexes, Really

By: Zend

23   0   1900

Uploaded on 11/06/2012

Speaker: Bill Karwin Karwin Software Solutions

MySQL users commonly ask: Here's my table, what indexes do I need? Why aren't my indexes helping me? Don't indexes cause overhead? This talk gives you some practical answers, with a step by step method for finding the queries you need to optimize, and choosing the best indexes for them.

Comments (9):

By anonymous    2017-09-20

You're right, the index on (year, studioName) is the best. It allows a fast lookup when you have conditions on both columns.

If you have a second query that only searches for year, but not for studioName, then the same index would also help.

But if you only had an index on (studioName, year), and you search only for a specific year, the index would not help.

Think of a telephone book. You can search for people by last name because that's how the book is sorted. You can also search for people by last and first name, and the book helps even more.

But if you search only for people with a given first name, the sort order of the book isn't much help.

You might like my presentation, How to Design Indexes, Really, or the video of me presenting it: https://www.youtube.com/watch?v=ELR7-RdU9XU

The presentation is tailored for MySQL, but many of the ideas apply to any database that implements indexes using B-tree structures.

Original Thread

By anonymous    2017-09-20

Change the primary key to currency, unix_date. Since you have an equality condition for currency, and a range condition for unix_date, you should put the column with the equality condition first. Then both the range condition on unix_date and the ORDER BY should use the primary key order.

Apply the condition to unix_date, not create_date, to get it to use the primary key index.

You have to use a derived table subquery, but you don't have to use two nested levels of subqueries.

SELECT row_num, unix_date, price
FROM (
    SELECT @row := @row + 1 AS row_num, unix_date, price
    FROM (SELECT @row := 0) AS _init
    CROSS JOIN price_data
    WHERE currency = 'USD' 
     AND unix_date BETWEEN UNIX_TIMESTAMP('2017-03-26 00:00:00') 
                       AND UNIX_TIMESTAMP('2017-06-26 23:59:59')
    ORDER BY unix_timestamp DESC
) AS t
WHERE MOD(row_num, 288) = 1

You should learn to use EXPLAIN to help you analyze index usage.

You might also like my presentation How to Design Indexes, Really, and the video: https://www.youtube.com/watch?v=ELR7-RdU9XU

MySQL 8.0 should have windowing functions, so look for that next year sometime.

Original Thread

By anonymous    2017-09-20

MySQL 5.6 introduced some optimizer improvements specifically for long lists of ids in an IN() predicate. Read https://dev.mysql.com/doc/refman/5.6/en/range-optimization.html#equality-range-optimization

That optimization is relevant if you are relying on an index for the IN() search. If you narrow down the search first as in your query 2a, the index will be used for the inequality expression, but not for the IN() predicate.

In general, when you have multiple search terms, only one range predicate will be optimized with an index. A range predicate is anything other than =.

So in your 2a example, the BETWEEN predicate will use the index (I believe the pair of inequality expressions you used will be optimized as if it is one BETWEEN predicate), and then the IN() predicate will just search linearly through the result of the first index scan.

In your 2b example, trying to narrow down the search with MONTH(created) cannot use an index at all. Think of looking up people in a phone book—if I ask you to find everyone whose middle initial is "J." then the fact that the phone book is sorted by last name doesn't help you.

You might like to view my presentation How to Design Indexes, Really, or the video: https://www.youtube.com/watch?v=ELR7-RdU9XU

Original Thread

By anonymous    2017-09-23

+1 to answer from @JuanCarlosOropeza, but you can go a little further with the index.

ALTER TABLE transactions ADD INDEX (
  transaction_type,
  transaction_status,
  created_at,
  transaction_currency,
  transaction_total
);

As @RickJames mentioned in comments, the order of columns is important.

  • First, columns in equality comparisons
  • Next, you can index one column that is used for a range comparison (which is anything besides equality), or GROUP BY or ORDER BY. You have both range comparison and GROUP BY, but you can only get the index to help with one of these.
  • Last, other columns needed for the query, if you think you can get a covering index.

I describe more detail about index design in my presentation How to Design Indexes, Really (video: https://www.youtube.com/watch?v=ELR7-RdU9XU).

You're probably stuck with the "using temporary" since you have a range condition and also a GROUP BY referencing different columns. But you can at least eliminate the "using filesort" by this trick:

...
GROUP BY 
    transaction_currency
ORDER BY NULL

Supposing that it's not important to you which order the rows of the query results return in.

Original Thread

By anonymous    2017-10-15

You might like my presentation, [How to Design Indexes, Really](https://www.slideshare.net/billkarwin/how-to-design-indexes-really), which describes the benefit of additional columns in an index. Also the video of me presenting it: https://www.youtube.com/watch?v=ELR7-RdU9XU

Original Thread

By anonymous    2017-10-22

First of all, you have no indexes defined for your transaction table. So any searches are forced to do a table-scan. It's important to use indexes to speed up searches.

You might like my presentation How to Design Indexes, Really. Or the video: https://www.youtube.com/watch?v=ELR7-RdU9XU

This is probably the most important index for you to create:

ALTER TABLE tbl_transaction ADD INDEX (create_date);

But even after you have an index, when you try to search on an expression like the following, it can't use the index.

date(t.date_create) BETWEEN ? AND ?

You have to rewrite it so the indexed column is alone:

t.date_create BETWEEN ? AND ?

But this means you want the first parameter to be "floored" to time 00:00:00 and the second parameter to be "ceilinged" to time 23:59:59, to make sure the date_create with any time component matches the date range you want it to match.

if( strlen($search['date_from']) > 0  && strlen($search['date_to']) > 0  ){
    $from = search['date_from'] . ' 00:00:00'; 
    $to   = search['date_to']   . ' 23:59:59'; 

    $sQuery .= ' AND t.date_create BETWEEN ? AND ?';
    array_push($fill,$from);
    array_push($fill,$to);
}

Likewise, even for the searches for one specific date, you still have to do a range:

else if( strlen($search['date_from']) > 0 ){
    $from = search['date_from'] . ' 00:00:00'; 
    $to   = search['date_from'] . ' 23:59:59'; 

    $sQuery .= ' AND t.date_create BETWEEN ? AND ?';
    array_push($fill,$from);
    array_push($fill,$to);
}

If you read my presentation on designing indexes, you will have noticed that a range condition like the one on your create_date means any subsequent columns in the index won't help the search. You can use multi-column indexes, but all columns involved in equality conditions must be to the left in the index.

Also, you have multiple conditions that may or may not be present, based on the user's search criteria. This is also hard to optimize, because an indexed search only uses the columns of the index left-to-right. If you skip a column because the search doesn't need it, the subsequent columns won't help.

Another type of condition in your search is text pattern matching with LIKE '%'?'', instead of equality conditions. These LIKE searches can't use a conventional index, but they can use a special fulltext search index. I have a presentation for that too: Full Text Search Throwdown. Video: https://www.youtube.com/watch?v=V8yA8C3CZOc

Original Thread

Recommended Books

    Popular Videos 3001

    Submit Your Video

    If you have some great dev videos to share, please fill out this form.