RSS
热门关键字:  数据挖掘  数据仓库  商业智能  人工智能  搜索引擎

How MySQL Uses Indexes

来源: 作者:unkonwn 时间:2004-11-22 点击:

Indexes are used to find rows with specific column values quickly. Without an index, MySQL must begin with the first row and then read through the entire table to find the relevant rows. The larger the table, the more this costs. If the table has an index for the columns in question, MySQL can quickly determine the position to seek to in the middle of the data file without having to look at all the data. If a table has 1,000 rows, this is at least 100 times faster than reading sequentially. If you need to access most of the rows, it is faster to read sequentially, because this minimizes disk seeks.

Most MySQL indexes (PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT) are stored in B-trees. Exceptions are that indexes on spatial data types use R-trees, and that MEMORY tables also support hash indexes.

数据挖掘研究院

Strings are automatically prefix- and end-space compressed. See Section 13.1.4, “CREATE INDEX Syntax”.

数据挖掘研究院

In general, indexes are used as described in the following discussion. Characteristics specific to hash indexes (as used in MEMORY tables) are described at the end of this section.

数据挖掘研究院

MySQL uses indexes for these operations:

  • To find the rows matching a WHERE clause quickly. 数据挖掘实验室

  • To eliminate rows from consideration. If there is a choice between multiple indexes, MySQL normally uses the index that finds the smallest number of rows. 数据挖掘研究院

  • To retrieve rows from other tables when performing joins.

    数据挖掘研究院

  • To find the MIN() or MAX() value for a specific indexed column key_col. This is optimized by a preprocessor that checks whether you are using WHERE key_part_N = constant on all key parts that occur before key_col in the index. In this case, MySQL does a single key lookup for each MIN() or MAX() expression and replaces it with a constant. If all expressions are replaced with constants, the query returns at once. For example:

    SELECT MIN(key_part2),MAX(key_part2)
      FROM tbl_name WHERE key_part1=10;
     数据挖掘研究院 
  • To sort or group a table if the sorting or grouping is done on a leftmost prefix of a usable key (for example, ORDER BY key_part1, key_part2). If all key parts are followed by DESC, the key is read in reverse order. See Section 7.2.12, “ORDER BY Optimization”. 数据挖掘实验室

  • In some cases, a query can be optimized to retrieve values without consulting the data rows. If a query uses only columns from a table that are numeric and that form a leftmost prefix for some key, the selected values may be retrieved from the index tree for greater speed: 数据挖掘研究院

    SELECT key_part3 FROM tbl_name 
      WHERE key_part1=1
     

    数据挖掘研究院

Suppose that you issue the following SELECT statement: 数据挖掘研究院

mysql> SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;
  

If a multiple-column index exists on col1 and col2, the appropriate rows can be fetched directly. If separate single-column indexes exist on col1 and col2, the optimizer tries to find the most restrictive index by deciding which index finds fewer rows and using that index to fetch the rows. 数据挖掘研究院

If the table has a multiple-column index, any leftmost prefix of the index can be used by the optimizer to find rows. For example, if you have a three-column index on (col1, col2, col3), you have indexed search capabilities on (col1), (col1, col2), and (col1, col2, col3). 数据挖掘研究院

MySQL cannot use a partial index if the columns do not form a leftmost prefix of the index. Suppose that you have the SELECT statements shown here: 数据挖掘研究院

SELECT * FROM tbl_name WHERE col1=val1;
SELECT * FROM tbl_name WHERE col1=val1 AND col2=val2;

SELECT * FROM tbl_name WHERE col2=val2;
SELECT * FROM tbl_name WHERE col2=val2 AND col3=val3;
  

If an index exists on (col1, col2, col3), only the first two queries use the index. The third and fourth queries do involve indexed columns, but (col2) and (col2, col3) are not leftmost prefixes of (col1, col2, col3).

数据挖掘研究院

A B-tree index can be used for column comparisons in expressions that use the =, >, >=, <, <=, or BETWEEN operators. The index also can be used for LIKE comparisons if the argument to LIKE is a constant string that does not start with a wildcard character. For example, the following SELECT statements use indexes: 数据挖掘研究院

SELECT * FROM tbl_name WHERE key_col LIKE ′Patrick%′;
SELECT * FROM tbl_name WHERE key_col LIKE ′Pat%_ck%′;
 数据挖掘研究院 

In the first statement, only rows with ′Patrick′ <= key_col < ′Patricl′ are considered. In the second statement, only rows with ′Pat′ <= key_col < ′Pau′ are considered.

数据挖掘研究院

The following SELECT statements do not use indexes: 数据挖掘实验室

SELECT * FROM tbl_name WHERE key_col LIKE ′%Patrick%′;
SELECT * FROM tbl_name WHERE key_col LIKE other_col;
 

数据挖掘研究院

In the first statement, the LIKE value begins with a wildcard character. In the second statement, the LIKE value is not a constant.

If you use ... LIKE ′%string%′ and string is longer than three characters, MySQL uses the Turbo Boyer-Moore algorithm to initialize the pattern for the string and then uses this pattern to perform the search more quickly. 数据挖掘研究院

A search using col_name IS NULL employs indexes if col_name is indexed. 数据挖掘实验室

Any index that does not span all AND levels in the WHERE clause is not used to optimize the query. In other words, to be able to use an index, a prefix of the index must be used in every AND group.

数据挖掘研究院

The following WHERE clauses use indexes:

... WHERE index_part1=1 AND index_part2=2 AND other_column=3
    /* index = 1 OR index = 2 */
... WHERE index=1 OR A=10 AND index=2
    /* optimized like "index_part1=′hello′" */
... WHERE index_part1=′hello′ AND index_part3=5
    /* Can use index on index1 but not on index2 or index3 */
... WHERE index1=1 AND index2=2 OR index1=3 AND index3=3;
 

数据挖掘研究院

These WHERE clauses do not use indexes:

    /* index_part1 is not used */
... WHERE index_part2=1 AND index_part3=2

    /*  Index is not used in both parts of the WHERE clause  */
... WHERE index=1 OR A=10

    /* No index spans all rows  */
... WHERE index_part1=1 OR index_part2=10
 

数据挖掘研究院

Sometimes MySQL does not use an index, even if one is available. One circumstance under which this occurs is when the optimizer estimates that using the index would require MySQL to access a very large percentage of the rows in the table. (In this case, a table scan is likely to be much faster because it requires fewer seeks.) However, if such a query uses LIMIT to retrieve only some of the rows, MySQL uses an index anyway, because it can much more quickly find the few rows to return in the result.

数据挖掘研究院

Hash indexes have somewhat different characteristics from those just discussed: 数据挖掘实验室

  • They are used only for equality comparisons that use the = or <=> operators (but are very fast). They are not used for comparison operators such as < that find a range of values.

    数据挖掘研究院

  • The optimizer cannot use a hash index to speed up ORDER BY operations. (This type of index cannot be used to search for the next entry in order.)

    数据挖掘研究院

  • MySQL cannot determine approximately how many rows there are between two values (this is used by the range optimizer to decide which index to use). This may affect some queries if you change a MyISAM table to a hash-indexed MEMORY table.

    数据挖掘研究院

  • Only whole keys can be used to search for a row. (With a B-tree index, any leftmost prefix of the key can be used to find rows.) 数据挖掘研究院


User Comments

Posted by Michael Schröpl on July 23 2002 1:42pm [Delete] [Edit]

One case when mySQL 3.23 does not use an index is


if it has to implicitly convert types.

Imagine you have a column of a VARCHAR type but
query this in the form "SELECT * FROM tablename
WHERE columnname = 123"; mySQL implicitly converts
123 to "123" but then does a full table scan.
数据挖掘研究院

Posted by Ed Soniat on November 1 2002 7:03am [Delete] [Edit]

This section should include information about the cost of indexes. The size of an index and the cost of keeping an index current. An over view of how to determine when to use an index would be good too.
数据挖掘研究院

Posted by [name withheld] on June 4 2003 2:21am [Delete] [Edit]

The ′not using indexes when using OR′ feature is rather annoying... surely it′s quicker to use the index than scan a large table? In the example below, MemberID is the primary key, MembershipNumber is an index and Member is a table with over 450,000 rows.

select MembershipNumber, MemberID from Member where MemberID = 123896920 and MembershipNumber = 1029540;
1 row in set (0.00 sec)

select MembershipNumber, MemberID from Member where MemberID = 123896920 or MembershipNumber = 1029540;
1 row in set (11.49 sec)



Posted by Scott James on June 18 2003 3:13am [Delete] [Edit]

With 3.28.56 I was able to use a composite index on two columns, in one special case:

mysql> create table t(a int, b int, c int);
mysql> create index t_idx on t (a,b,c);
mysql> insert into t values (1, 1, 1);
... (load 10 more test rows like this) ...

mysql> explain select * from t where a=1 or b=1;
数据挖掘研究院

| table | type  | possible_keys | key   | key_len | ref  
| t | index | t_idx | t_idx | 15 | NULL

数据挖掘研究院


However, notice that the composite index is on ALL columns of the table (rarely practical). So, lets create an index on only the first two columns:

mysql> drop index t_idx from t;
mysql> create index t_idx on t(a,b);
mysql> explain select * from t where a=1 or b=1;
| table | type | possible_keys | key  | key_len | ref
| t | ALL | t_idx | NULL | NULL | NULL
数据挖掘研究院

So, if you absolutely need to use OR to join multiple rows, consider creating a smaller in-memory table (CREATE ... SELECT ... TYPE=HEAP). This could help if you do this kind of join multiple times in a single transaction.


  数据挖掘研究院

Posted by Donny Simonton on November 4 2003 1:56pm [Delete] [Edit]

Instead of using OR, just use IN. For example, you would write your query like:

select MembershipNumber, MemberID from Member where MemberID IN (′123896920′, ′1029540′);

I′m running 4.1 currently and the difference between an OR and IN are nothing. But on an older 3.x box we have IN makes a huge difference.

Posted by Martin Mokrejs on November 5 2003 2:06am [Delete] [Edit]

Scott James above gave us nice examples. Unfortunately,
the syntax to drop the index needs `ON′ instead of `FROM′.

DROP INDEX index_name ON tbl_name
数据挖掘实验室

Posted by [name withheld] on December 15 2003 6:31am [Delete] [Edit]

Hi,

Not sure what version of MYSQL you are talking about, looking at the docs here -> http://www.mysql.com/doc/en/ALTER_TABLE.html

This is the correct pseudo syntax:

alter table test drop index I_TEST;

[edited]
Looks like this link might explain some of the confusion http://www.mysql.com/doc/en/DROP_INDEX.html

I would suggest using the alter table statement.


greetings,

glenn
数据挖掘研究院

Posted by Mike Chirico on April 16 2004 2:14pm [Delete] [Edit]

An index can be used to surgically remove duplicate entries.

For updated example, reference:
http://osdn.dl.sourceforge.net/sourceforge/souptonuts/README_mysql.txt

Assume the following table and data.

CREATE TABLE IF NOT EXISTS dupTest (
pkey int(11) NOT NULL auto_increment,
a int,
b int,
c int,
timeEnter timestamp(14),
PRIMARY KEY (pkey)

);

insert into dupTest (a,b,c) values (1,2,3),(1,2,3), 数据挖掘研究院
(1,5,4),(1,6,4);



mysql> select * from dupTest;
select * from dupTest;
+------+------+------+------+---------------------+
| pkey | a | b | c | timeEnter |
+------+------+------+------+---------------------+
| 1 | 1 | 2 | 3 | 2004-04-16 10:55:35 |
| 2 | 1 | 2 | 3 | 2004-04-16 10:55:35 |
| 3 | 1 | 5 | 4 | 2004-04-16 10:55:35 |
| 4 | 1 | 6 | 4 | 2004-04-16 10:55:35 |
+------+------+------+------+---------------------+
4 rows in set (0.00 sec)

mysql>

Note, the first two rows contains duplicates in columns a and b. It contains
other duplicates; but, leave the other duplicates alone.



mysql> ALTER IGNORE TABLE dupTest ADD UNIQUE INDEX(a,b);


mysql> select * from dupTest;
select * from dupTest;
+------+------+------+------+---------------------+
| pkey | a | b | c | timeEnter |
+------+------+------+------+---------------------+

数据挖掘研究院


| 1 | 1 | 2 | 3 | 2004-04-16 11:11:42 |
| 3 | 1 | 5 | 4 | 2004-04-16 11:11:42 |
| 4 | 1 | 6 | 4 | 2004-04-16 11:11:42 |
+------+------+------+------+---------------------+
3 rows in set (0.00 sec)



Regards,

Mike Chirico


数据挖掘研究院

Posted by Peter Brodersen on June 7 2004 11:42am [Delete] [Edit]

Selecting data using OR on two different columns with individual indices could be optimized using UNION (thereby creating two separate queries). UNION was introduced in MySQL 4.0.

From the above example: (notice that two different fields are used - MemberID and MembershipNumber - if the field was the same, IN() or OR would be optimized)

Before:
select MembershipNumber, MemberID from Member where MemberID = 123896920 or MembershipNumber = 1029540;

After:
select MembershipNumber, MemberID from Member where MemberID = 123896920


UNION
select MembershipNumber, MemberID from Member where MembershipNumber = 1029540;


Two fast selects will be performed where indices are used. UNION will remove a possible duplicate (and UNION ALL would leave the duplicate).
数据挖掘研究院

最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?