在mysql使用時到超大千萬級數據量時我們大多會考慮到分表查詢,分庫查詢,下面我們來介紹一下關于分表查詢時的優化.
我們的項目中有好多不等于的情況,今天寫這篇文章簡單的分析一下怎么個優化法,這里的分表邏輯是根據t_group表的user_name組的個數來分的.
因為這種情況單獨user_name字段上的索引就屬于爛索引,起不了啥名明顯的效果.
1、試驗PROCEDURE,代碼如下:
- DELIMITER $$
- DROP PROCEDURE `t_girl`.`sp_split_table`$$
- CREATE PROCEDURE `t_girl`.`sp_split_table`()
- BEGIN
- declare done int default 0;
- declare v_user_name varchar(20) default '';
- declare v_table_name varchar(64) default '';
- -- Get all users' name.
- declare cur1 cursor for select user_name from t_group group by user_name;
- -- Deal with error or warnings.
- declare continue handler for 1329 set done = 1;
- -- Open cursor.
- open cur1;
- while done <> 1
- do
- fetch cur1 into v_user_name;
- if not done then
- -- Get table name.
- set v_table_name = concat('t_group_',v_user_name);
- -- Create new extra table.
- set @stmt = concat('create table ',v_table_name,' like t_group');
- prepare s1 from @stmt;
- execute s1;
- drop prepare s1;
- -- Load data into it.
- set @stmt = concat('insert into ',v_table_name,' select * from t_group where user_name = ''',v_user_name,'''');
- prepare s1 from @stmt;
- execute s1;
- drop prepare s1;
- end if;
- end while;
- -- Close cursor.
- close cur1;
- -- Free variable from memory.
- set @stmt = NULL;
- END$$
- DELIMITER ;
2、試驗表,我們用一個有一千萬條記錄的表來做測試,代碼如下:
- mysql> select count(*) from t_group;
- +----------+
- | count(*) |
- +----------+
- | 10388608 |
- +----------+
- 1 row in set (0.00 sec)
表結構,代碼如下:
- mysql> desc t_group;
- +-------------+------------------+------+-----+-------------------+----------------+
- | Field | Type | Null | Key | Default | Extra |
- +-------------+------------------+------+-----+-------------------+----------------+
- | id | int(10) unsigned | NO | PRI | NULL | auto_increment |
- | money | decimal(10,2) | NO | | | |
- | user_name | varchar(20) | NO | MUL | | |
- | create_time | timestamp | NO | | CURRENT_TIMESTAMP | |
- +-------------+------------------+------+-----+-------------------+----------------+
- 4 rows in set (0.00 sec)
索引情況.
- mysql> show index from t_group;
- +---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
- | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
- +---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
- | t_group | 0 | PRIMARY | 1 | id | A | 10388608 | NULL | NULL | | BTREE | |
- | t_group | 1 | idx_user_name | 1 | user_name | A | 8 | NULL | NULL | | BTREE | |
- | t_group | 1 | idx_combination1 | 1 | user_name | A | 8 | NULL | NULL | | BTREE | |
- | t_group | 1 | idx_combination1 | 2 | money | A | 3776 | NULL | NULL | | BTREE | |
- +---------+------------+------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
- 4 rows in set (0.00 sec) --Vevb.com
PS:idx_combination1 這個索引是必須的,因為要對user_name來GROUP BY,此時屬于松散索引掃描,當然完了后你可以干掉她.
idx_user_name 這個索引是為了加快單獨執行constant這種類型的查詢.
我們要根據用戶名來分表,代碼如下:
- mysql> select user_name from t_group where 1 group by user_name;
- +-----------+
- | user_name |
- +-----------+
- | david |
- | leo |
- | livia |
- | lucy |
- | sarah |
- | simon |
- | sony |
- | sunny |
- +-----------+
- 8 rows in set (0.00 sec)
所以結果表應該是這樣的,代碼如下:
- mysql> show tables like 't_group_%';
- +------------------------------+
- | Tables_in_t_girl (t_group_%) |
- +------------------------------+
- | t_group_david |
- | t_group_leo |
- | t_group_livia |
- | t_group_lucy |
- | t_group_sarah |
- | t_group_simon |
- | t_group_sony |
- | t_group_sunny |
- +------------------------------+
- 8 rows in set (0.00 sec)
3、對比結果,代碼如下:
- mysql> select count(*) from t_group where user_name = 'david';
- +----------+
- | count(*) |
- +----------+
- | 1298576 |
- +----------+
- 1 row in set (1.71 sec)
執行了將近2秒.
- mysql> select count(*) from t_group_david;
- +----------+
- | count(*) |
- +----------+
- | 1298576 |
- +----------+
- 1 row in set (0.00 sec)
幾乎是瞬間的.
- mysql> select count(*) from t_group where user_name <> 'david';
- +----------+
- | count(*) |
- +----------+
- | 9090032 |
- +----------+
- 1 row in set (9.26 sec)
執行了將近10秒,可以想象,這個是實際的項目中是不能忍受的.
- mysql> select (select count(*) from t_group) - (select count(*) from t_group_david) as total;
- +---------+
- | total |
- +---------+
- | 9090032 |
- +---------+
- 1 row in set (0.00 sec)
幾乎是瞬間的.
我們來看看聚集函數,對于原表的操作,代碼如下:
- mysql> select min(money),max(money) from t_group where user_name = 'david';
- +------------+------------+
- | min(money) | max(money) |
- +------------+------------+
- | -6.41 | 500.59 |
- +------------+------------+
- 1 row in set (0.00 sec)
最小,最大值都是FULL INDEX SCAN,所以是瞬間的.
- mysql> select sum(money),avg(money) from t_group where user_name = 'david';
- +--------------+------------+
- | sum(money) | avg(money) |
- +--------------+------------+
- | 319992383.84 | 246.417910 |
- +--------------+------------+
- 1 row in set (2.15 sec)
其他聚集函數的結果就不是FULL INDEX SCAN了,耗時2.15秒,對于小表的操作,代碼如下:
- mysql> select min(money),max(money) from t_group_david;
- +------------+------------+
- | min(money) | max(money) |
- +------------+------------+
- | -6.41 | 500.59 |
- +------------+------------+
- 1 row in set (1.50 sec)
最大最小值完全是FULL TABLE SCAN,耗時1.50秒,不劃算,以此看來,代碼如下:
- mysql> select sum(money),avg(money) from t_group_david;
- +--------------+------------+
- | sum(money) | avg(money) |
- +--------------+------------+
- | 319992383.84 | 246.417910 |
- +--------------+------------+
- 1 row in set (1.68 sec)
取得這兩個結果也是花了快2秒,快了一點,我們來看看這個小表的結構,代碼如下:
- mysql> desc t_group_david;
- +-------------+------------------+------+-----+-------------------+----------------+
- | Field | Type | Null | Key | Default | Extra |
- +-------------+------------------+------+-----+-------------------+----------------+
- | id | int(10) unsigned | NO | PRI | NULL | auto_increment |
- | money | decimal(10,2) | NO | | | |
- | user_name | varchar(20) | NO | MUL | | |
- | create_time | timestamp | NO | | CURRENT_TIMESTAMP | |
- +-------------+------------------+------+-----+-------------------+----------------+
- 4 rows in set (0.00 sec)
明顯的user_name屬性是多余的,那么就干掉它,代碼如下:
- mysql> alter table t_group_david drop user_name;
- Query OK, 1298576 rows affected (7.58 sec)
- Records: 1298576 Duplicates: 0 Warnings: 0
現在來重新對小表運行查詢,代碼如下:
- mysql> select min(money),max(money) from t_group_david;
- +------------+------------+
- | min(money) | max(money) |
- +------------+------------+
- | -6.41 | 500.59 |
- +------------+------------+
- 1 row in set (0.00 sec)
此時是瞬間的.
- mysql> select sum(money),avg(money) from t_group_david;
- +--------------+------------+
- | sum(money) | avg(money) |
- +--------------+------------+
- | 319992383.84 | 246.417910 |
- +--------------+------------+
- 1 row in set (0.94 sec)
這次算是控制在一秒以內了.
mysql> Aborted
小總結一下:分出的小表的屬性盡量越少越好,大膽的去干吧.
新聞熱點
疑難解答