開發實務上,很常會遇到需要做分頁的需求,並且API提供page
跟limit
參數可以帶,接著再根據參數去DB撈取對應區段的資料。例如今天我們寫了個撈取卡片列表的API,在API請求進來後得到希望查詢以最新創建時間排序、在分頁第4頁(每頁顯示10筆)的卡片資訊,經過轉換後,後端程式送出了以下SQL語法給DB查詢。
SELECT id, content, createTime FROM card ORDER BY createTime DESC LIMIT 10 OFFSET 30;
這是我們目標查詢的card
table的結構:
CREATE TABLE card ( id INT NOT NULL AUTO_INCREMENT, content TEXT, createTime TIMESTAMP DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (id), KEY idx_createTime (createTime) );
得到DB返回:
+--------+--------------+-------------+ | id | content | createTime | +--------+--------------+-------------+ | 999970 | (以下省略)... | (以下省略)... | | 999969 | ... | ... | | 999968 | ... | ... | | 999967 | ... | ... | | 999966 | ... | ... | | 999965 | ... | ... | | 999964 | ... | ... | | 999963 | ... | ... | | 999962 | ... | ... | | 999961 | ... | ... | +--------+--------------+-------------+ 10 rows in set (0.01 sec)
撈出來的資料沒什麼問題,但如果今天卡片數量很多,例如範例中的table總共有100萬筆資料,而我們又剛好想撈取在中後面的分頁的卡片(就像你在凌晨四點把迷片網站翻到第10001頁一樣):
SELECT id, content, createTime FROM card ORDER BY createTime DESC LIMIT 10 OFFSET 100000;
+--------+--------------+-------------+ | id | content | createTime | +--------+--------------+-------------+ | 900000 | (以下省略)... | (以下省略)... | | 899999 | ... | ... | | 899998 | ... | ... | | 899997 | ... | ... | | 899996 | ... | ... | | 899995 | ... | ... | | 899994 | ... | ... | | 899993 | ... | ... | | 899992 | ... | ... | | 899991 | ... | ... | +--------+--------------+-------------+ 10 rows in set (2.09 sec)
奇怪,怎麼查詢時間瞬間從0.01秒增加至2.09秒這麼多?!那如果我們再查詢更後面的分頁呢,例如第90001頁會發生什麼事:
SELECT id, content, createTime FROM card ORDER BY createTime DESC LIMIT 10 OFFSET 900000;
+--------+--------------+-------------+ | id | content | createTime | +--------+--------------+-------------+ | 100000 | (以下省略)... | (以下省略)... | | 99999 | ... | ... | | 99998 | ... | ... | | 99997 | ... | ... | | 99996 | ... | ... | | 99995 | ... | ... | | 99994 | ... | ... | | 99993 | ... | ... | | 99992 | ... | ... | | 99991 | ... | ... | +--------+--------------+-------------+ 10 rows in set (5.44 sec)
阿娘喂5.44秒是怎麼一回事,看起來OFFEST越多查詢時間會呈現線性增長嘛@@?! 沒錯,你猜對了。
MySQL在使用查詢語句搭配OFFSET時其實並不像我們想像的一樣聰明,LIMIT 10 OFFSET 900000就直接跳到偏移900000筆資料處的地方往下撈10筆返回。
實際上,MySQL會從第1筆開始拿,一直數到拿到了第900000筆資料後把前面讀到的都丟掉,最後再往下撈10筆我們要的資料後返回。
更重要的是,這邊指的拿是真的讀出整條記錄而不是走訪index數數而已。因此即便我們有建index而且MySQL也選擇使用index,每當走訪到一個index節點後,MySQL還是要根據這個index節點對應的主鍵ID回表查詢存在主鍵那棵Btree的資料,所以當偏移太多時MySQL反而會選擇全表搜索還比較快,我們用EXPLAIN語句來證實這段話。
EXPLAIN SELECT id, content, createTime FROM card ORDER BY createTime DESC LIMIT 10 OFFSET 10;
先來查詢第2頁,很好,MySQL有如我們預期使用index。
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+---------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+---------------------+ | 1 | SIMPLE | card | NULL | index | NULL | idx_createTime | 5 | NULL | 20 | 100.00 | Backward index scan | +----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+---------------------+
那第102頁呢?
EXPLAIN SELECT id, content, createTime FROM card ORDER BY createTime DESC LIMIT 10 OFFSET 110;
也有~
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+---------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+---------------------+ | 1 | SIMPLE | card | NULL | index | NULL | idx_createTime | 5 | NULL | 120 | 100.00 | Backward index scan | +----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+---------------------+
第1002頁呢?
EXPLAIN SELECT id, content, createTime FROM card ORDER BY createTime DESC LIMIT 10 OFFSET 1010;
也還有~
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+---------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+---------------------+ | 1 | SIMPLE | card | NULL | index | NULL | idx_createTime | 5 | NULL | 1020 | 100.00 | Backward index scan | +----+-------------+-------+------------+-------+---------------+----------------+---------+------+------+----------+---------------------+
第10002頁呢?
EXPLAIN SELECT id, content, createTime FROM card ORDER BY createTime DESC LIMIT 10 OFFSET 10010;
這下就沒有了,到此為止MySQL覺得查詢index後再回去查主鍵那棵Btree的資料太慢了,於是果斷選擇全表查詢(type=ALL)再搭配排序(extra=using file sort),最後數到第10010筆資料後往下再抓10筆返回。現在可以理解為什麼我們最一開始的SQL語句當分頁到第90001頁會花這麼長的時間了吧。
+----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------+ | 1 | SIMPLE | card | NULL | ALL | NULL | NULL | NULL | NULL | 996250 | 100.00 | Using filesort | +----+-------------+-------+------------+------+---------------+------+---------+------+--------+----------+----------------+
上面的道理都講得通,但想想這樣還是很不合理呀,我們建了index就是希望加快查詢速度,結果MySQL你在那邊不好好利用index還給我全表查詢+排序拖台錢,有沒有什麼優化方法呢。
有的,既然我們知道index存的是已經排序好的資料,但慢就慢在每走訪到一個index節點MySQL會執行回表查詢的這個動作,我們只要想辦法消除掉每次回表查詢的這個缺點,留下index的優點就可以加速我們分頁用到OFFSET的速度了。我們拿剛剛上面的SQL語句修改一下,但這次只單純查第10002頁的卡片ID就好,因為index就包含了ID所以不會觸發回表查詢,先EXPLAIN一下看看index的使用情形。
EXPLAIN SELECT id FROM card ORDER BY createTime DESC LIMIT 10 OFFSET 10010;
不錯唷MySQL有乖乖聽話,type=index代表此次查詢有使用index,而且在最後面的extra裡面的using index代表查詢只使用了index,沒有執行回表的動作,完全符合我們的期待。
+----+-------------+-------+------------+-------+---------------+----------------+---------+------+-------+----------+----------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------+---------------+----------------+---------+------+-------+----------+----------------------------------+ | 1 | SIMPLE | card | NULL | index | NULL | idx_createTime | 5 | NULL | 10020 | 100.00 | Backward index scan; Using index | +----+-------------+-------+------------+-------+---------------+----------------+---------+------+-------+----------+----------------------------------+
最後再搭配join或sub query的形式查詢我們最後想要的卡片詳細資訊就行了。
SELECT main.id, content, createTime FROM card AS main INNER JOIN ( SELECT id FROM card ORDER BY createTime DESC LIMIT 10 OFFSET 900000 ) AS sub ON main.id = sub.id ORDER BY createTime DESC;
0.26秒對比5.44秒!!執行速度是不是跟剛剛天差地遠呢~
+--------+--------------+-------------+ | id | content | createTime | +--------+--------------+-------------+ | 100000 | (以下省略)... | (以下省略)... | | 99999 | ... | ... | | 99998 | ... | ... | | 99997 | ... | ... | | 99996 | ... | ... | | 99995 | ... | ... | | 99994 | ... | ... | | 99993 | ... | ... | | 99992 | ... | ... | | 99991 | ... | ... | +--------+--------------+-------------+ 10 rows in set (0.26 sec)
EXPLAIN一下看看神秘的黑魔法是怎麼發生的。
+----+-------------+------------+------------+--------+---------------+----------------+---------+--------+--------+----------+----------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+------------+------------+--------+---------------+----------------+---------+--------+--------+----------+----------------------------------+ | 1 | PRIMARY | <derived2> | NULL | ALL | NULL | NULL | NULL | NULL | 900010 | 100.00 | Using temporary; Using filesort | | 1 | PRIMARY | main | NULL | eq_ref | PRIMARY | PRIMARY | 4 | sub.id | 1 | 100.00 | NULL | | 2 | DERIVED | card | NULL | index | NULL | idx_createTime | 5 | NULL | 900010 | 100.00 | Backward index scan; Using index | +----+-------------+------------+------------+--------+---------------+----------------+---------+--------+--------+----------+----------------------------------+
首先看到最下面table=card的那個row,也就是我們的子查詢行為跟前一個EXPLAIN一樣,這邊就跳過不解釋了。
再往上看到table=main的那個row,他的type=eq_ref且ref=sub.id的意思是指MySQL使用了唯一對等連接(兩張表的key都是unique)透過連接main.id=sub.id做關聯。當有連接match到後驅動表就立馬換下一筆id搜尋,所以速度很快。
最上面那個table=derived2的row在extra地方表示了using temporary使用臨時表暫存join後的結果,並且排序(using filesort)成最終我們要的形狀。
透過這樣的優化,以後我們就再也不用擔心當分頁頁數很多,查詢越後面頁數的資料會越慢的情況發生啦~
感謝您分享這篇很實用,困擾我很久
另外如果遇到全文索引,好像就不適?
用例如:MATCH...AGAINST
SELECT * FROM 表名 WHERE MATCH (欄位1, 欄位2,...) AGAINST('word1 word2 ...' )
全文索引我自己沒有用過可能無法回答到你😅
不過如果要對文字做搜尋,業界比較常使用elasticsearch這類的DB