問題描述:
10月25日上午濱州網(wǎng)通的工程師報告oss應用系統(tǒng)運行緩慢,具體操作是通過oss系統(tǒng)查詢話單,很長時間才能返回結(jié)果,嚴重影響了客戶的正常使用。
問題處理:
1.登陸數(shù)據(jù)庫主機,用sar命令看到idle的值持續(xù)為0,cpu的資源已經(jīng)耗盡:
bz_db1# sar 2 4
sunos kest 5.8 generic_108528-19 sun4u 10/26/04
10:56:46 %usr %sys %wio %idle
10:56:48 1 4 95 0
10:56:50 1 5 94 0
10:56:52 0 6 93 0
10:56:54 1 6 93 0
average 1 5 94 0
2.使用top命令看到有兩個明顯占用cpu利用率過高的進程,以下是top命令的結(jié)果:
bz_db1# top
last pid: 1664;load averages: 3.26, 3.24, 3.69
159 processes: 152 sleeping, 2 running, 2 zombie, 1 stopped, 2 on cpu
cpu states: 1.5% idle, 72.5% user, 17.9% kernel, 8.0% iowait, 0.0% swap
memory: 2.0g real, 233m free, 2.0g swap in use, 3.4g swap free
pid username thr pr nce size res state time flts cpu command
27420 oracle 1 10 0 1.3g 1.2g cpu01 22.9h 2 31.94% oracle
27418 oracle 1 10 0 1.3g 1.2g run 23.0h 6 26.86% oracle
5943 oracle 1 59 0 1.3g 1.2g sleep 25:26 37 4.92% oracle
6295 oracle 1 55 0 1.3g 1.2g run 25:14 74 4.90% oracle
7778 oracle 1 43 0 1.3g 1.2g sleep 11:43 110 4.86% oracle
13270 oracle 1 59 0 1.3g 1.2g sleep 210.6h 0 0.96% oracle
13056 oracle 1 48 0 1.3g 1.2g sleep 303:30 0 0.37% oracle
10653 root 1 58 0 2560k 1624k cpu00 0:00 0 0.32% top
18827 oracle 1 58 0 1.3g 1.2g sleep 18.4h 0 0.31% oracle
12748 oracle 258 58 0 1.3g 1.2g sleep 555:14 0 0.21% oracle
10634 oracle 1 59 0 1.3g 1.2g sleep 0:01 0 0.21% oracle
28458 oracle 1 58 0 1.3g 1.2g sleep 535:02 0 0.18% oracle
13075 oracle 1 59 0 1.3g 1.2g sleep 326:33 0 0.15% oracle
13173 oracle 1 58 0 1.3g 1.2g sleep 593:07 0 0.13% oracle
4927 oracle 1 59 0 1.3g 1.2g sleep 33.4h 0 0.11% oracle
可以看到這兩個進程號分別是27420和27418.
3.捕獲占用cpu利用率過高的sql語句:
以下用到了我總結(jié)的sql語句:
sql>set line 240
sql>set verify off
sql>column sid format 999
sql>column pid format 999
sql>column s_# format 999
sql>column username format a9 heading "ora user"
sql>column program format a29
sql>column sql format a60
sql>column osname format a9 heading "os user"
sql>select p.pid pid,s.sid sid,p.spid spid,s.username username,
s.osuser osname,p.serial# s_#,p.terminal,p.program program,
p.background,s.status,rtrim(substr(a.sql_text, 1, 80)) sql
from v$process p, v$session s,v$sqlarea a where p.addr = s.paddr
and s.sql_address = a.address (+) and p.spid like '%&1%';
enter value for 1: 27420(注意這里應輸入占用cpu最高的進程對應的pid)
得到以下sql語句:
select nvl(sum(localcharge),0),nvl(sum(usage),0) from localusage where to_char(endtime,'yyyymmdd')=20041016
and localcharge>0 and caller like '0543886%';
27418進程對應的sql語句如下:
select nvl(sum(localcharge),0) from localusage where to_char(endtime,'yyyymmdd')=20041016 and caller like '0543888%';
4.使用相關用戶連接到數(shù)據(jù)庫,檢查其執(zhí)行計劃:
sql>connect wacos/oss
connected.
sql>@?/rdbms/admin/utlxplan.sql
table created.
sql>set autotrace on
sql>set timing on
sql>select nvl(sum(localcharge),0),nvl(sum(usage),0) from localusage where to_char(endtime,'yyyymmdd')=20041016
and localcharge>0 and caller like '0543886%';
nvl(sum(localcharge),0) nvl(sum(usage),0)
----------------------- -----------------
0 0
elapsed: 00:02:56.37
execution plan
----------------------------------------------------------
0 select statement optimizer=choose (cost=13435 card=1 bytes=5
3)
1 0 sort (aggregate)
2 1 partition range (all)
3 2 table access (full) of 'localusage' (cost=13435 card=1
81 bytes=9593)
statistics
----------------------------------------------------------
258 recursive calls
0 db block gets
88739 consistent gets
15705 physical reads
0 redo size
580 bytes sent via sql*net to client
651 bytes received via sql*net from client
2 sql*net roundtrips to/from client
8 sorts (memory)
0 sorts (disk)
1 rows processed
發(fā)現(xiàn)對localusage表做了全表掃描,什么記錄也沒有返回居然用了2分多鐘。
sql> select nvl(sum(localcharge),0) from localusage where to_char(endtime,'yyyymmdd')=20040816 and caller like '0543888%';
nvl(sum(localcharge),0)
-----------------------
27.6
elapsed: 00:03:56.46
execution plan
----------------------------------------------------------
0 select statement optimizer=choose (cost=13435 card=1 bytes=4
0)
1 0 sort (aggregate)
2 1 partition range (all)
3 2 table access (full) of 'localusage' (cost=13435 card=3
615 bytes=144600)
statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
88588 consistent gets
15615 physical reads
0 redo size
507 bytes sent via sql*net to client
651 bytes received via sql*net from client
2 sql*net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
這個sql語句有結(jié)果返回,發(fā)現(xiàn)也是對localusage表做了全表掃描,但速度也很慢,用了3分多鐘。
sql> select count(*) from localusage;
count(*)
----------
5793776
該表有579萬多條記錄,數(shù)據(jù)量很大,全表掃描已經(jīng)不再適合。
5.檢查該表的類型:
sql> select index_name, table_name, status, partitioned from user_indexes where table_name='localusage';
index_name table_name status par
------------------------------ ------------------------------ -------- ---
i_localusage_sid localusage n/a yes
ui_localusage_st_seq localusage n/a yes
sql> select index_name,table_name,locality from user_part_indexes where table_name='localusage';
index_name table_name locali
------------------------------ ------------------------------ ------
i_localusage_sid localusage local
ui_localusage_st_seq localusage local
發(fā)現(xiàn)該表是分區(qū)表,并在serviceid,startime和cdrsequence列上建立了分區(qū)索引,索引類型是local索引。
6.查看分區(qū)索引的索引鍵值:
sql> select index_name,column_name,index_owner from dba_ind_columns where table_name='localusage';
index_name column_name index_owner
-------------------- -------------------- ------------------------------
i_localusage_sid serviceid wacos
ui_localusage_st_seq starttime wacos
ui_localusage_st_seq cdrsequence wacos
發(fā)現(xiàn)在endtime和caller列上都沒有建立索引,這也是導致sql語句做全表掃描的最終原因。
7.決定創(chuàng)建新的分區(qū)索引以消除全表掃描:
(1).首先查看localusage表分區(qū)情況:
sql> select partition_name,tablespace_name from user_tab_partitions where table_name='localusage';
partition_name tablespace_name
------------------------------ ------------------------------
localusage_200312 wacos
localusage_200401 wacos
localusage_200402 wacos
localusage_200404 wacos
localusage_200405 wacos
localusage_200406 wacos
localusage_200407 wacos
localusage_200409 wacos
localusage_200410 wacos
localusage_200411 wacos
localusage_200403 wacos
localusage_200408 wacos
localusage_200412 wacos
13 rows selected.
(2).在caller列上創(chuàng)建local分區(qū)索引:
sql>set timing on
sql>create index i_localusage_caller on localusage(caller)
local
(
partition localusage_200312,
partition localusage_200401,
partition localusage_200402,
partition localusage_200404,
partition localusage_200405,
partition localusage_200406,
partition localusage_200407,
partition localusage_200409,
partition localusage_200410,
partition localusage_200411,
partition localusage_200403,
partition localusage_200408,
partition localusage_200412
)
tablespace wacos
storage(
initial 6553600
next 6553600
maxextents unlimited
pctincrease 0)
pctfree 5
nologging;
index created.
elapsed: 00:06:27.90 (由于數(shù)據(jù)量比較大,耗時6分鐘)
8.再次查看執(zhí)行計劃:
sql>select nvl(sum(localcharge),0),nvl(sum(usage),0) from localusage where to_char(endtime,'yyyymmdd')=20041016
and localcharge>0 and caller like '0543886%';
nvl(sum(localcharge),0) nvl(sum(usage),0)
----------------------- -----------------
0 0
elapsed: 00:00:03.00
execution plan
----------------------------------------------------------
0 select statement optimizer=choose (cost=22 card=1 bytes=53)
1 0 sort (aggregate)
2 1 partition range (all)
3 2 table access (by local index rowid) of 'localusage' (cost=22 card=181 bytes=9593)
4 3 index (range scan) of 'i_localusage_caller' (non-unique) (cost=14 card=65063)
statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
16813 consistent gets
569 physical reads
0 redo size
580 bytes sent via sql*net to client
651 bytes received via sql*net from client
2 sql*net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
這次走了索引后速度明顯快多了,用了3秒鐘就返回了結(jié)果。
sql>select nvl(sum(localcharge),0) from localusage where to_char(endtime,'yyyymmdd')=20040816 and caller like '0543888%';
nvl(sum(localcharge),0)
-----------------------
27.6
elapsed: 00:00:24.73
execution plan
----------------------------------------------------------
0 select statement optimizer=choose (cost=22 card=1 bytes=40)
1 0 sort (aggregate)
2 1 partition range (all)
3 2 table access (by local index rowid) of 'localusage' (cost=22 card=3615 bytes=144600)
4 3 index (range scan) of 'i_localusage_caller' (non-unique) (cost=14 card=65063)
statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
129336 consistent gets
7241 physical reads
0 redo size
507 bytes sent via sql*net to client
651 bytes received via sql*net from client
2 sql*net roundtrips to/from client
0 sorts (memory)
0 sorts (disk)
1 rows processed
這個sql語句走了索引,用了24秒鐘返回結(jié)果,性能明顯提高了很多。