[oracle] dba工作備忘錄之三:rman備份,未使用catalog,控制文件丟失的解決辦法 作者:fenng 日期:july 19 2004
情況描述客戶報告數據庫故障,新來的系統管理員誤操作。刪掉了一些文件。詢問:刪掉了那些文件?答曰:所有重要數據文件,所有控制文件。數據庫原來是歸檔模式,用rman備份數據,rman 使用控制文件。幸運的是,最后一次rman full 備份是包括了控制文件在內。系統沒有設定自動備份控制文件.現在狀況是數據庫無法啟動.
不用說,客戶的備份方案不夠完善,但是這時候再去說這些話責備用戶有事后諸葛亮之嫌,用戶是上帝,不要去得罪他。還有,客戶有full備份(雖然不是自動備份控制文件,這樣無法用常規的恢復步驟來進行恢復)。這對我們來說是個絕對的好消息。
下面我們通過一次模擬操作來演示這個問題的解決辦法。
解決過程首先,用控制文件作數據庫系統的全備份:
代碼:------------------------藍色部分是輸入內容,黑色部分是敏感信息,須加以注意----------------------------------------------------c:wutemp>rman target /
recovery manager: release 9.2.0.1.0 - production.copyright (c) 1995, 2002, oracle corporation. all rights reserved.connected to target database: demo (dbid=3272375326)
rman> run {2> allocate channel c1 type disk;3> backup full tag 'fullbackup' format 'd:/kde/%d_%u_%s_%p.dbf' database include current controlfile;4> sql ' alter system archive log current';5> release channel c1;6> }
using target database controlfile instead of recovery catalogallocated channel: c1channel c1: sid=15 devtype=disk
starting backup at 18-jul-04channel c1: starting full datafile backupsetchannel c1: specifying datafile(s) in backupsetincluding current spfile in backupsetincluding current controlfile in backupsetinput datafile fno=00001 name=d:/oracle/oradata/demo ystem01.dbfinput datafile fno=00002 name=d:/oracle/oradata/demo/undotbs01.dbfinput datafile fno=00004 name=d:/oracle/oradata/demo/example01.dbfinput datafile fno=00009 name=d:/oracle/oradata/demo/xdb01.dbfinput datafile fno=00005 name=d:/oracle/oradata/demo/indx01.dbfinput datafile fno=00008 name=d:/oracle/oradata/demo/users01.dbfinput datafile fno=00003 name=d:/oracle/oradata/demo/drsys01.dbfinput datafile fno=00006 name=d:/oracle/oradata/demo/odm01.dbfinput datafile fno=00007 name=d:/oracle/oradata/demo/tools01.dbfchannel c1: starting piece 1 at 18-jul-04channel c1: finished piece 1 at 18-jul-04piece handle=d:/kde/demo_01fr79ot_1_1.dbf comment=nonechannel c1: backup set complete, elapsed time: 00:01:17finished backup at 18-jul-04
sql statement: alter system archive log current
released channel: c1--如上所示,我們做了一次數據庫的full備份.備份片中包括控制文件.注意上面輸出內容的黑體部分.我們在后面的恢復操作中會用到. 模擬錯誤,關掉實例,刪掉所有的控制文件和所有的.dbf文件。然后starup會看到如下的出錯信息:
sql> startuporacle instance started.
total system global area 152115804 bytesfixed size 453212 bytesvariable size 100663296 bytesdatabase buffers 50331648 bytesredo buffers 667648 bytesora-00205: error in identifying controlfile, check alert log for more info
查看alert log,應該是系統找不到控制文件.現在情形和客戶問題一致.不過在繼續講述之前,我們還需要介紹一點背景知識.
背景知識: 在oracle 816 以后的版本中,oracle提供了一個包:dbms_backup_restore.dbms_backup_restore 包是由dbmsbkrs.sql 和 prvtbkrs.plb 這兩個腳本創建的.catproc.sql 腳本運行后會調用這兩個包.所以是每個數據庫都有的這個包是oracle服務器和操作系統之間io操作的接口.由恢復管理器直接調用。而且據說這兩個腳本的功能是內建到oracle的一些庫文件中的.
由此可見,我們可以在數據庫 nomount 情況下調用這些package ,來達到我們的恢復目的。在dbmsbkrs.sql 和prvtbkrs.plb 這兩個腳本中有詳細的說明文檔,出于篇幅問題,就不一一加以翻譯了,但在下面會直接引用一些原文說明。
關鍵的內容有:
function deviceallocate( type in varchar2 default null ,name in varchar2 default null ,ident in varchar2 default null ,noio in boolean default false ,params in varchar2 default null )return varchar2;
-- describe the device to be used for sequential i/o. for device types where-- only one process at a time can use a device, this call allocates a device-- for exclusive use by this session. the device remains allocated until-- devicedeallocate is called or session termination. the device can be used-- both for creating and restoring backups.---- specifying a device allocates a context that exists until the session-- terminates or devicedeallocate is called. only one device can be specified-- at a time for a particular session. thus devicedeallocate must be called-- before a different device can be specified. this is not a limitation since-- a session can only read or write one backup at a time.---- the other major effect of allocating a device is to specify the name space-- for the backup handles (file names). the handle for a sequential file does-- not necessarily define the type of device used to write the file. thus it-- is necessary to specify the device type in order to interpret the file-- handle. the null device type is defined for all systems. it is the file-- system supplied by the operating system. the sequential file handles are-- thus normal file names.---- a device can be specified either by name or by type.-- if the type is specified but not the name, the system picks an-- available device of that type.-- if the name is specified but not the type, the type is determined-- from the device.-- if neither the type or the name is given, the backups are files in-- the operating system file system.
-- note that some types of devices, optical disks for example, can be shared-- by many processes, and thus do not really require allocation of the device-- itself. however we do need to allocate the context for accessing the-- device, and we do need to know the device type for proper interpretation-- of the file handle. thus it is always necessary to make the device-- allocation call before making most other calls in this package.---- input parameters:-- type-- if specified, this gives the type of device to use for sequential-- i/o. the allowed types are port specific. for example a port may-- support the type "tape" which is implemented via the oracle tape-- api. if no type is specified, it may be implied by specifying a-- particular device name to allocate. the type should be allowed to-- default to null if operating system files are to be used.---- name-- if specified, this names a particular piece of hardware to use for-- accessing sequential files. if not specified, any available-- device of the correct type will be allocated. if the device cannot-- be shared, it is allocated to this session for exclusive use.-- the name should be allowed to default to null if operating system-- files are to be used.---- ident-- this is the users identifier that he uses to name this device. it-- is only used to report the status of this session via-- dbms_application_info. this value will be placed in the client_info-- column of the v$session table, in the row corresponding to the-- session in which the device was allocated. this value can also-- be queried with the dbms_application_info.read_client_info procedure.---- noio-- if true, the device will not be used for doing any i/o. this allows-- the specification of a device type for deleting sequential files-- without actually allocating a piece of hardware. an allocation for-- noio can also be used for issuing device commands. note that some-- commands may actually require a physical device and thus will get-- an error if the allocate was done with noio set to true.---- params-- this string is simply passed to the device allocate osd. it is-- completely port and device specific.---- returns:-- it returns a valid device type. this is the type that should be-- allocated to access the same sequential files at a later date. note-- that this might not be exactly the same value as the input string.-- the allocate osd may do some translation of the type passed in. the-- return value is null when using operating system files.
procedure restorecontrolfileto(cfname in varchar2);
-- this copies the controlfile from the backup set to an operating system-- file. if the database is mounted, the name must not match any of the-- current controlfiles.---- input parameters:-- cfname-- name of file to create or overwrite with the controlfile from the-- backup set.procedure restoredatafileto( dfnumber in binary_integer,toname in varchar2 default null);---- restoredatafileto creates the output file from a complete backup in the-- backup set.
如果您有興趣可以去閱讀一下這兩個文件的注釋說明.
我們首先嘗試恢復控制文件:sql>startup force nomount;
sql> declare2 devtype varchar2(256);3 done boolean;4 begin5 devtype:=sys.dbms_backup_restore.deviceallocate(type=>'',ident=>'t1');6 sys.dbms_backup_restore.restoresetdatafile;7 sys.dbms_backup_restore.restorecontrolfileto(cfname=>'d:/oracle/control01.ctl');8 sys.dbms_backup_restore.restorebackuppiece(done=>done,handle=>'d:/kde/demo_01fr79ot_1_1.dbf', params=>null);9 sys.dbms_backup_restore.devicedeallocate;10 end;11 /
pl/sql procedure successfully completed.ok,控制文件恢復完成.對以上內容的解釋:
第五行 分配一個device channel,因為使用的操作系統文件,所以這里為空,如果是從磁帶上恢復要用 "sbt_tape";第六行 指明開始restore ; 第七行 指出待恢復文件目標存儲位置;第八行 從哪個備份片中恢復;第九行 釋放設備通道.不妨對以上操作的結果驗證一下:
sql> host dir d:/oraclevolume in drive d is datavolume serial number is dc79-57f8directory of d:/oracle
07/18/2004 09:08 pm <dir> .07/18/2004 09:08 pm <dir> ..06/08/2004 03:21 pm <dir> admin07/18/2004 09:08 pm 1,871,872 control01.ctl 07/16/2004 11:27 am <dir> ora9207/18/2004 09:02 pm <dir> oradata這樣,我們成功的restore了控制文件 .如果控制文件在full備份之后單獨做的,接下來關掉實例,拷貝控制文件到具體位置,然后rman 執行restore database;即可。
可是,我們這里的情況有些不同.
視丟失文件的情況而定,繼續進行如下的恢復操作:代碼:--------------------------------------------------------------------------------sql> declare2 devtype varchar2(256);3 done boolean;4 begin5 devtype:=sys.dbms_backup_restore.deviceallocate (type=>'',ident=>'t1');6 sys.dbms_backup_restore.restoresetdatafile;7 sys.dbms_backup_restore.restoredatafileto(dfnumber=>01,toname=>'d:/oracle/oradata/demo ystem01.dbf');8 sys.dbms_backup_restore.restoredatafileto(dfnumber=>02,toname=>'d:/oracle/oradata/demo/undotbs01.dbf');9 sys.dbms_backup_restore.restoredatafileto(dfnumber=>03,toname=>'d:/oracle/oradata/demo/drsys01.dbf');10 sys.dbms_backup_restore.restoredatafileto(dfnumber=>04,toname=>'d:/oracle/oradata/demo/example01.dbf');11 sys.dbms_backup_restore.restoredatafileto(dfnumber=>05,toname=>'d:/oracle/oradata/demo/indx01.dbf');12 sys.dbms_backup_restore.restoredatafileto(dfnumber=>06,toname=>'d:/oracle/oradata/demo/odm01.dbf');13 sys.dbms_backup_restore.restoredatafileto(dfnumber=>07,toname=>'d:/oracle/oradata/demo/tools01.dbf');14 sys.dbms_backup_restore.restoredatafileto(dfnumber=>08,toname=>'d:/oracle/oradata/demo/users01.dbf');15 sys.dbms_backup_restore.restoredatafileto(dfnumber=>09,toname=>'d:/oracle/oradata/demo/xdb01.dbf');16 sys.dbms_backup_restore.restorebackuppiece(done=>done,handle=>'d:/kde/demo_01fr79ot_1_1.dbf', params=>null);17 sys.dbms_backup_restore.devicedeallocate;18 end;19 /
pl/sql procedure successfully completed.--我們的情形是所有的數據文件都丟失了,那就如法炮制 ...........--文件對應編號來自前面全備份時候的屏幕輸出內容.所以,在備份的時候保留操作log是個很好的習慣.
sql> startup force mount;oracle instance started.
total system global area 152115804 bytesfixed size 453212 bytesvariable size 100663296 bytesdatabase buffers 50331648 bytesredo buffers 667648 bytesdatabase mounted.sql> recover database using backup controlfile until cancel ;ora-00279: change 243854 generated at 07/18/2004 20:57:03 needed for thread 1ora-00289: suggestion : d:/kde/arc00002.001ora-00280: change 243854 for thread 1 is in sequence #2specify log: {<ret>=suggested | filename | auto | cancel}d:/kde/arc00002.001ora-00279: change 244089 generated at 07/18/2004 20:58:18 needed for thread 1ora-00289: suggestion : d:/kde/arc00003.001ora-00280: change 244089 for thread 1 is in sequence #3ora-00278: log file 'd:/kde/arc00002.001' no longer needed for this recoveryspecify log: {<ret>=suggested | filename | auto | cancel}cancelmedia recovery cancelled.sql> alter database open resetlogs;
database altered.最后,不得不resetlogs .
然后,打掃戰場,馬上進行數據庫的全備份。如果您是dba的話,應該進一步制定并完善備份計劃.亡羊補牢,為時未晚。
總結一下
1 控制文件在備份中意義重大,建議每次對其單獨備份,如果數據庫版本允許的話,應該設置為控制文件自動備 份。同時應該盡可能地增大control_file_record_keep_time這個初始化參數的值。以便備份信息能更長時間的保留
2 應該制定比較完善的備份計劃,否則備份計劃一旦出現缺口,將可能給系統帶來災難.記住, "可能出錯的地方一定會出錯".
3 熟悉rman內部備份機制,對dbms_backup_restore的用法有一定的掌握在關鍵時侯很有幫助.
4 備份腳本應該對log重定向并保存.以便在出錯的查找有用信息.
參考資料:
rman recovery without recovery catalog or controlfiles by bonnie bizzarodbmsbkrs.sql 和 prvtbkrs.plb 文件說明注釋(可在你的系統 $oracle_home/rdbms/admin/中找到.)
相關鏈接:
本文的更多討論,請參考這里:http://www.itpub.net/244345.htmldba工作備忘錄之二: exp出錯的一個案例http://www.itpub.net/showthread.php?s=&threadid=238819dba工作備忘錄之一:用events 跟蹤解決不能創建物化試圖一例 http://www.dbanotes.net/oracle/oracle-case-of-10046_i.htm原文出處<a >http://www.dbanotes.net/rman_nocatalog_lost_controlfile_howto.htm</a>
本文作者: fenng,現任某美資公司dba,業余時間混跡于各數據庫相關的技術論壇。目前關注如何利用oracle數據庫有效的構建企業應用。對oracle tuning、troubleshooting有一點研究。個人技術站點: http://www.dbanotes.net/ 。可以通過電子郵件 [email protected] 聯系到他。 本文為dbanotes.net版權所有,轉載請注明出處、作者并盡量保留本文所有超鏈接。