字符集問題的初步探討(五)

2024-07-21 02:07:29

字體：大中小

供稿：網(wǎng)友

原文鏈接:

http://www.eygle.com/special/nls_character_set_05.htm

原文發(fā)表于itpub技術(shù)叢書《oracle數(shù)據(jù)庫dba專題技術(shù)精粹》,未經(jīng)許可,嚴禁轉(zhuǎn)載本文.

我們知道在導出文件中，記錄著導出使用的字符集id,通過查看導出文件頭的第2、3個字節(jié)，我們可以找到16進制表示的字符集id,在windows上，
我們可以使用ultraedit等工具打開dmp文件，查看其導出字符集::

在unix上我們可以通過以下命令來查看:

cat expdat.dmp | od -x | head

oracle提供標準函數(shù)，對字符集名稱及id進行轉(zhuǎn)換:

sql> select nls_charset_id('zhs16gbk') from dual;nls_charset_id('zhs16gbk')-------------------------- 8521 row selected.sql> select nls_charset_name(852) from dual;nls_char--------zhs16gbk1 row selected.十進制轉(zhuǎn)換十六進制:sql> select to_char('852','xxxx') from dual;to_ch----- 3541 row selected.

對應上面的圖中第2、3字節(jié)，我們知道該導出文件字符集為zhs16gbk.

查詢數(shù)據(jù)庫中有效的字符集可以使用以下腳本:

col nls_charset_id for 9999col nls_charset_name for a30col hex_id for a20select nls_charset_id(value) nls_charset_id, value nls_charset_name,to_char(nls_charset_id(value),'xxxx') hex_idfrom v$nls_valid_valueswhere parameter = 'characterset'order by nls_charset_id(value)/

輸出樣例如下:

nls_charset_id nls_charset_name hex_id
-------------- ------------------------------ -------------
1 us7ascii 1
2 we8dec 2
3 we8hp 3
4 us8pc437 4
5 we8ebcdic37 5
6 we8ebcdic500 6
7 we8ebcdic1140 7
8 we8ebcdic285 8
...................
850 zhs16cgb231280 352
851 zhs16maccgb231280 353
852 zhs16gbk 354
853 zhs16dbcs 355
860 zht32euc 35c
861 zht32sops 35d
862 zht16dbt 35e
863 zht32tris 35f
864 zht16dbcs 360
865 zht16big5 361
866 zht16ccdc 362
867 zht16mswin950 363
868 zht16hkscs 364
870 al24utffss 366
871 utf8 367
872 utfe 368

..................................

在很多時候，當我們進行導入操作的時候，已經(jīng)離開了源數(shù)據(jù)庫，這時如果目標數(shù)據(jù)庫的字符集和導出文件不一致，很多時候就需要進行特殊處理，
以下介紹幾種方法，主要以us7ascii和zhs16gbk為例

1．源數(shù)據(jù)庫字符集為us7ascii，導出文件字符集為us7ascii或zhs16gbk，目標數(shù)據(jù)庫字符集為zhs16gbk
在oracle92中，我們發(fā)現(xiàn)對于這種情況，不論怎樣處理，這個導出文件都無法正確導入到oracle9i數(shù)據(jù)庫中，這可能是因為oracle9i的編碼方案發(fā)生了較大改變。

以下是我們所做的簡單測試，其中導出文件命名規(guī)則為:

s-server ，后跟server字符集
c-client , 后跟導出操作時客戶端字符集

導入時客戶端字符集設(shè)置在命令行完成，限于篇幅，我們省略了部分測試過程。
對于oracle9ir2,我們的測試結(jié)果是us7ascii字符集，不管怎樣轉(zhuǎn)換，都無法正確導入zhs16gbk字符集的數(shù)據(jù)庫中。

在進行導入操作時，如果字符不能正常轉(zhuǎn)換，oracle數(shù)據(jù)庫會自動用一個”?”代替，也就是編碼63。

e:/nls2>set nls_lang=american_america.us7asciie:/nls2>imp eygle/eygle file=sus7ascii-cus7ascii.dmp fromuser=eygle touser=eygle tables=testimport: release 9.2.0.4.0 - production on mon nov 3 17:14:39 2003copyright (c) 1982, 2002, oracle corporation. all rights reserved.connected to: oracle9i enterprise edition release 9.2.0.4.0 - productionwith the partitioning, oracle label security, olap and oracle data mining optionsjserver release 9.2.0.4.0 - productionexport file created by export:v09.02.00 via conventional pathimport done in us7ascii character set and al16utf16 nchar character setimport server uses zhs16gbk character set (possible charset conversion). . importing table "test" 2 rows importedimport terminated successfully without warnings.e:/nls2>sqlplus eygle/eyglesql*plus: release 9.2.0.4.0 - production on mon nov 3 17:14:50 2003copyright (c) 1982, 2002, oracle corporation. all rights reserved.connected to:oracle9i enterprise edition release 9.2.0.4.0 - productionwith the partitioning, oracle label security, olap and oracle data mining optionsjserver release 9.2.0.4.0 - productionsql> select name,dump(name) from test;namedump(name)-----------------------------????typ=1 len=4: 63,63,63,63testtyp=1 len=4: 116,101,115,1162 rows selected.sql> exitdisconnected from oracle9i enterprise edition release 9.2.0.4.0 - productionwith the partitioning, oracle label security, olap and oracle data mining optionsjserver release 9.2.0.4.0 - productione:/nls2>set nls_lang=american_america.zhs16gbke:/nls2>imp eygle/eygle file=sus7ascii-cus7ascii.dmp fromuser=eygle touser=eygle tables=test ignore=yimport: release 9.2.0.4.0 - production on mon nov 3 17:15:28 2003copyright (c) 1982, 2002, oracle corporation. all rights reserved.connected to: oracle9i enterprise edition release 9.2.0.4.0 - productionwith the partitioning, oracle label security, olap and oracle data mining optionsjserver release 9.2.0.4.0 - productionexport file created by export:v09.02.00 via conventional pathimport done in zhs16gbk character set and al16utf16 nchar character setexport client uses us7ascii character set (possible charset conversion). . importing table "test" 2 rows importedimport terminated successfully without warnings.e:/nls2>sqlplus eygle/eyglesql*plus: release 9.2.0.4.0 - production on mon nov 3 17:15:34 2003copyright (c) 1982, 2002, oracle corporation. all rights reserved.connected to:oracle9i enterprise edition release 9.2.0.4.0 - productionwith the partitioning, oracle label security, olap and oracle data mining optionsjserver release 9.2.0.4.0 - productionsql> select name,dump(name) from test;namedump(name)--------------------------------------------------------------------------------????typ=1 len=4: 63,63,63,63testtyp=1 len=4: 116,101,115,116????typ=1 len=4: 63,63,63,63testtyp=1 len=4: 116,101,115,1164 rows selected.sql> drop table test;table dropped.sql> exitdisconnected from oracle9i enterprise edition release 9.2.0.4.0 - productionwith the partitioning, oracle label security, olap and oracle data mining optionsjserver release 9.2.0.4.0 - productione:/nls2>set nls_lang=american_america.zhs16gbke:/nls2>imp eygle/eygle file=sus7ascii-czhs16gbk.dmp fromuser=eygle touser=eygle tables=test ignore=yimport: release 9.2.0.4.0 - production on mon nov 3 17:17:21 2003copyright (c) 1982, 2002, oracle corporation. all rights reserved.connected to: oracle9i enterprise edition release 9.2.0.4.0 - productionwith the partitioning, oracle label security, olap and oracle data mining optionsjserver release 9.2.0.4.0 - productionexport file created by export:v09.02.00 via conventional pathimport done in zhs16gbk character set and al16utf16 nchar character set. . importing table "test" 2 rows importedimport terminated successfully without warnings.e:/nls2>sqlplus eygle/eyglesql*plus: release 9.2.0.4.0 - production on mon nov 3 17:17:30 2003copyright (c) 1982, 2002, oracle corporation. all rights reserved.connected to:oracle9i enterprise edition release 9.2.0.4.0 - productionwith the partitioning, oracle label security, olap and oracle data mining optionsjserver release 9.2.0.4.0 - productionsql> select name,dump(name) from test;namedump(name)----------------------------------------------????typ=1 len=4: 63,63,63,63testtyp=1 len=4: 116,101,115,1162 rows selected.sql> exitdisconnected from oracle9i enterprise edition release 9.2.0.4.0 - productionwith the partitioning, oracle label security, olap and oracle data mining optionsjserver release 9.2.0.4.0 - productione:/nls2>set nls_lang=american_america.us7asciie:/nls2>imp eygle/eygle file=sus7ascii-czhs16gbk.dmp fromuser=eygle touser=eygle tables=test ignore=yimport: release 9.2.0.4.0 - production on mon nov 3 17:18:00 2003copyright (c) 1982, 2002, oracle corporation. all rights reserved.connected to: oracle9i enterprise edition release 9.2.0.4.0 - productionwith the partitioning, oracle label security, olap and oracle data mining optionsjserver release 9.2.0.4.0 - productionexport file created by export:v09.02.00 via conventional pathimport done in us7ascii character set and al16utf16 nchar character setimport server uses zhs16gbk character set (possible charset conversion)export client uses zhs16gbk character set (possible charset conversion). . importing table "test" 2 rows importedimport terminated successfully without warnings.e:/nls2>sqlplus eygle/eyglesql*plus: release 9.2.0.4.0 - production on mon nov 3 17:18:08 2003copyright (c) 1982, 2002, oracle corporation. all rights reserved.connected to:oracle9i enterprise edition release 9.2.0.4.0 - productionwith the partitioning, oracle label security, olap and oracle data mining optionsjserver release 9.2.0.4.0 - productionsql> select name,dump(name) from test;namedump(name)----------------------------------------????typ=1 len=4: 63,63,63,63testtyp=1 len=4: 116,101,115,116????typ=1 len=4: 63,63,63,63testtyp=1 len=4: 116,101,115,1164 rows selected.sql>

對于這種情況，我們可以通過使用oracle8i的導出工具，設(shè)置導出字符集為us7ascii，導出后修改第二、三字符，修改 0001 為
0354,這樣就可以將us7ascii字符集的數(shù)據(jù)正確導入到zhs16gbk的數(shù)據(jù)庫中。

修改導出文件:

導入修改后的導出文件:

e:/nls2>set nls_lang=american_america.zhs16gbke:/nls2>imp eygle/eygle file=sus7ascii-cus7ascii-exp817.dmp fromuser=eygle touser=eygle tables=testimport: release 9.2.0.4.0 - production on mon nov 3 17:37:17 2003copyright (c) 1982, 2002, oracle corporation. all rights reserved.connected to: oracle9i enterprise edition release 9.2.0.4.0 - productionwith the partitioning, oracle label security, olap and oracle data mining optionsjserver release 9.2.0.4.0 - productionexport file created by export:v08.01.07 via conventional pathimport done in zhs16gbk character set and al16utf16 nchar character setexport server uses utf8 nchar character set (possible ncharset conversion). . importing table "test" 2 rows importedimport terminated successfully without warnings.e:/nls2>sqlplus eygle/eyglesql*plus: release 9.2.0.4.0 - production on mon nov 3 17:37:23 2003copyright (c) 1982, 2002, oracle corporation. all rights reserved.connected to:oracle9i enterprise edition release 9.2.0.4.0 - productionwith the partitioning, oracle label security, olap and oracle data mining optionsjserver release 9.2.0.4.0 - productionsql> select name,dump(name) from test;name dump(name)--------------------------------------------------------------------------------測試 typ=1 len=4: 178,226,202,212test typ=1 len=4: 116,101,115,1162 rows selected.sql>

2．使用create database的方法
如果導出文件使用的字符集是us7ascii，目標數(shù)據(jù)庫的字符集是zhs16gbk,我們可以使用create database的方法來修改，具體如下:

sql> col parameter for a30sql> col value for a30sql> select * from v$nls_parameters;parameter value------------------------------ ------------------------------nls_language americannls_territory americanls_currency $nls_iso_currency americanls_numeric_characters .,nls_calendar gregoriannls_date_format dd-mon-rrnls_date_language americannls_characterset zhs16gbknls_sort binary……………….19 rows selected.sql> create database character set us7ascii;create database character set us7ascii*error at line 1:ora-01031: insufficient privilegessql> select * from v$nls_parameters;parameter value------------------------------ ------------------------------nls_language americannls_territory americanls_currency $nls_iso_currency americanls_numeric_characters .,nls_calendar gregoriannls_date_format dd-mon-rrnls_date_language americannls_characterset us7asciinls_sort binary…………..19 rows selected.sql> exitdisconnected from oracle9i enterprise edition release 9.2.0.4.0 - productionwith the partitioning, oracle label security, olap and oracle data mining optionsjserver release 9.2.0.4.0 - productione:/nls2>set nls_lang=american_america.us7asciie:/nls2>imp eygle/eygle file=sus7ascii-cus7ascii.dmp fromuser=eygle touser=eygleimport: release 9.2.0.4.0 - production on sun nov 2 14:53:26 2003copyright (c) 1982, 2002, oracle corporation. all rights reserved.connected to: oracle9i enterprise edition release 9.2.0.4.0 - productionwith the partitioning, oracle label security, olap and oracle data mining optionsjserver release 9.2.0.4.0 - productionexport file created by export:v09.02.00 via conventional pathimport done in us7ascii character set and al16utf16 nchar character setimport server uses zhs16gbk character set (possible charset conversion). . importing table "test" 2 rows importedimport terminated successfully without warnings.e:/nls2>sqlplus eygle/eyglesql*plus: release 9.2.0.4.0 - production on sun nov 2 14:53:35 2003copyright (c) 1982, 2002, oracle corporation. all rights reserved.connected to:oracle9i enterprise edition release 9.2.0.4.0 - productionwith the partitioning, oracle label security, olap and oracle data mining optionsjserver release 9.2.0.4.0 - productionsql> select * from test;name----------測試test2 rows selected.

我們看到，當發(fā)出create database character set us7ascii；命令時，數(shù)據(jù)庫v$nls_parameters中的字符集設(shè)置隨之更改，該參數(shù)影響導入進程，
更改后可以正確導入數(shù)據(jù)，重起數(shù)據(jù)庫后，該設(shè)置恢復。

提示:v$nls_paraemters來源于x$nls_parameters,該動態(tài)性能視圖影響導入操作；而nls_database_parameters來源于props$數(shù)據(jù)表，影響數(shù)據(jù)存儲。

3． oracle提供的字符掃描工具csscan

我們說以上的方法只是應該在不得已的情況下使用，其本質(zhì)是欺騙數(shù)據(jù)庫，強制導入數(shù)據(jù)，可能損失元數(shù)據(jù)。
如果要確保數(shù)據(jù)的完整性，應該使用csscan掃描數(shù)據(jù)庫，找出所有不兼容的字符，然后通過編寫相應的腳本及代碼，在轉(zhuǎn)換之后進行更新，確保數(shù)據(jù)的正確性。
我們簡單看一下csscan的使用。

要使用csscan之前，需要以sys用戶身份創(chuàng)建相應數(shù)據(jù)字典對象:

e:/nls2>sqlplus "/ as sysdba"sql*plus: release 9.2.0.4.0 - production on sun nov 2 19:42:07 2003copyright (c) 1982, 2002, oracle corporation. all rights reserved.connected to:oracle9i enterprise edition release 9.2.0.4.0 - productionwith the partitioning, oracle label security, olap and oracle data mining optionsjserver release 9.2.0.4.0 - productionsql> select instance_name from v$intance;select instance_name from v$intance *error at line 1:ora-00942: table or view does not existsql> select instance_name from v$instance;instance_name----------------penny1 row selected.sql> @?/rdbms/admin/csminst.sqluser created.grant succeeded.………..

這個腳本創(chuàng)建相應用戶(csmig)及數(shù)據(jù)字典對象，掃描信息會記錄在相應的數(shù)據(jù)字典表里。

我們可以在命令行調(diào)用這個工具對數(shù)據(jù)庫進行掃描:

e:/nls2>csscan full=y fromchar=zhs16gbk tochar=us7ascii log=us7check.log capture=y array=1000000 process=2character set scanner v1.1 : release 9.2.0.1.0 - production on sun nov 2 20:24:45 2003copyright (c) 1982, 2002, oracle corporation. all rights reserved.username: eygle/eygleconnected to:oracle9i enterprise edition release 9.2.0.4.0 - productionwith the partitioning, oracle label security, olap and oracle data mining optionsjserver release 9.2.0.4.0 - productionenumerating tables to scan.... process 1 scanning sys.source$[aaaabhaabaaaairaaa]. process 2 scanning sys.attribute$[aaaaeoaabaaaahzaaa]. process 2 scanning sys.parameter$[aaaaeoaabaaaahzaaa]. process 2 scanning sys.method$[aaaaeoaabaaaahzaaa]……... process 2 scanning system.def$_aqerror[aaaa8faabaaacwjaaa]. process 1 scanning wmsys.wm$env_vars[aaabewaabaaafmzaaa]………………….. process 2 scanning sys.ugroup$[aaaaa5aabaaaagpaaa]. process 2 scanning sys.con$[aaaaacaabaaaacpaaa]. process 1 scanning sys.file$[aaaaaraabaaaabxaaa]creating database scan summary report...creating individual exception report...scanner terminated successfully.

然后我們可以檢查輸出的日志來查看數(shù)據(jù)庫掃描情況:

database scan individual exception report[database scan parameters]parameter value ------------------------------ ------------------------------------------------scan type full database scan char data? yes current database character set zhs16gbk new database character set us7ascii scan nchar data? no array fetch buffer size 1000000 number of processes 2 capture convertible data? yes ------------------------------ ------------------------------------------------[data dictionary individual exceptions][application data individual exceptions]user : eygletable : testcolumn: nametype : varchar2(10)number of exceptions : 1 max post conversion data size: 4 rowid exception type size cell data(first 30 bytes) ------------------ ------------------ ----- ------------------------------aaabpiaadaaaaamaaa lossy conversion 測試 ------------------ ------------------ ----- ------------------------------

不能轉(zhuǎn)換的數(shù)據(jù)將會被記錄下來，我們可以根據(jù)這些信息在轉(zhuǎn)換之后，對數(shù)據(jù)進行相應的更新，確保轉(zhuǎn)換無誤。

最大的網(wǎng)站源碼資源下載站，

上一篇：初學觸發(fā)器

下一篇：物料編碼的原則