摘要:詳細(xì)介紹了C++中的Name Mangling的原理和gcc中對(duì)應(yīng)的實(shí)現(xiàn),通過(guò)程序代碼和nm C++filt等工具來(lái)驗(yàn)證這些原理。對(duì)于詳細(xì)了解程序的鏈接過(guò)程有一定的幫助。Name Mangling概述大型程序是通過(guò)多個(gè)模塊構(gòu)建而成,模塊之間的關(guān)系由makefile來(lái)描述。對(duì)于由C++語(yǔ)言編制的大型程序而言,也是符合這個(gè)規(guī)則。程序的構(gòu)建過(guò)程一般為:各個(gè)源文件分別編譯,形成目標(biāo)文件。多個(gè)目標(biāo)文件通過(guò)鏈接器形成最終的可執(zhí)行程序。顯然,從某種程度上說(shuō),編譯器的輸出是鏈接器的輸入,鏈接器要對(duì)編譯器的輸出做二次加工。從通信的角度看,這兩個(gè)程序需要一定的協(xié)議來(lái)規(guī)范符號(hào)的組織格式。這就是Name Mangling產(chǎn)生的根本原因。C++的語(yǔ)言特性比C豐富的多,C++支持的函數(shù)重載功能是需要Name Mangling技術(shù)的最直接的例子。對(duì)于重載的函數(shù),不能僅依靠函數(shù)名稱(chēng)來(lái)區(qū)分不同的函數(shù),因?yàn)镃++中重載函數(shù)的區(qū)分是建立在以下規(guī)則上的:函數(shù)名字不同 || 參數(shù)數(shù)量不同||某個(gè)參數(shù)的類(lèi)型不同那么區(qū)分函數(shù)的時(shí)候,應(yīng)該充分考慮參數(shù)數(shù)量和參數(shù)類(lèi)型這兩種語(yǔ)義信息,這樣才能為卻分不同的函數(shù)保證充分性。當(dāng)然,C++還有很多其他的地方需要Name Mangling,如namespace, class, template等等。總的來(lái)說(shuō),Name Mangling就是一種規(guī)范編譯器和鏈接器之間用于通信的符號(hào)表表示方法的協(xié)議,其目的在于按照程序的語(yǔ)言規(guī)范,使符號(hào)具備足夠多的語(yǔ)義信息以保證鏈接過(guò)程準(zhǔn)確無(wú)誤的進(jìn)行。簡(jiǎn)單的實(shí)驗(yàn)Name Mangling會(huì)帶了一個(gè)很常見(jiàn)的負(fù)面效應(yīng),就是C語(yǔ)言的程序調(diào)用C++的程序時(shí),會(huì)比較棘手。因?yàn)镃語(yǔ)言中的Name Mangling很簡(jiǎn)單,不如C++中這么復(fù)雜。下面的代碼用于演示這兩種不同點(diǎn): 1. /*2. * simple_test.c3. * a demo to show that different name mangling technology in C++ and C4. 5. * Author: Chaos Lee6. 7. */8. 9. #include<stdio.h>10. 11. int rect_area(int x1,int x2,int y1,int y2)12. 13. {14. return (x2-x1) * (y2-y1);15. }16. 17. int elipse_area(int a,int b)18. 19. {20. return 3.14 * a * b;21. }22. 23. int main(int argc,char *argv[])24. 25. {26. int x1 = 10, x2 = 20, y1 = 30, y2 = 40;27. int a = 3,b=4;28. int result1 = rect_area(x1,x2,y1,y2);29. int result2 = elipse_area(a,b);30. return 0;31. } 1. [lichao@sg01 name_mangling]$ gcc -c simple_test.c2. 3. [lichao@sg01 name_mangling]$ nm simple_test.o4. 5. 0000000000000027 T elipse_area6. 7. 0000000000000051 T main8. 9. 0000000000000000 T rect_area從上面的輸出結(jié)果上,可以看到使用gcc編譯后對(duì)應(yīng)的符號(hào)表中,幾乎沒(méi)有對(duì)函數(shù)做任何修飾。接下來(lái)使用g++編譯: 1. [lichao@sg01 name_mangling]$ nm simple_test.o2. 0000000000000028 T _Z11elipse_areaii3. 4. 0000000000000000 T _Z9rect_areaiiii5. 6. U __gxx_personality_v07. 0000000000000052 T main顯然,g++編譯器對(duì)符號(hào)的改編比較復(fù)雜。所以,如果一個(gè)由C語(yǔ)言編譯的目標(biāo)文件中調(diào)用了C++中實(shí)現(xiàn)的函數(shù),肯定會(huì)出錯(cuò)的,因?yàn)榉?hào)不匹配。簡(jiǎn)單對(duì)_Z9rect_areaiiii做個(gè)介紹:l C++語(yǔ)言中規(guī)定 :以下劃線并緊挨著大寫(xiě)字母開(kāi)頭或者以?xún)蓚€(gè)下劃線開(kāi)頭的標(biāo)識(shí)符都是C++語(yǔ)言中保留的標(biāo)示符。所以_Z9rect_areaiiii是保留的標(biāo)識(shí)符,g++編譯的目標(biāo)文件中的符號(hào)使用_Z開(kāi)頭(C99標(biāo)準(zhǔn))。l 接下來(lái)的部分和網(wǎng)絡(luò)協(xié)議很類(lèi)似。9表示接下來(lái)的要表示的一個(gè)字符串對(duì)象的長(zhǎng)度(現(xiàn)在知道為什么不讓用數(shù)字作為標(biāo)識(shí)符的開(kāi)頭了吧?)所以rect_area這九個(gè)字符就作為函數(shù)的名稱(chēng)被識(shí)別出來(lái)了。l 接下來(lái)的每個(gè)小寫(xiě)字母表示參數(shù)的類(lèi)型,i表示int類(lèi)型。小寫(xiě)字母的數(shù)量表示函數(shù)的參數(shù)列表中參數(shù)的數(shù)量。l 所以,在符號(hào)中集成了用于區(qū)分不同重載函數(shù)的足夠的語(yǔ)義信息。如果要在C語(yǔ)言中調(diào)用C++中的函數(shù)該怎么做?這時(shí)候可以使用C++的關(guān)鍵字extern “C”。對(duì)應(yīng)代碼如下: 1. /*2. * simple_test.c3. * a demo to show that different name mangling technology in C++ and C4. 5. * Author: Chaos Lee6. 7. */8. 9. #include<stdio.h>10. 11. #ifdef __cplusplus12. 13. extern "C" {14. 15. #endif16. int rect_area(int x1,int x2,int y1,int y2)17. 18. {19. return (x2-x1) * (y2-y1);20. }21. 22. int elipse_area(int a,int b)23. 24. {25. return (int)(3.14 * a * b);26. }27. 28. #ifdef __cplusplus29. 30. }31. #endif32. 33. int main(int argc,char *argv[])34. 35. {36. int x1 = 10, x2 = 20, y1 = 30, y2 = 40;37. int a = 3,b=4;38. int result1 = rect_area(x1,x2,y1,y2);39. int result2 = elipse_area(a,b);40. return 0;41. }下面是使用gcc編譯的結(jié)果: 1. [lichao@sg01 name_mangling]$ gcc -c simple_test.c2. 3. [lichao@sg01 name_mangling]$ nm simple_test.o4. 5. 0000000000000027 T elipse_area6. 7. 0000000000000051 T main8. 9. 0000000000000000 T rect_area在使用g++編譯一次: 1. [lichao@sg01 name_mangling]$ g++ -c simple_test.c2. 3. [lichao@sg01 name_mangling]$ nm simple_test.o4. 5. U __gxx_personality_v06. 7. 0000000000000028 T elipse_area8. 9. 0000000000000052 T main10. 11. 0000000000000000 T rect_area可見(jiàn),使用extern “C”關(guān)鍵字之后,符號(hào)按照C語(yǔ)言的格式來(lái)組織了。事實(shí)上,C標(biāo)準(zhǔn)庫(kù)中使用了大量的extern “C”關(guān)鍵字,因?yàn)镃標(biāo)準(zhǔn)庫(kù)也是可以用C++編譯器編譯的,但是要確保編譯之后仍然保持C的接口而不是C++的接口(因?yàn)槭荂標(biāo)準(zhǔn)庫(kù)),所以需要使用extern “C”關(guān)鍵字。下面是一個(gè)簡(jiǎn)單的例子: 1. /*2. * libc_test.c3. * a demo PRogram to show that how the standard C4. 5. * library are compiled when encountering a C++ compiler6. 7. */8. #include<stdio.h>9. int main(int argc,char * argv[])10. 11. {12. puts("hello world./n");13. return 0;14. }搜索一下puts,我們并沒(méi)有看到extern “C”.奇怪么? 1. [lichao@sg01 name_mangling]$ g++ -E libc_test.c | grep 'puts'2. 3. extern int fputs (__const char *__restrict __s, FILE *__restrict __stream);4. 5. extern int puts (__const char *__s);6. 7. extern int fputs_unlocked (__const char *__restrict __s,8. 9. puts("hello world./n");搜索一下 extern “C”試下 1. [lichao@sg01 name_mangling]$ g++ -E libc_test.c | grep 'extern "C"'2. 3. extern "C" {4. 5. extern "C" {這是由于extern “C”可以使用{}的形式將其作用域內(nèi)的函數(shù)全部聲明為C語(yǔ)言可調(diào)用的接口形式。標(biāo)準(zhǔn)不同編譯器使用不同的方式進(jìn)行name mangling, 你可能會(huì)問(wèn)為什么不將C++的 name mangling標(biāo)準(zhǔn)化,這樣就能實(shí)現(xiàn)各個(gè)編譯器之間的互操作了。事實(shí)上,在C++的FAQ列表上有對(duì)此問(wèn)題的回答:"Compilers differ as to how objects are laid out, how multiple inheritance is implemented, how virtual function calls are handled, and so on, so if the name mangling were made the same, your programs would link against libraries provided from other compilers but then crash when run. For this reason, the ARM (Annotated C++ Reference Manual) encourages compiler writers to make their name mangling different from that of other compilers for the same platform. Incompatible libraries are then detected at link time, rather than at run time."“編譯器由于內(nèi)部實(shí)現(xiàn)的不同而不同,內(nèi)部實(shí)現(xiàn)包括對(duì)象在內(nèi)存中的布局,繼承的實(shí)現(xiàn),虛函數(shù)調(diào)用處理等等。所以如果將name mangling標(biāo)準(zhǔn)化了,不錯(cuò),你的程序確實(shí)能夠鏈接成功,但是運(yùn)行肯定要崩的。恰恰是因?yàn)檫@個(gè)原因,ARM鼓勵(lì)為同一平臺(tái)提供的不同編譯器應(yīng)該使用不同的name mangling方式。這樣在編譯的時(shí)候,不兼容的庫(kù)就會(huì)被檢測(cè)到,而不至于鏈接時(shí)雖然通過(guò)了,但是運(yùn)行時(shí)崩潰了。”顯然,這是基于“運(yùn)行時(shí)崩潰比鏈接時(shí)失敗的代價(jià)更大”這個(gè)原則而考慮的。GCC的name manglingGCC采用IA 64的name mangling方案,此方案定義于Intel IA64 standard ABI.在g++的FAQ列表中有以下一段話(huà): "GNU C++ does not do name mangling in the same way as other C++ compilers.This means that object files compiled with one compiler cannot be used withanother”GNU C++的name mangling方案和其他C++編譯器方案不同,所以一種編譯器生成的目標(biāo)文件并不能被另外一種編譯器生成的目標(biāo)文件使用。以下為內(nèi)置的編碼類(lèi)型: 1. Builtin types encoding2. 3. <builtin-type> ::= v # void4. ::= w # wchar_t5. ::= b # bool6. ::= c # char7. ::= a # signed char8. ::= h # unsigned char9. ::= s # short10. ::= t # unsigned short11. ::= i # int12. ::= j # unsigned int13. ::= l # long14. ::= m # unsigned long15. ::= x # long long, __int6416. ::= y # unsigned long long, __int6417. ::= n # __int12818. ::= o # unsigned __int12819. ::= f # float20. ::= d # double21. ::= e # long double, __float8022. ::= g # __float12823. ::= z # ellipsis24. ::= u <source-name> # vendor extended type操作符編碼:Operator encoding 1. <operator-name> ::= nw # new 2. ::= na # new[]3. ::= dl # delete 4. ::= da # delete[] 5. ::= ps # + (unary)6. ::= ng # - (unary) 7. ::= ad # & (unary) 8. ::= de # * (unary) 9. ::= co # ~ 10. ::= pl # + 11. ::= mi # - 12. 13. ::= ml # * 14. 15. ::= dv # / 16. ::= rm # % 17. ::= an # & 18. ::= or # | 19. ::= eo # ^ 20. ::= aS # = 21. ::= pL # += 22. ::= mI # -= 23. ::= mL # *= 24. ::= dV # /= 25. ::= rM # %= 26. ::= aN # &= 27. ::= oR # |= 28. ::= eO # ^= 29. ::= ls # << 30. ::= rs # >> 31. ::= lS # <<= 32. ::= rS # >>= 33. ::= eq # == 34. ::= ne # != 35. ::= lt # < 36. ::= gt # > 37. ::= le # <= 38. ::= ge # >= 39. ::= nt # ! 40. ::= aa # && 41. ::= oo # || 42. ::= pp # ++ 43. ::= mm # -- 44. ::= cm # , 45. ::= pm # ->* 46. ::= pt # -> 47. ::= cl # () 48. ::= ix # [] 49. ::= qu # ? 50. ::= st # sizeof (a type)51. ::= sz # sizeof (an expression)52. ::= cv <type> # (cast) 53. 54. ::= v <digit> <source-name> # vendor extended operator類(lèi)型編碼: 1. <type> ::= <CV-qualifiers> <type>2. 3. ::= P <type> # pointer-to4. ::= R <type> # reference-to5. ::= O <type> # rvalue reference-to (C++0x)6. ::= C <type> # complex pair (C 2000)7. ::= G <type> # imaginary (C 2000)8. ::= U <source-name> <type> # vendor extended type qualifier下面是一段簡(jiǎn)單的代碼: 1. /*2. * Author: Chaos Lee3. 4. * Description: A simple demo to show how the rules used to mangle functions' names work5. 6. * Date:2012/05/067. 8. */9. #include<iostream>10. #include<string>11. using namespace std;12. 13. int test_func(int & tmpInt,const char * ptr,double dou,string str,float f)14. 15. {16. return 0;17. }18. int main(int argc,char * argv[])19. 20. {21. char * test="test";22. int intNum = 10;23. double dou = 10.012;24. string str="str";25. float f = 1.2;26. test_func(intNum,test,dou,str,f);27. return 0;28. } 1. [lichao@sg01 name_mangling]$ g++ -c func.cpp2. 3. [lichao@sg01 name_mangling]$ nm func.cpp4. 5. nm: func.cpp: File format not recognized6. 7. [lichao@sg01 name_mangling]$ nm func.o8. 9. 0000000000000060 t _GLOBAL__I__Z9test_funcRiPKcdSsf10. U _Unwind_Resume11. 0000000000000022 t _Z41__static_initialization_and_destruction_0ii12. 13. 0000000000000000 T _Z9test_funcRiPKcdSsf14. 15. U _ZNSaIcEC1Ev16. U _ZNSaIcED1Ev17. U _ZNSsC1EPKcRKSaIcE18. U _ZNSsC1ERKSs19. U _ZNSsD1Ev20. U _ZNSt8ios_base4InitC1Ev21. U _ZNSt8ios_base4InitD1Ev22. 0000000000000000 b _ZSt8__ioinit23. 24. U __cxa_atexit25. U __dso_handle26. U __gxx_personality_v027. 0000000000000076 t __tcf_028. 29. 000000000000008e T main加粗的那行就是函數(shù)test_func經(jīng)過(guò)name mangling之后的結(jié)果,其中:l Ri,表示對(duì)整型變量的引用l PKc:表示const char *指針l Ss:目前還沒(méi)有找到原因。先留著~l f:表示浮點(diǎn)型name demanglingC++的name mangling技術(shù)一般使得函數(shù)變得面目全非,而很多情況下我們?cè)诓榭催@些符號(hào)的時(shí)候并不需要看到這些函數(shù)name mangling之后的效果,而是想看看是否定義了某個(gè)函數(shù),或者是否引用了某個(gè)函數(shù),這對(duì)于我們調(diào)試程序是非常有幫助的。所以需要一種方法從name mangling之后的符號(hào)變換為name mangling之前的符號(hào),這個(gè)過(guò)程稱(chēng)之為name demangling.事實(shí)上有很多工具提供這些功能,最常用的就是c++file命令,c++filt命令接受一個(gè)name mangling之后的符號(hào)作為輸入并輸出demangling之后的符號(hào)。例如: 1. [lichao@sg01 name_mangling]$ c++filt _Z9test_funcRiPKcdSsf2. 3. test_func(int&, char const*, double, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, float)一般更常用的方法為: 1. [lichao@sg01 name_mangling]$ nm func.o | c++filt2. 3. 0000000000000060 t global constructors keyed to _Z9test_funcRiPKcdSsf4. 5. U _Unwind_Resume6. 0000000000000022 t __static_initialization_and_destruction_0(int, int)7. 8. 0000000000000000 T test_func(int&, char const*, double, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, float)9. 10. U std::allocator<char>::allocator()11. 12. U std::allocator<char>::~allocator()13. 14. U std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&)15. 16. U std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)17. 18. U std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()19. 20. U std::ios_base::Init::Init()21. U std::ios_base::Init::~Init()22. 0000000000000000 b std::__ioinit23. 24. U __cxa_atexit25. U __dso_handle26. U __gxx_personality_v027. 0000000000000076 t __tcf_028. 29. 000000000000008e T main另外使用nm命令也可以demangle符號(hào),使用選項(xiàng)-C即可,例如: 1. [lichao@sg01 name_mangling]$ nm -C func.o2. 3. 0000000000000060 t global constructors keyed to _Z9test_funcRiPKcdSsf4. 5. U _Unwind_Resume6. 0000000000000022 t __static_initialization_and_destruction_0(int, int)7. 8. 0000000000000000 T test_func(int&, char const*, double, std::string, float)9. 10. U std::allocator<char>::allocator()11. 12. U std::allocator<char>::~allocator()13. 14. U std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(char const*, std::allocator<char> const&)15. 16. U std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::string const&)17. 18. U std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string()19. 20. U std::ios_base::Init::Init()21. U std::ios_base::Init::~Init()22. 0000000000000000 b std::__ioinit23. 24. U __cxa_atexit25. U __dso_handle26. U __gxx_personality_v027. 0000000000000076 t __tcf_028. 29. 000000000000008e T main又到了Last but not least important的時(shí)候了,還有一個(gè)特別重要的接口函數(shù)就是__cxa_demangle(),此函數(shù)的原型為: 1. namespace abi {2. extern "C" char* __cxa_demangle (const char* mangled_name,3. 4. char* buf,5. size_t* n,6. int* status);7. }用于將mangled_name所指向的mangled進(jìn)行demangle并將結(jié)果存放在buf中,n為buf的大小。status存放函數(shù)執(zhí)行的結(jié)果,返回值為0表示執(zhí)行成功。下面是使用這個(gè)接口函數(shù)進(jìn)行demangle的例子: 1. /*2. * Author: Chaos Lee3. 4. * Description: Employ __cxa_demangle to demangle a mangling function name.5. 6. * Date:2012/05/067. 8. *9. */10. #include<iostream>11. #include<cxxabi.h>12. using namespace std;13. 14. using namespace abi;15. 16. int main(int argc,char *argv[])17. 18. {19. const char * mangled_string = "_Z9test_funcRiPKcdSsf";20. 21. char buffer[100];22. int status;23. size_t n=100;24. __cxa_demangle(mangled_string,buffer,&n,&status);25. 26. cout<<buffer<<endl;27. cout<<status<<endl;28. return 0;29. }測(cè)試結(jié)果: 1. [lichao@sg01 name_mangling]$ g++ cxa_demangle.cpp -o cxa_demangle2. 3. [lichao@sg01 name_mangling]$ ./cxa_demangle4. 5. test_func(int&, char const*, double, std::string, float)6. 7. 0name mangling與黑客l 使用demangling可以破解動(dòng)態(tài)鏈接庫(kù)中的沒(méi)有公開(kāi)的APIl 編寫(xiě)名稱(chēng)為name mangling接口函數(shù),打開(kāi)重復(fù)符號(hào)的編譯開(kāi)關(guān),可以替換原來(lái)函數(shù)中鏈接函數(shù)的指向,從而改變程序的運(yùn)行結(jié)果。