CS107 - Lecture 3 - Note

2019-11-06 06:17:46

字體：大中小

來源：轉載

供稿：網友

第三次課繼續講內存管理，介紹了double、array，struct，&，*，以及大小端模式。Jerry會用很直觀小方框圖畫出出內存管理圖。高級語言里有很多模板，例如二分搜索、線形搜索、歸并排序、快速排序等等，當你理解了內存管理的本質，就可以自己實現這些內存管理的范式。

double

“Whatever bit pattern happened to reside there before is now PRetending to be a character for the lifetime of this statement right here.” 對于強制類型轉換的表達式，Jerry一直用這個口訣，翻譯下就是：“取地址，解引用，強制重新解釋地址，不管變量的bit pattern曾經是什么。”

e.g.1. double強制轉換為char

double d = 3.1416;char ch = *(char*)&d; cout<<ch<<endl;

Fig. 1. double強制轉換為char

這里寫圖片描述

e.g.2. 也是強制類型轉換，但double *強制重新解釋&s后，可能會導致s后沒有多余的6個byte內存而崩潰。

e.g.2. short強制轉換為double

short s = 45;double d = *(double*)&s;

Q&A：大小端機器之間的數據copy

有人提出了大小端機器之間數據copy的問題。如果是bit copy，沒有什么不同，因為大小端字節序的前提是建立在多字節存儲的時候；如果是byte copy，則會有高低位字節顛倒的情況。

Student: Are these examples gonna behave differently on low NDN? Jerry: Certainly, yeah. As far as bit copying is concerned, no. This ampersand(&), right here, is always the address of the lowest byte. But as far as how the NDNS has to do more with interpretation and placement of bytes relevant to another. There was these phrase in the handout that I kind of de-emphasized, but they’re there.

struct

通過e.g.3.和Fig. 2介紹結構體內存操作：

e.g.3. 結構體內存操作

//以C++方式定義了結構體類型fractionstruct fraction { int num; int denom;};fraction pi;pi.num = 22;pi.demon = 7;((fraction*)&pi.denom)->num = 12;((fraction*)&pi.denom)->denom = 33;

Fig. 2. 內存管理圖示

這里寫圖片描述

struct和array是混合介紹的，所以就有了e.g.4 以數組的形式管理結構體內存的例子：

e.g.4. 以數組的形式管理結構體內存

/*&pi位置的內存視為數組，這個數組就是一個8byte/單位的數組*/((fraction*)&pi.denom)[0].denom; (&pi)[1].num;

Q&A: 系統是如何將struct識別成array的？

Student: So you are use the address of pi and the X sets it as an array there. Is it gonna know that the asides of each element is a fraction? Jerry: Yep. That’s it uses the data typing of whatever pi is right there, and because ampersand of pi is an int star…*

array

C/C++不提供數組越界檢查（Bounce Checking），所以后面會有幾組越界例子，來展示直接操作內存的副作用；java會提供檢查，但CS107對此不作介紹。

e.g.4. array訪問操作

int array[10];array[0] = 44;array[10] = 1;array[25] = 25; //it's okarray[-4] = 77; //it's ok

e.g.4.只是例子，不是好的代碼，因為在C和C++中，聲明數組的同時，應該傳入數組的長度、寬度、和以列為元素的數組，從而讓內存的圖變得更加清晰，否則內存管理會變得混亂，甚至無法恢復。附Jerry的原話，我聽了很多遍，很難理解的一段：

Jerry: It doesn’t in C and C++, not at all. All it does is instruction for that one declaration as to how many ints to set aside space for. But once you do that, like, there’s no length of the array is that. But the length of the memory figure, it’s not exposed to you. So there’s no way to recover it. That’s why you always pass around the length, width, a raw array in C and C++.

Jerry: You use vectors more than you did raw arrays in C in 106B, but we’re gonna be more C programmers than C++.

Jerry強調了在C語言中使用array進行內存管理，和在106B課程使用C++ STL模板的vector進行內存管理，方式是不同的。vector具有高級內存管理的特性，array直接和底層打交道。

Q&A: 為什么在初始化時限制數組邊界？

Student: Say an array 10, like, you initialize it 10 by spaces, but like, what’s the point of initializing it if it’s just going to do, what’s basically do what you want when you inside of it. Jerry: That is true, actually. This right here is really just documentation for how much space is being allocated. And then you’re supposed to write code I’m not saying this is good code. I’m just saying its code. Okay? Your’re supposed to write code that’s consistent with the amount of space that you legally have. But this, this and this, just work because there’s no bounce checking. It doesn’t look arbitrarily far backwards to figure out whether or not it’s an in-range index. So when you get away with this, and it compiles, and it runs, it’s just gonna put a one where it assumes that the eleventh entry would be, or the twenty-sixty entry, or the negative fourth entry. Okay?*

e.g.4. array內存操作的語法：

array 等價 &array[0] (array+k) 等價 array[k] *array 等價 &array[0]*(array+k) 等價 &array[k] *(array-4) 等價 &array[-4]

*的操作會首先進行,所以k不是移動了4個byte，而是16個byte，進入一個矩形，放入一個數值。基于類型系統，k會知道自己是int *型的，從array基地址開始偏移int長度*k個字節。e.g.5介紹一些array的強制類型轉換：

e.g.5 array的強制類型轉換

int arr[5];arr[3] = 128;((short*)arr)[6] = 2; cout<<arr[3]<<endl;

array到short的強制類型轉換使其可以操縱任意的基本單元，e.g.5的內存操作如Figure 4.：

輸出結果是512+128，但是我在自己電腦上驗證輸出2，在紙上算也是2。本以為是大小端問題，后來還是師弟給出了正確解釋。視頻是2008年的，那個時候的int還是2 byte，long int才是4 byte，short（也就是short int）是1 byte，課程內容也是建立在32bit機器上，所以根本不是機器大小端的問題，而是操作系統位數的問題！很多C語言的經典著作里都會強調這個問題：

Stephen G. Kochan, Programming in C (4th Edition)

“You should never write programs that make any assumptions about the size of your data types. You are, however, guaranteed that a minimum amount of storage will be set aside for each basic type. For example, it’s guaranteed that an integer value will be stored in a minimum of 32 bit of storage, which is the size fo a “Word” on many computers.”

另外，對array的越界操作會影響到內存中保存array前后地址的數據，例如一個函數域function，這個區域是在函數運行階段，在堆棧中開辟的一段內存空間：

function: int a; int array[5]; double d;

array的越界可能會修改int a或double d的值，這種模型，是機器工作的真正模式，即將所有的局部變量打包成一個“活動記錄”的小東西(a memory block for all local variables in a function)

struct和array的混合模型

以C++方式聲明一個16 bytes的結構體，在pure C語言中沒有string，所以name表示成character arrays，在末尾是一個0 character or null character。

struct student { char *name; //動態字符數組 char star char suid[8]; //靜態字符數組 static character array of length 8 int numUnites; };student pupils[4];pupils[0].numUnits = 21;pupils[2].name = strdup("Adam");strcpy(pupils[1].suid, "2014110xxx");//求值指針表達式，6是基于char*類型進行的指針加減運算pupils[3].name= pupils[0].suid + 6; strcpy(pupils[3].name, "123456");

其中，strdup=shorthand for string duplicate，dynamically allocates just enough space to store the string，strdup不指定地址，而是在堆中分配一塊內存中寫下‘Adam’，再返回‘Adam’地址；strcpy并不分配地址，而是在內部指定一塊地址。在strcpy里有一個for循環進行character copy one after another直到0（0也被copy）。strdup似乎就是后來的malloc？

Q&A: 結構體在內存中以怎樣的形式存儲？

Jerry:“In 106B and 106X, and maybe 106A as well, you drew them as the somewhat loose rectangels around two boxes. Okay, I want to be a little bit more structured than that. I want to recognize that the amount of memory that’s set aside for the struct fraction, not surprisingly, is eight bytes. Okay? It’s obviously the sum of some of its parts, and it actually packs all of those bytes as tightly as possible.”

結構體在內存中的存儲，需要考慮變量在內存地址中的先后順序，以及各個變量間字節對齊的情況。Fig . 是結構體在內存中的存儲圖示：

在e.g. 5, line 13求值指針表達式中，strcpy會把pupils[0].suid+6視為一連串長度的字符序列空間基地址，傳給pupils[3].name，此時pupils[3].name里保存的應該是諸如0x1000，0x1001…這樣的地址值。得到基地址后，在基地址基礎上一個接著一個，向小方框賦值，或者將字符print出來。

Fig. 5. 內存圖表

這里寫圖片描述

C語言中的“泛型”

C語言中的泛型本質就是a simple function + advanced memory terminology，例如最常見的swap函數：

void swap(int *ap, int *bp);{ int temp = *ap; *ap = *bp; //求值它所尋址的部分 *bp = *temp;}int x = 7;int y = 117;swap(&x, &y)

實際上這種輾轉交換的過程和int無關，可以是double，char，struct，student…雖然在C中寫泛型很沒有必要，但是這確實是一種方式，尤其當你了解內存的原理后。下一次課會介紹這類泛型，用通用的指針和通用的字節交換來形成新的認識（in terms of generic pointers and generic byte）。

上一篇：codeforces 343 D. Water Tree （樹鏈剖分）

下一篇：yii2中getter 和 setter 注意事項