国产探花免费观看_亚洲丰满少妇自慰呻吟_97日韩有码在线_资源在线日韩欧美_一区二区精品毛片,辰东完美世界有声小说,欢乐颂第一季,yy玄幻小说排行榜完本

首頁 > 系統(tǒng) > Linux > 正文

Linux Hugetlbfs內核源碼簡析-----(一)Hugetlbfs初始化

2024-06-28 13:23:30
字體:
來源:轉載
供稿:網友
linux Hugetlbfs內核源碼簡析-----(一)Hugetlbfs初始化

一、引言

  為了實現(xiàn)虛擬內存管理機制,操作系統(tǒng)對內存實行分頁管理。自內存“分頁機制”提出之始,內存頁面的默認大小便被設置為 4096 字節(jié)(4KB),雖然原則上內存頁面大小是可配置的,但絕大多數的操作系統(tǒng)實現(xiàn)中仍然采用默認的 4KB 頁面。當某些應用的需要使用的內存達到幾G、甚至幾十G的時候,4KB的內存頁面將嚴重制約程序的性能。

  CPU緩存中有一組緩存專門用于緩存TLB,但其大小是有限的。當采用的默認頁面大小為 4KB,其產生的TLB較大,因而將會產生較多 TLB Miss 和缺頁中斷,從而大大影響應用程序的性能。操作系統(tǒng)以 2MB 甚至更大作為分頁的單位時,將會大大減少 TLB Miss 和缺頁中斷的數量,顯著提高應用程序的性能。這也正是 Linux 內核引入大頁面支持的直接原因。好處是很明顯的,假設應用程序需要 2MB 的內存,如果操作系統(tǒng)以 4KB 作為分頁的單位,則需要 512 個頁面,進而在 TLB 中需要 512 個表項,同時也需要 512 個頁表項,操作系統(tǒng)需要經歷至少 512 次 TLB Miss 和 512 次缺頁中斷才能將 2MB 應用程序空間全部映射到物理內存;然而,當操作系統(tǒng)采用 2MB 作為分頁的基本單位時,只需要一次 TLB Miss 和一次缺頁中斷,就可以為 2MB 的應用程序空間建立虛實映射,并在運行過程中無需再經歷 TLB Miss 和缺頁中斷(假設未發(fā)生 TLB 項替換和 Swap)。

  為了能以最小的代價實現(xiàn)大頁面支持,Linux 操作系統(tǒng)采用了基于 hugetlbfs 特殊文件系統(tǒng) 2M 字節(jié)大頁面支持。這種采用特殊文件系統(tǒng)形式支持大頁面的方式,使得應用程序可以根據需要靈活地選擇虛存頁面大小,而不會被強制使用 2MB 大頁面。

二、HugePage的使用

  本文的例子摘自 Linux 內核源碼中提供的有關說明文檔 (Documentation/vm/hugetlbpage.txt) 。使用 hugetlbfs 之前,首先需要在編譯內核 (make menuconfig) 時配置CONFIG_HUGETLB_PAGECONFIG_HUGETLBFS選項,這兩個選項均可在 File systems 內核配置菜單中找到。

  內核編譯完成并成功啟動內核之后,將 hugetlbfs 特殊文件系統(tǒng)掛載到根文件系統(tǒng)的某個目錄上去,以使得 hugetlbfs 可以訪問。命令如下:

  mount none /mnt/huge -t hugetlbfs

  此后,只要是在 /mnt/huge/ 目錄下創(chuàng)建的文件,將其映射到內存中時都會使用 2MB 作為分頁的基本單位。值得一提的是,hugetlbfs 中的文件是不支持讀 / 寫系統(tǒng)調用 ( 如read()write()等 ) 的,一般對它的訪問都是以內存映射的形式進行的。為了更好地介紹大頁面的應用,接下來將給出一個大頁面應用的例子,該例子同樣也是摘自于上述提到的內核文檔,只是略有簡化。

 1 清單 1. Linux 大頁面應用示例 2  #include <fcntl.h>  3  #include <sys/mman.h>  4  #include <errno.h>  5  6  #define MAP_LENGTH      (10*1024*1024)  7  8  int main()  9  { 10     int fd; 11     void * addr; 12 13     /* create a file in hugetlb fs */ 14     fd = open("/mnt/huge/test", O_CREAT | O_RDWR); 15     if(fd < 0){ 16         perror("Err: "); 17         return -1; 18     }   19 20     /* map the file into address space of current application PRocess */ 21     addr = mmap(0, MAP_LENGTH, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); 22     if(addr == MAP_FAILED){ 23         perror("Err: "); 24         close(fd); 25         unlink("/mnt/huge/test"); 26         return -1; 27     }   28 29     /* from now on, you can store application data on huage pages via addr */ 30 31     munmap(addr, MAP_LENGTH); 32     close(fd); 33     unlink("/mnt/huge/test"); 34     return 0; 35  }

  對于系統(tǒng)中大頁面的統(tǒng)計信息可以在 Proc 特殊文件系統(tǒng)(/proc)中查到,如/proc/sys/vm/nr_hugepages給出了當前內核中配置的大頁面的數目,也可以通過該文件配置大頁面的數目,如:

  echo 20 > /proc/sys/vm/nr_hugepages

三、Hugetlbfs的初始化(基于Linux-3.4.51)

1、hugetlb的初始化

  hugetlb初始化是通過hugetlb_init()函數實現(xiàn)的,主要是初始化hstates[MAX_NUMNODES]全局數組以及創(chuàng)建sysfs相關目錄文件?!?/p>

 1 static int __init hugetlb_init(void) 2 { 3     /* Some platform decide whether they support huge pages at boot 4      * time. On these, such as powerpc, HPAGE_SHIFT is set to 0 when 5      * there is no such support 6      */ 7     if (HPAGE_SHIFT == 0) 8         return 0; 9 10     if (!size_to_hstate(default_hstate_size)) {11         default_hstate_size = HPAGE_SIZE;  /*默認大小為2M*/12         if (!size_to_hstate(default_hstate_size))13        /* 初始化hstates[MAX_NUMNODES]數組,數組中只有一個成員;14         * HUGETLB_PAGE_ORDER = 9,即,h->order = 9;15         */16        hugetlb_add_hstate(HUGETLB_PAGE_ORDER);17     }18    /*由于hstates[]只有一個成員,default_hstate_idx = 0*/19     default_hstate_idx = size_to_hstate(default_hstate_size) - hstates;20    /*默認最大頁數為0*/21     if (default_hstate_max_huge_pages)22         default_hstate.max_huge_pages = default_hstate_max_huge_pages;23 24   /*由于最大頁數為0,沒有為hstate[]分配任何頁*/25   hugetlb_init_hstates();26   /*這個函數不知道干啥???*/27   gather_bootmem_prealloc();28   /*打印初始化后的相關信息*/29   report_hugepages();30   /*初始化/sys/kernel/mm/hugepages相關目錄文件*/31   hugetlb_sysfs_init();32   /*初始化/sys/device/system/node/node*/hugepages相關目錄文件*/33   hugetlb_register_all_nodes();34   return 0;35 }36 module_init(hugetlb_init);

另外,hugepage的默認大小也可以通過配置內核啟動參數“default_hugepagesz”指定,例如:default_hugepagesz=4M,指定default_hstate_size的大小為4M,其內核實現(xiàn)如下:  
1 static int __init hugetlb_default_setup(char *s)2 {3     default_hstate_size = memparse(s, &s);4     return 1;5 }6 __setup("default_hugepagesz=", hugetlb_default_setup);
hugepage的大頁是通過將N個連續(xù)的4k頁作為一個混合頁來實現(xiàn)大頁面的。hugepage的頁數也可以通過內核啟動參數“hugepages”指定。例如:hugepages=1024,其內核實現(xiàn)如下:
 1 static int __init hugetlb_nrpages_setup(char *s) 2 { 3     unsigned long *mhp; 4     static unsigned long *last_mhp; 5     /* 6      * !max_hstate means we haven't parsed a hugepagesz= parameter yet, 7      * so this hugepages= parameter goes to the "default hstate". 8      */ 9     if (!max_hstate)10         mhp = &default_hstate_max_huge_pages;11     else12         mhp = &parsed_hstate->max_huge_pages;13     if (mhp == last_mhp) {14         printk(KERN_WARNING "hugepages= specified twice without "15             "interleaving hugepagesz=, ignoring/n");16         return 1;17     }18     if (sscanf(s, "%lu", mhp) <= 0)19         *mhp = 0;20     /*21      * Global state is always initialized later in hugetlb_init.22      * But we need to allocate >= MAX_ORDER hstates here early to still23      * use the bootmem allocator.24      */25    /* parsed_hstate->order = 9, MAX_ORDER = 11, 不會調用hugetlb_hstate_alloc_pages();26     * 通過內核啟動參數配置頁面數,什么時候分配具體的內存頁???27     */28     if (max_hstate && parsed_hstate->order >= MAX_ORDER)29         hugetlb_hstate_alloc_pages(parsed_hstate);30     last_mhp = mhp;31     return 1;32 }33 __setup("hugepages=", hugetlb_nrpages_setup);

hugepage的頁數也可以通過命令配置,echo 20 > /proc/sys/vm/nr_hugepages,此時,是通過系統(tǒng)調用實現(xiàn)的。內核實現(xiàn)如下:

1 int hugetlb_sysctl_handler(struct ctl_table *table, int write,2               void __user *buffer, size_t *length, loff_t *ppos)3 {4     return hugetlb_sysctl_handler_common(false, table, write,5                             buffer, length, ppos);6 }
 1 static int hugetlb_sysctl_handler_common(bool obey_mempolicy, 2              struct ctl_table *table, int write, 3              void __user *buffer, size_t *length, loff_t *ppos) 4 { 5     struct hstate *h = &default_hstate; 6     unsigned long tmp; 7     int ret; 8     tmp = h->max_huge_pages; 9     if (write && h->order >= MAX_ORDER)10         return -EINVAL;11     table->data = &tmp;12     table->maxlen = sizeof(unsigned long);13   /*從用戶空間將數值copy賦值給tabel->data,即tmp,并做相關檢查*/14     ret = proc_doulongvec_minmax(table, write, buffer, length, ppos);15     if (ret)16         goto out;17     if (write) {        18           NODEMASK_ALLOC(nodemask_t, nodes_allowed, GFP_KERNEL | __GFP_NORETRY);19         if (!(obey_mempolicy &&20                    init_nodemask_of_mempolicy(nodes_allowed))) {21             NODEMASK_FREE(nodes_allowed);22             nodes_allowed = &node_states[N_HIGH_MEMORY];23         }24      /*設置最大頁數,并分配具體內存頁*/25         h->max_huge_pages = set_max_huge_pages(h, tmp, nodes_allowed);26         if (nodes_allowed != &node_states[N_HIGH_MEMORY])27             NODEMASK_FREE(nodes_allowed);28     }29 out:30     return ret;31 }
 1 static unsigned long set_max_huge_pages(struct hstate *h, unsigned long count, 2                         nodemask_t *nodes_allowed) 3 { 4     unsigned long min_count, ret; 5     if (h->order >= MAX_ORDER) 6         return h->max_huge_pages; 7     /* 8      * Increase the pool size 9      * First take pages out of surplus state.  Then make up the10      * remaining difference by allocating fresh huge pages.11      *12      * We might race with alloc_buddy_huge_page() here and be unable13      * to convert a surplus huge page to a normal huge page. That is14      * not critical, though, it just means the overall size of the15      * pool might be one hugepage larger than it needs to be, but16      * within all the constraints specified by the sysctls.17      */18     spin_lock(&hugetlb_lock);19     while (h->surplus_huge_pages && count > persistent_huge_pages(h)) {20         if (!adjust_pool_surplus(h, nodes_allowed, -1))21             break;22     }23     while (count > persistent_huge_pages(h)) {24         /*25          * If this allocation races such that we no longer need the26          * page, free_huge_page will handle it by freeing the page27          * and reducing the surplus.28          */29         spin_unlock(&hugetlb_lock);30      /*分配內存頁*/31         ret = alloc_fresh_huge_page(h, nodes_allowed);32         spin_lock(&hugetlb_lock);33         if (!ret)34             goto out;35         /* Bail for signals. Probably ctrl-c from user */36         if (signal_pending(current))37             goto out;38     }39     /*40      * Decrease the pool size41      * First return free pages to the buddy allocator (being careful42      * to keep enough around to satisfy reservations).  Then place43      * pages into surplus state as needed so the pool will shrink44      * to the desired size as pages become free.45      *46      * By placing pages into the surplus state independent of the47      * overcommit value, we are allowing the surplus pool size to48      * exceed overcommit. There are few sane options here. Since49      * alloc_buddy_huge_page() is checking the global counter,50      * though, we'll note that we're not allowed to exceed surplus51      * and won't grow the pool anywhere else. Not until one of the52      * sysctls are changed, or the surplus pages go out of use.53      */54     min_count = h->resv_huge_pages + h->nr_huge_pages - h->free_huge_pages;55     min_count = max(count, min_count);56     try_to_free_low(h, min_count, nodes_allowed);57     while (min_count < persistent_huge_pages(h)) {58         if (!free_pool_huge_page(h, nodes_allowed, 0))59             break;60     }61     while (count < persistent_huge_pages(h)) {62         if (!adjust_pool_surplus(h, nodes_allowed, 1))63             break;64     }65 out:66     ret = persistent_huge_pages(h);67     spin_unlock(&hugetlb_lock);68     return ret;69 }

 1 static int alloc_fresh_huge_page(struct hstate *h, nodemask_t *nodes_allowed) 2 { 3     struct page *page; 4     int start_nid; 5     int next_nid; 6     int ret = 0; 7     start_nid = hstate_next_node_to_alloc(h, nodes_allowed); 8     next_nid = start_nid; 9     do {10      /* 從內存Node的zonelist上分配2^h->order個4K的內存頁,返回第一個page的地址;11       * 如果分配不成功,從下一個內存Node上嘗試;12       */13         page = alloc_fresh_huge_page_node(h, next_nid);14         if (page) {15             ret = 1;16             break;17         }18         next_nid = hstate_next_node_to_alloc(h, nodes_allowed);19     } while (next_nid != start_nid);20     if (ret)21         count_vm_event(HTLB_BUDDY_PGALLOC);22     else23         count_vm_event(HTLB_BUDDY_PGALLOC_FAIL);24     return ret;25 }

 1 static struct page *alloc_fresh_huge_page_node(struct hstate *h, int nid) 2 { 3     struct page *page; 4     if (h->order >= MAX_ORDER) 5         return NULL; 6     /*__GFP_COMP標志:分配2^h->order個連續(xù)的4K大小的page,返回第一個Page的地址,并設置PG_compound標記*/ 7    page = alloc_pages_exact_node(nid, 8    htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE| 9                         __GFP_REPEAT|__GFP_NOWARN,10    huge_page_order(h));11     if (page) {12         if (arch_prepare_hugepage(page)) {13             __free_pages(page, huge_page_order(h));14             return NULL;15         }16      /* 1、將已分配的2^h->order個數的page中的第二個page的lru.next執(zhí)行函數free_huge_page();17       * 2、在put_page()函數中,最后調用free_huge_page()-->enqueue_huge_page(),將page加入到h->hugepages_freelists[nid]鏈表;18       */19         prep_new_huge_page(h, page, nid);20     }21     return page;22 }

2、hugetlbfs的初始化

hugetlbfs的創(chuàng)建,主要是建立VFS層的super_block、dentry、inode之間的相關映射,同時也和hugetlb_init()函數中初始化的hstates[]數組關聯(lián)起來了,也就和分配的大內存頁關聯(lián)起來了。如下圖(有點亂):

 1 static int __init init_hugetlbfs_fs(void) 2 { 3     int error; 4     struct vfsmount *vfsmount; 5  6     /*初始化hugetlbfs回寫數據結構*/ 7     error = bdi_init(&hugetlbfs_backing_dev_info); 8     if (error) 9         return error;10 11     error = -ENOMEM;12     /*創(chuàng)建slab緩存hugetlbfs_inode_cachep,后續(xù)hugetlbfs的inode從這里面分配*/13     hugetlbfs_inode_cachep = kmem_cache_create("hugetlbfs_inode_cache",14                     sizeof(struct hugetlbfs_inode_info),15                     0, 0, init_once);16     if (hugetlbfs_inode_cachep == NULL)17         goto out2;18 19     /*將hugetlbfs_fs_type加入到全局file_systems鏈表中*/20     error = register_filesystem(&hugetlbfs_fs_type);21     if (error)22         goto out;23 24     /* 創(chuàng)建hugetlbfs的super_block、entry、inode,并建立它們之間的相互映射,25    * 以及它們與hugetlbfs_fs_type、default_hstate、hugetlbfs_inode_cachep之間的映射關系26    */27     vfsmount = kern_mount(&hugetlbfs_fs_type);28 29     if (!IS_ERR(vfsmount)) {30         hugetlbfs_vfsmount = vfsmount;31         return 0;32     }33 34     error = PTR_ERR(vfsmount);35 36  out:37     kmem_cache_destroy(hugetlbfs_inode_cachep);38  out2:39     bdi_destroy(&hugetlbfs_backing_dev_info);40     return error;41 }42     

有不足或錯誤之處,歡迎指出。

參考:

http://www.ibm.com/developerworks/cn/linux/l-cn-hugetlb/


發(fā)表評論 共有條評論
用戶名: 密碼:
驗證碼: 匿名發(fā)表
主站蜘蛛池模板: 屯门区| 镇远县| 嵊州市| 东阳市| 屏边| 舞阳县| 淮阳县| 城固县| 尉氏县| 墨脱县| 甘南县| 元江| 昆明市| 岑巩县| 孝昌县| 水富县| 额尔古纳市| 榆中县| 北碚区| 乐清市| 合川市| 遵义县| 林口县| 荣昌县| 徐闻县| 松阳县| 东山县| 蓝田县| 富民县| 宁远县| 巫溪县| 温州市| 喜德县| 安阳县| 黄浦区| 绿春县| 台湾省| 额尔古纳市| 龙游县| 池州市| 莱西市|