List和Dictionary泛型類查找效率淺析

2019-11-17 02:42:58

字體：大中小

供稿：網(wǎng)友

List和Dictionary泛型類查找效率淺析

List和Dictionary泛型類查找效率存在巨大差異，前段時(shí)間親歷了一次。事情的背景是開發(fā)一個(gè)匹配程序，將書籍（BookID）推薦給網(wǎng)友（UserID），生成今日推薦數(shù)據(jù)時(shí)，有條規(guī)則是同一書籍七日內(nèi)不能推薦給同一網(wǎng)友。

同一書籍七日內(nèi)不能推薦給同一網(wǎng)友規(guī)則的實(shí)現(xiàn)是程序不斷優(yōu)化的過程，第一版程序是直接取數(shù)據(jù)庫(kù)，根據(jù)BookID+UserID查詢七日內(nèi)有無記錄，有的話不進(jìn)行分配。但隨著數(shù)據(jù)量的增大，程序運(yùn)行時(shí)間越來越長(zhǎng)，于是開始優(yōu)化。第一次優(yōu)化是把所有七日內(nèi)的數(shù)據(jù)取出來，放到List<T>中，然后再內(nèi)存中進(jìn)行查找，發(fā)現(xiàn)這樣效率只是稍有提高，但不明顯。第二次優(yōu)化采用了Dictionary<TKey, TValue>，意外的發(fā)現(xiàn)效果不是一般的好，程序效率提高了幾倍。

下面是偽代碼，簡(jiǎn)化了程序代碼，只是為說明List和Dictionary效率的差別，并不具備實(shí)際意義。

    /// <summary>    /// 集合類效率測(cè)試    /// </summary>    public class SetEfficiencyTest    {        static List<TestModel> todayList = InitTodayData();        static List<TestModel> historyList = InitHisoryData();        public static void Run()        {            CodeTimer.Time("ListTest", 1, ListTest);            CodeTimer.Time("DictionaryTest", 1, DictionaryTest);        }        public static void ListTest()        {            List<TestModel> resultList = todayList.FindAll(re =>             {                 if (historyList.Exists(m => m.UserID == re.UserID && m.BookID == re.BookID))                 {                     return false;                 }                 return true;             });        }        public static void DictionaryTest()        {            Dictionary<int, List<string>> bDic = new Dictionary<int, List<string>>();            foreach (TestModel obj in historyList)            {                if (!bDic.ContainsKey(obj.UserID))                {                    bDic.Add(obj.UserID, new List<string>());                }                bDic[obj.UserID].Add(obj.BookID);            }            List<TestModel> resultList = todayList.FindAll(re =>            {                if (bDic.ContainsKey(re.UserID) && bDic[re.UserID].Contains(re.BookID))                {                    return false;                }                return true;            });        }        /// <summary>        /// 初始化數(shù)據(jù)（今日）        /// </summary>        /// <returns></returns>        public static List<TestModel> InitTodayData()        {            List<TestModel> list = new List<TestModel>();            for (int i = 0; i < 10000; i++)            {                list.Add(new TestModel() { UserID = i, BookID = i.ToString() });            }            return list;        }        /// <summary>        /// 初始化數(shù)據(jù)（歷史）        /// </summary>        /// <returns></returns>        public static List<TestModel> InitHisoryData()        {            List<TestModel> list = new List<TestModel>();            Random r = new Random();            int loopTimes = 60000;            for (int i = 0; i < loopTimes; i++)            {                list.Add(new TestModel() { UserID = r.Next(0, loopTimes), BookID = i.ToString() });            }            return list;        }        /// <summary>        /// 測(cè)試實(shí)體        /// </summary>        public class TestModel        {            /// <summary>            /// 用戶ID            /// </summary>            public int UserID { get; set; }            /// <summary>            /// 書ID            /// </summary>            public string BookID { get; set; }        }    }

View Code

輸出如下：

真是想不到，兩者效率相差這么多。接下來研究下兩者差異巨大的原因。

List<T>.Exists()函數(shù)的實(shí)現(xiàn)：

        public bool Exists(PRedicate<T> match)        {            return this.FindIndex(match) != -1;        }          public int FindIndex(Predicate<T> match)        {            return this.FindIndex(0, this._size, match);        }        public int FindIndex(int startIndex, int count, Predicate<T> match)        {            if (startIndex > this._size)            {                ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.startIndex, ExceptionResource.ArgumentOutOfRange_Index);            }            if (count < 0 || startIndex > this._size - count)            {                ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.count, ExceptionResource.ArgumentOutOfRange_Count);            }            if (match == null)            {                ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);            }            int num = startIndex + count;            for (int i = startIndex; i < num; i++)            {                if (match(this._items[i]))                {                    return i;                }            }            return -1;        }

View Code

List<T>.Exists 本質(zhì)是通過循環(huán)查找出該條數(shù)據(jù)，每一次的調(diào)用都會(huì)重頭循環(huán)，所以效率很低。顯然，這是不可取的。

Dictionary<TKey, TValue>.ContainsKey()函數(shù)的實(shí)現(xiàn)：

        public bool ContainsKey(TKey key)        {            return this.FindEntry(key) >= 0;        }        // System.Collections.Generic.Dictionary<TKey, TValue>        private int FindEntry(TKey key)        {            if (key == null)            {                ThrowHelper.ThrowArgumentNullException(ExceptionArgument.key);            }            if (this.buckets != null)            {                int num = this.comparer.GetHashCode(key) & 2147483647;                for (int i = this.buckets[num % this.buckets.Length]; i >= 0; i = this.entries[i].next)                {                    if (this.entries[i].hashCode == num && this.comparer.Equals(this.entries[i].key, key))                    {                        return i;                    }                }            }            return -1;        }

View Code

Dictionary<TKey, TValue>.ContainsKey()內(nèi)部是通過Hash查找實(shí)現(xiàn)的，所以效率比List高出很多。

最后，給出MSDN上的建議：

1.如果需要非常快地添加、刪除和查找項(xiàng)目，而且不關(guān)心集合中項(xiàng)目的順序，那么首先應(yīng)該考慮使用 System.Collections.Generic.Dictionary<TKey, TValue>（或者您正在使用 .NET Framework 1.x，可以考慮 Hashtable）。三個(gè)基本操作（添加、刪除和包含）都可快速操作，即使集合包含上百萬(wàn)的項(xiàng)目。

2.如果您的使用模式很少需要?jiǎng)h除和大量添加，而重要的是保持集合的順序，那么您仍然可以選擇 List<T>。雖然查找速度可能比較慢（因?yàn)樵谒阉髂繕?biāo)項(xiàng)目時(shí)需要遍歷基礎(chǔ)數(shù)組），但可以保證集合會(huì)保持特定的順序。

3.您可以選擇 Queue<T> 實(shí)現(xiàn)先進(jìn)先出 (FIFO) 順序或 Stack<T> 實(shí)現(xiàn)后進(jìn)先出 (LIFO) 順序。雖然 Queue<T> 和 Stack<T> 都支持枚舉集合中的所有項(xiàng)目，但前者只支持在末尾插入和從開頭刪除，而后者只支持從開頭插入和刪除。

4.如果需要在實(shí)現(xiàn)快速插入的同時(shí)保持順序，那么使用新的 LinkedList<T> 集合可幫助您提高性能。與 List<T> 不同，LinkedList<T> 是作為動(dòng)態(tài)分配的對(duì)象鏈實(shí)現(xiàn)。與 List<T> 相比，在集合中間插入對(duì)象只需要更新兩個(gè)連接和添加新項(xiàng)目。從性能的角度來看，鏈接列表的缺點(diǎn)是垃圾收集器會(huì)增加其活動(dòng)，因?yàn)樗仨毐闅v整個(gè)列表以確保沒有對(duì)象沒有被釋放。另外，由于每個(gè)節(jié)點(diǎn)相關(guān)的開銷以及每個(gè)節(jié)點(diǎn)在內(nèi)存中的位置等原因，大的鏈接列表可能會(huì)出現(xiàn)性能問題。雖然將項(xiàng)目插入到 LinkedList<T> 的實(shí)際操作比在 List<T> 中插入要快得多，但是找到要插入新值的特定位置仍需遍歷列表并找到正確的位置。

參考資料：CLR 完全介紹: 最佳實(shí)踐集合，List和hashtable之查找效率

上一篇：[C#]火星坐標(biāo)系 (GCJ-02)與百度坐標(biāo)系 (BD-09) 轉(zhuǎn)換幫助類

下一篇：一個(gè)執(zhí)行Dos命令的窗口程序，與各位分享。