Learning Theory---誤差理論（Error Theory）

2019-11-06 06:34:39

字體：大中小

供稿：網(wǎng)友

Error = Bias + Variance

Error反映的是整個(gè)模型的準(zhǔn)確度，Bias反映的是模型在樣本上的輸出與真實(shí)值之間的誤差，即模型本身的精準(zhǔn)度，Variance反映的是模型每一次輸出結(jié)果與模型輸出期望之間的誤差，即模型的穩(wěn)定性。

BIAS是偏離率的意思，即信號(hào)檢測(cè)估計(jì)理論中常用的一種參數(shù)。

Variance是方差，方差是各個(gè)數(shù)據(jù)與其算術(shù)平均數(shù)的離差平方和的平均數(shù)。

Generalization error是泛化誤差，模型的泛化誤差(generalization error)不僅包括其在樣本上的期望誤差，還包括在訓(xùn)練集上的誤差。即在真實(shí)情況下模型的誤差。模型訓(xùn)練出來(lái)后，在測(cè)試集（抽樣）上測(cè)試會(huì)得出一個(gè)誤差，姑且叫E(in)，但這個(gè)誤差是否能真實(shí)反應(yīng)這個(gè)模型的預(yù)測(cè)的準(zhǔn)確性呢？不一定哦。如果測(cè)試集（抽樣）并不能很好地代表真實(shí)情況（總體），這個(gè)E(in)的值就不能很好地反應(yīng)模型實(shí)際情況下的表現(xiàn)。而模型在真實(shí)情況（總體）上的表現(xiàn)出的誤差就稱為泛化誤差，這個(gè)誤差才能真正地反映模型的預(yù)測(cè)的準(zhǔn)確性。

http://blog.csdn.net/linkin1005/article/details/42563229

切諾夫界：Chernoff bound

馬爾科夫不等式：

X為非負(fù)隨機(jī)變量，E(X)存在，對(duì)任意t>0,有 PR[x>t]<=E[X]/t

chernoff 界：

X1,X2,...,Xn為獨(dú)立泊松事件，Pr[Xi=1]=pi,X=sigma(i=0,n)Xi,u=E[X],對(duì)任意的0<=&<1,有

下界 Pr[X<(1-&)u]<(e^-&/(1-&)^(1-&))^u<e^(-u&2/2)

上界 Pr[X>(1+&)u]<=(e^&/(1+&)^(1+&)u)

X1,X2,...,Xn為離散獨(dú)立隨機(jī)變量，E{Xi}=0 |Xi|<=1,i=1,2,...,n,X=sigma(i=1,n)Xi, D{X}=&²

Pr[|X|>=t]<=2e^{-u^2/4}

Pr[X>=t]<=e^{-u^2/4}

training error是指在訓(xùn)練樣本上的損失的平均值。

風(fēng)險(xiǎn)函數(shù)（risk function）是度量模型在平均意義下的預(yù)測(cè)好壞。

可能近似正確（probably approximately correct，PAC）：計(jì)算理論研究什么時(shí)候一個(gè)問(wèn)題是可被計(jì)算的，而 PAC 學(xué)習(xí)理論，或者說(shuō)計(jì)算學(xué)習(xí)理論 (Computational Learning Theory) 主要研究的是什么時(shí)候一個(gè)問(wèn)題是可被學(xué)習(xí)的?？捎?jì)算性在計(jì)算理論中已經(jīng)有定義，而可學(xué)習(xí)性正是我們待會(huì)要定義的內(nèi)容。另外，計(jì)算理論中還有很大一部分精力花在研究問(wèn)題是可計(jì)算的時(shí)候，其復(fù)雜度又是什么樣的，因此，類似的，在計(jì)算學(xué)習(xí)理論中，也有研究可學(xué)習(xí)的問(wèn)題的復(fù)雜度的內(nèi)容，主要是樣本復(fù)雜度 (Sample Complexity) 。最后，在可計(jì)算的時(shí)候，得到實(shí)現(xiàn)計(jì)算的具體算法也是計(jì)算理論中的一個(gè)重要部分；而學(xué)習(xí)理論（或者更多的在“機(jī)器學(xué)習(xí)”這個(gè)課題下）當(dāng)然也會(huì)探討針對(duì)可學(xué)習(xí)的問(wèn)題的具體的學(xué)習(xí)算法。

In computational learning theory, probably approximately correct learning (PAC learning) is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant.[1]In this framework, the learner receives samples and must select a generalization function (called the hypothesis) from a certain class of possible functions. The goal is that, with high probability (the "probably" part), the selected function will have low generalization error (the "approximately correct" part). The learner must be able to learn the concept given any arbitrary approximation ratio, probability of success, or distribution of the samples.The model was later extended to treat noise (misclassified samples).An important innovation of the PAC framework is the introduction of computational complexity theory concepts to machine learning. In particular, the learner is expected to find efficient functions (time and space requirements bounded to a polynomial of the example size), and the learner itself must implement an efficient procedure (requiring an example count bounded to a polynomial of the concept size, modified by the approximation and likelihood bounds).

VC維（Vapnik-Chervonenkis Dimension）的概念是為了研究學(xué)習(xí)過(guò)程一致收斂的速度和推廣性，由統(tǒng)計(jì)學(xué)理論定義的有關(guān)函數(shù)集學(xué)習(xí)性能的一個(gè)重要指標(biāo)。

傳統(tǒng)的定義是：對(duì)一個(gè)指示函數(shù)集，如果存在H個(gè)樣本能夠被函數(shù)集中的函數(shù)按所有可能的2的H次方種形式分開，則稱函數(shù)集能夠把H個(gè)樣本打散；函數(shù)集的VC維就是它能打散的最大樣本數(shù)目H。若對(duì)任意數(shù)目的樣本都有函數(shù)能將它們打散，則函數(shù)集的VC維是無(wú)窮大，有界實(shí)函數(shù)的VC維可以通過(guò)用一定的閾值將它轉(zhuǎn)化成指示函數(shù)來(lái)定義。VC維反映了函數(shù)集的學(xué)習(xí)能力，VC維越大則學(xué)習(xí)機(jī)器越復(fù)雜（容量越大），遺憾的是，目前尚沒有通用的關(guān)于任意函數(shù)集VC維計(jì)算的理論，只對(duì)一些特殊的函數(shù)集知道其VC維。例如在N維空間中線性分類器和線性實(shí)函數(shù)的VC維是N+1。

http://blog.pluskid.org/?p=821

http://www.jianshu.com/p/695a2dac26b6