博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
Tikhonov regularization 吉洪诺夫 正则化
阅读量:6702 次
发布时间:2019-06-25

本文共 3778 字,大约阅读时间需要 12 分钟。

        这个知识点很重要,但是,我不懂。

        第一个问题:为什么要做正则化?

In mathematics, statistics, and computer science, particularly in the fields of machine learning and inverse problems, regularization is a process of introducing additional information in order to solve an ill-posed problem or to prevent overfitting

And, what is ill-posed problem?... ...

And, what is overfitting? In statistics, overfitting is "the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably", as the next figure shows. 

Figure 1.  The green curve represents an overfitted model and the black line represents a regularized model. While the green line best follows the training data, it is too dependent on that data and it is likely to have a higher error rate on new unseen data, compared to the black line.

        第二个问题:常用的正则化方法有哪些?

        第三个问题:The advantages fo Tikhonov regularizatioin

        The fourth question: Tikhonov regularization

        Tikhonov regularization, named for Andrey Tikhonov, is the most commonly used method of regularization of ill-posed problems. In statistics, the method is known as ridge regression, in machine learning it is known as weight decay, and with multiple independent discoveries, it is also variously known as the Tikhonov–Miller method, the Phillips–Twomey method, the constrained linear inversion method, and the method of linear regularization. It is related to the Levenberg–Marquardt algorithm for non-linear least-squares problems.

        Suppose that for a known matrix A and vector b, we wish to find a vector x such that:

        A\mathbf {x} =\mathbf {b}

        The standard approach is  linear regression. However, if no x satisfies the equation or more than one x does—that is, the solution is not unique—the problem is said to be . In such cases, ordinary least squares estimation leads to an  (), or more often an  () system of equations. Most real-world phenomena have the effect of  in the forward direction where A maps x to b. Therefore, in solving the inverse-problem, the inverse mapping operates as a  that has the undesirable tendency of amplifying noise ( / singular values are largest in the reverse mapping where they were smallest in the forward mapping). In addition, ordinary least squares implicitly nullifies every element of the reconstructed version of x that is in the null-space of A, rather than allowing for a model to be used as a prior for \mathbf {x}. Ordinary least squares seeks to minimize the sum of squared , which can be compactly written as:

        {\displaystyle \|A\mathbf {x} -\mathbf {b} \|_{2}^{2}}  where {\displaystyle \left\|\cdot \right\|_{2}} is the . 

                                             

         In order to give preference to a particular solution with desirable properties, a regularization term can be included in this minimization:

        {\displaystyle \|A\mathbf {x} -\mathbf {b} \|_{2}^{2}+\|\Gamma \mathbf {x} \|_{2}^{2}} for some suitably chosen Tikhonov matrix\Gamma. In many cases, this matrix is chosen as a multiple of the  (\Gamma =\alpha I), giving preference to solutions with smaller ; this is known as L2 regularization. In other cases, high-pass operators (e.g., a  or a weighted ) may be used to enforce smoothness if the underlying vector is believed to be mostly continuous. This regularization improves the conditioning of the problem, thus enabling a direct numerical solution. An explicit solution, denoted by {\hat {x}}, is given by:

{\displaystyle {\hat {x}}=(A^{\top }A+\Gamma ^{\top }\Gamma )^{-1}A^{\top }\mathbf {b} }, process can be seen at (https://blog.csdn.net/nomadlx53/article/details/50849941).

The effect of regularization may be varied via the scale of matrix \Gamma. For \Gamma =0 this reduces to the unregularized least squares solution provided that (ATA)−1 exists.

L2 regularization is used in many contexts aside from linear regression, such as  with  or , and matrix factorization.

     

对于y=Xw,若X无解或有多个解,称这个问题是病态的。病态问题下,用最小二乘法求解会导致过拟合或欠拟合,用正则化来解决。

设X为m乘n矩阵:

  • 过拟合模型:m<<nm<<n,欠定方程,存在多解的可能性大;
  • 欠拟合模型:m>>nm>>n,超定方程,可能无解,或者有解但准确率很低

 

 

REF:

https://blog.csdn.net/darknightt/article/details/70179848

转载于:https://www.cnblogs.com/shyang09/p/9120007.html

你可能感兴趣的文章
解决Office系列安装不上的办法
查看>>
vimdiff的简单使用
查看>>
我的友情链接
查看>>
我的友情链接
查看>>
工作两个月的感受随笔
查看>>
工作的习惯,看到好收藏下
查看>>
理解CSS3 transform中的Matrix(矩阵)
查看>>
联想ThinkCentre M8400t-n000等高配电脑重装成xp蓝屏0xc000007b代码
查看>>
我的友情链接
查看>>
如何成为“10倍效率”开发者
查看>>
谈判高手子贡---引导“用户”的专家
查看>>
我的友情链接
查看>>
(一)prometheus与grafana介绍与安装
查看>>
GLSL学习笔记
查看>>
我的友情链接
查看>>
关于我
查看>>
PowerShell变量——PowerShell三分钟(七)
查看>>
安装perl5.10.0
查看>>
【学习笔记】在storyboard中给TabViewController添加tab页面
查看>>
注册广播
查看>>