#人工智能[超话]# 【最新人工智能论文动态2020年10月精选】Are wider nets better given the same number of parameters?
Anna Golubeva, Behnam Neyshabur, Guy Gur-Ari https://t.cn/A6bFTtmZ
Empirical studies demonstrate that the performance of neural networks improves with increasing numberof parameters. In most of these studies, the number of parameters is increased by increasing the networkwidth. This begs the question: Is the observed improvement due to the larger number of parameters, oris it due to the larger width itself? We compare different ways of increasing model width while keepingthe number of parameters constant. We show that for models initialized with a random, static sparsitypattern in the weight tensors, network width is the determining factor for good performance, while thenumber of weights is secondary, as long as trainability is ensured. As a step towards understanding thiseffect, we analyze these models in the framework of Gaussian Process kernels. We find that the distancebetween the sparse finite-width model kernel and the infinite-width kernel at initialization is indicative ofmodel performance.
Anna Golubeva, Behnam Neyshabur, Guy Gur-Ari https://t.cn/A6bFTtmZ
Empirical studies demonstrate that the performance of neural networks improves with increasing numberof parameters. In most of these studies, the number of parameters is increased by increasing the networkwidth. This begs the question: Is the observed improvement due to the larger number of parameters, oris it due to the larger width itself? We compare different ways of increasing model width while keepingthe number of parameters constant. We show that for models initialized with a random, static sparsitypattern in the weight tensors, network width is the determining factor for good performance, while thenumber of weights is secondary, as long as trainability is ensured. As a step towards understanding thiseffect, we analyze these models in the framework of Gaussian Process kernels. We find that the distancebetween the sparse finite-width model kernel and the infinite-width kernel at initialization is indicative ofmodel performance.
https://t.cn/A6bc8TOL
#Timo笔记#
直觉上觉得如果人数太少,那么统计power就有可能不够;但我一直只有一个对power模糊的概念,我不知道具体是否可以计算。
简言之,power = 1 - Beta;Beta Error概率(Beta Probability),是在零假设本身错误的前提下,你接受了它的概率。
图中的4,就是sample number。由这个公式可知,当样本数很少的时候,power会很低。反之,如果想增加power,就要提高样本量;所以在实验设计阶段,也要充分考虑样本量大小是否足以支持所研究的问题。
#Timo笔记#
直觉上觉得如果人数太少,那么统计power就有可能不够;但我一直只有一个对power模糊的概念,我不知道具体是否可以计算。
简言之,power = 1 - Beta;Beta Error概率(Beta Probability),是在零假设本身错误的前提下,你接受了它的概率。
图中的4,就是sample number。由这个公式可知,当样本数很少的时候,power会很低。反之,如果想增加power,就要提高样本量;所以在实验设计阶段,也要充分考虑样本量大小是否足以支持所研究的问题。
“那边真好看”“那边是租界”
“那边是天堂 这边是地域”
“国人皆如此 倭寇何敢”
“八百壮士”
晚场回来九点多睡的姥姥还给十点多到家的小破孩留着灯
好好长大
才能看到未来
历史的沉重
不是小破孩电影院几滴眼泪就背负得起的
小破孩wanna call your cellphone on the way home
but no phone number
Don't ask the bad child why
cause she is a 小破孩
去看#电影八佰0821# 吧
我喜欢王千源这样的老男人
“那边是天堂 这边是地域”
“国人皆如此 倭寇何敢”
“八百壮士”
晚场回来九点多睡的姥姥还给十点多到家的小破孩留着灯
好好长大
才能看到未来
历史的沉重
不是小破孩电影院几滴眼泪就背负得起的
小破孩wanna call your cellphone on the way home
but no phone number
Don't ask the bad child why
cause she is a 小破孩
去看#电影八佰0821# 吧
我喜欢王千源这样的老男人
✋热门推荐