... 三、从马尔可夫决策过程到强化学习(from Markov Decision Process to Reinforce Learning) 四、值函数估计(Value function approximation) 五、策略搜索(Policy Search) ...
基于6个网页-相关网页
An appropriate selection of basis function directly in?uences the learning performance of a policy iteration method during the value function approximation.
该算法先用渐进方法进行多序列比对,然后通过迭代策略,利用上一轮多序列比对结果修正指导树,产生新一轮比对。
An appropriate selection of basis function directly in? Uences the learning performance of a policy iteration method during the value function approximation.
在策略迭代结强化学习方法的值函数逼近过程中,基函数的合理选择直接影响方法的性能。
At paraxial approximation the maximum of the matching function is increasing with increased divergence Angle. For different brightness of pumping light, the optimum value of the Angle is obtained.
发现在旁轴近似下,匹配函数的最大值随泵浦光发散角的增大而增大,在考虑到像散的影响后,得到泵浦光发散角的参考值。
应用推荐