标签 - 信任区域
2023
Hands on Reinforcement Learning Advanced Chapter
Hands on Reinforcement Learning Advanced Chapter
公告
迷茫研究生,研究方向:逆强化学习
最新文章
最新评论
正在加载中...
标签
AcWingHexo算法竞赛进阶指南考研英语高等数学考研政治强化学习Reinforcement LearningProximal Policy OptimizationPPO近端策略优化算法Imitation Learning模仿学习GANGenerative Adversarial Network生成式对抗网络Behavior CloningBC行为克隆Inverse Reinforcement LearningIRL逆向强化学习generative adversarial imitation learning,GAIL生成式对抗模仿学习Deep Q NetworkDQNDouble DQNDueling DQNpolicy-basedpolicy gradientREINFORCEActor-CriticTrust Region Policy OptimizationTRPO信任区域策略优化conjugate gradient method共轭梯度法KKT约束优化问题GAEPearlmutter Trick信任区域trust regionDeep Deterministic Policy GradientDDPGoff-policySACSoft Q LearningSoft Actor Critic多臂老虎机ϵ-Greedyupper confidence boundThompson sampling马尔可夫决策过程Markov decision processMDP蒙特卡洛方法Monte-Carlo methods占用度量occupancy measure贝尔曼期望方程贝尔曼最优方程Bellman Expectation EquationBellman optimality equation状态价值函数动作价值函数action-value functionstate-value function随机过程stochastic process马尔可夫过程Markov process动态规划Dynamic programming策略迭代价值迭代policy iterationvalue iteration时序差分Temporal differenceSarsa algorithmNstep Sarsa algorithmQ-learning在线策略算法离线策略算法on-policyDyna-Qmodel-based reinforcement learning深度强化学习Deep Reinforcement LearningCS231nData-driven Approachk-Nearest Neighbortrain/val/test splitsL1/L2 distanceshyperparameter searchcross-validationModel Predictive ControlMPC打靶法ShootingRandom Shooting MethodRSM随机打靶法Cross Entropy MethodCEM交叉熵方法Probabilistic Ensembles with Trajectory SamplingPETSEnsemble Learning集成学习模型预测控制Model-Based Policy OptimizationMBPO基于模型的策略优化分支推演Offline Reinforcement Learning离线强化学习Extrapolation Error外推误差Batch-Constrained Policy批量限制策略VAEVariational Auto-Encoder变分自动编码器Conservative Q-learning保守 Q-learningCQLGoal-oriented Reinforcement LearningGoRL目标导向的强化学习Hindsight Experience Replay AlgorithmHER 算法Multi-agent Reinforcement LearningMARL多智能体强化学习fully centralizedfully decentralized完全中心化完全去中心化Independent PPOIPPOIndependent Learning独立学习Centralized Training with Decentralized ExecutionCTDE中心化训练去中心化执行muli-agent DDPGMADDPGpartially observable Markov games部分可观测马尔可夫博弈Gumbel-Softmax变分推断Variational InferenceKL散度
网站资讯
文章数目 :
45
已运行时间 :
本站总字数 :
481.4k
本站访客数 :
本站总访问量 :
最后更新时间 :
访客地图