序号 | 文章 | 关键词 | 大概意思 |
---|---|---|---|
61 | Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space | General Utilities, PG | 介绍了一种梯度下降方法,用于General Utilities,就是目标函数是state-action pair distribution的非线性函数 |
62 | Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning | off-policy, Important sampling | 介绍了一种用于离线学习的重要性采样算法RBIS,将trace表示为了一个t的函数,而不是IS ratio的乘积 |
63 | Semi-Offline Reinforcement Learning for Optimized Text Generation | semi-offline, LLM | 介绍了一种训练语言模型的半离线强化学习方法,即利用模型的训练数据,只需要模型推理一步 |
64 | StriderNet: A Graph Reinforcement Learning Approach to Optimize Atomic Structures on Rough Energy Landscapes | GNN, Atomic structes, RL应用 | 使用强化学习对原子结构进行优化,使用了一个GNN提取原子结构的特征 |
65 | Reinforcement Learning Can Be More Efficient with Multiple Rewards | - | |
66 | LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework | - | |
67 | Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement Learning | Exploration | 使用一个网络估计trajectory中的每一步是否访问该状态,并计算出一个对应的探索获得的bonus |
68 | Interactive Object Placement with Reinforcement Learning | - | |
69 | Oracles and Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning | Stackelberg Equilibria, 博弈论, RL应用 | 介绍了一种框架,用来实现多个代理的Stackelberg均衡问题 |
70 | Non-stationary Reinforcement Learning under General Function Approximation | Non-stationary | 介绍了一种用于不稳定环境的强化学习算法SW-OPEA,在筛选策略集函数时使用了基于滑动窗口和置信度的条件 |
71 | Multi-task Hierarchical Adversarial Inverse Reinforcement Learning | IL, IRL,Muti-task | 介绍了一种分层的模仿学习算法MH-AIRL,对AIRL算法进行了改进,可以用于多任务中 |
72 | Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning | Multi-Agent | 提出了一种用于多代理任务的PPO算法,按照一定顺序对代理进行更新,在更新时之前代理的动作作为条件。 |
73 | Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning | Muti-Agent, LLM | 介绍了一个多代理强化学习框架EnDi,通过对代理需要交互的实体进行划分,避免子目标冲突,提高了泛化能力 |
74 | Parallel Q Q Q-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation | Parallel | 介绍了一种并行的强化学习算法PQL,基于DDQN扩展,并行的更新Q函数和策略。 |
75 | Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning | - | |
76 | Language Instructed Reinforcement Learning for Human-AI Coordination | LLM | 介绍了一个强化学习框架instructRL,使用人类的指令对Q函数进行修正,改进了Q-learning和PPO,提升了人机协调能力 |
77 | Representation-Driven Reinforcement Learning | Exploration | 介绍了一种强化学习框架ReRL,通过把策略的参数表示为一个用于探索的数值,将探索问题转为表示问题 |
78 | Efficient Online Reinforcement Learning with Offline Data | offline | 介绍了一种新的框架RLPD,通过混合离线数据和在线数据,添加LayerNorm等方法对离线数据进行利用 |
79 | Reinforcement Learning with History Dependent Dynamic Contexts | Non-stationary | 介绍了一种动态的马尔科夫决策过程DCMDP,采用特征映射来获取历史向量,以及一种采用最大似然法求解特征映射的方法LDC-UCB,以及一种基于模型的方法DCZero |
80 | Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation | Exploration, adversarial cost | 介绍了一种基于最小二乘法的强化学习算法PO-LSBE,用于鼓励在可变损失环境中的进行探索。 |
ICML强化学习文章分类
猜你喜欢
转载自blog.csdn.net/HGGshiwo/article/details/131149707
今日推荐
周排行