Taiyuan University of technology

Online System

Reviewer Login

Editor Login

Author Login

Download

Statement of Competing Interests

Authors Contribution Form

Online Journal

Advanced search

Special subject

Current Issue

Previous Issue

Introduction

Bimonthly, started in 1957
Administrator
Shanxi Provincial Education Department
Sponsor
Taiyuan University of Technology
Publisher
Ed. Office of Journal of TYUT
Editor-in-Chief
SUN Hongbin
ISSN: 1007-9432
CN: 14-1220/N

Links

Shanxi Provincial Education Department

Taiyuan University of Technology

location: home > paper >

References:

LI Hailiang, WANG Li.Deep Reinforcement Learning with Phasic Policy Gradient with Sample Reuse[J].Taiyuan University of technology,2024，55（04）:712-719

Browse HTML

PDFDownload size： 3.42MB viewed：98 download：478

Deep Reinforcement Learning with Phasic Policy Gradient with Sample Reuse

DOI:

10.16355/j.tyut.1007-9432.20230300

Received:

2023-04-22

Accepted:

2023-05-25

Corresponding author		Institute
WANG Li		College of Data Science, Taiyuan University of Technology

abstract:

【Purposes】 The algoritihm of phasic policy gradient with sample reuse(SR-PPG) is proposed to address the problems of non-reuse of samples and low sample utilization in policy-based deep reinforcement learning algorithms. 【Methods】 In the proposed algorithm, offline data are introduced on the basis of the phasic policy gradient (PPG), thus reducing the time cost of training and enabling the model to converge quickly. In this work, SR-PPG combines the stability advantages of theoretically supported on-policy algorithms with the sample efficiency of off-policy algorithms to develop policy improvement guarantees applicable to off-policy settings and to link these bounds to the clipping mechanism used by PPG. 【Findings】 A series of theoretical and experimental demonstrations show that this algorithm provides better performance by effectively balancing the competing goals of stability and sample efficiency.

Keywords:

deep reinforcement learning; phasic policy gradient; sample reuse