Introduction

Bimonthly, started in 1957
Administrator
Shanxi Provincial Education Department
Sponsor
Taiyuan University of Technology
Publisher
Ed. Office of Journal of TYUT
Editor-in-Chief
SUN Hongbin
ISSN: 1007-9432
CN: 14-1220/N
location: home > paper > 
References:
  • Browse HTML PDFDownload   size: 3.42MB   viewed:88   download:444
  • Deep Reinforcement Learning with Phasic Policy Gradient with Sample Reuse
    DOI:
     10.16355/j.tyut.1007-9432.20230300
    Received:
     2023-04-22
    Accepted:
     2023-05-25
    Corresponding author     Institute
    WANG Li      College of Data Science, Taiyuan University of Technology
    abstract:
    【Purposes】 The algoritihm of phasic policy gradient with sample reuse(SR-PPG) is proposed to address the problems of non-reuse of samples and low sample utilization in policy-based deep reinforcement learning algorithms. 【Methods】 In the proposed algorithm, offline data are introduced on the basis of the phasic policy gradient (PPG), thus reducing the time cost of training and enabling the model to converge quickly. In this work, SR-PPG combines the stability advantages of theoretically supported on-policy algorithms with the sample efficiency of off-policy algorithms to develop policy improvement guarantees applicable to off-policy settings and to link these bounds to the clipping mechanism used by PPG. 【Findings】 A series of theoretical and experimental demonstrations show that this algorithm provides better performance by effectively balancing the competing goals of stability and sample efficiency.
    Keywords:
     deep reinforcement learning; phasic policy gradient; sample reuse

    Website Copyright © Editorial Office of Journal of Taiyuan University of Technology

    E-mail:tyutxb@tyut.edu.cn