我是靠谱客的博主 炙热煎蛋,最近开发中收集的这篇文章主要介绍【MATLAB强化学习工具箱】学习笔记--在Simulink环境中训练智能体Create Simulink Environment and Train Agent创建接口生成DDPG Agent训练网络,觉得挺不错的,现在分享给大家,希望可以做个参考。

概述

Simulink中便于搭建各类动力学与控制模型,通过将原有的控制器替换为AI控制器,可以方便使用已有模型,提供增量效果。

本节的重点是如何引入Simulink模型作为env,其他的内容在之前的文章中已有说明。

以水箱模型watertank为例,如下图所示:

 采用PI控制器,控制效果如下所示:

 将此PI控制器替换为神经网络控制器后,系统架构如下图所示:

 具体替换策略如下所示:

(1)删去PID控制器;

(2)增加RL Agent模块;

(3)观测器模块:

观测向量为:,其中为水箱高度,,是设定的水位高度。

(4)奖励函数定义如下

如下所示:

 (5)终止条件

创建接口

观测器信息

obsInfo = rlNumericSpec([3 1],...
'LowerLimit',[-inf -inf 0
]',...
'UpperLimit',[ inf
inf inf]');
obsInfo.Name = 'observations';
obsInfo.Description = 'integrated error, error, and measured height';
numObservations = obsInfo.Dimension(1);

 动作器信息

actInfo = rlNumericSpec([1 1]);
actInfo.Name = 'flow';
numActions = actInfo.Dimension(1);

构建接口【最核心的步骤】

env = rlSimulinkEnv('rlwatertank','rlwatertank/RL Agent',...
obsInfo,actInfo);

定义重置函数

env.ResetFcn = @(in)localResetFcn(in);

配置仿真总时间和仿真步长

Ts = 1.0;
Tf = 200;

重置随机数种子

rng(0)

生成DDPG Agent

具体见前一篇文章。

state网络

statePath = [
featureInputLayer(numObservations,'Normalization','none','Name','State')
fullyConnectedLayer(50,'Name','CriticStateFC1')
reluLayer('Name','CriticRelu1')
fullyConnectedLayer(25,'Name','CriticStateFC2')];

action网络

actionPath = [
featureInputLayer(numActions,'Normalization','none','Name','Action')
fullyConnectedLayer(25,'Name','CriticActionFC1')];

合并网络

commonPath = [
additionLayer(2,'Name','add')
reluLayer('Name','CriticCommonRelu')
fullyConnectedLayer(1,'Name','CriticOutput')];

组成critic网络

criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');

critic网络参数配置

criticOpts = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},criticOpts);

actor网络

actorNetwork = [
featureInputLayer(numObservations,'Normalization','none','Name','State')
fullyConnectedLayer(3, 'Name','actorFC')
tanhLayer('Name','actorTanh')
fullyConnectedLayer(numActions,'Name','Action')
];

actor网络参数配置

actorOptions = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,'Observation',{'State'},'Action',{'Action'},actorOptions);

构建最终的DDPG Agent

agentOpts = rlDDPGAgentOptions(...
'SampleTime',Ts,...
'TargetSmoothFactor',1e-3,...
'DiscountFactor',1.0, ...
'MiniBatchSize',64, ...
'ExperienceBufferLength',1e6);
agentOpts.NoiseOptions.StandardDeviation = 0.3;
agentOpts.NoiseOptions.StandardDeviationDecayRate = 1e-5;
agent = rlDDPGAgent(actor,critic,agentOpts);

训练网络

maxepisodes = 5000;
maxsteps = ceil(Tf/Ts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes, ...
'MaxStepsPerEpisode',maxsteps, ...
'ScoreAveragingWindowLength',20, ...
'Verbose',false, ...
'Plots','training-progress',...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',800);

开始训练 


trainingStats = train(agent,env,trainOpts);

训练过程可以看到主程序一直在调用simulink模型

 

最后验证训练结果

simOpts = rlSimulationOptions('MaxSteps',maxsteps,'StopOnError','on');
experiences = sim(env,agent,simOpts);

得到如下结果

 

 

 

最后

以上就是炙热煎蛋为你收集整理的【MATLAB强化学习工具箱】学习笔记--在Simulink环境中训练智能体Create Simulink Environment and Train Agent创建接口生成DDPG Agent训练网络的全部内容,希望文章能够帮你解决【MATLAB强化学习工具箱】学习笔记--在Simulink环境中训练智能体Create Simulink Environment and Train Agent创建接口生成DDPG Agent训练网络所遇到的程序开发问题。

如果觉得靠谱客网站的内容还不错,欢迎将靠谱客网站推荐给程序员好友。

本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
点赞(79)

评论列表共有 0 条评论

立即
投稿
返回
顶部