Bayesian strategy networks based soft actor-critic learning