An Agent AI that Trades Stocks

Intro

I was looking for ideas or environments to train an agent, when I found this stocks trading env
With this training environment, you can teach your AI to trade stocks, as a RL task.

It is easy as importing a module:

1
2
3
from gym_anytrading.envs import TradingEnv, ForexEnv, StocksEnv, Actions, Positions 
from gym_anytrading.datasets import FOREX_EURUSD_1H_ASK, STOCKS_GOOGL
env = gym.make('stocks-v0', frame_bound=(50, 100), window_size=10)
1
2
3
4
5
6
print("env information:")
print("> shape:", env.shape)
print("> df.shape:", env.df.shape)
print("> prices.shape:", env.prices.shape)
print("> signal_features.shape:", env.signal_features.shape)
print("> max_possible_profit:", env.max_possible_profit())

env information:

shape: (10, 2)
df.shape: (2335, 6)
prices.shape: (60,)
signal_features.shape: (60, 2)
max_possible_profit: 1.351611419906775

The Approach

I went for a simple linear model, but with a twist. I, again, used Gates. As for the context, I used the total profit (the multiplier from the base ammount of money) squared, to accentuate extremes. I think this is how it figured out the trick it did later on.
I took the prediction of multiple old timesteps, and fed it to the model, like a Many-to-One architecture.
Finally, for the DQN, I used a deque for the memory, and stored the cuda Tensors for less RAM usage.

The Framework

Here, I used pyTorch, which I find better for prototyping, and now, I’ve switched to it.

The Model

Here’s a simple model I used.

1
2
3
4
5
self.grunit = nn.GRU(observation.shape[1], r_hidden)
self.gate = Gate([
nn.Linear(r_hidden, g_hidden)
], nn.Linear(1, 4))
self.proba = nn.Linear(g_hidden, 2)
1
dqn = DQN(64, 32)

In total, this is 15,000 params

And the Hyperparameters I chose for the DQN

1
2
3
4
5
6
7
8
9
10
11
EPISODES = 300
BATCH_SIZE = 64 #Training_Rate
LR = 2e-4 # learning rate
EPSILON = 1 # greedy policy
EPSILON_DECAY = .980 #Decay of the exploration
EPSILON_MIN = .01
GAMMA = 1.3 # reward discount
TARGET_REPLACE_ITER = 100 # target update frequency
MEMORY_CAPACITY = 2000
N_ACTIONS = env.action_space.n
N_STATES = env.observation_space.shape[0]

For the activation function, I used gelu for the temporal layer, sigmoid for the gate, and softmax for the probabilistic output.

Results

Here, I found something interesting. The Agent BROKE the environment. Take a look at the profit over time, in multiplication factor (1 being no profit):

broke-the-env

Well, actually, it found a method to simply earn consistantly 15% of what it invested.
Strangely enough, its average reward was 0.000.

Possible Improvement

  • Again, Gates

    Shallow Networks don’t profit from Gates as much as deeper networks