LoL AI Model Part 2: Redesign MDP with Gold Diff

AI in Video Games: Improving Decision Making in League of Legends using Real Match Statistics and Personal Preferences

Part 2: Redesigning Markov Decision Process with Gold Difference and Improving Model

Motivations and Objectives

League of Legends is a team oriented video game where on two team teams (with 5 players in each) compete for objectives and kills. Gaining an advantage enables the players to become stronger (obtain better items and level up faster) than their opponents and, as their advantage increases, the likelihood of winning the game also increases. We therefore have a sequence of events dependent on previous events that lead to one team destroying the other’s base and winning the game.

Sequences like this being modelled statistically is nothing new; for years now researchers have considered how this is applied in sports, such as basketball (https://arxiv.org/pdf/1507.01816.pdf), where a sequence of passing, dribbling and foul plays lead to a team obtaining or losing points. The aim of research such as this one mentioned is to provide more detailed insight beyond a simple box score (number of points or kill gained by player in basketball or video games respectively) and consider how teams perform when modelled as a sequence of events connected in time.

Modelling the events in this way is even more important in games such as League of Legends as taking objectives and kills lead towards both an item and level advantage. For example, a player obtaining the first kill of the game nets them gold that can be used to purchase more powerful items. With this item they are then strong enough to obtain more kills and so on until they can lead their team to a win. Facilitating a lead like this is often referred to as ‘snowballing’ as the players cumulatively gain advantages but often games are not this one sided and objects and team plays are more important.

The aim of this is project is simple; can we calculate the next best event given what has occurred previously in the game so that the likelihood of eventually leading to a win increases based on real match statistics?

However, there are many factors that lead to a player’s decision making in a game that cannot be easily measured. No how matter how much data collected, the amount of information a player can capture is beyond any that a computer can detect (at least for now!). For example, players may be over or underperforming in this game or may simply have a preference for the way they play (often defined by the types of characters they play). Some players will naturally be more aggressive and look for kills while others will play passively and push for objectives instead. Therefore, we further develop our model to allow the player to adjust the recommended play on their preferences.

Import Packages and Data

In [1]:

import pandas as pd
import numpy as np
import datetime
import matplotlib.pyplot as plt
import seaborn as sns
from IPython.display import Image
import math
from scipy.stats import kendalltau

from IPython.display import clear_output
import timeit

import warnings
warnings.filterwarnings('ignore')

In [2]:

#kills = pd.read_csv('C:\\Users\\Phil\\Documents\\LoL Model\\kills.csv')
#matchinfo = pd.read_csv('C:\\Users\\Phil\\Documents\\LoL Model\\matchinfo.csv')
#monsters = pd.read_csv('C:\\Users\\Phil\\Documents\\LoL Model\\monsters.csv')
#structures = pd.read_csv('C:\\Users\\Phil\\Documents\\LoL Model\\structures.csv')

kills = pd.read_csv('../input/kills.csv')
matchinfo = pd.read_csv('../input/matchinfo.csv')
monsters = pd.read_csv('../input/monsters.csv')
structures = pd.read_csv('../input/structures.csv')

Introducing Gold Difference to Redesign the Markov Decision Process

In [3]:

#gold = pd.read_csv('C:\\Users\\Phil\\Documents\\LoL Model\\gold.csv')
gold = pd.read_csv('../input/gold.csv')

In [4]:

gold = gold[gold['Type']=="golddiff"]
gold.head()

Out[4]:

  Address Type min_1 min_2 min_3 min_4 min_5 min_6 min_7 min_8 min_9 min_10 min_11 min_12 min_13 min_14 min_15 min_16 min_17 min_18 min_19 min_20 min_21 min_22 min_23 min_24 min_25 min_26 min_27 min_28 min_29 min_30 min_31 min_32 min_33 min_34 min_35 min_36 min_37 min_38 ... min_56 min_57 min_58 min_59 min_60 min_61 min_62 min_63 min_64 min_65 min_66 min_67 min_68 min_69 min_70 min_71 min_72 min_73 min_74 min_75 min_76 min_77 min_78 min_79 min_80 min_81 min_82 min_83 min_84 min_85 min_86 min_87 min_88 min_89 min_90 min_91 min_92 min_93 min_94 min_95
0 http://matchhistory.na.leagueoflegends.com/en/... golddiff 0 0 -14 -65 -268 -431 -488 -789 -494 -625 -1044 -313 -760 -697 -790 -611 240 845.0 797.0 1422.0 987.0 169.0 432.0 491.0 1205.0 1527.0 1647.0 1847.0 3750.0 4719.0 3561.0 3367.0 2886.0 2906.0 4411.0 4473.0 4639.0 4762.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 http://matchhistory.na.leagueoflegends.com/en/... golddiff 0 0 -26 -18 147 237 -152 18 88 -242 102 117 802 1420 1394 1301 1489 1563.0 1368.0 1105.0 205.0 192.0 587.0 377.0 667.0 415.0 1876.0 1244.0 2130.0 2431.0 680.0 1520.0 949.0 1894.0 2644.0 3394.0 3726.0 1165.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 http://matchhistory.na.leagueoflegends.com/en/... golddiff 0 0 10 -60 34 37 589 1064 1258 913 1233 1597 1575 3046 2922 3074 3626 3466.0 5634.0 5293.0 4597.0 4360.0 4616.0 4489.0 4880.0 5865.0 6993.0 7049.0 7029.0 7047.0 7160.0 7081.0 7582.0 9917.0 10337.0 9823.0 12307.0 13201.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 http://matchhistory.na.leagueoflegends.com/en/... golddiff 0 0 -15 25 228 -6 -243 175 -346 16 -258 -57 -190 -111 -335 -8 324 428.0 -124.0 768.0 2712.0 1813.0 198.0 1242.0 1245.0 1278.0 1240.0 -664.0 -1195.0 -1157.0 -2161.0 -2504.0 -3873.0 -3688.0 -3801.0 -3668.0 -3612.0 -5071.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 http://matchhistory.na.leagueoflegends.com/en/... golddiff 40 40 44 -36 113 158 -121 -191 23 205 156 272 -271 -896 -574 177 -425 -730.0 -318.0 478.0 926.0 761.0 -286.0 473.0 490.0 1265.0 2526.0 3890.0 4319.0 5121.0 5140.0 5141.0 6866.0 9517.0 11322.0 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

In [5]:

# Add ID column based on last 16 digits in match address for simpler matching

matchinfo['id'] = matchinfo['Address'].astype(str).str[-16:]
kills['id'] = kills['Address'].astype(str).str[-16:]
monsters['id'] = monsters['Address'].astype(str).str[-16:]
structures['id'] = structures['Address'].astype(str).str[-16:]
gold['id'] = gold['Address'].astype(str).str[-16:]

In [6]:

# Dragon became multiple types in patch v6.9 (http://leagueoflegends.wikia.com/wiki/V6.9) 
# so we remove and games before this change occured and only use games with the new dragon system

old_dragon_id = monsters[ monsters['Type']=="DRAGON"]['id'].unique()
old_dragon_id

monsters = monsters[ ~monsters['id'].isin(old_dragon_id)]
monsters = monsters.reset_index()

matchinfo = matchinfo[ ~matchinfo['id'].isin(old_dragon_id)]
matchinfo = matchinfo.reset_index()

kills = kills[ ~kills['id'].isin(old_dragon_id)]
kills = kills.reset_index()

structures = structures[ ~structures['id'].isin(old_dragon_id)]
structures = structures.reset_index()

gold = gold[ ~gold['id'].isin(old_dragon_id)]
gold = gold.reset_index()

gold.head(3)

Out[6]:

  index Address Type min_1 min_2 min_3 min_4 min_5 min_6 min_7 min_8 min_9 min_10 min_11 min_12 min_13 min_14 min_15 min_16 min_17 min_18 min_19 min_20 min_21 min_22 min_23 min_24 min_25 min_26 min_27 min_28 min_29 min_30 min_31 min_32 min_33 min_34 min_35 min_36 min_37 ... min_57 min_58 min_59 min_60 min_61 min_62 min_63 min_64 min_65 min_66 min_67 min_68 min_69 min_70 min_71 min_72 min_73 min_74 min_75 min_76 min_77 min_78 min_79 min_80 min_81 min_82 min_83 min_84 min_85 min_86 min_87 min_88 min_89 min_90 min_91 min_92 min_93 min_94 min_95 id
0 335 http://matchhistory.na.leagueoflegends.com/en/... golddiff 0 0 -28 67 355 453 311 1250 1321 1470 2005 1735 2064 2985 2911 4018 4308 4344.0 4687.0 5719.0 6525.0 5506.0 6198.0 7494.0 7284.0 7191.0 8089.0 8006.0 7922.0 8548.0 9077.0 12737.0 12064.0 NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN 55109b5a7a91ae87
1 336 http://matchhistory.na.leagueoflegends.com/en/... golddiff 0 0 36 -133 -58 -393 -374 -115 -194 -497 -1131 -1204 -1656 -1222 -221 -1613 -824 -644.0 -851.0 -955.0 774.0 658.0 812.0 1134.0 153.0 363.0 322.0 243.0 74.0 -987.0 -2022.0 -254.0 -629.0 -672.0 -3757.0 -3858.0 -4105.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN e147296c928da5b4
2 337 http://matchhistory.na.leagueoflegends.com/en/... golddiff 0 17 58 574 224 501 743 603 1018 1167 1592 1288 1140 2014 2698 3263 3822 4199.0 5032.0 4730.0 4859.0 5129.0 6546.0 6428.0 7093.0 7080.0 6786.0 7047.0 9240.0 10510.0 12205.0 15039.0 NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ea0611c44259f062

In [7]:

#Transpose Gold table, columns become matches and rows become minutes

gold_T = gold.iloc[:,3:-1].transpose()
gold_T.head(3)

Out[7]:

  0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 ... 4875 4876 4877 4878 4879 4880 4881 4882 4883 4884 4885 4886 4887 4888 4889 4890 4891 4892 4893 4894 4895 4896 4897 4898 4899 4900 4901 4902 4903 4904 4905 4906 4907 4908 4909 4910 4911 4912 4913 4914
min_1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
min_2 0.0 0.0 17.0 8.0 -1.0 0.0 0.0 0.0 -3.0 0.0 -8.0 10.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 -510.0 0.0 0.0 0.0 -8.0 0.0 0.0 -8.0 0.0 0.0 -18.0 -8.0 0.0 0.0 8.0 8.0 -10.0 0.0 0.0 -8.0 -8.0 ... -11.0 0.0 0.0 0.0 0.0 0.0 3.0 0.0 -16.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 5.0 0.0 0.0 -5.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 13.0 0.0 0.0 0.0 0.0 -8.0 0.0 0.0 -8.0 0.0 0.0
min_3 -28.0 36.0 58.0 8.0 83.0 20.0 6.0 -17.0 -61.0 34.0 -33.0 50.0 50.0 -73.0 40.0 -71.0 -74.0 -35.0 -54.0 -465.0 -39.0 -27.0 -14.0 6.0 -18.0 -71.0 -48.0 20.0 51.0 465.0 23.0 -24.0 -74.0 -8.0 56.0 52.0 -4.0 -407.0 -64.0 6.0 ... -97.0 88.0 82.0 108.0 -47.0 -43.0 -117.0 2.0 -83.0 -75.0 0.0 -7.0 -1.0 204.0 -90.0 -74.0 161.0 14.0 -145.0 49.0 -2.0 -26.0 -22.0 47.0 114.0 -38.0 -32.0 29.0 150.0 45.0 -14.0 70.0 34.0 37.0 -187.0 -18.0 -86.0 -6.0 -97.0 -8.0

In [8]:

gold2 = pd.DataFrame()

start = timeit.default_timer()
for r in range(0,len(gold)):
    clear_output(wait=True)
    
    # Select each match column, drop any na rows and find the match id from original gold table
    gold_row = gold_T.iloc[:,r]
    gold_row = gold_row.dropna()
    gold_row_id = gold['id'][r]
    
    # Append into table so that each match and event is stacked on top of one another    
    gold2 = gold2.append(pd.DataFrame({'id':gold_row_id,'GoldDiff':gold_row}))
    
    
    stop = timeit.default_timer()
   
    if (r/len(gold)*100) < 5  :
        expected_time = "Calculating..."
        
    else:
        time_perc = timeit.default_timer()
        expected_time = np.round( ( (time_perc-start)/60 / (r/len(gold)) ),2)
        
  
        
    print("Current progress:",np.round(r/len(gold) *100, 2),"%")        
    print("Current run time:",np.round((stop - start)/60,2),"minutes")
    print("Expected Run Time:",expected_time,"minutes")
    
Current progress: 74.95 %

In [9]:

gold3 = gold2[['id','GoldDiff']]
gold3.head(3)

Out[9]:

  id GoldDiff
min_1 55109b5a7a91ae87 0.0
min_2 55109b5a7a91ae87 0.0
min_3 55109b5a7a91ae87 -28.0

In [10]:

### Create minute column with index, convert from 'min_1' to just the number
gold3['Minute'] = gold3.index.to_series()
gold3['Minute'] = np.where(gold3['Minute'].str[-2]=="_", gold3['Minute'].str[-1],gold3['Minute'].str[-2:])
gold3['Minute'] = gold3['Minute'].astype(int)
gold3 = gold3.reset_index()
gold3 = gold3.sort_values(by=['id','Minute'])

gold3.head(3)

Out[10]:

  index id GoldDiff Minute
22608 min_1 0001f4374a03c133 0.0 1
22609 min_2 0001f4374a03c133 0.0 2
22610 min_3 0001f4374a03c133 51.0 3

In [11]:

# Gold difference from data is relative to blue team's perspective,
# therefore we reverse this by simply multiplying amount by -1
gold3['GoldDiff'] = gold3['GoldDiff']*-1

gold3.head(10)

Out[11]:

  index id GoldDiff Minute
22608 min_1 0001f4374a03c133 -0.0 1
22609 min_2 0001f4374a03c133 -0.0 2
22610 min_3 0001f4374a03c133 -51.0 3
22611 min_4 0001f4374a03c133 132.0 4
22612 min_5 0001f4374a03c133 35.0 5
22613 min_6 0001f4374a03c133 940.0 6
22614 min_7 0001f4374a03c133 589.0 7
22615 min_8 0001f4374a03c133 1391.0 8
22616 min_9 0001f4374a03c133 1151.0 9
22617 min_10 0001f4374a03c133 563.0 10

In [12]:

matchinfo.head(3)

Out[12]:

  index League Year Season Type blueTeamTag bResult rResult redTeamTag gamelength blueTop blueTopChamp blueJungle blueJungleChamp blueMiddle blueMiddleChamp blueADC blueADCChamp blueSupport blueSupportChamp redTop redTopChamp redJungle redJungleChamp redMiddle redMiddleChamp redADC redADCChamp redSupport redSupportChamp Address id
0 335 NALCS 2016 Summer Season TSM 1 0 CLG 33 Hauntzer Maokai Svenskeren Graves Bjergsen Zilean Doublelift Lucian Biofrost Karma Darshan Trundle Xmithie RekSai Huhi Viktor Stixxay Ezreal aphromoo Nami http://matchhistory.na.leagueoflegends.com/en/... 55109b5a7a91ae87
1 336 NALCS 2016 Summer Season CLG 0 1 TSM 39 Darshan Fiora Xmithie RekSai Huhi Vladimir Stixxay Ezreal aphromoo Bard Hauntzer Ekko Svenskeren Elise Bjergsen Viktor Doublelift Lucian Biofrost Braum http://matchhistory.na.leagueoflegends.com/en/... e147296c928da5b4
2 337 NALCS 2016 Summer Season NV 1 0 NRG 32 Seraph Trundle Procxin Nidalee Ninja Lissandra LOD Ezreal Hakuho Karma Quas Maokai Santorin Kindred GBM Ziggs ohq Sivir KiWiKiD Alistar http://matchhistory.na.leagueoflegends.com/en/... ea0611c44259f062

In [13]:

gold4 = gold3

matchinfo2 = matchinfo[['id','rResult','gamelength']]
matchinfo2['gamlength'] = matchinfo2['gamelength'] + 1
matchinfo2['index'] = 'min_'+matchinfo2['gamelength'].astype(str)
matchinfo2['rResult2'] =  np.where(matchinfo2['rResult']==1,999999,-999999)
matchinfo2 = matchinfo2[['index','id','rResult2','gamelength']]
matchinfo2.columns = ['index','id','GoldDiff','Minute']


gold4 = gold4.append(matchinfo2)
gold4.tail()

Out[13]:

  index id GoldDiff Minute
4910 min_34 377c00c79e80e193 999999.0 34
4911 min_39 22cad427dfd10959 999999.0 39
4912 min_24 671b2487ca72bfab 999999.0 24
4913 min_35 7cdb33f56fe49084 -999999.0 35
4914 min_42 13203adbaa0c1fa5 999999.0 42

In [14]:

kills = kills[ kills['Time']>0]

kills['Minute'] = kills['Time'].astype(int)

kills['Team'] = np.where( kills['Team']=="rKills","Red","Blue")
kills.head(3)

Out[14]:

  index Address Team Time Victim Killer Assist_1 Assist_2 Assist_3 Assist_4 x_pos y_pos id Minute
0 4462 http://matchhistory.na.leagueoflegends.com/en/... Blue 6.032 CLG Huhi TSM Svenskeren TSM Bjergsen NaN NaN NaN 7825 8666 55109b5a7a91ae87 6
1 4463 http://matchhistory.na.leagueoflegends.com/en/... Blue 9.428 CLG Huhi TSM Biofrost TSM Bjergsen TSM Doublelift NaN NaN 8728 8751 55109b5a7a91ae87 9
2 4464 http://matchhistory.na.leagueoflegends.com/en/... Blue 9.780 CLG Xmithie TSM Bjergsen TSM Hauntzer TSM Svenskeren NaN NaN 8655 1172 55109b5a7a91ae87 9

In [15]:

# For the Kills table, we need decided to group by the minute in which the kills took place and averaged 
# the time of the kills which we use later for the order of events

f = {'Time':['mean','count']}

killsGrouped = kills.groupby( ['id','Team','Minute'] ).agg(f).reset_index()
killsGrouped.columns = ['id','Team','Minute','Time Avg','Count']
killsGrouped = killsGrouped.sort_values(by=['id','Minute'])
killsGrouped.head(3)

Out[15]:

  id Team Minute Time Avg Count
4 0001f4374a03c133 Red 4 4.635 1
5 0001f4374a03c133 Red 6 6.064 1
0 0001f4374a03c133 Blue 8 8.194 1

In [16]:

structures = structures[ structures['Time']>0]

structures['Minute'] = structures['Time'].astype(int)
structures['Team'] = np.where(structures['Team']=="bTowers","Blue",
                        np.where(structures['Team']=="binhibs","Blue","Red"))
structures2 = structures.sort_values(by=['id','Minute'])

structures2 = structures2[['id','Team','Time','Minute','Type']]
structures2.head(3)

Out[16]:

  id Team Time Minute Type
4247 0001f4374a03c133 Blue 11.182 11 OUTER_TURRET
37055 0001f4374a03c133 Red 11.006 11 OUTER_TURRET
4248 0001f4374a03c133 Blue 16.556 16 OUTER_TURRET

In [17]:

monsters['Type2'] = np.where( monsters['Type']=="FIRE_DRAGON", "DRAGON",
                    np.where( monsters['Type']=="EARTH_DRAGON","DRAGON",
                    np.where( monsters['Type']=="WATER_DRAGON","DRAGON",       
                    np.where( monsters['Type']=="AIR_DRAGON","DRAGON",   
                             monsters['Type']))))

monsters = monsters[ monsters['Time']>0]

monsters['Minute'] = monsters['Time'].astype(int)

monsters['Team'] = np.where( monsters['Team']=="bDragons","Blue",
                   np.where( monsters['Team']=="bHeralds","Blue",
                   np.where( monsters['Team']=="bBarons", "Blue", 
                           "Red")))

monsters = monsters[['id','Team','Time','Minute','Type2']]
monsters.columns = ['id','Team','Time','Minute','Type']
monsters.head(3)

Out[17]:

  id Team Time Minute Type
0 55109b5a7a91ae87 Blue 23.444 23 DRAGON
1 55109b5a7a91ae87 Blue 31.069 31 DRAGON
2 55109b5a7a91ae87 Blue 16.419 16 DRAGON

In [18]:

GoldstackedData = gold4.merge(killsGrouped, how='left',on=['id','Minute'])
 
monsters_structures_stacked = structures2.append(monsters[['id','Team','Minute','Time','Type']])

GoldstackedData2 = GoldstackedData.merge(monsters_structures_stacked, how='left',on=['id','Minute'])

GoldstackedData2 = GoldstackedData2.sort_values(by=['id','Minute'])
GoldstackedData2.head(30)

Out[18]:

  index id GoldDiff Minute Team_x Time Avg Count Team_y Time Type
0 min_1 0001f4374a03c133 -0.0 1 NaN NaN NaN NaN NaN NaN
1 min_2 0001f4374a03c133 -0.0 2 NaN NaN NaN NaN NaN NaN
2 min_3 0001f4374a03c133 -51.0 3 NaN NaN NaN NaN NaN NaN
3 min_4 0001f4374a03c133 132.0 4 Red 4.6350 1.0 NaN NaN NaN
4 min_5 0001f4374a03c133 35.0 5 NaN NaN NaN NaN NaN NaN
5 min_6 0001f4374a03c133 940.0 6 Red 6.0640 1.0 NaN NaN NaN
6 min_7 0001f4374a03c133 589.0 7 NaN NaN NaN NaN NaN NaN
7 min_8 0001f4374a03c133 1391.0 8 Blue 8.1940 1.0 NaN NaN NaN
8 min_9 0001f4374a03c133 1151.0 9 NaN NaN NaN NaN NaN NaN
9 min_10 0001f4374a03c133 563.0 10 Red 10.4780 1.0 NaN NaN NaN
10 min_11 0001f4374a03c133 915.0 11 NaN NaN NaN Blue 11.182 OUTER_TURRET
11 min_11 0001f4374a03c133 915.0 11 NaN NaN NaN Red 11.006 OUTER_TURRET
12 min_11 0001f4374a03c133 915.0 11 NaN NaN NaN Red 11.261 DRAGON
13 min_12 0001f4374a03c133 2444.0 12 NaN NaN NaN NaN NaN NaN
14 min_13 0001f4374a03c133 1509.0 13 NaN NaN NaN NaN NaN NaN
15 min_14 0001f4374a03c133 1859.0 14 NaN NaN NaN NaN NaN NaN
16 min_15 0001f4374a03c133 2133.0 15 NaN NaN NaN Red 15.777 RIFT_HERALD
17 min_16 0001f4374a03c133 2162.0 16 NaN NaN NaN Blue 16.556 OUTER_TURRET
18 min_16 0001f4374a03c133 2162.0 16 NaN NaN NaN Red 16.145 OUTER_TURRET
19 min_17 0001f4374a03c133 2243.0 17 Red 17.3870 2.0 Red 17.570 DRAGON
20 min_18 0001f4374a03c133 1793.0 18 Blue 18.0980 1.0 Red 18.378 OUTER_TURRET
21 min_18 0001f4374a03c133 1793.0 18 Red 18.0670 2.0 Red 18.378 OUTER_TURRET
22 min_19 0001f4374a03c133 3151.0 19 NaN NaN NaN NaN NaN NaN
23 min_20 0001f4374a03c133 4194.0 20 NaN NaN NaN NaN NaN NaN
24 min_21 0001f4374a03c133 4792.0 21 Red 21.7900 1.0 NaN NaN NaN
25 min_22 0001f4374a03c133 4494.0 22 Blue 22.5125 2.0 Red 22.915 BARON_NASHOR
26 min_22 0001f4374a03c133 4494.0 22 Red 22.4578 5.0 Red 22.915 BARON_NASHOR
27 min_23 0001f4374a03c133 4833.0 23 NaN NaN NaN Red 23.885 DRAGON
28 min_24 0001f4374a03c133 7907.0 24 NaN NaN NaN Red 24.943 INNER_TURRET
29 min_25 0001f4374a03c133 7681.0 25 NaN NaN NaN Red 25.463 INNER_TURRET

In [19]:

GoldstackedData3 = GoldstackedData2
GoldstackedData3['Time2'] = GoldstackedData3['Time'].fillna(GoldstackedData3['Time Avg']).fillna(GoldstackedData3['Minute'])
GoldstackedData3['Team'] = GoldstackedData3['Team_x'].fillna(GoldstackedData3['Team_y'])
GoldstackedData3 = GoldstackedData3.sort_values(by=['id','Time2'])

GoldstackedData3['EventNum'] = GoldstackedData3.groupby('id').cumcount()+1

GoldstackedData3 = GoldstackedData3[['id','EventNum','Team','Minute','Time2','GoldDiff','Count','Type']]

GoldstackedData3.columns = ['id','EventNum','Team','Minute','Time','GoldDiff','KillCount','Struct/Monster']

GoldstackedData3.head(30)

Out[19]:

  id EventNum Team Minute Time GoldDiff KillCount Struct/Monster
0 0001f4374a03c133 1 NaN 1 1.000 -0.0 NaN NaN
1 0001f4374a03c133 2 NaN 2 2.000 -0.0 NaN NaN
2 0001f4374a03c133 3 NaN 3 3.000 -51.0 NaN NaN
3 0001f4374a03c133 4 Red 4 4.635 132.0 1.0 NaN
4 0001f4374a03c133 5 NaN 5 5.000 35.0 NaN NaN
5 0001f4374a03c133 6 Red 6 6.064 940.0 1.0 NaN
6 0001f4374a03c133 7 NaN 7 7.000 589.0 NaN NaN
7 0001f4374a03c133 8 Blue 8 8.194 1391.0 1.0 NaN
8 0001f4374a03c133 9 NaN 9 9.000 1151.0 NaN NaN
9 0001f4374a03c133 10 Red 10 10.478 563.0 1.0 NaN
11 0001f4374a03c133 11 Red 11 11.006 915.0 NaN OUTER_TURRET
10 0001f4374a03c133 12 Blue 11 11.182 915.0 NaN OUTER_TURRET
12 0001f4374a03c133 13 Red 11 11.261 915.0 NaN DRAGON
13 0001f4374a03c133 14 NaN 12 12.000 2444.0 NaN NaN
14 0001f4374a03c133 15 NaN 13 13.000 1509.0 NaN NaN
15 0001f4374a03c133 16 NaN 14 14.000 1859.0 NaN NaN
16 0001f4374a03c133 17 Red 15 15.777 2133.0 NaN RIFT_HERALD
18 0001f4374a03c133 18 Red 16 16.145 2162.0 NaN OUTER_TURRET
17 0001f4374a03c133 19 Blue 16 16.556 2162.0 NaN OUTER_TURRET
19 0001f4374a03c133 20 Red 17 17.570 2243.0 2.0 DRAGON
20 0001f4374a03c133 21 Blue 18 18.378 1793.0 1.0 OUTER_TURRET
21 0001f4374a03c133 22 Red 18 18.378 1793.0 2.0 OUTER_TURRET
22 0001f4374a03c133 23 NaN 19 19.000 3151.0 NaN NaN
23 0001f4374a03c133 24 NaN 20 20.000 4194.0 NaN NaN
24 0001f4374a03c133 25 Red 21 21.790 4792.0 1.0 NaN
25 0001f4374a03c133 26 Blue 22 22.915 4494.0 2.0 BARON_NASHOR
26 0001f4374a03c133 27 Red 22 22.915 4494.0 5.0 BARON_NASHOR
27 0001f4374a03c133 28 Red 23 23.885 4833.0 NaN DRAGON
28 0001f4374a03c133 29 Red 24 24.943 7907.0 NaN INNER_TURRET
29 0001f4374a03c133 30 Red 25 25.463 7681.0 NaN INNER_TURRET

In [20]:

GoldstackedData3[GoldstackedData3['GoldDiff']==999999].head(3)

Out[20]:

  id EventNum Team Minute Time GoldDiff KillCount Struct/Monster
238225 0001f4374a03c133 44 NaN 34 34.0 999999.0 NaN NaN
240809 0016710a48fdd46d 55 NaN 40 40.0 999999.0 NaN NaN
241084 0016c9df37278448 55 NaN 40 40.0 999999.0 NaN NaN

In [21]:

# We then add an 'Event' column to merge the columns into one, where kills are now
# simple labelled as 'KILLS'

GoldstackedData3['Event'] = np.where(GoldstackedData3['KillCount']>0,"KILLS",None)
GoldstackedData3['Event'] = GoldstackedData3['Event'].fillna(GoldstackedData3['Struct/Monster'])

GoldstackedData3['Event'] = GoldstackedData3['Event'].fillna("NONE")

GoldstackedData3['GoldDiff2'] = np.where( GoldstackedData3['GoldDiff']== 999999,"WIN",
                                np.where( GoldstackedData3['GoldDiff']==-999999, 'LOSS',
                                         
    
                                np.where((GoldstackedData3['GoldDiff']<1000) & (GoldstackedData3['GoldDiff']>-1000),
                                        "EVEN",
                                np.where( (GoldstackedData3['GoldDiff']>=1000) & (GoldstackedData3['GoldDiff']<2500),
                                         "SLIGHTLY_AHEAD",
                                np.where( (GoldstackedData3['GoldDiff']>=2500) & (GoldstackedData3['GoldDiff']<5000),
                                         "AHEAD",
                                np.where( (GoldstackedData3['GoldDiff']>=5000),
                                         "VERY_AHEAD",
                                         
                                np.where( (GoldstackedData3['GoldDiff']<=-1000) & (GoldstackedData3['GoldDiff']>-2500),
                                         "SLIGHTLY_BEHIND",
                                np.where( (GoldstackedData3['GoldDiff']<=-2500) & (GoldstackedData3['GoldDiff']>-5000),
                                         "BEHIND",
                                np.where( (GoldstackedData3['GoldDiff']<=-5000),
                                         "VERY_BEHIND","ERROR"
                                        
                                        )))))))))

GoldstackedData3.head(3)

Out[21]:

  id EventNum Team Minute Time GoldDiff KillCount Struct/Monster Event GoldDiff2
0 0001f4374a03c133 1 NaN 1 1.0 -0.0 NaN NaN NONE EVEN
1 0001f4374a03c133 2 NaN 2 2.0 -0.0 NaN NaN NONE EVEN
2 0001f4374a03c133 3 NaN 3 3.0 -51.0 NaN NaN NONE EVEN

In [22]:

GoldstackedData3[GoldstackedData3['GoldDiff2']=="ERROR"]

Out[22]:

In [23]:

GoldstackedData3[GoldstackedData3['Team']=='Blue'].head(10)

Out[23]:

  id EventNum Team Minute Time GoldDiff KillCount Struct/Monster Event GoldDiff2
7 0001f4374a03c133 8 Blue 8 8.1940 1391.0 1.0 NaN KILLS SLIGHTLY_AHEAD
10 0001f4374a03c133 12 Blue 11 11.1820 915.0 NaN OUTER_TURRET OUTER_TURRET EVEN
17 0001f4374a03c133 19 Blue 16 16.5560 2162.0 NaN OUTER_TURRET OUTER_TURRET SLIGHTLY_AHEAD
20 0001f4374a03c133 21 Blue 18 18.3780 1793.0 1.0 OUTER_TURRET KILLS SLIGHTLY_AHEAD
25 0001f4374a03c133 26 Blue 22 22.9150 4494.0 2.0 BARON_NASHOR KILLS AHEAD
35 0001f4374a03c133 36 Blue 31 31.1685 12117.0 2.0 NaN KILLS VERY_AHEAD
47 0016710a48fdd46d 5 Blue 5 5.9710 -341.0 1.0 NaN KILLS EVEN
53 0016710a48fdd46d 12 Blue 11 11.9720 -283.0 1.0 NaN KILLS EVEN
55 0016710a48fdd46d 13 Blue 12 12.6030 -234.0 NaN OUTER_TURRET OUTER_TURRET EVEN
58 0016710a48fdd46d 16 Blue 15 15.4360 -1190.0 NaN DRAGON DRAGON SLIGHTLY_BEHIND

In [24]:

GoldstackedData3['Next_Min'] = GoldstackedData3['Minute']+1


GoldstackedData4 = GoldstackedData3.merge(gold4[['id','Minute','GoldDiff']],how='left',left_on=['id','Next_Min'],
                                         right_on=['id','Minute'])

GoldstackedData4.head(10)

Out[24]:

  id EventNum Team Minute_x Time GoldDiff_x KillCount Struct/Monster Event GoldDiff2 Next_Min Minute_y GoldDiff_y
0 0001f4374a03c133 1 NaN 1 1.000 -0.0 NaN NaN NONE EVEN 2 2.0 -0.0
1 0001f4374a03c133 2 NaN 2 2.000 -0.0 NaN NaN NONE EVEN 3 3.0 -51.0
2 0001f4374a03c133 3 NaN 3 3.000 -51.0 NaN NaN NONE EVEN 4 4.0 132.0
3 0001f4374a03c133 4 Red 4 4.635 132.0 1.0 NaN KILLS EVEN 5 5.0 35.0
4 0001f4374a03c133 5 NaN 5 5.000 35.0 NaN NaN NONE EVEN 6 6.0 940.0
5 0001f4374a03c133 6 Red 6 6.064 940.0 1.0 NaN KILLS EVEN 7 7.0 589.0
6 0001f4374a03c133 7 NaN 7 7.000 589.0 NaN NaN NONE EVEN 8 8.0 1391.0
7 0001f4374a03c133 8 Blue 8 8.194 1391.0 1.0 NaN KILLS SLIGHTLY_AHEAD 9 9.0 1151.0
8 0001f4374a03c133 9 NaN 9 9.000 1151.0 NaN NaN NONE SLIGHTLY_AHEAD 10 10.0 563.0
9 0001f4374a03c133 10 Red 10 10.478 563.0 1.0 NaN KILLS EVEN 11 11.0 915.0

In [25]:

GoldstackedData4[ GoldstackedData4['GoldDiff_y']== -999999].head(3)

Out[25]:

  id EventNum Team Minute_x Time GoldDiff_x KillCount Struct/Monster Event GoldDiff2 Next_Min Minute_y GoldDiff_y
409 0091705b03924485 30 NaN 26 26.000 -10450.0 NaN NaN NONE VERY_BEHIND 27 27.0 -999999.0
491 00986b51908a63c3 79 NaN 68 68.000 -3283.0 NaN NaN NONE BEHIND 69 69.0 -999999.0
538 00b13dbf1bd7aff0 44 Blue 35 35.129 -2880.0 1.0 INHIBITOR KILLS BEHIND 36 36.0 -999999.0

In [26]:

GoldstackedData4['GoldDiff2_Next'] =  np.where( GoldstackedData4['GoldDiff_y']== 999999,"WIN",
                                np.where( GoldstackedData4['GoldDiff_y']==-999999, 'LOSS',
                                         
    
                                np.where((GoldstackedData4['GoldDiff_y']<1000) & (GoldstackedData4['GoldDiff_y']>-1000),
                                        "EVEN",
                                np.where( (GoldstackedData4['GoldDiff_y']>=1000) & (GoldstackedData4['GoldDiff_y']<2500),
                                         "SLIGHTLY_AHEAD",
                                np.where( (GoldstackedData4['GoldDiff_y']>=2500) & (GoldstackedData4['GoldDiff_y']<5000),
                                         "AHEAD",
                                np.where( (GoldstackedData4['GoldDiff_y']>=5000),
                                         "VERY_AHEAD",
                                         
                                np.where( (GoldstackedData4['GoldDiff_y']<=-1000) & (GoldstackedData4['GoldDiff_y']>-2500),
                                         "SLIGHTLY_BEHIND",
                                np.where( (GoldstackedData4['GoldDiff_y']<=-2500) & (GoldstackedData4['GoldDiff_y']>-5000),
                                         "BEHIND",
                                np.where( (GoldstackedData4['GoldDiff_y']<=-5000),
                                         "VERY_BEHIND","ERROR"
                                        
                                        )))))))))
GoldstackedData4 = GoldstackedData4[['id','EventNum','Team','Minute_x','Time','Event','GoldDiff2','GoldDiff2_Next']]
GoldstackedData4.columns = ['id','EventNum','Team','Minute','Time','Event','GoldDiff2','GoldDiff2_Next']

GoldstackedData4['Event'] = np.where( GoldstackedData4['Team']=="Red", "+"+GoldstackedData4['Event'],
                                np.where(GoldstackedData4['Team']=="Blue", "-"+GoldstackedData4['Event'], 
                                         GoldstackedData4['Event']))

#GoldstackedData4.head(10)

In [27]:

# Errors are caused due to game ending in minute and then there is no 'next_min' info for this game but our method expects there to be
GoldstackedData4 = GoldstackedData4[GoldstackedData4['GoldDiff2_Next']!="ERROR"]
GoldstackedData4[GoldstackedData4['GoldDiff2_Next']=="ERROR"]

Out[27]:

In [28]:

GoldstackedDataFINAL = GoldstackedData4
GoldstackedDataFINAL['Min_State_Action_End'] = ((GoldstackedDataFINAL['Minute'].astype(str)) + "_"
                                       + (GoldstackedDataFINAL['GoldDiff2'].astype(str)) + "_"
                                       + (GoldstackedDataFINAL['Event'].astype(str)) + "_"  
                                       + (GoldstackedDataFINAL['GoldDiff2_Next'].astype(str))
                                      )

GoldstackedDataFINAL['MSAE'] = ((GoldstackedDataFINAL['Minute'].astype(str)) + "_"
                                       + (GoldstackedDataFINAL['GoldDiff2'].astype(str)) + "_"
                                       + (GoldstackedDataFINAL['Event'].astype(str)) + "_"  
                                       + (GoldstackedDataFINAL['GoldDiff2_Next'].astype(str))
                                      )

GoldstackedDataFINAL.head()

Out[28]:

  id EventNum Team Minute Time Event GoldDiff2 GoldDiff2_Next Min_State_Action_End MSAE
0 0001f4374a03c133 1 NaN 1 1.000 NONE EVEN EVEN 1_EVEN_NONE_EVEN 1_EVEN_NONE_EVEN
1 0001f4374a03c133 2 NaN 2 2.000 NONE EVEN EVEN 2_EVEN_NONE_EVEN 2_EVEN_NONE_EVEN
2 0001f4374a03c133 3 NaN 3 3.000 NONE EVEN EVEN 3_EVEN_NONE_EVEN 3_EVEN_NONE_EVEN
3 0001f4374a03c133 4 Red 4 4.635 +KILLS EVEN EVEN 4_EVEN_+KILLS_EVEN 4_EVEN_+KILLS_EVEN
4 0001f4374a03c133 5 NaN 5 5.000 NONE EVEN EVEN 5_EVEN_NONE_EVEN 5_EVEN_NONE_EVEN

In [29]:

goldMDP = GoldstackedDataFINAL[['Minute','GoldDiff2','Event','GoldDiff2_Next']]
goldMDP.columns = ['Minute','State','Action','End']
goldMDP['Counter'] = 1
goldMDP.head()

Out[29]:

  Minute State Action End Counter
0 1 EVEN NONE EVEN 1
1 2 EVEN NONE EVEN 1
2 3 EVEN NONE EVEN 1
3 4 EVEN +KILLS EVEN 1
4 5 EVEN NONE EVEN 1

In [30]:

goldMDP[goldMDP['End']=='ERROR'].head(3)

Out[30]:

In [31]:

goldMDP2 = goldMDP.groupby(['Minute','State','Action','End']).count().reset_index()
goldMDP2['Prob'] = goldMDP2['Counter']/(goldMDP2['Counter'].sum())
goldMDP2.head()

Out[31]:

  Minute State Action End Counter Prob
0 1 EVEN +KILLS EVEN 70 0.000285
1 1 EVEN -KILLS EVEN 80 0.000325
2 1 EVEN NONE EVEN 4770 0.019408
3 1 EVEN NONE SLIGHTLY_AHEAD 1 0.000004
4 2 EVEN +KILLS EVEN 228 0.000928

In [32]:

goldMDP3 = goldMDP.groupby(['Minute','State','Action']).count().reset_index()
goldMDP3['Prob'] = goldMDP3['Counter']/(goldMDP3['Counter'].sum())
goldMDP3.head()

Out[32]:

  Minute State Action End Counter Prob
0 1 EVEN +KILLS 70 70 0.000285
1 1 EVEN -KILLS 80 80 0.000325
2 1 EVEN NONE 4771 4771 0.019412
3 2 EVEN +KILLS 228 228 0.000928
4 2 EVEN -KILLS 269 269 0.001094

In [33]:

goldMDP4 = goldMDP2.merge(goldMDP3[['Minute','State','Action','Prob']], how='left',on=['Minute','State','Action'] )
goldMDP4.head(20)

Out[33]:

  Minute State Action End Counter Prob_x Prob_y
0 1 EVEN +KILLS EVEN 70 0.000285 0.000285
1 1 EVEN -KILLS EVEN 80 0.000325 0.000325
2 1 EVEN NONE EVEN 4770 0.019408 0.019412
3 1 EVEN NONE SLIGHTLY_AHEAD 1 0.000004 0.019412
4 2 EVEN +KILLS EVEN 228 0.000928 0.000928
5 2 EVEN -KILLS EVEN 269 0.001094 0.001094
6 2 EVEN NONE EVEN 4457 0.018134 0.018151
7 2 EVEN NONE SLIGHTLY_AHEAD 2 0.000008 0.018151
8 2 EVEN NONE SLIGHTLY_BEHIND 2 0.000008 0.018151
9 2 SLIGHTLY_AHEAD NONE SLIGHTLY_AHEAD 1 0.000004 0.000004
10 3 EVEN +DRAGON EVEN 5 0.000020 0.000020
11 3 EVEN +KILLS EVEN 592 0.002409 0.002421
12 3 EVEN +KILLS SLIGHTLY_BEHIND 3 0.000012 0.002421
13 3 EVEN +OUTER_TURRET EVEN 437 0.001778 0.001778
14 3 EVEN -DRAGON EVEN 6 0.000024 0.000024
15 3 EVEN -KILLS EVEN 632 0.002571 0.002592
16 3 EVEN -KILLS SLIGHTLY_BEHIND 5 0.000020 0.002592
17 3 EVEN -OUTER_TURRET EVEN 422 0.001717 0.001717
18 3 EVEN NONE EVEN 3389 0.013789 0.013895
19 3 EVEN NONE SLIGHTLY_AHEAD 12 0.000049 0.013895

In [34]:

goldMDP4['GivenProb'] = goldMDP4['Prob_x']/goldMDP4['Prob_y']
goldMDP4 = goldMDP4.sort_values('GivenProb',ascending=False)
goldMDP4['Next_Minute'] = goldMDP4['Minute']+1
goldMDP4[(goldMDP4['State']!=goldMDP4['End'])&(goldMDP4['Counter']>10)&(goldMDP4['State']!="WIN")&(goldMDP4['State']!="LOSS")].head(10)

Out[34]:

  Minute State Action End Counter Prob_x Prob_y GivenProb Next_Minute
5397 31 BEHIND -BASE_TURRET VERY_BEHIND 20 0.000081 0.000102 0.800000 32
5378 31 BEHIND +INHIBITOR VERY_BEHIND 17 0.000069 0.000094 0.739130 32
4483 28 BEHIND -BASE_TURRET VERY_BEHIND 14 0.000057 0.000081 0.700000 29
5642 32 AHEAD +INHIBITOR VERY_AHEAD 17 0.000069 0.000102 0.680000 33
6726 35 SLIGHTLY_AHEAD +INNER_TURRET AHEAD 12 0.000049 0.000073 0.666667 36
4117 27 AHEAD +BASE_TURRET VERY_AHEAD 13 0.000053 0.000081 0.650000 28
4124 27 AHEAD +INHIBITOR VERY_AHEAD 11 0.000045 0.000069 0.647059 28
4771 29 BEHIND +INHIBITOR VERY_BEHIND 18 0.000073 0.000114 0.642857 30
3553 25 AHEAD +BASE_TURRET VERY_AHEAD 12 0.000049 0.000077 0.631579 26
5800 32 SLIGHTLY_AHEAD +INNER_TURRET AHEAD 12 0.000049 0.000077 0.631579 33

Reinforcement Learning AI Model

Now that we have our data modelled as an MDP, we can apply Reinforcement Learning. In short, this applied a model that simulates thousands of games and learns how good or bad each decision is towards reaching a win given the team’s current position.

What makes this AI is its ability to learn from its own trial and error experience. It starts with zero knowledge about the game but, as it is rewarded for reaching a win and punished for reaching a loss, it begins to recognise and remember which decisions are better than others. Our first models start with no knowledge but I later demonstrate the impact initial information about decisions can be fed into the model to represent a person’s preferences.

So how is the model learning? In short, we use Monte Carlo learning whereby each episode is a simulation of a game based on our MDP probabilities and depending on the outcome for the team, our return will vary (+1 terminal reward for win and -1 terminal reward for loss). The value of each action taken in this episode is then updated accordingly based on whether the outcome was a win or loss.

In Monte Carlo learning, we have a parameter 'gamma' that discounts the rewards and will give a higher value to immediate steps than later one. In our model, this can be understood by the fact that as we reach later stages of the games, the decisions we make will have a much larger impact on the final outcome than those made in the first few minutes. For example, losing a team fight in minute 50 is much more likely to lead to a loss than losing a team fight in the first 5 minutes.

We can re-apply our model from part 1 with some minor adjustments for our new MDP.

In [35]:

def MCModelv4(data, alpha, gamma, epsilon, reward, StartState, StartMin, StartAction, num_episodes, Max_Mins):
    
    # Initiatise variables appropiately
    
    data['V'] = 0
    data_output = data
    
    outcomes = pd.DataFrame()
    episode_return = pd.DataFrame()
    actions_output = pd.DataFrame()
    V_output = pd.DataFrame()
    
    
    Actionist = [
       'NONE',
       'KILLS', 'OUTER_TURRET', 'DRAGON', 'RIFT_HERALD', 'BARON_NASHOR',
       'INNER_TURRET', 'BASE_TURRET', 'INHIBITOR', 'NEXUS_TURRET',
       'ELDER_DRAGON']
    
    
    for e in range(0,num_episodes):
        action = []
        
        current_min = StartMin
        current_state = StartState
        
        
        
        data_e1 = data
    
    
        actions = pd.DataFrame()

        for a in range(0,100):
            
            action_table = pd.DataFrame()
       
            # Break condition if game ends or gets to a large number of mins 
            if (current_state=="WIN") | (current_state=="LOSS") | (current_min==Max_Mins):
                continue
            else:
                if a==0:
                    data_e1=data_e1
                   
                elif (len(individual_actions_count[individual_actions_count['Action']=="+RIFT_HERALD"])==1):
                    data_e1_e1 = data_e1[(data_e1['Action']!='+RIFT_HERALD')|(data_e1['Action']!='-RIFT_HERALD')]
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="-RIFT_HERALD"])==1):
                    data_e1 = data_e1[(data_e1['Action']!='+RIFT_HERALD')|(data_e1['Action']!='-RIFT_HERALD')]
                
                elif (len(individual_actions_count[individual_actions_count['Action']=="+OUTER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+OUTER_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-OUTER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-OUTER_TURRET']
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="+INNER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+INNER_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-INNER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-INNER_TURRET']
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="+BASE_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+BASE_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-BASE_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-BASE_TURRET']
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="+INHIBITOR"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+INHIBITOR']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-INHIBITOR"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-INHIBITOR']
                elif (len(individual_actions_count[individual_actions_count['Action']=="+NEXUS_TURRET"])==2):
                    data_e1 = data_e1[data_e1['Action']!='+NEXUS_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-NEXUS_TURRET"])==2):
                    data_e1 = data_e1[data_e1['Action']!='-NEXUS_TURRET']
                
                       
                else:
                    data_e1 = data_e1
                    
                # Break condition if we do not have enough data    
                if len(data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)])==0:
                    continue
                else:             

                    if (a>0) | (StartAction is None):
                        random_action = data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)].sample()
                        random_action = random_action.reset_index()
                        current_action = random_action['Action'][0]
                    else:
                        current_action =  StartAction


                    data_e = data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)&(data_e1['Action']==current_action)]

                    data_e = data_e[data_e['GivenProb']>0]





                    data_e = data_e.sort_values('GivenProb')
                    data_e['CumProb'] = data_e['GivenProb'].cumsum()
                    data_e['CumProb'] = np.round(data_e['CumProb'],4)


                    rng = np.round(np.random.random()*data_e['CumProb'].max(),4)
                    action_table = data_e[ data_e['CumProb'] >= rng]
                    action_table = action_table[ action_table['CumProb'] == action_table['CumProb'].min()]
                    action_table = action_table.reset_index()


                    action = current_action
                    next_state = action_table['End'][0]
                    next_min = current_min+1


                    if next_state == "WIN":
                        step_reward = 10*(gamma**a)
                    elif next_state == "LOSS":
                        step_reward = -10*(gamma**a)
                    else:
                        step_reward = -0.005*(gamma**a)

                    action_table['StepReward'] = step_reward


                    action_table['Episode'] = e
                    action_table['Action_Num'] = a

                    current_action = action
                    current_min = next_min
                    current_state = next_state


                    actions = actions.append(action_table)

                    individual_actions_count = actions


        actions_output = actions_output.append(actions)
                
        episode_return = actions['StepReward'].sum()

                
        actions['Return']= episode_return
                
        data_output = data_output.merge(actions[['Minute','State','Action','End','Return']], how='left',on =['Minute','State','Action','End'])
        data_output['Return'] = data_output['Return'].fillna(0)    
             
        data_output['V'] = data_output['V'] + alpha*(data_output['Return']-data_output['V'])
        data_output = data_output.drop('Return', 1)
        
        V_outputs = pd.DataFrame({'Episode':[e],'V_total':[data_output['V'].sum()]})
        V_output = V_output.append(V_outputs)
        
                
        if current_state=="WIN":
            outcome = "WIN"
        elif current_state=="LOSS":
            outcome = "LOSS"
        else:
            outcome = "INCOMPLETE"
        outcome = pd.DataFrame({'Epsiode':[e],'Outcome':[outcome]})
        outcomes = outcomes.append(outcome)

        
   


    return(outcomes,actions_output,data_output,V_output)
    

In [36]:

alpha = 0.1
gamma = 0.9
num_episodes = 100
epsilon = 0.1


goldMDP4['Reward'] = 0
reward = goldMDP4['Reward']

StartMin = 15
StartState = 'EVEN'
StartAction = None
data = goldMDP4

Max_Mins = 50
start_time = timeit.default_timer()


Mdl4 = MCModelv4(data=data, alpha = alpha, gamma=gamma, epsilon = epsilon, reward = reward,
                StartMin = StartMin, StartState=StartState,StartAction=StartAction, 
                num_episodes = num_episodes, Max_Mins = Max_Mins)

elapsed = timeit.default_timer() - start_time

print("Time taken to run model:",np.round(elapsed/60,2),"mins")
print("Avg Time taken per episode:", np.round(elapsed/num_episodes,2),"secs")
Time taken to run model: 0.64 mins
Avg Time taken per episode: 0.38 secs

In [37]:

Mdl4[1].head(10)

Out[37]:

  index Minute State Action End Counter Prob_x Prob_y GivenProb Next_Minute Reward V CumProb StepReward Episode Action_Num
0 1319 15 EVEN +RIFT_HERALD EVEN 26 0.000106 0.000130 0.812500 16 0 0 1.0000 -0.005000 0 0
0 1530 16 EVEN -OUTER_TURRET EVEN 119 0.000484 0.000679 0.712575 17 0 0 1.0000 -0.004500 0 1
0 1731 17 EVEN -DRAGON SLIGHTLY_AHEAD 4 0.000016 0.000220 0.074074 18 0 0 0.0926 -0.004050 0 2
0 1978 18 SLIGHTLY_AHEAD +DRAGON SLIGHTLY_AHEAD 35 0.000142 0.000236 0.603448 19 0 0 1.0000 -0.003645 0 3
0 2232 19 SLIGHTLY_AHEAD NONE SLIGHTLY_AHEAD 232 0.000944 0.001261 0.748387 20 0 0 1.0000 -0.003280 0 4
0 2420 20 SLIGHTLY_AHEAD +BASE_TURRET AHEAD 2 0.000008 0.000012 0.666667 21 0 0 1.0000 -0.002952 0 5
0 2555 21 AHEAD -INNER_TURRET SLIGHTLY_AHEAD 1 0.000004 0.000016 0.250000 22 0 0 0.2500 -0.002657 0 6
0 2901 22 SLIGHTLY_AHEAD -BASE_TURRET AHEAD 1 0.000004 0.000004 1.000000 23 0 0 1.0000 -0.002391 0 7
0 3038 23 AHEAD -INNER_TURRET AHEAD 5 0.000020 0.000020 1.000000 24 0 0 1.0000 -0.002152 0 8
0 3309 24 AHEAD -KILLS AHEAD 107 0.000435 0.000635 0.685897 25 0 0 1.0000 -0.001937 0 9

So we now have the values of each state action pair, we can use this to select the single next best step as being the action with the highest value given the current state.

In [38]:

final_output = Mdl4[2]

final_output2 = final_output[(final_output['Minute']==StartMin)&(final_output['State']==StartState)].sort_values('V',ascending=False).head(10)
final_output2

Out[38]:

  Minute State Action End Counter Prob_x Prob_y GivenProb Next_Minute Reward V
1853 15 EVEN -DRAGON EVEN 51 0.000208 0.000252 0.822581 16 0 0.200944
2066 15 EVEN -INNER_TURRET EVEN 14 0.000057 0.000073 0.777778 16 0 0.145974
3034 15 EVEN +DRAGON EVEN 49 0.000199 0.000305 0.653333 16 0 0.103227
2569 15 EVEN -OUTER_TURRET EVEN 133 0.000541 0.000769 0.703704 16 0 0.085860
7494 15 EVEN +OUTER_TURRET SLIGHTLY_AHEAD 27 0.000110 0.000655 0.167702 16 0 0.056043
6950 15 EVEN +INNER_TURRET SLIGHTLY_AHEAD 3 0.000012 0.000061 0.200000 16 0 0.013789
7679 15 EVEN -INNER_TURRET SLIGHTLY_BEHIND 3 0.000012 0.000073 0.166667 16 0 0.011424
7669 15 EVEN -RIFT_HERALD SLIGHTLY_BEHIND 6 0.000024 0.000146 0.166667 16 0 0.003259
2109 15 EVEN +KILLS EVEN 382 0.001554 0.002018 0.770161 16 0 0.001017
8552 15 EVEN +KILLS SLIGHTLY_BEHIND 60 0.000244 0.002018 0.120968 16 0 0.000975

In theory, we are now ready to run the model for as many episodes as possible so that the results will eventually converge to an optimal suggestion. However, it seems that we will need a lot of epsiodes with this current method to get anything close to convergence.

In [39]:

sum_V_episodes = Mdl4[3]

plt.plot(sum_V_episodes['Episode'],sum_V_episodes['V_total'])
plt.xlabel("Epsiode")
plt.ylabel("Sum of V")
plt.title("Cumulative V by Episode")
plt.show()

Model Improvements

There are many ways we can improve this, one solution would be to greatly simplify the problem by breaking the game into segments. Often these segments are referred to as early, mid and late game and would be we would only need to consider fewer steps to reach an end goal. In this case it would not be to lead to a win but rather aiming to be at an advantage and the end of, say, 10 minute intervals.

Another solution would be to use a model that learns quicker instead of the Monte Carlo method used. This often includes Q-learning or SARSA.

We will not consider doing this here as they would require a lot of work to re-adjust the code. Instead, we will improve the rate the model learns by the way it selects its actions. Currently, we can either define the first action or it will choose randomly which means it is. However, after this first action all subsequent actions are also chosen randomly which is causing the number of episodes needed to exponentially increase for convergence to occur.

Therefore, we will introduce a basic action selection method for these known as greedy selection. This means we select the best action the majority of the time therefore continually testing the success rate of the actions it thinks are best. It will also randomly select actions some of the time so that it can still explore states and doesnt get caught in a local maximum and not the optimal sulution.

Also parameters play a key part in how quickly the output will converge, none moreso than our alpha parameter. Although a small alpha will be more accurate, a larger alpha value will learn and therefore subsequently converge faster.

We will adjust our code to utilise both greedy selection and a reasonably large alpha as well as running for as many episodes as possible. More episodes means the runtime will be longer but because, at least at this stage, we are not attemping to have this running in real time there is no issue with it running for a while to find an optimal suggestion.

It takes approximately 10 mins to run the model for 1,000 episodes, we have also included a tracking system for the run progress which shows the current percentage (based on which episode) the loop is in. This is particuarly useful if running for anything more than a few minutes.

It was at this stage, I also notice that my method for applying the update rule was overriding previous knowledge if the action wasnt used in the current episode and so all results were converging back to zero after each episode. I fixed this by making it so if the state + action wasnt used in the episode, then it simply remians the same and will show in our results as being the flat parts of the lines.

Lastly, we output the value of each of our available actions given our start state in each episode so we can track the learning process for these and how, over many episodes, our optimal action is decided.

In [40]:

from IPython.display import clear_output

In [41]:

def MCModelv5(data, alpha, gamma, epsilon, reward, StartState, StartMin, StartAction, num_episodes, Max_Mins):
    
    # Initiatise variables appropiately
    
    data['V'] = 0
    data_output = data
    
    outcomes = pd.DataFrame()
    episode_return = pd.DataFrame()
    actions_output = pd.DataFrame()
    V_output = pd.DataFrame()
    
    
    Actionist = [
       'NONE',
       'KILLS', 'OUTER_TURRET', 'DRAGON', 'RIFT_HERALD', 'BARON_NASHOR',
       'INNER_TURRET', 'BASE_TURRET', 'INHIBITOR', 'NEXUS_TURRET',
       'ELDER_DRAGON']
        
    for e in range(0,num_episodes):
        clear_output(wait=True)
        
        action = []
        
        current_min = StartMin
        current_state = StartState
        
        
        
        data_e1 = data
    
    
        actions = pd.DataFrame()

        for a in range(0,100):
            
            action_table = pd.DataFrame()
       
            # Break condition if game ends or gets to a large number of mins 
            if (current_state=="WIN") | (current_state=="LOSS") | (current_min==Max_Mins):
                continue
            else:
                if a==0:
                    data_e1=data_e1
                   
                elif (len(individual_actions_count[individual_actions_count['Action']=="+RIFT_HERALD"])==1):
                    data_e1_e1 = data_e1[(data_e1['Action']!='+RIFT_HERALD')|(data_e1['Action']!='-RIFT_HERALD')]
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="-RIFT_HERALD"])==1):
                    data_e1 = data_e1[(data_e1['Action']!='+RIFT_HERALD')|(data_e1['Action']!='-RIFT_HERALD')]
                
                elif (len(individual_actions_count[individual_actions_count['Action']=="+OUTER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+OUTER_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-OUTER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-OUTER_TURRET']
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="+INNER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+INNER_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-INNER_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-INNER_TURRET']
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="+BASE_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+BASE_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-BASE_TURRET"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-BASE_TURRET']
                    
                elif (len(individual_actions_count[individual_actions_count['Action']=="+INHIBITOR"])==3):
                    data_e1 = data_e1[data_e1['Action']!='+INHIBITOR']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-INHIBITOR"])==3):
                    data_e1 = data_e1[data_e1['Action']!='-INHIBITOR']
                elif (len(individual_actions_count[individual_actions_count['Action']=="+NEXUS_TURRET"])==2):
                    data_e1 = data_e1[data_e1['Action']!='+NEXUS_TURRET']
                elif (len(individual_actions_count[individual_actions_count['Action']=="-NEXUS_TURRET"])==2):
                    data_e1 = data_e1[data_e1['Action']!='-NEXUS_TURRET']
                
                       
                else:
                    data_e1 = data_e1
                    
                # Break condition if we do not have enough data    
                if len(data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)])==0:
                    continue
                else:             

                    
                    # Greedy Selection:
                    # If this is our first action and start action is non, select greedily. 
                    # Else, if first actions is given in our input then we use this as our start action. 
                    # Else for other actions, if it is the first episode then we have no knowledge so randomly select actions
                    # Else for other actions, we randomly select actions a percentage of the time based on our epsilon and greedily (max V) for the rest 
                    
                    
                    if   (a==0) & (StartAction is None):
                        random_action = data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)].sample()
                        random_action = random_action.reset_index()
                        current_action = random_action['Action'][0]
                    elif (a==0):
                        current_action =  StartAction
                    
                    elif (e==0) & (a>0):
                        random_action = data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)].sample()
                        random_action = random_action.reset_index()
                        current_action = random_action['Action'][0]
                    
                    elif (e>0) & (a>0):
                        epsilon = epsilon
                        greedy_rng = np.round(np.random.random(),2)
                        if (greedy_rng<=epsilon):
                            random_action = data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)].sample()
                            random_action = random_action.reset_index()
                            current_action = random_action['Action'][0]
                        else:
                            greedy_action = (
                            
                                data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)][
                                    
                                    data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)]['V']==data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)]['V'].max()
                                
                                ])
                                
                            greedy_action = greedy_action.reset_index()
                            current_action = greedy_action['Action'][0]
                            
                  
                    
                        
                   
                            
                        
                        
                        
                    


                    data_e = data_e1[(data_e1['Minute']==current_min)&(data_e1['State']==current_state)&(data_e1['Action']==current_action)]

                    data_e = data_e[data_e['GivenProb']>0]





                    data_e = data_e.sort_values('GivenProb')
                    data_e['CumProb'] = data_e['GivenProb'].cumsum()
                    data_e['CumProb'] = np.round(data_e['CumProb'],4)


                    rng = np.round(np.random.random()*data_e['CumProb'].max(),4)
                    action_table = data_e[ data_e['CumProb'] >= rng]
                    action_table = action_table[ action_table['CumProb'] == action_table['CumProb'].min()]
                    action_table = action_table.reset_index()


                    action = current_action
                    next_state = action_table['End'][0]
                    next_min = current_min+1


                    if next_state == "WIN":
                        step_reward = 10*(gamma**a)
                    elif next_state == "LOSS":
                        step_reward = -10*(gamma**a)
                    else:
                        step_reward = -0.005*(gamma**a)

                    action_table['StepReward'] = step_reward


                    action_table['Episode'] = e
                    action_table['Action_Num'] = a

                    current_action = action
                    current_min = next_min
                    current_state = next_state


                    actions = actions.append(action_table)

                    individual_actions_count = actions
                    
        print("Current progress:", np.round((e/num_episodes)*100,2),"%")

        actions_output = actions_output.append(actions)
                
        episode_return = actions['StepReward'].sum()

                
        actions['Return']= episode_return
                
        data_output = data_output.merge(actions[['Minute','State','Action','End','Return']], how='left',on =['Minute','State','Action','End'])
        data_output['Return'] = data_output['Return'].fillna(0)    
             
            
        data_output['V'] = np.where(data_output['Return']==0,data_output['V'],data_output['V'] + alpha*(data_output['Return']-data_output['V']))
        
        data_output = data_output.drop('Return', 1)

        
        for actions in data_output[(data_output['Minute']==StartMin)&(data_output['State']==StartState)]['Action'].unique():
            V_outputs = pd.DataFrame({'Index':[str(e)+'_'+str(actions)],'Episode':e,'StartMin':StartMin,'StartState':StartState,'Action':actions,
                                      'V':data_output[(data_output['Minute']==StartMin)&(data_output['State']==StartState)&(data_output['Action']==actions)]['V'].sum()
                                     })
            V_output = V_output.append(V_outputs)
        
        if current_state=="WIN":
            outcome = "WIN"
        elif current_state=="LOSS":
            outcome = "LOSS"
        else:
            outcome = "INCOMPLETE"
        outcome = pd.DataFrame({'Epsiode':[e],'Outcome':[outcome]})
        outcomes = outcomes.append(outcome)

        
   


    return(outcomes,actions_output,data_output,V_output)
    

In [42]:

alpha = 0.3
gamma = 0.9
num_episodes = 1000
epsilon = 0.2


goldMDP4['Reward'] = 0
reward = goldMDP4['Reward']

StartMin = 15
StartState = 'EVEN'
StartAction = None
data = goldMDP4

Max_Mins = 50
start_time = timeit.default_timer()


Mdl5 = MCModelv5(data=data, alpha = alpha, gamma=gamma, epsilon = epsilon, reward = reward,
                StartMin = StartMin, StartState=StartState,StartAction=StartAction, 
                num_episodes = num_episodes, Max_Mins = Max_Mins)

elapsed = timeit.default_timer() - start_time

print("Time taken to run model:",np.round(elapsed/60,2),"mins")
print("Avg Time taken per episode:", np.round(elapsed/num_episodes,2),"secs")
Current progress: 99.9 %
Time taken to run model: 11.53 mins
Avg Time taken per episode: 0.69 secs

In [43]:

Mdl5[3].sort_values(['Episode','Action']).head(10)

Out[43]:

  Index Episode StartMin StartState Action V
0 0_+BASE_TURRET 0 15 EVEN +BASE_TURRET 0.000000
0 0_+DRAGON 0 15 EVEN +DRAGON 0.000000
0 0_+INNER_TURRET 0 15 EVEN +INNER_TURRET 0.000000
0 0_+KILLS 0 15 EVEN +KILLS 0.000000
0 0_+OUTER_TURRET 0 15 EVEN +OUTER_TURRET 0.000000
0 0_+RIFT_HERALD 0 15 EVEN +RIFT_HERALD 0.000000
0 0_-DRAGON 0 15 EVEN -DRAGON 0.000000
0 0_-INNER_TURRET 0 15 EVEN -INNER_TURRET 0.000000
0 0_-KILLS 0 15 EVEN -KILLS 0.000000
0 0_-OUTER_TURRET 0 15 EVEN -OUTER_TURRET -0.418229

We have also changed our sum of V output to only be for the possible actions from our start state as these are the only ones we are currently concerned with.

In [44]:

V_episodes = Mdl5[3]

plt.figure(figsize=(20,10))

for actions in V_episodes['Action'].unique():
    plot_data = V_episodes[V_episodes['Action']==actions]
    plt.plot(plot_data['Episode'],plot_data['V'])
plt.xlabel("Epsiode")
plt.ylabel("V")
plt.title("V for each Action by Episode")
#plt.show()

Out[44]:

Text(0.5,1,'V for each Action by Episode')

In [45]:

final_output = Mdl5[2]


final_output2 = final_output[(final_output['Minute']==StartMin)&(final_output['State']==StartState)]
final_output3 = final_output2.groupby(['Minute','State','Action']).sum().sort_values('V',ascending=False).reset_index()
final_output3[['Minute','State','Action','V']]

Out[45]:

  Minute State Action V
0 15 EVEN +DRAGON 2.166430
1 15 EVEN -KILLS 1.403234
2 15 EVEN -INNER_TURRET 1.101869
3 15 EVEN -DRAGON 0.555246
4 15 EVEN +RIFT_HERALD 0.529776
5 15 EVEN +OUTER_TURRET 0.218571
6 15 EVEN +BASE_TURRET 0.091745
7 15 EVEN +INNER_TURRET 0.058387
8 15 EVEN +KILLS -0.118718
9 15 EVEN -RIFT_HERALD -0.294434
10 15 EVEN NONE -0.589646
11 15 EVEN -OUTER_TURRET -1.191101

In [46]:

single_action1 = final_output3['Action'][0]
single_action2 = final_output3['Action'][len(final_output3)-1]

plot_data1 = V_episodes[(V_episodes['Action']==single_action1)]
plot_data2 = V_episodes[(V_episodes['Action']==single_action2)]

plt.plot(plot_data1['Episode'],plot_data1['V'], label = single_action1)
plt.plot(plot_data2['Episode'],plot_data2['V'], label = single_action2)
plt.xlabel("Epsiode")
plt.ylabel("V")
plt.legend()
plt.title("V by Episode for the Best/Worst Actions given the Current State")
plt.show()

Part 2 Conclusion

We have now fixed our issue highlighted in part one and have a MDP that takes into account cumulative success/failures in a match by defining our current/next states by a gold advtantage/disadvantage.

We have also made a number of improvements to our model but there are many aspects that could be further improved. These will be discussed further in our next part where we will introduce personal preferences to influence the model output.

猜你喜欢

转载自blog.csdn.net/XYYxyy55/article/details/81166352