The code below is adapted from this implementation.
from math import floor
basehash = hash
class IHT:
"Structure to handle collisions"
def __init__(self, sizeval):
self.size = sizeval
self.overfullCount = 0
self.dictionary = {}
def count (self):
return len(self.dictionary)
def getindex (self, obj, readonly=False):
d = self.dictionary
if obj in d: return d[obj]
elif readonly: return None
size = self.size
count = self.count()
if count >= size:
if self.overfullCount==0: print('IHT full, starting to allow collisions')
self.overfullCount += 1
return basehash(obj) % self.size
else:
d[obj] = count
return count
def hashcoords(coordinates, m, readonly=False):
if type(m)==IHT: return m.getindex(tuple(coordinates), readonly)
if type(m)==int: return basehash(tuple(coordinates)) % m
if m==None: return coordinates
def tiles(ihtORsize, numtilings, floats, ints=[], readonly=False):
"""returns num-tilings tile indices corresponding to the floats and ints"""
qfloats = [floor(f*numtilings) for f in floats]
Tiles = []
for tiling in range(numtilings):
tilingX2 = tiling*2
coords = [tiling]
b = tiling
for q in qfloats:
coords.append( (q + b) // numtilings )
b += tilingX2
coords.extend(ints)
Tiles.append(hashcoords(coords, ihtORsize, readonly))
return Tiles
if __name__ == '__main__':
tc=IHT(4096)
tiles = tiles(tc, 8, [0, 0.5], ints=[], readonly=False)
print(tiles)
I'm trying to figure out how the function tiles()
works. It implements tile coding, which is explained in "Reinforcement Learning: An Introduction" (2020) Sutton and Barto on page 217.
So far I've figured out:
qfloats
rescales the floating numbers to be the largest integer less than or equal to the original floating number, these are then re-scaled by the number of tilings.Then, for each tiling, a list is created, the first element of the list is the tiling number, followed by the coordinates of the floats for that tiling, i.e. each of the re-scaled numbers is offset by the tiling number,
b
and integer divided bynumtilings
.Finally,
hashcoords
first checks the dictionaryd
to see if this list has appeared before, if it has it returns the number relating to that list. It not it either creates a new entry with that list at the key and the count as the value or if the count is more than or equal to the size it adds one tooverfullCount
and returnsbasehash(obj) % self.size
.
I'm struggling to understand two parts:
What is the
tilingX2
part doing?Why is
tilingX2
added tob
after the first coordinate has been calculated? It seems to me that each coordinate should be treated separatelyAnd why by a factor of 2?
What is the expression
basehash(obj) % self.size
doing? I'm quite new to the concept of hashing. I know that, generally, they create a unique number for a given input (up to a limit), but I'm really struggling to understand what is going on in the line above.