Post 12
Published:
Corrections to dict
(#2261)
There were issues with dict
collision resolution, insertion, and popping; both in Linear Probing and Separate Chaining implementations. For example,
- Linear Probing
from lpython import i32
def test():
d: dict[str, i32] = {'a': 1, 'a': 1}
print(len(d)) # 2
d.pop('a')
print(len(d)) # 1
d.pop('a')
print(len(d)) # 0
d.pop('a')
print(len(d)) # -1
test()
- Separate Chaining
from lpython import i32
def test():
d: dict[i32, i32] = {2: 1, 2: 1}
print(len(d)) # 1
d.pop(2)
print(len(d)) # 0
d.pop(2)
print(len(d)) # -1
d.pop(2)
print(len(d)) # -2
test()
I fixed these and several related issues, and would document the dict
implementation here. The set
implementation is very similar.
In LPython, we have implemented hash tables similar to CPython. CPython uses linear probing for collision resolution, and we provide both that and separate chaining options.
Hash computation: Similar to both CPython and C++, for integers it is the integer itself (modulo dictionary capacity). For strings, we use Polynomial Rolling Hash.
- Linear Probing
- Two lists are maintained - one for keys and one for values. Further, there is a list storing keymasks, that indicates if an element is present or not. The following mask values are used
- 0: no key set
- 1: key set without probing (i.e., hash value and position are same)
- 2: key set with probing
- 3: tombstone (i.e., key was deleted, and now this position is empty)
- Insertion and deletion occur by linear probing scheme.
- Two lists are maintained - one for keys and one for values. Further, there is a list storing keymasks, that indicates if an element is present or not. The following mask values are used
- Separate Chaining
- Here, a chain (linked list) of key-value pairs is maintained at each hash location. For example, to access the
i
th linked listllvm::Value* key_value_pairs = LLVM::CreateLoad(*builder, get_pointer_to_key_value_pairs(dict)); llvm::Value* dict_i = llvm_utils->create_ptr_gep(key_value_pairs, idx);
- This linked list has elements of the form
key, value, next
, wherenext
is a pointer to the next element in this LL (or null if none exists) - Dictionary insertion and deletion occur via these lists.
- Here, a chain (linked list) of key-value pairs is maintained at each hash location. For example, to access the
- Benchmarks
- Dict: Gist by Gagandeep Singh
- Set: Gist by me
This marks the end of my official GSoC contribution period. I had a lot of fun in the entire period. I learnt a lot of cool things: getting introduced to LLVM and implementing data structures for an actual compiler! Hope to keep contributing here and to open-source in general.
I would like to thank Ondřej Čertík for the conception of this wonderful compiler. My mentors Gagandeep Singh and Thirumalai Shaktivel were immensely helpful before and during this entire project. Thank you! Finally, thanks to all LPython contributors on whose work I could build on.