Post 9

2 minute read

Published: July 23, 2023

In the past couple of weeks, an implementation of sets was completed. This week, I worked on benchmarking this. I also started work on another implementation, and progressed on some earlier dictionary issues.

Set Benchmark (GitHub Gist)

This benchmarks the recent hash-set implementation that uses linear-probing for collision-resolution. We compare performance of this LPython data-structure with equivalent structures in Python (set) and C++ (unordered_set).
Performance of functions set.add and set.remove is tested.
- One way to do this is to simply insert and remove elements in order (say, from 1 to 1e7). However, this would not lead to collisions, as the set would be rehashed after reaching a size threshold (in our implementation, this is 0.6 of the capacity at any point).
- To add some more complexity, we use random numbers. To generate pseudo-random numbers without adding major performance overheads, Lehmer random number generator is used. In particular, the tests use Schrage’s method to avoid 64-bit division. This is a straightforward algorithm that uses linear modular arithmetic to generate random numbers.
As can be seen in the results, LPython outperforms both Python and C++. Surprisingly, C++ was slower than the equivalent Python code.
- When dealing with integers, LPython is more than 4.5 times faster than C++, and more than 3.6 times faster than Python.
- With strings, these gains are about 1.2 and 1.02, respectively.
For C++, we also tested with custom hash functions, to mimic those used in our LPython implementation. This did not affect the results much.

Set (Separate Chaining) (#2198)

- This is towards adding the separate-chaining collision-resolution technique. In contrast to linear probing, when collisions happen, we extend the pre-existing linked list of elements at that hash value. We already have it in dictionaries, but there are some issues in that implementation.

I created all the basic functions, and will fix remaining bugs in the coming week.
Dictionary (Keys and Values, continued) (#2023)
I had implemented this in a previous week to work with the linear-probing implementation of dictionaries. Now, I extended these functions to also work with the separate-chaining implementation.
We iterate over all the hash values, and if there is a corresponding linked list, we iterate over that, putting keys/values into a list, the latter returned at the end.

Share on

Twitter Facebook LinkedIn

Virendra Kabra

Post 9

Set Benchmark (GitHub Gist)

Set (Separate Chaining) (#2198)

Dictionary (Keys and Values, continued) (#2023)

Share on

You May Also Enjoy

GSoC 2023 Report

Post 12

Post 11

Post 10

Virendra Kabra

Set Benchmark (GitHub Gist)

​​​​​​​Set (Separate Chaining) (#2198)

​​​​​​Dictionary (Keys and Values, continued) (#2023)

Share on

You May Also Enjoy

GSoC 2023 Report

Post 12

Post 11

Post 10

Set (Separate Chaining) (#2198)

Dictionary (Keys and Values, continued) (#2023)