In my SLOG post for week 5, I put forward a theory that you could add additional parameters to a linked list node that would allow for much quicker index lookup. If in addition to self.next_, you added self.hundred_next which would store the node 100 nodes deeper, you could essentially leap over the list rather than walking it. I’m not sure what the optimal jump size would be, but I theorized that 100, 10,000, and 1,000,000 (adding 2 zeroes each time) would be a good start. Here, the largest number of leaps/steps a 100 million linked list would require would be 99 + 99 + 99 + 99. And a leap is no more taxing than walking a single node once the architecture is in place. 396 traversals feels like it would be a heck of a lot faster than 100 million (and of course it is).
This week, I decided to test this out, so I dusted off my lab 4 node.py file and implemented the additional code necessary into LinkedListNode.__init__, LinkedList.append, and LinkedList.__getitem__. An important note is that only nodes at index 0, 100, 200, etc need their self.hundred_next parameter to store a value since the index lookup of a number greater than 1000 would never see a node that isn’t a multiple of 100 until it’s within 100 of the index it’s looking for.
The results were interesting. I implemented the 100, 10,000, and 1,000,000 leaps and then built an 11 million node linked list. The lookup time of 10 large indices would round to 0.0 seconds. By comparison, without these additional parameters, it would take about 4 seconds to lookup these same indices. This is a fantastic increase in performance and if you expected to have to do a lot of index lookup actions, then implementing these parameters would certainly be worth it. But there’s certainly a downside.
Having to initialize 3 additional parameters for every node is not negligible when it comes to building a large linked list. That in addition to having to check and build connections each time you build a node that’s a multiple of a hundred caused the build time of the 11 million node linked list to more than double. Additionally, the nodes storing more data means that you use up your available memory that much quicker.
I haven’t yet implemented the insert and delete methods, but these operations would be significantly more taxing now. If we add or delete a node near the beginning of the linked list, we would essentially have to completely rebuild the node connections. A node insertion which would have previously been near instant could now take almost as long as it would take to create an entire linked list from scratch. Perhaps there are ways to intelligently write this code to make it quicker, but every hundredth node would need to have its parameters wiped, and every node one before it would need to have a new parameter added.
So did I stumble upon a breakthrough? Probably not. If you expect that you would modify your linked list very rarely in relation to how often you would be doing index lookups, then it may be worth it to implement these additional parameters, however, if you expect to make many modifications to your list, or even just don’t expect to be using __getitem__ often (perhaps you’re more interested in using __contains__ for your application), then it’s best to stick with the current simpler version… or just use a binary tree.