Linked Lists – self.prev_, self.next_… but how about self.super_next? – Week 5 – February 12, 2016

Posted by George on February 16, 2016

Reading stuff in CS related forums made me realize that Linked Lists are not just an exercise we’ve been doing in class but a concept with real applications in the programming world. But if the theory is that they can be superior to lists due to not having to be fully stored in memory, then I must assume that their appeal increases as the size of the list increases.

So I’m imagining a linked list with at least a million nodes. But the problem I’m wondering about is with __getitem__ or any iterator becoming slow with such a large list. At least the way I set it up is we essentially “walk” through the linked list to find the item at a given index. Then we return or pass on the value and I presume garbage collect our place in the list. So for the memory we end up saving by using a linked list, aren’t we increasing the processor usage upon retrieving items from the middle or end of such a gargantuan linked list if we have many attempts to call indices?

I’m thinking that there’s may be a way to alleviate this problem with adding more parameters to linked lists. Our nodes currently store their value as well as next_node values and possible prev_node values, but what if they also stored what node was 100 in the future, and 10,000, and 1,000,000? Then when we iterate, instead of “walking” along the list, we could jump, leap, or even fly over large chunks of it to get to the index we’re seeking exponentially faster. I will call these faster than walking parameters “super_nexts”.

A __getitem__ equivalent could look something like this (just brainstorming the general idea, I did not try this code in Python):

def super_iterate(self, index)

    target_index = index
    current_index = 0
    current_node = self.front

    while current_index < target_index:
        if current_node.million_next and target_index - current_index >= 1000000:
            current_node = current_node.million_next
            current_index += 1000000
        elif current_node.ten_k_next and target_index - current_index >= 10000:
            current_node = current_node.ten_k_next:
            current_index += 10000
        elif current_node.hundred_next and target_index - current_index >= 100:
             current_node = current_node.hundred_next:
            current_index += 100
        elif current_node.next_ and target_index - current_index >= 1:
            current_node = current_node.next_
            current_index += 1
    
    return current_node

Now if we want to find a particular index, then we can get there much much faster than having to go through every node… think of a calculator adding up thousands instead of just 1 + 1 + 1 etc to get to a large number. Clearly, this would make accessing high nodes easier…. but what about storing nodes? Only the first node and then proper multiples of the jumps we want need to actually store these extra super_next parameters since our count will always starts at the front anyway… but then what if we remove a node? Or add a node in the middle somewhere. Now although searching has become much much faster, modifying our linked list has become more complicated. Not only do we need to ensure the pointers around that node point appropriately, we actually may have to go through the entire linked list to ensure that our super_next jumps are appropriate.

I think it’s doable though. All add/remove type methods just need to have additional code to accommodate this new requirement. And we can use our jump parameters in this code so that we don’t have to iterate one piece at a time to completely reformulate all the super_nexts. If we add a new node at index[1004], then we can iterate through the millions, and for each stop beyond the new index:

current_node = self.front
iterate = True
while iterate:
    if current_node.million_next:
        current_node.million_next = current_node.million_next.prev_
        current_node = current_node.million_next.next_
    else:
        iterate = False

And then again for 10,000s and 100s. I know this isn’t completely correct, for example if a new node we just introduced pushed our list size to a new threshold size like 50,000. But given some fine-tuning, I can see this working provided we include a .prev_ parameter in our nodes. I don’t imagine having the majority of nodes holding million_next = None, etc taking up so much extra space that it isn’t worth it (though maybe I’m wrong when it comes to huge numbers, I wonder if the trade-off between space for the nodes and speed to index is worth it. I imagine that it is and now I’m kind of excited to attempt to implement this super_next indexing method for 2-way linked lists.

Please comment and let me know if you think this will work, if it’s worth it, or if something like this has already been done.

George's CSC148 SLOG

Linked Lists – self.prev_, self.next_… but how about self.super_next? – Week 5 – February 12, 2016

1 Comment on “Linked Lists – self.prev_, self.next_… but how about self.super_next? – Week 5 – February 12, 2016”

Leave a Reply Cancel reply