10 Comments
Apr 9, 2023·edited Apr 9, 2023

Isn't the case with packing multiple elements in a node essentially a different data structure? Yes, nominally it is a linked list in that you link nodes and content can be whatever, but packing elements makes it a hybrid, if you consider the element the basic data unit of the whole structure. Semantically it is to linked lists what a B-tree is to balanced trees. (And surely not a coincidence that in practice B-trees often exhibit massive performance benefits compared to "classic" balanced trees (e.g. BST, AVL, red-black) in that domain.)

It certainly is in tune with the composable data structures theme, but also likely not what people readily associate with the "linked list" moniker.

Expand full comment

Casey in his refterm lecture said the "array is faster than linked list" statement is "fake optimization". It's at best neutral or worse counter-productive https://youtu.be/pgoetgxecw8?t=722

What do you think about having two separate arrays: one for the nodes (which just hold a pointer to the next node and a pointer to the value) and one for the actual values?

Expand full comment

Thanks for the great post. It made me realize that I am also dogmatically leaning on contiguous arrays far too much "because cache", creating unnecessarily complex allocation code just to manage them.

I'm particularly interested in the idea of storing a number of entities per linked list node, to be able to handle a large number of total entities and still iterate over them with relative speed. However, I'm wondering how one would handle updating entity references when the block is updated with additions/deletions/insertions with such an arrangement?

My guess is each entity would require a pointer to the block, and an index to the entity within that block. To delete an entity, we might need to do something like marking it, and then freeing the block when all entities are marked? Maybe it's not the best approach for my particular situation: I also need to sort entities, which would ruin any references if using anything other than a single entity per node linked list.

Expand full comment

I had a really interesting scenario recently where I had a tree data structure implemented in left-child right-sibling (LCRS) fashion, where I stored the rank of each node in a separate, corresponding array to the array that held the Linked nodes. I realized I could actually keep the buffer containing the Linked nodes sorted by their rank (and the corresponding list of ranks would be sorted too, of course). This made sense in context because during an update I had to update the relevant pointers (offset into buffer in this case) anyway, so as long as I was expecting ranks to be incremented rather than jump around randomly I could perform the important query without actually having to look at the ranks, just by comparing the position in memory. It was a major win for memory locality! And because of the Linked nodes implementation you can do updates on existing data without any allocations at all. Of course using a non-LCRS, array-based version also has good cache locality in the typical way we expect, however updates require new arrays of different sizes to be allocated, data to be copied, and this pessimizes updates, which in the LCRS version do not need any heap allocations at all. It's a trade-off of course and not applicable in every scenario but I just wanted to throw it out there that there are cases where a Linked list implementation IMPROVES cache locality AND improves update speed.

Another technique that can be used is building a Linked list version of your structure first and then copying the data over to an array-based version, if the array-based version is indeed the better implementation for your structure but you don't want a million allocations and deallocations and copying each data element a dozen times each.

Expand full comment
Nov 9, 2022·edited Nov 9, 2022

I think that I will try to utilize linked lists more after reading this article. I feel a little embarrassed because I was definitely one of the people talking about the cache locality, but I would like to ask about one use case of linked lists that I have seen. If you had an isolated function that needed to return a list (like a string split, for example), could you use the way that arenas allocate memory to dynamically grow a traditional list, or are linked lists still better in the case? I have not tried to implement something like this, but I have a gut feeling that it might fall apart at some level. Nevertheless, this was an interesting read and I will look for places where linked lists would be applicable in my code in the future.

Expand full comment