Could you explain to someone who doesn’t know computers (or link to an explanation) why garbage collected languages are so much less efficient than languages with manual memory management? What is a human programmer doing with all those let or malloc and free statements that a compiler can’t do? (Or maybe does way more than necessary?)
this is a really complex question that has led to a lot of programmer arguments over the past few decades.
first it is necessary to distinguish between memory management strategies and the languages that favour them: many of the early languages that used automatic garbage collection did so for convenience and safety rather than speed, and since they provided many other features like dynamic dispatch and dynamic typing that are difficult to optimise they became known as “slow languages”.
on the other hand you can use automatic garbage collection with C, but the nature of the language makes it difficult to optimise, so paradoxically the GC in a “fast language” not originally designed for it can be slower than that in a slow language that was.
if we ignore the rest of the language then the fastest memory management strategy is the “null garbage collector” that never attempts to recover unused memory, this sounds like a joke, but for short-lived processes (old-school inetd servers, the famous cruise missile example) you really can’t beat it since malloc can be implemented with a single pointer increment and free is a no-op!
for longer-lived processes this shades into region-based memory management, where you tag your allocations and then throw them all away in one go at the end, for example a compiler might do lots of little allocations while compiling a particular module but then not need them once that module is finished, in which case it can drop the entire region at once instead of freeing each one individually, reducing overhead.
if you don’t know anything about regions then you have to use a more generic strategy, like generational garbage collectors that assume most allocated objects are short-lived and keep them in a “nursery” region unless they live long enough that they get copied out etc. this starts to take more work though as you are tracing through objects at runtime to determine lifetime relationships that ideally would be known at compile time.
many programs that ostensibly use “manual” memory management are actually using reference counting, in which case the programmer has already thrown up their hands and admitted they don’t know how long these objects are going to live so they need the program to check them at runtime; reference counting that traces the dead objects ends up being the dual of garbage collection that traces the live objects (see the paper A Unified Theory of Garbage Collection) and which one ends up faster can depend on the workload.
automatic garbage collection also gets better when you have more RAM, as if you have more RAM you can afford to waste more by running the collector less frequently, which reduces the amount of tracing you need to do; RAM is much cheaper than programmer time for most applications these days.
anyway I’m kind of rambling, for a fast language you want to be able to express as much as possible about your object lifetimes statically at compile time (see Rust for some simplistic examples of this) and use an appropriate garbage collection strategy that is suitable for your workload for the dynamic objects.
Programmers can look at a piece of data and say “I’m certain that this won’t be needed again by this point in the program so I can throw it away now” whereas most languages are not structured in a way that allow compilers to do the same, so they have to make those decisions dynamically at run time which comes with a performance cost.
right, although to throw it away throw it away throw it away now on a principled basis requires proof that you are holding the last live reference to that object, which most languages are not capable of actually proving, which is why relying on the programmer introduces so many pernicious memory allocation bugs.
Rust solves a limited case of this with unique references, but the general case remains in industry parlance a bitch.
I mean, technically yes, but that limited case is pretty damn comprehensive. I can’t think of many situations where I’ve actually needed to use reference counters aside from concurrent shit, in which case you’d probably need some kind of run time locking system even if you were writing in C. Or, to put it another way, I don’t think there are many situations where I would be comfortable declaring that it’s safe to free but trusty ol borrowck wouldnt, even if such situations do in theory exist.
the obvious case is any tree structure with references that go up instead of down, which is incredibly common and just a generalisation of the circular linked list case which Rust cannot describe without reference counting, unsafe code, or indirection through integer indices (but I repeat myself).