Chad Perrin: SOB

15 August 2009

Learning Memory Management

Filed under: Geek — apotheon @ 05:05

A lot of people talk about the benefits of garbage collection in a programming language. It’s true — the benefits are significant, and worth having available. On the other hand, there’s something to be said for memory management the “hard way”, particularly if you ever want to be a competent programmer.

The problem with only learning a single language, and selecting one with garbage collection, is that memory management in your software will probably become more and more prone to memory management issues. While it’s easier to write code that doesn’t have significant memory management issues as you get more of the memory management nitty gritty automated by your language implementation, it’s also harder to use memory management well. Good memory management requires a lot more knowledge in a garbage collected language than in one where you need to allocate and deallocate “manually”.

With no memory management, you explicitly see everything it’s doing right there in front of you, and it’s a relatively simple matter to allocate and deallocate. With a few rules of thumb and half a brain, you can turn that into well managed memory. The keys to good memory management when you’re doing it all yourself are making good decisions about how much to allocate at a time and not forgetting to deallocate, generally speaking.

With a simplified, but effective, automated memory management system — such as reference counting — you have to think more abstractly to make sure you’re using memory management effectively. Scoping, circular references, and other challenges arise that are not as direct and conceptually simple to deal with. Sure, you can just ignore the memory management and assume it’ll be done for you, but you’re likely to end up with less efficient code if you don’t know what the reference counting system is doing for you. In certain edge cases you can still manage to create memory leaks and other problems.

Finally, with an automated, full-featured garbage collector, you then have to think even more abstractly, to the point where nobody is likely to get it right all the time. Have you ever seen a Java application suddenly slow down in mid-operation for no apparent reason? Yeah, I see that semi-regularly too. If you don’t know enough about how the garbage collector works (and plan around that painstakingly), you could very well see your application slow to a crawl when the garbage collector is running. That can be deadly for certain types of software, of course, which is why garbage collected languages are not typically used for those types of software. There are even rarer — but typically more difficult to reason through — edge cases than with reference counting where things may not be deallocated properly, and may lead to bugs. Getting memory management right with a garbage collector is more difficult to achieve with any certainty than the difficulties of no memory management and of reference counting put together. You can get it “good enough” to think you got it right pretty easily, though (if you don’t know anything about memory management other than “the garbage collector does it for me”).

This is why learning to work without an automated memory management safety net is important, in my opinion. This is also why, sometimes, reference counting is the best way to go: it’s almost as automated as a garbage collector, and almost as conceptually simple as doing the memory management yourself.

In short, sometimes garbage collection is the best way to go, but it is not an excuse to just ignore memory management and figure it’s being done for you. It is being done for you, of course, but it’s probably not being done right unless you manage the memory management as well. You need to know about memory management to really get the full benefits of automated memory management.

This, of course, is why I so lament the fact that I don’t know more about memory management than I do. I know enough to deal effectively with Perl’s reference counting, and use it to best effect. I don’t, however, know it well enough to manage Ruby’s garbage collection as well as I’d like. I’m sure I’ll feel a lot better about the whole thing when I get around to (re)learning C to a level of some competency some time in the next couple of years.

I think it was Joel Spolsky who called the relevant principle the Law of Leaky Abstractions.

It was Sean Reifschneider who said “If Java had real garbage-collection, it would delete most programs before it executed them.”

All original content Copyright Chad Perrin: Distributed under the terms of the Open Works License