The many-core future
Jul. 2nd, 2008 12:56 pm![[personal profile]](https://www.dreamwidth.org/img/silk/identity/user.png)
If you're not already following it, I commend today's post in Ars Technica about the upcoming changes to hardware. It's not precisely new, but it does underline what I've been saying for the past couple of years: the time of Massively Multicore is upon us.
Everybody's getting used to having two or maybe even eight cores in a computer, and you can almost kind of ignore that in most cases -- after all, you're just writing one process among several on the machine, so if you're limited to one core it's not a big deal. You might even tweak the program to use a few threads. But Intel is now talking seriously about architectures that range from dozens to *thousands* of little cores working together. You can't ignore that if you're going to be doing any kind of serious programming.
There's a message here, and it's an important one if you're in the field: if you're not already good at multi-threaded programming, you need to *get* good at it. There are probably deep changes to programming coming soon -- they're hard to predict at this point, but the root message is that you're going to need to understand threading pretty well. Or you're going to need to learn the languages that are inherently threadsafe, like Erlang. Or both. If not, you risk confining yourself to the limited subset of the field where threading can be ignored. (Eg, the simpler forms of web-server programming, but not the really interesting bits.)
It's exciting stuff -- we may be looking at the most dramatic changes to the *art* of programming (rather than just its uses) in decades. But if you are seriously in this field, you need to be paying attention to it, and making sure your skills are up-to-date, or you risk dead-ending your career...
Everybody's getting used to having two or maybe even eight cores in a computer, and you can almost kind of ignore that in most cases -- after all, you're just writing one process among several on the machine, so if you're limited to one core it's not a big deal. You might even tweak the program to use a few threads. But Intel is now talking seriously about architectures that range from dozens to *thousands* of little cores working together. You can't ignore that if you're going to be doing any kind of serious programming.
There's a message here, and it's an important one if you're in the field: if you're not already good at multi-threaded programming, you need to *get* good at it. There are probably deep changes to programming coming soon -- they're hard to predict at this point, but the root message is that you're going to need to understand threading pretty well. Or you're going to need to learn the languages that are inherently threadsafe, like Erlang. Or both. If not, you risk confining yourself to the limited subset of the field where threading can be ignored. (Eg, the simpler forms of web-server programming, but not the really interesting bits.)
It's exciting stuff -- we may be looking at the most dramatic changes to the *art* of programming (rather than just its uses) in decades. But if you are seriously in this field, you need to be paying attention to it, and making sure your skills are up-to-date, or you risk dead-ending your career...
(no subject)
Date: 2008-07-02 05:27 pm (UTC)Edit: in the graphics, scientific and engineering field, I get it, but what about the business applications side?
(no subject)
Date: 2008-07-02 11:11 pm (UTC)Consider: most business apps nowadays are *still* pushing pretty hard at the CPU. (As is often pointed out, Word today runs no faster on today's hardware than Wordstar did on my Z80 25 years ago.) Not every moment, and not for every function, but all sorts of functionality is winding up needing heavy resources.
Will there be *aspects* of business apps that can be done single-threaded? Probably. But frankly, I think that even that will be going away -- most of those sorts of applications will be replaced with higher-level views, rather than today's simple languages. For instance, while I judge Windows Workflow Foundation a fairly mediocre first pass, I suspect that it is a sign of where business programming is going in the medium-term: to high-level event-driven systems that don't quite look like modern programs, so that they can scale...
(no subject)
Date: 2008-07-03 12:55 pm (UTC)The only reason office automation tools (such as word processors) run as fast as they did 20+ years ago is because of code bloat. 8^) It is like "stuff". It expands to fill all available space....
However, I was actually thinking more along the lines of decision support systems and other back office applications. I can see a lot of potential for supporting e-Commerce systems and for speeding up queries into data warehousing systems, but trying to understand how to apply this to specifiying requirements for application design is escaping me at the moment. Of course, we'd have to make a good case for the business to upgrade to the new hardware in the first place. There are still plenty of linear programs running the world.
(no subject)
Date: 2008-07-03 02:58 pm (UTC)So on the one hand, I do think that linear programming is going to become ever more problematic even in that space: people will continue to be more demanding in what those systems do, and current indications are that linear programs are never likely to run much faster than they do now. (Indeed, on the coming chips they may well run slower.) But linear programming may not be the best way to tackle decision-support problems *anyway*, and newer rule-based languages, which *will* scale well, are likely to become a more natural fit to the problem space as they mature.
(And yes, it'll probably take many years for the transition to happen. But we shouldn't forget the lesson of Y2K: when the changes come, they sometimes come with overwhelming speed, and sweep a lot of old code away rather suddenly. So rather than the old COBOL programmers losing their jobs gradually over the course of decades, many of them were put out of work almost overnight as their codebases went away...)
(no subject)
Date: 2008-07-02 06:39 pm (UTC)Except in domains that are naturally at that level, the programmer shouldn't have to be thinking of thread management and where and when to thread; instead, they should be able to lay down the logic of a program without imposing order on its operations except where necessary, and let the compiler/interpreter do its own optimizing.
Something of a hybrid here are languages with a lot of matrix calculations and lazy evaluation (as perl 6 has, at least semantically). As long as you don't have a contract saying what order a matrix operator has to process stuff in, and can run lazy evaluations in parallel rather than sequence, you can get a lot of threading in under the hood without having to bother the programmer about it.
(no subject)
Date: 2008-07-02 11:15 pm (UTC)Could be -- I've been making that particular point for a decade now. That said, if you're correct it means that programmers will need to rewire their brains: I'd be surprised if 5% have the slightest clue about how to work in such languages. (Even I can't say I have enough experience with them to be comfortable in that space: I still need to do something serious in Haskell one of these years.)
Something of a hybrid here are languages with a lot of matrix calculations and lazy evaluation
Sure -- that's one of the benefits of Fortress, still one of my favorite up-and-coming languages. And even fairly traditional languages like C# are starting to get in on this particular act, with library-level parallelism that can be slipped in without *too* much difficulty -- those may be stepping-stones towards languages that have such features truly built in...
That was my first thought....
Date: 2008-07-02 09:50 pm (UTC)Or time to have compilers that do this automatically...I'm still not clear on why multithreading is left in the hands of the programmer when it's *so* easy to get wrong in horrible, unfindable ways. It's like garbage collection -- one of the places where the abstraction bleed is still inescapable, because state of the art is still crummy for handling this basic element of application infrastructure.
(True, GC is much better than it was 15-20 years ago; but it still can occupy a huge amount of processing space, in some languages (cough Java cough) seems to work at about 50% efficiency, and can routinely produce programs with memory leaks...)
Google's demonstrated that the MapReduce paradigm is a good example of a shift that leads to inherently parallelizable programs. I think it should be taught in Programming 201, just after recursion...
Re: That was my first thought....
Date: 2008-07-03 03:02 am (UTC)Well, largely because it's difficult/impossible to do automatically in traditional languages, and people are used to traditional languages. To make it automatic, you have to change how people *think* about programs, and that's not easy.
Google's demonstrated that the MapReduce paradigm is a good example of a shift that leads to inherently parallelizable programs.
Oh, absolutely, and it wouldn't surprise me if that proves to be one of the primary shifts -- that that will push down from "architecture" to "libraries" to "language" pretty steadily. But that simply underscores my point that a programmer can't afford to be complacent in how he programs nowadays, because tomorrow it ain't gonna be the same.
(Which reminds me that I really need to get more comfortable with MapReduce myself: I've never worked with that particular architecture, and I owe it to myself to grok it deeply enough to know when to use it...)
Re: That was my first thought....
Date: 2008-07-03 03:15 am (UTC)This is kind of why I brought up the GC example. When I was first taught programming, "automatic" garbage collection was some sort of weird voodoo that no one quite believed in, and you had to be very careful to make sure you free()d things, and such.
Wind forward some years, and Java's GC (while slow and inefficient) is essentially foolproof, barring a few memory leaks over the years (like Strings). And I'm hoping there's another 15 years of progress since then that have improved this situation.
Parallelically, I'm hoping that there's something we're all missing about multithreading along the same lines -- that some minor change in programming, possibly involving an extra layer of abstraction (by analogy with the functional -> OOP shift), will mean that we get multithreading for free. And no one will have to write locks or monitors or whatever, ever again, because they're just too easy to get horribly wrong.
And a pony!
Re: That was my first thought....
Date: 2008-07-03 02:49 pm (UTC)Honestly, my suspicion is not. Consider: people have been working on the parallelism problem a lot longer than the GC one. When I mentioned the current push towards multi-core architectures to my father a few months back, and the challenges it posed for programmers, he pointed out that sure, he was working on projects for that -- back in the 1960s. Everyone's known for many years that this day was coming: they've just pushed it off for longer than anyone originally thought possible through clever hardware wizardry.
My suspicion is that it *will* become automatic in the not terribly distant future -- but doing so will require a somewhat more major change to programming than that. Specifically, every promising-looking approach I've seen requires you to think about problems a little differently: tackling problems by decomposition, rather than as a sequence of instructions. The nature of those decompositions varies -- sometimes using an ecology of inter-communicating objects (as in Erlang), sometimes using a descriptive approach to programming (as in Haskell). But it's always about breaking the problem down into little pieces, and letting the recomposition happen automatically.
I don't know that that's really any harder in an absolute sense. But I'd bet it will leave some programmers who can't make the leap behind, the same way OOP did...
(no subject)
Date: 2008-07-04 02:35 pm (UTC)OOP has been around for how long? And yet, you still see very procedural approaches even in Java. Even when an OO solution is obviously superior. Indeed, look at any code base that supports OO ideas like classes and you'll often see, perhaps most of the time, those ideas eschewed for concepts that are presumably more well understood by the programmer.
Programmers don't usually go multi threaded unless it solves a problem and even then not always. Unless some mechanism is presented that forces the concepts, I don't see this changing. Again, Java and Objects. Java pushed the paradigm hard at the programmer, and still you see entire classes made of static methods and parallel arrays.
I actually looked into Erlang, mostly because it's time I learned some more declarative languages. I don't like the loose typing, I really never like loose typing. I want to like Python, but can't because of that. However, I recently read an inteview with Bjarne Stroustrup (http://www.computerworld.com.au/index.php/id;408408016;fp;4194304;fpid;1;pf;1) where he talks at little about the nextgen C++ and concurrent programming. I believe the ability to leverage mutil cores will probably rely on smarter compilers and simple libraries, rather than more well informed programmers.
(no subject)
Date: 2008-07-04 04:25 pm (UTC)Java pushed the paradigm hard at the programmer, and still you see entire classes made of static methods and parallel arrays.
*Twitch*. True -- but *twitch*.
I actually looked into Erlang, mostly because it's time I learned some more declarative languages. I don't like the loose typing, I really never like loose typing.
I dunno. I somewhat agree -- I've developed a fondness for strongly-typed languages over the years. That said, I don't mind *good* loosely-typed languages: Ruby remains a favorite of mine, for example. I'm intrigued by the next-gen Javascript dialects like ActionScript 3, which allow both models side-by-side.
I don't love Erlang, but that's a larger issue: I just find the language rather more idiosyncratic than it needs to be. I suspect that the same ideas could be put into a more mature language that I would appreciate more.
I believe the ability to leverage mutil cores will probably rely on smarter compilers and simple libraries, rather than more well informed programmers.
Perhaps -- but again, it's going to come down to How Many Cores. With a relatively modest number of cores, or special-purpose ones, libraries will do well enough: the programmer thinks mainly in terms of linear programs, and calls out the parallel bits explicitly.
But if they do get up to the terascale thousand-core systems, I really doubt that's going to hack it -- the linear parts of the program will turn into bottlenecks that prevent you from leveraging the system at all well, and bog things down badly. Smart compilers can only buy you so much, if they're being applied to current languages, because those languages just don't have the right *semantics* for automatic parallelization.
So at that point, I really suspect we're going to see a shift into newer languages, that are more natively parallelizable -- languages that *do* allow the compiler to really make the program hum nicely on a massively parallel system. Those may not be as weird as Erlang, but I suspect that they will be at least like Fortress. (Which *defaults* to parallel processing unless you explicitly prevent it.)
The moral of the hardware story is that smart chips will only get you so far before you have to change paradigms. My strong suspicion is that the same will be true of software -- that smarter compilers can only get you so far before you have to change the language...