Saturday, June 04, 2011

The Concurrency Myth

For nearly a decade now technology pundits have been talking about the end of Moore's Law. Just this week, The Economist ran an article about how programmers are starting to learn functional programming languages to make use of the multi-core processors that have become the norm. Indeed inventors of some of these newer languages like Rich Hickey (Clojure) and Martin Odersky (Scala) love to talk about how their languages give developers a much better chance of dealing with the complexity of concurrent programming that is needed to take advantage of multi-core CPUs. Earlier this week I was at the Scala Days conference and got to hear Odersky's keynote. Pretty much the second half of his keynote was on this topic. The message is being repeated over and over to developers: you have to write concurrent code, and you don't know how to do it very well. Is this really true, or is it just propaganda?

There is no doubt that the computer that we buy are now multi-core. Clock speeds on these computers have stopped going up. I am writing this blog post on a MacBook Air with a dual-core CPU running at 2.13 GHz. Five years ago I had a laptop with a 2.4 GHz processor. I'm not disputing that multi-core CPUs are the norm now, and I'm not going to hold my breath for a 4 GHz CPU. But what about this claim that it is imperative for developers to learn concurrent programming because of this shift in processors? First let's talk about which developers. I am only going to talk about application developers. What I mean are developers who are writing software that is directly used by people. Well maybe I'll talk about other types of developers later, but I will at least start off with application developers. Why? I think most developers fall into this category, and I think these are the developers that are often the target of the "concurrency now!" arguments. It also allows me to take a top-down approach to this subject.

What kind of software do you use? Since you are reading this blog, I'm going to guess that you use a lot of web software. Indeed a lot of application developers can be more precisely categorized as web developers. Let's start with these guys. Do they need to learn concurrent programming? I think the answer is "no, not really." If you are building a web application, you are not going to do a lot of concurrent programming. It's hard to imagine a web application where one HTTP request comes in and a dozen threads (or processes, whatever) are spawned. Now I do think that event-driven programming like you see in node.js will become more and more common. It certainly breaks the assumption of a 1-1 mapping between request and thread, but it most certainly does not ask/require/suggest that the application developer deal with any kind of concurrency.

The advancements in multi-core processors has definitely helped web applications. Commodity app servers can handle more and more simultaneous requests. When it comes to scaling up on a web application, Moore's Law has not missed a beat. However it has not required all of those PHP, Java, Python, Ruby web developers to learn anything about concurrency. Now I will admit that such apps will occasionally do something that requires a background thread, etc. However this has always been the case, and it is the exception to the type of programming usually needed by app developers. You may have one little section of code that does something concurrent, and it will be tricky. But this has nothing to do with multi-core CPUs.

Modern web applications are not just server apps though. They have a lot of client-side code as well, and that means JavaScript. The only formal concurrency model in JavaScript are Web Workers. This is a standard that has not yet been implemented by all browsers, so it has not seen much traction yet. It's hard to say if it will become a critical tool for JS development. Of course one of the most essential APIs in JS is XMLHttpRequest. This does indeed involve multiple threads, but again this is not exposed to the application developer.

Now one can argue that in the case of both server side and client side web technologies, there is a lot of concurrency going on but it is managed by infrastructure (web servers and browsers). This is true, but again this has always been the case. It has nothing to do with multi-core CPUs, and the most widely used web servers and browsers are written in languages like C++ and Java.

So is it fair to conclude that if you are building web applications, then you can safely ignore multi-core CPU rants? Can you ignore the Rich Hickeys and Martin Oderskys of the world? Can you just stick to your PHP and JavaScript? Yeah, I think so.

Now web applications are certainly not the only kind of applications out there. There are desktop applications and mobile applications. This kind of client development has always involved concurrency. Client app developers are constantly having to manage multiple threads in order to keep the user interface responsive. Again this is nothing new. This has nothing to do with multi-core CPUs. It wasn't like app developers used to do everything in a single thread, but now that multi-core CPUs have arrived, you need to start figuring out how to manage multiple threads (or actors or agents or whatever.) Now perhaps functional programming can be used by these kind of application developers. I think there are a lot of interesting possibilities here. However, I don't think the Hickeys and Oderskys of the world have really been going after developers writing desktop and mobile applications.

So if you are a desktop or mobile application developer, should you be thinking about multi-core CPUs and functional programming? I think you should be thinking about it at least a little. Chances are you already deal with this stuff pretty effectively, but that doesn't mean there's room for improvement. This is especially true if language/runtime designers started thinking more about your use cases.

I said I was only going to talk about application developers, but I lied. There is another type of computing that is becoming more and more common, and that is distributed computing. Or is it called cloud computing? I can't keep up. The point is that there are a lot of software systems that run across a cluster of computers. Clearly this is concurrent programming, so bust out the functional programming or your head will explode, right? Well maybe not. Distributed computing does not involve the kind of shared mutable state that functional programming can protect you from. Distributed map/reduce systems like Hadoop manage shared state complexity despite being written in Java. That is not to say that distributed systems cannot benefit from languages like Scala, it's just that the benefit is not necessarily the concurrent problems/functional programming that are often the selling points of these languages. I will say that Erlang/OTP and Scala/Akka do have a lot to offer distributed systems, but these frameworks address different problems than the multi-core concurrency.

It might sound like I am a imperative program loving curmudgeon, but I actually really like Scala and Clojure, as well as other functional languages like Haskell. It's just that I'm not sure that the sales pitch being used for these languages is accurate/honest. I do think the concurrency/functional programming angle could have payoffs in the land of mobile computing (desktop too, but there's not much future there.) After all, tablets have already gone multi-core and there are already a handful of multi-core smartphones. But these languages have a lot of work to do there, since there are already framework features and common patterns for dealing with concurrency in mobile. Event driven programming for web development (or the server in client/server in general) is the other interesting place, but functional languages have more to offer framework writers than application developers in that arena.  My friend David Pollak recently wrote about how the current crop of functional languages can hope for no more than to be niche languages like Eiffel. I think that he might be right, but not just because functional programming has a steep learning curve. If all they can offer is to solve the concurrency problem, then that might not be enough of a problem for these languages to matter.

24 comments:

  1. This really hit the nail on the head for me. I'd switch to Scala/Haskell/Clojure if it could give me a multithreaded GUI toolkit, where I'd never have to worry about whether I was on the event thread or not.

    Is it impossible? A "Failed Dream"? http://weblogs.java.net/blog/kgh/archive/2004/10/multithreaded_t.html

    ReplyDelete
  2. Anonymous1:11 AM

    It's more to programming than retail-level amusement-oriented web apps. Believe it or not some people need to use computers to crunch tons of data in complicated ways. I suppose they are the ones needing parallel programming.

    ReplyDelete
  3. Anonymous2:47 AM

    Someone needs to build the foundation for these fancy web developers to safely dance around.

    ReplyDelete
  4. Anonymous4:18 AM

    From what you describe it appears, and forgive me if it's not true, you are only seeing the concurrency problem in terms of software you are familiar with.

    I don't believe your argument holds up when it comes to other classes of application and system development.

    For example, drawing from my own limited experience in the finance world where complex systems are constantly dealing with asynchronous events and have complicated rules all working with information that is changing over time languages like Clojure offer a generalised solution that can scale not only in terms of performance but in terms of the size of the team working on it. Your argument does not address this, imho. All apps are not fitting easily into the domain of 'forced' functionality of the web-app stack.

    ReplyDelete
  5. Anonymous5:04 AM

    "It's hard to imagine a web application where one HTTP request comes in and a dozen threads (or processes, whatever) are spawned." actually this happens quite a lot today but, as you point out, tends to be abstracted away from the developer by means of infrastructure - a database engine, message/service bus, distributed transaction coordinator, and it usually involves a pool of pre-created threads rather than allocating resources on the fly. But I do see many instances where this is done manually when there is no appropriate infrastructure in place. I have to admit I am lost on relevance of functional programming to your piece, managing concurrency is more about a properly planned resource allocation and locking strategy than which particular language choices you might make. That said, I know that functional approach can be very useful in GUI toolkits (remember DirectAnimation anyone) not to mention the usual suite of web languages Javascript, Ruby, Perl and Python which can all be used with a functional programming style

    ReplyDelete
  6. Concurrency and parallelism really need to be treated as two different things. While traditional web programming may have relatively little opportunity for concurrency beyond shared access to a central data store (more in a second), web programming tends to create huge opportunities for parallelism. Far more than just one thread per request or whatever. Why? Because web apps tends to generate very large data sets and much of the processing of those data sets needs be done in parallel if you want timely results.

    Even within the concurrency realm I question your premise. As a web user I expect many kinds of web application to at least notify me of concurrent events. Why doesn't blogger.com show me that somebody else has entered a comment on the same article while I'm typing mine, then allow me to load the new comments without losing the one I'm entering? That's concurrency and users are expecting to see it.

    If you don't see opportunities for parallel and concurrent programming in web development then it's because you aren't looking very hard.

    Finally, the authors of Clojure and Scala are not pitching web development languages. They are pitching general purpose languages. So any argument based on moving strings around the web web is ignoring all the other domains that those languages might be useful in. Focusing on GUI vs web app isn't even interesting. What about simulations and scientific computing, circuit design and model verification, proof checking and machine learning? All of these are areas where parallelism are huge. Why can't I use Scala or Clojure or whatever to work in those domains...and why can't I put up a web front end to such services?

    ReplyDelete
  7. Anonymous6:57 AM

    please read computer architecture, the art of multiprocessor programming and http://bit.ly/iN0C8E . thanks.

    ReplyDelete
  8. Anonymous7:24 AM

    TL:DR: I don't understand concurrency, so there's no reason why you should pay attention to it.

    Needless to say, this article is clueless. Twitter thought the same way and did a half assed rails app. And then tried to scale it. The result is fail whales. This is not something to be proud of.

    ReplyDelete
  9. Simon Marlow7:29 AM

    +1 to James Iry's comment

    ReplyDelete
  10. Anonymous8:09 AM

    "It's hard to imagine a web application where one HTTP request comes in and a dozen threads (or processes, whatever) are spawned."

    That's exactly what happens when you use a database from your webapp.

    ReplyDelete
  11. A couple of anonymous commenters point out that there are many other types of programming. Absolutely true. I tried to make it clear that I was going to focus mostly on application development, because I felt that was the kind of programming done by "the masses" or "mainstream programmers" or whatever cliche you like. I also think it's the kind of programmers who language evangelists have been preaching to about the multi-core/concurrency/functional programming trifecta.

    Several others have pointed out that there is also system programming that web application development in particular must rely on. Again absolutely true. I agree that there are cases where system engineering or frameworks can make use of concurrency on a single multi-core machine. However for any successful piece of infrastructure or framework, there is a lot more application code built on top of it. Further, a lot of systems engineering winds up on the distributed side of the fence, which is why I talked about that kind of programming as well.

    The argument for functional programming seems only applicable to concurrent programs that only run on a single machine. I reiterate my opinion that a lot of developers still do not need any explicit concurrency (or parallelism) or if they do need it, then they need it in a distributed environment, not on a single machine.

    ReplyDelete
  12. @James Iry -- You make a great point in distinguishing between concurrency and parallel programming. It was sloppy for me not to make that distinction. However I stand by my statements about ~1 thread/request. An application server may have some other dedicated threads or processes that do not handle inbound HTTP requests, absolutely. But these don't generally increase linearly with HTTP requests, and often they are distributed in nature, not on a single machine. As I said in the post, I don't think the concurrency/functional programming need exists for distributed systems.

    The Blogger notification use case you gave is a great example. I would say that is an excellent use case for event driven web applications. It's an example of where a framework (like Twisted, node.js, whatever) will usually be used and the framework will deal with some concurrency. I still don't see the application developer needing any kind of functional programming to deal with this use case.

    Finally as for other types of programming, again there may absolutely be need for functional programming in places like scientific computing. I chose to focus on programming done by the kind of programmers that I think are being preached to about concurrency and functional programming.

    ReplyDelete
  13. "It's hard to imagine a web application where one HTTP request comes in and a dozen threads (or processes, whatever) are spawned."

    In addition to the comments to date, for any non-trivial web app, this is pretty common. The goal is to minimize the synchronous workload to get a response back to users ASAP, so unless you're just building a site for low traffic, low load, you're probably going to be using some kind of message queue for firing off notifications and complex processes with some kind of AJAX polling to notify users when the magic has been completed without blocking the server (or maybe even an email notification if it's a really long running set of tasks/processes).

    And often it makes sense to make those spawned processes functional to allow for lots of concurrent operations.

    So while you might still write the front end of your web app in a tool that doesn't need to worry about concurrency, for any non-trivial scale you probably want to keep in mind functional programming paradigms for doing whatever magic is required to add value to your users when they come to your site.

    ReplyDelete
  14. Anonymous11:40 AM

    I'd have to say that you are entirely WRONG about concurrency and web programming. Sure posting a simple form does not need this. But rendering a large web page with multiple different chunks of content can and SHOULD be done concurrently. In fact cutting edge web frameworks such as Lift actually do this behind the scenes. And utilizing this technique is especially important if each of those chunks of content is pulled from a different synchronous database.

    This said, I don't see PHP, or even ruby, doing this anytime soon. It does require a language and platform which is setup for concurrently from the start.

    ReplyDelete
  15. Anonymous2:55 PM

    Many commenters criticize this article by writing that web applications have concurrency. But they missed the main point: does that mean that most web developers have to handle this concurrency in their own code?

    Don't forget web is a client/server system over HTTP. Pages are already built up from several chunks, but the developers don't need functional programming for this. Most of the time, the web server (apache, etc) handles the concurrency through AJAX requests. Even when the concurrent code is not in the server app, it will probably be in the framework's code, so few developers will deal with it.

    ReplyDelete
  16. I agree with you, Michael. Scala may be interesting for other features, especially since it lives in the same "world" as Java. In practice as an application developer, multi-core programming is not an urgent problem for me.

    ReplyDelete
  17. I doubt Odersky and Hickey are pitching their respective languages as a means for developers to otherwise not concern themselves with the inherent complexity of concurrent systems. Both are simply asserting that the fundamental problems underlying concurrent programming can be more easily and correctly expressed using better languages.

    Such languages do not absolve developers from being well-versed in concurrency theory, though. The risk is that anyone naive enough to think that simply ignoring concurrency issues because an underlying framework, or a language for that matter, takes care of managing thread pools, or whatever the case may be, is bound to be impacted by concurrency-related bugs. This notion that one can maintain ignorance on the topic is fraught with disaster.

    Despite all the recent attention on emerging languages and frameworks that purport to solve the concurrency problem, the best course of action for any software engineer is to first develop a core understanding of concurrency theory. Once you have this knowledge at your disposal, it is easier to see the correlation between constructs in emerging functional languages, such as Scala and Clojure, and simple solutions to common concurrency problems. And perhaps, more importantly, this knowledge allows you to use existing languages to solve the same set of problems.

    ReplyDelete
  18. Your main arguments seems to be: we can safely bypass understanding the underlying concurrency when developing web applications in single-threaded languages such as PHP and Ruby. Fair enough, we can.

    Odersky and Hickey's pitch are in a different semantic space - they take with people who _want_ to understand the underlying concurrency.

    A minor nit to pick: you might not want to lump Node.js with PHP and Ruby. They are sufficiently different.

    ReplyDelete
  19. Anonymous1:38 PM

    "It's hard to imagine a web application where one HTTP request comes in and a dozen threads (or processes, whatever) are spawned."

    Spawned, perhaps -- used, not at all! What's so hard to imagine about a program which can effectively use the CPU power of many cores, and which happens to be delivered over the web? I'm working on such a program right now. (A common example: ever used Google?)

    "can safely ignore multi-core CPU rants? Can you ignore the Rich Hickeys and Martin Oderskys of the world?"

    I don't think Hickey ever claimed that this was the sole benefit of Clojure, and I don't know Odersky but I would guess he doesn't think that about Scala, either. These guys are smart enough to recognize that people do care about this, and these languages are really good at this. Nothing more.

    "The argument for functional programming seems only applicable to concurrent programs that only run on a single machine."

    I find this pretty funny, since the idea that FP is for performance is a rather new one. For most of my life I've been hearing about how FP is better because it makes programs easier to understand -- even though it came at the expensive of performance!

    ReplyDelete
  20. Excellent pieces. Keep posting such kind of information on your blog. I really impressed by your blog.

    ReplyDelete
  21. I think the things you covered through the post are quiet impressive, good job and great efforts. I found it very interesting and enjoyed reading all of it...keep it up, lovely job..

    ReplyDelete
  22. You have provided us new features. Use with ease the technology. This is an open source so can you use for the programming. You have provided for acquiring attractive, flexible,easy-to-manage and incredible website for programming.

    ReplyDelete
  23. Rather than argue with some of them. There is set out to test out how valid each method by using experiment.

    ReplyDelete
  24. i like this blog!thank u!!

    ReplyDelete