the Shortest Path to a Solution

…what is simplicity? Simplicity is the shortest path to a solution. Say somebody does a proof for a mathematical problem in 20 pages. You study those 20 pages, and finally you say, “Oh, I get it.” You get a reward as the result of understanding that proof, because the proof was a solution to an interesting problem, not just a difficulty. Later, somebody else comes up with a 10-page proof for the same problem. Maybe the new proof uses a branch of mathematics that you might have to study to master, but once you master that branch of mathematics you can use it. And a 20-page proof becomes a 10-page proof. You’d have to say it’s simpler, because it’s a shorter path. Maybe it’s longer if you have to do a digression to actually learn a new branch of mathematics, but let’s assume that over time we realize that this branch is important to know in general, so we all become familiar with it.

What we’re really trying to do in software is find a way to make it easy to get value from having solutions to problems. How do we do that? When we work the program, we put in what we think is the shortest path to a solution. When we discover that the problem is different than we thought, we rewrite. And then we rewrite again. We work the program. That process is just like doing the proofs over and over. Sooner or later we discover that instead of doing something in 30 lines of code, we can do it in 15 lines, because now we have another capability that fits in. It really is just the right capability, so the work done there we don’t have to do here. We’ll just invoke that capability from here. That makes our solution easier to follow. Plus the effort you expend today to understand the code will make you a more powerful programmer tomorrow. So that simplification is very valuable.

If you write a lot of programs, and you’re used to squeezing them all the time, you find that it’s easy to write a program that’s simple. A lot of it is having a clear sense of what you want to say?writing the proof by choosing what to prove, and being clear about that. In programming, a lot of simplicity comes from knowing what matters and what doesn’t matter. A lot of times a program is made complicated because it’s attending to details that aren’t needed, or could have been avoided, or could have been relegated to something else.

Someone says, “You should always check your arguments to see if they’re in range.” Someone else says, “Half the statements in this program are checking arguments that are intrinsically in range.” Have they made the program better or worse? No, I think they’ve made it worse. I’m not a fan of checking arguments. On the other hand, there ought to be a fail fast. If you make a mistake, the program ought to stop. So there is an art to knowing where things should be checked and making sure that the program fails fast if you make a mistake. That kind of choosing is part of the art of simplification.

…Coding up the simplest thing that could possibly work is really about this: If you can’t keep five things in your head at one time and make a decision, try keeping three things in your head. Try keeping just one thing in your head, and see if you can make a decision. Then you can think of the next thing. And amazingly, when you write some of this dumb, straight-ahead code, it often turns out that it was all that was required. It works great. When a second programmer comes back later and reads the code she might say, “The people who wrote this are morons. They just wrote a simple linear search here. This thing’s ordered, so they could have done a binary search. They could have used a hash table here. Why are they doing a linear search?” Well, because a linear search worked. And when the other programmer looked at the linear search, she understood it in a minute.

Ward Cunningham, in an interview at Artima

PHP Scales

The news that Friendster migrated to PHP from JSP for scalability reasons has triggered much needed discussion in the Java community.

Chris Shiflett at O’Reilly had this to say (source rc3.org):

…how does scalability apply to the Web? First, you should ask yourself whether the Web’s fundamental architecture is scalable. The answer is yes. Some people will describe HTTP’s statelessness in a derogatory manner. The more enlightened people, however, understand that this is one of the key characteristics that make HTTP such a scalable protocol. What makes it scalable? With every HTTP transaction being completely independent, the amount of resources necessary grows linearly with the amount of requests received. In a system that does not scale (where “does not scale” means that it scales poorly), the amount of resources necessary would increase at a higher rate than the number of requests. While HTTP has its flaws (the proper spelling of referrer being one), there’s no arguing that it scales, and this is one of the things that made the Web’s early explosive growth less painful than it would have otherwise been.

The present discussion is about developing Web applications that scale well, and whether particular languages, technologies, and platforms are more appropriate than others. My opinion is that some things scale more naturally than others, and Rasmus’s explanation above touches on this. PHP, when compiled as an Apache module (mod_php), fits nicely into the basic Web paradigm. In fact, it might be easier to imagine PHP as a new skill that Apache can learn. HTTP requests are still handled by Apache, and unless your programming logic specifically requires interaction with another source (database, filesystem, network), your application will scale as well as Apache (with a decrease in performance based upon the complexity of your programming logic). This is why PHP naturally scales. The caveat I mention is why your PHP application may not scale.

A common (and somewhat trite) argument being tossed around is that scalability has nothing to do with the programming language. While it is true that language syntax is irrelevant, the environments in which languages typically operate can vary drastically, and this makes a big difference. PHP is much different than ColdFusion or JSP. In terms of scalability, PHP has an advantage, but it loses a few features that some developers miss (which is why there are efforts to create application servers for PHP). The PHP versus JSP argument should focus on environment, otherwise the point gets lost.

I actually disagree with George’s statement, “PHP doesn’t magically scale ‘naturally'”. Of course, I understand and agree with the spirit of what he’s trying to say, which is that using PHP isn’t going to make your applications magically scale well, but I do believe that PHP has a natural advantage, as I just described. Rasmus seems to agree with me, and George might also agree, despite his statement.

I think PHP scales well because Apache scales well because the Web scales well. PHP doesn’t try to reinvent the wheel; it simply tries to fit into the existing paradigm, and this is the beauty of it.

When he quotes Rasmus Lerdorf I think he gets to the heart of the matter:

A typical Java application will make use of the fact that it is running under a JVM in which you can store session and state data very easily and you can effectively write a web application very much the same way you would write a desktop application. This is very convenient, but it doesn’t scale. To scale this you then have to add other mechanisms to do intra-JVM message passing which adds another level of complexity and performance issues. There are of course ways to avoid this, but the typical first Java implementation of something will fall into this trap.

PHP has no scalability issues of this nature. Each request is completely sandboxed from every other request and there is nothing in the language that leads people towards writing applications that don’t scale.

It’s been my experience that because of Java’s abundance of riches when it comes to application design, many using it concentrate too much on tuning the Java code on the application tier, instead concentrating on all other areas of opportunity.

Because PHP does not offer so many different options to cache or pass data within applications written with it, there is far less chance for a developer or project manager to think the scalability problem can be solved entirely there. It forces you to look at the other sub-systems in across your architecture and make sure you have the resources to do so.

I have a perfect example from work experience that I?ll share with you sometime.

The More The Distance, The Easier The Violence, Or The Flame

Mark Bernstein blamed the confusion and flamefest that occured in a particular weblog community on comment and trackback usage. He suggested turning off both and relying on weblog front pages for communication. I disagree.

Flamefests, whether in user comments or on weblog front pages, are not in the best interests of one on one communication. There is a tremendous threat to the person being communicated to of being defined by it. Weblog postings get cached, linked to, and syndicated by thousands, making one on one communication, which is already hard enough in person, to have the additional weight of thousands of on lookers and potential band waggoners.

The more personal the contact, the less likely the violence. The more remote the exchange, the easier it is and more likely, for spears to be thrown.

I prefer e-mail or voice to weblog postings for one on one communication and find those that attempt to criticize/help another person from their weblogs without attempting at least e-mail first to be suspect. If you mean to have true one on one discussion, then you got to go to the most intimate means of communication.

It’s not the tools fault. It’s the people who refuse to come a little closer to talk. Just like so many other problems in this world.

Still Looking For An Open Source Project Manager

Cofax, an open source content management system that powered Knight Ridder newspaper’s online properties and is still in use around the world recently released V2.0 RC2. You can download it here. We are getting close to a full blown 2.0 release.

We need some help. We need two things right now: 1. Bug testers who can register them in SourceForge, 2. An experienced open source development project manager who can help us understand and utilize CVS and the tools at SourceForge in an effective way.

If you or you know someone else who is interested, e-mail me at: ().

“Web Logging Is to Teach Us More About Ourselves”

I give credit to Dave Winer of Userland Software for inventing web logging, and I think the idea then was to publish, to share your thoughts with everyone else. But most people’s thoughts aren’t really worth sharing. Most web logs are little more than lists of annotated bookmarks and the value of those bookmarks can probably be best derived through a web aggregator, in which case people would be writing not to be read but to be counted, which isn’t nearly as much fun.

A lot of this comes down to production values, which is a subject those in the web log world tend to ignore because it is to their advantage to do so. There is a lot of bad television, but its packaging is such that we still seem to sit through the shows. Network TV spends perhaps $500,000 on an hour. How much do you spend on each web log entry? No wonder most web logs are so boring.

But Joe Reger wants us to not think so much about the web log publishing model and instead use the technology — preferably HIS technology — as a personal freeform database with analytical tools to take the measure of our own lives. Here we’ve been thinking about web logs as a way of reaching out to the world when they may be as much or even more useful reaching into ourselves.

I think he is onto something. Personal data mining means that I’d be mining my own data, learning about my own little world. If the FBI wanted to do that (they probably do) then I’d be opposed, but personal data mining offers personal payoffs. Imagine if your web log chirped up one day suggesting out of the blue that maybe, just maybe certain trends in the entries were suggesting that you need a vacation or your business is in peril or your kid is abusing drugs or that you probably have cancer. If such knowledge was hidden in your web log data, wouldn’t you rather know than not?

Read the rest in i, cringely’s column.