NoSQL, Relational Database, ETL Link-a-rama for November 25th, 2009

Jon Moore: NoSQL East 2009 Redux

Dare Obasanjo: Building Scalable Databases: Perspectives on the War on Soft Deletes

Explain Extended: What is a relational database?

Explain Extended: What is the entity-relationship model?

Data Doghouse: Data Integration: Hand-coding Using ETL Tools

Data Doghouse: Data Integration: Hand-coding Using ETL Tools Part 2

Smart Data Collective: ETL tools: Don’t Forget About the Little Dogs

Smart Data Collective: Data Integration: Hand-coding Using ETL Tools

Communications of the ACM: Extreme Agility at Facebook

Dare Obasanjo: Facebook Seattle Engineering Road Show: Mike Shroepfer on Engineering at Scale at Facebook

Research: Software development roles and responsibilities

Trying to answer the elusive questions of:

What is a Software engineer?

What is a Lead Programmer?

What is a Tech Lead?

What is a Principal Engineer?

What is a Software Architect?

What is a Technical Project Manger?

What is a Scrum Master?

Links:

Wikipedia: Lead programmer

Wikipedia: Software engineer

Wikipedia: Software architect

IBM developerWorks: Characteristics of a software architect

Magpie Brain: A Tech Lead Manifesto

vanderbilt.edu: Project Roles and Responsibilities (Word .doc!)

it’s a delivery thing: Agile Project Roles and Responsibilities

Stack Overflow: Does a software architect have a role in agile, esp. Scrum?

Wikipedia: Scrum Roles

InfoQ: Mapping Traditional Software Development Roles to Scrum

Code Better:Classic Technical Lead Blunder

Atlassian: Tech Leads Talk

Two comparisons of different programming languages worth reading

Normally these kinds of pieces are worthless, but these two recently stood out to me:

Dennis M. Ritchie: Five Little Languages and How They Grew: Talk at HOPL* March 19, 2002

Michael Tsai: Perl vs. Python vs. Ruby – distinguished for the thoughtful replies in the discussion thread.

Clay Shirky lays out the issues confronting the future of news journalism

Read the whole thing. Nieman Journalism Lab: Clay Shirky at the Shorenstein Center on the Press, Politics and Public Policy:

…in the nightmare scenario that I’ve kind of been spinning at for the last couple years has been: Every town in this country of 500,000 or less just sinks into casual, endemic, civic corruption — that without somebody going down to the city council again today, just in case, that those places will simply revert to self-dealing. Not of epic, catastrophic sorts, but the sort that just takes five percent off the top. Newspapers have been our principal bulwark for that, and as they’re shrinking, that I think is where the threat is.

…So we don’t need another different kind of institution that does 85 percent of accountability journalism. We need a class of institutions or models, whether they’re endowments or crowdsourced or what have you — we need a model that produces five percent of accountability journalism. And we need to get that right 17 times in a row. That’s the issue before us. There will not be anything that replaces newspapers, because if you could write the list of stuff you needed and organizational characteristics and it looked like newspapers, newspapers would be able to fill that role, right?

It is really a shift from one class of institutions to the ecosystem as a whole where I think we have to situate the need of our society for accountability. I also want to distance myself — and I’ll end shortly. But I want to distance myself, with that observation I also want to distance myself from the utopians in my tribe, the web tribe, and even to some degree the optimists.

I think a bad thing is going to happen, right? And it’s amazing to me how much, in a conversation conducted by adults, the possibility that maybe things are just going to get a lot worse for a while does not seem to be something people are taking seriously. But I think this falling into relative corruption of moderate-sized cities and towns — I think that’s baked into the current environment. I don’t think there’s any way we can get out of that kind of thing. So I think we are headed into a long trough of decline in accountability journalism, because the old models are breaking faster than the new models can be put into place.

Again read the whole thing.

People tend to pick apart Shirky’s writings to find what supports their arguments. Which, I partially just did in fact, so don’t do that – absorb the nuance because the opportunities and problems at hand are far more complicated than the either naysayers or utopians would lead us believe.

SVN Branch Management Link-a-rama

Coding Horror: Software Branching and Parallel Universes

Perforce: Laura Wingerd & Christopher Seiwald: High-level Best Practices in Software Configuration Management

InfoQ: Version Control for Multiple Agile Teams

BetterExplained: A Visual Guide to Version Control

Branch Maintenance: Chapter 4. Common Branching Patterns

Submerged: CollabNet’s Subversion Blog: Branching Strategy Questioned

CMCrossroads: Robert Cowham: Branching and Merging – An Agile Perspective

CMCrossroads: Steve Berczuk. Robert Cowham, Brad Appleton: An Agile Approach to Release Management

Related Background Links:

Version Control with Subversion: Branch Maintenance: Chapter 4. Branching and Merging

Version Control with Subversion: Strategies for Repository Deployment: Chapter 5. Repository Administration

Version Control with Subversion: Repository Maintenance: Chapter 5. Repository Administration

JavaWorld: Merging and branching in Subversion 1.5

RubyRobot: Subversion With Mac OS X Tutorial

Worth Repeating: Rob Pike “Data dominates.” and Frederick Brooks “Representation is the Essence of Programming”

Rob Pike is a famous name in programming with a history going back to Bell Labs, co-author of two often quoted books, and today works at Google.

Back in February 1989 he wrote an essay, “Notes on Programming in C” which many consider contains insight to the “Unix Philosophy”.

One of the sections of the essay people focus on were six rules he listed on complexity. Rule number 5 is:

“Data dominates. If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self­ evident. Data structures, not algorithms, are central to programming.”

I’ve seen people summarize that rule (why summarize three sentences?!) into “write stupid code that uses smart objects”, but I believe that misses the point.

To help us understand the context behind the rule, Pike cites Frederick Brooks’ “The Mythical Man-Month” p. 102. Here it is for your edification:

Representation is the Essence of Programming

Beyond craftmanship lies invention, and it is here that lean, spare, fast programs are born. Almost always these are the result of strategic breakthrough rather than tactical cleverness. Sometimes the strategic breakthrough will be a new algorithm, such as the Cooley-Tukey Fast Fourier Transform or the substitution of an n log n sort for an n2 set of comparisons.

Much more often, strategic breakthrough will come from redoing the representation of the data or tables. This is where the heart of your program lies. Show me your flowcharts and conceal your tables, and I shall be continued to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.

It is easy to multiply examples of the power of representations. I recall a young man undertaking to build an elaborate console interpreter for an IBM 650. He ended up packing it onto an incredibly small amount of space by building an interpreter for the interpreter, recognizing that human interactions are slow and infrequent, but space was dear. Digitek’s elegant little Fortran compiler uses a very dense, specialized representation for the compiler code itself, so that external storage is not needed. That time lost in decoding this representation is gained back tenfold by avoiding input-output. (The exercieses at the end of Chapter 6 in Brooks and Inversion, “Automatic Data Processing” include a collection of such examples, as do many of Knuth’s exercises.)

The programmer at wit’s end for lack of space can often do best by disentangling himself from his code, rearing back, and contemplating his data. Representation is the essence of programming

References:

Wikipedia: Unix_philosophy

Eric Steven Raymond: The Art of Unix Programming: Basics of the Unix Philosophy

Internet life links for October 31, 2009

Alex Hillman recently tweeted: “Twitter lists illustrate the most important shift in the internet: your bio is now written by others, and what they say about you.” He follows up with a longer piece on his blog.

Google Wave: we came, we saw, we played D&D: It’s easy to see why many people who use it for the first time wonder what the big deal is–as I said above, you really need to try to accomplish something with it as part of a group before you understand what it’s good for.

Rafe shares the frustration he has trying to correct the the misinformation friends and family are consuming off the Web and from cable news media.

I had my Twitter updates streaming to Facebook, but recently discontinued that. danah boyd shares some of the reasons in her blog post: Some thoughts on Twitter vs. Facebook Status Updates:

One way to really see this is when people on Twitter auto-update their Facebook (guilty as charged). The experiences and feedback on Twitter feel very different than the experiences and feedback on Facebook. On Twitter, I feel like I’m part of an ocean of people, catching certain waves and creating my own. Things whirl past and I add stuff to the mix. When I post the same messages to Facebook, I’m consistently shocked by the people who take the time to leave comments about them, to favorite them, to ask questions in response, to start a conversation. (Note: I’m terrible about using social media for conversation and so I’m a terrible respondent on Facebook.) Many of the people following me are the same, but the entire experience is different.

Seth Godin comments on the penalty you face exceeding the Dunbar Number

And finally, this is brilliant.

Unit Testing Python Links

OnLamp.com: Jason Diamond: Test-Driven Development in Python

OnLamp.com: Jason Diamond: More Test-Driven Development in Python

AgileTesting: Python Unit Testing Part 1

AgileTesting: Python Unit Testing Part 2

StackOverflow: Python Doctest vs Unittest

Ian Bicking: Behavior Driven Programming