Using your Google Spreadsheet as your Twitter Client?

Posted on March 7, 2010 by Karl

Vivek Haldar explains the significance of Google Apps Script.

Wow!

Shirky confirms Shenk

Posted on February 20, 2010 by Karl

Clay Shirky, in a recent talk at Web 2.0 Expo New York, challenged us to stop talking about information overload as an excuse, recognize it as a fact (one that’s existed for a long time and will not diminish in the future), and to work on building better filters.

Watch Clay Shirky on information overload versus filter failure:

Titles like the Boing Boing one are kinda unfortunate because they frame Shirky’s view to be one that would be in opposition to lets say, David Shenk’s from his book “Data Smog”.

Far from it.

David Shenk attempted to identify the information landscape we are living in now way back in 1997. In a 2007 piece in Slate he took a critical look back.

As with any look forward, the book wildly missed the mark with some of its more grim predictions, but in many ways still has much to offer and think about.

In particular, towards the end of the book Shenk proposed a personal call to action for building better filters (learning to be our own for example) and to be better information producing citizens (being our own editors). Big foreshadowing of Shirky’s talk there.

Most reviews of the book focussed on Shenk’s definition of the problem and pooh-poohed his suggestions. So here we are, many years down the line, and most of the focus is *still* grousing about ‘information overload’.

Clay Shirky’s point is its high time to stop doing that and get busy building the tools, protocols, customs and businesses that will help us not only deal with it, but thrive from it.

Database related reads (and videos) for January 25, 2010

Posted on January 25, 2010 by Karl

Lambda the Ultimate: Why Normalization Failed to Become the Ultimate Guide for Database Designers?

Generation 5:
Putting Freebase in a Star Schema

no:sql(east): video: Justin Sheehy is the CTO of Basho Technologies on Riak and more

ShopTalk Blog: Death to filesystems

More from Daniel Jacobson on NPR’s content management ecosystem

Posted on January 18, 2010 by Karl

Programmable Web: Daniel Jacobson: “Content Portability: Building an API is Not Enough”

Previous entries in the series:

Programmable Web: Daniel Jacobson: Content Modularity: More Than Just Data Normalization

Programmable Web: Daniel Jacobson: COPE: Create Once, Publish Everywhere

You can read much more from the NPR team on their blog at Inside NPR.org. A recent post on the blog from Jason Grosman that caught my attention was “What Happens When Stuff Breaks On NPR.org”.

Justin Cormack has some thoughts on the above series, in particular on content portablility, that are worth reading.

Also related to content portability (I think – okay – maybe a stretch – but is worthy to think about), is “Dive into history, 2009 edition”: “HTML is not an output format. HTML is The Format. Not The Format Of Forever, but damn if it isn’t The Format Of The Now.”

Also Related:

AIGA: Callie Neylan: Case Study: NPR.org

Watching: Alfresco Webcast – Getting Started with CMIS

Posted on January 18, 2010 by Karl

I’m half way thru its 1.5 hours and it is worth it to become familiar with the technology and concepts behind it.

More on CMIS and Alfresco at cmis.alfresco.com.

Think you have statistical chops? Help predict homicides in Philadelphia

Posted on January 17, 2010 by Karl

The Analytics X Prize is “to use statistical techniques and any data sets you can find to predict where crime, specifically homicides, will occur in the city”.

Drew Conway at Zero Intelligence Agents has posted some of his progress so far using spacial regression.

The rise of the journalist-programmer

Posted on January 16, 2010 by Karl

I’d call it some long-awaited recognition for many. Gawker: Hack to Hacker: Rise of the Journalist-Programmer.

Hmm… have I qualified as a Programmer-Journalist in the past?

NoSQL, Relational Database, ETL Link-a-rama for November 25th, 2009

Posted on November 25, 2009 by Karl

Jon Moore: NoSQL East 2009 Redux

Dare Obasanjo: Building Scalable Databases: Perspectives on the War on Soft Deletes

Explain Extended: What is a relational database?

Explain Extended: What is the entity-relationship model?

Data Doghouse: Data Integration: Hand-coding Using ETL Tools

Data Doghouse: Data Integration: Hand-coding Using ETL Tools Part 2

Smart Data Collective: ETL tools: Don’t Forget About the Little Dogs

Smart Data Collective: Data Integration: Hand-coding Using ETL Tools

Communications of the ACM: Extreme Agility at Facebook

Dare Obasanjo: Facebook Seattle Engineering Road Show: Mike Shroepfer on Engineering at Scale at Facebook

Two links on simple map visualizations with Python

Posted on November 22, 2009 by Karl

Simon Wilson: Exploring Python: Stack Overflow Dev Days Amsterdam November 2009

FlowingData: How to Make a US Country Thematic Map Using Free Tools

Worth Repeating: Rob Pike “Data dominates.” and Frederick Brooks “Representation is the Essence of Programming”

Posted on November 14, 2009 by Karl

Rob Pike is a famous name in programming with a history going back to Bell Labs, co-author of two often quoted books, and today works at Google.

Back in February 1989 he wrote an essay, “Notes on Programming in C” which many consider contains insight to the “Unix Philosophy”.

One of the sections of the essay people focus on were six rules he listed on complexity. Rule number 5 is:

“Data dominates. If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self evident. Data structures, not algorithms, are central to programming.”

I’ve seen people summarize that rule (why summarize three sentences?!) into “write stupid code that uses smart objects”, but I believe that misses the point.

To help us understand the context behind the rule, Pike cites Frederick Brooks’ “The Mythical Man-Month” p. 102. Here it is for your edification:

Representation is the Essence of Programming

Beyond craftmanship lies invention, and it is here that lean, spare, fast programs are born. Almost always these are the result of strategic breakthrough rather than tactical cleverness. Sometimes the strategic breakthrough will be a new algorithm, such as the Cooley-Tukey Fast Fourier Transform or the substitution of an n log n sort for an n² set of comparisons.

Much more often, strategic breakthrough will come from redoing the representation of the data or tables. This is where the heart of your program lies. Show me your flowcharts and conceal your tables, and I shall be continued to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.

It is easy to multiply examples of the power of representations. I recall a young man undertaking to build an elaborate console interpreter for an IBM 650. He ended up packing it onto an incredibly small amount of space by building an interpreter for the interpreter, recognizing that human interactions are slow and infrequent, but space was dear. Digitek’s elegant little Fortran compiler uses a very dense, specialized representation for the compiler code itself, so that external storage is not needed. That time lost in decoding this representation is gained back tenfold by avoiding input-output. (The exercieses at the end of Chapter 6 in Brooks and Inversion, “Automatic Data Processing” include a collection of such examples, as do many of Knuth’s exercises.)

The programmer at wit’s end for lack of space can often do best by disentangling himself from his code, rearing back, and contemplating his data. Representation is the essence of programming

References:

Wikipedia: Unix_philosophy

Eric Steven Raymond: The Art of Unix Programming: Basics of the Unix Philosophy

paradox1x.org

Karl Martino, Philadelphia, PA, USA

Tag Archives: data