NPR’s Daniel Jacobson shares details on their CMS ecosystem

Posted on October 29, 2009 by Karl

Programmable Web: COPE: Create Once, Publish Everywhere

Programmable Web: Content Modularity: More Than Just Data Normalization

What strikes me is the focus on data storage and the emphasis on normalizing it to a modular form that enables re-use.

I’ve seen CMSes over the years try and deny the value in this approach – they store content as blobs and force app developers to keep access knowledge and manipulation maintained in the app layer. The idea being that you can never know what content you will need to store down the line, so why attempt to build a normalized store where data is maintained and re-used long term?

In the end, many of these CMSes embrace the Anemic Domain Model anti-pattern that Martin Fowler wrote about. More and more behavior that is related to your domain is pushed in to your app-space or into a services layer.

NPR.org confirms my past experience – the investment in building a modular data store not only establishes a strong foundation – it is one that gains in value over time. It takes research – you need to dig deep into your business’s problem domain – you need to determine what is it that is the core product(s) of your business (note – I didn’t say CMS). For NPR it’s the Story. What is it for yours?

As Martin Fowler said, “In general, the more behavior you find in the services, the more likely you are to be robbing yourself of the benefits of a domain model. If all your logic is in services, you’ve robbed yourself blind.”

Invest the time.

BTW – this isn’t a NOSQL versus RDBMS issue – there are data management solutions among each that can satisfy this.

NPR’s development team has been sharing more regularly on their blog “Inside NPR.org”.

CJR: NPR Builds a Brain Trust: Thought leaders convene for a digital “Think In”

digitalthinkin.ning.com

Beware the fallacies of distributed computing…

Posted on October 18, 2009 by Karl

Peter Deutsch: The Eight Fallacies of Distributed Computing:

The network is reliable.
Latency is zero.
Bandwidth is infinite.
The network is secure.
Topology doesn’t change.
There is one administrator.
Transport cost is zero.
The network is homogeneous.

Pros and cons for NoSQL

Posted on October 4, 2009 by Karl

Pros:

a tornado of razorblades: SQL Databases Don’t Scale (Hacker News thread)

Cons:

Code Monkeyism: The dark side of NoSQL (Hacker News thread)

Archives of the Caml mailing list: Message from Brian Hurt

Chris Williams , Co-Curator of NoSQL East, NoSQL: A Modest Proposal

Carsonified: Should you go Beyond Relational Databases?

cURLing with Alfresco’s and Google’s Data APIs

Posted on August 29, 2009 by Karl

Jeff Potts: Curl up with a good web script (interacting with Alfresco’s Document Manager via CMIS and Atom)

Google Data APIs: Using cURL to interact with Google Data services

Bonus: commandlinefu.com: Update twitter via curl as Function

Useful Wget and cURL links

Posted on July 7, 2009 by Karl

Using cURL to interact with Google data services

insanesecurity: Wget all the way

cURL: Tutorial

Hive, Hadoop at Facebook, Yahoo

Posted on June 11, 2009 by Karl

Engineering@Facebook: Hive – A Petabyte Scale Data Warehouse using Hadoop

Yahoo! Developer Blog: Announcing the Yahoo! Distribution of Hadoop

Reading up on ETL (Extract, Transform, Load) processing

Posted on June 7, 2009 by Karl

Wikipedia: Extract, transform, load

Wikipedia: Talend Open Studio

Talend Open Studio: Tutorials

Manageability: Open Source ETL (Extraction, Transform, Load) Written in Java

richard.gluga.com: Data Migration Done Right

kJube: Vendors and tools – ETL

AlfrescoForge: ETL Connector

Talend job for Job Scheduler implement

High Scalability: How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data

NYTimes: Announcing the Map/Reduce Toolkit

core-user@hadoop.apache.org: Andreas Kostyrka: Re: hadoop in the ETL process
Re: hadoop in the ETL process

Election Result Maps

Posted on November 9, 2008 by Karl

Data visualizations can sometimes spur us into contemplative directions. Sometimes they can put us to sleep. These are some of the more interesting election visualizations I’ve come across:

Mark Newman, Department of Physics and Center for the Study of Complex Systems, University of Michigan: Election Maps

Robert J. Vanderbei, Professor and Chair, Operations Research and Financial Engineering, Princeton: Purple America

What the electoral map would look like if decided by 18-29 year olds

NYTimes: Election Results 2008

Interesting Analysis

David Kuhn: Politico: That huge voter turnout? Didn’t happen: “Between 60.7 percent and 61.7 percent of the 208.3 million eligible voters cast ballots this year, compared with 60.6 percent of those eligible in 2004”

Andrew Sullivan: He Saw It Coming: McCain/Palin ran a post-modern campaign (unlike Sullivan, I think it almost worked).

CNN: Number of votes cast set record, but voter turnout percentage didn’t

Associated Press: No hidden white bias seen in presidential race

CSMonitor: Obama made inroads with religious vote

NYTimes: This American Moment – The Surprises: Guess who Joe the Plumber voted for?

Salon.com: How Obama won, by the numbers: “The 18-to-29-year-old cohort supported Obama by a 2-to-1 margin (66-32), and while it is too soon to gauge precise turnout measures, their numbers clearly grew.”

Salon.com: Obama and the dawn of the Fourth Republic

NYTimes: Dissecting the Changing Electorate

Vote swings in rich and poor countries

Red State, Blue State, Rich State, Poor State: Election 2008: what really happened

Interesting Tools:

A Beautiful WWW: 20 Useful Visualization Libraries

igraph Python library

physorg.com: Visualizing election polls

IBM’s Many Eyes

Full feeds versus partial feeds

Posted on March 23, 2006 by Karl

Lots of folks out there take a hard line when it comes to publishing either full feeds (the entire contents of each post being published in RSS/Atom) or partial feeds.

Scoble, for example, is famous for declaring he won’t subscribe to anyone’s partial feed.

Shelley and Rafe have posted thoughtful takes on this, from either side of the fence.

My take? Well I publish a full feed. But for the longest time I didn’t. It hasn’t made a difference as far as my readership is concerned one way, or another, because this is such a personal space for me.

‘There is more than one way to do it’ should not only be the motto of Perl, but the motto of the web. There is room for both approaches – and many more. We’ve mostly gotten each other speaking the same language (hey I know that’s arguable), but to argue that there is only ‘one true way’ to publish the sentences misses the beauty of the web.

paradox1x.org

Karl Martino, Philadelphia, PA, USA

Tag Archives: data

NPR’s Daniel Jacobson shares details on their CMS ecosystem

Beware the fallacies of distributed computing…

Pros and cons for NoSQL

cURLing with Alfresco’s and Google’s Data APIs

Useful Wget and cURL links

Hive, Hadoop at Facebook, Yahoo

Reading up on ETL (Extract, Transform, Load) processing

Election Result Maps

Full feeds versus partial feeds

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this:

Share this: