Tag Archives: feeds

The UNIX Way

Kas Thomas of CMS Watch riffs on “The UNIX Way”, principals summarized by Mike Gancarz:

1. Small is beautiful.
2. Make each program do one thing well.
3. Build a prototype as soon as possible.
4. Choose portability over efficiency.
5. Store data in flat text files.
6. Use software leverage to your advantage.
7. Use shell scripts to increase leverage and portability.
8. Avoid captive user interfaces.
9. Make every program a filter

Read the whole piece.

Reading up on ETL (Extract, Transform, Load) processing

Wikipedia: Extract, transform, load

Wikipedia: Talend Open Studio

Talend Open Studio: Tutorials

Manageability: Open Source ETL (Extraction, Transform, Load) Written in Java

richard.gluga.com: Data Migration Done Right

kJube: Vendors and tools – ETL

AlfrescoForge: ETL Connector

Talend job for Job Scheduler implement

High Scalability: How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data

NYTimes: Announcing the Map/Reduce Toolkit

core-user@hadoop.apache.org: Andreas Kostyrka: Re: hadoop in the ETL process
Re: hadoop in the ETL process

Smart aggregation and API use in NPRbackstory

NPRbackstory is an automated Twitter feed that attempts to add context to the news stories trending popular today according to Google’s Hot Trends. It leverages NPR’s archives (very smart, as Joshua Benton notes archives are underused assets), and Yahoo! Pipes to produce a RSS feed that is fed into the NPRbackstory account. It was developed by Keith Hopper of NPR’s Public Interactive group.

Read Joshua Benton’s piece at Nieman Journalism Lab

Read more about it at Keith Hopper’s blog.

Check out his other Twitter related project – Twitterstars – a tool to find local Twitter power tweeters.