This presentation was great to get a peek at what Twitter’s Storm was about: YouTube: PyCon US 2012: Gabriel Grant:
Related:
Twitter Engineering: “A Storm is coming: more details and plans for release”
GitHub: Storm
This presentation was great to get a peek at what Twitter’s Storm was about: YouTube: PyCon US 2012: Gabriel Grant:
Related:
Twitter Engineering: “A Storm is coming: more details and plans for release”
GitHub: Storm
Guardian: Paul Bradshaw: “How to be a data journalist”
ProPublica: Jeff Larson: “The Rainbow Connection: How We Made Our CDO Connections Graphic” (tools mentioned: google-refine (formerly Gridworks), Raphaël, JSON)
You’re a programmer with a task to retrieve information from some source, manipulate and message it, and to deploy it somewhere.
Like all things in programming, there is an acronym for that: “ETL”.
ETL stands for Extract, Transform, and Load. The Wikipedia page is pretty thorough in its summary of the topic and reviews many of the typical functions an ETL process needs to take to accomplish its task.
The problem is ETL doesn’t roll off the tongue so easy. The acronym provides a weak set of metaphors for programmers to map familiar concepts to.
Rafe Colburn provides a great mental model to apply when developing ETL scripts and applications. It’s one I follow, but have lacked the words to describe. Go read his post.
Here’s a thought to challenge you if you are a CMS developer, now that you have read the above – are whatever forms you build to enable people to contribute and manage content in a CMS a kind of ETL process? Does the Wikipedia description for “Extract, Transform, and Load” contain functions there that you would expect a CMS to encompass?
And speaking of CMS, Gadgetopia has a terrific article on what a CMS system is. It is difficult to be clarifying in a world where hype and acronyms get thrown about so much (like this very post!) – but the Gadgetopia piece certainly is. It helps outline the functionality you should expect from a CMS implementation.
Jon Moore: NoSQL East 2009 Redux
Dare Obasanjo: Building Scalable Databases: Perspectives on the War on Soft Deletes
Explain Extended: What is a relational database?
Explain Extended: What is the entity-relationship model?
Data Doghouse: Data Integration: Hand-coding Using ETL Tools
Data Doghouse: Data Integration: Hand-coding Using ETL Tools Part 2
Smart Data Collective: ETL tools: Don’t Forget About the Little Dogs
Smart Data Collective: Data Integration: Hand-coding Using ETL Tools
Communications of the ACM: Extreme Agility at Facebook
Dare Obasanjo: Facebook Seattle Engineering Road Show: Mike Shroepfer on Engineering at Scale at Facebook
Engineering@Facebook: Hive – A Petabyte Scale Data Warehouse using Hadoop
Yahoo! Developer Blog: Announcing the Yahoo! Distribution of Hadoop
Wikipedia: Extract, transform, load
Wikipedia: Talend Open Studio
Talend Open Studio: Tutorials
Manageability: Open Source ETL (Extraction, Transform, Load) Written in Java
richard.gluga.com: Data Migration Done Right
kJube: Vendors and tools – ETL
AlfrescoForge: ETL Connector
Talend job for Job Scheduler implement
High Scalability: How Rackspace Now Uses MapReduce and Hadoop to Query Terabytes of Data
NYTimes: Announcing the Map/Reduce Toolkit
core-user@hadoop.apache.org: Andreas Kostyrka: Re: hadoop in the ETL process
Re: hadoop in the ETL process