State of Data Last Week – Dec 06
December 4, 2010 Leave a comment
#Analysis – Logistic regression is a ‘categorical tool’ e.g., telling fraud/not fraud. Here is a great starter with a worked out case analysis in minutes using R on your laptop.
#Architecture – Top 5 Free, Open Source Data Mining Software
#Big Data – One single racecar streams 27GB of telemetry data during a race weekend from 200 sensors.
#DBMS – Build your own ‘Circular Log’ / ‘Log Rotation Routine’ with MySQL (or any data storage). With later Oracle releases, ‘interval partition’ is in-built for this.
#Learning – Want to learn how to write efficient SQL from the master who created it all? An excellent 16-hr ‘SQL Master Class’ video course from Chris Date shows how to avoid common traps and pitfalls. Best of all, it’s completely FREE for Intuit employees using Safari Online.
#visualization – Logstalgia displays your web access logs as a ‘pong like battle between Web Server and a never ending torrent of requests’. Requests appear as color balls!
glTail is similar FREE, real-time log-visualization tool – ‘each circle is a hit on website, and size of circle indicates the size of request’.
#etc
- Massively Parallel Bacterial Data Storage – 90GB of data stored in 1g Bacteria
- 88% of tax returns filed by prisoners not screened for potential fraud.
- Great one-a-day calendar format from leading performance experts from our typical big web scale entities — http://calendar.perfplanet.com/2010/ e.g., Dec 2 analyzes why we need minimum 60 servers for building a movies-on-demand service.
- Slate invited ideas for “Data for a Better Planet”. E.g., Asthmapolis distribute an attachment to asthma inhalers with built-in GPS, and use it to help understand what sets off their attacks