State of Data -#44
April 15, 2011
#analysis – ICWSM 2011 published the list of accepted papers. ‘4chan..Analysis of Anonymity..in Large Online Community’ (PDF) is interesting for Large Data Practitioners. Folks from MIT analyzed over 5M posts to ‘quantify ephemerality’. Median life of a thread is ~4min; longest lived thread in sample was alive for only 6.2 hrs (think that w.r.t. identify-enabled sites). The authors found ‘anonymity promotes disinhibition, mob-behavior’ but the disinhibition worked better in ‘advice and discussion threads’. NSFW language warning for contents reported verbatim from /b/ or 4chan in the paper.
#architecture – Take a sneak peek inside World’s 10 largest Data Centers
#big_data – Visualizing News Data for Defense Research & Intelligence Analysis – ‘take terabytes of data from 5000 sources and make it actionable’ (using this editor’s favorite viz tool, Spotfire) – nice 46 min presentation with Q&A later
#DBMS – Running Red Hat, Oracle, and new Xeon processors? You may get about 10% better performance by enabling Turbo Boost
#learning – Automated Processing of WikiLeaks cable showing friends (Green dots), foes (red), and passersby (teal and blue) – original Stanford Class Project here (PDF). They foundSpain to be US’s most important ally
#visualization – Beauty of Map – the entire BBC series is now available to watch
- Cartoon guide to Statistical Significance
- Netflix Effect – When Software Suggests Student’s Course
- Got a Machine Learning vacancy? Watson is looking for work
- A tool to help deal with very large (think $T) numbers – Make it smaller at a ‘normal level’. Philip Greenspun intrapolates US budget to a ‘family that is spending $38,200 per year. The family’s income is $21,700 per year. The family adds $16,500 in credit card debt every year’.