State of Data #67
September 25, 2011 Leave a comment
#analysis – Three Secrets of Business Analytics (from 37Signals)
How Lloyd’s of London uses R for Insurance
#architecture – How and why a Portland startup went from PostGres to MongoDB and came back (PDF)
This might make some people cringe. Mongo has a single global read/write lock for the entire server. The efect this has is that if a write ever takes a non-trivial amount of time—page fault combined with slow disk, perhaps—everything backs up. We had high lock % when disk %util was only ~30-40%
#big_data – Convert .csv file to MySQL Database
Yelp opened reviews for 7000 businesses, and calling talented Data Miners from Universities to solve problems – e.g., “Top 10 Positive and Negative words ranked”
#Data_Science – Building Data Science Teams
All the top data scientists share an innate sense of curiosity. Their curiosity is broad, and extends well beyond their day-to-day activities. They are interested in understanding many different areas of the company, business, industry, and technology. As a result, they are often able to bring disparate areas together in a novel way….I’ve seen data scientists apply novel DNA sequencing techniques to find patterns of fraud.
#DBMS – Is Database Design a dying art or a dead art already (interesting comments too)
#idea – Story behind Opera’s $84M big data funding
#learning – “Is the average number of fair coin tosses required to get a HTH (Head-Tails-Head) pattern greater than, less than, or the same as, the number of tosses required to get a HTT pattern?” Peter Donnelly (TED talk) shows how stats fool juries
#visualization – Meta-visualization – what are the most popular types
#etc
- Cookies’ data could reveal SO much about users
- Understanding ‘Opportunity Cost’
- Why Speed Matters in Data Transfer - $300M to save 6ms
- Humans Strike Back – Trump machines in Facial Recognition