It feels like I’ve made an art form out of only updating my blog once per year now. Except that I put all my articles on Medium.com now. Either way, here’s my review of experimentation in 2016
This is my annual review of everything I tweeted about AB-testing in 2015. It’s a couple of years old, but mostly holds up very well!
This is simply the 3rd instalment of my annual summary of everything I tweeted about shaping behavior over the year. The difference this time, however, is that the article itself is on Medium. It’s just a much easier platform!
Some of the most interesting things happening in 2013 was around recommender engines. Amazon.com won an Emmy for their video recommender and Netflix algorithms got mainstream coverage with every mention of the movie Sharknado. Also, Arjan Haring presented some interesting thoughts about social proof in framing of recommendations.
Another interesting trend is more insights into the effects of user generated content like reviews and online comments. Sinan Aral demonstrated how reviewers are shaped by social contagion and the trend of shutting off comments is (thankfully) growing.
On the less exciting side, last year I hoped for more insights into online loyalty and stickiness this year, but little surfaced in that regards. The only thing I noticed was an article by Arie Goldshlager on predicting repeat customers, but even that was just referencing research from 2008. Maybe people just keep their cards too close to their chest on this.
Here’s the top material I found and tweeted in 2013 on everything related to online behavior, from conversion optimization and psychology to recommender systems, data science and AB-testing.
Online behavior change
“When you want to motivate someone to exercise regularly, a first push up is a great start! The same goes when you want to sell products.”
Maximizing conversion with micro persuasion (Arjan Haring, Econsultancy.com)
Why We Overestimate Technology and Underestimate the Power of Words (Arjan Haring, Copyblogger)
7 Principles From 7 Years Of Landing Pages (Scott Brinker, Search Engine Land)
5 Dangerous Conversion Optimization Myths (Linda Bustos, GetElastic)
The One (Really Easy) Persuasion Technique Everyone Should Know (Jeremy Dean, PsyBlog)
The Recipe for a Perfect Landing Page (Amy Hardingson, Yahoo)
How to Know When You’ve Done Too Much Conversion Optimization (Chris Goward, Wider Funnel)
How to Use Personalized Content and Behavioral Targeting For Improved Conversions (Ott Niggulis, ConversionXL)
Nine conversion techniques from the 1920s to try today (Dave Gowans, Econsultancy.com)
Persuasive Psychology for Interactive Design (Brian Cugelman)
URLs are for People, not Computers (Andreas Bonini, Not Implemented)
5 Principles of Persuasive Web Design (Peep Laja, ConversionXL)
Use of Pricing Tables in Web Design – Starkly Comparison (Nataly Birch, designmodo)
32 UX Myths (Zoltán Gócza and Zoltán Kollin)
“we have mapped the brain regions associated with ideas that are likely to be contagious”
How the brain creates the ‘buzz’ that helps ideas spread (Stuart Wolpert, UCLA Newsroom)
Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach (Schwartz et al., PLOS ONE)
Your casual acquaintances on Twitter are better than your friends on Facebook (Clive Thompson)
How To Get People To Do Stuff #5: What makes things go viral? (Susan Weinschenk, The Brain Lady Blog)
LinkedIn Endorsements: Reputation, Virality, and Social Tagging (Sam Shah, LinkedIn)
So you think you can go viral? Three reasons you may be kidding yourself! (Sangeet Paul Choudary, Platform Thinking)
Do you fear you are missing out? (Phys.org)
Measuring International Mobility Through Where People Log In to Commonly Used Websites (David McKenzie, blogs.worldbank.org)
Reviews and comments
“someone invented ‘reader comments’ and paradise was lost.”
This Story Stinks (Dominique Brossard and Dietram A. Scheufele, New York Times)
The real reason for rotten online reviews on TripAdvisor (Rory Sutherland, The Spectator)
“Positive comments tended to attract birds of a feather” (Tim Harford, the undercover economist)
The pitfalls of crowdsourcing: online ratings vulnerable to bias (Carolyn Y. Johnson, Boston.com)
The Problem With Online Ratings (Sinan Aral, MIT)
“Is Starbucks missing out on millions of dollars in revenue because its coffee prices are too low?”
Is Starbucks coffee too cheap? (Roger Dooley, Forbes.com)
You looking at me? Making eye contact might get you punched in the face (John Ericson, Newsweek)
Top 10 bargaining tricks in China (“judaicaman”, eBay buying guides)
10 Dirty Negotiation Tactics and How to Beat Them (Barry Moltz, Open Forum)
Drinking with your eyes: How wine labels trick us into buying (Michaleen Doucleff, The Salt/NPR)
Slot machines: a lose lose situation (Tom Vanderbilt, The Guardian)
How Memories of Experience Influence Behavior (Peter Noel Murray, PsychologyToday)
No windows, one exit, free drinks: Building a crowdsourcing project with casino-driven design (Al Shaw, Nieman Journalism Lab)
The Psychology of Effective Workout Music (Ferris Jabr, Scientific American)
Restaurant menu psychology: tricks to make us order more (Amy Fleming, The Guardian)
There’s an entire industry of exploitation that relies on fear and shame as motivators for business.
What Fear-Based Business Models Teach Us About User Motivation (Max Ogles, FastCompany)
What happens when you actually click on one of those “One Weird Trick” ads? (Alex Kaufman, Slate)
How to Instill False Memories (Steven Ross Pomeroy, Scientific American)
The psychology experiment that involved real beheadings (Esther Inglis-Arkell, io9)
If you text a lot, you are probably also racist and shallow (Annalee Newitz, io9)
Are your recommendations any good? (Mark Levy, Data Science in Action)
The Science Behind the Netflix Algorithms That Decide What You’ll Watch Next (Tom Vanderbilt, Wired)
Why There Are So Many Terrible Movies on Netflix (Meghan Neal, Vice)
Shit Recommendation Engines Say (Lukas Vermeer)
Why You Should Not Build a Recommendation Engine (Valerie Coffman, Data Community DC)
Online Controlled Experiments at Large Scale (Kohavi et al., KDD 2013)
Recommender systems: from algorithms to user experience (Joseph A. Konstan and John Riedl)
“Robert McNamara epitomizes the hyper-rational executive led astray by numbers.”
The Dictatorship of Data (Kenneth Cukier and Viktor Mayer-Schönberger, MIT Technology Review)
WTF Visualizations: data science
What Does It Really Matter If Companies Are Tracking Us Online? (Rebecca J. Rosen, The Atlantic)
16 useless infographics (Mona Chalabi, The Guardian)
Why you should never trust a data scientist (Pete Warden)
Statistics Done Wrong (Alex Reinhart)
The Potential and the Risks of Data Science (Steve Lohr, New York Times)
Data Science: For Fun and Profit (Lukas Vermeer)
Seven dirty secrets of data visualisation (Nate Agrin and Nick Rabinowitz, Creative Bloq)
5 ways big data is going to blow your mind and change your world (Derrick Harris, Gigaom)
‘Neuromarketing’: can science predict what we’ll buy? (Alex Hannaford, The Telegraph)
Most data isn’t “big,” and businesses are wasting money pretending it is (Christopher Mims, Quartz)
DARPA envisions the future of machine learning (Phys.org)
Obama Campaign Misjudged Mac Users Based On Orbitz’s Experience, Says Chief Data Scientist
(Kashmir Hill, Forbes)
“When running online experiments, getting numbers is easy; getting numbers you can trust is hard.”
Online Experiments: Practical Lessons
(Ron Kohavi, Roger Longbotham, and Toby Walker, Microsoft)
The do’s and don’ts in A/B testing (Floor Drees, Usersnap)
Research Practices That Can Prevent an Inflation of False-Positive Rates. (Murayama K, et al.)
Effective Web Experimentation as a Homo Narrans (Dan McKinley)
Theory-testing in psychology and physics: a methodological paradox (Paul E. Meehl, Philosophy of Science)
The Nuremberg Code for human experimentation
Is your A/B testing effort just chasing statistical ghosts? (Mats Stafseng Einarsen, Booking.com)
Split-testing 101: A quick-start guide to conversion rate optimization (Conversion Rate Experts)
False positives and false negatives in predicting customer lifetime value (ariegoldshlager)
If you want to achieve anything bigger than yourself, you need others to play along. Even if people management isn’t your calling, knowing how to lead people is a hugely useful skill for anyone who wants to achieve something. Having tried, failed and succeeded in leading development teams in various ways, I want to share these three concepts that has been very useful.
I’ve always been on lookout for literature on leadership that fits 3 simple criteria: 1) Not based in ideology. 2) Some backing in data and research. 3) Directly applicable, not grand ideas or personal development plans. So far I’ve found three great concepts to master that I’ve found very useful: the SCARF model, Level 5 leadership and High Performance Teams. There’s a lot of questionable material around these concepts online, so I’ve tried to pick a few articles as close to the original sources as possible.
The SCARF model: Before you can really work well together with anyone, you have to make them want to approach work rather than avoid it. It’s really that simple. The SCARF model tries to create a model based on very basic psychological principles of what is important to people to want to participate in something. It’s Status, Certainty, Autonomy, Relatedness and Fairness. This is a very good article that I recommend to everyone:
Level 5 Leadership basically teaches humility with resolve. I’m not sure where it’s best to start, but here’s a few links. If you want to learn more, read Jim Collins’ book “From Good To Great”.
High Performance Teams is a very simple concept. It’s just a question of helping your team find the right way of working together, and gives you some specific elements to look for in your team:
There is so much more, of course. What these articles can help you with is to start thinking along the 3 most important lines in people management: motivation and engagement, how to lead while listening and how to create a system in which your team can succeed. Enjoy.
I’m just going to leave this here. When moving to Amsterdam, I put together a quick scraping and spidering script to get a list of the most frequently used words in Dutch to practice learning Dutch and building my vocabulary. The thinking being that by using the most high frequent words, I would learn the language in a demand driven fashion – and learn what matters first! It’s got a few bells and whistles like Google translation links and context examples. I figure I’d share it, since it comes up once in a while.
Here’s the top 100 words in Dutch. Run the script to get a larger sample.
I want to develop it further by creating a flash-card generator and make it auto-generate quizzes, since both those methods are known to be good for language learning. Also it would be nice to have it remember the words you’ve seen and learnt.
Also, disclaimer: I’ve lived over 3 years in the Netherlands and my Dutch is awful, so this approach to language learning does so far not have a good track record.
Mind hacks, recommendations and behavioral heuristics: 2012’s top articles on online consumer behavior
Understanding consumer psychology and online behavior has become essential and mainstream knowledge for e-commerce development in 2012. While some of us might regret that the cat is out of the bag, it also means a lot of smart people are figuring out a lot of smart things. Below are what I found to be the most insightful and actionable articles in 2012.
There’s two things mostly missing: Recommender systems are still exclusively the domain of data crunching and algorithms, while I’d like to see more on inspiration and getting people out of the filter bubble. The other thing I haven’t seen much of in 2012 is any interesting work on stickiness and loyalty.
9 Things to Know About Influencing Purchasing Decisions (ConversionXL)
Persuade with Pictures (Neuromarketing)
Secrets from The Science of Persuasion (by Robert Cialdini & Steve Martin)
A – Z of persuasion (Richard Sedley, Loopstatic)
“Self-Efficacy” = a highly competent persuasion technique (+5 conversion tips!) (Bart Schutz, Online Persuasion)
“Autonomy”: a Super Persuasive Technique (+ 5 conversion tips!) (Bart Schutz, Online Persuasion)
Nine valuable techniques to persuade visitors to buy in 2012 (Paul Rouke, Econsultancy)
Lings Cars and the art of persuading visitors to buy (Paul Rouke, Econsultancy)
The Hard Sell
‘Fair and square’ pricing? That’ll never work, JC Penney. We like being shafted (Bob Sullivan, NBC News)
Pricing Experiments You Might Not Know, But Can Learn From (ConversionXL)
Using Behavioral Economics, Psychology, and Neuroeconomics to Maximize Sales (Mark Hayes, Shopify Blog)
12 brands increasing conversions by understanding human psychology (Kelvin Newman, Econsultancy)
Evolving E-commerce Checkout (Luke Wroblewski)
Win the Pitch: Tips from Mastercard’s “Priceless” Pitchman (Kevin Allen, HBR)
From AB tests to MAB tests (talk by John Myles White)
Three reasons to stop A/B testing (Maurits Kaptein on Econsultancy)
Experimenting at Scale (Josh Wills, Google)
Trustworthy Online Controlled Experiments: Five Puzzling Outcomes Explained (Kohavi, Deng, Frasca, Longbotam, Walker & Xu, Microsoft)
Spotify solves discovery by discovering music ain’t so social after all (Robert Andrews on paidContent)Evaluating the effectiveness of explanations for recommender systems (Tintarev & Masthoff, User Modeling and User-Adapted Interaction)
Building Large-scale Real-world Recommender Systems (Amatriain, Netflix)
Dark Social: We Have the Whole History of the Web Wrong (The Atlantic)
5 Design Tricks Facebook Uses To Affect Your Privacy Decisions (Avi Charkham, TechCrunch)
Creating Effective Loyalty Programs Knowing What (Wo-)Men Want (Valentina Melnyk, UVT)
Some Offline Learnings
Buy Design: Meet Paco Underhill, retail anthropologist (Metafilter post)
Bizarre Insights From Big Data (New York Times)
The Touch-point Collective: Crowd Contouring on the Casino Floor (Natasha Dow Schüll,Limn)
Design & Other Mind Games
Hacking the brain for fun and profit (Mind Hacks)
61 Behavioral Biases That Screw Up How You Think (Aimee Groth, Gus Lubin & Shlomo Sprung, Business Insider)
If… (Introducing behavioural heuristics) (Dan, Design with Intent)
How top beauty brands seduce you with emotional design: A UX study (Harrison Weber, The Next Web)
How to choose the right UX metrics for your product (Kerry Rodden, YouTube/Google)
Tripline let’s you create trips on maps. I just have to test how it works to embed their maps, so this blog has a little map tripline of the trip I did of Morocco, Spain and Portugal with @angelarhodes in April/May. Check it out – it’s a cool little thing to play with.
Angela’s blogpost: Morocco – Assault on the Senses.
I’ve always felt that the idea of repeated significance testing error and false positive rates is a bit of a pedantic academic exercise. And I’m not the only one, some A/B frameworks let you automatically stop or conclude at the moment of significance, and there’s is blessed little discussion of false positive rates online. For anyone running A/B tests it’s also little incentive to control your false positives. Why make it harder for yourself to show successful changes, just to meet some standard no-one cares about anyways?
It’s not that easy. Because it actually matters, and matters a lot if you care about your A/B experiments, and not the least about what you learn from them. Evan Miller has written a thorough article on the subject in How Not To Run An A/B Test, but it’s quite too advanced to illustrate the effect very well. To demonstrate how much it matters, I’ve ran a simulation of how much impact you should expect repeat testing errors to have on your success rate.
Here’s how the simulation works:
- It runs 1.000 experiments, each with 200.000 fake participants divided randomly into two experiment variants.
- The conversion rate is 3% in both variants.
- Each individual “participant” gets randomly assigned to a variant and either the “hit” or “miss” group based on the conversion rate.
- After each participant, a g-test type significance test is run, testing if the distribution is different between the two variants.
- I then count every occasion where an experiment did hit significance at 90% and 95% probability, then count every experiment that did reach significance at any point.
- As the g-test doesn’t like low numbers, I didn’t check significance during first 1.000 participants in each experiment.
- You can download the script and alter the variables to fit your metrics.
So what’s the outcome? Keep in mind that these are 1.000 controlled experiment where it’s known that there are no difference between the variants.
- 771 experiments out of 1.000 reached 90% significance at some point
- 531 experiments out of 1.000 reached 95% significance at some point
This means if you’ve run 1.000 experiments and didn’t control for repeat testing error in any way, a rate of successful positive experiments up to 25% might be explained by a false positive rate. But you’ll see a temporary significant effect in around half of your experiments!
Fortunately, there’s an easy fix. Select your sample size or decision point in advance, and make your decision then. These are the false error rates when making the decision only at the end of the experiment:
- 100 experiments out of 1.000 were significant at 90%
- 51 experiments out of 1.000 were significant at 95%
So you still get a false positive rate you should not ignore, but nowhere near as serious as when you don’t control correctly. And this is what you should expect when running with significance levels like this – this is actually the probability level of 95% you would expect, and at this point you can talk about real hypothesis testing.