Absolutely the best guide to visiting Lisbon

From Twitter: @matseinarsen January 27th, 2018 § 0 comments § permalink

Every so often I send around this write up I originally did for a friend about visiting Lisbon. I’m posting it here to facilitate sharing of this humble guide to make the most of eating in visiting this great city.

» Read the rest of this entry «

Advent calendar time!

From Twitter: @matseinarsen December 6th, 2017 § 0 comments § permalink

I’m currently working on a side project to make research on creating prosocial behavior easily digestible for people in tech, #ux, #product, etc. As a taster, I’ve pulled 24 concepts I’ll be tweeting daily up to Xmas under the #Convert2Good tag.

You can follow the tweets starting in this thread.

2016 in experimentation

From Twitter: @matseinarsen January 19th, 2017 § 0 comments § permalink

It feels like I’ve made an art form out of only updating my blog once per year now. Except that I put all my articles on Medium.com now. Either way, here’s my review of experimentation in 2016

2015 in experimentation

From Twitter: @matseinarsen January 19th, 2017 § 0 comments § permalink

This is my annual review of everything I tweeted about AB-testing in 2015. It’s a couple of years old, but mostly holds up very well!

Read 2015 in experimentation at Medium.com!

2014 was an amazing year for shaping human behavior

From Twitter: @matseinarsen January 19th, 2017 § 0 comments § permalink

This is simply the 3rd instalment of my annual summary of everything I tweeted about shaping behavior over the year. The difference this time, however, is that the article itself is on Medium. It’s just a much easier platform!

2014 in human behaviour

A Tripline map: Morocco, Spain, Portugal

From Twitter: @matseinarsen July 2nd, 2012 § 1 comment § permalink

Tripline let’s you create trips on maps.  I just have to test how it works to embed their maps, so this blog has a little map tripline of the trip I did of Morocco, Spain and Portugal with @angelarhodes in April/May. Check it out – it’s a cool little thing to play with.

Angela’s blogpost: Morocco – Assault on the Senses.



Is your A/B testing effort just chasing statistical ghosts?

From Twitter: @matseinarsen June 17th, 2012 § 18 comments § permalink

I’ve always felt that the idea of repeated significance testing error and false positive rates is a bit of a pedantic academic exercise.  And I’m not the only one, some A/B frameworks let you automatically stop or conclude at the moment of significance, and there’s is blessed little discussion of false positive rates online. For anyone running A/B tests it’s also little incentive to control your false positives. Why make it harder for yourself to show successful changes, just to meet some standard no-one cares about anyways?

It’s not that easy. Because it actually matters, and matters a lot if you care about your A/B experiments, and not the least about what you learn from them. Evan Miller has written a thorough article on the subject in How Not To Run An A/B Test, but it’s quite too advanced to illustrate the effect very well. To demonstrate how much it matters, I’ve ran a simulation of how much impact you should expect repeat testing errors to have on your success rate.

Here’s how the simulation works:

  • It runs 1.000 experiments, each with 200.000 fake participants divided randomly into two experiment variants.
  • The conversion rate is 3% in both variants.
  • Each individual “participant” gets randomly assigned to a variant and either the “hit” or “miss” group based on the conversion rate.
  • After each participant, a g-test type significance test is run, testing if the distribution is different between the two variants.
  • I then count every occasion where an experiment did hit significance at 90% and 95% probability, then count every experiment that did reach significance at any point.
  • As the g-test doesn’t like low numbers, I didn’t check significance during first 1.000 participants in each experiment.
  • You can download the script and alter the variables to fit your metrics.

So what’s the outcome?  Keep in mind that these are 1.000 controlled experiment where it’s known that there are no difference between the variants.

  • 771 experiments out of 1.000 reached 90% significance at some point
  • 531 experiments out of 1.000 reached 95% significance at some point

This means if you’ve run 1.000 experiments and didn’t control for repeat testing error in any way, a rate of successful positive experiments up to 25% might be explained by a false positive rate. But you’ll see a temporary significant effect in around half of your experiments!

Fortunately, there’s an easy fix. Select your sample size or decision point in advance, and make your decision then. These are the false error rates when making the decision only at the end of the experiment:

  • 100 experiments out of 1.000 were significant at 90%
  • 51 experiments out of 1.000 were significant at 95%

So you still get a false positive rate you should not ignore, but nowhere near as serious as when you don’t control correctly. And this is what you should expect when running with significance levels like this – this is actually the probability level of 95% you would expect, and at this point you can talk about real hypothesis testing.



Where Am I?

You are currently browsing the Uncategorized category at Mats Stafseng Einarsen.