Finale of the Tip Jar Experiment

Well, I’m done with the tip jar experiment, and the changes that I made in Round 2 really didn’t matter very much.  I got a very mixed result.  In the first round, tip jar size and the time of day dominated the effects.  In making each day an experimental run, I controlled for that some, but the response I got was that size was statistically significant — but only after dropping out insignificant terms from the model, and the result was 180 degrees off of the previous result: smaller was now better.

There was no statistical significance to seeding or opacity, meaning that my main hypothesis is thoroughly shattered.  Oh well, that’s why you run experiments!  That there was no significance for either in 16 experimental runs tells me that it doesn’t matter.

One of the baristas told me on the first or second day that she didn’t think that I’d get good results given that “some baristas just get more tips than others”.  Even that didn’t necessarily hold true, as one barista got 3x the response today as she did a day earlier this week, and she always works opening shifts at that store.

What this tells me is that there are nuisance factors at play — things that you can’t control for.  The key nuisance factor is the amount of traffic the store does.  I could control for this by indexing the tips to that day’s sales.  Does that give me a perfect answer?  No.  Does that give me a better answer?  Yes.  However, I decided that situation was out of my control and that information was beyond what I could expect to be given.

Another nuisance factors is indeed what my friend suggested: some baristas just get more tips than others, for whatever reasons.  Now you could do some observations and see why that is, but you can also block for operators and say, “We’ll have Geof do all of these variations on his shifts, run the analysis, and see what turns up.”  Frankly, what may work for one barista may not for another one, but it could also be that isolating by blocking would help to reduce that noise factor.  Also, baristas tend to work the same shifts over time, so you’d be going back toward the time-of-day factor that I controlled for with this second run.

I really did think that I’d get conclusive results on this, but I estimated the main effects yesterday and realized that the small-jar days had three of the four highest responses; when I got a low response for the larger jar yesterday, I knew how this would come out.  It’s a little disappointing, because I’d like to be able to go back to the women and say, “Here’s your answer!”  I can’t.

My Tipping Jar Experiment, Round 2

So the first round of the tipping jar experiment went well.  There were two dominant factors: time of day and size of the jar.  I was not surprised that morning tipping was better than evening tipping, as the store is busier early than late.  I was surprised by the degree to which it dominated the results.  You use fractional factorial experiments to screen for results.  Normally, I would reduce this to a 23 experiment1 by dropping an insignificant factor; instead, I have changed how I collect my data and am re-running with the other three factors.  Instead of morning and night shifts allowing me to collect the data in 5-6 days, I’m using each full day as an experimental run, which will take me eight days.

I fully expect that tip jar size — larger was better, which surprised me — will continue to be a key player, but I want to know if opacity and seeding have main effects, and I want to know if any interactions occur.2  With a full factorial — even with a single replicate — I’ll be able to create a good reduced model once I see which main effects and interactions have any meaning.  It may be that tip jar size is the only factor that matters, but I won’t know until I take data.

I start in the morning and finish next Friday, which gives me 11 days to pull the data and plots together to write a paper.  It’s going to be a furious finish to the semester.  The big thing is that I now have to be there at least some of the time every single day for the next week-ish.  I’m there most every day, but now I have to make a concerted effort.

For future work, we’re going to refine the testing a bit.  Weekdays and weekends have different clienteles.  My tentative plan is to take data M-Th and F-Su, using those as blocks.  That really slows down my time to get results, but I won’t be on a schedule.  The baristas seem really interested in the results of this, which probably doesn’t surprise you.  They have ideas, too, and I’m the man that knows how to make the data happen.  It may take us all summer, but I bet we’ll be getting a good result at the end of it.  I’m already making plans!


  1. Three factors, each with two possible states. 

  2. E.g., how jar size and seeding interact