Well, I’m done with the tip jar experiment, and the changes that I made in Round 2 really didn’t matter very much. I got a very mixed result. In the first round, tip jar size and the time of day dominated the effects. In making each day an experimental run, I controlled for that some, but the response I got was that size was statistically significant — but only after dropping out insignificant terms from the model, and the result was 180 degrees off of the previous result: smaller was now better.
There was no statistical significance to seeding or opacity, meaning that my main hypothesis is thoroughly shattered. Oh well, that’s why you run experiments! That there was no significance for either in 16 experimental runs tells me that it doesn’t matter.
One of the baristas told me on the first or second day that she didn’t think that I’d get good results given that “some baristas just get more tips than others”. Even that didn’t necessarily hold true, as one barista got 3x the response today as she did a day earlier this week, and she always works opening shifts at that store.
What this tells me is that there are nuisance factors at play — things that you can’t control for. The key nuisance factor is the amount of traffic the store does. I could control for this by indexing the tips to that day’s sales. Does that give me a perfect answer? No. Does that give me a better answer? Yes. However, I decided that situation was out of my control and that information was beyond what I could expect to be given.
Another nuisance factors is indeed what my friend suggested: some baristas just get more tips than others, for whatever reasons. Now you could do some observations and see why that is, but you can also block for operators and say, “We’ll have Geof do all of these variations on his shifts, run the analysis, and see what turns up.” Frankly, what may work for one barista may not for another one, but it could also be that isolating by blocking would help to reduce that noise factor. Also, baristas tend to work the same shifts over time, so you’d be going back toward the time-of-day factor that I controlled for with this second run.
I really did think that I’d get conclusive results on this, but I estimated the main effects yesterday and realized that the small-jar days had three of the four highest responses; when I got a low response for the larger jar yesterday, I knew how this would come out. It’s a little disappointing, because I’d like to be able to go back to the women and say, “Here’s your answer!” I can’t.