mystery shopping mystery shopping mystery shopping mystery shoppingmystery shopping

Sample size calculation in Mystery Shopping Programs

27th April 2009

Co Authors: Arcadio Roselli and Marcelo Tarica

Some years ago, we couldn’t avoid feeling embarrassed every time a prospect or client asked us how many shops per period should be necessary in order to achieve reliable results.
Number of stores, type of program (diagnosis or ongoing), frequency of waves and budget limitations were some of the factors to consider in the setup phase of a Mystery Shopping Program, but, taking into account that those aforementioned results would quickly turn into actionable data, it was crucial to find out a more concrete response to this question.


Although Mystery Shopping is defined as a qualitative research technique, we can resort to some basic statistical concepts usually applicable in quantitative studies, for instance:
Population Size: it’s the number of people under study; in MS programs, the population size represents the number of people, customers or not, who enter the company’s stores. It can be ignored when it is large or unknown, becoming an irrelevant data for sample size calculation.
Confidence Level: it defines the amount of uncertainty we are willing to tolerate. Typical choices in this kind of studies are 90% or 95%. Higher confidence level requires a larger sample size.
Confidence Interval: also known as “margin of error”, it’s the amount of error we are willing to accept. This plus-or-minus figure defines a range within an obtained result may vary. Lower margin of error requires a larger sample size.
For example, if we have defined a sample size using a Confidence Level of 95% and a Confidence Interval of 5% and, for instance, “Greeting” obtains 58% of positive answers, we can be 95% sure that between 53% and 63% of the company’s customers were greeted when they entered the store.
Moreover, Sample Size takes on a different meaning dependent upon the intended use of the data collected. Simply stated in Mystery Shopping specifically there are different purposes for doing a mystery shop evaluation.

  1. Measuring individual performance
  2. Measuring unit performance, Store Manager
  3. Measuring Team performance in a department, category of merchandise, etc.
  4. Measuring compliance
  5. Measuring integrity

In each case, the end in mind is a key element determining Sample Size, Confidence Level and Confidence Interval. Example if we are making general statements about the population then sample size can be less inclusive in definition; however, if we want to select the best employees of a certain Key Performance Indicator, well then, every employee must be given at least three opportunities to be evaluated, which in turn leads to a Population of 100% multiple times over some predefined period depending upon the given behaviors and/or tasks to be assessed.

Then, of course, we at times are forced to create the right theatre meaning in order to test key behaviors, we must ask the Mystery Shopper to interact with a specific person and do so in line with a predefined scenario and do so across as many locations with as many specific “targets” as there are in the chain. This creates a “level playing field” ensuring that the quantitative results are indeed comparable as you move from Buenos Aires to New York to San Francisco to Milan and beyond.

In summary, the intended use of the metrics will help us to calculate the statistical variables in a Mystery Shopping Program and, consequently, to define how accurate and confident the performance measurement of any unit (chain, cluster, store, department, employee, key, etc.) will be.

The above article is written by :
Marcelo Tarica
Email:
Company:
Title:
Founder & Director
Company Website:
Specialization:
MS programs & fieldwork in Latin America
Location:
Buenos Aires, Argentina