The use of randomized controlled trials (RCTs) is becoming increasingly common in technology companies for the development of new products and features. However, despite fine control over engineering systems and data instrumentation, these RCTs can still be imperfectly executed. Similar to biomedical RCTs, online experimentation in technology companies is prone to biases such as opt-in and user activity bias, selection bias, non-compliance with treatment, and challenges in testing the question of interest. These imperfections can result in biased estimates of causal effects, reduced statistical power, attenuation of effects or a need to reframe the research question. This paper aims to raise awareness among practitioners of experimentation about these imperfections that may be hidden throughout the engineering stack or design process. The authors recommend that experiment designers collaborate closely with product and user experience designers to balance learning goals with minimizing burden on end consumers. They provide practical guidance on designing and scoping experiments, instrumenting the experimentation funnel, proactively monitoring measurement imperfections and adjusting statistical analysis to mitigate imperfections. The concepts are illustrated using a running example that assumes on-device treatment assignment. The challenges discussed in this example are applicable to server-side experiments as well. Experimenters need to carefully consider randomization methods for users' experiences, how users trigger randomized experiences, the target population and how users enter the experiment subset and any mechanisms that may create unequal randomization in treatment assignment. Overall this paper highlights the importance of thoughtful experiment design and provides strategies for addressing imperfections in technology-industry RCTs. By following these guidelines practitioners can improve the validity and reliability of their experimental findings.
- - Use of randomized controlled trials (RCTs) in technology companies for product development is increasing
- - RCTs in technology companies can be imperfectly executed
- - Biases such as opt-in and user activity bias, selection bias, non-compliance with treatment, and challenges in testing the question of interest can affect RCT results
- - Collaboration between experiment designers, product designers, and user experience designers is recommended to balance learning goals and minimize burden on end consumers
- - Practical guidance provided on designing and scoping experiments, instrumenting the experimentation funnel, monitoring measurement imperfections, and adjusting statistical analysis
- - Challenges discussed are applicable to both on-device and server-side experiments
- - Consideration needed for randomization methods, how users trigger randomized experiences, target population, entry into experiment subset, and mechanisms that may create unequal randomization in treatment assignment
- - Importance of thoughtful experiment design highlighted for improving validity and reliability of experimental findings.
Technology companies are using a special way called randomized controlled trials (RCTs) to make their products better. But sometimes, these trials are not done perfectly. There are some problems that can affect the results, like when only certain people choose to participate or when people don't follow the instructions properly. It is recommended for different designers to work together to make sure the trials are fair and not too hard for the customers. There is also advice on how to plan and do the trials, how to measure things accurately, and how to analyze the results correctly. These challenges apply to experiments done both on devices and on servers. It's important to think carefully about how to design the trials so that we can trust what they tell us."
Definitions - Randomized Controlled Trials (RCTs): A special way of testing things where some people are chosen randomly to try something new while others don't.
- Biases: When our opinions or actions are influenced by things that might not be fair or accurate.
- Opt-in: Choosing or deciding to do something voluntarily.
- User activity bias: When people's behavior affects the results of a test in a certain way.
- Selection bias: When certain types of people are more likely to be chosen for a test than others.
- Non-compliance with treatment: When people don't follow the instructions given during a test.
- Collaboration: Working together with other people as a team.
- Experiment designers: People who plan and create
The Use of Randomized Controlled Trials in Technology Companies: Addressing Imperfections
Randomized controlled trials (RCTs) have long been considered the gold standard for evaluating the effectiveness of medical treatments. However, in recent years, RCTs have also gained popularity in technology companies for testing new products and features. With fine control over engineering systems and data instrumentation, these experiments seem to offer a perfect solution for evaluating causal effects. However, as with any research method, RCTs can still be imperfectly executed.
In their research paper titled "Addressing Imperfections in Technology-Industry Randomized Controlled Trials", authors Ron Kohavi and Stefan Thomke discuss the challenges that technology companies face when conducting RCTs. They highlight biases that may affect experimental results and provide practical guidance on designing and scoping experiments to mitigate these imperfections.
Biases Affecting Online Experiments
One of the main challenges faced by technology companies is opt-in bias. This occurs when users self-select into an experiment based on their preferences or behavior, leading to a non-representative sample. User activity bias is another common issue where active users are more likely to participate in an experiment than passive ones, resulting in biased estimates of treatment effects.
Selection bias is another concern where certain groups of users are more likely to be assigned to one treatment group than others due to factors such as demographics or device type. Non-compliance with treatment is also a potential problem as some users may not adhere to the assigned treatment or may drop out of the experiment altogether.
Additionally, there are challenges specific to online experimentation such as testing the right question of interest and dealing with measurement imperfections caused by technical issues or user behavior.
Strategies for Addressing Imperfections
To address these imperfections, Kohavi and Thomke recommend close collaboration between experiment designers and product/user experience designers. This ensures that learning goals are balanced with minimizing burden on end consumers.
They also provide practical guidance on experiment design, instrumentation, and statistical analysis. For example, they suggest using stratified randomization to balance the distribution of user characteristics across treatment groups and monitoring measurement imperfections proactively to identify potential biases.
The authors also emphasize the importance of carefully considering the target population and how users enter the experiment subset. They recommend using multiple methods for assigning treatments to ensure equal representation of users in each group.
Illustrative Example
To illustrate their recommendations, Kohavi and Thomke use a running example that assumes on-device treatment assignment. This means that treatments are assigned directly on the user's device rather than through a server-side process. However, they note that the challenges discussed in this example are applicable to server-side experiments as well.
In this example, they discuss various factors such as randomization methods for users' experiences, how users trigger randomized experiences, and mechanisms that may create unequal randomization in treatment assignment. By addressing these factors during experiment design and implementation, researchers can improve the validity and reliability of their findings.
Conclusion
In conclusion, Kohavi and Thomke's research paper highlights the importance of thoughtful experiment design in technology-industry RCTs. By acknowledging potential imperfections and implementing strategies to address them, practitioners can improve the validity of their experimental findings.
This paper serves as a valuable resource for anyone involved in conducting RCTs in technology companies. It provides practical guidance on designing experiments while also raising awareness about common biases that may affect results. By following these guidelines, researchers can ensure more accurate evaluation of new products and features in technology companies.