Bringing QA to data science

Despite the existence of best practices in software testing for operational software applications, there is a remarkable lack of established Quality Assurance practices for advanced analytics and data science. For decades, the advanced analytics community, rooted in academia and research, has tolerated the lack of best practices for solution deployment. Today, as the practice of data science proliferates across businesses, conducted by a broadening variety of analytics specialists and data scientists, the number of insufficiently tested solutions is growing rapidly.

Challenges of testing

Many advanced analytics practitioners and data scientists rely on code reviews by team members, because typical software testing methodologies cannot accommodate the special needs of their models and applications. As an example, simple changes in data can adversely affect the performance of analytics models. The uniqueness and size of an advanced analytics software solution can make it very challenging to test scalability and prepare for successful implementation. Regular testing of production analytics is required, as models may not have been examined for many years, while the business processes and software environments evolved.

An advanced analytics QA methodology

In blending best practices of software testing and analytics, we can successfully execute and institutionalise the review and validation of mathematical optimisation and predictive models. This approach uncovers new ideas for improvement, enables benchmarking of team practices, gives business leaders more confidence in solutions, and helps specialists improve development skills. As seen in several examples, my colleagues and I have verified the robustness and reliability of mathematical optimisation-based software systems, while enabling ongoing improvements to the underlying models.

At Remsoft, , a global leader in optimised planning and scheduling software for land-based assets, performance issues were impacting in-house and client-side users of their software. Developers encountered inadequate solution times of their formulations of mixed integer programs (MIPs), especially for larger problems with many assets and time periods.

Remsoft leadership retained Princeton Consultants to conduct a third-party model review. First, the team interviewed key company personnel to understand the business problem and context, and to determine the current structure of the different models. Next, the team reviewed documentation to understand the modelling platform and the data sources, and studied data sets to understand how the data and modelling platform mapped to a model’s implementation, looking for differences between the understanding of the mathematical model and the actual implementation. Finally, the team analysed and experimented with several key optimization instances.

“Through methodical interviews of our leaders, the Princeton Consultants team promptly understood our models and practices, got to the heart of performance problems, and presented the necessary fixes,” said Remsoft Co-Founder, Chief Executive Officer Andrea Feunekes. “They further recommended changes that have helped us advance our development and services for our clients around the world.”

In another example, leaders of a U.S. government agency required an external review of an innovative operational control system to manage a national mobile workforce. Our team reviewed documentation and interviewed personnel about the business problems and the current solution methodology. After testing for scalability and deployment, the team recommended techniques to address a variety of performance issues. The agency’s leadership better understood the risks with its algorithmic approach and chosen solution methodology. Additional improvements to the implementation were determined to minimise the risk of failure when the system is deployed.

The benefit of a third-party review was clear in the case of a transportation company, with longstanding and robust analytics capabilities, that retained Princeton Consultants to evaluate forecasting and optimization models used in operational decision making. The outputs of the forecasting models were used as inputs to a sequence of optimization models. The review and validation uncovered that forecasts were using small amounts of historical data and simplistic techniques for outlier removal and were not tuned to account for the variability of the business in different geographies. In one optimization model, it was found that there was a potential for wide variability in the results that would drive future decisions. In a second optimization model, the review uncovered the model allowed answers that were not feasible in the business, and the data supplied to the model misrepresented the business conditions.

Based on these examples and others, we can recommend the following steps for advanced analytics QA:

  1. Interview stakeholders from business and analytics development to understand the business problem and context.
  2. Review existing models and procedures.
  3. Review data sources.
  4. Implement models in alternative technologies to compare results—languages, solvers, analytics engines.
  5. Experiment with models and a variety of test data sets to uncover issues and stress the model implementation.
  6. Suggest improvements and recommend a possible further investigation.

The right testing team

An advanced analytics QA team requires expertise in modelling, advanced analytics algorithms, numerical computing, commercial and open source packages for analytics and data science, and deployment of systems embedding advanced analytics. Conducting a review entails vital questions about the correctness of the model, data sourcing and integration, publishing and use of solutions in the business, sensitivity of the answers to the inputs, and other issues. These questions often can’t be answered internally for a variety of reasons. An independent testing team may need to be supplemented by third-party experts.

Any organisation that relies on advanced analytics for core processes and key decisions must determine if suitable Quality Assurance has been conducted. A formal process should be established for testing advanced analytics, in line with testing of other operational software. The failure to do so could reduce the potential impact of advanced analytics and data science in the business environment.

Written by Dr. Irv Lustig, Optimisation Principal, Princeton Consultants