Bringing QA to data science

Despite the existence of best practices in software testing for operational software applications, there is a remarkable lack of established Quality Assurance practices for advanced analytics and data science. For decades, the advanced analytics community, rooted in academia and research, has tolerated the lack of best practices for solution deployment. Today, as the practice of data science proliferates across businesses, conducted by a broadening variety of analytics specialists and data scientists, the number of insufficiently tested solutions is growing rapidly.

Challenges of testing

Many advanced analytics practitioners and data scientists rely on code reviews by team members, because typical software testing methodologies cannot accommodate the special needs of their models and applications. As an example, simple changes in data can adversely affect the performance of analytics models. The uniqueness and size of an advanced analytics software solution can make it very challenging to test scalability and prepare for successful implementation. Regular testing of production analytics is required, as models may not have been examined for many years, while the business processes and software environments evolved.

An advanced analytics QA methodology

In blending best practices of software testing and analytics, we can successfully execute and institutionalise the review and validation of mathematical optimisation and predictive models. This approach uncovers new ideas for improvement, enables benchmarking of team practices, gives business leaders more confidence in solutions, and helps specialists improve development skills. As seen in several examples, my colleagues and I have verified the robustness and reliability of mathematical optimisation-based software systems, while enabling ongoing improvements to the underlying models.

At Remsoft, , a global leader in optimised planning and scheduling software for land-based assets, performance issues were impacting in-house and client-side users of their software. Developers encountered inadequate solution times of their formulations of mixed integer programs (MIPs), especially for larger problems with many assets and time periods.

Remsoft leadership retained Princeton Consultants to conduct a third-party model review. First, the team interviewed key company personnel to understand the business problem and context, and to determine the current structure of the different models. Next, the team reviewed documentation to understand the modelling platform and the data sources, and studied data sets to understand how the data and modelling platform mapped to a model’s implementation, looking for differences between the understanding of the mathematical model and the actual implementation. Finally, the team analysed and experimented with several key optimization instances.

“Through methodical interviews of our leaders, the Princeton Consultants team promptly understood our models and practices, got to the heart of performance problems, and presented the necessary fixes,” said Remsoft Co-Founder, Chief Executive Officer Andrea Feunekes. “They further recommended changes that have helped us advance our development and services for our clients around the world.”

In another example, leaders of a U.S. government agency required an external review of an innovative operational control system to manage a national mobile workforce. Our team reviewed documentation and interviewed personnel about the business problems and the current solution methodology. After testing for scalability and deployment, the team recommended techniques to address a variety of performance issues. The agency’s leadership better understood the risks with its algorithmic approach and chosen solution methodology. Additional improvements to the implementation were determined to minimise the risk of failure when the system is deployed.

The benefit of a third-party review was clear in the case of a transportation company, with longstanding and robust analytics capabilities, that retained Princeton Consultants to evaluate forecasting and optimization models used in operational decision making. The outputs of the forecasting models were used as inputs to a sequence of optimization models. The review and validation uncovered that forecasts were using small amounts of historical data and simplistic techniques for outlier removal and were not tuned to account for the variability of the business in different geographies. In one optimization model, it was found that there was a potential for wide variability in the results that would drive future decisions. In a second optimization model, the review uncovered the model allowed answers that were not feasible in the business, and the data supplied to the model misrepresented the business conditions.

Based on these examples and others, we can recommend the following steps for advanced analytics QA:

  1. Interview stakeholders from business and analytics development to understand the business problem and context.
  2. Review existing models and procedures.
  3. Review data sources.
  4. Implement models in alternative technologies to compare results—languages, solvers, analytics engines.
  5. Experiment with models and a variety of test data sets to uncover issues and stress the model implementation.
  6. Suggest improvements and recommend a possible further investigation.

The right testing team

An advanced analytics QA team requires expertise in modelling, advanced analytics algorithms, numerical computing, commercial and open source packages for analytics and data science, and deployment of systems embedding advanced analytics. Conducting a review entails vital questions about the correctness of the model, data sourcing and integration, publishing and use of solutions in the business, sensitivity of the answers to the inputs, and other issues. These questions often can’t be answered internally for a variety of reasons. An independent testing team may need to be supplemented by third-party experts.

Any organisation that relies on advanced analytics for core processes and key decisions must determine if suitable Quality Assurance has been conducted. A formal process should be established for testing advanced analytics, in line with testing of other operational software. The failure to do so could reduce the potential impact of advanced analytics and data science in the business environment.

Written by Dr. Irv Lustig, Optimisation Principal, Princeton Consultants

Artificial Intelligence-Based Data Monitoring Helps ETS Clients Control Energy Costs

Artificial Intelligence-Based Data Monitoring Helps ETS Clients Control Energy Costs

The AI data analysis capability can ensure that a building is taking full advantage of utility tariff classes for the building’s usage characteristics, ETS’ Jeff Hendler says.

Thank you for sharing!

Your article was successfully shared with the contacts you provided.

Don’t forget you can visit MyAlerts to manage your alerts at any time.

Jeff Hendler, CEO of Energy Technology Savings with The Highlands at Hilltop, 100 White Rock Rd, Verona, NJ, one of the properties using Hendler’s AI-based building energy management software (Composite photo)

LIVINGSTON, NJ—Energy Technology Savings, an energy technology, behavior management and smart building service provider, is rolling out artificial intelligence-based technology to help building owners turn their building operations data into valuable revenue streams.

To continue reading,
become a free ALM digital reader

Benefits include:

  • Unlimited access to and other free ALM publications
  • Access to 15 years of archives
  • Your choice of digital newsletters and over 70 others from popular sister publications
  • 5 free articles across the ALM subscription network every 30 days
  • Exclusive discounts on ALM events and publications

*May exclude premium content

Featured Events


Featured Content

August 06, 2018

Barring any black swan event(s), the near-term outlook for lodging remains very positive. Domestic and foreign investment, and institutional capital continue to be deployed into single assets and portfolios of all types and locations of US hotels.

August 06, 2018

Years of investment activity means that available supply is low, and cities like Columbus and Memphis are seeing benefits.

August 06, 2018

The fund will invest $50 million in emerging companies and technologies in the grocery sector.

August 06, 2018

Company culture has become integral to attracting ad retaining top talent, but how do you create a winning company culture?

Data Drives Digital Marketing, Guides Creative

Data Drives Digital Marketing, Guides Creative

Sponsored By:
Josh Medore

YOUNGSTOWN, Ohio — Every page you visit on the internet, every item you add to your online shopping cart and every email from a company you patronize is logged. And when it comes to digital marketing, that information is invaluable.

“We acquire all this data not just to understand a consumer’s propensity to buy a product, but also [to understand] how to communicate with them,” says Jason Wood, president of Actionable Insights, a marketing firm based in Sharon, Pennsylvania.

In a digital environment where consumers are bombarded by messages from scores of sources, the use of data in marketing campaigns helps companies precisely target customers.

“If you’re a 28-year-old that likes craft beer, works 9 to 6 and enjoys organic meals in downtown Youngstown, that’s how clear-cut we can make the target,” says Palo Creative’s president and CEO, Rob Palowitz. “Can you hit that target by traditional means? Yeah, you’ll hit them somewhere along the line, but you get a lot of waste.”

Beyond being able to reach specific audiences, the use of data can inform the creation of marketing campaigns.

“Taste and instincts have prevailed for decades. A lot of classic marketers were instinctual and knew how to get people in a call to action,” says Jeff Herrmann, owner of Madison, Michigan and Market, Youngstown. “Data can be an awesome reinforcement mechanism or be the way you optimize to sell your product. Data-driven marketing is about knowing the habits and interests of your audience because it’s quantifiable.”

Such information can be acquired a few ways. First-party data, which the marketers interviewed for this story prefer using, includes information the company has cultivated, things like website traffic and email contacts. Third-party information, meanwhile, is information acquired from an outside source, such as a list purchased from a market research firm.

While marketers prefer to use first-party data, outside information can be useful, says Chris Askew, director of digital marketing strategies for The Prodigal Co., Boardman. He points to a client who wanted to market a product to farmers, but didn’t have the internal information to do so.

“We purchased data from a credit bureau, asking for data on people who spent more than $40,000 in a year on farming equipment. That tells us this person is probably a farmer,” Askew says. “Something like that is a smart buy. You really have to think about the purpose.”

First-party data is generally used to optimize targeting for a marketing campaign. It can relate how people arrived at your website, where they visited, how long they stayed and what they clicked on before they left.

In some cases, such information can be used to fine tune aspects of sales, says Palo’s digital media director, Jim Komara. Through Google Analytics, he can look at how customers got to the cart page, how many entered a shipping address and how many made it to the checkout page.

“We can look at how many people made it there and bailed out,” he explains. “That could tell you if your shipping is too high because a lot of people got all the way there and then decided not to finish the process.”

Information gleaned from your website can then be combined with automated marketing systems such as HubSpot or SharpSpring to create hyper-targeted campaigns.

“When you have the data to know what certain segments are and what they’re doing in certain places, you can create messages specifically for them,” says Prodigal’s Jessica Thompson, director of business development. “You can get a lot more personalized messages because you know who you’re talking to and what they want to hear.”

Jeff Ryznar, owner of 898 Marketing in Canfield, adds that while digital information is easily accessible, it isn’t the only information that factors into the creation of a digital marketing plan. When creating a strategy with a client, he says that information drawn from employees can be just as valuable.

“You also use sales data and talk with partners to understand their closing rates and behaviors and sales process internally,” Ryznar says. “It’s not all data you can get from data or digital channels. Sometimes you have to go to their store or office and learn how they measure things.”

That information is vital, he continues, because the standard measure of success for a digital marketing campaign is the financial return, whether determined by return on investment, the cost per action or improved efficiency.

“We know with absolute specificity when we’ve converted somebody. We can create a list of human beings that we know we drove [to the site] and that gets crosschecked with sales and revenue numbers,” says Actionable Insights’ Wood. “Because we start with data and we know who we’ve sent, we can figure out the specific ROI.”

When creating a startegy, Palo’s Komara says, it’s best to assign a dollar value to goals.

“If we know an average purchase is $219,” he says, “then we can back into how many people made a purchase, how many people backed out and how many people visited the site and assign a value to each. That way, we know that every time someone visits your website, it’s worth, say, 20 cents.”

It’s a similar process for determining just how much efficiency has improved. Herrmann offers the example of aiming to reach $1 million sales, with an average sales value of $100,000. It takes 10 sales to hit that goal and 50 emails to make those sales. By using data pulled from his website, a client can opt to get in touch only with people who visited a specific page on that website and close the sale rather than contact everyone in the general sales list.

In doing so, the conversion rate could drop to 10 sales for every 25 sales emails.

“By automating some of your marketing, you can take some cost out,” Herrmann says. “You can either have a bunch of people on the phone playing the volume game or you can have a well-defined strategy that gets into the inbox of dedicated customers, and you know that the ones who open it are ready to go. It’s all about qualification.”

On top of that, some marketing platforms allow companies to create “look-alike personas,” where the information from one user profile is examined and generates leads for similar online profiles.

“It’s really dialing in the targeting aspect of marketing where you don’t just have a general message, but you have a specific message for each persona,” Palowitz says. “Each one looks, feels and thinks about a thing in different ways.”

Wood notes, however, that email shouldn’t be the primary conversion tool for companies. It’s good for reaching a targeted audience, but the moneymaker is still a company website.

“The website should be the center of the communication strategy. It’s open 24/7/365 so you can treat it as the focal point,” Herrmann agrees.

And marketing campaigns don’t necessarily have to rely entirely on digital promotion. Collecting data is useful for informing decisions when composing what a company needs to be doing to promote itself.

Prodigal’s senior account executive Tony Marr points to a campaign the company worked on where Prodigal’s analysts saw spikes in website visits sduring the first and final week of the month for part of the campaign.

“We looked at it and saw they were doing a TV campaign two weeks out of the month,” he explains. “There was a rise in website activity [during the TV campaign] and then it dropped back off. It helps us, too, because we could show them just how well TV was working for them.”

Copyright 2018 The Business Journal, Youngstown, Ohio.

to subscribe to our print edition and sign up to our free daily headlines.

A Genetic Study Using 23andMe Data Finds Link Between Schizophrenia and Cannabis Use

A marijuana plant.
Photo: Justin Sullivan (Getty Images)

There’s evidence of a connection between cannabis use and schizophrenia, but it’s unclear whether the drug leads to the disorder, or vice versa. A new study published Monday, which relies partly on genetic data from 23andMe volunteers, might offer a little clarity on that link. It found that people genetically at risk of schizophrenia are also more likely to start smoking pot, suggesting the disorder itself might cause cannabis use in some people.

The current study, published in Nature Neuroscience, is a continuation of previous efforts to sketch out the genetic variations that make people more likely to start using cannabis, a project known as the International Cannabis Consortium. The study authors, which include some researchers from DNA test company 23andMe, studied anonymized genetic data taken from previous or ongoing studies, such as the UK Biobank, as well as from people who have permitted their DNA to be used for research, such as those who signed up for genetic testing from 23andMe. Overall, they looked at more than 180,000 people, making this the largest study of its kind, according to the authors.

A person’s genetic code can differ slightly from someone else’s in lots of ways, but the most common variation is called a single-nucleotide polymorphism, or SNP. A SNP is a minute change in the building blocks that make up DNA (and RNA), known as nucleotides. So in one specific section of DNA, for example, most people might have adenine (A), one of the four nucleobases that make up a nucleotide, but others might have cytosine (C) instead.

In the study, the researchers found eight of these SNPs that were associated with lifetime cannabis use. Taken as a whole, they calculated, these variations accounted for 11 percent of the difference in whether someone reported smoking pot or not.

Using different tests, they also found 35 genes in 16 different sections across the genome that were associated with cannabis use. Many of these genes seemed to be associated with other habits, personality traits, and mental health conditions, particularly the gene CADM2. Variations in CADM2, the authors noted, have already been linked to taking more risks, greater alcohol use, and personality traits such as extraversion. They also found a genetic overlap with schizophrenia.

“That is not a big surprise, because previous studies have often shown that cannabis use and schizophrenia are associated with each other,” lead author Jacqueline Vink, a researcher at Radboud University in the Netherlands, said in a statement. “However, we also studied whether this association is causal.”

They attempted to find a possible cause-and-effect relationship using a method called Mendelian randomization. This technique lets geneticists ask whether having the known genes for one thing (schizophrenia, in this study) directly predisposes you to another thing (using marijuana). In this case, they found evidence that being genetically vulnerable to schizophrenia made people more likely to use pot, possibly as a way to cope with their condition, according to the authors.

This finding in particular is important because we still don’t really understand how cannabis and schizophrenia are tied to one another. Other research has found that pot use itself raises the risk of schizophrenia, especially if begun at an early age by people already at risk of mental illness. The authors are careful to point out their single study doesn’t disprove that theory, but it does suggest, as other genetic studies have, that the relationship is complicated.

The researchers next plan to study if there are specific genes that can predict more frequent or heavier use of cannabis.

[Nature Neuroscience]

Google sells the future, powered by your personal data

Breaking News Emails

Get breaking news alerts and special reports. The news and stories that matter, delivered weekday mornings.

“We may analyze [email] content to customize search results, better detect spam and malware,” he added.

It doesn’t stop there, though. Google says it is also leverages some of its datasets to “help build the next generation of ground-breaking artificial intelligence solutions.” On Tuesday, Google rolled out “Smart Replies,” in which artificial intelligence helps users finish sentences.

The extent of the information Google has can be eyebrow-raising even for technology professionals. Dylan Curran, an information technology consultant, recently downloaded everything Facebook had on him and got a 600-megabyte file. When he downloaded the same kind of file from Google, it was 5.5 gigabytes, about nine times as large. His tweets highlighting each kind of information Google had on him, and therefore other users, got nearly 170,000 retweets.

“This is one of the craziest things about the modern age, we would never let the government or a corporation put cameras/microphones in our homes or location trackers on us, but we just went ahead and did it ourselves because … I want to watch cute dog videos,” Curran wrote.

Want to freak yourself out? I’m gonna show just how much of your information the likes of Facebook and Google store about you without you even realising it

— Dylan Curran (@iamdylancurran) March 24, 2018

What does Google guarantee?

The company has installed various guardrails against this data being misused. It says it doesn’t sell your personal information, makes user data anonymous after 18 months, and offers tools for users to delete their recorded data piece by piece or in its (almost) entirety, and to limit how they’re being tracked and targeted for advertising. And it doesn’t allow marketers to target users based on sensitive categories like beliefs, sexual interests or personal hardships.

However, that doesn’t prevent the company from selling advertising slots that can be narrowed to a user’s ZIP code. Combined with enough other categories of interest and behavior, Google advertisers can create a fairly tight Venn diagram of potential viewers of a marketing message, with a minimum of 100 people.

“They collect everything they can, as a culture,” Scott Cleland, chairman of NetCompetition, an advocacy group that counts Comcast and other cable companies among its members, told NBC News. “They know they’ll find some use for it.”

What can you do about it?

“We give users controls to delete individual items, services or their entire account,” said Google’s Stein. “When a user decides to delete data, we go through a process over time to safely and completely remove it from our systems, including backups. We keep some data with a user’s Google Account, like when and how they use certain features, until the account is deleted.”

New European data privacy rules known as GDPR are set to go into effect on May 25. Those new regulations are supposed to limit what data can be collected on users and give them the ability to completely delete their data from systems, as well as bring their data from one service to another. Companies like Google will be forced to more clearly spell out to customers what kind of data is being collected and no longer be able to bury them in fine print, with fines for violations up to 4 percent of revenue.

What might Google do in the future?

All that data is already valuable to Google, but it could yield an even greater return once paired with advanced artificial intelligence systems that offer highly personalized services, like a souped-up version of Google Assistant.

“On your way to a friend’s house and say ‘find wine’ and you’ll get recommendations for a store that is still open and also not out of the route,” said Oren Etzioni, CEO of the Allen Institute for Artificial Intelligence, a research group founded by Microsoft co-founder Paul Allen.

But Etzioni recommended caution before we unleash swarms of digital agents.

Already we’ve seen some unpleasant effects. Palantir, a security and data-mining firm, sells software that hoovers up data and allows law enforcement to engage in “predictive policing,” guesstimating who might commit crimes. Uber’s self-driving car experiment resulted in a pedestrian being killed after the software was tuned too far in the direction of ignoring stray objects, like plastic bags.

“We need to think hard about how AI gathers and extrapolates data,” Etzioni said. “It has deep implications.”

Breaking News Emails

Get breaking news alerts and special reports. The news and stories that matter, delivered weekday mornings.