3 items tagged "value"

  • Getting real value out of BI: closing the gap between analytics and potential

    Getting real value out of BI: closing the gap between analytics and potential

    The output of BI is used in organizational decision-making, since it is not the process or technologies that are used by decision-makers but rather their output. Arisa Sholo, Copenhagen Business School, 2012

    Traditional Business Intelligence (BI) is not working. BI is supposed to help businesses make data-informed decisions to improve outcomes, but the reality is that most are falling back on gut instinct to drive their actions. Thanks to a serious mismatch between existing BI infrastructure, tools and end-users, there’s a big gap between analytics and potential. While BI seems to have taken several steps forward since its inception, it has also managed to jump backward with every technological advance.  

    To get a clearer picture of what we’re talking about here, we need to step back a little. 

    BI grew within the IT-centric system of record in the 1980s, where those of us with business questions would go to the specialists who ran our databases, ask for reports, and through the painful, iterative dance of “that wasn’t exactly what I was looking for, I actually need *this*” eventually end up with… something, at least.

    By the early 2000s BI evolved into visual-based data discovery, which offered a simple proposition: what if, instead of waiting for IT to figure out what they were looking for, they could make their *own* damn charts, using simple drag-and-drop user interfaces?

    Tremendous success followed. Business Intelligence tools that turned dimensions and measures into charts and graphs flourished, and new names like Tableau, Qlik, and Spotfire started to eclipse the venerable Cognos, Business Objects, and Microstrategy. And as the technical folks responsible for implementing this new breed of tool got asked for more and more sophisticated analyses, eager product leaders converted requests into shiny new features and shipped them at a breakneck pace.

    Visual-based data discovery has a simple premise: Most business questions aren’t hard to answer if you know what you’re looking for, and if a business user understands her question, she should be able to drag and drop dimensions and measures until a chart reveals the answer. Dashboards and self-service BI tools are meant to make analytics quick and easy, correct?

    And there’s the rub.

    Industry analyst firms are circulating a disappointing statistic: Close to 80 and 90% of knowledge workers lack the technical skills, data literacy, or access to make effective use of BI tooling. 

    This is because today’s self-service BI tools are influenced by the needs of data specialists (trained business analysts, data engineers, DBAs, and data scientists) who require more advanced features and capabilities to support their use cases. So that’s why you will notice that Tableau has as many control surfaces as a 737, why Qlik has an expression editor, and why Sisense allows filter values to be expressed in code.  All the vendors in the BI space convert feature requests to software, and race to anticipate the next request of our buyers – which leads to tremendously powerful tools with features 90% of users don’t know how to use.

    Essentially, self-service BI has become too complex, and too impenetrable for its target user. So now, we’ve put ourselves back into the IT-centric system of record. Not only are we back to where we started, but now we’re doing it with tools the data pros don’t like (they don’t need a visual tool, they can write SQL and Python) and the business users don’t like or don’t know how to use.

    So what? Throw it all out? Go back to stone tablets? What if, instead of siloing analytics within the analyst community, you could incorporate data and insights into every workflow your team uses, to drive every decision? 

    This is infused analytics, the much-needed, seamless evolution of data analytics that BI has been sorely missing. 

    Here’s how we can do it: let’s hook our BI infrastructure – with its powerful cloud data warehouses, its beautiful data models, and its ability to mash up metrics from many sources – directly to the tools we *do* know how to use to analyze data. Plug a set of views and a good NLQ engine into Google Sheets, or Excel, or MS Teams, and let’s actually answer our next questions. At least for the 80% of our questions that are easily predictable (does anyone really need to guess what the VP of Sales is going to ask about?) – and for the other 20%, well, that’s why we have statisticians and data analysts on the team.

    Author: Scott Castle

    Source: Sisense

  • How valuable is your data science project really? An evaluation guide

    How valuable is your data science project really? An evaluation guide

    Performance metrics can’t tell you what you want to know: how valuable a project actually is

    There is a big focus in data science on various performance metrics. Data scientists will spend months trying to improve various performance metrics for a project. The issue is, it isn’t clear that all of this effort actually provides value. If you’re only looking at performance metrics, it’s not possible to know if you’re increasing the value your model is providing.

    Performance metrics don’t know how valuable your predictions are. To take one example, F1 score explicitly places equal weighting on precision and recall. In practice, there is usually a different business cost to false positives and false negatives.

    You can tell you how well your model is doing using a million different metrics. But none of these tell you what stakeholders actually want to know: What business value does this have?

    Money Talks

    What is business value? At the end of the day, for a for-profit business, business value is monetary value.

    This is great news for data scientists: we love numbers. Money is quantitative.

    Unfortunately, the connection between what we’re building and the monetary value isn’t always straightforward.

    One of the most concrete ways to connect a data science project to business models is to calculate what implementing that model would mean for the company’s bottom line. This isn’t always possible, but it’s a useful exercise. By looking at the situations where we can calculate an explicit value, we can clarify the areas where the connection is less clear.

    A concrete example of calculating the business value of a model

    Let’s take a straightforward example where we are building a model for a business problem: detecting manufacturing defects at a widget factory.

    We know that if we detect a defective widget, we throw it out, leading to a loss of the manufacturing costs. The cost of replacing a defective widget is $100

    If we fail to detect a defective widget, we ship the widget to a customer, and then have to replace their widget and pay for shipping on the new widget. Let’s say the shipping cost is $50, on top of the $100 loss from replacing the widget.

    If we have a model for predicting defective widgets, we can then write out the costs for different scenarios:

    True positives: -$100 for being down a widget

    False positives: -$100 to replace widget we thought was defective

    True negative: 0 (we’re considering “no defect” to be the default scenario the costs are compared against)

    False negative: -$150 to ship replacement widget

    Credit: Tommy Blanchard

    Note that this is equivalent to saying the cost of a false positive is $100 (the difference between false positive and true negative) and the cost of a false negative is $50 (the difference between a false negative and a true positive).

    We can now build a classifier, and calculate for that classifier what the cost from defects would be if we used that classifier as our screening process. To evaluate the business value of the model is straightforward — we just need to produce a confusion matrix on the test set. Let’s take an example:

    Credit: Tommy Blanchard

    Then we multiply the cost of each outcome with the proportion of times that outcome occurs:

    (-100*0.2) + (-100*0.05) + (-150 * 0.05) + (0 * 0.8) = -32.50.

    In other words, on average, for each widget we will lose an average of $32.50 due to defects if we use this model.

    We can compare this to the cost of the current policy. Let’s assume that currently there is no screening for defects and all widgets are shipped. Based on the confusion matrix above, 25% of widgets are defective. So to calculate the cost of this policy, we just multiply 0.25 by the cost of shipping a defective product:

    -150 * 0.25 = -37.50

    This policy costs an average of $37.50 per widget. Therefore, implementing our model to screen out widgets will save the company an average of $5 per widget.

    That’s it! We’ve calculated the business value of our model. Of course, in the real world there may be costs to implementing a new policy, and those costs would have to be compared against the calculated gains of implementing the model.

    Asymmetric costs shift the optimal decision threshold

    With an explicitly defined cost matrix, we have opportunities to finetune our model to minimize costs further.

    Most classifiers by default use a probability decision threshold of 0.5 to determine what to label positive or negative, but with asymmetric costs that’s not necessarily the best threshold to use.

    For example, in this scenario, false negatives cost more than false positives ($100 for false positives vs $50 for false negatives). This pushes the optimal decision threshold lower; false negatives are less costly than false positives, so we should be more willing to take on false negatives.

    Here is a simple simulation of what the cost curve could look like for this cost matrix (note that the minimum on the cost curve is around 0.7, so that would be our optimal decision threshold):

    Credit: Tommy Blanchard
     
    An example cost curve for different probability thresholds. Costs have been normalized so that 1 is the most costly scenario.

    Therefore, we should label anything with a prediction probability above ~0.7 as defective, and everything else as not defective.

    Changing the costs matrix

    It’s important to realize that a change to the cost matrix can change not only the business value of the model, but also the optimal decision threshold.

    For example, let’s say someone at the company has developed a new test for defective widgets. It’s costly, so we don’t want to use it on every widget, but it definitively tells us if a widget is defective. If that test costs $20, we get a big change in the cost matrix for our model:

    True positives: -$120 ($100 for the cost of production, and an additional $20 for a definitive test to make sure it is defective)

    False positives: -$20 for the cost of the definitive test, which will exonerate good widgets

    True negative: 0 (we’re again considering “no defect” to be the default scenario the costs are compared against)

    False negative: -$150 to ship replacement widget

    Credit: Tommy Blanchard

    Because the cost of a false positive is now lower, this shifts the payoff curve. We should now be more willing to have false positives since they are not as costly, while false negatives remain just as costly:

    Credit: Tommy Blanchard
     
    With the change in the cost matrix, the cost curve shifts. Costs have been normalized so that 1 is the most costly scenario. 

    The optimal decision threshold that minimizes costs has shifted to around 0.3. We’ll label many more widgets as potentially defective, but that’s fine since now we’ll be submitting them to further testing instead of throwing them out. We can calculate how costly this policy will be overall, and compare it to other policies (for example, doing the $20 test on every widget).

    Generalizing in more ambiguous situations

    In the real world, it’s rare that we have such a well-defined problem. Defining a business problem in this way is what I’ve referred to as the hard part of data science.

    In the real world, costs aren’t well known and it’s rare to have a straightforward classification problem that completely captures the essence of the business problem. However, by looking at these simplified cases, we can approach the more complicated problems with greater clarity. Recognizing what is ambiguous or missing in a project definition is the first step towards clarifying the problem and connecting it to a technical solution that brings the most business value.

    Author: Tommy Blanchard

    Source: Towards Data Science

  • The key challenges in translating high quality data to value

    The key challenges in translating high quality data to value

    Most organizations consider their data quality to be either 'good' or 'very good', but there’s a disconnect around understanding and trust in the data and how it informs business decisions, according to new research from software company Syncsort.

    The company surveyed 175 data management professionals earlier this year, and found that 38% rated their data quality as good while 27% said it was very good.

    A majority of the respondents (69%) said their leadership trusts data insights enough to inform business decisions. Yet they also said only 14% of stakeholders had a very good understanding of the data. Of the 27% who reported sub-optimal data quality, 72% said it negatively affected business decisions.

    The top three challenges companies face when ensuring high quality data are multiple sources of data (70%), applying data governance processes (50%) and volume of data (48%).

    Approximately three quarters (78%) have challenges profiling or applying data quality to large data sets, and 29% said they have a partial understanding of the data that exists across their organization. About half (48%) said they have a good understanding.

    Fewer than 50% of the respondents said they take advantage of data profiling tools or data catalogs. Instead, they rely on other methods to gain an understanding of data. More than half use SQL queries and about 40% use business intelligence tools.

    Author: Bob Violino

    Source: Information-management

EasyTagCloud v2.8