3 items tagged "data analysis"

  • Solutions to help you deal with heterogeneous data sources

    Solutions to help you deal with heterogeneous data sources

    With enterprise data pouring in from different sources; CRM systems, web applications, databases, files, etc., streamlining data processes is a significant challenge as it requires integrating heterogeneous data streams. In such a scenario, standardizing data becomes a pre-requisite for effective and accurate data analysis. The absence of the right integration strategy will give rise to application-specific and intradepartmental data silos, which can hinder productivity and delay results.

    Consolidating data from disparate structured, unstructured, and semi-structured sources can be complex. A survey conducted by Gartner revealed that one-third of respondents consider 'integrating multiple data sources' as one of the top four integration challenges.

    Understanding the common issues faced during this process can help enterprises successfully counteract them. Here are three challenges generally faced by organizations when integrating heterogeneous data sources, as well as ways to resolve them:

    Data extraction

    Challenge: Pulling source data is the first step in the integration process. But it can be complicated and time-consuming if data sources have different formats, structures, and types. Moreover, once the data is extracted, it needs to be transformed to make it compatible with the destination system before integration.

    Solution: The best way to go about this is to create a list of sources that your organization deals with regularly. Look for an integration tool that supports extraction from all these sources. Preferably, go with a tool that supports structured, unstructured, and semi-structured sources to simplify and streamline the extraction process.

    Data integrity

    Challenge: Data Quality is a primary concern in every data integration strategy. Poor data quality can be a compounding problem that can affect the entire integration cycle. Processing invalid or incorrect data can lead to faulty analytics, which if passed downstream, can corrupt results.

    Solution: To ensure that correct and accurate data goes into the data pipeline, create a data quality management plan before starting the project. Outlining these steps guarantees that bad data is kept out of every step of the data pipeline, from development to processing.


    Challenge: Data heterogeneity leads to the inflow of data from diverse sources into a unified system, which can ultimately lead to exponential growth in data volume. To tackle this challenge, organizations need to employ a robust integration solution that has the features to handle high volume and disparity in data without compromising on performance.

    Solution: Anticipating the extent of growth in enterprise data can help organizations select the right integration solution that meets their scalability and diversity requirements. Integrating one data point at a time is beneficial in this scenario. Evaluating the value of each data point with respect to the overall integration strategy can help prioritize and plan. Say that an enterprise wants to consolidate data from three different sources: Salesforce, SQL Server, and Excel files. The data within each system can be categorized into unique datasets, such as sales, customer information, and financial data. Prioritizing and integrating these datasets one at a time can help organizations gradually scale data processes.

    Author: Ibrahim Surani

    Source: Dataversity

  • Two modern-day shifts in market research

    Two modern-day shifts in Market Research

    In an industry that is changing tremendously, traditional ways of doing things will no longer suffice. Timelines are shortening, as demands for faster and faster insights increase, and, in addition, we are seeking these insights in such a vast sea of data. The only way to address the combination of these two issues is with technology.

    The human-machine relationship

    One good example of this shift is the whole arena surrounding computational text analysis. Smarter, artificial intelligence (AI)-based approaches are completely changing the way we approach this task. In the past, the human-based analysis only allowed us to skim the text, use a small sample and analyze it with subjective bias. This kind of generalized approach is being replaced by a computational methodology that incorporates all the text while throwing away what the computer views as non-essential information. Sometimes, without the right program, much of the meaning can be lost. However, this machine-based approach can work with large amounts of data quickly.

    When we start to dive deeper into AI-based solutions, we see that technology can shoulder much of the hard work to free up humans to do what we can do better. What the machine does really well is finding the data points that can help us tell a better, richer story. It can run algorithms and find patterns in natural language, taking care of the heavy lifting. Then the human can come in, add color and apply sensible intelligence to the data. This human-machine tension is something I predict that we’ll continue to see as we accommodate our new reality. The end goal is to make the machine as smart as possible to really leverage our own limited time in the best ways possible.

    Advanced statistical analysis

    Another big change taking place surrounds the statistical underpinnings we use for analysis. Traditionally we have found things out by using the humble crosstab tool. But if we truly want to understand what’s driving, for example, differences between groups, it is simply not efficient to go through crosstab after crosstab. It is much better to have the machine do it for you and reveal just the differences that matter. When you do that, though, classical statistics break down because false positives become statistically inevitable.

    Bayesian statistics do not suffer this same problem when a high volume of tests are required. In short, a Bayesian approach allows researchers to test a hypothesis and see if it holds given the data, rather than the more commonly used tests for significance which test that the data is right in the face of a given hypothesis.

    There are a host of other models that are changing the way we approach our daily jobs in market research. New tools, some of them based in a completely different set of underlying principles (like Bayesian statistics), are giving us new opportunities. With all these opportunities, we are challenged to work in a new set of circumstances and learn to navigate a new reality.

    We can’t afford to wait any longer to change the way we are doing things. The industry and our clients’ industries are moving too quickly for us to hesitate. I encourage researchers to embrace this new paradigm so that they will have the skill advantage. Try new tools, even if you don’t understand how they work, many of them can help you do what you do (better). Doing things in new ways can lead to better, faster insights. Go for it!

    Author: Geoff Lowe

    Source: Greenbook Blog

  • Using Hierarchical Clustering in data analysis

    Using Hierarchical Clustering in data analysis

    This article discusses the analytical method of Hierarchical Clustering and how it can be used within an organization for analytical purposes.

    What is Hierarchical Clustering?

    Hierarchical Clustering is a process by which objects are classified into a number of groups so that they are as much dissimilar as possible from one group to another group and as much similar as possible within each group.

    For example, if you want to create four groups of items, these items  should be as similar as possible in terms of attributes of the items in each group, and items in group 1 and group 2 should be as dissimilar as possible. All items start in one cluster, and are then divided into two clusters. The data points within one cluster are as similar as possible, and the data points in other clusters are dissimilar from the other clusters being analyzed. For each cluster, we repeat the process until the specified number of clusters is reached (four in this case).

    This type of analysis can be applied to segment customers by purchase history, segment users by the types of activities they perform on websites or applications, to develop personalized consumer profiles based on activities or interests, and to recognize market segments, etc.

    How does an organization use Hierarchical Clustering to analyze data?

    In order to understand the application of Hierarchical Clustering for organizational analysis, let us consider two use cases.

    Use case one

    Business problem: A bank wants to group loan applicants into high/medium/low risk based on attributes such as loan amount, monthly installments, employment tenure, the number of times the applicant has been delinquent in other payments, annual income, debt to income ratio etc.

    Business benefit: Once the segments are identified, the bank will have a loan applicants’ dataset with each applicant labeled as high/medium/low risk. Based on these labels, the bank can easily make a decision on whether to give loan to an applicant and how much credit to extend, as well as the interest rate the applicant will be given, based on the amount of risk involved.

    Use case two

    Business problem: The enterprise wishes to organize customers into groups/segments based on similar traits, product preferences and expectations. Segments are constructed based on customer demographic characteristics, psychographics, past behavior and product use behavior.

    Business benefit: Once the segments are identified, marketing messages and products can be customized for each segment. The better the segment(s) chosen for targeting by a particular organization, the more successful the business will be in the market.

    Hierarchical Clustering can help an enterprise organize data into groups to identify similarities and, equally important, dissimilar groups and characteristics, so that the business can target pricing, products, services, marketing messages and more.

    Author: Kartik Patel

    Source: Dataversity

EasyTagCloud v2.8