2 items tagged "busines intelligence"

  • Strengthening Analytics with Data Documentation

    Strengthening Analytics with Data Documentation

    Data documentation is a new term used to describe the capture and use of information about your data.  It is used mainly in the context of data transformation, whereby data engineers and analysts can better describe the data models created in data transformation workflows.

    Data documentation is critical to your analytics processes. It helps all personas involved in the data modeling and transformation process share, assist, and participate in the data and analytics engineering process.

    Let’s take a deeper dive into data documentation, explore what makes for good data documentation, and see how a deep set of data documentation helps add greater value to your analytics processes.

    What is Data Documentation?

    At the simplest level, data documentation is information about your data. This information ranges from raw schema information to system-generated information to user-supplied information.

    While many people associate information about your data with data catalogs, data catalogs are a more general-purpose solution that spans all of your data and tends to be in the domain of IT.  If an organization uses an enterprise data catalog, data documentation should further enhance data from the data catalog.

    Data documentation refers to information captured about your data in the data modeling and transformation process. Data documentation is highly specific to the data engineering and analytics processes and is in the domain of data engineering and analytics teams.

    How is Data Documentation Used?

    Data documentation is used throughout your analytics processes, including data engineering, analytics generation, and analytics consumption by the business. Each persona in the process will contribute and use data documentation based on their knowledge about the data and how they participate in the process:

    • Data engineers – This persona tends to know more about the data itself – where it resides, how it is structured and formatted, and how to get it – and less about how the business uses the data. They will document the core information about the data and how it was transformed. They will also use this information when vetting and trouble-shooting models and datasets.
    • Data analysts and scientists – These personas tend to know less about the core data itself but completely understand how the data is incorporated into analytics and how the business would use the data. They will document the data with this type of information: what the data is good for, how it is used, if it is good and trusted, and what analytics are generated from it.
    • Business analysts and teams – These teams will interpret the analytics from the analytics teams to make decisions and resulting actions. The business side needs to understand where the data came from and how it was brought together to best interpret the analytics results. They will consume information captured by the data engineering and analytics teams but will also add information about how they use the data and the business results from the data.

    What Should You Expect for Data Documentation?

    The data documentation in many data transformation tools focuses on the data engineering side of the analytics process to ensure that data workflows are defined and executed properly. This basic form of data documentation is one way these tools help facilitate software development best practices within data engineering.

    Only basic information about the data is captured in these data transformation tools, such as schema information. Any additional information is placed by data engineers within their data modeling and transformation code – SQL – as comments and is used to describe how the data was manipulated for other data engineers to use when determining how to best reuse data models.

    The basic information capture and use in most data transformation tools limit the spread of information, knowledge capture, and knowledge sharing across the broader data, analytics, and business teams. This hinders the overall analytics process, makes analytics teams hesitant to trust data, and could lead to analytics and business teams misinterpreting data.

    As you evaluate data transformation tools, you should look for much broader and deeper data documentation facilities that your extended data, analytics, and business teams can use and participate in the process.  Information that can be captured, supplied, and used should include what is described below.

    Auto-generated documentation and information

    • The technical schema information about the data,
    • The transformations performed both within each model and across the entire data workflow,
    • Deep data profiles at each stage in the data workflow as well as in the end data model delivered to analytics teams,
    • System-defined properties such as owner, create date, created by, last modified date, last modified by, and more,
    • The end to end data lineage for any data workflow from raw data to the final consumed data model, and
    • Auditing and status information such as when data workflows are run, and data models have been generated.

    User-supplied information

    • Descriptions that can be applied at the field level, data model level, and entire data workflow level,
    • Tags that can be used for a standardized means to label datasets for what the data contains to how it is used,
    • Custom properties that allow analytics and business users to add business-level properties to the data,
    • Status and certification fields that have specific purposes of adding trust levels to the data such as status (live or in-dev) or certified,
    • Business metadata that allows analytics and business teams to describe data in their terms, and
    • Comments that allow the entire team to add ad-hoc information about the data and communicate effectively in the data engineering process.

    Let’s explore how this broader and deeper set of data documentation positively impacts your analytics processes.

    Collaboration and Knowledge-sharing

    The broader and deeper data documentation described above helps the extended team involved in the analytics process to better collaborate and share the knowledge each has with the rest of the team. This level of collaboration allows the broader, diverse team to:

    • Efficiently handoff models or components between members at various phases,
    • Contribute and use their skills in the most effective manner,
    • Provide and share knowledge for more effective reuse of models and promote proper use of the data in analytics,
    • Crowdsourcing tasks such as testing, auditing, and governance.

    Beyond making the process efficient and increasing team productivity, a collaborative data transformation workflow eliminates manual handoffs and misinterpretation of requirements. This adds one more valuable benefit: it eliminates errors in the data transformations and ensures models get done right the first time.

    Discovery

    When specific analytics team members are waiting for data engineering to complete a project and deliver analytics-ready datasets, they are typically involved in the process and receive a handoff of the datasets. But what about the rest of the analytics team? Perhaps they can use these new datasets as well.

    Your data modeling and transformation tool should have a rich, Google-like faceted search capability that allows any team member to search for datasets across ALL the information in the broad and deep data documentation.  This allows:

    • Analysts to easily discover what datasets are out there, how they can use these datasets, and quickly determine if datasets apply to the analytics problem they are currently trying to solve,
    • Data engineers to easily find data workflows and data models created by other data engineers to determine if they may already solve the problem they are tasked with or if they can reuse them in their current project, and
    • Business teams to discover the datasets used in the analytics they are consuming for complete transparency and to best interpret the results.

    Facilitating Data Literacy and Strong Analytics

    The broader and deeper data documentation we have described here can be used as a lynchpin for facilitating greater data literacy. This happens across all four personas:

    • Data engineers – the data documentation information provided by the downstream consumers of the data workflows allows data engineering teams to have greater knowledge of how data is used and helps them get greater context into their future projects,
    • Analysts – the information provided by data engineers, other analysts, and business teams allows analysts to gain a better understanding of how to use data and produce faster and more meaningful analytics,
    • Data scientists – they can use the information provided about the data to best understand the best form and fit for their AI and ML projects for faster execution of projects and highly accurate models, and
    • Business teams – these teams can use the information to increase the overall understanding of the datasets used to increase their trust in the results and perform fast, decisive actions based on the analytics.

    Wrap Up

    Your data documentation should be better than basic schema information and comments left by data engineers in their SQL code.  Everyone involved in the analytics process – data engineers, analytics producers, and analytics consumers – all have knowledge and information about the data that should be captured and shared across the entire team that helps everyone.

    Using a data transformation tool that provides the richer data documentation we’ve described here delivers a faster analytics process, fosters collaboration and easy discoverability, and promotes greater data literacy.  This leads to greater and better use of your data, strong and accurate analytics and data science, highly trusted results, and more decisive actions by the business.

    Author: John Morrell

    Source: Datameer

  • Why flexibility is key to keep up with developments in BI software

    Why flexibility is key to keep up with developments in BI software

    In modern business, the proliferation of enterprise digital applications—analytics included—is accelerating their ongoing journeys toward cloud infrastructure adoption. The change is driven by companies’ growing need for a greater number and variety of users to access those applications. But repeatedly introducing new applications to meet business and user needs may not be the best bet.

    On a given day, a person will use more than 20 software applications—be they cloud-based, enterprise, or desktop—O’Reilly reports. Although analytics is increasingly critical to decision-making at all levels of the organization, business leaders risk further dividing users’ attention with new analytics applications.

    Still, employees expect all the capabilities they need at their fingertips. If senior leaders ask them to make a decision as part of their responsibilities, those same leaders must provide a streamlined way for those employees to access the insights they need to make that decision a good one.

    Analytics designed with this flexibility in mind makes this possible. These applications integrate analytics access into existing employee applications and workflows. As Forrester describes, “In the future, BI [i.e., analytics] will enable business users to turn insights into actions without having to leave whatever business or productivity application they have open.” Adaptive analytics environments of this kind allow classic analytics experiences to blend seamlessly with relevant tasks and processes, enhancing corresponding decision making for their users.

    The Problems With Existing Data Ecosystems

    Although most workers still lack access to powerful analytics, many more struggle to leverage analytics within their existing, business-critical workflows. Instead, organizations’ existing data ecosystems are separated within different environments, where individual teams each have their own silos to which they have grown accustomed. This makes it difficult to standardize data access, let alone have individual team members prioritize and use approved analytics resources.

    In lieu of data literacy training and self-service analytics capabilities, workers have had to rely on IT or data wranglers to retrieve the right data for them. Requesting data in this way is often a tedious, drawn-out process that many workers simply avoid. “If it takes four months to get data to support a decision, then the opportunity is lost,” Forbes describes in their article on 2021 analytics trends. “For the business to drive critical outcomes and opportunities quickly, data needs to be available quickly.”

    Make Analytics—Not Users—Adapt

    The future of successful analytics is an adaptive environment that can adjust to a constantly evolving and improving business decision lifecycle. Adaptive analytics of this kind is platform-agnostic and scalable; it can be deployed in any scenario and across on-premises, cloud, or hybrid environments. 

    Since adaptive analytics is cloud-based and flexible, integrating and evolving with a wide variety of digital tools, it future-proofs organizations from missing these opportunities. And while many companies cannot give up on on-premises data sources, advanced analytics of this kind allow them to harness the power of their legacy data stores and provide a bridge from the cloud.

    Most importantly, employees won’t have to change their best practices and existing workflows to leverage the latest, greatest analytics capabilities as they arrive. With embedded analytics, users can “‘make analytics calls’ [to databases] on-demand on a massively big scale and uncover previously hidden patterns and correlations,” as Forbes explains in another article. This solves the core problem many business leaders overlook: Workers don’t want new digital tools, per se; they want answers to their questions and support for their everyday responsibilities. 

    Hyperconvergence Is Key

    Simply put, business leaders must remove the barrier between business users and analytics accessibility—not by swinging open the doors to analytics tools but by strategically integrating analytics into business users’ existing workflows through hyperconvergence:

    “Hyperconverged data analytics is still big data analytics, but it is highly scalable, increasingly intelligent data analytics that has been unified with other core data tools and data functions, while it is also dovetailed with other business tools and business functions.”

    Forbes, “How Data Analytics Became Hyperconverged,” May 27, 2021

    Instead of forcing business users to engage in the tedious process of requesting data from data scientists, those data scientists can connect analytics resources to other business-critical application libraries, APIs, or workflows. In this way, data teams can “operationalize” analytics for business users without forcing them to learn entirely new applications. Analytics finds its place at each company’s operational center, within the same applications workers use every day. 

    As advanced analytics tools evolve, data scientists can enhance capabilities within those business-critical applications as well. For example, natural language processing (NLP) within those applications allows users to seamlessly retrieve insights from data—without technical analytics knowledge and without leaving their preferred application environment. Analytics may use AI to anticipate user behaviors within those application environments as well and then make recommendations based on any variety of available data.

    Start Turning Analytics Into People Power

    According to Forbes’ aforementioned article on 2021 analytics trends, “[2021] will be the year we see the influence of the business user have a major impact on data and analytics, driven by the need to adapt quickly to the next round of changes caused by the pandemic and the economy.” Indeed, flexibility will be critical to the longevity of analytics investments, the ubiquity of adoption, the success of business decision-making, and the realization of ROI.

    Flexibility in analytics also means the technology adapts to—and drives value within—the cultural norms and decision workflows within the organization, helping employees to improve rather than dramatically change how they work, collaborate, and improve. It’s the organizations that prioritize the data needs of their business users who will be most successful in the years to come.

    Author: Omri Kohl

    Source: Pyramid Analytics

EasyTagCloud v2.8