4 items tagged "platform "

  • Five factors to help select the right data warehouse product

    meer-bronnenHow big is your company, and what resources does it have? What are your performance needs? Answering these questions and others can help you select the right data warehouse platform.

    Once you've decided to implement a new data warehouse, or expand an existing one, you'll want to ensure that you choose the technology that's right for your organization. This can be challenging, as there are many data warehouse platforms and vendors to consider.

    Long-time data warehouse users generally have a relational database management system (RDBMS) such as IBM DB2, Oracle or SQL Server. It makes sense for these companies to expand their data warehouses by continuing to use their existing platforms. Each of these platforms offers updated features and add-on functionality (see the sidebar, "What if you already have a data warehouse?").

    But the decision is more complicated for first-time users, as all data warehousing platform options are available to them. They can opt to use a traditional DBMS, an analytic DBMS, a data warehouse appliance or a cloud data warehouse. The following factors may help make the decision process easier.

    1. How large is your company?

    Larger companies looking to deploy data warehouse systems generally have more resources, including financial and staffing, which translates to more technology options. It can make sense for these companies to implement multiple data warehouse platforms, such as an RDBMS coupled with an analytical DBMS such as Hewlett Packard Enterprise (HPE) Vertica or SAP IQ. Traditional queries can be processed by the RDBMS, while online analytical processing (OLAP) and nontraditional queries can be processed by the analytical DBMS. Nontraditional queries aren't usually found in transactional applications typified by quick lookups. This could be a document-based query or a free-form search, such as those done on Web search sites like Google and Bing.

    For example, HPE Vertica offers Machine Data Log Text Search, which helps users collect and index large log file data sets. The product's enhanced SQL analytics functions deliver in-depth capabilities for OLAP, geospatial and sentiment analysis. An organization might also consider SAP IQ for in-depth OLAP as a near-real-time service to SAP HANA data.

    Teradata Corp.'s Active Enterprise Data Warehouse (EDW) platform is another viable option for large enterprises. Active EDW is a database appliance designed to support data warehousing that's built on a massively parallel processing architecture. The platform combines relational and columnar capabilities, along with limited NoSQL capabilities. Teradata Active EDW can be deployed on-premises or in the cloud, either directly from Teradata or through Amazon Web Services.

    For midsize organizations, where a mixture of flexibility and simplicity is important, reducing the number of vendors is a good idea. That means looking for suppliers that offer compatible technology across different platforms. For example, Microsoft, IBM and Oracle all have significant software portfolios that can help minimize the number of other vendors an organization might need. Hybrid transaction/analytical processing (HTAP) capabilities that enable a single DBMS to run both transaction processing and analytics applications should also appeal to midsize organizations.

    Smaller organizations and those with minimal IT support should consider a data warehouse appliance or a cloud-based data warehouse as a service (DWaaS) offering. Both options make it easier to get up and running, and minimize the administration work needed to keep a data warehouse functional. In the cloud, for example, Amazon Redshift and IBM dashDB offer fully managed data warehousing services that can lower up-front implementation costs and ongoing management expenses.

    Regardless of company size, it can make sense for an organization to work with a vendor or product that it has experience using. For example, companies using Oracle Database might consider the Oracle Exadata Database Machine, Oracle's data warehouse appliance. Exadata runs Oracle Database 12c, so Oracle developers and DBAs should immediately be able to use the appliance. Also, the up-front system planning and integration required for data warehousing projects is eliminated with Exadata because it bundles the DBMS with compute, storage and networking technologies.

    A similar option for organizations that use IBM DB2 is the IBM PureData System for Analytics, which is based on DB2 for LUW. Keep in mind, however, that data warehouse appliances can be costly, at times pricing themselves out of the market for smaller organizations.

    Microsoft customers should consider the preview release of Microsoft Azure SQL Data Warehouse. It's a fully managed data warehouse service that's compatible and integrated with the Microsoft SQL Server ecosystem.

    2. What are your availability and performance needs?

    Other factors to consider include high availability and rapid response. Most organizations that decide to deploy a data warehouse will likely want both, but not every data warehouse actually requires them.

    When availability and performance are the most important criteria, DWaaS should be at the bottom of your list because of the lower speed imposed by network latency with cloud access. Instead, on-premises deployment can be tuned and optimized by IT technicians to deliver increased system availability and faster performance at the high end. This can mean using the latest features of an RDBMS, including the HTAP capabilities of Oracle Database, or IBM's DB2 with either the IBM DB2 Analytics Accelerator add-on product for DB2 for z/OS or BLU Acceleration capabilities for DB2 for LUW. Most RDBMS vendors offer capabilities such as materialized views, bitmap indexes, zone maps, and high-end compression for data and indexes. For most users, however, satisfactory performance and availability can be achieved with data warehouse appliances such as IBM PureData, Teradata Active EDW and Oracle Exadata. These platforms are engineered for data warehousing workloads, but require minimal tuning and administration.

    Another appliance to consider is the Actian Analytics Platform, which is designed to support high-speed data warehouse implementation and management. The platform combines relational and columnar capabilities, but also includes high-end features for data integration, analytics and performance. It can be a good choice for organizations requiring both traditional and nontraditional data warehouse queries. The Actian Analytics Platform includes Actian Vector, a Symmetric Multiprocessor DBMS designed for high-performance analytics, which exploits many newer, performance-oriented features such as single instruction multiple data. This enables a single operation to be applied on a set of data at once and CPU cache to be utilized as execution memory.

    Pivotal Greenplum is an open source, massively parallel data warehouse platform capable of delivering high-speed analytics on large volumes of data. The platform combines relational and columnar capabilities and can be deployed on-premises as software or an appliance, or as a service in the cloud. Given its open source orientation, Pivotal Greenplum may be viewed favorably by organizations basing their infrastructure on an open source computing stack.

    3. Are you already in the cloud?

    DWaaS is probably the best option for companies that already conduct cloud-based operations. The other data warehouse platform options would require your business to move data from the cloud to an on-premises data warehouse. Keep in mind, though, that in addition to cloud-only options like Amazon Redshift, IBM dashDB and Microsoft Azure SQL Data Warehouse, many data warehouse platform providers offer cloud-based deployments.

    4. What are your data volume and latency requirements?

    Although many large data warehouses contain petabytes of raw data, every data warehouse implementation has different data storage needs. The largest data warehouses are usually customized combinations of RDBMS and analytic DBMS or HTAP implementations. As data volume requirements diminish, more varied options can be utilized, including data warehouse appliances.

    5. Is a data warehouse part of your big data strategy?

    Big data requirements have begun to impact the data warehouse, and many organizations are integrating unstructured and multimedia data into their data warehouse to combine analytics with business intelligence requirements -- aka polyglot data warehousing. If your project could benefit from integrated polyglot data warehousing, you need a platform that can manage and utilize this type of data. For example, the big RDBMS vendors -- IBM, Oracle and Microsoft -- are integrating support for nontraditional data and Hadoop in each of their respective products.

    You may also wish to consider IBM dashDB, which can process unstructured data via its direct integration with IBM Cloudant, enabling you to store and access JSON and NoSQL data. The Teradata Active EDW supports Teradata's Unified Data Architecture, which enables organizations to seamlessly access and analyze relational and nonrelational data. The Actian Analytics Platform delivers a data science workbench, simplifying analytics, as well as a scaled-out version of Actian Vector for processing data in Hadoop. Last, the Microsoft Azure SQL Data Warehouse enables analysis across many kinds of data, including relational data and semi-structured data stored in Hadoop, using its T-SQL language.

    Although organizations have been building data warehouses since the 1980s, the manner in which they are being implemented has changed considerably. After reading this four-part series, you should have a better idea of how modern data warehouses are built and what each of the leading vendors provides. Armed with this knowledge, you can make a more informed choice when purchasing data warehouse products.

    Source: TechTarget

  • Gartner Magic Quadrant: Tableau opnieuw leider in Analytics en BI patforms

    Gartner Magic Quadrant: Tableau opnieuw leider in Analytics en BI patforms

    Tableau is opnieuw door Gartner benoemd als leider in het Magic Quadrant voor Analytics en Business Intelligence Platforms. Dit is het achtste opeenvolgende jaar waarin Tableau als leider wordt erkend in dit Magic Quadrant.

    In een tijd van stevige groei, marktevolutie en verhoogde klantverwachtingen, is Tableau opgeschaald met klantgerichte innovatie en platformuitbreiding om te voldoen aan de behoeften van datagestuurde ondernemingen.

    Tableau is in het Magic Quadrant de naaste concurrent van Microsoft en heeft een goede staat van dienst als het gaat om interactieve data-exploratie en visualisatie. Tableau, dat tegenwoordig eigendom is van Salesforce, heeft haar aanbod in 2019 versterkt met data preparation, een natural language interface voor het stellen van vragen, toevoeging van AI-onderdelen en ondersteuning voor serveromgevingen.

    Het rapport van Tableau kun je hier vinden.

    Bron: BI-platform

  • Recognizing the mismatch between your analytics platform and your business

    Recognizing the mismatch between your analytics platform and your business

    It’s no secret that analytics confers a significant competitive advantage on companies that successfully implement BI platforms and drive key decision making with data. Yet, many organizations struggle in this endeavor. So, why aren’t more analytics and BI implementations delivering results? No one believes that you can simply install analytics and BI software and magic will occur. It is understood that a successful implementation requires two other ingredients: people (end users) and processes (collaboration). The magic only happens when you have alignment on all three elements: the right people, the right processes, and the right tools.

    But what if you knew you had the best and brightest on your staff? And what if they were hungry to solve the organization’s most pressing challenges with data? What if the reason the BI implementation was failing was not the users or their willingness to work together, but that they were using the wrong analytics platform? What if the solution chosen as the centerpiece of an analytics strategy was not fit for duty?

    Watch for the signs

    Consider the following scenario: You finally chose the analytics platform that you hoped would propel your organization to success. At first, everything seemed fine. You went through dozens of stakeholder reviews and witnessed countless vendor demos. You spoke to your executive team, ITleaders, and line-of-business managers. You eliminated the platforms that seemed too complicated for the task and the ones that didn’t quite have the horsepower for your enterprise needs. Plus, the CEO loved the attractive visualizations and report templates included out-of-the-box.

    But now you are halfway through the implementation, and you are starting to see the signs that things are not going entirely to plan. You have the feeling that nothing has really changed in the way people go about their work and that the business has not made any significant progress. You look around and begin to feel that the BI application you selected may not have been the best choice. The following are four signs that you may have chosen the wrong platform:

    1. The content tells you answers everyone already knows

    Everybody loves pie charts. And column charts. And scatter plots. Any visualization is fantastic. However, visualizations are simply representations of data, and they often tell you what you already know. For example, say you have a pie chart on a dashboard that shows your top 10 customers by geography. It will wow you at first, but the novelty wears thin when you realize you already knew your top accounts. What you’d like to do is ask the next questions? What’s the year-over-year change in customers? Why am I losing them or keeping them? Can I take my highest performing salespeople and see why they are successful compared to the unsuccessful ones? If your platform gives you attractive charts, but only offers a modicum of analytic depth, you’ll be left hungry for more.

    2. People are not using it

    Imagine that an analyst has a beautiful chart based on data from your accounting system showing product sales over the last three trailing quarters. But the chart doesn’t tell her about profitability in the next 3 months, or the reasons for profitability. It only gives her the obvious answers.

    So, she reviews a separate profit and loss report (usually a grid of figures), cuts and pastes the data into Excel, applies a forecast algorithm, and then plops that into a PowerPoint to share with the VP of sales. Worse yet, she extracts it out of the accounting tool as raw data because the data in the BI platform was both stale and slightly incorrect. In short, she uses anything other than your company’s expensive analytics tool to produce the real insights. If your employees are not using the platform to make decisions, it risks becoming shelfware.

    A provider of well-known BI platform likes to promote its high number of 'active touches'. What’s alarming is that the vendor considers an active touch to be once-a-month use. So, here are a few questions: Is a person actively communicating if they’re only checking their email once per month? Are you considered worldly if you only check the news once per month? Similarly, are your employees 'data-driven' if they’re only checking their analytics once per month? A successful implementation requires active use of data, and people should have a natural need to use it.

    3. Your tool is too simplistic to answer complex business questions; or, it’s too complicated for people to actually use

    You purchased the platform to accelerate speed-to-insight, not slow it down. However, if you find that your platform merely generates visualizations that don’t trigger meaningful action, then your analytics tool lacks sophistication. Data visualizations cannot make decisions for you, they simply provide representations of data. However, if the visualization is inherently unsophisticated, or simply restates the obvious, it’s just a pretty picture. And if the analytics tool doesn’t give you the ability to interrogate the data beyond the static (or lightly interactive) visualizations, or you need expert help to answer the question, that’s a problem. Your users require something more sophisticated if they’re going to use it. Difficult business questions require sophisticated tools.

    Many analytics platforms tools are rudimentary by design in an attempt to cater to the lowest common denominator (the casual user who only lightly consumes information). Yet they alienate the users that want more than just attractive visualizations. Some platforms cater to the 5% of users who demand high-powered analytics, the data scientists among the userbase. However, this yet again alienates the majority of users because the tool is too difficult or time-consuming to learn. Analytics is a continually evolving exercise. You need to be constantly thinking about the next question and the next question after that. And the next question cannot come at a tremendous cost, it cannot be a development project that constrains decisions.

    For an analytics implementation to truly work, it needs to cater to the 80% in the middle group of users. The ideal platform finds that middle ground. It provides you with a friendly UI that the average user can appreciate, but plumbs in sophisticated analytics, with simplicity, so advanced users can explore greater depths of sophistication and answer the tough business questions. The art is activating the 80%, those that need more than nothing, but less than everything.

    4. The confidence in your insights and analysis is low

    Now, more than ever, users need data to inform their decisions, and they need to be able to trust the data. Desktop-based tools allow users to build their own content entirely untethered from the organization, regardless of whether the underlying data or analytics is accurate or not. This causes downstream problems and sows distrust in the integrity of the data. No one can act on information without confidence in the people, processes, and tools. Analytic platforms should provide governance capabilities to manage data from a centrally administered repository so that analysis can be reproducible and defensible. It should provide the means to trace the origins of the data, the techniques used to examine it, and the individuals who prepared the analysis.

    The dangers of picking the wrong analytics platform

    Often, data visualization platforms are purchased when 'analytics' is merely a check box. The platforms may provide the ability to build and show data representations, but they seldom go deep enough. A serious analytics platform lets you and your business users ask the next big question, and the next one after that. And the questions are never simple. If the answer is obvious, they usually don’t need to be asked.

    If you made a purchasing decision with analytics as an afterthought, you will see the signs with time. It could mean that your efforts won’t deliver meaningful value or, worse yet, that your efforts will utterly fail. So, if you are serious about your analytics, then get a serious analytics platform.

    Author: Avi Perez

    Source: Pyramid Analytics

  • Zooming In On The Data Science Pipeline

    Zooming In On The Data Science Pipeline

    Finding the right data science tools is paramount if your team is to discover business insight. Here are five things to look for when you search for your next data science platform.

    If you are old enough to have grown up with the Looney Tunes cartoons, you probably remember watching clips of Wile E. Coyote chasing the Road Runner hoping to one day catch him. In each episode, the coyote would use increasingly outrageous tools to try to outwit his nemesis, only to fail disastrously each time. Without the right tools, he was forever doomed to failure.

    As a data scientist, do you constantly feel like you are bringing the wrong tool to the job as you strive to find and capture one of the most valuable, yet elusive, targets around -- business insight?

    As data science tools and platforms mature, organizations are constantly looking to find what their analysts need to be most effective in their jobs. The right tool could mean the difference between success and failure when put in the hands of capable data scientists.

    As you are trying to find the right data science tools for your team, here are five areas to consider in your evaluation.

    Algorithms

    The first thing you need to evaluate when looking at a potential data science platform is what algorithms it supports. In your assessment of algorithms, you must understand just what your business is and what your data science organization will actually use.

    There are many algorithms available. Some are generic in nature and can be used in a broad set of scenarios. Others are very specific to unique problem sets. In the hands of the right data scientist, both types of algorithms can be extremely advantageous and valuable. The challenge is that the more algorithms available, the harder it is for the team to select the correct one to meet the current business problem. In your evaluation, ensure that the algorithms known to your team are available and are not crowded out by algorithms they will not use.

    In addition to the algorithms that are already pre-packaged as part of the data science platform, one area to look at is the extensibility of the platform. Can new algorithms be added? Are there marketplaces of new algorithms available for the platform? Can the team evolve the algorithms to meet their needs? Such extensibility will provide your team access to new and valuable algorithms as they become available and can become a critical success factor for your data science team.

    Data Preprocessing

    One of the main tasks your team will be performing is preparing the data. This involves cleaning the data, transforming it, breaking the conglomerate data into its parts, and normalizing it. Different types of algorithms have limitations on what data they can consume and use. Your data science platform must be able to take available data and prepare it for input into your process.

    If you have text data in your environment, text processing can be a vital component to your data science platform. This can be as simple as parsing the text into individual words or it can involve more complex data, such as the meaning of these words, the topics associated with the text, or the sentiment of the text. If this is important to your data science program, make sure your platform has the right support for your use cases.

    Model Training and Testing

    Once you have the right data in the right format and you have chosen the right algorithm or set of algorithms, the next step is to use these to define a model. When evaluating data science tools, understand what this process of model training and testing looks like and how it functions.

    In your evaluation, understand if this process is accomplished through a graphical user interface or through coding. With the training process, understand what parameters are available to measure the progress on the model creation and how to define stopping points. As an automated iterative process, you will want your team to define when that process is completed and when the results are good enough to move to the next step.

    Look at the documentation output of the model development process. Does it give you enough traceability about what the resulting model is, how it works, and why it chose that model over other variations? These can be critical in selling your results to the business and are becoming a requirement from governments if the model has an impact on decisions where bias could be detrimental to people.

    Collaboration

    You might have a small team of data scientists or a large team with many different roles. Either way, it is important that your team members have an effective ecosystem where they can collaborate. This can involve collaboration on the cleaning of data, the development and testing of models, or on the deployment of these models into production.

    With the shortage of data science resources in the market, some companies are starting to look outside the walls of their organizations for citizen data scientists -- individuals outside of the organization who can collaborate with your teams to perform analysis of data and create models. As the extent of your team boundaries grows, your requirements for a platform that enables that collaboration increase as well. Ensure that the platform you select can be used across those boundaries.

    MLOps and Operationalization

    Data science in the laboratory is important, but for the results of their work to be beneficial to your business in a sustainable and repeatable way, the data preprocessing and model deployment has to be operationalized. Creating models and deploying models to a production environment require different skills. Sometimes, you will have resources who span both disciplines, but as your team grows and becomes more complex, these resources will often be very different.

    It is important that you assess the platform’s capabilities to facilitate the collaboration between the data scientists as well as the collaboration between the data scientists and MLOps, who have the responsibility for deployment and ongoing sustainability of these models.

    Evaluate what mechanisms are in place in your platform to enable the models to be promoted from the development stage to production stage and what gates exists along the way to maintain system integrity.

    Evaluate Your Platform

    As you meet with potential vendors, make sure you know what your team needs to be successful and then use those criteria to evaluate the fit of the tool to the situation at hand. Using these five key areas of evaluation will provide you the basis for an effective set of conversations with your vendor. If you have the right tools on hand for your data scientists, hopefully you won’t find yourself like Wile E. Coyote -- getting burned in the end -- but rather capturing that elusive target: business value.

    Author: Troy Hiltbrand

    Source: TDWI

EasyTagCloud v2.8