14 items tagged "Data warehousing"

  • ‘Vooruitgang in BI, maar let op ROI’

    5601405Business intelligence (bi) werd door Gartner al benoemd tot hoogste prioriteit voor de cio in 2016. Ook de Computable-experts voorspellen dat er veel en grote stappen genomen gaan worden binnen de bi. Tegelijkertijd moeten managers ook terug kijken en nadenken over hun businessmodel bij de inzet van big data: hoe rechtvaardig je de investeringen in big data?

    Kurt de Koning, oprichter van Dutch Offshore ICT Management
    Business intelligence/analytics is door Gartner op nummer één gezet voor 2016 op de prioriteitenlijst voor de cio. Gebruikers zullen in 2016 hun beslissingen steeds meer laten afhangen van stuurinformatie die uit meerdere bronnen komt. Deze bronnen zullen deels bestaan uit ongestructureerde data. De bi-tools zullen dus niet alleen visueel de informatie aantrekkelijk moeten opmaken en een goede gebruikersinterface moeten bieden. Bij het ontsluiten van de data zullen die tools zich onderscheiden , die in staat zijn om orde en overzicht te scheppen uit de vele verschijningsvormen van data.

    Laurent Koelink, senior interim BI professional bij Insight BI
    Big data-oplossingen naast traditionele bi
    Door de groei van het aantal smart devices hebben organisaties steeds meer data te verwerken. Omdat inzicht (in de breedste zin) een van de belangrijkste succesfactoren van de toekomst gaat zijn voor veel organisaties die flexibel in willen kunnen spelen op de vraag van de markt, zullen zijn ook al deze nieuwe (vormen) van informatie moeten kunnen analyseren. Ik zie big data niet als vervangen van traditionele bi-oplossingen, maar eerder als aanvulling waar het gaat om analytische verwerking van grote hoeveelheden (vooral ongestructureerde) data.

    In-memory-oplossingen
    Organisaties lopen steeds vaker aan tegen de performance-beperkingen van traditionele database systemen als het gaat om grote hoeveelheden data die ad hoc moeten kunnen worden geanalyseerd. Specifieke hybride database/hardware-oplossingen zoals die van IBM, SAP en TeraData hebben hier altijd oplossingen voor geboden. Daar komen nu steeds vaker ook in-memory-oplossingen bij. Enerzijds omdat deze steeds betaalbaarder en dus toegankelijker worden, anderzijds doordat dit soort oplossingen in de cloud beschikbaar komen, waardoor de kosten hiervan goed in de hand te houden zijn.

    Virtual data integration
    Daar waar data nu nog vaak fysiek wordt samengevoegd in aparte databases (data warehouses) zal dit, waar mogelijk, worden vervangen door slimme metadata-oplossingen, die (al dan niet met tijdelijke physieke , soms in memory opslag) tijdrovende data extractie en integratie processen overbodig maken.

    Agile BI development
    Organisaties worden meer en meer genoodzaakt om flexibel mee te bewegen in en met de keten waar ze zich in begeven. Dit betekent dat ook de inzichten om de bedrijfsvoering aan te sturen (de bi-oplossingen) flexibel moeten mee bewegen. Dit vergt een andere manier van ontwikkelen van de bi-ontwikkelteams. Meer en meer zie je dan ook dat methoden als Scrum ook voor bi-ontwikkeling worden toegepast.

    Bi voor de iedereen
    Daar waar bi toch vooral altijd het domein van organisaties is geweest zie je dat ook consumenten steeds meer en vaker gebruik maken van bi-oplossingen. Bekende voorbeelden zijn inzicht in financiën en energieverbruik. De analyse van inkomsten en uitgaven op de webportal of in de app van je bank, maar ook de analyse van de gegevens van slimme energiemeters zijn hierbij sprekende voorbeelden. Dit zal in de komende jaren alleen maar toenemen en geïntegreerd worden.

    Rein Mertens, head of analytical platform bij SAS
    Een belangrijke trend die ik tot volwassenheid zie komen in 2016 is ‘streaming analytics’. Vandaag de dag is big data niet meer weg te denken uit onze dagelijkse praktijk. De hoeveelheid data welke per seconde wordt gegenereerd blijft maar toenemen. Zowel in de persoonlijke als zakelijke sfeer. Kijk maar eens naar je dagelijkse gebruik van het internet, e-mails, tweets, blog posts, en overige sociale netwerken. En vanuit de zakelijke kant: klantinteracties, aankopen, customer service calls, promotie via sms/sociale netwerken et cetera.

    Een toename van volume, variatie en snelheid van vijf Exabytes per twee dagen wereldwijd. Dit getal is zelfs exclusief data vanuit sensoren, en overige IoT-devices. Er zit vast interessante informatie verstopt in het analyseren van al deze data, maar hoe doe je dat? Een manier is om deze data toegankelijk te maken en op te slaan in een kosteneffectief big data-platform. Onvermijdelijk komt een technologie als Hadoop dan aan de orde, om vervolgens met data visualisatie en geavanceerde analytics aan de gang te gaan om verbanden en inzichten uit die data berg te halen. Je stuurt als het ware de complexe logica naar de data toe. Zonder de data allemaal uit het Hadoop cluster te hoeven halen uiteraard.

    Maar wat nu, als je op basis van deze grote hoeveelheden data ‘real-time’ slimme beslissingen zou willen nemen? Je hebt dan geen tijd om de data eerst op te slaan, en vervolgens te gaan analyseren. Nee, je wilt de data in-stream direct kunnen beoordelen, aggregeren, bijhouden, en analyseren, zoals vreemde transactie patronen te detecteren, sentiment in teksten te analyseren en hierop direct actie te ondernemen. Eigenlijk stuur je de data langs de logica! Logica, die in-memory staat en ontwikkeld is om dat heel snel en heel slim te doen. En uiteindelijke resultaten op te slaan. Voorbeelden van meer dan honderdduizend transacties zijn geen uitzondering hier. Per seconde, welteverstaan. Stream it, score it, store it. Dat is streaming analytics!

    Minne Sluis, oprichter van Sluis Results
    Van IoT (internet of things) naar IoE (internet of everything)
    Alles wordt digitaal en connected. Meer nog dan dat we ons zelfs korte tijd geleden konden voorstellen. De toepassing van big data-methodieken en -technieken zal derhalve een nog grotere vlucht nemen.

    Roep om adequate Data Governance zal toenemen
    Hoewel het in de nieuwe wereld draait om loslaten, vertrouwen/vrijheid geven en co-creatie, zal de roep om beheersbaarheid toch toenemen. Mits vooral aangevlogen vanuit een faciliterende rol en zorgdragend voor meer eenduidigheid en betrouwbaarheid, bepaald geen slechte zaak.

    De business impact van big data & data science neemt toe
    De impact van big data & data science om business processen, diensten en producten her-uit te vinden, verregaand te digitaliseren (en intelligenter te maken), of in sommige gevallen te elimineren, zal doorzetten.

    Consumentisering van analytics zet door
    Sterk verbeterde en echt intuïtieve visualisaties, geschraagd door goede meta-modellen, dus data governance, drijft deze ontwikkeling. Democratisering en onafhankelijkheid van derden (anders dan zelfgekozen afgenomen uit de cloud) wordt daarmee steeds meer werkelijkheid.

    Big data & data science gaan helemaal doorbreken in de non-profit
    De subtiele doelstellingen van de non-profit, zoals verbetering van kwaliteit, (patiënt/cliënt/burger) veiligheid, punctualiteit en toegankelijkheid, vragen om big data toepassingen. Immers, voor die subtiliteit heb je meer goede informatie en dus data, sneller, met meer detail en schakering nodig, dan wat er nu veelal nog uit de traditionelere bi-omgevingen komt. Als de non-profit de broodnodige focus van de profit sector, op ‘winst’ en ‘omzetverbetering’, weet te vertalen naar haar eigen situatie, dan staan succesvolle big data initiatieven om de hoek! Mind you, deze voorspelling geldt uiteraard ook onverkort voor de zorg.

    Hans Geurtsen, business intelligence architect data solutions bij Info Support
    Van big data naar polyglot persistence
    In 2016 hebben we het niet meer over big, maar gewoon over data. Data van allerlei soorten en in allerlei volumes die om verschillende soorten opslag vragen: polyglot persistence. Programmeurs kennen de term polyglot al lang. Een applicatie anno 2015 wordt vaak al in meerdere talen geschreven. Maar ook aan de opslag kant van een applicatie is het niet meer alleen relationeel wat de klok zal slaan. We zullen steeds meer andere soorten databases toepassen in onze data oplossingen, zoals graph databases, document databases, etc. Naast specialisten die alles van één soort database afweten, heb je dan ook generalisten nodig die precies weten welke database zich waarvoor leent.

    De doorbraak van het moderne datawarehouse
    ‘Een polyglot is iemand met een hoge graad van taalbeheersing in verschillende talen’, aldus Wikipedia. Het gaat dan om spreektalen, maar ook in het it-vakgebied, kom je de term steeds vaker tegen. Een applicatie die in meerdere programmeertalen wordt gecodeerd en data in meerdere soorten databases opslaat. Maar ook aan de business intelligence-kant volstaat één taal, één omgeving niet meer. De dagen van het traditionele datawarehouse met een etl-straatje, een centraal datawarehouse en één of twee bi-tools zijn geteld. We zullen nieuwe soorten data-platformen gaan zien waarin allerlei gegevens uit allerlei bronnen toegankelijk worden voor informatiewerkers en data scientists die allerlei tools gebruiken.

    Business intelligence in de cloud
    Waar vooral Nederlandse bedrijven nog steeds terughoudend zijn waar het de cloud betreft, zie je langzaam maar zeker dat de beweging richting cloud ingezet wordt. Steeds meer bedrijven realiseren zich dat met name security in de cloud vaak beter geregeld is dan dat ze zelf kunnen regelen. Ook cloud leveranciers doen steeds meer om Europese bedrijven naar hun cloud te krijgen. De nieuwe data centra van Microsoft in Duitsland waarbij niet Microsoft maar Deutsche Telekom de controle en toegang tot klantgegevens regelt, is daar een voorbeeld van. 2016 kan wel eens hét jaar worden waarin de cloud écht doorbreekt en waarin we ook in Nederland steeds meer complete BI oplossingen in de cloud zullen gaan zien.

    Huub Hillege, principal data(base) management consultant bij Info-Shunt
    Big data
    De big data-hype zal zich nog zeker voortzetten in 2016 alleen het succes bij de bedrijven is op voorhand niet gegarandeerd. Bedrijven en pas afgestudeerden blijven elkaar gek maken over de toepassing. Het is onbegrijpelijk dat iedereen maar Facebook, Twitter en dergelijke data wil gaan ontsluiten terwijl de data in deze systemen hoogst onbetrouwbaar is. Op elke conferentie vraag ik waar de business case, inclusief baten en lasten is, die alle investeringen rondom big data rechtvaardigen. Zelfs bi-managers van bedrijven moedigen aan om gewoon te beginnen. Dus eigenlijk: achterom kijken naar de data die je hebt of kunt krijgen en onderzoeken of je iets vindt waar je iets aan zou kunnen hebben. Voor mij is dit de grootste valkuil, zoals het ook was met de start van Datawarehouses in 1992. Bedrijven hebben in de huidige omstandigheden beperkt geld. Zuinigheid is geboden.

    De analyse van big data moet op de toekomst zijn gericht vanuit een duidelijke business-strategie en een kosten/baten-analyse: welke data heb ik nodig om de toekomst te ondersteunen? Bepaal daarbij:

    • Waar wil ik naar toe?
    • Welke klantensegmenten wil ik erbij krijgen?
    • Gaan we met de huidige klanten meer 'Cross selling' (meer producten) uitvoeren?
    • Gaan we stappen ondernemen om onze klanten te behouden (Churn)?

    Als deze vragen met prioriteiten zijn vastgelegd moet er een analyse worden gedaan:

    • Welke data/sources hebben we hierbij nodig?
    • Hebben we zelf de data, zijn er 'gaten' of moeten we externe data inkopen?

    Databasemanagementsysteem
    Steeds meer databasemanagementsysteem (dbms)-leveranciers gaan ondersteuning geven voor big data-oplossingen zoals bijvoorbeeld Oracle/Sun Big Data Appliance, Teradata/Teradata Aster met ondersteuning voor Hadoop. De dbms-oplossingen zullen op de lange termijn het veld domineren. big data-software-oplossingen zonder dbms zullen het uiteindelijk verliezen.

    Steeds minder mensen, ook huidige dbma's, begrijpen niet meer hoe het technisch diep binnen een database/DBMS in elkaar zit. Steeds meer zie je dat fysieke databases uit logische data modelleer-tools worden gegeneerd. Formele fysieke database-stappen/-rapporten blijven achterwege. Ook ontwikkelaars die gebruik maken van etl-tools zoals Informatica, AbInitio, Infosphere, Pentaho et cetera, genereren uiteindelijk sgl-scripts die data van sources naar operationele datastores en/of datawarehouse brengen.

    Ook de bi-tools zoals Microstrategy, Business Objects, Tableau et cetera genereren sql-statements.
    Meestal zijn dergelijke tools initieel ontwikkeld voor een zeker dbms en al gauw denkt men dat het dan voor alle dbms'en toepasbaar is. Er wordt dan te weinig gebruik gemaakt van specifieke fysieke dbms-kenmerken.

    De afwezigheid van de echte kennis veroorzaakt dan performance problemen die in een te laat stadium worden ontdekt. De laatste jaren heb ik door verandering van databaseontwerp/indexen en het herstructureren van complexe/gegenereerde sql-scripts, etl-processen van zes tot acht uur naar één minuut kunnen krijgen en queries die 45 tot 48 uur liepen uiteindelijk naar 35 tot veertig minuten kunnen krijgen.

    Advies
    De benodigde data zal steeds meer groeien. Vergeet de aanschaf van allerlei hype software pakketten. Zorg dat je zeer grote, goede, technische, Database-/dbms-expertise in huis haalt om de basis van onderen goed in te richten in de kracht van je aanwezige dbms. Dan komt er tijd en geld vrij (je kan met kleinere systemen uit de voeten omdat de basis goed in elkaar zit) om, na een goede business case en ‘proof of concepts’, de juiste tools te selecteren.

  • Be careful when implementing data warehouse automation

    DWHAAutomation can be a huge help, but automating concepts before you understand them is a recipe for disaster.

    The concept of devops has taken root in the world of business intelligence and analytics.

    The overall concept of devops has been around for a while in traditional IT departments as they sought to expand and refine the way that they implemented software and applications. The core of devops in the world of analytics is called DWA (data warehouse automation), which links together the design and implementation of analytical environments into repeatable processes and should lead to increased data warehouse and data mart quality, as well as decreased time to implement those environments.

    Unfortunately, for several reasons the concept of data warehouse automation is not a silver bullet when it comes to the implementation of analytical environments.

    One reason is that you really shouldn't automate concepts before you fully understand them. As the saying goes, don't put your problems on roller skates. Automating a broken process only means that you make mistakes faster. Now, while I often advocate the concept of failing faster to find the best solution to an analytical problem, I don't really agree with the concept of provisioning flawed database structures very quickly only to rebuild them later.

    Another issue with applying devops to analytical practices is that the software development community has a 10-15 year head start on the analytical community when it comes to productizing elements of their craft.

    oftware developers have spent years learning how to best encapsulate their designs into object-oriented design, package that knowledge, and put it in libraries for use by other parts of the organization, or even by other organizations. Unfortunately, the design, architecture, and implementation of analytical components, such as data models, dashboard design, and database administration, are viewed as an art and still experience cultural resistance to the concept that a process can repeat the artistry of a data model or a dashboard design.

    Finally, there is the myth that data warehouse automation or any devops practice can replace the true thought processes that go into the design of an analytical environment.

    With the right processes and cultural buy-in, DWA will provide an organization with the ability to leverage their technical teams and improve the implementation time of changes in analytical environments. However, without that level of discipline to standardize the right components and embrace artistry on the tricky bits, organizations will take the concept of data warehouse automation and fail miserably in their efforts to automate.

    The following is good advice for any DWA practice:

    • Use the right design process and engage the analytical implementation teams. Without this level of forethought and cultural buy-in, the process becomes more of an issue than it does a benefit and actually takes longer to implement than a traditional approach.
    • Find the right technologies to use. There are DWA platforms available to use, but there are also toolsets such as scripting and development environments that can provide much of the implementation value of a data warehouse automation solution. The right environment for your team's skills and budget will go a long way to either validating a DWA practice or showing its limitations.
    • Iterate and improve. Just as DWA is designed to iterate the development of analytical environments, data warehouse automation practices should have the same level of iteration. Start small. Perfect the implementation. Expand the scope. Repeat.

    Source: Infoworld

  • Business Intelligence in 3PL: Mining the Value of Data

    data-mining-techniques-create-business-value 1In today’s business world, “information” is a renewable resource and virtually a product in itself. Business intelligence technology enables businesses to capture historical, current and predictive views of their operations, incorporating such functions as reporting, real-time analytics, data and process mining, performance management, predictive analytics, and more. Thus, information in its various forms and locations possesses genuine inherent value.
     
    In the real world of warehousing, the availability of detailed, up-to-the minute information on virtually every item in the operators’ custody, from inbound dock to delivery site, leads to greater efficiency in every area it touches. Logic would offer that greater profitability ensues.
     
    Three areas of 3PL operations seem to be most benefitted through savings opportunities identified through business intelligence solutions: labor, inventory, and analytics.
    In the first case, business intelligence tools can help determine the best use of the workforce, monitoring its activity in order to assure maximum effective deployment. The result: potentially major jumps in efficiency, dramatic reductions in downtime, and healthy increases in productivity and billable labor.
     
    In terms of inventory management, the metrics obtainable through business intelligence can stem inventory inaccuracies that would have resulted in thousands of dollars in annual losses, while also reducing write-offs.
     
    Analytics through business intelligence tools can also accelerate the availability of information, as well as provide the optimal means of presentation relative to the type of user. One such example is the tracking of real-time status of work load by room or warehouse areas; supervisors can leverage real-time data to re-assign resources to where they are needed in order to balance workloads and meet shipping times. A well-conceived business intelligence tool can locate and report on a single item within seconds and a couple of clicks.
     
    Extending the Value
    The value of business intelligence tools is definitely not confined to the product storage areas.
     
    With automatically analyzed information available in a dashboard presentation, users – whether in the office or on the warehouse floor – can view the results of their queries/searches in a variety of selectable formats, choosing the presentation based on its usefulness for a given purpose. Examples:
    • Status checks can help identify operational choke points, such as if/when/where an order has been held up too long; if carrier wait-times are too long; and/or if certain employees have been inactive for too long.
    • Order fulfillment dashboards can monitor orders as they progress through the picking, staging and loading processes, while also identifying problem areas in case of stalled processes.
    • Supervisors walking the floor with handheld devices can both encourage team performance and, at the same time, help assure efficient dock-side activity. Office and operations management are able to monitor key metrics in real-time, as well as track budget projections against actual performance data.
    • Customer service personnel can call up business intelligence information to assure that service levels are being maintained or, if not, institute measures to restore them.
    • And beyond the warehouse walls, sales representatives in the field can access mined and interpreted data via mobile devices in order to provide their customers with detailed information on such matters as order fill rates, on-time shipments, sales and order volumes, inventory turnover, and more.
    Thus, well-designed business intelligence tools not only can assemble and process both structured and unstructured information from sources across the logistics enterprise, but can deliver it “intelligently” – that is, optimized for the person(s) consuming it. These might include frontline operators (warehouse and clerical personnel), front line management (supervisors and managers), and executives.
     
    The Power of Necessity
    Chris Brennan, Director of Innovation at Halls Warehouse Corp., South Plainfield N.J., deals with all of these issues as he helps manage the information environment for the company’s eight facilities. Moreover, as president of the HighJump 3PL User Group, he strives to foster collective industry efforts to cope with the trends and issues of the information age as it applies to warehousing and distribution.
     
    “Even as little as 25 years ago, business intelligence was a completely different art,” Brennan has noted. “The tools of the trade were essentially networks of relationships through which members kept each other apprised of trends and happenings. Still today, the power of mutual benefit drives information flow, but now the enormous volume of data available to provide intelligence and drive decision making forces the question: Where do I begin?”
     
    Brennan has taken a leading role in answering his own question, drawing on the experience and insights of peers as well as the support of HighJump’s Enterprise 3PL division to bring Big Data down to size:
     
    “Business intelligence isn’t just about gathering the data,” he noted, “it’s about getting a group of people with varying levels of background and comfort to understand the data and act upon it. Some managers can glance at a dashboard and glean everything they need to know, but others may recoil at a large amount of data. An ideal BI solution has to relay information to a diverse group of people and present challenges for them to think through.”
     
    source: logisticviewpoints.com, December 6, 2016
  • Data governance: using factual data to form subjective judgments

    Data governance: using factual data to form subjective judgments

    Data Warehouses were born of the finance and regulatory age. When you peel away the buzz words, the principle goal of this initial phase of business intelligence was the certification of truth. Warehouses helped to close the books and analyze results. Regulations like Dodd Frank wanted to make sure that you took special care to certify the accuracy of financial results and Basel wanted certainty around capital liquidity and on and on. Companies would spend months or years developing common metrics, KPIs, and descriptions so that a warehouse would accurately represent this truth.

    In our professional lives, many items still require this certainty. There can only be one reported quarterly earnings figure. There can only be one number of beds in a hospital or factories available for manufacturing. However, an increasing number of questions do not have this kind of tidy right and wrong answer. Consider the following:

    • Who are our best customers?
    • Is that loan risky?
    • Who are our most effective employees?
    • Should I be concerned about the latest interest rate hike?

    Words like best, risky, and effective are subjective by their very natures. Jordon Morrow (Qlik) writes and speaks extensively about the importance of data literacy and uses a phrase that has always felt intriguing: data literacy requires the ability to argue with data. This is key when the very nature of what we are evaluating does not have neat, tidy truths.

    Let’s give an example. A retail company trying to liquidate its winter inventory and has asked three people to evaluate the best target list for an e-mail campaign.

    • John downloads last year’s campaign results and collects the names and e-mail addresses of the 2% that responded to the campaign last year with an order.
    • Jennifer thinks about the problem differently. She looks through sales records of anyone who has bought winter merchandise in the past 5 years during the month of March who had more than a 25% discount on the merchandise. She notices that these people often come to the web site to learn about sales before purchasing. Her reasoning is that a certain type of person who likes discounts and winter clothes is the target.
    • Juan takes yet another approach. He looks at social media feeds of brand influencers. He notices that there are 100 people with 1 million or more followers and that social media posts by these people about product sales traditionally cause a 1% spike in sales for the day as their followers flock to the stores. This is his target list.

    So who has the right approach? This is where the ability to argue with data becomes critical. In theory, each of these people should feel confident developing a sales forecast on his or her model. They should understand the metric that they are trying to drive and they should be able to experiment with different ideas to drive a better outcome and confidently state their case.

    While this feels intuitive, enterprise processes and technologies are rarely set up to support this kind of vibrant analytics effort. This kind of analytics often starts with the phrase “I wonder if…” while conventional IT and data governance frameworks are not able generally to deal with questions that a person did not know that they had 6 months before. And yet, “I wonder if” relies upon data that may have been unforeseen. In fact, it usually requires a connection of data sets that have often never been connected before to drive break-out thinking. Data science is about identifying those variables and metrics that might be better predictors of performance. This relies on the analysis of new, potentially unexpected data sets like social media followers, campaign results, web clicks, sales behavior etc. Each of these items might be important for an analysis, but in a world in which it is unclear what is and is not important, how can a governance organization anticipate and apply the same dimensions of quality to all of the hundreds of data sets that people might use? And how can they apply the same kind of rigor to data quality standards for the hundreds of thousands of data elements available as opposed to the 100-300 critical data elements.

    They can’t. And that’s why we need to re-evaluate the nature of data governance for different kinds of analytics.

    Author: Joe Dos Santos

    Source: Qlik

  • Data warehouse automation: what you need to know

    data warehouseIn the dark about data warehousing? You’re not alone

    You would be forgiven for not knowing data warehousing exists, let alone that it’s been automated. It’s not a topic that gets a lot of coverage in the UK, unlike in the USA and Europe. It might be that Business Intelligence and Big Data Analytics are topics that have more ‘curb’ appeal. But, without data warehousing, data analytics would not generate the quality of business intelligence that organisations rely on. So what is a data warehouse and why did it need to be automated?

    Here’s what you need to know about data warehouse automation.

    In its most basic form a data warehouse is a repository where all your data is put, so that it can be analysed for business insight, and most business have one. Your customers will most likely have one because they need the kind of insight data analysis provides. Business Insight or Intelligence (BI) helps the business make accurate decisions, stay competitive and ultimately profitable.

    In retail, for example, the accurate and timely reporting of sales, inventory, discounts and profit is critical to getting a consolidated view of the business at all levels and at all locations. In addition, analysing customer data can inform businesses which promotions work, which products sell, which locations work best, what loyalty vouchers and schemes are working, and which are not. Knowing customer demographics can help retailers to cross or upsell items. By analysing customer data companies can tailor products to the right specification, at the right time thereby improving customer relations and ultimately increasing customer retention.

    Analysing all the data

    But, this is only part of the picture. The best intelligence will come from an analysis of all the data the company has. There are several places where companies get data. They usually have their own internal systems that have finance data, HR data, sales data, and other data specific to its business. In addition, most of your customers will now also collect data from the internet and social media (Big Data), with new data coming in from sensors, GPS and smart devices (IoT data). The data warehouse can pull any kind of data from any source into one single place for analysis. A lack of cross-pollination across the business can lead to missed opportunities and a limited corporate view.

    Previously, to get the data from its source (internal or external) into the data warehouse involved writing code by hand. This was monotonous, slow and laborious. It meant that the data warehouse took months to build, and then was rigidly stuck to the coding (and therefore design) it had been built with. Any changes that needed to be made, were equally slow and time consuming creating a frustration for both the IT and the Business. For the business, the data often took so long to be produced that it was out of date by the time they had it.

    Automation

    Things have moved on since the days of the traditional data warehouse and now the design and build of a data warehouse is automated, optimised and wizard driven. It means that the coding is generated automatically. With automation, data is available at the push of a button. Your customers don’t have to be an IT expert to create reports and employees don’t need to ask head office if they want information on a particular product line. Even more importantly, when you automate the data warehouse lifecycle you make it agile, so as your business grows and changes the warehouse can adapt. As we all know, it’s a false economy to invest in a short-term solution, which in a few years, will not be fit for purpose. Equally, it’s no good paying for excellent business intelligence tools and fancy reporting dashboards if the data underneath is not fully accessible, accurate and flexible.

    What does this mean for the channel?

    So now you know the importance of a data warehouse for data analytics, and how automation has brought data warehousing into the 21st century. So, what next? What does this mean for the channel?

    Not everyone in the channel will be interested in automation. Faster more efficient projects might not look like they will generate the immediate profit margins or revenue of a longer, slower one. But, innovative channel partners will be able to see that there are two clear advantages for them. One is that the projects, whilst shorter, never really end. This means there is a consistent stream of income. Secondly, by knowing about and offering your clients data warehouse automation the channel partner shows their expertise and consultancy abilities.

    The simple fact is that most companies have a data warehouse of some kind, from the giant supermarkets such as Tesco and Sainsbury, to smaller businesses like David Lloyd or Jersey Electricity. You don’t want to be the channel partner who didn’t know about or didn’t recommend the best, most efficient solution for your client. This could impact more than just the immediate sales. By educating your customers about the benefits of data warehouse automation you will bring them a wealth of efficiencies to their company, and most likely a wealth of future recommendations to yours.

    Source: ChannelPro

  • Decision making by smart technology

    bid-2015Zo heet het congres dat Heliview dinsdag 27 januari 2015 in ‘s Hertogenbosch organiseert over Business Intelligence & Datawarehousing. Business Intelligence blijft volgens vele bronnen op de prioriteitenlijst staan van Nederlandse organisaties. De hoeveelheid gestructureerde en ongestructureerde data neemt in recordtempo toe. Deze data is van onschatbare waarde voor organisaties. Business intelligence stelt organisaties in staat data op een slimme manier te verwerken tot de juiste informatie en daarmee tijd en geld te besparen en concurrentie voor te blijven. Slimme organisaties zijn steeds vaker ook succesvolle organisaties.  

    In het Heliview congres (dat onder dagvoorzitterschap staat van BI-kring initiatiefnemer Egbert Philips) staat de klassieke BI-driehoek centraal. Sprekers als Rick van der Lans, Arent van ‘t Spijker en vele anderen bespreken hoe organisaties betere beslissingen nemen door het slim en op maat inzetten van actuele technologische mogelijkheden op het gebied van data- en informatieverwerking. Voor wat betreft de techniek staan 27 januari centraal: social BI, mobile BI, business analytics en datawarehousing in de cloud.

    Lees hier neer over het congres

     

  • Drie componenten van Agile BI bij Alliander

    agile bi

    Het nutsbedrijf Alliander beheert energienetwerken die gas en elektriciteit distribueren in een groot deel van Nederland en heeft ongeveer 3,3 miljoen klanten. Alliander wil inspelen op de onvoorspelbaarheid van zowel de energiemarkt als de technologie ontwikkelingen en 

    een 'datagedreven' netbeheerder zijn.

    De beschikking hebben over state-of-the-art BI & Analytics oplossingen en toch datadumps moeten aanleveren voor Excel rapportages? Dit is niet iets wat een organisatie wenst, maar is vaak  wel de realiteit en eerlijk zijn: ook in uw organisatie. Men ervaart dat BI-trajecten te lang duren waardoor ‘Excelerados’ in elkaar gezet worden. Ook bij Alliander hebben we hiermee te maken en dit handmatige alternatief is uiteraard ongewenst. De aanwezigheid binnen Alliander van dure BI-oplossingen in een vastgeroeste, inflexibele architectuur met lange ontwikkeltijden is ook niet wenselijk. Daarom hebben we drie componenten toegepast bij het ontwikkelen van BI-projecten om meer agility te krijgen. Deze componenten zijn Scrum, een data provisioning layer gelijkwaardig het logische datawarehouse, en data profiling.

    Binnen Alliander onderkennen we minimaal vier probleemgebieden in de manier van werken met een verouderde architectuur: ‘Exelerados’, dure BI-oplossingen, inflexibiliteit en lange ontwikkeltijden. Daarom is een Agile productontwerp voor Alliander essentieel gebleken om onze ambitie te verwezenlijken en de onderkende uitdagingen aan te gaan. Het Agile product is tot stand gekomen met drie technieken: Alliander’s Data Provisioning Layer (DPL) als Logisch Datawarehouse; flexibel inspelen op veranderende informatiebehoeften met Agile Data Modeling (het zogenaamde account-based model); directe feedback van de eindgebruiker met behulp van Data Profiling en Scrum. We willen dit toelichten in een drietal blogs. Dit is de eerste.

    Data Provisioning Layer als Logisch Datawarehouse
    Het hart van agile productontwikkeling is de architectuur waarmee je werkt. De wendbaarheid van de architectuur is bij Alliander vormgegeven als een Logisch Datawarehouse. De Data Provisioning Layer (DPL) is het onderdeel van de architectuur dat als Logisch Datawarehouse ingezet wordt.

    De Data Provisioning Layer maakt data beschikbaar van verschillende traditionele bronnen (en bijvoorbeeld ook bestaande datawarehouses) die we kennen binnen Alliander, maar ook data van buiten Alliander, bijvoorbeeld data uit het Centrale Aansluit Register (CAR) of van de Kamer van Koophandel (KvK). En verder maakt de DPL ook real-time data beschikbaar, bijvoorbeeld uit het elektriciteits- of gasnet, om deze te kunnen combineren met andere data, zoals meetgegevens uit een onderstation (telemetrie).

    Door met views te werken in de DPL met daaronder virtuele informatiemodellen, maakt het voor de gebruikers van data geen verschil waar de data vandaan komt. Dit stelt ons in staat om bijvoorbeeld heel snel transactiegegevens uit ons ERP-systeem te combineren met geografische gegevens uit onze GIS-systemen, of met real-time data uit de netten.

    Een dashboard of andere toepassing is gebaseerd op een view uit de DPL, waarbij de data bijvoorbeeld direct uit een operational datastore of uit het bronsysteem komt. Als bijvoorbeeld wegens redenen van performance een dimensioneel datamodel nodig is, dan blijft het bestaande informatiemodel intact en daarmee ook de ontwikkelde toepassing voor een gebruiker.

    Conclusies
    Door de gevirtualiseerde DPL in te zetten volgens het concept van het Logisch Datawarehouse, zijn we in staat geweest om de volgende voordelen te behalen:

    • korte levertijden voor rapportages, door ontkoppeling van gegevensbron met gegevensgebruikers;
    • combinatie van externe met interne bronnen;
    • altijd beschikking over actuele gegevens;
    • benaderen van Big Data-bronnen.

    sam geurts

    De  gevirtualiseerde laag, aangeduid met DPL in bovenstaand schema, zorgt ervoor dat snellere integratie van de verschillende bronnen mogelijk is, wat tot betere resultaten leidt.

    In het tweede deel van deze blog gaan we in op de volgende toegepaste techniek: Agile Data Modeling.


    Hüseyin Kara is Senior BI Consultant bij Alliander.
    Sam Geurts is Scrum Master Data & Inzicht bij Alliander.

  • Five factors to help select the right data warehouse product

    meer-bronnenHow big is your company, and what resources does it have? What are your performance needs? Answering these questions and others can help you select the right data warehouse platform.

    Once you've decided to implement a new data warehouse, or expand an existing one, you'll want to ensure that you choose the technology that's right for your organization. This can be challenging, as there are many data warehouse platforms and vendors to consider.

    Long-time data warehouse users generally have a relational database management system (RDBMS) such as IBM DB2, Oracle or SQL Server. It makes sense for these companies to expand their data warehouses by continuing to use their existing platforms. Each of these platforms offers updated features and add-on functionality (see the sidebar, "What if you already have a data warehouse?").

    But the decision is more complicated for first-time users, as all data warehousing platform options are available to them. They can opt to use a traditional DBMS, an analytic DBMS, a data warehouse appliance or a cloud data warehouse. The following factors may help make the decision process easier.

    1. How large is your company?

    Larger companies looking to deploy data warehouse systems generally have more resources, including financial and staffing, which translates to more technology options. It can make sense for these companies to implement multiple data warehouse platforms, such as an RDBMS coupled with an analytical DBMS such as Hewlett Packard Enterprise (HPE) Vertica or SAP IQ. Traditional queries can be processed by the RDBMS, while online analytical processing (OLAP) and nontraditional queries can be processed by the analytical DBMS. Nontraditional queries aren't usually found in transactional applications typified by quick lookups. This could be a document-based query or a free-form search, such as those done on Web search sites like Google and Bing.

    For example, HPE Vertica offers Machine Data Log Text Search, which helps users collect and index large log file data sets. The product's enhanced SQL analytics functions deliver in-depth capabilities for OLAP, geospatial and sentiment analysis. An organization might also consider SAP IQ for in-depth OLAP as a near-real-time service to SAP HANA data.

    Teradata Corp.'s Active Enterprise Data Warehouse (EDW) platform is another viable option for large enterprises. Active EDW is a database appliance designed to support data warehousing that's built on a massively parallel processing architecture. The platform combines relational and columnar capabilities, along with limited NoSQL capabilities. Teradata Active EDW can be deployed on-premises or in the cloud, either directly from Teradata or through Amazon Web Services.

    For midsize organizations, where a mixture of flexibility and simplicity is important, reducing the number of vendors is a good idea. That means looking for suppliers that offer compatible technology across different platforms. For example, Microsoft, IBM and Oracle all have significant software portfolios that can help minimize the number of other vendors an organization might need. Hybrid transaction/analytical processing (HTAP) capabilities that enable a single DBMS to run both transaction processing and analytics applications should also appeal to midsize organizations.

    Smaller organizations and those with minimal IT support should consider a data warehouse appliance or a cloud-based data warehouse as a service (DWaaS) offering. Both options make it easier to get up and running, and minimize the administration work needed to keep a data warehouse functional. In the cloud, for example, Amazon Redshift and IBM dashDB offer fully managed data warehousing services that can lower up-front implementation costs and ongoing management expenses.

    Regardless of company size, it can make sense for an organization to work with a vendor or product that it has experience using. For example, companies using Oracle Database might consider the Oracle Exadata Database Machine, Oracle's data warehouse appliance. Exadata runs Oracle Database 12c, so Oracle developers and DBAs should immediately be able to use the appliance. Also, the up-front system planning and integration required for data warehousing projects is eliminated with Exadata because it bundles the DBMS with compute, storage and networking technologies.

    A similar option for organizations that use IBM DB2 is the IBM PureData System for Analytics, which is based on DB2 for LUW. Keep in mind, however, that data warehouse appliances can be costly, at times pricing themselves out of the market for smaller organizations.

    Microsoft customers should consider the preview release of Microsoft Azure SQL Data Warehouse. It's a fully managed data warehouse service that's compatible and integrated with the Microsoft SQL Server ecosystem.

    2. What are your availability and performance needs?

    Other factors to consider include high availability and rapid response. Most organizations that decide to deploy a data warehouse will likely want both, but not every data warehouse actually requires them.

    When availability and performance are the most important criteria, DWaaS should be at the bottom of your list because of the lower speed imposed by network latency with cloud access. Instead, on-premises deployment can be tuned and optimized by IT technicians to deliver increased system availability and faster performance at the high end. This can mean using the latest features of an RDBMS, including the HTAP capabilities of Oracle Database, or IBM's DB2 with either the IBM DB2 Analytics Accelerator add-on product for DB2 for z/OS or BLU Acceleration capabilities for DB2 for LUW. Most RDBMS vendors offer capabilities such as materialized views, bitmap indexes, zone maps, and high-end compression for data and indexes. For most users, however, satisfactory performance and availability can be achieved with data warehouse appliances such as IBM PureData, Teradata Active EDW and Oracle Exadata. These platforms are engineered for data warehousing workloads, but require minimal tuning and administration.

    Another appliance to consider is the Actian Analytics Platform, which is designed to support high-speed data warehouse implementation and management. The platform combines relational and columnar capabilities, but also includes high-end features for data integration, analytics and performance. It can be a good choice for organizations requiring both traditional and nontraditional data warehouse queries. The Actian Analytics Platform includes Actian Vector, a Symmetric Multiprocessor DBMS designed for high-performance analytics, which exploits many newer, performance-oriented features such as single instruction multiple data. This enables a single operation to be applied on a set of data at once and CPU cache to be utilized as execution memory.

    Pivotal Greenplum is an open source, massively parallel data warehouse platform capable of delivering high-speed analytics on large volumes of data. The platform combines relational and columnar capabilities and can be deployed on-premises as software or an appliance, or as a service in the cloud. Given its open source orientation, Pivotal Greenplum may be viewed favorably by organizations basing their infrastructure on an open source computing stack.

    3. Are you already in the cloud?

    DWaaS is probably the best option for companies that already conduct cloud-based operations. The other data warehouse platform options would require your business to move data from the cloud to an on-premises data warehouse. Keep in mind, though, that in addition to cloud-only options like Amazon Redshift, IBM dashDB and Microsoft Azure SQL Data Warehouse, many data warehouse platform providers offer cloud-based deployments.

    4. What are your data volume and latency requirements?

    Although many large data warehouses contain petabytes of raw data, every data warehouse implementation has different data storage needs. The largest data warehouses are usually customized combinations of RDBMS and analytic DBMS or HTAP implementations. As data volume requirements diminish, more varied options can be utilized, including data warehouse appliances.

    5. Is a data warehouse part of your big data strategy?

    Big data requirements have begun to impact the data warehouse, and many organizations are integrating unstructured and multimedia data into their data warehouse to combine analytics with business intelligence requirements -- aka polyglot data warehousing. If your project could benefit from integrated polyglot data warehousing, you need a platform that can manage and utilize this type of data. For example, the big RDBMS vendors -- IBM, Oracle and Microsoft -- are integrating support for nontraditional data and Hadoop in each of their respective products.

    You may also wish to consider IBM dashDB, which can process unstructured data via its direct integration with IBM Cloudant, enabling you to store and access JSON and NoSQL data. The Teradata Active EDW supports Teradata's Unified Data Architecture, which enables organizations to seamlessly access and analyze relational and nonrelational data. The Actian Analytics Platform delivers a data science workbench, simplifying analytics, as well as a scaled-out version of Actian Vector for processing data in Hadoop. Last, the Microsoft Azure SQL Data Warehouse enables analysis across many kinds of data, including relational data and semi-structured data stored in Hadoop, using its T-SQL language.

    Although organizations have been building data warehouses since the 1980s, the manner in which they are being implemented has changed considerably. After reading this four-part series, you should have a better idea of how modern data warehouses are built and what each of the leading vendors provides. Armed with this knowledge, you can make a more informed choice when purchasing data warehouse products.

    Source: TechTarget

  • Forrester benoemt SAS als leader voor Enterprise Insights platforms

    saslogo280x80 580x358SAS is benoemd tot leider in The Forrester Wave: Enterprise Insight Platforms, Q1 2019. In het rapport wordt opgemerkt dat "SAS Viya een moderne architectuur is met een enkele krachtige analytische engine. Het platform van SAS biedt ook de nauwste integratie die we hebben gezien tussen verschillende analysemogelijkheden, data preparation en governance."

    De succesvolle organisaties van deze tijd worden gedreven door analytische inzichten, niet door intuïtie. SAS Viya op het SAS Platform biedt bedrijven toonaangevende analyticsmogelijkheden om zakelijke beslissingen met zowel snelheid als schaalbaarheid te ondersteunen. Met behulp van een solide en samenhangende omgeving kunnen organisaties hun data op grote schaal manipuleren en verkennen. SAS Viya biedt ook toegang tot geavanceerde analyses en kunstmatige intelligentie (AI), met een extra laag voor transparantie en interpretatie in door AI gegenereerde beslissingen. Daarmee wordt de ‘black box’ van AI voor zowel data scientists als zakelijke gebruikers geopend.

    Volledige levenscyclus van analyses

    Volgens Sarah Gates, Product Marketing Manager voor het SAS-platform, willen bedrijven vertrouwen op een uitgebreid platform dat data management, analytics en ontwikkelingstools orkestreert om inzichten te genereren die hun beslissingen ondersteunen. “Het produceren van zowel snelle als betrouwbare resultaten is van cruciaal belang. SAS Viya biedt deze ondersteuning gedurende de volledige levenscyclus van analytics - van data tot discovery en implementatie.”

    Het rapport van Forrester vermeldt dat SAS een eersteklas toolset heeft voor analyse, voorspellen en streamen van. “Ondersteuning voor notebooks, programmeren voor meerdere talen en meer cloudopties completeren SAS Viya en maken het een goede keuze voor bedrijven met bedrijfskritieke analysebehoeften." SAS scoort het hoogst in de analytics tools categorie en heeft de hoogst mogelijke score behaald in de categorie market presence.

     

    Bron: BI-Platform

     

  • Gartner positions Microsoft as a leader in the Magic Quadrant for Operational Database Management Systems

    Microsoft is placed furthest in vision and highest for ability to execute within the Leaders Quadrant.

    With the release of SQL Server 2014, the cornerstone of Microsoft’s data platform, we have continued to add more value to what customers are already buying.  Innovations like workload optimized in-memory technology, advanced security, high availability for mission critical workloads are built-in instead of requiring expensive add-ons. We have long maintained that customers need choice and flexibility to navigate this mobile-first, cloud-first world and that Microsoft is uniquely equipped to deliver on that vision in both trusted environments on-premises and in the cloud.

    Industry analysts have taken note of our efforts and we are excited to share Gartner has positioned Microsoft as a Leader, for the third year in a row, in the Magic Quadrant for Operational Database Management Systems. Microsoft is placed furthest in vision and highest for ability to execute within the Leaders Quadrant.

    Given customers are trying to do more with data than ever before across a variety of data types, at large volumes, the complexity of managing and gaining meaningful insights from the data continues to grow.  One of the key design points in Microsoft data strategy is ensuring ease of use in addition to solving complex customer problems. For example, you can now manage both structured and unstructured data through the simplicity of T-SQL rather than requiring a mastery in Hadoop and MapReduce technologies. This is just one of many examples of how Microsoft values ease of use as a design point. 

    Gartner also recognizes Microsoft as a leader in the Magic Quadrant for Business Intelligence and Analytics Platforms and placed Microsoft as a leader in the Magic Quadrant for Data Warehouse Database Management Systems – recognizing Microsoft’s completeness of vision and ability to execute in the data warehouse market.

    Offering only one piece of the data puzzle isn’t enough to satisfy all the different scenarios in today’s environments and workloads. Our commitment is to make it easy for customers to capture and manage data and to transform and analyze that data for new insights.

    Being named a leader in Operational DBMS, BI & Analytics Platforms, and DW DBMS Magic Quadrants is incredibly important to us: We believe it validates Microsoft is delivering a comprehensive platform that ensures every organization, every team and every individual is empowered to do more and achieve more because of the data at their fingertips.

     

  • Hybrid data and the modern operational data warehouse

    Hybrid data and the modern operational data warehouse

    New requirements for data, software, and business practices are driving a new wave of modernization for the operational data warehouse.

    Modern enterprises looking for growth in revenue and profitability know that data is critically important to gaining competitive advantage. A high return on investment comes from digitally transforming business operations by capturing and analyzing a greater variety and volume of data to inform better business insights. When data sets with extremely diverse structures and characteristics are integrated for multiple use cases in operations and analytics, we call the resulting data set hybrid data.

    Valuable hybrid data comes from an increasing number of different sources, both old and new, internal and external. It is inevitable that hybrid data will arrive in many structures, schemas, and formats with variable characteristics for volume, latency (from batch to streams), concurrency, requirements for storage and in situ processing, and the emerging characteristics of machine data (IoT standards, geocoding, events, images, audio, training data for machine learning, etc.).

    From a technology viewpoint, it is challenging to integrate data of such diverse characteristics. From a business viewpoint, however, integration is well worth the effort because it provides deeper visibility into business processes and richer analytics insights than were possible before hybrid data's greater variety emerged.

    Hybrid data architectures

    Hybrid data usually drives users to deploy many types of database management systems and other data platforms (such as Hadoop and cloud) to capture, store, process, and analyze hybrid data. After all, it's difficult or impossible to optimize a single instance of a single data platform type to satisfy the eclectic requirements of hybrid data's multiple structures, latencies, storage paradigms, and analytics processing methods.

    The diversification of data and the quickening adoption of advanced analytics are some of the strongest drivers toward hybrid data architectures, so called because hybrid data is increasingly distributed across multiple platforms, both on premises and on one or more clouds. For some use cases, the right tool for a particular data type might be sufficient. However, there is more value in integrating access and analysis in a hybrid data architecture that can deliver the scale and performance needed to produce actionable insights.

    The modern operational data warehouse (ODW)

    Hybrid data and hybrid data architectures are already here. To get full business value from them, you need an appropriate data management platform, and that's where the modern operational data warehouse comes in. The modern ODW delivers insights from a hybrid data architecture quickly enough to impact operational business decisions.

    The operational data warehouse continues to focus on speed 

    Note that the operational data warehouse has been with us for decades, sometimes under synonyms such as the real-time, active, or dynamic data warehouse. No matter what you call it, the operational data warehouse has always involved high-performance data ingestion and query so that data travels as fast as possible into and out of the warehouse.

    Through analysis, an ODW provides timely insights for time-sensitive decisions such as real-time offers in e-commerce, network optimization, fraud detection, and investment decisions in trading environments. However, an ODW also supports time-sensitive operational processes such as just-in-time inventory, business monitoring, and operational reporting.

    Performance and real-time requirements continue to apply to the ODW. However, a modern ODW must also handle a broader range of data types and sources at unprecedented scale as well as new forms of analytics. The modern ODW satisfies requirements old and new largely by leveraging the speed and scale of new data platforms and analytics tools.

    The modern ODW is a hybrid data management solution and is hybrid in multiple ways 

    It integrates hybrid data from multiple operational systems and other sources. The modern ODW is built to handle modern data, which trends toward hybrid combinations. Furthermore, an implementation of a modern ODW may itself be hybrid when it spans both on-premises and cloud systems. In addition, a modern ODW tends to have substantial data integration capabilities that integrate data among the source and target systems of a hybrid data architecture.

    The best ODWs operate with very low latency

    A modern ODW is built for today's hybrid data and business use cases that demand real-time or near-real-time performance. Low-latency use cases supported by modern ODWs include real-time analytics, operational reporting, management dashboards, business activity monitoring, catching fraud before cash leaves the ATM, and making an offer before the potential customer leaves the store or website.

    A modern ODW is strong where other approaches are weak 

    For example, the traditional enterprise data warehouse is great as a corporate 'single source of truth' but inflexible and expensive. Operational data stores are fast in a limited domain but not extensible to larger enterprise needs. Data lakes are great for storing big and varied data economically but poor at data governance and predictable performance.

    By comparison, a modern ODW is built on the latest technology for superior speed, scale, maintenance, functionality, and cost containment. In addition, a modern ODW assumes that leveraging hybrid data is its raison d'etre, so it is built to handle an extremely broad range of data types at massive scale with extremely high performance.

    Given this daunting list of system requirements, it is unlikely that a user organization can satisfy even half of them with a homegrown system that was built by IT groups or consultants. Therefore, users should seek vendor-built systems designed and optimized for modern operational data warehousing.

    A successful ODW leverages recent advancements in data platforms and tools 

    These include parallel execution, columnar databases, in-memory execution, high-speed storage, distributed file systems, scalable clusters, elastic clouds, cloud-based databases, and managed services for cloud data solutions. Because of the extreme diversity of hybrid data, a successful ODW will interoperate via many access methods (such as R, Scala, SQL, or GUI), accommodate a wide variety of user skills (from data scientist to business user), and flexibly support new deployment models (data center, public cloud, private cloud, managed service, multicloud, and hybrid cloud, alone or in any combination).

    Author: Philip Russom

    Source: TDWI

  • Master Data Management and the role of (un)structured data

    MasterDataManagementTraditional conversations about master data management’s utility have centered on determining what actually constitutes MDM, how to implement data governance with it, and the balance between IT and business involvement in the continuity of MDM efforts.

    Although these concerns will always remain apposite, MDM’s overarching value is projected to significantly expand in 2018 to directly create optimal user experiences—for customers and business end users. The crux of doing so is to globalize its use across traditional domains and business units for more comprehensive value.

    “The big revelation that customers are having is how do we tie the data across domains, because that reference of what it means from one domain to another is really important,” Stibo Systems Chief Marketing Officer Prashant Bhatia observed.

    The interconnectivity of MDM domains is invaluable not only for monetization opportunities via customer interactions, but also for streamlining internal processes across the entire organization. Oftentimes the latter facilitates the former, especially when leveraged in conjunction with contemporary opportunities related to the Internet of Things and Artificial Intelligence.

    Structured and Unstructured Data

    One of the most eminent challenges facing MDM related to its expanding utility is the incorporation of both structured and unstructured data. Fueled in part by the abundance of external data besieging the enterprise from social, mobile, and cloud sources, unstructured and semi-structured data can pose difficulties to MDM schema.

    After attending the recent National Retail Federation conference with over 30,000 attendees, Bhatia noted that one of the primary themes was, “Machine learning, blockchain, or IoT is not as important as how does a company deal with unstructured data in conjunction with structured data, and understand how they’re going to process that data for their enterprise. That’s the thing that companies—retailers, manufacturers, etc.—have to figure out.”

    Organizations can integrate these varying data types into a single MDM platform by leveraging emerging options for schema and taxonomies with global implementations, naturally aligning these varying formats together. The competitive advantage generated from doing so is virtually illimitable. 

    Original equipment manufacturers and equipment asset management companies can attain real-time, semi-structured or unstructured data about failing equipment and use that to influence their product domain with attributes informing the consequences of a specific consumer’s tire, for example. The aggregation of that semi-structured data with structured data in an enterprise-spanning MDM system can influence several domains. 

    Organizations can reference it with customer data for either preventive maintenance or discounted purchase offers. The location domain can use it to provide these services close to the customer; integrations with lifecycle management capabilities can determine what went wrong and how to correct it. “That IoT sensor provides so much data that can tie back to various domains,” Bhatia said. “The power of the MDM platform is to tie the data for domains together. The more domains that you can reference with one another, you get exponential benefits.”

    Universal Schema

    Although the preceding example pertained to the IoT, it’s worth noting that it’s applicable to virtually any data source or type. MDM’s capability to create these benefits is based on its ability to integrate different data formats on the back end. A uniformity of schema, taxonomies, and data models is desirable for doing so, especially when using MDM across the enterprise. 

    According to Franz CEO Jans Aasman, traditionally “Master Data Management just perpetuates the difficulty of talking to databases. In general, even if you make a master data schema, you still have the problem that all the data about a customer, or a patient, or a person of interest is still spread out over thousands of tables.” 

    Varying approaches can address this issue; there is growing credence around leveraging machine learning to obtain master data from various stores. Another approach is to considerably decrease the complexity of MDM schema so it’s more accessible to data designated as master data. By creating schema predicated on an exhaustive list of business-driven events, organizations can reduce the complexity of myriad database schemas (or even of conventional MDM schemas) so that their “master data schema is incredibly simple and elegant, but does not lose any data,” Aasman noted.

    Global Taxonomies

    Whether simplifying schema based on organizational events and a list of their outcomes or using AI to retrieve master data from multiple locations, the net worth of MDM is based on the business’s ability to inform the master data’s meaning and use. The foundation of what Forrester terms “business-defined views of data” is oftentimes the taxonomies predicated on business use as opposed to that of IT. Implementing taxonomies enterprise-wide is vital for the utility of multi-domain MDM (which compounds its value) since frequently, as Aasman indicated, “the same terms can have many different meanings” based on use case and department.

    The hierarchies implicit in taxonomies are infinitely utilitarian in this regard, since they enable consistency across the enterprise yet have subsets for various business domains. According to Aasman, the Financial Industry Bank Ontology can also function as a taxonomy in which, “The higher level taxonomy is global to the entire bank, but the deeper you go in a particular business you get more specific terms, but they’re all bank specific to the entire company.” 

    The ability of global taxonomies to link together meaning in different business domains is crucial to extracting value from cross-referencing the same master data for different applications or use cases. In many instances, taxonomies provide the basis for search and queries that are important for determining appropriate master data.

    Timely Action

    By expanding the scope of MDM beyond traditional domain limitations, organizations can redouble the value of master data for customers and employees. By simplifying MDM schema and broadening taxonomies across the enterprise, they increase their ability to integrate unstructured and structured data for timely action. “MDM users in a B2B or B2C market can provide a better experience for their customers if they, the retailer and manufacturer, are more aware and educated about how to help their end customers,” Bhatia said.

     

    Author: Jelani Harper

    Source: Information Management

  • The most important BI trends to watch: perspective of data pipeline producer Fivetran

    The most important BI trends to watch: perspective of data pipeline producer Fivetran

    There’s nothing more ritical to successful BI than centralized data. TDWI spoke to George Fraser, CEO and technical cofounder of Fivetran, who explains why this is so important, what technologies are important now, and what emerging technology you should be paying attention to.

    What technology or methodology must be part of an enterprise’s data strategy if it wants to be competitive nowadays?

    George Fraser: You need to have a tool that centralizes your data into a single location. You’re not going to be able to do all the things that you want to do with your data unless you have it all centralized.

    What one emerging technology are you most excited about and think has the greatest potential? What’s so special about this technology?

    GF: Data warehouses that separate compute from storage, such as Snowflake and BigQuery. These warehouses leverage the cloud in a fundamentally different way than earlier data warehouses. When you’re able to scale your compute up and down on demand, many of the problems that people had in the past with data warehouses just disappear. Professionals that spent 90% of their time working on data warehouse issues in the past can now spend their time on projects that drive value for the business. Everyone knows this technology is special, but it's even more special than people realize.

    What is the single biggest challenge enterprises face today? How do most enterprises respond (and is it working)?

    GF: I think the single biggest challenge businesses, not just enterprises, face is hiring great talent. We’ve been in a ten-year economic boom and the job market is about as tight as it can get.

    One thing that makes an organization a good place to work is transparency. A lot of people talk about using data to make better business decisions, which is important. But there is a second benefit that relates directly to finding talent and hiring, which is that you can create transparency with data.

    As an employer, you can ensure that everyone in your company across all departments and levels has access to the data being used to drive decisions. You can use data to create transparency so that everyone can understand what is going on in the business and why decisions are being made the way they are.

    Your employees should be able to see whether the things they’re doing are working or contributing to the overall goals of the business. This can be achieved through data and data transparency.

    Is there a new technology in data and analytics that is creating more challenges than most people realize? How should enterprises adjust their approach to it?

    GF: There are great new BI tools out there, such as Looker and Sigma Computing, as well as great new data warehouses, such as BigQuery and Snowflake. But I think what people sometimes don't realize is that these tools can't do anything about bringing the data into themselves. They give you a great environment to analyze your data, but they won't put the data in that environment. Although a lot of their marketing materials talk about getting all of your data in one place, they don’t actually solve that problem.

    Getting your data together is difficult. Oftentimes companies end up deciding to put together their data warehouse and do the ETL (extract, transform, load) with their own engineering team. They don't necessarily realize that they're signing up for a mammoth effort to centralize their data that is going to metastasize across the organization, consuming unthinkable amounts of time and resources.

    Enterprises should adjust their approach by using ELT tools (which are different than ETL) that specifically solve for the data centralization problem.

    Where do you see analytics and data management headed in 2019 and beyond? What’s just over the horizon that we haven’t heard much about yet?

    GF: I think we’re going to see a lot of the existing trends continue, including increased adoption of cloud data warehouses, which I think will become one of the dominant use cases of cloud computing. The main job of IT at a lot of traditional companies is data warehousing. As more and more traditional companies move to the cloud, we’re going to see data warehousing representing a growing percentage of what is being done in the cloud.

    Something just over the horizon I find very interesting is Azure Data Explorer. It is basically a complete reconceptualization of the data warehouse, including a totally new query language that is fascinating. Although it has just recently become publicly available, I am told it has been widely adopted within Microsoft. I’m interested to see how it plays out because it is so ambitious.

    Tell us about your product/solution and the problem it solves for enterprises.

    GF: Fivetran is a zero-configuration data pipeline that centralizes all your business data in your data warehouse. Centralizing your data is the hardest part of building an enterprise data warehouse, and we’ve built the only truly turnkey solution.

    Author: James E. Powell

    Source: TDWI

  • Three trends that urge to modernization of data warehouses

    Modern Data Warehouse image Jan 2017

    In the last couple of years, we’ve seen the rapid adoption of machine learning into the analytics environment, moving from science experiment to table stakes. In fact, at this point, I’m hard pressed to think of an enterprise that doesn’t have at least some sort of predictive or machine learning strategy already in place.

    Meanwhile, data warehouses have long been the foundation of analytics and business intelligence––but they’ve also traditionally been complex and expensive to operate. With the widespread adoption of machine learning and the increasing need to broaden access to data beyond just data science teams, we are seeing a fundamental shift in the way organizations should approach data warehousing.

    With this in mind, here are three broad data management trends I expect will accelerate this year:

    Operationalize insights with analytical databases

    I’m seeing a lot of convergence between machine learning and analytics. As a result, people are using machine learning frameworks such as R, Python, and Spark to do their machine learning.

    They also then do their best to make those results available in ways that are accessible to the rest of the business beyond only data scientists. These talented data scientists are hacking away using their own tools but these are just not going to be accessed by business analysts. 

    How you get the best of both worlds is to allow data scientists to use their tools of choice to produce their predictions, but then publish those results to an analytical database, which is more open to business users. The business user is already familiar with tools like Tableau, so by using an analytical database they can easily operationalize insights from the predictive model outcomes.

    Growth in streaming data sources

    Similar to the convergence of machine learning and analytics, I’m also seeing much greater interest in how to support streaming use cases or streaming data sources. 

    There are a number of technologies, among them Kafka, that provide a way to capture and propagate streams and do stream-based processing. Many systems from web analytics stacks to a single microservice in someone’s application stack are pushing out interesting events to a Kafka topic. But how do you consume that? 

    There are specialized streaming databases, for example, that allow you to consume this in real time. In some cases that works well but in others it's not as natural, especially when trending across larger data ranges. Accomplishing this is easier by pushing that streaming data into an analytics database.

    The ephemeral data mart

    The third trend I’m seeing more of, and I expect to accelerate in 2018, is what I would call the ephemeral data mart. 

    What I mean by that is to quickly bring together a data set, perform some queries, and then the data can be thrown away. As such, data resiliency and high availability become less important than data ingestion and computation speed. I’m seeing this in some of our customers and expect to see more.

    One customer in particular is using an analytics database to do processing of very large test results. By creating an ephemeral data mart for each test run, they can perform post-test analysis and trending, then just store the results for the longer term. 

    As organizations need better and more timely analytics that fit within their hardware and cost budgets, it’s changing the ways data is accessed and stored. The trends I’ve outlined above are ones that I expect to gather steam this year, and can serve as guideposts for enterprises that recognize the need to modernize their approach to data warehouses.

    Author: Dave Thompson

    Source: Information Management

EasyTagCloud v2.8