20 items tagged "Data warehousing"

  • ‘Vooruitgang in BI, maar let op ROI’

    5601405Business intelligence (bi) werd door Gartner al benoemd tot hoogste prioriteit voor de cio in 2016. Ook de Computable-experts voorspellen dat er veel en grote stappen genomen gaan worden binnen de bi. Tegelijkertijd moeten managers ook terug kijken en nadenken over hun businessmodel bij de inzet van big data: hoe rechtvaardig je de investeringen in big data?

    Kurt de Koning, oprichter van Dutch Offshore ICT Management
    Business intelligence/analytics is door Gartner op nummer één gezet voor 2016 op de prioriteitenlijst voor de cio. Gebruikers zullen in 2016 hun beslissingen steeds meer laten afhangen van stuurinformatie die uit meerdere bronnen komt. Deze bronnen zullen deels bestaan uit ongestructureerde data. De bi-tools zullen dus niet alleen visueel de informatie aantrekkelijk moeten opmaken en een goede gebruikersinterface moeten bieden. Bij het ontsluiten van de data zullen die tools zich onderscheiden , die in staat zijn om orde en overzicht te scheppen uit de vele verschijningsvormen van data.

    Laurent Koelink, senior interim BI professional bij Insight BI
    Big data-oplossingen naast traditionele bi
    Door de groei van het aantal smart devices hebben organisaties steeds meer data te verwerken. Omdat inzicht (in de breedste zin) een van de belangrijkste succesfactoren van de toekomst gaat zijn voor veel organisaties die flexibel in willen kunnen spelen op de vraag van de markt, zullen zijn ook al deze nieuwe (vormen) van informatie moeten kunnen analyseren. Ik zie big data niet als vervangen van traditionele bi-oplossingen, maar eerder als aanvulling waar het gaat om analytische verwerking van grote hoeveelheden (vooral ongestructureerde) data.

    In-memory-oplossingen
    Organisaties lopen steeds vaker aan tegen de performance-beperkingen van traditionele database systemen als het gaat om grote hoeveelheden data die ad hoc moeten kunnen worden geanalyseerd. Specifieke hybride database/hardware-oplossingen zoals die van IBM, SAP en TeraData hebben hier altijd oplossingen voor geboden. Daar komen nu steeds vaker ook in-memory-oplossingen bij. Enerzijds omdat deze steeds betaalbaarder en dus toegankelijker worden, anderzijds doordat dit soort oplossingen in de cloud beschikbaar komen, waardoor de kosten hiervan goed in de hand te houden zijn.

    Virtual data integration
    Daar waar data nu nog vaak fysiek wordt samengevoegd in aparte databases (data warehouses) zal dit, waar mogelijk, worden vervangen door slimme metadata-oplossingen, die (al dan niet met tijdelijke physieke , soms in memory opslag) tijdrovende data extractie en integratie processen overbodig maken.

    Agile BI development
    Organisaties worden meer en meer genoodzaakt om flexibel mee te bewegen in en met de keten waar ze zich in begeven. Dit betekent dat ook de inzichten om de bedrijfsvoering aan te sturen (de bi-oplossingen) flexibel moeten mee bewegen. Dit vergt een andere manier van ontwikkelen van de bi-ontwikkelteams. Meer en meer zie je dan ook dat methoden als Scrum ook voor bi-ontwikkeling worden toegepast.

    Bi voor de iedereen
    Daar waar bi toch vooral altijd het domein van organisaties is geweest zie je dat ook consumenten steeds meer en vaker gebruik maken van bi-oplossingen. Bekende voorbeelden zijn inzicht in financiën en energieverbruik. De analyse van inkomsten en uitgaven op de webportal of in de app van je bank, maar ook de analyse van de gegevens van slimme energiemeters zijn hierbij sprekende voorbeelden. Dit zal in de komende jaren alleen maar toenemen en geïntegreerd worden.

    Rein Mertens, head of analytical platform bij SAS
    Een belangrijke trend die ik tot volwassenheid zie komen in 2016 is ‘streaming analytics’. Vandaag de dag is big data niet meer weg te denken uit onze dagelijkse praktijk. De hoeveelheid data welke per seconde wordt gegenereerd blijft maar toenemen. Zowel in de persoonlijke als zakelijke sfeer. Kijk maar eens naar je dagelijkse gebruik van het internet, e-mails, tweets, blog posts, en overige sociale netwerken. En vanuit de zakelijke kant: klantinteracties, aankopen, customer service calls, promotie via sms/sociale netwerken et cetera.

    Een toename van volume, variatie en snelheid van vijf Exabytes per twee dagen wereldwijd. Dit getal is zelfs exclusief data vanuit sensoren, en overige IoT-devices. Er zit vast interessante informatie verstopt in het analyseren van al deze data, maar hoe doe je dat? Een manier is om deze data toegankelijk te maken en op te slaan in een kosteneffectief big data-platform. Onvermijdelijk komt een technologie als Hadoop dan aan de orde, om vervolgens met data visualisatie en geavanceerde analytics aan de gang te gaan om verbanden en inzichten uit die data berg te halen. Je stuurt als het ware de complexe logica naar de data toe. Zonder de data allemaal uit het Hadoop cluster te hoeven halen uiteraard.

    Maar wat nu, als je op basis van deze grote hoeveelheden data ‘real-time’ slimme beslissingen zou willen nemen? Je hebt dan geen tijd om de data eerst op te slaan, en vervolgens te gaan analyseren. Nee, je wilt de data in-stream direct kunnen beoordelen, aggregeren, bijhouden, en analyseren, zoals vreemde transactie patronen te detecteren, sentiment in teksten te analyseren en hierop direct actie te ondernemen. Eigenlijk stuur je de data langs de logica! Logica, die in-memory staat en ontwikkeld is om dat heel snel en heel slim te doen. En uiteindelijke resultaten op te slaan. Voorbeelden van meer dan honderdduizend transacties zijn geen uitzondering hier. Per seconde, welteverstaan. Stream it, score it, store it. Dat is streaming analytics!

    Minne Sluis, oprichter van Sluis Results
    Van IoT (internet of things) naar IoE (internet of everything)
    Alles wordt digitaal en connected. Meer nog dan dat we ons zelfs korte tijd geleden konden voorstellen. De toepassing van big data-methodieken en -technieken zal derhalve een nog grotere vlucht nemen.

    Roep om adequate Data Governance zal toenemen
    Hoewel het in de nieuwe wereld draait om loslaten, vertrouwen/vrijheid geven en co-creatie, zal de roep om beheersbaarheid toch toenemen. Mits vooral aangevlogen vanuit een faciliterende rol en zorgdragend voor meer eenduidigheid en betrouwbaarheid, bepaald geen slechte zaak.

    De business impact van big data & data science neemt toe
    De impact van big data & data science om business processen, diensten en producten her-uit te vinden, verregaand te digitaliseren (en intelligenter te maken), of in sommige gevallen te elimineren, zal doorzetten.

    Consumentisering van analytics zet door
    Sterk verbeterde en echt intuïtieve visualisaties, geschraagd door goede meta-modellen, dus data governance, drijft deze ontwikkeling. Democratisering en onafhankelijkheid van derden (anders dan zelfgekozen afgenomen uit de cloud) wordt daarmee steeds meer werkelijkheid.

    Big data & data science gaan helemaal doorbreken in de non-profit
    De subtiele doelstellingen van de non-profit, zoals verbetering van kwaliteit, (patiënt/cliënt/burger) veiligheid, punctualiteit en toegankelijkheid, vragen om big data toepassingen. Immers, voor die subtiliteit heb je meer goede informatie en dus data, sneller, met meer detail en schakering nodig, dan wat er nu veelal nog uit de traditionelere bi-omgevingen komt. Als de non-profit de broodnodige focus van de profit sector, op ‘winst’ en ‘omzetverbetering’, weet te vertalen naar haar eigen situatie, dan staan succesvolle big data initiatieven om de hoek! Mind you, deze voorspelling geldt uiteraard ook onverkort voor de zorg.

    Hans Geurtsen, business intelligence architect data solutions bij Info Support
    Van big data naar polyglot persistence
    In 2016 hebben we het niet meer over big, maar gewoon over data. Data van allerlei soorten en in allerlei volumes die om verschillende soorten opslag vragen: polyglot persistence. Programmeurs kennen de term polyglot al lang. Een applicatie anno 2015 wordt vaak al in meerdere talen geschreven. Maar ook aan de opslag kant van een applicatie is het niet meer alleen relationeel wat de klok zal slaan. We zullen steeds meer andere soorten databases toepassen in onze data oplossingen, zoals graph databases, document databases, etc. Naast specialisten die alles van één soort database afweten, heb je dan ook generalisten nodig die precies weten welke database zich waarvoor leent.

    De doorbraak van het moderne datawarehouse
    ‘Een polyglot is iemand met een hoge graad van taalbeheersing in verschillende talen’, aldus Wikipedia. Het gaat dan om spreektalen, maar ook in het it-vakgebied, kom je de term steeds vaker tegen. Een applicatie die in meerdere programmeertalen wordt gecodeerd en data in meerdere soorten databases opslaat. Maar ook aan de business intelligence-kant volstaat één taal, één omgeving niet meer. De dagen van het traditionele datawarehouse met een etl-straatje, een centraal datawarehouse en één of twee bi-tools zijn geteld. We zullen nieuwe soorten data-platformen gaan zien waarin allerlei gegevens uit allerlei bronnen toegankelijk worden voor informatiewerkers en data scientists die allerlei tools gebruiken.

    Business intelligence in de cloud
    Waar vooral Nederlandse bedrijven nog steeds terughoudend zijn waar het de cloud betreft, zie je langzaam maar zeker dat de beweging richting cloud ingezet wordt. Steeds meer bedrijven realiseren zich dat met name security in de cloud vaak beter geregeld is dan dat ze zelf kunnen regelen. Ook cloud leveranciers doen steeds meer om Europese bedrijven naar hun cloud te krijgen. De nieuwe data centra van Microsoft in Duitsland waarbij niet Microsoft maar Deutsche Telekom de controle en toegang tot klantgegevens regelt, is daar een voorbeeld van. 2016 kan wel eens hét jaar worden waarin de cloud écht doorbreekt en waarin we ook in Nederland steeds meer complete BI oplossingen in de cloud zullen gaan zien.

    Huub Hillege, principal data(base) management consultant bij Info-Shunt
    Big data
    De big data-hype zal zich nog zeker voortzetten in 2016 alleen het succes bij de bedrijven is op voorhand niet gegarandeerd. Bedrijven en pas afgestudeerden blijven elkaar gek maken over de toepassing. Het is onbegrijpelijk dat iedereen maar Facebook, Twitter en dergelijke data wil gaan ontsluiten terwijl de data in deze systemen hoogst onbetrouwbaar is. Op elke conferentie vraag ik waar de business case, inclusief baten en lasten is, die alle investeringen rondom big data rechtvaardigen. Zelfs bi-managers van bedrijven moedigen aan om gewoon te beginnen. Dus eigenlijk: achterom kijken naar de data die je hebt of kunt krijgen en onderzoeken of je iets vindt waar je iets aan zou kunnen hebben. Voor mij is dit de grootste valkuil, zoals het ook was met de start van Datawarehouses in 1992. Bedrijven hebben in de huidige omstandigheden beperkt geld. Zuinigheid is geboden.

    De analyse van big data moet op de toekomst zijn gericht vanuit een duidelijke business-strategie en een kosten/baten-analyse: welke data heb ik nodig om de toekomst te ondersteunen? Bepaal daarbij:

    • Waar wil ik naar toe?
    • Welke klantensegmenten wil ik erbij krijgen?
    • Gaan we met de huidige klanten meer 'Cross selling' (meer producten) uitvoeren?
    • Gaan we stappen ondernemen om onze klanten te behouden (Churn)?

    Als deze vragen met prioriteiten zijn vastgelegd moet er een analyse worden gedaan:

    • Welke data/sources hebben we hierbij nodig?
    • Hebben we zelf de data, zijn er 'gaten' of moeten we externe data inkopen?

    Databasemanagementsysteem
    Steeds meer databasemanagementsysteem (dbms)-leveranciers gaan ondersteuning geven voor big data-oplossingen zoals bijvoorbeeld Oracle/Sun Big Data Appliance, Teradata/Teradata Aster met ondersteuning voor Hadoop. De dbms-oplossingen zullen op de lange termijn het veld domineren. big data-software-oplossingen zonder dbms zullen het uiteindelijk verliezen.

    Steeds minder mensen, ook huidige dbma's, begrijpen niet meer hoe het technisch diep binnen een database/DBMS in elkaar zit. Steeds meer zie je dat fysieke databases uit logische data modelleer-tools worden gegeneerd. Formele fysieke database-stappen/-rapporten blijven achterwege. Ook ontwikkelaars die gebruik maken van etl-tools zoals Informatica, AbInitio, Infosphere, Pentaho et cetera, genereren uiteindelijk sgl-scripts die data van sources naar operationele datastores en/of datawarehouse brengen.

    Ook de bi-tools zoals Microstrategy, Business Objects, Tableau et cetera genereren sql-statements.
    Meestal zijn dergelijke tools initieel ontwikkeld voor een zeker dbms en al gauw denkt men dat het dan voor alle dbms'en toepasbaar is. Er wordt dan te weinig gebruik gemaakt van specifieke fysieke dbms-kenmerken.

    De afwezigheid van de echte kennis veroorzaakt dan performance problemen die in een te laat stadium worden ontdekt. De laatste jaren heb ik door verandering van databaseontwerp/indexen en het herstructureren van complexe/gegenereerde sql-scripts, etl-processen van zes tot acht uur naar één minuut kunnen krijgen en queries die 45 tot 48 uur liepen uiteindelijk naar 35 tot veertig minuten kunnen krijgen.

    Advies
    De benodigde data zal steeds meer groeien. Vergeet de aanschaf van allerlei hype software pakketten. Zorg dat je zeer grote, goede, technische, Database-/dbms-expertise in huis haalt om de basis van onderen goed in te richten in de kracht van je aanwezige dbms. Dan komt er tijd en geld vrij (je kan met kleinere systemen uit de voeten omdat de basis goed in elkaar zit) om, na een goede business case en ‘proof of concepts’, de juiste tools te selecteren.

  • Be careful when implementing data warehouse automation

    DWHAAutomation can be a huge help, but automating concepts before you understand them is a recipe for disaster.

    The concept of devops has taken root in the world of business intelligence and analytics.

    The overall concept of devops has been around for a while in traditional IT departments as they sought to expand and refine the way that they implemented software and applications. The core of devops in the world of analytics is called DWA (data warehouse automation), which links together the design and implementation of analytical environments into repeatable processes and should lead to increased data warehouse and data mart quality, as well as decreased time to implement those environments.

    Unfortunately, for several reasons the concept of data warehouse automation is not a silver bullet when it comes to the implementation of analytical environments.

    One reason is that you really shouldn't automate concepts before you fully understand them. As the saying goes, don't put your problems on roller skates. Automating a broken process only means that you make mistakes faster. Now, while I often advocate the concept of failing faster to find the best solution to an analytical problem, I don't really agree with the concept of provisioning flawed database structures very quickly only to rebuild them later.

    Another issue with applying devops to analytical practices is that the software development community has a 10-15 year head start on the analytical community when it comes to productizing elements of their craft.

    oftware developers have spent years learning how to best encapsulate their designs into object-oriented design, package that knowledge, and put it in libraries for use by other parts of the organization, or even by other organizations. Unfortunately, the design, architecture, and implementation of analytical components, such as data models, dashboard design, and database administration, are viewed as an art and still experience cultural resistance to the concept that a process can repeat the artistry of a data model or a dashboard design.

    Finally, there is the myth that data warehouse automation or any devops practice can replace the true thought processes that go into the design of an analytical environment.

    With the right processes and cultural buy-in, DWA will provide an organization with the ability to leverage their technical teams and improve the implementation time of changes in analytical environments. However, without that level of discipline to standardize the right components and embrace artistry on the tricky bits, organizations will take the concept of data warehouse automation and fail miserably in their efforts to automate.

    The following is good advice for any DWA practice:

    • Use the right design process and engage the analytical implementation teams. Without this level of forethought and cultural buy-in, the process becomes more of an issue than it does a benefit and actually takes longer to implement than a traditional approach.
    • Find the right technologies to use. There are DWA platforms available to use, but there are also toolsets such as scripting and development environments that can provide much of the implementation value of a data warehouse automation solution. The right environment for your team's skills and budget will go a long way to either validating a DWA practice or showing its limitations.
    • Iterate and improve. Just as DWA is designed to iterate the development of analytical environments, data warehouse automation practices should have the same level of iteration. Start small. Perfect the implementation. Expand the scope. Repeat.

    Source: Infoworld

  • Business Intelligence in 3PL: Mining the Value of Data

    data-mining-techniques-create-business-value 1In today’s business world, “information” is a renewable resource and virtually a product in itself. Business intelligence technology enables businesses to capture historical, current and predictive views of their operations, incorporating such functions as reporting, real-time analytics, data and process mining, performance management, predictive analytics, and more. Thus, information in its various forms and locations possesses genuine inherent value.
     
    In the real world of warehousing, the availability of detailed, up-to-the minute information on virtually every item in the operators’ custody, from inbound dock to delivery site, leads to greater efficiency in every area it touches. Logic would offer that greater profitability ensues.
     
    Three areas of 3PL operations seem to be most benefitted through savings opportunities identified through business intelligence solutions: labor, inventory, and analytics.
    In the first case, business intelligence tools can help determine the best use of the workforce, monitoring its activity in order to assure maximum effective deployment. The result: potentially major jumps in efficiency, dramatic reductions in downtime, and healthy increases in productivity and billable labor.
     
    In terms of inventory management, the metrics obtainable through business intelligence can stem inventory inaccuracies that would have resulted in thousands of dollars in annual losses, while also reducing write-offs.
     
    Analytics through business intelligence tools can also accelerate the availability of information, as well as provide the optimal means of presentation relative to the type of user. One such example is the tracking of real-time status of work load by room or warehouse areas; supervisors can leverage real-time data to re-assign resources to where they are needed in order to balance workloads and meet shipping times. A well-conceived business intelligence tool can locate and report on a single item within seconds and a couple of clicks.
     
    Extending the Value
    The value of business intelligence tools is definitely not confined to the product storage areas.
     
    With automatically analyzed information available in a dashboard presentation, users – whether in the office or on the warehouse floor – can view the results of their queries/searches in a variety of selectable formats, choosing the presentation based on its usefulness for a given purpose. Examples:
    • Status checks can help identify operational choke points, such as if/when/where an order has been held up too long; if carrier wait-times are too long; and/or if certain employees have been inactive for too long.
    • Order fulfillment dashboards can monitor orders as they progress through the picking, staging and loading processes, while also identifying problem areas in case of stalled processes.
    • Supervisors walking the floor with handheld devices can both encourage team performance and, at the same time, help assure efficient dock-side activity. Office and operations management are able to monitor key metrics in real-time, as well as track budget projections against actual performance data.
    • Customer service personnel can call up business intelligence information to assure that service levels are being maintained or, if not, institute measures to restore them.
    • And beyond the warehouse walls, sales representatives in the field can access mined and interpreted data via mobile devices in order to provide their customers with detailed information on such matters as order fill rates, on-time shipments, sales and order volumes, inventory turnover, and more.
    Thus, well-designed business intelligence tools not only can assemble and process both structured and unstructured information from sources across the logistics enterprise, but can deliver it “intelligently” – that is, optimized for the person(s) consuming it. These might include frontline operators (warehouse and clerical personnel), front line management (supervisors and managers), and executives.
     
    The Power of Necessity
    Chris Brennan, Director of Innovation at Halls Warehouse Corp., South Plainfield N.J., deals with all of these issues as he helps manage the information environment for the company’s eight facilities. Moreover, as president of the HighJump 3PL User Group, he strives to foster collective industry efforts to cope with the trends and issues of the information age as it applies to warehousing and distribution.
     
    “Even as little as 25 years ago, business intelligence was a completely different art,” Brennan has noted. “The tools of the trade were essentially networks of relationships through which members kept each other apprised of trends and happenings. Still today, the power of mutual benefit drives information flow, but now the enormous volume of data available to provide intelligence and drive decision making forces the question: Where do I begin?”
     
    Brennan has taken a leading role in answering his own question, drawing on the experience and insights of peers as well as the support of HighJump’s Enterprise 3PL division to bring Big Data down to size:
     
    “Business intelligence isn’t just about gathering the data,” he noted, “it’s about getting a group of people with varying levels of background and comfort to understand the data and act upon it. Some managers can glance at a dashboard and glean everything they need to know, but others may recoil at a large amount of data. An ideal BI solution has to relay information to a diverse group of people and present challenges for them to think through.”
     
    source: logisticviewpoints.com, December 6, 2016
  • Data governance: using factual data to form subjective judgments

    Data governance: using factual data to form subjective judgments

    Data Warehouses were born of the finance and regulatory age. When you peel away the buzz words, the principle goal of this initial phase of business intelligence was the certification of truth. Warehouses helped to close the books and analyze results. Regulations like Dodd Frank wanted to make sure that you took special care to certify the accuracy of financial results and Basel wanted certainty around capital liquidity and on and on. Companies would spend months or years developing common metrics, KPIs, and descriptions so that a warehouse would accurately represent this truth.

    In our professional lives, many items still require this certainty. There can only be one reported quarterly earnings figure. There can only be one number of beds in a hospital or factories available for manufacturing. However, an increasing number of questions do not have this kind of tidy right and wrong answer. Consider the following:

    • Who are our best customers?
    • Is that loan risky?
    • Who are our most effective employees?
    • Should I be concerned about the latest interest rate hike?

    Words like best, risky, and effective are subjective by their very natures. Jordon Morrow (Qlik) writes and speaks extensively about the importance of data literacy and uses a phrase that has always felt intriguing: data literacy requires the ability to argue with data. This is key when the very nature of what we are evaluating does not have neat, tidy truths.

    Let’s give an example. A retail company trying to liquidate its winter inventory and has asked three people to evaluate the best target list for an e-mail campaign.

    • John downloads last year’s campaign results and collects the names and e-mail addresses of the 2% that responded to the campaign last year with an order.
    • Jennifer thinks about the problem differently. She looks through sales records of anyone who has bought winter merchandise in the past 5 years during the month of March who had more than a 25% discount on the merchandise. She notices that these people often come to the web site to learn about sales before purchasing. Her reasoning is that a certain type of person who likes discounts and winter clothes is the target.
    • Juan takes yet another approach. He looks at social media feeds of brand influencers. He notices that there are 100 people with 1 million or more followers and that social media posts by these people about product sales traditionally cause a 1% spike in sales for the day as their followers flock to the stores. This is his target list.

    So who has the right approach? This is where the ability to argue with data becomes critical. In theory, each of these people should feel confident developing a sales forecast on his or her model. They should understand the metric that they are trying to drive and they should be able to experiment with different ideas to drive a better outcome and confidently state their case.

    While this feels intuitive, enterprise processes and technologies are rarely set up to support this kind of vibrant analytics effort. This kind of analytics often starts with the phrase “I wonder if…” while conventional IT and data governance frameworks are not able generally to deal with questions that a person did not know that they had 6 months before. And yet, “I wonder if” relies upon data that may have been unforeseen. In fact, it usually requires a connection of data sets that have often never been connected before to drive break-out thinking. Data science is about identifying those variables and metrics that might be better predictors of performance. This relies on the analysis of new, potentially unexpected data sets like social media followers, campaign results, web clicks, sales behavior etc. Each of these items might be important for an analysis, but in a world in which it is unclear what is and is not important, how can a governance organization anticipate and apply the same dimensions of quality to all of the hundreds of data sets that people might use? And how can they apply the same kind of rigor to data quality standards for the hundreds of thousands of data elements available as opposed to the 100-300 critical data elements.

    They can’t. And that’s why we need to re-evaluate the nature of data governance for different kinds of analytics.

    Author: Joe Dos Santos

    Source: Qlik

  • Data warehouse automation: what you need to know

    data warehouseIn the dark about data warehousing? You’re not alone

    You would be forgiven for not knowing data warehousing exists, let alone that it’s been automated. It’s not a topic that gets a lot of coverage in the UK, unlike in the USA and Europe. It might be that Business Intelligence and Big Data Analytics are topics that have more ‘curb’ appeal. But, without data warehousing, data analytics would not generate the quality of business intelligence that organisations rely on. So what is a data warehouse and why did it need to be automated?

    Here’s what you need to know about data warehouse automation.

    In its most basic form a data warehouse is a repository where all your data is put, so that it can be analysed for business insight, and most business have one. Your customers will most likely have one because they need the kind of insight data analysis provides. Business Insight or Intelligence (BI) helps the business make accurate decisions, stay competitive and ultimately profitable.

    In retail, for example, the accurate and timely reporting of sales, inventory, discounts and profit is critical to getting a consolidated view of the business at all levels and at all locations. In addition, analysing customer data can inform businesses which promotions work, which products sell, which locations work best, what loyalty vouchers and schemes are working, and which are not. Knowing customer demographics can help retailers to cross or upsell items. By analysing customer data companies can tailor products to the right specification, at the right time thereby improving customer relations and ultimately increasing customer retention.

    Analysing all the data

    But, this is only part of the picture. The best intelligence will come from an analysis of all the data the company has. There are several places where companies get data. They usually have their own internal systems that have finance data, HR data, sales data, and other data specific to its business. In addition, most of your customers will now also collect data from the internet and social media (Big Data), with new data coming in from sensors, GPS and smart devices (IoT data). The data warehouse can pull any kind of data from any source into one single place for analysis. A lack of cross-pollination across the business can lead to missed opportunities and a limited corporate view.

    Previously, to get the data from its source (internal or external) into the data warehouse involved writing code by hand. This was monotonous, slow and laborious. It meant that the data warehouse took months to build, and then was rigidly stuck to the coding (and therefore design) it had been built with. Any changes that needed to be made, were equally slow and time consuming creating a frustration for both the IT and the Business. For the business, the data often took so long to be produced that it was out of date by the time they had it.

    Automation

    Things have moved on since the days of the traditional data warehouse and now the design and build of a data warehouse is automated, optimised and wizard driven. It means that the coding is generated automatically. With automation, data is available at the push of a button. Your customers don’t have to be an IT expert to create reports and employees don’t need to ask head office if they want information on a particular product line. Even more importantly, when you automate the data warehouse lifecycle you make it agile, so as your business grows and changes the warehouse can adapt. As we all know, it’s a false economy to invest in a short-term solution, which in a few years, will not be fit for purpose. Equally, it’s no good paying for excellent business intelligence tools and fancy reporting dashboards if the data underneath is not fully accessible, accurate and flexible.

    What does this mean for the channel?

    So now you know the importance of a data warehouse for data analytics, and how automation has brought data warehousing into the 21st century. So, what next? What does this mean for the channel?

    Not everyone in the channel will be interested in automation. Faster more efficient projects might not look like they will generate the immediate profit margins or revenue of a longer, slower one. But, innovative channel partners will be able to see that there are two clear advantages for them. One is that the projects, whilst shorter, never really end. This means there is a consistent stream of income. Secondly, by knowing about and offering your clients data warehouse automation the channel partner shows their expertise and consultancy abilities.

    The simple fact is that most companies have a data warehouse of some kind, from the giant supermarkets such as Tesco and Sainsbury, to smaller businesses like David Lloyd or Jersey Electricity. You don’t want to be the channel partner who didn’t know about or didn’t recommend the best, most efficient solution for your client. This could impact more than just the immediate sales. By educating your customers about the benefits of data warehouse automation you will bring them a wealth of efficiencies to their company, and most likely a wealth of future recommendations to yours.

    Source: ChannelPro

  • Data warehousing: ETL, ELT, and the use of big data

    Data warehousing: ETL, ELT, and the use of big data

    If your company keeps up with the trends in data management, you likely have encountered the concepts and definitions of data warehouse and big data. When your data professionals try to implement data extraction for your business, they need a data repository. For this purpose, they can use a data warehouse and a data lake.

    Roughly speaking, a data lake is mainly used to gather and preserve unstructured data, while a data warehouse is intended for structured and semi-structured data.

    Data warehouse modeling concepts

    All data in a data warehouse is well-organized, archived, and arranged in a particular way. Not all data that can be gathered from multiple sources reach a data warehouse. The source of data is crucial since it impacts the quality of data-driven insights and hence, business decisions.

    During the phase of data warehouse development, a lot of time and effort is needed to analyze data sources and select useful ones. It depends on the business processes, whether a data source has value or not. Data only gets into the warehouse when its value is confirmed.

    On top of that, the way data is represented in your database has a critical role. Concepts of data modeling in a data warehouse are a powerful expression of business requirements specific to a company. A data model determines how data scientists and software engineers will design, create, and implement a database.

    There are three basic types of modeling. Conceptual data model describes all entities a business needs information about. It provides facts about real-world things, customers, and other business-related objects and relations.
    The goal of creating this data model is to synthesize and store all the data needed to gain an understanding of the whole business. This model is designed for the business audience.

    Logical data model suits more in-depth data. It describes the structure of data elements, their attributes, and ways these elements interrelate. For instance, this model can be used to identify relationships between customers and products of interest for them. This model is characterized by a high level of clarity and accuracy.

    Physical data model describes specific data and relationships needed for a particular case as well as the way data model is used in database implementation. It provides a wealth of meta-data and facilitates visualizing the structure of a database. Meta-data can involve accesses, limitations, indexes, and other features.

    ELT and ETL data warehouse concepts

    Large amounts of data sorted for warehousing and analytics require a special approach. Businesses need to gather and process data to retrieve meaningful insights. Thus, data should be manageable, clean, and suitable for molding and transformation.

    ETL (extract, transform, load)and ELT (extract, load, transform) are the two approaches that have technological differences but serve the same purpose – to manage and analyze data.

    ETL is the paradigm that enables data extraction from multiple sources and pulling data into a single database to serve a business.

    At the first stage of the ETL process, engineers extract data from different databases and gather it in a single place. The collected data undergo transformation to take the form required for a target repository. Then the data come to a data warehouse or a target database.

    If to switch the letters 'T' and 'L', you get the ELT process. After the retrieval, the data can be loaded straight to the target database. The cloud technology enables large and scalable storage places, and massive datasets can be first loaded and then transformed as per the business requirements and needs.

    The ELT paradigm is a newer alternative to a well-established ETL process. It is flexible and allows fast processing speed to work with raw data. On the one hand, ELT requires special tools and frameworks, but on the other, it enables unlimited access to business data, thus saving BI and data analytics experts so much time.

    ETL testing concepts are also essential to ensure that data is loading in a data warehouse in a correct and accurate manner. This testing involves data verification at transitional phases. And before data reaches the destination, its quality and usefulness are already verified.

    Types of data warehouse for your company

    Different data warehouse concepts presuppose the use of particular techniques and tools to work with data. Basic data warehouse concepts also differ depending on a company’s size and purposes of using data.

    Enterprise data warehouse enables a unique approach to organizing, visualizing, and representing all the data across a company. Data can be classified by a subject and can be accessed based on this attribute.

    Data mart is a subcategory of a data warehouse designed for specific tasks in business areas such as retail, finance, and so forth. Data comes into a data mart straight from the sources.

    Operational data store satisfies the reporting needs within a company. It is updating in real time, which makes this solution best-suited for keeping in all business records.

    Big data and data warehouse ambiguity

    A data warehouse is an architecture that has proved to be valuable for data storing over the years. It involves data that has a defined value and can be used from the start to solve some business needs. Everyone can access this data, and the features of datasets are reliability and accuracy.

    Big data is a hyped field these days. It is the technology that allows retrieving data from heterogeneous sources. The key features of big data are volume, velocity or data streams, and a variety of data formats. Unlike a data warehouse, big data is a repository that can hold unstructured data as well.

    Companies seek to adopt custom big data solutions to unlock useful information that can help improve decision-making. These solutions help drive revenue, increase profitability, and cut customer churn thanks to the comprehensive information collected and available in one place.

    Data warehouse implementation entails advantages in terms of making informed decisions. It provides comprehensive insights into what is going on within a company, while big data can be in the shape of massive but disorganized datasets. However, big data can be later used for data warehousing.

    Running a data-driven business means dealing with billions of data on in-house, external operations, consumers, and regulations.

    Author: Katrine Spirina

    Source: In Data Labs

  • Dealing with the challenges of data migration

    Dealing with the challenges of data migration

    Once you have decided to migrate your data warehouse to a cloud-based database, the hard and risky work of data migration begins.

    Organizations of all sizes and maturities already have data warehouses deployed and in operation. Modernizing, upgrading, or otherwise improving an incumbent warehouse regularly involves migrating data from platform to platform, and migrations today increasingly move data from on-premises to cloud systems. This is because replatforming is a common data warehouse modernization strategy, whether you will rip-and-replace the warehouse's primary platform or augment it with additional data platforms.

    Even when using an augmentation strategy for data warehouse modernization, 'data balancing' is an inevitable migration task as you redistribute data across the new combination of old and new platforms.

    In a related direction, some data warehouse modernization strategies simplify bloated and redundant portfolios of databases (or take control of rogue data marts and analytics sandboxes) by consolidating them onto fewer platforms, with cloud-based databases increasingly serving as a consolidation platform.

    In all these modernization strategies, the cloud plays an important role. For example, many organizations have a cloud-first mandate because they know that cloud computing is the future of data center infrastructure. In addition, the cloud is a common target for data warehouse modernization because cloud-based data platforms are the most modern ones available for warehouses today.

    Finally, a cloud is an easily centralized and globally available platform, which makes it an ideal target for data consolidation, as well as popular use cases such as analytics, self-service data practices, and data sharing across organizational boundaries.

    Users who modernize a data warehouse need to plan carefully for the complexity, time, business disruption, risks, and costs of migrating and/or consolidating data onto cloud-based platforms suitable for data warehousing, as follows.

    Avoid a big bang project

    That kind of plan attempts to modernize and migrate too much too fast. The large size and complexity of deliverables raises the probability of failure. By comparison, a project plan with multiple phases will be a less risky way to achieve your goals for modernization and cloud migration. A multiphase project plan segments work into multiple manageable pieces, each with a realistic technical goal that adds discernable business value.

    The first deliverable should be easy but useful 

    For example, successful data migration or replatforming projects should focus the first phase on a data subset or use case that is both easy to construct and in high demand by the business. Prioritize early phases so they give everyone confidence by demonstrating technical prowess and business value. Save problematic phases for later.

    Cloud migration is not just for data 

    You are also migrating (or simply redirecting the access of) business processes, groups of warehouse end users, reports, applications, analysts, developers, and data management solutions. Your plan should explain when and how each entity will be migrated or redirected to cloud. Managers and users should be involved in planning to ensure their needs are addressed with minimal disruption to business operations.

    Manage risk with contingency plans

    Expect to fail, but know that segmenting work into phases has the added benefit of limiting the scope of failure. Be ready to recover from failed phases via roll back to a prior phase state. Don't be too eager to unplug the old platforms because you may need them for roll back. It is inevitable that old and new data warehouse platforms (both on premises and on clouds) will operate simultaneously for months or years depending on the size and complexity of the data, user groups, and business processes you are migrating.

    Beware lift-and-shift projects

    Sometimes you can 'lift and shift' data from one system to another with minimal work, but usually you cannot. Even when lift and shift works, developers need to tweak data models and interfaces for maximum performance on the new platform. A replatforming project can easily turn into a development project when data being migrated or consolidated requires considerable work.

    In particular, organizations facing migrations of older applications and data to cloud platforms should assume that lift and shift will be inadequate because old and new platforms (especially when strewn across on-premises and cloud systems) will differ in terms of interfaces, tool or platform functionality, and performance characteristics. When the new platform offers little or no backward compatibility with the old one, development may be needed for platform-specific components, such as stored procedures, user-defined functions, and hand-coded routines.

    Improve data, don't just move it

    Problems with data quality, data modeling, and metadata should be remediated before or during migration. Otherwise you're just bringing your old problems into the new platform. In all data management work, when you move data you should also endeavor to improve data.

    Assemble a diverse team for modernizing and replatforming a data warehouse 

    Obviously, data management professionals are required. Data warehouse modernization and replatforming usually need specialists in warehousing, integration, analytics, and reporting. When tweaks and new development are required, experts in data modeling, architecture, and data languages may be needed. Don't overlook the maintenance work required of database administrators (DBAs), systems analysts, and IT staff. Before migrating to a cloud-based data warehouse platform, consider hiring consultants or new employees who have cloud experience, not just data management experience. Finally, do not overlook the need for training employees on the new cloud platform.

    Data migrations affect many types of people 

    Your plan should accommodate them all. A mature data warehouse will serve a long list of end users who consume reports, dashboards, metrics, analyses, and other products of data warehousing and business intelligence. These people report to a line-of-business manager and other middle managers. Affected parties (i.e., managers and sometimes end users, too) should be involved in planning a data warehouse modernization and migration to cloud. First, their input should affect the whole project from the beginning so they get what they need to be successful with the new cloud data warehouse. Second, the new platform roll-out should take into consideration the productivity and process needs of all affected parties.

    Coordinate with external parties when appropriate

    In some scenarios, such as those for supply chain, e-commerce, and business-to-business relationships, the plan for migration to cloud should also stipulate dates and actions for partners, suppliers, clients, customers, and other external entities. Light technical work may be required of external parties, as when customers or suppliers have online access to reports or analytics supported by a cloud data warehouse platform.

    Author: Philip Russom

    Source: TDWI

  • Decision making by smart technology

    bid-2015Zo heet het congres dat Heliview dinsdag 27 januari 2015 in ‘s Hertogenbosch organiseert over Business Intelligence & Datawarehousing. Business Intelligence blijft volgens vele bronnen op de prioriteitenlijst staan van Nederlandse organisaties. De hoeveelheid gestructureerde en ongestructureerde data neemt in recordtempo toe. Deze data is van onschatbare waarde voor organisaties. Business intelligence stelt organisaties in staat data op een slimme manier te verwerken tot de juiste informatie en daarmee tijd en geld te besparen en concurrentie voor te blijven. Slimme organisaties zijn steeds vaker ook succesvolle organisaties.  

    In het Heliview congres (dat onder dagvoorzitterschap staat van BI-kring initiatiefnemer Egbert Philips) staat de klassieke BI-driehoek centraal. Sprekers als Rick van der Lans, Arent van ‘t Spijker en vele anderen bespreken hoe organisaties betere beslissingen nemen door het slim en op maat inzetten van actuele technologische mogelijkheden op het gebied van data- en informatieverwerking. Voor wat betreft de techniek staan 27 januari centraal: social BI, mobile BI, business analytics en datawarehousing in de cloud.

    Lees hier neer over het congres

     

  • Drie componenten van Agile BI bij Alliander

    agile bi

    Het nutsbedrijf Alliander beheert energienetwerken die gas en elektriciteit distribueren in een groot deel van Nederland en heeft ongeveer 3,3 miljoen klanten. Alliander wil inspelen op de onvoorspelbaarheid van zowel de energiemarkt als de technologie ontwikkelingen en 

    een 'datagedreven' netbeheerder zijn.

    De beschikking hebben over state-of-the-art BI & Analytics oplossingen en toch datadumps moeten aanleveren voor Excel rapportages? Dit is niet iets wat een organisatie wenst, maar is vaak  wel de realiteit en eerlijk zijn: ook in uw organisatie. Men ervaart dat BI-trajecten te lang duren waardoor ‘Excelerados’ in elkaar gezet worden. Ook bij Alliander hebben we hiermee te maken en dit handmatige alternatief is uiteraard ongewenst. De aanwezigheid binnen Alliander van dure BI-oplossingen in een vastgeroeste, inflexibele architectuur met lange ontwikkeltijden is ook niet wenselijk. Daarom hebben we drie componenten toegepast bij het ontwikkelen van BI-projecten om meer agility te krijgen. Deze componenten zijn Scrum, een data provisioning layer gelijkwaardig het logische datawarehouse, en data profiling.

    Binnen Alliander onderkennen we minimaal vier probleemgebieden in de manier van werken met een verouderde architectuur: ‘Exelerados’, dure BI-oplossingen, inflexibiliteit en lange ontwikkeltijden. Daarom is een Agile productontwerp voor Alliander essentieel gebleken om onze ambitie te verwezenlijken en de onderkende uitdagingen aan te gaan. Het Agile product is tot stand gekomen met drie technieken: Alliander’s Data Provisioning Layer (DPL) als Logisch Datawarehouse; flexibel inspelen op veranderende informatiebehoeften met Agile Data Modeling (het zogenaamde account-based model); directe feedback van de eindgebruiker met behulp van Data Profiling en Scrum. We willen dit toelichten in een drietal blogs. Dit is de eerste.

    Data Provisioning Layer als Logisch Datawarehouse
    Het hart van agile productontwikkeling is de architectuur waarmee je werkt. De wendbaarheid van de architectuur is bij Alliander vormgegeven als een Logisch Datawarehouse. De Data Provisioning Layer (DPL) is het onderdeel van de architectuur dat als Logisch Datawarehouse ingezet wordt.

    De Data Provisioning Layer maakt data beschikbaar van verschillende traditionele bronnen (en bijvoorbeeld ook bestaande datawarehouses) die we kennen binnen Alliander, maar ook data van buiten Alliander, bijvoorbeeld data uit het Centrale Aansluit Register (CAR) of van de Kamer van Koophandel (KvK). En verder maakt de DPL ook real-time data beschikbaar, bijvoorbeeld uit het elektriciteits- of gasnet, om deze te kunnen combineren met andere data, zoals meetgegevens uit een onderstation (telemetrie).

    Door met views te werken in de DPL met daaronder virtuele informatiemodellen, maakt het voor de gebruikers van data geen verschil waar de data vandaan komt. Dit stelt ons in staat om bijvoorbeeld heel snel transactiegegevens uit ons ERP-systeem te combineren met geografische gegevens uit onze GIS-systemen, of met real-time data uit de netten.

    Een dashboard of andere toepassing is gebaseerd op een view uit de DPL, waarbij de data bijvoorbeeld direct uit een operational datastore of uit het bronsysteem komt. Als bijvoorbeeld wegens redenen van performance een dimensioneel datamodel nodig is, dan blijft het bestaande informatiemodel intact en daarmee ook de ontwikkelde toepassing voor een gebruiker.

    Conclusies
    Door de gevirtualiseerde DPL in te zetten volgens het concept van het Logisch Datawarehouse, zijn we in staat geweest om de volgende voordelen te behalen:

    • korte levertijden voor rapportages, door ontkoppeling van gegevensbron met gegevensgebruikers;
    • combinatie van externe met interne bronnen;
    • altijd beschikking over actuele gegevens;
    • benaderen van Big Data-bronnen.

    sam geurts

    De  gevirtualiseerde laag, aangeduid met DPL in bovenstaand schema, zorgt ervoor dat snellere integratie van de verschillende bronnen mogelijk is, wat tot betere resultaten leidt.

    In het tweede deel van deze blog gaan we in op de volgende toegepaste techniek: Agile Data Modeling.


    Hüseyin Kara is Senior BI Consultant bij Alliander.
    Sam Geurts is Scrum Master Data & Inzicht bij Alliander.

  • Five factors to help select the right data warehouse product

    meer-bronnenHow big is your company, and what resources does it have? What are your performance needs? Answering these questions and others can help you select the right data warehouse platform.

    Once you've decided to implement a new data warehouse, or expand an existing one, you'll want to ensure that you choose the technology that's right for your organization. This can be challenging, as there are many data warehouse platforms and vendors to consider.

    Long-time data warehouse users generally have a relational database management system (RDBMS) such as IBM DB2, Oracle or SQL Server. It makes sense for these companies to expand their data warehouses by continuing to use their existing platforms. Each of these platforms offers updated features and add-on functionality (see the sidebar, "What if you already have a data warehouse?").

    But the decision is more complicated for first-time users, as all data warehousing platform options are available to them. They can opt to use a traditional DBMS, an analytic DBMS, a data warehouse appliance or a cloud data warehouse. The following factors may help make the decision process easier.

    1. How large is your company?

    Larger companies looking to deploy data warehouse systems generally have more resources, including financial and staffing, which translates to more technology options. It can make sense for these companies to implement multiple data warehouse platforms, such as an RDBMS coupled with an analytical DBMS such as Hewlett Packard Enterprise (HPE) Vertica or SAP IQ. Traditional queries can be processed by the RDBMS, while online analytical processing (OLAP) and nontraditional queries can be processed by the analytical DBMS. Nontraditional queries aren't usually found in transactional applications typified by quick lookups. This could be a document-based query or a free-form search, such as those done on Web search sites like Google and Bing.

    For example, HPE Vertica offers Machine Data Log Text Search, which helps users collect and index large log file data sets. The product's enhanced SQL analytics functions deliver in-depth capabilities for OLAP, geospatial and sentiment analysis. An organization might also consider SAP IQ for in-depth OLAP as a near-real-time service to SAP HANA data.

    Teradata Corp.'s Active Enterprise Data Warehouse (EDW) platform is another viable option for large enterprises. Active EDW is a database appliance designed to support data warehousing that's built on a massively parallel processing architecture. The platform combines relational and columnar capabilities, along with limited NoSQL capabilities. Teradata Active EDW can be deployed on-premises or in the cloud, either directly from Teradata or through Amazon Web Services.

    For midsize organizations, where a mixture of flexibility and simplicity is important, reducing the number of vendors is a good idea. That means looking for suppliers that offer compatible technology across different platforms. For example, Microsoft, IBM and Oracle all have significant software portfolios that can help minimize the number of other vendors an organization might need. Hybrid transaction/analytical processing (HTAP) capabilities that enable a single DBMS to run both transaction processing and analytics applications should also appeal to midsize organizations.

    Smaller organizations and those with minimal IT support should consider a data warehouse appliance or a cloud-based data warehouse as a service (DWaaS) offering. Both options make it easier to get up and running, and minimize the administration work needed to keep a data warehouse functional. In the cloud, for example, Amazon Redshift and IBM dashDB offer fully managed data warehousing services that can lower up-front implementation costs and ongoing management expenses.

    Regardless of company size, it can make sense for an organization to work with a vendor or product that it has experience using. For example, companies using Oracle Database might consider the Oracle Exadata Database Machine, Oracle's data warehouse appliance. Exadata runs Oracle Database 12c, so Oracle developers and DBAs should immediately be able to use the appliance. Also, the up-front system planning and integration required for data warehousing projects is eliminated with Exadata because it bundles the DBMS with compute, storage and networking technologies.

    A similar option for organizations that use IBM DB2 is the IBM PureData System for Analytics, which is based on DB2 for LUW. Keep in mind, however, that data warehouse appliances can be costly, at times pricing themselves out of the market for smaller organizations.

    Microsoft customers should consider the preview release of Microsoft Azure SQL Data Warehouse. It's a fully managed data warehouse service that's compatible and integrated with the Microsoft SQL Server ecosystem.

    2. What are your availability and performance needs?

    Other factors to consider include high availability and rapid response. Most organizations that decide to deploy a data warehouse will likely want both, but not every data warehouse actually requires them.

    When availability and performance are the most important criteria, DWaaS should be at the bottom of your list because of the lower speed imposed by network latency with cloud access. Instead, on-premises deployment can be tuned and optimized by IT technicians to deliver increased system availability and faster performance at the high end. This can mean using the latest features of an RDBMS, including the HTAP capabilities of Oracle Database, or IBM's DB2 with either the IBM DB2 Analytics Accelerator add-on product for DB2 for z/OS or BLU Acceleration capabilities for DB2 for LUW. Most RDBMS vendors offer capabilities such as materialized views, bitmap indexes, zone maps, and high-end compression for data and indexes. For most users, however, satisfactory performance and availability can be achieved with data warehouse appliances such as IBM PureData, Teradata Active EDW and Oracle Exadata. These platforms are engineered for data warehousing workloads, but require minimal tuning and administration.

    Another appliance to consider is the Actian Analytics Platform, which is designed to support high-speed data warehouse implementation and management. The platform combines relational and columnar capabilities, but also includes high-end features for data integration, analytics and performance. It can be a good choice for organizations requiring both traditional and nontraditional data warehouse queries. The Actian Analytics Platform includes Actian Vector, a Symmetric Multiprocessor DBMS designed for high-performance analytics, which exploits many newer, performance-oriented features such as single instruction multiple data. This enables a single operation to be applied on a set of data at once and CPU cache to be utilized as execution memory.

    Pivotal Greenplum is an open source, massively parallel data warehouse platform capable of delivering high-speed analytics on large volumes of data. The platform combines relational and columnar capabilities and can be deployed on-premises as software or an appliance, or as a service in the cloud. Given its open source orientation, Pivotal Greenplum may be viewed favorably by organizations basing their infrastructure on an open source computing stack.

    3. Are you already in the cloud?

    DWaaS is probably the best option for companies that already conduct cloud-based operations. The other data warehouse platform options would require your business to move data from the cloud to an on-premises data warehouse. Keep in mind, though, that in addition to cloud-only options like Amazon Redshift, IBM dashDB and Microsoft Azure SQL Data Warehouse, many data warehouse platform providers offer cloud-based deployments.

    4. What are your data volume and latency requirements?

    Although many large data warehouses contain petabytes of raw data, every data warehouse implementation has different data storage needs. The largest data warehouses are usually customized combinations of RDBMS and analytic DBMS or HTAP implementations. As data volume requirements diminish, more varied options can be utilized, including data warehouse appliances.

    5. Is a data warehouse part of your big data strategy?

    Big data requirements have begun to impact the data warehouse, and many organizations are integrating unstructured and multimedia data into their data warehouse to combine analytics with business intelligence requirements -- aka polyglot data warehousing. If your project could benefit from integrated polyglot data warehousing, you need a platform that can manage and utilize this type of data. For example, the big RDBMS vendors -- IBM, Oracle and Microsoft -- are integrating support for nontraditional data and Hadoop in each of their respective products.

    You may also wish to consider IBM dashDB, which can process unstructured data via its direct integration with IBM Cloudant, enabling you to store and access JSON and NoSQL data. The Teradata Active EDW supports Teradata's Unified Data Architecture, which enables organizations to seamlessly access and analyze relational and nonrelational data. The Actian Analytics Platform delivers a data science workbench, simplifying analytics, as well as a scaled-out version of Actian Vector for processing data in Hadoop. Last, the Microsoft Azure SQL Data Warehouse enables analysis across many kinds of data, including relational data and semi-structured data stored in Hadoop, using its T-SQL language.

    Although organizations have been building data warehouses since the 1980s, the manner in which they are being implemented has changed considerably. After reading this four-part series, you should have a better idea of how modern data warehouses are built and what each of the leading vendors provides. Armed with this knowledge, you can make a more informed choice when purchasing data warehouse products.

    Source: TechTarget

  • Forrester benoemt SAS als leader voor Enterprise Insights platforms

    saslogo280x80 580x358SAS is benoemd tot leider in The Forrester Wave: Enterprise Insight Platforms, Q1 2019. In het rapport wordt opgemerkt dat "SAS Viya een moderne architectuur is met een enkele krachtige analytische engine. Het platform van SAS biedt ook de nauwste integratie die we hebben gezien tussen verschillende analysemogelijkheden, data preparation en governance."

    De succesvolle organisaties van deze tijd worden gedreven door analytische inzichten, niet door intuïtie. SAS Viya op het SAS Platform biedt bedrijven toonaangevende analyticsmogelijkheden om zakelijke beslissingen met zowel snelheid als schaalbaarheid te ondersteunen. Met behulp van een solide en samenhangende omgeving kunnen organisaties hun data op grote schaal manipuleren en verkennen. SAS Viya biedt ook toegang tot geavanceerde analyses en kunstmatige intelligentie (AI), met een extra laag voor transparantie en interpretatie in door AI gegenereerde beslissingen. Daarmee wordt de ‘black box’ van AI voor zowel data scientists als zakelijke gebruikers geopend.

    Volledige levenscyclus van analyses

    Volgens Sarah Gates, Product Marketing Manager voor het SAS-platform, willen bedrijven vertrouwen op een uitgebreid platform dat data management, analytics en ontwikkelingstools orkestreert om inzichten te genereren die hun beslissingen ondersteunen. “Het produceren van zowel snelle als betrouwbare resultaten is van cruciaal belang. SAS Viya biedt deze ondersteuning gedurende de volledige levenscyclus van analytics - van data tot discovery en implementatie.”

    Het rapport van Forrester vermeldt dat SAS een eersteklas toolset heeft voor analyse, voorspellen en streamen van. “Ondersteuning voor notebooks, programmeren voor meerdere talen en meer cloudopties completeren SAS Viya en maken het een goede keuze voor bedrijven met bedrijfskritieke analysebehoeften." SAS scoort het hoogst in de analytics tools categorie en heeft de hoogst mogelijke score behaald in de categorie market presence.

     

    Bron: BI-Platform

     

  • From Patterns to Predictions: Harnessing the Potential of Data Mining in Business  

    From Patterns to Predictions: Harnessing the Potential of Data Mining in Business

    Data mining techniques can be applied across various business domains such as operations, finance, sales, marketing, and supply chain management, among others. When executed effectively, data mining provides a trove of valuable information, empowering you to gain a competitive advantage through enhanced strategic decision-making.

    At its core, data mining is a method employed for the analysis of data, delving into large datasets to unearth meaningful and data-driven insights. Key components of successful data mining encompass tasks like data cleaning, data transformation, and data integration.

    Data Cleaning and Preparation

    Data cleaning and preparation stand as crucial stages within the data mining process, playing a pivotal role in ensuring the effectiveness of subsequent analytical methods. The raw data necessitates purification and formatting to render it suitable for diverse analytic approaches. Encompassing elements such as data modeling, transformation, migration, ETL (Extract, Transform, Load), ELT (Extract, Load, Transform), data integration, and aggregation, this phase is indispensable for comprehending the fundamental features and attributes of data, ultimately determining its optimal utilization.

    The business implications of data cleaning and preparation are inherently clear. Without this initial step, data holds either no meaning for an organization or is compromised in terms of reliability due to quality issues. For companies, establishing trust in their data is paramount, ensuring confidence not only in the data itself but also in the analytical outcomes and subsequent actions derived from those results.

    Pattern and Classification

    The essence of data mining lies in the fundamental technique of tracking patterns, a process integral to discerning and monitoring trends within data. This method enables the extraction of intelligent insights into potential business outcomes. For instance, upon identifying a sales trend, organizations gain a foundation for taking strategic actions to leverage this newfound insight. When it’s revealed that a specific product outperforms others within a particular demographic, this knowledge becomes a valuable asset. Organizations can then capitalize on this information by developing similar products or services tailored to the demographic or by optimizing the stocking strategy for the original product to cater to the identified consumer group.

    In the realm of data mining, classification techniques play a pivotal role by scrutinizing the diverse attributes linked to various types of data. By discerning the key characteristics inherent in these data types, organizations gain the ability to systematically categorize or classify related data. This process proves crucial in the identification of sensitive information, such as personally identifiable data, prompting organizations to take measures to protect or redact this information from documents.

    Connections

    The concept of association in data mining, closely tied to statistics, unveils connections among different sets of data or events within a dataset. This technique highlights the interdependence of specific data points or events, akin to the idea of co-occurrence in machine learning. In this context, the presence of one data-driven event serves as an indicator of the likelihood of another, shedding light on the intricate relationships embedded within the data.

    Outlier Detection

    Outlier detection serves as a critical process in identifying anomalies within datasets. When organizations pinpoint irregularities in their data, it facilitates a deeper understanding of the underlying causes and enables proactive preparation for potential future occurrences, aligning with strategic business objectives. To illustrate, if there’s a notable surge in credit card transactions within specific time frames, organizations can leverage this information to investigate the root cause. Understanding why this surge happens allows them to optimize sales strategies for the remainder of the day, showcasing the practical application of outlier detection in refining business operations.

    Clustering

    Clustering, a pivotal analytics technique, employs visual approaches to comprehend data distributions. Utilizing graphics, clustering mechanisms illustrate how data aligns with various metrics, employing different colors to highlight these distributions. Graphs, particularly in conjunction with clustering, offer a visual representation of data distribution, allowing users to discern trends relevant to their business objectives.

    Regression

    Regression techniques prove invaluable in identifying the nature of relationships between variables in a dataset. Whether causal or correlational, regression, as a transparent white box technique, elucidates the precise connections between variables. Widely applied in forecasting and data modeling, regression provides a clear understanding of how variables interrelate.

    Prediction

    Prediction stands as a potent facet of data mining, constituting one of the four branches of analytics. Predictive analytics leverage patterns in current or historical data to extrapolate insights into future trends. While some advanced approaches incorporate machine learning and artificial intelligence, predictive analytics can also be facilitated through more straightforward algorithms. This predictive capability offers organizations a foresight into upcoming data trends, irrespective of the complexity of the underlying techniques.

    Sequential Data

    Sequential patterns, a specialized data mining technique, focus on unveiling events occurring in a sequence, which is particularly advantageous for analyzing transactional data. This method can reveal customer preferences, such as the type of clothing they are likely to purchase after acquiring a specific item. Understanding these sequential patterns empowers organizations to make targeted recommendations, thereby stimulating sales. VPN ensures the confidentiality of transactional data, preserving the privacy of customers while deriving valuable insights.

    Decision Trees

    Decision trees, a subset of machine learning, serve as transparent predictive models. They facilitate a clear understanding of how data inputs influence outputs. When combined into a random forest, decision trees form powerful predictive analytics models, albeit more complex. While random forests may be considered black box techniques, the fundamental decision tree structure enhances accuracy, especially when compared to standalone decision tree models.

    Data Mining Analytics

    At the heart of data mining analytics lie statistical techniques, forming the foundation for various analytical models. These models produce numerical outputs tailored to specific business objectives. From neural networks to machine learning, statistical concepts drive these techniques, contributing to the dynamic field of artificial intelligence.

    Data Visualizations

    Data visualizations play a crucial role in data mining, offering users insights based on sensory perceptions. Today’s dynamic visualizations, characterized by vibrant colors, are adept at handling real-time streaming data. Dashboards, built upon different metrics and visualizations, become powerful tools to uncover data mining insights, moving beyond numerical outputs to visually highlight trends and patterns.

    Deep Learning

    Neural networks, a subset of machine learning, draw inspiration from the human brain’s neuron structure. While potent for data mining, their complexity necessitates caution. Despite the intricacy, neural networks stand out as accurate models in contemporary machine learning applications, particularly in AI and deep learning scenarios.

    Data Warehousing

    Data warehousing, a pivotal component of data mining, has evolved beyond traditional relational databases. Modern approaches, including cloud data warehouses and those accommodating semi-structured and unstructured data in platforms like Hadoop, enable comprehensive, real-time data analysis, extending beyond historical data usage.

    Analyzing Insights

    Long-term memory processing involves the analysis of data over extended periods. Utilizing historical data, organizations can identify subtle patterns that might evade detection otherwise. This method proves particularly useful for tasks such as analyzing attrition trends over several years, providing insights that contribute to reducing churn in sectors like finance.

    ML and AI

    Machine learning and artificial intelligence represent cutting-edge advancements in data mining. Advanced forms like deep learning excel in accurate predictions at scale, making them invaluable for AI deployments such as computer vision, speech recognition, and sophisticated text analytics using natural language processing. These techniques shine in extracting value from semi-structured and unstructured data.

    Conclusion

    In data mining, each technique serves as a distinct tool for uncovering valuable insights. From the discernment of sequential patterns to the transparent predictability of decision trees, the foundational role of statistical techniques, and the dynamic clarity of visualizations, the array of methods presents a holistic approach. These techniques empower organizations to not only analyze data effectively but also to innovate strategically in an ever-evolving data landscape, ensuring they harness the full potential of their data for informed decision-making and transformative outcomes.

    Date: December 5, 2023

    Author: Anas Baig

    Source: Dataversity

  • Gartner positions Microsoft as a leader in the Magic Quadrant for Operational Database Management Systems

    Microsoft is placed furthest in vision and highest for ability to execute within the Leaders Quadrant.

    With the release of SQL Server 2014, the cornerstone of Microsoft’s data platform, we have continued to add more value to what customers are already buying.  Innovations like workload optimized in-memory technology, advanced security, high availability for mission critical workloads are built-in instead of requiring expensive add-ons. We have long maintained that customers need choice and flexibility to navigate this mobile-first, cloud-first world and that Microsoft is uniquely equipped to deliver on that vision in both trusted environments on-premises and in the cloud.

    Industry analysts have taken note of our efforts and we are excited to share Gartner has positioned Microsoft as a Leader, for the third year in a row, in the Magic Quadrant for Operational Database Management Systems. Microsoft is placed furthest in vision and highest for ability to execute within the Leaders Quadrant.

    Given customers are trying to do more with data than ever before across a variety of data types, at large volumes, the complexity of managing and gaining meaningful insights from the data continues to grow.  One of the key design points in Microsoft data strategy is ensuring ease of use in addition to solving complex customer problems. For example, you can now manage both structured and unstructured data through the simplicity of T-SQL rather than requiring a mastery in Hadoop and MapReduce technologies. This is just one of many examples of how Microsoft values ease of use as a design point. 

    Gartner also recognizes Microsoft as a leader in the Magic Quadrant for Business Intelligence and Analytics Platforms and placed Microsoft as a leader in the Magic Quadrant for Data Warehouse Database Management Systems – recognizing Microsoft’s completeness of vision and ability to execute in the data warehouse market.

    Offering only one piece of the data puzzle isn’t enough to satisfy all the different scenarios in today’s environments and workloads. Our commitment is to make it easy for customers to capture and manage data and to transform and analyze that data for new insights.

    Being named a leader in Operational DBMS, BI & Analytics Platforms, and DW DBMS Magic Quadrants is incredibly important to us: We believe it validates Microsoft is delivering a comprehensive platform that ensures every organization, every team and every individual is empowered to do more and achieve more because of the data at their fingertips.

     

  • Hybrid data and the modern operational data warehouse

    Hybrid data and the modern operational data warehouse

    New requirements for data, software, and business practices are driving a new wave of modernization for the operational data warehouse.

    Modern enterprises looking for growth in revenue and profitability know that data is critically important to gaining competitive advantage. A high return on investment comes from digitally transforming business operations by capturing and analyzing a greater variety and volume of data to inform better business insights. When data sets with extremely diverse structures and characteristics are integrated for multiple use cases in operations and analytics, we call the resulting data set hybrid data.

    Valuable hybrid data comes from an increasing number of different sources, both old and new, internal and external. It is inevitable that hybrid data will arrive in many structures, schemas, and formats with variable characteristics for volume, latency (from batch to streams), concurrency, requirements for storage and in situ processing, and the emerging characteristics of machine data (IoT standards, geocoding, events, images, audio, training data for machine learning, etc.).

    From a technology viewpoint, it is challenging to integrate data of such diverse characteristics. From a business viewpoint, however, integration is well worth the effort because it provides deeper visibility into business processes and richer analytics insights than were possible before hybrid data's greater variety emerged.

    Hybrid data architectures

    Hybrid data usually drives users to deploy many types of database management systems and other data platforms (such as Hadoop and cloud) to capture, store, process, and analyze hybrid data. After all, it's difficult or impossible to optimize a single instance of a single data platform type to satisfy the eclectic requirements of hybrid data's multiple structures, latencies, storage paradigms, and analytics processing methods.

    The diversification of data and the quickening adoption of advanced analytics are some of the strongest drivers toward hybrid data architectures, so called because hybrid data is increasingly distributed across multiple platforms, both on premises and on one or more clouds. For some use cases, the right tool for a particular data type might be sufficient. However, there is more value in integrating access and analysis in a hybrid data architecture that can deliver the scale and performance needed to produce actionable insights.

    The modern operational data warehouse (ODW)

    Hybrid data and hybrid data architectures are already here. To get full business value from them, you need an appropriate data management platform, and that's where the modern operational data warehouse comes in. The modern ODW delivers insights from a hybrid data architecture quickly enough to impact operational business decisions.

    The operational data warehouse continues to focus on speed 

    Note that the operational data warehouse has been with us for decades, sometimes under synonyms such as the real-time, active, or dynamic data warehouse. No matter what you call it, the operational data warehouse has always involved high-performance data ingestion and query so that data travels as fast as possible into and out of the warehouse.

    Through analysis, an ODW provides timely insights for time-sensitive decisions such as real-time offers in e-commerce, network optimization, fraud detection, and investment decisions in trading environments. However, an ODW also supports time-sensitive operational processes such as just-in-time inventory, business monitoring, and operational reporting.

    Performance and real-time requirements continue to apply to the ODW. However, a modern ODW must also handle a broader range of data types and sources at unprecedented scale as well as new forms of analytics. The modern ODW satisfies requirements old and new largely by leveraging the speed and scale of new data platforms and analytics tools.

    The modern ODW is a hybrid data management solution and is hybrid in multiple ways 

    It integrates hybrid data from multiple operational systems and other sources. The modern ODW is built to handle modern data, which trends toward hybrid combinations. Furthermore, an implementation of a modern ODW may itself be hybrid when it spans both on-premises and cloud systems. In addition, a modern ODW tends to have substantial data integration capabilities that integrate data among the source and target systems of a hybrid data architecture.

    The best ODWs operate with very low latency

    A modern ODW is built for today's hybrid data and business use cases that demand real-time or near-real-time performance. Low-latency use cases supported by modern ODWs include real-time analytics, operational reporting, management dashboards, business activity monitoring, catching fraud before cash leaves the ATM, and making an offer before the potential customer leaves the store or website.

    A modern ODW is strong where other approaches are weak 

    For example, the traditional enterprise data warehouse is great as a corporate 'single source of truth' but inflexible and expensive. Operational data stores are fast in a limited domain but not extensible to larger enterprise needs. Data lakes are great for storing big and varied data economically but poor at data governance and predictable performance.

    By comparison, a modern ODW is built on the latest technology for superior speed, scale, maintenance, functionality, and cost containment. In addition, a modern ODW assumes that leveraging hybrid data is its raison d'etre, so it is built to handle an extremely broad range of data types at massive scale with extremely high performance.

    Given this daunting list of system requirements, it is unlikely that a user organization can satisfy even half of them with a homegrown system that was built by IT groups or consultants. Therefore, users should seek vendor-built systems designed and optimized for modern operational data warehousing.

    A successful ODW leverages recent advancements in data platforms and tools 

    These include parallel execution, columnar databases, in-memory execution, high-speed storage, distributed file systems, scalable clusters, elastic clouds, cloud-based databases, and managed services for cloud data solutions. Because of the extreme diversity of hybrid data, a successful ODW will interoperate via many access methods (such as R, Scala, SQL, or GUI), accommodate a wide variety of user skills (from data scientist to business user), and flexibly support new deployment models (data center, public cloud, private cloud, managed service, multicloud, and hybrid cloud, alone or in any combination).

    Author: Philip Russom

    Source: TDWI

  • Master Data Management and the role of (un)structured data

    MasterDataManagementTraditional conversations about master data management’s utility have centered on determining what actually constitutes MDM, how to implement data governance with it, and the balance between IT and business involvement in the continuity of MDM efforts.

    Although these concerns will always remain apposite, MDM’s overarching value is projected to significantly expand in 2018 to directly create optimal user experiences—for customers and business end users. The crux of doing so is to globalize its use across traditional domains and business units for more comprehensive value.

    “The big revelation that customers are having is how do we tie the data across domains, because that reference of what it means from one domain to another is really important,” Stibo Systems Chief Marketing Officer Prashant Bhatia observed.

    The interconnectivity of MDM domains is invaluable not only for monetization opportunities via customer interactions, but also for streamlining internal processes across the entire organization. Oftentimes the latter facilitates the former, especially when leveraged in conjunction with contemporary opportunities related to the Internet of Things and Artificial Intelligence.

    Structured and Unstructured Data

    One of the most eminent challenges facing MDM related to its expanding utility is the incorporation of both structured and unstructured data. Fueled in part by the abundance of external data besieging the enterprise from social, mobile, and cloud sources, unstructured and semi-structured data can pose difficulties to MDM schema.

    After attending the recent National Retail Federation conference with over 30,000 attendees, Bhatia noted that one of the primary themes was, “Machine learning, blockchain, or IoT is not as important as how does a company deal with unstructured data in conjunction with structured data, and understand how they’re going to process that data for their enterprise. That’s the thing that companies—retailers, manufacturers, etc.—have to figure out.”

    Organizations can integrate these varying data types into a single MDM platform by leveraging emerging options for schema and taxonomies with global implementations, naturally aligning these varying formats together. The competitive advantage generated from doing so is virtually illimitable. 

    Original equipment manufacturers and equipment asset management companies can attain real-time, semi-structured or unstructured data about failing equipment and use that to influence their product domain with attributes informing the consequences of a specific consumer’s tire, for example. The aggregation of that semi-structured data with structured data in an enterprise-spanning MDM system can influence several domains. 

    Organizations can reference it with customer data for either preventive maintenance or discounted purchase offers. The location domain can use it to provide these services close to the customer; integrations with lifecycle management capabilities can determine what went wrong and how to correct it. “That IoT sensor provides so much data that can tie back to various domains,” Bhatia said. “The power of the MDM platform is to tie the data for domains together. The more domains that you can reference with one another, you get exponential benefits.”

    Universal Schema

    Although the preceding example pertained to the IoT, it’s worth noting that it’s applicable to virtually any data source or type. MDM’s capability to create these benefits is based on its ability to integrate different data formats on the back end. A uniformity of schema, taxonomies, and data models is desirable for doing so, especially when using MDM across the enterprise. 

    According to Franz CEO Jans Aasman, traditionally “Master Data Management just perpetuates the difficulty of talking to databases. In general, even if you make a master data schema, you still have the problem that all the data about a customer, or a patient, or a person of interest is still spread out over thousands of tables.” 

    Varying approaches can address this issue; there is growing credence around leveraging machine learning to obtain master data from various stores. Another approach is to considerably decrease the complexity of MDM schema so it’s more accessible to data designated as master data. By creating schema predicated on an exhaustive list of business-driven events, organizations can reduce the complexity of myriad database schemas (or even of conventional MDM schemas) so that their “master data schema is incredibly simple and elegant, but does not lose any data,” Aasman noted.

    Global Taxonomies

    Whether simplifying schema based on organizational events and a list of their outcomes or using AI to retrieve master data from multiple locations, the net worth of MDM is based on the business’s ability to inform the master data’s meaning and use. The foundation of what Forrester terms “business-defined views of data” is oftentimes the taxonomies predicated on business use as opposed to that of IT. Implementing taxonomies enterprise-wide is vital for the utility of multi-domain MDM (which compounds its value) since frequently, as Aasman indicated, “the same terms can have many different meanings” based on use case and department.

    The hierarchies implicit in taxonomies are infinitely utilitarian in this regard, since they enable consistency across the enterprise yet have subsets for various business domains. According to Aasman, the Financial Industry Bank Ontology can also function as a taxonomy in which, “The higher level taxonomy is global to the entire bank, but the deeper you go in a particular business you get more specific terms, but they’re all bank specific to the entire company.” 

    The ability of global taxonomies to link together meaning in different business domains is crucial to extracting value from cross-referencing the same master data for different applications or use cases. In many instances, taxonomies provide the basis for search and queries that are important for determining appropriate master data.

    Timely Action

    By expanding the scope of MDM beyond traditional domain limitations, organizations can redouble the value of master data for customers and employees. By simplifying MDM schema and broadening taxonomies across the enterprise, they increase their ability to integrate unstructured and structured data for timely action. “MDM users in a B2B or B2C market can provide a better experience for their customers if they, the retailer and manufacturer, are more aware and educated about how to help their end customers,” Bhatia said.

     

    Author: Jelani Harper

    Source: Information Management

  • The differences between data lakes and data warehouses: a brief explanation

    The differences between data lakes and data warehouses: a brief explanation

    When comparing data lake vs. data warehouse, it's important to know that these two things actually serve quite different roles. They manage data differently and serve their own types of functions.

    The market for data warehouses is booming. One study forecasts that the market will be worth $23.8 billion by 2030. Demand is growing at an annual pace of 29%.

    While there is a lot of discussion about the merits of data warehouses, not enough discussion centers around data lakes. 

    Both data warehouses and data lakes are used when storing big data. On the other hand, they are not the same. A data warehouse is a storage area for filtered, structured data that has been processed already for a particular use, while Data Lake is a massive pool of raw data and the aim is still unknown.

    Many people are confused about these two, but the only similarity between them is the high-level principle of data storing.  It is vital to know the difference between the two as they serve different principles and need diverse sets of eyes to be adequately optimized. However, a data lake functions for one specific company, the data warehouse, on the other hand, is fitted for another.

    This blog will reveal or show the difference between the data warehouse and the data lake. Below are their notable differences.

    Data Lake

    • Type of Data: structured and unstructured from different sources of data
    • Purpose: Cost-efficient big data storage
    • Users: Engineers and scientists
    • Tasks: storing data as well as big data analytics, such as real-time analytics and deep learning
    • Sizes: Store data which might be utilized

    Data Warehouse

    • Data Type: Historical which has been structured in order to suit the relational database diagram
    • Purpose: Business decision analytics
    • Users: Business analysts and data analysts
    • Tasks: Read-only queries for summarizing and aggregating data
    • Size: Just stores data pertinent to the analysis

    Data Type

    Data cleaning is a vital data skill as data comes in imperfect and messy types. Raw data that has not been cleared is known as unstructured data; this includes chat logs, pictures, and PDF files. Unstructured data that has been cleared to suit a plan, sort out into tables, and defined by relationships and types, is known as structured data. This is a vital disparity between data warehouses and data lakes.

    Data warehouses contain historical information that has been cleared to suit a relational plan. On the other hand, data lakes store from an extensive array of sources like real-time social media streams, Internet of Things devices, web app transactions, and user data. This data is often structured, but most of the time, it is messy as it is being ingested from the data source.

    Purpose

    When it comes to principles and functions, Data Lake is utilized for cost-efficient storage of significant amounts of data from various sources. Letting data of whichever structure decreases cost as it is flexible as well as scalable and does not have to suit a particular plan or program. On the other hand, it is easy to analyze structured data as it is cleaner. It also has the same plan to query from. A data warehouse is very useful for historical data examination for particular data decisions by limiting data to a plan or program.

    You might see that both set off each other when it comes to the workflow of the data. The ingested organization will be stored right away into Data Lake. Once a particular organization concern arises, a part of the data considered relevant is taken out from the lake, cleared as well as exported.

    Users

    Each one has different applications, but both are very valuable for diverse users. Business analysts and data analysts out there often work in a data warehouse that has openly and plainly relevant data which has been processed for the job. Data warehouse needs a lower level of knowledge or skill in data science and programming to use.

    Engineers set up and maintained data lakes, and they include them into the data pipeline. Data scientists also work closely with data lakes because they have information on a broader as well as current scope.

    Tasks

    Engineers make use of data lakes in storing incoming data. On the other hand, data lakes are not just restricted to storage. Keep in mind that unstructured data is scalable and flexible, which is better and ideal for data analytics. A big data analytic can work on data lakes with the use of Apache Spark as well as Hadoop. This is true when it comes to deep learning that needs scalability in the growing number of training information.

    Usually, data warehouses are set to read-only for users, most especially those who are first and foremost reading as well as collective data for insights. The fact that information or data is already clean as well as archival, usually there is no need to update or even insert data.

    Size

    When it comes to size, Data Lake is much bigger than a data warehouse. This is because of the fact that Data Lake keeps hold of all information that may be pertinent to a business or organization. Frequently, data lakes are petabytes, which is 1,000 terabytes. On the other hand, the data warehouse is more selective or choosy on what information is stored.

    Understand the Significance of Data Warehouses and Data Lakes

    If you are settling between data warehouse or data lake, you need to review the categories mentioned above to determine one that will meet your needs and fit your case. In case you are interested in a thorough dive into the disparities or knowing how to make data warehouses, you can partake in some lessons offered online.

    Always keep in mind that sometimes you want a combination of these two storage solutions, most especially if developing data pipelines.

    Author: Liraz Postan

    Source: Smart Data Collective

  • The most important BI trends to watch: perspective of data pipeline producer Fivetran

    The most important BI trends to watch: perspective of data pipeline producer Fivetran

    There’s nothing more ritical to successful BI than centralized data. TDWI spoke to George Fraser, CEO and technical cofounder of Fivetran, who explains why this is so important, what technologies are important now, and what emerging technology you should be paying attention to.

    What technology or methodology must be part of an enterprise’s data strategy if it wants to be competitive nowadays?

    George Fraser: You need to have a tool that centralizes your data into a single location. You’re not going to be able to do all the things that you want to do with your data unless you have it all centralized.

    What one emerging technology are you most excited about and think has the greatest potential? What’s so special about this technology?

    GF: Data warehouses that separate compute from storage, such as Snowflake and BigQuery. These warehouses leverage the cloud in a fundamentally different way than earlier data warehouses. When you’re able to scale your compute up and down on demand, many of the problems that people had in the past with data warehouses just disappear. Professionals that spent 90% of their time working on data warehouse issues in the past can now spend their time on projects that drive value for the business. Everyone knows this technology is special, but it's even more special than people realize.

    What is the single biggest challenge enterprises face today? How do most enterprises respond (and is it working)?

    GF: I think the single biggest challenge businesses, not just enterprises, face is hiring great talent. We’ve been in a ten-year economic boom and the job market is about as tight as it can get.

    One thing that makes an organization a good place to work is transparency. A lot of people talk about using data to make better business decisions, which is important. But there is a second benefit that relates directly to finding talent and hiring, which is that you can create transparency with data.

    As an employer, you can ensure that everyone in your company across all departments and levels has access to the data being used to drive decisions. You can use data to create transparency so that everyone can understand what is going on in the business and why decisions are being made the way they are.

    Your employees should be able to see whether the things they’re doing are working or contributing to the overall goals of the business. This can be achieved through data and data transparency.

    Is there a new technology in data and analytics that is creating more challenges than most people realize? How should enterprises adjust their approach to it?

    GF: There are great new BI tools out there, such as Looker and Sigma Computing, as well as great new data warehouses, such as BigQuery and Snowflake. But I think what people sometimes don't realize is that these tools can't do anything about bringing the data into themselves. They give you a great environment to analyze your data, but they won't put the data in that environment. Although a lot of their marketing materials talk about getting all of your data in one place, they don’t actually solve that problem.

    Getting your data together is difficult. Oftentimes companies end up deciding to put together their data warehouse and do the ETL (extract, transform, load) with their own engineering team. They don't necessarily realize that they're signing up for a mammoth effort to centralize their data that is going to metastasize across the organization, consuming unthinkable amounts of time and resources.

    Enterprises should adjust their approach by using ELT tools (which are different than ETL) that specifically solve for the data centralization problem.

    Where do you see analytics and data management headed in 2019 and beyond? What’s just over the horizon that we haven’t heard much about yet?

    GF: I think we’re going to see a lot of the existing trends continue, including increased adoption of cloud data warehouses, which I think will become one of the dominant use cases of cloud computing. The main job of IT at a lot of traditional companies is data warehousing. As more and more traditional companies move to the cloud, we’re going to see data warehousing representing a growing percentage of what is being done in the cloud.

    Something just over the horizon I find very interesting is Azure Data Explorer. It is basically a complete reconceptualization of the data warehouse, including a totally new query language that is fascinating. Although it has just recently become publicly available, I am told it has been widely adopted within Microsoft. I’m interested to see how it plays out because it is so ambitious.

    Tell us about your product/solution and the problem it solves for enterprises.

    GF: Fivetran is a zero-configuration data pipeline that centralizes all your business data in your data warehouse. Centralizing your data is the hardest part of building an enterprise data warehouse, and we’ve built the only truly turnkey solution.

    Author: James E. Powell

    Source: TDWI

  • The Power of Real-Time Data Warehousing  

    The Power of Real-Time Data Warehousing

    Modern data management techniques which include real-time data warehousing are transforming how businesses use awareness. The tools are provided to the businesses that they need to stay on the cutting edge of data-driven decision-making by enabling the consistent integration of streaming data into conventional data warehouse solutions. The continual ingestion and analysis of data as it flows in real-time is made possible by this ground-breaking fusion, guaranteeing that organizations have access to the most recent data.

    With some additional improving operational efficiency, this transformation also makes it possible to take neutralizing actions, keep an eye on things in real-time, and have a quick reaction to shifting market conditions. Real-time Data Warehousing is important for surviving in this age of data-driven competition.

    Making timely and educated decisions is necessary for staying ahead of the competition in today’s fast-paced corporate environment. When it comes to maintaining and analyzing historical data, traditional data warehousing solutions have proved invaluable, but they frequently fail to meet when it comes to offering real-time insights. Organizations are increasingly utilizing real-time data warehousing, which integrates streaming data for real-time intelligence, to close this gap. We’ll discuss real-time data warehousing and how it is changing how businesses manage data in this post.

    Data Warehousing’s Development

    Since its inception, data warehousing has advanced significantly. Data warehouses were initially created largely for the purpose of organizing and preserving historical data for use in reporting and analysis. They were distinguished by batch processing, in which data was periodically gathered, converted, and loaded (ETL), typically on a nightly basis, into the warehouse. This method had some drawbacks, particularly when it is needed for quick answers to important questions.

    The Demand for Instantaneous Insights

    In the modern digital world, the data is produced at an unparalleled rate. Online interactions between customers and enterprises, continuous data production from IoT devices, and information stream generation from social media platforms. Organizations need real-time insights to make use of this plethora of data. Consider how an online retailer may alter marketing strategies on the fly by watching website traffic and sales in real-time, or how a banking institution could spot fraudulent transactions as they take place. Real-time data warehousing makes it possible for these scenarios.

    Real-time Data Warehousing: An Overview

    An architectural strategy called real-time data warehousing enables businesses to acquire, process, and analyze streaming data in real-time alongside their conventional historical data. This is achieved by combining streaming data platforms with established data warehousing techniques. Let’s examine some basic elements and tenets of real-time data warehousing.

    Organizations utilize streaming data platforms like Apache Kafka or AWS Kinesis to ingest data in real time. These technologies enable the continual absorption of data in manageable pieces.

    After streaming data has been ingested, it is processed in real-time. This may involve data aggregation, transformation, and enrichment. For this, contemporary tools like Spark Streaming and Apache Flink are used.

    • Integration with Data Warehouse: The classic data warehouse, often known as the “lakehouse” concept, is easily integrated with the processed streaming data. This blends the advantages of real-time data analytics with data warehousing.
    • Analytics and Querying: Business users have the ability to run real-time queries on both historical and streaming data. This process is facilitated by SQL-like querying languages and robust analytical tools, which offer quick insights into shifting data trends.

    Real-time Data Warehousing Benefits

    Real-time data warehousing adoption benefits businesses in a number of ways.

    • Faster Decision-Making: With the help of real-time information, organizations can act fast in response to rapidly altering market conditions and consumer behavior. Personalized customer interactions based on real-time data allow businesses to improve customer happiness and loyalty. Operations that are more cost-effective and efficient can be optimized using real-time data, including supply chain management.
    • Competitive Advantage: Organisations that can make use of real-time data have an advantage over rivals in terms of innovation and reactivity.
    • Data Integrity: Real-time processing helps businesses spot and resolve data integrity problems as they arise, resulting in accurate and trustworthy insights.

    Challenges and Things to Think About

    While real-time data warehousing has many advantages, there are drawbacks as well:

    • Complexity: Setting up and maintaining real-time data warehousing may be challenging and need a high level of technical competence.
    • Cost: Real-time data warehousing solutions can be expensive to build and operate, especially when dealing with large amounts of data.
    • Data Security: Sensitive data must be safeguarded throughout transmission and storage, which raises security issues with real-time data streaming.
    • Scalability: For on-premises solutions in particular, ensuring scalability and performance as data quantities increase can be a challenging task.

    Conclusion

    The management and analysis of data in enterprises is changing as a result of real-time data warehousing. Organizations may make educated decisions in real time by integrating streaming data with conventional Data Warehouse Solutions, which improves customer experiences, operational effectiveness, and competitive advantages. Despite its obstacles, real-time data warehousing is becoming increasingly popular across industries and is a vital part of contemporary data management methods. Businesses that adopt real-time data warehousing will be better positioned to prosper in the digital era as the data landscape continues to change.

    For organizations looking for up-to-date insights, streaming data must be integrated into real-time data warehousing. The rapidly increasing amount of real-time data coming from sources like IoT devices and social media is too much for traditional data warehousing solutions, which are built for batch processing. 

    Businesses may obtain timely information, enable quicker decision-making, improve customer experiences, and maintain competitiveness in today’s fast-paced environment by embracing streaming data. Continuous data ingestion, real-time analytics, and quick reaction to shifting trends are all made possible by this change. In summary, the incorporation of streaming data into data warehousing enables businesses to fully utilize their data, spurring innovation and expansion.

    Date: October 9, 2023

    Author: James Warner 

    Source: Datafloq

  • Three trends that urge to modernization of data warehouses

    Modern Data Warehouse image Jan 2017

    In the last couple of years, we’ve seen the rapid adoption of machine learning into the analytics environment, moving from science experiment to table stakes. In fact, at this point, I’m hard pressed to think of an enterprise that doesn’t have at least some sort of predictive or machine learning strategy already in place.

    Meanwhile, data warehouses have long been the foundation of analytics and business intelligence––but they’ve also traditionally been complex and expensive to operate. With the widespread adoption of machine learning and the increasing need to broaden access to data beyond just data science teams, we are seeing a fundamental shift in the way organizations should approach data warehousing.

    With this in mind, here are three broad data management trends I expect will accelerate this year:

    Operationalize insights with analytical databases

    I’m seeing a lot of convergence between machine learning and analytics. As a result, people are using machine learning frameworks such as R, Python, and Spark to do their machine learning.

    They also then do their best to make those results available in ways that are accessible to the rest of the business beyond only data scientists. These talented data scientists are hacking away using their own tools but these are just not going to be accessed by business analysts. 

    How you get the best of both worlds is to allow data scientists to use their tools of choice to produce their predictions, but then publish those results to an analytical database, which is more open to business users. The business user is already familiar with tools like Tableau, so by using an analytical database they can easily operationalize insights from the predictive model outcomes.

    Growth in streaming data sources

    Similar to the convergence of machine learning and analytics, I’m also seeing much greater interest in how to support streaming use cases or streaming data sources. 

    There are a number of technologies, among them Kafka, that provide a way to capture and propagate streams and do stream-based processing. Many systems from web analytics stacks to a single microservice in someone’s application stack are pushing out interesting events to a Kafka topic. But how do you consume that? 

    There are specialized streaming databases, for example, that allow you to consume this in real time. In some cases that works well but in others it's not as natural, especially when trending across larger data ranges. Accomplishing this is easier by pushing that streaming data into an analytics database.

    The ephemeral data mart

    The third trend I’m seeing more of, and I expect to accelerate in 2018, is what I would call the ephemeral data mart. 

    What I mean by that is to quickly bring together a data set, perform some queries, and then the data can be thrown away. As such, data resiliency and high availability become less important than data ingestion and computation speed. I’m seeing this in some of our customers and expect to see more.

    One customer in particular is using an analytics database to do processing of very large test results. By creating an ephemeral data mart for each test run, they can perform post-test analysis and trending, then just store the results for the longer term. 

    As organizations need better and more timely analytics that fit within their hardware and cost budgets, it’s changing the ways data is accessed and stored. The trends I’ve outlined above are ones that I expect to gather steam this year, and can serve as guideposts for enterprises that recognize the need to modernize their approach to data warehouses.

    Author: Dave Thompson

    Source: Information Management

  • Why data lakes are the future of data storage

    Why data lakes are the future of data storage

    The term big data has been around since 2005, but what does it actually mean? Exactly how big is big? We are creating data every second. It’s generated across all industries and by a myriad of devices, from computers to industrial sensors to weather balloons and countless other sources. According to a recent study conducted by Data Never Sleeps, there are a quintillion bytes of data generated each minute, and the forecast is that our data will only keep growing at an unprecedented rate.

    We have also come to realize just how important data really is. Some liken its value to something as precious to our existence as water or oil, although those aren’t really valid comparisons. Water supplies can fall and petroleum stores can be depleted, but data isn’t going anywhere. It only continues to grow. Not just in volume, but also in variety and velocity. Thankfully, over the past decade, data storage has become cheaper, faster and more easily available, and as a result, where to store all this information isn’t the biggest concern anymore. Industries that work in the IoT and faster payments space are now starting to push data through at a very high speed and that data is constantly changing shape.

    In essence, all this gives rise to a 'data demon'. Our data has become so complex that normal techniques for harnessing it often fail, keeping us from realizing data’s full potential.

    Most organizations currently treat data as a cost center. Each time a data project is spun off, there is an 'expense' attached to it. It’s contradictive. On the one side, we’re proclaiming that data is our most valuable asset, but on the other side we perceive it as a liability. It’s time to change that perception, especially when it comes to banks. The volumes of data financial institutions have can be used to create tremendous value. Note that I’m not talking about 'selling the data', but leveraging it more effectively to provide crisp analytics that deliver knowledge and drive better business decisions.

    What’s stopping people from converting data from an expense to an asset, then? The technology and talent exist, but the thought process is lacking.

    Data warehouses have been around for a long time and traditionally were the only way to store large amounts of data that’s used for analytical and reporting purposes. However, a warehouse, as the name suggests, immediately makes one think of a rigid structure that’s limited. In a physical warehouse, you can store products in three dimensions: length, breadth and height. These dimensions, though, are limited by your warehouse’s architecture. If you want to add more products, you must go through a massive upgrade process. Technically, it’s doable, but not ideal. Similarly, data warehouses present a bit of rigidity when handling constantly changing data elements.

    Data lakes are a modern take on big data. When you think of a lake, you cannot define its shape and size, nor can you define what lives in it and how. Lakes just form, even if they are man-made. There is still an element of randomness to them and it’s this randomness that helps us in situations where the future is, well, sort of unpredictable. Lakes expand and contract, they change over periods of time, and they have an ecosystem that’s home to various types of animals and organisms. This lake can be a source of food (such as fish) or fresh water and can even be the locale for water-based adventures. Similarly, a data lake contains a vast body of data and is able to handle that data’s volume, velocity and variety.

    When the mammoth data organizations like Yahoo, Google, Facebook and LinkedIn started to realize that their data and data usage were drastically different and that it was almost impossible to use traditional methods to analyze it, they had to innovate. This in turn gave rise to technologies like document-based databases and big data engines like Hadoop, Spark, HPCC Systems and others. These technologies were designed to allow the flexibility one needs when handling unpredictable data inputs.

    Jeff Lewis is SVP of Payments at Sutton Bank, a small community bank that’s challenging the status quo for other banks in the payments space. 'Banks have to learn to move on from data warehouses to data lakes. The speed, accuracy and flexibility of information coming out of a data lake is crucial to the increased operational efficiency of employees and to provide a better regulatory oversight', said Lewis. 'Bankers are no longer old school and are ready to innovate with the FinTechs of the world. A data centric thought process and approach is crucial for success'.

    Data lakes are a natural choice to handle the complexity of such data, and the application of machine learning and AI are also becoming more common, as well. From using AI to clean and augment incoming data, to running complex algorithms to correlate different sources of information to detect complex fraud, there is an algorithm for just about everything. And now, with the help of distributed processing, these algorithms can be run on multiple clusters and the workload can be spread across nodes.

    One thing to remember is that you should be building a data lake and not a data swamp. It’s hard to control a swamp. You cannot drink from it, nor can you navigate it easily. So, when you look at creating a data lake, think about what the ecosystem looks like and who your consumers are. Then, embark on a journey to build a lake on your own.

    Source: Insidebigdata

EasyTagCloud v2.8