169 items tagged "big data"

  • 'Mkb kan nog veel winnen op gebied van big data'

    mkb kan nog veel winnen gebied van big data Werken met big data, grote en ongestructureerde gegevensbestanden, is voor veel mkb'ers nog een ver-van-mijn-bed-show. Het ontbreekt ondernemers aan tijd en geld om zich in data te verdiepen, maar ook aan kennis en kunde.

    Dat blijkt uit het onderzoek Ondernemen met (big) data door het mkb van de Kamer van Koophandel en de Jheronimus Academy of Data Science (JADS) onder 1.710 leden van het KvK Ondernemerspanel. JADS is een initiatief van de universiteiten van Eindhoven en Tilburg, de provincie Noord-Brabant en de gemeente Den Bosch.

    Bijna de helft van de ondervraagden (44 procent) ziet een kans in big data, terwijl de andere helft (49 procent) geen relevantie ziet voor het eigen bedrijf. Vier van de tien ondernemers zien werken met data als kostenpost en niet als strategisch middel.

    Van alle ondernemers schenkt 37 procent nauwelijks en 25 procent niet structureel aandacht aan databeheer. Bij 40 procent staat databeheer wel structureel op de radar. Het inzetten van data als een dagelijkse, strategische activiteit gebeurt nauwelijks.

    Hekkensluiters

    De sectoren ICT en industrie zijn het meest ontwikkeld, gevolgd door logistiek en financiën. Landbouw, cultuur en sport, bouw en onderwijs zijn de hekkensluiters, terwijl big data in die sectoren juist een steeds grotere rol spelen.

    Verder speelt de omzet van een bedrijf een rol bij hoe belangrijk big data worden gevonden. Bedrijven die minder dan 250.000 euro per jaar omzetten, zien vaak de relevantie niet van big data. Bij een omzet van meer dan een miljoen euro is die relevantie boven elke twijfel verheven.

    Bijna de helft van de mkb'ers zegt graag hulp te krijgen bij ondernemen met big data. Zo willen ze raad bij het bepalen van concrete ondernemingskansen.

    10 procent van het mkb bestaat uit "koplopers" in big data. Deze koplopers groeien gemiddeld meer dan 5 procent per jaar en hebben een gemiddelde omzet van bijna twee miljoen euro. Ook zijn ze vergeleken met andere mkb'ers innovatiever en exporteren ze vaker.

    Source: nu.nl, 10 januari 2017

  • ‘Als je wilt overleven als bedrijf móét je ruimte voor Big Data inruimen’

    shutterstock 139983571De toekomst van bedrijven hangt af van Big Data aldus Emile Aarts, rector magnificus van de Tilburg University. In een interview met Management Team vertelt de rector magnificus over de typen toepassingen die hij als kansrijk ziet.
    Beeld ‘Als je wilt overleven als bedrijf móét je ruimte voor Big Data inruimen’
     
    Zoals Copernicus, Darwin en Freud ons wereldbeeld hebben laten kantelen, zo gaan Big Data dat ook doen. De invloed van Big Data wordt groot, zeer groot, aldus Aarts in het interview.
     
    Process Mining
    Een van de voorbeelden die hij noemt is Process Mining. Daarmee is het mogelijk om met ‘event logs’ en andere data in het bedrijf processen in een organisatie in kaart te brengen. Hiermee is te zien hoe het een en ander werkelijk verloopt, in hoeverre men afwijkt van de processen op papier. Procesverbeteringen kunnen daardoor nauwkeuriger en sneller worden doorgevoerd.
     
    Scenario’s
    Op de vraag of managers wel leiding kunnen geven aan data deskundigen antwoordt Aarts dat ondernemers en managers wel enig verstand moeten hebben van Big Data en kunstmatige intelligentie. Maar met verstand van Big Data zijn ze er nog niet. “De informatie op basis waarvan zij beslissingen moeten, zal steeds meer worden gevisualiseerd in scenario’s. Daarmee moeten ze dus ook kunnen omgaan, wat voor de manager die het liefste informatie in een spreadsheet heeft nog lastig kan zijn,” aldus Aarts.
     
    Soft skills
    Ook worden soft skills voor managers steeds belangrijker. “Mensen kunnen zich onderscheiden door hun empathische vermogen, hun communicatieve vaardigheden en de manier waarop ze anderen kunnen inspireren. Zeker als ze aan jongere mensen leiding willen geven, die wars zijn van hiërarchische structuren en graag werken in bedrijven waar beslissingen in sterke mate collectief worden genomen,” aldus Aarts.
     
    Bron: hrpraktijk.nl, 14 november 2016
  • ‘Van big data profiteren kan pas als belangrijke zorgen zijn weggenomen’

    546400Big data kan duizenden banen scheppen en miljarden euro’s aan omzet genereren. Van de kansen die big data biedt kan echter niet worden geprofiteerd zolang belangrijke zorgen op het gebied van privacy en beveiligen niet zijn verholpen.

    Dit meldt het Science and Technology Committee van het Britse Lagerhuis in het rapport ‘The Big Data Dilemma’ (pdf), waar onder andere technologie experts, onderzoekers, privacyspecialisten en open data experts aan hebben meegewerkt. Naar verwachting kan Big Data in alleen al het Verenigd Koninkrijk in de komende vijf jaar voor 200 miljard dollar omzet zorgen. Big data biedt dan ook enorme kansen.

    Misbruik van persoonlijke informatie
    Het rapport waarschuwt dat van deze kansen echter niet geprofiteerd kan worden indien bedrijven misbruik maken van persoonlijke informatie. Het is dan ook noodzakelijk dat er voldoende maatregelen worden genomen om de privacy van gebruikers en de veiligheid van verzamelde data te waarborgen. Zo adviseert het Science and Technology Committee het misbruik van big data strafbaar te stellen. De Britse overheid zou hiermee niet moeten wachten op de introductie van Europese regelgeving op dit gebied, maar juist vooruit lopen door nu alvast dergelijke wetgeving te introduceren. Dit kan zorgen van burgers over privacy en de veiligheid van hun data wegnemen.

    Om de groeiende juridische en ethische uitdagingen rond big data aan te pakken wil het Lagerhuis daarom een Council of Data Ethics opzetten. Deze raad zou onderdeel moeten worden van het Alan Turing Institute, het nationale instituut in het VK op het gebied van datawetenschappen.

    Bedrijven analyseren slechts 12% van hun data
    Op het gebied van big data kunnen bedrijven zich over het algemeen nog flink ontwikkelen.Bedrijven die big data inzetten zouden 10% productiever zijn dan bedrijven hun grote datasets niet analyseren. Desondanks schatten de meeste bedrijven slechts 12% van hun data daadwerkelijk te analyseren. Naar schatting zal big data in de komende vijf jaar pakweg 58.000 banen opleveren in het VK.

    Rapport The Big data Dilemma

    Source: Executive People

  • ‘Vooruitgang in BI, maar let op ROI’

    5601405Business intelligence (bi) werd door Gartner al benoemd tot hoogste prioriteit voor de cio in 2016. Ook de Computable-experts voorspellen dat er veel en grote stappen genomen gaan worden binnen de bi. Tegelijkertijd moeten managers ook terug kijken en nadenken over hun businessmodel bij de inzet van big data: hoe rechtvaardig je de investeringen in big data?

    Kurt de Koning, oprichter van Dutch Offshore ICT Management
    Business intelligence/analytics is door Gartner op nummer één gezet voor 2016 op de prioriteitenlijst voor de cio. Gebruikers zullen in 2016 hun beslissingen steeds meer laten afhangen van stuurinformatie die uit meerdere bronnen komt. Deze bronnen zullen deels bestaan uit ongestructureerde data. De bi-tools zullen dus niet alleen visueel de informatie aantrekkelijk moeten opmaken en een goede gebruikersinterface moeten bieden. Bij het ontsluiten van de data zullen die tools zich onderscheiden , die in staat zijn om orde en overzicht te scheppen uit de vele verschijningsvormen van data.

    Laurent Koelink, senior interim BI professional bij Insight BI
    Big data-oplossingen naast traditionele bi
    Door de groei van het aantal smart devices hebben organisaties steeds meer data te verwerken. Omdat inzicht (in de breedste zin) een van de belangrijkste succesfactoren van de toekomst gaat zijn voor veel organisaties die flexibel in willen kunnen spelen op de vraag van de markt, zullen zijn ook al deze nieuwe (vormen) van informatie moeten kunnen analyseren. Ik zie big data niet als vervangen van traditionele bi-oplossingen, maar eerder als aanvulling waar het gaat om analytische verwerking van grote hoeveelheden (vooral ongestructureerde) data.

    In-memory-oplossingen
    Organisaties lopen steeds vaker aan tegen de performance-beperkingen van traditionele database systemen als het gaat om grote hoeveelheden data die ad hoc moeten kunnen worden geanalyseerd. Specifieke hybride database/hardware-oplossingen zoals die van IBM, SAP en TeraData hebben hier altijd oplossingen voor geboden. Daar komen nu steeds vaker ook in-memory-oplossingen bij. Enerzijds omdat deze steeds betaalbaarder en dus toegankelijker worden, anderzijds doordat dit soort oplossingen in de cloud beschikbaar komen, waardoor de kosten hiervan goed in de hand te houden zijn.

    Virtual data integration
    Daar waar data nu nog vaak fysiek wordt samengevoegd in aparte databases (data warehouses) zal dit, waar mogelijk, worden vervangen door slimme metadata-oplossingen, die (al dan niet met tijdelijke physieke , soms in memory opslag) tijdrovende data extractie en integratie processen overbodig maken.

    Agile BI development
    Organisaties worden meer en meer genoodzaakt om flexibel mee te bewegen in en met de keten waar ze zich in begeven. Dit betekent dat ook de inzichten om de bedrijfsvoering aan te sturen (de bi-oplossingen) flexibel moeten mee bewegen. Dit vergt een andere manier van ontwikkelen van de bi-ontwikkelteams. Meer en meer zie je dan ook dat methoden als Scrum ook voor bi-ontwikkeling worden toegepast.

    Bi voor de iedereen
    Daar waar bi toch vooral altijd het domein van organisaties is geweest zie je dat ook consumenten steeds meer en vaker gebruik maken van bi-oplossingen. Bekende voorbeelden zijn inzicht in financiën en energieverbruik. De analyse van inkomsten en uitgaven op de webportal of in de app van je bank, maar ook de analyse van de gegevens van slimme energiemeters zijn hierbij sprekende voorbeelden. Dit zal in de komende jaren alleen maar toenemen en geïntegreerd worden.

    Rein Mertens, head of analytical platform bij SAS
    Een belangrijke trend die ik tot volwassenheid zie komen in 2016 is ‘streaming analytics’. Vandaag de dag is big data niet meer weg te denken uit onze dagelijkse praktijk. De hoeveelheid data welke per seconde wordt gegenereerd blijft maar toenemen. Zowel in de persoonlijke als zakelijke sfeer. Kijk maar eens naar je dagelijkse gebruik van het internet, e-mails, tweets, blog posts, en overige sociale netwerken. En vanuit de zakelijke kant: klantinteracties, aankopen, customer service calls, promotie via sms/sociale netwerken et cetera.

    Een toename van volume, variatie en snelheid van vijf Exabytes per twee dagen wereldwijd. Dit getal is zelfs exclusief data vanuit sensoren, en overige IoT-devices. Er zit vast interessante informatie verstopt in het analyseren van al deze data, maar hoe doe je dat? Een manier is om deze data toegankelijk te maken en op te slaan in een kosteneffectief big data-platform. Onvermijdelijk komt een technologie als Hadoop dan aan de orde, om vervolgens met data visualisatie en geavanceerde analytics aan de gang te gaan om verbanden en inzichten uit die data berg te halen. Je stuurt als het ware de complexe logica naar de data toe. Zonder de data allemaal uit het Hadoop cluster te hoeven halen uiteraard.

    Maar wat nu, als je op basis van deze grote hoeveelheden data ‘real-time’ slimme beslissingen zou willen nemen? Je hebt dan geen tijd om de data eerst op te slaan, en vervolgens te gaan analyseren. Nee, je wilt de data in-stream direct kunnen beoordelen, aggregeren, bijhouden, en analyseren, zoals vreemde transactie patronen te detecteren, sentiment in teksten te analyseren en hierop direct actie te ondernemen. Eigenlijk stuur je de data langs de logica! Logica, die in-memory staat en ontwikkeld is om dat heel snel en heel slim te doen. En uiteindelijke resultaten op te slaan. Voorbeelden van meer dan honderdduizend transacties zijn geen uitzondering hier. Per seconde, welteverstaan. Stream it, score it, store it. Dat is streaming analytics!

    Minne Sluis, oprichter van Sluis Results
    Van IoT (internet of things) naar IoE (internet of everything)
    Alles wordt digitaal en connected. Meer nog dan dat we ons zelfs korte tijd geleden konden voorstellen. De toepassing van big data-methodieken en -technieken zal derhalve een nog grotere vlucht nemen.

    Roep om adequate Data Governance zal toenemen
    Hoewel het in de nieuwe wereld draait om loslaten, vertrouwen/vrijheid geven en co-creatie, zal de roep om beheersbaarheid toch toenemen. Mits vooral aangevlogen vanuit een faciliterende rol en zorgdragend voor meer eenduidigheid en betrouwbaarheid, bepaald geen slechte zaak.

    De business impact van big data & data science neemt toe
    De impact van big data & data science om business processen, diensten en producten her-uit te vinden, verregaand te digitaliseren (en intelligenter te maken), of in sommige gevallen te elimineren, zal doorzetten.

    Consumentisering van analytics zet door
    Sterk verbeterde en echt intuïtieve visualisaties, geschraagd door goede meta-modellen, dus data governance, drijft deze ontwikkeling. Democratisering en onafhankelijkheid van derden (anders dan zelfgekozen afgenomen uit de cloud) wordt daarmee steeds meer werkelijkheid.

    Big data & data science gaan helemaal doorbreken in de non-profit
    De subtiele doelstellingen van de non-profit, zoals verbetering van kwaliteit, (patiënt/cliënt/burger) veiligheid, punctualiteit en toegankelijkheid, vragen om big data toepassingen. Immers, voor die subtiliteit heb je meer goede informatie en dus data, sneller, met meer detail en schakering nodig, dan wat er nu veelal nog uit de traditionelere bi-omgevingen komt. Als de non-profit de broodnodige focus van de profit sector, op ‘winst’ en ‘omzetverbetering’, weet te vertalen naar haar eigen situatie, dan staan succesvolle big data initiatieven om de hoek! Mind you, deze voorspelling geldt uiteraard ook onverkort voor de zorg.

    Hans Geurtsen, business intelligence architect data solutions bij Info Support
    Van big data naar polyglot persistence
    In 2016 hebben we het niet meer over big, maar gewoon over data. Data van allerlei soorten en in allerlei volumes die om verschillende soorten opslag vragen: polyglot persistence. Programmeurs kennen de term polyglot al lang. Een applicatie anno 2015 wordt vaak al in meerdere talen geschreven. Maar ook aan de opslag kant van een applicatie is het niet meer alleen relationeel wat de klok zal slaan. We zullen steeds meer andere soorten databases toepassen in onze data oplossingen, zoals graph databases, document databases, etc. Naast specialisten die alles van één soort database afweten, heb je dan ook generalisten nodig die precies weten welke database zich waarvoor leent.

    De doorbraak van het moderne datawarehouse
    ‘Een polyglot is iemand met een hoge graad van taalbeheersing in verschillende talen’, aldus Wikipedia. Het gaat dan om spreektalen, maar ook in het it-vakgebied, kom je de term steeds vaker tegen. Een applicatie die in meerdere programmeertalen wordt gecodeerd en data in meerdere soorten databases opslaat. Maar ook aan de business intelligence-kant volstaat één taal, één omgeving niet meer. De dagen van het traditionele datawarehouse met een etl-straatje, een centraal datawarehouse en één of twee bi-tools zijn geteld. We zullen nieuwe soorten data-platformen gaan zien waarin allerlei gegevens uit allerlei bronnen toegankelijk worden voor informatiewerkers en data scientists die allerlei tools gebruiken.

    Business intelligence in de cloud
    Waar vooral Nederlandse bedrijven nog steeds terughoudend zijn waar het de cloud betreft, zie je langzaam maar zeker dat de beweging richting cloud ingezet wordt. Steeds meer bedrijven realiseren zich dat met name security in de cloud vaak beter geregeld is dan dat ze zelf kunnen regelen. Ook cloud leveranciers doen steeds meer om Europese bedrijven naar hun cloud te krijgen. De nieuwe data centra van Microsoft in Duitsland waarbij niet Microsoft maar Deutsche Telekom de controle en toegang tot klantgegevens regelt, is daar een voorbeeld van. 2016 kan wel eens hét jaar worden waarin de cloud écht doorbreekt en waarin we ook in Nederland steeds meer complete BI oplossingen in de cloud zullen gaan zien.

    Huub Hillege, principal data(base) management consultant bij Info-Shunt
    Big data
    De big data-hype zal zich nog zeker voortzetten in 2016 alleen het succes bij de bedrijven is op voorhand niet gegarandeerd. Bedrijven en pas afgestudeerden blijven elkaar gek maken over de toepassing. Het is onbegrijpelijk dat iedereen maar Facebook, Twitter en dergelijke data wil gaan ontsluiten terwijl de data in deze systemen hoogst onbetrouwbaar is. Op elke conferentie vraag ik waar de business case, inclusief baten en lasten is, die alle investeringen rondom big data rechtvaardigen. Zelfs bi-managers van bedrijven moedigen aan om gewoon te beginnen. Dus eigenlijk: achterom kijken naar de data die je hebt of kunt krijgen en onderzoeken of je iets vindt waar je iets aan zou kunnen hebben. Voor mij is dit de grootste valkuil, zoals het ook was met de start van Datawarehouses in 1992. Bedrijven hebben in de huidige omstandigheden beperkt geld. Zuinigheid is geboden.

    De analyse van big data moet op de toekomst zijn gericht vanuit een duidelijke business-strategie en een kosten/baten-analyse: welke data heb ik nodig om de toekomst te ondersteunen? Bepaal daarbij:

    • Waar wil ik naar toe?
    • Welke klantensegmenten wil ik erbij krijgen?
    • Gaan we met de huidige klanten meer 'Cross selling' (meer producten) uitvoeren?
    • Gaan we stappen ondernemen om onze klanten te behouden (Churn)?

    Als deze vragen met prioriteiten zijn vastgelegd moet er een analyse worden gedaan:

    • Welke data/sources hebben we hierbij nodig?
    • Hebben we zelf de data, zijn er 'gaten' of moeten we externe data inkopen?

    Databasemanagementsysteem
    Steeds meer databasemanagementsysteem (dbms)-leveranciers gaan ondersteuning geven voor big data-oplossingen zoals bijvoorbeeld Oracle/Sun Big Data Appliance, Teradata/Teradata Aster met ondersteuning voor Hadoop. De dbms-oplossingen zullen op de lange termijn het veld domineren. big data-software-oplossingen zonder dbms zullen het uiteindelijk verliezen.

    Steeds minder mensen, ook huidige dbma's, begrijpen niet meer hoe het technisch diep binnen een database/DBMS in elkaar zit. Steeds meer zie je dat fysieke databases uit logische data modelleer-tools worden gegeneerd. Formele fysieke database-stappen/-rapporten blijven achterwege. Ook ontwikkelaars die gebruik maken van etl-tools zoals Informatica, AbInitio, Infosphere, Pentaho et cetera, genereren uiteindelijk sgl-scripts die data van sources naar operationele datastores en/of datawarehouse brengen.

    Ook de bi-tools zoals Microstrategy, Business Objects, Tableau et cetera genereren sql-statements.
    Meestal zijn dergelijke tools initieel ontwikkeld voor een zeker dbms en al gauw denkt men dat het dan voor alle dbms'en toepasbaar is. Er wordt dan te weinig gebruik gemaakt van specifieke fysieke dbms-kenmerken.

    De afwezigheid van de echte kennis veroorzaakt dan performance problemen die in een te laat stadium worden ontdekt. De laatste jaren heb ik door verandering van databaseontwerp/indexen en het herstructureren van complexe/gegenereerde sql-scripts, etl-processen van zes tot acht uur naar één minuut kunnen krijgen en queries die 45 tot 48 uur liepen uiteindelijk naar 35 tot veertig minuten kunnen krijgen.

    Advies
    De benodigde data zal steeds meer groeien. Vergeet de aanschaf van allerlei hype software pakketten. Zorg dat je zeer grote, goede, technische, Database-/dbms-expertise in huis haalt om de basis van onderen goed in te richten in de kracht van je aanwezige dbms. Dan komt er tijd en geld vrij (je kan met kleinere systemen uit de voeten omdat de basis goed in elkaar zit) om, na een goede business case en ‘proof of concepts’, de juiste tools te selecteren.

  • 10 Big Data Trends for 2017

    big-dataInfogix, a leader in helping companies provide end-to-end data analysis across the enterprise, today highlighted the top 10 data trends they foresee will be strategic for most organizations in 2017.
     
    “This year’s trends examine the evolving ways enterprises can realize better business value with big data and how improving business intelligence can help transform organization processes and the customer experience (CX),” said Sumit Nijhawan, CEO and President of Infogix. “Business executives are demanding better data management for compliance and increased confidence to steer the business, more rapid adoption of big data and innovative and transformative data analytic technologies.”
     
    The top 10 data trends for 2017 are assembled by a panel of Infogix senior executives. The key trends include:
     
    1.    The Proliferation of Big Data
        Proliferation of big data has made it crucial to analyze data quickly to gain valuable insight.
        Organizations must turn the terabytes of big data that is not being used, classified as dark data, into useable data.   
        Big data has not yet yielded the substantial results that organizations require to develop new insights for new, innovative offerings to derive a competitive advantage
     
    2.    The Use of Big Data to Improve CX
        Using big data to improve CX by moving from legacy to vendor systems, during M&A, and with core system upgrades.
        Analyzing data with self-service flexibility to quickly harness insights about leading trends, along with competitive insight into new customer acquisition growth opportunities.
        Using big data to better understand customers in order to improve top line revenue through cross-sell/upsell or remove risk of lost revenue by reducing churn.
     
    3.    Wider Adoption of Hadoop
        More and more organizations will be adopting Hadoop and other big data stores, in turn, vendors will rapidly introduce new, innovative Hadoop solutions.
        With Hadoop in place, organizations will be able to crunch large amounts of data using advanced analytics to find nuggets of valuable information for making profitable decisions.
     
    4.    Hello to Predictive Analytics
        Precisely predict future behaviors and events to improve profitability.
        Make a leap in improving fraud detection rapidly to minimize revenue risk exposure and improve operational excellence.
     
    5.    More Focus on Cloud-Based Data Analytics
        Moving data analytics to the cloud accelerates adoption of the latest capabilities to turn data into action.
        Cut costs in ongoing maintenance and operations by moving data analytics to the cloud.
     
    6.    The Move toward Informatics and the Ability to Identify the Value of Data
        Use informatics to help integrate the collection, analysis and visualization of complex data to derive revenue and efficiency value from that data
        Tap an underused resource – data – to increase business performance
     
    7.    Achieving Maximum Business Intelligence with Data Virtualization
        Data virtualization unlocks what is hidden within large data sets.
        Graphic data virtualization allows organizations to retrieve and manipulate data on the fly regardless of how the data is formatted or where it is located.
     
    8.    Convergence of IoT, the Cloud, Big Data, and Cybersecurity
        The convergence of data management technologies such as data quality, data preparation, data analytics, data integration and more.
        As we continue to become more reliant on smart devices, inter-connectivity and machine learning will become even more important to protect these assets from cyber security threats.
     
    9.    Improving Digital Channel Optimization and the Omnichannel Experience
        Delivering the balance of traditional channels with digital channels to connect with the customer in their preferred channel.
        Continuously looking for innovative ways to enhance CX across channels to achieve a competitive advantage.
     
    10.    Self-Service Data Preparation and Analytics to Improve Efficiency
        Self-service data preparation tools boost time to value enabling organizations to prepare data regardless of the type of data, whether structured, semi-structured or unstructured.
        Decreased reliance on development teams to massage the data by introducing more self-service capabilities to give power to the user and, in turn, improve operational efficiency.
     
    “Every year we see more data being generated than ever before and organizations across all industries struggle with its trustworthiness and quality. We believe the technology trends of cloud, predictive analysis and big data will not only help organizations deal with the vast amount of data, but help enterprises address today’s business challenges,” said Nijhawan. “However, before these trends lead to the next wave of business, it’s critical that organizations understand that the success is predicated upon data integrity.”
     
    Source: dzone.com, November 20, 2016
  • 2016 wordt het jaar van de kunstmatige intelligentie

    Artificial-intelligence.jpg-1024x678December is traditiegetrouw de periode van het jaar om terug te blikken en oudjaarsdag is daarbij in het bijzonder natuurlijk de beste dag voor. Bij Numrush kijken we echter liever vooruit. Dat deden we begin december al met ons RUSH Magazine. In deze Gift Guide gaven we cadeautips aan de hand van een aantal thema’s waar we komend jaar veel over gaan horen.Eén onderwerp bleef bewust een beetje onderbelicht in onze Gift Guide. Aan de ene kant omdat het niet iets is wat je cadeau geeft, maar ook omdat het eigenlijk de diverse thema’s overstijgt. Ik heb het over kunstmatige intelligentie. Dat is natuurlijk niets nieuws, er is al ontzettend veel gebeurt op dat vlak, maar komend jaar zal de toepassing hiervan nog verder in een stroomversnelling raken.

  • 2017 Investment Management Outlook

    2017 investment management outlook infographic

    Several major trends will likely impact the investment management industry in the coming year. These include shifts in buyer behavior as the Millennial generation becomes a greater force in the investing marketplace; increased regulation from the Securities and Exchange Commission (SEC); and the transformative effect that blockchain, robotic process automation, and other
    emerging technologies will have on the industry.

    Economic outlook: Is a major stimulus package in the offing?

    President-elect Donald Trump may have to depend heavily on private-sector funding to proceed with his $1 trillion infrastructure spending program, considering Congress ongoing reluctance to increase spending. The US economy may be nearing full employment with the younger cohorts entering the labor market as more Baby Boomers retire. In addition, the prospects for a fiscal stimulus seem greater now than they were before the 2016 presidential election.

    Steady improvement and stability is the most likely scenario for 2017. Although weak foreign demand may continue to weigh on growth, domestic demand should be strong enough to provide employment for workers returning to the labor force, as the unemployment rate is expected to remain at approximately 5 percent. GDP annual growth is likely to hit a maximum of 2.5 percent. In the medium term, low productivity growth will likely put a ceiling on the economy, and by 2019, US GDP growth may be below 2 percent, despite the fact that the labor market might be at full employment. Inflation is expected to remain subdued. Interest rates are likely to rise in 2017, but should remain at historically low levels throughout the year. If the forecast holds, asset allocation shifts among cash, commodities, and fixed income may begin by the end of 2017.

    Investment industry outlook: Building upon last year’s performance
    Mutual funds and exchange-traded funds (ETFs) have experienced positive growth. Worldwide regulated funds grew at 9.1 percent CAGR versus 8.6 percent by US mutual funds and ETFs. Non-US investments grew at a slightly faster pace due to global demand. Both worldwide and US investments seem to show declining demand in 2016 as returns remained low.

    Hedge fund assets have experienced steady growth over the past five years, even through performance swings.

    Private equity investments continued a track record of strong asset appreciation. Private equity has continued to attract investment even with current high valuations. Fundraising increased incrementally over the past five years as investors increased allocations in the sector.

    Shifts in investor buying behavior: Here come the Millennials
    Both institutional and retail customers are expected to continue to drive change in the investment management industry. The two customer segments are voicing concerns about fee sensitivity and transparency. Firms that enhance the customer experience and position advice, insight, and expertise as components of value should have a strong chance to set themselves apart from their competitors.

    Leading firms may get out in front of these issues in 2017 by developing efficient data structures to facilitate accounting and reporting and by making client engagement a key priority. On the retail front, the SEC is acting on retail investors’ behalf with reporting modernization rule changes for mutual funds. This focus on engagement, transparency, and relationship over product sales are integral to creating a strong brand as a fiduciary, and they may prove to differentiate some firms in 2017.

    Growth in index funds and other passive investments should continue as customers react to market volatility. Investors favor the passive approach in all environments, as shown by net flows. They are using passive investments alongside active investments, rather than replacing the latter with the former. Managers will likely continue to add index share classes and index-tracking ETFs in 2017, even if profitability is challenged. In addition, the Department of Labor’s new fiduciary rule is expected to promote passive investments as firms alter their product offerings for retirement accounts.

    Members of the Millennial generation—which comprises individuals born between 1980 and 2000—often approach investing differently due to their open use of social media and interactions with people and institutions. This market segment faces different challenges than earlier generations, which influences their use of financial services.

    Millennials may be less prosperous than their parents and may need to own less in order to fully fund retirement. Many start their careers burdened by student debt. They may have a negative memory of recent stock market volatility, distrust financial institutions, favor socially conscious investments, and rely on recommendations from their friends when seeking financial advice.

    Investment managers likely need to consider several steps when targeting Millennials. These include revisiting product lines, offering socially conscious “impact investments,” assigning Millennial advisers to client service teams, and employing digital and mobile channels to reach and serve this market segment.

    Regulatory developments: Seeking greater transparency, incentive alignment, and risk control
    Even with a change in leadership in the White House and at the SEC, outgoing Chair Mary Jo White’s major initiatives are expected to endure in 2017 as they seek to enhance transparency, incentive alignment, and risk control, all of which build confidence in the markets. These changes include the following:

    Reporting modernization. Passed in October 2016, this new requirement of forms, rules, and amendments for information disclosure and standardization will require development by registered investment companies (RICs). Advisers will need technology solutions that can capture data that may not currently exist from multiple sources; perform high-frequency calculations; and file requisite forms with the SEC.

    Liquidity risk management (LRM). Passed in October 2016, this rule requires the establishment of LRM programs by open-end funds (except money market) and ETFs to reduce the risk of inability to meet redemption requirements without dilution of the interests of remaining shareholders.

    Swing pricing. Also passed in October 2016, this regulation provides an option for open-end funds (except money market and ETFs) to adjust net asset values to pass the costs stemming from purchase and redemption activity to shareholders.

    Use of derivatives. Proposed in December 2015, this requires RICs and business development companies to limit the use of derivatives and put risk management measures in place.

    Business continuity and transition plans. Proposed in June 2016, this measure requires registered investment advisers to implement written business continuity and transition plans to address operational risk arising from disruptions.

    The Dodd-Frank Act, Section 956. Reproposed in May 2016, this rule prohibits compensation structures that encourage individuals to take inappropriate risks that may result in either excessive compensation or material loss.

    The DOL’s Conflict-of-Interest Rule. In 2017, firms must comply with this major expansion of the “investment advice fiduciary” definition under the Employee Retirement Income Security Act of 1974. There are two phases to compliance:

    Phase one requires compliance with investment advice standards by April 10, 2017. Distribution firms and advisers must adhere to the impartial conduct standards, provide a notice to retirement investors that acknowledge their fiduciary status, and describes their material conflicts of interest. Firms must also designate a person responsible for addressing material conflicts of interest monitoring advisers' adherence to the impartial conduct standards.

    Phase two requires compliance with exemption requirements by January 1, 2018. Distribution firms must be in full compliance with exemptions, including contracts, disclosures, policies and procedures, and documentation showing compliance.

    Investment managers may need to create new, customized share classes driven by distributor requirements; drop distribution of certain share classes post-rule implementation, and offer more fee reductions for mutual funds.

    Financial advisers may need to take another look at fee-based models, if they are not using already them; evolve their viewpoint on share classes; consider moving to zero-revenue share lineups; and contemplate higher use of ETFs, including active ETFs with a low-cost structure and 22(b) exemption (which enables broker-dealers to set commission levels on their own).

    Retirement plan advisers may need to look for low-cost share classes (R1-R6) to be included in plan options and potentially new low-cost structures.

    Key technologies: Transforming the enterprise

    Investment management poised to become even more driven by advances in technology in 2017, as digital innovations play a greater role than ever before.

    Blockchain. A secure and effective technology for tracking transactions, blockchain should move closer to commercial implementation in 2017. Already, many blockchain-based use cases and prototypes can be found across the investment management landscape. With testing and regulatory approvals, it might take one to two years before commercial rollout becomes more widespread.

    Big data, artificial intelligence, and machine learning. Leading asset management firms are combining big data analytics along with artificial intelligence (AI) and machine learning to achieve two objectives: (1) provide insights and analysis for investment selection to generate alpha, and (2) improve cost effectiveness by leveraging expensive human analyst resources with scalable technology. Expect this trend to gain momentum in 2017.

    Robo-advisers. Fiduciary standards and regulations should drive the adoption of robo-advisers, online investment management services that provide automated, portfolio management advice. Improvements in computing power are making robo-advisers more viable for both retail and institutional investors. In addition, some cutting-edge robo-adviser firms could emerge with AI-supported investment decision and asset allocation algorithms in 2017.

    Robotic process automation. Look for more investment management firms to employ sophisticated robotic process automation (RPA) tools to streamline both front- and back-office functions in 2017. RPA can automate critical tasks that require manual intervention, are performed frequently, and consume a signifcant amount of time, such as client onboarding and regulatory compliance.


    Change, development, and opportunity
    The outlook for the investment management industry in 2017 is one of change, development, and opportunity. Investment management firms that execute plans that help them anticipate demographic shifts, improve efficiency and decision making with technology, and keep pace with regulatory changes will likely find themselves ahead of the competition.


    Download 2017 Investment management industry outlook

    Source: Deloitte.com

     

  • 4 benefits of predictive analytics improving healthcare

    4 benefits of predictive analytics improving healthcare

    There are so many wonderful ways predictive analytics will improve healthcare. Here are some of the potential benefits to consider.

    Medical care has relied on the education and expertise of doctors. Human error is common and 250,000 people per year die from medical errors. As this is the third-leading cause of death in the United States, limiting errors is a key focus in the healthcare industry.

    Big data and predictive analytics will lead to healthcare improvement.

    But how? Health IT Analytics previously published an excellent paper on some of the best use cases of predictive analytics in healthcare. We reviewed other papers on the topic and condensed the best benefits into this article.

    1. Diagnoses accuracy will improve

    Diagnoses accuracy will improve, and this will occur with the help of predictive algorithms. Surveys will be incorporated, which will ask the person that enters the emergency room with chest pain an array of questions.

    Algorithms could, potentially, use this information to determine if the patient should be sent home or if the patient is having a heart attack.

    Patients will still have insight from doctors who will use the information to assist in a diagnosis. The predictive analytics are not designed to replace a doctor’s advice.

    2. Early diagnoses and treatment options

    Big data will lead to earlier diagnoses, especially in deadly forms of cancer and disease. Annually, mesothelioma affects 2,000 to 3,000 people, but there’s a latency period that’s rarely less than 15 years and could be as long as 70 years.

    Predictive analysis will allow for doctors to put all of a person’s history into an algorithm to better determine the patient’s risk of certain diseases.

    And when a disease is found early on, treatment options are expanded. There are a variety of treatment options often available when a person is in good health. If doctors can predict a patient’s risk of cancer or certain illnesses, they can offer preventative care which may be able to slow the progression of the disease.

    Babylon Health already has raised $60 million to create a chatbot that will use an AI chatbot to help with patient diagnoses.

    3. Improve patient outcomes

    One study suggests that patient outcomes will improve by 30% to 40%, with the cost of treatment will be reduced by 50%. Medical imaging diagnosis will improve with an enhancement in care delivery, too. The introduction of predictive analytics will allow patients to live longer and have a better medical outlook as a result.

    Consumers will work with physicians in a collaborative manner to provide better overall health histories.

    Doctors will be able to create models that help predict health risks using genome analysis and family history to help.

    4. Changes for hospitals and insurance providers

    Hospitals and insurance providers will also see changes, initially bad changes. Through predictive analysis, patients will be able to seek diagnoses without going to the hospital. Wearables may be able to predict health issues that a person is likely to face.

    Revenues will initially be lost by hospitals, insurance companies and pharmacies that have fewer patients and errors sending patients to facilities.

    Hospitals and insurance companies will need to adapt to these changes or face losing profit and revenue in the process. Government funding may also increase in an effort to increase innovation in the market.

    Predictive analytics has the potential to help people live longer with better treatment options and predictive preventative care.

    Predictive analytics is the key solution to healthcare challenges

    Many healthcare challenges are still plaguing patients and healthcare providers around the United States. The good news is that new advances in predictive analytics are making it easier for healthcare providers to administer excellent care. Big data solutions will help healthcare providers lower healthcare costs and give patients excellent service that they expect and deserve.

    Author: Andrej Kovacevic

    Source: SmartDataCollective

  • 4 Tips om doodbloedende Big Data projecten te voorkomen

    projectmanagers

    Investeren in big data betekent het verschil tussen aantrekken of afstoten van klanten, tussen winst of verlies. Veel retailers zien hun initiatieven op het vlak van data en analytics echter doodbloeden. Hoe creëer je daadwerkelijk waarde uit data en voorkom je een opheffingsuitverkoop? Vier tips.

    Je investeert veel tijd en geld in big data, exact volgens de boodschap die retailgoeroes al enkele jaren verkondigen. Een team van data scientists ontwikkelt complexe datamodellen, die inderdaad interessante inzichten opleveren. Met kleine ‘proofs of value’ constateert u dat die inzichten daadwerkelijk ten gelde kunnen worden gemaakt. Toch gebeurt dat vervolgens niet. Wat is er aan de hand?

    Tip 1: Pas de targets aan

    Dat waardevolle inzichten niet in praktijk worden gebracht, heeft vaak te maken met de targets die uw medewerkers hebben meegekregen. Neem als voorbeeld het versturen van mailingen aan klanten. Op basis van bestaande data en klantprofielen kunnen we goed voorspellen hoe vaak en met welke boodschap elke klant moet worden gemaild. En stiekem weet elke marketeer donders goed dat niet elke klant op een dagelijkse email zit te wachten.

    Toch trapt menigeen in de valkuil en stuurt telkens weer opnieuw een mailing uit naar het hele klantenbestand. Het resultaat: de interesse van een klant ebt snel weg en de boodschap komt niet langer aan. Waarom doen marketeers dat? Omdat ze louter en alleen worden afgerekend op de omzet die ze genereren, niet op de klanttevredenheid die ze realiseren. Dat nodigt uit om iedereen zo vaak mogelijk te mailen. Op korte termijn groeit met elk extra mailtje immers de kans op een verkoop.

    Tip 2: Plaats de analisten in de business

    Steeds weer zetten retailers het team van analisten bij elkaar in een kamer, soms zelfs als onderdeel

    van de IT-afdeling. De afstand tot de mensen uit de business die de inzichten in praktijk moeten brengen, is groot. En te vaak blijkt die afstand onoverbrugbaar. Dat leidt tot misverstanden, onbegrepen analisten en waardevolle inzichten die onbenut blijven.

    Beter is om de analisten samen met de mensen uit de business bij elkaar te zetten in multidisciplinaire teams, die werken met scrum-achtige technieken. Organisaties die succesvol zijn, beseffen dat ze continu in verandering moeten zijn en werken in dat soort teams. Dat betekent dat business managers in een vroegtijdig stadium worden betrokken bij de bouw van datamodellen, zodat analisten en de business van elkaar kunnen leren. Klantkennis zit immers in data én in mensen.

    Tip 3: Neem een business analist in dienst

    Data-analisten halen hun werkplezier vooral uit het maken van fraaie analyses en het opstellen van goede, misschien zelfs overontwikkelde datamodellen. Voor hun voldoening is het vaak niet eens nodig om de inzichten uit die modellen in praktijk te brengen. Veel analisten zijn daarom ook niet goed in het interpreteren van data en het vertalen daarvan naar de concrete impact op de retailer. 

    Het kan verstandig zijn om daarom een business analist in te zetten. Dat is iemand die voldoende affiniteit heeft met analytics en enigszins snapt hoe datamodellen tot stand komen, maar ook weet wat de uitdagingen van de business managers zijn. Hij kan de kloof tussen analytics en business overbruggen door vragen uit de business te concretiseren en door inzichten uit datamodellen te vertalen naar kansen voor de retailer.

    Tip 4: Analytics is een proces, geen project

    Nog te veel retailers kijken naar alle inspanningen op het gebied van data en analytics alsof het een project met een kop en een staart betreft. Een project waarvan vooraf duidelijk moet zijn wat het gaat opleveren. Dat is vooral het geval bij retailorganisaties die worden geleid door managers uit de ‘oude generatie’ die onvoldoende gevoel en affiniteit met de nieuwe wereld hebben Het commitment van deze managers neemt snel af als investeringen in data en analytics niet snel genoeg resultaat opleveren.

    Analytics is echter geen project, maar een proces waarin retailers met vallen en opstaan steeds handiger en slimmer worden. Een proces waarvan de uitkomst vooraf onduidelijk is, maar dat wel moet worden opgestart om vooruit te komen. Want alle ontwikkelingen in de retailmarkt maken één ding duidelijk: stilstand is achteruitgang.

    Auteur: EY, Simon van Ulden, 5 oktober 2016

  • 50% NL'ers wil prive data delen voor gratis online diensten

    ANP-Data-CenterNederlanders zijn goed op de hoogte van hoe bedrijven en instellingen persoonlijke gegevens verzamelen en gebruiken. Ook zijn Nederlanders van alle Europeanen het meest bereid om persoonlijke gegevens te verstrekken in ruil voor gratis online diensten. Dat blijkt uit een onderzoek door TNS in opdracht van het Vodafone Institute.

    Deze Berlijnse denktank van Vodafone heeft 8.000 Europanen in acht landen laten ondervragen over hun kennis inzake datagebruik. Daaruit blijkt dat de helft van de Nederlanders bereid is om persoonlijke gegevens te verstrekken in ruil voor het gratis gebruik van online diensten. Bijna even veel Fransen (48%) zijn hiertoe bereid.

    Aan de andere kant van het spectrum staan Italianen en Engelsen, waarvan 66 procent liever betaalt voor online diensten dan dat zij persoonlijke gegevens verstrekken voor gratis gebruik ervan. Consumenten uit deze twee Europese landen zijn volgens de studie van het Vodafone Institute ook het minst goed geïnformeerd over de verzameling en het gebruik van persoonlijke gegevens door bedrijven en instellingen.

    Het Vodafone Institute stelt dat de studie inzicht biedt in de uitdagingen die overheid en bedrijfsleven nog hebben bij gebruik van klantgegevens voor Big Data projecten. Europeanen zijn volgens het Institute bereid hun gegevens te delen, zo lang ze een duidelijk persoonlijk of maatschappelijk belang zien. Maar wanneer organisaties te kort schieten in hun uitleg over hoe en waarom ze gegevens willen analyseren, wordt de kans veel kleiner dat mensen mee doen aan Big Data initiatieven.

    Wantrouwen over bedrijven groot, minder bij overheden

    35 procent van de Europeanen vindt bestaande wet- en regelgeving over privacy passend en proportioneel. Nederland scoort hier gemiddeld. 26 procent van de Europeanen denkt dat bedrijven hun privacy respecteren, Nederlanders scoren nog lager (22%). Nederlanders hebben wel meer vertrouwen in de overheid dan andere Europeanen: 49% denkt dat de overheid hun privacy respecteert (versus gemiddeld 36% in Europa).

    Duidelijkheid over datagebruik helpt

    Meer mensen blijken bereid te zijn om gegevens te delen als duidelijk is hoe dit hen of de samenleving helpt. Zo zou 70 procent van de Nederlanders verzameling van grote hoeveelheden anonieme data door gezondheidsorganisaties steunen, versus gemiddeld 65 procent in Europa. 67 procent van de Nederlanders zou deze instanties zelfs toegang geven tot zijn of haar gegevens (tegen 62% gemiddeld in Europa).

    Verder vindt 50 procent van de Nederlanders het krijgen van verkeersadvies op basis van verzamelde gegevens door navigatiebedrijven niet bezwaarlijk (versus 55% gemiddeld in Europa). 41 procent van de Nederlanders vindt het ook goed als deze gegevens met lokale overheden gedeeld worden om het wegennet en de doorstroming te verbeteren (45% gemiddeld in Europa).

    66 procent van de Nederlanders (68% gemiddeld in Europa) staat positief tegen ‘smart meters’ van energiebedrijven, omdat dergelijke meters kunnen helpen bij het analyseren en beperken van energieverbruik. Ook vindt 45 procent van de Nederlanders het goed als online shops verzamelde data gebruiken om producten of diensten te verbeteren, een gemiddelde score. Nederlanders staan iets minder open voor persoonlijke aanbiedingen op basis van shopgedrag (39%) dan andere Europeanen (44%).

    Doorverkopen gegevens stuit op verzet

    Het doorverkopen van persoonlijke gegevens aan derden stuit op veel weerstand. Slechts 11 procent van de Nederlanders vindt het goed als online shops hun gegevens doorverkopen voor marketing- en advertentiedoeleinden (versus 10% gemiddeld in Europa). 9 procent vindt het goed als data verkregen uit navigatieapparatuur, of over auto en/of rijgedrag geanonimiseerd en geaggregeerd worden doorverkocht (11% gemiddeld in Europa). Eveneens slechts 9 procent vindt het geen probleem als energiebedrijven geanonimiseerde en geaggregeerde data door verkopen (13% gemiddeld in Europa).

    Op de vraag wat organisaties kunnen doen om het vertrouwen van gebruikers te vergroten bij het beheren en beschermen van gegevens, zeggen Nederlanders:

    • Wees transparant over wat verzameld wordt en waarvoor dat wordt gebruikt (73%);
    • Gebruik begrijpelijke taal en korte algemene voorwaarden (61%);
    • Biedt de mogelijkheid om persoonlijke privacy-instellingen aan te passen (55%)
    • Certificering door een onafhankelijk testinstituut (48%)

    Source: Telecompaper

  • 7 trends that will emerge in the 2021 big data industry

    7 trends that will emerge in the 2021 big data industry

    “The best-laid plans of mice and men often go amiss”– a saying by poet Robert Burns.

    In January 2020, most businesses laid out ambitious plans, covering a complete roadmap to steer organizations through the months to follow. But to our dismay, COVID-19 impacted the world in ways we could never imagine, proclaiming pointless many of these best-laid plans.

    And to avert the crisis, organizations had to become more adaptable seemingly overnight.

    As the pandemic continues to disrupt lives, markets, and societies at large, organizations are seeking mindful ways to pivot and weather all types of disruptions.

    Big data trends in 2021

    Big data has been and will continue to be a crucial resource for both private and public enterprises.

    A report by Statista estimated the global big data market to reach USD 103 billion by 2027.

    Despite the benefits big data promised over these past years, it is only now that those promises are coming to fruition. Here are seven top big data trends organizations will need to watch to better reinforce and secure disrupted businesses. Have a look at the summary of those trends:

    1. Cloud automation

    Capturing big data is easy. What’s difficult is to corral, tag, govern, and utilize it.

    NetApp, a hybrid cloud provider, sees cloud automation as a practice that enables IT, developers, and teams to develop, modify, and disassemble resources automatically on the cloud.

    Cloud computing provides services whenever it is required. Yet, you need support to utilize these resources to further test, identify, and take them down when the requirement is no longer needed. Completing the process requires a lot of manual effort and is time-consuming. This is when cloud automation intervenes.

    Cloud automation mitigates the burden of cloud systems – public and private.

    Artificial intelligence (AI), machine learning, and artificial intelligence for IT operations (AIOps) also help cloud automation to review swaths of data, spot trends, and analyze results.

    Cloud automation, along with AI, is revolutionizing the future of work by offering:

    • Security
    • Centralized governance
    • Lower total cost of ownership (TCO)
    • Scalability
    • Continued innovation with the latest version of cloud platform

    2. Hybrid cloud

    Hybrid cloud is paramount to improve business continuity.

    Most organizations are skeptical about sharing data on the cloud for multiple reasons: poor latency, security, privacy, and much alike. But with the hybrid cloud, components and applications from multiple cloud services can easily interoperate across boundaries and architectures. For instance, cloud vs on-premises and traditional integration vs modern digital integration. The present big data industry is converging around hybrid clouds. Therefore, making it an intermediate point for enterprise data to have a structured deployment in public clouds.

    One of the major benefits hybrid cloud offers is agility. The ability to adapt quickly is the key to success for current businesses. Your organization might need to facilitate both private and public clouds with on-premise resources to become agile.

    Hybrid clouds can:

    • Build efficient infrastructure
    • Optimize performance
    • Improve security
    • Strengthen regulatory compliance system

    3. Hyperautomation

    Listed as one of Gartner’s Top 10 Strategic Technology Trends for 2020, the term ‘hyperautomation’ will continue to be significant in 2021.

    “Hyperautomation is irreversible and inevitable. Everything that can and should be automated will be automated,” says Brian Burke, Research Vice President, Gartner.

    Automation, when combined with technologies such as AI, machine learning, and intelligent business processes, achieves a new level of digital transformation. Moreover, it helps businesses automate countless IT and decision-making processes.

    The core components of hyperautomation are:

    RPA is also referred to as the foundation stone of hyperautomation, and the technology is anticipated to grow to USD 25.56 billion by 2027, according to Grand View Research.

    With remote work on the rise, organizations have been pushed to the brink to adopt a digital-first approach. This instilled fear among employees since it started impacting the way they work, leading to a spike in security concerns:

    Further use of hyperautomation can easily resolve 80% of threats even before any user can report them, says Security Boulevard.

    4. Actionable data

    There is no reward for an organization owning large amounts of data that are not useful. You need to transform raw data into actionable insight to help businesses make informed decisions. This can be possible through ‘actionable data.’

    “What big data represents is an opportunity; an opportunity for actionable insight, an opportunity to create value, an opportunity to effect relevant and profitable organizational change. The opportunity lies in which information is integrated, how it is visualized and where actionable insight is extracted.” – CIS Wired

    The need to glean accurate data and information that further establishes relevant insights for decision-makers is critical for business impact.

    Big data will continue its rise in 2021. This might be the first year where we will experience the potential of actionable data.

    5. Immersive experience

    The immersive web is already undergoing a sudden change we believe will shape 2021.

    “Everything that is on a smartphone will soon be possible in XR, and in addition, a range of new applications will be invented that are only possible using VR/AR,” says Ferhan Ozkan, co-founder of VR First and XR Bootcamp.

    The future of the immersive web is set to take flight by virtual reality (VR) and augmented reality (AR), also called immersive experience.

    In 2020, we experienced a year with a drastic impact on digital entertainment, on apps like Discord, TikTok, and Roblox. Despite being early iterations of immersive web, this trend will be further driven by Gen Z.

    Lockdown measures implemented in 2020 have accentuated this drastic shift, more so bringing forth an opportunity for businesses to take charge of the interests of society.

    6. Data marketplace and exchanges

    By 2022, most of the online marketplace will attract nearly 35 percent of large organizations to stay connected by making them become sellers or buyers of data, predicts Gartner. Top companies like Acxiom, White Pages, and ZoomInfo were already selling data for decades. But with emerging data exchanges, you can easily find platforms to integrate data offerings even from a third-party, e.g. SingularityNET.

    This trend will definitely accelerate the rise of technologies like data science, machine learning, deep learning, and the cloud.

    7. Edge computing

    Edge computing will go mainstream in 2021, predict Gartner and Forrester.

    “Edge computing is entering the mainstream as organizations look to extend cloud to on-premises and to take advantage of IoT and transformational digital business applications. I&O leaders must incorporate edge computing into their cloud computing plans as a foundation for new application types over the long term.” – Gartner 2021 Strategic Roadmap for Edge Computing

    Many organizations are pushing toward implementing edge computing, to gain benefits like greater reliability, increased scalability, improved performance, and better regulatory compliance options.

    The continued rise in utilizing data by technologies like VR, AR, and 5G networks will further drive the growing demand for edge computing.

    With organizations switching to remote work globally, many have shifted from traditional servers to cloud computing services to boost security, while some have started turning to edge computing to reduce latency, increase internet speed, and boost network performance.

    Stay certified and get ready for the big data change in 2021!

    Source: Dasca

  • 7 voorspellingen over IT in 2045

    HeroboticsDe kans is groot dat de wereld binnenkort niet alleen wordt bevolkt door miljarden mensen, maar ook door miljarden robots. De IT-industrie wordt het terrein voor bedrijven die programma's ontwikkelen voor deze robots. Net zoals de nu voor menselijke gebruikers ontwikkelde apps zullen deze 'robo-apps' te downloaden en te installeren zijn.

    De grenzen tussen robots en mensen vervagen. Bij transplantaties wordt gebruik gemaakt van elektronisch gestuurde kunstmatige organen en protheses. Nanorobots dringen diep in het lichaam door om medicijnen af te leveren bij zieke cellen of om microchirurgie uit te voeren. Speciaal geïnstalleerde houden toezicht op de gezondheid van mensen.

    Mensen in slimme huizen wonen, waar het meeste comfort volledig is geautomatiseerd. De software waarop het huis draait regelt het verbruik en de aanvulling van energie, water, voedsel en andere voorzieningen.

    Onze digitale alter ego's komen eindelijk volledig tot wasdom binnen een enkele, wereldwijde infrastructuur die in staat is tot zelfregulering en betrokken is bij het beheer van het leven op de planeet. Het systeem zal een beetje werken als het hedendaagse TOR; de meest actieve en effectieve gebruikers zullen moderatorrechten verdienen.

    Niet alleen saaie klusjes behoren tot het verleden – ook de productie van bepaalde artikelen zal niet langer nodig zijn. In plaats daarvan stellen 3D-printers ons in staat alles te ontwerpen en te maken wat we nodig hebben.

    De pc stond weliswaar aan de basis van de hele IT-revolutie, maar in 2045 zien we hem waarschijnlijk alleen nog in musea. De dingen om ons heen verwerken hun eigen informatie.

    Niet iedereen zal even enthousiast zijn over die mooie, nieuwe robotwereld. Waarschijnlijk zullen technofoben in opstand komen om zich te verzetten tegen de ontwikkeling van intelligente huizen, geautomatiseerde levensstijlen en robots.

    Bron: Automatiseringsgids, 22 Januari 2014

  • 8 op de 10 bedrijven slaat gevoelige data op in de cloud

    54640085% van de bedrijven slaat gevoelige data op in de cloud. Dit is een flinke stijging ten opzichte van de 54% die vorig jaar aangaf dit te doen. 70% van de bedrijven maakt zich zorgen over de veiligheid van deze data.

    Dit blijkt uit onderzoek van 451 Research in opdracht van Vormetric, leverancier van databeveiliging voor fysieke, big data, public, private en hybride cloud omgevingen. Gevoelige data staat uiteraard niet alleen in de cloud. 50% van de bedrijven geeft aan gevoelige data in big data systemen te hebben staan (tegenover 31% vorig jaar), en 33% heeft dergelijke data in Internet of Things (IoT) omgevingen opgeslagen.

    Zorgen over de cloud
    451 Research heeft respondenten ook gevraagd naar de zorgen die zij hebben over de veiligheid van hun gevoelige data die in de cloud staat. De belangrijkste zorgenpunten zijn:

    • Cyberaanvallen en -inbraken bij een service provider (70%)
    • De kwetsbaarheid van een gedeelde infrastructuur (66%)
    • Een gebrek aan controle over de locatie waar data is opgeslagen (66%)
    • Een gebrek aan een data privacy beleid of privacy SLA (65%)

    Ook is respondenten gevraagd welke wijzigingen hun bereidheid data in de cloud onder te brengen zullen vergroten. De belangrijkste wijzigingen waar respondenten behoefte aan hebben zijn:

    • Encryptie van data, waarbij de encryptiesleutel wordt beheerd op de eigen infrastructuur van het bedrijf (48%)
    • Gedetaileerde informatie over de fysieke en IT-beveiliging (36%)
    • Het zelf kunnen kiezen voor encryptie van data die is opgeslagen op de infrastructuur van een service provider (35%)

    Zorgen over big data systemen
    Ook de opslag van gevoelige data in big data systemen baart respondenten zorgen. De belangrijkste zorgenpunten zijn:

    • De veiligheid van rapporten die met big data systemen worden gecreëerd, aangezien deze gevoelige data kunnen bevatten (42%)
    • Het feit dat data op iedere locatie binnen deze omgeving kan zijn ondergebracht (41%)
    • Privacyschendingen door data die uit verschillende landen afkomstig is (40%)Toegang door gebruikers met ‘superrechten’ tot beschermde data (37%)
    • Een gebrek aan een security raamwerk en beheermogelijkheden binnen de omgeving (33%)

    Ook merkt 451 Research op dat big data systemen vaak in de cloud draaien. Zorgen over de opslag van gevoelige data van de cloud zijn hierdoor ook van toepassing op data die in big data omgevingen is opgeslagen.

    Ook data in IoT omgevingen leidt tot zorgen
    Tot slot kijkt 451 Research naar de zorgen die bedrijven hebben over de opslag van data in IoT omgevingen. De belangrijkste zorgen op dit gebied zijn:

    • Het beschermen van data die door IoT wordt gecreëerd (35%)
    • Privacyschendingen (30%)
    • Identificeren welke data gevoelig is (29%)
    • Toegang van gebruikers met ‘superrechten’ tot IoT data en apparaten (28%)
    • Aanvallen op IoT-apparaten die een impact kunnen hebben op de kritieke bedrijfsvoering (27%)

    Het gehele onderzoek lees je HIER

    Source: Executive People

  • A look at the major trends driving next generation datacenters

    Data centers have become a core component of modern living, by containing and distributing the information required to participate in everything from social life to economy. In 2017, data centers consumed 3 percent of the world’s electricity, and new technologies are only increasing their energy demand. The growth of high-performance computing — as well as answers to growing cyber-security threats and efficiency concerns — are dictating the development of the next generation of data centers.

    But what will these new data centers need in order to overcome the challenges the industry faces? Here is a look at 5 major trends that will impact data center design in the future.

    1. Hyperscale functionality

    The largest companies in the world are increasingly consolidating computing power in massive, highly efficient hyperscale data centers that can keep up with the increasing demands of enterprise applications. These powerful data centers are mostly owned by tech giants like Amazon or Facebook, and there are currently around 490 of them in existence with more than 100 more in development. It’s estimated that these behemoths will contain more than 50 percent of all data that passes through data centers by 2021, as companies take advantage of their immense capabilities to implement modern business intelligence solutions and grapple with the computing requirements of the Internet of Things (IoT).

    2. Liquid efficiency

    The efficiency of data centers is both an environmental concern and a large-scale economic issue for operators. Enterprises in diverse industries from automotive design to financial forecasting are implementing and relying on machine-learning in their applications, which results in more expensive and high-temperature data center infrastructure. It’s widely known that power and cooling represent the biggest costs that data center owners have to contend with, but new technologies are emerging to combat this threat. Liquid cooling is swiftly becoming more popular for those building new data centers, because of its incredible efficiency and its ability to future-proof data centers against the increasing heat being generated by demand for high-performance computing. The market is expected to grow to $2.5 billion by 2025 as a result.

    3. AI monitoring

    Monitoring software that implements the critical advances made in machine learning and artificial intelligence is one of the most successful technologies that data center operators have put into practice to improve efficiency. Machines are much more capable of reading and predicting the needs of data centers second to second than their human counterparts, and with their assistance operators can manipulate cooling solutions and power usage in order to dramatically increase energy efficiency.

    4. DNA storage

    In the two-year span between 2015 and 2017, more data was created than in all of preceding history. As this exponential growth continues, we may soon see the sheer quantity of data outstrip the ability of hard drives to capture it. But researchers are exploring the possibility of storing this immense amount of data within DNA, as it is said that a single gram of DNA is capable of storing 215 million gigabytes of information. DNA storage could provide a viable solution to the limitations of encoding on silicon storage devices, and meet the requirements of an ever-increasing number of data centers despite land constraints near urban areas. But it comes with its own drawbacks. Although it has improved considerably, it is still expensive and extremely slow to write data to DNA. Furthermore, getting data back from DNA involves sequencing it, and decoding files and finding / retrieving specific files stored on DNA is a major challenge. However, according to Microsoft research data, algorithms currently being developed may lower the cost of sequencing and synthesizing DNA plunge to levels that make it feasible in the future.

    5. Dynamic security

    The average cost of a cyber-attack to the impacted businesses will be more than $150 million by 2020, and data centers are at the center of the modern data security fight. Colocation facilities have to contend with the security protocols of multiple customers, and the march of data into the cloud means that hackers can gain access to it through multiple devices or applications. New physical and cloud security features are going to be critical for the evolution of the data center industry, including biometric security measures on-site to prevent physical access by even the most committed thieves or hackers. More strict security guidelines for cloud applications and on-site data storage will be a major competitive advantage for the most effective data center operators going forward as cyber-attacks grow more costly and more frequent. The digital economy is growing more dense and complex every single day, and data center builders and operators need to upgrade and build with the rising demand for artificial intelligence and machine learning in mind. This will make it necessary for greener, more automated, more efficient and more secure data centers to be able to safely host the services of the next generation of digital companies.

    Author: Gavin Flynn

    Source: Information-management

  • A new quantum approach to big data

    MIT-Quantum-Big-Data 0From gene mapping to space exploration, humanity continues to generate ever-larger sets of data — far more information than people can actually process, manage, or understand.
    Machine learning systems can help researchers deal with this ever-growing flood of information. Some of the most powerful of these analytical tools are based on a strange branch of geometry called topology, which deals with properties that stay the same even when something is bent and stretched every which way.


    Such topological systems are especially useful for analyzing the connections in complex networks, such as the internal wiring of the brain, the U.S. power grid, or the global interconnections of the Internet. But even with the most powerful modern supercomputers, such problems remain daunting and impractical to solve. Now, a new approach that would use quantum computers to streamline these problems has been developed by researchers at MIT, the University of Waterloo, and the University of Southern California.
    The team describes their theoretical proposal this week in the journal Nature Communications. Seth Lloyd, the paper’s lead author and the Nam P. Suh Professor of Mechanical Engineering, explains that algebraic topology is key to the new method. This approach, he says, helps to reduce the impact of the inevitable distortions that arise every time someone collects data about the real world.


    In a topological description, basic features of the data (How many holes does it have? How are the different parts connected?) are considered the same no matter how much they are stretched, compressed, or distorted. Lloyd explains that it is often these fundamental topological attributes “that are important in trying to reconstruct the underlying patterns in the real world that the data are supposed to represent.”


    It doesn’t matter what kind of dataset is being analyzed, he says. The topological approach to looking for connections and holes “works whether it’s an actual physical hole, or the data represents a logical argument and there’s a hole in the argument. This will find both kinds of holes.”
    Using conventional computers, that approach is too demanding for all but the simplest situations. Topological analysis “represents a crucial way of getting at the significant features of the data, but it’s computationally very expensive,” Lloyd says. “This is where quantum mechanics kicks in.” The new quantum-based approach, he says, could exponentially speed up such calculations.


    Lloyd offers an example to illustrate that potential speedup: If you have a dataset with 300 points, a conventional approach to analyzing all the topological features in that system would require “a computer the size of the universe,” he says. That is, it would take 2300 (two to the 300th power) processing units — approximately the number of all the particles in the universe. In other words, the problem is simply not solvable in that way.
    “That’s where our algorithm kicks in,” he says. Solving the same problem with the new system, using a quantum computer, would require just 300 quantum bits — and a device this size may be achieved in the next few years, according to Lloyd.


    “Our algorithm shows that you don’t need a big quantum computer to kick some serious topological butt,” he says.
    There are many important kinds of huge datasets where the quantum-topological approach could be useful, Lloyd says, for example understanding interconnections in the brain. “By applying topological analysis to datasets gleaned by electroencephalography or functional MRI, you can reveal the complex connectivity and topology of the sequences of firing neurons that underlie our thought processes,” he says.


    The same approach could be used for analyzing many other kinds of information. “You could apply it to the world’s economy, or to social networks, or almost any system that involves long-range transport of goods or information,” says Lloyd, who holds a joint appointment as a professor of physics. But the limits of classical computation have prevented such approaches from being applied before.


    While this work is theoretical, “experimentalists have already contacted us about trying prototypes,” he says. “You could find the topology of simple structures on a very simple quantum computer. People are trying proof-of-concept experiments.”


    Ignacio Cirac, a professor at the Max Planck Institute of Quantum Optics in Munich, Germany, who was not involved in this research, calls it “a very original idea, and I think that it has a great potential.” He adds “I guess that it has to be further developed and adapted to particular problems. In any case, I think that this is top-quality research.”
    The team also included Silvano Garnerone of the University of Waterloo in Ontario, Canada, and Paolo Zanardi of the Center for Quantum Information Science and Technology at the University of Southern California. The work was supported by the Army Research Office, Air Force Office of Scientific Research, Defense Advanced Research Projects Agency, Multidisciplinary University Research Initiative of the Office of Naval Research, and the National Science Foundation.

    Source:MIT news

  • A Shortcut Guide to Machine Learning and AI in The Enterprise

    advanced-predictive-proactive-etc-Two-men-fighting

    Predictive analytics / machine learning / artificial intelligence is a hot topic – what’s it about?

    Using algorithms to help make better decisions has been the “next big thing in analytics” for over 25 years. It has been used in key areas such as fraud the entire time. But it’s now become a full-throated mainstream business meme that features in every enterprise software keynote — although the industry is battling with what to call it.

    It appears that terms like Data Mining, Predictive Analytics, and Advanced Analytics are considered too geeky or old for industry marketers and headline writers. The term Cognitive Computing seemed to be poised to win, but IBM’s strong association with the term may have backfired — journalists and analysts want to use language that is independent of any particular company. Currently, the growing consensus seems to be to use Machine Learning when talking about the technology and Artificial Intelligence when talking about the business uses.

    Whatever we call it, it’s generally proposed in two different forms: either as an extension to existing platforms for data analysts; or as new embedded functionality in diverse business applications such as sales lead scoring, marketing optimization, sorting HR resumes, or financial invoice matching.

    Why is it taking off now, and what’s changing?

    Artificial intelligence is now taking off because there’s a lot more data available and affordable, powerful systems to crunch through it all. It’s also much easier to get access to powerful algorithm-based software in the form of open-source products or embedded as a service in enterprise platforms.

    Organizations today have also more comfortable with manipulating business data, with a new generation of business analysts aspiring to become “citizen data scientists.” Enterprises can take their traditional analytics to the next level using these new tools.

    However, we’re now at the “Peak of Inflated Expectations” for these technologies according to Gartner’s Hype Cycle — we will soon see articles pushing back on the more exaggerated claims. Over the next few years, we will find out the limitations of these technologies even as they start bringing real-world benefits.

    What are the longer-term implications?

    First, easier-to-use predictive analytics engines are blurring the gap between “everyday analytics” and the data science team. A “factory” approach to creating, deploying, and maintaining predictive models means data scientists can have greater impact. And sophisticated business users can now access some the power of these algorithms without having to become data scientists themselves.

    Second, every business application will include some predictive functionality, automating any areas where there are “repeatable decisions.” It is hard to think of a business process that could not be improved in this way, with big implications in terms of both efficiency and white-collar employment.

    Third, applications will use these algorithms on themselves to create “self-improving” platforms that get easier to use and more powerful over time (akin to how each new semi-autonomous-driving Tesla car can learn something new and pass it onto the rest of the fleet).

    Fourth, over time, business processes, applications, and workflows may have to be rethought. If algorithms are available as a core part of business platforms, we can provide people with new paths through typical business questions such as “What’s happening now? What do I need to know? What do you recommend? What should I always do? What can I expect to happen? What can I avoid? What do I need to do right now?”

    Fifth, implementing all the above will involve deep and worrying moral questions in terms of data privacy and allowing algorithms to make decisions that affect people and society. There will undoubtedly be many scandals and missteps before the right rules and practices are in place.

    What first steps should companies be taking in this area?
    As usual, the barriers to business benefit are more likely to be cultural than technical.

    Above all, organizations need to make sure they have the right technical expertise to be able to navigate the confusion of new vendors offers, the right business knowledge to know where best to apply them, and the awareness that their technology choices may have unforeseen moral implications.

    Source: timoelliot.com, October 24, 2016

     

  • About how Uber and Netflex turn Big Data into real business value

    client-logo-netflix-logo-png-netflix-logo-png-netflix-logo-qlHSS6-clipart

    From the way we go about our daily lives to the way we treat cancer and protect our society from threats, big data will transform every industry, every aspect of our lives. We can say this with authority because it is already happening.

    Some believe big data is a fad, but they could not be more wrong. The hype will fade, and even the name may disappear, but the implications will resonate and the phenomenon will only gather momentum. What we currently call big data today will simply be the norm in just a few years’ time.

    Big data refers generally to the collection and utilization of large or diverse volumes of data. In my work as a consultant, I work every day with companies and government organizations on big data projects that allow them to collect, store, and analyze the ever-increasing volumes of data to help improve what they do.

    In the course of that work, I’ve seen many companies doing things wrong — and a few getting big data very right, including Netflix and Uber.

    Netflix: Changing the way we watch TV and movies

    The streaming movie and TV service Netflix are said to account for one-third of peak-time Internet traffic in the US, and the service now have 65 million members in over 50 countries enjoying more than 100 million hours of TV shows and movies a day. Data from these millions of subscribers is collected and monitored in an attempt to understand our viewing habits. But Netflix’s data isn’t just “big” in the literal sense. It is the combination of this data with cutting-edge analytical techniques that makes Netflix a true Big Data company.

    Although Big Data is used across every aspect of the Netflix business, their holy grail has always been to predict what customers will enjoy watching. Big Data analytics is the fuel that fires the “recommendation engines” designed to serve this purpose.

    At first, analysts were limited by the lack of information they had on their customers. As soon as streaming became the primary delivery method, many new data points on their customers became accessible. This new data enabled Netflix to build models to predict the perfect storm situation of customers consistently being served with movies they would enjoy.

    Happy customers, after all, are far more likely to continue their subscriptions.

    Another central element to Netflix’s attempt to give us films we will enjoy is tagging. The company pay people to watch movies and then tag them with elements the movies contain. They will then suggest you watch other productions that were tagged similarly to those you enjoyed. 

    Netflix’s letter to shareholders in April 2015 shows their Big Data strategy was paying off. They added 4.9 million new subscribers in Q1 2015, compared to four million in the same period in 2014. In Q1 2015 alone, Netflix members streamed 10 billion hours of content. If Netflix’s Big Data strategy continues to evolve, that number is set to increase.

    Uber: Disrupting car services in the sharing economy

    Uber is a smartphone app-based taxi booking service which connects users who need to get somewhere with drivers willing to give them a ride. 

    Uber’s entire business model is based on the very Big Data principle of crowdsourcing: anyone with a car who is willing to help someone get to where they want to go can offer to help get them there. This gives greater choice for those who live in areas where there is little public transport, and helps to cut the number of cars on our busy streets by pooling journeys.

    Uber stores and monitors data on every journey their users take, and use it to determine demand, allocate resources and set fares. The company also carry out in-depth analysis of public transport networks in the cities they serve, so they can focus coverage in poorly served areas and provide links to buses and trains.

    Uber holds a vast database of drivers in all of the cities they cover, so when a passenger asks for a ride, they can instantly match you with the most suitable drivers. The company have developed algorithms to monitor traffic conditions and journey times in real time, meaning prices can be adjusted as demand for rides changes, and traffic conditions mean journeys are likely to take longer. This encourages more drivers to get behind the wheel when they are needed – and stay at home when demand is low. 

    The company have applied for a patent on this method of Big Data-informed pricing, which they call “surge pricing”. This is an implementation of “dynamic pricing” – similar to that used by hotel chains and airlines to adjust price to meet demand – although rather than simply increasing prices at weekends or during public holidays it uses predictive modelling to estimate demand in real time.

    Data also drives (pardon the pun) the company’s UberPool service. According to Uber’s blog, introducing this service became a no-brainer when their data told them the “vast majority of [Uber trips in New York] have a look-a-like trip – a trip that starts near, ends near and is happening around the same time as another trip”. 

    Other initiatives either trialed or due to launch in the future include UberChopper, offering helicopter rides to the wealthy, Uber-Fresh for grocery deliveries and Uber Rush, a package courier service.

    These are just two companies using Big Data to generate a very real advantage and disrupt their markets in incredible ways. I’ve compiled dozens more examples of Big Data in practice in my new book of the same name, in the hope that it will inspire and motivate more companies to similarly innovate and take their fields into the future. 

    Thank you for reading my post. Here at LinkedIn and at Forbes I regularly write about management, technology and Big Data. If you would like to read my future posts then please click 'Follow' and feel free to also connect via TwitterFacebookSlideshare, and The Advanced Performance Institute.

    You might also be interested in my new and free ebook on Big Data in Practice, which includes 3 Amazing use cases from NASA, Dominos Pizza and the NFL. You can download the ebook from here: Big Data in Practice eBook.

    Author: Bernard Marr

    Source: Linkedin Blog

  • AI-Powered Data Integration: A New Era of Efficiency and Intelligence

    AI-Powered Data Integration: A New Era of Efficiency and Intelligence

    Enterprises are creating and collecting more data than ever, around 2.5 quintillion bytes per day, which will likely continue in the coming years. Businesses are thus constantly looking for solutions that can efficiently collect and combine this data.  

    One of the best solutions these days to solve data integration woes is Artificial Intelligence (AI). Many businesses are increasingly adopting AI to rapidly evolve their data processes as they strive to streamline operations, improve decision-making, and gain a competitive edge.  

    AI is helping companies improve productivity and cut costs while allowing employees to deliver more value. AI is not just a short-term trend that is going to fade away. In fact, it will become prominent as technology improves and business requirements become more intricate.  

    Let’s look at the benefits of using AI to power data integration efforts and what the future holds.  

    Intelligent Data Mapping and Transformation 

    Data mapping is the critical component of data integration, which defines relationships between objects in different databases. AI has completely changed data mapping by making it more efficient and smarter. AI-powered data mapping can easily overcome the complexities of diverse data formats and systems, ensuring seamless data flow and harmonization. 

    Machine learning algorithms can analyze data patterns, learn from past integration patterns, and suggest mappings and transformations, reducing manual effort and accelerating integration projects and, consequently, time-to-insight.  

    AI can also automatically suggest relevant transmutations based on the nature of the data and past inputs, speeding up data processing. The best part about using AI is perhaps that it can automatically build ingestion pipelines from multiple sources within an enterprise, enabling a business to create a single source of truth.  

    Boosting Data Quality 

    It is cheaper to solve data quality issues proactively than reactively, not to mention quicker. AI plays a crucial role in accelerating data quality management during integration. AI tools allow businesses to identify and resolve data inconsistencies during run-time, as opposed to after the data is loaded and processed, thus ensuring the integrity and accuracy of integrated data for analysis. 

    These tools can automatically detect and rectify errors a human analyst might have missed (especially for vast datasets). For example, they can capture and remove outliers in a sales dataset to give a realistic average of monthly sales. In fraud detection, real-time integration with AI algorithms can flag suspicious activities, trigger alerts, and facilitate proactive measures to mitigate fraud risks. Basically, AI allows teams to scale their data initiatives while ensuring accuracy and completeness.  

    Real-time Integration and Workflow Automation 

    With AI, data integration transcends traditional processing. AI algorithms enable real-time data integration by continuously monitoring data streams and integrating data as it becomes available. This approach allows organizations to react swiftly to critical events like market fluctuations, customer behaviors, or operational changes. For example, real-time integration enables an e-commerce business to instantly update inventory levels across multiple channels, ensuring accurate stock availability and minimizing the risk of overselling.  

    Real-time integration is also helpful in situations with multiple connected devices and sources, such as an Internet of Things (IoT) ecosystem. It enables immediate detection and prompt fixing in case of device failures in home systems, for instance.  

    AI-driven solutions automate complex integration processes by automatically identifying data relationships, validating data integrity, and transforming data into the desired format. This automation is necessary in this fast-paced business environment as it minimizes errors, accelerates integration timelines, and frees up resources for more strategic tasks. 

    Future Outlook 

    The use of AI to power various data management processes, including data integration, will become more common. With time, AI solutions will become more adept at detecting and solving anomalies, further reducing the need for manual intervention. The demand for dedicated ETL and ELT developers will gradually decrease as AI empowers non-technical users to oversee the integration process.  

    Currently, many DI tools are limited by the number of connectors they support. As AI tech becomes more robust, it will allow data management providers to build solutions that support a more comprehensive range of sources.  

    Cognitive automation, driven by AI, will lead to more intelligent and autonomous data integration workflows. AI algorithms will optimize integration tasks, prioritize data processing based on relevance and urgency, and proactively identify data quality issues. This level of automation will result in more efficient data integration processes. 

    Lastly, the future holds great promise for specialized AI and ML engineers. The rise of AI will require trained professionals to implement and monitor advanced machine learning algorithms. Consequently, there will be a surge in the demand for relevant trainings and certifications. 

    Final Thoughts 

    There is no denying the fact that AI is the future. AI adoption has become necessary, given the speed at which the world is moving today. It is rapidly reshaping how organizations handle their processes, and data integration is no different. AI’s ability to automate tasks and improve data quality is the key to gaining real-time insights-the key to all competitive advantage.

    Date: August 2, 2023

    Author: Tehreem Naeem

    Source: Datafloq

  • Artificial intelligence: Can Watson save IBM?

    160104-Cloud-800x445The history of artificial intelligence has been marked by seemingly revolutionary moments — breakthroughs that promised to bring what had until then been regarded as human-like capabilities to machines. The AI highlights reel includes the “expert systems” of the 1980s and Deep Blue, IBM’s world champion-defeating chess computer of the 1990s, as well as more recent feats like the Google system that taught itself what cats look like by watching YouTube videos.

    But turning these clever party tricks into practical systems has never been easy. Most were developed to showcase a new computing technique by tackling only a very narrow set of problems, says Oren Etzioni, head of the AI lab set up by Microsoft co-founder Paul Allen. Putting them to work on a broader set of issues presents a much deeper set of challenges.
    Few technologies have attracted the sort of claims that IBM has made for Watson, the computer system on which it has pinned its hopes for carrying AI into the general business world. Named after Thomas Watson Sr, the chief executive who built the modern IBM, the system first saw the light of day five years ago, when it beat two human champions on an American question-and-answer TV game show, Jeopardy!
    But turning Watson into a practical tool in business has not been straightforward. After setting out to use it to solve hard problems beyond the scope of other computers, IBM in 2014 adapted its approach.
    Rather than just selling Watson as a single system, its capabilities were broken down into different components: each of these can now be rented to solve a particular business problem, a set of 40 different products such as language-recognition services that amount to a less ambitious but more pragmatic application of an expanding set of technologies.
    Though it does not disclose the performance of Watson separately, IBM says the idea has caught fire. John Kelly, an IBM senior vice-president and head of research, says the system has become “the biggest, most important thing I’ve seen in my career” and is IBM’s fastest growing new business in terms of revenues.
    But critics say that what IBM now sells under the Watson name has little to do with the original Jeopardy!-playing computer, and that the brand is being used to create a halo effect for a set of technologies that are not as revolutionary as claimed.

    “Their approach is bound to backfire,” says Mr Etzioni. “A more responsible approach is to be upfront about what a system can and can’t do, rather than surround it with a cloud of hype.”
    Nothing that IBM has done in the past five years shows it has succeeded in using the core technology behind the original Watson demonstration to crack real-world problems, he says.

    Watson’s case
    The debate over Watson’s capabilities is more than just an academic exercise. With much of IBM’s traditional IT business shrinking as customers move to newer cloud technologies, Watson has come to play an outsized role in the company’s efforts to prove that it is still relevant in the modern business world. That has made it key to the survival of Ginni Rometty, the chief executive who, four years after taking over, is struggling to turn round the company.
    Watson’s renown is still closely tied to its success on Jeopardy! “It’s something everybody thought was ridiculously impossible,” says Kris Hammond, a computer science professor at Northwestern University. “What it’s doing is counter to what we think of as machines. It’s doing something that’s remarkably human.”

    By divining the meaning of cryptically worded questions and finding answers in its general knowledge database, Watson showed an ability to understand natural language, one of the hardest problems for a computer to crack. The demonstration seemed to point to a time when computers would “understand” complex information and converse with people about it, replicating and eventually surpassing most forms of human expertise.
    The biggest challenge for IBM has been to apply this ability to complex bodies of information beyond the narrow confines of the game show and come up with meaningful answers. For some customers, this has turned out to be much harder than expected.
    The University of Texas’s MD Anderson Cancer Center began trying to train the system three years ago to discern patients’ symptoms so that doctors could make better diagnoses and plan treatments.
    “It’s not where I thought it would go. We’re nowhere near the end,” says Lynda Chin, head of innovation at the University of Texas’ medical system. “This is very, very difficult.” Turning a word game-playing computer into an expert on oncology overnight is as unlikely as it sounds, she says.

    Part of the problem lies in digesting real-world information: reading and understanding reams of doctors’ notes that are hard for a computer to ingest and organise. But there is also a deeper epistemological problem. “On Jeopardy! there’s a right answer to the question,” says Ms Chin but, in the
    medical world, there are often just well-informed opinions.
    Mr Kelly denies IBM underestimated how hard challenges like this would be and says a number of medical organisations are on the brink of bringing similar diagnostic systems online.


    Applying the technology
    IBM’s initial plan was to apply Watson to extremely hard problems, announcing in early press releases “moonshot” projects to “end cancer” and accelerate the development of Africa. Some of the promises evaporated almost as soon as the ink on the press releases had dried. For instance, a far-reaching partnership with Citibank to explore using Watson across a wide range of the bank’s activities, quickly came to nothing.
    Since adapting in 2014, IBM now sells some services under the Watson brand. Available through APIs, or programming “hooks” that make them available as individual computing components, they include sentiment analysis — trawling information like a collection of tweets to assess mood — and personality tracking, which measures a person’s online output using 52 different characteristics to come up with a verdict.

    At the back of their minds, most customers still have some ambitious “moonshot” project they hope that the full power of Watson will one day be able to solve, says Mr Kelly; but they are motivated in the short term by making improvements to their business, which he says can still be significant.
    This more pragmatic formula, which puts off solving the really big problems to another day, is starting to pay dividends for IBM. Companies like Australian energy group Woodside are using Watson’s language capabilities as a form of advanced search engine to trawl their internal “knowledge bases”. After feeding more than 20,000 documents from 30 years of projects into the system, the company’s engineers can now use it to draw on past expertise, like calculating the maximum pressure that can be used in a particular pipeline.
    To critics in the AI world, the new, componentised Watson has little to do with the original breakthrough and waters down the technology. “It feels like they’re putting a lot of things under the Watson brand name — but it isn’t Watson,” says Mr Hammond.
    Mr Etzioni goes further, claiming that IBM has done nothing to show that its original Jeopardy!-playing breakthrough can yield results in the real world. “We have no evidence that IBM is able to take that narrow success and replicate it in broader settings,” he says. Of the box of tricks that is now sold under the Watson name, he adds: “I’m not aware of a single, super-exciting app.”

    To IBM, though, such complaints are beside the point. “Everything we brand Watson analytics is very high-end AI,” says Mr Kelly, involving “machine learning and high-speed unstructured data”. Five years after Jeopardy! the system has evolved far beyond its original set of tricks, adding capabilities such as image recognition to expand greatly the range of real-world information it can consume and process.


    Adopting the system
    This argument may not matter much if the Watson brand lives up to its promise. It could be self-fulfilling if a number of early customers adopt the technology and put in the work to train the system to work in their industries, something that would progressively extend its capabilities.

    Another challenge for early users of Watson has been knowing how much trust to put in the answers the system produces. Its probabilistic approach makes it very human-like, says Ms Chin at MD Anderson. Having been trained by experts, it tends to make the kind of judgments that a human would, with the biases that implies.
    In the business world, a brilliant machine that throws out an answer
    to a problem but cannot explain itself will be of little use, says Mr Hammond. “If you walk into a CEO’s office and say we need to shut down three factories and sack people, the first thing the CEO will say is: ‘Why?’” He adds: “Just producing a result isn’t enough.”
    IBM’s attempts to make the system more transparent, for instance by using a visualisation tool called WatsonPaths to give a sense of how it reached a conclusion, have not gone far enough, he adds.
    Mr Kelly says a full audit trail of Watson’s decision-making is embedded in the system, even if it takes a sophisticated user to understand it. “We can go back and figure out what data points Watson connected” to reach its answer, he says.

    He also contrasts IBM with other technology companies like Google and Facebook, which are using AI to enhance their own services or make their advertising systems more effective. IBM is alone in trying to make the technology more transparent to the business world, he argues: “We’re probably the only ones to open up the black box.”
    Even after the frustrations of wrestling with Watson, customers like MD Anderson still believe it is better to be in at the beginning of a new technology.
    “I am still convinced that the capability can be developed to what we thought,” says Ms Chin. Using the technology to put the reasoning capabilities of the world’s oncology experts into the hands of other doctors could be far-reaching: “The way Amazon did for retail and shopping, it will change what care delivery looks like.”
    Ms Chin adds that Watson will not be the only reasoning engine that is deployed in the transformation of healthcare information. Other technologies will be needed to complement it, she says.
    Five years after Watson’s game show gimmick, IBM has finally succeeded in stirring up hopes of an AI revolution in business. Now, it just has to live up to the promises.

    Source: Financial Times

  • Bedrijven verwachten veel van Big Data

    Uit onderzoek van Forrester in opdracht van Xerox komt naar voren dat bijna driekwart van de Europese ondernemingen veel rendement verwacht van Big Data en analytics.

    big-data-1

    Voor het onderzoek werden gesprekken gevoerd met 330 senior business- (CEO, HR, Finance en Marketing) en IT-beslissers in Retail, Hightech, industriële en financiële dienstverlenende organisaties in België, Frankrijk, Duitsland, Nederland en het Verenigd Koninkrijk. Forrester concludeert dat 74 procent van de West-Europese bedrijven verwacht door inzichten verkregen met big data een return on investment (ROI) te realiseren binnen 12 maanden na implementatie. Meer dan de helft (56 procent) ervaart momenteel al de voordelen van big data.ondernemingen veel rendement verwacht van Big Data en analytics.

    Niet van een leien dakje
    Het simpelweg aanschaffen van een analysepakket voor grote hoeveelheden data is echter niet voldoende. Slechte datakwaliteit en het gebrek aan expertise belemmeren de transformatie die organisaties kunnen doormaken door met big data te werken. Er zal voldoende gekwalificeerd personeel moeten komen, om ervoor te zorgen dat op de juiste manier met de juiste data wordt gewerkt.

    Onderbuikgevoel
    Big data is essentieel bij het nemen van beslissingen in 2015: 61 procent van de organisaties zegt beslissingen steeds meer te baseren op data-driven intelligence, dan op factoren zoals onderbuikgevoel, mening of ervaring.

    Onjuiste data
    Onjuiste data blijken kostbaar: 70 procent van de organisaties heeft nog steeds onjuiste data in hun systemen en 46 procent van de respondenten is van mening dat dit zelfs een negatieve invloed heeft op de bedrijfsvoering.

    Veiligheid
    Van de respondenten beoordeelt 37 procent gegevensbeveiliging en privacy als de grootste uitdagingen bij het implementeren van big data-strategieën. Nederlandse organisaties zien het gebrek aan toegang tot interne data vanwege technische bottlenecks als grootste uitdaging bij de implementatie van big data (36 procent).

    Bron: Automatiseringsgids, 1 mei 2015

     

  • BI and Big Data: Same or Different?

    BI and Big Data: Same or Different?

    Webster dictionary defines a synonym as "a word having the same or nearly the same meaning" or as "a word or expression accepted as another name for something." This is so true for popular definitions of BI and big data. Forrester defines BI as:

    A set of methodologies, processes, architectures, and technologies that transform raw data into meaningful and useful information used to enable more effective strategic, tactical, and operational insights and decision-making.

    While BI has been a thriving market for decades and will continue to flourish for the foreseeable future, the world doesn't stand still and:

    • Recognizes a need for more innovation. Some of the approaches in earlier generation BI applications and platforms started to hit a ceiling a few years ago. For example, SQL and SQL-based database management systems (DBMS), while mature, scalable, and robust, are not agile and flexible enough in the modern world where change is the only constant.
    • Needs to addresses some of the limitations of earlier generation BI. In order to address some of the limitations of more traditional and established BI technologies, big data offers more agile and flexible alternatives to democratize all data, such as NoSQL, among many others.

    Forrester defines big data as:

     The practices and technologies that close the gap between the data available and the ability to turn that data into business insight.

    But at the end of the day, while new terms are important to emphasize the need to evolve, change, and innovate, what's infinitely more imperative is that both strive to achieve the same goal: transform data into information and insight. Alas, while many developers are beginning to recognize the synergies and overlaps between BI and big data, quite a few still consider and run both in individual silos.

    Contrary to some of the market hype, data democratization and big data do not eliminate the need for the "BI 101" basics, such as data governance, data quality, master data management, data modeling, well thought out data architecture, and many others. If anything, big data makes these tasks and processes more challenging because more data is available to more people, which in turn may cause new mistakes and drive wrong conclusions. All of the typical end-to-end steps necessary to transform raw data into insights still have to happen; now they just happen in different places and at different times in the process.

    To address this challenge in a "let's have the cake and eat it too" approach, Forrester suggests integrating the worlds of BI and big data in flexible hub-and-spoke data platform. Our hub-and-spoke BI/Big Data architecture defines such components as

    • Hadoop based data hubs/lakes to store and process majority of the enterprise data
    • Data discovery accelerators to help profile and discover definitions and meanings in data sources
    • Data governance that differentiates the processes you need to perform at the ingest, move, use, and monitor stages
    • BI that becomes one of many spokes of the Hadoop based data hub
    • A knowledge management portal to front end multiple BI spokes
    • Integrated metadata for data lineage and impact analysis

    Our research also recommends considering architecting the hub-and-spoke environment around the three following key areas:

    • A "cold" layer based on Hadoop where processes my run slower than in DBMS but the total cost of ownership is much lower. This is where the majority of your enterprise data should end up
    •  A "warm" are based on DBMS where queries run faster, but at a price. Forrester typically sees <30% of enterprise data stored and processed in data warehouses and data marts
    • A "hot" area based on in-memory technology for real time low latency interactive data exploration. While this area requires the most expensive software/hardware investments, real time data interactivity produces tangible business benefits.

    Auteur: Boris Evelson

    Bron: Information Management

  • Big Data Analytics in Banking

    Big Data Analytics in Banking

    Banking institutions need to use big data to remodel customer segmentation into a solution that works better for the industry and its customers. Basic customer segmentation generalizes customer wants and needs without addressing any of their pain points. Big data allows the banking industry to create individualized customer profiles that help decrease the pains and gaps between bankers and their clients. Big data analytics allows banks to examine large sets of data to find patterns in customer behavior and preferences. Some of this data includes social media behavior.

    • Demographic information.
    • Customer spending.
    • Product and service usage — including offers that customers have declined.
    • Impactful life events.
    • Relationships between bank customers.
    • Service preferences and attitudes toward the banking industry as a whole.

    Providing a Personalized Customer Experience with Big Data Analytics

    Banking isn’t known for being an industry that provides tailor-made customer service experiences. Now, with the combination of service history and customer profiles made available by big data analytics, bank culture is changing. 

    Profiling has an invasive ring to it, but it’s really just an online version of what bankers are already doing. Online banking has made it possible for customers to transfer money, deposit checks and pay bills all from their mobile devices. The human interaction that has been traditionally used to analyze customer behavior and create solutions for pain points has gone digital. 

    Banks can increase customer satisfaction and retention due to profiling. Big data analytics allows banks to create a more complete picture of what each of their customers is like, not just a generic view of them. It tracks their actual online banking behaviors and tailors its services to their preferences, like a friendly teller would with the same customer at their local branch. 

    Artificial Intelligence’s Role in Banking

    Nothing will ever beat the customer service you can receive in a conversation with a real human being. But human resources are limited by many physical factors that artificial intelligence (AI) can make up for. Where customer service agents may not be able to respond in a timely manner to customer inquiries depending on demand, AI can step in. 

    Chatbots enable customers to receive immediate answers to their questions. Their AI technology uses customer profile information and behavioral patterns to give personalized responses to inquiries. They can even recognize emotions to respond sensitively depending on the customers’ needs. 

    Another improvement we owe to AI is simplified online banking. Advanced machine learning accurately pulls information from documents uploaded online and on mobile apps. This technology is the reason why people can conveniently deposit checks from their smartphones. 

    Effective Fraud Prevention

    Identity fraud is one of the fastest growing forms of theft. With more than 16 million identity theft cases in 2017, fraud protection is becoming increasingly important in the banking industry. Big data analytics can help banks in securing customer account information.

    Business intelligence (BI) tools are used in banking to evaluate risk and prevent fraud. The big data retrieved from these tools determines interest rates for individuals, finds credit scores and pinpoints fraudulent behavior. Big data that’s analyzed to find market trends can help inform personal and industry-wide financial decisions, such as increasing debt monitoring rates.

    Similarly, using big data for predictive purposes can also help financial institutions avoid financial crises before they happen by collecting information on things like cross-border debt and debt-service ratios.

    The Future of Big Data Analytics

    The banking industry can say goodbye to their outdated system of customer guesswork. Big data analytics have made it possible to monitor the financial health and needs of customers, including small business clients. 

    Banks can now leverage big data analytics to detect fraud and assess risks, personalize banking services and create AI-driven customer resources. Data volume will only continue to increase with time as more people create and use this information. The mass of information will grow, but so will its profitability as more industries adopt big data analytic tools. 

    Big data will continue to aid researchers in discovering market trends and making timely decisions. The internet has changed the way people think and interact, which is why the banking industry must utilize big data to keep up with customer needs. As technology continues to improve at a rapid pace, any business who falls behind may be left there.

    Author: Shannon Flynn

    Source: Open Data Science

  • Big Data Analytics: hype?

    Big DAta explosion

    Er gaat momenteel geen dag voorbij of er is in de media wel een bericht of discussie te vinden rond data. Of het nu gaat om vraagstukken rond privacy, nieuwe mogelijkheden en bedreigingen van Big Data, of nieuwe diensten gebaseerd op het slim combineren en uitwisselen van gegevens: je kunt er niet onderuit dat informatie ‘hot’ is. 

    Is Big Data Analytics - ofwel de analyse van grote hoeveelheden data, veelal ongestructureerd - een hype? Toen de term enkele jaren geleden opeens overal opdook zeiden veel sceptici dat het een truc was van software leveranciers om iets bestaands - data analyse wordt al lang toegepast - opnieuw te vermarkten. Inmiddels zijn alle experts het er over eens dat Big Data Analytics in de vorm waarin het nu kan worden toegepast een enorme impact gaat hebben op de wereld zoals wij die kennen. Ja, het is een hype, maar wel een terechte.

    Big Data Analytics – wat is dat nou eigenlijk?

    Big Data is al jaren een hype, en zal dat nog wel even blijven. Wanneer is er nou sprake van ‘Big’ Data, bij hoeveel tera-, peta- of yottabytes (1024) ligt de grens tussen ‘Normal’ en ‘Big’ Data? Het antwoord is: er is geen duidelijke grens. Je spreekt van Big Data als het te veel wordt voor jouw mensen en middelen. Big Data Analytics richt zich op de exploratie van data middels statistische methoden om nieuwe inzichten op te doen waarmee de toekomstige prestaties verbeterd kunnen worden. 

    Big Data Analytics als stuurmiddel voor prestaties is al volop in gebruik bij bedrijven. Denk aan een sportclub die het inzet om te bepalen welke spelers ze gaan kopen. Of een bank die gestopt is alleen talenten te rekruteren van topuniversiteiten omdat bleek dat kandidaten van minder prestigieuze universiteiten het beter deden. Of bijvoorbeeld een verzekeringsmaatschappij die het gebruikt om fraude te detecteren. Enzovoorts. Enzovoorts. 

    Wat maakt Big Data Analytics mogelijk? 

    Tenminste drie ontwikkelingen zorgen ervoor dat Big Data Analytics een hele nieuwe fase ingaat. 

    1. Rekenkracht 

    De toenemende rekenkracht van computers stelt analisten in staat om enorme datasets te gebruiken, en een groot aantal variabelen te gebruiken in hun analyses. Door de toegenomen rekenkracht is het niet langer nodig om een steekproef te nemen zoals vroeger, maar kan alle data gebruikt worden voor een analyse. De analyse kan worden gedaan met behulp van specifieke tools en vereist vaak specifieke kennis en vaardigheden van de gebruiker, een data analist of data scientist. 

    2. Datacreatie 

    Het internet en social media zorgen ervoor dat de hoeveelheid data die we creëren exponentieel toeneemt. Deze data is inzetbaar voor talloze data-analyse toepassingen, waarvan de meeste nog bedacht moeten worden. 

    Om een beeld te krijgen van de datagroei, overweeg deze statistieken: 

    - Meer dan een miljard tweets worden iedere 48 uur verstuurd.

    - Dagelijks komen een miljoen Twitter accounts bij.

    - Iedere 60 seconden worden er 293.000 status updates gepost op facebook.

    - De gemiddelde Facebook gebruiker creëert 90 stukken content per maand, inclusief links, nieuws, verhalen, foto’s en video’s. 

    - Elke minuut komen er 500 Facebook accounts bij. 

    - Iedere dag worden 350 miljoen foto’s geupload op facebook, wat neerkomt op 4.000 foto’s per seconde.

    - Als Wikipedia een boek zou zijn, zou het meer dan twee miljard pagina’s omvatten. 

    Bron: http://www.iacpsocialmedia.org

    3. Dataopslag 

    De kosten voor het opslaan van data zijn sterk afgenomen de afgelopen jaren, wat de mogelijkheden om analytics toe te passen heeft doen groeien. Een voorbeeld is de opslag van videobeelden. Beveiligingscamera’s in een supermarkt namen eerst alles op tape op. Als er na drie dagen niks gebeurd was werd de band teruggespoeld en werd er opnieuw over opgenomen.  

    Dat is niet langer nodig. Een supermarkt kan nu digitale beelden - die de hele winkel vastleggen - naar de cloud versturen waar ze blijven opgeslagen. Vervolgens is het mogelijk analytics op deze beelden toe te passen: welke promoties werken goed? Voor welke schappen blijven mensen lang staan? Wat zijn de blinde hoeken in de winkel? Of predictive analytics: Stel dat we dit product in dit schap zouden leggen, wat zou het resultaat dan zijn? Deze analyses kan het management gebruiken om tot een optimale winkelinrichting te komen en maximaal rendement uit promoties te halen.  

    Betekenis Big Data Analytics

    Big Data - of Smart Data - zoals Bernard Marr, auteur van het nieuwe praktische boek ‘Big Data: Using SMART Big Data Analytics To Make Better Decisions and Improve Performance’ - het liever noemt is de wereld aan het veranderen. De hoeveelheid data neemt exponentieel toe momenteel, maar de hoeveelheid is voor de meeste beslissers grotendeels irrelevant. Het gaat erom hoe men het inzet om te komen tot waardevolle inzichten.  

    Big Data 

    De meningen zijn verdeeld over wat big data nou precies is. Gartner definieert big data vanuit de drie V’s Volume, Velocity en Variety. Het gaat dus om de hoeveelheid data, de snelheid waarmee de data verwerkt kan worden en de diversiteit van de data. Met dit laatste wordt bedoeld dat de data, naast gestructureerde bronnen, ook uit allerlei ongestructureerde bronnen gehaald kan worden, zoals internet en social media, inclusief tekst, spraak en beeldmateriaal.

    Analytics

    Wie zou niet de toekomst willen voorspellen? Met voldoende data, de juiste technologie en een dosis wiskunde komt dat binnen bereik. Dit wordt business analytics genoemd, maar er zijn veel andere termen in omloop, zoals data science, machine learning en, jawel, big data. Ondanks dat deze wiskunde al vrij lang bestaat, is het nog een relatief nieuw vakgebied dat tot voor kort alleen voor gespecialiseerde bedrijven met veel geld bereikbaar was.

    Toch maken we er zonder het te weten allemaal al gebruik van. Spraakherkenning op je telefoon, virusscanners op je PC en spamfilters voor email zijn gebaseerd op concepten die in het domein van business analytics vallen. Ook de ontwikkeling van zelfrijdende auto’s en alle stapjes daarnaartoe (adaptive cruise control, lane departure system, et cetera) zijn alleen mogelijk door machine learning. 

    Analytics is kortom de ontdekking en de communicatie van zinvolle patronen in data. Bedrijven kunnen analytics toepassen op zakelijke gegevens om hun bedrijfsprestaties te beschrijven, voorspellen en verbeteren. Er zijn verschillende soorten analytics, zoals tekst-analytics, spraak-analytics en video-analytics. 

    Een voorbeeld van tekst-analytics is een advocatenfirma die hiermee duizenden documenten doorzoekt om zo snel de benodigde informatie te vinden ter voorbereiding van een nieuwe zaak. Speech-analytics worden bijvoorbeeld gebruikt in callcenters om vast te stellen wat de stemming van de beller is, zodat de medewerker hier zo goed mogelijk op kan anticiperen. Video-analytics kan gebruikt worden voor het monitoren van beveiligingscamera’s. Vreemde patronen worden er zo uitgepikt, waarop beveiligingsmensen in actie kunnen komen. Ze hoeven nu zelf niet langer uren naar het scherm te staren terwijl er niks gebeurt.  

    Het proces kan zowel top-down als bottom-up benaderd worden. De meest toegepaste benaderingen zijn: 

    • Datamining: Dataonderzoek op basis van een gerichte vraag, waarin men op zoek gaat naar een specifiek antwoord.
    • Trend-analyse en predictive analytics: Door gericht op zoek te gaan naar oorzaak-gevolg verbanden om bepaalde gebeurtenissen te kunnen verklaren of om toekomstig gedrag te voorspellen.
    • Data discovery: Data onderzoeken op onverwachte verbanden of andere opvallende zaken.

    Feiten en dimensies

    De data die helpen om inzichten te verkrijgen of besluiten te nemen zijn feiten. Bijvoorbeeld EBITDA, omzet of aantal klanten. Deze feiten krijgen waarde door dimensies. De omzet over het jaar 2014 voor de productlijn babyvoeding in de Regio Oost. Door met dimensies te gaan analyseren kun je verbanden ontdekken, trends benoemen en voorspellingen doen voor de toekomst.

    Analytics versus Business Intelligence

    Waarin verschilt analytics nu van business intelligence (BI)? In feite is analytics op data gebaseerde ondersteuning van de besluitvorming. BI toont wat er gebeurd is op basis van historische gegevens die gepresenteerd worden in vooraf bepaalde rapporten. Waar BI inzicht geeft in het verleden, focust analytics zich op de toekomst. Analytics vertelt wat er kan gaan gebeuren door op basis van de dagelijks veranderende datastroom met ‘wat als’- scenario’s inschattingen te maken en risico’s en trends te voorspellen.

    Voorbeelden Big Data Analytics

    De wereld wordt steeds slimmer. Alles is meetbaar, van onze hartslag tijdens een rondje joggen tot de looppatronen in winkels. Door die data te gebruiken, kunnen we indrukwekkende analyses maken om bijvoorbeeld filevorming te voorkomen, epidemieën voortijdig te onderdrukken en medicijnen op maat aan te bieden.

    Deze evolutie is zelfs zichtbaar in de meest traditionele industrieën, zoals de visserij. In plaats van - zoals vanouds - puur te vertrouwen op een kompas en ‘insider knowledge’ doorgegeven door generaties vissersfamilies, koppelt de hedendaagse visser sensoren aan vissen en worden scholen opgespoord met de meest geavanceerde GPS-systemen. Big Data Analytics wordt inmiddels toegepast in alle industrieën en sectoren. Ook steden maken er gebruik van. Hieronder een overzicht van mogelijke toepassingen:

    Doelgroep beter begrijpen

    De Amerikaanse mega retailer Target weet door een combinatie van 25 aankopen wanneer een vrouw zwanger is. Dat is één van de weinige perioden in een mensenleven waarin koopgedrag afwijkt van routines. Hier speelt Target slim op in met baby-gerelateerde aanbiedingen. Amazon is zo goed geworden in predictive analytics dat ze producten al naar naar je toe kunnen sturen voordat je ze gekocht hebt. Als het aan hun ligt, kun je je bestelling binnenkort middels een drone binnen 30 minuten bezorgd krijgen.

    Processen verbeteren 

    Processen veranderen ook door Big Data. Bijvoorbeeld inkoop. Walmart weet dat er meer ‘Pop Tarts’ verkocht worden bij een stormwaarschuwing. Ze weten niet waarom dat is, maar ze zorgen er wel voor dat ze voldoende voorraad hebben en de snacks een mooie plek in de winkel geven. Een ander proces waar data grote kansen biedt voor optimalisatie is de supply chain. Welke routes laat je chauffeurs rijden en in welke volgorde laat je ze bestellingen afleveren? Real-time weer- en verkeerdata zorgt voor bijsturing. 

    Business optimalisatie

    Bij Q-Park betalen klanten per minuut voor parkeren, maar het is ook mogelijk een abonnement af te nemen. De prijs per minuut is bij een abonnement vele malen goedkoper. Als de garage vol begint te raken, is het vervelend als er net een klant met abonnement aan komt rijden, want dat kost omzet. Het analytics systeem berekent daarom periodiek de optimale mix van abonnementsplekken en niet abonnementsplekken op basis van historische gegevens. Zo haalt de garage exploitant het maximale eruit wat eruit te halen valt. 

    Optimalisatie machines 

    General Electric (GE) is een enthousiast gebruiker van big data. Het conglomeraat gebruikt al veel data in haar data-intensieve sectoren, zoals gezondheidszorg en financiële dienstverlening, maar het bedrijf ziet ook industriële toepassingen, zoals in GE’s businesses voor locomotieven, straalmotoren en gasturbines. GE typeert de apparaten in bedrijfstakken als deze ook wel als ‘dingen die draaien’ en verwacht dat de meeste van die dingen, zo niet alle, binnenkort gegevens over dat ‘draaien’ kunnen vastleggen en communiceren. 

    Een van die draaiende dingen is de gasturbine die de klanten van GE gebruiken voor energieopwekking. GE monitort nu al meer dan 1500 turbines vanuit een centrale faciliteit, dus een groot deel van de infrastructuur voor gebruik van big data om de prestaties te verbeteren is er al. GE schat dat het de efficiëntie van de gemonitorde turbines met minstens 1 procent kan verbeteren via software en netwerkoptimalisatie, doeltreffender afhandelen van onderhoud en betere harmonisering van het gas-energiesysteem. dat lijkt misschien niet veel, maar het zou neerkomen op een brandstofbesparing van 66 miljard dollar in de komende 15 jaar.
    (bron: 'Big Data aan het werk' door Thomas Davenport)

    Klantenservice en commercie

    Een grote winst van de nieuwe mogelijkheden van big data voor bedrijven is dat ze alles aan elkaar kunnen verbinden; silo’s, systemen, producten, klanten, enzovoorts. Binnen de telecom hebben ze bijvoorbeeld het cost-to-serve-concept geïntroduceerd. Daarmee kunnen zij vanuit de daadwerkelijke operatie kijken wat voor contactpunten ze met de klant hebben; hoe vaak hij belt met de klantenservice; wat zijn betaalgedrag is; hoe hij zijn abonnement gebruikt; hoe hij is binnengekomen; hoe lang hij klant is; waar hij woont en werkt; welke telefoon hij gebruikt; et cetera. 

    Wanneer het telecombedrijf de data van al die invalshoeken bij elkaar brengt, ontstaat er opeens een hele andere kijk op de kosten en omzet van die klant. In die veelheid van gezichtspunten liggen mogelijkheden. Alleen al door data te integreren en in context te bekijken, ontstaan gegarandeerd verrassende nieuwe inzichten. Waar bedrijven nu typisch naar kijken is de top 10 klanten die het meeste en minste bijdragen aan de omzet. Daar trekken ze dan een streep tussen. Dat is een zeer beperkte toepassing van de beschikbare data. Door de context te schetsen kan het bedrijf wellicht acties bedenken waarmee ze de onderste 10 kunnen enthousiasmeren iets meer te doen. Of er alsnog afscheid van nemen, maar dan weloverwogen.

    Slimme steden

    New York City maakt tegenwoordig gebruik van een ‘soundscape’ van de hele stad. Een verstoring in het typische stadsgeluid, zoals bijvoorbeeld een pistoolschot, wordt direct doorgegeven aan de politie die er op af kunnen. Criminelen gaan een moeilijke eeuw tegemoet door de toepassing van dergelijke Big Data Analytics. 

    Slimme ziekenhuizen

    Of het nu gaat om de informatie die gedurende een opname van een patiënt wordt verzameld of informatie uit de algemene jaarrapporten: Big Data wordt voor ziekenhuizen steeds belangrijker voor verbeterde patiëntenzorg, beter wetenschappelijk onderzoek en bedrijfsmatige informatie. Medische data verdubbelen iedere vijf jaar in volume. Deze gegevens kunnen van grote waarde zijn voor het leveren van de juiste zorg.

    HR Analytics

    Data kan worden aangewend om de prestaties van medewerkers te monitoren en te beoordelen. Dit geldt niet alleen voor de werknemers van bedrijven, maar zal ook steeds vaker worden toegepast om de toplaag van managers en leiders objectief te kunnen beoordelen. 

    Een bedrijf dat de vruchten heeft geplukt van HR Analytics is Google. De internet- en techgigant had nooit het geloof dat managers veel impact hadden, dus ging het analyticsteam aan de slag met de vraag: ‘Hebben managers eigenlijk een positieve impact bij Google?’ Hun analyse wees uit dat managers wel degelijk verschil maken en een positieve impact kunnen hebben bij Google. De volgende vraag was: ‘Wat maakt een geweldige manager bij Google?’ Dit resulteerde in 8 gedragingen van de beste managers en de 3 grootste valkuilen. Dit heeft geleid tot een zeer effectief training en feedback programma voor managers dat een hele positieve invloed heeft gehad op de performance van Google.  

    Big Data Analytics in het MKB

    Een veelgehoorde misvatting over Big Data is dat het alleen iets is voor grote bedrijven. Fout, want ieder bedrijf van groot naar klein kan data inzetten. Bernard Marr geeft in zijn boek een voorbeeld van een kleine mode retail onderneming waar hij mee samen heeft gewerkt. 

    De onderneming in kwestie wilden hun sales verhogen. Ze hadden alleen geen data om dit doel te bereiken op de traditionele sales data na. Ze bedachten toen eerst een aantal vragen:

    - Hoeveel mensen passeren onze winkels?

    - Hoeveel stoppen er om in de etalage te kijken en voor hoe lang?

    - Hoeveel komen vervolgens binnen?

    - Hoeveel kopen dan iets? 

    Vervolgens hebben ze een klein discreet apparaat achter het raam geplaatst dat het aantal passerende mobiele telefoons (en daarmee mensen) is gaan meten. Het apparaat legt ook vast hoeveel mensen voor de etalage blijven staan en voor hoe lang, en hoeveel er naar binnen komen. Sales data legt vervolgens vast hoeveel mensen wat kopen. De winkelketen kon vervolgens experimenteren met verschillende etalages om te testen welke het meest succesvol waren. Dit project heeft geleid tot fors meer omzet, en het sluiten van één worstelend filiaal waar onvoldoende mensen langs bleken te komen.  

    Conclusie

    De Big Data revolutie maakt de wereld in rap tempo slimmer. Voor bedrijven is de uitdaging dat deze revolutie plaatsvindt naast de ‘business as usual’. Er is nog veel te doen voordat de meeste ondernemingen in staat zijn echt te profiteren van Big Data Analytics. Het gros van de organisaties is al blij dat ze op een goede manier kunnen rapporteren en analyseren. Veel bedrijven moeten nog aan het experiment beginnen, iets waarbij ze mogelijk over hun koudwatervrees heen moeten stappen. Het is in ieder geval zeker dat er nu snel heel veel kansen zullen ontstaan. De race die nu begonnen is zal uitwijzen wie er met de nieuwe inzichten aan de haal gaan. 

    Auteur: Jeppe Kleyngeld

    Bron: FMI

                

  • Big data and the future of the self-driving car

    Big data and the future of the self-driving car

    Each year, car manufacturers get closer to successfully developing a fully autonomous vehicle. Over the last several years, major tech companies have paired up with car manufacturers to develop the advanced technology that will one day allow the majority of vehicles on the road to be autonomous. Of the five levels of automation, companies like Ford and Tesla are hovering around level three, which offers several autonomous driving functions but still requires a person to be attentive behind the wheel.

    However, car manufacturers are expected to release fully automatic vehicles to the public within the next decade. These vehicles are expected to have a large number of safety and environmental benefits. Self-driving technology has come a long way over the last few years, as the growth of big data in technology industries has helped provide car manufacturers with the programming data needed to get closer to fully automating cars. Big data is helping to install enough information and deep learning in autonomous cars to make them safer for all drivers.

    History of self-driving cars

    The first major automation in cars was cruise control, which was patented in 1950 and is used by most drivers to keep their speed steady during long drives nowadays. Most modern cars already have several automated functions, like proximity warnings and steering adjustment, which have been tried and tested, and proven to be valuable features for safe driving. These technologies use sensors to alert the driver when they are coming too close to something that may be out of the driver’s view or something that the driver may simply not have noticed.

    The fewer functions drivers have to worry about and pay attention to, the more they’re able to focus on the road in front of them and stay alert to dangerous circumstances that could occur at any moment. Human error causes 90 percent of all crashes on the roads, which is one of the main reasons so many industries support the development of autonomous vehicles. However, even when a driver is completely attentive, circumstances that are out of their control could cause them to go off the road or crash into other vehicles. Car manufacturers are still working on the programming for autonomous driving in weather that is less than ideal.

    Big data’s role in autonomous vehicle development

    Although these technologies provided small steps toward automation, they remained milestones away from a fully automated vehicle. However, over the last decade, with the large range of advancements that have been made in technology and the newfound use of big data, tech companies have discovered the necessary programming for fully automating vehicles. Autonomous vehicles rely entirely on the data they receive through GPS, radar and sensor technology, and the information they process through cameras.

    The information cars receive through these sources provides them with the data needed to make safe driving decisions. Although car manufacturers are still using stores of big data to work out the kinks of the thousands of scenarios an autonomous car could find itself in, it’s only a matter of time before self-driving cars transform the automotive industry by making up the majority of cars on the road. As the price of the advanced radars for these vehicles goes down, self-driving cars should become more accessible to the public, which will increase the safety of roads around the world.

    Big data is changing industries worldwide, and deep learning is contributing to the progress towards fully autonomous vehicles. Although it will still be several decades before the mass adoption of self-driving cars, the change will slowly but surely come. In only a few decades, we’ll likely be living in a time where cars are a safer form of transportation, and accidents are tragedies that are few and far between.

    Source: Insidebigdata

  • Big data can’t bring objectivity to a subjective world

    justiceIt seems everyone is interested in big data these days. From social scientists to advertisers, professionals from all walks of life are singing the praises of 21st-century data science.
     
    In the social sciences, many scholars apparently believe it will lend their subject a previously elusive objectivity and clarity. Sociology books like An End to the Crisis of Empirical Sociology? and work from bestselling authors are now talking about the superiority of “Dataism” over other ways of understanding humanity. Professionals are stumbling over themselves to line up and proclaim that big data analytics will enable people to finally see themselves clearly through their own fog.
     
    However, when it comes to the social sciences, big data is a false idol. In contrast to its use in the hard sciences, the application of big data to the social, political and economic realms won’t make these area much clearer or more certain.
     
    Yes, it might allow for the processing of a greater volume of raw information, but it will do little or nothing to alter the inherent subjectivity of the concepts used to divide this information into objects and relations. That’s because these concepts — be they the idea of a “war” or even that of an “adult” — are essentially constructs, contrivances liable to change their definitions with every change to the societies and groups who propagate them.
     
    This might not be news to those already familiar with the social sciences, yet there are nonetheless some people who seem to believe that the simple injection of big data into these “sciences” should somehow make them less subjective, if not objective. This was made plain by a recent article published in the September 30 issue of Science.
     
    Authored by researchers from the likes of Virginia Tech and Harvard, “Growing pains for global monitoring of societal events” showed just how off the mark is the assumption that big data will bring exactitude to the large-scale study of civilization.
     
    The systematic recording of masses of data alone won’t be enough to ensure the reproducibility and objectivity of social studies.
    More precisely, it reported on the workings of four systems used to build supposedly comprehensive databases of significant events: Lockheed Martin’s International Crisis Early Warning System (ICEWS), Georgetown University’s Global Data on Events Language and Tone (GDELT), the University of Illinois’ Social, Political, and Economic Event Database (SPEED) and the Gold Standard Report (GSR) maintained by the not-for-profit MITRE Corporation.
     
    Its authors tested the “reliability” of these systems by measuring the extent to which they registered the same protests in Latin America. If they or anyone else were hoping for a high degree of duplication, they were sorely disappointed, because they found that the records of ICEWS and SPEED, for example, overlapped on only 10.3 percent of these protests. Similarly, GDELT and ICEWS hardly ever agreed on the same events, suggesting that, far from offering a complete and authoritative representation of the world, these systems are as partial and fallible as the humans who designed them.
     
    Even more discouraging was the paper’s examination of the “validity” of the four systems. For this test, its authors simply checked whether the reported protests actually occurred. Here, they discovered that 79 percent of GDELT’s recorded events had never happened, and that ICEWS had gone so far as entering the same protests more than once. In both cases, the respective systems had essentially identified occurrences that had never, in fact, occurred.
     
    They had mined troves and troves of news articles with the aim of creating a definitive record of what had happened in Latin America protest-wise, but in the process they’d attributed the concept “protest” to things that — as far as the researchers could tell — weren’t protests.
     
    For the most part, the researchers in question put this unreliability and inaccuracy down to how “Automated systems can misclassify words.” They concluded that the examined systems had an inability to notice when a word they associated with protests was being used in a secondary sense unrelated to political demonstrations. As such, they classified as protests events in which someone “protested” to her neighbor about an overgrown hedge, or in which someone “demonstrated” the latest gadget. They operated according to a set of rules that were much too rigid, and as a result they failed to make the kinds of distinctions we take for granted.
     
    As plausible as this explanation is, it misses the more fundamental reason as to why the systems failed on both the reliability and validity fronts. That is, it misses the fact that definitions of what constitutes a “protest” or any other social event are necessarily fluid and vague. They change from person to person and from society to society. Hence, the systems failed so abjectly to agree on the same protests, since their parameters on what is or isn’t a political demonstration were set differently from each other by their operators.
     
    Make no mistake, the basic reason as to why they were set differently from each other was not because there were various technical flaws in their coding, but because people often differ on social categories. To take a blunt example, what may be the systematic genocide of Armenians for some can be unsystematic wartime killings for others. This is why no amount of fine-tuning would ever make such databases as GDELT and ICEWS significantly less fallible, at least not without going to the extreme step of enforcing a single worldview on the people who engineer them.
     
    It’s unlikely that big data will bring about a fundamental change to the study of people and society.
    Much the same could be said for the systems’ shortcomings in the validity department. While the paper’s authors stated that the fabrication of nonexistent protests was the result of the misclassification of words, and that what’s needed is “more reliable event data,” the deeper issue is the inevitable variation in how people classify these words themselves.
     
    It’s because of this variation that, even if big data researchers make their systems better able to recognize subtleties of meaning, these systems will still produce results with which other researchers find issue. Once again, this is because a system might perform a very good job of classifying newspaper stories according to how one group of people might classify them, but not according to how another would classify them.
     
    In other words, the systematic recording of masses of data alone won’t be enough to ensure the reproducibility and objectivity of social studies, because these studies need to use often controversial social concepts to make their data significant. They use them to organize “raw” data into objects, categories and events, and in doing so they infect even the most “reliable event data” with their partiality and subjectivity.
     
    What’s more, the implications of this weakness extend far beyond the social sciences. There are some, for instance, who think that big data will “revolutionize” advertising and marketing, allowing these two interlinked fields to reach their “ultimate goal: targeting personalized ads to the right person at the right time.” According to figures in the advertising industry “[t]here is a spectacular change occurring,” as masses of data enable firms to profile people and know who they are, down to the smallest preference.
     
    Yet even if big data might enable advertisers to collect more info on any given customer, this won’t remove the need for such info to be interpreted by models, concepts and theories on what people want and why they want it. And because these things are still necessary, and because they’re ultimately informed by the societies and interests out of which they emerge, they maintain the scope for error and disagreement.
     
    Advertisers aren’t the only ones who’ll see certain things (e.g. people, demographics, tastes) that aren’t seen by their peers.
     
    If you ask the likes of Professor Sandy Pentland from MIT, big data will be applied to everything social, and as such will “end up reinventing what it means to have a human society.” Because it provides “information about people’s behavior instead of information about their beliefs,” it will allow us to “really understand the systems that make our technological society” and allow us to “make our future social systems stable and safe.”
     
    That’s a fairly grandiose ambition, yet the possibility of these realizations will be undermined by the inescapable need to conceptualize information about behavior using the very beliefs Pentland hopes to remove from the equation. When it comes to determining what kinds of objects and events his collected data are meant to represent, there will always be the need for us to employ our subjective, biased and partial social constructs.
     
    Consequently, it’s unlikely that big data will bring about a fundamental change to the study of people and society. It will admittedly improve the relative reliability of sociological, political and economic models, yet since these models rest on socially and politically interested theories, this improvement will be a matter of degree rather than kind. The potential for divergence between separate models won’t be erased, and so, no matter how accurate one model becomes relative to the preconceptions that birthed it, there will always remain the likelihood that it will clash with others.
     
    So there’s little chance of a big data revolution in the humanities, only the continued evolution of the field.
  • Big Data changes CI fast!

    Few professions are left unchanged by technology. From healthcare to retail to professional sports, professionals of every stripe make use of technology to do what they do better through the application of technology’s most prolific output: data. So it would make sense that an entire industry based on the analysis of data and information also is undergoing a revision.

    Although it’s one of those industries few people think about, the competitive intelligence (CI) field has been gaining

    CI-BIG-DATA

     ground steadily since it was officially recognized as a discipline soon after World War II. The industry really did not get its official CI designation until the early 1980s, when its trade association, The Strategic and Competitive Intelligence Professionals (SCIP), was established.

    There are a few variations on the CI theme, customer intelligence and market intelligence being the most widely recognized outside of the profession. Because of the explosion of data sources, cheap processing power, and many analytics vendors today, “integrated intelligence” is taking over as the umbrella term under which all data collection, analysis, and interpretation and dissemination activities take place.

    “CI has expanded far beyond human intelligence and primary data collection at this point,” said SCIP Executive Director Nannette Bulger. “CI professionals are handling market sizing, segmentation, strategic analysis to support mergers and acquisitions, and so on.”

    As one would expect based on its name, CI is all about providing businesses with insights so they can best the competition. But with the rise of the Internet and the widespread dissemination of information and data, what was once an ivory tower type of pursuit involving high

    ly trained specialists is now, in many respects, the job of everyone in an organization. Data are everywhere. The ability to make sense of it all is still a rare skill, but its collection and organization are no longer the primary job of CI professionals.

    And this is challenging many in the business to take fresh look at what they do, said Bulger. Where once there were just a handful of players claiming CI leadership, now the industry is under siege by a growing host of data-focused startups as well as business intelligence software vendors, marketing automation players, and anyone else who analyzes data to understand business better.

    Bulger trots out example after example of companies that have figured out new business-focused uses for analytics tools developed for other purposes.

    “Before, it was just a lot of data dissemination,” she said. “Now, you have people coming out of MIT’s Media Lab working with cosmetics vendors, for example. There’s a tidal wave coming that the established vendors are trying to ignore.”

    The crest of that wave is companies coming into the CI market that may not be seen as threats by current vendors. There is a supply chain mapping company, for example, that is now doing analytics on their data to help companies avoid disruptions to their operations if a source of raw materials suddenly goes away. Providing a company with the knowledge it needs to switch suppliers quickly and continue operations is a serious competitive advantage in times of scarcity.

    While not CI directly but a great example of unlooked-for-uses of technology, the Formula One racing team McLaren is helping doctors monitor infants in intensive care units, said Bulger. Auto racing teams use real-time monitoring from sensors all over their racecars. As it turns out, this same technology is ideal for monitoring people’s vital signs.

    It is this type of disruption that has the profession in a state of flux, making even deciding who is an “intelligence” vendor and who is not problematic. Tech industry research house IDC has gone so far as to rebrand the entire industry, calling any business that provides other businesses with intelligence “value-added content” (VAC) providers:

    “Big Data and analytics (BDA) solutions are fueling a demand for more and wider varieties of data every day,” IDC wrote in a 2014 opinion brief about HGdata, a big data analytics company. “A raft of new companies that provide a range of data types – from wind speed data to data about what people are watching, reading, or listening to – are emerging to coexist with and sometimes replace more traditional data vendors in the information industry. What’s more is that organizations in many industries are curating and adding value to that content, in some cases transforming it completely and finding new ways of deriving economic value from the data. Value-added content (VAC) is an emerging market. Social media, blog posts, Web transactions, industrial data, and many other types of data are being aggregated, curated, enhanced, and sold to organizations hungry to understand their customers and products as well as the markets in which they exist.”

    At the end of the day, the big data revolution is not about data. It’s about doing what we do better. Whether improving a process, finding a cure for a rare disease, taking over market share from a competitor, or just understanding how things really work, all of these things can be done better through the analysis of data – the more data, the better. Like the world when viewed through the lens of an ultra-slow motion camera or at the tip of an electron microscope, big data gives people the ability to see things they otherwise would not be able to see.

    “I believe this is a huge growth area,” said Heather Cole, president of business intelligence solutions company Lodestar Solutions. “Companies are beginning to feel the effects of ‘digital disruption.’ They must be innovative to thrive. Customer intelligence is a valuable part of innovation. Companies that identify why their clients buy from them find new clients to serve or a new product to serve their existing clients, and will find it is much easier to hold margins and market share even in a highly competitive market.”

  • Big data defeats dengue

    mosquito-aedes-albopictusNumbers have always intrigued Wilson Chua, a big data analyst hailing from Dagupan, Pangasinan and currently residing in Singapore. An accountant by training, he crunches numbers for a living, practically eats them for breakfast, and scans through rows and rows of excel files like a madman.
     
    About 30 years ago, just when computer science was beginning to take off, Wilson stumbled upon the idea of big data. And then he swiftly fell in love. He came across the story of John Snow, the English physician who solved the cholera outbreak in London in 1854, which fascinated him with the idea even further. “You can say he’s one of the first to use data analysis to come out with insight,” he says.
     
    In 1850s-London, everybody thought cholera was airborne. Nobody had any inkling, not one entertained the possibility that the sickness was spread through water. “And so what John Snow did was, he went door to door and made a survey. He plotted the survey scores and out came a cluster that centered around Broad Street in the Soho District of London.
     
    “In the middle of Broad Street was a water pump. Some of you already know the story, but to summarize it even further, he took the lever of the water pump so nobody could extract water from that anymore. The next day,” he pauses for effect, “no cholera.”
     
    The story had stuck with him ever since, but never did he think he could do something similar. For Wilson, it was just amazing how making sense of numbers saved lives.
     
    A litany of data
     
    In 2015 the province of Pangasinan, from where Wilson hails, struggled with rising cases of dengue fever. There were enough dengue infections in the province—2,940 cases were reported in the first nine months of 2015 alone—for it to be considered an epidemic, had Pangasinan chosen to declare it.
     
    Wilson sat comfortably away in Singapore while all this was happening. But when two of his employees caught the bug—he had business interests in Dagupan—the dengue outbreak suddenly became a personal concern. It became his problem to solve.
     
    “I don’t know if Pangasinan had the highest number of dengue cases in the Philippines,” he begins, “but it was my home province so my interests lay there,” he says. He learned from the initial data released by the government that Dagupan had the highest incident of all of Pangasinan. Wilson, remembering John Snow, wanted to dig deeper.
     
    Using his credentials as a technology writer for Manila Bulletin, he wrote the Philippine Integrated Diseases Surveillance and Response team (PIDSR) of the Department of Health, requesting for three years worth of data on Pangasinan.
     
    The DOH acquiesced and sent him back a litany of data on an Excel sheet: 81,000 rows of numbers or around 27,000 rows of data per year. It’s an intimidating number but one “that can fit in a hard disk,” Wilson says.
     
    He then set out to work. Using tools that converted massive data into understandable patterns—graphs, charts, the like—he looked for two things: When dengue infections spiked and where those spikes happened.
     
    “We first determined that dengue was highly related to the rainy season. It struck Pangasinan between August and November,” Wilson narrates. “And then we drilled down the data to uncover the locations, which specific barangays were hardest hit.”
     
    The Bonuan district of the city of Dagupan, which covers the barangays of Bonuan Gueset, Bonuan Boquig, and Bonuan Binloc, accounted for a whopping 29.55 percent—a third of all the cases in Dagupan for the year 2015.
     
    The charts showed that among the 30 barangays, Bonuan Gueset was number 1 in all three years. “It means to me that Bonuan Gueset was the ground zero, the focus of infection.”
     
    But here’s the cool thing: After running the data on analytics, Wilson learned that the PIDS sent more than they had hoped for. They also included the age of those affected. According to the data, dengue in Bonuan was prevalent among school children aged 5-15 years old.
     
    “Now given the background of Aedes aegypti, the dengue-carrying mosquito—they bite after sunrise and a few hours before sunset. So it’s easily to can surmise that the kids were bitten while in school.”
     
    It excited him so much he fired up Google Maps and switched it to satellite image. Starting with Barangay Bonuan Boquig, he looked for places that had schools that had stagnant pools of water nearby. “Lo and behold, we found it,” he says.
     
    Sitting smack in the middle of Lomboy Elementary School and Bonuan Boquig National High School were large pools of stagnant water.
    Like hitting jackpot, Wilson quickly posted his findings on Facebook, hoping someone would take up the information and make something out of it. Two people hit him up immediately: Professor Nicanor Melecio, the project director of the e-Smart Operation Center of Dagupan City Government, and Wesley Rosario, director at the Bureau of Fisheries and Aquatic Resources, a fellow Dagupeño.
     
    A social network
     
    Unbeknownst to Wilson, back in Dagupan, the good professor had been busy, conducting studies on his own. The e-Smart Center, tasked with crisis, flooding, disaster-type of situation, had been looking into the district’s topography vis-a-vis rainfall in Bonuan district. “We wanted to detect the catch basins of the rainfall,” he says, “the elevation of the area, the landscape. Basically, we wanted to know the deeper areas where rainfall could possibly stagnate.”
     
    Like teenage boys, the two excitedly messaged each other on Facebook. “Professor Nick had lieder maps of Dagupan, and when he showed me those, it confirmed that these areas, where we see the stagnant water, during rainfall, are those very areas that would accumulate rainfall without exit points,” Wilson says. With no sewage system, the water just sat there and accumulated.
     
    With Wilson still operating remotely in Singapore, Professor Melecio took it upon himself to do the necessary fieldwork. He went to the sites, scooped up water from the stagnant pools, and confirmed they were infested with kiti-kiti or wriggling mosquito larvae.
     
    Professor Melecio quickly coordinated with Bonuan Boquig Barangay Captain Joseph Maramba to involve the local government of Bonuan Boquig on their plan to conduct vector control measures.
     
    A one-two punch
     
    Back in Singapore, Wilson found inspiration from the Tiger City’s solution to its own mosquito problem. “They used mosquito dunks that contained BTI, the bacteria that infects mosquitoes and kills its eggs,” he says.
     
    He used his own money to buy a few of those dunks, imported them to Dagupan, and on Oct. 6, had his team scatter them around the stagnant pools of Bonuan Boquig. The solution was great, dream-like even, except it had a validity period. Beyond 30 days, the bacteria is useless.
     
    Before he even had a chance to even worry about the solution’s sustainability, BFAR director Wesley Rosario pinged him on Facebook saying the department had 500 mosquito fish for disposal. “Would we want to send somebody to his office, get the fish, and release them into the pools?”
     
    The Gambezi earned its nickname because it eats, among other things, mosquito larvae. In Wilson’s and Wesley’s mind, the mosquito fish can easily make a home out of the stagnant pools and feast on the very many eggs present. When the dry season comes, the fish will be left to die. Except, here’s the catch: mosquito fish is edible.
     
    “The mosquito fish solution was met with a few detractors,” Wilson admits. “There are those who say every time you introduce a new species, it might become invasive. But it’s not really new as it is already endemic to the Philippines. Besides we are releasing them in a landlocked area, so wala namang ibang ma-a-apektuhan.”
     
    The critics, however, were silenced quickly. Four days after deploying the fish, the mosquito larvae were either eaten or dead. Twenty days into the experiment, with the one-two punch of the dunks and the fish, Barangay Boquig reported no new infections of dengue.
     
    “You know, we were really only expecting the infections to drop 50 percent,” Wilson says, rather pleased. More than 30 days into the study and Barangay Bonuan Boquig still has no reports of new cases. “We’re floored,” he added.
     
    At the moment, nearby barangays are already replicating what Wilson, Professor Melecio, and Wesley Rosario have done with Bonuan Boquig. Michelle Lioanag of the non-profit Inner Wheel Club of Dagupan has already taken up the cause to do the same for Bonuan Gueset, the ground zero for dengue in Dagupan.
     
    According to Wilson, what they did in Bonuan Boquig is just a proof of concept, a cheap demonstration of what big data can do. “It was so easy to do,” he said. “Everything went smoothly,” adding all it needed was cooperative and open-minded community leaders who had nothing more than sincere public service in their agenda.
     
    “You know, big data is multi-domain and multi-functional. We can use it for a lot of industries, like traffic for example. I was talking with the country manager of Waze…” he fires off rapidly, excited at what else his big data can solve next.
     
    Source: news.mb.com, November 21, 2016
  • Big Data en privacy kunnen zeker goed samengaan

    fd-big-dataBig Data wordt gezien als de toekomst van onderzoek en dienstverlening. De economische belofte is groot. Vanuit die optiek proclameren consultancybedrijven regelmatig dat je als bedrijf nu op de Big Data-trein moet springen, of anders over 5 jaar 'out of business' bent. Niet meedoen is dus geen optie. Tegelijkertijd zijn veel bedrijven huiverig, omdat Big Data problematisch kan zijn in het kader van privacy. Met nieuwe, strengere privacyregelgeving op komst kan dat betekenen dat meedoen met Big Data betekent dat je over twee jaar juist 'out of business' bent. Tijd om privacy innovatie expliciet onderdeel te maken van Big Data ontwikkelingen. Zo kunnen de economische vruchten van Big Data geplukt worden, terwijl privacy van gebruikers gerespecteerd wordt.

    De verwachte economische omvang van Big Data zal volgens Forbes groeien naar een wereldwijde markt van $ 122 mrd in 2025. Vanzelfsprekend is dit een interessant gebied voor de EU, die toch van origine een economische samenwerking is. Vanuit diezelfde EU kwam in december de definitieve tekst van de Algemene Verordening Gegevensbescherming, die de huidige regelgeving over privacy en bescherming van persoonsgegevens gaat vervangen. Als het Europees Parlement de tekst goedkeurt, deze maand of februari, wordt de nieuwe wetgeving in 2017 van kracht. Wie daarna niet voldoet maakt kans op torenhoge boetes (4% van de wereldwijde jaaromzet). Bedrijven zullen dan activiteiten moeten stopzetten als ze zich niet aan de regels uit die Verordening houden.

    Het verwerken van grote hoeveelheden persoonsgegevens, zonder vooraf duidelijk vastgesteld doel, zal niet zomaar toegestaan zijn. En was het hele idee van Big Data nu niet juist dat je enorme hoeveelheden gegevens verwerkt? En is het analyseren van gegevens met behulp van algoritmen, waarbij je de uitkomst (en het mogelijke doel) niet vooraf kúnt voorspellen, niet juist één van de kroonjuwelen van Big Data? Want hoe handig is het als een supermarktketen op basis van je aankoopgedrag als eerste weet dat je zwanger bent, of wanneer een verzekeraar op basis van Big Data kan voorspellen of je een risico bent qua gezondheid of rijgedrag zodat je wellicht meer premie moet betalen?

    Het is duidelijk dat privacybescherming bij Big Data toepassingen niet vanzelfsprekend is. En dat Big Data ook echt voordelen oplevert, natuurlijk. Gelukkig biedt de Verordening ook uitkomst. Het principe van Data Protection by Design zal een belangrijke rol spelen. Dat betekent dat organisaties de vereisten voor bescherming van persoonsgegevens in moeten bedden in de ontwikkeling van nieuwe diensten. En als je dat bij Big Data toepassingen goed doet hoeft de Verordening dus niet te betekenen dat je 'out of business' bent. Zeker niet wanneer je als organisatie je klanten echt centraal stelt. Het faciliteren van privacy kan immers ook een nicheproduct zijn, en een kansrijk product bovendien. Denk aan toepassingen waarbij eerst op basis van geaggregeerde en geanonimiseerde gegevens analyses worden gedaan.

    Vervolgens kun je op basis van de kennis die daaruit voortvloeit gerichte producten of diensten aanbieden door op individueel niveau de koppeling te maken, met de toestemming van de gebruiker uiteraard. Want die ziet dan ook de toegevoegde waarde en weet welk product of dienst hij krijgt. Innovatieve benaderingen kunnen Big Data faciliteren, mogelijk verbeteren, en met inachtneming van privacy een belangrijk dienstensegment ontsluiten. En als je het goed doet word je ook geen slachtoffer van kritiek van consumenten en onjuiste beeldvorming in de media. Daardoor zijn immers in het recente verleden al enkele initiatieven de das omgedaan.

    Organisaties moeten dus zeker op de Big Data trein springen. Maar als ze met die Big Data nog langer dan een paar jaar vooruit willen moeten ze wel eerste klas reizen, in de Data Protection by Design wagonnetjes. Ze moeten vooral ontdekken wat mogelijk is onder de nieuwe Verordening en wat ze daarvoor moeten doen, in plaats van alleen hindernissen te zien. Want Big Data en privacy kunnen prima samengaan, als je er bij het ontwerp van je diensten maar aan denkt.

    Mr.dr. Arnold Roosendaal is onderzoeker Strategie en Beleid voor de Informatiemaatschappij bij TNO en tevens verbonden aan het PI.lab.

    Source Financieel Dagblad

  • Big Data Facts: How Many Companies Are Really Making Money From Their Data?

    data monetizationMore and more businesses are waking up to the importance of data as a strategic resource. Yesterday, research released by the Economist Intelligence Unit reported that 60% of the professionals they quizzed feel that data is generating revenue within their organizations and 83% say it is making existing services and products more profitable.

    After surveying 476 executives from around the world, it found that those based in Asia are leading the way – where 63% said they are routinely generating value from data. In the US, the figure was 58%, and in Europe, 56%.

    This makes it clear that businesses are finding more and more ways to turn data into value, but at the same time, the report found, many are hitting stumbling blocks which are frustrating those efforts. Just 34% of respondents said that they feel their organizations are “very effective” at being transparent with customers about how data is used. And 9% say they feel that they are “totally ineffective” in this area, which can be very detrimental to building the all-important customer trust.

    For businesses that are built around customer data (or those which are repurposing to be so), customer trust is absolutely essential. We have seen that people are becoming increasingly willing to hand over personal data in return for products and services that make their lives easier. However that goodwill can evaporate in an instant if customers feel their data is being used improperly, or not effectively protected.

    The report states that ‘Big Data analysis, or the mining of extremely large data sets to identify trends and patterns, is fast becoming standard business practice.

    “Global technology infrastructure, too, has matured to an extent that reliability, speed and security are all typically robust enough to support the seamless flow of massive volumes of data, and consequently encourage adoption.”

    It also goes on to suggest that more and more businesses, taking cues from online giants such as Facebook and Google, are positioning themselves as data-first operations, built entirely around data collection, analysis and redistribution – as opposed to simply using it as a source of business intelligence.

    59% of respondents said that they consider data and analytics to be “vital” to the running of their organizations, with a further 29% deeming it “very important”.

    The increasing availability of cloud processing, analytics and storage services has undoubtedly opened the floodgates in terms of making Big Data driven analytics accessible to businesses of all sizes across many industries. But I feel this survey also backs up warnings that I, and others, have been voicing for some time. Data, particularly Big Data, is an almost infinitely empowering asset – but its use can be limited, or it can even become a liability if it isn’t backed up by a robust (and regulator-compliant) strategy.

    Interestingly, just under half of those surveyed (47%) say that their data analytics is limited to data they have collected themselves – through internal processes, commercial activity and customer services. I would expect this number to shrink in coming years, as more and more organizations become accustomed to adding data provided by third parties such as data wholesalers and governments into the mix.

    Another statistic which stood out to me was that 69% feel there is a business case with their companies to set up a dedicated internal data and analytics unit, with the purpose of exploring new ways to add value through data projects. This is probably driven by the fact that 48% feel that their organizations have, in the past, failed to take advantage of opportunities to capitalize on their data. I fully expect to see dedicated data teams and working groups become an increasingly vital piece of corporate infrastructure over the next few years, well beyond industries such as tech and finance where they are already commonplace.

    Overall, it seems businesses are fairly confident about their ability to keep our data safe – with 82% saying that their data protection procedures are “very” or “somewhat” effective. However, we know that large scale theft of customer data from corporations is an ever-growing problem. Executives at organizations hit by this type of crime recently – such as Anthem, Talk Talk and the US Government – were presumably fairly confident that their systems were safe too – until they discovered that they weren’t. The report also makes it clear that data breaches are certainly not limited to the high profile incidents that receive coverage in the media. In fact, fairly shockingly, 34% of respondents said that their businesses had suffered “significant” data breaches within the past 12 months.

    The EIU report, which can be read in full here, makes it clear that adoption of Big Data driven strategies has come on in leaps and bounds during the last year. However it is also equally clear that there is still a long way to go until every business is secure enough in its infrastructure to transition to a fully data driven business model.

    Source: Forbes, January 14th, 2016

     

     

  • Big Data gaat onze zorg verbeteren

    Hij is een man met een missie. En geen geringe: hij wil samen met patiënten, de zorgverleners en verzekeraars een omslag in de gezondheidszorg bewerkstelligen, waarbij de focus verlegd wordt van het managen van ziekte naar het managen van gezondheid. Jeroen Tas, CEO Philips Connected Care & Health Informatics, over de toekomst van de zorg.

    big-data-healthcare-2Wat is er mis met het huidige systeem?

    “In de ontwikkelde wereld wordt gemiddeld 80 procent van het budget voor zorg besteed aan het behandelen van chronische ziektes, zoals hart- en vaatziektes, longziektes, diabetes en verschillende vormen van kanker. Slechts 3 procent van dat budget wordt besteed aan preventie, aan het voorkomen van die ziektes. Terwijl we weten dat 80 procent van hart- en vaatziekten, 90 procent van diabetes type 2 en 50 procent van kanker te voorkomen zijn. Daarbij spelen sociaaleconomische factoren mee, maar ook voeding, wel of niet roken en drinken, hoeveel beweging je dagelijks krijgt en of je medicatie goed gebruikt. We sturen dus met het huidige systeem lang niet altijd op op de juiste drivers om de gezondheid van mensen te bevorderen en hun leven daarmee beter te maken. 50 procent van de patiënten neemt hun medicatie niet of niet op tijd in. Daar liggen mogelijkheden voor verbetering.”

    Dat systeem bestaat al jaren - waarom is het juist nu een probleem?
    “De redenen zijn denk ik alom bekend. In veel landen, waaronder Nederland, vergrijst de bevolking en neemt daarmee het aantal chronisch zieken toe, en dus ook de druk op de zorg. Daarbij verandert ook de houding van de burger ten aanzien van zorg: beter toegankelijk, geïntegreerd en 24/7, dat zijn de grote wensen. Tot slot nemen de technologische mogelijkheden sterk toe. Mensen kunnen en willen steeds vaker zelf actieve rol spelen in hun gezondheid: zelfmeting, persoonlijke informatie en terugkoppeling over voortgang. Met Big Data zijn we nu voor het eerst in staat om grote hoeveelheden data snel te analyseren, om daarin patronen te ontdekken en meer te weten te komen over ziektes voorspellen en voorkomen. Kortom, we leven in een tijd waarin er binnen korte tijd heel veel kan en gaat veranderen. Dan is het belangrijk om op de juiste koers te sturen.”

    Wat moet er volgens jou veranderen?
    “De zorg is nog steeds ingericht rond (acute) gebeurtenissen. Gezondheid is echter een continu proces en begint met gezond leven en preventie. Als mensen toch ziek worden, volgt er diagnose en behandeling. Vervolgens worden mensen beter, maar hebben ze misschien nog wel thuis ondersteuning nodig. En hoop je dat ze weer verder gaan met gezond leven. Als verslechtering optreedt is tijdige interventie wenselijk. De focus van ons huidige systeem ligt vrijwel volledig op diagnose en behandeling. Daarop is ook het vergoedingssysteem gericht: een radioloog wordt niet afgerekend op zijn bijdrage aan de behandeling van een patiënt maar op de hoeveelheid beelden die hij maakt en beoordeelt. Terwijl we weten dat er heel veel winst in termen van tijd, welzijn en geld te behalen valt als we juist meer op gezond leven en preventie focussen. 

    Er moeten ook veel meer verbanden komen tussen de verschillende pijlers in het systeem en terugkoppeling over de effectiviteit van diagnose en behandeling. Dat kan bijvoorbeeld door het delen van informatie te stimuleren. Als een cardioloog meer gegevens heeft over de thuissituatie van een patiënt, bijvoorbeeld over hoe hij zijn medicatie inneemt, eet en beweegt, dan kan hij een veel beter behandelplan opstellen, toegesneden op de specifieke situatie van de patiënt. Als de thuiszorg na behandeling van die patiënt ook de beschikking heeft over zijn data, weet men waarop er extra gelet moet worden voor optimaal herstel. En last maar zeker not least, de patiënt moet ook over die data beschikken, om zo gezond mogelijk te blijven. Zo ontstaat een patiëntgericht systeem gericht op een optimale gezondheid.”

    Dat klinkt heel logisch. Waarom gebeurt het dan nog niet?
    “Alle verandering is lastig – en zeker verandering in een sector als de zorg, die om begrijpelijke redenen conservatief is en waarin er complexe processen spelen. Het is geen kwestie van technologie: alle technologie die we nodig hebben om de omslag tot stand te brengen, is er. We hebben sensoren om data automatisch te generen, die in de omgeving van de patiënt kunnen worden geïnstalleerd, die hij kan dragen – denk aan een Smarthorloge – en die zelfs in zijn lichaam kunnen zitten, in het geval van slimme geneesmiddelen. Daarmee komt de mens centraal te staan in het systeem, en dat is waar we naartoe willen.
    Er moet een zorgnetwork om ieder persoon komen, waarin onderling data wordt gedeeld ten behoeve van de persoonlijke gezondheid. Dankzij de technologie kunnen veel behandelingen ook op afstand gebeuren, via eHealth oplossingen. Dat is veelal sneller en vooral efficiënter dan mensen standaard doorsturen naar het ziekenhuis. Denk aan thuismonitoring, een draagbaar echo apparaat bij de huisarts of beeldbellen met een zorgverlener. We kunnen overigens al hartslag, ademhaling en SPo2 meten van een videobeeld. 

    De technologie is er. We moeten het alleen nog combineren, integreren en vooral: implementeren. Implementatie hangt af van de bereidheid van alle betrokkenen om het juiste vergoedingsstelsel en samenwerkingsverband te vinden: overheid, zorgverzekeraars, ziekenhuis, artsen, zorgverleners en de patiënt zelf. Daarover ben ik overigens wel positief gestemd: ik zie de houding langzaam maar zeker veranderen. Er is steeds meer bereidheid om te veranderen.”

    Is die bereidheid de enige beperkende factor?
    “We moeten ook een aantal zaken regelen op het gebied van data. Data moet zonder belemmeringen kunnen worden uitgewisseld, zodat alle gegevens van een patiënt altijd en overal beschikbaar zijn. Dat betekent uiteraard ook dat we ervoor moeten zorgen dat die gegevens goed beveiligd zijn. We moeten ervoor zorgen dat we dat blijvend kunnen garanderen. En tot slot moeten we werken aan het vertrouwen dat nodig is om gegevens te standaardiseren en te delen, bij zorgverleners en vooral bij de patiënt.Dat klinkt heel zwaar en ingewikkeld maar we hebben het eerder gedaan. Als iemand je twintig jaar geleden had verteld dat je via internet al je bankzaken zou regelen, zou je hem voor gek hebben versleten: veel te onveilig. Inmiddels doen we vrijwel niet anders.
    De shift in de zorg nu vraagt net als de shift in de financiële wereld toen om een andere mindset. De urgentie is er, de technologie is er, de bereidheid ook steeds meer – daarom zie ik de toekomst van de zorg heel positief in.”

     Bron: NRC
  • Big Data in 2016: Cloudy, with a Chance of Disappointment, Disillusionment, and Disruption

    Chris Surdak NEWLike last year, I thought that I’d wrap up my writing calendar with some prognostications on Big Data in 2016. I doubt any of these six will come as a surprise to most readers, but what may be a surprise is how emphatically our worlds will have changed twelve months from now, when I take a crack at predicting the world of 2017. Happy New Year!

    1. Welcome to the Trough
    As Big Data moves through the Gartner hype cycle for technology adoption, we will naturally progress into the “trough of Disillusionment.” Organizations have been whipped into a frenzied pitch by the promise of Big Data, and nearly all organizations have been attempting to use Big Data to transform their business, or at least the results that they produce.

    Because Big Data has been the latest “Big Thing” and “Shiny, New Object” in the business world, it has been ever so slightly over-sold; particularly over the last year or so. Organizations have been told that all they need to do is buy and implement this or that new technology and magically, they’ll have amazing new results from their businesses. Unfortunately, like every technology innovation that preceded it, Big Data is merely an enabler, enhancer and amplifier. If your business processes or management approaches are garbage, Big Data will make them much more so.
    Expect to see many organizations become deeply disillusioned by Big Data in 2016 because they had hoped to get different results from their business, without using Big Data to actually change how they operated. Those who used Big Data to make substantive changes to how they operate will dramatically out-compete those who used Big Data to produce merely-more-detailed reports, but little actual change.

    2. The Cloudy Future of Analytics
    For years, Big Data has been too big, too expensive and too complicated for anyone outside of the Fortune 500. After all, the technologies were new, unproven and not even close to ready for prime time, and “real” data scientists were tied up in Universities, large companies, government agencies or any number of tiny, disruptive startup companies. Hence, many small- and mid-sized companies were left on the sidelines of this revolution.
    This year, you will see an explosion of cloud-based analytics solutions designed to embrace the mid-market. Some may merely provide storage and compute capacity while others will provide full-blown analytics platforms, complete with DIY training. The best will also provide on-demand expertise from data-Jedi-for-hire, which will explain why such a large number of big company data scientists will change jobs in the next 12 months.

    3. Open Warfare Online
    Unfortunately, issues related to information security will escalate beyond data breaches, hacking attacks, and identity theft. In 2016, we will see open warfare on the internet between digital have’s and have-not’s. Whether it is nations attacking one another for state secrets and political leverage, Anonymous escalating their fight with ISIS, or cyber criminals holding people and organizations hostage for millions of dollars in ransom, you can expect an ever-increasing amount of online conflict in the coming year.
    Not only will the volume of attacks grow, the techniques, the numbers of victims and the consequences to all of us will also grow; probably dramatically. Last year’s attacks against Ashley Madison, Sony, United Airlines, Excellus BCBS, Experian and the IRS will seem trivial compared to those that will likely come in 2016. Don’t be surprised by attacks against the power grid, the global financial infrastructure, the military, mass-media and other “pillars of our society.”
    This may sound rather dystopian, but the trends are all pointing in this direction. While their techniques, technologies, and approaches will become increasingly sophisticated, the goals of the attackers will be rather simple: social disruption, political change, and good old fashioned profit motive. In an increasingly-interconnected and automated world, brought on by Big Data, you’re as likely to have your power or water cut off for a week as you are in having your credit card number stolen.

    4. Persuasive Analytics Becomes Normal and Expected
    If, in 2015, you haven’t had a creepy experience with persuasive analytics, you either live in a cave, or you likely weren’t pay attention. Whether it’s instant coupons delivered one click after shopping for something online, getting an invite to a “flash sale” on a favored app, or having a friend or family member receive a notice of your browsing history or physical location, persuasive analytics is the big news in Big Data.
    No doubt your organization played around with predictive analytics over the last couple of years; nearly everyone has. But, you probably also came to the same conclusion as everyone else: predictive analytics is a waste of time and money. Knowing what MIGHT happen in the future has no value if you don’t benefit from the insight. CHANGING the future, so that you CAN benefit from it is how you monetize Big Data. This is the distinction between predictive and persuasive analytics. In the former you spend money, in the latter you make money.
    The revolution of predictive analytics is driven by the Digital Trinity of mobility, social media, and data analytics. Leverage this trio correctly and your business will thrive. Do so incorrectly, and you’ll wonder why your business is dying before your eyes.

    5. Privacy Comes to the Fore
    While personal privacy has been all but surrendered in the United States, there has been a growing trend towards personal privacy and commercial restraint in other countries. The last two years have seen major moves in the privacy arena by the European Union, including the judgment against Google in the right to be forgotten and the nullification of the Safe Harbor provision between the EU and the US.
    Similar actions in jurisdictions around the globe demonstrate a growing awareness of just how valuable our individual information has become and how important it is that we take an active role in managing our data.
    You should expect to see greater governmental action against the unfair, undisclosed, uncontrolled collection and use of end-user data, even as the use of such information becomes a commercial and governmental imperative. As both consumers and citizens, we will expect organizations to meet our needs predictively, while at the same time we will want to be able to control the unfettered access that these organizations have to our most intimate details. This is a huge privacy paradox, and all organizations pursuing a Big Data strategy should have information governance and privacy as central themes in all of their efforts.

    6. Introducing the iPresident
    While many people may not realize it, the last two presidential elections in the United States were heavily influenced by the Digital Trinity. In next-year’s election, the White House will be won by whomever uses Digital Trinity most effectively. In the past, the use of the Trinity to sway voters was fairly rudimentary, and not obvious to the public at-large.
    Next year, the impact of the Trinity won’t be nearly so subtle, or passive. Persuasive analytics will be used to drive new voters to the polls, push very targeted and specific political agendas to the fore and drive the mass media at least as much as the mass media tries to drive society. As events in Syria, Libya, France, Greece, Ferguson Missouri, Baltimore and Hong Kong have shown us, the Digital Trinity is an enabler of dramatic social change. Many of these changes will be positive, others will be decidedly less so. Either way, expect significant disruption to the same old same old in our society.
    American politics is about to be fundamentally, comprehensively and permanently changed by the full application of Big Data and many of those who have held power in our country for a very long time will no longer have a seat at the table. This process will be front and center in 2016 as the presidential election unfolds before our eyes. If you’ve paid any attention to the run up to the election in the second-half of 2015 you’ve noted the degree to which things seem different this time. Trust me you haven’t seen anything yet! Next year, pop some popcorn, tune into the election coverage, and settle in for some great entertainment, because this will be the year that the real power of the Digital Trinity will take center-stage.

    Source: Inside Bigdata

  • Big Data loses its Zing

    9 Juli 2015

    Big data isn’t what it used to be. Not because firms are disillusioned with the technology, but rather because the term is no longer helpful. With nearly two-thirds of firms having implemented or planning to implement some big data capability by the end of 2015, the wave has definitely hit. People have bought in.

    But that doesn’t mean we find many firms extolling the benefits they should be seeing by now; even early adopters still have problems across the customer lifecycle. Can your firm understand customers as individuals, not segments? Are analytics driving consistent, insightful experiences across channels? Does all that customer insight developed by marketing make a bit of difference to your contact center agents? If you are like most firms the answer is, “Not yet but we are working on it.”

  • Big Data nog weinig ingezet voor real-time of voorspellingen

    Big DataDatagedreven opereren? Bij de meeste bedrijven zijn de datatoepassingen nog relatief simpel en vooral gericht op analyse in plaats van real-time en voorspellingen. Een gemiste kans én risico voor de lange-termijnkoers van een organisatie.

    Nu al zegt 22 procent van de bedrijven achter te lopen op de concurrentie terwijl ruim 81 procent van de respondenten aangeeft dat de mogelijkheden van Big Data voor de eigen organisatie groot zijn.

    Dat blijkt uit de Big Data Survey 2015 van data-consultancybureau GoDataDriven en vakbeurs Big Data Expo. Bijna 200 bedrijven werden onderzocht om inzicht te geven in de actuele rol van big data, de mate van adoptie, intenties en mogelijke valkuilen.

    Data uit voor de hand liggende bronnen
    Wat blijkt? De data die gebruikt wordt is over het algemeen numeriek en komt vaak uit voor de hand liggende bronnen, zoals CRM en klantendatabase (18 procent), websitestatistieken (18 procent), externe bronnen (14 procent) en marketingdata vanuit e-mailstatistieken (14 procent) en transactionele data (13 procent). Toepassingen met data uit rijkere bronnen zoals tekst, beeld en geluid zijn er nog zeer weinig, terwijl hier grote winst te behalen is.

    GoDataDriven

    Meer budget voor datagedreven toepassingen
    De meeste bedrijven maken komend jaar meer budget vrij voor datagedreven toepassingen en zijn van plan te investeren in de kennisontwikkeling binnen het team. Een klein deel van de bedrijven is momenteel al bezig met het toepassen van kunstmatige intelligentie, machine learning, voorspellende modellen en deep learning.

    Maar dat verandert in hoog tempo. Binnen drie jaar verwacht 50 procent van de respondenten de eerste toepassingen met geavanceerde technologie ontwikkeld te hebben.

    Visie het belangrijkst voor succesvolle implementatie
    Wat de belangrijkste factoren zijn voor een succesvolle implementatie van een Big Data-strategie? Visie, aldus 28 procent van de ondervraagden, en ondersteuning vanuit de directie (19 procent). Maar ook ondersteunende systemen en processen (18 procent), budget (14 procent), talent (11 procent) en training (10 procent) spelen een belangrijke rol.

    GoDataDriven2

    Data als strategische pijler
    Tegelijkertijd geeft een opvallend groot deel van de ondervraagden aan dat het binnen het eigen bedrijf wel goed zit met de strategische rol van data. 37 procent vult in dat de bedrijfsdirectie data als een strategische pijler ziet, terwijl 27 procent het hier gedeeltelijk mee eens is. Bij bijna een kwart van de bedrijven (23 procent) is er binnen de bedrijfsdirectie op dit vlak juist een grote winst te halen.

    Ruim 67 procent van de bedrijven zegt dan ook dat de mogelijkheden van big data voor de eigen organisatie groot zijn. Nog eens 14,5 procent is het hier gedeeltelijk mee eens. Slechts 9 procent is het in meer of mindere mate oneens met deze stelling.

    Meer highlights:

    • Hadoop is het meest populaire dataplatform: 21 procent heeft een of andere Hadoop-implementatie (Hadoop, Horton, Cloudera).
    • Terwijl bij de licensed software SAP (8 procent), SPSS (7 procent) en SAS (6 procent) het beste scoren.
    • Datatoepassingen worden het vaakst gebruikt binnen marketing (19 procent).
    • Informatietechnologie is bij 13 procent een toepassing, terwijl fraudedetectie (6 procent) en riskmanagement (6 procent) ook regelmatig met behulp van data wordt uitgevoerd.
  • Big Data on the cloud makes economic sense

    With Big Data analytics solutions increasingly being made available to enterprises in the cloud, more and more companies will be able to afford and use them for agility, efficiency and competitiveness

    google
    For almost 10 years, only the biggest of technology firms such as Alphabet Inc.’s Google and Amazon.com Inc.
    used data analytics on a scale that justified the idea of ‘big’ in Big Data. Now more and more firms are
    warming up to the concept. Photo: Bloomberg

    On 27 September, enterprise software company SAP SE completed the acquisition of Altiscale Inc.—a provider of Big Data as-a-Service (BDaaS). The news came close on the heels of data management and analytics company Cloudera Inc. and data and communication services provider CenturyLink Inc. jointly announcing BDaaS services. Another BDaaS vendor, Qubole Inc., said it would offer a big data service solution for the Oracle Cloud Platform.

    These are cases in point of the growing trend to offer big data analytics using a cloud model. Cloud computing allows enterprises to pay for software modules or services used over a network, typically the Internet, on a monthly or periodical basis. It helps firms save relatively larger upfront costs for licences and infrastructure. Big Data analytics solutions enable companies to analyse multiple data sources, especially large data sets, to take more informed decisions.

    According to research firm International Data Corporation (IDC), the global big data technology and services market is expected to grow at a compound annual growth rate (CAGR) of 23.1% over 2014-2019, and annual spending is estimated to reach $48.6 billion in 2019.

    With Big Data analytics solutions increasingly being made available to enterprises in the cloud, more and more companies will be able to afford and use them for agility, efficiency and competitiveness.

    MarketsandMarkets, a research firm, estimates the BDaaS segment will grow from $1.8 billion in 2015 to $7 billion in 2020. There are other, even more optimistic estimates: research firm Technavio, for instance, forecasts this segment to grow at a CAGR of 60% from 2016 to 2020.

    Where does this optimism stem from?

    For almost 10 years, it was only the biggest of technology firms such as Alphabet Inc.’s Google and Amazon.com Inc., that used data analytics on a scale that justified the idea of ‘big’ in Big Data. In industry parlance, three key attributes are often used to understand the concept of Big Data. These are volume, velocity and variety of data—collectively called the 3Vs.

    Increasingly, not just Google and its rivals, but a much wider swathe of enterprises are storing, accessing and analysing a mountain of structured and unstructured data. The trend is necessitated by growing connectivity, falling cost of storage, proliferation of smartphones and huge popularity of social media platforms—enabling data-intensive interactions not only among ‘social friends’ but also among employers and employees, manufacturers and suppliers, retailers and consumers—virtually all sorts of connected communities of people.

    g tech web
     
    A November 2015 IDC report predicts that by 2020, organisations that are able to analyse all relevant data and deliver actionable information will achieve an extra $430 billion in productivity benefits over their less analytically oriented peers.

    The nascent nature of BDaaS, however, is causing some confusion in the market. In a 6 September article onNextplatform.com, Prat Moghe, founder and chief executive of Cazena—a services vendor—wrote that there is confusion regarding the availability of “canned analytics or reports”. According to him, vendors (solutions providers) should be carefully evaluated and aspects such as moving data sets between different cloud and on-premises systems, ease of configuration of the platform, etc., need to be kept in mind before making a purchase decision.

    “Some BDaaS providers make it easy to move datasets between different engines; others require building your own integrations. Some BDaaS vendors have their own analytics interfaces; others support industry-standard visualization tools (Tableau, Spotfire, etc.) or programming languages like R and Python. BDaaS vendors have different approaches, which should be carefully evaluated,” he wrote.

    Nevertheless, the teething troubles are likely to be far outweighed by the benefits that BDaaS brings to the table. The key drivers, according to the IDC report cited above, include digital transformation initiatives being undertaken by a lot of enterprises; the merging of real life with digital identity as all forms of personal data becomes available in the cloud; availability of multiple payment and usage options for BDaaS; and the ability of BDaaS to put more analytics power in the hands of business users.

    Another factor that will ensure growth of BDaaS is the scarcity of skills in cloud as well as analytics technologies. Compared to individual enterprises, cloud service providers such as Google, Microsoft Corp., Amazon Web Services and International Businsess Machines Corp. (IBM) can attract and retain talent more easily and for longer durations.

    Manish Mittal, managing principal and head of global delivery at Axtria, a medium-sized Big Data analytics solutions provider, says the adoption of BDaaS in India is often driven by business users. While the need is felt by both chief information officers and business leaders, he believes that the latter often drive adoption as they feel more empowered in the organisation.

    The potential for BDaaS in India can be gauged from Axtria’s year-on-year business growth of 60% for the past few years—and there are several niche big data analytics vendors currently operating in the country (besides large software companies).

    Mittal says that the growth of BDaaS adoption will depend on how quickly companies tackle the issue of improving data quality.

    Source: livemint.com, October 10, 2016
     

     

  • Big data privacy must be fixed before the revolution can begin

    all-you-need-to-know-about-big-dataThere won't be a 'big data revolution' until the public can be reassured that their data won't be misused.

    Big data is an asset which can create tens of thousands of jobs and generate hundreds of billions for the economy, but the opportunity can't be taken until concerns about privacy and security have been overcome.

    That's according to the newly released The Big Data Dilemma report which is based on evidence from technologists, open data enthusiasts, medical research organisations and privacy campaigners.

    It warns that a big data revolution is coming - something it's suggested will generate over £200bn for the UK economy alone over the next five years - but personal data must not be exploited by corporations and that "well-founded" concerns surrounding privacy must be addressed.

    The answer to this, the report suggests, is the formation of a 'Council of Data Ethics' which will be tasked with explicitly addressing concerns about consent and trust in the area of data collection and retention. It's only then, the report argues, that analysis of big data will truly be able to make a positive impact to society as a whole.

    The report recommends that in order to address the growing legal and ethical challenges associated with balancing privacy, anonymisation, security and public benefit, the Council of Data Ethics should be established within the Alan Turing Institute, the UK's national institute for data science.

    "There is often well-founded distrust about this and about privacy which must be resolved by industry and Government," said Nicola Blackwood MP, chair of the House of Commons Science and Technology Committee, which published the report.

    "A 'Council of Data Ethics' should be created to explicitly address these consent and trust issues head on. And the government must signal that it is serious about protecting people's privacy by making the identifying of individuals by de-anonymising data a criminal offence," she added.

    Nonetheless, the report cites high-technology science projects like the Large Hadron Collider at CERN and the Square Kilometre Array - the world's largest radio telescope, set to be run from the UK's Jodrell Bank Observatory - as examples of how benefits can be gained from analysis of vast datasets.

    "Properly exploited, this data should be transformative, increasing efficiency, unlocking new avenues in life-saving research and creating as yet unimagined opportunities for innovation," the report says.

    However, it also warns that existing big data is nowhere near being fully taken advantage of, with figures suggesting that companies are analysing just 12% of the data available to them.

    Making use of this, the committee claims, "could create 58,000 new jobs over five years, and contribute £216bn to the UK economy" and could be especially effective at boosting efficiency in the public sector.

    The committee also suggests that in order for government to address public concerns around big data, it shouldn't wait for European Union regulations take effect, but rather address the issue head on by introducing criminal penalties for misuse of data.

    "We do not share the government's view that current UK data protections can simply be left until the Data Protection Act will have to be revised to take account of the new EU Regulation. Some areas need to be addressed straightaway -- introducing the Information Commissioner's kitemark and introducing criminal penalties," the report says.

    "Such clarity is needed to give big data users the confidence they need to drive forward an increasingly big data economy, and individuals that their personal data will be respected," it adds, and the document's conclusion puts a strong emphasis on the need for data protection.

    "Given the scale and place of data gathering and sharing, district arising from concerns about privacy and security is often well founded and must be resolved by industry and government is full value of big data is to be realised," it argues.

    Privacy advocates have praised the report, but have also warned that the government still needs to do more on data protection issues.

    "It's admirable that the Committee called out the government for dragging its feet waiting for the new EU Data Protection Regulation. Now the government must take the Regulation and make it true and real to protect our data," says Matthew Rice, advocacy officer at Privacy International.

    "The recommendations in the report provide some practical, small steps that the government should take to better prepare not only for future regulation but for the future understanding of the issue of personal data protection," he adds.

    Source: ZDnet

  • Big data vendors see the internet of things (IoT) opportunity, pivot tech and message to compete

    waterfall-stream-over-bouldersOpen source big data technologies like Hadoop have done much to begin the transformation of analytics. We're moving from expensive and specialist analytics teams towards an environment in which processes, workflows, and decision-making throughout an organisation can - in theory at least - become usefully data-driven. Established providers of analytics, BI and data warehouse technologies liberally sprinkle Hadoop, Spark and other cool project names throughout their products, delivering real advantages and real cost-savings, as well as grabbing some of the Hadoop glow for themselves. Startups, often closely associated with shepherding one of the newer open source projects, also compete for mindshare and custom.

    And the opportunity is big. Hortonworks, for example, has described the global big data market as a $50 billion opportunity. But that pales into insignificance next to what Hortonworks (again) describes as a $1.7 trillion opportunity. Other companies and analysts have their own numbers, which do differ, but the step-change is clear and significant. Hadoop, and the vendors gravitating to that community, mostly address 'data at rest'; data that has already been collected from some process or interaction or query. The bigger opportunity relates to 'data in motion,' and to the internet of things that will be responsible for generating so much of this.

    My latest report, Streaming Data From The Internet Of Things Will Be The Big Data World’s Bigger Second Act, explores some of the ways that big data vendors are acquiring new skills and new stories with which to chase this new opportunity.

    For CIOs embarking on their IoT journey, it may be time to take a fresh look at companies previously so easily dismissed as just 'doing the Hadoop thing.' 

    Source: Forrester.com, 

  • Big data: key trends in analytics, technologies and services

    Big data: key trends in analytics, technologies and services

    There is no doubt that we produce more data in a day than we did in decades of history. We most likely don’t even realize that we produce such a large amount of data simply by browsing on the Internet, so you will be surprised. Keep an eye out for the future trends in Big data analytics and you won’t be caught off guard by future technologies.

    Over the past decade, global data has been growing exponentially, and it continues to do so today. It is mainly aggregated via the internet, including social networks, web search requests, text messages, and media files. IoT devices and sensors also contribute huge amounts of data propelling Big data analytics trends.

    Throughout various industries, Big data has evolved significantly since it first entered the technical scene in the early 2000s. As Big data has become more prevalent, companies must hire experts in data analytics, capable of handling complex data processing to keep up with the latest trends in Big data analytics.

    Data fabric

    On-premises and cloud environments are supported by data fabrics, which provide consistent functionality across a variety of endpoints. Using Data Fabric, organizations can simplify and integrate data storage across cloud and on-premises environments, providing access to and sharing of data in a distributed environment to drive digital transformation & new trends in Big data analytics.

    Through a data fabric architecture, organizations are able to store and retrieve information across distributed on-premises, cloud, and hybrid infrastructures. Enterprises can utilize data fabrics in an ever-changing regulatory environment, while ensuring the right data is securely provided in an environment where data and analytics technology is constantly evolving.

    As opposed to being generated by real-world events, synthetic data is information created artificially. Synthetic data is produced algorithmically, and it can be used as a substitute for production or operational data as well as to validate mathematical models and, more often than not, to train machine learning algorithms.

    As of 2022, more attention is being paid to training machine learning algorithms using synthetic data sets, which are simulations generated by computers that provide a wide variety of different and anonymous training data for machine learning algorithms. In order to ensure a close resemblance to the genuine data, various techniques are used to create the anonymized data, such as general conflicting networks and simulators.

    Although synthetic data concepts have been around for decades, they did not gain serious commercial adoption until the mid-2000s in the autonomous vehicle industry. It is no surprise that synthetic data’s use in autonomous vehicles began there. It is often the sector that is catalyst for the development of foundational technologies like synthetic data because it attracts more machine learning talent and investment dollars than any other commercial application of AI, further accelerating Big data analytics and the future of marketing and sales.

    AI developers can improve their models’ performance and robustness by using synthetic data sets. In order to train and develop machine learning and artificial intelligence (AI), data scientists have developed efficient methods for producing high-quality synthetic data that would be helpful to companies that need large quantities of data.

    Data as a service

    Data was traditionally stored in data stores, which were designed for particular applications to access, however, when SaaS (software as a service) gained popularity, DaaS was a relatively new concept. As with Software-as-a-Service applications, Data as a Service uses cloud technology to provide users and applications with on-demand access to information, regardless of where the users or applications are located.

    In spite of the popularity of SaaS for more than a decade, DaaS has only recently begun to gain broad acceptance. The reason for this is that generic cloud computing services were not originally built to handle massive data workloads; instead, they were intended to host applications and store data (instead of integrating, analyzing, and processing data).

    Earlier in the life of cloud computing, when bandwidth was often limited, processing large data sets via the network was also challenging. Nonetheless, DaaS is just as practical and beneficial as SaaS today, thanks to the availability of low-cost cloud storage and bandwidth, combined with cloud-based platforms designed specifically for managing and processing large amounts of data quickly and efficiently.

    Active Metadata

    The key to maximizing a modern data stack lies in the enrichment of active metadata by machine learning, human interaction, and process output. In modern data science procedures, there are several different classifications of data, and metadata is the one that informs users about the data. To ensure that Big data is properly interpreted and can be effectively leveraged to deliver results, a metadata management strategy is essential.

    A good data management strategy for Big data requires good metadata management from collection to archiving to processing to cleaning. As technologies like IoT, cloud computing, etc., advance, this will be useful in formulating digital strategies, monitoring in the purposeful use of data, & identifying the sources of information used in analyses to accelerate the Big data analytics future scope. Data governance would be enhanced by the use of active metadata, which are available in a variety of forms.

    Edge Computing

    This term describes the process of running a process on a local system, such as the system of a user, an IoT device or a server, and moving that process there. Edge computing allows data to be processed at the edge of a network, reducing the number of long-distance connections between a server and a customer, making it a major trend in Big data analytics.

    This enhances Data Streaming, such as real-time data streaming and processing without causing latency; devices respond immediately as a result. Computing at the edge is efficient because it consumes less bandwidth and reduces an organization’s development costs. It also enables remote software to run more efficiently.

    Many companies use edge computing to save money alone, so cost savings are often the driving force for their deployment. In organizations initially embraced the cloud, bandwidth costs may have been higher than anticipated, and if they are looking for a less expensive alternative, edge computing might be a good fit.

    In recent years, edge computing has become increasingly popular as a way to process and store data faster, which can allow companies to create more efficient real-time applications. The facial recognition algorithm would have to be run through a cloud-based service if a smartphone scanned a person’s face for facial recognition before edge computing was invented, which would take a lot of time and effort.

    Hybrid clouds

    With the orchestration of two interfaces, a cloud computing system combines a private cloud on-premises with a public cloud from a third party. With hybrid cloud deployment, processes are moved between private and public clouds, which allows for great flexibility and more data deployment options. For an organization to be adaptable to the aspired public cloud, it needs a private cloud.

    This requires building a data center, which includes servers, storage, a LAN, and load balancers. VMs and containers must be supported by a virtualization layer or hypervisor. A private cloud software layer must also be installed, enabling instances to transfer data between the public and private clouds through the implementation of software.

    A hybrid cloud setup uses traditional systems as well as the latest cloud technology, without a full commitment to a specific vendor, and adjusts the infrastructure accordingly. Businesses work with a variety of types of data in disparate environments and adjust their infrastructure accordingly. The organization can migrate workloads between its traditional infrastructure and the public cloud at any time.

    Data center infrastructure is owned and operated by an organization with a private cloud, which is associated with significant capital expenditures and fixed costs. In contrast, public cloud resources and services are considered variable and operational expenses. Hybrid cloud users can choose to run workloads in the most cost-effective environment.

    Data service layer

    An organization’s data service level is critical to providing data to customers within and across organizations. Real-time service levels enable end-users to interact with data in real-time or near-real-time changing the Big data analytics future scope.

    In addition to providing low-cost storage to store large quantities of raw data, the data lakehouse system implements the metadata layer above the store in order to structure data and improve data management capabilities similar to a data warehouse. A single system lets multiple teams access all company data for a variety of projects, such as machine learning, data science, and business intelligence, using one system.

    Data mesh

    An enterprise data fabric is a holistic approach for connecting all data within an organization, regardless of its location, and making it accessible on demand. A data mesh, on the other hand, is an architectural approach similar to and supportive of that approach. With a data mesh, information about creating, storing, and sharing data is domain-specific and applicable across multiple domains on a distributed architecture.

    Using data mesh approaches is a great way for businesses to democratize both data access and data management by treating data as a product, organized and governed by experts. Taking a data mesh approach is a great way to increase scalability of the data warehouse model as well as democratize both data access and data management.

    Natural language processing

    Among the many applications of artificial intelligence, Natural Language Processing (NLP) enables computers and humans to communicate effectively. It is a type of artificial intelligence that aims to read and decode human language and create meanings. The majority of the software developed for natural language processing is based on machine learning.

    By applying grammar rules, algorithms can recognize and extract the necessary data from each sentence in Natural Language Processing. The main techniques used in natural language processing are syntactic and semantic analysis. A syntactic analysis takes care of sentences and grammatical problems, whereas a semantic analysis analyzes the meaning of the text or data.

    XOps

    A key objective of XOps (data, machine learning, model, platform) is to optimize efficiency and achieve economies of scale. XOps is achieved by adopting DevOps best practices. This will reduce technology, process replication, and automation, ensuring efficiency, reusability, and repeatability. These innovations would allow prototypes to be scaled, with flexible design and agile orchestration of governed systems.

    A growing number of algorithms for solving specific business problems is being deployed as AI continues to increase, so organizations will need multiple algorithms for attacking new challenges. By removing organizational silos to facilitate greater collaboration between software engineers, data scientists and IT staff, companies can effectively implement ModelOps and ensure it becomes an integral part of AI development and deployment.

    Summary

    As the name implies, Big data refers to a large amount of information that needs to be processed in an innovative way to improve insight and decision-making. With the use of Big data technologies, organizations can gain insight and make better decisions, leading to greater ROI for their investments. It is critical to understand the prospects of Big data technology, however, to decide which solution is right for an organization given so many advancements.

    Organizations that use data-driven strategies are those that succeed in today’s digital age and are looking to invest in data analytics. As a result of digital assets and processes, more data is being gathered than ever before, and data analytics is helping businesses shape themselves. Here are the latest trends in Big Data Analytics for 2022 and beyond.

    Data analytics: questions answered

    What are the future trends in data analytics?

    AI and machine learning are being embraced heavily by businesses as a means of analyzing Big data about different components of their operations and strategizing accordingly. This is especially the case when it comes to improving customer service and providing a seamless customer experience.

    What will be the future of Big data industry?

    The future of Big data may see organizations using business analytics to create real-world solutions by combining analyses from the digital world with the analyses from the physical world.

    What is the next big thing in data analytics?

    Using artificial intelligence, machine learning, and natural language processing technologies, augment analytics automates the analysis of large amounts of data for real-time insights.

    What is the next big thing after Big data?

    Several sources claim that Artificial Intelligence (AI) will be the next big thing in technology, and we believe that Big Data will be as well.

    What are the top trends of data analytics in 2023?

    • AR; VR
    • Driverless Cars
    • Blockchain
    • AI
    • Drones.

    What are the key data trends for 2023?

    • Using Big data for climate change research
    • Gaining traction for real-time analytics
    • Launching Big Data into the real world

    What is the scope of Big data analytics?

    In today’s world, there is no doubt that Big data analytics is in high demand due to its numerous benefits. This enormous progress can be attributed to the wide variety of industries that use Big data analytics.

    Is Big Data Analytics in demand?

    The wide range of industries that are using Big data analytics is undoubtedly a major reason for the growth of the technology.

    What are the critical success factors for Big data analytics?

    • Establishing your mission, values, and strategy,
    • Identifying your strategic objectives and “candidate” CSFs
    • Evaluating and prioritizing them
    • Communicating them to key stakeholders
    • Monitoring and measuring their implementation.

    Author: Zharovskikh Anastasiya

    Source: InData Labs

  • Big data: laten we niet vervallen in oude fouten

    4a0527affdaec1c2a3dc5f0f95a51416Als we niet oppassen gaat het opnieuw helemaal verkeerd. Vendoren ontwikkelen big data-producten vaak als extra stap in het proces, wat dat proces er alleen maar complexer op maakt. Terwijl producten het proces nu juist zouden moeten versoepelen.

    Vijftien jaar geleden werd de revolutie in de besluitvormingsondersteuning in de kiem gesmoord doordat de datamanagementbranche inflexibele data warehouse-systemen bouwde en onbruikbare business intelligence-tools ontwikkelde. Organisaties werden gedwongen om hun bedrijfsprocessen aan te passen aan onze eigen productagenda's. Vandaag de dag, nu het big data-tijdperk opkomt, gaan we weer precies dezelfde kant op.

    Er moet een verandering in het denken komen. We moeten ons niet meer richten op het product, maar op het proces!

    Volg de processen

    Het ideale flow diagram voor een proces heeft zo min mogelijk tussenstappen. Maar in de praktijk zien we dit vrijwel nooit en dat heeft talloze redenen. De belangrijkste daarvan is dat softwareproducenten zich richten op het product en niet op het proces. Zij hanteren een strategie die probeert een product in te voegen als een van de vele tussenstappen in de processtroom. Eigenlijk ontwerpen ze zichzelf een proces binnen.

    De reactie van de datamanagementbranche op big data is tot nu toe van hetzelfde laken een pak. In de meeste gevallen betekent dit een ratjetoe van propriëtaire, op de stack gerichte big data-"oplossingen", technologische of architectonische voorschriften die alleen het eigenbelang dienen en front-end tools die eigenlijk nog niet helemaal klaar zijn.

    Maar big data is anders, omdat het onmiskenbaar multidisciplinair is: het impliceert onderlinge verbondenheid, interoperabiliteit en uitwisseling tussen verschillende domeinen. Big data wil zeggen dat je alles met alles verbindt, en wat dat betreft is big data precies het tegenovergestelde van databeheer.

    Vanuit productperspectief moet een big data-bewuste tool functioneren in een context waarbinnen problemen, praktijken en processen multidisciplinair zijn. Geen enkel product is volledig onafhankelijk of werkt volledig geïsoleerd. Dit betekent overigens niet dat er geen big data-geörienteerde producten kunnen bestaan die zich richten op uiterst specifieke toepassingen, of meer generalistische big data-geörienteerde producten die bedoeld zijn voor bepaalde proces-, domein- of functie-activiteiten. En het betekent ook niet automatisch dat een volledig cohort bestaande producten ineens "pre-big data" wordt.

    Meer van hetzelfde is de verkeerde aanpak

    Toch ontwikkelen en verkopen de meeste aanbieders "big data-in-a-platform"-producten. En deze "oplossingen" hebben één ding met elkaar gemeen, hun productgerichte model: ze zijn er helemaal op gericht om zichzelf - als een tussenstap - in een proces te nestelen. Terwijl elke tussenstap zorgt voor vertraging en een vergroting van de complexiteit en de kwetsbaarheid.

    Of erger nog: elke tussenstap heeft zijn eigen infrastructuur. Die bestaat voor elke afzonderlijke fabrikant uit eigen ondersteunend personeel met een eigen interne knowledge-base. In het beste geval betekent dit legers Java- of Pig Latin-programmeurs werven, of DBA's en SQL-programmeurs de fijne kneepjes van HQL bijbrengen. In het ergste geval betekent dit aanzienlijke hoeveelheden tijd en geld investeren in de ontwikkeling van platformspecifieke kennisbanken.

    Automatisering is de oplossing

    De manier om iets te doen aan deze scheve verhouding is focussen op het automatiseren van de processen van een datawarehouse-omgeving, zoals scoping, het opzetten van warehouses, continu beheer en periodieke refactoring. Je zou zelfs het aanmaken en beheren van documentatie, schema's en lineage-informatie voor warehouses kunnen automatiseren door het helemaal elimineren van handmatig programmeren in SQL of in interne, alleen voor bepaalde tools bestemde talen.

    Big data-producten hebben namelijk helemaal geen eigen infrastructuur nodig. Zij moeten de taal speken van en ondersteuning bieden aan de specifieke onderdelen van OLTP-systemen, warehouseplatforms, analytische databases, NoSQL- of big data-archieven, BI-tools en alle overige 'stappen' die samen een ecosysteem voor informatie vormen.

    Producten moeten zich richten op de punten in het proces tussen geïsoleerde systemen, waar een processtroom wordt geblokkeerd. Dit type blokkade is het onvermijdelijke gevolg van een productgerichte ontwikkelings- en verkoopstrategie. En zoals het er nu naar uitziet gaan we veel van dit soort blokkades krijgen op het gebied van big data.

    We moeten big data gaan zien als een soort vrijhandelszone waarbij 'handel' gelijk staat aan 'proces': gegevens worden verplaatst van de ene tussenstap naar de andere, met minimale beperking of belemmering en zonder platformspecifieke embargo's van onnodige tussenstappen.

    Het antwoord ligt mijns inziens in automatisering. En dan niet automatisering omwille van de automatisering, maar als integrale processtroom ter voorkoming van blokkades, verhoging van responsiviteit, verlaging van kosten en om IT de gelegenheid te bieden om zich te richten op het creëren van waarde.

    Laten we er met z'n allen voor zorgen dat het deze keer niet wéér misgaat!

    Source: CIO

  • Big data's big problem? Most companies don't realize they're already using it

    bigdatahCompanies are already actively using big data. They just don't call it that. While the phrase has problems, the technology is becoming more intrinsic to business.

    It turns out that no one knows what the heck big data is, and about the same number of companies are actually doing anything meaningful with it, according to a new study from Dresner Advisory Services. Surprised? More about the study? Click here.

    You shouldn't be. After all, despite years of big data prognostication, most companies still struggle to even put little data to use.

    This isn't to suggest that big data isn't a big deal, or that companies aren't deriving massive value today from their data. It is and they are. But, to get value from big data, companies first need to get real.

    Who needs it?

    As Datamation's James Maguire captures, Dresner Advisory Services doesn't see much adoption of big data.

    bd1

    Just 17% of companies acknowledge using big data today, with another 47% putting it off into an indeterminate future. No wonder, then, that the report's authors conclude, "Despite an extended period of awareness building and hype, actual deployment of big data analytics is not broadly applicable to most organizations at the present time."

    Big data, big nothing?

    Well, no. After all, 59% of the report's respondents also claim big data is "critically important," despite not doing anything with it (apparently). Something is clearly going on here....

    That "something," I suspect, is just definitional.

    You keep using that word...

    Way back in the prehistoric world of 2012, NewVantage Partners upended the prevailing wisdom of what the "big" in big data actually meant. Despite tons of hype around petabyte-scale data problems, largely fueled by Hadoop and its ecosystem vendors, the reality was (and is) that most companies don't have petabyte-scale problems.

    The primary problems most companies struggle with involve variety and velocity of data, as the survey uncovered.

    The market is finally starting to grok this, investing increasing amounts of money in technologies that more easily manage diverse data types (e.g. NoSQL databases like MongoDB and DataStax-sponsored Cassandra), and handle streaming data (e.g. Apache Spark).

    At the same time, enterprises continue to turn to more traditional data infrastructure like Oracle. As DB-Engines found in its 2015 year-end review, Oracle was the biggest gainer in terms of overall popularity last year (measured in terms of job postings, tech forum mentions, Google searches, etc.).

    More than sexy-cool NoSQL. More than cloud-first Amazon. More than anything.

    Of course, some of this increased Oracle usage has nothing to do with big data, and everything to do with managing neat-and-tidy rows-and-column data. But, based on NewVantage Partners' survey data, this comparatively "small" data is still where most of the big data analytics action resides.

    Moving beyond this structured data, too, I suspect many companies still don't think of varied, high-velocity data as "big data." This may be one reason so few companies claim to be doing much of anything with big data. As MySQL database engineer Justin Swanhart put it, "Big data is meaningless. You might as well ask people what color database they want."

    In short, big data is alive and well, but companies don't necessarily think of it as "big."

    So what?

    For enterprises wondering if they're being left behind by big data, it's time to stop worrying. As Gartner analyst Nick Heudecker posits, "big data" has migrated into more familiar categories:

    • Advanced analytics and data science
    • Business intelligence and analytics
    • Enterprise information management
    • In-memory computing technology
    • Information infrastructure

    Most enterprises are already engaged in projects that put big data to use. They just don't call it that. Even so, there's still a lot of work to do. As Michael Schrage, a research fellow at MIT Sloan School's Center for Digital Business, puts it:

    "[The] most enduring impact of predictive analytics...comes less from quantitatively improving the quality of prediction than from dramatically changing how organizations think about problems and opportunities."

    In other words, companies may already own the requisite technologies to put big data to work. What they lack is a desire to fundamentally change how they put that data to work. It's one thing to have a group of analysts decipher data, and quite another to use that data analysis to fuel real-time changes in one's business.

    That's not the sort of thing you can buy from a vendor. It's something that has to change within the DNA of an enterprise. Between a more accurate understanding of big data, and actually doing something with it, enterprises have their work cut out for them.

    Source: techRepublic

     

     

  • Big Tech: the battle for our data

    Big Tech: the battle for our data

    The most important sector of tech is user privacy and with it comes a war not fought in the skies or trenches but in congressional hearings and slanderous advertisements, this battle fought in the shadows for your data and attention is now coming to light.

    The ever-growing reliance we have on technology has boomed since the advent of social media, especially and specifically with phones. Just 15 years ago, the favoured way of accessing services like Facebook was through a computer but this changed at a radical pace following the introduction of the iPhone in 2007 and the opening of the iOS App Store in 2008.

    Since then, the app economy now in its teens has become a multi-billion dollar industry built on technologies founded in behavioural change and habit forming psychology.

    If you don’t have the iPhone’s ‘Screen Time’ feature set up, you’ll want to do that after hearing this:

    According to various studies a typical person spends over four hours a day on their phone, with almost half of that time taken up by social media platforms like Facebook, Instagram, and Twitter. These studies were conducted before the pandemic so it wouldn’t be far stretched to assume these figures have gone up.

    So what happens with all this time spent on these platforms?

    Your time is your attention, your attention is their data

    Where advertisements of old for businesses and products relied on creativity and market research on platforms like television and newspapers, modern advertising takes advantage of your online behaviour and interests to accurately target tailored advertisements to users.

    User data collected by Facebook is used to create targeted advertisements for all kinds of products, businesses and services. They use information like your search history, previous purchases, location data and even collect identifying information across apps and websites owned by other companies to build a profile that’s used to advertise things to you. In a recent update to iOS, Apple’s App Store now requires developers to outline to users what data is tracked and collected in what they are calling ‘privacy nutrition labels’.

    In response to this in Facebook’s most recent quarterly earnings call, Mark Zuckerberg stated “We have a lot of competitors who make claims about privacy that are often misleading,” and “Now Apple recently released so-called (privacy) nutrition labels, which focused largely on metadata that apps collect rather than the privacy and security of people’s actual messages,”.

    Facebook uses this meta-data to sell highly targeted ad space.

    This is how you pay for ‘free’ services, with your data and attention

    The harvesting of user data on platforms like Facebook has not only benefited corporations in ‘Big Tech’ and smaller business but has even been grossly abused by politicians to manipulate outcomes of major political events.

    In 2018, the Cambridge Analytica scandal emerged into the forefront of mainstream media after a whistleblower for the company, Christopher Wylie came forward with information that outlined the unethical use of Facebook user data to create highly targeted advertisements with the goal of swaying political agendas. Most notably, illicitly obtained data was used in former US President Donald Trump’s 2016 presidential campaign in the United States, as well as the Leave. EU and UK Independence campaigns in support of BREXIT in the United Kingdom and his is just the tip of the iceberg.

    This is the level of gross manipulation of data Apple is taking a stand against.

    “The fact is that an interconnected eco-system of companies and data-brokers; of purveyors of fake news and peddlers of division; of trackers and hucksters just trying to make a quick buck, is more present in our lives than it has ever been.” — Tim Cook on Privacy, 2021

    What we have here are two titans of industry with massive amounts of influence and responsibility at war.

    On one hand, you have Facebook who has time and time again been grilled in public forums for data harvesting of their 2.6 billion monthly active users, shadow profiles (data collected on non-Facebook users), and social media bias, and then, on the other hand, you have Apple, who have 1.5 billion active devices running iOS across iPhone and iPad, all of which are ‘tools’ that demand attention with constant notifications and habit forming user experience design.

    Apple has been scrutinised in the past for its App Store policy and are currently fighting an anti-trust lawsuit filed by Epic Games over the removal of Fortnite from the App Store for violating its policies on in-app purchases. Facebook stated in December of 2020, that the company will support Epic Games’ case and is also now reportedly readying an antitrust lawsuit of its own against Apple for forcing third-party developers to follow rules that first-party apps don’t have to follow.

    Zuckerberg stated in the earnings call that “Apple has every incentive to use their dominant platform position to interfere with how our apps and other apps work, which they regularly do to preference their own. And this impacts the growth of millions of businesses around the world.” and “we believe Apple is behaving anti-competitively by using their control of the App Store to benefit their bottom line at the expense of app developers and small businesses”. This is an attempt by Zuckerberg to show that Apple is using their control of the App Store to stifle the growth of small businesses but our right to know how our own data is being used should stand paramount, even if its at the expense of business growth.

    Apple’s position on privacy protection ‘for the people’ and introduction of privacy ‘nutrition labelling’ is not one that just benefits users, but is one that benefits and upholds trust in the company and its products. The choices the company makes in its industries tend to form and dictate how and where the market will go. You only have to look at its previous trends in product and packaging design to see what argument I’m trying to make.

    With growing concern and mainstream awareness of data use, privacy is now at the forefront of consumer trends. Just look at the emergence of VPN companies in the last couple of years. Apple’s stance on giving privacy back to the user could set a new trend into motion across the industry and usher in an age of privacy-first design.

    Author: Morgan Fox

    Source: Medium

  • Bol.com: machine learning om vraag en aanbod beter bij elkaar te brengen

    0cd4fbcf0a4f81814f388a75109da149ca643f45Een online marktplaats is een concept dat e-commerce in toenemende mate blijft adopteren. Naast consumer-to-consumer marktplaatsen zoals Marktplaats.nl, zijn er uiteraard ook business-to-consumer marktplaatse waarbij een online platform de vraag van consumenten en het aanbod van leveranciers bij elkaar brengt.

    Sommige marktplaatsen hebben geen eigen assortiment: hun aanbod bestaat voor 100 procent uit aangesloten leveranciers, denk bijvoorbeeld aan Alibaba. Bij Amazon bedraagt het aandeel van eigen producten 50 procent. Ook bol.com heeft een eigen marktplaatsen: ’Verkopen via Bol.com’. Deze draagt bij aan miljoenen extra artikelen in het assortiment van Bol.com.

    Bewaken van contentkwaliteit

    Er komt veel kijken bij het managen van zo’n marktplaats. Het doel is duidelijk: ervoor zorgen dat de vraag en het aanbod zo snel mogelijk bij elkaar komen, zodat de klant direct een aantal producten krijgt aangeboden die voor hem relevant zijn. En met miljoenen klanten aan de ene kant en miljoenen producten van duizenden leveranciers aan de andere kant, is dat natuurlijk een hele klus.

    Jens legt uit: “Het begint bij de standaardisatie van informatie aan zowel de vraag- als de aanbodkant. Bijvoorbeeld, als je als leverancier een cd van Tsjaikovski of een bril van Dolce & Gabbana bij bol.com wilt aanbieden, dan zijn er vele schrijfwijzen mogelijk. Voor een verkoopplatform als ‘Verkopen via bol.com’ is de kwaliteit van de data cruciaal. Het in stand houden van de kwaliteit van de content is dus een van de uitdagingen.

    Aan de andere kant van de transactie zijn er natuurlijk klanten van bol.com die ook allerlei variaties van termen, zoals de namen van merken, in het zoekveld intypen. Daarnaast wordt er in toenemende mate gezocht op generieke termen als ‘cadeau voor huwelijk’ of ‘spullen voor een feestje’.

    Vraag en aanbod bij elkaar brengen

    Naarmate het assortiment groter wordt, wat het geval is, en de klanten steeds ‘generieker’ gaan zoeken, is het steeds uitdagender om een match te maken en relevantie hoog te houden. Door het volume van deze ongestructureerde data en het feit dat ze realtime geanalyseerd moeten worden, kun je die match niet met de hand maken. Je moet hiervoor de data slim kunnen inzetten. En dat is een van de activiteiten waar het customer intelligence team van bol.com, een onderdeel van customer centric selling-afdeling, mee bezig is.

    Jens: “De truc is om het gedrag van klanten op de website te vertalen naar contentverbeteringen. Door de woorden (en woordcombinaties) die klanten gebruiken om artikelen te zoeken en producten die uiteindelijk gekocht zijn te analyseren en met elkaar te matchen, kunnen synoniemen voor desbetreffende producten worden gecreëerd. Dankzij deze synoniemen gaat de relevantie van de zoekresultaten omhoog en help je dus de klant om het product sneller te vinden. Bovendien snijdt het mes snijdt aan twee kanten, omdat tegelijkertijd de kwaliteit van de productcatalogus wordt verbeterd. Denk hierbij aan verfijning van verschillende kleurbeschrijvingen (WIT, Wit, witte, white, etc.).

    Algoritmes worden steeds slimmer

    Het bovenstaande proces verloopt nog semi-automatisch (met terugwerkende kracht), maar de ambitie is om het in de toekomst volledig geautomatiseerd plaats te laten vinden. Om dat te kunnen doen worden er op dit moment stap voor stap machinelearningtechnieken geïmplementeerd. Als eerste is er geïnvesteerd in technologieën om grote volumes van ongestructureerde data zeer snel te kunnen verwerken. Bol.com bezit twee eigen datacenters met tientallen clusters.

    “Nu wordt er volop geëxperimenteerd om deze clusters in te zetten voor het verbeteren van het zoekalgoritme, het verrijken van de content en standaardisatie”, geeft Jens aan. “En dat levert uitdagingen op. Immers, als je doorslaat in standaardisatie, dan kom je in een selffulfilling prophecy terecht. Maar gelukkig nemen de algoritmes het beetje bij beetje over en worden ze steeds slimmer. Nu probeert het algoritme de zoekterm zelf aan een product te koppelen en legt het deze aan diverse interne specialisten voor. Concreet geformuleerd: de specialisten krijgen te zien dat ‘de kans 75 procent is dat de klant dit bedoelt’. Die koppeling wordt vervolgens handmatig gevalideerd. De terugkoppeling van deze specialisten over een voorgestelde verbetering levert belangrijke input voor algoritmes om informatie nog beter te kunnen verwerken. Je ziet dat de algoritmes steeds beter hun werk doen.”

    Toch levert dit voor Jens en zijn team een volgende kwestie op: waar leg je de grens waarbij het algoritme zelf de beslissing kan nemen? Is dat bij 75 procent? Of moet alles onder de 95 procent door menselijk inzicht gevalideerd worden?

    Een betere winkel maken voor onze klanten met big data

    Drie jaar geleden was big data een onderwerp waarover voornamelijk in PowerPoint‑slides gesproken werd. Tegenwoordig hebben vele (grotere) e-commercebedrijven een eigen Hadoop-cluster. Het is de volgende stap om met big data de winkel écht beter te maken voor klanten en bij bol.com wordt daar hard aan gewerkt. In 2010 is bij het bedrijf overgestapt van ‘massamediale’ naar ‘persoonlijk relevante’ campagnevoering, waarbij er in toenemende mate gepoogd wordt om op basis van diverse ‘triggers’ een persoonlijke boodschap aan de klant te bieden, real-time.

    Die triggers (zoals bezochte pagina’s of bekeken producten) wegen steeds zwaarder dan historische gegevens (wie is de klant en wat heeft deze in verleden gekocht).

    “Als je inzicht krijgt in relevante triggers en niet‑relevante weglaat”, stelt Jens, “dan kun je de consument beter bedienen door bijvoorbeeld de meest relevante review te tonen, een aanbieding te doen of een selectie vergelijkbare producten te maken. Op deze manier sluit je beter aan bij de klantreis en is de kans steeds groter dat de klant bij je vind wat hij zoekt.”

    En dat doet bol.com door eerst, op basis van het gedrag op de website, maar ook op basis van de beschikbare voorkeuren van de klant, op zoek te gaan naar de relevante triggers. Nadat deze aan de content zijn gekoppeld, zet bol.com A/B‑testen in om de conversie te analyseren om het uiteindelijk wel of niet definitief door te voeren. Immers, elke wijziging moet resulteren in hogere relevantie.

    Er komen uiteraard verschillende technieken bij kijken om ongestructureerde data te kunnen analyseren en hier zijn zowel slimme algoritmes als menselijk inzicht voor nodig. Jens: “Gelukkig zijn bij ons niet alleen de algoritmes zelflerend, maar ook het bedrijf, dus het proces gaat steeds sneller en beter.”

    Data-scientists

    Outsourcen of alles in-house doen is een strategische beslissing. Bol.com koos voor het laatste. Uiteraard wordt er nog op ad-hocbasis gebruikgemaakt van de kennis uit de markt als dat helpt om processen te versnellen. Data-analisten en data scientists zijn een belangrijk onderdeel van het groeiende customer centric selling team.

    Het verschil spreekt voor zich: data-analisten zijn geschoold in ‘traditionele’ tools als SPSS en SQL en doen analysewerk. Data scientists hebben een grotere conceptuele flexibiliteit en kunnen daarnaast programmeren in onder andere Java, Python en Hive. Uiteraard zijn er doorgroeimogelijkheden voor ambitieuze data-analisten, maar toch wordt het steeds lastiger om data scientists te vinden.

    Hoewel er in de markt keihard gewerkt wordt om het aanbod uit te breiden; hebben we hier vooralsnog met een kleine, selecte groep professionals te maken. Bol.com doet er alles aan om de juiste mensen te werven en op te leiden. Eerst wordt een medewerker met het juiste profiel binnengehaald; denk aan iemand die net is afgestudeerd in artificial intelligence, technische natuurkunde of een andere exacte wetenschap. Vervolgens wordt deze kersverse data scientist onder de vleugels van een van de ervaren experts uit het opleidingsteam van bol.com genomen. Training in computertalen is hier een belangrijk onderdeel van en verder is het vooral learning-by-doing.

    Mens versus machine

    Naarmate de algoritmes steeds slimmer worden en artificial‑intelligencetechnologieën steeds geavanceerder, zou je denken dat het tekort aan data scientists tijdelijk is: de computers nemen het over.

    Dat is volgens Jens niet het geval: “Je zult altijd behoefte blijven houden aan menselijk inzicht. Alleen, omdat de machines steeds meer routinematig en gestandaardiseerd analysewerk overnemen, kun je steeds meer gaan doen. Bijvoorbeeld, niet de top 10.000 zoektermen verwerken, maar allemaal. Feitelijk kun je veel meer de diepte én de breedte in. En dus is de impact van jouw werk op de organisatie vele malen groter. Het resultaat? De klant wordt beter geholpen en hij bespaart tijd omdat hij steeds relevantere informatie krijgt en daarom meer engaged is. En brengt ons ook steeds verder in onze ambitie om onze klanten de beste winkel te bieden die er bestaat.”

    Klik hiervoor het hele rapport.

    Source: Marketingfacts

  • Business Data Scientist 2.0

    Ruim 3 jaar geleden verzorgden we de eerste leergang Business Data Scientist. Getriggerd door de vele sexy vacature teksten vroegen we ons als docenten af wat een data scientist nu exact tot data scientist maakt? In de vacatureteksten viel ons naast een enorme variëteit ook een waslijst aan noodzakelijke competenties op. De associatie met het (meestal) denkbeeldige schaap met de vijf poten was snel gelegd. Daarnaast sprak uit die vacatureteksten in 2014 vooral hoop en ambitie. Bedrijven met hoge verwachtingen op zoek naar deskundig personeel om de alsmaar groter wordende stroom data te raffineren tot waarde voor de onderneming. Wat komt daar allemaal bij kijken?

    Een aantal jaar en 7 leergangen later is er veel veranderd. Maar eigenlijk ook weer weinig. De verwachtingen van bedrijven zijn nog steeds torenhoog. De data scientist komt voor in alle vormen en gedaanten. Dat lijkt geaccepteerd. Maar de kern: hoe data tot waarde te brengen en wat daarbij komt kijken blijft onderbelicht. De relevantie voor een opleiding Business Data Scientist is dus onveranderd. En eigenlijk groter geworden. De investeringen in data science zijn door veel bedrijven gedaan. Het wordt tijd om te oogsten.Data scientist 2.0

    Om data tot waarde te kunnen brengen is ‘verbinding’ noodzakelijk. Verbinding tussen de hard core data scientists die data als olie kunnen opboren, raffineren tot informatie en het volgens specificaties kunnen opleveren aan de ene kant. En de business mensen met hun uitdagingen aan de andere kant. In onze leergangen hebben we veel verhalen gehoord van mooie dataprojecten die paarlen voor de zwijnen bleken vanwege onvoldoende verbinding. Hoe belangrijk ook, zonder die verbinding overleeft de data scientist niet. De relevantie van een leergang Business Data Scientist is dus onveranderd. Moet iedere data scientist deze volgen? Bestaat er een functie business data scientist? Beide vragen kunnen volmondig met néé beantwoord worden. Wil je echter op het raakvlak van toepassing en data science opereren dan zit je bij deze leergang precies goed. En dat raakvlak zal meer en meer centraal gaan staan in data intensieve organisaties.

    De business data scientist is iemand die als geen ander weet dat de waarde van data zit in het uiteindelijk gebruik. Vanuit dat eenvoudig uitgangspunt definieert, begeleidt, stuurt hij/zij data projecten in organisaties. Hij denkt mee over de structurele verankering van het gebruik van data science in de operationele en beleidsmatige processen van organisatie en komt met inrichtingsvoorstellen. De business data scientist kent de data science gereedschapskist door en door zonder ieder daarin aanwezige instrument ook daadwerkelijk zelf te kunnen gebruiken. Hij of zij weet echter welk stukje techniek voor welk type probleem moet worden ingezet. En omgekeerd is hij of zij in staat bedrijfsproblemen te typeren en classificeren zodanig dat de juiste technologieën en expertises kunnen worden geselecteerd. De business data scientist begrijpt informatieprocessen, kent de tool box van data science en weet zich handig te bewegen in het domein van de belangen die altijd met projecten zijn gemoeid.

    De BDS leergang is relevant voor productmanagers en marketeers die data intensiever willen gaan werken, voor hard core data scientists die de verbinding willen leggen met de toepassing in hun organisatie en voor (project)managers die verantwoordelijk zijn voor het functioneren van data scientists.

    De leergang BDS 2.0 wordt gekenmerkt door een actie gerichte manier van leren. Gebaseerd op een theoretisch framework dat tot doel heeft om naar de tool box van data science te kijken vanuit het oogpunt van business value staan cases centraal. In die cases worden alle fasen van het tot waarde brengen van data belicht. Van de projectdefinitie via de data analyse en de business analytics naar het daadwerkelijk gebruik. En voor alle relevante fasen leveren specialisten een deep dive. Ben je geïnteresseerd in de leergang. Download dan hier de brochure. http://www.ru.nl/rma/leergangen/bds/

    Egbert Philips  

    Docent BDS leergang Radboud Management Academy

    Director Hammer, market intelligence   www.Hammer-intel.com

     

  • Business Data Scientist leergang nu ook in België

     

    De Radboud Management Academy heeft haar in Nederland zo succesvolle Business Data Scientist leergang nu ook in Belgie op de markt gebracht. In samenwerking met Business & Decision werd afgelopen week in het kantoor van Axa in Brussel een verkorte leergang gegeven aan mensen uit het Belgische bedrijfsleven. Ook vertegenwoordigers van ministeries en andere overheidsinstellingen waren vertegenwoordigd.BDS

    De opleiding speelt in op de behoefte van bedrijven meer waarde te halen uit de bij hen beschikbare data. Daarbij richt de opleiding zich niet alleen op de ontwikkeling van individuele competenties maar ook op organisatiestructuren en instrumenten die organisaties helpen meer datagestuurd te werken.

    Het 3D model dat centraal staat in de leergang wordt door cursisten als een belangrijke toevoeging gezien op de technische competenties die men vaak reeds bezit. Meer en meer wordt onderkend dat eigenschappen die de interfacing met de ‘business’ kunnen verbeteren uiteindelijk bepalend zijn voor het tot waarde brengen van de inzichten die uiteindelijk met data kunnen worden gegenereerd. De toolbox van de Data Scientist wordt in de leergang op een significante wijze uitgebreid met zowel functionele, sociale als technische eigenschappen.

    Meer weten? Ga naar http://www.ru.nl/rma/leergangen/bds/

  • Business Intelligence in 3PL: Mining the Value of Data

    data-mining-techniques-create-business-value 1In today’s business world, “information” is a renewable resource and virtually a product in itself. Business intelligence technology enables businesses to capture historical, current and predictive views of their operations, incorporating such functions as reporting, real-time analytics, data and process mining, performance management, predictive analytics, and more. Thus, information in its various forms and locations possesses genuine inherent value.
     
    In the real world of warehousing, the availability of detailed, up-to-the minute information on virtually every item in the operators’ custody, from inbound dock to delivery site, leads to greater efficiency in every area it touches. Logic would offer that greater profitability ensues.
     
    Three areas of 3PL operations seem to be most benefitted through savings opportunities identified through business intelligence solutions: labor, inventory, and analytics.
    In the first case, business intelligence tools can help determine the best use of the workforce, monitoring its activity in order to assure maximum effective deployment. The result: potentially major jumps in efficiency, dramatic reductions in downtime, and healthy increases in productivity and billable labor.
     
    In terms of inventory management, the metrics obtainable through business intelligence can stem inventory inaccuracies that would have resulted in thousands of dollars in annual losses, while also reducing write-offs.
     
    Analytics through business intelligence tools can also accelerate the availability of information, as well as provide the optimal means of presentation relative to the type of user. One such example is the tracking of real-time status of work load by room or warehouse areas; supervisors can leverage real-time data to re-assign resources to where they are needed in order to balance workloads and meet shipping times. A well-conceived business intelligence tool can locate and report on a single item within seconds and a couple of clicks.
     
    Extending the Value
    The value of business intelligence tools is definitely not confined to the product storage areas.
     
    With automatically analyzed information available in a dashboard presentation, users – whether in the office or on the warehouse floor – can view the results of their queries/searches in a variety of selectable formats, choosing the presentation based on its usefulness for a given purpose. Examples:
    • Status checks can help identify operational choke points, such as if/when/where an order has been held up too long; if carrier wait-times are too long; and/or if certain employees have been inactive for too long.
    • Order fulfillment dashboards can monitor orders as they progress through the picking, staging and loading processes, while also identifying problem areas in case of stalled processes.
    • Supervisors walking the floor with handheld devices can both encourage team performance and, at the same time, help assure efficient dock-side activity. Office and operations management are able to monitor key metrics in real-time, as well as track budget projections against actual performance data.
    • Customer service personnel can call up business intelligence information to assure that service levels are being maintained or, if not, institute measures to restore them.
    • And beyond the warehouse walls, sales representatives in the field can access mined and interpreted data via mobile devices in order to provide their customers with detailed information on such matters as order fill rates, on-time shipments, sales and order volumes, inventory turnover, and more.
    Thus, well-designed business intelligence tools not only can assemble and process both structured and unstructured information from sources across the logistics enterprise, but can deliver it “intelligently” – that is, optimized for the person(s) consuming it. These might include frontline operators (warehouse and clerical personnel), front line management (supervisors and managers), and executives.
     
    The Power of Necessity
    Chris Brennan, Director of Innovation at Halls Warehouse Corp., South Plainfield N.J., deals with all of these issues as he helps manage the information environment for the company’s eight facilities. Moreover, as president of the HighJump 3PL User Group, he strives to foster collective industry efforts to cope with the trends and issues of the information age as it applies to warehousing and distribution.
     
    “Even as little as 25 years ago, business intelligence was a completely different art,” Brennan has noted. “The tools of the trade were essentially networks of relationships through which members kept each other apprised of trends and happenings. Still today, the power of mutual benefit drives information flow, but now the enormous volume of data available to provide intelligence and drive decision making forces the question: Where do I begin?”
     
    Brennan has taken a leading role in answering his own question, drawing on the experience and insights of peers as well as the support of HighJump’s Enterprise 3PL division to bring Big Data down to size:
     
    “Business intelligence isn’t just about gathering the data,” he noted, “it’s about getting a group of people with varying levels of background and comfort to understand the data and act upon it. Some managers can glance at a dashboard and glean everything they need to know, but others may recoil at a large amount of data. An ideal BI solution has to relay information to a diverse group of people and present challenges for them to think through.”
     
    source: logisticviewpoints.com, December 6, 2016
  • Business Intelligence nog steeds hot….

    Business Intelligence outdated? Niets is minder waar zo bewees het Heliview congres ‘Decision making by smart technologies’ dat afgelopen dinsdag in de Brabanthallen in Den Bosch werd georganiseerd.

    200 Klantorganisaties luisterden naar presentaties van o.a. Rick van der Lans, Peter Jager, Frank de Nijs en Arent van ‘t Spijker. Naast het bekende geluid was er ook veel nieuws te beluisteren in Den Bosch.

    Nieuwe technologieën maken heel veel meer mogelijk. Social media en, moderne, big data technologie stellen organisaties in staat veel meer waarde uit data te halen. Hoe organisaties dat moeten doen is veelal nog een uitdaging. Toepassing van de technologie is geen doel op zich zelf. Het gaat erom toegevoegde waarde voor organisaties te produceren. Of door optimalisatie van processen. Dan wel door het beter bedienen van de klant door productontwikkeling. In extremis kan data zelfs de motor achter nieuwe business concepten of –modellen zijn. Voorwaarde is wel een heldere bedrijfsvisie (al dan niet geproduceerd met intelligent gebruik van data en informatie). Belangrijk om te voorkomen dat we ongericht miljoenen stuk slaan op nieuwe technologie.

    Voor de aanwezigen was het gehoorde geluid soms bekend, maar soms ook een confrontatie met zichzelf. Een ding is zeker: De rol van data en informatie bij het intelligent zaken doen is nog niet uitgespeeld. Business Intelligence leeft.

    30 JANUARI 2015

  • Chatbots, big data and the future of customer service

    Chatbots, big data and the future of customer service

    The rise and development of big data has paved the way for an incredible array of chatbots in customer service. Here's what to know.

    Big data is changing the direction of customer service. Machine learning tools have led to the development of chatbots. They rely on big data to better serve customers.

    How are chatbots changing the future of the customer service industry and what role does big data play in managing them?

    Big data Leads to the deployment of more sophisticated chatbots

    BI-kring published an article about the use of chatbots in HR about a month ago. This article goes deeper into the role of big data when discussing chatbots.

    The following terms are more popular than ever: 'chatbot', 'automated customer service', 'virtual advisor'. Some know more, others less about process automation. One thing is for sure: if you want to sell more on the internet, handle more customers, save on personnel costs, you certainly need a chatbot. A chatbot is a conversational system that was created to stimulate intelligent conversation between a human and an automaton.

    Chatbots rely on machine learning and other sophisticated data technology. They are constantly collecting new data from their interactions with customers to offer a better experience.

    But how commonly used are chatbots? An estimated 67% of consumers around the world have communicated with one. That figure is going to rise sharply in the near future. In 2020, over 85% of all customer service interactions will involve chatbots.

    A chatbot makes it possible to automate customer service in various communication channels, for example on a website, chat, in social media or via SMS. In practice, a customer does not have to wait for hours to receive a reply from the customer service department, a bot will provide an answer within a few seconds.

    According to requirements, a chatbot may assume the role of a virtual advisor or assistant. For questions where a real person has to become involved, in analyzing the received enquiries bots can not only identify what issue the given customer is addressing but also to automatically send it to the correct person or department. Machine learning tools make it easier to determine when a human advisor is needed.

    Bots supported by associative memory algorithms understand the entire content even if the interlocutor made a mistake or a typo. Machine learning makes it easier for them to decipher contextual meanings by interpreting these mistakes.

    Response speed and 24/7 assistance are very important when it comes to customer service, as late afternoons and evenings are times of day when online shops experience increased traffic. If a customer cannot obtain information about a given product right there and then, it is possible that they will just abandon their basket and not come shop at that store again. Any business would want to prevent that a customer journey towards their product takes a turn the other way, especially if it's due to a lack of appropriate support.

    Online store operators, trying to stay a step ahead of the competition, often decide to implement a state-of-the-art solution, which makes the store significantly more attractive and provides a number of new opportunities delivered by chatbots. Often, following the application of such a solution, website visits increase significantly. This translates into more sales of products or services.

    We are not only seeing increased interest in the e-commerce industry, chatbots are successfully used in the banking industry as well. Bank Handlowy and Credit Agricole use bots to handle loyalty programmes or as assistants when paying bills.

    What else can a chatbot do?

    Big data has made it easier for chatbots to function. Here are some of the benefits that they offer:

    • Send reminders of upcoming payment deadlines.
    • Send account balance information.
    • Pass on important information and announcements from the bank.
    • Offer personalised products and services.
    • Bots are also increasingly more often used to interact with customers wishing to order meals, taxis, book tickets, accommodation, select holiday packages at travel agents, etc.

    The insurance industry is yet another area where chatbots are very useful. Since insurance companies are already investing heavily in big data and machine learning to handle actuarial analyses, it is easy for them to extend their knowledge of data technology to chatbots.

    The use of Facebook Messenger chatbots during staff recruitment may be surprising for many people.

    Chatbots are frequently used in the health service as well, helping to find the right facilities, arrange a visit, select the correct doctor and also find opinions about them or simply provide information on given drugs or supplements.

    As today every young person uses a smartphone, social media and messaging platforms for a whole range of everyday tasks like shopping, acquiring information, sorting out official matters, paying bills etc., the use of chatbots is slowly becoming synonymous with contemporary and professional customer service. A service available 24/7, often geared to satisfy given needs and preferences.

    Have you always dreamed of employees who do not get sick, do not take vacations and do not sleep? Try using a chatbot.

    Big data has led to fantastic developments with chatbots

    Big data is continually changing the direction of customer service. Chatbots rely heavily on the technology behind big data. New advances in machine learning and other data technology should lead to even more useful chatbots in the future.

    Author: Ryan Kh

    Source: SmartDataCollective

  • CIO's Adjust BI Strategy for Big Data

     
    The CIO focus on business intelligence (BI) and analytics will likely continue through 2017, according to Gartner Inc. The research firm says the benefits of fact-based decision-making are clear to business managers in a broad range of disciplines, including marketing, sales, supply chain management, manufacturing, engineering, risk management, finance and human resources.

    "Major changes are imminent to the world of BI and analytics, including the dominance of data discovery techniques, wider use of real-time streaming event data and the eventual acceleration in BI and analytics spending when big data finally matures," Roy Schulte, vice president and distinguished analyst at Gartner, said in a statement. "As the cost of acquiring, storing and managing data continues to fall, companies are finding it practical to apply BI and analytics in a far wider range of situations."

    Gartner outlined four key predictions for BI and analytics:

    • By 2015, the majority of BI vendors will make data discovery their prime BI platform offering, shifting BI emphasis from reporting-centric to analysis-centric.
    • By 2017, more than 50% of analytics implementations will make use of event data streams generated from instrumented machines, applications and/or individuals.
    • By 2017, analytic applications offered by software vendors will be indistinguishable from analytic applications offered by service providers.
    • Until 2016, big data confusion will constrain spending on BI and analytics software to single-digit growth.

    Recent Gartner surveys show that only 30% of organizations have invested in big data, of which only a quarter (or 8% of the total) have made it into production. This leaves room for substantial future growth in big data initiatives, the firm says.

    By: Bob Violino

  • Cracking the Code of Big Data: Key Challenges and Solutions

    Cracking the Code of Big Data: Key Challenges and Solutions

    As organizations become increasingly data-driven and with the development better computer performance and larger data storage, the groundwork for the next evolutionary step was made possible. Big data took off and started to expand into new domains.

    In this post, we explain critical Big data problems and solutions that organizations should be aware of.

    Understanding Big data issues: What is Big data?

    The IT world is full of definitions that can be challenging to understand, even for industry professionals. To avoid confusion, let’s define what Big data is before proceeding with the topic.

    Simply put, Big data means an enormous amount of digital information that an organization can analyze to discover patterns, for example, in clients’ behavior. Those revealed patterns then become foundations for profit-oriented decisions and further business development plans.

    Problems Big data can solve vary from improving one’s experience with the Windows UI to the colonization of Mars, building the fastest tourism route and understanding the influence of cultural factors on the customer’s behavior.

    Therefore, the core idea of the Big data concept is to help with grounded decision-making to optimize existing workflows and introduce new ideas.

    The 6 V’s of Big data:

    • Volume: The total size of data (nowadays, it’s measured in peta- or even exabytes of data in one data set)
    • Velocity: The data’s flow speed
    • Veracity: The data’s validity
    • Variety: The data’s nature (structured and unstructured formats)
    • Value: The opportunity to receive profitable conclusions from analyzing the data
    • Variability: The scale and speed of data transformation, obsolescence and refreshment.

    Big data problem examples

    The idea of Big data sounds excellent and can work well in perfect conditions. However, despite all the benefits that Big data can bring, a question arises: What are the problems associated with Big data?

    The number of Big data problems to solve is great and increasing, but the top 6 data science challenges can and should be highlighted. Further in this post, we review six critical problems with Big data that organizations need to address regardless of their size and industry.

    Wrong treatment of Big data

    Organizations treating Big data wrongly risk failing on different levels. The typical examples of Big data problems include an employee who does not know about the data itself, its sources, value and the related workflows. That employee might create a risk of losing the entire data set by, for instance, not backing up data on time. And until such employees have a clear view of the organization’s data, that risk is relevant no matter the number of other qualified IT specialists and data analysts at the organization’s disposal.

    Growing data volumes

    The total volume of data an organization might store can reach petabytes and even more. Organizing proper storage of those vast data sets is among the key problems associated with Big data already.

    The overall data generation tempo, which is going to steadily increase with time, makes handling the Big data volumes a relevant and urgent issue that requires further investments and tech progress. The additional complicating factor for Big data storage problems is the nature of data sets, which mainly come from documents, audio, text files, videos and other data without a common structure.

    Too many Big data tools

    The Big data analytics problems for organizations in Big data app development begin even before they start analyzing the data sets. The variety of Big data tools available to integrate and use can be confusing. Which technology will be the best for data storage? What app to pick for the most efficient data analysis?

    The wide choice of tools along with such questions can put pressure on an organizations’ leaders, while finding straightforward answers is not always possible. Without suitable solutions and technologies supporting their Big data initiatives, organizations get wrong outcomes and make grave decisions, additionally wasting funds, resources and effort.

    Expertise deficiency

    Operating modern IT technologies and Big data tools properly is impossible without qualified employees. Organizations need data engineers, analysts and scientists to collect, operate, store and analyze giant data sets, and bring efficient results. The problem of Big data here is that the growth of the industry is more rapid than the pace of the desired professionals’ education.

    Big data security

    The world has gone online long ago, and today the variety of cyberthreats awaiting an opportunity to strike individuals and organizations is enormous. Big data means valuable information stored in one place. Such storage is an attractive target for the attacks of both lone cybercriminals and organized corporate espionage groups, meaning that Big data security issues are unavoidable.

    Moreover, organizations can get overwhelmed with storing, analyzing, understanding and using those data sets. As a result, they choose to postpone solving Big data security problems. For instance, organizations that run VMWare-bases virtual environments but don’t integrate a VMWare backup solution leave their valuable data assets vulnerable to ransomware and other malicious attacks.

    Big data Integration

    Organizations find data in different sources, from targeted reports and quizzes to social media pages and customer emails. Integrating the data of such various types and sources into one system that can assist leaders in decision-making and avoiding Big data analytics issues is another challenge. Even the most thorough analysis can fail when an analyst misses important data that specialists couldn’t integrate properly.

    Special case: Big data problems in healthcare

    As Big data is a part of everyday life, the problems already mentioned can be relevant to any organization. Still, particular industries can have special requirements for data and, consequently, highlight specific problems. Viewing Big data problems from a healthcare perspective can be a spectacular example in this case.

    Big data failure

    Back in the late 2000s, Google Flu Trends was considered an efficient Big data project for the proven ability to predict influenza related doctor visits accurately, even more so than the CDC (Centre for Disease Control) that used traditional approaches and stats. Still, in 2013, Google Flu Trends predicted a two times higher demand for influenza visits compared to the actual numbers. Google then stopped the project in August 2015.

    This failure can be explained by one of the data sources that Google Flu Trends used: Google searches related to influenza, including such general requests as, for example, “cough” or “fever”. Google Flu Trends could only measure the number of searches but not the purpose behind the search requests.

    Summing up the above, counting on Big Data too much can be among the most urgent problems with Big data in healthcare. Failures similar to that of Google Flu Trends can and will cause negative effects on the quality of healthcare initiatives.

    Understanding context

    Big data is about numbers, images and texts gathered from multiple sources. However, a computer can highlight a minor aspect of the patient’s problem while being unable to have a broader look at the entire case. Understanding the context is important while analyzing data related to healthcare in particular, and while solving Big data research problems in general.

    Big data privacy issues

    Again, personal data protection is a common challenge. However, when speaking of the healthcare industry, Big data and privacy issues go hand-in-hand. The type of personal information that healthcare organizations collect and store causes extreme risks for individuals’ privacy.

    Healthcare Big data breaches can, for instance, make someone’s disease conditions or genetic info public and lead to violations of other personal rights. If not published, the covert misuse of such data can still threaten a person’s comfort, health or even life.

    Medication mistakes threatening patients

    The goal of technology in analyze disease cases and prescribing medicines is to increase the effectiveness of medical care. However, in medical treatment, one mistake can have life threatening consequences. Checking and confirming the medical data analysis results and the relevance of recommendations can be challenging due to the overall data volumes that a computer might process.

    Healthcare data quality and intercompatibility

    Although the overall data quality can be questioned within the industry, contemporary medical services are impossible without Big data. Therefore, data quality issues in Big data require special attention in healthcare.

    The additional point that is usually neglected is the critical need to capture and monitor every interaction of the patient with the healthcare system, ensuring data intercompatibility between different institutions and doctors.

    Research and evidence reliability

    The observational nature of Big data techniques is another problem which does not allow getting clear cause-and-effect conclusions, as one cannot avert confounding variables. Big data includes terabytes of diverse data pieces and is challenging to control even when originating from the same healthcare facility or research institution. Consequently, analysts and decision-makers must be very careful when generalizing Big data outcomes.

    Big data ethical issues

    Just like with the World Wide Web in the past, or with neural networks today, organizations are also facing ethical issues with Big data besides practical challenges. Misuse of data sets in an ethical sense can lead an organization to probably the most unwanted result: reputational loss. Highlighting the crucial ethical issues beforehand can help organizations figure out how to solve Big data problems later.

    Data ethics application

    Executives don’t always have ethical questions of data usage on top of their minds. That disregard is not intentional. The organization’s leader normally prefers paying more attention to “visible” and urgent things (tools, technologies, data management KPIs, etc.) rather than to the ethical issues of incorrect data usage.
    Nevertheless, any case of data use should be a consideration point for an organization.

    Big data features make the need to think over the usage results even more relevant because the price of a mistake can be enormously high. One should pay special attention to the data utilization while “feeding” Big data sets to machine learning and AI applications. The consequences of careless attitude to data sets are unpredictable in that case.

    Data ethics responsibility

    The organization’s leaders may think that hiring sector experts and delegating data management responsibilities to them is enough to fulfill Big-data–related ethical obligations.

    However, data ethics should be every employee’s concern, not only that of data scientists and compliance officers. Front-line workers should be aware of the need to think over ethical points of data, and of the possibility to raise and discuss issues they note. Executives, in turn, must ensure that their data use strategies and commercial goals meet customer expectations and legal demands.

    It’s important to consider the potential Big data impact on society, as the use of data analytics can have far-reaching implications on issues such as privacy, security, and social inequality, requiring organizations to balance their business interests with ethical considerations and social responsibility.

    Aim for quick profits

    When using Big data to solve social problems and offer certain improvements for clients, organizations still count on profits. Economic instability, along with the aggressive expansion of neural networks and other innovations throughout different industries, make organizations cut expenses and optimize investments.

    In such conditions, executives and hired employees may face a temptation to violate ethical rules (for example, by sharing useful personal info inappropriately) in exchange for quicker and higher ROI.

    Data matters, sources don’t

    Ethical issues with Big data may also arise if the leaders prefer to notice the value and reliability of particular data sets without analyzing the entire data supply channel. What is the source of data? Is there a guarantee that the data subjects’ consent for third-party usage can be verified? Can an organization use the entire data set legally and without unpredictable consequences? Answers to such questions before collecting and utilizing Big data are required to preserve the organization’s functioning and reputation among clients, partners and industries.

    Solutions to Big data problems

    After a particular Big data problem statement, the time comes to solve issues. The solutions may require additional investments, effort and time. However, the result is more efficient, secure and ethical.

    Hire and empower experts

    To solve the mentioned issues with Big data, an organization can, in the first place, increase investments in hiring qualified data scientists, analysts and managers. Another efficient step would be the additional funding of training and education for the existing employees.

    With qualified professionals, an organization can then get more from advanced machine learning or AI-driven solutions for data analytics, including Big data analytics in the supply chain, to gain insights and improve operational efficiency. Getting up-to-date solutions for employees who are less qualified in data science can be a working price/quality alternative here and now, additionally enabling organizations to boost staff members’ knowledge of Big data with time.

    The data science impact on business can be significant, as organizations that invest in data science talent and technology can gain a competitive edge by leveraging advanced analytics to drive insights and inform strategic decision-making.

    Invest more in data security

    Cybersecurity staff qualification is crucial to protect an organization’s IT infrastructure. Leaders can also consider the following practices to improve the security of their Big data sets:

    • Data encryption
    • Data partitioning
    • Data back-up and disaster recovery
    • Identity and access control
    • Endpoint security
    • Real time monitoring of IT infrastructures

    On average, data breaches cost organizations $4.35 million in 2022. Investing in data backup, disaster recovery and security is a more affordable alternative.

    Use specialized data integration tools

    Another thing that is important for solving a Big Data problem is finding the right tools. Hiring an experienced data professional to create and run the environment according to the organization’s needs is the first way.

    Alternatively, an organization might want to ask for professional consulting. It can be beneficial for an organization to seek guidance from a data architecture consultant who can provide expert advice on data management, processing, and storage, ensuring that the organization is equipped with the best tools and practices for their specific needs.

    A company can choose among the suggested data tools and either integrate them in the existing workflows or reorganize the infrastructure to optimize the use of new tools. In addition to traditional Big data tools, organizations can also consider leveraging IoT Big data solutions to collect and analyze data from connected devices and sensors, providing valuable insights for optimizing processes and improving operational efficiency.

    Apply data storage improvements

    Organizations can handle the enormous (and growing) size of Big data sets by applying contemporary storage improvements and technologies, such as:

    • Data tiering: Sending data storage types - public and private clouds, flash drives, tapes, etc. 
    • Deduplication: Removing data doubles Big data sets. 
    • Compression: Reducing of bits that data items occupy in storage. 

    Boost Big data knowledge

    Organizations should hold Big data knowledge transfers, topical seminars and courses for employees. Additional training sessions and educational opportunities are a must-have for every team member involved in Big data projects. All levels of the organization must ensure that employees have a basic understanding of data concepts to reduce human-related risks.

    Conclusion

    The impact of Big data on business is immense, as organizations benefit from using Big data to solve economic and social problems, improve client experience and predict future trends in their industries. However, executives and employees are bound to face problems when collecting and analyzing enormous data sets.

    The scale, complexity and angle of Big data issues depend on particular industries and vary between organizations. For instance, Big data problems in healthcare mainly impact patient privacy and treatment which, in some cases, can lead to a life threating situation. Review the examples of Big data problems and solutions provided in this article to develop the suitable approach towards data retrieval, storage, usage and security.

    Work on practical, ethical and legal Big data issues with equal thoroughness: this can help you save time, cut costs and maintain your reputation, as well as ensure stable operations and profits for your organization. 

    Author: Alex Tray

    Source: InData Labs

    Data: May 24, 2023

     

     

     

  • Data als ingrediënt op weg naar digitale volwassenheid

    0cd4fbcf0a4f81814f388a75109da149ca643f45Stéphane Hamel deed op 21 januari de High Tech Campus in Eindhoven aan: dé kans voor een flinke dosis inspiratie door één van ’s wereld meest vooraanstaande denkers in digital analytics. Hamel lichtte op digital maturity day 2016 (#DMD2016) het Digital Analytics Maturity-model toe.

    Imperfecte data

    Volgens Stéphane Hamel is het verschil tussen een goede en een excellente analyst het volgende: de excellente analyst weet ook bij imperfecte data te komen tot beslissingen of zinvol advies. “Data will never be perfect, know how bad the data is is essential. If you know 5 or 10% is bad, there is no problem”, aldus Hamel.

    Analytics = Context + Data + Creativity

    Analytics klinkt als een vakgebied voor datageeks en nerds. Dat beeld klopt niet: buiten de data is het onderkennen van de context waarbinnen de data zijn verzameld en creativiteit bij het interpreteren ervan essentieel. Om data te begrijpen moet je vanachter je laptop of PC vandaan komen. Alleen door de wereld ‘daarbuiten’ mee te nemen in je analyse kun je als data-analist tot zinvolle inzichten en aanbevelingen komen.

    Hamel geeft een voorbeeld uit de collegebanken: toen een groep studenten de dataset van Save the Children uit 2010 te zien kreeg, dachten sommigen dat de factor 10 toename in websiteverkeer te danken was aan een campagne of toeval. De werkelijke oorzaak was de aardbeving in Haïti.

    Digital Maturity Assessment

    Het Digital Maturity Assessment-model is ontwikkeld aan de hand van de digitale transformatie van honderden bedrijven wereldwijd. Op basis van deze ervaringen weet Stéphane welke uitdagingen bedrijven moeten overwinnen op weg naar digital leadership.

    Digital Analytics Maturity SHamel

    Dit model kun je natuurlijk gebruiken om de eigen organisatie te benchmarken tegen andere bedrijven. De meerwaarde volgens Hamel zit echter in het ‘benchmarken van jezelf versus jezelf’. Het helpt kortom om het gesprek intern aan te gaan. Als je voor de derde keer van tooling switcht, ben je zelf het probleem, niet de technologie.

    Hamel geeft de voorkeur aan een consistente score op de vijf criteria van dit Digital Maturity Assessment-model: liever een twee overall dan uitschieters naar boven of beneden. De factor die meestal het zwakst scoort is ‘process’.

    Dit criterium staat voor de werkwijze om te komen tot dataverzameling, -analyse en -interpretatie. Vaak zit dit proces zelf helemaal niet zo slecht in elkaar, maar worstelen data-analisten om aan collega’s of het managementteam uit te leggen welke stappen ze hebben gezet. Hamel benadrukt daarom: “you need a digital culture, not a digital strategy”.

    Omhels de jongens van IT

    Geef IT de kans om jou echt te helpen. Niet door te zeggen ‘voer dit uit of fix dat’. Wel door IT te vragen om samen met jullie een probleem op te lossen. Hamel ziet digitale analisten daarom vooral als change-agents, niet als stoffige dataprofessionals. Juist die shift in benadering en rol betekent dat we binnenkort niet meer spreken over digital analytics, maar over ‘analytics’.

    Data is the raw material of my craft

    Hamel’s favoriete motto “data is the raw material of my craft” verwijst naar het vakmanschap en de passie die Stéphane Hamel graag aan het vakgebied digital analytics toevoegt. Stéphane’s honger om het verschil te maken in digital analytics werd ooit tijdens een directievergadering aangewakkerd. Hamel zat in die vergadering erbij als de ‘IT guy’ en werd niet serieus genomen toen hij met data de business problemen en kansen wilde benoemen.

    Dit prikkelde Hamel om, met steun van zijn baas, een MBA te gaan doen. En met resultaat: hij rondde de MBA af behorende tot de top 5 procent van alle studenten. Sindsdien opereert hij op het snijvlak van data en bedrijfsprocessen, ondermeer in het beurswezen en in de verzekeringsbranche.

    Digital is de grote afwezige in het onderwijs

    Hamel’s zeer indrukwekkende loopbaan tonen ondermeer een erkenning als een van ’s werelds weinige Certified Web Analysts, ‘Most Influential Industry Contributor’ door de Digital Analytics Association en mede-beheerder van de grootste community op Google+ over Google Analytics. Toch vindt Hamel zijn allergrootste prestatie het afwerpen van het stempel ‘IT’er’.

    Zijn grootste ambitie voor de nabije toekomst is het schrijven van een tekstboek over digital analytics. Er is veel informatie digitaal beschikbaar, maar er mist nog veel content in offline formaat. Juist omdat ook andere sprekers op #DMD16 wezen naar het achterblijvend niveau van onze HBO- en WO-opleidingen in digitale vaardigheden vroeg ik Hamel welke tips hij heeft voor het Nederlands onderwijs.

    In de basis dient volgens Hamel de component ‘digital’ veel meer als rode draad in het curriculum te worden opgenomen. Studenten dienen daarbij gestimuleerd te worden om de content zelf te verrijken met eigen voorbeelden. Zo komt er in cocreatie tussen docenten, auteurs en studenten steeds betere content tot stand.

    De belofte van big data en marketingautomatisering

    Hamel ziet zeker in B2B de toegevoegde waarde van marketing automation. Je relatie met klant en prospect is immers meer persoonlijk. Marketingautomatisering wordt echter soms foutief ingezet waarbij email wordt ingezet om de indruk te wekken van een persoonlijke, menselijke dialoog. Hamel: “I still believe in genuine, human interaction. There is a limit to how you can leverage marketingautomation.”

    Digital Maturity bron PREZI Joeri Verbossen

    Het grootste probleem bij de succesvolle introductie van marketingautomatisering is dan ook ook de maturiteit van de organisatie. Zolang deze niet voldoende is, zal een softwarepakket altijd vooral een kostenpost zijn. Een cultuuromslag moet plaatsvinden zodat de organisatie de software als noodzakelijke randvoorwaarde beschouwt voor het kunnen uitvoeren van de strategie.

    Dezelfde nuchtere woorden gebruikt Hamel over de belofte van big data. Al te vaak hoort hij in bedrijven: “We need Big Data!” Zijn antwoord is dan: “No, you don’t big data, you need solutions. As long as it does the job, I’m happy.”

    Source: Marketingfacts

  • Data as a universal language

    Data as a universal language

    You don’t have to look very far to recognize the importance of data analytics in our world; from the weather channel using historical weather patterns to predict the summer, to a professional baseball team using on-base plus slugging percentage to determine who is more deserving of playing time, to Disney using films’ historical box office data to nail down the release date of its next Star Wars film.

    Data shapes our daily interactions with everything, from the restaurants we eat at, to the media we watch and the things that we buy. Data defines how businesses engage with their customers, using website visits, store visits, mobile check-ins and more to create a customer profile that allows them to tailor their future interactions with you. Data enhances how we watch sports, such as the world cup where broadcasters share data about players’ top speed and how many miles they run during the match. Data is also captured to remind us how much time we are wasting on our mobile devices, playing online games or mindlessly scrolling through Instagram.

    The demand for data and the ability to analyze it has also created an entire new course of study at universities around the world, as well as a career path that is currently among the fastest growing and most sought-after skillsets. While data scientists are fairly common and chief data officer is one of the newest executive roles focused on data-related roles and responsibilities, data analytics no longer has to be exclusive to specialty roles or the overburdened IT department. 

    And really, what professional can’t benefit from actionable intelligence?

    Businesses with operations across the country or around the world benefit from the ability to access and analyze a common language that drives better decision making.  An increasing number of these businesses recognize that they are creating volumes of data that have value, and even more important perhaps, the need for a centralized collection system for the information so they use the data to be more efficient and improve their chances for success.

    Sales teams, regardless of their location, can use centrally aggregated customer data to track purchasing behavior, develop pricing strategies to increase loyalty, and identify what products are purchased most frequently in order to offer complementary solutions to displace competitors.

    Marketing teams can use the same sales data to develop focused campaigns that are based on real experiences with their customers, while monitoring their effectiveness in order to make needed adjustments and or improve future engagement.

    Inventory and purchasing can use the sales data to improve purchasing decisions, ensure inventory is at appropriate levels and better manage slow moving and dead stock to reduce the financial impact on the bottom line.

    Branch managers can use the same data to focus on their own piece of the business, growing loyalty among their core customers and tracking their sales peoples’ performance.

    Accounts receivables can use the data to focus their efforts on the customers that need the most attention in terms of collecting outstanding invoices. And integrating the financial data with operational data paints a more complete picture of performance for financial teams and executives responsible for reporting and keeping track of the bottom line.

    Data ties all of the disciplines and departments together regardless of their locations. While some may care more about product SKUs than P&L statements or on-time-in-full deliveries, they can all benefit from a single source of truth that turns raw data into visual, easy-to-read charts, graphs and tables.

    The pace, competition and globalization of business make it critical for your company to use data to your advantage, which means moving away from gut feel or legacy habits to basing key decisions on the facts found in your ERP, CRM, HR, marketing and accounting systems. With the right translator, or data analytics software, the ability to use your data based on roles and responsibilities to improve sales and marketing strategies, customer relationships, stock and inventory management, financial planning and your corporate performance, can be available to all within your organization, making data a true universal language.

    Source: Phocas Software

  • Data Lakes and the Need for Data Version Control  

    Data Lakes and the Need for Data Version Control

    In the ever-evolving world of big data, managing vast amounts of information efficiently has become a critical challenge for businesses across the globe. As data lakes gain prominence as a preferred solution for storing and processing enormous datasets, the need for effective data version control mechanisms becomes increasingly evident. 

    In this article, we will delve into the concept of data lakes, explore their differences from data warehouses and relational databases, and discuss the significance of data version control in the context of large-scale data management.

    Understanding Data Lakes

    A data lake is a centralized repository that stores structured, semi-structured, and unstructured data in its raw format. Unlike traditional data warehouses or relational databases, data lakes accept data from a variety of sources, without the need for prior data transformation or schema definition. As a result, data lakes can accommodate vast volumes of data from different sources, providing a cost-effective and scalable solution for handling big data.

    Before we address the questions, ‘What is data version control?’ and ‘Why is it important for data lakes?’, we will discuss the key characteristics of data lakes.

    Schema-on-Read vs. Schema-on-Write

    Data lakes follow the ‘Schema-on-Read’ approach, which means data is stored in its raw form, and schemas are applied at the time of data consumption. In contrast, data warehouses and relational databases adhere to the ‘Schema-on-Write’ model, where data must be structured and conform to predefined schemas before being loaded into the database.

    Flexibility and Agility

    Data lakes provide flexibility, enabling organizations to store diverse data types without worrying about immediate data modeling. This allows data scientists, analysts, and other stakeholders to perform exploratory analyses and derive insights without prior knowledge of the data structure.

    Cost-Efficiency

    By leveraging cost-effective storage solutions like the Hadoop Distributed File System (HDFS) or cloud-based storage, data lakes can handle large-scale data without incurring prohibitive costs. This is particularly advantageous when dealing with exponentially growing data volumes.

    Data Lakes vs. Data Warehouses and Relational Databases

    It is essential to distinguish data lakes from data warehouses and relational databases, as each serves different purposes and has distinct characteristics.

    Data Warehouses

    Some key characteristics of data warehouses are as follows:

    • Data Type: Data warehouses primarily store structured data that has undergone ETL (Extract, Transform, Load) processing to conform to a specific schema.
    • Schema Enforcement: Data warehouses use a “schema-on-write” approach. Data must be transformed and structured before loading, ensuring data consistency and quality.
    • Processing: Data warehouses employ massively parallel processing (MPP) for quick query performance. They are optimized for complex analytical queries and reporting.
    • Storage Optimization: Data warehouses use columnar storage formats and indexing to enhance query performance and data compression.
    • Use Cases: Data warehouses are tailored for business analysts, decision-makers, and executives who require fast, reliable access to structured data for reporting, business intelligence, and strategic decision-making.

    In summary, data lakes prioritize data variety and exploration, making them suitable for scenarios where the data landscape is evolving rapidly, and the initial data structure might not be well-defined also data lakes are more suitable for storing diverse and raw data for exploratory analysis, while data warehouses focus on structured data, ensuring data quality and enabling efficient querying for business-critical operations like business intelligence and reporting.

    Relational Databases

    Some key characteristics of relational databases are as follows:

    • Data Structure: Relational databases store structured data in rows and columns, where data types and relationships are defined by a schema before data is inserted.
    • Schema Enforcement: Relational databases use a “schema-on-write” approach, where data must adhere to a predefined schema before it can be inserted. This ensures data consistency and integrity.
    • Processing: Relational databases are optimized for transactional processing and structured queries using SQL. They excel at managing structured data and supporting ACID (Atomicity, Consistency, Isolation, Durability) transactions.
    • Scalability: Relational databases can scale vertically by upgrading hardware, but horizontal scaling can be more challenging due to the need to maintain data integrity and relationships.
    • Use Cases: Relational databases are commonly used for applications requiring structured data management, such as customer relationship management (CRM), enterprise resource planning (ERP), and online transaction processing (OLTP) systems.

    Data lakes are designed for storing and processing diverse and raw data, making them suitable for exploratory analysis and big data processing. Relational databases are optimized for structured data with well-defined schemas, making them suitable for transactional applications and structured querying.

    The Importance of Data Version Control in Data Lakes

    As data lakes become the backbone of modern data infrastructures, the management of data changes and version control becomes a critical challenge. Data version control refers to the ability to track, manage, and audit changes made to datasets over time. This is particularly vital in data lakes for the following reasons.

    Data Volume and Diversity

    Data lakes often contain vast and diverse datasets from various sources, with updates and additions occurring continuously. Managing these changes efficiently is crucial for maintaining data consistency and accuracy.

    Collaborative Data Exploration

    In data lakes, multiple teams and stakeholders collaboratively explore data to derive insights. Without proper version control, different users may inadvertently overwrite or modify data, leading to potential data integrity issues and confusion.

    Auditing and Compliance

    In regulated industries or environments with strict data governance requirements, data version control is essential for tracking changes, understanding data lineage, and ensuring compliance with regulations.

    Handling Changes at Scale with Data Version Control

    To effectively handle changes at scale in data lakes, robust data version control mechanisms must be implemented. Here are some essential strategies:

    • Time-Stamped Snapshots: Maintaining time-stamped snapshots of the data allows for a historical view of changes made over time. These snapshots can be used to roll back to a previous state or track data lineage.
    • Metadata Management: Tracking metadata, such as data schema, data sources, and data transformation processes, aids in understanding the evolution of datasets and the context of changes.
    • Access Controls and Permissions: Implementing fine-grained access controls and permissions ensures that only authorized users can make changes to specific datasets, reducing the risk of unauthorized modifications.
    • Change Tracking and Notifications: Setting up change tracking mechanisms and notifications alerts stakeholders about data modifications, ensuring transparency and awareness.
    • Automated Testing and Validation: Automated testing and validation procedures help detect and rectify any anomalies or inconsistencies resulting from data changes.

    Conclusion

    Data lakes have revolutionized the way organizations manage and analyze large-scale data. Their ability to store diverse data types without predefined schemas makes them highly flexible and cost-efficient. However, managing changes in data lakes requires careful attention to ensure data consistency, accuracy, and compliance. 

    Data version control plays a crucial role in addressing these challenges, enabling organizations to handle changes at scale and derive valuable insights from their data lakes with confidence and reliability. By implementing robust version control mechanisms and following best practices, businesses can leverage data lakes to their full potential, driving innovation and informed decision-making.

    Date: September 21, 2023

    Author: Kruti Chapaneri

    Source: ODSC

     

  • Data lakes, don't confuse them with data warehouses, warns Gartner

    LOIIn mid-2014, a pair of Gartner analysts levied some trenchant criticisms at the increasingly hyped concept of data lakes.

    "The fundamental issue with the data lake is that it makes certain assumptions about the users of information," said Gartner research director Nick Heudecker.

    "It assumes that users recognize or understand the contextual bias of how data is captured, that they know how to merge and reconcile different data sources without 'a priori knowledge' and that they understand the incomplete nature of datasets, regardless of structure."

    A year and a half later, Gartner's concerns do not appear to have eased. While there are successful projects, there are also failures -- and the key success factor appears to be a strong understanding of the different roles of a data lake and a data warehouse.

    Heudecker said a data lake, often marketed as a means of tackling big data challenges, is a great place to figure out new questions to ask of your data, "provided you have the skills".

    "If that's what you want to do, I'm less concerned about a data lake implementation. However, a higher risk scenario is if your intent is to reimplement your data warehousing service level agreements (SLAs) on the data lake."

    Heudecker said a data lake is typically optimised for different uses cases, levels of concurrency and multi-tenancy.

    "In other words, don't use a data lake for data warehousing in anger."

    It's perfectly reasonable to need both, he said, because each is optimised for different SLAs, users and skills.

    Data lakes are, broadly, enterprise-wide platforms for analysing disparate data sources in native format to eliminate the cost and data transformation complexity of data ingestion. And herein lies the challenge: data lakes lack semantic consistency and governed metadata putting a great deal of the analytical onus on skilled users.

    Heudecker said there is some developing maturity in understanding, but the data lake hype is still rampant.

    The maturity of the technology is harder to get a handle on because the technology options to implement data lakes continue to change rapidly.

    "For example, Spark is a popular data processing framework and it averages a new release every 43 days," Heudecker said.

    The success factors for data lake projects, he said, come down to metadata management, the availability of skills and enforcing the right levels of governance.

    "I've spoken with companies that built a data lake, put a bunch of data into it and simply couldn't find anything. Others have no idea which datasets are inaccurate and which are high quality. Like everything else in IT, there is no silver bullet."

    Data lakes are an architectural concept, not a specific implementation, he said.

    "Like any new concept, or technology for that matter, there will be accompanying hype followed by a period of disillusionment before becoming an understood practice.

    "Data lakes will continue to be a reflection of the data scientists that use them.

    "The technology may change and improve, perhaps taking advantage of things like GPUs or FPGAs, but the overall intent will be to uncover new uses and opportunities in data. Taking those insights to production will likely occur elsewhere."

  • Data redundancy: avoid the negative and use the positive to your advantage

    Data redundancy: avoid the negative and use the positive to your advantage

    Data redundancy means keeping data in two or more locations within a database or storage infrastructure. Data redundancy can occur either intentionally or accidentally within an organization. In case of data corruption or loss, the organization can continue operations or services if conscious redundancy is provided. On the other hand, unconscious redundancy causes duplicate data to waste database space and information inconsistencies throughout the organization.

    Types of data redundancy

    There are two types of data redundancy. Positive data redundancy is provided intentionally within the organization. It ensures that the same data kept and protected in different places are used for redundancy and business sustainability in case of a possible disaster.

    Wasteful data redundancy, which occurs with unintentional data duplication and is an indicator of failed database management, may cause information inconsistencies throughout an organization. When data is stored in numerous places, it takes up valuable storage space and makes it difficult for an organization to figure out which data should be accessed or updated.

    What is the difference between data redundancy, data duplicity, and backup?

    The main difference between redundancy and duplicity, which is often confused, lies in the reason for adding a new copy of the data. From a database point of view, data duplicity refers to data added back to the system by users. In contrast, redundancy requires synchronization between databases to ensure positive redundancy without any problems. While data duplicity inevitably causes inconsistency in databases, database synchronizations and data normalization prevent this issue in data redundancy.

    The distinction between data backup and redundancy may be subtle, but it is crucial. Backing up data creates compressed and encrypted versions of data stored locally or in the cloud. In contrast, data redundancy adds an extra layer of protection to the backup. Local backups are necessary for business continuity; however, it’s also essential to have another protective layer for data. You can reduce the risks by including data redundancy in your disaster recovery plan.

    What is the relationship between data redundancy and data inconsistency?

    Simply put, data redundancy leads to Data Inconsistency. The data inconsistency condition occurs when the same data exists in different formats in multiple tables. It means that other files contain different information about a particular object, situation, event, or person. This inconsistency can cause unreliable and meaningless information.

    Benefits of positive data redundancy

    Data must be stored in two or more locations to be considered redundant. Suppose the initial data is damaged or the hard drive on which it is stored fails. In that case, the backup data can help save the organization money.

    The redundant data may be either a complete copy of the original information or particular elements of it. Keeping only certain pieces of data allows organizations to reassemble lost or destroyed data without pushing their resource limitations. Backups and RAID systems are used to protect data in case of failure. Backups, for example, can be stored on multiple hard drives so that if one fails, the array can activate with minimal downtime.

    There are distinct advantages to data redundancy, which depend on its implementation. The following are some of the potential benefits:

      • Data redundancy helps to guarantee data security. Organizations can use redundant data to replace or recompile missing information when data is unavailable. 
      • Multiple data servers enable data management systems to examine any variances, assuring data consistency. 
      • Data may be easier to access in some areas than others for an organization that covers several physical locations. Accessing information from various sources might allow individuals in a company to access the same data more quickly.
      • Data redundancy is a must in business continuity management. Backup technology ensures data security, while disaster recovery services minimize downtime by prioritizing mission-critical information. Data redundancy serves as an add-on to both of these processes for increased recoverability.

    How to avoid wasteful data redundancy?

    As wasteful data redundancy grows, it takes up a significant server storage space over time. The fewer storage slots there are, the longer it will take to retrieve data, eventually harming business results. On the other hand, inconsistent data is likely to corrupt reports or analytics that can cost organizations direly.

    Data redundancy is popular among organizations as data security or backup method. It appears to be an excellent solution when you have all the resources needed to store and manage your data. But if you don’t have enough resources, the positive redundancy can turn wasteful quickly. Here are some valuable tips to avoid wasteful redundancy:

      • Master Data provides more consistency and accuracy in data. It’s the sum of all your vital business information stored in various systems throughout your company. The use of master data does not eliminate data redundancy; instead, it helps organizations work around a certain degree of redundancy. The main advantage of master data is that it allows companies to work on a single changed data element instead of the overall data.
      • Another source of data redundancy is keeping information that isn’t relevant any longer. Suppose you migrate your data to a new database but forget to delete it from the old one. In that case, you’ll have the same information in two locations, wasting space. Make sure databases that aren’t required anymore are deleted.
      • Data normalization is a technique that involves organizing data in a database to minimize duplication. This approach ensures that the data from all records are comparable and may be interpreted similarly. Standardizing data fields, including customer names, contact information, and addresses is easy with data normalization. Therefore, it will allow you to quickly delete, update, or add any information.

    Author: Hasan Selman

    Source: Dataconomy

  • Data science en de groei naar volwassenheid

    Data strategyIn 2015 ben ik samen met de Radboud Management Academy en Business & Decision gestart met de leergang Business Data Scientist. In deze blog ga ik in op één van de belangrijke doelstellingen van de leergang: bedrijven helpen een passend groeipad in big data gebruik te ontwikkelen en doorlopen.

    Data Science is een beetje als de buurman die je zijn nieuwe auto met het allernieuwste snufje laat zien. Bijna onmiddellijk bekruipt je het gevoel dat je dat ook nodig hebt. Veel ondernemingen ervaren hetzelfde gevoel wanneer het om data science gaat.

    Data is door talrijke technische en sociale ontwikkelingen (Connected economy, mobility, internet of things, willingness to share data) in overvloed aanwezig. Bedrijven herkennen ook dat data meer is dan een bijproduct van operationele processen. Ze zoeken daarom, meer of minder gedreven door de successen van de buurman, naar mogelijkheden om hun eigen bedrijfsvoering te verbeteren. Daarbij gebruikmakend van data als een primaire bron of asset.

    Veel ondernemingen vragen zich echter af: (Waar) moet ik beginnen? Wat moet ik ambiëren? Wie heb ik nodig om dit te organiseren? Is het voldoende als ik een stel hard core data scientists in dienst neem? Vanzelfsprekend is er geen ‘one fits all’ antwoord op deze vragen. Deze vragen zijn uitsluitend te beantwoorden wanneer de onderneming een helder beeld heeft op een haar passende data science strategie en bijbehorend groeipad. Wanneer deze ontbreken dreigt mislukking. En frustratie! Niet in de laatste plaats bij de aangenomen data scientists die de oplossing voor bijna alles zouden zijn. Het is immers moeilijk te voldoen aan onbepaalde en oneindige verwachtingen.

    Bedrijven kunnen verschillende keuzes maken in hun data science strategie. Deze zijn afhankelijk van hun eigen positie en bedrijfsstrategie. De uitgangspunten voor het laten aansluiten van de data science strategie op de bedrijfsstrategie kunnen verschillen. In de ene groep data-science strategieën (‘executie-strategie’, ‘transformatie-strategie’ en ‘service-strategie’) staat de bedrijfsstrategie niet ter discussie en is het doel van data science de bedrijfsvoering te ondersteunen en optimaliseren. In de andere data-science strategie is het doel juist de ondernemingsstrategie te veranderen. Data is dan een enabler voor fundamentele verandering van de business.

    De ene data science strategie is niet beter dan de andere. Bovendien zijn er mengvormen mogelijk en kan de ene strategie de andere in een later stadium volgen. Belangrijker is dat organisaties een expliciete keuze maken en een passende roadmap naar volwassenheid opstellen. De ontwikkeling van de data science afdeling wordt daar vervolgens op afgestemd. De ene data science competentie is namelijk de andere niet. Voor een executie strategie heb je bijvoorbeeld andere mensen en technologieën nodig dan voor een enabler strategie.

    Zo kiezen organisaties bewust een eigen data science strategie en groeipad naar volwassenheid. Vanuit dat kader kunnen technologische competenties en tools worden beoordeeld op hun noodzaak en bruikbaarheid. En maakt het gevoel ‘het nieuwste snufje van de buurman ook te moeten hebben’ plaats voor een bewust afweging op basis van ‘behoefte , bruikbaarheid en ontwikkelstadium’.

    Op zoek naar meer informatie? Indien je geinteresseerd bent kun je hier een nieuwe brochure aanvragen.

    Egbert Philips, https://www.linkedin.com/in/egbert-philips-9548bb2?trk=nav_responsive_tab_profile

  • Data voor apps, apps voor data

    5023657Data en apps hebben niets met elkaar te maken. En alles. Maar dan natuurlijk precies omgekeerd als veelal wordt gedacht.

    Verwarde mannen. Nergens kom je die zo vaak tegen als in de ict. Althans, dat denk ik wanneer ik intelligente mensen opvattingen hoor verkondigen die met minder hypegevoeligheid en meer gezond verstand niet hadden bestaan. Zo’n ervaring had ik vier jaar terug. Ik mocht advies uitbrengen over een reorganisatie van een organisatie met veel ict. Met veel ict kwamen vele data-eilanden en datalogistiek en de bekende problemen van hoge kosten en matige datakwaliteit. Bij een aantal betrokkenen bleek het beeld te bestaan dat die problemen grotendeels zouden gaan verdwijnen door legacy-applicaties te ver-appen. Ook rekening houdend met het feit dat apps nog maar net aan hun opmars waren begonnen, was dat een merkwaardige opvatting.

    Het is niet moeilijk om verwarde vakmensen te ont-hypen. Wat doen apps? Worden foute data goed? Worden corrupte data integer? Worden inconsistente data samenhangend? Worden verouderde data actueel? Nee, nee, nee en nee. Wat wel gebeurt is dat data ontsluiten voor de gebruiker gemakkelijker wordt dan met een browser en dat je programmatuur en lokale data op een mobiele client kunt installeren en bijhouden. Natuurlijk kun je in een app de halve datawereld ontsluiten, waar je ook bent, maar de credits daarvoor gaan naar open data(bases), Java, JDBC, 4G, Android/iOS, et cetera; niet naar het verschijnsel ‘app’ als zodanig.

    Ondertussen heeft de combinatie van al deze zaken wel degelijk geleid tot een totaal andere relatie van mensen en data: zowel gestructureerde als ongestructureerde data, zowel klassieke text & number data als grafische, audio- en videodata. Maar de app is slechts de zichtbare toegangspoort tot al die rijkdom, de toegangspoort naar die wereld, even low-tech en high-concept als de hyperlink.

    De opkomst van de app heeft ondertussen een wereld van data opengelegd. Niet zozeer om wat de app vermag maar vanwege het verbijsterende aanbod van apps, alle met hun eigen databronnen. In de praktijk leidt de app eerder tot versplintering van functionaliteit dan tot integratie van databronnen, al kan dat meer te maken hebben met het meer vluchtige gebruik van mobiele devices dan met de aard van apps. Maar hoe lang je er ook over doorpraat, apps als zodanig hebben geen effect op hoe een gegevenshuishouding van een organisatie moet worden opgezet of hoe bedrijfsdata moeten worden gemanaged.

    Tot hier heb ik de app bekeken als een middel om data te ontsluiten, de conventionele kijk. Er is echter ook aan andere kijk, die vooral wordt aangetroffen onder privacyadepten en databasemarketeers. Voor deze lieden is de app een data collection device, een superkrachtig middel om toegang te krijgen tot een wereld aan gedragsdata. Locatie, contacten, foto’s, in potentie alles dat een mobiel apparaat aan data kan verzamelen; de app als de primaire provider van big data.

    Zodra we de relatie tussen apps en data omkeren, zien we wel degelijk een impact op de gegevenshuishouding en een enorme uitdaging voor dataspecialisten. Wat te doen met al die ruwe sensordata die niet worden verzameld en gefilterd door data entry-mensen, ondersteund door elektronische formulieren met handige classificatiehulpmiddelen als drop-down listboxes, radio buttons en invoercontroles? Apps breiden de gegevenshuishouding van organisaties uit met bergen gedragsdata die van een totaal andere aard zijn dan de klassieke administratieve data.

    Voor app-gebruikers, met name consumenten die gratis apps gebruiken, is de grootste uitdaging het behoud van een acceptabel minimum aan privacy. Voor organisaties bestaat de uitdaging in het integreren van de klassieke administratieve data met de nieuwe gedragsdata. Zo bezien hebben apps inderdaad een enorm effect op de gegevenshuishouding van organisaties, maar dan aanvullend. De data die apps toevoegen aan de gegevenshuishouding van organisaties vormen veelal nieuwe informatie-eilanden, aanvullend aan en relatief incompatibel met de archipel van data toebehorend aan (legacy-)systemen en datawarehouses. Het perspectief van integratie van administratieve data en door apps verzamelde gedragsdata in één samenhangende gegevenshuishouding is nog groter dan de integratie van eilanden van administratieve data, juist omdat de aard van deze data zo verschillend en aanvullend is. Helaas voorspelt de geschiedenis van data-integratie niet veel goeds. We staan vermoedelijk aan het begin van een groot aantal hele en halve mislukkingen en sowieso van grote ict-investeringen. We leven weer eens in interessante tijden.

    Is dat een droevige conclusie? Ja en nee. Ja, want extra complexiteit en kosten zijn uiteindelijk in niemands belang. En nee, want ict en de manier waarop organisaties daarmee omgaan zal meer dan ooit een vitaal concurrentiemiddel worden. Jazeker, dat dachten we in de jaren ’80 ook en dat pakte anders uit, maar deze keer is het voor veel organisaties – natuurlijk lang niet alle – anders.

    Vraag niet wat uw data kunnen doen voor uw apps, maar wat uw apps kunnen doen voor uw data. Dát is de vraag.

    Source: Computable

  • Data Warehousing Lessons for A Data Lake World

     

     

    Over the past 2 decades, we have spent considerable time and effort trying to perfect the world of data warehousing. We took the technology that we were given and the data that would fit into that technology, and tried to provide our business constituents with the reports and dashboards necessary to run the businesses.

    It was a lot of hard work and we had to do many “unnatural” acts to get these OLTP (Online Transaction Processing)-centric technologies to work; aggregated tables, plethora of indices, user defined functions (UDF) in PL/SQL, and materialized views just to name a few. Kudos to us!!

    Now as we get ready for the full onslaught of the data lake, what lessons can we take away from our data warehousing experiences? I don’t have all the insights, but I offer this blog in hopes that others will comment and contribute. In the end, we want to learn from our data warehousing mistakes, but we don’t want to throw out those valuable learnings.

    Special thanks to Joe DosSantos (@JoeDosSantos) for his help on this blog.

    Why Did Data Warehousing Fail?

    Below is the list of areas where data warehousing struggled or outright failed. Again, this list is not comprehensive, and I encourage your contributions.

    • Adding New Data Takes Too Long. It took too long to load new data into the data warehouse. The general rule to add new data to a data warehouse was 3 months and $1 million. Because of the need to pre-build a schema before loading data into the data warehouse, the addition of new data sources to the data warehouse was a major effort. We had to conduct weeks of interviews with every potential user to capture every question they might ever want to ask in order to build a schema that handled all of their query and reporting requirements. This greatly hindered our ability to quickly explore new data sources, so organizations resorted to other options, which leads to…
    • Data Silos. Because it took so long to add new data sources to the data warehouse, organizations found it more expedient to build their own data marts, spreadmarts[1] or Access databases. Very quickly there was a wide-spread proliferation of these purpose built data stores across the organization. The result: no single version of the truth and lots of executive meetings wasting time debating whose version of the data was most accurate, which leads to…
    • Lack of Business Confidence. Because there was this proliferation of data across the organization and the resulting executive debates around whose data was most accurate, business leaders’ confidence in the data (and the data warehouse) quickly faded. This became especially true when the data being used to run a business unit was redefined for corporate use in such a way that it was not useful to the business. Take, for instance, a sales manager looking to assign a quota to his rep that manages the GE account and wants a report of historical sales. For him, sales might be Gross and GE might include Synchrony, whereas the corporate division might look at sales as Net or Adjusted and GE as its legal entities. It’s not so much a question of right and wrong as much as it is the enterprise introducing definitions that undermines confidence, which leads to…
    • Underinvestment In Metadata. No business leader had the time to verify the accuracy of the data, and no IT person knew the business well enough to make those data accuracy decisions. Plus, spending the money to hire consultants to do our job for us was always a hard internal sell, which leads to the metadata management denial cycle:
      • IT: “You business users need to own the data.”
      • Business: “We don’t have time to do that.”
      • IT: “Okay, let’s hire consultants.”
      • Business: “Shouldn’t we know our data better than consultants?”
      • IT: “Okay, you business users need to own the data”
      • And so forth…
    • Inability to Easily Share Data. The data warehouse lacked the ability to quickly ingest and consequently easily share data across different business functions and use cases. The data warehouse failed to become that single repository for the storage of the organization’s data assets because of the complexity, difficulty and slowness to add new data to the data warehouse, which leads to…
    • Shadow IT Spend. Nothing confirms the failure of the data warehouse more than shadow IT spend. Business users did not have confidence in how the data warehouse could help them address urgent business needs. Consequently, many line of business leaders pursued their own one-off IT initiatives (call center operations, sales force automation, campaign marketing, logistics planning, financial planning, etc.), which also further contributed to the unmanageable proliferation of data across the organizational data silos.
    • Inability to Handle Unstructured Data. Data warehouses cannot handle unstructured data. Unfortunately the bulk of the world’s data is now found in semi-structured data (log files, sensors, beacons, routers, MAC addresses) and unstructured data (text files, social media postings, audio files, photos, video files). Organizations who wanted a holistic view of the business had to make do with only 10 to 20% of the available organizational data. Hard to provide a holistic view with a 80% to 90% hole in that view.
    • No Predictive Analytic Capabilities. Business Intelligence solutions provide the summarized data necessary to support the organization’s operational and management reporting needs (descriptive analytics). However, most data warehouses lacked the detailed data across a wide variety of structured and unstructured data sources to support the organization’s predictive and prescriptive analytic needs.
    • Too Damned Expensive. Data science is about creating behavioral analytics at the individual levels (e.g., customers, employees, jet engine, train engine, truck, wind turbine, etc.). To uncover these behavioral analytics at the individual level, data scientists need the complete history of detailed transactional, operational and engagement data. The data scientists don’t want 13 months of aggregated data; they want 17 years of detailed transactions, even if that data is now located on mag tape. Trying to gather all of the voluminous data on a data warehouse is a recipe for organizational bankruptcy.
    • Inadequate Processing Power. Let’s face it; data warehouses lacked the economical processing power necessary to analyze petabytes of customer and machine data to uncover behavioral patterns and propensities. The data lake is built on modern, big data scale-out environments using open source software built on commodity servers are game changers in allowing organizations to store and analyze data volumes magnitudes bigger than one could ever economically fit into a data warehouse.

    What Did Data Warehousing Get Right?

    Okay, I was pretty harsh on the data warehouse world in which I grew up. But again, it was amazing what we were able to do with technology designed to deal with single records (insert, update, delete). I have never constructed analytics that uses only a single record. Analytics requires a massive number of records in order to uncover individual behaviors, propensities, tendencies, patterns, etc.

    So what did we get right, and what should we preserve as we move into the modern data lake world?

    • Data Governance. Data governance, into which I also group things like data accuracy, data lineage and data traceability, is as important now as it was in the data warehouse world. Having a process that allows the data science team to quickly ingest and explore the data unencumbered by data governance is a good practice. However you will need data governance rules, policies and procedures once you have determined that there is value in that data to support key decisions. If the business users do not have confidence in the data, then all is lost.
    • Metadata Management. The importance of metadata only becomes clearer as we begin to integrate data and analytics into the organization’s key business processes. The more metadata that we have about the data, the easier it is to get value from that data. Investing in the associated metadata carries the same economic value as investing it the data itself, IMHO. We want to enrich the data as much as possible, and a solid metadata management strategy is key for making that happen.
    • Conformed Dimensions. Having a single master file – or conformed dimension – for key business entities (e.g., products, customers, employees, physicians, teachers, stores, jet engines, locomotives, delivery trucks, etc.) is critical. It is these conformed dimensions that allow the data science team to tie together the wide variety of data sources to create the detailed analytic and behavioral profiles. Maintaining these conformed dimensions is hard work, but without them, there is no way to turn all this valuable data (and metadata) into actionable insights.
    • Single Version of The Truth. While I have always hated the term “single version of the truth,” operationally it is important to have all the data about your key business entities in a single (physical or logical) location. Also, in the Big Data world, the notion of data that is fit for purpose becomes critical. There may not be one truth, but there should be clarity as to how numbers are produced to provide transparency and trust.
    • Analytics Self-service. The idea of creating a self-service environment around analytics is very powerful. How do I pull IT out of the middle of the analytics request and provisioning process? If I truly want to create an environment where the analysts can quickly spin up an analytics sandbox and populate with data, I can’t have heavy manual processes in the middle of that process.
    • Reports Starting Point. The many reports and dashboards that have been built upon your data warehouse are a great starting point for your data lake journey. Business users have requested those reports for a reason. Instead of focusing time and effort to create yet more reports, first try to understand what questions and decisions the business users hoped to address with those reports, and what additional predictive and prescriptive insights do they need from those reports.
    • Yeah, SQL is still the query language of choice and we need to embrace how we help SQL-trained analysts to use that tool on the data lake. Open-source tools like Hive, HBase, and HAWQ are all designed to enable that army of SQL-trained business users and analysts to have access to the wealth of data in the data lake.

    Summary

    There is much that can be learned from our data warehousing experiences. The key is to understand what to keep and what to throw out. That means a single data lake (not data lakes). That means data governance. That means metadata management, and even more that we learned from our data warehousing experiences. We must learn from our experiences, otherwise…

    “Those who do not learn history are doomed to repeat it.”

    [1] Spreadmart (short for “spreadsheet data mart”) is a business intelligence term that refers to the propensity of some organizations or departments within organizations to use individual, desktop-based databases like spreadsheets as a primary means of data organization, storage, and dissemination.

     

     

  • Data warehousing: ETL, ELT, and the use of big data

    Data warehousing: ETL, ELT, and the use of big data

    If your company keeps up with the trends in data management, you likely have encountered the concepts and definitions of data warehouse and big data. When your data professionals try to implement data extraction for your business, they need a data repository. For this purpose, they can use a data warehouse and a data lake.

    Roughly speaking, a data lake is mainly used to gather and preserve unstructured data, while a data warehouse is intended for structured and semi-structured data.

    Data warehouse modeling concepts

    All data in a data warehouse is well-organized, archived, and arranged in a particular way. Not all data that can be gathered from multiple sources reach a data warehouse. The source of data is crucial since it impacts the quality of data-driven insights and hence, business decisions.

    During the phase of data warehouse development, a lot of time and effort is needed to analyze data sources and select useful ones. It depends on the business processes, whether a data source has value or not. Data only gets into the warehouse when its value is confirmed.

    On top of that, the way data is represented in your database has a critical role. Concepts of data modeling in a data warehouse are a powerful expression of business requirements specific to a company. A data model determines how data scientists and software engineers will design, create, and implement a database.

    There are three basic types of modeling. Conceptual data model describes all entities a business needs information about. It provides facts about real-world things, customers, and other business-related objects and relations.
    The goal of creating this data model is to synthesize and store all the data needed to gain an understanding of the whole business. This model is designed for the business audience.

    Logical data model suits more in-depth data. It describes the structure of data elements, their attributes, and ways these elements interrelate. For instance, this model can be used to identify relationships between customers and products of interest for them. This model is characterized by a high level of clarity and accuracy.

    Physical data model describes specific data and relationships needed for a particular case as well as the way data model is used in database implementation. It provides a wealth of meta-data and facilitates visualizing the structure of a database. Meta-data can involve accesses, limitations, indexes, and other features.

    ELT and ETL data warehouse concepts

    Large amounts of data sorted for warehousing and analytics require a special approach. Businesses need to gather and process data to retrieve meaningful insights. Thus, data should be manageable, clean, and suitable for molding and transformation.

    ETL (extract, transform, load)and ELT (extract, load, transform) are the two approaches that have technological differences but serve the same purpose – to manage and analyze data.

    ETL is the paradigm that enables data extraction from multiple sources and pulling data into a single database to serve a business.

    At the first stage of the ETL process, engineers extract data from different databases and gather it in a single place. The collected data undergo transformation to take the form required for a target repository. Then the data come to a data warehouse or a target database.

    If to switch the letters 'T' and 'L', you get the ELT process. After the retrieval, the data can be loaded straight to the target database. The cloud technology enables large and scalable storage places, and massive datasets can be first loaded and then transformed as per the business requirements and needs.

    The ELT paradigm is a newer alternative to a well-established ETL process. It is flexible and allows fast processing speed to work with raw data. On the one hand, ELT requires special tools and frameworks, but on the other, it enables unlimited access to business data, thus saving BI and data analytics experts so much time.

    ETL testing concepts are also essential to ensure that data is loading in a data warehouse in a correct and accurate manner. This testing involves data verification at transitional phases. And before data reaches the destination, its quality and usefulness are already verified.

    Types of data warehouse for your company

    Different data warehouse concepts presuppose the use of particular techniques and tools to work with data. Basic data warehouse concepts also differ depending on a company’s size and purposes of using data.

    Enterprise data warehouse enables a unique approach to organizing, visualizing, and representing all the data across a company. Data can be classified by a subject and can be accessed based on this attribute.

    Data mart is a subcategory of a data warehouse designed for specific tasks in business areas such as retail, finance, and so forth. Data comes into a data mart straight from the sources.

    Operational data store satisfies the reporting needs within a company. It is updating in real time, which makes this solution best-suited for keeping in all business records.

    Big data and data warehouse ambiguity

    A data warehouse is an architecture that has proved to be valuable for data storing over the years. It involves data that has a defined value and can be used from the start to solve some business needs. Everyone can access this data, and the features of datasets are reliability and accuracy.

    Big data is a hyped field these days. It is the technology that allows retrieving data from heterogeneous sources. The key features of big data are volume, velocity or data streams, and a variety of data formats. Unlike a data warehouse, big data is a repository that can hold unstructured data as well.

    Companies seek to adopt custom big data solutions to unlock useful information that can help improve decision-making. These solutions help drive revenue, increase profitability, and cut customer churn thanks to the comprehensive information collected and available in one place.

    Data warehouse implementation entails advantages in terms of making informed decisions. It provides comprehensive insights into what is going on within a company, while big data can be in the shape of massive but disorganized datasets. However, big data can be later used for data warehousing.

    Running a data-driven business means dealing with billions of data on in-house, external operations, consumers, and regulations.

    Author: Katrine Spirina

    Source: In Data Labs

  • Data, Analytics & Fuel Innovation at Celgene

    Williams-Richard-CelgeneCIO Richard Williams leads a global IT organization that’s harnessing digital, data, and analytics to support R&D innovation, drive operational excellence, and help Celgene achieve first-mover advantage in the shift to value-based, personalized health care intended to help patients live longer and healthier lives.
     
     
    An explosion of electronic health information is rocking the entire health care ecosystem, threatening to transform or disrupt every aspect of the industry. In the biopharmaceutical sector, that includes everything from the way breakthrough scientific innovations and insights occur to clinical development, regulatory approvals, and reimbursement for innovations. Celgene, the $11 billion integrated global biopharmaceutical company, is no exception.
     
    Indeed, Celgene, whose mission is to discover, develop, and commercialize innovative therapies for the treatment of cancer, immune-inflammatory, and other diseases, is aggressively working to leverage the information being generated across the health care system, applying advanced analytics to derive insights that power its core business and the functions that surround and support it. Long known for its commitment to external scientific collaboration as a source of innovation, Celgene is investing to harness not only the data it generates across the enterprise, but also the real-world health care data generated by its expanding network of partners. Combined, this network of networks is powering tremendous value.
     
    CIO Richard Williams sees his mission—and that of the IT organization he leads—as providing the platforms, data management, and analytics capabilities to support Celgene through the broader industry transition to value-based, personalized health care. At Celgene, this transformation is enabled by a focus on the seamless integration of information and technology. A cloud-first platform strategy, coupled with enterprise information management, serves as the foundation for leveraging the data generated and the corresponding insights from internal and external health care data.
     
    Williams recently shared his perspective on the changes wrought by enormous data volumes in health care, the role of IT at Celgene, and the ways IT supports life sciences innovation.
     
    Can you describe the environment in which Celgene is currently operating?
     
    Williams: We are living in an exciting era of scientific breakthroughs coupled with technology convergence. This creates both disruption and opportunity. The explosion and availability of data, the cloud, analytics, mobility, artificial intelligence, cognitive computing, and other technologies are accelerating data collection and insight generation, opening new pathways for collaboration and innovation. At Celgene, we’re able to apply technology as never before—in protein homeostasis, epigenetics, immuno-oncology, immuno-inflammation, informatics, and other fields of study—to better understand disease and develop targeted therapies and treatments for people who desperately need them.
     
    How does IT support scientific and business innovation at Celgene?
     
    At its core, Celgene IT is business aligned and value focused. Rather than looking at technology for technology’s sake, we view information and technology as essential to achieving our mission and business objectives. As an integrated function, we have end-to-end visibility across the value chain. This enables us to identify opportunities to leverage technology investments to connect processes and platforms across all functions. As a result, we’re able to support improvements in R&D productivity, product launch effectiveness, and overall operational excellence.
     
    This joint emphasis on business alignment and business value, which informs everything we do, is manifest in three important ways:
     
    First is our emphasis on a core set of enterprise platforms, which enable us to provide end-to-end visibility rather than a narrower functional view. We established a dual information- and cloud-first strategy to provide more comprehensive platforms of capabilities that can be shared across Celgene’s businesses. The cloud—especially with recent advances in security and analytics—provides tremendous scale, agility, and value because it allows us to standardize and create both consistency and agility across the entire organization regardless of device or access method. It’s our first choice for applications, compute power, and storage.
     
    Second is our focus on digital and the proliferation of patient, consumer, scientific, and it is creating. Health care data is growing exponentially—from something like 500 petabytes (PB) of data in 2013 to 25,000 PB by 2020, according to one study.
     
    To address this opportunity, we’ve initiated an enterprise information management (EIM) strategy through which we are targeting important data domains across our business and applying definitions, standards, taxonomies, and governance to data we capture internally and from our external partners. Establishing that consistency is critically important. It drives not only innovation, but also insight into our science, operations, and, ultimately, patient outcomes. Celgene is at the forefront in leveraging technologies that offer on-demand compute and analytic services. By establishing data consistency and influencing and setting standards, we will support our own objectives while also benefiting the broader industry.
     
    Third is our support for collaboration—the network of networks—and the appropriate sharing of information across organizational boundaries. We want to harness the capabilities and data assets of our partners to generate insights that improve our science and our ability to get better therapies to patients faster. Celgene is well-known in the industry for external innovation—how we partner scientifically—and we are now extending this approach to data and technology collaboration. One recent example is our alliance with Medidata Solutions, whose Clinical Cloud will serve as our enterprise technology and data platform for Celgene clinical trials worldwide. Celgene is also a founding commercial member of the Oncology Research Information Exchange Network, a collaboration of cancer centers spearheaded by M2Gen, a health informatics solution company. And we have teamed with ConvergeHEALTH by Deloitte and several other organizations for advanced analytics around real-world evidence and knowledge management, which will also be integrated into our data platform.
     
    You’re building this network-enabled, data-rich environment. But are your users prepared to take advantage of it?
     
    That’s an important aspect of the transformation and disruption taking place across multiple industries. Sure, IT can make information, technology, and insights available for improved decision-making, but the growing complexity of the data—whether it’s molecular structures, genomics, electronic medical records, or payment information—demands different skill sets.
     
    Data scientists are in high demand. We need to embed individuals with those specialized skills in functions from R&D to supply chain and commercial. At the same time, many more roles will require analytics acumen as part of the basic job description.
     
    As you build out your platform and data strategies, are you likely to extend those to your external alliances and partners?
     
    External collaboration enabled by shared data and analytics platforms is absolutely part of our collaboration strategy. If our informatics platforms can help our academic or commercial biotech collaborators advance the pace of their scientific evaluations, clinical studies, and commercialization, or they can help us with ours, that’s a win-win situation—and a differentiator for Celgene. We are already collaborating with Sage Bionetworks, leveraging Apple ResearchKit to develop an app that engages patients directly in innovation aimed at improving treatments for their diseases. We’re also working with IBM Watson to increase patient safety using cognitive computing to improve drug monitoring. As the power of collaborative innovation continues, collaboration will become more commonplace and lead to some amazing results.
     
    As you look out 12 to 18 months, what technologies might you want to bolt onto this platform or embed in your EIM strategy?
     
    The importance of cognitive computing, including machine learning and artificial intelligence, will continue to grow, helping us to make sense of the increasing volumes of data. The continued convergence of these technologies with the internet of things and analytics is another area to watch. It will result in operational insights as well as new, more intelligent ways to improve treatments for disease.
     
    What advice do you have for CIOs in health care or other industries who may not be as far along in their cloud, data, and analytics journeys?
    A digital enterprise is a knowledge- and information-driven enterprise, so CIOs should first focus on providing technologies and platforms that support seamless information sharing. In the process, CIOs should constantly be looking at information flows through an enterprise lens—real value is created when information is connected across all functions. Next, it’s increasingly important for CIOs to help build a technology ecosystem that allows the seamless exchange of information internally and externally because transformation and insight will occur in both places. Last, CIOs need to recognize that every job description will include data and information skills. This is an especially exciting time to be in IT because the digital capabilities we provide increasingly affect every function and role. We need to help people develop the skills they need to take advantage ofwhat we can offer now and in the future.
    Source: deloitte.wsj.com, November 14, 2016
  • Database possibilities in an era of big data

    Database possibilities in an era of big data

    We live in an era of big data. The sheer volume of data currently existing is huge enough without also grappling with the amount of new information that’s generated every day. Think about it: financial transactions, social media posts, web traffic, IoTsensor data, and much more, being ceaselessly pulled into databases the world over. Outdated technology simply can’t keep up.

    The modern types of databases that have arisen to tackle the challenges of big data take a variety of forms, each suited for different kinds of data and tasks. Whatever your company does, choosing the right database to build your product or service on top of is a vital decision. In this article, we’ll dig into the different types of database options you could be considering for your unique challenges, as well as the underlying database technologies you should be familiar with. We’ll be focusing on relational database management systems (RDBMS), NoSQL DBMS, columnar stores, and cloud solutions.

    RDBMS

    First up, the reliable relational database management system. This widespread variety is renowned for its focus on the core database attributes of atomicity (keeping tasks indivisible and irreducible), consistency (actions taken by the database obey certain constraints), isolation (a transaction’s immediate state is invisible to other transactions), and durability (data changes reliably persist). Data in an RDBMS is stored in tables and an RDBMS is able to tackle tons of data and complex queries as opposed to flat files, which tend to take up more memory and are less efficient. An RDBMS is usually made up of a collection of tables, each with columns (fields) and records (rows). Popular examples of RDBM systems include Microsoft SQL, Oracle, MySQL, and Postgres.

    Some of the strengths of an RDBMS include flexibility and scalability. Given the huge amounts of information that modern businesses need to handle, these are important factors to consider when surveying different types of databases. Ease of management is another strength since each of the constituent tables can be changed without impacting the others. Additionally, administrators can choose to share different tables with certain users and not others (ideal if working with confidential information you might not want shared with all users). It’s easy to update data and expand your database, and since each piece of data is stored at a single point, it’s easy to keep your system free from errors as well.

    No system is perfect, however. Each RDBMS is built on a single server, so once you hit the limits of the machine you’ve got, you need to buy a new one. Rapidly changing data can also challenge these systems, as increased volume, variety, velocity, and complexity create complicated relationships that the RDBMS can have trouble keeping up with. Lastly, despite having 'relation' in the name, relational database management systems don’t store the relationships between elements, meaning that the system doesn’t actually understand the connections between data as pertains to various joins you may be using. 

    NoSQL DBMS

    NoSQL (originally, 'non relational' or 'not SQL') DBMS emerged as web applications were becoming more complex. These types of databases are designed to handle heterogeneous data that’s difficult to stick in a normalization schema. While they can take a wide array of forms, the most important difference between NoSQL and RDBMS is that while relational databases rigidly define how all the data contained within must be arranged, NoSQL databases can be schema agnostic. This means that if you’ve got unstructured and semi-structured data, you can store and manipulate it easily, whereas an RDBMS might not be able to handle it at all. 

    Considering this, it’s no wonder that NoSQL databases are seeing a lot of use in big data and real-time web apps. Examples of these database technologies include MongoDB, Riak, Amazon S3, Cassandra, and Hbase. However, one drawback of NoSQL databases is that they have 'eventual consistency', meaning that all nodes will eventually have the same data. However, since there’s a lag while all the nodes update, it’s possible to get out-of-sync data depending on which node you end up querying during the update window. Data consistency is a challenge with NoSQL since they do not perform ACID transactions.

    Columnar storage database

    A columnar storage database’s defining characteristic is that it stores data tables by column rather than by row. The main benefit of this configuration is that it accelerates analyses because the system only has to read the locations your query is interested in, all within a single column. Also, these systems compress repeating volumes in storage, allowing better compression, since the data in one specific column is homogeneous across all the columns (or, columns are all the same type: integers, strings, etc. so that they can be better compressed). 

    However, due to this feature, Columnar storage databases are not typically used to build transactional databases. One of the drawbacks of these types of database is that inserts and updates on an entire row (necessary for apps like ERPs and CRMs, for example) can be expensive. It’s also slower for these types of applications. For example, when opening an account’s page in a CRM, the app needs to read the entire row (name, address, email, account id, etc) to populate the page and write back all that as well. In this example, a relational database would be more efficient. 

    Cloud solutions

    While not technically a type of database themselves, no discussion of modern types of database solutions would be complete without discussing the cloud. In this age of big data and fast-moving data sources, data engineers are increasingly turning to cloud solutions (AWS, Snowflake, etc.) to store, access, and analyze their data. One of the biggest advantages of cloud options is that you don’t have to pay for the physical space or the physical machine associated with your database (or its upkeep, emergency backups, etc.). Additionally, you only pay for what you use: as your memory and processing power needs scale up, you pay for the level of service you need, but you don’t have to pre-purchase these capabilities.

    There are some drawbacks to using a cloud solution, however. First off, since you’re connecting to a remote resource, bandwidth limitations can be a factor. Additionally, even though the cloud does offer cost savings, especially when starting a company from scratch, the lifetime costs of paying your server fees could exceed what you would have paid buying your own equipment. Lastly, depending on the type of data you’re dealing with, compliance and security can be issues because the responsibility of managing the data and its security is no longer handled by you, the data owner, and instead by the third party provider. For example, unsecured APIs and interfaces that can be more readily exploited, data breaches, data loss or leakage risks can be elevated, and unauthorized access through improperly configured firewalls are some ways in which cloud databases can be compromised.

    Decision time

    The era of Big Data is changing the way companies deal with their data. This means choosing new database models and finding the right analytics and BI tools to help your team get the most out of your data and build the apps, products, and services that will shape the world. Whatever you’re creating, picking the right database type for you, and build boldly.

    Author: Jack Cieslak

    Source: Sisense

  • De 5 beloftes van big data

    4921077Big data is een fenomeen dat zichzelf moeilijk laat definiëren. Velen zullen gehoord hebben van de 3 V’s: volume, velocity en variety. Kortgezegd gaat big data over grote volumes, veel snelheid (realtime) en gevarieerde/ongestructureerde data. Afhankelijk van de organisatie kent big data echter vele gezichten.

    Om te analyseren hoe big data het beste in een bedrijf geïntegreerd kan worden, is het van belang eerst duidelijk in beeld te hebben wat big data precies biedt. Dit is het beste samen te vatten in de volgende viif beloftes:


    1. Predictive: Big data genereert voorspellende resultaten die iets zeggen over de toekomst van uw organisatie of resultaat van een concrete actie;
    2. Actionable results: Big data levert mogelijkheden op voor directe acties op gevonden resultaten, zonder menselijke interventie;
    3. Realtime: De nieuwe snelheidsnormen zorgen dat je direct kunt reageren op nieuwe situaties;
    4. Adaptive: Een goed ontworpen model past zich constant automatisch aan wanneer situaties en relaties veranderen;
    5. Scalable: Verwerking en opslagcapaciteit is lineair schaalbaar, waardoor u flexibel kunt inspelen op nieuwe eisen.

    Deze vijf big data beloftes kunnen alleen worden gerealiseerd met inzet van drie big data disciplines/rollen: De big data scientist, de big data engineer en de big data infrastructuur specialist.

    Predictive

    In een klassieke Business Intelligence omgeving worden rapportages gegenereerd over de huidige status van het bedrijf. In het geval van big data praat men echter niet over het verleden of de huidige situatie, maar over predictive analytics.

    Voorspellende rapportages worden mogelijk gemaakt doordat de data scientist patroonherkenningstechnieken toepast op historische data en de gevonden patronen uitwerkt in een model. Het model kan vervolgens de historie inladen en op basis van actuele events/transacties de patronen doortrekken naar de toekomst. Op deze manier kan een manager schakelen van reactief management naar anticiperend management.

    Actionable results

    Actionable results ontstaan wanneer gevonden resultaten uit de modellen van de data scientist direct worden vertaald naar beslissingen in bedrijfsprocessen. Hierbij maakt de data engineer de koppeling en zorgt de data scientist dat het model de output in het juiste formaat aanbiedt. De belofte van actionable results wordt zodoende deels ingelost door de big data-specialisten, echter komt het grootste deel voor rekening van de attitude van het management team.

    Het management heeft de taak om een nieuwe manier van sturing aan te wenden. Er wordt niet meer gestuurd op de micro-processen zelf, maar op de modellen die deze processen automatiseren. Zo wordt er bijvoorbeeld niet meer gestuurd op wanneer welke machine onderhouden moet worden, maar welke risicomarges het beslissende model mag hanteren om de onderhoudskosten te optimaliseren.

    Realtime

    Bij big data wordt vaak gedacht aan grote volumes van terabytes aan data die verwerkt moeten worden. De 'big' van big data is echter geheel afhankelijk van de dimensie van snelheid. Zo is 10 TB aan data verwerken in een uur big data, maar 500 MB verwerken is ook big data als de eis is dat dit in tweehonderd milliseconde moet gebeuren. Realtime verwerking ligt in dat laatste hogesnelheidsdomein van verwerking. Er is geen gouden regel, maar men spreek vaak van realtime wanneer de reactiesnelheid binnen vijfhonderd milliseconde is. Om deze hoge snelheden te realiseren is een combinatie van alle drie de big data disciplines nodig.

    De big data infrastructuur specialist heeft de taak om het opslaan en inlezen van data te optimaliseren. Snelheidsoptimalisatie vind je door de data geheel te structureren op de manier waarop het door het model wordt ingelezen. Zo laten we alle flexibiliteit in de data los om deze vanuit één perspectief zo snel mogelijk te consumeren.

    De big data engineer realiseert dit door de snelheid van de koppelingen tussen de databronnen en de afnemers te optimaliseren, door de koppelingen in een gedistribueerd format aan te bieden. Zo kunnen een theoretisch oneindig aantal resources worden aangeschakeld om de data gedistribueerd te krijgen en elke verdubbeling van resources zorgt voor een verdubbeling van capaciteit. Ook is het aan de big data engineer om de modellen die de data scientist ontwikkelt om te zetten in een format dat alle sub-analyses van het model isoleert - en zoveel mogelijk distribueert over de beschikbare resources. Data scientists werken vaak in programmeertalen als R en Matlab, die ideaal zijn voor het exploreren van de data en de verschillende mogelijke modellen. Deze talen lenen zich echter niet goed voor distributed processing en de big data engineer moet daarom vaak in samenwerking met de data scientist een vertaling van het prototype model verwezenlijken in een productiewaardige programmeertaal als Java of Scala.

    De data scientist verzorgt zoals besproken de modellen en daarmee de logica van de dataverwerking. Om realtime te kunnen opereren is het de taak aan deze persoon om de complexiteit van de dataverwerking in te perken tot een niveau beneden exponentieel. Zodoende is een samenwerking van de drie disciplines vereist om tot een optimaal resultaat te komen.

    Adaptive

    We kunnen spreken van een adaptive omgeving - ook wel machine learning of artificial intelligence genoemd - wanneer de intelligentie van deze omgeving zich autonoom aanpast aan nieuwe ontwikkelingen binnen het te modelleren domein. Om dit mogelijk te maken is het belangrijk dat het model genoeg ervaring heeft opgedaan om zelf te kunnen leren. Hoe meer informatie er beschikbaar is over het model door de geschiedenis heen, hoe breder de ervaring is die we op kunnen doen.

    Scalable

    Schaalbaarheid wordt bereikt wanneer er een theoretisch oneindige verwerkingscapaciteit is als oneindig veel computers worden bijgeschakeld. Dit betekent wanneer je vier keer zoveel capaciteit nodig hebt, vier keer zoveel computers worden bijgeschakeld - en wanneer je duizend keer meer nodig hebt er duizend computers worden toegevoegd. Dit lijkt eenvoudig, maar tot voorheen was deze samenwerking tussen computers een zeer complexe taak.

    Iedere discipline heeft een rol in het schaalbaar maken en schaalbaar houden van big data-oplossingen. Zo verzorgt de big data infrastructuur specialist de schaalbaarheid van het lezen, schrijven en opslaan van data. De big data engineer verzorgt de schaalbaarheid van het consumeren en produceren van data en de big data scientist verzorgt de schaalbaarheid van de intelligente verwerking van de data.

    Big data, big deal?

    Om van de volledige mogelijkheden van big data gebruik te maken is het dus van groot belang een multidisciplinair team in te schakelen. Dit klinkt wellicht alsof er direct zeer grote investeringen gedaan moeten worden, echter biedt big data ook de mogelijkheid om klein te beginnen. Dit kan door een data scientist de verschillende analyses te laten doen op een laptop of een lokale server, om zo met een minimale investering een aantal ‘short-term wins’ voor je organisatie te creëren. Wanneer je de toegevoegde waarde van big data inzichtelijk hebt, is het een relatief kleine stap om een big data omgeving in productie te zetten en zodoende ook jouw organisatie op een data-gedreven manier te kunnen sturen.

    Source: Computable

  • De blik vooruit: digitaal transformeren met Big Data

    blik vooruitHet ‘Big’ in Big Data staat voor velen alleen voor de grote hoeveelheden data die organisaties in huis hebben. Die data lijken in veel praktijkvoorbeelden vooral voedingsbodem voor aanscherping van verdienmodellen – zoals in de advertentie-industrie – of totale disruptie zoals de kentering die eHealth op dit moment veroorzaakt.

    Een eenzijdig beeld misschien, want data leveren net zo goed inzichten voor interne procesverbetering of een betere klantbenadering met digitale oplossingen. Het gesprek over Big Data zou vooral moet gaan over de enorme potentie ervan: over de impact die de data kunnen hebben op bestaande producten en organisatievraagstukken en hoe die helpen bij een eerste stap richting digitale transformatie.

    Big Data in de huidige praktijk
    De praktijk leert dat er voor ieder bedrijf vier manieren zijn om data in te zetten. Terug naar het realisme in Big Data: de mogelijkheden, voorbeelden en natuurlijke grenzen.

    Excellentie: onderzoek toont aan dat de manager die zich puur door zijn gevoel of intuïtie laat leiden, kwalitatief slechtere beslissingen neemt en dus aanstuurt op een minder goede dienst of experience. Door de harde feiten uit gecombineerde data te benutten, zijn bottlenecks in processen te herkennen en zelfs te voorspellen.

    Zo kunnen Amerikaanse politiekorpsen sinds eind vorig jaar op basis van realtime data schatten waar misdrijven gepleegd zullen worden. Hoewel het al decennia gebruikelijk is om op willekeurige wijze door de stad te patrouilleren – de crimineel weet dan immers niet waar agenten zich bevinden en zou daardoor gehinderd worden – wordt die werkwijze nu losgelaten.

    Het algoritme van de analyticsoplossing PredPol belooft aan de hand van plaats, tijd en soort criminaliteit te voorspellen waar agenten met hun aanwezigheid een misdrijf kunnen voorkomen. In de film Minority Report werd het nog weggezet als voorspelling voor het jaar 2054, nu blijkt dat excelleren met voorspellende data in 2015 al de normaalste zaak van de wereld is.

    Productleiderschap: niet ieder bedrijf hoeft het nieuwe Spotify of Airbnb te worden. Wel moet er rekening worden gehouden met de disruptieve kracht van deze spelers en hun verdienmodellen.

    De leidende positie die Netflix heeft ingenomen heeft het bedrijf deels te danken aan het slimme gebruik van Big Data. De enorme hoeveelheid content die online beschikbaar is, maakt het mogelijk te grasduinen in films en series. Dat surf- en kijkgedrag vertaalt Netflix in dataprofielen waarmee het eigen product zichtbaar verbetert. De data resulteren in aanbevelingen die – voor de kijker – onverwachts bij iemand zijn smaak passen en waarmee het bedrijf je als klant verder bindt.

    De video on demand-dienst staat er bij gebruikers inmiddels om bekend te weten wat de kijker boeit en is in staat je meer te laten afnemen dan je van plan bent. Met de data die het platform continu verbeteren, heeft Netflix de entertainmentindustrie en de groeiende markt van video op afroep opgeschud.

    Het bedrijf heeft zelfs gezorgd voor het nieuwe fenomeen van binge watching, waarbij de kijker urenlang aan de hand van aanbevelingen van de ene aflevering in de andere wordt gezogen. De algoritmes die hiervoor zorgen zijn zo belangrijk dat Netflix een miljoen dollar belooft aan diegene die met een beter alternatief komt.

    Intimiteit: meer te weten komen over de klant is misschien wel de bekendste kans van Big Data. Sociale media, online tracking via cookies en open databronnen maken dat iedere organisatie diensten op maat kan bieden: generieke homepages maken plaats voor met gebruikersdata gepersonaliseerd portals, de serviceverlening verbetert nu er op de juiste plekken in de organisatie een completer plaatje is van de klant.

    Hoe ver die intimiteit gaat, bewijst Amazon. Het bedrijf zegt producten te kunnen bezorgen nog voor deze zijn besteld. Amazon kan op basis van locatie, eerdere bestellingen, zoekopdrachten, opgeslagen verlanglijsten en ander online gedrag voorspellen welke behoefte een klant heeft. De gegevens zouden al zo accuraat zijn dat het winkelbedrijf eerder weet welke producten daarvoor moeten worden besteld dan de klant zelf. Zo nauwkeurig kunnen data soms zijn.

    Risicobeheersing: als data eindelijk goed worden ontgonnen en gecombineerd, maken ze duidelijk welke risico’s bedrijven realtime lopen. Met dank aan Big Data zijn audits sneller en doelgerichter in te zetten.

    Schoolvoorbeeld daarin zijn financiële instellingen. De tienduizenden financiële transacties die deze bedrijven iedere seconde verwerken, genereren zoveel data dat door middel van patroonherkenning fraude snel kan worden opgespoord. Bestel je online een artikel waar bijvoorbeeld een ongebruikelijk prijskaartje aan hangt, rinkelt binnen no-time de telefoon om de transactie te verifiëren.

    En hoewel het kostenbesparende aspect geen verdere toelichting behoeft, is een groot deel van de bedrijven onvoldoende voorbereid. Volgens wereldwijd onderzoek van EY erkent 72 procent van de bedrijven dat Big Data belangrijk zijn voor risicobeheersing, maar zegt 60 procent ook dat zij daarin nog belangrijke stappen moeten zetten.

    De verzekeraar: Big Data als middel om te transformeren
    Van uitlatingen op sociale media tot aan sociaal demografische gegevens, van eigen data over koopgedrag tot aan openbare data als temperatuurschommelingen: hoe rijker de data hoe scherper het inzicht. Wanneer er een organisatievraagstuk ligt, is de kans groot dat data een middel zijn om tot de benodigde transformatie te komen. Met de aanpak wordt daartoe een belangrijke stap gezet.

    • Detectie – Op basis van een Big Data-analyse kan er zicht ontstaan op eventuele behoeften onder de doelgroep. Bijvoorbeeld: Maakt een zorgverzekeraar een digitale transformatie door, dan is het raadzaam profielen van klanten uit een specifieke generatie met onderzoeksdata te verfijnen. Hoe begeven zij zich door de customer journey en welke digitale oplossingen verwachten zij gedurende een contactmoment?
    • Doel- en vraagstelling – Creëer potentiële scenario’s op basis van de data. Bijvoorbeeld: de jongste doelgroep van de verzekeraar groeit op in een stedelijke omgeving. Welk (mobiel) gedrag is specifiek voor deze groep en hoe beïnvloedt dit de de digitale oplossingen waar deze jongeren om vragen? Bepaal waar de databronnen zich bevinden – welke interne en externe databronnen zijn benodigd voor beantwoording van de vragen? Denk daarbij aan interne klantprofielen, maar ook aan open data-projecten van de overheid. Sociale media – uitingen en connecties – in combinatie met demografische kenmerken en postcodegebieden verreiken de profielen. De gegevens vertellen meer over de voorkeuren en invloed die directe omgeving en online media hebben.
    • Controleer de data – Vergeet niet te kijken naar wet- en regelgeving en met name wat privacyregels verbieden. De Wet Bescherming Persoonsgegevens gaat behoorlijk ver: de wet is zelfs van toepassing op gegevens die je tijdelijk binnenhaalt ter verwerking.
    • Analyse – De data worden geïnterpreteerd door analisten. Zo wordt duidelijk dat er een aantoonbaar verband is tussen leeftijd, woonomgeving en gebruik van digitale oplossingen. Bijvoorbeeld: jonge stedelingen zijn digital native en willen een online portal met eHealth-oplossing. Hierin willen zij eigen Big Data uit apps kunnen koppelen voor een beter beeld van de gezondheid.
    • Verankering – Door klantprofielen te blijven monitoren wordt snel duidelijk of er toekomstig afwijkingen optreden. Indien nodig is de transformatie bij te sturen.

    Grenzen aan Big Data
    Het is vooral belangrijk te waken over realisme in het denken over Big Data. Want hoewel data veel antwoorden geven, zit er ook een grens aan de mogelijkheden. Zodra databronnen naast elkaar lopen, kan het zijn dat analyses elkaar beïnvloeden: uitkomsten die correleren blijken achteraf louter op toeval te berusten.

    De mens blijft ook in de toekomst een belangrijke schakel in de kansen die Big Data bieden. Cruciale kennis moet op persoonsniveau behouden blijven, alleen de mens is in staat de juiste interpretatie te leveren. Wat te doen met inzichten is aan hen voorbehouden: een onverwachts dwarsverband kan aantonen dat een groep consumenten een verhoogt bedrijfsrisico oplevert. Wat te doen met deze inzichten als die een groep mensen stigmatiseren? De ethische grens moet altijd worden bewaakt.

    In veel organisaties betekent de komst van data een ware cultuurverandering. Behalve dat managers te weten komen dát er iets verandert, weten zij door die data ook hoe en in welke richting zich iets in de toekomst ontwikkelt. Met Big Data kan de blik weer vooruit.

    Source: Emerce

  • De computeranalyse bepaalt straks of je perspectief hebt bij een bedrijf

    Moet de afdeling personeelszaken ook opletten wat de medewerkers allemaal zeggen en registreren op sociale media als LinkedIn, Facebook en Twitter? Moet dat opgeslagen worden voor de eeuwigheid? Het kunstmatig intelligente softwareprogramma Crunchr blijft daar nu nog allemaal ver vandaan, stelt oprichter Dirk Jonker van Focus Orange, de eigenaar van Crunchr, meerdere malen nadrukkelijk.

    Bedrijven zijn in de ervaring van Jonker ook zeer conservatief in het napluizen van sociale media. ‘Ze doen het nog niet, maar die discussie gaat er komen. Het lijken immers openbare data. Mensen geven heel veel data weg. Kijk naar de bonuskaart van AH: in ruil voor een beetje korting geven ze heel veel informatie aan het bedrijf.’

    Net als bij AH moet de werknemer er volgens Jonker ook beter van worden als hij bijvoorbeeld data over zijn arbeidsverleden ter beschikking stelt. ‘We moeten de werknemer in ruil daarvoor ook een dienst kunnen aanbieden. Daarnaast is transparantie over wat je doet, en waarom, het belangrijkste. Maar voorlopig moeten bedrijven eerst de data gaan gebruiken die ze al hebben.’

    Anoniem onderzoek

    Een tussenstap die bij verschillende klanten al wel heel veel kwalitatieve informatie oplevert, is een jaarlijks anoniem onderzoek onder het personeel. 'Stel ieder jaar veertig vragen over beloning, sfeer, management, chefs, trainingen', zegt Jonker. 'Daar haal je een schat aan informatie uit. En doe consequent exitinterviews als mensen vertrekken.’

    Focus Orange adviseert ook regelmatig bij het ontwerpen van cao’s. ‘Daar zetten we Crunchr ook in. Het blijkt namelijk dat ondernemingen vaak onvoldoende weten wat het personeel echt wil. Als je dat meet, kun je een pakket voorstellen dat effectief en aantrekkelijk is.'

    Consortium

    In mei heeft Jonker het People Analytics Consortium opgericht om het vakgebied verder door te ontwikkelen. De TU Delft, het Centrum voor Wiskunde en Informatie (VU/UvA), Randstad, Wolters Kluwer en ASML nemen hieraan deel. Binnen het consortium worden technieken ontwikkeld — nadrukkelijk niet met data van de betrokkenen bedrijven — om nieuwe vraagstukken te kunnen beantwoorden. Welke vragen moet een bedrijf stellen om patronen te kunnen herkennen en wat zijn de beste technieken om de vragen te laten beantwoorden?

    ‘Er worden bijvoorbeeld 40.000 lappen tekst in het systeem gestopt. Dat moet dan technisch ontleed worden. We trainen ons algoritme bijvoorbeeld op de hele correspondentie rond het Enron-schandaal. Dat is allemaal openbaar.’

    Naast deze vrij ingewikkelde techniek kan Crunchr ook worden ingezet om inconsistenties in bedrijfsdata te vinden. Bijvoorbeeld of iemand teveel verdient voor zijn functie. Bij grote bedrijven die in alle delen van de wereld werken, is zo’n 'uitschieter’ geen uitzondering. ‘Daar komen soms dingen uit die je met het blote oog niet ziet’, zegt Jonker. ‘Wij kunnen in zeven seconden 40.000 records doorakkeren. En met de uitschieters die we vinden kunnen we ons systeem weer kalibreren.’

    Opvolging

    Bij opvolgingsplanning maakt personeelszaken voor senior management en kritieke posities een plan wie deze personen kunnen opvolgen als de positie vacant komt. Voorselectie ligt gevoelig, het kost veel tijd en de plannen zijn beperkt houdbaar. Zodra een belangrijk persoon vertrekt, is de helft van de kandidaten ook al weer in een andere positie, of inmiddels ongeschikt gebleken.

    Crunchr gebruikt de opvolgingsplannen als input voor het algoritme, dat zelf moet leren wat iemand tot een goede opvolger maakt. Als er in de toekomt een positie vrijkomt, kijkt het netwerk in het hele bedrijf naar opvolgers. Het bedrijf is flexibel en niet meer afhankelijk van verouderde plannen. Iedereen komt in beeld en als een bedrijf meer vrouwen in de top wil, dan kan het daar op sturen. Door dit eindeloos systematisch te oefenen is Crunchr getraind.

    Een grote internationale onderneming maakt al snel decentraal een paar duizend plannen, die het regionaal bespreekt en waar personeelszaken op corporate niveau op stuurt. Een getraind algoritme geeft binnen enkele seconden elk plan een realiteitsscore. Een pas afgestudeerde Nederlander opvolger laten zijn voor een senior managementpositie in de VS is bijvoorbeeld onwaarschijnlijk.

    Risico's mitigeren

    Met de bekende ‘grafentheorie’ uit de wiskunde laat Crunchr zien of binnen het management iedereen denkt een aantal opvolgers te hebben, maar als dit veelal dezelfde toppotentials zijn, is het risico niet afgezwakt. Daarnaast kan het bedrijf de opvolgingsplannen gebruiken voor het voorspellen van ‘global mobility’. Hoe bewegen toekomstige leiders over de wereld, waar zitten leiders nu en waarheen verhuizen ze. Beide inzichten zijn volgens Jonker bijzonder waardevol voor de top van het bedrijf.

    FD, 3 oktober 2016

  • DeepMind gaat algoritmes gebruiken om blindheid te voorspellen

    118628 c2f7304fDeepMind, een van de dochterbedrijven van zoekgigant Google, die onderzoek doet naar zelflerende computers, gaat helpen bij onderzoek naar blindheid. DeepMind gaat samenwerken met de Britse gezondheidsorganisatie NHS om zijn technologie te leren de eerste tekenen van blindheid op te sporen.

    Daartoe krijgt DeepMind 1 miljoen geanonimiseerde oogscans aangeleverd. De software gaat die scannen en op basis van meegeleverde informatie zou het moeten weten welke scans een oogziekte vertonen en welke niet. De bedoeling is dat de software uiteindelijk uit zichzelf de eerste tekenen van oogziektes leert te herkennen.

    Het gaat op dit moment om twee vormen van blindheid die relatief veel voorkomen: leeftijdsgebonden maculadegeneratie en diabetische retinopathie. Mensen met diabetes hebben bijvoorbeeld 25 keer zoveel kans om blind te worden als mensen zonder diabetes. Het vroeg herkennen van dit soort gevallen zou kunnen helpen blindheid te voorkomen.

    Het hoofd van de oogafdeling in het ziekenhuis, Professor Peng Tee Khaw, vertelt dat het kan helpen om snel oogziektes op te sporen bij patiënten. "Deze scans zijn ongelofelijk gedetailleerd, gedetailleerder zelfs dan alle andere scans die we van het lichaam hebben. We zien beelden op celniveau. Maar het probleem is tegelijkertijd juist dat het zoveel data biedt."

    Daar komt dan ook de oplossing om DeepMind te gebruiken vandaan. "Ik heb er alle ervaring uit mijn hele leven voor nodig om de geschiedenis van een patiënt te kunnen volgen. Maar patiënten vertrouwen op mijn ervaring om hun toekomst te voorspellen. Als we zelflerende technologie kunnen gebruiken, zouden we dit veel beter kunnen doen, want dan zou ik de ervaring van wel 10.000 levens hebben."

    Bron: Techzine.nl

  • Digitale technologieën leveren Europees bedrijfsleven komende twee jaar 545 miljard euro op

    925609982sEuropese bedrijven kunnen dankzij het toepassen van digitale tools en technologieën een omzetstijging van 545 miljard euro behalen in de komende twee jaar. Voor Nederlandse bedrijven ligt dit bedrag op 23,5 miljard euro. Dat blijkt uit een onderzoek van Cognizant in samenwerking met Roubini Global Economics onder ruim 800 Europese bedrijven.
     
    Het onderzoek The Work Ahead – Europe’s Digital Imperative maakt onderdeel uit van een wereldwijd onderzoek waarin het veranderende karakter van werk in het digitale tijdperk wordt onderzocht. De resultaten tonen aan dat organisaties die het meest proactief zijn in het dichter bij elkaar brengen van de fysieke en virtuele wereld, de grootste kans hebben om meer omzet te behalen.
     
    Omzetpotentieel benutten
    Leidinggevenden geven aan dat technologieën als Artificial Intelligence (AI), Big Data en blockchain een bron kunnen zijn voor nieuwe businessmodellen en inkomststromen, veranderende klantrelaties en lagere kosten. Sterker nog, de ondervraagden verwachten dat digitale technologieën een positief effect van 8,4 procent zullen hebben op de omzet tussen nu en 2018.
     
    Digitalisering kan voor zowel kostenefficiëntie als omzetstijging zorgen. Door bijvoorbeeld intelligent process automation (IPA) toe te passen – waarbij software-robots routinetaken overnemen – kunnen bedrijven kosten besparen in de middle en backoffice. Uit de analyse blijkt dat de impact van digitale transformatie op omzet en kostenbesparing in de onderzochte industrieën (retail, financiële diensten, verzekeringen1, maakindustrie en life sciences) uitkomt op 876 miljoen euro in 2018.
     
    Nog steeds achterblijvers op digitaal gebied
    Europese executives verwachten dat een digitale economie gestimuleerd zal worden door een combinatie van data, algoritmes, software-robots en connected devices. Gevraagd naar welke technologie de grootste invloed zal hebben op het werk in 2020, komt Big Data als winnaar naar voren. Maar liefst 99 procent van de respondenten noemt deze technologie. Opvallend is dat AI vlak daarna met 97 procent op een tweede plek eindigt; respondenten beschouwen AI als meer dan een hype. Sterker nog, de verwachting is dat AI een centrale plek zal innemen in het toekomstige werk in Europa.
     
    Aan de andere kant kunnen late adopters een gezamenlijk verlies van 761 miljard euro verwachten in 2018, zo blijkt uit het onderzoek.
    Een derde van de ondervraagde managers geeft aan dat hun werkgever in hun ogen niet beschikt over de kennis en kwaliteiten om de juiste digitale strategie in te voeren of zelfs geen idee heeft van wat er gedaan moet worden. 30 procent van de ondervraagden is van mening dat hun leidinggevenden te weinig investeren in nieuwe technologieën, terwijl 29 procent terughoudendheid ondervindt in het toepassen van nieuwe manieren van werken.
     
    De belangrijkste obstakels voor bedrijven om de overstap te maken naar digitaal zijn angst voor beveiligings-issues (24%), budgetbeperkingen (21%) en een gebrek aan talent (14%).
     
    Euan Davis, European Head of the Centre for the Future of Work bij Cognizant, licht toe: “Om de noodzakelijke stap te kunnen maken naar digitaal, moet het management proactief zijn en hun organisatie voorbereiden op toekomstig werk. Langzame innovatierondes en onwil om te experimenteren zijn de doodsteek voor organisaties om digitale mogelijkheden goed te kunnen benutten. Het beheren van de digitale economie is een absolute noodzaak voor organisaties. Bedrijven die geen prioriteit geven aan het verdiepen, verbreden, versterken of verbeteren van hun digitale voetafdruk, spelen bij voorbaat een verloren wedstrijd.”
     
    Over het onderzoek
    Uitkomsten zijn gebaseerd op een wereldwijd onderzoek onder 2.000 executives in verschillende industrieën, 250 middenmanagers verantwoordelijk voor andere werknemers, 150 MBA-studenten van grote universiteiten wereldwijd en 50 futuristen (journalisten, academici en auteurs). Het onderzoek onder executives en managers is in 18 landen uitgevoerd in het Engels, Arabisch, Frans, Duits, Japans en Chinees. Executives zijn daarbij telefonisch geïnterviewd, managers via een online vragenlijst. De MBA-studenten en futuristen zijn in het Engels ondervraagd via telefonische interviews (MBA studenten in 15 landen, futuristen in 10 landen). The Work Ahead – Europe’s Digital Imperative bevat de 800 reacties van het Europese onderzoek onder executives en managers. Meer details zijn te vinden in Work Ahead: Insights to Master the Digital Economy.
     
    Source: emerce.nl, 28 november 2016
  • Disruptive models that create the data centric enterprise

    In the digital age, companies are striving for radical reinvention in order to create new, significant and sustainable sources data centric enterpriseof revenue. Imperfect market conditions such as inefficient matching, information asymmetries or human biases and errors open the door to disruption.

    Data is the secret weapon to change the dynamics of competition and spur digital growth. Digital-savvy organizations are conquering markets at a rapid pace by employing data-centric strategies to outpace the incumbents.

    For best-in-class organizations, data has meanwhile become a critical corporate asset—similar to land, labor or capital—not just to improve their operations but to launch entirely new business models. 

    The advent of artificial intelligence, data analytics and machine learning enable organizations to solve an unprecedented array of business problems—and the emergence of technology is continuously pushing the boundaries even further. 

    To jumpstart from center span to front line, McKinsey has identified the following six distinctively different patterns of how organizations can apply data-centric models to turn strategic insights into a competitive advantage, as published in their "The Age of Analytics Report":

    Leveraging orthogonal data can be a game-changer

    Across most verticals, incumbents are used to relying on a standardized set of certain data. Bringing new data all of a sudden to the table to enrich the data already employed can change the dynamics of competition. New entrants utilizing privileged access to these “orthogonal” data sets can cause a disruption in their respective field of business. Rather than replacing existing data silos, orthogonal data typically complements the data in use to enable correlation as well as taping into new territory to gain additional insights.

    Matching supply and demand in real-time through digital platforms

    Digital platforms are matchmakers that connect sellers and buyers for products or services. They typically provide a transaction-based framework and act as an intermediate to facilitate the sales process. Thanks to data and analytics, platform operators can now do this in real-time and on an unparalleled order of magnitude in markets where matching supply and demand has been inefficient.

    Personal transportation is one example where platforms such as Uber, Lyft and Chinese ride-sharing giant Didi Chuxing have expanded rapidly by allowing third parties to put their underutilized assets to work, rather than owning large fleets themselves. By 2030, shared mobility services could account for more than 15 to 20 percent of total passenger vehicle miles globally. This growth—and the resulting disruption to the taxi industry—may be only a hint of what is to come. 

    Data and analytics allow “radical personalization”

    Data and analytics can discover more granular levels of distinctions, with micro-segmenting a population based on the characteristics of individuals being a powerful use case. Using the resulting insights to radically personalize products and services is changing the fundamentals of competition across many industries, including advertising, education, travel and leisure, media and retail.

    This capability could also heavily affect the way health care is provided when incorporating the behavioral, genetic, and molecular data connected with many individual patients. The advent of proteomics, the declining costs of genome sequencing and the growth of real-time monitoring technologies allow generating this kind of new, ultra-fine data. Experts estimate the economic impact could range from $2 trillion to $10 trillion.

    Massive data integration capabilities can break down organizational silos

    Combining and integrating large-scale data sets from a variety of sources, and breaking silos within an organization to correlate data, has enormous potential to gain insights. However, many organizations are struggling with creating the right structure for that synthesis to take place. 

    Retail banking, for instance, is an industry possessing lots of data on customers’ transactions, financial status and demographics. Massive data integration could enable better cross-selling, the development of personalized products, yield management, better risk assessment and more effective marketing campaigns—and ultimately help the institutions become more competitive. In fact, McKinsey estimates the total impact of $110 billion to $170 billion in the retail banking industry in developed markets and approximately $60 billion to $90 billion in emerging markets. 

    Data and analytics can fuel discovery and innovation

    Innovation can serve as a booster to differentiate and leapfrog competition. Throughout human kind, people have exploring new ideas in an effort to strive for progression. However, with the emergence of artificial intelligence, data mining and machine learning human ingenuity is now being supported, enhanced or even replaced in some instances. 

    For example, data and analytics are helping organizations determine how to set up teams, resources and workflows to optimize their outcome. High-performing teams can be many times more productive than low-performing teams. Understanding this variance and how to accomplish more effective collaboration presents a huge opportunity for organizations. Data and analytics can also test hypotheses and find new patterns that may not have even been recognized otherwise. In product innovation, data and analytics can transform research and development in areas such as materials science, synthetic biology and life sciences. 

    Algorithms can support and enhance human decision-making

    Human decision-making is often muddy, biased and limited. Analytics can help overcome this by taking far more data points into account across multiple sources, breaking down information asymmetries, and adding automated algorithms to make the process instantaneous. 

    As the sources of data grow in complexity and diversity, there are many ways to use the resulting insights to make decisions faster, more accurate, more consistent, and more transparent. Besides medical decision support systems to preclude human error when it comes to treatments, smart cities are one of the other prevailing settings for applying the ability of machines and algorithms to scrutinize huge data sets in a blink of an eye. Utilizing sensors to smoothly route traffic flows and IoT-enabled utilities to reduce waste and keep infrastructure systems working at top efficiency are just two of the many smart city scenarios.

    Source: Information Management 

    Author: Marc Wilczek

  • Do data scientists have the right stuff for the C-suite?

    The Data Science Clock v1.1 Simple1What distinguishes strong from weak leaders? This raises the question if leaders are born or can be grown. It is the classic “nature versus nurture” debate. What matters more? Genes or your environment?

    This question got me to thinking about whether data scientists and business analysts within an organization can be more than just a support to others. Can they be become leaders similar to C-level executives? 

    Three primary success factors for effective leaders

    Having knowledge means nothing without having the right types of people. One person can make a big difference. They can be someone who somehow gets it altogether and changes the fabric of an organization’s culture not through mandating change but by engaging and motivating others.

    For weak and ineffective leaders irritating people is not only a sport for them but it is their personal entertainment. They are rarely successful. 

    One way to view successful leadership is to consider that there are three primary success factors for effective leaders. They are (1) technical competence, (2) critical thinking skills, and (3) communication skills. 

    You know there is a problem when a leader says, “I don’t do that; I have people who do that.” Good leaders do not necessarily have high intelligence, good memories, deep experience, or innate abilities that they are born with. They have problem solving skills. 

    As an example, the Ford Motor Company’s CEO Alan Mulally came to the automotive business from Boeing in the aerospace industry. He was without deep automotive industry experience. He has been successful at Ford. Why? Because he is an analytical type of leader.

    Effective managers are analytical leaders who are adaptable and possess systematic and methodological ways to achieve results. It may sound corny but they apply the “scientific method” that involves formulating hypothesis and testing to prove or disprove them. We are back to basics.

    A major contributor to the “scientific method” was the German mathematician and astronomer Johannes Kepler. In the early 1600s Kepler’s three laws of planetary motion led to the Scientific Revolution. His three laws made the complex simple and understandable, suggesting that the seemingly inexplicable universe is ultimately lawful and within the grasp of the human mind. 

    Kepler did what analytical leaders do. They rely on searching for root causes and understanding cause-and-effect logic chains. Ultimately a well-formulated strategy, talented people, and the ability to execute the executive team’s strategy through robust communications are the key to performance improvement. 

    Key characteristics of the data scientist or analyst as leader

    The popular Moneyball book and subsequent movie about baseball in the US demonstrated that traditional baseball scouts methods (e.g., “He’s got a good swing.”) gave way to fact-based evidence and statistical analysis. Commonly accepted traits of a leader, such as being charismatic or strong, may also be misleading.

    My belief is that the most scarce resource in an organization is human ability and competence. That is why organizations should desire that every employee be developed for growth in their skills. But having sound competencies is not enough. Key personal qualities complete the package of an effective leader. 

    For a data scientist or analyst to evolve as an effective leader three personal quality characteristics are needed: curiosity, imagination, and creativity. The three are sequentially linked. Curious people constantly ask “Why are things the way they are?” and “Is there a better way of doing things?” Without these personal qualities then innovation will be stifled. The emergence of analytics is creating opportunities for analysts as leaders. 

    Weak leaders are prone to a diagnostic bias. They can be blind to evidence and somehow believe their intuition, instincts, and gut-feel are acceptable masquerades for having fact-based information. In contrast, a curious person always asks questions. They typically love what they do. If they are also a good leader they infect others with enthusiasm. Their curiosity leads to imagination. Imagination considers alternative possibilities and solutions. Imagination in turn sparks creativity.

    Creativity is the implementation of imagination

    Good data scientists and analysts have a primary mission: to gain insights relying on quantitative techniques to result in better decisions and actions. Their imagination that leads to creativity can also result in vision. Vision is a mark of a good leader. In my mind, an executive leader has one job (aside from hiring good employees and growing them). That job is to answer the question, “Where do we want to go?” 

    After that question is answered then managers and analysts, ideally supported by the CFO’s accounting and finance team, can answer the follow-up question, “How are we going to get there?” That is where analytics are applied with the various enterprise and corporate performance management (EPM/CPM) methods that I regularly write about. EPM/CPM methods include a strategy map and its associated balance scorecard with KPIs; customer profitability analysis; enterprise risk management (ERM), and capacity-sensitive driver-based rolling financial forecasts and plans. Collectively they assure that the executive team’s strategy can be fully executed.

    My belief is that that other perceived characteristics of a good leader are over-rated. These include ambition, team spirit, collegiality, integrity, courage, tenacity, discipline, and confidence. They are nice-to-have characteristics, but they pale compared to the technical competency and critical thinking and communications skills that I earlier described. 

    Be analytical and you can be a leader. You can eventually serve in a C-suite role

    Author: Gary Cokins 

    Source: Information Management

  • Do you know what a data driven company is?

    Most companies today claim to be fluent in data, but as with most trends, these claims tend to be exaggerated. Com

    Data driven company 1396308504-data-driven-means-never-having-say-sorry

    panies are high on data, but what does it mean to be a data-driven company? I went ahead and asked a number of business leaders.

    According to Amir Orad, CEO of Sisense, a business intelligence software provider, true data-driven companies understand that data should be omnipresent and accessible.

    "A data-driven company is an organization where every person who can use data to make better decisions, has access to the data they need when they need it. being data-driven is not about seeing a few canned reports at the beginning of every day or week; it's about giving the business decision makers the power to explore data independently, even if they're working with big or disparate data sources."

    Asaf Yigal, the co-Founder of Logz.io, ELK as a service cloud platform, agrees, but emphasized the importance of measurability.

    "Data-driven complains are companies that relentlessly measure and monitor the pulse of the business and are doing so in a continuous and often automated manner."

    Companies often proudly talk about data-driven marketing, but forget that the company itself should be driven by data, internally and externally. It's also important to remember that internal data might help produce information that can be used for marketing and sales purposes.

    "There's a lot of talk about data-driven marketing and sales, etc., but not a lot about a company as a whole becoming data-driven," said Manish Sood, the founder and CEO of Reltio.

    Bryan Harwood from Outsell sets says a company needs to meet the following three objectives to qualify.

     

    1. It should be able to not only draw data from a variety of internal and external sources, but also be able to blend that data in an analytics engine and distill it down to actionable insights.

    2. These insights should drive real-time decision making that infuses every level of the organization.

    3. The data should yield quantifiable results downstream that in turn, inform the organization about which data sources are yielding results.

    Considering the increasing complexity of data growing larger in size, changing rapidly and spread between many disparate sources, accessibility alone is not enough.

    "Being data-driven is not about seeing a few canned reports at the beginning of every day or week; it's about giving the business decision makers the power to explore data independently, even if they're working with big or disparate data sources. They need to be able to ask questions and receive answers that are based on data before the decision is actually made -- today in many places the executive makes a 'gut-instinct' decision and then looks for the data to justify it. But if data is readily available and easy to analyze and to present in visual form, it becomes an inherent part of the decision-making process -- and that's what makes an organization truly data-driven," said Orad.

    The surge of a data-driven culture has also had a significant impact on how companies are structured. The complexity of data forces companies to merge different department to harness their individual strengths to make the most of data. Being data-driven means making use of massive quantities of unstructured data – text, video, voice.  In the past this belonged to the IT department which had a tough time

    extracting insights from it.

     

    From darkness to light: how to become data-driven

    According to most experts, the road to data fluency is not easy or glamorous.

    "To become a data-driven company the belief in the importance of the integrity and quality of information needs to perme

    ate the culture of the company at all levels. It is not enough to start a formal data governance program, becoming data-driven requires a disciplined shift in the mindset of all employees towards maintaining the integrity and quality of their data," said Chris Jennings, vice president of technology services at Collaborative Consulting.

    Yigal from Logz.io asks companies to treat data as religion.

    "Companies need to be religious with demanding to see the data before and after changes are being made. Especially in fast moving start-up companies where changes are easily made it's prudent to track the impact of every change."

    To make sure data is not only in the hands of IT and other data enthusiasts, organizations need to embrace a switch in culture. 

    Most experts agree that business intelligence needs to be in the hands of every decision maker in the company to make sure the entire staff is aligned and fighting the same battles.

    "This way, there are no 'different versions of the truth' floating around in disparate spreadsheets, and every user has a consistent experience across platforms," said Ulrik Pederson, CTO of TARGIT.

     

    Once the organization is prepared for the switch, there are three key components of becoming a data-driven organization.

    • Build a data warehouse
    • Connect the most critical data sources
    • Share data through dashboards, analyses, mobile business intelligence, storyboards, and report

    As data volume, variety, and velocity increase, so does the organization's ability to make use of it, especially if the cultural and technical elements are in place. Analytics, resources, and skills should not be limited to a few departments, and everyone, from sales to marketing and IT to finance, should leverage the benefits of data.

    Source: InfoWorld. This article is published as part of the IDG Contributor Network.

  • E-commerce and the growing importance of data

    E-commerce and the growing importance of data

    E-commerce is claiming a bigger role in global retail. In the US for example, e-commerce currently accounts for approximately 10% of all retail sales, a number that is projected to increase to nearly 18% by 2021. To a large extent, the e-commerce of the present exists in the shadow of the industry’s early entrant and top player, Amazon. Financial analysts predict that the retail giant will control 50% of the US’ online retail sales by 2021, leaving other e-commerce stores frantically trying to take a page out of the company’s incredibly successful online retail playbook.

    While it seems unlikely that another mega-retailer will rise to challenge Amazon’s e-commerce business in the near future, at least 50% of the online retail market is wide open. Smaller and niche e-commerce stores have a ;arge opportunity to reach specialized audiences, create return customers, and cultivate persistent brand loyalty. Amazon may have had a first-mover advantage, but the rise in big data and the ease of access to analytics means that smaller companies can find areas in which to compete and improve margins. As e-retailers look for ways to expand revenues while remaining lean, data offers a way forward for smart scalability.

    Upend your back-end

    While data can improve e-commerce’s customer-facing interactions, it can have just as major an impact on the customer experience factors that take place off camera. Designing products that customers want, having products in stock, making sure that products ship on schedule, all these kind of back-end operations play a part in shaping customer experience and satisfaction. In order to shift e-commerce from a product-centric to a consumer-centric model, e-commerce companies need to invest in unifying customer data to inform internal processes and provide faster, smarter services.

    The field of drop shipping, for instance, is coming into its own thanks to smart data applications. Platforms like Oberlo are leveraging prescriptive analytics to enable intelligent product selection for Shopify stores, helping them curate trending inventory that sells, allowing almost anyone to create their own e-store. Just as every customer touchpoint can be enhanced with big data, e-commerce companies that apply unified big data solutions to their behind-the-scenes benefit from streamlined processes and workflow.

    Moreover, e-commerce companies that harmonize data across departments can identify purchasing trends and act on real-time data to optimize inventory processes. Using centralized data warehouse software like Snowflake empowers companies to create a single version of customer truth to automate reordering points and determine what items they should be stocking in the future. Other factors, such as pricing decisions, can also be finessed using big data to generate specific prices per product that match customer expectations and subsequently sell better.

    Data transforms the customer experience

    When it comes to how data can impact the overall customer experience, e-commerce companies don’t have to invent the wheel. There’s a plethora of research that novice and veteran data explorers can draw on when it comes to optimizing customer experiences on their websites. General findings on the time it takes for customers to form an opinion of a website, customers’ mobile experience expectations, best times to send promotional emails and many more metrics can guide designers and developers tasked with improving e-commerce site traffic.

    However, e-commerce sites that are interested in more benefits will need to invest in more specific data tools that provide a 360-degree view of their customers. Prescriptive analytic tools like Tableau empower teams to connect the customer dots by synthesizing data across devices and platforms. Data becomes valuable as it provides insights that allow companies to make smarter decisions based on each consumer identify inbound marketing opportunities and automate recommendations and discounts based on the customer’s previous behavior.

    Data can also inspire changes in a field that has always dominated the customer experience: customer support. The digital revolution has brought substantial changes in the once sleepy field of customer service, pioneering new ways of direct communication with agents via social media and introducing the now ubiquitous AI chatbots. In order to provide the highest levels of customer satisfaction throughout these new initiatives, customer support can utilize data to anticipate when they might need more human agents staffing social media channels or the type of AI persona that their customers want to deal with. By improving customer service with data, e-commerce companies can improve the entire customer experience.

    Grow with your data

    As more and more data services migrate to the cloud, e-commerce companies have ever-expanding access to flexible data solutions that both fuel growth and scale alongside the businesses they’re helping. Without physical stores to facilitate face-to-face relationships, e-commerce companies are tasked with transforming their digital stores into online spaces that customers connect with and ultimately want to purchase from again and again.

    Data holds the key to this revolution. Instead of trying to force their agenda upon customers or engage in wild speculations about customer desires, e-commerce stores can use data to craft narratives that engage customers, create a loyal brand following, and drive increasing profits. With only about 2.5% of e-commerce & web visits converting to saleson average, e-commercecompanies that want to stay competitive must open themselves up to big data and the growth opportunities it offers.

    Author: Ralph Tkatchuck

    Source: Dataconomy

  • Een datagedreven organisatiecultuur: waar te beginnen?

    3jzi1jporwq1besspdljn34iuj7oxblBedrijven met een datagedreven organisatiecultuur plukken daar de vruchten van. De voordelen van data inzet zijn bij de meeste bedrijven wel bekend, echter blijft de implementatie vaak achter. Op zich niet verrassend: de overgang op verschillende organisatieniveaus is een hele uitdaging en cultuurverandering kost tijd.

    Begrippen als big data, business intelligence, analytics en data science zijn voor veel organisaties nog behoorlijk abstract. Met alle beschikbare data heb je feitelijk goud in handen, maar hoe ga je slim om met data, hoeveel waarde zit er in en welke aanpak is de sleutel tot succes? Organisaties met veel data in hun bezit behalen namelijk lang niet altijd direct concurrentievoordeel. Het zijn organisaties die data gebruiken als basis voor het nemen van beslissingen die het meest profiteren.


    Mindset
    Voor bedrijven die (big) data effectief willen inzetten, is het van belang dat er een cultuur aanwezig is waarin de besluitvorming wordt gebaseerd op data. Zonder een datagedreven cultuur worden medewerkers namelijk niet gestimuleerd om de technologieën ook daadwerkelijk te gebruiken. Verandering van de mindset is hierbij dus cruciaal. Maar hoe kom je tot een cultuuromslag? En hoe zorg je ervoor dat uiteindelijk iedereen binnen je organisatie data gedreven beslissingen omarmt? Hieronder vind je een aantal direct toepasbare tips om zelf invloed uit te oefen op je organisatiecultuur.


    Start met het bepalen en meten van KPI's
    Wat je niet meet, kun je ook niet optimaliseren. De juiste KPI's zijn het fundament van data gedreven beslissingen. Zorg dat je helder in kaart hebt welke data gerelateerd zijn aan het succes van je organisatie. Welke datapunten zijn belangrijk en hoe verhouden deze zich tot je business doelstellingen? Meet wat gekwantificeerd kan worden. Is kwantificeren niet mogelijk? Zorg dan voor kwalitatieve data. Wanneer je de KPI's scherp hebt, kun je goede en gegronde beslissingen/aanbevelingen maken. Hier zal ook je omgeving de meerwaarde van zien.


    Wees zelf ambassadeur
    Maak je eigen beslissingen datagedreven en zet anderen aan om dit ook te doen. Het is van belang dat je zoveel mogelijk data van je organisatie inzet om datagedreven beslissingen te nemen. Wanneer data namelijk onderbuikgevoelens versterkt, of nog beter, deze tegenspreekt, zal het belang van een datagedreven aanpak vanzelf op de voorgrond treden.


    Start vandaag
    De weg naar een datagedreven organisatie is lang, maar je kunt wel vandaag de eerste stappen zetten. Begin bijvoorbeeld met het zoeken en uitvoeren van een aantal datahefbomen waarmee je datagedreven resultaten kunt boeken en groei van daaruit door. Maak de boodschap duidelijk naar je collega’s, toon hun het belang van datagedreven beslissingen aan. Faciliteer het gebruik van data. En laat zien dat het werkt.

    Source: Twinkle

  • Een eerste indruk van de fusie tussen Cloudera en Hortonworks

    Een eerste indruk van de fusie tussen Cloudera en Hortonworks

    Een aantal maanden geleden werd bekend dat big data-bedrijven Cloudera en Hortonworks gaan fuseren. De overname is inmiddels goedgekeurd en Cloudera en Hortonworks gaan verder als één bedrijf. Techzine ging in gesprek met Wim Stoop, senior product marketing manager bij Cloudera. Stoop heeft alle ins en outs wat betreft de visie rond deze fusie en wat de fusie betekent voor bedrijven en data analisten die met de producten van de twee bedrijven werken.

    Stoop vertelt dat deze fusie min of meer het perfecte huwelijk is. Beide bedrijven houden zich bezig met big data op basis van Hadoop en hebben zich de afgelopen jaren hierin gespecialiseerd. Zo is Hortonworks erg goed in Hadoop Data Flow (HDF), werken met streaming data die snel in het Hadoop platform moeten worden toegevoegd. 

    Cloudera data science workbench

    Cloudera heeft met zijn data science workbench een goede oplossing in handen voor data analisten. Zij kunnen met deze workbench snel en eenvoudig data combineren en analyseren, zonder dat je daarvoor direct extreem veel rekenkracht nodig hebt. Met de workbench van Cloudera kun je experimenteren en testen om te zien wat voor uitkomsten dit biedt, voordat je het meteen op grote schaal toepast. Het belangrijkste voordeel is dat de workbench overweg kan met enorm veel programmeertalen, waardoor iedere data analist in zijn eigen favoriete taal kan werken. De workbench houdt tevens exact bij welke stappen zijn doorlopen om tot een resultaat te komen. De uitkomst is weliswaar belangrijk, maar het algoritme en methoden die leiden tot het eindresultaat zijn minstens net zo belangrijk.

    De route naar één oplossing

    Als je er dieper op in gaat dan zijn er natuurlijk veel meer zaken waar juist Hortonworks of Cloudera heel erg goed in is. Of welke technologie net even beter of efficiënter is dan de andere. Dat zal het nieuwe bedrijf dwingen tot harde keuzes, maar volgens Stoop gaat dat allemaal wel goed komen. De behoefte aan een goed dataplatform is enorm groot, dat er dan keuzes gemaakt moeten worden is onvermijdelijk. Uiteindelijk speelt het bedrijf hiermee in op de kritiek die er op Hadoop is. Hadoop zelf vormt de basis van de database, maar daarboven zijn er zo veel verschillende modules die data kunnen inlezen, uitlezen of verwerken. Daardoor is het overzicht soms ver te zoeken. Het feit dat er zoveel oplossingen zijn heeft te maken met het open source karakter en de steun van bedrijven als Cloudera en Hortonworks, die bij veel projecten de grootste bijdrager zijn. Dat gaat ook veranderen met deze fusie. Er komt dit jaar nog een nieuw platform met de naam Cloudera Data Platform. In dit platform zullen de beste onderdelen van Hortonworks en Cloudera worden samengevoegd. Het betekent ook dat conflicterende projecten of modules goed nieuws zullen zijn voor de een maar slecht nieuws voor de ander. Voor het verwerken van metadata gebruiken beide bedrijven nu een andere oplossing, in het Cloudera Data Platform zullen we er maar één terug zien. Dat betekent dat het aantal modules een stukje minder wordt en alles overzichtelijker wordt, wat voor alle betrokkenen positief is.

    Cloudera Data Platform

    De naam van het nieuwe bedrijf was nog niet aan bod gekomen. De bedrijven hebben gekozen voor een fusie, maar uiteindelijk zal de naam Hortonworks gewoon verdwijnen. Het bedrijf gaat verder als Cloudera, vandaar ook de naam Cloudera Data Platform. De bedoeling is dat het Cloudera Data Platform dit jaar nog beschikbaar wordt, zodat klanten ermee kunnen gaan testen. Zodra het platform stabiel en volwassen genoeg is, krijgen klanten het advies om te migreren naar dit nieuwe platform. Alle bestaande Cloudera en Hortonworks producten zullen uiteindelijk gaan verdwijnen, maar tot eind 2022 blijven deze producten wel volledig ondersteund. Daarna moet iedereen echter over op het Cloudera Data Platform. Cloudera heeft in de meest recente versies van zijn huidige producten al rekening gehouden met een migratietraject. Bij Hortonworks zal dit vanaf nu ook gaan gebeuren. Het bedrijf gaat stappen zetten zodat bestaande producten en het nieuwe Data Platform in staat zijn om samen te werken bij de migratie naar het nieuwe platform.

    Shared data experience

    Een andere innovatie die volgens Stoop in de toekomst steeds belangrijker wordt is de shared data experience. Als klanten Cloudera producten gebruiken dan kunnen deze Hadoop-omgevingen eenvoudig aan elkaar gekoppeld worden, zodat ook de resources (CPU, GPU, geheugen) gecombineerd kunnen worden bij het analyseren van data. Stel dat een bedrijf Cloudera-omgevingen voor data-analyses heeft in eigen datacenters én cloudplatformen, maar dat het daarna ineens een heel groot project moet analyseren. In dat geval zou het al die omgevingen kunnen combineren en gezamenlijk kunnen inzetten. Daarnaast is het mogelijk om bijvoorbeeld data van lokale kantoren/filialen te combineren.

    Door fusie meer innovatie mogelijk

    Een gigantisch voordeel van deze fusie is volgens Stoop de ontwikkelcapaciteit die beschikbaar wordt om nieuwe innovatieve oplossingen te ontwikkelen. De bedrijven waren nu vaak afzonderlijk van elkaar aan vergelijkbare projecten aan het werken. Beide bedrijven droegen bijvoorbeeld bij aan een verschillend project dat om kan gaan met metadata in Hadoop. Uiteindelijk was een van de twee het wiel opnieuw aan het uitvinden, dat is nu niet meer nodig. Gezien de huidige arbeidsmarkt is het vinden van ontwikkelaars die de juiste passie en kennis hebben voor data analyse enorm lastig. Met deze fusie kan er veel efficiënter gewerkt gaan worden en kunnen er flink wat teams ingezet worden voor het ontwikkelen van nieuwe innovatieve oplossingen. Deze week vindt de Hortonworks Datasummit plaats in Barcelona. Daar zal ongetwijfeld meer bekend worden gemaakt over de fusie, de producten en de status van het nieuwe Cloudera Data Platform.

    Auteur: Coen van Eenbergen

    Bron: Techzine

     

  • Eerste Big Data Hub van Nederland geopend

    arenaDe Amsterdam ArenA opent vanmiddag haar deuren voor de eerste zogenoemde Big Data Hub van Nederland. Ondernemers kunnen daar met overheden en wetenschappers data delen en data-gedreven innovaties ontwikkelen. Op allerlei terrein: van entertainment tot mobiliteit.

    Het Data Hub richt zich grotendeels op toepassingen in en rond de ArenA. Daarbij kan worden gedacht aan het sturen van bezoekersstromen, het optimaliseren van  bezoekerservaringen, het vergroten van de veiligheid, het beter benutten van water en elektriciteit en het sneller en effectiever informeren van hulpdiensten.

    Het Big Data Value Center maakt deel uit van de al eerder gelanceerde Amsterdam Innovation ArenA. Eén van de faciliteiten wordt een zogenoemde integrated control room waarin alle data van mobiliteitsstromen, nutsvoorzieningen en veiligheidscamera’s worden gecombineerd ten behoeve van misdaadpreventie en het reguleren van optimale evacuatieroutes via LED-verlichting in de vloer.

    Vervoersbedrijven kunnen mobiliteit tijdens pieken beter stroomlijnen. Energie-, afval- en waterbedrijven gebruiken de data om preventief onderhoud te kunnen plegen of hun processen te verduurzamen.

    Tijdens evenementen kunnen de data worden gebruikt om de beleving te vergroten via social media en smartphones door spelers op het veld te volgen of zelf beelden te laten maken en te delen. Deelnemers aan de Hub kunnen deze en eigen concepten en datasets testen in deze operationele omgeving.

    De belangrijkste partners zijn KPMG, Kamer van Koophandel, TNO en Commit2Data. Deze initiatiefnemers hopen dat de oplossingen door de samenwerkende partijen worden overgenomen naar (andere) speelsteden van het EK. Of naar vergelijkbare locaties als Schiphol of de Floriade.

    De Big Data Hub Metropoolregio Amsterdam is de eerste hub van vier en heeft een belangrijke focus op de creatieve industrie. Eind dit jaar volgt de Big Data Hub voor o.a. Security en Logistiek in de metropoolregio Den Haag/Rotterdam. Eind 2017 moeten ook hubs in Noord-Nederland (Energie) en Noord-Brabant (Maakindustrie) van start zijn.

    Bron: Emerce, 12 september 2016

     

     

  • Effective data analysis methods in 10 steps

    Effective data analysis methods in 10 steps

    In this data-rich age, understanding how to analyze and extract true meaning from the digital insights available to our business is one of the primary drivers of success.

    Despite the colossal volume of data we create every day, a mere 0.5% is actually analyzed and used for data discovery, improvement, and intelligence. While that may not seem like much, considering the amount of digital information we have at our fingertips, half a percent still accounts for a huge amount of data.

    With so much data and so little time, knowing how to collect, curate, organize, and make sense of all of this potentially business-boosting information can be a minefield, but online data analysisis the solution.

    To help you understand the potential of analysis and how you can use it to enhance your business practices, we will answer a host of important analytical questions. Not only will we explore data analysis methods and techniques, but we’ll also look at different types of data analysis while demonstrating how to do data analysis in the real world with a 10-step blueprint for success.

    What is a data analysis method?

    Data analysis methods focus on strategic approaches to taking raw data, mining for insights that are relevant to a business’s primary goals, and drilling down into this information to transform metrics, facts, and figures into initiatives that benefit improvement.

    There are various methods for data analysis, largely based on two core areas: quantitative data analysis methods and data analysis methods in qualitative research.

    Gaining a better understanding of different data analysis techniques and methods, in quantitative research as well as qualitative insights, will give your information analyzing efforts a more clearly defined direction, so it’s worth taking the time to allow this particular knowledge to sink in.

    Now that we’ve answered the question, ‘what is data analysis?’, considered the different types of data analysis methods, it’s time to dig deeper into how to do data analysis by working through these 10 essential elements.

    1. Collaborate your needs

    Before you begin to analyze your data or drill down into any analysis techniques, it’s crucial to sit down collaboratively with all key stakeholders within your organization, decide on your primary campaign or strategic goals, and gain a fundamental understanding of the types of insights that will best benefit your progress or provide you with the level of vision you need to evolve your organization.

    2. Establish your questions

    Once you’ve outlined your core objectives, you should consider which questions will need answering to help you achieve your mission. This is one of the most important steps in data analytics as it will shape the very foundations of your success.

    To help you ask the right things and ensure your data works for you, you have to ask the right data analysis questions.

    3. Harvest your data

    After giving your data analytics methodology real direction and knowing which questions need answering to extract optimum value from the information available to your organization, you should decide on your most valuable data sources and start collecting your insights, the most fundamental of all data analysis techniques.

    4. Set your KPIs

    Once you’ve set your data sources, started to gather the raw data you consider to potentially offer value, and established clearcut questions you want your insights to answer, you need to set a host of key performance indicators (KPIs) that will help you track, measure, and shape your progress in a number of key areas.

    KPIs are critical to both data analysis methods in qualitative research and data analysis methods in quantitative research. This is one of the primary methods of analyzing data you certainly shouldn’t overlook.

    To help you set the best possible KPIs for your initiatives and activities, explore our collection ofkey performance indicator examples.

    5. Omit useless data

    Having defined your mission and bestowed your data analysis techniques and methods with true purpose, you should explore the raw data you’ve collected from all sources and use your KPIs as a reference for chopping out any information you deem to be useless.

    Trimming the informational fat is one of the most crucial steps of data analysis as it will allow you to focus your analytical efforts and squeeze every drop of value from the remaining ‘lean’ information.

    Any stats, facts, figures, or metrics that don’t align with your business goals or fit with your KPI management strategies should be eliminated from the equation.

    6. Conduct statistical analysis

    One of the most pivotal steps of data analysis methods is statistical analysis.

    This analysis method focuses on aspects including cluster, cohort, regression, factor, and neural networks and will ultimately give your data analysis methodology a more logical direction.

    Here is a quick glossary of these vital statistical analysis terms for your reference:

    • Cluster: The action of grouping a set of elements in a way that said elements are more similar (in a particular sense) to each other than to those in other groups, hence the term ‘cluster’.
    • Cohort: A subset of behavioral analytics that takes insights from a given data set (e.g. a web application or CMS) and instead of looking at everything as one wider unit, each element is broken down into related groups.
    • Regression: A definitive set of statistical processes centered on estimating the relationships among particular variables to gain a deeper understanding of particular trends or patterns.
    • Factor: A statistical practice utilized to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called ‘factors’. The aim here is to uncover independent latent variables.
    • Neural networks: A neural network is a form of machine learning which is far too comprehensive to summarize, but this explanation will help paint you a fairly comprehensive picture.

    7. Build a data management roadmap

    While (at this point) this particular step is optional (you will have already gained a wealth of insight and formed a fairly sound strategy by now), creating a data governance roadmap will help your data analysis methods and techniques become successful on a more sustainable basis. These roadmaps, if developed properly, are also built so they can be tweaked and scaled over time.

    Invest ample time in developing a roadmap that will help you store, manage, and handle your data internally, and you will make your analysis techniques all the more fluid and functional.

    8. Integrate technology

    There are many ways to analyze data, but one of the most vital aspects of analytical success in a business context is integrating the right  decision support software and technology.

    Robust analysis platforms will not only allow you to pull critical data from your most valuable sources while working with dynamic KPIs that will offer you actionable insights, it will also present the information in a digestible, visual, interactive format from one central, live dashboard. A data analytics methodology you can count on.

    By integrating the right technology for your statistical method data analysis and core data analytics methodology, you’ll avoid fragmenting your insights, saving you time and effort while allowing you to enjoy the maximum value from your business’s most valuable insights.

    9. Answer your questions

    By considering each of the above efforts, working with the right technology, and fostering a cohesive internal culture where everyone buys into the different ways to analyze data as well as the power of digital intelligence, you will swiftly start to answer to your most important, burning business questions. 

    10. Visualize your data

    Arguably, the best way to make your data analysis concepts accessible across the organization is through data visualization. An online data visualization is a powerful tool as it lets you tell a story with your metrics, allowing users across the business to extract meaningful insights that aid business evolution. It also covers all the different ways to analyze data.

    The purpose of data analysis is to make your entire organization more informed and intelligent, and with the right platform or dashboard, this can be simpler than you think.

    Data analysis in the big data environment

    Big data is invaluable to today’s businesses, and by using different methods for data analysis, it’s possible to view your data in a way that can help you turn insight into positive action.

    To inspire your efforts and put the importance of big data into context, here are some insights that could prove helpful. Some facts that will help shape your big data analysis techniques.

    • By 2020, around 7 megabytes of new information will be generated every second for every single person on the planet.
    • A 10% boost in data accessibility will result in more than $65 million extra net income for your average Fortune 1000 company.
    • 90% of the world’s big data was created in the past three years.
    • According to Accenture, 79% of notable business executives agree that companies that fail to embrace big data will lose their competitive position and could face extinction. Moreover, 83% of business execs have implemented big data projects to gain a competitive edge.

    Data analysis concepts may come in many forms, but fundamentally, any solid data analysis methodology will help to make your business more streamlined, cohesive, insightful and successful than ever before.

    Author: Sandra Durcevic

    Source: Datapine

  • European Union to Scrutinize Usage of Big Data by Large Internet Companies

    Competition Commissioner Margrethe VestagerThe European Union is considering whether the way large Internet companies, such as Alphabet Inc.’s Google or Facebook Inc., collect vast quantities of data is in breach of antitrust rules, the bloc’s competition chief said Sunday.

    “If a company’s use of data is so bad for competition that it outweighs the benefits, we may have to step in to restore a level playing field,” said Margrethe Vestager, European Commissioner for Competition, according to a text of her speech delivered at the Digital Life Design conference in Munich, Germany.

    “We continue to look carefully at this issue,” she said, adding that while no competition problems have yet been found in this area, “this certainly doesn’t mean we never will” find them in the future.

    Her comments highlight the increased focus that regulators give to the use of so-called big data—large sets of personal information that are increasingly important for digital businesses, even though people generally hand over the information voluntarily when they use free services.

    The data can help firms target ways to make business operations more efficient. Companies increasingly are also collecting more data as a greater range of devices—from fitness trackers, smoke detectors to home-heating meters—are being connected to the Web, a phenomenon known as the “Internet of Things.”

    “But if just a few companies control the data you need to satisfy customers and cut costs, that could give them the power to drive their rivals out of the market,” Ms. Vestager said.

    The concern is that huge data sets compiled by large Internet firms could give these companies an unfair advantage by essentially erecting barriers to new competition, some experts say. Incumbent firms would amass detailed profiles of their consumers that would allow them to target advertising with precision, while new rivals could find themselves too far behind to compete.

    This isn’t the first time Ms. Vestager has expressed interest into how companies use big data. On Sunday, she laid out some details about how the European Commission is looking into the issue.

    Ms. Vestager said the commission would be careful to differentiate between different types of data, since some forms of information can become obsolete quickly, making concerns of market dominance moot.

    She also said the EU would look into why some companies can’t acquire information that is as useful as the data that other competing firms have.

    “What’s to stop them [companies] from collecting the same data from their customers, or buying it from a data-analytics company?” she said.

    Lawyers representing U.S. tech firms have said previously that competition concerns over data are misguided. They said data isn’t exclusive since many different companies can hold the same information on people’s names, addresses and credit-card details, for example. It is also easy for consumers to switch between platforms, they said.

    As for how companies protect their consumers’ data, Ms. Vestager said that was beyond her scope and pointed to the new EU-wide data-privacy rules agreed late last year.

    Ms. Vestager also said she would publish a preliminary report in the middle of the year, as the next formal step in an investigation into whether Internet commerce companies, such as Amazon.com Inc., are violating antitrust rules by restricting cross-border trade.

    “With so much at stake, we need to act quickly when we discover problems,” she said, in reference to business contracts that aim to keep national markets separate.

    To start that debate, the commissioner said she would publish a paper before Easter outlining the views of relevant parties affected or involved in geo-blocking, a practice to discriminate via price or the range of goods a company offers based on a customer’s location.

    The commission in September launched a public questionnaire to gather more information about the practice of geo-blocking.

    Source: The Wall Street Journal

  • Facing the major challenges that come with big data

    Facing the major challenges that come with big data

    Worldwide, over 2.5 quintillion bytes of data are created every day. And with the expansion of the Internet of Things (IoT), that pace is increasing. 90% of the current data in the world was generated in the last two years alone. When it comes to businesses, for a forward thinking, digitally transforming organization, you’re going to be dealing with data. A lot of data. Big data.

    Challenges faced by businesses

    While simply collecting lots of data presents comparatively few problems, most businesses run into two significant roadblocks in its use: extracting value and ensuring responsible handling of data to the standard required by data privacy legislation like GDPR. What most people don’t appreciate is the sheer size and complexity of the data sets that organizations have to store and the related IT effort, requiring teams of people working on processes to ensure that others can access the right data in the right way, when they need it, to drive essential business functions. All while ensuring personal information is treated appropriately.

    The problem comes when you’ve got multiple teams around the world, all running to different beats, without synchronizing. It’s a bit like different teams of home builders, starting work independently, from different corners of a new house. If they have all got their own methods and bricks, then by the time they meet in the middle, their efforts won’t match up. It’s the same in the world of IT. If one team is successful, then all teams should be able to learn those lessons of best practice. Meanwhile, siloed behavior can become “free form development” where developers write code to suit a specific problem that their department is facing, without reference to similar or diverse problems that other departments may be experiencing.

    In addition, often there simply aren’t enough builders going around to get these data projects turned around quickly, which can be a problem in the face of heightening business demand. In the scramble to get things done at the pace of modern business, at the very least there will be some duplication of effort, but there’s also a high chance of confusion and the foundations for future data storage and analysis won’t be firm. Creating a unified, standard approach to data processing is critical – as is finding a way to implement it with the lowest possible level of resource, at the fastest possible speeds.

    Data Vault automation

    One of the ways businesses can organize data to meet both the needs for standardization and flexibility is in a Data Vault environment. This data warehousing methodology is designed to bring together information from multiple different teams and systems into a centralized repository, providing a bedrock of information that teams can use to make decisions – it includes all of the data, all of the time, ensuring that no information is missed out of the process.

    However, while a Data Vault design is a good architect’s drawing, it won’t get the whole house built on its own. Developers can still code and build it manually over time but given its complexity they certainly cannot do this quickly, and potentially may not be able to do it in a way that can stand up to the scrutiny of data protection regulations like GDPR. Building a Data Vault environment by hand, even using standard templates, can be incredibly laborious and potentially error prone.

    This is where Data Vault automation comes in, taking care of the 90% or so of an organization’s data infrastructure that fits standardized templates and the stringent requirements that the Data Vault 2.0 methodology demands. Data vault automation can lay out the core landscape of a Data Vault, as well as make use of reliable, consistent metadata to ensure information, including personal information, can be monitored both at its source and over time as records are changed.

    Author: Dan Linstedt

    Source: Insidebigdata

  • Forrester benoemt SAS als leader voor Enterprise Insights platforms

    saslogo280x80 580x358SAS is benoemd tot leider in The Forrester Wave: Enterprise Insight Platforms, Q1 2019. In het rapport wordt opgemerkt dat "SAS Viya een moderne architectuur is met een enkele krachtige analytische engine. Het platform van SAS biedt ook de nauwste integratie die we hebben gezien tussen verschillende analysemogelijkheden, data preparation en governance."

    De succesvolle organisaties van deze tijd worden gedreven door analytische inzichten, niet door intuïtie. SAS Viya op het SAS Platform biedt bedrijven toonaangevende analyticsmogelijkheden om zakelijke beslissingen met zowel snelheid als schaalbaarheid te ondersteunen. Met behulp van een solide en samenhangende omgeving kunnen organisaties hun data op grote schaal manipuleren en verkennen. SAS Viya biedt ook toegang tot geavanceerde analyses en kunstmatige intelligentie (AI), met een extra laag voor transparantie en interpretatie in door AI gegenereerde beslissingen. Daarmee wordt de ‘black box’ van AI voor zowel data scientists als zakelijke gebruikers geopend.

    Volledige levenscyclus van analyses

    Volgens Sarah Gates, Product Marketing Manager voor het SAS-platform, willen bedrijven vertrouwen op een uitgebreid platform dat data management, analytics en ontwikkelingstools orkestreert om inzichten te genereren die hun beslissingen ondersteunen. “Het produceren van zowel snelle als betrouwbare resultaten is van cruciaal belang. SAS Viya biedt deze ondersteuning gedurende de volledige levenscyclus van analytics - van data tot discovery en implementatie.”

    Het rapport van Forrester vermeldt dat SAS een eersteklas toolset heeft voor analyse, voorspellen en streamen van. “Ondersteuning voor notebooks, programmeren voor meerdere talen en meer cloudopties completeren SAS Viya en maken het een goede keuze voor bedrijven met bedrijfskritieke analysebehoeften." SAS scoort het hoogst in de analytics tools categorie en heeft de hoogst mogelijke score behaald in de categorie market presence.

     

    Bron: BI-Platform

     

  • Forrester’s Top Trends For Customer Service In 2016

    It’s a no-brainer that good customer service experiences boost satisfaction, loyalty, and can influence top line revenue. Good service — whether it’s to answer a customer’s question prior to purchase, or help a customer resolve an issue post-purchase should be easy, effective, and strive to create an emotional bond between the customer and the company. Here are 5 top trends – out of a total of 10 – that I am keeping my eye on. A full report highlighting all trends can be found here:

    Trend 1: Companies Will Make Self Service Easier. In 2015, we found that web and mobile self-service interactions exceeded interactions over live-assist channels, which are increasingly used by customers as escalation paths to answer harder questions whose answers they can’t find online. In 2016, customer service organizations will make self-service easier for customers to use by shoring up its foundations and solidifying their knowledge-management strategy. They will start to explore virtual agents and communities to extend the reach of curated content. They will start embedding knowledge into devices — like Xerox does with its printers — or delivering it via wearables to a remote service technician.

    Trend 2: Field Service Will Empower Customers To Control Their Time. 73% of consumers say that valuing their time is the most important thing a company can do to provide them with good service — whether on a call, in a chat, or while waiting for a service technician to troubleshoot and fix their product. In 2016, customer service organizations will better support customer journeys that start with an agent-assisted service interaction and end with a service call. They will explore lighter-weight field service management capabilities, which give customers self-service appointment management capabilities and allow agents to efficiently dispatch technicians and optimize their schedules.

    Trend 3: Prescriptive Advice Will Power Offers, Decisions, And Connections. Decisioning — automatically deciding a customer’s or system’s next action — is starting to be heavily leveraged in customer service. In 2016, organizations will use analytics in a much more prescriptive manner – for example to prescribe the right set of steps for customers or agents to more effectively service customers; to correlate online behavior with requests for service and prescribe changes to agent schedules and forecasts. Analytics will be used to better route a customer to an agent who can most effectively answer a question based on skills and behavior data, or to better understand customer call patterns and preempt future calls.

    Trend 4: Insights From Connected Devices Will Trigger Preemptive Service and Turn Companies Into Services-Based Ones. Companies use support automation to preemptively diagnose and fix issues for connected devices. For example, Tesla Motors pushes software patches to connected cars. Nintendo monitors devices to understand customer actions right before the point of failure. In 2016, the Internet of Things (IoT) will continue to transform companies from being products-based to services-based . Examples abound where companies are starting to monitor the state of equipment via IoT, and realizing new streams of revenue because of their customer-centric focus. To make the business model of IoT work, companies must keep a close eye on emerging interoperability standards: device-to-network connectivity, data messaging formats that work under constrained network conditions, and data models to aggregate, connect with contact center solutions, and act on the data via triggers, alerts to service personnel or automated actions.

    Trend 5: The Customer Service Technology Ecosystem Will Consolidate. The customer service process involves complex software that falls into three main categories: queuing and routing technologies, customer relationship management (CRM) customer service technologies, and workforce optimization technologies. You need to use solutions from each of these three software categories, which you must integrate to deliver quality customer service. We believe that the combination of: 1) mature software categories in which vendors are struggling with growth opportunities; 2) the rise of robust software-as-a-service (SaaS) solutions in each category; 3) rising buyer frustration; and 4) the increasing importance of delivering simpler and smarter customer service makes for ripe conditions for further consolidation to happen in the marketplace, This consolidation will make it easier for buyers to support the end-to-end customer service experience with a single set of vendor solutions.

    Source: customer think

  • From data-driven to information-driven?

    Over the last several years, data analytics has become a driving force for organizations wanting to make informed decisions about their businesses and their customers. 

    MIND the GAPWith further advancements in open source analytic tools, faster storage and database performance and the advent of sensors and IoT, IDC predicts the big data analytics market is on track to become a $200 billion industry by the end of this decade.

    Many organizations now understand the value of extracting relevant information from their enterprise data and using it for better decision-making, superior customer service and more efficient management. But to realize their highest potential in this space, organizations will have to evolve from being "data-driven” to being “information-driven.” While these two categories might sound similar, they’re actually quite different.

    In order to make a data-driven decision, a user must somehow find the data relevant to a query and then interpret it to resolve that query. The problem with this approach is there is no way to know the completeness and accuracy of the data found in any reliable way. 

    Being information-driven means having all of the relevant content and data from across the enterprise intelligently and securely processed into information that is contextual to the task at hand and aligned with the user’s goals.

    An information-driven approach is ideal for organizations in knowledge-intensive industries such as life sciences and finance where the number and volume of data sets are increasing and arriving from diverse sources. The approach has repeatedly proven to help research and development organizations within large pharmaceutical companies connect experts with others experts and knowledge across the organization to accelerate research, lab tests and clinical trials to be first to market with new drugs.

    Or think of maintenance engineers working at an airline manufacturer trying to address questions over an unexpected test procedure result. For this, they need to know immediately the particular equipment configuration, the relevant maintenance procedures for that aircraft and whether other cases with the same anomaly are known and how they were treated. They don’t have time to “go hunting” for information. The information-driven approach draws data from multiple locations, formats and languages for a complete picture of the issue at hand. 

    In the recent report, “Insights-Driven Businesses Set the Pace for Global Growth,” Forrester Research notes organizations that use better data to gain business insights will create a competitive advantage for future success. They are expected to grow at an average of more than 30 percent each year, and by 2020 are predicted to take $1.8 trillion annually from their less-informed peers.

    To achieve this level of insight, here are several ways to evolve into an information-driven organization. 

    Understand the meaning of multi-sourced data

    To be information-driven, organizations must have a comprehensive view of information and understand its meaning. If it were only about fielding queries and matching on keywords, a simple indexing approach would suffice. 

    The best results are obtained when multiple indexes are combined, each contributing a different perspective or emphasis. Indexes are designed to work in concert to provide the best results such as a full-text index for key terms and descriptions, a structured index for metadata and a semantic index that focuses on the meaning of the information. 

    Maintain strong security controls and develop contextual abilities

    Being information-driven also requires a tool that is enterprise-grade with strong security controls to support the complexities and multiple security layers, and contextual enrichment to learn an organization’s vernacular and language. 

    Capture and leverage relevant feedback from searches

    As queries are performed, information is captured about the system that interacts with the end user and leveraged in all subsequent searches. This approach ensures the quality of information improves as the system learns what documents are most used and valued the most. 

    Connect information along topical lines

    Connecting information along topical lines across all repositories allows information-driven organizations to expose and leverage their collective expertise. This is especially valuable in large organizations that are geographically distributed.

    As more people are connected, the overall organization becomes more responsive in including research and development, service and support and marketing and sales as needed. Everyone has the potential to be proficient in less time as new and existing employees learn new skills and have access to the expertise to take their work to the next level.

    By connecting related information across dispersed applications and repositories, employees can leverage 360-degree views and have more confidence they are getting holistic information about the topic they are interested in, whether it be a specific customer, a service that is provided, a sales opportunity or any other business entity critical to driving the business. 

    Leverage natural language processing

    A key to connecting information is natural language processing (NLP), which performs essential functions, including automated language detection and lexical analysis for speech tagging and compound word detection.

    NLP also provides the ability to automatically extract dozens of entity types, including concepts and named entities such as people, places and companies. It also enables text-mining agents integrated into the indexing engine that detects regular expressions and complex "shapes" that describe the likely meaning of specific terms and phrases and then normalizes them for use across the enterprise.

    Put Machine Learning to work

    Machine learning (ML) is becoming increasingly critical to enhancing and improving search results and relevancy. This is done during ingestion but also constantly in the background as humans interact with the system. The reason ML has become essential in recent years is that it can handle complexity beyond what’s possible with rules. 

    ML helps organizations become information-driven by analyzing and structuring content to both enrich and extract concepts such as entities and relationships. It can modify results through usage, incorporating human behavior into the calculation of relevance. And it can provide recommendations based what is in the content (content-based) and by examining users’ interactions (collaborative filtering).

    Taking these steps will help organizations become information-driven by connecting people with the relevant information, knowledge, expertise and insights necessary to ensure positive business outcomes. 

    Author: Alexandre Bilger

    Source: Information Management

  • From traditional Business to Smart Big Data leader

    In this post I outline how US agricultural manufacturer John Deere has transformed itself from a traditional manufacturing company to a big data leader. The post was first published in my column for Data Science Central

    John Deere 2

    John Deere has always been a pioneering company. Its eponymous founder personally designed, built and sold some of the first commercial steel ploughs. These made the lives of settlers moving into the Midwest during the middle of the 19th century much easier and established the company as an American legend.

    Often at the forefront of innovation, it is no surprise that it has embraced Big Data enthusiastically – assisting pioneers with the taming of the virtual wild frontier just as it did with the real one.

    In recent years, it has focused efforts on providing Big Data and Internet of Thingssolutions to let farmers (and in the case of their industrial division with the black and yellow logo, builders) to make informed decisions based on real-time analysis of captured data.

    So in this post I want to take a look at some of John Deere’s innovations in the virtual realm, and how they are leading to change which is said to be “revolutionizing” the world of farming.

    Smart farms

    The world’s population is growing rapidly, which means there is always going to be an increasing demand for more food. With the idea of genetically modified food still not appealing to public appetites, increasing the efficiency of production of standard crops is key to this. To this end, John Deere has launched several Big Data-enabled services which let farmers benefit from crowdsourced, real-time monitoring of data collected from its thousands of users.

    They are designed by the company’s Intelligent Solutions Group, and the vision is that one day even large farms will be manageable by a small team of humans working alongside a fleet of robotic tools, all connected and communicating with each other.

    To this end, they are working on a suite of services to allow everything from land preparation to seeding, fertilizing and harvesting to be controlled from a central hub.

    The total land available can be split into sections and “Prescriptions” issued with precise instructions for seed density, depth and fertilization. These decisions are informed by Big Data – aggregated data from thousands of users feeding their own data back to the service for analysis.

    Crowd sourced agriculture

    Myjohndeere.com is an online portal which allows farmers to access data gathered from sensors attached to their own machinery as they work the fields, as well as aggregated data from other users around the world. It is also connected to external datasets including weather and financial data.

    These services allow farmers to make better informed decisions about how to use their equipment, where they will get the best results from, and what return on their investment they are providing.

    For example, fuel usage of different combines can be monitored and correlated with their productivity levels. By analyzing the data from thousands of farms, working with many different crops in many different conditions, it is possible to fine-tune operations for optimum levels of production.

    The system also helps to minimize downtime by predicting, based on crowdsourced data, when and where equipment is likely to fail. This data can be shared with engineers who will stand ready to supply new parts and service machinery as and when it is needed – cutting down on waste caused by expensive machinery sitting idle.

    Another service is Farmsight, launched in 2011. It allows farmers to make proactive decisions about what crops to plant where, based on information gathered in their own fields and those of other users. This is where the “prescriptions” can be assigned to individual fields, or sections of fields, and machinery remotely reprogrammed to alter their behavior according to the “best practice” suggested by the analytics.

    As well as increasing farmers’ profits and hopefully creating cheaper, more abundant food for the world, there are potential environmental gains, too.

    Pesticides and fertilizer can often cause pollution of air and waterways, so having more information on the precise levels needed for optimum production means that no more than is necessary will be used.

    Who owns your agricultural data?

    Of course, with all of this data being generated and shared – there is one question which needs answering – who owns it?

    Deere offers what it calls its Deere Open Data Platform, which lets farmers share data with each other (or choose not to, if they wish) and also with third party application developers, who use can the APIs to connect equipment by other manufacturers, or to offer their own data analysis services.

    But this has not stopped many farmers asking why they should effectively pay for their own data, and asking why John Deere and other companies providing similar services shouldn’t pay them – according to American Farm Bureau Federation director Mary Kay Thatcher.

    Talks are currently ongoing between the AFBF and companies including John Deere, Monsanto and DuPont over how these concerns should be addressed. As well as privacy worries, there are concerns that having too much information could allow traders in financial markets to manipulate prices.

    Farming is one of the fundamental activities which makes us human and distinguishes us from animals. Once we developed farms, we no longer needed to constantly be on the move in the pursuit of food and fertile foraging spots, leading to the development of towns, cities and civilization.

    The future of farming?

    With the development of automation and Big Data, we are starting to delegate those responsibilities to robots – not because farmers are lazy (they really aren’t, as anyone who lives in an area where agricultural activity goes on will tell you!) but because they can often do it better.

    Sure, John Deere’s vision of vast areas of farmland managed by a man sitting at a computer terminal with a small team of helpers will lead to less employment opportunities for humans working the land, but that has been the trend for at least the last century, regardless.

    And the potential for huge positive change– in a world facing overpopulation and insufficient production of food – particularly in the developing nations, is something that has the potential to benefit everyone on the planet.

    I hope you found this post interesting. I am always keen to hear your views on the topic and invite you to comment with any thoughts you might have.

     

    Author: Bernard Marr

     

  • Gaining control of big data with the help of NVMe

    Gaining control of big data with the help of NVMe

    Every day there is an unfathomable amount of data, nearly 2.5 quintillion bytes, being generated all around us. Part of the data being created we see every day, such as pictures and videos on our phones, social media posts, banking and other apps.

    In addition to this, there is data being generated behind the scenes by ubiquitous sensors and algorithms, whether that’s to process quicker transactions, gain real-time insights, crunch big data sets or to simply meet customer expectations. Traditional storage architectures are struggling to keep up with all this data creation, leading IT teams to investigate new solutions to keep ahead and take advantage of the data boom.

    Some of the main challenges are understanding performance, removing data throughput bottlenecks and being able to plan for future capacity. Architecture can often lock businesses in to legacy solutions, and performance needs can vary and change as data sets grow.

    Architectures designed and built around NVMe(non-volatile memory express) can provide the perfect balance, particularly for data-intensive applications that demand fast performance. This is extremely important for organizations that are dependent on speed, accuracy and real-time data insights.

    Industries such as healthcare, autonomous vehicles, artificial intelligence(AI)/machine learning(ML) and Genomics are at the forefront of the transition to high performance NVMe storage solutions that deliver fast data access for high performance computing systems that drive new research and innovations.

    Genomics

    With traditional storage architectures, detailed genome analysis can take upwards of five days to complete, which makes sense considering an initial analysis of one person’s genome produces approximately 300GB - 1TB of data, and a single round of secondary analysis on just one person’s genome can require upwards of 500TB storage capacity. However, with an NVMe solution implemented it’s possible to get results in just one day.

    In a typical study, genome research and life sciences companies need to process, compare and analyze the genomes of between 1,000 and 5,000 people per study. This is a huge amount of data to store, but it’s imperative that it’s done. These studies are working toward revolutionary scientific and medical advances, looking to personalize medicine and provide advanced cancer treatments. This is only now becoming possible thanks to the speed that NVMe enables researchers to explore and analyze the human genome.

    Autonomous vehicles

    A growing trend in the tech industry is the one of autonomous vehicles. Self-driving cars are the next big thing, and various companies are working tirelessly to perfect the idea. In order to function properly, these vehicles need very fast storage to accelerate the applications and data that ‘drive’ autonomous vehicle development. Core requirements for autonomous vehicle storage include:

    • Must have a high capacity in a small form factor
    • Must be able to accept input data from cameras and sensors at “line rate” – AKA have extremely high throughput and low latency
    • Must be robust and survive media or hardware failures
    • Must be “green” and have minimal power footprint
    • Must be easily removable and reusable
    • Must use simple but robust networking

    What kind of storage meets all these requirements? That’s right – NVMe.

    Artificial Intelligence

    Artificial Intelligence (AI) is gaining a lot of traction in a variety of industries varying from financial to manufacturing, and beyond. In financial, AI does things like predict investment trends. In manufacturing, AI-based image recognition software checks for defects during product assembly. Wherever it’s used, AI needs a high level of computing power, coupled with a high-performance and low-latency architecture in order to enable parallel processing power of data in real-time.

    Once again, NVMe steps up to the plate, providing the speed and processing power that is critical during training and inference. Without NVMe to prevent bottlenecks and latency issues, these stages can take much, much longer. Which, in turn, can lead to the temptation to take shortcuts, causing software to malfunction or make incorrect decisions down the line.

    The rapid increase of data creation has put traditional storage architectures under high pressure due to its lack of scalability and flexibility, both of which are required to fulfill future capacity and performance requirements. This is where NVMe comes in, breaking the barriers of existing designs by offerings unanticipated density and performance. The breakthroughs that NVMe is able to offer contain the requirements needed to help manage and maintain the data boom.

    Author: Ron Herrmann

    Source: Dataversity

     

  • Geopolitieke spanningen bedreigen digitale veiligheid Nederland

    Geopolitieke ontwikkelingen, zoals internationale conflicten of politieke gevoeligheden, hebben een grote invloed op de digitale veiligheid in Nederland. Dat stelt staatssecretaris Klaas Dijkhoff in een rapportage die hij gisteren naar de Tweede kamer zond.

    Het rapport 'Cyber Securitybeeld Nederland' (CSBN) laat zien dat de eerder al gesignaleerde trends doorzetten in 2015. Een aanpak waarbij publieke en private partijen nationaal en internationaal samenwerken om de cybersecurity te verbeteren, wordt dan ook noodzakelijk geacht. Staatssecretaris Dijkhoff laat weten dat hij tijdens het aanstaande EU-voorzitterschap van Nederland hier aandacht voor wil vragen bij andere Lidstaten: "Alleen als we samenwerken, kunnen we ons digitale leven beschermen tegen criminaliteit en spionage."

    Werkprogramma

    Tegelijk met het CSBN is de voortgang van het werkprogramma van de Nationale Cyber Security Strategie 2 (NCSS 2) naar de Kamer verzonden. Het NCSS2, gestart in 2013, heeft als doel de Nederlandse digitale weerbaarheid te verbeteren. Het werkprogramma zou 'op hoofdlijnen' op schema liggen, meldt Dijkhoff aan de Tweede Kamer.

    In zijn beleidsreactie benadrukt de Staatssecretaris dat publiek-private samenwerking is cruciaal is in de aanpak van cybercrime en digitale spionage. De snelle ontwikkeling van cyberdreigingen in combinatie met een geopolitieke omgeving die steeds stabiel wordt, vraagt om 'voortdurende aandacht'. Daarom wil Dijkhoff alle relevante publieke en private partijen betrekken bij het doorontwikkelen van de 'cybersecurity-visie'. Uitgangspunt daarbij zal zijn dat cybersecurity een balans is tussen vrijheid, veiligheid en economische groei.

    Alert Online

    Bewustwording over online veiligheid is een belangrijk onderdeel van het digitale veiligheidsbeleid van de overheid. Daarom wordt ook dit jaar weer de campagne 'Alert Online' gehouden. Deze campagne is een gezamenlijk initiatief van overheid, bedrijfsleven en wetenschap en vindt dit jaar plaats van 26 oktober tot 6 november 2015. Er wordt aandacht besteed aan cybercrime die mensen en bedrijven treft, zoals phishing en cryptoware. 

     

    Bron: Automatiseringsgids, 15 Otober 2015

     

  • Growth Stories: Change Everything

    mobile-uiInterview by Alastair Dryburgh

    What do you do with a small technology company which has an interesting product but is stuck in a crowded, noisy market where larger competitors have locked up many of the distribution channels? You could keep struggling on, or you could make a bold move; re-engineer the product to meet a different purpose for a different market. That's what Pentaho did, leading to 6-times growth over 5 years and a successful sale to a large organisation.

    In this interview their CEO Quentin Gallivan describes how they did it.

    Alastair Dryburgh: Quentin, welcome. This series is about that period of a company's evolution when it has to go through the rather dangerous territory that lies between being and exciting new start up and being an established profitable business. I'm told that you've got a very, very interesting story to tell about that with Pentaho. I'm looking forward to hearing that.

    Quentin Gallivan: Okay, great.

    Dryburgh: What would be useful would be if you could give us a very quick background sketch of Pentaho. What it does and how it's evolved in the last few years.

    Gallivan: So Pentaho, the company is approximately 12 years old. There were five founders, and they all came from a business intelligence technology background. What they were looking for was a different way to innovate around the business intelligence market place.

    One of the things I saw going on with that company was that the biggest challenge in companies doing data mining or predictive analytics on unstructured data or big data, was how do you get all this unstructured data, and unstructured data being clickstream data from websites, or weather data, or now what's very popular is machine data from Internet of Things devices.

    I wondered, is there a company out there that can actually make it easier to get all this different data into these big data analytical platforms? Because that was the biggest problem we had.

    When I looked at Pentaho, at the time it was not that company. It was not the new, sexy, next generation company, but I knew the venture capitalist behind Pentaho. We spent about a month just talking about what could the company be. Version one of the company was really a business analytics software product sold to the mid-market. They got some initial traction there, but that was a very cluttered market - very busy, a lot of noise, lots of large incumbents with channel dominance and then lots of small companies. It was hard to get above the din. I was not interested in Pentaho as the company was, right? I didn't see that as very interesting, very compelling.

    What interested me though, was when you dug deeper on the technology I thought it could be repurposed to address the big data problem. That was a big leap of faith, right? Because at the time, Pentaho wasn't doing any big data, didn't have any big data capabilities. The customers were all mid-market, small companies and it was known as a business intelligence company.

    Dryburgh: Pretty substantial change of vision really, isn't it?

    Gallivan: Massive, massive change, and I looked at it and I spoke to the VC's and said, "I would be interested in taking the CEO role, but not for the company that you've invested in, but for a very, very different company and I think we can do it. I don't know if we're going to do it. It's a long shot, but if you're willing to bankroll me, and allow me to build a team and support the vision, I'll give it a go."

    Dryburgh: Could I stop you there a moment to see if I could put a little bit of a frame around this? You've got a pretty fundamental change here.There's probably, very crudely, three different elements you've got to look after. First is obviously the technology and I guess that must have needed to evolve and develop. Then you've got what you might call the harder side of the organisational change, the strategy, definition of who the customer is, the organisation, the roles, the people you need, that's the second one. Then the third element which I think is particularly interesting is the softer side which is the culture. I'd be really interested to hear which of those was the biggest issue for you?

    Gallivan: That's a great question. I like the way you framed it, I would add a fourth dimension, which is the market perception of you. How do you get people to stop thinking about you as Open Source BI company for small and medium size businesses and think about you as leading, big data analytics platform for a large companies, for the large enterprise. Those are the four vectors that we needed to cross that chasm.

    The hardest one was not the culture because at the time, the company was very small. It had 75 employees and we are going to be over 500 employees this year, right? At the time it was really an open book from a culture... The founders were very open to a change in the business. For most startups, less than 100 employees, the culture is generally driven by the founder or founders and so there was no resistance.

    Dryburgh: Okay, good. So what were the biggest things you had to do to make the transformation work?

    Gallivan: If you look at those, just think about the transformation in those four key areas, you look at the metrics. Five years ago we were known as a commercial open source BI company selling to midsize companies. What we wanted to do was to be known as a big data analytics company selling to large enterprises because for big data that's where the dollars are being spent right now.. What we did was we set down the mission, we set down the strategy and then the other piece, and this is sort of from my GE days when it comes to strategic execution, that we employed was you've got to have metrics that drive milestones in the journey.

    What we started to do was we tracked what percentage of our business came from mid-sized small companies versus large. Five years ago 0% came from large. Last quarter it was 75%. Then over this journey we would track that percentage of our business that came from these larger enterprises. The other thing we would track was in that fourth vector, the brand. How do you change the brand from being known as an open source BI company to being known as a big data analytics company? There we had again, at the best marketing organisation I've ever worked with that had a share of a voice metric. Not a feel-good, hey we had so many press releases, but a quantifiable metric about our brand that we tracked four years ago and it was what position do we play and what share of voice do we have when people talk about big data versus non big data.

    That was where our marketing team was very aggressive and had these metrics. When we first started out, since we launched ourselves as a big data analytics company we had a pretty good penetration in terms of the brand, but over the last couple years we've been tracking, we've been number one or two versus our competitors as the most identifiable brand in big data. That's a metric we track every month. Very, very quantifiable, but it's part of the journey. It took us a while to get there.

    Then the other piece, the other key metric for us is really the R and D investment and that was, we basically had to transform or re-engineer the project to really meet the needs of the large enterprise from a security standpoint, a scalability standpoint. Making sure that we integrate with all the key technologies that the large enterprise have and so that was again, when we did prioritization around out R and D we would prioritize and we'd have metrics around large enterprise and then we would sacrifice the needs of the small/medium in the product road map. That again was an evolution.

    Five years ago 10% of our R and D investment went into large enterprise features. Now that's the majority, it's something didn't happen overnight but we tracked and we shared with the company and sort of made it work.

  • Harnessing the value of Big Data

    big dataTo stay competitive and grow in today’s market, it becomes necessary for organizations to closely correlate both internal and external data, and draw meaningful insights out of it.

    During the last decade a tremendous amount of data has been produced by internal and external sources in the form of structured, semi-structured and unstructured data. These are large quantities of human or machine generated data produced by heterogeneous sources like social media, field devices, call centers, enterprise applications, point of sale etc., in the form of text, image, video, PDF and more.

    The “Volume”, “Varity” and “Velocity” of data have posed a big challenge to the enterprise. The evolution of “Big Data” technology has been a boon to the enterprise towards effective management of large volumes of structured and unstructured data. Big data analytics is expected to correlate this data and draw meaningful insights out of it.

    However, it has been seen that, a siloed big data initiative has failed to provide ROI to the enterprise. A large volume of unstructured data can be more a burden than a benefit. That is the reason that several organizations struggle to turn data into dollars.

    On the other hand, an immature MDM program limits an organization’s ability to extract meaningful insights from big data. It is therefore of utmost importance for the organization to improve the maturity of the MDM program to harness the value of big data.

    MDM helps towards the effective management of master information coming from big data sources, by standardizing and storing in a central repository that is accessible to business units.

    MDM and Big Data are closely coupled applications complementing each other. There are many ways in which MDM can enhance big data applications, and vice versa. These two types of data pertain to the context offered by big data and the trust provided by master data.

    MDM and big data – A matched pair

    At first hand, it appears that MDM and big data are two mutually exclusive systems with a degree of mismatch. Enterprise MDM initiative is all about solving business issues and improving data trustworthiness through the effective and seamless integration of master information with business processes. Its intent is to create a central trusted repository of structured master information accessible by enterprise applications.

    The big data system deals with large volumes of data coming in unstructured or semi-structured format from heterogeneous sources like social media, field devises, log files and machine generated data.  The big data initiative is intended to support specific analytics tasks within a given span of time after that it is taken down. In Figure 1 we see the characteristics of MDM and big data.  

     

    MDM

    Big Data

    Business Objective

      Provides a single version of trust of Master and Reference information.

      Acts as a system of record / system of reference for enterprise.

      Provides cutting edge analytics and offer a competitive advantage

    Volume of Data and Growth

      Deals with Master Data sets which are smaller in volume

      Grow with relatively slower rate.

      Deal with enormous large volumes of data, so large that current databases struggle to handle it.

      The growth of Big Data is very fast.

    Nature of Data

      Permanent and long lasting

      Ephemeral in nature; disposable if not useful.

    Types of Data (Structure and Data Model)

      It is more towards containing structured data in a definite format with a pre-defined data model.

      Majority of Big Data is either semi-structured or unstructured, lacking in a fixed data model.

    Source of Data

      Oriented around internal enterprise centric data.

      Platform to integrate the data coming from multiple internal and external sources including social media, cloud, mobile, machine generated data etc.

    Orientation

      Supports both analytical and operational environment.

      Fully analytical oriented

    Despite apparent differences there are many ways in which MDM and big data complement each other.

    Big data offers context to MDM

    Big data can act as an external source of master information for the MDM hub and can help enrich internal Master Data in the context of the external world.  MDM can help aggregate the required and useful information coming from big data sources with  internal master records.

    An aggregated view and profile of master information can help  link the customer correctly and in turn help perform effective analytics and campaign. MDM can act as a hub between the system of records and system of engagement.

    However, not all data coming from big data sources will be relevant for MDM. There should be a mechanism to process the unstructured data and distinguish the relevant master information and the associated context. NoSQL offering, Natural Language Processing, and other semantic technologies can be leveraged towards distilling the relevant master information from a pool of unstructured/semi-structured data.

    MDM offers trust to big data

    MDM brings a single integrated view of master and reference information with unique representations for an enterprise. An organization can leverage MDM system to gauge the trustworthiness of data coming from big data sources.

    Dimensional data residing in the MDM system can be leveraged towards linking the facts of big data. Another way is to leverage the MDM data model backbone (optimized for entity resolution) and governance processes to bind big data facts.

    The other MDM processes like data cleansing, standardization, matching and duplicate suspect processing can be additionally leveraged towards increasing the uniqueness and trustworthiness of big data.

    MDM system can support big data by:

    • Holding the “attribute level” data coming from big data sources e.g. social media Ids, alias, device Id, IP address etc.
    • Maintaining the code and mapping of reference information. 
    • Extracting and maintaining the context of transactional data like comments, remarks, conversations, social profile and status etc. 
    • Facilitating entity resolution.
    • Maintaining unique, cleansed golden master records
    • Managing the hierarchies and structure of the information along with linkages and traceability. E.g. linkages of existing customer with his/her Facebook id linked-in Id, blog alias etc.
    • MDM for big data analytics – Key considerations

    Traditional MDM implementation, in many cases, is not sufficient to accommodate big data sources. There is a need for the next generation MDM system to incorporate master information coming from big data systems. An organization needs to take the following points into consideration while defining Next Gen MDM for big data:

    Redefine information strategy and topology

    The overall information strategy needs to get reviewed and redefined in the context of big data and MDM. The impact of changes in topology needs to get accessed thoroughly. It is necessary to define the linkages between these two systems (MDM and big data), and how they operate with internal and external data. For example, the data coming from social media needs to get linked with internal customer and prospect data to provide an integrated view at the enterprise level.

    Information strategy should address following:

    Integration point between MDM and big data - How big data and MDM systems are going to interact with each other.
    Management of master data from different sources - How the master data from internal and external sources is going to be managed.
     Definition and classification of master data - How the master data coming from big data sources gets defined and classified.
    Process of unstructured and semi-structured master data - How master data from big data sources in the form of unstructured and semi-structured data is going to be processed.
    Usage of master data - How the MDM environment are going to support big data analytics and other enterprise applications.

    Revise data architecture and strategy

    The overall data architecture and strategy needs to be revised to accommodate changes with respect to the big data. The MDM data model needs to get enhanced to accommodate big data specific master attributes. For example the data model should accommodate social media and / or IoT specific attributes such as social media Ids, aliases, contacts, preferences, hierarchies, device Ids, device locations, on-off period etc. Data strategy should get defined towards effective storage and management of internal and external master data.

    The revised data architecture strategy should ensure that:

    • The MDM data model accommodates all big data specific master attributes
    • The local and global master data attributes should get classified and managed as per the business needs
    • The data model should have necessary provision to interlink the external (big data specifics) and internal master data elements. The necessary provisions should be made to accommodate code tables and reference data.

     Define advanced data governance and stewardship

     A significant amount of challenges are associated towards governing Master Data coming from big data sources because of the unstructured nature and data flowing from various external sources. The organization needs to define advance policy, processes and stewardship structure that enable big data specifics governance.

    Data governance process for MDM should ensure that:

    Right level of data security, privacy and confidentiality to be maintained for customer and other confidential master data.
    Right level of data integrity to be maintained between internal master data and master data from big data sources. 
    Right level of linkages between reference data and master data to exist.
    Policies and processes need to be redefined/enhanced to support big data and related business transformation rules and control access for data sharing and distribution, establishing the ongoing monitoring and measurement mechanisms and change.
    A dedicated group of big data stewards available for master data review, monitoring and conflict management.

    Enhance integration architecture

     The data integration architecture needs to be enhanced to accommodate the master data coming from big data sources. The MDM hub should have the right level of integration capabilities to integrate with big data using Ids, reference keys and other unique identifiers.

    The unstructured, semi-structured and multi-structured data will get parsed using big data parser in the form of logical data objects. This data will get processed further, matched, merged and get loaded with the appropriate master information to the MDM hub.

    The enhanced integration architecture should ensure that:

    The MDM environment has the ability to parse, transform and integrate the data coming from the big data platform.
    The MDM environment has the intelligence built to analyze the relevance of master data coming from big data environment, and accept or reject accordingly.

    Enhance match and merge engine

     MDM system should enhance the “Match & Merge” engine so that master information coming from big data sources can correctly be identified and integrated into the MDM hub. A blend of probabilistic and deterministic matching algorithm can be adopted.

    For example, the successful identification of the social profile of existing customers and making it interlinked with existing data in the MDM hub. The context of data quality will be more around the information utility for the consumer of the data than objective “quality”.

    The enhanced match and merge engine should ensure that:

    • The master data coming from big data sources get effectively matched with internal data residing in the MDM Hub.
    • The “Duplicate Suspect” master records get identified and processed effectively.
    • The engine should recommend the “Accept”, “Reject”, “Merge” or “Split” of the master records coming from big data sources.

     

    In this competitive era, organizations are striving hard to retain their customers.  It is of utmost importance for an enterprise to keep a global view of customers and understand their needs, preferences and expectations.

    Big data analytics coupled with MDM backbone is going to offer the cutting edge advantage to enterprise towards managing the customer-centric functions and increasing profitability. However, the pairing of MDM and big data is not free of complications. The enterprise needs to work diligently on the interface points so to best harness these two technologies.

    Traditional MDM systems needs to get enhanced to accommodate the information coming from big data sources, and draw a meaningful context. The big data system should leverage MDM backbone to interlink data and draw meaningful insights.

    Bron: Information Management, 2017, Sunjay Kumar

  • Het kabinet gaat inzetten op Big Data

    free-vector-wetenschappelijke-raad-voor-het-regeringsbeleid 028926 wetenschappelijke-raad-voor-het-regeringsbeleidZo blijkt uit de kabinetsreactie op het eerder dit jaar verschenen rapport “Big Data in een vrije en veilige samenleving” van de Wetenschappelijke Raad voor het Regeringsbeleid (“WRR”). In mei 2014 diende de regering een adviesaanvraag over het thema ‘Big Data, privacy en veiligheid’ bij het WRR in. Volgens het WRR ligt het zwaartepunt in de huidige regelgeving te zeer bij het reguleren van het verzamelen en delen van data. De WRR adviseert daarom ook de regulering aan te vullen met toezicht op fases van analyses en het gebruik van Big Data.
     
    In het rapport wordt ingegaan op Big Data-analyses door het ‘veiligheidsdomein’ (zijnde politie, justitie, inlichtingen- en veiligheidsdiensten en organisaties en samenwerkingsverbanden omtrent fraudebestrijding.) In deze sector liggen veel mogelijkheden. Echter vragen deze mogelijkheden om corresponderende waarborgen voor de vrijheidsrechten van burgers. Enkele Big Data-mogelijkheden die het rapport noemt zijn het reconstrueren van aanslagen, het in kaart brengen van terroristische netwerken, het realtime volgen van ontwikkelingen in crisissituaties en crowd control bij evenementen. Ook benoemt het rapport de vervaging van de grens tussen data uit publieke en private bronnen, zoals collega Micha Schimmel al beschreef in een blog.
     
    Het kabinet spreekt het voornemen uit te onderzoeken of de wettelijke basis omtrent data-analyses versteviging nodig heeft. Zij onderzoekt tevens welke waarborgen daarbij gehanteerd moeten worden, waarbij het vergroten van transparantie het aandachtspunt is. Met ‘transparantie’ bedoelt het kabinet; inzicht in informatie aan burgers over aangewende bestanden, toelaatbare foutmarges, gehanteerde logica en doeleinden van de analyses. Het doel hiervan is het waarborgen van deugdelijke besluitvorming waar Big Data aan ten grondslag ligt.
     
    Op dit moment is er een verbod voor besluitvorming zonder menselijke tussenkomst, waarbij geldt dat dit niet is toegestaan indien er een aanmerkelijke impact is voor burgers (artikel 42 Wbp). De situatie waarin een computerprogramma persoonsgegevens analyseert en op basis daarvan een verzoek van een betrokkene afwijst, is hierdoor bijvoorbeeld niet toegestaan.
     
    Het kabinet gaat onderzoeken of situaties waarbij de menselijke goedkeuring van Big Data-analyses buiten dit verbod valt. Eveneens gaat onderzocht worden hoe voldoende inzicht in analysemethoden gegeven kan worden zodat rechters betere afwegingen kunnen maken bij geschillen.
     
    “Analyse en gebruik vormen het hart van Big Data-processen en dat mag niet oncontroleerbaar verlopen: er moet onafhankelijk toezicht zijn op algoritmen en profielen – en dat vereist bevoegdheden, middelen en expertise voor toezichthouders.” Concludeert de WRR.
     
    De Minister van Veiligheid en Justitie noemde in zijn toespraak bij ontvangst van het rapport een aanstaande mogelijkheid om de strafmaat van daders in de Amerikaanse staat Pennsylvania te baseren op te verwachten vergrijpen in de toekomst. Hoewel hij dit afdeed als dystopische  praktijken, liet hij wel duidelijk merken dat Big Data de toekomst is en wij niet moeten kijken naar de nadelen maar ons juist moeten focussen op de voordelen. De toekomst zal ons leren wat voor impact de toepassingen van Big Data op onze samenleving hebben.
     
    source: solv.nl, 5 december 2016
  • Hoe groot is ‘the next big thing’?

    iotWat als IoT gewoon een overkoepelende term zou zijn voor manieren om iets bruikbaars te maken uit machine-gegenereerde data? Bijvoorbeeld, een bus vertelt mijn telefoon hoe ver mijn bushalte is en mijn fietsverhuur vertelt me ​​hoeveel fietsen beschikbaar zijn?

    In 2014 vroeg IDC 400 C-suite professionals wat volgens hen IoT was. De antwoorden varieerden van soorten apparaten (thermostaten, auto's, home security-systemen) tot uitdagingen (beveiliging, data management, connectiviteit). Dezelfde analist benadrukt ook dat de wereldwijde markt voor IoT oplossingen zal groeien van 1,9 biljoen in 2013 tot 7,1 biljoen dollar in 2020. Dit optimisme wordt ondersteund door Gartner’s inschatting: 4,9 miljard gekoppelde 'dingen' zullen in 2016 in gebruik zijn. In 2020 zullen dat er 25 miljard zijn.
    Met andere woorden: IoT is zeer divers en het potentieel is enorm. De waarde ligt niet alleen in de kosten van de sensoren. Het is veel meer dan dat.

    Wanneer IoT begint te vertellen
    Het IoT is niet iets dat op zichzelf staat. Het rijpt naast big data. Het uitrusten van miljarden objecten met sensoren is van beperkte waarde als het niet mogelijk is miljarden datastromen te genereren, verzenden, opslaan en te analyseren.
    De datawetenschapper is de menselijke choreograaf van dit IoT. Zij zijn essentieel voor het identificeren van de waarde van de enorme hoeveelheid data die al deze apparaten genereren. En dat is de reden waarom connectiviteit en opslag zo belangrijk zijn. Kleine geïsoleerde apparaten zonder opslag en weinig rekenkracht vertellen ons weinig. Alleen door naar grote verzamelingen data te kijken kunnen we correlaties ontdekken en wordt het mogelijk trends te herkennen en voorspellingen te doen.
    In elke zakelijke omgeving, is het scenario identiek: de CxO zal de informatie die er vandaag is bekijken ten opzichte van informatie die er was in het verleden om een voorspelbaar inzicht te krijgen in wat er gaat gebeuren in de toekomst.

    Sneller inzicht leidt tot concurrentievoordeel
    CxO’s willen tegenwoordig een ander soort bedrijf. Ze willen dat het in een snel tempo opereert en reageert op de markt, maar ze willen ook beslissingen nemen op basis van intelligentie verzameld via big data. En ze willen de beste producten maken, gebaseerd op klantinzicht. Bedrijven zijn op zoek naar een disruptief business model waardoor ze steeds meer in kunnen spelen op trends in de markt en daarmee een voorsprong hebben op de concurrentie.

    Start-up gedrag
    Het antwoord ligt in de volgende vraag aan bedrijven: "Waarom kunnen ondernemingen zich niet meer als start-ups gedragen?" Dit gaat niet over het maken van overhaaste beslissingen met weinig of geen overzicht. Het gaat over het aannemen van een slank business model dat onzekerheid en uitgerekte budgetten tolereert. En nog belangrijker, het gaat over hoe het management van het bedrijf een cultuur van slagvaardigheid neerzet.
    De organisaties die zullen winnen in het big data spel zijn niet degenen die de meeste of de beste toegang ertoe hebben. De winnaars omschrijven duidelijk hun doelen, zetten de nodige operationele grenzen en stellen vast wat de uitrusting is die nodig is om de klus te klaren.

    Leidende rol CIO's
    CxO’s hebben de zakelijke waarde van IT erkend, en willen dat CIO's meer een leidende rol nemen en in kaart brengen wat de toekomst is van het bedrijf. IT kan een enorme rol spelen in de bouw van die toekomst door samen te werken met de business en de tools te verschaffen die nodig zijn om productief te zijn. Technologie kan voortdurende innovatie op elk niveau vergemakkelijken, waardoor het bedrijf niet alleen kan overleven maar floreren.
    Het is niet niks om deze wens van bedrijven te bereiken. Maar samenwerken met technologie maakt het veel haalbaarder omdat het bedrijven in staat stelt tot een wendbare, innovatieve, data-gedreven toekomst te komen.

    Source: ManagersOnline

  • Hoe onderscheiden data gedreven organisaties zich echt?

    We are data driven Image

    Je hoort het vaak in bestuurskamers: we willen een data-driven organisatie zijn. We willen aan de slag met IoT, (predictive) analytics of location based services. En ja, dat zijn sexy toepassingen. Maar wat zijn de werkelijke business drivers? Die blijven vaak onderbelicht. Onderzoek laat zien op welke terreinen organisaties met een hoge ‘datavolwassenheid’ vooroplopen.

    SAS ondervroeg bijna 600 beslissers en kon op basis van de antwoorden de respondenten onderverdelen in drie groepen: de koplopers, een middengroep en de achterblijvers. Zo ontstaat goed zicht op waarin de koplopers zich onderscheiden van de achterblijvers.

    Het eerste wat opvalt is de proactieve houding. Koplopers maken budget vrij om oude processen en systemen te vervangen en investeren in de uitdaging van data-integratie. Er heerst bovendien een cultuur van ‘continuous improvement’. Deze bedrijven zijn voortdurend actief op zoek naar verbetermogelijkheden. Dit in tegenstelling tot de achterblijvers, die pas willen investeren in verbeteringen als ze precies weten hoe hoog de ROI is.

    De koplopers vervangen hun oude systemen het vaakst door open source data platformen, waarbij Hadoop verreweg het meest populaire platform is. Behalve in technologie investeren deze bedrijven ook meer in het opschonen van data. Ze hebben goede processen ingericht om ervoor te zorgen dat data up-to-date en van de juiste kwaliteit is voor het beoogde gebruik. En ook de governance op deze processen is beter dan in de bedrijven die achterblijven (lees hier over het verhogen van de ROI op data en IT).

    Ook investeren koplopers meer in talent. 73 procent van deze bedrijven heeft een dedicated datateam dat wordt bezet met eigen mensen. De achterblijvers hebben vaker ofwel helemaal geen datateam ofwel een team dat wordt ingevuld door externe mensen. Koplopers investeren ook meer in werving en selectie van gekwalificeerd personeel. Daardoor ondervindt ‘slechts’ 38 procent van de koplopers een tekort aan interne vaardigheden, tegenover 62 procent van de achterblijvers.

    Dit alles leidt ertoe dat koplopers beter zijn voorbereid op de GDPR-regelgeving, die in 2018 zijn intrede doet.

    Ze zijn beter in staat om de risico’s te benoemen die verbonden zijn aan een data-driven strategie en ze hebben maatregelen genomen om deze risico’s af te dekken of te verkleinen.

    De komst van de GDPR is voor veel organisaties een aanleiding om te investeren in een goede datastrategie. Maar dit is niet de enige reden. Bedrijven met een hoge datavolwassenheid kunnen:

    • sneller ingewikkelde vragen beantwoorden
    • sneller beslissingen nemen
    • sneller innoveren en groeien
    • de klantervaring verbeteren
    • groei realiseren in omzet en marktaandeel
    • kortere time-to-market voor nieuwe producten en diensten realiseren
    • business processen optimaliseren
    • betere strategische plannen en rapportages maken

    Alle reden dus om écht in data governance en data management te investeren en niet alleen maar te roepen dat je organisatie data-driven is. 90 procent van de ondervraagden vindt zichzelf namelijk datagedreven, maar de realiteit is helaas minder rooskleurig.

    Interesse in de volledige onderzoeksresultaten?
    Download hier het rapport ‘How data-driven organisations are winning’.

     

    Bron: Rein Mertens (SAS)

    In: www.Analyticstoday.nl

  • Hoe waarde creatie met predictive analysis en datamining

    De groeiende hoeveelheid data brengt een stortvloed aan vragen met zich mee. De hoofdvraag is wat we met die data kunnen Data miningen betere diensten aangeboden kunnen worden en risico’s vermeden? Helaas blijft bij de meeste bedrijven die vraag onbeantwoord. Hoe kunnen bedrijven waarde aan data toevoegen en overgaan tot predictive analytics, machine learning en decision management?

    Predictive analytics: de glazen bol voor de business

    Via data mining worden verborgen patronen in gegevens zichtbaar waardoor de toekomst voorspeld kan worden. Bedrijven, wetenschappers en overheden gebruiken al tientallen jaren dit soort methoden om vanuit data inzichten voor toekomstige situaties te verkrijgen. Moderne bedrijven gebruiken data data mining en predictive analytics om onder andere fraude op te sporen, cybersecurity te voorkomen en voorraadbeheer te optimaliseren. Dankzij een iteratief analytisch proces brengen zij data, verkenning van de data en de inzet van de nieuwe inzichten uit de data samen.

    Data mining: business in de lead

    Decision management zorgt dat deze inzichten worden omgezet in acties in het operationele proces. De vraag is hoe dit proces binnen een bedrijf vorm te geven. Het begint altijd bij een vraag vanuit de business en eindigt bij een evaluatie van de acties. Hoe deze Analytical Life Cycle eruit ziet en welke vragen relevant zijn per branche, leest u in de Data Mining From A to Z: How to Discover Insights and Drive Better Opportunities.

     

    Naast dit model waarin duidelijk wordt hoe uw bedrijf dit proces kan inzetten, wordt dieper ingegaan op de rol van data mining in het stadium van onderzoek. Door dit verder uit te diepen via het onderstaande stappenplan kan nog meer waarde uit data worden gehaald.

    1. Business-vraag omvormen tot een analytische hypothese

    2. Data gereedmaken voor data mining

    3. Data verkennen

    4. Data in een model plaatsen

    Wilt u weten hoe uw bedrijf ook data in kan zetten om de vragen van morgen te kunnen beantwoorden en een betere service kan verlenen? Download dan “Data Mining From A to Z: How to Discover Insights and Drive Better Opportunities.”

  • How a Video Game Helped People Make Better Decisions

     

    oct15 14 games aResearchers in recent years have exhaustively catalogued and chronicled the biases that affect our decisions. We all know the havoc that biased decisions can wreak. From misguided beliefs about the side effects of vaccinating our children, to failures in analysis by our intelligence community, biases in decision making contribute to problems in business, public policy, medicine, law, education, and private life.

    Researchers have also long searched for ways to train people to reduce bias and improve their general decision making ability – with little success. Traditional training, designed to debias and improve decision-making, is effective in specific domains such as firefighting, chess, or weather forecasting. But even experts in such areas fail to apply what they’ve learned to new areas. Weather forecasters, for instance, are highly accurate when predicting the chance of rain, but they are just as likely as untrained novices to show bias when making other kinds of probability estimates, such as estimating how many of their answers to basic trivia questions are correct.

    Because training designed to improve general decision making abilities has not previously been effective, most efforts to debias people have focused on two techniques. The first is changing the incentives that influence a decision. Taxing soda, for example, in the hopes that the increased cost will dissuade people from buying it. The second approach involves changing the way information for various choices is presented or choices are made, such as adding calorie information to fast-food menus or offering salad as the default side order to entrées instead of French fries. However, these methods are often not always effective, and when effective, only affect specific decisions, not decision-makers’ ability to make less biased decisions in other situations.

    My research collaborators and I wondered if an interactive training exercise might effectively debias decision-makers. (The team included Boston University’s Haewon Yoon, City University London’s Irene Scopelliti, Leidos’ Carl W. Symborski, Creative Technologies, Inc.’s James H. Korris and Karim Kassam, a former assistant professor at Carnegie Mellon University.) So we spent the past four years developing two interactive, “serious” computer games to see if they might substantially reduce game players’ susceptibility to cognitive bias.

    There was scant evidence that this kind of one-shot training intervention could be effective, and we thought our chances of success were slim. But, as we report in a paper just published in Policy Insights in the Behavioral and Brain Sciences,the interactive games not only reduced game players’ susceptibility to biases immediately, those reductions persisted for several weeks. Participants who played one of our games, each of which took about 60 minutes to complete, showed a large immediate reduction in their commission of the biases (by more than 31%), and showed a large reduction (by more than 23%) at least two months later.

    The games target six well-known cognitive biases. Though these biases were chosen for their relevance to intelligence analysis, they affect all kinds of decisions made by professionals in business, policy, medicine, and education as well. They include:

    • Bias blind spot – seeing yourself as less susceptible to biases than other people
    • Confirmation bias – collecting and evaluating evidence that confirms the theory you are testing
    • Fundamental attribution error – unduly attributing someone’s behavior to enduring aspects of that person’s disposition rather than to the circumstance in which the person was placed
    • Anchoring – relying too heavily on the first piece of information considered when making a judgment
    • Projection – assuming that other people think the same way we do
    • Representativeness – relying on some simple and often misleading rules when estimating the probability of uncertain events

    We ran two experiments. In the first experiment, involving 243 adult participants, one group watched a 30-minute video, “Unbiasing Your Biases,” commissioned by the program sponsor, the Intelligence Advanced Research Projects Activity (IARPA), a U.S. research agency under the Director of National Intelligence. The video first defined heuristics – information-processing shortcuts that produce fast and efficient, though not necessarily accurate, decisions. The video then explained how heuristics can sometimes lead to incorrect inferences. Then, bias blind spot, confirmation bias, and fundamental attribution error were described and strategies to mitigate them were presented.

    Another group played a computer game, “Missing: The Pursuit of Terry Hughes,” designed by our research team to elicit and mitigate the same three cognitive biases. Game players make decisions and judgments throughout the game as they search for Terry Hughes – their missing neighbor. At the end of each level of the game, participants received personalized feedback about how biased they were during game play. They were given a chance to practice and they were taught strategies to reduce their propensity to commit each of the biases.

    We measured how much each participant committed the three biases before and after the game or the video. In the first experiment, both the game and the video were effective, but the game was more effective than the video. Playing the game reduced the three biases by about 46% immediately and 35% over the long term. Watching the video reduced the three biases by about 19% immediately and 20% over the long term.

    In a second experiment, involving 238 adult participants, one group watched the video “Unbiasing Your Biases 2” to address anchoring, projection, and representativeness. Another group played the computer detective game“Missing: The Final Secret,” in which they were to exonerate their employer of a criminal charge and uncover criminal activity of her accusers. Along the way, players made decisions that tested their propensity to commit anchoring, projection, and representativeness. After each level of the game, their commission of those biases was measured and players were provided with personalized feedback, practice, and mitigation strategies.

    Again, the game was more effective than the video. Playing the game reduced the three biases by about 32% immediately and 24% over the long term. Watching the video reduced the three biases by about 25% immediately and 19% over the long term.

    The games, which were specifically designed to debias intelligence analysts, are being deployed in training academies in the U.S. intelligence services. But because this approach affects the decision maker rather than specific decisions, such games can be effective in many contexts and decisions – and with lasting effect. (A commercial version of the games is in production.)

    Games are also attractive because once such approaches are developed, the marginal costs of debiasing many additional people are minimal. As this and other recent work suggests, such interactive training is a promising addition to the growing suite of techniques that improve judgment and reduce the costly mistakes that result from biased decision making.

    Source: http://www.scoop.it/t/strategy-and-competitive-intelligencebig

     

  • How big data can help your business design its letterhead

    How big data can help your business design its letterhead

    Big data can pave the way for major improvements in the quality of company letterhead designs. Here's how that can happen.

    Big data has been at the forefront of the design industry for years. A number of companies have written detailed articles on the utilization of data visualization with graphics. However, big data can be effective in more rudimentary designs as well.

    There are a lot of effective ways to use big data to make better designs. Many modern design tools rely on sophisticated machine learning algorithms. Companies producing company letterhead use big data technology to create the best possible designs.

    Elements of big data in company letterhead design

    Big data is redefining the way that companies design letterheads. However, they still need to employ the right design principles. Machine learning tools can make higher quality designs much more quickly, but won’t do much good if these basic design guidelines are ignored.

    Patrick Hebron wrote a great article on design practices in the age of machine learning. He pointed out that machine learning aids in a number of ways, such as finding emergent feature sets and the best starting template for a design.

    Company letterhead shouldn’t be seen as a larger version of your business card. It is a template on which you write or print a variety of messages. Here are a few design tips for your company letterhead with big data design tools.

    Focus on readability

    A business card must be readable. You can’t let graphics, photos, and decorative elements distract from the critical information on the card. Yet the business card has to stand out from the crowded stack of business cards stuffed into someone’s wallet. On the flip side, a business letter is meant to be read. That is its entire purpose. You shouldn’t put product images or a list of services provided on a business letterhead. Big data design tools may help with choosing better templates, but you still need to manually review the card for readability.

    The letterhead for your company should contain only the essential elements. This includes your business name, mailing address, and contact information. Whether you put your email address or phone number first is your decision. It isn’t necessary to include your business website on the letter head unless you’re sending bills that could be paid online. In these cases, you could include the URL for the payment site. However, it is more economical to leave that in the body of the message. Then you can use the letterhead for everything from bills to apology letters.

    When creating letterhead with a big data design tool, use neutral white or off-white paper for the letterhead. This ensures that the text is readable no matter what color ink you use. A side benefit of this is that the letter will remain readable if the recipient runs it through a copying machine or fax machine.

    Only include what adds value

    Machine learning tools help you find design elements that are popular with other graphic artists. However, you may find that what they consider popular with designers doesn’t work with customers. You should instead use data analytics to find what adds value to your end viewer.

    Only include elements on the letterhead that add value. For example, include your business contact information on every printed letter, since that is appropriate no matter what the letter says. Leave off pre-printed signatures, since the names of those in the role may change. You don’t want to have to print off new letterhead because someone changed their name upon marriage or divorce. There are several other reasons to leave off signature lines printed onto every letter head. One reason is that the packaged a signature line like 'Thanks' or 'Hear from you soon' could conflict with the tone of the message printed or written on the letter. This undermines your image. Another reason is that pre-printed signatures make it easier for someone to generate a fake message, especially if there is a literally printed signature of the business owner. Put the business name and address on the bottom of the page if you can’t stand leaving the footer empty.

    Use images strategically

    Be strategic with any images on the letterhead. As a rule, include your logo when creating letterhead with a big data design tool. It can be beneficial to include a thumbnail of your head shot when you’re working closely with a client. For example, your face on the photo could win their trust if you want to buy their house or you want them to recognize you when you come to their door. However, you shouldn’t be adding pictures of your products to the letterhead. You could add the logo to another corner of the page or embed a watermark into the paper. Yet this must be done carefully so that someone scanning the image doesn’t end up with unreadable sections of text.

    Big data is essential for designing great company letterhead

    Big data design tools are great for many purposes. They are great for designing company letterheads. However, you need to still follow basic design guidelines that big data technology can’t account for.

    Author: Diana Hope

    Source: Smart Data Collective

  • How Big Data is changing the business landscape

    jpgBig Data is increasingly being used by prominent companies to outpace the competition. Be it established companies or start-ups, they are embracing data-focussed strategies to outpace the competition.

    In healthcare, clinical data can be reviewed treatment decisions based on big data algorithms that work on aggregate individual data sets to detect nuances in subpopulations that are so rare that they are not readily apparent in small samples.

    Banking and retail have been early adopters of Big Data-based strategies. Increasingly, other industries are utilizing Big Data like that from sensors embedded in their products to determine how they are actually used in the real world.

    Big Data is useful not just for its scale but also for its real-time and high-frequency nature that enables real-time testing of business strategies. While creating new growth opportunities for existing companies, it is also creating entirely new categories of companies that capture and analyse industry data about products and services, buyers and suppliers, consumer preferences and intent.

     

    What can Big Data analytics do for you?

    *Optimise Operations

    The advent of advanced analytics, coupled with high-end computing hardware, has made it possible for organizations to analyse data more comprehensively and frequently.

    Analytics can help organisations answer new questions about business operations and advance decision-making, mitigate risks and uncover insights that may prove to be valuable to the organisation. Most organisations are sitting upon heaps of transactional data. Increasingly, they are discovering and developing the capability to collect and utilise this mass of data to conduct controlled experiments to make better management decisions.

    * React faster

    Big Data analytics allows organisations to make and execute better business decisions in very little time. Big Data and analytics tools allow users to work with data without going through complicated technical steps. This kind of abstraction allows data to be mined for specific purposes.

    * Improve the quality of services

    Big Data analytics leads to generation of real business value by combining analysis, data and processing. The ability to include more data, run deeper analysis on it and deliver faster answers has the potential to improve services. Big Data allows ever-narrower segmentation of customers and, therefore, much more precisely tailored products or services. Big Data analytics helps organizations capitalize on a wider array of new data sources, capture data in flight, analyse all the data instead of sample subsets, apply more sophisticated analytics to it and get answers in minutes that formerly took hours or days.

    * Deliver relevant, focussed customer communications

    Mobile technologies tracks can now track where customers are at any point of time, if they're surfing mobile websites and what they're looking at or buying. Marketers can now serve customised messaging to their customers. They can also inform just a sample of people who responded to an ad in the past or run test strategies on a small sample.

    Where is the gap?

    Data is more than merely figures in a database. Data in the form of text, audio and video files can deliver valuable insights when analysed with the right tools. Much of this happens using natural language processing tools, which are vital to text mining, sentiment analysis, clinical language and name entity recognition efforts. As Big Data analytics tools continue to mature, more and more organisations are realizing the competitive advantage of being a data-driven enterprise.

    Social media sites have identified opportunities to generate revenue from the data they collect by selling ads based on an individual user's interests. This lets companies target specific sets of individuals that fit an ideal client or prospect profile. The breakthrough technology of our time is undeniably Big Data and building a data science and analytics capability is imperative for every enterprise.

    A successful Big Data initiative, then, can require a significant cultural transformation in an organisation. In addition to building the right infrastructure, recruiting the right talent ranks among the most important investments an organization can make in its Big Data initiative. Having the right people in place will ensure that the right questions are asked - and that the right insights are extracted from the data that's available. Data professionals are in short supply and are being quickly snapped up by top firms.

    Source: The Economic Times

  • How big data is having a 'mind-blowing' impact on medicine

    istock000016682100doubleDell Services chief medical officer Dr. Nick van Terheyden explains the 'mind blowing' impact big data is having on the healthcare sector in both developing and developed countries.

    For a long time, doctors have been able to diagnose people with diabetes—one of the world's fastest growing chronic diseases—by testing a patient's insulin levels and looking at other common symptoms, as well as laboratory results.

    While there has been great accuracy in their diagnoses in the past, the real opportunity in healthcare at the moment, according to Dell Services chief medical officer Dr. Nick van Terheyden, is the role big data can play in taking the accuracy of that diagnosis a step further by examining a person's microbiome, which changes as people develop diabetes.

    "We can come up with a definitive diagnosis and say you have it based on these criteria. But now, interestingly, that starts to open up opportunities to say 'could you treat that?'" Terheyden said.

    He described these new advancements as "mind-blowing."

    "So, there is now the potential to say 'I happen to know you're developing diabetes, but I'm going to give you therapy that changes your biome and reverses that process, and to me that's just mind-blowing as I continue to see these examples," Terheyden said.

    He pinned a major contributor to the "explosion" of data to genomics, saying having additional data will increase the opportunity for clinicians to identify correlations that have previously been poorly understood or gone unnoticed, and improve the development and understanding of causation.

    "When the first human was sequenced back in the early 2000s, it was billions of dollars, and many years and multiple peoples' work and effort. We're now down to sequencing people in under 24 hours and for essentially less than US$1,000. That creates this enormous block of data that we can now look at," he said.

    Increasingly, Terheyden believes the healthcare sector will see the entry of data experts, who will be there to help and support clinicians with the growing influx of the need to analyse data.

    When asked about the impact technology has had on healthcare in developing countries, Terheyden said he believes medical advances will overtake the pace of developed countries, much like how the uptake of telephonic communication has "leapfrogged" in those countries.

    He said despite the lack of resources in Africa, for instance, the uptake of mobile devices is strong and networks are everywhere, which he says is having a knock-on effect on the medical sector as it is helping those living in remote areas gain access to clinicians through telehealth.

    Research by Ericsson predicted that, while currently only 27% of the population in Africa has access to the internet, data traffic is already predicted to increase 20-fold by 2019—double the growth rate compared to the rest of the world.

    Terheyden explained while infrastructure may be rather basic in places such as Africa, and some improvements still need to be made around issues such as bandwidth, telehealth has already begun to open up new opportunities, so much so that when compared to the way medicine is practiced in developed countries, it appears archaic.

    "I know there are still some challenges with bandwidth...but that to me is a very short term problem," he said. "I think we've started to see some of the infrastructure that people are advocating that would completely blow that out of the water.

    "So, now you remove that barrier and suddenly instead of saying, 'hey you need go to a hospital and see a doctor to have a test', we're saying, 'why would you?'"

    Despite the benefits, Terheyden expects clinicians, particularly in the western world, will be faced with the challenge of coping with how their roles are changing. He pointed out that they are increasingly becoming more of a "guide, an orchestrator, and conductor," versus the person that previously "played all the instruments, as well as being the conductor."

    He highlighted given how much medical information is out there, believing it doubles every 18-24 months, it would require clinicians to be reading 80-90 hours per week to keep up to date.

    "There's this change in behaviour to longer be the expert," he said. "You're not the Wizard of Oz. People don't come to you and you dispense knowledge; you're there as the guide."

    Source: Techrepublic.com

     

  • How Big Data leaves its mark on the banking industry

    How Big Data leaves its mark on the banking industry

    Did you know that big data can impact your bank account, and in more ways than one? Here's what to know about the role big data is playing in finance and within your local bank.

    Nowadays, terms like ‘Data Analytics,’ ‘Data Visualization,’ and ‘Big Data’ have become quite popular. These terms are fundamentally tied predominantly to matters involving digital transformation as well as growth in companies. In this modern age, each business entity is driven by data. Data analytics are now very crucial whenever there is a decision-making process involved.

    Through this tool, gaining better insight has become much easier now. It doesn’t matter whether the decision being considered has huge or minimal impact; businesses have to ensure they can access the right data to move forward. Typically, this approach is essential, especially for the banking and finance sector in today’s world.

    The role of Big Data

    Financial institutions such as banks have to adhere to such a practice, especially when laying the foundation for back-test trading strategies. They have to utilize Big Data to its full potential to stay in line with their specific security protocols and requirements. Banking institutions actively use the data within their reach in a bid to keep their customers happy. By doing so, these institutions can limit fraud cases and prevent any complications in the future.

    Some prominent banking institutions have gone the extra mile and introduced software to analyze every document while recording any crucial information that these documents may carry. Right now, Big Data tools are continuously being incorporated in the finance and banking sector. 

    Through this development, numerous significant strides are being made, especially in the realm of banking. Big Data is taking a crucial role, especially in streamlining financial services everywhere in the world today. The value that Big Data brings with it is unrivaled, and, in this article, we will see how this brings forth positive results in the banking and finance world.

    The underlying concept 

    A 2013 survey conducted by the IBM’s Institute of Business Value and the University of Oxford showed that 71% of the financial service firms had already adopted analytics and big data. Financial and banking industries worldwide are now exploring new and intriguing techniques through which they can smoothly incorporate big data analytics in their systems for optimal results.

    Big data has numerous perks relating to the financial and banking industries. With the ever-changing nature of digital tech, information has become crucial, and these sectors are working diligently to take up and adjust to this transformation. There is significant competition in the industry, and emerging tactics and strategies must be accepted to survive the market competition. Using big data, firms can boost the quality and standards of their services.

    Perks associated with Big Data

    Analytics and big data play a critical role when it comes to the financial industry. Firms are currently developing efficient strategies that can woo and retain clients. Financial and banking corporations are learning how to balance Big Data with their services to boost profits and sales. Banks have improved their current data trends and automated routine tasks. Here are a few of the advantages of Big Data in the banking and financial industry:

    Improvement in risk management operations

    Big Data can efficiently enhance the ways firms utilize predictive models in the risk management discipline. It improves the response timeline in the system and consequently boosts efficiency. Big Data provides financial and banking organizations with better risk coverage. Thanks to automation, the process has become more efficient.Through Big Data, groups concerned with risk management offer accurate intelligence insights linked to risk management.

    Engaging the workforce

    Among the most significant perks of Big Data in banking firms is worker engagement. The working experience in the organization is considerably better. Nonetheless, companies and banks that handle financial services need to realize that Big Data must be appropriately implemented. It can come in handy when tracking, analyzing, and sharing metrics connected with employee performance. Big Data aids financial and banking service firms in identifying the top performers in the corporation.

    Client data accessibility

    Companies can find out more regarding their clients through Big Data. Excellent customer service implies outstanding employee performance. Aside from designing numerous tech solutions, data professionals will assist the firm set performance indicators in a project. It will aid in injective analytic expertise in multiple organizational areas. Whenever there is a better process, the work processes are streamlined. The banking and financial firms can leverage improved insights and knowledge of customer service and operational needs.

    Author: Matt Bertram

    Source: Smart Data Collective

  • How data analytics changes marketing strategies in the near future

    Marketing analyticsOver the course of last year, we saw the marketing industry monitor a number of emerging trends including wearables and facial/voice recognition, and experiment with new tools and techniques such as VR and augmented reality. 

    For example, in October, we a saw a campaign from New Zealand health insurance company Sovereign that won an International ECHO Award for integrating a wide range of datasets into a campaign which drove customer signup, lead generation and sales. They integrated new data streams from activity trackers, gym networks and grocery stores to reward customers for healthy behavior. This new data also powered timely, tailored notifications across platforms. Notwithstanding the large undertaking, Sovereign was able to improve health outcomes and increase policy renewals, reversing a negative trend for the company.

    In 2018, I expect that these features will evolve in ways that will help marketers better understand businesses, consumers, and competitors. Here are a few predictions for what we can expect to see this year: 

     It’s all about relationshi s based on Truth, Results and Trust – 1:1 Relationships at scale

    Data is a horizontal that cuts across all of marketing, yet to date many organizations (some very large organizations) are not yet data-driven. They are realizing that today’s technology and processing power enables organizations to use data informed techniques to enhance customer experience. They’re realizing that to be competitive they must pivot toward data-driven marketing techniques including data-informed design and messaging to personalize offers that resonate with individual customers based on their individual needs and interests. Look for deep-pocketed advertisers like P&G to play catchup with a vengeance in the data-driven marketing space.

    Data Quality, Brand Safety, Transaction Transparency and Transaction Verification

    We all know that massive amounts of data can be overwhelming. And of course, transforming data into actionable insight is the key to maximizing marketing ROI and enhancing the customer experience. Yet there is too much spurious data that is dangerous and costly. While it is a cliché, “garbage in equals garbage out” still rings true. This has been particularly evident in the digital advertising space with bad actors using bots to mimic human behavior. 

    Additionally, some algorithms have gone awry in the digital ad space causing potential harm to brands by placing ads in undesirable spaces. Client-side marketers cannot tolerate fraud or waste. Consequently, the supply-side has been injured as client-side marketers began reducing their digital ad buys. Look for supply-side solution providers to increase their efforts to attack such problems utilizing tools and techniques like massive processing engines, blockchain technology, better machine learning and collective concentrations of power like trade associations that bring organizations together to collectively identify and address issues that organizations struggle to solve on their own. 

    Timing and the Propensity to Buy

    While algorithms may be able to predict the next site at which a potential customer will land, they haven’t yet fully incorporated the ages old data-driven marketing technique of correctly timing a compelling offer. Look for leading solution providers to utilize more machine learning and AI to better incorporate timing into their ‘propensity to buy’ calculations.

    Third Party Data and the Burgeoning Duopoly 

    There is a balance of power issue developing in the digital ad space as Google and Facebook continue to gain dominate market share momentum in the digital ad spend space (presently estimated at a combined 84%!). Look for “rest of the world” market forces to develop innovative solutions to ensure that competition and innovation thrives in this space. 

    Responsibility 

    The data and marketing industry thrives on innovation and the technological advancement that allows us to build connections with our customers based on truth, results and trust. Acting responsibly is paramount to building brand loyalty. As more hacks and breaches occur, this large problem will attract entrepreneurs seeking opportunities to solve such problems. While it is very disturbing the see large organization like Equifax fall victim to a data breach, our data and marketing industry is stocked with brilliant minds. Look for highly encrypted cloud-based security vaults to surface. And I suspect that while many organizations may feel reluctant to house their data in the cloud, look for them to realize that it is far more secure than keeping it “in house.”

    Education will Evolve

    While a bachelor’s degree is a critical requirement for many marketing jobs, the marketing degree hanging on the wall can’t keep marketers up to speed with the ever-increasing rate of change in our data-driven marketing industry. IoT, big data, attribution woes, and integrating online and offline touchpoints, identity across platforms, channels and devices, emerging technology and techniques are all examples of daunting challenges. 

    In 2018, expect to see a surge in continuous talent-development programs, not just from academics, but from practitioners and commercial solution providers that address new challenges every day. Look for powerful video-centric platforms like DMA360, a crowdsourced platform for solution providers to bring their solutions to the market which incorporates social media techniques to curate the content through user upvotes. We all know that knowledge drives the competitive edge!

    Author: Tom Benton

    Marketing analytics(chief executive officer at the Data & Marketing Association)

  • How data analytics is affecting the insurance industry

    How data analytics is affecting the insurance industry

    Data analytics in the insurance industry is transforming the way insurance businesses operate. Here's why that is important.

    Technology has had a profound impact on the insurance industry recently. Insurers are relying heavily on big data as the number of insurance policyholders also grow. Big data analytics can help to solve a lot of data issues that insurance companies face, but the process is a bit daunting. It can be challenging for insurance companies who have not adjusted to this just yet.

    Effect of big data analytics on customer loyalty

    One of the reasons why some insurance companies get more customers as compared to others is because they can provide the things that their customers need. The more that they can give what the customers expect, the more loyalty customers reciprocate in return.

    Instead of just aggregating one policy from their insurer at a time, they may get all of their insurance policies in a single, centric dashboard. Even if people solicit an anonymous car insurance quote from a different company that is lower than others, they would still stick to a company that they are fiercely loyal to. This means that they will need to consider other factors, such as whether they have been unfairly prejudicing customers based on characteristics like gender or race. Big data may be able to help address this.

    Big data analytics can be very useful in acquiring all of the necessary data in a short amount of time. This means that insurance companies will know what their customers want and will offer these wants immediately. Insurance companies will also have the ability to provide personalized plans depending on their customer’s needs.

    Big data analytics in fraud cases

    One of the biggest issues that insurance companies are facing nowadays is fraud. According to industry findings, 1 out of 10 claims is fraudulently filed. This is an alarming rate, especially with the number of policyholders that an insurance company may have. Some consumers filing fraudulent claims have done so sloppily, which makes it easier for the company to seek restitution and prosecute the offenders before they can drive premiums up on other drivers. Some may be meticulously done and people can get away with it.

    With big data analytics, a large amount of data can be checked in a short amount of time. It includes a variety of big data solutions, such as social network analysis and telemetrics. This is the biggest weapon insurers have against insurance fraud.

    Subrogation

    A large amount of data that is needed and received for subrogation cases. The data can come from police records, medical records, and even notes regarding cases. Through big data analytics, it will be possible to get phrases that will show that the cases that are being investigated are subrogation cases.

    Settlement cases

    There are a lot of customers who may complain that lawsuit settlements often take a long time, because there is a lot of analysis that needs to be done. With the use of big data analytics, the processes can help settle the needed claims instantly. It will also be possible to check and analyze the history of the claims and the claims history of each customer. This can help reduce labor costs as the employees do not have to put all of their time into checking and finalizing each data regarding the claim. It can also give the payouts to the customer faster which means that customer satisfaction will also greatly increase.

    Checking more complex cases

    There are some people who have acquired anonymous car insurance quote and have gotten insurance in order to file claims to acquire money from the insurance company. Some cases are obvious frauds and the authentic ones can be immediately analyzed with the use of big data analytics. Yet, there are some cases that are just too complex that it would take a lot of checking to see if the data received coincide with what the customer claims. Big data analytics use data mining techniques. These techniques allow the various claims to be categorized and scored depending on their importance. There are even some that will allow the claims to be settled accordingly.

    Some common issues in using big data analytics

    It is always important for insurance companies to consider both the good and the bad details about using analytics. Some of the good things have been tackled above. These are just some concerns that you need to be familiar with:

    • You still need to use multiple tools in order to process the data which can be problematic as data may get lost along the way.
    • Getting too many data analysts when a few will be enough.
    • Not unifying the gathered information.

    Take note of these issues so that they can be avoided.

    With all of the things that big data analytics can do, it is not surprising why a lot of insurance companies would need to start using this soon. This can be integrated little by little so that it will not be too overwhelming for everyone who is involved. The sooner that this can be done, the better. Not only for the customers but for the insurance company as a whole.

    Big data will address countless insurance industry challenges

    The insurance industry is more dependent on big data than many other sectors. Their entire business model is built around actuarial analyses. As a result, they will need to rely on big data to solve many of the challenges that have plagued them for years. Big data will also help them fight fraud and process lawsuit settlements more quickly.

    Author: Diana Hope

     Source: Smart Data Collective

  • How Data Platform Modernization Leads to Enhanced Data Insights  

    How Data Platform Modernization Leads to Enhanced Data Insights

    Today, business enterprises are operating in a highly competitive landscape with multiple touchpoints, channels, and operating and regulatory environments. For such business enterprises, data has become their most important asset, which is being continuously acquired from all types of sources. These may include IoT networks, social media, websites, customers, employees, the cloud, and many more. Here, data is no longer defined only as highly structured information. They constitute a wide variety of data types and structures emanating from a multitude of sources. With all this high volume of information, the question arises: does the data deliver true value to the enterprise? If enterprises cannot extract timely data insights from business data, then they are not adding any value.

    The challenge before businesses today is to leverage data in alignment with technology, security, and governance into a cohesive modernization framework to deliver tangible benefits. Although using data from multiple sources to pursue new business opportunities, streamline operations, predict customer behavior, identify risks, attract customers, and others have become critical, it is only half the battle. The other half involves the need for businesses to update their legacy infrastructure and create a robust IT infrastructure, including large data repositories. For instance, they may seek to develop solutions for on-premise, public, and private clouds by incorporating AI.

    To modernize their existing data platforms and gain better data insights, businesses ought to move legacy data to the cloud while making it available in a streamlined and structured way without risking privacy and security. Besides, businesses do not want to be dependent on vendors for technologies and incur recurring costs. They need technologies that are fast and versatile enough to adapt to their needs. This is where a data modernization platform can prove to be a pathway to optimizing the storage, security, processing, and analysis of data.

    Data has indeed become the lifeblood of businesses across industries and geographies. From sales and marketing to operations and resource management, every aspect of a business relies on data acquisition, processing, and analysis for better decision-making. However, with the vast amount of data being generated every day from various channels, platforms, and touchpoints, it’s becoming increasingly challenging for businesses to keep up. This is where data modernization comes in. Let us understand the five benefits of modernizing a data platform for better data insights.

    5 Benefits of Modernizing Data Platforms to Generate Better Data Insights

    To remain one step ahead of the competition, businesses need to draw better insights from their data in real-time by modernizing their data platforms. The five benefits of doing the same are as follows:

    1. Improved Data Quality

    Modernizing the data platform involves leveraging the latest technologies to upgrade data storage, data processing, and data management systems. This, in addition to enhancing the speed and efficiency of data processing, also improves the quality of the data. Thus, with improved data quality, businesses can make more accurate decisions and gain better insights into their operations.

    2. Increased Data Accessibility

    Not being able to access the right type of data in the right quantity when needed has been a bane for businesses. However, a modernized data platform facilitates data accessibility in real-time. Thus, team members can access the data they need at the time and place of their choosing. This, however, can only be possible through the use of cloud-based platforms. These data insights platforms allow remote data access, enabling teams to collaborate and share data in real time. With increased data accessibility, businesses can promote a more data-driven culture, leading to better decision-making at all levels.

    3. Real-time Data Insights

    With a modernized data platform, businesses can gain real-time insights into their operations, allowing them to make informed decisions quickly. This is particularly useful in industries where timing is critical, such as finance and healthcare. Real-time data insights can also help them identify trends and patterns in the data that might have gone unnoticed otherwise, enabling them to make proactive decisions rather than reactive ones.

    4. Scalability and Flexibility

    Scalability and flexibility are the twin requirements that businesses often need to address as they grow. With a modern data platform, they can achieve both, besides optimizing their data acquisition, processing, and storage needs. In other words, they can scale up or down their data infrastructure without worrying about losing data or facing downtime. A flexible data platform also enables the seamless integration of new data sources or technologies, allowing businesses to stay ahead of the competition.

    5. Cost Savings

    In the final analysis, modernizing the data platform can offer significant cost savings. For instance, by optimizing data storage, processing, and management systems, businesses or data modernization services, can reduce the amount of time and resources spent on data processing and analysis. This can lead to more efficient operations and reduced costs. Additionally, cloud-based platforms can offer cost savings by reducing the need for setting up and maintaining on-premises infrastructure.

    Conclusion

    With data becoming the most important asset for businesses operating in the digital landscape, it needs to be leveraged using data platforms to gain big data insights and make informed decisions. However, modernizing the data platforms is essential to optimizing activities related to data acquisition, storage, processing, and analysis. Unless businesses can extract the right kind of data in real-time, they will not be able to draw the right insights or inferences on market trends, customer preferences, and other tangibles. So, among the benefits that modernized data platforms are likely to offer are improved quality of data, better access to data, real-time data insights, scalability and flexibility, and cost savings. By investing in modernizing data platforms, businesses can stay ahead of the competition and drive growth.

    Date: June 6, 2023

    Author: Hermanth Kumar

    Source: Datafloq

     

     

  • How Data Science is Changing the Entertainment Industry

    How Data Science is Changing the Entertainment Industry

    Beyond how much and when, to what we think and how we feel

    Like countless other industries, the entertainment industry is being transformed by data. There’s no doubt data has always played a role in guiding show-biz decision-making, for example, in the form of movie tracking surveys and Nielsen data. But with the ever-rising prominence of streaming and the seamless consumption measurement it enables, data has never been more central to understanding, predicting, and influencing TV and movie consumption.

    With experience as both a data scientist in the entertainment space and a researcher of media preferences, I’ve had the fortune of being in the trenches of industry analyzing TV/movie consumption data and being able to keep up with media preferences research from institutions around the world. As made evident by the various citations to come, the component concepts presented here themselves aren’t anything new, but I wanted to apply my background to bring together these ideas in laying out a structured roadmap for what I believe to be the next frontiers in enhancing our ability to understand, predict, and influence video content consumption around the world. While data can play a role at many earlier phases of the content lifecycle — e.g. in greenlighting processes or production — and what I am about to say can be relevant in various phases, I write mainly from a more downstream perspective, nearer to and after release as content is consumed, as cultivated during my industry and academic work.

    Beyond Viewing and Metadata

    When you work in the entertainment space, you end up working a lot with title consumption data and metadata. To a large extent, this is unavoidable — all “metadata” and “viewing data” really mean is data on what’s being watched and how much — but it’s hard to not start sensing that models based on such data, as commonly seen in content similarity analyses, output results that fall into familiar patterns. For example, these days when I see “similar shows/movies” recommendations, a voice in my head goes, “That’s probably a metadata-based recommendation ,” or, “Those look like viewership-based recommendations,” based on what I’ve seen during my work with such models. I can’t be 100% sure, of course, and the voice is more confident with smaller services that likely use more off-the-shelf approaches; on larger platforms, recommendations are often seamless enough that I’m not thinking about flaws, but who knows what magic sauce is going into them?

    I’m not saying viewing data and metadata will ever stop being important, nor do models using such data fail to explain ample variance in consumption. What I am saying is that there is a limit to how far solely these elements will get us when it comes to best analyzing and predicting viewership— we need new ways to enhance understanding of viewers and their relationship with content. We want to understand and foresee title X’s popularity at time point A beyond, “It was popular at A-1, it will be popular at A,” or, “title Y, which is similar to X, was popular, so will be popular”, especially since often, data at A-1 or on similarity between X and Y may not be available. Let’s talk about one type of data that I think will prove critical in enhancing understanding of and predictive capacity concerning viewership moving forward.

    Psychometrics: Who is Watching and Why

    People love to talk demographics when it comes to media consumption. Indeed, anyone who’s taken a movie business class is likely to be familiar with the “four quadrant movie”, or a movie that can appeal to men and women over and under the ages of 25. But demographics are limited in their explanatory and predictive utility in that they generally go as far as telling us the who but not necessarily the why.

    That’s where psychometrics (a.k.a. psychographics) can provide a boost. Individuals in the same demographic can easily have different tendencies, values, preferences; an example would be the ability to divide men or women into people who tend to be DIYers, early adopters, environmentalists, etc. based on their measured characteristics across various dimensions. Similarly, people of different demographics can easily have similar characteristics, such as being high in thrill-seeking, being open to new experiences, or identifying as left/right politically. Such psychometric variables have indeed been shown to influence media preference — for example, agreeable people like talk shows and soaps more, higher sensation seeking individuals like violent content more — and improve the capacity of recommendation models. My own research has shown that even abbreviated psychometric measures can produce an improvement in model fit to genre preference data compared to demographic data alone. Consumer data companies have already begun to recognize the importance of psychometric data, with many of them incorporating them in some form into their services.

    Psychometric data can be useful at the individual-level at which they are often collected, or aggregated to provide group-level — audience, userbase, country, so on — psychometric features of various kinds. Some such data might come ‘pre-aggregated’ at the source, as is the case with, for example, Hofstede’s cultural dimensions. In terms of collection, when direct collection for all viewers in an audience isn’t feasible (e.g. when you can’t survey millions of users), a “seed” set of self-report survey data from responding viewers could be used to impute the values to similar non-respondents using nearest neighbor methods. Psychometric data can also be beneficial in cold-start problem scenarios — if you don’t have direct data about what a particular audience watches or how much they would watch particular titles, wouldn’t data about their characteristics that point to the types of content are likely to want be useful?

    Consumption as Viewer-Content Trait Interaction

    The above section discusses psychometrics in particular, but zooming out a bit, what it is more broadly pushing for is an expansion of the viewer/audience feature space beyond the demographic and behavioral. This is because all consumption is inherently an interaction between the traits of a viewer and the traits of a piece of content. This concept is simpler and more well-tread than it may sound; all it really means is that some element of a viewer (viewer trait) means they are more (or less) drawn to some element of a piece of content (content trait). Even familiar stereotypes about genre preferences — children are more into animation, men are more into action, etc. — inherently concern viewer-content trait interactions (viewer age-content genre, viewer sex-content genre in above examples), and the aforementioned research on viewer psychometrics effects on content preferences also fall under this paradigm.

    The larger the array of viewer traits we have, the more things we can consider might interact with some kind of content trait to impact their interest in consuming the title. Conversely, this also means that it is beneficial to have new forms of data title-side as well. It can seem like people are more readily ‘get deep’ with title-side data, in the form of metadata (genre, cast, crew, studio, awards, average reviews, etc.), than they do with viewer-side data, but there’s still room for expansion title-side, especially if one is expanding viewer-side data as suggested above through collection of psychometrics and the like. Tags and tagging are a good place to start in this regard. Human tagging can particularly be beneficial by capturing latent information still difficult for machines to detect on their own — e.g. humor, irony, sarcasm, etc. — but automated processes can provide useful baseline content tags of a consistent nature. However, these days, tags are just the start when it comes to generating additional title-side data. It’s possible to engineer all sorts of features from the audio and video of titles, as well as to extract the emotional arc of a story from text.

    Once you consider consumption from the viewer-content interaction lens and expand data collection on both the viewer and title sides, the possibilities really open up. You could, for example, code the race/ethnicity and gender of characters in a title and see how demographic similarity between the title cast/crew and the typical users of a streaming platform can impact the title’s success. Or maybe you want to code titles for their message sensation value to see how that’s associated with the title’s appeal to a particular high sensation-seeking group. Or perhaps you want to use data from OpenSubtitles or the like to determine the narrative arc type of all the titles in your system and see if any patterns arise as to the appeal of certain arcs to individuals of certain psychographics.

    Parsing the Pipeline: Perception, Interest, Response

    Lastly, there needs to be a more granular consideration of the consumption pipeline, from interest to response. Though easily lumped together as “good” signals of a consumer’s feelings about a title, being interested in, watching, and liking a piece of content are entirely different things. Here’s how the full viewing process should be parsed out when possible, separated broadly into pre-consumption and post-consumption phases.

    Perception (Pre-consumption): Individuals of different demographics, and presumably of different psychographics, can perceive the same media product differently. These perceptions can be shaped by the elements of a product’s brand design, font, colors, and advertisements. Perception arguably has important effects on the next phase in the pipeline.

    Interest, and Selection (Pre-consumption): First off, though related and the former certainly increases the likelihood of the latter, it is important to note that interest (a.k.a. preference) is not the same has selection (a.k.a. choice). Though analyses regarding one may often be relevant to the other, we cannot always assume that an individual who expresses interest in something or has a high likelihood of being interested in something will always choose to consume it. This is well exemplified by models like the Reasoned Action Model, within which framework an individual who feels favorably about watching a movie may not watching it to perceived unfavorable norms about watching said movie. Examining factors driving interest-selection conversion may be beneficial.

    Response (Post-consumption): Lastly, there is how individuals feel after watching a piece of content. This could be as simple as whether they liked it or not; and though it can be tempting to equate high viewership with wow, people really like that movie when looking at a dataset, it’s critical to remember that how much people watch something and whether they like it are related but ultimately separate things, as anyone who was stoked for a movie then crushed by its mediocrity can attest; my own research has shown that the effects at play with interest in unseen content can differ from, even be the opposite of, the effects at play with liking of seen content. Beyond liking, responses can also include elements such as how viewers felt about the content emotionally, how much they related with the characters, to what degree they were immersed into the storyline, and more.

    Media preference and consumption does not need to be considered a singular, stationary process, but instead, separated out this way, a fluid modular, process where strategic management of upstream processes can impact the likelihood of desired outcomes, whatever they may be, down the line. How can we selectively optimize perception of a media product across different demographic and psychographic groups to get maximum interest in a title — or perhaps, optimize the desired downstream outcome? How can we optimally convert interest into selection? Can certain upstream perceptions or overly high levels of interest interact adversely with the content of a certain title such that the ultimate response to the title is more negative than it would have been had perceptions been different or interest less extreme? In addition, though I provide potential key mechanisms of relevance to each step of the pipeline, certain mechanisms may be of relevance at multiple phases or across different phases of the pipeline — for example, (potential) viewer-character similarity may impact perception of and interest in a title after exposure to advertising, while social network effects may mean the post-consumption responses of certain individuals heavily influence pre-consumption interest among other individuals.

    Conclusion

    As an industry, we’ve only begun to scratch the surface of how data can help us understand, predict, and influence content consumption, and these are just some of my thoughts on what I believe will be important considerations as data science becomes ever more prevalent and critical in the entertainment space. Audience psychometrics will help enhance understanding of audiences beyond what demographics can do alone; considering interactions between new audience and content features will provide superior strategic insights and predictive capacity; and a nuanced consideration of the full consumption pipeline from interest to response will help optimize desired outcomes.

    Author: Danny Kim

    Source: Towards Data Science

  • How Nike And Under Armour Became Big Data Businesses

    960x0Like the Yankees vs the Mets, Arsenal vs Tottenham, or Michigan vs Ohio State, Nike and Under Armour are some of the biggest rivals in sports.
     
    But the ways in which they compete — and will ultimately win or lose — are changing.
     
    Nike and Under Armour are both companies selling physical sports apparel and accessories products, yet both are investing heavily in apps, wearables, and big data.  Both are looking to go beyond physical products and create lifestyle brands athletes don’t want to run without.
     
    Nike
     
    Nike is the world leader in multiple athletic shoe categories and holds an overall leadership position in the global sports apparel market. It also boasts a strong commitment to technology, in design, manufacturing, marketing, and retailing.
     
    It has 13 different lines, in more than 180 countries, but how it segments and serves those markets is its real differentiator. Nike calls it “category offense,” and divides the world into sporting endeavors rather than just geography. The theory is that people who play golf, for example, have more in common than people who simply happen to live near one another.
     
    And that philosophy has worked, with sales reportedly rising more than 70% since the company shifted to this strategy in 2008. This retail and marketing strategy is largely driven by big data.
     
    Another place the company has invested big in data is with wearables and technology.  Although it discontinued its own FuelBand fitness wearable in 2014, Nike continues to integrate with many other brands of wearables including Apple which has recently announced the Apple Watch Nike+.How Nike And Under Armour Became Big Data Businesses
     
    But the company clearly has big plans for its big data as well. In a 2015 call with investors about Nike’s partnership with the NBA, Nike CEO Mark Parker said, “I’ve talked with commissioner Adam Silver about our role enriching the fan experience. What can we do to digitally connect the fan to the action they see on the court? How can we learn more about the athlete, real-time?”
     
    Under Armour
     
    Upstart Under Armour is betting heavily that big data will help it overtake Nike. The company has recently invested $710 million in acquiring three fitness app companies, including MyFitnessPal, and their combined community of more than 120 million athletes — and their data.
     
    While it’s clear that both Under Armour and Nike see themselves as lifestyle brands more than simply apparel brands, the question is how this shift will play out.
     
    Under Armour CEO Kevin Plank has explained that, along with a partnership with a wearables company, these acquisitions will drive a strategy that puts Under Armour directly in the path of where big data is headed: wearable tech that goes way beyond watches
     
    In the not-too-distant future, wearables won’t just refer to bracelets or sensors you clip on your shoes, but rather apparel with sensors built in that can report more data more accurately about your movements, your performance, your route and location, and more.
     
    “At the end of the day we kept coming back to the same thing. This will help drive our core business,” Plank said in a call with investors. “Brands that do not evolve and offer the consumer something more than a product will be hard-pressed to compete in 2015 and beyond.”
     
    The company plans to provide a full suite of activity and nutritional tracking and expertise in order to help athletes improve, with the assumption that athletes who are improving buy more gear.
     
    If it has any chance of unseating Nike, Under Armour has to innovate, and that seems to be exactly where this company is planning to go. But it will have to connect its data to its innovations lab and ultimately to the products it sells for this investment to pay off.
     
     
    Source: forbes.com, November 15, 2016
  • How the data-based gig economy affects all markets

    How the data-based gig economy affects all markets

    Data is infinite. Any organization that wants to grow at a meaningful pace would be wise to learn how to leverage the vast amount of data available to drive growth. Just ask the top five companies in the world today: Apple, Amazon, Google, Facebook, and Microsoft. All these technology giants either process or produce data.

    Companies like these with massive stockpiles of data often find themselves surrounded by other businesses that use that data to operate.Salesforce is a great example: Each year at its Dreamforce conference in San Francisco, hundreds of thousands of attendees and millions of viewers worldwide prove just how many jobs the platform has created.

    Other companies are using vast amounts of information from associated companies to enhance their own data or to provide solutions for their clients to do so. When Microsoft acquired LinkedIn, for instance, it acquired 500 million user profiles and all of the data that each profile has generated on the platform. All ripe for analysis.

    With so much growth evolving from a seemingly infinite ocean of data, tomorrow’s leading companies will be those that understand how to capture, connect, and leverage information into actionable insight. Unless they’re already on the top 10 list of the largest organizations, the problem most companies face is a shortage of highly skilled talent that can do it for them. Enter the data scientist.

    More data, more analysts

    The sheer amount of data at our fingertips isn’t the only thing that’s growing. According to an Evans Data report, more than 6 million developers across the world are officially involved in analyzing big data. Even traditionally brick-and-mortar retail giant Walmart plans to hire 2,000 tech experts, including data scientists, for that specific purpose.

    Companies old and new learned long ago that data analysis is vital to understanding customers’ behavior. Sophisticated data analytics can reveal when customers are likely to buy certain products and what marketing methods would be effective in certain subgroups of their customer base.

    Outside of traditional corporations, companies in the gig economy are relying even more on data to utilize their resources and workforce more efficiently. For example, Uber deploys real-time user data to determine how many drivers are on the road at any given time, where more drivers are needed, and when to enact a surge charge to attract more drivers.

    Data scientists are in demand and being hired by the thousands. Some of the most skilled data scientists are going the freelance route because their expertise allows them to choose more flexible work styles. But how can data scientists who aren’t interested in becoming full-time, in-house hires ensure that the companies for which they freelance are ready for their help?

    The data-based gig economy

    Gartner reports that the number of freelance data scientists will grow five times faster than that of traditionally employed ones by next year. The data-based gig economy can offer access to top talent on flexible schedules. But before data scientists sign on for a project, they should check to see that companies are prepared in the following areas:

    • Companies need to understand their data before they decide what to do with it. That data could include inventory, peak store hours, customer data, or other health metrics.
    • Next, businesses should have streamlined the way they collect and store their data to make it easy to analyze. Use of a CRM platform is a good indicator of preparedness at this stage.
    • Finally, companies need to be able to act on the insights they glean. After freelancers are able to use organizations’ collected and organized data to find valuable connections and actionable insights, those organizations should have a process for implementing the discoveries.

    Today’s organizations need data in order to be successful, and they need data scientists to make use of that data. In order for both parties to thrive in this era, companies need to have the right strategies in place before they invest in freelance talent. When they do, freelance data scientists will have the opportunity to gather critical knowledge from the data and use their talents to drive innovation and success.

    Author: Marcus Sawyerr

    Source: Insidebigdata

  • How to act now to be successful in the future? Digital business models

    How to act now to be successful in the future? Digital business models

    Digital business models created around data are producing a winner-take-all market, not a digital divide. That’s why leaders need to “stop doing analytics for analytics’ sake, focus on the business problem, and define and ask the big questions of your data,” warns disrupting digital business author Ray Wang in 10 Enterprise Analytics Trends to Watch.

    The Constellation Research founder and principal analyst notes that digital leaders are now grabbing 70% of overall market share, and more than 75% of profits. A Harvard Business Review Analytic Services report that features insights from Wang warns brands of “an existential imperative; those companies that do not evolve into data-driven organizations will be supplanted by those that do.”

    For most, a long way to go and a short time to get ghere

    The inflection point for the data-driven enterprise report, based on a survey of 729 business leaders conducted by Harvard Business Review Analytic Services, shows that while 90% of respondents say they’re confident that their organizations will achieve its vision of a data-driven enterprise, most have an alarmingly long way to go:

    • While 86% say the ability to extract new value and insights from existing data and analytics applications is very important, only 30% say their organization is currently very effective at doing so.
    • While 78% say accessing and combining data from a variety of external data sources is very important, just 23% say their organization is currently very effective at doing so.

    And those new digital business models that are, according to Ray Wang, creating a winner-take-all market? Only 28% of respondents say that introducing new business models is a key goal of their evolution into a data-driven organization. For leaders this is key to digital transformation, says Wang. For the remaining 72% that don’t have new business model creation or business model evolution as a goal, there’s simply no time to wait.

    “This is a top-down strategic business model decision that boards have to address,” says Wang. “Boards aren’t doing their jobs because they don’t understand the problem: they’re in a data war, and data is the weapon."

    Leaders are moving further ahead, faster

    In 10 Enterprise Analytics Trends to Watch, Wang notes that you’ll also see analytics leaders applying artificial intelligence for business agility and scale. This automation and augmentation when it comes to data and insights is set to move leaders and fast followers even further ahead when it comes to digital transformation.

    “The situation in almost every market is that executives realize that they need to transform. They want to start using artificial intelligence, for example,” says Wang. “But they don’t realize that these changes happen along a continuum. It’s an intensive, multi-year process.”

    As the next decade looms, the race is on to make the most, and more than competitors, of data. Is your 2020 vision for data and analytics clear?

    ''Every board member and CEO needs to understand that data assets have to be managed the same way they manage any other asset. If they don’t, they will be disrupted.'' - Ray Wang, Constellation Research

    Source: Microstrategy

  • How to Benchmark Your Marketing Performance Against Your Competition's

    160225-Man-Painting-Coloured-Arrows-115378220In today's digital marketing world, competitive intelligence often takes a back seat to all the key performance indicators (KPIs) on which marketers are focused—open rates, social engagement metrics, lead-to-sales opportunity conversion rates, etc.

    That inward focus on how well you are doing with your revenue-driving marketing tactics is critical. But it can lead you to celebrate the wrong things. Don't let your KPIs overshadow the importance of knowing exactly how your digital marketing strategies are performing in relation to your peers who are competing against you in the market.

    If you forget to look at the bigger picture, you'll miss a perspective that, well, separates the best marketers from the mediocre ones.

    You can easily keep tabs on how your campaigns measure up against others in your industry without hiring an expensive third-party research firm. Of course, there may be times when you do need customer research and use a fancy detailed matrix of your competitors for in-depth analysis for identifying new products or for market sizing.

    But I'm talking about a quick and easy dashboard that measures you, the marketer, against your competitors.

    Why Spy?

    Competitive intelligence helps you...

    • Increase your chances of winning in the marketplace
    • Shape the development of your digital marketing strategy
    • Create a strategy for new product launches
    • Uncover threats and opportunities
    • Establish benchmarking for your analytics
      Most businesses do not have the luxury of having a dedicated employee, let alone a dedicated team, to gather and analyze gobs of data. However, you can easily track basic KPIs to inform decision-making at your company.

    Having analyzed the digital marketing strategies of numerous companies of various size and in various industries, including e-commerce, SaaS, and travel companies—and their competitors—I suggest the following for benchmarking.

    Website Performance Metrics

    To track the performance of a website, gather data from sites such as SEMRush, Pingdom, Similarweb, and Alexa. While that data is not always accurate when you compare three or four competitors at once, you can spot trends.

    Important metrics to monitor include the following:

    • Website visits: The average number of visitors per month can easily size up how popular you and your competitors are.
    • Bounce rate and site speed: Correlate these two metrics. That's how you can determine whether you need to make changes to your own website. For example, if your website has a high page-load time compared with your competitors, that will impact your page rankings, bounce rate, and overall customer satisfaction.
    • Geographic sources of traffic: Look at what percentage of visitors comes from what regions. That's critical if your company plans to expand beyond its current geographical presence. It will also allow you to spot global opportunities by finding gaps in distribution when looking at all competitors.
    • Website traffic by channel: See where your competitors choose to spend their time and money. For example, a company that has a higher percentage of visitors from email probably has a large prospect database. If you look at their website, you can examine how they collect data for their email marketing programs. Are they getting website visitors to sign up for newsletters or special offers? If not, they may be purchasing prospect data from a data provider. You can adjust your own strategy to ramp up marketing campaigns in areas where your competitors are not actively engaging prospects, or to increase spending in areas where they are outperforming you.

    Benchmarking reports from industry research reports are also helpful for tracking average open, click-through, and conversion rates.

    By putting together your newly found competitor insight and your own metrics, including your past performance, you can establish your own benchmarking.

    Mining for More Data

    Where are your competitors spending their advertising budgets? How are they using social media and PR? What jobs are they posting? Those answers are not hard to find, and they provide powerful insights.

    • SEO/PPC research: Tools are available to help you determine what ads your competitors are running and how they rank for particular keywords. Check out SEMRush, SpyFu, and WhatRunsWhere. You can also look at their overall spending for PPC campaigns. Depending on the source, however, the accuracy of this data can be as low as 50%. So use it for gauging overall direction, but don't rely on it entirely.
    • Social media: This is probably the hottest area of marketing and the hardest to assess. Mining data on social channels is especially tough when tracking consumer brands. It's best to monitor your competitors' activities monthly, and make sure to look at the posts ad promotions that companies generate. When updating or changing your strategy, you should have a solid understanding of what social media channels your competitors are using, types of posts they are making, how frequently they are using social media, and how successful they are (including number of users and levels of engagement).
    • PR: Press releases, financial reports, and thought-leadership blog posts distributed by your competitors provide great insight into their partnerships, possible marketing spending, and other initiatives.
    • Job postings: From time to time, take a look at LinkedIn or other job sites and you can get a good idea of where and how the company plans to expand.

    Frequency of Competitive Analysis

    The answer depends on the type of business that you have and the competitive landscape.

    For example, if you are selling a product in the SaaS Cloud space where you have 10 competitors, most of which are leading innovators, it makes sense to track their every move. However, if you are a B2B company and you have only one or two competitors in the manufacturing sector, you probably can get away with doing some basic benchmarking once every quarter.

    It is advisable to do a competitive analysis prior to changing strategy, launching a new product, or making tactical plans for the next quarter or year.

    Don't Be Afraid: Know Where You Stand

    Here's the bottom line: Don't get too excited about your 5% jump in email open rates, or passing a "likes" milestone on Facebook. Have the courage to see whether you are really a marketing rock star by benchmarking yourself against your competitors. Your business needs to know what your competition is doing. And I don't mean just knowing your competitors' products and pricing.

    With the insights you'll get from these tips and tools, you will be able to create a solid strategy, spot-on tactical plans, and (at the very least) a fantastic presentation to your executives or board.

    Source: MarketingProfs

  • How to Do Big Data on a Budget?

    2016-02-11-1455188997-848612-shutterstock 274038974-thumbTo really make the most of big data, most businesses need to invest in some tools or services - software, hardware, maybe even new staff - and there's no doubt that the costs can add up. The good news is that big data doesn't have to cost the Earth and a small budget needn't prevent companies from stepping into the world of big data. Here are some tips and ideas to help keep costs down:

    Think about your business objectives
    Too many businesses focus on collecting as much data as possible which, in my view, misses the whole point of big data. The objective should be to focus on the data that helps you achieve your strategic objectives. The whole point of big data should be to learn something from your data, take action based on what you've learned and grow your business as a result. Limiting the scope of your data projects so they tightly match your business goals should help keep costs down, as you can focus only on the data you really need.

    Make use of the resources you already have
    Before you splash out on any new technology, it's worth looking at what you're already using in your business. Some of your existing infrastructure could have a role to play. Go through each of the four key infrastructure elements (data sources, data storage, data analysis and data output) and note what related technology or skills you already have in-house that could prove useful. For example, you may already be collecting useful customer data through your website or customer service department. Or you very likely have a wealth of financial and sales data that could provide insights. Just be aware that you may already have some very useful data that could help you achieve your business objectives, saving you time and money.

    Look for savings on software
    Open source (free) software, like Hadoop, exists for most of the essential big data tasks. And distributed storage systems are designed to run on cheap, off-the-shelf hardware. The popularity of Hadoop has really opened big data up to the masses - it allows anyone to use cheap off-the-shelf hardware and open source software to analyse data, providing they invest time in learning how. That's the trade-off: it will take some time and technical skill to get free software set up and working the way you want. So unless you have the expertise (or are willing to spend time developing it) it might be worth paying for professional technical help, or 'enterprise' versions of the software. These are generally customised versions of the free packages, designed to be easier to use, or specifically targeted at various industries.

    Take advantage of big data as a service (BDaaS)
    In the last few years many businesses have sprung up offering cloud-based big data services to help other companies and organisations solve their data dilemmas. This makes big data a possibility for even the smallest company, allowing them to harness external resources and skills very easily. At the moment, BDaaS is a somewhat vague term often used to describe a wide variety of outsourcing of various big data functions to the cloud. This can range from the supply of data, to the supply of analytical tools which interrogate the data (often through a web dashboard or control panel) to carrying out the actual analysis and providing reports. Some BDaaS providers also include consulting and advisory services within their BDaaS packages.

    BDaaS removes many of the hurdles associated with implementing a big data strategy and vastly lowers the barrier of entry. When you use BDaaS, all of the techy 'nuts and bolts' are, in theory, out of sight and out of mind, leaving you free to concentrate on business issues. BDaaS providers generally take this on for the customer - they have everything set up and ready to go - and you simply rent the use of their cloud-based storage and analytics engines and pay either for the time you use them or the amount of data crunched. Another great advantage is that BDaaS providers often take on the cost of compliance and data protection - something which can be a real burden for small businesses. When the data is stored on the BDaaS provider's servers, they are (generally) responsible for it.

    It's not just new BDaaS companies which are getting in on the act; some of the big corporations like IBM and HP are also offering their own versions of BDaaS. HP have made their big data analytics platform, Haven, available entirely through the cloud. This means that everything from storage to analytics and reporting is handled on HP systems which are leased to the customer via a monthly subscription - entirely eliminating infrastructure costs. And IBM's Analytics for Twitter service provides businesses with access to data and analytics on Twitter's 500 million tweets per day and 280 million monthly active users. The service provides analytical tools and applications for making sense of that messy, unstructured data and has trained 4,000 consultants to help businesses put plans into action to profit from them.

    As more and more companies realise the value of big data, more services will emerge to support them. And competition between suppliers should help keep subscription prices low, which is another advantage for those on a tight budget. I've already seen that BDaaS is making big data projects viable for many businesses that previously would have considered them out of reach - and I think it's something we'll see and hear a lot more about in the near future.

    Source: HuffPost

EasyTagCloud v2.8