144 items tagged "data science"

  • ‘Vooruitgang in BI, maar let op ROI’

    5601405Business intelligence (bi) werd door Gartner al benoemd tot hoogste prioriteit voor de cio in 2016. Ook de Computable-experts voorspellen dat er veel en grote stappen genomen gaan worden binnen de bi. Tegelijkertijd moeten managers ook terug kijken en nadenken over hun businessmodel bij de inzet van big data: hoe rechtvaardig je de investeringen in big data?

    Kurt de Koning, oprichter van Dutch Offshore ICT Management
    Business intelligence/analytics is door Gartner op nummer één gezet voor 2016 op de prioriteitenlijst voor de cio. Gebruikers zullen in 2016 hun beslissingen steeds meer laten afhangen van stuurinformatie die uit meerdere bronnen komt. Deze bronnen zullen deels bestaan uit ongestructureerde data. De bi-tools zullen dus niet alleen visueel de informatie aantrekkelijk moeten opmaken en een goede gebruikersinterface moeten bieden. Bij het ontsluiten van de data zullen die tools zich onderscheiden , die in staat zijn om orde en overzicht te scheppen uit de vele verschijningsvormen van data.

    Laurent Koelink, senior interim BI professional bij Insight BI
    Big data-oplossingen naast traditionele bi
    Door de groei van het aantal smart devices hebben organisaties steeds meer data te verwerken. Omdat inzicht (in de breedste zin) een van de belangrijkste succesfactoren van de toekomst gaat zijn voor veel organisaties die flexibel in willen kunnen spelen op de vraag van de markt, zullen zijn ook al deze nieuwe (vormen) van informatie moeten kunnen analyseren. Ik zie big data niet als vervangen van traditionele bi-oplossingen, maar eerder als aanvulling waar het gaat om analytische verwerking van grote hoeveelheden (vooral ongestructureerde) data.

    In-memory-oplossingen
    Organisaties lopen steeds vaker aan tegen de performance-beperkingen van traditionele database systemen als het gaat om grote hoeveelheden data die ad hoc moeten kunnen worden geanalyseerd. Specifieke hybride database/hardware-oplossingen zoals die van IBM, SAP en TeraData hebben hier altijd oplossingen voor geboden. Daar komen nu steeds vaker ook in-memory-oplossingen bij. Enerzijds omdat deze steeds betaalbaarder en dus toegankelijker worden, anderzijds doordat dit soort oplossingen in de cloud beschikbaar komen, waardoor de kosten hiervan goed in de hand te houden zijn.

    Virtual data integration
    Daar waar data nu nog vaak fysiek wordt samengevoegd in aparte databases (data warehouses) zal dit, waar mogelijk, worden vervangen door slimme metadata-oplossingen, die (al dan niet met tijdelijke physieke , soms in memory opslag) tijdrovende data extractie en integratie processen overbodig maken.

    Agile BI development
    Organisaties worden meer en meer genoodzaakt om flexibel mee te bewegen in en met de keten waar ze zich in begeven. Dit betekent dat ook de inzichten om de bedrijfsvoering aan te sturen (de bi-oplossingen) flexibel moeten mee bewegen. Dit vergt een andere manier van ontwikkelen van de bi-ontwikkelteams. Meer en meer zie je dan ook dat methoden als Scrum ook voor bi-ontwikkeling worden toegepast.

    Bi voor de iedereen
    Daar waar bi toch vooral altijd het domein van organisaties is geweest zie je dat ook consumenten steeds meer en vaker gebruik maken van bi-oplossingen. Bekende voorbeelden zijn inzicht in financiën en energieverbruik. De analyse van inkomsten en uitgaven op de webportal of in de app van je bank, maar ook de analyse van de gegevens van slimme energiemeters zijn hierbij sprekende voorbeelden. Dit zal in de komende jaren alleen maar toenemen en geïntegreerd worden.

    Rein Mertens, head of analytical platform bij SAS
    Een belangrijke trend die ik tot volwassenheid zie komen in 2016 is ‘streaming analytics’. Vandaag de dag is big data niet meer weg te denken uit onze dagelijkse praktijk. De hoeveelheid data welke per seconde wordt gegenereerd blijft maar toenemen. Zowel in de persoonlijke als zakelijke sfeer. Kijk maar eens naar je dagelijkse gebruik van het internet, e-mails, tweets, blog posts, en overige sociale netwerken. En vanuit de zakelijke kant: klantinteracties, aankopen, customer service calls, promotie via sms/sociale netwerken et cetera.

    Een toename van volume, variatie en snelheid van vijf Exabytes per twee dagen wereldwijd. Dit getal is zelfs exclusief data vanuit sensoren, en overige IoT-devices. Er zit vast interessante informatie verstopt in het analyseren van al deze data, maar hoe doe je dat? Een manier is om deze data toegankelijk te maken en op te slaan in een kosteneffectief big data-platform. Onvermijdelijk komt een technologie als Hadoop dan aan de orde, om vervolgens met data visualisatie en geavanceerde analytics aan de gang te gaan om verbanden en inzichten uit die data berg te halen. Je stuurt als het ware de complexe logica naar de data toe. Zonder de data allemaal uit het Hadoop cluster te hoeven halen uiteraard.

    Maar wat nu, als je op basis van deze grote hoeveelheden data ‘real-time’ slimme beslissingen zou willen nemen? Je hebt dan geen tijd om de data eerst op te slaan, en vervolgens te gaan analyseren. Nee, je wilt de data in-stream direct kunnen beoordelen, aggregeren, bijhouden, en analyseren, zoals vreemde transactie patronen te detecteren, sentiment in teksten te analyseren en hierop direct actie te ondernemen. Eigenlijk stuur je de data langs de logica! Logica, die in-memory staat en ontwikkeld is om dat heel snel en heel slim te doen. En uiteindelijke resultaten op te slaan. Voorbeelden van meer dan honderdduizend transacties zijn geen uitzondering hier. Per seconde, welteverstaan. Stream it, score it, store it. Dat is streaming analytics!

    Minne Sluis, oprichter van Sluis Results
    Van IoT (internet of things) naar IoE (internet of everything)
    Alles wordt digitaal en connected. Meer nog dan dat we ons zelfs korte tijd geleden konden voorstellen. De toepassing van big data-methodieken en -technieken zal derhalve een nog grotere vlucht nemen.

    Roep om adequate Data Governance zal toenemen
    Hoewel het in de nieuwe wereld draait om loslaten, vertrouwen/vrijheid geven en co-creatie, zal de roep om beheersbaarheid toch toenemen. Mits vooral aangevlogen vanuit een faciliterende rol en zorgdragend voor meer eenduidigheid en betrouwbaarheid, bepaald geen slechte zaak.

    De business impact van big data & data science neemt toe
    De impact van big data & data science om business processen, diensten en producten her-uit te vinden, verregaand te digitaliseren (en intelligenter te maken), of in sommige gevallen te elimineren, zal doorzetten.

    Consumentisering van analytics zet door
    Sterk verbeterde en echt intuïtieve visualisaties, geschraagd door goede meta-modellen, dus data governance, drijft deze ontwikkeling. Democratisering en onafhankelijkheid van derden (anders dan zelfgekozen afgenomen uit de cloud) wordt daarmee steeds meer werkelijkheid.

    Big data & data science gaan helemaal doorbreken in de non-profit
    De subtiele doelstellingen van de non-profit, zoals verbetering van kwaliteit, (patiënt/cliënt/burger) veiligheid, punctualiteit en toegankelijkheid, vragen om big data toepassingen. Immers, voor die subtiliteit heb je meer goede informatie en dus data, sneller, met meer detail en schakering nodig, dan wat er nu veelal nog uit de traditionelere bi-omgevingen komt. Als de non-profit de broodnodige focus van de profit sector, op ‘winst’ en ‘omzetverbetering’, weet te vertalen naar haar eigen situatie, dan staan succesvolle big data initiatieven om de hoek! Mind you, deze voorspelling geldt uiteraard ook onverkort voor de zorg.

    Hans Geurtsen, business intelligence architect data solutions bij Info Support
    Van big data naar polyglot persistence
    In 2016 hebben we het niet meer over big, maar gewoon over data. Data van allerlei soorten en in allerlei volumes die om verschillende soorten opslag vragen: polyglot persistence. Programmeurs kennen de term polyglot al lang. Een applicatie anno 2015 wordt vaak al in meerdere talen geschreven. Maar ook aan de opslag kant van een applicatie is het niet meer alleen relationeel wat de klok zal slaan. We zullen steeds meer andere soorten databases toepassen in onze data oplossingen, zoals graph databases, document databases, etc. Naast specialisten die alles van één soort database afweten, heb je dan ook generalisten nodig die precies weten welke database zich waarvoor leent.

    De doorbraak van het moderne datawarehouse
    ‘Een polyglot is iemand met een hoge graad van taalbeheersing in verschillende talen’, aldus Wikipedia. Het gaat dan om spreektalen, maar ook in het it-vakgebied, kom je de term steeds vaker tegen. Een applicatie die in meerdere programmeertalen wordt gecodeerd en data in meerdere soorten databases opslaat. Maar ook aan de business intelligence-kant volstaat één taal, één omgeving niet meer. De dagen van het traditionele datawarehouse met een etl-straatje, een centraal datawarehouse en één of twee bi-tools zijn geteld. We zullen nieuwe soorten data-platformen gaan zien waarin allerlei gegevens uit allerlei bronnen toegankelijk worden voor informatiewerkers en data scientists die allerlei tools gebruiken.

    Business intelligence in de cloud
    Waar vooral Nederlandse bedrijven nog steeds terughoudend zijn waar het de cloud betreft, zie je langzaam maar zeker dat de beweging richting cloud ingezet wordt. Steeds meer bedrijven realiseren zich dat met name security in de cloud vaak beter geregeld is dan dat ze zelf kunnen regelen. Ook cloud leveranciers doen steeds meer om Europese bedrijven naar hun cloud te krijgen. De nieuwe data centra van Microsoft in Duitsland waarbij niet Microsoft maar Deutsche Telekom de controle en toegang tot klantgegevens regelt, is daar een voorbeeld van. 2016 kan wel eens hét jaar worden waarin de cloud écht doorbreekt en waarin we ook in Nederland steeds meer complete BI oplossingen in de cloud zullen gaan zien.

    Huub Hillege, principal data(base) management consultant bij Info-Shunt
    Big data
    De big data-hype zal zich nog zeker voortzetten in 2016 alleen het succes bij de bedrijven is op voorhand niet gegarandeerd. Bedrijven en pas afgestudeerden blijven elkaar gek maken over de toepassing. Het is onbegrijpelijk dat iedereen maar Facebook, Twitter en dergelijke data wil gaan ontsluiten terwijl de data in deze systemen hoogst onbetrouwbaar is. Op elke conferentie vraag ik waar de business case, inclusief baten en lasten is, die alle investeringen rondom big data rechtvaardigen. Zelfs bi-managers van bedrijven moedigen aan om gewoon te beginnen. Dus eigenlijk: achterom kijken naar de data die je hebt of kunt krijgen en onderzoeken of je iets vindt waar je iets aan zou kunnen hebben. Voor mij is dit de grootste valkuil, zoals het ook was met de start van Datawarehouses in 1992. Bedrijven hebben in de huidige omstandigheden beperkt geld. Zuinigheid is geboden.

    De analyse van big data moet op de toekomst zijn gericht vanuit een duidelijke business-strategie en een kosten/baten-analyse: welke data heb ik nodig om de toekomst te ondersteunen? Bepaal daarbij:

    • Waar wil ik naar toe?
    • Welke klantensegmenten wil ik erbij krijgen?
    • Gaan we met de huidige klanten meer 'Cross selling' (meer producten) uitvoeren?
    • Gaan we stappen ondernemen om onze klanten te behouden (Churn)?

    Als deze vragen met prioriteiten zijn vastgelegd moet er een analyse worden gedaan:

    • Welke data/sources hebben we hierbij nodig?
    • Hebben we zelf de data, zijn er 'gaten' of moeten we externe data inkopen?

    Databasemanagementsysteem
    Steeds meer databasemanagementsysteem (dbms)-leveranciers gaan ondersteuning geven voor big data-oplossingen zoals bijvoorbeeld Oracle/Sun Big Data Appliance, Teradata/Teradata Aster met ondersteuning voor Hadoop. De dbms-oplossingen zullen op de lange termijn het veld domineren. big data-software-oplossingen zonder dbms zullen het uiteindelijk verliezen.

    Steeds minder mensen, ook huidige dbma's, begrijpen niet meer hoe het technisch diep binnen een database/DBMS in elkaar zit. Steeds meer zie je dat fysieke databases uit logische data modelleer-tools worden gegeneerd. Formele fysieke database-stappen/-rapporten blijven achterwege. Ook ontwikkelaars die gebruik maken van etl-tools zoals Informatica, AbInitio, Infosphere, Pentaho et cetera, genereren uiteindelijk sgl-scripts die data van sources naar operationele datastores en/of datawarehouse brengen.

    Ook de bi-tools zoals Microstrategy, Business Objects, Tableau et cetera genereren sql-statements.
    Meestal zijn dergelijke tools initieel ontwikkeld voor een zeker dbms en al gauw denkt men dat het dan voor alle dbms'en toepasbaar is. Er wordt dan te weinig gebruik gemaakt van specifieke fysieke dbms-kenmerken.

    De afwezigheid van de echte kennis veroorzaakt dan performance problemen die in een te laat stadium worden ontdekt. De laatste jaren heb ik door verandering van databaseontwerp/indexen en het herstructureren van complexe/gegenereerde sql-scripts, etl-processen van zes tot acht uur naar één minuut kunnen krijgen en queries die 45 tot 48 uur liepen uiteindelijk naar 35 tot veertig minuten kunnen krijgen.

    Advies
    De benodigde data zal steeds meer groeien. Vergeet de aanschaf van allerlei hype software pakketten. Zorg dat je zeer grote, goede, technische, Database-/dbms-expertise in huis haalt om de basis van onderen goed in te richten in de kracht van je aanwezige dbms. Dan komt er tijd en geld vrij (je kan met kleinere systemen uit de voeten omdat de basis goed in elkaar zit) om, na een goede business case en ‘proof of concepts’, de juiste tools te selecteren.

  • 3 AI and data science applications that can help dealing with COVID-19

    3 AI and data science applications that can help dealing with COVID-19

    All industries already feel the impact of the current COVID-19 pandemic on the economy. As many businesses had to shut down and either switch to telework or let go of their entire staff, there is no doubt that it will take a long time for the world to recover from this crisis.

    Current prospects on the growth of the global economy, shared by different sources, support the idea of the long and painful recovery of the global economy from the COVID-19 crisis.
    Statista, for example, compares the initial GDP growth prognosis for 2020 and the prognosis based on the impact of the novel coronavirus on the GPD growth, estimating the difference of as much as 0.5%.

    The last time that global GDP experienced such a decline was back in 2008 when the global economic crisis affected every industry with no exceptions.

    In the situation with the current pandemic, we also see that different industries change their growth prognoses.
    The IT industry, for instance, the expected spending growth in 2020 doesn’t even exceed the pessimistic scenario related to the coronavirus pandemic, and is even expected to shrink.

    It would be foolish to claim that the negative effect of the COVID-19 crisis can be reversed. It is already our reality that many businesses and industries around the world will suffer during the current global economic crisis.
    Governments around the world responded to this crisis by helping businesses not go bankrupt with state financial support. However, this support is only expected to have a short-term effect and will hardly mitigate the final effect of the global economic crisis on businesses around the world.

    So, in search of solutions to decrease the negative effect of drowning global economics, the world, among all other sources, will likely turn to the help of technology, just as the entire world did when it was forced to work from home.

    In this article, we offer our stance on how AI and data scientists, in particular, can help respond to the COVID-19 crisis and help relieve its negative effect.

    1. Data science and healthcare system

    The biggest negative effect on the global economy can come from failing healthcare systems. It was the reason why governments around the world ordered citizens to stay at home and self-isolate, as, in many cases, the course of the COVID-19 disease can be asymptomatic.

    Is increasing investment in the healthcare system a bad thing altogether?

    No, if we are talking about healthcare systems at a local level, like a state or a province. “At a local level, increasing investments in the healthcare system increases the demand for related products and equipment in direct ratio,” says Dorian Martin, a researcher at WowGrade.

    However, in case local governments run out of money in their emergency budgets, they might have to ask the state government for financial support.

    This scenario could become our reality if the number of infected people rapidly increases, with hospitals potentially running out of equipment, beds, and, most critically, staff.

    What can data science do to help manage this crisis?

    UK’s NHS healthcare data storage

    Some countries are already preparing for the scenario described above with the help of data scientists.
    For instance, the UK government ordered NHS England to develop a data store that would combine multiple data sources and make them deliver information to one secure cloud storage.
    What will this data include?

    This cloud storage will help NHS healthcare workers access information on the movement of the critical staff, the availability of hospital beds and equipment.

    Apart from that, this data storage will help the government to get a comprehensive and accurate view of the current situation to detect anomalies, and make timely decisions based on real data received from hospitals and NHS partner organizations.

    Thus, the UK government and NHS are looking into data science to create a system that will help the country tackle the crisis consistently, and manage the supply and demand for critical hospital equipment needed to fight the pandemic.

    2. AI’s part in creating the COVID-19 vaccine

    Another critical factor that has an effect on the current global economic crisis is the COVID-19 vaccine. It has already become clear that the world is in the standby mode until scientists develop a vaccine that will return people to their normal lives.

    It’s a simple cause-and-effect relationship: both global economy and local economies depend on consistent production, production depends on open and functioning production facilities, which depend on workers, who, in their turn, depend on the vaccine to be able to return to work.

    And while we still have over a year before the COVID-19 vaccine becomes available to the wide public, scientists turn to AI to speed up the process.

    How can AI help develop the COVID-19 vaccine?

    • With the help of AI, scientists can analyze the structure of the virus and how it attaches itself to human cells, i.e., its behavior. This data helps researchers build the foundation for vaccine development.
    • AI and data science become part of the vaccine development process, as they help scientists analyze thousands of research papers on the matter to make their approach to the vaccine more precise.

    An important part of developing a vaccine is analyzing and understanding the protein of the virus and its genetic sequence. In January 2020, Google DeepMind launched a system that builds the virus’s protein in the 3D mode, AlphaFold. This invention already helped the U.S. scientists study the virus enough to create a trial vaccine and launch clinical trials this week.

    However, scientists are looking into the ways, how AI can not only be involved in gathering information, but also in the very process of creating a vaccine.

    There have already been cases of drugs successfully created by AI. The British startup Excienta created its first drug with the help of artificial intelligence algorithms. The drug is currently undergoing clinical trials. But it will take this drug only 12 months to be ready, compared to 5 years that it usually takes.

    Thus, AI gives the world hope that the long-awaited COVID-19 vaccine will be available to the world faster than it’s currently predicted. Yet, there are still a few problems of artificial intelligence implementation in this process, which are mainly connected to AI being underdeveloped itself.

    3. Data science and the fight against misinformation

    Another factor, which is mostly related to how people respond to the current crisis, and yet has the most negative effect on the global economy, is panic.

    We’ve already seen the effects of the rising panic during the Ebola virus crisis in Africa when local economies suffered from plummeting sectors like tourism and commerce.

    In economics, the period between the boom (the rising demand for the product) and the bust (a drop in product availability) is very short. During the current pandemic, we’ve seen quite a few examples of how panic buying led to low supply, which damaged local economies.

    How can data scientists tackle the threat of panic?

    The answer is already in the question: with data.

    One of the reasons why people panic is misinformation. “Our online poll has shown that only 12% of respondents read authoritative COVID-19-related resources, while others mostly relied on word-of-mouth approach,” says Martin Harris, a researcher at Studicus.

    Misinformation, unfortunately, happens not only among people but on the government level as well. One of the best examples of it is the U.S. officials promoting a drug against malaria as an effective method to treat COVID-19 patients, when, in fact, the effectiveness of this drug hasn’t been proven yet.

    The best solution to treat the virus of panic and misinformation is to accumulate all the information from the authoritative resources on the COVID-19 pandemic to help people observe it not only on the local but on the global level as well.

    Data scientists and developers at Boston Children’s Hospital have created such a system, called HealthMap, to help people track COVID-19 pandemic, as well as other disease outbreaks around the world.

    Conclusion

    While there are already quite a few applications of AI and data science that help us respond to the COVID-19 crisis, this crisis is still in its early stages of development.

    As we already can use data science to accumulate important information regarding critical hospital staff and equipment, fight misinformation, and use AI to develop the vaccine, we still might discover new ways of applying AI and data science to help the world respond to the COVID-19 crisis.

    Yet, today, we can already say that AI and data science have been of enormous help in fighting the pandemic, giving us hope that we will return to our normal lives as soon as possible.

    Author: Estelle Liotard

    Source: In Data Labs

  • 3 Predicted trends in data analytics for 2021

    3 Predicted trends in data analytics for 2021

    It’s that time of year again for prognosticating trends and making annual technology predictions. As we move into 2021, there are three trends data analytics professionals should keep their eyes on: OpenAI, optimized big data storage layers, and data exchanges. What ties these three technologies together is the maturation of the data, AI and ML landscapes. Because there already is a lot of conversation surrounding these topics, it is easy to forget that these technologies and capabilities are fairly recent evolutions. Each technology is moving in the same direction -- going from the concept (is something possible?) to putting it into practice in a way that is effective and scalable, offering value to the organization.

    I predict that in 2021 we will see these technologies fulfilling the promise they set out to deliver when they were first conceived.

    #1: OpenAI and AI’s Ability to Write

    OpenAI is a research and deployment company that last year released what they call GPT3 -- artificial intelligence that generates text that mimics text produced by humans. This AI offering can write prose for blog posts, answer questions as a chatbot, or write software code. It’s risen to a level of sophistication where it is getting more difficult to discern if what it generated was written by a human or a robot. Where this type of AI is familiar to people is in writing email messages; Gmail anticipates what the user will write next and offers words or sentence prompts. GPT3 goes further: the user can create a title or designate a topic and GPT3 will write a thousand-word blog post.

    This is an inflection point for AI, which, frankly, hasn’t been all that intelligent up to now. Right now, GPT3 is on a slow rollout and is being used primarily by game developers enabling video gamers to play, for example, Dungeons and Dragons without other humans.

    Who would benefit from this technology? Anyone who needs content. It will write code. It can design websites. It can produce articles and content. Will it totally replace humans who currently handle these duties? Not yet, but it can offer production value when an organization is short-staffed. As this technology advances, it will cease to feel artificial and will eventually be truly intelligent. It will be everywhere and we’ll be oblivious to it.

    #2: Optimized Big Data Storage Layers

    Historically, massive amounts of data have been stored in the cloud, on hard drives, or wherever your company holds information for future use. The problem with these systems has been finding the right data when needed. It hasn’t been well optimized, and the adage “like looking for a needle in the haystack” has been an accurate portrayal of the associated difficulties. The bigger the data got, the bigger the haystack got, and the harder it became to find the needle.

    In the past year, a number of technologies have emerged, including Iceberg, Hudi, and Delta Lake, that are optimizing the storage of large analytics data sets and making it easier to find that needle. They organize the hay in such a way that you only have to look at a small, segmented area, not the entire data haystack, making the search much more precise.

    This is valuable not only because you can access the right data more efficiently, but because it makes the data retrieval process more approachable, allowing for widespread adoption in companies. Traditionally, you had to be a data scientist or engineer and had to know a lot about underlying systems, but these optimized big data storage layers make it more accessible for the average person. This should decrease the time and cost of accessing and using the data.

    For example, Iceberg came out of an R&D project at Netflix and is now open source. Netflix generates a lot of data, and if an executive wanted to use that data to predict what the next big hit will be in its programming, it could take three engineers upwards of four weeks to come up with an answer. With these optimized storage layers, you can now get answers faster, and that leads to more specific questions with more efficient answers.

    #3: Data Exchanges

    Traditionally, data has stayed siloed within an organization and never leaves. It has become clear that another company may have valuable data in their silo that can help your organization offer a better service to your customers. That’s where data exchanges come in. However, to be effective, a data exchange needs a platform that offers transparency, quality, security, and high-level integration.

    Going into 2021 data exchanges are emerging as an important component of the data economy, according to research from Eckerson Group. According to this recent report, “A host of companies are launching data marketplaces to facilitate data sharing among data suppliers and consumers. Some are global in nature, hosting a diverse range of data sets, suppliers, and consumers. Others focus on a single industry, functional area (e.g., sales and marketing), or type of data. Still, others sell data exchange platforms to people or companies who want to run their own data marketplace. Cloud data platform providers have the upper hand since they’ve already captured the lion’s share of data consumers who might be interested in sharing data.”

    Data exchanges are very much related to the first two focal points we already mentioned, so much so that data exchanges are emerging as a must-have component of any data strategy. Once you can store data more efficiently, you don’t have to worry about adding greater amounts of data, and when you have AI that works intelligently, you want to be able to use the data you have on hand to fill your needs.

    We might reach a point where Netflix isn’t just asking the technology what kind of content to produce but the technology starts producing the content. It uses the data it collects through the data exchanges to find out what kind of shows will be in demand in 2022, and then the AI takes care of the rest. It’s the type of data flow that today might seem far-fetched, but that’s the direction we’re headed.

    A Final Thought

    One technology is about getting access, one is understanding new data, and one is executing information based on the data. As these three technologies begin to mature, we can expect to see a linear growth pattern and see them all intersect at just the right time.

    Author: Nick Jordan

    Source: TDWI

  • 4 Tips om doodbloedende Big Data projecten te voorkomen

    projectmanagers

    Investeren in big data betekent het verschil tussen aantrekken of afstoten van klanten, tussen winst of verlies. Veel retailers zien hun initiatieven op het vlak van data en analytics echter doodbloeden. Hoe creëer je daadwerkelijk waarde uit data en voorkom je een opheffingsuitverkoop? Vier tips.

    Je investeert veel tijd en geld in big data, exact volgens de boodschap die retailgoeroes al enkele jaren verkondigen. Een team van data scientists ontwikkelt complexe datamodellen, die inderdaad interessante inzichten opleveren. Met kleine ‘proofs of value’ constateert u dat die inzichten daadwerkelijk ten gelde kunnen worden gemaakt. Toch gebeurt dat vervolgens niet. Wat is er aan de hand?

    Tip 1: Pas de targets aan

    Dat waardevolle inzichten niet in praktijk worden gebracht, heeft vaak te maken met de targets die uw medewerkers hebben meegekregen. Neem als voorbeeld het versturen van mailingen aan klanten. Op basis van bestaande data en klantprofielen kunnen we goed voorspellen hoe vaak en met welke boodschap elke klant moet worden gemaild. En stiekem weet elke marketeer donders goed dat niet elke klant op een dagelijkse email zit te wachten.

    Toch trapt menigeen in de valkuil en stuurt telkens weer opnieuw een mailing uit naar het hele klantenbestand. Het resultaat: de interesse van een klant ebt snel weg en de boodschap komt niet langer aan. Waarom doen marketeers dat? Omdat ze louter en alleen worden afgerekend op de omzet die ze genereren, niet op de klanttevredenheid die ze realiseren. Dat nodigt uit om iedereen zo vaak mogelijk te mailen. Op korte termijn groeit met elk extra mailtje immers de kans op een verkoop.

    Tip 2: Plaats de analisten in de business

    Steeds weer zetten retailers het team van analisten bij elkaar in een kamer, soms zelfs als onderdeel

    van de IT-afdeling. De afstand tot de mensen uit de business die de inzichten in praktijk moeten brengen, is groot. En te vaak blijkt die afstand onoverbrugbaar. Dat leidt tot misverstanden, onbegrepen analisten en waardevolle inzichten die onbenut blijven.

    Beter is om de analisten samen met de mensen uit de business bij elkaar te zetten in multidisciplinaire teams, die werken met scrum-achtige technieken. Organisaties die succesvol zijn, beseffen dat ze continu in verandering moeten zijn en werken in dat soort teams. Dat betekent dat business managers in een vroegtijdig stadium worden betrokken bij de bouw van datamodellen, zodat analisten en de business van elkaar kunnen leren. Klantkennis zit immers in data én in mensen.

    Tip 3: Neem een business analist in dienst

    Data-analisten halen hun werkplezier vooral uit het maken van fraaie analyses en het opstellen van goede, misschien zelfs overontwikkelde datamodellen. Voor hun voldoening is het vaak niet eens nodig om de inzichten uit die modellen in praktijk te brengen. Veel analisten zijn daarom ook niet goed in het interpreteren van data en het vertalen daarvan naar de concrete impact op de retailer. 

    Het kan verstandig zijn om daarom een business analist in te zetten. Dat is iemand die voldoende affiniteit heeft met analytics en enigszins snapt hoe datamodellen tot stand komen, maar ook weet wat de uitdagingen van de business managers zijn. Hij kan de kloof tussen analytics en business overbruggen door vragen uit de business te concretiseren en door inzichten uit datamodellen te vertalen naar kansen voor de retailer.

    Tip 4: Analytics is een proces, geen project

    Nog te veel retailers kijken naar alle inspanningen op het gebied van data en analytics alsof het een project met een kop en een staart betreft. Een project waarvan vooraf duidelijk moet zijn wat het gaat opleveren. Dat is vooral het geval bij retailorganisaties die worden geleid door managers uit de ‘oude generatie’ die onvoldoende gevoel en affiniteit met de nieuwe wereld hebben Het commitment van deze managers neemt snel af als investeringen in data en analytics niet snel genoeg resultaat opleveren.

    Analytics is echter geen project, maar een proces waarin retailers met vallen en opstaan steeds handiger en slimmer worden. Een proces waarvan de uitkomst vooraf onduidelijk is, maar dat wel moet worden opgestart om vooruit te komen. Want alle ontwikkelingen in de retailmarkt maken één ding duidelijk: stilstand is achteruitgang.

    Auteur: EY, Simon van Ulden, 5 oktober 2016

  • 5 Astonishing IoT examples in civil engineering

    5 Astonishing IoT examples in civil engineering

    The internet of things is making a major impact on the field of civil engineering, and these five examples of IoT applications in civil engineering are fascinating.

    As the Internet of Things (IoT) becomes smarter and more advanced, we’ve started to see its usage grow across various industries. From retail and commerce to manufacturing, the technology continues to do some pretty amazing things in nearly every sector. The civil engineering field is no exception.

    An estimated 20 billion internet-connected devices will be active around the world by 2020. Adoption is certainly ramping up, and the technologies that support IoT are also growing more sophisticated: including big data, cloud computing and machine learning.

    As a whole, civil engineering projects have a lot to gain from the integration of IoT technologies and devices. The technology significantly improves automation and remote monitoring for many tasks, allowing operators to remain hands-off more than ever before. The data that IoT devices collect can inform and enable action throughout the scope of a project and even beyond.

    For example, IoT sensors can monitor soil consolidation and degradation, as well as a development project’s environmental impact. Alternatively, IoT can measure and identify public roadways that need servicing. These two basic examples provide a glimpse into what IoT can do in the civil engineering sector.

    IoT, alongside many other innovative construction technologies, will completely transform the industry. That said, what role is it currently playing in the field? What are some other applications that are either planned or now in use? How can the civil engineering industry genuinely use IoT?

    1. Allows a transformation from reactionary to preventative maintenance

    Most maintenance programs are corrective or reactionary. When something breaks down or fails, a team acts to fix the problem. In reality, this practice is nothing more than slapping a bandage on a gaping wound.

    With development projects, once things start to break down, they generally continue on that path. Problems grow much more prominent, no matter what fixes you apply. It makes more sense, then, to monitor a subject’s performance and status and apply fixes long before things break down. In other words, using a preventative maintenance routine is much more practical, efficient and reliable.

    IoT devices and sensors deliver all the necessary data to make such a process possible. They collect information about a subject in real-time and then report it to an external system or analytics program. That program then identifies potential errors and communicates the necessary information to a maintenance crew.

    In any field of construction, preventative maintenance considerably improves the project in question as well as the entire management process. Maintenance management typically comprises about 40% to 50% of a business’s operational budget. Companies spend much of their time reacting to maintenance issues rather than preventing them. IoT can turn that around.

    2. Presents a real-time construction management solution

    A proper construction management strategy is necessary for any civil engineering project. Many nuanced tasks need to be completed, whether they involve tracking and measuring building supplies or tagging field equipment and dividing it up properly.

    IoT technology can reduce tension by collecting relevant information in real time and delivering it to the necessary parties. Real-time solutions also provide faster time-to-action. Management and decision-makers can see almost immediately how situations are playing out and take action to either improve or correct a project’s course.

    For example, imagine the following scenario. During a project that’s underway, workers hit a snag that forced them to use more supplies than expected. Rather than waiting until supplies run out, the technology has already ordered more. That way, the supplies are already on their way and will arrive at the project site before the existing supply is exhausted. The result is a seamless operation that continually moves forward, despite any potential errors. IoT can measure the number of supplies and report it to a remote system, which then makes the necessary purchase order.

    3. Creates automated and reliable documentation

    One of the minor responsibilities of development and civil engineering projects is related to paperwork. Documentation records a great deal about a project before, during and after it wraps up.

    IoT technologies can improve the entire process, if not completely automate many of the tedious elements. Reports are especially useful to have during inspections, insurance and liability events, and much more. The data that IoT collects can be parsed and added to any report to fill out much-needed details. Because the process happens automatically, the reports can generate with little to no external input.

    4. Provides a seamless project safety platform

    Worksites can be dangerous, which is why supervisors and project managers must remain informed about their workers at all times. If an accident occurs, they must be able to locate and evacuate any nearby personnel. IoT can provide real-time tracking for all workers on a site, and even those off-site.

    More importantly, IoT technology can connect all those disparate parties, allowing for direct communication with near-instant delivery. The result is a much safer operation for all involved, especially the workers who spend most of their time in the trenches.

    5. Enhances operational intelligence support

    By putting IoT and data collection devices in place with no clear guidance, an operation can suffer from data overload: an overabundance and complete saturation of intelligence with no clear way to analyze the data and use it.

    Instead, once IoT technology is implemented, organizations are forced to focus on an improved operational intelligence program to make sure the data coming in is adequately vetted, categorized and put to use. It’s cyclical because IoT empowers the intelligence program by offering real-time collection and analysis opportunities. So, even though more data is coming in and the process of extracting insights is more complex, the reaction times are much faster and more accurate as a result.

    Here’s a quick example. With bridge and tunnel construction, it’s necessary to monitor the surrounding area for environmental changes. Soil and ground movement, earthquakes, changes in water levels and similar events can impact the project. Sensors embedded within the surrounding area can collect pertinent information, which passes to a remote analytics tool. During a seismic event, the entire system would instantly discern if work must be postponed or if it can continue safely. A support program can distribute alerts to all necessary parties automatically, helping to ensure everyone knows the current status of the project, especially those in the field.

    Identifying new opportunities with IoT

    Most civil engineering and development teams have no shortage of projects in today’s landscape. Yet, it’s still crucial to remain informed about the goings-on to help pinpoint more practical opportunities.

    When IoT is installed during new projects, the resulting data reports may reveal additional challenges or problems that would have otherwise gone unnoticed. A new two-lane road, for instance, may see more traffic and congestion than initially expected. Or, perhaps a recently developed water pipeline is seeing unpredictable pressure spikes.

    With the correct solutions in place, IoT can introduce many new opportunities that might significantly improve the value and practicality of a project.

    Author: Megan Ray Nichols

    Source: SmartDataCollective

  • 7 Personality assets required to be successful in data and tech

    7 Personality assets required to be successful in data and tech

    If you look at many of the best-known visionaries, such as Richard Branson, Elon Musk, and Steve Jobs, there are certain traits that they all have which are necessary for being successful. So this got me thinking, what are the characteristics necessary for success in the tech industry? In this blog, I’m going to explain the seven personality traits that I decided are necessary for success, starting with:

    1. Analytical capabilities

    Technology is extremely complex. If you want to be successful, you should be able to cope with complexity. Complexity not only from technical questions, but also when it comes to applying technology in an efficient and productive way.

    2. Educational foundation

    Part of the point above is your educational foundation. I am not talking so much about specific technical expertise learned at school or university, but more the general basis for understanding certain theories and relations. The ability to learn and process new information very quickly is also important. We all know that we have to learn new things continuously.

    3. Passion

    One of the most important things in the secret sauce for success is being passionate about what you do. Passion is the key driver of human activity, and if you love what you’re doing, you’ll be able to move mountains and conquer the world. If you are not passionate about what you are doing, you are doing the wrong thing.

    4. Creativity

    People often believe that if you are just analytical and smart, you’ll automatically find a good solution. But in the world of technology, there is no one single, optimal, rational solution in most cases. Creating technology is a type of art, where you have to look for creative solutions, rather than having a genius idea. History teaches us that the best inventions are born out of creativity.

    5. Curiosity

    The best technology leaders never stop being curious like children. Preserving an open mind, challenging everything and keeping your curiosity for new stuff will facilitate your personal success in a constantly changing world.

    6. Persistence

    If you are passionate, smart and creative and find yourself digging deeply into a technological problem, then you’ll definitively need persistence. Keep being persistent to analyze your problem appropriately, to find your solution, and eventually to convince others to use it.

    7. Being a networker and team player

    If you have all the other skills, you might already be successful. But, the most important booster of your success is your personal skillset. Being a good networker and team player, and having the right people in your network to turn to for support, will make the whole journey factors easier. There might be successful mavericks, but the most successful people in technology have a great set of soft skills.

    As you’ll notice, these characteristics aren’t traits that you are necessarily born with. For those who find that these characteristics don’t come naturally to them, you’ll be pleased to hear that all can be learned and adopted through hard work and practice. Anyone can be successful in tech, and by keeping these traits in mind in future, you too can ensure a long and successful career in tech.

    Author: Mathias Golombek

    Source: Dataversity

     

  • 9 Data issues to deal with in order to optimize AI projects

    9 Data issues to deal with in order to optimize AI projects

    The quality of your data affects how well your AI and machine learning models will operate. Getting ahead of these nine data issues will poise organizations for successful AI models.

    At the core of modern AI projects are machine-learning-based systems which depend on data to derive their predictive power. Because of this, all artificial intelligence projects are dependent on high data quality.

    However, obtaining and maintaining high quality data is not always easy. There are numerous data quality issues that threaten to derail your AI and machine learning projects. In particular, these nine data quality issues need to be considered and prevented before issues arise.

    1. Inaccurate, incomplete and improperly labeled data

    Inaccurate, incomplete or improperly labeled data is typically the cause of AI project failure. These data issues can range from bad data at the source to data that has not been cleaned or prepared properly. Data might be in the incorrect fields or have the wrong labels applied.

    Data cleanliness is such an issue that an entire industry of data preparation has emerged to address it. While it might seem an easy task to clean gigabytes of data, imagine having petabytes or zettabytes of data to clean. Traditional approaches simply don't scale, which has resulted in new AI-powered tools to help spot and clean data issues.

    2. Having too much data

    Since data is important to AI projects, it's a common thought that the more data you have, the better. However, when using machine learning sometimes throwing too much data at a model doesn't actually help. Therefore, a counterintuitive issue around data quality is actually having too much data.

    While it might seem like too much data can never be a bad thing, more often than not, a good portion of the data is not usable or relevant. Having to go through to separate useful data from this large data set wastes organizational resources. In addition, all that extra data might result in data "noise" that can result in machine learning systems learning from the nuances and variances in the data rather than the more significant overall trend.

    3. Having too little data

    On the flip side, having too little data presents its own problems. While training a model on a small data set may produce acceptable results in a test environment, bringing this model from proof of concept or pilot stage into production typically requires more data. In general, small data sets can produce results that have low complexity, are biased or too overfitted and will not be accurate when working with new data.

    4. Biased data

    In addition to incorrect data, another issue is that the data might be biased. The data might be selected from larger data sets in ways that doesn't appropriately convey the message of the wider data set. In other ways, data might be derived from older information that might have been the result of human bias. Or perhaps there are some issues with the way that data is collected or generated that results in a final biased outcome.

    5. Unbalanced data

    While everyone wants to try to minimize or eliminate bias from their data sets, this is much easier said than done. There are several factors that can come into play when addressing biased data. One factor can be unbalanced data. Unbalanced data sets can significantly hinder the performance of machine learning models. Unbalanced data has an overrepresentation of data from one community or group while unnecessarily reducing the representation of another group.

    An example of an unbalanced data set can be found in some approaches to fraud detection. In general, most transactions are not fraudulent, which means that only a very small portion of your data set will be fraudulent transactions. Since a model trained on this fraudulent data can receive significantly more examples from one class versus another, the results will be biased towards the class with more examples. That's why it's essential to conduct thorough exploratory data analysis to discover such issues early and consider solutions that can help balance data sets.

    6. Data silos

    Related to the issue of unbalanced data is the issue of data silos. A data silo is where only a certain group or limited number of individuals at an organization have access to a data set. Data silos can result from several factors, including technical challenges or restrictions in integrating data sets as well as issues with proprietary or security access control of data.

    They are also the product of structural breakdowns at organizations where only certain groups have access to certain data as well as cultural issues where lack of collaboration between departments prevents data sharing. Regardless of the reason, data silos can limit the ability of those at a company working on artificial intelligence projects to gain access to comprehensive data sets, possibly lowering quality results.

    7. Inconsistent data

    Not all data is created the same. Just because you're collecting information, that doesn't mean that it can or should always be used. Related to the collection of too much data is the challenge of collecting irrelevant data to be used for training. Training the model on clean, but irrelevant data results in the same issues as training systems on poor quality data.

    In conjunction with the concept of data irrelevancy is inconsistent data. In many circumstances, the same records might exist multiple times in different data sets but with different values, resulting in inconsistencies. Duplicate data is one of the biggest problems for data-driven businesses. When dealing with multiple data sources, inconsistency is a big indicator of a data quality problem.

    8. Data sparsity

    Another issue is data sparsity. Data sparsity is when there is missing data or when there is an insufficient quantity of specific expected values in a data set. Data sparsity can change the performance of machine learning algorithms and their ability to calculate accurate predictions. If data sparsity is not identified, it can result in models being trained on noisy or insufficient data, reducing the effectiveness or accuracy of results.

    9. Data labeling issues

    Supervised machine learning models, one of the fundamental types of machine learning, require data to be labeled with correct metadata for machines to be able to derive insights. Data labeling is a hard task, often requiring human resources to put metadata on a wide range of data types. This can be both complex and expensive. One of the biggest data quality issues currently challenging in-house AI projects is the lack of proper labeling of machine learning training data. Accurately labeled data ensures that machine learning systems establish reliable models for pattern recognition, forming the foundations of every AI project. Good quality labeled data is paramount to accurately training the AI system on what data it is being fed.

    Organizations looking to implement successful AI projects need to pay attention to the quality of their data. While reasons for data quality issues are many, a common theme that companies need to remember is that in order to have data in the best condition possible, proper management is key. It's important to keep a watchful eye on the data that is being collected, run regular checks on this data, keep the data as accurate as possible, and get the data in the right format before having machine learning models learn on this data. If companies are able to stay on top of their data, quality issues are less likely to arise.

    Author: Kathleen Walch

    Source: TechTarget

  • 9 Tips to become a better data scientist

    9 Tips to become a better data scientist

    Over the years I worked on many Data Science projects. I remember how easy it was to get lost and waste a lot of energy in the wrong direction. In time, I learned what works for me to be more effective. This list is my best try to sum it up:

    1. Build a working pipeline first

    While it’s tempting to start with the cool stuff first, you want to make sure that you don't spend time on small technical things like loading the data, feature extraction and so on. I like to start with a very basic pipeline, but one that works, i.e., I can run it end to end and get results. Later I expand every part while keeping the pipeline working.

    2. Start simple and complicate one thing at a time

    Once you have a working pipeline, start expanding and improving it. You have to take it to step by step. It is very important to understand what caused what. If you introduce too many changes at once, it will be hard to tell how each change affected the whole model. Keep the updates as simple and clean as possible. Not only it will be easier to understand its effect, but also, it will be easier to refactor it once you come up with another idea.

    3. Question everything

    Now you have a lot on your hands, you have a working pipeline and you already did some changes that improved your results. It’s important to understand why. If you added a new feature and it helped the model to generalize better, why? If it didn't, why not? Maybe your model is slower than before, why’s that? Are you sure each of your features/modules does what you think it does? If not, what happened?

    These kinds of questions should pop in your head while you’re working. To end up with a really great result, you must understand everything that happens in your model.

    4. Experience a lot and experience fast

    After you questioned everything, you got stuck with… well, a lot of questions. The best way to answer them is to experiment. If you followed this far, you already have a working pipeline and a nicely written code, so conducting an experiment shouldn't waste much of your time. Ideally, you’ll be able to run more than one experiment at a time, this you’ll help you answer your questions and improve your intuition of what works and what is not.

    Things to experiment: Adding/removing features, changing hyperparameters, changing architectures, adding/removing data and so on.

    5. Prioritize and Focus

    At this point, you did a lot of work, you have a lot of questions, some answers, some other tasks and probably some new ideas to improve your model (or even working on something entirely different).

    But not all these are equally important. You have to understand what is the most beneficial direction for you. Maybe you came up with a brilliant idea that slightly improved your model but also made it much more complicated and slow, should you continue with this direction? It depends on your goal. If your goal is to publish a state-of-the-art solution, maybe it is. But if your goal is to deploy a fast and descent model to production, then probably you can invest your time on something else. Remember your final goal when working, and try to understand what tasks/experiments will get you closer to it.

    6. Believe in your metrics

    As discussed, understanding what’s working and what is not is very important. But how do you know when something works? you evaluate your results against some validation/test data and get some metric! You have to belive that metric! There may be some reasons not to believe in your metric. It could be the wrong one, for example. Your data may be unbalanced so accuracy can be the wrong metric for you. Your final solution must be very precise, so maybe you’re more interested in precision than recall. Your metric must reflect the goal you’re trying to achieve. Another reason not to believe in your metric is when your test data is dirty or noisy. Maybe you got data somewhere from the web and you don’t know exactly what’s in there?

    A reliable metric is important to advance fast, but also, it’s important that the metric reflects your goals. In data science, it may be easy to convince ourselves that our model is good, while in reality, it does very little.

    7. Work to publish/deploy

    Feedback is an essential part of any work, and data science is not an exception. When you work knowing that your code will be reviewed by someone else, you’ll write much better code. When you work knowing that you’ll need to explain it to someone else, you’ll understand it much better. It doesn't have to be a fancy journal or conference or company production code. If you’re working on a personal project, make it open source, write a post about it, send it to your friends, show it to the world!

    Not all feedback will be positive, but you’ll be able to learn from it and improve over time.

    8. Read a lot and keep updated

    I’m probably not the first one suggesting keeping up with the recent advancement to be effective, so instead of talking about it, I’ll just tell you how I do it: good old newsletters! I find them very useful as it’s essentially someone that keeps up with the most recent literature, picks the best stuff and sends it to you!

    9. Be curious

    While reading about the newest and coolest, don't limit yourself to the one area you’re interested in, try to explore others (but related) as well. It could be beneficial in a few ways. You can find a technique that works in one domain to be very useful in yours, you’ll improve your ability to understand complex ideas, and, you may find another domain that interests you so you’ll be able to expand your data skills and knowledge.

    Conclusion

    You’ll get much better results and enjoy the process if you’re effective. While all of the topics above are important, if I have to choose one, it will be 'Prioritize and Focus'. For me, all other topics lead to this one eventually. The key to success is to work on the right thing.

    Author: Dima Shulga

    Source: towards data science

  • A guide to Business Process Automation

    A guide to Business Process Automation

    Are you spending hours repeating the same tasks? Office workers spend 69 days a year on administrative tasks. You might be wishing for a simpler way to get those jobs done.

    An increasing number of businesses are relying on automation tools to take those repetitive tasks off their plate. In fact, 67% of businesses said that software solutions would be important to remain competitive.

    So, how will our workforce change with business process automation? And how will your business develop as the digital transformation era makes things happen faster?

    In this complete guide, we’ll cover:

    • What is Business Process Automation?
    • 5 Business Process Automation examples
      • Accounting
      • Customer service
      • Employee onboarding
      • HR onboarding
      • Sales and marketing
    • The benefits of Business Process Automation
    • What business processes can be automated?
    • Best practices with Business Process Automation

    What is Business Process Automation?

    Here’s a simple definition: Business Process Automation is the act of using software to make complex things simpler.

    (It’s also known as BPA or BPM. The latter means Business Process Management.)

    You can use BPA to cut the time you spend doing every-day tasks. For example, you can use chatbots to handle customer support queries. This uses robotic process automation (RPA). Or, you can use contract management software to get clients to put pen to paper on your deal.

    How else can you use business process automation?

    Business Process Automation examples

    Accounting

    Research has found that cloud computing reduces labor costs by 50%, which is probably why 67% of accountants prefer cloud accounting.

    So, how can you use an accounting automation solution in your business?

    • Generate purchase orders: Purchase orders have long paper trails that can be difficult to keep track of. Prevent that from becoming a problem by automating your purchase orders. Your software creates a PO and sends it automatically for approval.
    • Handle accounts payable: Automating your 'accounts payable' department can take tedious payment-related jobs off your hands. The software scans your incoming invoices, records how much you need to pay, and pays it with the click of a button.
    • Send invoices: Do you send the same invoices every week or month? Use automated invoicing systems to create business rules. For example, you can invoice your client on the 1st working day of each month without having to set a reminder to do it manually.

    Customer service

    Customer service is crucial for your business to get right. But it can take lots of human time, unless you’re taking advantage of these business process automations:

    • E-mail and push notifications: Use machine learning software, like chatbots, to handle incoming messages. The technology will understand your customer inquiry, and respond within seconds. Your customers or business users don’t need to wait for a response from a human agent.
    • Helpdesk support: Do you have an overwhelming log of support tickets? By automating your helpdesk, you can route tickets to different team members. For example, if someone says their query is about a billing issue, you could automatically send their ticket to a finance agent.
    • Call center processes: Think about what tasksyour call center team. Chances are, they’ll send emails once they hang up the phone. Or, they’ll set reminders to contact their lead in a few days. You can automate those repetitive tasks for them to focus on money-making calls with new leads.

    Employee onboarding

    Lots of paperwork and decision-making is involved with bringing on a new team member. However, you can automate most of the onboarding process with automation software. Here are some use cases.

    • Verify employment history: You don’t have to call a candidates’ references to verify they’ve worked there. You can automate this process using tools like PreCheck . This software scans data to find links between your candidates’ names and their past employers.
    • Source candidates: Find the best candidates by automating your recruitment process. For example, you can post a job description to one profile and syndicate it to other listing websites.
    • Manage contracts: Long gone are the days of posting an employment contract and waiting for your new team member to post it back. You can automate this business workflow with document signage software. It sends the document via email and automatically reminds your new team member to sign and return it. It simplifies the entire lifecycle of bringing a new team member on board.

    (Some fear that automation will destroy jobs in this process. Forrester data goes against this: 10% of jobs will be lost, but 3% will be created.)

    HR onboarding

    Your Human Resources teamwork with people. But that doesn’t mean they have to manually do those people-related tasks themselves. You can use the HR process automation for things like:

    • Time tracking: Figure out how much money you’re making per customer (or client) by tracking time. However, you can’t always rely on team members to record their time. It’s tricky to remember! You can automate their time-tracking, and use software to break down the time you’ve spent on each activity.
    • Employee leave requests: Do your staff need to send an email to submit a PTO request? Those emails can get lost. Instead, use a leave management system. This software will accept or decline requests and manage shifts based on absences.
    • Monitoring attendance: Keep an eye on your staff by using an automated attendance management system. You can track their clock-in (and out) times, breaks, and time off: without spying on them yourself.

    Sales and marketing

    Artificial Intelligence (AI) is the top growth area for sales teams, its’ adoption is expected to boost by 139% over the next 3 years. Your sales and marketing team can use business process automation for these sales and marketing activities:

    • Lead nurturing: Don’t rely on sticky notes to remind you of the leads you’re nurturing. You can add them to a CRM. Then, use automation to follow-up with your leads using a premade template or social media message.
    • Creating customer case studies: You can automate surveys to collect customer experience feedback. Add data processing software to pull sentiments from individual feedback submissions. From there, you can find customers likely to make the best case studies.
    • A/B testing: You’re probably running A/B tests on your website to determine which elements work best. Automate that process using tools like Intellimize. They’ll automatically show variations to your visitors, and collect the real-time data to analyze. Pick the one with the best user experience metrics.

    Still not convinced? This could give your business a competitive advantage. Just 28% of marketers use marketing automation software.

    Benefits of Business Process Automation

    The use cases we’ve shared work for any business. But they’re not just 'nice to have'. There are several ways you’ll benefit from business process automation, such as:

    Increased efficiency and productivity: Your automation tools store information in the cloud. This means you can access your systems from anywhere. It’s great for remote or mobile workers who use multiple devices.

    Faster turnaround: You don’t have to complete your day-to-day tasks manually. Sure, you’ll need to spend a few hours creating your automations. But you’ll save time when your software does them faster.

    Cost savings: You might not think that the hours you spend doing certain tasks cost a lot in comparison to the software. But, those hours are salaried; you’re still paying each team member their hourly rate. McKinsey found that 45% of paid activities can be automated by technology. (That’s an equivalent of $2 trillion in total annual wages.)

    Fewer errors: Some studies argue that computers are smarter than the human brain. In fact,  Google found that customers that use custom document classification have achieved up to 96% accuracy. You’re less prone to human errors using business process automation.

    Better team collaboration: With automation software, your entire team can view the processes you’re making with their own account. They won’t need to wait for a suitable time to talk about strategy. They can check the automation processes to see for themselves. Again, this is great for distributed teams who don’t have in-office communication.

    Best Practices with Business Process Automation

    Ready to start using business automation software?

    Avoid diving in feet-first with the first application you find. Refer to these best practices to get the most value out of process workflow automations:

    • Know your business’ needs, and prioritize automation software that helps with them. For example, if your focus is improving customer wait times, look at chatbot-style automations.
    • Write a list of the repetitive tasks, such as data entry, you’ll be able to automate. Do this by asking your team. the people who work in a specific department day-in, day-out. Or, ask your project management team for their advice. Can you find a single process or tool to streamline most of their tasks?
    • Start training your entire team on how to use the process automations. Some applications offer this type of support as part of your purchase. IBM, for example, have a Skills Gateway.

    The final thing to note? Don’t rush into business process automations.

    Start small and get used to how software is used. Then, ask your team for feedback. It’s better to be safe than sorry with this type of business decision, especially when your business is at stake!

    Author: Matt Shealy 

    Source: SAP

  • A look at the major trends driving next generation datacenters

    Data centers have become a core component of modern living, by containing and distributing the information required to participate in everything from social life to economy. In 2017, data centers consumed 3 percent of the world’s electricity, and new technologies are only increasing their energy demand. The growth of high-performance computing — as well as answers to growing cyber-security threats and efficiency concerns — are dictating the development of the next generation of data centers.

    But what will these new data centers need in order to overcome the challenges the industry faces? Here is a look at 5 major trends that will impact data center design in the future.

    1. Hyperscale functionality

    The largest companies in the world are increasingly consolidating computing power in massive, highly efficient hyperscale data centers that can keep up with the increasing demands of enterprise applications. These powerful data centers are mostly owned by tech giants like Amazon or Facebook, and there are currently around 490 of them in existence with more than 100 more in development. It’s estimated that these behemoths will contain more than 50 percent of all data that passes through data centers by 2021, as companies take advantage of their immense capabilities to implement modern business intelligence solutions and grapple with the computing requirements of the Internet of Things (IoT).

    2. Liquid efficiency

    The efficiency of data centers is both an environmental concern and a large-scale economic issue for operators. Enterprises in diverse industries from automotive design to financial forecasting are implementing and relying on machine-learning in their applications, which results in more expensive and high-temperature data center infrastructure. It’s widely known that power and cooling represent the biggest costs that data center owners have to contend with, but new technologies are emerging to combat this threat. Liquid cooling is swiftly becoming more popular for those building new data centers, because of its incredible efficiency and its ability to future-proof data centers against the increasing heat being generated by demand for high-performance computing. The market is expected to grow to $2.5 billion by 2025 as a result.

    3. AI monitoring

    Monitoring software that implements the critical advances made in machine learning and artificial intelligence is one of the most successful technologies that data center operators have put into practice to improve efficiency. Machines are much more capable of reading and predicting the needs of data centers second to second than their human counterparts, and with their assistance operators can manipulate cooling solutions and power usage in order to dramatically increase energy efficiency.

    4. DNA storage

    In the two-year span between 2015 and 2017, more data was created than in all of preceding history. As this exponential growth continues, we may soon see the sheer quantity of data outstrip the ability of hard drives to capture it. But researchers are exploring the possibility of storing this immense amount of data within DNA, as it is said that a single gram of DNA is capable of storing 215 million gigabytes of information. DNA storage could provide a viable solution to the limitations of encoding on silicon storage devices, and meet the requirements of an ever-increasing number of data centers despite land constraints near urban areas. But it comes with its own drawbacks. Although it has improved considerably, it is still expensive and extremely slow to write data to DNA. Furthermore, getting data back from DNA involves sequencing it, and decoding files and finding / retrieving specific files stored on DNA is a major challenge. However, according to Microsoft research data, algorithms currently being developed may lower the cost of sequencing and synthesizing DNA plunge to levels that make it feasible in the future.

    5. Dynamic security

    The average cost of a cyber-attack to the impacted businesses will be more than $150 million by 2020, and data centers are at the center of the modern data security fight. Colocation facilities have to contend with the security protocols of multiple customers, and the march of data into the cloud means that hackers can gain access to it through multiple devices or applications. New physical and cloud security features are going to be critical for the evolution of the data center industry, including biometric security measures on-site to prevent physical access by even the most committed thieves or hackers. More strict security guidelines for cloud applications and on-site data storage will be a major competitive advantage for the most effective data center operators going forward as cyber-attacks grow more costly and more frequent. The digital economy is growing more dense and complex every single day, and data center builders and operators need to upgrade and build with the rising demand for artificial intelligence and machine learning in mind. This will make it necessary for greener, more automated, more efficient and more secure data centers to be able to safely host the services of the next generation of digital companies.

    Author: Gavin Flynn

    Source: Information-management

  • A new quantum approach to big data

    MIT-Quantum-Big-Data 0From gene mapping to space exploration, humanity continues to generate ever-larger sets of data — far more information than people can actually process, manage, or understand.
    Machine learning systems can help researchers deal with this ever-growing flood of information. Some of the most powerful of these analytical tools are based on a strange branch of geometry called topology, which deals with properties that stay the same even when something is bent and stretched every which way.


    Such topological systems are especially useful for analyzing the connections in complex networks, such as the internal wiring of the brain, the U.S. power grid, or the global interconnections of the Internet. But even with the most powerful modern supercomputers, such problems remain daunting and impractical to solve. Now, a new approach that would use quantum computers to streamline these problems has been developed by researchers at MIT, the University of Waterloo, and the University of Southern California.
    The team describes their theoretical proposal this week in the journal Nature Communications. Seth Lloyd, the paper’s lead author and the Nam P. Suh Professor of Mechanical Engineering, explains that algebraic topology is key to the new method. This approach, he says, helps to reduce the impact of the inevitable distortions that arise every time someone collects data about the real world.


    In a topological description, basic features of the data (How many holes does it have? How are the different parts connected?) are considered the same no matter how much they are stretched, compressed, or distorted. Lloyd explains that it is often these fundamental topological attributes “that are important in trying to reconstruct the underlying patterns in the real world that the data are supposed to represent.”


    It doesn’t matter what kind of dataset is being analyzed, he says. The topological approach to looking for connections and holes “works whether it’s an actual physical hole, or the data represents a logical argument and there’s a hole in the argument. This will find both kinds of holes.”
    Using conventional computers, that approach is too demanding for all but the simplest situations. Topological analysis “represents a crucial way of getting at the significant features of the data, but it’s computationally very expensive,” Lloyd says. “This is where quantum mechanics kicks in.” The new quantum-based approach, he says, could exponentially speed up such calculations.


    Lloyd offers an example to illustrate that potential speedup: If you have a dataset with 300 points, a conventional approach to analyzing all the topological features in that system would require “a computer the size of the universe,” he says. That is, it would take 2300 (two to the 300th power) processing units — approximately the number of all the particles in the universe. In other words, the problem is simply not solvable in that way.
    “That’s where our algorithm kicks in,” he says. Solving the same problem with the new system, using a quantum computer, would require just 300 quantum bits — and a device this size may be achieved in the next few years, according to Lloyd.


    “Our algorithm shows that you don’t need a big quantum computer to kick some serious topological butt,” he says.
    There are many important kinds of huge datasets where the quantum-topological approach could be useful, Lloyd says, for example understanding interconnections in the brain. “By applying topological analysis to datasets gleaned by electroencephalography or functional MRI, you can reveal the complex connectivity and topology of the sequences of firing neurons that underlie our thought processes,” he says.


    The same approach could be used for analyzing many other kinds of information. “You could apply it to the world’s economy, or to social networks, or almost any system that involves long-range transport of goods or information,” says Lloyd, who holds a joint appointment as a professor of physics. But the limits of classical computation have prevented such approaches from being applied before.


    While this work is theoretical, “experimentalists have already contacted us about trying prototypes,” he says. “You could find the topology of simple structures on a very simple quantum computer. People are trying proof-of-concept experiments.”


    Ignacio Cirac, a professor at the Max Planck Institute of Quantum Optics in Munich, Germany, who was not involved in this research, calls it “a very original idea, and I think that it has a great potential.” He adds “I guess that it has to be further developed and adapted to particular problems. In any case, I think that this is top-quality research.”
    The team also included Silvano Garnerone of the University of Waterloo in Ontario, Canada, and Paolo Zanardi of the Center for Quantum Information Science and Technology at the University of Southern California. The work was supported by the Army Research Office, Air Force Office of Scientific Research, Defense Advanced Research Projects Agency, Multidisciplinary University Research Initiative of the Office of Naval Research, and the National Science Foundation.

    Source:MIT news

  • A Shortcut Guide to Machine Learning and AI in The Enterprise

    advanced-predictive-proactive-etc-Two-men-fighting

    Predictive analytics / machine learning / artificial intelligence is a hot topic – what’s it about?

    Using algorithms to help make better decisions has been the “next big thing in analytics” for over 25 years. It has been used in key areas such as fraud the entire time. But it’s now become a full-throated mainstream business meme that features in every enterprise software keynote — although the industry is battling with what to call it.

    It appears that terms like Data Mining, Predictive Analytics, and Advanced Analytics are considered too geeky or old for industry marketers and headline writers. The term Cognitive Computing seemed to be poised to win, but IBM’s strong association with the term may have backfired — journalists and analysts want to use language that is independent of any particular company. Currently, the growing consensus seems to be to use Machine Learning when talking about the technology and Artificial Intelligence when talking about the business uses.

    Whatever we call it, it’s generally proposed in two different forms: either as an extension to existing platforms for data analysts; or as new embedded functionality in diverse business applications such as sales lead scoring, marketing optimization, sorting HR resumes, or financial invoice matching.

    Why is it taking off now, and what’s changing?

    Artificial intelligence is now taking off because there’s a lot more data available and affordable, powerful systems to crunch through it all. It’s also much easier to get access to powerful algorithm-based software in the form of open-source products or embedded as a service in enterprise platforms.

    Organizations today have also more comfortable with manipulating business data, with a new generation of business analysts aspiring to become “citizen data scientists.” Enterprises can take their traditional analytics to the next level using these new tools.

    However, we’re now at the “Peak of Inflated Expectations” for these technologies according to Gartner’s Hype Cycle — we will soon see articles pushing back on the more exaggerated claims. Over the next few years, we will find out the limitations of these technologies even as they start bringing real-world benefits.

    What are the longer-term implications?

    First, easier-to-use predictive analytics engines are blurring the gap between “everyday analytics” and the data science team. A “factory” approach to creating, deploying, and maintaining predictive models means data scientists can have greater impact. And sophisticated business users can now access some the power of these algorithms without having to become data scientists themselves.

    Second, every business application will include some predictive functionality, automating any areas where there are “repeatable decisions.” It is hard to think of a business process that could not be improved in this way, with big implications in terms of both efficiency and white-collar employment.

    Third, applications will use these algorithms on themselves to create “self-improving” platforms that get easier to use and more powerful over time (akin to how each new semi-autonomous-driving Tesla car can learn something new and pass it onto the rest of the fleet).

    Fourth, over time, business processes, applications, and workflows may have to be rethought. If algorithms are available as a core part of business platforms, we can provide people with new paths through typical business questions such as “What’s happening now? What do I need to know? What do you recommend? What should I always do? What can I expect to happen? What can I avoid? What do I need to do right now?”

    Fifth, implementing all the above will involve deep and worrying moral questions in terms of data privacy and allowing algorithms to make decisions that affect people and society. There will undoubtedly be many scandals and missteps before the right rules and practices are in place.

    What first steps should companies be taking in this area?
    As usual, the barriers to business benefit are more likely to be cultural than technical.

    Above all, organizations need to make sure they have the right technical expertise to be able to navigate the confusion of new vendors offers, the right business knowledge to know where best to apply them, and the awareness that their technology choices may have unforeseen moral implications.

    Source: timoelliot.com, October 24, 2016

     

  • A word of advice to help you get your first data science job

    A word of advice to help you get your first data science job

    Creativity, grit, and perseverance will become the three words you live by

    Whether you’re a new graduate, someone looking for a career change, or a cat similar to the one above, the data science field is full of jobs that tick nearly every box on the modern worker’s checklist. Working in data science gives you the opportunity to have job security, a high-paying salary with room for advancement, and the ability to work from anywhere in the world. Basically, working in data science is a no-brainer for those interested.

    However, during the dreaded job search, many of us run into a situation where experience is required to be hired while in order to gain experience you need to be hired first...

    Pretty familiar, right?

    Having run into many situations myself where companies are often looking for candidates with 20 years of work experience before the age of 22, I understand the aggravation that comes with trying to look for a job when you’re a new graduate, someone looking for a career change, or even a cat, with no relevant work experience.

    However, this is no reason to become discouraged. While many data science jobs require work experience, there are plenty of ways to create your own work experience that will make you an eligible candidate for these careers.

    All you need is a little creativity, grit, and perseverance.

    It’s not about what you know. It’s about who you know and who knows you.

    In countries similar to Canada where having some form of university qualification is becoming the norm (in 2016, 54% of Canadians aged 25 to 64 had a college or university certification), it’s now no longer about what you know. Instead, it’s about who you know and who knows you.

    Google “the importance of networking”, and you will be flooded with articles from all the major players (Forbes, Huffington Post, Indeed, etc.) on why networking is one of the most important things you can do for your career. Forbes says it best:

    “Networking is not only about trading information, but also serves as an avenue to create long-term relationships with mutual benefits.” — Bianca Miller Cole, Forbes

    While networking is a phenomenal way to get insider knowledge on how to become successful in a particular career, it can also serve as a mutually beneficial relationship later on down the road.

    I got my first job in tech by maintaining a relationship with a university colleague. We met as a result of being teamed up for our final four-month-long practicum. After graduation, we kept in touch. Almost two years later, I got a message saying that the company they work for is interested in hiring me to do some work for them. Thanks to maintaining that relationship, I managed to score my first job after graduation with no work experience thanks to my colleague putting my name forward.

    In other words, it’s important to make a few acquaintances while you’re going through university, to attend networking events and actually talk to people there, and to put yourself out there so recruiters begin to know your name.

  • About how Uber and Netflex turn Big Data into real business value

    client-logo-netflix-logo-png-netflix-logo-png-netflix-logo-qlHSS6-clipart

    From the way we go about our daily lives to the way we treat cancer and protect our society from threats, big data will transform every industry, every aspect of our lives. We can say this with authority because it is already happening.

    Some believe big data is a fad, but they could not be more wrong. The hype will fade, and even the name may disappear, but the implications will resonate and the phenomenon will only gather momentum. What we currently call big data today will simply be the norm in just a few years’ time.

    Big data refers generally to the collection and utilization of large or diverse volumes of data. In my work as a consultant, I work every day with companies and government organizations on big data projects that allow them to collect, store, and analyze the ever-increasing volumes of data to help improve what they do.

    In the course of that work, I’ve seen many companies doing things wrong — and a few getting big data very right, including Netflix and Uber.

    Netflix: Changing the way we watch TV and movies

    The streaming movie and TV service Netflix are said to account for one-third of peak-time Internet traffic in the US, and the service now have 65 million members in over 50 countries enjoying more than 100 million hours of TV shows and movies a day. Data from these millions of subscribers is collected and monitored in an attempt to understand our viewing habits. But Netflix’s data isn’t just “big” in the literal sense. It is the combination of this data with cutting-edge analytical techniques that makes Netflix a true Big Data company.

    Although Big Data is used across every aspect of the Netflix business, their holy grail has always been to predict what customers will enjoy watching. Big Data analytics is the fuel that fires the “recommendation engines” designed to serve this purpose.

    At first, analysts were limited by the lack of information they had on their customers. As soon as streaming became the primary delivery method, many new data points on their customers became accessible. This new data enabled Netflix to build models to predict the perfect storm situation of customers consistently being served with movies they would enjoy.

    Happy customers, after all, are far more likely to continue their subscriptions.

    Another central element to Netflix’s attempt to give us films we will enjoy is tagging. The company pay people to watch movies and then tag them with elements the movies contain. They will then suggest you watch other productions that were tagged similarly to those you enjoyed. 

    Netflix’s letter to shareholders in April 2015 shows their Big Data strategy was paying off. They added 4.9 million new subscribers in Q1 2015, compared to four million in the same period in 2014. In Q1 2015 alone, Netflix members streamed 10 billion hours of content. If Netflix’s Big Data strategy continues to evolve, that number is set to increase.

    Uber: Disrupting car services in the sharing economy

    Uber is a smartphone app-based taxi booking service which connects users who need to get somewhere with drivers willing to give them a ride. 

    Uber’s entire business model is based on the very Big Data principle of crowdsourcing: anyone with a car who is willing to help someone get to where they want to go can offer to help get them there. This gives greater choice for those who live in areas where there is little public transport, and helps to cut the number of cars on our busy streets by pooling journeys.

    Uber stores and monitors data on every journey their users take, and use it to determine demand, allocate resources and set fares. The company also carry out in-depth analysis of public transport networks in the cities they serve, so they can focus coverage in poorly served areas and provide links to buses and trains.

    Uber holds a vast database of drivers in all of the cities they cover, so when a passenger asks for a ride, they can instantly match you with the most suitable drivers. The company have developed algorithms to monitor traffic conditions and journey times in real time, meaning prices can be adjusted as demand for rides changes, and traffic conditions mean journeys are likely to take longer. This encourages more drivers to get behind the wheel when they are needed – and stay at home when demand is low. 

    The company have applied for a patent on this method of Big Data-informed pricing, which they call “surge pricing”. This is an implementation of “dynamic pricing” – similar to that used by hotel chains and airlines to adjust price to meet demand – although rather than simply increasing prices at weekends or during public holidays it uses predictive modelling to estimate demand in real time.

    Data also drives (pardon the pun) the company’s UberPool service. According to Uber’s blog, introducing this service became a no-brainer when their data told them the “vast majority of [Uber trips in New York] have a look-a-like trip – a trip that starts near, ends near and is happening around the same time as another trip”. 

    Other initiatives either trialed or due to launch in the future include UberChopper, offering helicopter rides to the wealthy, Uber-Fresh for grocery deliveries and Uber Rush, a package courier service.

    These are just two companies using Big Data to generate a very real advantage and disrupt their markets in incredible ways. I’ve compiled dozens more examples of Big Data in practice in my new book of the same name, in the hope that it will inspire and motivate more companies to similarly innovate and take their fields into the future. 

    Thank you for reading my post. Here at LinkedIn and at Forbes I regularly write about management, technology and Big Data. If you would like to read my future posts then please click 'Follow' and feel free to also connect via TwitterFacebookSlideshare, and The Advanced Performance Institute.

    You might also be interested in my new and free ebook on Big Data in Practice, which includes 3 Amazing use cases from NASA, Dominos Pizza and the NFL. You can download the ebook from here: Big Data in Practice eBook.

    Author: Bernard Marr

    Source: Linkedin Blog

  • An overview of Morgan Stanley's surge toward data quality

    An overview of Morgan Stanley's surge toward data quality

    Jeff McMillan, chief analytics and data officer at Morgan Stanley, has long worried about the risks of relying solely on data. If the data put into an institution's system is inaccurate or out of date, it will give customers the wrong advice. At a firm like Morgan Stanley, that just isn't an option.

    As a result, Morgan Stanley has been overhauling its approach to data. Chief among them is that it wants to improve data quality in core business processing.

    “The acceleration of data volume and the opportunity this data presents for efficiency and product innovation is expanding dramatically,” said Gerard Hester, head of the bank’s data center of excellence. “We want to be sure we are ahead of the game.”

    The data center of excellence was established in 2018. Hester describes it as a hub with spokes out to all parts of the organization, including equities, fixed income, research, banking, investment management, wealth management, legal, compliance, risk, finance and operations. Each division has its own data requirements.

    “Being able to pull all this data together across the firm we think will help Morgan Stanley’s franchise internally as well as the product we can offer to our clients,” Hester said.

    The firm hopes that improved data quality will let the bank build higher quality artificial intelligence and machine learning tools to deliver insights and guide business decisions. One product expected to benefit from this is the 'next best action' the bank developed for its financial advisers.

    This next best action uses machine learning and predictive analytics to analyze research reports and market data, identify investment possibilities, and match them to individual clients’ preferences. Financial advisers can choose to use the next best action’s suggestions or not.

    Another tool that could benefit from better data is an internal virtual assistant called 'ask research'. Ask research provides quick answers to routine questions like, “What’s Google’s earnings per share?” or “Send me your latest model for Google.” This technology is currently being tested in several departments, including wealth management.

    New data strategy

    Better data quality is just one of the goals of the revamp. Another is to have tighter control and oversight over where and how data is being used, and to ensure the right data is being used to deliver new products to clients.

    To make this happen, the bank recently created a new data strategy with three pillar. The first is working with each business area to understand their data issues and begin to address those issues.

    “We have made significant progress in the last nine months working with a number of our businesses, specifically our equities business,” Hester said.

    The second pillar is tools and innovation that improve data access and security. The third pillar is an identity framework.

    At the end of February, the bank hired Liezel McCord to oversee data policy within the new strategy. Until recently, McCord was an external consultant helping Morgan Stanley with its Brexit strategy. One of McCord’s responsibilities will be to improve data ownership, to hold data owners accountable when the data they create is wrong and to give them credit when it’s right.

    “It’s incredibly important that we have clear ownership of the data,” Hester said. “Imagine you’re joining lots of pieces of data. If the quality isn’t high for one of those sources of data, that could undermine the work you’re trying to do.”

    Data owners will be held accountable for the accuracy, security and quality of the data they contribute and make sure that any issues are addressed.

    Trend of data quality projects

    Arindam Choudhury, the banking and capital markets leader at Capgemini, said many banks are refocusing on data as it gets distributed in new applications.

    Some are driven by regulatory concerns, he said. For example, the Basel Committee on Banking Supervision's standard number 239 (principles for effective risk data aggregation and risk reporting) is pushing some institutions to make data management changes.

    “In the first go-round, people complied with it, but as point-to-point interfaces and applications, which was not very cost effective,” Choudhury said. “So now people are looking at moving to the cloud or a data lake, they’re looking at a more rationalized way and a more cost-effective way of implementing those principles.”

    Another trend pushing banks to get their data house in order is competition from fintechs.

    “One challenge that almost every financial services organization has today is they’re being disintermediated by a lot of the fintechs, so they’re looking at assets that can be used to either partner with these fintechs or protect or even grow their business,” Choudhury said. “So they’re taking a closer look at the data access they have. Organizations are starting to look at data as a strategic asset and try to find ways to monetize it.”

    A third driver is the desire for better analytics and reports.

    "There’s a strong trend toward centralizing and figuring out, where does this data come from, what is the provenance of this data, who touched it, what kinds of rules did we apply to it?” Choudhury said. That, he said, could lead to explainable, valid and trustworthy AI.

    Author: Penny Crosman

    Source: Information-management

  • Applying data science to battle childhood cancer

    Applying data science to battle childhood cancer

    Acute myeloid leukaemia in children has a poor prognosis and treatment options unchanged for decades. One collaboration is using data analytics to bring a fresh approach to tackling the disease.

    Acute myeloid leukaemia (AML) kills hundreds of children a year. It's the type of cancer that causes the most deaths in children under two, and in teenagers. It has a poor prognosis, and its treatments can be severely toxic.

    Research initiative Target Paediatric AML (tpAML) was set up to change the way that the disease is diagnosed, monitored and treated, through greater use of personalised medicine. Rather than the current one-size-fits-all approach for many diseases, personalised medicine aims to tailor an individual's treatment by looking at their unique circumstance, needs, health, and genetics.

    AML is caused by many different types of genetic mutation, alone and together. Those differences can affect how the cancer should be treated and its prognosis. To understand better how to find, track and treat the condition, tpAML researchers began building the largest dataset ever compiled around the disease. By sequencing the genomes of over 2,000 people, both alive and deceased, who had the disease, tpAML's researchers hoped to find previously unknown links between certain mutations and how a cancer could be tackled.

    Genomic data is notoriously sizeable, and tpAML's sequencing had generated over a petabyte of it. As well as difficulties thrown up by the sheer bulk of data to be analysed, tpAML's data was also hugely complex: each patient's data had 48,000 linked RNA transcripts to analyse.

    Earlier this year, Joe Depa, a father who had lost a daughter to the disease and was working with tpAML, joined with his coworkers at Accenture to work on a project to build a system that could analyse the imposing dataset.

    Linking up with tpAML's affiliated data scientists and computational working group, Depa along with data-scientist and genomic-expert colleagues hoped to help turn the data into information that researchers and clinicians could use in the fight against paediatric AML, by allowing them to correlate what was happening at a genetic level with outcomes in the disease.

    In order to turn the raw data into something that could generate insights into paediatric AML, Accenture staff created a tool that ingested the raw clinical and genomic data and cleaned it up, so analytics tools could process it more effectively. Using Alteryx and Python, the data was merged into a single file, and any incomplete or duplicate data removed. Python was used to profile the data and develop statistical summaries for the analysis – which could be used to flag genes that could be of interest to researchers, Depa says. The harmonised DataFrame was exported as a flat file for more analysis.

    "The whole idea was 'let's reduce the time for data preparation', which is a consistent issue in any area around data, but particularly in the clinical space. There's been a tonne of work already put into play for this, and now we hope we've got it in a position where hopefully the doctors can spend more time analysing the data versus having to clean up the data," says Depa, managing director at Accenture Applied Intelligence.

    Built using R, the code base that was created for the project is open source, allowing researchers and doctors with similar challenges, but working on different conditions, to reuse the group's work for their own research. While users may need a degree of technical expertise to properly manipulate the information at present, the group is working on a UI that should make it as accessible as possible for those who don't have a similar background.

    "We wanted to make sure that at the end of this analysis, any doctor in the world can access this data, leverage this data and perform their analysis on it to hopefully drive to more precision-type medicine," says Depa.

    But clinical researchers and doctors aren't always gifted data scientists, so the group has been working on ways to visualise the information, using Unity. The tools they've created allow researchers to manipulate the data in 3D, and zoom in and out on anomalies in the data to find data points that may be worthy of further exploration. One enterprising researcher has even been able to explore those datasets in virtual reality using an Oculus.

    Historically, paediatric and adult AML were treated as largely the same disease. However, according to Dr Soheil Meshinchi, professor in the Fred Hutchinson Cancer Research Center's clinical research division and lead for tpAML's computational working group, the two groups stem from different causes. In adults, the disease arises from changes to the smallest links in the DNA chain, known as single base pairs, while in children it's driven by alterations to larger chunks of their chromosomes.

    The tpAML has allowed researchers to find previously unknown alterations that cause the disease in children. "We've used the data that tpAML generated to probably make the most robust diagnostic platform that there is. We've identified genetic alterations which was not possible by conventional methods," says Meshinchi.

    Once those mutations are found, the data analysis platformcan begin identifying drugs that could potentially target them. Protocols for how to treat paediatric AML have remained largely unchanged for decades and new, more individualised treatment options are sorely needed.

    "We've tried it for 40 years of treating all AML the same and hoping for the best. That hasn't worked – you really need to take a step back and to treat each subset more appropriately based on the target that's expressed," says Meshinchi.

    The data could help by identifying drugs that have already been developed to treat other conditions but may have a role in fighting paediatric AML, and by showing the pharmaceutical companies that make those drugs there is hard evidence that starting the expensive and risky.

    Using the analytics platform to find drugs that can be repurposed in this way, rather than created from scratch, could cut the time it takes for a new paediatric AML treatment to be approved by years. One drug identified as a result has already been tested in clinical trials.

    The results generated by the team's work has begun to have an impact for paediatric AML patients. When the data was used to show a subset of children with the disease who had a particular genetic marker that were considered particularly high risk, the treatment pathway for those children was altered.

    "This data will not only have an impact ongoing but is already having an impact right now," says Julie Guillot, co-founder of tpAML.

    "One cure for leukaemia or one cure for AML is very much unlikely. But we are searching for tailored treatments for specific groups of kids… when [Meshinchi] and his peers are able to find that Achilles heel for a specific cluster of patients, the results are dramatic. These kids go from a very low percentage of cure to, for example, a group that went to 95%. This approach can actually work."

    Author: Jo Best

    Source: ZDNet

  • BERT-SQuAD: Interviewing AI about AI

    BERT-SQuAD: Interviewing AI about AI

    If you’re looking for a data science job, you’ve probably noticed that the field is hyper-competitive. AI can now even generate code in any language. Below, we’ll explore how AI can extract information from paragraphs to answer questions.

    One day you might be competing against AI, if AutoML isn’t that competitor already.

    What is BERT-SQuAD?

    Google BERT and the Stanford Question Answering Dataset.

    BERT is a cutting-edge Natural Language Processing algorithm that can be used for tasks like question answering (which we’ll go into here), sentiment analysis, spam filtering, document clustering, and more. It’s all language!

    “Bidirectionality” refers to the fact that many words change depending on their context, like “let’s hit he club” versus “an idea hit him”, so it’ll consider words on both sides of the keyword.

    “Encoding” just means assigning numbers to characters, or turning an input like “let’s hit the club” into a machine-workable format.

    “Representations” are the general understanding of words you get by looking at many of their encodings in a corpus of text.

    “Transformers” are what you use to get from embeddings to representations. This is the most complex part.

    As mentioned, BERT can be trained to work on basically any kind of language task, so SQuAD refers to the dataset we’re using to train it on a specific language task: Question answering.

    SQuAD is a reading comprehension dataset, containing questions asked by crowdworkers on Wikipedia articles, where the answer to every question is a segment of text from the corresponding passage.

    BERT-SQuAD, then, allows us to answer general questions by fishing out the answer from a body of text. It’s not cooking up answers from scratch, but rather, it understands the context of the text enough to find the specific area of an answer.

    For example, here’s a context paragraph about lasso and ridge regression:

    “You can quote ISLR’s authors Hastie, Tibshirani who asserted that, in presence of few variables with medium / large sized effect, use lasso regression. In presence of many variables with small / medium sized effect, use ridge regression.

    Conceptually, we can say, lasso regression (L1) does both variable selection and parameter shrinkage, whereas Ridge regression only does parameter shrinkage and end up including all the coefficients in the model. In presence of correlated variables, ridge regression might be the preferred choice. Also, ridge regression works best in situations where the least square estimates have higher variance. Therefore, it depends on our model objective.”

    Now, we could ask BERT-SQuAD:

    “When is Ridge regression favorable over Lasso regression?”

    And it’ll answer:

    “In presence of correlated variables”

    While I show around 100 words of context here, you could input far more context into BERT-SQuAD, like whole documents, and quickly retrieve answers. An intelligent Ctrl-F, if you will.

    To test the following 7 questions, I used Gradio, a library that lets developers make interfaces out of models. In this case, I used the BERT-SQuAD interface created out of Google Colab.

    I used the contexts from a Kaggle thread as inputs, and modified the questions for simplicities sake.

    Q1: What will happen if you don’t rotate PCA components?

    The effect of PCA will diminish

    Q2. How do you reduce the dimensions of data to reduce computation time?

    We can separate the numerical and categorical variables and remove the correlated variables

    Q3: Why is Naive Bayes “naive” ?

    It assumes that all of the features in a data set are equally important and independent

    Q4: Which algorithm should you use to tackle low bias and high variance?

    Bagging

    Q5: How are kNN and kmeans clustering different?

    kmeans is unsupervised in nature and kNN is supervised in nature

    Q6: When is Ridge regression favorable over Lasso regression?

    In presence of correlated variables

    Q7: What is convex hull?

    Represents the outer boundaries of the two group of data points

    Author: Frederik Bussler

    Source: Towards Data Science

     

  • BI topics to tackle when migrating to the cloud

    BI topics to tackle when migrating to the cloud

    When your organization decides to pull the trigger on a cloud migration, a lot of stuff will start happening all at once. Regardless of how long the planning process has been, once data starts being relocated, a variety of competing factors that have all been theoretical earlier become devastatingly real: frontline business users still want to be able to run analyses while the migration is happening, your data engineers are concerned with the switch from whatever database you were using before, and the development org has its own data needs. With a comprehensive, BI-focused data strategy, you and your stakeholders will know what your ideal data model should look like once all your data is moved over. This way, as you’re managing the process and trying to keep everyone happy, you end in a stronger place when your migration is over than you were at the start, and isn’t that the goal?

    BI focus and your data infrastructure

    “What does all this have to do with my data model?” you might be wondering. “And for that matter, my BI solution?”

    I’m glad you asked, internet stranger. The answer is everything. Your data infrastructure underpins your data model and powers all of your business-critical IT systems. The form it takes can have immense ramifications for your organization, your product, and the new things you want to do with it. Your data infrastructure is hooked into your BI solution via connectors, so it’ll work no matter where the data is stored. Picking the right data model, once all your data is in its new home, is the final piece that will allow you to get the most out of it with your BI solution. If you don’t have a BI solution, the perfect time to implement one is once all your data is moved over and your model is built. This should all be part of your organization’s holistic cloud strategy, with buy-in from major partners who are handling the migration.

    Picking the right database model for you

    So you’re giving your data a new home and maybe implementing a BI solution when it’s all done. Now, what database model is right for your company and your use case? There are a wide array of ways to organize data, depending on what you want to do with it.

    One of the broadest is a conceptual model, which focuses on representing the objects that matter most to the business and the relationships between them. This database model is designed principally for business users. Compare this to a physical model, which is all about the structure of the data. In this model, you’ll be dealing with tables, columns, relationships, graphs, etc. And foreign keys, which distinguish the connections between the tables.

    Now, let’s say you’re only focused on representing your data organization and architecture graphically, putting aside the physical usage or database management framework. In cases like these, a logical model could be the way to go. Examples of these types of databases include relational (dealing with data as tables or relations), network (putting data in the form of records), and hierarchical (which is a progressive tree-type structure, with each branch of the tree showing related records). These models all feature a high degree of standardization and cover all entities in the dataset and the relationships between them.

    Got a wide array of different objects and types of data to deal with? Consider an object-oriented database model, sometimes called a “hybrid model.” These models look at their contained data as a collection of reusable software pieces, all with related features. They also consolidate tables but aren’t limited to the tables, giving you freedom when dealing with lots of varied data. You can use this kind of model for multimedia items you can’t put in a relational database or to create a hypertext database to connect to another object and sort out divergent information.

    Lastly, we can’t help but mention the star schema here, which has elements arranged around a central core and looks like an asterisk. This model is great for querying informational indexes as part of a larger data pool. It’s used to dig up insights for business users, OLAP cubes, analytics apps, and ad-hoc analyses. It’s a simple, yet powerful, structure that sees a lot of usage, despite its simplicity.

    Now what?

    Whether you’re building awesome analytics into your app or empowering in-house users to get more out of your data, knowing what you’re doing with your data is key to maintaining the right models. Once you’ve picked your database, it’s time to pick your data model, with an eye towards what you want to do with it once it’s hooked into your BI solution.

    Worried about losing customers? A predictive churn model can help you get ahead of the curve by putting time and attention into relationships that are at risk of going sour. On the other side of the coin, predictive up- and cross-sell models can show you where you can get more money out of a customer and which ones are ripe to deepen your financial relationship.

    What about your marketing efforts? A customer segmentation data model can help you understand the buying behaviors of your current customers and target groups and which marketing plays are having the desired effect. Or go beyond marketing with “next-best-action models” that take into account life events, purchasing behaviors, social media, and anything else you can get your hands on so that you can figure out what’s the next action with a given target (email, ads, phone call, etc.) to have the greatest impact. And predictive analyses aren’t just for humancentric activities, manufacturing and logistics companies can take advantage of maintenance models that can let you circumvent machine breakdowns based on historical data. Don’t get caught without a vital piece of equipment again.

    Bringing it all together with BI

    Staying focused on your long-term goals is an important key to success. Whether you’re building a game-changing product or rebuilding your data model, having a well defined goal makes all the difference in the world when it comes to the success of your enterprise. If you’re already migrating your data to the cloud, then you’re at the perfect juncture to pick the right database and data models for your eventual use cases. Once these are set up, they’ll integrate seamlessly with your BI tool (and if you don’t have one yet, it’ll be the perfect time to implement one). Big moves like this represent big challenges, but also big opportunities to make lay the foundation for whatever you’re planning on building. Then you just have to build it!

    Author: Jack Cieslak

    Source: Sisense

  • Big Data Analytics: hype?

    Big DAta explosion

    Er gaat momenteel geen dag voorbij of er is in de media wel een bericht of discussie te vinden rond data. Of het nu gaat om vraagstukken rond privacy, nieuwe mogelijkheden en bedreigingen van Big Data, of nieuwe diensten gebaseerd op het slim combineren en uitwisselen van gegevens: je kunt er niet onderuit dat informatie ‘hot’ is. 

    Is Big Data Analytics - ofwel de analyse van grote hoeveelheden data, veelal ongestructureerd - een hype? Toen de term enkele jaren geleden opeens overal opdook zeiden veel sceptici dat het een truc was van software leveranciers om iets bestaands - data analyse wordt al lang toegepast - opnieuw te vermarkten. Inmiddels zijn alle experts het er over eens dat Big Data Analytics in de vorm waarin het nu kan worden toegepast een enorme impact gaat hebben op de wereld zoals wij die kennen. Ja, het is een hype, maar wel een terechte.

    Big Data Analytics – wat is dat nou eigenlijk?

    Big Data is al jaren een hype, en zal dat nog wel even blijven. Wanneer is er nou sprake van ‘Big’ Data, bij hoeveel tera-, peta- of yottabytes (1024) ligt de grens tussen ‘Normal’ en ‘Big’ Data? Het antwoord is: er is geen duidelijke grens. Je spreekt van Big Data als het te veel wordt voor jouw mensen en middelen. Big Data Analytics richt zich op de exploratie van data middels statistische methoden om nieuwe inzichten op te doen waarmee de toekomstige prestaties verbeterd kunnen worden. 

    Big Data Analytics als stuurmiddel voor prestaties is al volop in gebruik bij bedrijven. Denk aan een sportclub die het inzet om te bepalen welke spelers ze gaan kopen. Of een bank die gestopt is alleen talenten te rekruteren van topuniversiteiten omdat bleek dat kandidaten van minder prestigieuze universiteiten het beter deden. Of bijvoorbeeld een verzekeringsmaatschappij die het gebruikt om fraude te detecteren. Enzovoorts. Enzovoorts. 

    Wat maakt Big Data Analytics mogelijk? 

    Tenminste drie ontwikkelingen zorgen ervoor dat Big Data Analytics een hele nieuwe fase ingaat. 

    1. Rekenkracht 

    De toenemende rekenkracht van computers stelt analisten in staat om enorme datasets te gebruiken, en een groot aantal variabelen te gebruiken in hun analyses. Door de toegenomen rekenkracht is het niet langer nodig om een steekproef te nemen zoals vroeger, maar kan alle data gebruikt worden voor een analyse. De analyse kan worden gedaan met behulp van specifieke tools en vereist vaak specifieke kennis en vaardigheden van de gebruiker, een data analist of data scientist. 

    2. Datacreatie 

    Het internet en social media zorgen ervoor dat de hoeveelheid data die we creëren exponentieel toeneemt. Deze data is inzetbaar voor talloze data-analyse toepassingen, waarvan de meeste nog bedacht moeten worden. 

    Om een beeld te krijgen van de datagroei, overweeg deze statistieken: 

    - Meer dan een miljard tweets worden iedere 48 uur verstuurd.

    - Dagelijks komen een miljoen Twitter accounts bij.

    - Iedere 60 seconden worden er 293.000 status updates gepost op facebook.

    - De gemiddelde Facebook gebruiker creëert 90 stukken content per maand, inclusief links, nieuws, verhalen, foto’s en video’s. 

    - Elke minuut komen er 500 Facebook accounts bij. 

    - Iedere dag worden 350 miljoen foto’s geupload op facebook, wat neerkomt op 4.000 foto’s per seconde.

    - Als Wikipedia een boek zou zijn, zou het meer dan twee miljard pagina’s omvatten. 

    Bron: http://www.iacpsocialmedia.org

    3. Dataopslag 

    De kosten voor het opslaan van data zijn sterk afgenomen de afgelopen jaren, wat de mogelijkheden om analytics toe te passen heeft doen groeien. Een voorbeeld is de opslag van videobeelden. Beveiligingscamera’s in een supermarkt namen eerst alles op tape op. Als er na drie dagen niks gebeurd was werd de band teruggespoeld en werd er opnieuw over opgenomen.  

    Dat is niet langer nodig. Een supermarkt kan nu digitale beelden - die de hele winkel vastleggen - naar de cloud versturen waar ze blijven opgeslagen. Vervolgens is het mogelijk analytics op deze beelden toe te passen: welke promoties werken goed? Voor welke schappen blijven mensen lang staan? Wat zijn de blinde hoeken in de winkel? Of predictive analytics: Stel dat we dit product in dit schap zouden leggen, wat zou het resultaat dan zijn? Deze analyses kan het management gebruiken om tot een optimale winkelinrichting te komen en maximaal rendement uit promoties te halen.  

    Betekenis Big Data Analytics

    Big Data - of Smart Data - zoals Bernard Marr, auteur van het nieuwe praktische boek ‘Big Data: Using SMART Big Data Analytics To Make Better Decisions and Improve Performance’ - het liever noemt is de wereld aan het veranderen. De hoeveelheid data neemt exponentieel toe momenteel, maar de hoeveelheid is voor de meeste beslissers grotendeels irrelevant. Het gaat erom hoe men het inzet om te komen tot waardevolle inzichten.  

    Big Data 

    De meningen zijn verdeeld over wat big data nou precies is. Gartner definieert big data vanuit de drie V’s Volume, Velocity en Variety. Het gaat dus om de hoeveelheid data, de snelheid waarmee de data verwerkt kan worden en de diversiteit van de data. Met dit laatste wordt bedoeld dat de data, naast gestructureerde bronnen, ook uit allerlei ongestructureerde bronnen gehaald kan worden, zoals internet en social media, inclusief tekst, spraak en beeldmateriaal.

    Analytics

    Wie zou niet de toekomst willen voorspellen? Met voldoende data, de juiste technologie en een dosis wiskunde komt dat binnen bereik. Dit wordt business analytics genoemd, maar er zijn veel andere termen in omloop, zoals data science, machine learning en, jawel, big data. Ondanks dat deze wiskunde al vrij lang bestaat, is het nog een relatief nieuw vakgebied dat tot voor kort alleen voor gespecialiseerde bedrijven met veel geld bereikbaar was.

    Toch maken we er zonder het te weten allemaal al gebruik van. Spraakherkenning op je telefoon, virusscanners op je PC en spamfilters voor email zijn gebaseerd op concepten die in het domein van business analytics vallen. Ook de ontwikkeling van zelfrijdende auto’s en alle stapjes daarnaartoe (adaptive cruise control, lane departure system, et cetera) zijn alleen mogelijk door machine learning. 

    Analytics is kortom de ontdekking en de communicatie van zinvolle patronen in data. Bedrijven kunnen analytics toepassen op zakelijke gegevens om hun bedrijfsprestaties te beschrijven, voorspellen en verbeteren. Er zijn verschillende soorten analytics, zoals tekst-analytics, spraak-analytics en video-analytics. 

    Een voorbeeld van tekst-analytics is een advocatenfirma die hiermee duizenden documenten doorzoekt om zo snel de benodigde informatie te vinden ter voorbereiding van een nieuwe zaak. Speech-analytics worden bijvoorbeeld gebruikt in callcenters om vast te stellen wat de stemming van de beller is, zodat de medewerker hier zo goed mogelijk op kan anticiperen. Video-analytics kan gebruikt worden voor het monitoren van beveiligingscamera’s. Vreemde patronen worden er zo uitgepikt, waarop beveiligingsmensen in actie kunnen komen. Ze hoeven nu zelf niet langer uren naar het scherm te staren terwijl er niks gebeurt.  

    Het proces kan zowel top-down als bottom-up benaderd worden. De meest toegepaste benaderingen zijn: 

    • Datamining: Dataonderzoek op basis van een gerichte vraag, waarin men op zoek gaat naar een specifiek antwoord.
    • Trend-analyse en predictive analytics: Door gericht op zoek te gaan naar oorzaak-gevolg verbanden om bepaalde gebeurtenissen te kunnen verklaren of om toekomstig gedrag te voorspellen.
    • Data discovery: Data onderzoeken op onverwachte verbanden of andere opvallende zaken.

    Feiten en dimensies

    De data die helpen om inzichten te verkrijgen of besluiten te nemen zijn feiten. Bijvoorbeeld EBITDA, omzet of aantal klanten. Deze feiten krijgen waarde door dimensies. De omzet over het jaar 2014 voor de productlijn babyvoeding in de Regio Oost. Door met dimensies te gaan analyseren kun je verbanden ontdekken, trends benoemen en voorspellingen doen voor de toekomst.

    Analytics versus Business Intelligence

    Waarin verschilt analytics nu van business intelligence (BI)? In feite is analytics op data gebaseerde ondersteuning van de besluitvorming. BI toont wat er gebeurd is op basis van historische gegevens die gepresenteerd worden in vooraf bepaalde rapporten. Waar BI inzicht geeft in het verleden, focust analytics zich op de toekomst. Analytics vertelt wat er kan gaan gebeuren door op basis van de dagelijks veranderende datastroom met ‘wat als’- scenario’s inschattingen te maken en risico’s en trends te voorspellen.

    Voorbeelden Big Data Analytics

    De wereld wordt steeds slimmer. Alles is meetbaar, van onze hartslag tijdens een rondje joggen tot de looppatronen in winkels. Door die data te gebruiken, kunnen we indrukwekkende analyses maken om bijvoorbeeld filevorming te voorkomen, epidemieën voortijdig te onderdrukken en medicijnen op maat aan te bieden.

    Deze evolutie is zelfs zichtbaar in de meest traditionele industrieën, zoals de visserij. In plaats van - zoals vanouds - puur te vertrouwen op een kompas en ‘insider knowledge’ doorgegeven door generaties vissersfamilies, koppelt de hedendaagse visser sensoren aan vissen en worden scholen opgespoord met de meest geavanceerde GPS-systemen. Big Data Analytics wordt inmiddels toegepast in alle industrieën en sectoren. Ook steden maken er gebruik van. Hieronder een overzicht van mogelijke toepassingen:

    Doelgroep beter begrijpen

    De Amerikaanse mega retailer Target weet door een combinatie van 25 aankopen wanneer een vrouw zwanger is. Dat is één van de weinige perioden in een mensenleven waarin koopgedrag afwijkt van routines. Hier speelt Target slim op in met baby-gerelateerde aanbiedingen. Amazon is zo goed geworden in predictive analytics dat ze producten al naar naar je toe kunnen sturen voordat je ze gekocht hebt. Als het aan hun ligt, kun je je bestelling binnenkort middels een drone binnen 30 minuten bezorgd krijgen.

    Processen verbeteren 

    Processen veranderen ook door Big Data. Bijvoorbeeld inkoop. Walmart weet dat er meer ‘Pop Tarts’ verkocht worden bij een stormwaarschuwing. Ze weten niet waarom dat is, maar ze zorgen er wel voor dat ze voldoende voorraad hebben en de snacks een mooie plek in de winkel geven. Een ander proces waar data grote kansen biedt voor optimalisatie is de supply chain. Welke routes laat je chauffeurs rijden en in welke volgorde laat je ze bestellingen afleveren? Real-time weer- en verkeerdata zorgt voor bijsturing. 

    Business optimalisatie

    Bij Q-Park betalen klanten per minuut voor parkeren, maar het is ook mogelijk een abonnement af te nemen. De prijs per minuut is bij een abonnement vele malen goedkoper. Als de garage vol begint te raken, is het vervelend als er net een klant met abonnement aan komt rijden, want dat kost omzet. Het analytics systeem berekent daarom periodiek de optimale mix van abonnementsplekken en niet abonnementsplekken op basis van historische gegevens. Zo haalt de garage exploitant het maximale eruit wat eruit te halen valt. 

    Optimalisatie machines 

    General Electric (GE) is een enthousiast gebruiker van big data. Het conglomeraat gebruikt al veel data in haar data-intensieve sectoren, zoals gezondheidszorg en financiële dienstverlening, maar het bedrijf ziet ook industriële toepassingen, zoals in GE’s businesses voor locomotieven, straalmotoren en gasturbines. GE typeert de apparaten in bedrijfstakken als deze ook wel als ‘dingen die draaien’ en verwacht dat de meeste van die dingen, zo niet alle, binnenkort gegevens over dat ‘draaien’ kunnen vastleggen en communiceren. 

    Een van die draaiende dingen is de gasturbine die de klanten van GE gebruiken voor energieopwekking. GE monitort nu al meer dan 1500 turbines vanuit een centrale faciliteit, dus een groot deel van de infrastructuur voor gebruik van big data om de prestaties te verbeteren is er al. GE schat dat het de efficiëntie van de gemonitorde turbines met minstens 1 procent kan verbeteren via software en netwerkoptimalisatie, doeltreffender afhandelen van onderhoud en betere harmonisering van het gas-energiesysteem. dat lijkt misschien niet veel, maar het zou neerkomen op een brandstofbesparing van 66 miljard dollar in de komende 15 jaar.
    (bron: 'Big Data aan het werk' door Thomas Davenport)

    Klantenservice en commercie

    Een grote winst van de nieuwe mogelijkheden van big data voor bedrijven is dat ze alles aan elkaar kunnen verbinden; silo’s, systemen, producten, klanten, enzovoorts. Binnen de telecom hebben ze bijvoorbeeld het cost-to-serve-concept geïntroduceerd. Daarmee kunnen zij vanuit de daadwerkelijke operatie kijken wat voor contactpunten ze met de klant hebben; hoe vaak hij belt met de klantenservice; wat zijn betaalgedrag is; hoe hij zijn abonnement gebruikt; hoe hij is binnengekomen; hoe lang hij klant is; waar hij woont en werkt; welke telefoon hij gebruikt; et cetera. 

    Wanneer het telecombedrijf de data van al die invalshoeken bij elkaar brengt, ontstaat er opeens een hele andere kijk op de kosten en omzet van die klant. In die veelheid van gezichtspunten liggen mogelijkheden. Alleen al door data te integreren en in context te bekijken, ontstaan gegarandeerd verrassende nieuwe inzichten. Waar bedrijven nu typisch naar kijken is de top 10 klanten die het meeste en minste bijdragen aan de omzet. Daar trekken ze dan een streep tussen. Dat is een zeer beperkte toepassing van de beschikbare data. Door de context te schetsen kan het bedrijf wellicht acties bedenken waarmee ze de onderste 10 kunnen enthousiasmeren iets meer te doen. Of er alsnog afscheid van nemen, maar dan weloverwogen.

    Slimme steden

    New York City maakt tegenwoordig gebruik van een ‘soundscape’ van de hele stad. Een verstoring in het typische stadsgeluid, zoals bijvoorbeeld een pistoolschot, wordt direct doorgegeven aan de politie die er op af kunnen. Criminelen gaan een moeilijke eeuw tegemoet door de toepassing van dergelijke Big Data Analytics. 

    Slimme ziekenhuizen

    Of het nu gaat om de informatie die gedurende een opname van een patiënt wordt verzameld of informatie uit de algemene jaarrapporten: Big Data wordt voor ziekenhuizen steeds belangrijker voor verbeterde patiëntenzorg, beter wetenschappelijk onderzoek en bedrijfsmatige informatie. Medische data verdubbelen iedere vijf jaar in volume. Deze gegevens kunnen van grote waarde zijn voor het leveren van de juiste zorg.

    HR Analytics

    Data kan worden aangewend om de prestaties van medewerkers te monitoren en te beoordelen. Dit geldt niet alleen voor de werknemers van bedrijven, maar zal ook steeds vaker worden toegepast om de toplaag van managers en leiders objectief te kunnen beoordelen. 

    Een bedrijf dat de vruchten heeft geplukt van HR Analytics is Google. De internet- en techgigant had nooit het geloof dat managers veel impact hadden, dus ging het analyticsteam aan de slag met de vraag: ‘Hebben managers eigenlijk een positieve impact bij Google?’ Hun analyse wees uit dat managers wel degelijk verschil maken en een positieve impact kunnen hebben bij Google. De volgende vraag was: ‘Wat maakt een geweldige manager bij Google?’ Dit resulteerde in 8 gedragingen van de beste managers en de 3 grootste valkuilen. Dit heeft geleid tot een zeer effectief training en feedback programma voor managers dat een hele positieve invloed heeft gehad op de performance van Google.  

    Big Data Analytics in het MKB

    Een veelgehoorde misvatting over Big Data is dat het alleen iets is voor grote bedrijven. Fout, want ieder bedrijf van groot naar klein kan data inzetten. Bernard Marr geeft in zijn boek een voorbeeld van een kleine mode retail onderneming waar hij mee samen heeft gewerkt. 

    De onderneming in kwestie wilden hun sales verhogen. Ze hadden alleen geen data om dit doel te bereiken op de traditionele sales data na. Ze bedachten toen eerst een aantal vragen:

    - Hoeveel mensen passeren onze winkels?

    - Hoeveel stoppen er om in de etalage te kijken en voor hoe lang?

    - Hoeveel komen vervolgens binnen?

    - Hoeveel kopen dan iets? 

    Vervolgens hebben ze een klein discreet apparaat achter het raam geplaatst dat het aantal passerende mobiele telefoons (en daarmee mensen) is gaan meten. Het apparaat legt ook vast hoeveel mensen voor de etalage blijven staan en voor hoe lang, en hoeveel er naar binnen komen. Sales data legt vervolgens vast hoeveel mensen wat kopen. De winkelketen kon vervolgens experimenteren met verschillende etalages om te testen welke het meest succesvol waren. Dit project heeft geleid tot fors meer omzet, en het sluiten van één worstelend filiaal waar onvoldoende mensen langs bleken te komen.  

    Conclusie

    De Big Data revolutie maakt de wereld in rap tempo slimmer. Voor bedrijven is de uitdaging dat deze revolutie plaatsvindt naast de ‘business as usual’. Er is nog veel te doen voordat de meeste ondernemingen in staat zijn echt te profiteren van Big Data Analytics. Het gros van de organisaties is al blij dat ze op een goede manier kunnen rapporteren en analyseren. Veel bedrijven moeten nog aan het experiment beginnen, iets waarbij ze mogelijk over hun koudwatervrees heen moeten stappen. Het is in ieder geval zeker dat er nu snel heel veel kansen zullen ontstaan. De race die nu begonnen is zal uitwijzen wie er met de nieuwe inzichten aan de haal gaan. 

    Auteur: Jeppe Kleyngeld

    Bron: FMI

                

  • Big data and the future of the self-driving car

    Big data and the future of the self-driving car

    Each year, car manufacturers get closer to successfully developing a fully autonomous vehicle. Over the last several years, major tech companies have paired up with car manufacturers to develop the advanced technology that will one day allow the majority of vehicles on the road to be autonomous. Of the five levels of automation, companies like Ford and Tesla are hovering around level three, which offers several autonomous driving functions but still requires a person to be attentive behind the wheel.

    However, car manufacturers are expected to release fully automatic vehicles to the public within the next decade. These vehicles are expected to have a large number of safety and environmental benefits. Self-driving technology has come a long way over the last few years, as the growth of big data in technology industries has helped provide car manufacturers with the programming data needed to get closer to fully automating cars. Big data is helping to install enough information and deep learning in autonomous cars to make them safer for all drivers.

    History of self-driving cars

    The first major automation in cars was cruise control, which was patented in 1950 and is used by most drivers to keep their speed steady during long drives nowadays. Most modern cars already have several automated functions, like proximity warnings and steering adjustment, which have been tried and tested, and proven to be valuable features for safe driving. These technologies use sensors to alert the driver when they are coming too close to something that may be out of the driver’s view or something that the driver may simply not have noticed.

    The fewer functions drivers have to worry about and pay attention to, the more they’re able to focus on the road in front of them and stay alert to dangerous circumstances that could occur at any moment. Human error causes 90 percent of all crashes on the roads, which is one of the main reasons so many industries support the development of autonomous vehicles. However, even when a driver is completely attentive, circumstances that are out of their control could cause them to go off the road or crash into other vehicles. Car manufacturers are still working on the programming for autonomous driving in weather that is less than ideal.

    Big data’s role in autonomous vehicle development

    Although these technologies provided small steps toward automation, they remained milestones away from a fully automated vehicle. However, over the last decade, with the large range of advancements that have been made in technology and the newfound use of big data, tech companies have discovered the necessary programming for fully automating vehicles. Autonomous vehicles rely entirely on the data they receive through GPS, radar and sensor technology, and the information they process through cameras.

    The information cars receive through these sources provides them with the data needed to make safe driving decisions. Although car manufacturers are still using stores of big data to work out the kinks of the thousands of scenarios an autonomous car could find itself in, it’s only a matter of time before self-driving cars transform the automotive industry by making up the majority of cars on the road. As the price of the advanced radars for these vehicles goes down, self-driving cars should become more accessible to the public, which will increase the safety of roads around the world.

    Big data is changing industries worldwide, and deep learning is contributing to the progress towards fully autonomous vehicles. Although it will still be several decades before the mass adoption of self-driving cars, the change will slowly but surely come. In only a few decades, we’ll likely be living in a time where cars are a safer form of transportation, and accidents are tragedies that are few and far between.

    Source: Insidebigdata

  • Big data can’t bring objectivity to a subjective world

    justiceIt seems everyone is interested in big data these days. From social scientists to advertisers, professionals from all walks of life are singing the praises of 21st-century data science.
     
    In the social sciences, many scholars apparently believe it will lend their subject a previously elusive objectivity and clarity. Sociology books like An End to the Crisis of Empirical Sociology? and work from bestselling authors are now talking about the superiority of “Dataism” over other ways of understanding humanity. Professionals are stumbling over themselves to line up and proclaim that big data analytics will enable people to finally see themselves clearly through their own fog.
     
    However, when it comes to the social sciences, big data is a false idol. In contrast to its use in the hard sciences, the application of big data to the social, political and economic realms won’t make these area much clearer or more certain.
     
    Yes, it might allow for the processing of a greater volume of raw information, but it will do little or nothing to alter the inherent subjectivity of the concepts used to divide this information into objects and relations. That’s because these concepts — be they the idea of a “war” or even that of an “adult” — are essentially constructs, contrivances liable to change their definitions with every change to the societies and groups who propagate them.
     
    This might not be news to those already familiar with the social sciences, yet there are nonetheless some people who seem to believe that the simple injection of big data into these “sciences” should somehow make them less subjective, if not objective. This was made plain by a recent article published in the September 30 issue of Science.
     
    Authored by researchers from the likes of Virginia Tech and Harvard, “Growing pains for global monitoring of societal events” showed just how off the mark is the assumption that big data will bring exactitude to the large-scale study of civilization.
     
    The systematic recording of masses of data alone won’t be enough to ensure the reproducibility and objectivity of social studies.
    More precisely, it reported on the workings of four systems used to build supposedly comprehensive databases of significant events: Lockheed Martin’s International Crisis Early Warning System (ICEWS), Georgetown University’s Global Data on Events Language and Tone (GDELT), the University of Illinois’ Social, Political, and Economic Event Database (SPEED) and the Gold Standard Report (GSR) maintained by the not-for-profit MITRE Corporation.
     
    Its authors tested the “reliability” of these systems by measuring the extent to which they registered the same protests in Latin America. If they or anyone else were hoping for a high degree of duplication, they were sorely disappointed, because they found that the records of ICEWS and SPEED, for example, overlapped on only 10.3 percent of these protests. Similarly, GDELT and ICEWS hardly ever agreed on the same events, suggesting that, far from offering a complete and authoritative representation of the world, these systems are as partial and fallible as the humans who designed them.
     
    Even more discouraging was the paper’s examination of the “validity” of the four systems. For this test, its authors simply checked whether the reported protests actually occurred. Here, they discovered that 79 percent of GDELT’s recorded events had never happened, and that ICEWS had gone so far as entering the same protests more than once. In both cases, the respective systems had essentially identified occurrences that had never, in fact, occurred.
     
    They had mined troves and troves of news articles with the aim of creating a definitive record of what had happened in Latin America protest-wise, but in the process they’d attributed the concept “protest” to things that — as far as the researchers could tell — weren’t protests.
     
    For the most part, the researchers in question put this unreliability and inaccuracy down to how “Automated systems can misclassify words.” They concluded that the examined systems had an inability to notice when a word they associated with protests was being used in a secondary sense unrelated to political demonstrations. As such, they classified as protests events in which someone “protested” to her neighbor about an overgrown hedge, or in which someone “demonstrated” the latest gadget. They operated according to a set of rules that were much too rigid, and as a result they failed to make the kinds of distinctions we take for granted.
     
    As plausible as this explanation is, it misses the more fundamental reason as to why the systems failed on both the reliability and validity fronts. That is, it misses the fact that definitions of what constitutes a “protest” or any other social event are necessarily fluid and vague. They change from person to person and from society to society. Hence, the systems failed so abjectly to agree on the same protests, since their parameters on what is or isn’t a political demonstration were set differently from each other by their operators.
     
    Make no mistake, the basic reason as to why they were set differently from each other was not because there were various technical flaws in their coding, but because people often differ on social categories. To take a blunt example, what may be the systematic genocide of Armenians for some can be unsystematic wartime killings for others. This is why no amount of fine-tuning would ever make such databases as GDELT and ICEWS significantly less fallible, at least not without going to the extreme step of enforcing a single worldview on the people who engineer them.
     
    It’s unlikely that big data will bring about a fundamental change to the study of people and society.
    Much the same could be said for the systems’ shortcomings in the validity department. While the paper’s authors stated that the fabrication of nonexistent protests was the result of the misclassification of words, and that what’s needed is “more reliable event data,” the deeper issue is the inevitable variation in how people classify these words themselves.
     
    It’s because of this variation that, even if big data researchers make their systems better able to recognize subtleties of meaning, these systems will still produce results with which other researchers find issue. Once again, this is because a system might perform a very good job of classifying newspaper stories according to how one group of people might classify them, but not according to how another would classify them.
     
    In other words, the systematic recording of masses of data alone won’t be enough to ensure the reproducibility and objectivity of social studies, because these studies need to use often controversial social concepts to make their data significant. They use them to organize “raw” data into objects, categories and events, and in doing so they infect even the most “reliable event data” with their partiality and subjectivity.
     
    What’s more, the implications of this weakness extend far beyond the social sciences. There are some, for instance, who think that big data will “revolutionize” advertising and marketing, allowing these two interlinked fields to reach their “ultimate goal: targeting personalized ads to the right person at the right time.” According to figures in the advertising industry “[t]here is a spectacular change occurring,” as masses of data enable firms to profile people and know who they are, down to the smallest preference.
     
    Yet even if big data might enable advertisers to collect more info on any given customer, this won’t remove the need for such info to be interpreted by models, concepts and theories on what people want and why they want it. And because these things are still necessary, and because they’re ultimately informed by the societies and interests out of which they emerge, they maintain the scope for error and disagreement.
     
    Advertisers aren’t the only ones who’ll see certain things (e.g. people, demographics, tastes) that aren’t seen by their peers.
     
    If you ask the likes of Professor Sandy Pentland from MIT, big data will be applied to everything social, and as such will “end up reinventing what it means to have a human society.” Because it provides “information about people’s behavior instead of information about their beliefs,” it will allow us to “really understand the systems that make our technological society” and allow us to “make our future social systems stable and safe.”
     
    That’s a fairly grandiose ambition, yet the possibility of these realizations will be undermined by the inescapable need to conceptualize information about behavior using the very beliefs Pentland hopes to remove from the equation. When it comes to determining what kinds of objects and events his collected data are meant to represent, there will always be the need for us to employ our subjective, biased and partial social constructs.
     
    Consequently, it’s unlikely that big data will bring about a fundamental change to the study of people and society. It will admittedly improve the relative reliability of sociological, political and economic models, yet since these models rest on socially and politically interested theories, this improvement will be a matter of degree rather than kind. The potential for divergence between separate models won’t be erased, and so, no matter how accurate one model becomes relative to the preconceptions that birthed it, there will always remain the likelihood that it will clash with others.
     
    So there’s little chance of a big data revolution in the humanities, only the continued evolution of the field.
  • Big data defeats dengue

    mosquito-aedes-albopictusNumbers have always intrigued Wilson Chua, a big data analyst hailing from Dagupan, Pangasinan and currently residing in Singapore. An accountant by training, he crunches numbers for a living, practically eats them for breakfast, and scans through rows and rows of excel files like a madman.
     
    About 30 years ago, just when computer science was beginning to take off, Wilson stumbled upon the idea of big data. And then he swiftly fell in love. He came across the story of John Snow, the English physician who solved the cholera outbreak in London in 1854, which fascinated him with the idea even further. “You can say he’s one of the first to use data analysis to come out with insight,” he says.
     
    In 1850s-London, everybody thought cholera was airborne. Nobody had any inkling, not one entertained the possibility that the sickness was spread through water. “And so what John Snow did was, he went door to door and made a survey. He plotted the survey scores and out came a cluster that centered around Broad Street in the Soho District of London.
     
    “In the middle of Broad Street was a water pump. Some of you already know the story, but to summarize it even further, he took the lever of the water pump so nobody could extract water from that anymore. The next day,” he pauses for effect, “no cholera.”
     
    The story had stuck with him ever since, but never did he think he could do something similar. For Wilson, it was just amazing how making sense of numbers saved lives.
     
    A litany of data
     
    In 2015 the province of Pangasinan, from where Wilson hails, struggled with rising cases of dengue fever. There were enough dengue infections in the province—2,940 cases were reported in the first nine months of 2015 alone—for it to be considered an epidemic, had Pangasinan chosen to declare it.
     
    Wilson sat comfortably away in Singapore while all this was happening. But when two of his employees caught the bug—he had business interests in Dagupan—the dengue outbreak suddenly became a personal concern. It became his problem to solve.
     
    “I don’t know if Pangasinan had the highest number of dengue cases in the Philippines,” he begins, “but it was my home province so my interests lay there,” he says. He learned from the initial data released by the government that Dagupan had the highest incident of all of Pangasinan. Wilson, remembering John Snow, wanted to dig deeper.
     
    Using his credentials as a technology writer for Manila Bulletin, he wrote the Philippine Integrated Diseases Surveillance and Response team (PIDSR) of the Department of Health, requesting for three years worth of data on Pangasinan.
     
    The DOH acquiesced and sent him back a litany of data on an Excel sheet: 81,000 rows of numbers or around 27,000 rows of data per year. It’s an intimidating number but one “that can fit in a hard disk,” Wilson says.
     
    He then set out to work. Using tools that converted massive data into understandable patterns—graphs, charts, the like—he looked for two things: When dengue infections spiked and where those spikes happened.
     
    “We first determined that dengue was highly related to the rainy season. It struck Pangasinan between August and November,” Wilson narrates. “And then we drilled down the data to uncover the locations, which specific barangays were hardest hit.”
     
    The Bonuan district of the city of Dagupan, which covers the barangays of Bonuan Gueset, Bonuan Boquig, and Bonuan Binloc, accounted for a whopping 29.55 percent—a third of all the cases in Dagupan for the year 2015.
     
    The charts showed that among the 30 barangays, Bonuan Gueset was number 1 in all three years. “It means to me that Bonuan Gueset was the ground zero, the focus of infection.”
     
    But here’s the cool thing: After running the data on analytics, Wilson learned that the PIDS sent more than they had hoped for. They also included the age of those affected. According to the data, dengue in Bonuan was prevalent among school children aged 5-15 years old.
     
    “Now given the background of Aedes aegypti, the dengue-carrying mosquito—they bite after sunrise and a few hours before sunset. So it’s easily to can surmise that the kids were bitten while in school.”
     
    It excited him so much he fired up Google Maps and switched it to satellite image. Starting with Barangay Bonuan Boquig, he looked for places that had schools that had stagnant pools of water nearby. “Lo and behold, we found it,” he says.
     
    Sitting smack in the middle of Lomboy Elementary School and Bonuan Boquig National High School were large pools of stagnant water.
    Like hitting jackpot, Wilson quickly posted his findings on Facebook, hoping someone would take up the information and make something out of it. Two people hit him up immediately: Professor Nicanor Melecio, the project director of the e-Smart Operation Center of Dagupan City Government, and Wesley Rosario, director at the Bureau of Fisheries and Aquatic Resources, a fellow Dagupeño.
     
    A social network
     
    Unbeknownst to Wilson, back in Dagupan, the good professor had been busy, conducting studies on his own. The e-Smart Center, tasked with crisis, flooding, disaster-type of situation, had been looking into the district’s topography vis-a-vis rainfall in Bonuan district. “We wanted to detect the catch basins of the rainfall,” he says, “the elevation of the area, the landscape. Basically, we wanted to know the deeper areas where rainfall could possibly stagnate.”
     
    Like teenage boys, the two excitedly messaged each other on Facebook. “Professor Nick had lieder maps of Dagupan, and when he showed me those, it confirmed that these areas, where we see the stagnant water, during rainfall, are those very areas that would accumulate rainfall without exit points,” Wilson says. With no sewage system, the water just sat there and accumulated.
     
    With Wilson still operating remotely in Singapore, Professor Melecio took it upon himself to do the necessary fieldwork. He went to the sites, scooped up water from the stagnant pools, and confirmed they were infested with kiti-kiti or wriggling mosquito larvae.
     
    Professor Melecio quickly coordinated with Bonuan Boquig Barangay Captain Joseph Maramba to involve the local government of Bonuan Boquig on their plan to conduct vector control measures.
     
    A one-two punch
     
    Back in Singapore, Wilson found inspiration from the Tiger City’s solution to its own mosquito problem. “They used mosquito dunks that contained BTI, the bacteria that infects mosquitoes and kills its eggs,” he says.
     
    He used his own money to buy a few of those dunks, imported them to Dagupan, and on Oct. 6, had his team scatter them around the stagnant pools of Bonuan Boquig. The solution was great, dream-like even, except it had a validity period. Beyond 30 days, the bacteria is useless.
     
    Before he even had a chance to even worry about the solution’s sustainability, BFAR director Wesley Rosario pinged him on Facebook saying the department had 500 mosquito fish for disposal. “Would we want to send somebody to his office, get the fish, and release them into the pools?”
     
    The Gambezi earned its nickname because it eats, among other things, mosquito larvae. In Wilson’s and Wesley’s mind, the mosquito fish can easily make a home out of the stagnant pools and feast on the very many eggs present. When the dry season comes, the fish will be left to die. Except, here’s the catch: mosquito fish is edible.
     
    “The mosquito fish solution was met with a few detractors,” Wilson admits. “There are those who say every time you introduce a new species, it might become invasive. But it’s not really new as it is already endemic to the Philippines. Besides we are releasing them in a landlocked area, so wala namang ibang ma-a-apektuhan.”
     
    The critics, however, were silenced quickly. Four days after deploying the fish, the mosquito larvae were either eaten or dead. Twenty days into the experiment, with the one-two punch of the dunks and the fish, Barangay Boquig reported no new infections of dengue.
     
    “You know, we were really only expecting the infections to drop 50 percent,” Wilson says, rather pleased. More than 30 days into the study and Barangay Bonuan Boquig still has no reports of new cases. “We’re floored,” he added.
     
    At the moment, nearby barangays are already replicating what Wilson, Professor Melecio, and Wesley Rosario have done with Bonuan Boquig. Michelle Lioanag of the non-profit Inner Wheel Club of Dagupan has already taken up the cause to do the same for Bonuan Gueset, the ground zero for dengue in Dagupan.
     
    According to Wilson, what they did in Bonuan Boquig is just a proof of concept, a cheap demonstration of what big data can do. “It was so easy to do,” he said. “Everything went smoothly,” adding all it needed was cooperative and open-minded community leaders who had nothing more than sincere public service in their agenda.
     
    “You know, big data is multi-domain and multi-functional. We can use it for a lot of industries, like traffic for example. I was talking with the country manager of Waze…” he fires off rapidly, excited at what else his big data can solve next.
     
    Source: news.mb.com, November 21, 2016
  • Big Data Experiment Tests Central Banking Assumptions

    centrale bank van nederland(Bloomberg) -- Central bankers may do well to pay less attention to the bond market and their own forecasts than they do to newspaper articles.That’s the somewhat heretical finding of a new algorithm-based index being tested at Norway’s central bank in Oslo. Researchers fed 26 years of news (or 459,745 news articles) from local business daily Dagens Naringsliv into a macroeconomic model to create a “newsy coincident index of business cycles” to help it gauge the state of the economy.

    Leif-Anders Thorsrud, a senior researcher at the bank who started the project while getting his Ph.D. at the Norwegian Business School, says the “hypothesis is quite simple: the more that is written on a subject at a time, the more important the subject could be.”

    He’s already working on a new paper (yet to be published) showing it’s possible to make trades on the information. According to Thorsrud, the work is part of a broader “big data revolution.”

    Big data and algorithms have become buzzwords for hedge funds and researchers looking for an analytical edge when reading economic and political trends. For central bankers, the research could provide precious input to help them steer policy through an unprecedented era of monetary stimulus, with history potentially a serving as a poor guide in predicting outcomes.

    At Norway’s central bank, researchers have found a close correlation between news and economic developments. Their index also gives a day-to-day picture of how the economy is performing, and do so earlier than lagging macroeconomic data.

    But even more importantly, big data can be used to predict where the economy is heading, beating the central bank’s own forecasts by about 10 percent, according to Thorsrud. The index also showed it was a better predictor of the recession in the early 2000s than market indicators such as stocks or bonds.

    The central bank has hired machines, which pore daily through articles from Dagens Naringsliv and divide current affairs into topics and into words with either positive or negative connotations. The data is then fed into a macroeconomic model employed by the central bank, which spits out a proxy of GDP.

    Thorsrud says the results of the index are definitely “policy relevant,” though it’s up to the operative policy makers whether they will start using the information. Other central bank such as the Bank of England are looking at similar tools, he said.

    While still in an experimental stage, the bank has set aside more resources to continue the research, Thorsrud said. “In time this could be a useful in the operative part of the bank.”

    Bron: Informatie Management
  • Big Data gaat onze zorg verbeteren

    Hij is een man met een missie. En geen geringe: hij wil samen met patiënten, de zorgverleners en verzekeraars een omslag in de gezondheidszorg bewerkstelligen, waarbij de focus verlegd wordt van het managen van ziekte naar het managen van gezondheid. Jeroen Tas, CEO Philips Connected Care & Health Informatics, over de toekomst van de zorg.

    big-data-healthcare-2Wat is er mis met het huidige systeem?

    “In de ontwikkelde wereld wordt gemiddeld 80 procent van het budget voor zorg besteed aan het behandelen van chronische ziektes, zoals hart- en vaatziektes, longziektes, diabetes en verschillende vormen van kanker. Slechts 3 procent van dat budget wordt besteed aan preventie, aan het voorkomen van die ziektes. Terwijl we weten dat 80 procent van hart- en vaatziekten, 90 procent van diabetes type 2 en 50 procent van kanker te voorkomen zijn. Daarbij spelen sociaaleconomische factoren mee, maar ook voeding, wel of niet roken en drinken, hoeveel beweging je dagelijks krijgt en of je medicatie goed gebruikt. We sturen dus met het huidige systeem lang niet altijd op op de juiste drivers om de gezondheid van mensen te bevorderen en hun leven daarmee beter te maken. 50 procent van de patiënten neemt hun medicatie niet of niet op tijd in. Daar liggen mogelijkheden voor verbetering.”

    Dat systeem bestaat al jaren - waarom is het juist nu een probleem?
    “De redenen zijn denk ik alom bekend. In veel landen, waaronder Nederland, vergrijst de bevolking en neemt daarmee het aantal chronisch zieken toe, en dus ook de druk op de zorg. Daarbij verandert ook de houding van de burger ten aanzien van zorg: beter toegankelijk, geïntegreerd en 24/7, dat zijn de grote wensen. Tot slot nemen de technologische mogelijkheden sterk toe. Mensen kunnen en willen steeds vaker zelf actieve rol spelen in hun gezondheid: zelfmeting, persoonlijke informatie en terugkoppeling over voortgang. Met Big Data zijn we nu voor het eerst in staat om grote hoeveelheden data snel te analyseren, om daarin patronen te ontdekken en meer te weten te komen over ziektes voorspellen en voorkomen. Kortom, we leven in een tijd waarin er binnen korte tijd heel veel kan en gaat veranderen. Dan is het belangrijk om op de juiste koers te sturen.”

    Wat moet er volgens jou veranderen?
    “De zorg is nog steeds ingericht rond (acute) gebeurtenissen. Gezondheid is echter een continu proces en begint met gezond leven en preventie. Als mensen toch ziek worden, volgt er diagnose en behandeling. Vervolgens worden mensen beter, maar hebben ze misschien nog wel thuis ondersteuning nodig. En hoop je dat ze weer verder gaan met gezond leven. Als verslechtering optreedt is tijdige interventie wenselijk. De focus van ons huidige systeem ligt vrijwel volledig op diagnose en behandeling. Daarop is ook het vergoedingssysteem gericht: een radioloog wordt niet afgerekend op zijn bijdrage aan de behandeling van een patiënt maar op de hoeveelheid beelden die hij maakt en beoordeelt. Terwijl we weten dat er heel veel winst in termen van tijd, welzijn en geld te behalen valt als we juist meer op gezond leven en preventie focussen. 

    Er moeten ook veel meer verbanden komen tussen de verschillende pijlers in het systeem en terugkoppeling over de effectiviteit van diagnose en behandeling. Dat kan bijvoorbeeld door het delen van informatie te stimuleren. Als een cardioloog meer gegevens heeft over de thuissituatie van een patiënt, bijvoorbeeld over hoe hij zijn medicatie inneemt, eet en beweegt, dan kan hij een veel beter behandelplan opstellen, toegesneden op de specifieke situatie van de patiënt. Als de thuiszorg na behandeling van die patiënt ook de beschikking heeft over zijn data, weet men waarop er extra gelet moet worden voor optimaal herstel. En last maar zeker not least, de patiënt moet ook over die data beschikken, om zo gezond mogelijk te blijven. Zo ontstaat een patiëntgericht systeem gericht op een optimale gezondheid.”

    Dat klinkt heel logisch. Waarom gebeurt het dan nog niet?
    “Alle verandering is lastig – en zeker verandering in een sector als de zorg, die om begrijpelijke redenen conservatief is en waarin er complexe processen spelen. Het is geen kwestie van technologie: alle technologie die we nodig hebben om de omslag tot stand te brengen, is er. We hebben sensoren om data automatisch te generen, die in de omgeving van de patiënt kunnen worden geïnstalleerd, die hij kan dragen – denk aan een Smarthorloge – en die zelfs in zijn lichaam kunnen zitten, in het geval van slimme geneesmiddelen. Daarmee komt de mens centraal te staan in het systeem, en dat is waar we naartoe willen.
    Er moet een zorgnetwork om ieder persoon komen, waarin onderling data wordt gedeeld ten behoeve van de persoonlijke gezondheid. Dankzij de technologie kunnen veel behandelingen ook op afstand gebeuren, via eHealth oplossingen. Dat is veelal sneller en vooral efficiënter dan mensen standaard doorsturen naar het ziekenhuis. Denk aan thuismonitoring, een draagbaar echo apparaat bij de huisarts of beeldbellen met een zorgverlener. We kunnen overigens al hartslag, ademhaling en SPo2 meten van een videobeeld. 

    De technologie is er. We moeten het alleen nog combineren, integreren en vooral: implementeren. Implementatie hangt af van de bereidheid van alle betrokkenen om het juiste vergoedingsstelsel en samenwerkingsverband te vinden: overheid, zorgverzekeraars, ziekenhuis, artsen, zorgverleners en de patiënt zelf. Daarover ben ik overigens wel positief gestemd: ik zie de houding langzaam maar zeker veranderen. Er is steeds meer bereidheid om te veranderen.”

    Is die bereidheid de enige beperkende factor?
    “We moeten ook een aantal zaken regelen op het gebied van data. Data moet zonder belemmeringen kunnen worden uitgewisseld, zodat alle gegevens van een patiënt altijd en overal beschikbaar zijn. Dat betekent uiteraard ook dat we ervoor moeten zorgen dat die gegevens goed beveiligd zijn. We moeten ervoor zorgen dat we dat blijvend kunnen garanderen. En tot slot moeten we werken aan het vertrouwen dat nodig is om gegevens te standaardiseren en te delen, bij zorgverleners en vooral bij de patiënt.Dat klinkt heel zwaar en ingewikkeld maar we hebben het eerder gedaan. Als iemand je twintig jaar geleden had verteld dat je via internet al je bankzaken zou regelen, zou je hem voor gek hebben versleten: veel te onveilig. Inmiddels doen we vrijwel niet anders.
    De shift in de zorg nu vraagt net als de shift in de financiële wereld toen om een andere mindset. De urgentie is er, de technologie is er, de bereidheid ook steeds meer – daarom zie ik de toekomst van de zorg heel positief in.”

     Bron: NRC
  • Big Tech: the battle for our data

    Big Tech: the battle for our data

    The most important sector of tech is user privacy and with it comes a war not fought in the skies or trenches but in congressional hearings and slanderous advertisements, this battle fought in the shadows for your data and attention is now coming to light.

    The ever-growing reliance we have on technology has boomed since the advent of social media, especially and specifically with phones. Just 15 years ago, the favoured way of accessing services like Facebook was through a computer but this changed at a radical pace following the introduction of the iPhone in 2007 and the opening of the iOS App Store in 2008.

    Since then, the app economy now in its teens has become a multi-billion dollar industry built on technologies founded in behavioural change and habit forming psychology.

    If you don’t have the iPhone’s ‘Screen Time’ feature set up, you’ll want to do that after hearing this:

    According to various studies a typical person spends over four hours a day on their phone, with almost half of that time taken up by social media platforms like Facebook, Instagram, and Twitter. These studies were conducted before the pandemic so it wouldn’t be far stretched to assume these figures have gone up.

    So what happens with all this time spent on these platforms?

    Your time is your attention, your attention is their data

    Where advertisements of old for businesses and products relied on creativity and market research on platforms like television and newspapers, modern advertising takes advantage of your online behaviour and interests to accurately target tailored advertisements to users.

    User data collected by Facebook is used to create targeted advertisements for all kinds of products, businesses and services. They use information like your search history, previous purchases, location data and even collect identifying information across apps and websites owned by other companies to build a profile that’s used to advertise things to you. In a recent update to iOS, Apple’s App Store now requires developers to outline to users what data is tracked and collected in what they are calling ‘privacy nutrition labels’.

    In response to this in Facebook’s most recent quarterly earnings call, Mark Zuckerberg stated “We have a lot of competitors who make claims about privacy that are often misleading,” and “Now Apple recently released so-called (privacy) nutrition labels, which focused largely on metadata that apps collect rather than the privacy and security of people’s actual messages,”.

    Facebook uses this meta-data to sell highly targeted ad space.

    This is how you pay for ‘free’ services, with your data and attention

    The harvesting of user data on platforms like Facebook has not only benefited corporations in ‘Big Tech’ and smaller business but has even been grossly abused by politicians to manipulate outcomes of major political events.

    In 2018, the Cambridge Analytica scandal emerged into the forefront of mainstream media after a whistleblower for the company, Christopher Wylie came forward with information that outlined the unethical use of Facebook user data to create highly targeted advertisements with the goal of swaying political agendas. Most notably, illicitly obtained data was used in former US President Donald Trump’s 2016 presidential campaign in the United States, as well as the Leave. EU and UK Independence campaigns in support of BREXIT in the United Kingdom and his is just the tip of the iceberg.

    This is the level of gross manipulation of data Apple is taking a stand against.

    “The fact is that an interconnected eco-system of companies and data-brokers; of purveyors of fake news and peddlers of division; of trackers and hucksters just trying to make a quick buck, is more present in our lives than it has ever been.” — Tim Cook on Privacy, 2021

    What we have here are two titans of industry with massive amounts of influence and responsibility at war.

    On one hand, you have Facebook who has time and time again been grilled in public forums for data harvesting of their 2.6 billion monthly active users, shadow profiles (data collected on non-Facebook users), and social media bias, and then, on the other hand, you have Apple, who have 1.5 billion active devices running iOS across iPhone and iPad, all of which are ‘tools’ that demand attention with constant notifications and habit forming user experience design.

    Apple has been scrutinised in the past for its App Store policy and are currently fighting an anti-trust lawsuit filed by Epic Games over the removal of Fortnite from the App Store for violating its policies on in-app purchases. Facebook stated in December of 2020, that the company will support Epic Games’ case and is also now reportedly readying an antitrust lawsuit of its own against Apple for forcing third-party developers to follow rules that first-party apps don’t have to follow.

    Zuckerberg stated in the earnings call that “Apple has every incentive to use their dominant platform position to interfere with how our apps and other apps work, which they regularly do to preference their own. And this impacts the growth of millions of businesses around the world.” and “we believe Apple is behaving anti-competitively by using their control of the App Store to benefit their bottom line at the expense of app developers and small businesses”. This is an attempt by Zuckerberg to show that Apple is using their control of the App Store to stifle the growth of small businesses but our right to know how our own data is being used should stand paramount, even if its at the expense of business growth.

    Apple’s position on privacy protection ‘for the people’ and introduction of privacy ‘nutrition labelling’ is not one that just benefits users, but is one that benefits and upholds trust in the company and its products. The choices the company makes in its industries tend to form and dictate how and where the market will go. You only have to look at its previous trends in product and packaging design to see what argument I’m trying to make.

    With growing concern and mainstream awareness of data use, privacy is now at the forefront of consumer trends. Just look at the emergence of VPN companies in the last couple of years. Apple’s stance on giving privacy back to the user could set a new trend into motion across the industry and usher in an age of privacy-first design.

    Author: Morgan Fox

    Source: Medium

  • Business Data Scientist 2.0

    Ruim 3 jaar geleden verzorgden we de eerste leergang Business Data Scientist. Getriggerd door de vele sexy vacature teksten vroegen we ons als docenten af wat een data scientist nu exact tot data scientist maakt? In de vacatureteksten viel ons naast een enorme variëteit ook een waslijst aan noodzakelijke competenties op. De associatie met het (meestal) denkbeeldige schaap met de vijf poten was snel gelegd. Daarnaast sprak uit die vacatureteksten in 2014 vooral hoop en ambitie. Bedrijven met hoge verwachtingen op zoek naar deskundig personeel om de alsmaar groter wordende stroom data te raffineren tot waarde voor de onderneming. Wat komt daar allemaal bij kijken?

    Een aantal jaar en 7 leergangen later is er veel veranderd. Maar eigenlijk ook weer weinig. De verwachtingen van bedrijven zijn nog steeds torenhoog. De data scientist komt voor in alle vormen en gedaanten. Dat lijkt geaccepteerd. Maar de kern: hoe data tot waarde te brengen en wat daarbij komt kijken blijft onderbelicht. De relevantie voor een opleiding Business Data Scientist is dus onveranderd. En eigenlijk groter geworden. De investeringen in data science zijn door veel bedrijven gedaan. Het wordt tijd om te oogsten.Data scientist 2.0

    Om data tot waarde te kunnen brengen is ‘verbinding’ noodzakelijk. Verbinding tussen de hard core data scientists die data als olie kunnen opboren, raffineren tot informatie en het volgens specificaties kunnen opleveren aan de ene kant. En de business mensen met hun uitdagingen aan de andere kant. In onze leergangen hebben we veel verhalen gehoord van mooie dataprojecten die paarlen voor de zwijnen bleken vanwege onvoldoende verbinding. Hoe belangrijk ook, zonder die verbinding overleeft de data scientist niet. De relevantie van een leergang Business Data Scientist is dus onveranderd. Moet iedere data scientist deze volgen? Bestaat er een functie business data scientist? Beide vragen kunnen volmondig met néé beantwoord worden. Wil je echter op het raakvlak van toepassing en data science opereren dan zit je bij deze leergang precies goed. En dat raakvlak zal meer en meer centraal gaan staan in data intensieve organisaties.

    De business data scientist is iemand die als geen ander weet dat de waarde van data zit in het uiteindelijk gebruik. Vanuit dat eenvoudig uitgangspunt definieert, begeleidt, stuurt hij/zij data projecten in organisaties. Hij denkt mee over de structurele verankering van het gebruik van data science in de operationele en beleidsmatige processen van organisatie en komt met inrichtingsvoorstellen. De business data scientist kent de data science gereedschapskist door en door zonder ieder daarin aanwezige instrument ook daadwerkelijk zelf te kunnen gebruiken. Hij of zij weet echter welk stukje techniek voor welk type probleem moet worden ingezet. En omgekeerd is hij of zij in staat bedrijfsproblemen te typeren en classificeren zodanig dat de juiste technologieën en expertises kunnen worden geselecteerd. De business data scientist begrijpt informatieprocessen, kent de tool box van data science en weet zich handig te bewegen in het domein van de belangen die altijd met projecten zijn gemoeid.

    De BDS leergang is relevant voor productmanagers en marketeers die data intensiever willen gaan werken, voor hard core data scientists die de verbinding willen leggen met de toepassing in hun organisatie en voor (project)managers die verantwoordelijk zijn voor het functioneren van data scientists.

    De leergang BDS 2.0 wordt gekenmerkt door een actie gerichte manier van leren. Gebaseerd op een theoretisch framework dat tot doel heeft om naar de tool box van data science te kijken vanuit het oogpunt van business value staan cases centraal. In die cases worden alle fasen van het tot waarde brengen van data belicht. Van de projectdefinitie via de data analyse en de business analytics naar het daadwerkelijk gebruik. En voor alle relevante fasen leveren specialisten een deep dive. Ben je geïnteresseerd in de leergang. Download dan hier de brochure. http://www.ru.nl/rma/leergangen/bds/

    Egbert Philips  

    Docent BDS leergang Radboud Management Academy

    Director Hammer, market intelligence   www.Hammer-intel.com

     

  • Business Intelligence Trends for 2017

    businessintelligence 5829945be5abcAnalyst and consulting firm, Business Application Research Centre (BARC), has come out with the top BI trends based on a survey carried out on 2800 BI professionals. Compared to last year, there were no significant changes in the ranking of the importance of BI trends, indicating that no major market shifts or disruptions are expected to impact this sector.
     
    With the growing advancement and disruptions in IT, the eight meta trends that influence and affect the strategies, investments and operations of enterprises, worldwide, are Digitalization, Consumerization, Agility, Security, Analytics, Cloud, Mobile and Artificial Intelligence. All these meta trends are major drivers for the growing demand for data management, business intelligence and analytics (BI). Their growth would also specify the trend for this industry.The top three trends out of 21 trends for 2017 were:
    • Data discovery and visualization,
    • Self-service BI and
    • Data quality and master data management
    • Data labs and data science, cloud BI and data as a product were the least important trends for 2017.
    Data discovery and visualization, along with predictive analytics, are some of the most desired BI functions that users want in a self-service mode. But the report suggested that organizations should also have an underlying tool and data governance framework to ensure control over data.
     
    In 2016, BI was majorly used in the finance department followed by management and sales and there was a very slight variation in their usage rates in that last 3 years. But, there was a surge in BI usage in production and operations departments which grew from 20% in 2008 to 53% in 2016.
     
    "While BI has always been strong in sales and finance, production and operations departments have traditionally been more cautious about adopting it,” says Carsten Bange, CEO of BARC. “But with the general trend for using data to support decision-making, this has all changed. Technology for areas such as event processing and real-time data integration and visualization has become more widely available in recent years. Also, the wave of big data from the Internet of Things and the Industrial Internet has increased awareness and demand for analytics, and will likely continue to drive further BI usage in production and operations."
     
    Customer analysis was the #1 investment area for new BI projects with 40% respondents investing their BI budgets on customer behavior analysis and 32% on developing a unified view of customers.
    • “With areas such as accounting and finance more or less under control, companies are moving to other areas of the enterprise, in particular to gain a better understanding of customer, market and competitive dynamics,” said Carsten Bange.
    • Many BI trends in the past, have become critical BI components in the present.
    • Many organizations were also considering trends like collaboration and sensor data analysis as critical BI components. About 20% respondents were already using BI trends like collaboration and spatial/location analysis.
    • About 12% were using cloud BI and more were planning to employ it in the future. IBM's Watson and Salesforce's Einstein are gearing to meet this growth.
    • Only 10% of the respondents used social media analysis.
    • Sensor data analysis is also growing driven by the huge volumes of data generated by the millions of IoT devices being used by telecom, utilities and transportation industries. According to the survey, in 2017, the transport and telecoms industries would lead the leveraging of sensor data.
    The biggest new investments in BI are planned in the manufacturing and utilities industries in 2017.
     
    Source: readitquick.com, November 14, 2016
  • Chatbots, big data and the future of customer service

    Chatbots, big data and the future of customer service

    The rise and development of big data has paved the way for an incredible array of chatbots in customer service. Here's what to know.

    Big data is changing the direction of customer service. Machine learning tools have led to the development of chatbots. They rely on big data to better serve customers.

    How are chatbots changing the future of the customer service industry and what role does big data play in managing them?

    Big data Leads to the deployment of more sophisticated chatbots

    BI-kring published an article about the use of chatbots in HR about a month ago. This article goes deeper into the role of big data when discussing chatbots.

    The following terms are more popular than ever: 'chatbot', 'automated customer service', 'virtual advisor'. Some know more, others less about process automation. One thing is for sure: if you want to sell more on the internet, handle more customers, save on personnel costs, you certainly need a chatbot. A chatbot is a conversational system that was created to stimulate intelligent conversation between a human and an automaton.

    Chatbots rely on machine learning and other sophisticated data technology. They are constantly collecting new data from their interactions with customers to offer a better experience.

    But how commonly used are chatbots? An estimated 67% of consumers around the world have communicated with one. That figure is going to rise sharply in the near future. In 2020, over 85% of all customer service interactions will involve chatbots.

    A chatbot makes it possible to automate customer service in various communication channels, for example on a website, chat, in social media or via SMS. In practice, a customer does not have to wait for hours to receive a reply from the customer service department, a bot will provide an answer within a few seconds.

    According to requirements, a chatbot may assume the role of a virtual advisor or assistant. For questions where a real person has to become involved, in analyzing the received enquiries bots can not only identify what issue the given customer is addressing but also to automatically send it to the correct person or department. Machine learning tools make it easier to determine when a human advisor is needed.

    Bots supported by associative memory algorithms understand the entire content even if the interlocutor made a mistake or a typo. Machine learning makes it easier for them to decipher contextual meanings by interpreting these mistakes.

    Response speed and 24/7 assistance are very important when it comes to customer service, as late afternoons and evenings are times of day when online shops experience increased traffic. If a customer cannot obtain information about a given product right there and then, it is possible that they will just abandon their basket and not come shop at that store again. Any business would want to prevent that a customer journey towards their product takes a turn the other way, especially if it's due to a lack of appropriate support.

    Online store operators, trying to stay a step ahead of the competition, often decide to implement a state-of-the-art solution, which makes the store significantly more attractive and provides a number of new opportunities delivered by chatbots. Often, following the application of such a solution, website visits increase significantly. This translates into more sales of products or services.

    We are not only seeing increased interest in the e-commerce industry, chatbots are successfully used in the banking industry as well. Bank Handlowy and Credit Agricole use bots to handle loyalty programmes or as assistants when paying bills.

    What else can a chatbot do?

    Big data has made it easier for chatbots to function. Here are some of the benefits that they offer:

    • Send reminders of upcoming payment deadlines.
    • Send account balance information.
    • Pass on important information and announcements from the bank.
    • Offer personalised products and services.
    • Bots are also increasingly more often used to interact with customers wishing to order meals, taxis, book tickets, accommodation, select holiday packages at travel agents, etc.

    The insurance industry is yet another area where chatbots are very useful. Since insurance companies are already investing heavily in big data and machine learning to handle actuarial analyses, it is easy for them to extend their knowledge of data technology to chatbots.

    The use of Facebook Messenger chatbots during staff recruitment may be surprising for many people.

    Chatbots are frequently used in the health service as well, helping to find the right facilities, arrange a visit, select the correct doctor and also find opinions about them or simply provide information on given drugs or supplements.

    As today every young person uses a smartphone, social media and messaging platforms for a whole range of everyday tasks like shopping, acquiring information, sorting out official matters, paying bills etc., the use of chatbots is slowly becoming synonymous with contemporary and professional customer service. A service available 24/7, often geared to satisfy given needs and preferences.

    Have you always dreamed of employees who do not get sick, do not take vacations and do not sleep? Try using a chatbot.

    Big data has led to fantastic developments with chatbots

    Big data is continually changing the direction of customer service. Chatbots rely heavily on the technology behind big data. New advances in machine learning and other data technology should lead to even more useful chatbots in the future.

    Author: Ryan Kh

    Source: SmartDataCollective

  • Cognitive diversity to strengthen your team

    Cognitive diversity to strengthen your team

    Many miles of copy have been written on the advantages of diverse teams. But all too often this thinking is only skin deep. That is it focusses on racial, gender & sexual orientation diversity.

    There can be a lot more benefit in having team ;members who actually think differently. This is what is called cognitive diversity. I’ve seen that in both the teams I’ve lead and at my clients’ offices. So, when blogger Harry Powell approached me with his latestbook review, I was sold.

    Harry is Director of Data Analytics at Jaguar Land Rover. He has blogged previously on th Productivity Puzzle and an Alan Turing lecture, amongst other topics. So, over to Harry to share what he has learnt about the importance of this type of diversity.

    Reading about Rebel Ideas

    I have just finished reading 'Rebel Ideas' by Matthew Syed. It’s not a long book, and hardly highbrow (anecdotes about 9/11 and climbing Everest, you know the kind of thing) but it made me think a lot about my team and my company.

    It’s a book about cognitive diversity in teams. To be clear that’s not the same thing as demographic diversity, which is about making sure that your team is representative of the population from which it is drawn. It’s about how the people in your team think.

    Syed’s basic point is that if you build a team of people who share similar perspectives and approaches the best possible result will be limited by the capability of the brightest person. This is because any diversity of thought that exists will essentially overlap. Everyone will think the same way.

    But if your team comprises people who approach problems differently, there is a good chance that your final result will incorporate the best bits of everyone’s ideas, so the worst possible result will be that of the brightest person, and will it normally end up being a lot better. This is because the ideas will overlap less, and so complement each other (see note below).

    Reflections on why this is a good idea

    In theory, I agree with this idea. Here are a few reflections:

    • The implication is that it might be better to recruit people with diverse perspectives and social skills than to simply look for the best and brightest. Obviously bright, diverse and social is the ideal.
    • Often a lack of diversity will not manifest itself so much in the solutions to the questions posed, but in the selection or framing of the problems themselves.
    • Committees of like-minded people not only water down ideas, they create the illusion of a limited set of feasible set of problems and solutions, which is likely to reduce the confidence of lateral thinkers to speak up.
    • Strong hierarchies and imperious personalities can be very effective in driving efficient responses to simple situations. But when problems are complex and multi-dimensional, these personalities can force through simplistic solutions with disastrous results.
    • Often innovation is driven not simply by the lone genius who comes up with a whole new idea, but by combining existing technologies in new ways. These new 'recombinant' ideas come together when teams are connected to disparate sets of ideas.

    All this, points towards the benefits of having teams made up of people who think differently about the world. But it poses other questions.

    Context guides the diversity you need

    What kinds of diversity are pertinent to a given situation?

    For example, if you are designing consumer goods, say mobile phones, you probably want a cross-section of ages and gender, given that different ages and genders may use those phones differently: My kids want to use games apps, but I just want email. My wife has smaller hands than me, etc.

    But what about other dimensions like race, or sexual preference? Are those dimensions important when designing a phone? You would have thought that the dimension of diversity you need may relate to the problem you are trying to solve.

    On the other hand, it seems that the most important point of cognitive diversity is that it makes the whole team aware of their own bounded perspectives, that there may be questions that remain to be asked, even if the demographic makeup of your team does not necessarily span wide enough to both pose and solve issues (that’s what market research is for).

    So, perhaps it doesn’t strictly matter if your team’s diversity is related to the problem space. Just a mixture of approaches can be valuable in itself.

    How can you identify cognitive diversity?

    Thinking differently is harder to observe than demographic diversity. Is it possible to select for the former without resorting to selecting on the latter?

    Often processes to ensure demographic diversity, such as standardised tests and scorecards in recruitment processes, promote conformity of thought and work against cognitive diversity. And processes to measure cognitive diversity directly (such as aptitude tests) are more contextual than are commonly admitted and may stifle a broader equality agenda.

    In other words, is it possible to advance both cognitive and demographic diversity with the same process?

    Even if you could identif different thinkers, what proportion of cognitive diversity can you tolerate in an organisation that needs to get things done?

    I guess the answer is the proportion of your business that is complex and uncertain, although a key trait of non-diverse businesses is that their self-assessment of their need for new ideas will be limited by their own lack of perspective. And how can you reward divergent thinkers?

    Much of what they do may be seen as disruptive and unproductive. Your most obviously productive people may be your least original, but they get things done.

    What do I do in my team?

    For data scientists, you need to test a number of skills at interview. They need to be able to think about a business problem, they need to understand mathematical methodologies, and they need to be able to code. There’s not a lot of time left for assessing originality or diversity of thought.

    So what I do is make the questions slightly open-ended, maybe a bit unconventional, certainly without an obviously correct answer.

    I expect them to get the questions a bit wrong. And then I see how they respond to interventions. Whether they take those ideas and play with them, see if they can use them to solve the problem. It’s not quite the same as seeking out diversity, but it does identify people who can co-exist with different thinkers: people who are open to new ways of thinking and try to respond positively.

    And then try to keep a quota for oddballs. You can only have a few of them, and they’ll drive you nuts, but you’ll never regret it.

    EndNote: the statistical appeal of Rebel Ideas

    Note: This idea appeals to me because it has a nice machine learning analogue to it. In a regression you want your information sets to be different, ideally orthogonal. If your data is collinear, you may as well have just one regressor.

    Equally, ensembles of low performing but different models often give better results than a single high-performing model.

    Author: Paul Laughlin

    Source: Datafloq

  • Connection between human and artificial intelligence moving closer to realization

    Connection between human and artificial intelligence moving closer to realization

    What was once the stuff of science fiction is now science fact, and that’s a good thing. It is heartening to hear how personal augmentation with robotics is changing people’s lives.

    A paralyzed man in France is now using a brain-controlled robotic suit to walk. The connection between brain and machine is now possible through ultra-high-speed computing power combined with deep engineering to enable a highly connected device.

    Artificial Intelligence is in everyday life

    We are seeing the rise of artificial intelligence in every walk of life, moving beyond the black box and being part of human life in its everyday settings.

    Another example is the advent of the digital umpire in major league baseball. The angry disputes of players and fans challenging the umpire and holding of breaths for the replay may become a thing of the past with deep precision of instant decisions from an unbiased, non-impassioned, and non-human ump.

    Augmented reality and virtual reality are also becoming a must for business. They have moved into every aspect from medicine to mining across design, manufacturing, logistics and service and are a familiar business tool delivering multi-dimensional immediate insights that were previously hard to find.

    For example, people are using digital twin technology to see deeply into equipment, wherever it is, and diagnose and fix problems, or to take a global view of business operations through a digital board room.

    What’s changed? Fail fast, succeed sooner!

    Every technology takes time to find its groove as early adopters experiment and find mass uses for it. There is a usual cycle of experimentation, fast failure is a necessary part of discovering best applications for technology. We all saw Google Goggles fail to find market traction, but in its current generation, it is an invaluable addition to provide valuable information to people in the field, repairing equipment and needed expertise on site.

    Speed, Intelligence, and Connection make it happen

    The Six Million Dollar Man for business should be able to connect to the brain, providing instant feedback to the operations of the business based on actual experience in the field. It has to operate in the speed of a heartbeat and use predictive technologies. (Nerd Alert: Speaking of the Six Million Dollar Man, it should come as no surprise that the titular character has been upgraded to 'The Six Billion Dollar Man' in the upcoming movie starring Mark Walberg.)

    Think of all the stuff our brain is doing even as we walk, balancing our bodies as we are in motion, making adjustments as we turn our head or land our feet. Predicting where our body will be so that the weight of our limbs can be adjusted, the brain needs instant feedback from all our senses to make decisions in realtime that appear to be 'natural'.

    Business, too, needs systems that are deeply connected, predictive, and high speed, balancing the desire for movement to optimizing the operations to make it happen. That requires a new architecture that is lightning fast using memory rather than disk processing, using artificial intelligence to optimize decisions that are too fast to make on our own, to keep a pulse on the business and to predict with machine learning.

    The fundamental architecture is different. It has to work together and be complete; it is no good having leg movements from one vendor and head movements from another. In a world where speed and sensing has to cover the whole body, it needs to work in unison.

    We can’t wait to see how these new architectures will change the world.

    Author: David Sweetman

    Source: Dataversity

  • Context & Uncertainty in Web Analytics

    Context & Uncertainty in Web Analytics

    Trying to make decisions with data

    “If a measurement matters at all, it is because it must have some conceivable effect on decisions and behaviour. If we can’t identify a decision that could be affected by a proposed measurement and how it could change those decisions, then the measurement simply has no value” - Douglas W. Hubbard, How to Measure Anything: Finding the Value of Intangibles in Business, 2007

    Like many digital businesses we use web analytics tools that measure how visitors interact with our websites and apps. These tools provide dozens of simple metrics, but in our experience their value for informing a decision is close to zero without first applying a significant amount of time, effort and experience to interpret them.

    Ideally we would like to use web analytics data to make inferences about what stories our readers value and care about. We can then use this to inform a range of decisions: what stories to commission, how many articles to publish, how to spot clickbait, which headlines to change, which articles to reposition on the page, and so on.

    Finding what is newsworthy can not and should not be as mechanistic as analysing an e-commerce store, where the connection between the metrics and what you are interested in measuring (visitors and purchases) is more direct. We know that — at best — this type of data can only weakly approximate what readers really think, and too much reliance on data for making decisions will have predictable negative consequences. However, if there is something of value the data has to say, we would like to hear it.

    Unfortunately, simple web analytics metrics fail to account for key bits of  that are vital if we want to understand if their values are higher or lower than what we should expect (and therefore interesting).

    Moreover, there is inherent  in the data we are using, and even if we can tell whether the value is higher or lower than expected, it is difficult to tell whether this is just down to chance.

    Good analysts, familiar with their domain often get good at doing the mental gymnastics required to account for context and uncertainty, so they can derive the insights that support good decisions. But doing this systematically when presented with a sea of metrics is rarely possible or the best use of an analyst’s valuable sense-making skills. Rather than all their time being spent trying to identify what is unusual, it would be better if their skills could be applied to learning why something is unusual or deciding how we might improve things. But if all of our attention is focused on the lower level what questions, we never get to the why or how questions — which is where we stand a chance of getting some value from the data.

    Context

    “The value of a fact shrinks enormously without context” - Howard Wainer, Visual Revelations: Graphical Tales of Fate and Deception from Napoleon Bonaparte to Ross Perot, 1997

    Take two metrics that we would expect to be useful — how many people start reading an article (we call this ), and how long they spend on it (we call this the average ). If the metrics worked as intended, they could help us identify the stories our readers care about, but in their raw form, they tell us very little about this.

    • : If an article is in a more prominent position on the website or app, more people will see it and click on it.
    • If an article is longer, on average, people will tend to spend more time reading it.

    Counting the number of readers tells us more about where an article was placed, and dwell time more about the length of the article than anything meaningful.

    It’s not just length and position that matter. Other context such as the section, the day of the week, how long since it was published, and whether people are reading it on our website or apps all systematically influence these numbers. So much so, that we can do a reasonable job of predicting how many readers an article will get and how long they will spend on it by only , and completely ignoring the content of the article.

    From this perspective, articles are a victim of circumstance, and the raw metrics we see in so many dashboards tell us more about their circumstances than anything more meaningful — it’s all noise and very little signal.

    Knowing this, what we really want to understand is how much better or worse an article did than we would expect, given that context. In our newsroom, we do this by turning each metric (readers, dwell time and some others) into an index that compares the actual metric for an article to it’s expected value. We score it on a scale from 1 to 5, where 3 is expected, 4 or 5 is better than expected and 1 or 2 is worse than expected.

    Article A: a longer article in a more prominent position. Neither the number of readers nor the time they spent reading it was different from what we would expect (both indices = 3).

    Article B: a shorter article in a less prominent position. Whilst it had the expected number of readers (index = 3), they spent longer reading it than we would expect (index = 4).

    The figures above show how we present this information when looking at individual articles. Article A had 7,129 readers, more than four thousand more readers than article B, and people spent 2m 44s reading article A, almost a minute longer than article B. A simple web analytics display would pick article A as the winner on both counts by a large margin. And completely mislead us.

    Once we take into account the context, and calculate the indices, we find that both articles had about as many readers as we would expect, no more or lessEven though article B had four thousand fewer, it was in a less prominent position, and so we wouldn’t expect so many. However, people did spend longer reading article B than we would expect, given factors such as it’s length (it was shorter than article A).

    The indices are the output of a predictive model, which predicts a certain value (e.g. number of readers), based on the context (the features in the model). The difference between the actual value and the predicted value (the residuals in the modelthen form the basis of the index, which we rescale into the 1–5 score. An additional benefit is that we also have a common scale for different measures, and a common language for discussing these metrics across the newsroom.

    Unless we account for context, we can only really use data for : ‘Just tell me which article got me the most readers, I don’t care why’. If the article only had more readers because it was at the top of the edition we’re not learning anything useful from the data, and at worst it creates a self fulfilling feedback loop (more prominent articles get more readers — similar to the popularity bias that can occur in recommendation engines).

    In his excellent book Upstream, Dan Heath talks about moving from . Data for learning is fundamental if we want to make better decisions. If we want to use data for learning in the newsroom, it’s incredibly useful to be able to identify which articles are performing better or worse than we would expect, but that is only ever the start. The real learning comes from what we do with that information, trying something different, and seeing if it has a positive effect on our readers’ experience.

    “Using data for inspection is so common that leaders are sometimes oblivious to any other model.” - Dan Heath, Upstream: The Quest to Solve Problems Before They Happen, 2020

    Uncertainty

    “What is not surrounded by uncertainty cannot be truth” - Richard Feynman (probably)

    The metrics presented in web analytics tools are incredibly precise. 7,129 people read the article we looked at earlier. How do we compare that to an article with 7,130 readers? What about one with 8,000? When presented with numbers, we can’t help making comparisons, even if we have no idea whether the difference matters.

    We developed our indices to avoid meaningless comparisons that didn’t take into account context, but earlier versions of our indices were displayed in a way that suggested more preciseness than they provided — we used a scale from 0 to 200 (with 100* as expected).

    *Originally we had 0 as our expected value, but quickly learnt that nobody likes having a negative score for their article, but something below 100 is more palatable.

    Predictably, people started worrying about small differences in the index values between articles. ‘This article scored 92 , but that one scored 103, that second article did better, let’s look at what we can learn from it’. Sadly the model we use to generate the index is not that accurate, and models, like data have uncertainty associated with them. Just as people agonise over small meaningless differences in raw numbers, the same was happening with the indices, and so we moved to a simple 5 point scale.

    Most articles get a 3, which can be interpreted as ‘we don’t think there is anything to see here, the article is doing as well as we’d expect on this measure’. An index of 2 or 1 means it is doing a bit worse or a lot worse than expected, and a 4 or a 5 means it is doing a bit better or a lot better than expected.

    In this format, the indices provide just enough information for us to know —  — how an article is doing. We use this alongside other data visualisations of indices or raw metrics where more precision is helpful, but in all cases our aim is to help focus attention on what matters, and free up time to validate these insights and decide what to do with them.

    Why are context and uncertainty so often ignored?

    These problems are not new and covered in many great books on data sense-making — some are decades old, but more recently Howard WainerStephen Few and R J Andrews.

    Practical guidance on dealing with  is easier to come by, but in our experience, thinking about  is trickier. From some perspectives this is odd. Predictive models — the bread and butter of data scientists — inherently deal with context as well as uncertainty, as do many of the tools for analysing time series data and detecting anomalies (such as statistical process control). But we are also taught to be cautious when making comparisons where there are fundamental differences between the things we are measuring. Since there are so many differences between the articles we publish, from length, position, who wrote them, what they are about, to the section and day of week on which they appear, we are left wondering whether we can or should use data to compare any of them. Perhaps the guidance on piecing all of this together to build better measurement metrics is less common, because how you deal with context is so contextual.

    Even if you set out on this path, there are many mundane reasons to fail. Often the valuable . It took us months to bring basic metadata about our articles— such as length and the position in which they appear— into the same system as the web analytics data. An even bigger obstacle is how much time it takes just to maintain a  metrics system (digital products are constantly changing, and this often breaks the web analytics data, including ours as I wrote this). Ideas for improving metrics often stay as ideas or proof of concepts that are not fully rolled out as you deal with these issues.

    If you do get started, there are myriad choices to make to account for context and uncertainty— from technical to ethical — all involving value judgements. If you stick with a simple metric you can avoid these choices. Bad choices can derail you, but even if you make good ones, if you can’t adequatelywhat you have done, you can’t expect the people who use the metrics to trust them. By accounting for context and uncertainty you may replace a simple (but not very useful) metric with something that is in theory more useful, but the opaqueness causes more problems than it solves. Even worse, people place too much trust in the metric and use it without questioning it.

    As for using data to make decisions. We will leave that for another post. But if the data is all noise and no signal, how do you present it in a clear way so the people using it understand what decisions it can help them make? The short answer is you can’t. But if the pressure is on to present some data, it is easier to passively display it in a big dashboard, filled with metrics and leave it to others to work out what to do, in the same way passive language can shield you if you have nothing interesting to say (or bullshit as Carl T. Bergstrom would call it). This is something else we have battled with, and we have tried to avoid replacing big dashboards filled with metrics with big dashboards filled with indices.

    Adding an R for reliable and an E for explainable, we end up with a checklist to help us avoid bad — or CRUDE — metrics (ontext eliability ncertainty ecision orientated xplainability). Checklists are always useful, as it’s easy to forget what matters along the way.

    Anybody promising a quick and easy path to metrics that solve all your problems is probably trying to sell you something. In our experience, it takes time and a significant commitment by everybody involved to build something better. If you don’t have this, it’s tough to even get started.

    Non-human metrics

    Part of the joy and pain of applying these principles to metrics used for analytics — that is, numbers that are put in front of people who then use them to help them make decisions — is that it provides a visceral feedback loop when you get it wrong. If the metrics cannot be easily understood, if they don’t convey enough information (or too much), if they are biased, or if they are unreliable or if they just look plain wrong vs. everything the person using them knows, you’re in trouble. Whatever the reason, you hear about it pretty quickly, and this is a good motivator for addressing problems head on if you want to maintain trust in the system you have built.

    Many metrics are not designed to be consumed by humans. The metrics that live inside automated decision systems are subject to many of the same considerations, biases and value judgements. It is sobering to consider the number of changes and improvements we have made based on the positive feedback loop from people using our metrics in the newsroom on a daily basis. This is not the case with many automated decision systems.

    Author: Dan Gilbert

    Source: Medium

  • Creating a single view of the customer with the help of data

    Data has key contributions for creating a single view of the customer that can be used to improve your business and better understand those involved in the market you serve. In order to thrive in the current economic environment, businesses need to know their customers very well in order to provide exceptional customer service. To do so, they must be able to rapidly understand and react to customer shopping behaviors. To properly interpret and react to customer behaviors, businesses need a complete, single view of their customers. What does that mean? A Single View of the Customer (SVC) allows a business to analyze and visualize all the relevant information surrounding their customers, such as transactions, products, demographics, marketing, discounting, etc. Unfortunately, the IT systems that are typically found in mid-to large-scale companies have separated much of the relevant customer data into individual systems. Marketing information, transaction data, website analytics, customer profiles, shipping information, product information, etc. are often kept in different data repositories. This makes the implementation of SVC potentially challenging. First of all, let’s examine two scenarios where a company’s ability to fuse all these external sources into an SVC can provide tremendous value. Afterwards, we will pay attention to strategies for implementing an SVC.

    Call center

    Customer satisfaction has a major impact on the bottom line of any business. Studies found that 78 percent of customers have failed to complete a transaction because of poor customer service. A company’s call center is a key part ofmaintaining customer satisfaction. Customers interacting with the call center typically do so because there is already a problem – a missing order, a damaged or incorrect product. It is critical that the call center employee can resolve the problem as quickly as possible. Unfortunately, due to the typical IT infrastructure discussed above, the customer service representative often faces a significant challenge. For example, in order to locate the customer’s order, access shipping information and the potential to find a replacement product to ship, the customer service representative may have to log in to three or more different systems (often, the number is significantly higher). Every login to a new system increases the time the customer has to wait, decreasing their satisfaction, and every additional system adds to the probability of diturbances or even failure.

    Sales performance

    Naturally, in order to maximize revenue, it is critical to understand numerous key metrics including, but not limited to:

    • What products are/are not doing well?
    • What products are being purchased by your key customer demographic groups?
    • What items are being purchased together (basket analysis)?
    • Are there stores with inventory problems (overstocked/understocked)?

    Once again, the plethora of data storage systems poses a challenge. To gather the data required to perform the necessary analytics, numerous systems will have to be queried. As with the call center scenario, this is often performed manually, via “swivel-chair integration.” This means that an analyst will have to manually login to each system, execute a query to get the necessary data, store that data in a temporary location (often in Microsoft Excel™), and then repeat that process for each independent data store. Finally, once the data is gathered the process of performing the actual analysis can begin. The process of gathering the necessary data often takes longer than the actual analysis. Even in medium-sized companies this process can involve numerous people to get the analysis done as quickly as possible. Still, the manual nature of this process means that it is not only expensive to perform (in terms of resources), but it occurs at a much slower pace than ideal to rapidly make business decisions. The fact that performing even the most basic and critical analytics is so expensive and time consuming often prevents companies from taking the next steps, even though those steps could turn out critical to the business' sales. One of those potential next steps is moving the full picture of the customer directly into the stores. When sales associates can immediately access customer information, they are able to provide a more personalized customer experience, which is likely to increase customer satisfaction and average revenue per sale. Another opportunity where the company can see tremendous sales impact is in moving from reactive analytics to predictive analytics. When a company runs traditional retail metrics – as previously described – they are typically either done as part of a regular reporting cycle or in response to an event, such as a sale, in order to understand the impact of that event on the business. While no one is likely to dispute the value of those analytics, the fact is that the company is now merely reacting to events that have already happened. We can try to use some sort of advanced analytic methods to predict how our customers may behave in the future based on their past actions, but as we so often hear from financial analysts, past performance is not indicative of future results. However, if we can take our SVC, that links together all of their past actions, and tie in information about what the customer intends to do in the future (like Prosper Insight Analytics', we now have a roadmap of customer intent that we can use to make key business decisions.

    Creating and implementing a Single View of the Customer

    To be effective, an implementation of SVC must be driven from a unified data fabric. Attempting to create an SVC by directly connecting the application to each of the necessary data sources will be extremely time consuming and will prove highly challenging to implement. A data fabric can connect the necessary data to provide an operational analytic environment upon which to base the SVC. The data fabric driving the SVC should meet the following requirements:

    • It must connect all the relevant data from the existing data sources in a central data store. This provides the ability to perform the complex analytics across all the linked information.
    • It needs to be easily modified to support new data. As the business grows and evolves, learning to leverage its new SVC, additional data sources will typically be identified as being helpful. These should be easy to integrate into the system.
    • The initial implementation must be rapid. Ideally, a no-code solution should be implemented. Companies rarely have the resources to support expensive, multi-month IT efforts.
    • It should not disrupt existing systems. The data fabric should provide an augmented data layer that supports the complex analytic queries required by the SVC without disrupting the day-to-day functionality of the existing data stores or applications.

    Conclusion

    A well-built SVC can have a significant, positive impact on a company’s bottom line. Customer satisfaction can be increased and analysis can move from being reactive to being predictive of customer behavior. This effort will likely require that a data fabric be developed to support the application, but new technologies now make it possible to rapidly create that fabric using no-code solutions, thereby making feasible the deployment of a Single View of the Customer.

    Author: Clark Richey

    Source: Smart Data Collective

  • Dashboard storytelling: The perfect presentation (part 1)

    Dashboard storytelling: The perfect presentation (part 1)

    Plato famously said that “those who tell stories rule society.” This statement is as true today as it was in ancient Greece, perhaps even more so in modern times.

    In the contemporary world of business, the age-old art of storytelling is far from forgotten: rather than speeches on the Senate floor, businesses rely on striking data visualizations to convey information, drive engagement, and persuade audiences.

    By combining the art of storytelling with the technological capabilities of dashboard software, it’s possible to develop powerful, meaningful, data-backed presentations that not only move people but also inspire them to take action or make informed, data-driven decisions that will benefit your business.

    As far back as anyone can remember, narratives have helped us make sense of the sometimes complicated world around us. Rather than just listing facts, figures, and statistics, people used gripping, imaginative timelines, bestowing raw data with real context and interpretation. In turn, this got the attention of listeners, immersing them in the narrative, thereby offering a platform to absorb a series of events in their mind’s eye precisely the way they unfolded.

    Here we explore data-driven, live dashboard storytelling in depth, looking at storytelling with KPIs and the dynamics of a data storytelling presentation while offering real-world storytelling presentation examples.

    First, we’ll delve into the power of data storytelling as well as the general dynamics of a storytelling dashboard and what you can do with your data to deliver a great story to your audience. Moreover, we will offer dashboard storytelling tips and tricks that will help you make your data-driven narrative-building efforts as potent as possible, driving your business into exciting new dimensions. But let’s start with a simple definition.

    “You’re never going to kill storytelling, because it’s built in the human plan. We come with it.” – Margaret Atwood

    What is dashboard storytelling?

    Dashboard storytelling is the process of presenting data in effective visualizations that depict the whole narrative of key performance indicators, business strategies and processes in the form of an interactive dashboard on a single screen, and in real-time. Storytelling is indeed a powerful force, and in the age of information, it’s possible to use the wealth of insights available at your fingertips to communicate your message in a way that is more powerful than you could ever have imagined. So, let's take a look at the top tips and tricks to be able to successfully create your own story with a few clicks.

    4 Tricks to get started with dashboard storytelling

    Big data commands big stories.

    Forward-thinking business people turn to online data analysis and data visualizations to display colossal volumes of content in a few well-designed charts. But these condensed business insights may remain hidden if they aren’t communicated with words in a way that is effective and rewarding to follow. Without language, business people often fail to push their message through to their audience, and as such, fail to make any real impact.

    Marketers, salespeople, and entrepreneurs are today’s storytellers. They are wholly responsible for their data story. People in these roles are often the bridge between their data and the forum of decision-makers they’re looking to encourage to take the desired action.

    Effective dashboard storytelling with data in a business context must be focused on tailoring the timeline to the audience and choosing one of the right data visualization types to complement or even enhance the narrative.

    To demonstrate this notion, let’s look at some practical tips on how to prepare the best story to accompany your data.

    1. Start with data visualization

    This may sound repetitive, but when it comes to a dashboard presentation, or dashboard storytelling presentation, it will form the foundation of your success: you must choose your visualization carefully.

    Different views answer different questions, so it’s vital to take care when choosing how to visualize your story. To help you in this regard, you will need a robust data visualization tool. These intuitive aids in dashboard storytelling are now ubiquitous and provide a wide array of options to choose from, including line charts, bar charts, maps, scatter plots, spider webs, and many more. Such interactive tools are rightly recognized as a more comprehensive option than PowerPoint presentations or endless Excel files.

    These tools help both in exploring the data and visualizing it, enabling you to communicate key insights in a persuasive fashion that results in buy-in from your audience.

    But for optimum effectiveness, we still need more than a computer algorithm.. Here we need a human to present the data in a way that will make it meaningful and valuable. Moreover, this person doesn’t need to be a common presenter or a teacher-like figure. According to research carried out by Stanford University, there are two types of storytelling: author- and reader-driven storytelling.

    An author-driven narrative is static and authoritative because it dictates the analysis process to the reader or listener. It’s like analyzing a chart printed in a newspaper. On the other hand, reader-driven storytelling allows the audience to structure the analysis on their own. Here, the audience can choose the data visualizations that they deem meaningful and interact with them on their own by drilling down to more details or choosing from various KPI examples they want to see visualized. They can reach out for insights that are crucial to them and make sense out of data independently. A different story may need a different type of stoeytelling.

    2. Put your audience first

    Storytelling for a dashboard presentation should always begin with stating your purpose. What is the main takeaway from your data story? It should be clear that your purpose is to motivate the audience to take a certain action.

    Instead of thinking about your business goals, try to envision what your listeners are seeking. Each member of your audience, be that a potential customer, future business partner, or stakeholder, has come to listen to your data storytelling presentation to gain a profit for him or herself. To better meet your audience’s expectations and gain their trust (and money), put their goals first in the determination of the line of your story.

    Needless to say, before your dashboard presentation, try to learn as much as you can about your listeners. Put yourself in their shoes: Who are they? What do they do on a daily basis? What are their needs? What value can they draw from your data for themselves?

    The better you understand your audience, the more they will trust you and follow your idea.

    3. Don’t fill up your data storytelling with empty words

    Storytelling with data, rather than just presenting data visualizations, brings the best results. That said, there are certain enemies of your story that make it more complicated than enlightening and turn your efforts into a waste of time.

    The first things that could cause some trouble are the various technology buzzwords that are devoid of any defined meaning. These words don’t create a clear picture in your listeners’ heads and are useless as a storytelling aid. In addition, to under-informing your audience, buzzwords are a sign of your lazy thinking and a herald that you don’t have anything unique or meaningful to say. Try to add clarity to your story by using more precise and descriptive narratives that truly communicate your purpose.

    Another trap can be the use of your industry jargon to sound more professional. The problem here is that it may not be the jargon of your listeners’ industry, they may not comprehend your narrative. Moreover, some jargon phrases have different meanings depending on the context they are used in. They mean one thing in the business field and something else in everyday life. Generally they reduce clarity and can also convey the opposite meaning of what you intend to communicate in your data storytelling.

    Don’t make your story too long, focus on explaining the meaning of data rather than the ornateness of your language, and humor of your anecdotes. Avoid overusing buzzwords or industry jargon and try to figure out what insights your listeners want to draw from the data you show them.

    4. Utilize the power of storytelling

    Before we continue our journey into data-powered storytelling, we’d like to further illustrate the unrivaled the power of offering your audience, staff, or partners inspiring narratives by sharing these must-know insights:

    • Recent studies suggest that 80% of today’s consumers want brands to tell stories about their business or products.
    • The average person processes 100 to 500 digital words every day. By taking your data and transforming it into a focused, value-driven narrative, you stand a far better chance of your message resonating with your audience and yielding the results you desire.
    • Human beings absorb information 60 times faster with visuals than with linear text-based content alone. By harnessing the power of data visualization to form a narrative, you’re likely to earn an exponentially greater level of success from your internal or external presentations.

    Please also take a look at part 2 of this interesting read, including presentation tips and examples of dashboard storytelling.

    Author: Sandra Durcevic

    Source: Datapine

  • Dashboard storytelling: The perfect presentation (part 2)

    Dashboard storytelling: The perfect presentation (part 2)

    In the first part of this article, we have introduced the phenomenon of dashboard storytelling and some tips and tricks to get started with it. If you haven´t read part 1 of this article, make sure you do that! You can find part 1 here.

    How to present a dashboard – 6 Tips for the perfect dashboard storytelling presentation

    Now that we’ve covered the data-driven storytelling essentials, it’s time to dig deeper into ways that you can make maximum impact with your storytelling dashboard presentations.

    Business dashboards are now driving forces for visualization in the field of business intelligence. Unlike their predecessors, a state-of-the-art dashboard builder gives presenters the ability to engage audiences with real-time data and offer a more dynamic approach to presenting data compared to the rigid, linear nature of, say, Powerpoint for example.

    With the extra creative freedom data dashboards offer, the art of storytelling is making a reemergence in the boardroom. The question now is: What determines great dashboarding?

    Without further ado, here are six tips that will help you to transform your presentation into a story and rule your own company through dashboard storytelling.

    1. Set up your plan

    Start at square one on how to present a dashboard: outline your presentation. Like all good stories, the plot should be clear, problems should be presented, and an outcome foreshadowed. You have to ask yourself the right data analysis questions when it comes to exploring the data to get insights, but you also need to ask yourself the right questions when it comes to presenting such data to a certain audience. Which information do they need to know or want to see? Make sure you have a concise storyboard when you present so you can take the audience along with you as you show off your data. Try to be purpose-driven to get the best dashboarding outcomes, but don’t entangle yourself in a rigid format that is unchangeable.

    2. Don’t be afraid to show some emotion

    Stephen Few, a leading design consultant, explains on his blog that “when we appeal to people’s emotions strictly to help them personally connect with information and care about it, and do so in a way that draws them into reasoned consideration of the information, not just feeling, we create a path to a brighter, saner future”. Emotions stick around much longer in a person’s psyche than facts and charts. Even the most analytical thinkers out there will be more likely to remember your presentation if you can weave elements of human life and emotion. How to present a dashboard with emotion? By adding some anecdotes, personal life experiences that everyone can relate to, or culturally shared moments and jokes.

    However, do not rely just on emotions to make your point. Your conclusions and ideas need to be backed by data, science, and facts. Otherwise, and especially in business contexts, you might not be taken seriously. You’d also miss an opportunity to help people learn to make better decisions by using reason and would only tap into a “lesser-evolved” part of humanity. Instead, emotionally appeal to your audience to drive home your point.

    3. Make your story accessible to people outside your sector

    Combining complicated jargon, millions of data points, advanced math concepts, and making a story that people can understand is not an easy task. Opt for simplicity and clear visualizations to increase the level of audience engagement.

    Your entire audience should be able to understand the points that you are driving home. Jeff Bladt, the director of Data Products Analytics at DoSomething.org, offered a pioneering case study on accessibility through data. When commenting on how he goes from 350 million data points to organizational change, he shared: “By presenting the data visually, the entire staff was able to quickly grasp and contribute to the conversation. Everyone was able to see areas of high and low engagement. That led to a big insight: Someon outside the analytics team noticed that members in Texas border towns were much more engaged than members in Northwest coastal cities.”

    Making your presentation accessible to laypeople opens up more opportunities for your findings to be put to good use.

    4. Create an interactive dialogue

    No one likes being told what to do. Instead of preaching to your audience, enable them to be a part of the presentation througinteractive dashboard features. By using real-time data, manipulating data points in front of the audience, and encouraging questions during the presentation, you will ensure your audiences are more engaged as you empower them to explore the data on their own. At the same time, you will also provide a deeper context. The interactivity is especially interesting in dashboarding when you have a broad target audience: it onboards newcomers easily while letting the ‘experts’ dig deeper into the data for more insights.

    5. Experiment

    Don’t be afraid to experiment with different approaches to storytelling with data. Create a dashboard storytelling plan that allows you to experiment, test different options, and learn what will build the engagement among your listeners and make sure you fortify your data storytelling with KPIs (Key Performance Indicators). As you try and fail by making them fall asleep or check their email, you will only learn from it and get the information on how to improve your dashboarding and storytelling with data techniques, presentation after presentation.

    6. Balance your words and visuals wisely

    Last but certainly not least is a tip that encompasses all of the above advice but also offers a means of keeping it consistent, accessible, and impactful from start to finish balance your words and visuals wisely.

    What we mean here is that in data-driven storytelling, consistency is key if you want to grip your audience and drive your message home. Our eyes and brains focus on what stands out. The best data storytellers leverage this principle by building charts and graphs with a single message that can be effortlessly understood, highlighting both visually and with words the strings of information that they want their audience to remember the most.

    With this in mind, you should keep your language clear, concise, and simple from start to finish. While doing this, use the best possible visualizations to enhance each segment of your story, placing a real emphasis on any graph, chart, or sentence that you want your audience to take away with them.

    Every single element of your dashboard design is essential, but by emphasizing the areas that really count, you’ll make your narrative all the more memorable, giving yourself the best possible chance of enjoying the results you deserve.

    The best dashboard storytelling examples

    Now that we’ve explored the ways in which you can improve your data-centric storytelling and make the most of your presentations, it’s time for some inspiring storytelling presentation examples. Let’s start with a storytelling dashboard that relates to the retail sector.

    1. A retailer’s store dashboard with KPIs

    The retail industry is an interesting one as it has particularly been disrupted with the advent of online retailing. Collecting data analytics is extremely important for this sector as it can take an excellent advantage out of analytics because of its data-driven nature. And as such, data storytelling with KPIs is a particularly effective method to communicate trends, discoveries and results.

    The first of our storytelling presentation examples serves up the information related to customers’ behavior and helps in identifying patterns in the data collected. The specific retail KPIs tracked here are focused on the sales: by division, by items, by city, and the out-of-stock items. It lets us know what the current trends in customers’ purchasing habits are and allow us to break down this data according to a city or a gender/age for enhanced analysis. We can also anticipate any stock-out to avoid losing money and visualize the stock-out tendencies over time to spot any problems in the supply chain.

    2. A hospital’s management dashboard with KPIs

    This second of our data storytelling examples delivers the tale of a busy working hospital. That might sound a little fancier than it is, but it’s of paramount importance. All the more when it comes to public healthcare, a sector very new to data collection and analytics that has a lot to win from it in many ways.

    For a hospital, a centralized dashboard is a great ally in the everyday management of the facility. The one we have here gives us the big picture of a complex establishment, tracking several healthcare KPIs.

    From the total admissions to the total patients treated, the average waiting time in the ER, or broken down per division, the story told by the healthcare dashboard is essential. The top management of this facility have a holistic view to run the operations more easily and efficiently and can try to implement diverse measures if they see abnormal figures. For instance, an average waiting time for a certain division that is way higher than the others can shed light on some problems this division might be facing: lack of staff training, lack of equipment, understaffed unit, etc.

    All this is vital for the patient’s satisfaction as well as the safety and wellness of the hospital staff that deals with life and death every day.

    3. A human resources (HR) recruitment dashboard with KPIs

    The third of our data storytelling examples relates to human resources. This particular storytelling dashboard focuses on one of the most essential responsibilities of any modern HR department: the recruitment of new talent.

    In today’s world, digital natives are looking to work with a company that not only shares their beliefs and values but offers opportunities to learn, progress, and grow as an individual. Finding the right fit for your organization is essential if you want to improve internal engagement and reduce employee turnover.

    The HR KPIs related to this storytelling dashboard are designed to enhance every aspect of the recruitment journey, helping to drive down economical efficiencies and improving the quality of hires significantly.

    Here, the art of storytelling with KPIs is made easy. This HR dashboard offers a clear snapshot into important aspects of HR recruitment, including the cost per hire, recruiting conversion or success rates, and the time to fill a vacancy from initial contact to official offer.

    With this most intuitive of data storytelling examples, building a valuable narrative that resonates with your audience is made easy, and as such, it’s possible to share your recruitment insights in a way that fosters real change and business growth.

    Final words of advice

    One of the major advantages of working with dashboards is the improvement they have made to data visualization. Don’t let this feature go to waste with your own presentations. Place emphasis on making visuals clear and appealing to get the most from your dashboarding efforts.

    Transform your presentations from static, lifeless work products into compelling stories by weaving an interesting and interactive plot line into them.

    If you haven't read part 1 of this article yet, you can find it here.

    Author: Sandra Durcevic

    Source: Datapine

  • Data access: the key to better decision making

    Data access: the key to better decision making

    When employees have better access to data, they end up making better decisions.

    Companies across sectors are already well in the habit of collecting relevant historical and business data to make projections and forecast the unknown future. They’re collecting this data at such a scale that 'big data' has become a buzzword technology. They want lots of it because they want an edge wherever they can get it. Who wouldn’t?

    But it’s not only the quantity and quality of the data a company collects that play a pivotal role in how that company moves forward, it’s also a question of access. When businesses democratize access to that data such that it’s accessible to workers throughout a hierarchy (and those workers end up actually interacting with it), it increases the quality of decisions made on lower rungs of the ladder. Those decisions end up being more often data-informed, and data is power.

    But that’s easier said than done lately. Businesses have no issue collecting data nowadays, but they do tend to keep it cordoned off.

    Data sticks to the top of a business hierarchy

    A business’s C-suite (often with help from a technical data science team) makes the big-picture decisions that guide the company’s overall development. This means the employees using data to inform a chosen course of action (like last year’s revenue versus this year’s revenue, or a certain client’s most common order) are either highly ranked within the company, or are wonky data specialists. Data lives behind a velvet rope, so to speak.

    But this data would be eminently useful to people throughout an organization, regardless of their rank or tenure. Such a level of access would make it more likely that data guides every decision, and that would lead to more desirable business outcomes over time. It might even overtly motivate employees by subtly reinforcing the idea that results are tracked and measured.

    Data tends not to trickle down to the appropriate sources

    Who better to have a clear view of the business landscape than the employees who toe the front lines every day? What would change if disparate employees scattered throughout an organization suddenly had access to actionable data points? These are the people positioned to actually make a tweak or optimization from the get-go. Whoever comes up with a data-informed strategy on a strong way forward, these are the people actually implementing it. But an organization-level awareness of an actionable data point doesn’t necessarily equate to action.

    As previously established, data has a high center of gravity. It is managerial food for thought on the way to designing and executing longer-term business strategies.

    But when companies change their culture around access to data and make it easy for everyone to interact with data, they make every worker think like such a strategist.

    By the time a piece of data reaches an appropriate source, it’s notnecessarily in a form he or she can’t interact with or understand

    As much as managers might like to think otherwise, there are people in their organization thinking in less than granular terms. They aren’t necessarily thinking about the costs their actions may or may not be having on the company, they don’t think about the overall bottom line. That’s why it’s important that data be in a form that people can use or understand, because it doesn’t always reach them that way.

    Getting data into a useable, understandable form happens by preserving connection between departments and avoiding disconnects.

    There seems to be a big data disconnect at the intersection of engineering and product development

    This is the intersection is where a business’s technical prowess meets its ability to design a great product. While the two pursuits are clearly related to one another on the way to great product design, it’s rare that one person should excel at both.

    The people who design groundbreaking machine learning algorithms aren’t necessarily the people who design a groundbreaking consumer product, and vice versa. They need each other’s help to understand each other.

    But data is the shared language that makes understanding possible. Not everyone has years of data science training, not everyone has business leadership experience, but even people doing menial things can still benefit from great access to data. Coming across the year’s growth goal, for example, might trigger a needle-moving idea from someone on how to actually get there. Great things happen when employees build a shared understanding of the raw numbers that drive everything they do.

    Businesses already collect so much data in the course of their day-to-day operations. But they could start using that data more effectively by bringing it out from behind the curtain, presenting employees across the board with easy access and interaction for it. The motivation for doing so should be clear: when more people think about the same problem in the same terms, that problem is more likely to be solved.

    All they need is access to the data that makes it possible.

    Author: Simone Di Somma

    Source: Insidebigdata

  • Data alone is not enough, storytelling matters - part 1

    Data alone is not enough, storytelling matters - part 1

    Crafting meaningful narratives from data is a critical skill for all types of decision making, in business, and in our public discourse

    As companies connect decision-makers with advanced analytics at all levels of their organizations, they need both professional and citizen data scientists who can extract value from that data and share. These experts help develop process-driven data workflows, ensuring employees can make predictive decisions and get the greatest possible value from their analytics technologies.

    But understanding data and communicating its value to others are two different skill sets. Your team members’ ability to do the latter impacts the true value you get from your analytics investment. This can work for or against your long-term decision-making and will shape future business success.

    There are between stories and their ability to guide people’s decisions, even in professional settings. Sharing data in a way that adds value to decision-making processes still requires a human touch. This is true even when that data comes in the form of insights from advanced analytics.

    That’s why data storytelling is such a necessary activity. Storytellers convert complex datasets into full and meaningful narratives, rich with visualizations that help guide all types of business decisions. This can happen at all levels of the organization with the right tools, skill sets, and workflows in place. This article highlights the importance of data storytelling in enterprise organizations and illustrates the value of the narrative in decision-making processes.

    What is data storytelling?

    Data storytelling is an acquired skill. Employees who have mastered it can make sense out of a body of data and analytics insights, then convey their wisdom via narratives that make sense to other team members. This wisdom helps guide decision making in an honest, accurate, and valuable way.

    Reporting that provides deep, data-driven context beyond the static data views and visualizations is a structured part of a successful analytic lifecycle. There are three structural elements of data storytelling that contribute to its success:

    • Data: Data represents the raw material of any narrative. Storytellers must connect the dots using insights from data to create a meaningful, compelling story for decision-makers.
    • Visualization: Visualization is a way to accurately share data in the context of a narrative. Charts, graphs, and other tools “can enlighten the audience to insights that they wouldn’t see without [them],” Forbes observes, where insights might otherwise remain hidden to the untrained eye.

    • NarrativeA narrative enables the audience to understand the business and emotional importance of the storyteller’s findings. A compelling narrative helps boost decision-making and instills confidence in decision-makers.

    In the best cases, storytellers can craft and automate engaging, dynamic narrative reports using the very same platform they use to prepare data models and conduct advanced analytics inquiries. Processes may be automated so that storytellers can prepare data models and conduct inquiries easily as they shape their narrative. But whether the storyteller has access to a legacy or modern business intelligence (BI)platform , it’s the storyteller and his or her capabilities that matter most.

    Who are your storytellers?


    "The ability to take data - to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it - that’s going to be a hugely important skill in the next decades."

    Hal R. Varian, Chief Economist, Google, 2009


    The history of analytics has been shaped by technical experts, where companies prioritized data scientists who can identify and understand raw information and process insights themselves. But as business became more data-driven, the need for insights spread across the organization. Business success called for more nuanced approaches to analysis and required broader access to analytics capabilities.

    Now, organizations more often lack the storytelling skill set - the ability to bridge the gap between analytics and business value. Successful storytellers embody this 'bridge' as a result of their ability to close the gap between analytics and business decision-makers at all levels of the organization.

    Today, a person doesn’t need to be a professional data scientist to master data storytelling. 'Citizen data scientists' can master data storytelling in the context of their or their team’s decision-making roles. In fact, the best storytellers have functional roles that equip them with the right vocabulary to communicate with their peers. It’s this “last mile” skill that makes the difference between information and results.

    Fortunately, leading BI platforms provide more self-service capabilities than ever, enabling even nontechnical users to access in-depth insights appropriate to their roles and skill levels. More than ever, employees across business functions can explore analytics data and hone their abilities in communicating its value to others. The question is whether or not you can trust your organization to facilitate their development.

    This is the end of part 1 of this article. To continue reading, you can find part 2 here.

    Author: Omri Kohl 

    Source: Pyramid Analytics

  • Data alone is not enough, storytelling matters - part 2

    Data alone is not enough, storytelling matters - part 2

    This article comprises the second half of a 2 part piece. Be sure to read part 1 before reading this article.

    Three common mistakes in data storytelling

    Of course, there are both opportunities and risks when using narratives and emotions to guide decision-making. Using a narrative to communicate important data and its context means listeners are one-step removed from the insights analytics provide.

    These risks became realities in the public discourse surrounding the 2020 global COVID-19 pandemic. Even as scientists recommended isolation and social distancing to ´flatten the curve´ - low the spread of infection - fears of an economic recession grew rampant. Public figures often overlooked inconvenient medical data in favor of narratives that might reactivate economic activity, putting lives at risk.

    Fortunately, some simple insights into human behavior can help prevent large-scale mistakes. Here are three common ways storytellers make mistakes when they employ a narrative, along with a simple use case to illustrate each example:

    • 'Objective' thinking: In this case, the storyteller focuses on an organizational objective instead of the real story behind the data. This might also be called cognitive bias. It’s characterized by the storyteller approaching data with an existing assumption rather than a question. The analyst therefore runs the risk of picking data that appears to validate that assumption and overlooking data that does not.

      Imagine a retailer who wants to beat its competitor’s customer service record. Business leaders task their customer experience professionals with proving this is the case. Resolute on meeting expectations, those analysts might omit certain data that doesn’t tip the results in favor of the desired outcome.

    • 'Presentative' thinking: In this case, the storyteller focuses on the means by which he or she presents the findings - such as a data visualization method - at risk of misleading, omitting, or watering down the data. The storyteller may favor a visualization that is appealing to his or her audience at the expense of communicating real value and insights.

      Consider an example from manufacturing. Imagine a storyteller preparing a narrative about productivity for an audience that prefers quantitative data visualization. That storyteller might show, accurately, that production and sales have increased but omit qualitative data analysis featuring important customer feedback.

    • 'Narrative' thinking: In this case, the storyteller creates a narrative for the narrative’s sake, even when it does not align well with the data. This often occurs when internal attitudes have codified a specific narrative about, say, customer satisfaction or performance.

      During the early days of testing for COVID-19, the ratio of critical cases to mild ones appeared high because not everyone infected had been tested. Despite the lack of data, this quickly solidified a specific media narrative about the lethality of the disease.

    Business leaders must therefore focus on maximizing their 'insight-to-value conversion rate', as Forbes describes it, where data storytelling is both compelling enough to generate action and valuable enough for that action to yield positive business results. Much of this depends on business leaders providing storytellers with the right tools, but it also requires encouragement that sharing genuine and actionable insights is their top priority.

    Ensuring storytelling success


    “Numbers have an important story to tell. They rely on you to give them a clear and convincing voice.”

    Stephen Few, Founder & Principal, Perceptual Edge®


    So how can your practical data scientists succeed in their mission: driving positive decision-making with narratives that accurately reflect the story behind the data your analytics provide? Here are some key tips to relay to your experts:

    • Involve stakeholders in the narrative’s creation. Storytellers must not operate in a vacuum. Ensure stakeholders understand and value the narrative before its official delivery.

    • Ensure the narrative ties directly to analytics data. Remember, listeners are a step removed from the insights your storytellers access. Ensure all their observations and visualizations have their foundations in the data.

    • Provide deep context with dynamic visualizations and content. Visualizations are building blocks for your narrative. With a firm foundation in your data, each visualization should contribute honestly and purposefully to the narrative itself.

    • Deliver contextualized insights. 'Know your audience' is a key tenant in professional writing, and it’s equally valuable here. Ensure your storytellers understand how listeners will interpret certain insights and findings and be ready to clarify for those who might not understand.

    • Guide team members to better decisions. Ensure your storytellers understand their core objective - to contribute honestly and purposefully to better decision-making among their audience members.

    As citizen data science becomes more common, storytellers and their audience of decision-makers are often already on the same team. That’s why self-service capabilities, contextual dashboards, and access to optimized insights have never been so critical to empowering all levels of the organization.

    Getting started: creating a culture of successful storytelling

    Insights are only valuable when shared - and they’re only as good as your team’s ability to drive decisions with them in a positive way. It’s data storytellers who bridge the gap from pure analytics insights to the cognitive and emotional capacities that regularly guide decision-making among stakeholders. As you might have gleaned from our two COVID-19 scenarios, outcomes are better when real data, accurate storytelling, and our collective capacities are aligned.

    But storytellers still need access to the right tools and contextual elements to bridge that gap successfully. Increasing business users’ access to powerful analytics tools is your first step towards data storytelling success. That means providing your teams with an analytics platform that adds meaning and value to business decisions, no matter their level in your organization.

    If you haven´t read part 1 of this article yet, you can find it here.

    Author: Omri Kohl

    Source: Pyramid Analytics

  • Data as a universal language

    Data as a universal language

    You don’t have to look very far to recognize the importance of data analytics in our world; from the weather channel using historical weather patterns to predict the summer, to a professional baseball team using on-base plus slugging percentage to determine who is more deserving of playing time, to Disney using films’ historical box office data to nail down the release date of its next Star Wars film.

    Data shapes our daily interactions with everything, from the restaurants we eat at, to the media we watch and the things that we buy. Data defines how businesses engage with their customers, using website visits, store visits, mobile check-ins and more to create a customer profile that allows them to tailor their future interactions with you. Data enhances how we watch sports, such as the world cup where broadcasters share data about players’ top speed and how many miles they run during the match. Data is also captured to remind us how much time we are wasting on our mobile devices, playing online games or mindlessly scrolling through Instagram.

    The demand for data and the ability to analyze it has also created an entire new course of study at universities around the world, as well as a career path that is currently among the fastest growing and most sought-after skillsets. While data scientists are fairly common and chief data officer is one of the newest executive roles focused on data-related roles and responsibilities, data analytics no longer has to be exclusive to specialty roles or the overburdened IT department. 

    And really, what professional can’t benefit from actionable intelligence?

    Businesses with operations across the country or around the world benefit from the ability to access and analyze a common language that drives better decision making.  An increasing number of these businesses recognize that they are creating volumes of data that have value, and even more important perhaps, the need for a centralized collection system for the information so they use the data to be more efficient and improve their chances for success.

    Sales teams, regardless of their location, can use centrally aggregated customer data to track purchasing behavior, develop pricing strategies to increase loyalty, and identify what products are purchased most frequently in order to offer complementary solutions to displace competitors.

    Marketing teams can use the same sales data to develop focused campaigns that are based on real experiences with their customers, while monitoring their effectiveness in order to make needed adjustments and or improve future engagement.

    Inventory and purchasing can use the sales data to improve purchasing decisions, ensure inventory is at appropriate levels and better manage slow moving and dead stock to reduce the financial impact on the bottom line.

    Branch managers can use the same data to focus on their own piece of the business, growing loyalty among their core customers and tracking their sales peoples’ performance.

    Accounts receivables can use the data to focus their efforts on the customers that need the most attention in terms of collecting outstanding invoices. And integrating the financial data with operational data paints a more complete picture of performance for financial teams and executives responsible for reporting and keeping track of the bottom line.

    Data ties all of the disciplines and departments together regardless of their locations. While some may care more about product SKUs than P&L statements or on-time-in-full deliveries, they can all benefit from a single source of truth that turns raw data into visual, easy-to-read charts, graphs and tables.

    The pace, competition and globalization of business make it critical for your company to use data to your advantage, which means moving away from gut feel or legacy habits to basing key decisions on the facts found in your ERP, CRM, HR, marketing and accounting systems. With the right translator, or data analytics software, the ability to use your data based on roles and responsibilities to improve sales and marketing strategies, customer relationships, stock and inventory management, financial planning and your corporate performance, can be available to all within your organization, making data a true universal language.

    Source: Phocas Software

  • Data governance: using factual data to form subjective judgments

    Data governance: using factual data to form subjective judgments

    Data Warehouses were born of the finance and regulatory age. When you peel away the buzz words, the principle goal of this initial phase of business intelligence was the certification of truth. Warehouses helped to close the books and analyze results. Regulations like Dodd Frank wanted to make sure that you took special care to certify the accuracy of financial results and Basel wanted certainty around capital liquidity and on and on. Companies would spend months or years developing common metrics, KPIs, and descriptions so that a warehouse would accurately represent this truth.

    In our professional lives, many items still require this certainty. There can only be one reported quarterly earnings figure. There can only be one number of beds in a hospital or factories available for manufacturing. However, an increasing number of questions do not have this kind of tidy right and wrong answer. Consider the following:

    • Who are our best customers?
    • Is that loan risky?
    • Who are our most effective employees?
    • Should I be concerned about the latest interest rate hike?

    Words like best, risky, and effective are subjective by their very natures. Jordon Morrow (Qlik) writes and speaks extensively about the importance of data literacy and uses a phrase that has always felt intriguing: data literacy requires the ability to argue with data. This is key when the very nature of what we are evaluating does not have neat, tidy truths.

    Let’s give an example. A retail company trying to liquidate its winter inventory and has asked three people to evaluate the best target list for an e-mail campaign.

    • John downloads last year’s campaign results and collects the names and e-mail addresses of the 2% that responded to the campaign last year with an order.
    • Jennifer thinks about the problem differently. She looks through sales records of anyone who has bought winter merchandise in the past 5 years during the month of March who had more than a 25% discount on the merchandise. She notices that these people often come to the web site to learn about sales before purchasing. Her reasoning is that a certain type of person who likes discounts and winter clothes is the target.
    • Juan takes yet another approach. He looks at social media feeds of brand influencers. He notices that there are 100 people with 1 million or more followers and that social media posts by these people about product sales traditionally cause a 1% spike in sales for the day as their followers flock to the stores. This is his target list.

    So who has the right approach? This is where the ability to argue with data becomes critical. In theory, each of these people should feel confident developing a sales forecast on his or her model. They should understand the metric that they are trying to drive and they should be able to experiment with different ideas to drive a better outcome and confidently state their case.

    While this feels intuitive, enterprise processes and technologies are rarely set up to support this kind of vibrant analytics effort. This kind of analytics often starts with the phrase “I wonder if…” while conventional IT and data governance frameworks are not able generally to deal with questions that a person did not know that they had 6 months before. And yet, “I wonder if” relies upon data that may have been unforeseen. In fact, it usually requires a connection of data sets that have often never been connected before to drive break-out thinking. Data science is about identifying those variables and metrics that might be better predictors of performance. This relies on the analysis of new, potentially unexpected data sets like social media followers, campaign results, web clicks, sales behavior etc. Each of these items might be important for an analysis, but in a world in which it is unclear what is and is not important, how can a governance organization anticipate and apply the same dimensions of quality to all of the hundreds of data sets that people might use? And how can they apply the same kind of rigor to data quality standards for the hundreds of thousands of data elements available as opposed to the 100-300 critical data elements.

    They can’t. And that’s why we need to re-evaluate the nature of data governance for different kinds of analytics.

    Author: Joe Dos Santos

    Source: Qlik

  • Data management: building the bridge between IT and business

    Data management: building the bridge between IT and business

    We all know businesses are trying to do more with their data, but inaccuracy and general data management issues are getting in the way. For most businesses, the status quo for managing data is not always working. However, tnew research shows that data is moving from a knee-jerk, “must be IT’s issue” conversation, to a “how can the business better leverage this rich (and expensive) data resource we have at our fingertips” conversation.

    The emphasis is on “conversation”, business and IT need to communicate in the new age of Artificial Intelligence, Machine Learning and Interactive Analytics. Roles and responsibilities are blurring, and it is expected that a company’s data will quickly turn from a cost-center of IT infrastructure to a revenue-generator for the business. In order to address the issues of control and poor data quality, there needs to be an ever-increasing bridge between IT and the business. This bridge has two component parts. The first one is technology, which is both sophisticated enough to handle complex data issues but easy enough to provide a quick time-to-value. The second one is people who are able to bridge the gap between IT systems/storage/access items and business users need for value and results (enter data analysts and data engineers).

    This bridge needs to be built with three key components in mind:

    • Customer experience:

      For any B2C company, customer experience is the number one hot topic of the day and a primary way they are leveraging data. A new 2019 data management benchmark report found that 98% of companies use data to improve customer experience. And for good reason, between social media, digital streaming services, online retailers and others, companies are looking to show the consumer that they aren’t just a corporation, but that they are the corporation most worthy of building a relationship with. This invariably involves creating a single view of the customer (SVC), and  that view needs to be built around context and based on the needs of the specific department within the business (accounts payable, marketing, customer service, etc.).
    • Trust in data:

      Possessing data and trusting data are two completely different things. Lots of companies have lots of data, but that doesn’t mean they automatically trust it enough to make business-critical decisions with it. Research finds that on average, organizations suspect 29% of current customer/prospect data is inaccurate in some way. In addition, 95% of organizations see impacts in their organization from poor quality data. A lack of trust in the data available to business users paralyzes decisions, and even worse, impacts the ability to make the right decisions based on faulty assumptions. How often have you received a report and questioned the results? More than you’d like to admit, probably. To get around this hurdle, organizations need to drive culture change around data quality strategies and methodologies. Only by completing a full assessment of data, developing a strategy to address the existing and ongoing issues, and implementing a methodology to execute on that strategy, will companies be able to turn the corner from data suspicion to data trust.
    • Changing data ownership:

      The responsibilities between IT and the business are blurring. 70% of businesses say that not having direct control over data impacts their ability to meet strategic objectives. The reality is that the definitions of control are throwing people off. IT thinks of control as storage, systems, and security. The business thinks of control as access, actionable and accurate. The role of the CDO is helping to bridge this gap, bringing the nuts and bolts of IT in line with the visions and aspirations of the business.

    The bottom line is that for most companies data is still a shifting sea of storage, software stacks, and stakeholders. The stakeholders are key, both from IT and the business, and in how the two can combine to provide the oxygen the business needs to survive: better customer experience, more personalization, and an ongoing trust in the data they administrate to make the best decisions to grow their companies and delight their customers.

    Author: Kevin McCarthy

    Source: Dataversity

  • Data science community to battle COVID-19 via Kaggle

    Data science community to battle COVID-19 via Kaggle

    A challenge on the data science community site Kaggle is asking great minds to apply machine learning to battle the COVID-19 coronavirus pandemic.

    As COVID-19 continues to spread uncontrolled around the world, shops and restaurants have closed their doors, information workers have moved home, other businesses have shut down entirely, and people are social distancing and self-isolating to 'flatten the curve'. It's only been a few weeks, but it feels like forever. If you listen to the scientists, we have a way to go still before we can consider reopening and reconnecting. The worst crisis is yet to come for many areas. Yet, there are glimmers of hope, too.

    Among them are the efforts of so many smart minds working on different parts of the problem to track hospital beds, map the spread, research the survivors, develop treatments, create a vaccine, and many other innovations. To help spur the development, researchers from several organizations at the request of the White House Office of Science and Technology Policy have released a dataset of machine-readable coronavirus literature for data and text mining, which includes more than 29,000 articles of which more than 13,000 have full text.

    The dataset is available to researchers around the world via Google's Kaggle machine learning and data science community, the White House office announced earlier this month, and was made available from researchers and leaders from the Allen Institute for AI, Chan Zuckerberg Initiative, Georgetown University's Center for Security and Emerging Technology, Microsoft, and the National Library of Medicine at the National Institutes of Health.

    Together, the White House and the organizations have issued a call to action to the nation's AI experts 'to develop new text and data mining techniques that can help the science community answer high-priority scientific questions related to COVID-19'.

    Among those answering the call is data science and AI platform company DataRobot, which announced that it would provide the platform for free to those who want to use it to help with the COVID-19 virus response effort. In collaboration with its cloud partner, AWS (which has also waived its fees), the program offers free access to the DataRobot's automated machine learning and Paxata data preparation technology for those participating in the Kaggle challenge.

    DataRobot has brought those 13,000 data sets into the DataRobot platform and performed some initial data preparation, Phil Gurbacki, senior VP of product and customer experience told InformationWeek. Some of the initial projects are looking at risk factors, seasonal factors, and how to identify the origin of transmission, he said. Gurbacki said the time series forecast model capabilities of the DataRobot platform could be particularly useful to data scientists looking to model impacts of the virus.

    'Innovation starts with an understanding', Gurbacki said. 'We want to make sure we maximize the amount of time that researchers are spending on innovation rather than wasting time doing something that could be automated for them'.

    DataRobot joins many other companies that are offering their platforms for free for a limited period as the world responds to the challenges of the novel coronavirus. GIS and mapping software company Esri is also offering its platform free of charge to those working on fighting the pandemic, particularly governments around the world. It has also built templates and a hub that spotlights notable projects.

    Plus, there are several vendors that are offering free trial versions of collaboration software for organizations that are now operating with a remote workforce. Those companies include Microsoft with its Teams collaboration software, Atlassian, Cisco's Webex, Facebook for Workplace, Google Hangouts, Slack, Stack Overflow Teams, Zoho Remotely, and Zoom, among many others.

    Author: Jessica Davis

    Source Informationweek

  • Data Science implementeren is geen ‘Prutsen en Pielen zonder pottenkijkers’

    Belastingdienst

    Van fouten bij de Belastingdienst kunnen we veel leren

    De belastingdienst verkeert opnieuw in zwaar weer. Na de negatieve berichtgeving in 2016 was in Zembla te zien hoe de belastingdienst invulling gaf aan Data Analytics. De broedkamer waarin dat gebeurde stond intern bekend als domein om te 'prutsen en pielen zonder pottenkijkers'.

    Wetgeving met voeten getreden

    Een overheidsdienst die privacy- en aanbestedingswetgeving met voeten treedt staat natuurlijk garant voor tumult en kijkcijfers. En terecht natuurlijk. Vanuit oorzaak en gevolg denken is het echter de vraag of die wetsovertredingen nou wel het meest interessant zijn. Want hoe kon het gebeuren dat een stel whizzkids in datatechnologie onder begeleiding van een extern bureau (Accenture) in een ‘kraamkamer’ werden gezet. En zo, apart van de gehele organisatie, een vrijbrief kregen voor…….Ja voor wat eigenlijk?

    Onder leiding van de directeur van de belastingdienst Hans Blokpoel is er een groot data en analytics team gestart. Missie: alle bij de belastingdienst bekende gegevens te combineren, om zo efficiënter te kunnen werken, fraude te kunnen detecteren en meer belastingopbrengsten te genereren. En zo dus waarde voor de Belastingdienst te genereren. Dit lijkt op een data science strategie. Maar wist de belastingdienst wel echt waar ze mee bezig was? Vacatureteksten die werden gebruikt om data scientists te werven spreken van ‘prutsen en pielen zonder pottenkijkers’.

    De klacht van Zembla is dat het team het niveau van ‘prutsen en pielen’ feitelijk niet ontsteeg. Fysieke beveiliging, authenticatie en autorisatie waren onvoldoende. Het was onmogelijk te zien wie bij de financiële gegevens van 11 miljoen burgers en 2 miljoen bedrijven geweest was, en of deze gedownload of gehackt waren. Er is letterlijk niet aan de wetgeving voldaan.

    Problemen met data science

    Wat bij de Belastingdienst misgaat gebeurt bij heel erg veel bedrijven en organisaties. Een directeur, manager of bestuurder zet data en analytics in om (letterlijk?) slimmer te zijn dan de rest. Geïsoleerd van de rest van de organisatie worden slimme jongens en meisjes zonder restricties aan de slag gezet met data. Uit alle experimenten en probeersels komen op den duur aardige resultaten. Resultaten die de belofte van de 'data driven organisatie' mogelijk moeten maken.

    De case van de belastingdienst maakt helaas eens te meer duidelijk dat er voor een 'data driven organisatie' veel meer nodig is dan de vaardigheid om data te verzamelen en te analyseren. Tot waarde brengen van data vergt visie (een data science strategie), een organisatiewijze die daarop aansluit (de ene data scientist is de andere niet) maar ook kennis van de restricties. Daarmee vraagt het om een cultuur waarin privacy en veiligheid gewaarborgd worden. Voor een adequate invulling van de genoemde elementen heb je een groot deel van de ‘oude’ organisatie nodig alsmede een adequate inbedding van de nieuwe eenheid of funct

    ie.

    Strategie en verwachtingen

    Data science schept verwachtingen. Meer belastinginkomsten met minder kosten, hogere omzet of minder fraude. Efficiency in operatie maar ook effectiviteit in klanttevredenheid. Inzicht in (toekomstige) marktontwikkelingen. Dit zijn hoge verwachtingen. Implementatie van data science vraagt echter ook om investeringen. Stevige investeringen in technologie en hoogopgeleide mensen. Schaarse mensen bovendien met kennis van IT, statistiek, onderzoeksmethodologie etc. Hoge verwachtingen die gepaard gaan met stevige investeringen leiden snel tot teleurstellingen. Teleurstellingen leiden tot druk. Druk leidt niet zelden tot het opzoeken van grenzen. En het opzoeken van grenzen leidt tot problemen. De functie van een strategie is deze dynamiek te voorkomen.

    Het managen van de verhouding tussen verwachtingen en investeringen begint bij een data science strategie. Een antwoord op de vraag: Wat willen we in welke volgorde volgens welke tijdspanne met de implementatie van data science bereiken? Gaan we de huidige processen optimaliseren (business executie strategie) of transformeren (business transformatie strategie)? Of moet het data science team nieuwe wijzen van werken faciliteren (enabling strategie)? Deze vragen zou een organisatie zichzelf moeten stellen alvorens met data science te beginnen. Een helder antwoord op de strategie vraag stuurt de governance (waar moeten we op letten? Wat kan er fout gaan?) maar ook de verwachtingen. Bovendien weten we dan wie er bij de nieuwe functie moet worden betrokken en wie zeker niet.

     

    Governance en excessen

    Want naast een data science strategie vraag adequate governance om een organisatie die in staat is om domeinkennis en expertise uit het veld te kunnen combineren met data. Dat vereist het in kunnen schatten van 'wat kan' en 'wat niet'. En daarvoor heb je een groot deel van de 'oude' organisatie nodig. Lukt dat, dan is de 'data driven organisatie' een feit. Lukt het niet dan kun je wachten op brokken. In dit geval dus een mogelijke blootstelling van alle financiele data van alle 11 miljoen belastingplichtige burgers en 2 miljoen bedrijven. Een branchevreemde data scientist is als een kernfysicus die in experimenten exotische (en daarmee ook potentieel gevaarlijke) toepassingen verzint. Wanneer een organisatie niet stuurt op de doelstellingen en dus data science strategie dan neemt de kans op excessen toe.

     

    Data science is veelmeer dan technologie

    Ervaringsdeskundigen weten al lang dat data science veelmeer is dat het toepassen van moderne technologie op grote hoeveelheden data. Er zijn een aantal belangrijke voorwaarden voor succes. In de eerste plaats gaat het om een visie op hoe data en data technologie tot waarde kunnen worden gebracht. Vervolgens gaat het om de vraag hoe je deze visie organisatorisch wilt realiseren. Pas dan ontstaat een kader waarin data en technologie gericht kunnen worden ingezet. Zo kunnen excessen worden voorkomen en wordt waarde gecreëerd voor de organisatie. Precies deze stappen lijken bij de Belastingdienst te zijn overgeslagen.

     

    Zembla

    De door Zembla belichtte overtreding van wetgeving is natuurlijk een stuk spannender. Vanuit het credo ‘voorkomen is beter dan genezen’ blijft het jammer dat het goed toepassen van data science in organisaties in de uitzending is onderbelicht.

     

    Bron: Business Data Science Leergang Radboud Management Academy http://www.ru.nl/rma/leergangen/bds/

    Auteurs: Alex Aalberts / Egbert Philips

  • Data science plays key role in COVID-19 research through supercomputers

    Data science plays key role in COVID-19 research through supercomputers

    Supercomputers, AI and high-end analytic tools are each playing a key role in the race to find answers, treatments and a cure for the widespread COVID-19.

    In the race to flatten the curve of COVID-19, high-profile tech companies are banking on supercomputers. IBM has teamed up with other firms, universities and federal agencies to launch the COVID-19 High Performance Computing Consortium.

    This consortium has brought together massive computing power in order to assist researchers working on COVID-19 treatments and potential cures. In total, the 16 systems in the consortium will offer researchers over 330 petaflops, 775,000 CPU cores and 34,000 GPUs and counting.

    COVID-19 High performance computing consortium

    The consortium aims to give supercomputer access to scientists, medical researchers and government agencies working on the coronavirus crisis. IBM said its powerful Summit supercomputer has already helped researchers at the Oak Ridge National Laboratory and the University of Tennessee screen 8,000 compounds to find those most likely to bind to the main "spike" protein of the coronavirus, rendering it unable to infect host cells.

    "They were able to recommend the 77 promising small-molecule drug compounds that could now be experimentally tested," Dario Gil, director of IBM Research, said in a post. "This is the power of accelerating discovery through computation."

    In conjunction with IBM, the White House Office of Science and Technology Policy, the U.S. Department of Energy, the National Science Foundation, NASA, nearly a dozen universities, and several other tech companies and laboratories are all involved.

    The work of the consortium offers an unprecedented back end of supercomputer performance that researchers can leverage while using AI to parse through massive databases to get at the precise information they're after, Tim Bajarin, analyst and president of Creative Strategies, said.

    Supercomputing powered by sharing big databases

    Bajarin said that the world of research is fundamentally done in pockets which creates a lot of insulated, personalized and proprietary big databases.

    "It will take incredible cooperation for Big Pharma to share their research data with other companies in an effort to create a cure or a vaccine," Bajarin added.

    Gil said IBM is working with consortium partners to evaluate proposals from researchers around the world and will provide access to supercomputing capacity for the projects that can have the most immediate impact.

    Many enterprises are coming together to share big data and individual databases with researchers.  

    Signals Analytics released a COVID-19 Playbook that offers access to critical market intelligence and trends surrounding potential treatments for COVID-19. The COVID-19 Playbook is available at no cost to researchers looking to monitor vaccines that are in development for the disease and other strains of coronavirus, monitor drugs that are being tested for COVID-19 and as a tool to assess which drugs are being repurposed to help people infected with the virus.

    "We've added a very specific COVID-19 offering so researchers don't have to build their own taxonomy or data sources and can use it off the shelf," said Frances Zelazny, chief marketing officer at Signals Analytics.

    Eschewing raw computing power for targeted, critical insights

    With the rapid spread of the virus and the death count rising, treatment options can't come soon enough. Raw compute power is important, but perhaps equally as crucial is being able to know what to ask and quickly analyze results.

    "AI can be a valuable tool for analyzing the behavior and spread of the coronavirus, as well as current research projects and papers that might provide insights into how best to battle COVID-19," Charles King, president of the Pund-IT analysis firm, said.

    The COVID-19 consortium includes research requiring complex calculations in& epidemiology and bioinformatics. While the high computing power allows for rapid model testing and large data processing, the predictive analytics have to be proactively applied to health IT.

    Dealing with COVID-19 is about predicting for the immediate, imminent future - from beds necessary in ICUs to social distancing timelines. In the long term, Bajarin would like to see analytic and predictive AI used as soon as possible to head off future pandemics.

    "We've known about this for quite a while - COVID-19 is a mutation of SARS. Proper trend analysis of medical results going forward could help head off the next great pandemic," Bajarin said.

    Author: David Needle

    Source: TechTarget

  • De impact van 5G op de ontwikkelingen in moderne technologie

    De impact van 5G op de ontwikkelingen in moderne technologie

    Van opvouwbare 5G-telefoons tot operaties die worden uitgevoerd op een patiënt op kilometers afstand, en van vroege supersnelle netwerkuitrol in de VS tot tactiele internet en Internet of Things, en natuurlijk robots... het lijkt wel alsof 5G altijd en overal onderwerp van gesprek is.

    Voor telecom operators of communicatie-providers (CSP's) is dit een tijd vol kansen, van het upgraden van capaciteit tot het leveren van nieuwe services, content en interactie op manieren die voorheen simpelweg niet mogelijk waren. Door 5G te gebruiken om Internet of Things (IoT) en edge computing mogelijk te maken, hebben ze nu een geweldige kans die een paar jaar geleden nog volledig ondenkbaar was.

    Maar dat betekent dat 5G zich op een enorm buigpunt bevindt. Van het kunnen verslaan van concurrenten met nieuwe diensten die echt waarde toevoegen, tot de prestaties en operationele efficiëntie die hun netwerken aanzienlijk verbeteren. De beslissingen die vandaag worden genomen, zullen verregaande financiële en operationele implicaties hebben voor CSP's.

    Bereik bedrijfsdoelen met 5G

    Maar als je 5G slechts ziet als een volgende mogelijkheid voor telecom, dan mis je de boot. Iedere organisatie in iedere sector die netwerken gebruikt (kortom iedereen) moet plannen maken voor de 5G implementatie en nadenken over hoe ze deze kunnen gebruiken om hun bedrijfsdoelen te bereiken.

    Het maakt niet uit of ze actief zijn in de detailhandel, logistiek, in een stad, in een landelijke omgeving, in de publieke of private sector. 5G belooft enorme kansen om nieuwe diensten en toepassingen te leveren, de automatisering en de mogelijkheden die dit met zich meebrengt te vergroten en om bedrijven te helpen om met klanten om te gaan op manieren die nog nooit eerder waren bedacht.

    Dat wil niet zeggen dat het altijd even gemakkelijk is, zoals Åsa Tamsons, hoofd new businesses bij Ericsson, al zei in een interview met CN. Want velen beschouwen 5G nog steeds als gewoon 'een ander netwerk'. Het is een houding die het aanvankelijk voor sommige organisaties moeilijker zou kunnen maken om de vooraf benodigde investeringen rond te krijgen. Ondernemingen zullen moeten werken om zowel interne als externe belanghebbenden ervan te overtuigen dat de kosten gerechtvaardigd zijn.

    Eén netwerk, een wereld van kansen

    Terug naar de initiële vraag: is 5G niet gewoon een andere G? Simpel gezegd, 5G levert veel hogere snelheden en biedt veel kortere latency en een aanzienlijk hogere dichtheid dan 4G. Maar wat betekent dit eigenlijk?

    Snelheid is relatief eenvoudig. Het 5G-voorbeeld dat hierin vaak wordt gebruikt, is dat het straks mogelijk is om in tien seconden een HD-film te downloaden in vergelijking met de (op zijn best) ongeveer twintig minuten die er momenteel voor nodig zijn (afhankelijk van de lokale breedbanddiensten).

    De latency, de tijd die nodig is om gegevens tussen twee punten te laten reizen, is bij 5G minder dan een milliseconde, wat bijvoorbeeld belangrijk kan zijn voor chirurgie, maar gecombineerd met snelheid is het ook een factor voor veel gamers die willen betalen voor dit type snelle, low latency-service.

    Op dit moment bevinden we ons al in een wereld waar meer dan 23 miljard apparaten zijn aangesloten op netwerken en die dichtheid blijft, dankzij grotere mobiliteit en IoT-use-cases, groeien. We zijn allemaal al eens in de situatie geweest waarin de snelheden drastisch afnemen als iedereen inlogt en naarmate we steeds meer verbonden raken, hebben we netwerken nodig die geschikt zijn voor aanzienlijk meer apparaten dan ooit tevoren.

    Telco-cloud

    Echter, om dit alles te kunnen leveren, is een aanzienlijke investering in de netwerkinfrastructuur nodig. Voor CSP's is dit een grote onderneming. Daarom is het waarschijnlijk zo dat in plaats van een puur 5G-netwerk, de meerderheid van de mensen een gemengde aanpak zullen zien, waarbij 4G beschikbaar is om basisdiensten te leveren en 5G wordt geïntroduceerd voor specifieke taken. Het is daarom van cruciaal belang om de zogenaamde telco-cloud te hebben. Dit is een software defined technologie die zowel de huidige 4G ondersteunt als het grondwerk voor 5G is, iets wat erg wordt gewaardeerd door operators zoals bijvoorbeeld Vodafone.

    'Het vermogen om flexibel en agile te zijn terwijl we onze netwerkactiviteiten en -beheer blijven automatiseren, kon alleen worden bereikt door een software defined infrastructuur', zegt Johan Wibergh, chief technology officer van Vodafone. 'We zijn blij met de versnelde time-to-market en de bijbehorende economische voordelen van onze transitie naar NFV en, in toenemende mate, een telco-cloud infrastructuur'.

    Met 5G hebben bedrijven toegang tot de niveaus en snelheden van connectiviteit die ze nodig hebben om te profiteren van de game changing technologieën zoals IoT, edge computing en AI (artificial intelligence) die de volgende fase van de digitale revolutie gaan vormgeven. Gecombineerd met deze software defined-infrastructuur, en meer in overeenstemming met de specificaties en ambities, heeft 5G de kracht om bedrijfsmodellen van gevestigde organisaties ongekend te transformeren.

    Kapitaliseren om te gedijen

    We beseffen ons nog niet eens wat de mogelijkheden van 5G nu al zijn. Er moet nog zoveel gebeuren voordat we volledige acceptatie zien, maar bedrijven moeten nu gaan nadenken over hoe ze de kracht van deze nieuwe netwerken kunnen benutten voor hun eigen concurrentievermogen. Er over denken als 'gewoon een andere G' dreigt er voor te zorgen dat men niet voorbereid is en de enorme kansen mist die beschikbaar zijn.

    5G is het netwerk en de basis die de beloftes van veel andere nieuwe technologieën waar gaat maken. Elke organisatie die er niet in slaagt om hiervan te profiteren, zal heel hard moeten werken om te overleven in de digitale wereld.

    Auteur: Jean Pierre Brulard

    Bron: CIO

  • De uitdaging van het structuur aanbrengen in ongestructureerde data

    De uitdaging van het structuur aanbrengen in ongestructureerde data

    De wereld verzamelt steeds meer data, en met een onrustbarend groeiende snelheid. Vanaf het begin van de beschaving tot ongeveer 2003 produceerde de mensheid zo’n 5 exabyte aan data. Nu produceren we deze hoeveelheid elke twee dagen. 90 procent van alle data is in de afgelopen 2 jaar gegenereerd.

    Op zich is er niets mis met data, maar het probleem is dat een groot deel hiervan ongestructureerd is. Deze ‘dark data’ omvat inmiddels al zo’n vier vijfde van de totale databerg. En daarmee beginnen de echte problemen.

    Privacy

    Ongestructureerde data is onbruikbaar. Je weet niet wat erin zit, wat de structuur is en hoeveel informatie daarvan misschien belangrijk is. Hoe kun je voldoen aan de eisen van de nieuwe privacywetgeving, als je niet eens weet welke informatie er in je data zit? Het kan gevoelige informatie zijn, zodat je de wet overtreedt zonder dat je daarvan op de hoogte bent. Totdat zich een lek voordoet en alle gegevens op straat liggen. En hoe kun je voldoen aan de wet openbaarheid bestuur en straks aan de wet open overheid, als je niet weet waar je de informatie moet vinden? De AVG verplicht je om persoonsgegevens te vernietigen als de persoon daarom vraagt. Maar als je niet weet waar je die moet vinden, sta je met de mond vol tanden.

    Databerg

    Stel je data voor als een ijsberg. Het grootste deel ligt onder water: je ziet het niet. Wat boven het water uitsteekt is de kritische informatie die je dagelijks gebruikt en die nodig is om jouw organisatie te laten werken. Direct onder het oppervlak ligt een groot deel dat ooit kritisch was. Het is gebruikt en daarna opgeslagen om vervolgens nooit meer aangeraakt te worden: redundant, overbodig en triviaal, kortom ROT.

    Het grootste deel van de berg bevindt zich daar weer onder, het is de ‘dark data’, verzameld door mensen, machines en allerlei werkprocessen. Je hebt geen idee wat er zich in dat donkere deel schuilhoudt. Het zijn gegevens die zijn verzameld door sensoren, video’s van beveiligingscamera’s, en vele, vele documenten van lang, lang geleden.

    Nieuwe inzichten

    Je kunt het natuurlijk negeren, je hebt het immers niet nodig voor je dagelijkse workflow. Maar voor hetzelfde geld bevindt zich in die dark data waardevolle informatie die gebruikt kan worden om de processen in de organisatie beter te laten verlopen. Of nieuwe toepassingen mogelijk te maken. Door data uit de berg te leggen op andere data bijvoorbeeld, kun je plotseling nieuwe inzichten verkrijgen waarmee beleid kan worden gemaakt: informatiegestuurd beleid.

    Digitale dompteur

    Als alle plannen en elke beleidsmaatregel kunnen worden onderbouwd met keiharde gegevens uit de databerg, dan hebben we de heilige graal gevonden. De kwaliteit van de dienstverlening van de overheid gaat met sprongen omhoog, en er komen nieuwe impulsen voor veiligheid, handhaving, onderhoud en schuldhulpverlening, om maar eens een paar beleidsterreinen te noemen.

    Dat zal waarschijnlijk een onbereikbaar ideaal blijven, maar we kunnen wel flinke stappen in de goede richting maken. Digitaal werken betekent voortdurend aanpassen, herordenen, migreren. Om digitale informatie te temmen is een digitale dompteur nodig: een beheeromgeving die structuur aanbrengt en die inspeelt op de voortdurende veranderingen die digitalisering met zich meebrengt.

    Bron: Managementbase

  • Determining the feature set complexity

    Determining the feature set complexity

    Thoughtful predictor selection is essential for model fairness

    One common AI-related fear I’ve often heard is that machine learning models will leverage oddball facts buried in vast databases of personal information to make decisions impacting lives. For example, the fact that you used Arial font in your resume, plus your cat ownership and fondness for pierogi, will prevent you from getting a job. Associated with such concerns is fear of discrimination based on sex or race due to this kind of inference. Are such fears silly or realistic? Machine learning models are based on correlation, and any feature associated with an outcome can be used as a decision basis; there is reason for concern. However, the risks of such a scenario occurring depend on the information available to the model and on the specific algorithm used. Here, I will use sample data to illustrate differences in incorporation of incidental information in random forest vs. XGBoost models, and discuss the importance of considering missing information, appropriateness and causality in assessing model fairness.

    Feature choice — examining what might be missing as well as what’s included– is very important for model fairness. Often feature inclusion is thought of only in terms of keeping or omitting “sensitive” features such as race or sex, or obvious proxies for these. However, a model may leverage any feature associated with the outcome, and common measures of model performance and fairness will be essentially unaffected. Incidental correlated features may not be appropriate decision bases, or they may represent unfairness risks. Incidental feature risks are highest when appropriate predictors are not included in the model. Therefore, careful consideration of what might be missing is crucial.

    Dataset

    This article builds on results from a previous blog post and uses the same dataset and code base to illustrate the effects of missing and incidental features [1, 2]. In brief, I use a publicly-available loans dataset, in which the outcome is loan default status (binary), and predictors include income, employment length, debt load, etc. I preferentially (but randomly) sort lower-income cases into a made-up “female” category, and for simplicity consider only two gender categories (“males” and “females”). The result is that “females” on average have a lower income, but male and female incomes overlap; some females are high-income, and some males low-income. Examining common fairness and performance metrics, I found similar results whether the model relied on income or on gender to predict defaults, illustrating risks of relying only on metrics to detect bias.

    My previous blog post showed what happens when an incidental feature substitutes for an appropriate feature. Here, I will discuss what happens when both the appropriate predictor and the incidental feature are included in the data. I test two model types, and show that, as might be expected, the female status contributes to predictions despite the fact it contains no additional information. However, the incidental feature contributes much more to the random forest model than to the XGBoost model, suggesting that model selection may be help reduce unfairness risk, although tradeoffs should be considered.

    Fairness metrics and global importances

    In my example, the female feature adds no information to a model that already contains income. Any reliance on female status is unnecessary and represents “direct discrimination” risk. Ideally, a machine learning algorithm would ignore such a feature in favor of the stronger predictor.

    When the incidental feature, female status, is added to wither a random forest or XGBoost model, I see little change in overall performance characteristics or performance metrics (data not shown). ROC scores barely budge (as should be expected). False positive rates show very slight changes.

    Demographic parity, or the difference in loan default rates for females vs. males, remain essentially unchanged for XGBoost (5.2% vs.5.3%) when the female indicator is included, but for random forest, this metric does change from 4.3% to 5.0%; I discuss this observation in detail below.

    Global permutation importances show weak influences from the female feature for both model types. This feature ranks 12/14 for the random forest model, and 22/26 for XGBoost (when female=1). The fact that female status is of relatively low importance may seem reassuring, but any influence from this feature is a fairness risk.

    There are no clear red flags in global metrics when female status is included in the data — but this is expected as fairness metrics are similar whether decisions are based on an incidental or causal factor [1]. The key question is: does incorporation of female status increase disparities in outcome?

    Aggregated shapley values

    We can measure the degree to which a feature contributes to differences in group predictions using aggregated Shapley values [3]. This technique distributes differences in predicted outcome rates across features so that we can determine what drives differences for females vs. males. Calculation involves constructing a reference dataset consisting of randomly selected males, calculating Shapley feature importances for randomly-selected females using this “foil”, and then aggregating the female Shapley values (also called “phi” values).

    Results are shown below for both model types, with and without the “female” feature. The top 5 features for the model not including female is plotted along with female status for the model that includes that feature. All other features are summed into “other”.

    1 w4QsW620 U9Y 5z0G9xNRw

    Image by author

    First, note that the blue bar for female (present for the model including female status only) is much larger for random forest than for XGBoost. The bar magnitudes indicate the amount of probability difference for women vs. men that is attributed to a feature. For random forest, the female status feature increases the probability of default for females relative to males by 1.6%, compared to 0.3% for XGBoost, an ~5x difference.

    For random forest, female status ranks in the top 3 influential features in determining the difference in prediction for males vs. females, even though the feature was the 12th most important globally. The global importance does not capture this feature’s impact on fairness.

    As mentioned in the section above, the random forest model shows decreased demographic parity when female status is included in the model. This effect is also apparent in the Shapley plots– the increase due to the female bar is not compensated for by any decrease in the other bars. For XGBoost, the small contribution from female status appears to be offset by tiny decreases in contributions from other features.

    The reduced impact of the incidental feature for XGBoost compared to random forest makes sense when we think about how the algorithms work. Random forests create trees using random subsets of features, which are examined for optimal splits. Some of these initial feature sets will include the incidental feature but not the appropriate predictor, in which case incidental features may be chosen for splits. For XGBoost models, split criteria are based on improvements to a previous model. An incidental feature can’t improve a model based on a stronger predictor; therefore, after several rounds, we expect trees to include the appropriate predictor only.

    Demographic parity decreases for random forest can also be understood considering model building mechanisms. When a subset of features to be considered for a split is generated in the random forest, we essentially have two “income” features, and so it’s more likely that (direct or indirect) income information will be selected.

    The random forest model effectively uses a larger feature set than XGBoost. Although numerous features are likely to appear in both model types to some degree, XGBoost solutions will be weighted towards a smaller set of more predictive features. This reduces, but does not eliminate, risks related to incidental features for XGBoost.

    Is XGBoost fairer than Random Forest?

    In a previous blog post [4], I showed that incorporation of interactions to mitigate feature bias was more effective for XGBoost than for random forest (for one test scenario). Here, I observe that the XGBoost model is also less influenced by incidental information. Does this mean that we should prefer XGBoost for fairness reasons?

    XGBoost has advantages when both an incidental and appropriate feature are included in the data but doesn’t reduce risk when only the incidental feature is included. A random forest model’s reliance on a larger set of features may be a benefit, especially when additional features are correlated with the missing predictor.

    Furthermore, the fact that XGBoost doesn’t rely much on the incidental feature does not mean that it doesn’t contribute at all. It may be that only a smaller number of decisions are based on inappropriate information.

    Leaving fairness aside, the fact that the random forest samples a larger portion of what you might think of as the “solution space”, and relies on more predictors, may be have some advantages for model robustness. When a model is deployed and faces unexpected errors in data, the random forest model may be somewhat more able to compensate. (On the other hand, if random forest incorporates a correlated feature that is affected by errors, it might be compromised while an XGBoost model remains unaffected).

    XGBoost may have some fairness advantages, but the “fairest” model type is context-dependent, and robustness and accuracy must also be considered. I feel that fairness testing and explainability, as well thoughtful feature choices, are probably more valuable than model type in promoting fairness.

    What am I missing?

    Fairness considerations are crucial in feature selection for models that might affect lives. There are numerous existing feature selection methods, which generally optimize accuracy or predictive power, but do not consider fairness. One question that these don’t address is “what feature am I missing?”

    A model that relies on an incidental feature that happens to be correlated with a strong predictor may appear to behave in a reasonable manner, despite making unfair decisions [1]. Therefore, it’s very important to ask yourself, “what’s missing?” when building a model. The answer to this question may involve subject matter expertise or additional research. Missing predictors thought to have causal effects may be especially important to consider [5, 6].

    Obviously, the best solution for a missing predictor is to incorporate it. Sometimes, this may be impossible. Some effects can’t be measured or are unobtainable. But you and I both know that simple unavailability seldom determines the final feature set. Instead, it’s often, “that information is in a different database and I don’t know how to access it”, or “that source is owned by a different group and they are tough to work with”, or “we could get it, but there’s a license fee”. Feature choice generally reflects time and effort — which is often fine. Expediency is great when it’s possible. But when fairness is compromised by convenience, something does need to give. This is when fairness testing, aggregated Shapley plots, and subject matter expertise may be needed to make the case to do extra work or delay timelines in order to ensure appropriate decisions.

    What am I including?

    Another key question is “what am I including?”, which can often be restated as “for what could this be a proxy?” This question can be superficially applied to every feature in the dataset but should be very carefully considered for features identified as contributing to group differences; such features can be identified using aggregated Shapley plots or individual explanations. It may be useful to investigate whether such features contribute additional information above what’s available from other predictors

    Who am I like, and what have they done before?

    A binary classification model predicting something like loan defaults, likelihood to purchase a product, or success at a job, is essentially asking the question, “Who am I like, and what have they done before?” The word “like” here means similar values of the features included in the data, weighted according to their predictive contribution to the model. We then model (or approximate) what this cohort has done in the past to generate a probability score, which we believe is indicative of future results for people in that group.

    The “who am I like?” question gets to the heart of worries that people will be judged if they eat too many pierogis, own too many cats, or just happen to be a certain race, sex, or ethnicity. The concern is that it is just not fair to evaluate individual people due to their membership in such groups, regardless of the average outcome for overall populations. What is appropriate depends heavily on context — perhaps pierogis are fine to consider in a heart attack model, but would be worrisome in a criminal justice setting.

    Our models assign people to groups — even if models are continuous, we can think of that as the limit of very little buckets — and then we estimate risks for these populations. This isn’t much different than old-school actuarial tables, except that we may be using a very large feature set to determine group boundaries, and we may not be fully aware of the meaning of information we use in the process.

    Final thoughts

    Feature choice is more than a mathematical exercise, and likely requires the judgment of subject matter experts, compliance analysts, or even the public. A data scientist’s contribution to this process should involve using explainability techniques to populations and discover features driving group differences. We can also identify at-risk populations and ask questions about features known to have causal relationships with outcomes.

    Legal and compliance departments often focus on included features, and their concerns may be primarily related to specific types of sensitive information. Considering what’s missing from a model is not very common. However, the question, “what’s missing?” is at least as important as, “what’s there?” in confirming that models make fair and appropriate decisions.

    Data scientists can be scrappy and adept at producing models with limited or noisy data. There is something satisfying about getting a model that “works” from less than ideal information. It can be hard to admit that something can’t be done, but sometimes fairness dictates that what we have right now really isn’t enough — or isn’t enough yet.

    Author: Valerie Carey

    Source: Towards Data Science

     
  • Digital transformation strategies and tech investments often at odds

    digitaltransformation

    While decision makers are well aware that digital transformation is essential to their organizations’ future, many are jumping into new technologies that don’t align with their current digital transformation pain points, according to a new report from PointSource, a division of Globant that provides IT solutions.

    All too often decision makers invest in technologies without taking a step back to assess how those technologies fit into their larger digital strategy and business goals, the study said. While the majority of such companies perceive these investments as a fast track to the next level of digital maturity, they are actually taking an avoidable detour. 

    PointSource surveyed more than 600 senior-level decision makers and found that a majority are investing in technology that they don’t feel confident using. In fact, at least a quarter plan to invest more than 25 percent of their 2018 budgets in artificial intelligence (AI), blockchain, voice-activated technologies or facial-recognition technologies.

    However, more than half (53 percent) of companies do not feel prepared to effectively use AI, blockchain or facial-recognition technologies.

    See Also A look inside American Family Insurance's digital transformation office

    Companies are actively focusing on digital transformation, the survey showed. Ninety-four percent have increased focus on digital growth within the last year, and 90 percent said digital plays a central role in their overarching business goals.
    Fifty-seven percent of senior managers are unsatisfied with one or more of the technologies their organizations’ employees rely on. 

    Many companies feel digitally outdated, with 45 percent of decision makers considering their company’s digital infrastructure to be outdated compared with that of their competitors.

    Author: Bob Violino

    Source: Information Management

  • Do data scientists have the right stuff for the C-suite?

    The Data Science Clock v1.1 Simple1What distinguishes strong from weak leaders? This raises the question if leaders are born or can be grown. It is the classic “nature versus nurture” debate. What matters more? Genes or your environment?

    This question got me to thinking about whether data scientists and business analysts within an organization can be more than just a support to others. Can they be become leaders similar to C-level executives? 

    Three primary success factors for effective leaders

    Having knowledge means nothing without having the right types of people. One person can make a big difference. They can be someone who somehow gets it altogether and changes the fabric of an organization’s culture not through mandating change but by engaging and motivating others.

    For weak and ineffective leaders irritating people is not only a sport for them but it is their personal entertainment. They are rarely successful. 

    One way to view successful leadership is to consider that there are three primary success factors for effective leaders. They are (1) technical competence, (2) critical thinking skills, and (3) communication skills. 

    You know there is a problem when a leader says, “I don’t do that; I have people who do that.” Good leaders do not necessarily have high intelligence, good memories, deep experience, or innate abilities that they are born with. They have problem solving skills. 

    As an example, the Ford Motor Company’s CEO Alan Mulally came to the automotive business from Boeing in the aerospace industry. He was without deep automotive industry experience. He has been successful at Ford. Why? Because he is an analytical type of leader.

    Effective managers are analytical leaders who are adaptable and possess systematic and methodological ways to achieve results. It may sound corny but they apply the “scientific method” that involves formulating hypothesis and testing to prove or disprove them. We are back to basics.

    A major contributor to the “scientific method” was the German mathematician and astronomer Johannes Kepler. In the early 1600s Kepler’s three laws of planetary motion led to the Scientific Revolution. His three laws made the complex simple and understandable, suggesting that the seemingly inexplicable universe is ultimately lawful and within the grasp of the human mind. 

    Kepler did what analytical leaders do. They rely on searching for root causes and understanding cause-and-effect logic chains. Ultimately a well-formulated strategy, talented people, and the ability to execute the executive team’s strategy through robust communications are the key to performance improvement. 

    Key characteristics of the data scientist or analyst as leader

    The popular Moneyball book and subsequent movie about baseball in the US demonstrated that traditional baseball scouts methods (e.g., “He’s got a good swing.”) gave way to fact-based evidence and statistical analysis. Commonly accepted traits of a leader, such as being charismatic or strong, may also be misleading.

    My belief is that the most scarce resource in an organization is human ability and competence. That is why organizations should desire that every employee be developed for growth in their skills. But having sound competencies is not enough. Key personal qualities complete the package of an effective leader. 

    For a data scientist or analyst to evolve as an effective leader three personal quality characteristics are needed: curiosity, imagination, and creativity. The three are sequentially linked. Curious people constantly ask “Why are things the way they are?” and “Is there a better way of doing things?” Without these personal qualities then innovation will be stifled. The emergence of analytics is creating opportunities for analysts as leaders. 

    Weak leaders are prone to a diagnostic bias. They can be blind to evidence and somehow believe their intuition, instincts, and gut-feel are acceptable masquerades for having fact-based information. In contrast, a curious person always asks questions. They typically love what they do. If they are also a good leader they infect others with enthusiasm. Their curiosity leads to imagination. Imagination considers alternative possibilities and solutions. Imagination in turn sparks creativity.

    Creativity is the implementation of imagination

    Good data scientists and analysts have a primary mission: to gain insights relying on quantitative techniques to result in better decisions and actions. Their imagination that leads to creativity can also result in vision. Vision is a mark of a good leader. In my mind, an executive leader has one job (aside from hiring good employees and growing them). That job is to answer the question, “Where do we want to go?” 

    After that question is answered then managers and analysts, ideally supported by the CFO’s accounting and finance team, can answer the follow-up question, “How are we going to get there?” That is where analytics are applied with the various enterprise and corporate performance management (EPM/CPM) methods that I regularly write about. EPM/CPM methods include a strategy map and its associated balance scorecard with KPIs; customer profitability analysis; enterprise risk management (ERM), and capacity-sensitive driver-based rolling financial forecasts and plans. Collectively they assure that the executive team’s strategy can be fully executed.

    My belief is that that other perceived characteristics of a good leader are over-rated. These include ambition, team spirit, collegiality, integrity, courage, tenacity, discipline, and confidence. They are nice-to-have characteristics, but they pale compared to the technical competency and critical thinking and communications skills that I earlier described. 

    Be analytical and you can be a leader. You can eventually serve in a C-suite role

    Author: Gary Cokins 

    Source: Information Management

  • E-commerce and the growing importance of data

    E-commerce and the growing importance of data

    E-commerce is claiming a bigger role in global retail. In the US for example, e-commerce currently accounts for approximately 10% of all retail sales, a number that is projected to increase to nearly 18% by 2021. To a large extent, the e-commerce of the present exists in the shadow of the industry’s early entrant and top player, Amazon. Financial analysts predict that the retail giant will control 50% of the US’ online retail sales by 2021, leaving other e-commerce stores frantically trying to take a page out of the company’s incredibly successful online retail playbook.

    While it seems unlikely that another mega-retailer will rise to challenge Amazon’s e-commerce business in the near future, at least 50% of the online retail market is wide open. Smaller and niche e-commerce stores have a ;arge opportunity to reach specialized audiences, create return customers, and cultivate persistent brand loyalty. Amazon may have had a first-mover advantage, but the rise in big data and the ease of access to analytics means that smaller companies can find areas in which to compete and improve margins. As e-retailers look for ways to expand revenues while remaining lean, data offers a way forward for smart scalability.

    Upend your back-end

    While data can improve e-commerce’s customer-facing interactions, it can have just as major an impact on the customer experience factors that take place off camera. Designing products that customers want, having products in stock, making sure that products ship on schedule, all these kind of back-end operations play a part in shaping customer experience and satisfaction. In order to shift e-commerce from a product-centric to a consumer-centric model, e-commerce companies need to invest in unifying customer data to inform internal processes and provide faster, smarter services.

    The field of drop shipping, for instance, is coming into its own thanks to smart data applications. Platforms like Oberlo are leveraging prescriptive analytics to enable intelligent product selection for Shopify stores, helping them curate trending inventory that sells, allowing almost anyone to create their own e-store. Just as every customer touchpoint can be enhanced with big data, e-commerce companies that apply unified big data solutions to their behind-the-scenes benefit from streamlined processes and workflow.

    Moreover, e-commerce companies that harmonize data across departments can identify purchasing trends and act on real-time data to optimize inventory processes. Using centralized data warehouse software like Snowflake empowers companies to create a single version of customer truth to automate reordering points and determine what items they should be stocking in the future. Other factors, such as pricing decisions, can also be finessed using big data to generate specific prices per product that match customer expectations and subsequently sell better.

    Data transforms the customer experience

    When it comes to how data can impact the overall customer experience, e-commerce companies don’t have to invent the wheel. There’s a plethora of research that novice and veteran data explorers can draw on when it comes to optimizing customer experiences on their websites. General findings on the time it takes for customers to form an opinion of a website, customers’ mobile experience expectations, best times to send promotional emails and many more metrics can guide designers and developers tasked with improving e-commerce site traffic.

    However, e-commerce sites that are interested in more benefits will need to invest in more specific data tools that provide a 360-degree view of their customers. Prescriptive analytic tools like Tableau empower teams to connect the customer dots by synthesizing data across devices and platforms. Data becomes valuable as it provides insights that allow companies to make smarter decisions based on each consumer identify inbound marketing opportunities and automate recommendations and discounts based on the customer’s previous behavior.

    Data can also inspire changes in a field that has always dominated the customer experience: customer support. The digital revolution has brought substantial changes in the once sleepy field of customer service, pioneering new ways of direct communication with agents via social media and introducing the now ubiquitous AI chatbots. In order to provide the highest levels of customer satisfaction throughout these new initiatives, customer support can utilize data to anticipate when they might need more human agents staffing social media channels or the type of AI persona that their customers want to deal with. By improving customer service with data, e-commerce companies can improve the entire customer experience.

    Grow with your data

    As more and more data services migrate to the cloud, e-commerce companies have ever-expanding access to flexible data solutions that both fuel growth and scale alongside the businesses they’re helping. Without physical stores to facilitate face-to-face relationships, e-commerce companies are tasked with transforming their digital stores into online spaces that customers connect with and ultimately want to purchase from again and again.

    Data holds the key to this revolution. Instead of trying to force their agenda upon customers or engage in wild speculations about customer desires, e-commerce stores can use data to craft narratives that engage customers, create a loyal brand following, and drive increasing profits. With only about 2.5% of e-commerce & web visits converting to saleson average, e-commercecompanies that want to stay competitive must open themselves up to big data and the growth opportunities it offers.

    Author: Ralph Tkatchuck

    Source: Dataconomy

  • Een eerste indruk van de fusie tussen Cloudera en Hortonworks

    Een eerste indruk van de fusie tussen Cloudera en Hortonworks

    Een aantal maanden geleden werd bekend dat big data-bedrijven Cloudera en Hortonworks gaan fuseren. De overname is inmiddels goedgekeurd en Cloudera en Hortonworks gaan verder als één bedrijf. Techzine ging in gesprek met Wim Stoop, senior product marketing manager bij Cloudera. Stoop heeft alle ins en outs wat betreft de visie rond deze fusie en wat de fusie betekent voor bedrijven en data analisten die met de producten van de twee bedrijven werken.

    Stoop vertelt dat deze fusie min of meer het perfecte huwelijk is. Beide bedrijven houden zich bezig met big data op basis van Hadoop en hebben zich de afgelopen jaren hierin gespecialiseerd. Zo is Hortonworks erg goed in Hadoop Data Flow (HDF), werken met streaming data die snel in het Hadoop platform moeten worden toegevoegd. 

    Cloudera data science workbench

    Cloudera heeft met zijn data science workbench een goede oplossing in handen voor data analisten. Zij kunnen met deze workbench snel en eenvoudig data combineren en analyseren, zonder dat je daarvoor direct extreem veel rekenkracht nodig hebt. Met de workbench van Cloudera kun je experimenteren en testen om te zien wat voor uitkomsten dit biedt, voordat je het meteen op grote schaal toepast. Het belangrijkste voordeel is dat de workbench overweg kan met enorm veel programmeertalen, waardoor iedere data analist in zijn eigen favoriete taal kan werken. De workbench houdt tevens exact bij welke stappen zijn doorlopen om tot een resultaat te komen. De uitkomst is weliswaar belangrijk, maar het algoritme en methoden die leiden tot het eindresultaat zijn minstens net zo belangrijk.

    De route naar één oplossing

    Als je er dieper op in gaat dan zijn er natuurlijk veel meer zaken waar juist Hortonworks of Cloudera heel erg goed in is. Of welke technologie net even beter of efficiënter is dan de andere. Dat zal het nieuwe bedrijf dwingen tot harde keuzes, maar volgens Stoop gaat dat allemaal wel goed komen. De behoefte aan een goed dataplatform is enorm groot, dat er dan keuzes gemaakt moeten worden is onvermijdelijk. Uiteindelijk speelt het bedrijf hiermee in op de kritiek die er op Hadoop is. Hadoop zelf vormt de basis van de database, maar daarboven zijn er zo veel verschillende modules die data kunnen inlezen, uitlezen of verwerken. Daardoor is het overzicht soms ver te zoeken. Het feit dat er zoveel oplossingen zijn heeft te maken met het open source karakter en de steun van bedrijven als Cloudera en Hortonworks, die bij veel projecten de grootste bijdrager zijn. Dat gaat ook veranderen met deze fusie. Er komt dit jaar nog een nieuw platform met de naam Cloudera Data Platform. In dit platform zullen de beste onderdelen van Hortonworks en Cloudera worden samengevoegd. Het betekent ook dat conflicterende projecten of modules goed nieuws zullen zijn voor de een maar slecht nieuws voor de ander. Voor het verwerken van metadata gebruiken beide bedrijven nu een andere oplossing, in het Cloudera Data Platform zullen we er maar één terug zien. Dat betekent dat het aantal modules een stukje minder wordt en alles overzichtelijker wordt, wat voor alle betrokkenen positief is.

    Cloudera Data Platform

    De naam van het nieuwe bedrijf was nog niet aan bod gekomen. De bedrijven hebben gekozen voor een fusie, maar uiteindelijk zal de naam Hortonworks gewoon verdwijnen. Het bedrijf gaat verder als Cloudera, vandaar ook de naam Cloudera Data Platform. De bedoeling is dat het Cloudera Data Platform dit jaar nog beschikbaar wordt, zodat klanten ermee kunnen gaan testen. Zodra het platform stabiel en volwassen genoeg is, krijgen klanten het advies om te migreren naar dit nieuwe platform. Alle bestaande Cloudera en Hortonworks producten zullen uiteindelijk gaan verdwijnen, maar tot eind 2022 blijven deze producten wel volledig ondersteund. Daarna moet iedereen echter over op het Cloudera Data Platform. Cloudera heeft in de meest recente versies van zijn huidige producten al rekening gehouden met een migratietraject. Bij Hortonworks zal dit vanaf nu ook gaan gebeuren. Het bedrijf gaat stappen zetten zodat bestaande producten en het nieuwe Data Platform in staat zijn om samen te werken bij de migratie naar het nieuwe platform.

    Shared data experience

    Een andere innovatie die volgens Stoop in de toekomst steeds belangrijker wordt is de shared data experience. Als klanten Cloudera producten gebruiken dan kunnen deze Hadoop-omgevingen eenvoudig aan elkaar gekoppeld worden, zodat ook de resources (CPU, GPU, geheugen) gecombineerd kunnen worden bij het analyseren van data. Stel dat een bedrijf Cloudera-omgevingen voor data-analyses heeft in eigen datacenters én cloudplatformen, maar dat het daarna ineens een heel groot project moet analyseren. In dat geval zou het al die omgevingen kunnen combineren en gezamenlijk kunnen inzetten. Daarnaast is het mogelijk om bijvoorbeeld data van lokale kantoren/filialen te combineren.

    Door fusie meer innovatie mogelijk

    Een gigantisch voordeel van deze fusie is volgens Stoop de ontwikkelcapaciteit die beschikbaar wordt om nieuwe innovatieve oplossingen te ontwikkelen. De bedrijven waren nu vaak afzonderlijk van elkaar aan vergelijkbare projecten aan het werken. Beide bedrijven droegen bijvoorbeeld bij aan een verschillend project dat om kan gaan met metadata in Hadoop. Uiteindelijk was een van de twee het wiel opnieuw aan het uitvinden, dat is nu niet meer nodig. Gezien de huidige arbeidsmarkt is het vinden van ontwikkelaars die de juiste passie en kennis hebben voor data analyse enorm lastig. Met deze fusie kan er veel efficiënter gewerkt gaan worden en kunnen er flink wat teams ingezet worden voor het ontwikkelen van nieuwe innovatieve oplossingen. Deze week vindt de Hortonworks Datasummit plaats in Barcelona. Daar zal ongetwijfeld meer bekend worden gemaakt over de fusie, de producten en de status van het nieuwe Cloudera Data Platform.

    Auteur: Coen van Eenbergen

    Bron: Techzine

     

  • Effective data analysis methods in 10 steps

    Effective data analysis methods in 10 steps

    In this data-rich age, understanding how to analyze and extract true meaning from the digital insights available to our business is one of the primary drivers of success.

    Despite the colossal volume of data we create every day, a mere 0.5% is actually analyzed and used for data discovery, improvement, and intelligence. While that may not seem like much, considering the amount of digital information we have at our fingertips, half a percent still accounts for a huge amount of data.

    With so much data and so little time, knowing how to collect, curate, organize, and make sense of all of this potentially business-boosting information can be a minefield, but online data analysisis the solution.

    To help you understand the potential of analysis and how you can use it to enhance your business practices, we will answer a host of important analytical questions. Not only will we explore data analysis methods and techniques, but we’ll also look at different types of data analysis while demonstrating how to do data analysis in the real world with a 10-step blueprint for success.

    What is a data analysis method?

    Data analysis methods focus on strategic approaches to taking raw data, mining for insights that are relevant to a business’s primary goals, and drilling down into this information to transform metrics, facts, and figures into initiatives that benefit improvement.

    There are various methods for data analysis, largely based on two core areas: quantitative data analysis methods and data analysis methods in qualitative research.

    Gaining a better understanding of different data analysis techniques and methods, in quantitative research as well as qualitative insights, will give your information analyzing efforts a more clearly defined direction, so it’s worth taking the time to allow this particular knowledge to sink in.

    Now that we’ve answered the question, ‘what is data analysis?’, considered the different types of data analysis methods, it’s time to dig deeper into how to do data analysis by working through these 10 essential elements.

    1. Collaborate your needs

    Before you begin to analyze your data or drill down into any analysis techniques, it’s crucial to sit down collaboratively with all key stakeholders within your organization, decide on your primary campaign or strategic goals, and gain a fundamental understanding of the types of insights that will best benefit your progress or provide you with the level of vision you need to evolve your organization.

    2. Establish your questions

    Once you’ve outlined your core objectives, you should consider which questions will need answering to help you achieve your mission. This is one of the most important steps in data analytics as it will shape the very foundations of your success.

    To help you ask the right things and ensure your data works for you, you have to ask the right data analysis questions.

    3. Harvest your data

    After giving your data analytics methodology real direction and knowing which questions need answering to extract optimum value from the information available to your organization, you should decide on your most valuable data sources and start collecting your insights, the most fundamental of all data analysis techniques.

    4. Set your KPIs

    Once you’ve set your data sources, started to gather the raw data you consider to potentially offer value, and established clearcut questions you want your insights to answer, you need to set a host of key performance indicators (KPIs) that will help you track, measure, and shape your progress in a number of key areas.

    KPIs are critical to both data analysis methods in qualitative research and data analysis methods in quantitative research. This is one of the primary methods of analyzing data you certainly shouldn’t overlook.

    To help you set the best possible KPIs for your initiatives and activities, explore our collection ofkey performance indicator examples.

    5. Omit useless data

    Having defined your mission and bestowed your data analysis techniques and methods with true purpose, you should explore the raw data you’ve collected from all sources and use your KPIs as a reference for chopping out any information you deem to be useless.

    Trimming the informational fat is one of the most crucial steps of data analysis as it will allow you to focus your analytical efforts and squeeze every drop of value from the remaining ‘lean’ information.

    Any stats, facts, figures, or metrics that don’t align with your business goals or fit with your KPI management strategies should be eliminated from the equation.

    6. Conduct statistical analysis

    One of the most pivotal steps of data analysis methods is statistical analysis.

    This analysis method focuses on aspects including cluster, cohort, regression, factor, and neural networks and will ultimately give your data analysis methodology a more logical direction.

    Here is a quick glossary of these vital statistical analysis terms for your reference:

    • Cluster: The action of grouping a set of elements in a way that said elements are more similar (in a particular sense) to each other than to those in other groups, hence the term ‘cluster’.
    • Cohort: A subset of behavioral analytics that takes insights from a given data set (e.g. a web application or CMS) and instead of looking at everything as one wider unit, each element is broken down into related groups.
    • Regression: A definitive set of statistical processes centered on estimating the relationships among particular variables to gain a deeper understanding of particular trends or patterns.
    • Factor: A statistical practice utilized to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called ‘factors’. The aim here is to uncover independent latent variables.
    • Neural networks: A neural network is a form of machine learning which is far too comprehensive to summarize, but this explanation will help paint you a fairly comprehensive picture.

    7. Build a data management roadmap

    While (at this point) this particular step is optional (you will have already gained a wealth of insight and formed a fairly sound strategy by now), creating a data governance roadmap will help your data analysis methods and techniques become successful on a more sustainable basis. These roadmaps, if developed properly, are also built so they can be tweaked and scaled over time.

    Invest ample time in developing a roadmap that will help you store, manage, and handle your data internally, and you will make your analysis techniques all the more fluid and functional.

    8. Integrate technology

    There are many ways to analyze data, but one of the most vital aspects of analytical success in a business context is integrating the right  decision support software and technology.

    Robust analysis platforms will not only allow you to pull critical data from your most valuable sources while working with dynamic KPIs that will offer you actionable insights, it will also present the information in a digestible, visual, interactive format from one central, live dashboard. A data analytics methodology you can count on.

    By integrating the right technology for your statistical method data analysis and core data analytics methodology, you’ll avoid fragmenting your insights, saving you time and effort while allowing you to enjoy the maximum value from your business’s most valuable insights.

    9. Answer your questions

    By considering each of the above efforts, working with the right technology, and fostering a cohesive internal culture where everyone buys into the different ways to analyze data as well as the power of digital intelligence, you will swiftly start to answer to your most important, burning business questions. 

    10. Visualize your data

    Arguably, the best way to make your data analysis concepts accessible across the organization is through data visualization. An online data visualization is a powerful tool as it lets you tell a story with your metrics, allowing users across the business to extract meaningful insights that aid business evolution. It also covers all the different ways to analyze data.

    The purpose of data analysis is to make your entire organization more informed and intelligent, and with the right platform or dashboard, this can be simpler than you think.

    Data analysis in the big data environment

    Big data is invaluable to today’s businesses, and by using different methods for data analysis, it’s possible to view your data in a way that can help you turn insight into positive action.

    To inspire your efforts and put the importance of big data into context, here are some insights that could prove helpful. Some facts that will help shape your big data analysis techniques.

    • By 2020, around 7 megabytes of new information will be generated every second for every single person on the planet.
    • A 10% boost in data accessibility will result in more than $65 million extra net income for your average Fortune 1000 company.
    • 90% of the world’s big data was created in the past three years.
    • According to Accenture, 79% of notable business executives agree that companies that fail to embrace big data will lose their competitive position and could face extinction. Moreover, 83% of business execs have implemented big data projects to gain a competitive edge.

    Data analysis concepts may come in many forms, but fundamentally, any solid data analysis methodology will help to make your business more streamlined, cohesive, insightful and successful than ever before.

    Author: Sandra Durcevic

    Source: Datapine

  • Exploring the risks of artificial intelligence

    shutterstock 117756049“Science has not yet mastered prophecy. We predict too much for the next year and yet far too little for the next ten.”

    These words, articulated by Neil Armstrong at a speech to a joint session of Congress in 1969, fit squarely into most every decade since the turn of the century, and it seems to safe to posit that the rate of change in technology has accelerated to an exponential degree in the last two decades, especially in the areas of artificial intelligence and machine learning.

    Artificial intelligence is making an extreme entrance into almost every facet of society in predicted and unforeseen ways, causing both excitement and trepidation. This reaction alone is predictable, but can we really predict the associated risks involved?

    It seems we’re all trying to get a grip on potential reality, but information overload (yet another side affect that we’re struggling to deal with in our digital world) can ironically make constructing an informed opinion more challenging than ever. In the search for some semblance of truth, it can help to turn to those in the trenches.

    In my continued interview with over 30 artificial intelligence researchers, I asked what they considered to be the most likely risk of artificial intelligence in the next 20 years.

    Some results from the survey, shown in the graphic below, included 33 responses from different AI/cognitive science researchers. (For the complete collection of interviews, and more information on all of our 40+ respondents, visit the original interactive infographic here on TechEmergence).

    Two “greatest” risks bubbled to the top of the response pool (and the majority are not in the autonomous robots’ camp, though a few do fall into this one). According to this particular set of minds, the most pressing short- and long-term risks is the financial and economic harm that may be wrought, as well as mismanagement of AI by human beings.

    Dr. Joscha Bach of the MIT Media Lab and Harvard Program for Evolutionary Dynamics summed up the larger picture this way:

    “The risks brought about by near-term AI may turn out to be the same risks that are already inherent in our society. Automation through AI will increase productivity, but won’t improve our living conditions if we don’t move away from a labor/wage based economy. It may also speed up pollution and resource exhaustion, if we don’t manage to install meaningful regulations. Even in the long run, making AI safe for humanity may turn out to be the same as making our society safe for humanity.”

    Essentially, the introduction of AI may act as a catalyst that exposes and speeds up the imperfections already present in our society. Without a conscious and collaborative plan to move forward, we expose society to a range of risks, from bigger gaps in wealth distribution to negative environmental effects.

    Leaps in AI are already being made in the area of workplace automation and machine learning capabilities are quickly extending to our energy and other enterprise applications, including mobile and automotive. The next industrial revolution may be the last one that humans usher in by their own direct doing, with AI as a future collaborator and – dare we say – a potential leader.

    Some researchers believe it’s a matter of when and not if. In Dr. Nils Nilsson’s words, a professor emeritus at Stanford University, “Machines will be singing the song, ‘Anything you can do, I can do better; I can do anything better than you’.”

    In respect to the drastic changes that lie ahead for the employment market due to increasingly autonomous systems, Dr. Helgi Helgason says, “it’s more of a certainty than a risk and we should already be factoring this into education policies.”

    Talks at the World Economic Forum Annual Meeting in Switzerland this past January, where the topic of the economic disruption brought about by AI was clearly a main course, indicate that global leaders are starting to plan how to integrate these technologies and adapt our world economies accordingly – but this is a tall order with many cooks in the kitchen.

    Another commonly expressed risk over the next two decades is the general mismanagement of AI. It’s no secret that those in the business of AI have concerns, as evidenced by the $1 billion investment made by some of Silicon Valley’s top tech gurus to support OpenAI, a non-profit research group with a focus on exploring the positive human impact of AI technologies.

    “It’s hard to fathom how much human-level AI could benefit society, and it’s equally hard to imagine how much it could damage society if built or used incorrectly,” is the parallel message posted on OpenAI’s launch page from December 2015. How we approach the development and management of AI has far-reaching consequences, and shapes future society’s moral and ethical paradigm.

    Philippe Pasquier, an associate professor at Simon Fraser University, said “As we deploy more and give more responsibilities to artificial agents, risks of malfunction that have negative consequences are increasing,” though he likewise states that he does not believe AI poses a high risk to society on its own.

    With great responsibility comes great power, and how we monitor this power is of major concern.

    Dr. Pei Wang of Temple University sees major risk in “neglecting the limitations and restrictions of hot techniques like deep learning and reinforcement learning. It can happen in many domains.” Dr. Peter Voss, founder of SmartAction, expressed similar sentiments, stating that he most fears “ignorant humans subverting the power and intelligence of AI.”

    Thinking about the risks associated with emerging AI technology is hard work, engineering potential solutions and safeguards is harder work, and collaborating globally on implementation and monitoring of initiatives is the hardest work of all. But considering all that’s at stake, I would place all my bets on the table and argue that the effort is worth the risk many times over.

    Source: Tech Crunch

  • Facing the major challenges that come with big data

    Facing the major challenges that come with big data

    Worldwide, over 2.5 quintillion bytes of data are created every day. And with the expansion of the Internet of Things (IoT), that pace is increasing. 90% of the current data in the world was generated in the last two years alone. When it comes to businesses, for a forward thinking, digitally transforming organization, you’re going to be dealing with data. A lot of data. Big data.

    Challenges faced by businesses

    While simply collecting lots of data presents comparatively few problems, most businesses run into two significant roadblocks in its use: extracting value and ensuring responsible handling of data to the standard required by data privacy legislation like GDPR. What most people don’t appreciate is the sheer size and complexity of the data sets that organizations have to store and the related IT effort, requiring teams of people working on processes to ensure that others can access the right data in the right way, when they need it, to drive essential business functions. All while ensuring personal information is treated appropriately.

    The problem comes when you’ve got multiple teams around the world, all running to different beats, without synchronizing. It’s a bit like different teams of home builders, starting work independently, from different corners of a new house. If they have all got their own methods and bricks, then by the time they meet in the middle, their efforts won’t match up. It’s the same in the world of IT. If one team is successful, then all teams should be able to learn those lessons of best practice. Meanwhile, siloed behavior can become “free form development” where developers write code to suit a specific problem that their department is facing, without reference to similar or diverse problems that other departments may be experiencing.

    In addition, often there simply aren’t enough builders going around to get these data projects turned around quickly, which can be a problem in the face of heightening business demand. In the scramble to get things done at the pace of modern business, at the very least there will be some duplication of effort, but there’s also a high chance of confusion and the foundations for future data storage and analysis won’t be firm. Creating a unified, standard approach to data processing is critical – as is finding a way to implement it with the lowest possible level of resource, at the fastest possible speeds.

    Data Vault automation

    One of the ways businesses can organize data to meet both the needs for standardization and flexibility is in a Data Vault environment. This data warehousing methodology is designed to bring together information from multiple different teams and systems into a centralized repository, providing a bedrock of information that teams can use to make decisions – it includes all of the data, all of the time, ensuring that no information is missed out of the process.

    However, while a Data Vault design is a good architect’s drawing, it won’t get the whole house built on its own. Developers can still code and build it manually over time but given its complexity they certainly cannot do this quickly, and potentially may not be able to do it in a way that can stand up to the scrutiny of data protection regulations like GDPR. Building a Data Vault environment by hand, even using standard templates, can be incredibly laborious and potentially error prone.

    This is where Data Vault automation comes in, taking care of the 90% or so of an organization’s data infrastructure that fits standardized templates and the stringent requirements that the Data Vault 2.0 methodology demands. Data vault automation can lay out the core landscape of a Data Vault, as well as make use of reliable, consistent metadata to ensure information, including personal information, can be monitored both at its source and over time as records are changed.

    Author: Dan Linstedt

    Source: Insidebigdata

  • Five Mistakes That Can Kill Analytics Projects

    Launching an effective digital analytics strategy is a must-do to understand your customers. But many organizations are still trying to figure out how to get business values from expensive analytics programs. Here are 5 common analytics mistakes that can kill any predictive analytics effort.

    Why predictive analytics projects fail

    failure of analytics

    Predictive Analytics is becoming the next big buzzword in the industry. But according to Mike Le, co-founder and chief operating officer at CB/I Digital in New York, implementing an effective digital analytics strategy has proven to be very challenging for many organizations. “First, the knowledge and expertise required to setup and analyze digital analytics programs is complicated,” Le notes. “Second, the investment for the tools and such required expertise could be high. Third, many clients see unclear returns from such analytics programs. Learning to avoid common analytics mistakes will help you save a lot of resources to focus on core metrics and factors that can drive your business ahead.” Here are 5 common mistakes that Le says cause many predictive analytics projects to fail.

    Mistake 1: Starting digital analytics without a goal

    “The first challenge of digital analytics is knowing what metrics to track, and what value to get out of them,” Le says. “As a result, we see too many web businesses that don’t have basic conversion tracking setup, or can’t link the business results with the factors that drive those results. This problem happens because these companies don’t set a specific goal for their analytics. When you do not know what to ask, you cannot know what you'll get. The purpose of analytics is to understand and to optimize. Every analytics program should answer specific business questions and concerns. If your goal is to maximize online sales, naturally you’ll want to track the order volume, cost-per-order, conversion rate and average order value. If you want to optimize your digital product, you’ll want to track how users are interact with your product, the usage frequency and the churn rate of people leaving the site. When you know your goal, the path becomes clear.”

    Mistake 2: Ignoring core metrics to chase noise

    “When you have advanced analytics tools and strong computational power, it’s tempting to capture every data point possible to ‘get a better understanding’ and ‘make the most of the tool,’” Le explains. “However, following too many metrics may dilute your focus on the core metrics that reveal the pressing needs of the business. I've seen digital campaigns that fail to convert new users, but the managers still setup advanced tracking programs to understand user 

    behaviors in order to serve them better. When you cannot acquire new users, your targeting could be wrong, your messaging could be wrong or there is even no market for your product - those problems are much bigger to solve than trying to understand your user engagement. Therefore, it would be a waste of time and resources to chase fancy data and insights while the fundamental metrics are overlooked. Make sure you always stay focus on the most important business metrics before looking broader.”

    Mistake 3: Choosing overkill analytics tools

    “When selecting analytics tools, many clients tend to believe that more advanced and expensive tools can give deeper insights and solve their problems better,” Le says. “Advanced analytics tools may offer more sophisticated analytic capabilities over some fundamental tracking tools. But whether your business needs all those capabilities is a different story. That's why the decision to select an analytics tool should be based on your analytics goals and business needs, not by how advanced the tools are. There’s no need to invest a lot of money on big analytics tools and a team of experts for an analytics program while some advanced features of free tools like Google Analytics can already give you the answers you need.”

    Mistake 4: Creating beautiful reports with little business value

    “Many times you see reports that simply present a bunch of numbers exported from tools, or state some ‘insights’ that has little relevance to the business goal,” Le notes. “This problem is so common in the analytics world, because a lot of people create reports for the sake of reporting. They don’t think about why those reports should exist, what questions they answer and how those reports can add value to the business. Any report must be created to answer a business concern. Any metrics that do not help answer business questions should be left out. Making sense of data is hard. Asking right questions early will

    help.”

    Mistake 5: Failing to detect tracking errors

    “Tracking errors can be devastating to businesses, because they produce unreliable data and misleading analysis,” Le cautions. “But many companies do not have the skills to setup tracking properly, and worse, to detect tracking issues when they happen. There are many things that can go wrong, such as a developer mistakenly removing the tracking pixels, transferring incorrect values, the tracking code firing unstably or multiple times, wrong tracking rule's logic, etc. The difference could be so subtle that the reports look normal, or are only wrong in certain scenarios. Tracking errors easily go undetected because it takes a mix of marketing and tech skills. Marketing teams usually don’t understand how tracking works, and development teams often don’t know what ‘correct’ means. To tackle this problem, you should frequently check your data accuracy and look for unusual signs in reports. Analysts should take an extra step to learn the technical aspect of tracking, so they can better sense the problems and raise smart questions for the technical team when the data looks suspicious.”

    Author: Mike Le

    Source: Information Management

  • Four important drivers of data science developments

    Four important drivers of data science developments

    According to the Gartner Group, digital business reached a tipping point last year, with 49% of CIOs reporting that their enterprises have already changed their business models or are in the process of doing so. When Gartner asked CIOs and IT leaders which technologies they expect to be most disruptive, artificial intelligence (AI) was the top-mentioned technology.

    AI and ML are having a profound impact on enterprise digital transformation becoming crucial as a competitive advantage and even for survival. As the field grows, four trends emerge, shaping data science in the next five years:

    Accelerate the full data science life-cycle

    The pressure to grow ROI from AI and ML initiatives has pushed demand for new innovative solutions that accelerate AI and data science. Although data science processes are iterative and highly manual, more than 40% of data science tasks are expected to be automated by 2020, according to Gartner, resulting in increased productivity and broader usage of data across the enterprise.

    Recently, automated machine learning (AutoML) has become one of the fastest-growing technologies for data science. Machine learning, however,  typically accounts for only 10-20% of the entire data science process. Real pains exist before the machine learning stage with data and feature engineering.  The new concept of data science automation goes beyond machine learning automation, including data preparation, feature engineering, machine learning, and the production of full data science pipelines. With data science automation, enterprises can genuinely accelerate AI and ML initiatives.

    Leverage existing resources for democratization

    Despite substantial investments in data science across many industries, the scarcity of data science skills and resources often limits the advancement of AI and ML projects in organizations. The shortage of data scientists has created a challenge for anyone implementing AI and ML initiatives, forcing a closer look at how to build and leverage data science resources.

    Other than the need for highly specialized technical skills and mathematical aptitude, data scientists must also couple these skills with domain/industry knowledge that is relevant to a specific business area. Domain knowledge is required for problem definition and result validation and is a crucial enabler to deliver business value from data science. Relying on 'data science unicorns' that have all these skill sets is neither realistic nor scalable.

    Enterprises are focusing on repurposing existing resources as 'citizen' data scientists. The rise of AutoML and data science automation can unlock data science to a broader user base and allow the practice to scale. By empowering citizen data scientists allowing them to execute standard use cases, skilled data scientists can focus on high-impact, technically-challenging projects to produce higher values.

    Augment insights for greater transparency

    As more organizations are adopting data science in their business process, relying on AI-derived recommendations that lack transparency is becoming problematic. Increased regulatory oversight like the GDPR has exacerbated the problem. Transparent insights make AI models more 'oversight' friendly and have the added benefit of being far more actionable.

    White-box AI models help organizations maintain accountability in data-driven decisions and allow them to live within the boundaries of regulations. The challenge is the need for high-quality and transparent inputs (aka 'features'), often requiring multiple manual iterations to achieve the needed transparency. Data science automation allows data scientists to explore millions of hypotheses and augments their ability to discover transparent and predictive features as business insights.

    Operationalize data science in business

    Although ML models are often tiny pieces of code, when models are finally deemed ready for production, deploying them can be complicated and problematic. For example, since data scientists are not software engineers, the quality of their code may not be production-ready. Data scientists often validate the models with down-sampled datasets in labs environments and models may not be scalable enough for production-scale datasets. Also, the performance of deployed models decreases as data invariably changes, making model maintenance pivotal to extract business value from AI and ML models continuously. Data and feature pipelines are much bigger and more complex than ML models themselves, and operationalizing data and feature pipelines is even more complicated.  One of the promising approaches is to leverage concepts from continuous deployment through APIs. Data science automation can generate APIs to execute the full data science pipeline, accelerating deployments while also providing an ongoing connection to development systems to accelerate the optimization and maintenance of models.

    Data science is at the heart of AI and ML. While the promise of AI is real, the problems associated with data science are also real. Through better planning, closer cooperation with line of business and by automating the more tedious and repetitive parts of the process, data scientists can finally begin to focus on what to solve, rather than how to solve.

    Author: Daniel Gutierrez

    Source: Insidebigdata

  • Gaining advantages with the IoT through 'Thing Management'

    Gaining advantages with the IoT through 'Thing Management'

    Some are calling the industrial Internet of Things the next industrial revolution, bringing dramatic changes and improvements to almost every sector. But to be sure it’s successful, there is one big question: how can organizations manage all the new things that are part of their organizations’ landscapes?

    Most organizations see asset management as the practice of tracking and managing IT devices such as routers, switches, laptops and smartphones. But that’s only part of the equation nowadays. With the advent of the IoT, enterprise things now include robotic bricklayers, agitators, compressors, drug infusion pumps, track loaders, scissor lifts and the list goes on and on, while all these things are becoming smarter and more connected.

    These are some examples for specific industries:

    ● Transportation is an asset-intensive industry that relies on efficient operations to achieve maximum profitability. To help customers manage these important assets, GE Transportation is equipping its locomotives with devices that manage hundreds of data elements per second. The devices decipher locomotive data and uncover use patterns that keep trains on track and running smoothly.

    ● The IoT’s promise for manufacturing is substantial. The IoT can build bridges that help solve the frustrating disconnects among suppliers, employees, customers, and others. In doing so, the IoT can create a cohesive environment where every participant is invested in and contributing to product quality and every customer’s feedback is learned from. Smart sensors, for instance, can ensure that every item, from articles of clothing to top-secret defense weapons, can have the same quality as the one before. The only problem with this is that the many pieces of the manufacturing puzzle and devices in the IoT are moving so quickly that spreadsheets and human analysis alone are not enough to manage the devices.

    ● IoT in healthcare will help connect a multitude of people, things with smart sensors (such as wearables and medical devices), and environments. Sensors in IoT devices and connected “smart” assets can capture patient vitals and other data in real time. Then data analytics technologies, including machine learning and artificial intelligence (AI), can be used to realize the promise of value-based care. There’s significant value to be gained, including operational efficiencies that boost the quality of care while reducing costs, clinical improvements that enable more accurate diagnoses, and more.

    ● In the oil and gas industry, IoT sensors have transformed efficiencies around the complex process of natural resource extraction by monitoring the health and efficiency of hard-to-access equipment installations in remote areas with limited connectivity.

    ● Fuelled by greater access to cheap hardware, the IoT is being used with notable success in logistics and fleet management by enabling cost-effective GPS tracking and automated loading/unloading.

    All of these industries will benefit from the IoT. However, as the IoT world expands, these industries and others are looking for ways to track the barrage of new things that are now pivotal to their success. Thing Management pioneers such as Oomnitza help organizations manage devices as diverse as phones, fork lifts, drug infusion pumps, drones and VR headset, providing an essential service as the industrial IoT flourishes.

    Think IoT, not IoP

    To successfully manage these Things, enterprises are not only looking for Thing Management. They also are rethinking the Internet, not as the Internet of People (IoP), but as the Internet of Things (IoP). Things aren’t people, and there are three fundamental differences.

    Many more things are connected to the Internet than people

    John Chambers, former CEO of Cisco, recently declared there will be 500 billion things connected by 2024. That’s nearly 100 times the number of people on the planet.

    Things have more to say than people

    A typical cell phone has nearly 14 sensors, including an accelerometer, GPS, and even a radiation detector. Industrial things such as wind turbines, gene sequencers, and high-speed inserters can easily have over 100 sensors.

    Things can speak much more frequently

    People enter data at a snail’s pace when compared to the barrage of data coming from the IoT. A utility grid power sensor, for instance, can send data 60 times per second, a construction forklift once per minute, and a high-speed inserter once every two seconds.

    Technologists and business people both need to learn how to collect and put all of the data coming from the industrial IoT to use and manage every connected thing. They will have to learn how to build enterprise software for things versus people.

    How the industrial IoT will shape the future

    The industrial IoT is all about value creation: increased profitability, revenue, efficiency, and reliability. It starts with the target of safe, stable operations and meeting environmental regulations, translating to greater financial results and profitability.

    But there’s more to the big picture of the IoT than that. Building the next generation of software for things is a worthy goal, with potential results such as continually improving enterprise efficiency and public safety, driving down costs, decreasing environmental impacts, boosting educational outcomes and more. Companies like GE, Oomnitza and Bosch are investing significant amounts of money in the ability to connect, collect data, and learn from their machines.

    The IoT and the next generation of enterprise software will have big economic impacts as well. The cost savings and productivity gains generated through “smart” thing monitoring and adaptation are projected to create $1.1 trillion to $2.5 trillion in value in the health care sector, $2.3 trillion $11.6 trillion in global manufacturing, and $500 billion $757 billion in municipal energy and service provision over the next decade. The total global impact of IoT technologies could generate anywhere from $2.7 trillion to $14.4 trillion in value by 2025.

    Author: Timothy Chou

    Source: Information-management

  • Gaining control of big data with the help of NVMe

    Gaining control of big data with the help of NVMe

    Every day there is an unfathomable amount of data, nearly 2.5 quintillion bytes, being generated all around us. Part of the data being created we see every day, such as pictures and videos on our phones, social media posts, banking and other apps.

    In addition to this, there is data being generated behind the scenes by ubiquitous sensors and algorithms, whether that’s to process quicker transactions, gain real-time insights, crunch big data sets or to simply meet customer expectations. Traditional storage architectures are struggling to keep up with all this data creation, leading IT teams to investigate new solutions to keep ahead and take advantage of the data boom.

    Some of the main challenges are understanding performance, removing data throughput bottlenecks and being able to plan for future capacity. Architecture can often lock businesses in to legacy solutions, and performance needs can vary and change as data sets grow.

    Architectures designed and built around NVMe(non-volatile memory express) can provide the perfect balance, particularly for data-intensive applications that demand fast performance. This is extremely important for organizations that are dependent on speed, accuracy and real-time data insights.

    Industries such as healthcare, autonomous vehicles, artificial intelligence(AI)/machine learning(ML) and Genomics are at the forefront of the transition to high performance NVMe storage solutions that deliver fast data access for high performance computing systems that drive new research and innovations.

    Genomics

    With traditional storage architectures, detailed genome analysis can take upwards of five days to complete, which makes sense considering an initial analysis of one person’s genome produces approximately 300GB - 1TB of data, and a single round of secondary analysis on just one person’s genome can require upwards of 500TB storage capacity. However, with an NVMe solution implemented it’s possible to get results in just one day.

    In a typical study, genome research and life sciences companies need to process, compare and analyze the genomes of between 1,000 and 5,000 people per study. This is a huge amount of data to store, but it’s imperative that it’s done. These studies are working toward revolutionary scientific and medical advances, looking to personalize medicine and provide advanced cancer treatments. This is only now becoming possible thanks to the speed that NVMe enables researchers to explore and analyze the human genome.

    Autonomous vehicles

    A growing trend in the tech industry is the one of autonomous vehicles. Self-driving cars are the next big thing, and various companies are working tirelessly to perfect the idea. In order to function properly, these vehicles need very fast storage to accelerate the applications and data that ‘drive’ autonomous vehicle development. Core requirements for autonomous vehicle storage include:

    • Must have a high capacity in a small form factor
    • Must be able to accept input data from cameras and sensors at “line rate” – AKA have extremely high throughput and low latency
    • Must be robust and survive media or hardware failures
    • Must be “green” and have minimal power footprint
    • Must be easily removable and reusable
    • Must use simple but robust networking

    What kind of storage meets all these requirements? That’s right – NVMe.

    Artificial Intelligence

    Artificial Intelligence (AI) is gaining a lot of traction in a variety of industries varying from financial to manufacturing, and beyond. In financial, AI does things like predict investment trends. In manufacturing, AI-based image recognition software checks for defects during product assembly. Wherever it’s used, AI needs a high level of computing power, coupled with a high-performance and low-latency architecture in order to enable parallel processing power of data in real-time.

    Once again, NVMe steps up to the plate, providing the speed and processing power that is critical during training and inference. Without NVMe to prevent bottlenecks and latency issues, these stages can take much, much longer. Which, in turn, can lead to the temptation to take shortcuts, causing software to malfunction or make incorrect decisions down the line.

    The rapid increase of data creation has put traditional storage architectures under high pressure due to its lack of scalability and flexibility, both of which are required to fulfill future capacity and performance requirements. This is where NVMe comes in, breaking the barriers of existing designs by offerings unanticipated density and performance. The breakthroughs that NVMe is able to offer contain the requirements needed to help manage and maintain the data boom.

    Author: Ron Herrmann

    Source: Dataversity

     

  • Gartner: 5 cool vendors in data science and machine learning

    data scienceResearch firm Gartner has identified five "cool vendors" in the data science and machine learning space, identifying the features that make their products especially unique or useful. The report, "5 Cool Vendors in Data Science and Machine Learning" was written by analysts Peter Krensky, Svetlana Sicular, Jim Hare, Erick Brethenoux and Austin Kronz. Here are the highlights of what they had to say about each vendor.

    DimensionalMechanics

    Bellevue, Washington
    www.dimensionalmechanics.com
    “DimensionalMechanics has built a data science platform that breaks from market traditions; where more conventional vendors have developed work flow-based or notebook-based data science environments, DimensionalMechanics has opted for a “data-science metalanguage,” Erick Brethenoux writes. “In effect, given the existing use cases the company has handled so far, its NeoPulse Framework 2.0 acts as an “AutoDL” (Auto-Deep Learning) platform. This makes new algorithms and approaches to unusual types of data (such as images, videos and sounds) more accessible and deployable.”

    Immuta

    College Park, Maryland
    www.immuta.com
    “Immuta offers a dedicated data access and management platform for the development of machine learning and other advanced analytics, and the automation of policy enforcement,” Peter Krensky and Jim Hare write. “The product serves as a control layer to rapidly connect and control access between myriad data sources and the heterogeneous array of data science tools without the need to move or copy data. This approach addresses the market expectation that platforms supporting data science will be highly flexible and extensible to the data portfolio and toolkit of a user’s choosing.”

    Indico

    Boston, Massachusetts
    www.indico.io
    “Indico offers a group of products with a highly accessible set of functionality for exploring and modeling unstructured data and automating processes,” according to Peter Krensky and Austin Kronz. “The offering can be described as a citizen data science toolkit for applying deep learning to text, images and document-based data. Indico’s approach makes deep learning a practical solution for subject matter experts (SMEs) facing unstructured content challenges. This is ambitious and exciting, as both deep learning and unstructured content analytics are areas where even expert data scientists are still climbing the learning curve.”

     

    Octopai

    Rosh HaAyin, Israel & New York, New York
    www.octopai.com
    “Octopai solves a foundational problem for data-driven organizations — enabling data science teams and citizen data scientists to quickly find the data, establish trust in data sources and achieve transparency of data lineage through automation,” explains Svetlana Sicular. “It connects the dots of complex data pipelines by using machine learning and pattern analysis to determine the relationships among different data elements, the context in which the data was created, and the data’s prior uses and transformations. Such access to more diverse, transparent and trustworthy data leads to better quality analytics and machine learning.”

     

    ParallelM

    Tel Aviv, Israel & Sunnyvale, California
    www.parallelm.com
    “ParallelM is one of the first software platforms principally focused on the data science operationalization process,” Erick Brethenoux writes. “The focus of data science teams has traditionally been on developing analytical assets, while dealing with the operationalization of these assets has been an afterthought. Deploying analytical assets within operational processes in a repeatable, manageable, secure and traceable manner requires more than a set of APIs and a cloud service; a model that has been scored (executed) has not necessarily been managed. ParallelM’s success and the general development of operationalization functionality within platforms will be an indicator of the success of an entire generation of data scientists.”

     Source: Information Management

     

  • Gartner: US government agencies falling behind digital businesses in other industries

    Gartner: US government agencies falling behind digital businesses in other industries

    A Gartner survey of more than 500 government CIOs shows that government agencies are falling behind other industries when it comes to planned investments in digital business initiatives. Just 17% of government CIOs say they’ll be increasing their investments, compared to 34% of CIOs in other industries.

    What’s holding government agencies back? While Gartner notes that their CIOs demonstrate a clear vision for the potential of digital government and emerging technologies, almost half of those surveyed (45%) say they lack the IT and business resources required to execute. Other common barriers include lack of funding (39%), as well as a challenge organizations across all industries struggle with: culture and resistance to change (37%).

    Another key challenge is the ability to scale digital initiatives, where government agencies lag by 5% against all other industries. To catch up, government CIOs see automation as a potential tool. This aligns with respondents’ views on 'game-changing' technologies for government. The top five in order are:

    • Artificial intelligence (AI) and machine learning (27%)
    • Data analytics, including predictive analytics (22%)
    • Cloud (19%)
    • Internet of Things (7%)
    • Mobile, including 5G (6%)

    Of the more than 500 government respondents in Gartner’s survey, 10% have already deployed an AI solution, 39% say they plan to deploy one within the next one to two years, and 36% intend to use AI to enable automation, scale of digital initiatives, and reallocation of human resources within the next two to three years.

    Investing today for tomorrow's success

    When it comes to increased investment this year (2019), BI and data analytics (43%), cyber and information security (43%), and cloud services and solutions (39%) top the tech funding list.

    As previous and current digital government initiatives start to take hold, CIOs are seeing moderate improvements in their ability to meet the increasing demands and expectations of citizens. 65% of CIOs say that their current digital government investments are already paying off. A great example of this is the U.S. Department of Housing and Urban Development’s use of BI and data analytics to modernize its Grants Dashboard.

    Despite budget and cultural change challenges typically associated with digital government initiatives, make no mistake: many agencies are making great strides and are now competing or leading compared to other organizations and industries.

    There’s never been a better time to invest in game changing technologies to both quickly catch up, and potentially take the lead.

    Author: Rick Nelson

    Source: Microstrategy

  • Hadoop: waarvoor dan?

    Hadoop

    Flexibel en schaalbaar managen van big data

    Data-infrastructuur is het belangrijkste orgaan voor het creëren en leveren van goede bedrijfsinzichten . Om te profiteren van de diversiteit aan data die voor handen zijn en om de data-architectuur te moderniseren, zetten veel organisaties Hadoop in. Een Hadoop-gebaseerde omgeving is flexibel en schaalbaar in het managen van big data. Wat is de impact van Hadoop? De Aberdeen Group onderzocht de impact van Hadoop op data, mensen en de performance van bedrijven.

    Nieuwe data uit verschillende bronnen

    Er moet veel data opgevangen, verplaatst, opgeslagen en gearchiveerd worden. Maar bedrijven krijgen nu inzichten vanuit verborgen data buiten de traditionele gestructureerde transactiegegevens. Denk hierbij aan: e-mails, social data, multimedia, GPS-informatie en sensor-informatie. Naast nieuwe databronnen hebben we ook een grote hoeveelheid nieuwe technologieën gekregen om al deze data te beheren en te benutten. Al deze informatie en technologieën zorgen voor een verschuiving binnen big data; van probleem naar kans.

    Wat zijn de voordelen van deze gele olifant (Hadoop)?

    Een grote voorloper van deze big data-kans is de data architectuur Hadoop. Uit dit onderzoek komt naar voren dat bedrijven die Hadoop gebruiken meer gedreven zijn om gebruik te maken van ongestructureerde en semigestructureerd data. Een andere belangrijke trend is dat de mindset van bedrijven verschuift, ze zien data als een strategische aanwinst en als een belangrijk onderdeel van de organisatie.

    De behoefte aan gebruikersbevoegdheid en gebruikerstevredenheid is een reden waarom bedrijven kiezen voor Hadoop. Daarnaast heeft een Hadoop-gebaseerde architectuur twee voordelen met betrekking tot eindgebruikers:

    1. Data-flexibiliteit – Alle data onder één dak, wat zorgt voor een hogere kwaliteit en usability.
    2. Data-elasticiteit – De architectuur is significant flexibeler in het toevoegen van nieuwe databronnen.

    Wat is de impact van Hadoop op uw organisatie?

    Wat kunt u nog meer met Hadoop en hoe kunt u deze data-architectuur het beste inzetten binnen uw databronnen? Lees in dit rapport hoe u nog meer tijd kunt besparen in het analyseren van data en uiteindelijk meer winst kunt behalen door het inzetten van Hadoop.

    Bron: Analyticstoday

  • Harnessing the value of Big Data

    big dataTo stay competitive and grow in today’s market, it becomes necessary for organizations to closely correlate both internal and external data, and draw meaningful insights out of it.

    During the last decade a tremendous amount of data has been produced by internal and external sources in the form of structured, semi-structured and unstructured data. These are large quantities of human or machine generated data produced by heterogeneous sources like social media, field devices, call centers, enterprise applications, point of sale etc., in the form of text, image, video, PDF and more.

    The “Volume”, “Varity” and “Velocity” of data have posed a big challenge to the enterprise. The evolution of “Big Data” technology has been a boon to the enterprise towards effective management of large volumes of structured and unstructured data. Big data analytics is expected to correlate this data and draw meaningful insights out of it.

    However, it has been seen that, a siloed big data initiative has failed to provide ROI to the enterprise. A large volume of unstructured data can be more a burden than a benefit. That is the reason that several organizations struggle to turn data into dollars.

    On the other hand, an immature MDM program limits an organization’s ability to extract meaningful insights from big data. It is therefore of utmost importance for the organization to improve the maturity of the MDM program to harness the value of big data.

    MDM helps towards the effective management of master information coming from big data sources, by standardizing and storing in a central repository that is accessible to business units.

    MDM and Big Data are closely coupled applications complementing each other. There are many ways in which MDM can enhance big data applications, and vice versa. These two types of data pertain to the context offered by big data and the trust provided by master data.

    MDM and big data – A matched pair

    At first hand, it appears that MDM and big data are two mutually exclusive systems with a degree of mismatch. Enterprise MDM initiative is all about solving business issues and improving data trustworthiness through the effective and seamless integration of master information with business processes. Its intent is to create a central trusted repository of structured master information accessible by enterprise applications.

    The big data system deals with large volumes of data coming in unstructured or semi-structured format from heterogeneous sources like social media, field devises, log files and machine generated data.  The big data initiative is intended to support specific analytics tasks within a given span of time after that it is taken down. In Figure 1 we see the characteristics of MDM and big data.  

     

    MDM

    Big Data

    Business Objective

      Provides a single version of trust of Master and Reference information.

      Acts as a system of record / system of reference for enterprise.

      Provides cutting edge analytics and offer a competitive advantage

    Volume of Data and Growth

      Deals with Master Data sets which are smaller in volume

      Grow with relatively slower rate.

      Deal with enormous large volumes of data, so large that current databases struggle to handle it.

      The growth of Big Data is very fast.

    Nature of Data

      Permanent and long lasting

      Ephemeral in nature; disposable if not useful.

    Types of Data (Structure and Data Model)

      It is more towards containing structured data in a definite format with a pre-defined data model.

      Majority of Big Data is either semi-structured or unstructured, lacking in a fixed data model.

    Source of Data

      Oriented around internal enterprise centric data.

      Platform to integrate the data coming from multiple internal and external sources including social media, cloud, mobile, machine generated data etc.

    Orientation

      Supports both analytical and operational environment.

      Fully analytical oriented

    Despite apparent differences there are many ways in which MDM and big data complement each other.

    Big data offers context to MDM

    Big data can act as an external source of master information for the MDM hub and can help enrich internal Master Data in the context of the external world.  MDM can help aggregate the required and useful information coming from big data sources with  internal master records.

    An aggregated view and profile of master information can help  link the customer correctly and in turn help perform effective analytics and campaign. MDM can act as a hub between the system of records and system of engagement.

    However, not all data coming from big data sources will be relevant for MDM. There should be a mechanism to process the unstructured data and distinguish the relevant master information and the associated context. NoSQL offering, Natural Language Processing, and other semantic technologies can be leveraged towards distilling the relevant master information from a pool of unstructured/semi-structured data.

    MDM offers trust to big data

    MDM brings a single integrated view of master and reference information with unique representations for an enterprise. An organization can leverage MDM system to gauge the trustworthiness of data coming from big data sources.

    Dimensional data residing in the MDM system can be leveraged towards linking the facts of big data. Another way is to leverage the MDM data model backbone (optimized for entity resolution) and governance processes to bind big data facts.

    The other MDM processes like data cleansing, standardization, matching and duplicate suspect processing can be additionally leveraged towards increasing the uniqueness and trustworthiness of big data.

    MDM system can support big data by:

    • Holding the “attribute level” data coming from big data sources e.g. social media Ids, alias, device Id, IP address etc.
    • Maintaining the code and mapping of reference information. 
    • Extracting and maintaining the context of transactional data like comments, remarks, conversations, social profile and status etc. 
    • Facilitating entity resolution.
    • Maintaining unique, cleansed golden master records
    • Managing the hierarchies and structure of the information along with linkages and traceability. E.g. linkages of existing customer with his/her Facebook id linked-in Id, blog alias etc.
    • MDM for big data analytics – Key considerations

    Traditional MDM implementation, in many cases, is not sufficient to accommodate big data sources. There is a need for the next generation MDM system to incorporate master information coming from big data systems. An organization needs to take the following points into consideration while defining Next Gen MDM for big data:

    Redefine information strategy and topology

    The overall information strategy needs to get reviewed and redefined in the context of big data and MDM. The impact of changes in topology needs to get accessed thoroughly. It is necessary to define the linkages between these two systems (MDM and big data), and how they operate with internal and external data. For example, the data coming from social media needs to get linked with internal customer and prospect data to provide an integrated view at the enterprise level.

    Information strategy should address following:

    Integration point between MDM and big data - How big data and MDM systems are going to interact with each other.
    Management of master data from different sources - How the master data from internal and external sources is going to be managed.
     Definition and classification of master data - How the master data coming from big data sources gets defined and classified.
    Process of unstructured and semi-structured master data - How master data from big data sources in the form of unstructured and semi-structured data is going to be processed.
    Usage of master data - How the MDM environment are going to support big data analytics and other enterprise applications.

    Revise data architecture and strategy

    The overall data architecture and strategy needs to be revised to accommodate changes with respect to the big data. The MDM data model needs to get enhanced to accommodate big data specific master attributes. For example the data model should accommodate social media and / or IoT specific attributes such as social media Ids, aliases, contacts, preferences, hierarchies, device Ids, device locations, on-off period etc. Data strategy should get defined towards effective storage and management of internal and external master data.

    The revised data architecture strategy should ensure that:

    • The MDM data model accommodates all big data specific master attributes
    • The local and global master data attributes should get classified and managed as per the business needs
    • The data model should have necessary provision to interlink the external (big data specifics) and internal master data elements. The necessary provisions should be made to accommodate code tables and reference data.

     Define advanced data governance and stewardship

     A significant amount of challenges are associated towards governing Master Data coming from big data sources because of the unstructured nature and data flowing from various external sources. The organization needs to define advance policy, processes and stewardship structure that enable big data specifics governance.

    Data governance process for MDM should ensure that:

    Right level of data security, privacy and confidentiality to be maintained for customer and other confidential master data.
    Right level of data integrity to be maintained between internal master data and master data from big data sources. 
    Right level of linkages between reference data and master data to exist.
    Policies and processes need to be redefined/enhanced to support big data and related business transformation rules and control access for data sharing and distribution, establishing the ongoing monitoring and measurement mechanisms and change.
    A dedicated group of big data stewards available for master data review, monitoring and conflict management.

    Enhance integration architecture

     The data integration architecture needs to be enhanced to accommodate the master data coming from big data sources. The MDM hub should have the right level of integration capabilities to integrate with big data using Ids, reference keys and other unique identifiers.

    The unstructured, semi-structured and multi-structured data will get parsed using big data parser in the form of logical data objects. This data will get processed further, matched, merged and get loaded with the appropriate master information to the MDM hub.

    The enhanced integration architecture should ensure that:

    The MDM environment has the ability to parse, transform and integrate the data coming from the big data platform.
    The MDM environment has the intelligence built to analyze the relevance of master data coming from big data environment, and accept or reject accordingly.

    Enhance match and merge engine

     MDM system should enhance the “Match & Merge” engine so that master information coming from big data sources can correctly be identified and integrated into the MDM hub. A blend of probabilistic and deterministic matching algorithm can be adopted.

    For example, the successful identification of the social profile of existing customers and making it interlinked with existing data in the MDM hub. The context of data quality will be more around the information utility for the consumer of the data than objective “quality”.

    The enhanced match and merge engine should ensure that:

    • The master data coming from big data sources get effectively matched with internal data residing in the MDM Hub.
    • The “Duplicate Suspect” master records get identified and processed effectively.
    • The engine should recommend the “Accept”, “Reject”, “Merge” or “Split” of the master records coming from big data sources.

     

    In this competitive era, organizations are striving hard to retain their customers.  It is of utmost importance for an enterprise to keep a global view of customers and understand their needs, preferences and expectations.

    Big data analytics coupled with MDM backbone is going to offer the cutting edge advantage to enterprise towards managing the customer-centric functions and increasing profitability. However, the pairing of MDM and big data is not free of complications. The enterprise needs to work diligently on the interface points so to best harness these two technologies.

    Traditional MDM systems needs to get enhanced to accommodate the information coming from big data sources, and draw a meaningful context. The big data system should leverage MDM backbone to interlink data and draw meaningful insights.

    Bron: Information Management, 2017, Sunjay Kumar

  • Hé Data Scientist! Are you a geek, nerd or suit?

    NerdData scientists are known for their unique skill sets. While thousands of compelling articles have been written about what a data scientist does, most of these articles fall short in examining what happens after you’ve hired a new data scientist to your team. 

    The onboarding process for your data scientist should be based on the skills and areas of improvement you’ve identified for the tasks you want them to complete. Here’s how we do it at Elicit.

    We’ve all seen the data scientist Venn diagrams over the past few years, which includes three high-level types of skills: programming, statistics/modeling, and domain expertise. Some even feature the ever-elusive “unicorn” at the center. 

    While these diagrams provide us with a broad understanding of the skillset required for the role in general, they don’t have enough detail to differentiate data scientists and their roles inside a specific organization. This can lead to poor hires and poor onboarding experiences.

    If the root of what a data scientist does and is capable of is not well understood, then both parties are in for a bad experience. Near the end of 2016, Anand Ramanathan wrote a post that really stuck with me called //medium.com/@anandr42/the-data-science-delusion-7759f4eaac8e" style="box-sizing:border-box;background-color:transparent;color:rgb(204, 51, 51);text-decoration:none">The Data Science Delusion. In it, Ramanathan talks about how within each layer of the data science Venn diagram there are degrees of understanding and capability.

    For example, Ramanathan breaks down the modeling aspect into four quadrants based on modeling difficulty and system complexity, explaining that not every data scientist has to be capable in all four quadrants—that different problems call for different solutions and different skillsets. 

    For example, if I want to understand customer churn, I probably don’t need a deep learning solution. Conversely, if I’m trying to recognize images, a logistic regression probably isn’t going to help me much.

    In short, you want your data scientist to be skilled in the specific areas that role will be responsible for within the context of your business.

    Ramanathan’s article also made me reflect on our data science team here at Elicit. Anytime we want to solve a problem internally or with a client we use our "Geek Nerd Suit" framework to help us organize our thoughts.

    Basically, it states that for any organization to run at optimal speed, the technology (Geek), analytics (Nerd), and business (Suit) functions must be collaborating and making decisions in lockstep. Upon closer inspection, the data science Venn diagram is actually comprised of Geek (programming), Nerd (statistics/modeling), and Suit (domain expertise) skills.

    But those themes are too broad; they still lack the detail needed to differentiate the roles of a data scientist. And we’d heard this from our team internally: in a recent employee survey, the issue of career advancement, and more importantly, skills differentiation, cropped up from our data science team.

    As a leadership team, we always knew the strengths and weaknesses of our team members, but for their own sense of career progression they were asking us to be more specific and transparent about them. This pushed us to go through the exercise of taking a closer look at our own evaluation techniques, and resulted in a list of specific competencies within the Geek, Nerd, and Suit themes. We now use these competencies both to assess new hires and to help them develop in their careers once they’ve joined us.

    For example, under the Suit responsibilities we define a variety of competencies that, amongst other things, include adaptability, business acumen, and communication. Each competency then has explicit sets of criteria associated with them that illustrate a different level of mastery within that competency. 

    We’ve established four levels of differentiation: “entry level,” “intermediate,” “advanced” and “senior.” To illustrate, here’s the distinction between “entry level” and “intermediate” for the Suit: Adaptability competency:

    Entry Level:

    • Analyzes both success and failures for clues to improvement.
    • Maintains composure during client meetings, remaining cool under pressure and not becoming defensive, even when under criticism.

    Intermediate:

    • Experiments and perseveres to find solutions.
    • Reads situations quickly.
    • Swiftly learns new concepts, skills, and abilities when facing new problems.

    And there are other specific criteria for the “advanced” and “senior” levels as well. 

    This led us to four unique data science titles—Data Scientist I, II, and III, as well as Senior Data Scientist, with the latter title still being explored for further differentiation. 

    The Geek Nerd Suit framework, and the definitions of the competencies within them, gives us clear, explicit criteria for assessing a new hire’s skillset in the three critical dimensions that are required for a data scientist to be successful.

    In Part 2, I’ll discuss what we specifically do within the Geek Nerd Suit framework to onboard a new hire once they’ve joined us—how we begin to groom the elusive unicorn. 

    Source: Information Management

    Author: Liam Hanham

  • Hoe werkt augmented intelligence?

    artificial-intelligenceComputers en apparaten die met ons meedenken zijn al lang geen sciencefiction meer. Artificial intelligence (AI) is terug te vinden in wasmachines die hun programma aanpassen aan de hoeveelheid was en computerspellen die zich aanpassen aan het niveau van de spelers. Hoe kunnen computers mensen helpen slimmer te beslissen? Deze uitgebreide whitepaper beschrijft welke modellen in het analyseplatform HPE IDOL worden toegepast.

    Mathematische modellen zorgen voor menselijke maat

    Processors kunnen in een oogwenk een berekening uitvoeren waar mensen weken tot maanden mee bezig zouden zijn. Daarom zijn computers betere schakers dan mensen, maar slechter in poker waarin de menselijke maat een grotere rol speelt. Hoe zorgt een zoek- en analyseplatform ervoor dat er meer ‘mens’ in de analyse terechtkomt? Dat wordt gerealiseerd door gebruik te maken van verschillende mathematische modellen.

    Analyses voor tekst, geluid, beeld en gezichten

    De kunst is om uit data actiegerichte informatie te verkrijgen. Dat lukt door patroonherkenning in te zetten op verschillende datasets. Daarnaast spelen classificatie, clustering en analyse een grote rol bij het verkrijgen van de juiste inzichten. Niet alleen teksten worden geanalyseerd, steeds vaker worden ook geluidsbestanden en beelden, objecten en gezichten geanalyseerd.

    Artificial intelligence helpt de mens

    De whitepaper beschrijft uitvoerig hoe patronen worden gevonden in tekst, audio en beelden. Hoe snapt een computer dat de video die hij analyseert over een mens gaat? Hoe wordt van platte beelden een geometrisch 3d-beeld gemaakt en hoe beslist een computer wat hij ziet? Denk bijvoorbeeld aan een geautomatiseerd seintje naar de controlekamer als het te druk is op een tribune of een file ontstaat. Hoe helpen theoretische modellen computers als mensen waarnemen en onze beslissingen ondersteunen? Dat en meer leest u in de whitepaper Augmented intelligence Helping humans make smarter decisions. Zie hiervoor AnalyticsToday

    Analyticstoday.nl, 12 oktober 2016

  • How artificial intelligence will shape the future of business

    How artificial intelligence will shape the future of business

    From the boardroom at the office to your living room at home, artificial intelligence (AI) is nearly everywhere nowadays. Tipped as the most disruptive technology of all time, it has already transformed industries across the globe. And companies are racing to understand how to integrate it into their own business processes.

    AI is not a new concept. The technology has been with us for a long time, but in the past, there were too many barriers to its use and applicability in our everyday lives. Now improvements in computing power and storage, increased data volumes and more advanced algorithms mean that AI is going mainstream. Businesses are harnessing its power to reinvent themselves and stay relevant in the digital age.

    The technology makes it possible for machines to learn from experience, adjust to new inputs and perform human-like tasks. It does this by processing large amounts of data and recognising patterns. AI analyses much more data than humans at a much deeper level, and faster.

    Most organisations can’t cope with the data they already have, let alone the data that is around the corner. So there’s a huge opportunity for organisations to use AI to turn all that data into knowledge to make faster and more accurate decisions.

    Customer experience

    Customer experience is becoming the new competitive battleground for all organisations. Over the next decade, businesses that dominate in this area will be the ones that survive and thrive. Analysing and interpreting the mountains of customer data within the organisation in real time and turning it into valuable insights and actions will be crucial.

    Today most organisations are using data only to report on what their customers did in the past. SAS research reveals that 93% of businesses currently cannot use analytics to predict individual customer needs.

    Over the next decade, we will see more organisations using machine learning to predict future customer behaviours and needs. Just as an AI machine can teach itself chess, organizations can use their existing massive volumes of customer data to teach AI what the next-best action for an individual customer should be. This could include what product to recommend next or which marketing activity is most likely to result in a positive response.

    Automating decisions

    In addition to improving insights and making accurate predictions, AI offers the potential to go one step further and automate business decision making entirely.

    Front-line workers or dependent applications make thousands of operational decisions every day that AI can make faster, more accurately and more consistently. Ultimately this automation means improving KPIs for customer satisfaction, revenue growth, return on assets, production uptime, operational costs, meeting targets and more.

    Take Shop Direct for example, which owns the Littlewoods and Very brands. This approach saw Shop Direct’s profits surge by 40%, driven by a 15.9% increase in sales from Very.co.uk. It uses AI from SAS to analyse customer data in real time and automate decisions to drive groundbreaking personalisation at an individual customer level.

    AI is here. It’s already being adopted faster than the arrival of the internet. And it’s delivering business results across almost every industry today. In the next decade, every successful company will have AI. And the effects on skills, culture and structure will deliver superior customer experiences.

    Author: Tiffany Carpenter

    Source: SAS

  • How automated data analytics can improve performance

    How automated data analytics can improve performance

    Data, data, data. Something very valuable to brands. They need it in order to make informed decisions and in the long term, make their brand grow. That part is probably common knowledge, right? What you are probably wondering is how big brands are choosing and using the right data analytics that will bring results. Find out the answer to that question here.

    Data analytics to learn more about brand performance

    More and more companies are investing in brand. The problem is that they don’t know if their investment is bringing results or not. Of course they can work off their gut feeling or some numbers here and there from Google Analytics or the like, but what does that really tell them about the impact of their brand campaigns? Not much. That’s why big brands are using MRP-based data analytics coming from brand tracking. They are using the precise and reliable data that advanced data science can bring them in order to make sure the decisions they make are indeed based on fact.

    Data analytics for risk management

    Following on from the last point of big brands needing precise data to make informed decisions, they also need such data for risk management. Being able to grow as a brand is not just about knowing who their customers are, their intention to buy their product, etc., it is also about being able to foresee any potential risks and knocking them out of the park before they can cause any damage. Take for instance UOB bank in Singapore, who have devised a risk management system based on big data.

    Data analytics to predict consumer behavior

    As much as big brands need to look into the future, they also need to look to the past. Historical data can do wonders for future growth. Data analytics can be used to pinpoint patterns in consumer behavior. Using the data, they can potentially predict when a certain market may take a nosedive, as well as markets on an upward trend that are worth investing money into right now.

    Data analytics for better marketing

    A combination of data analytics looking at the past, present, and future of a big brand can make for better marketing, and in turn, more profit. By using data analytics to identify consumer needs and purchasing patterns, big brands can target with more personalized marketing, refine the overall consumer experience, and develop better products. Pay attention in your everyday life and you can already see examples of such data being used to market a product at you. A product you Googled once now appearing in your Facebook feed? Retargeting. Emails sounding like they are speaking directly to your needs? That’s because they are, since there are more than a few email marketing approaches. Data analytics was used to figure out exactly what you need.

    There is one important trend occurring across the different ways that big brands are using data analytics to bring results. They all aim to understand consumers, in particular, the brands’ target audience. Whether that be what consumers think of their brand now, how they reacted toward them in the past, and how brands think consumers will act in the future because of detected patterns.

    So, how are big brands using data analytics that will bring results? They are using them in a way that will help them better understand the consumer. 

    Author: Steve Habazin

    Source: Insidebigdata

  • How autonomous vehicles are driven by data

    How autonomous vehicles are driven by data

    Understanding how to capture, process, activate, and store the staggering amount of data each vehicle is generating is central to realizing the future of  autonomous vehicles (AVs).

    Autonomous vehicles have long been spoken about as one of the next major transformations for humanity. And AVs are already a reality in delivery, freight services, and shipping, but the day when a car is driving along the leafy suburbs with no one behind the wheel, or level five autonomy as it’s also known, is still far off in the future.

    While we are a long way off from having AVs on our roads, IHS Markit reported last year that there will be more than 33 million autonomous vehicles sold globally in 2040. So, the revolution is coming. And it’s time to be prepared.

    Putting some data in the tank

    As with so many technological advancements today, data is critical to making AVs move intelligently. Automakers, from incumbents to Silicon Valley startups, are running tests and racking up thousands of miles in a race to be the leader in this field. Combining a variety of sensors to recognize their surroundings, each autonomous vehicle uses radar, lidar, sonar and GPS, to name just a few technologies, to navigate the streets and process what is around them to drive safely and efficiently. As a result, every vehicle is generating a staggering amount of data.

    According toa report by Accenture, AVs today generate between 4 and 6 terabytes (TBs) of data per day, with some producing as much as 8 to 10 TBs depending on the number of mounted devices on the vehicle. The report says that on the low end, that means the data generated from one test car in one day is roughly the equivalent to that of nearly 6,200 internet users.

    While it can seem a little overwhelming, this data contains valuable insights and ultimately holds the key in getting AVs on the road. This data provides insights into how an AV identifies navigation paths, avoids obstacles, and distinguishes between a human crossing the road or a trash can that has fallen over in the wind. In order to take advantage of what this data can teach us though, it must be collected, downloaded, stored, and activated to enhance the decision-making capabilities of each vehicle. By properly storing and managing this data, you are providing the foundation for progress to be made securely and speedily.

    Out of the car, into the ecosystem

    The biggest challenge facing AV manufacturers right now is testing. Getting miles on the clock and learning faster than competitors to eliminate errors, reach deadlines, and get one step closer to hitting the road. Stepping outside of the car, there is a plethora of other elements to be considered from a data perspective that are critical to enabling AVs.

    Not only does data need to be stored and processed in the vehicle, but also elsewhere on the edge and some of it at least, in the data center. Test miles are one thing, but once AVs hit the road for real, they will need to interact in real-time with the streets they are driving on. Hypothetically speaking, you might imagine that one day gas stations will be replaced by mini data centers on the edge, ensuring the AVs can engage with their surroundings and carry out the processing required to drive efficiently.

    Making the roads safer

    While it might seem that AVs are merely another technology humans want to use to make their lives easier, it’s worth remembering some of the bigger benefits. The U.S. National Highway Traffic Safety Administration has stated that with human error being the major factor in 94% of all fatal accidents, AVs have the potential to significantly reduce highway fatalities by addressing the root cause of crashes.

    That’s not to say humans won’t be behind the wheel at all in 20 years, but as artificial intelligence (AI) and deep learning (DL) have done in other sectors, they will augment our driving experience and look to put a serious dent in the number of fatal road accidents every year, which currently stands at nearly 1.3 million.

    Companies in the AV field understand the potential that AI and DL technology represents. Waymo, for example, shared one of its datasets in August 2019 with the broader research community to enable innovation. With data containing test miles in a wide variety of environments, from day and night, to sunshine and rain, data like this can play a pivotal role in preparing cars for all conditions and maintaining safety as the No. 1 priority.

    Laying the road ahead

    Any company manufacturing AVs or playing a significant role in the ecosystem, from edge to core, needs to understand the data requirements and implement a solid data strategy. By getting the right infrastructure in place ahead of time, AVs truly can become a reality and bring with them all the anticipated benefits, from efficiency of travel to the safety of pedestrians.

    Most of the hardware needed is already there: radars, cameras, lidar, chips and, of course, storage. But understanding how to capture, process, activate, and store the data created is central to realizing the future of AVs. Data is the gas in the proverbial tank, and by managing this abundant resource properly, you might just see that fully automated car in your neighborhood sooner than expected.

    Author: Jeff Fochtman

    Source: Informationweek

  • How Big Data leaves its mark on the banking industry

    How Big Data leaves its mark on the banking industry

    Did you know that big data can impact your bank account, and in more ways than one? Here's what to know about the role big data is playing in finance and within your local bank.

    Nowadays, terms like ‘Data Analytics,’ ‘Data Visualization,’ and ‘Big Data’ have become quite popular. These terms are fundamentally tied predominantly to matters involving digital transformation as well as growth in companies. In this modern age, each business entity is driven by data. Data analytics are now very crucial whenever there is a decision-making process involved.

    Through this tool, gaining better insight has become much easier now. It doesn’t matter whether the decision being considered has huge or minimal impact; businesses have to ensure they can access the right data to move forward. Typically, this approach is essential, especially for the banking and finance sector in today’s world.

    The role of Big Data

    Financial institutions such as banks have to adhere to such a practice, especially when laying the foundation for back-test trading strategies. They have to utilize Big Data to its full potential to stay in line with their specific security protocols and requirements. Banking institutions actively use the data within their reach in a bid to keep their customers happy. By doing so, these institutions can limit fraud cases and prevent any complications in the future.

    Some prominent banking institutions have gone the extra mile and introduced software to analyze every document while recording any crucial information that these documents may carry. Right now, Big Data tools are continuously being incorporated in the finance and banking sector. 

    Through this development, numerous significant strides are being made, especially in the realm of banking. Big Data is taking a crucial role, especially in streamlining financial services everywhere in the world today. The value that Big Data brings with it is unrivaled, and, in this article, we will see how this brings forth positive results in the banking and finance world.

    The underlying concept 

    A 2013 survey conducted by the IBM’s Institute of Business Value and the University of Oxford showed that 71% of the financial service firms had already adopted analytics and big data. Financial and banking industries worldwide are now exploring new and intriguing techniques through which they can smoothly incorporate big data analytics in their systems for optimal results.

    Big data has numerous perks relating to the financial and banking industries. With the ever-changing nature of digital tech, information has become crucial, and these sectors are working diligently to take up and adjust to this transformation. There is significant competition in the industry, and emerging tactics and strategies must be accepted to survive the market competition. Using big data, firms can boost the quality and standards of their services.

    Perks associated with Big Data

    Analytics and big data play a critical role when it comes to the financial industry. Firms are currently developing efficient strategies that can woo and retain clients. Financial and banking corporations are learning how to balance Big Data with their services to boost profits and sales. Banks have improved their current data trends and automated routine tasks. Here are a few of the advantages of Big Data in the banking and financial industry:

    Improvement in risk management operations

    Big Data can efficiently enhance the ways firms utilize predictive models in the risk management discipline. It improves the response timeline in the system and consequently boosts efficiency. Big Data provides financial and banking organizations with better risk coverage. Thanks to automation, the process has become more efficient.Through Big Data, groups concerned with risk management offer accurate intelligence insights linked to risk management.

    Engaging the workforce

    Among the most significant perks of Big Data in banking firms is worker engagement. The working experience in the organization is considerably better. Nonetheless, companies and banks that handle financial services need to realize that Big Data must be appropriately implemented. It can come in handy when tracking, analyzing, and sharing metrics connected with employee performance. Big Data aids financial and banking service firms in identifying the top performers in the corporation.

    Client data accessibility

    Companies can find out more regarding their clients through Big Data. Excellent customer service implies outstanding employee performance. Aside from designing numerous tech solutions, data professionals will assist the firm set performance indicators in a project. It will aid in injective analytic expertise in multiple organizational areas. Whenever there is a better process, the work processes are streamlined. The banking and financial firms can leverage improved insights and knowledge of customer service and operational needs.

    Author: Matt Bertram

    Source: Smart Data Collective

  • How data can aid young homeless people

    How data can aid young homeless people

    What comes to mind when you think of a “homeless person”? Chances are, you’llpicture an adult, probably male, dirty, likely with some health conditions, including a mental illness. Few of us would immediately recall homeless individuals as family members, neighbors, co-workers and other loved ones. Fewer still arelikely aware of how many youths (both minors and young adults) experience homelessness annually.

    Homeless youth is a population who can become invisible to us in many ways. These youth may still be in school, may go to work, and interact with many of our public and private systems, yet not have a reliable and safe place to sleep, eat, do homework and even build relationships.

    Youth experiencing homelessness is, in fact, far more prevalent than many people realize, as the Voices of Youth Count research briefs have illustrated. Approximately 1 in 10 young adults (18-25) and 1 in 30 youth (13-17) experience homelessness over the course of a year. That’s over 4 million individuals.

    When I worked for the San Bernardino County Department of Behavioral Health, we ran programs specifically targeting homeless youth. The stories of lives changed from supportive care is still motivating!Myrole at the County focused primarily on data. At SAS, I have continued to explore ways data can support whole person care, which includes the effects of homelessness on health. 

    I see three primary ways data can be powerful in helping homeless youth: 

    1. Data raises awareness

    Without good data, it’s hard to make interventionithout good data, it’sWithout good data, it’s hard to make interventions. Health inequities is a good example of this: If we don’t know where the problem is, we can’t change our policies and programs.

    The National Conference of State Legislatures has compiled a range of data points about youth homelessness in the United States and informationon related policy efforts. This is wonderful information, and I appreciate how they connect basic data with policy.

    At the same time, this kind of data can be complicated to compile. Information about youth experiencing homelessness can be siloed, which inhibits a larger perspective, like a regional, statewide, or even national view. We also know there are many intersections with other public and private systems, including education, foster care, criminal justice, social services, workforce support and healthcare. Each system has a distinct perspective and data point.

    What would happen if we were able to have a continuous whole person perspective of youth experiencing homelessness? How might that affect public awareness and, by extension, public policy to help homeless youth?

    2. Data informs context and strengths

    While chronic health conditions are often present with homeless youth, this is also an issue with family members, leading to family homelessness. First off, this is an important example of not looking at people at just individuals, but as part of a bigger system. That fundamentally requires a more integrative perspective.

    Further, homeless youth experience higher rates of other social factors, such as interactions with foster care, criminal justice, and educational discipline (e.g., suspensions). Add on top of that other socio-economic contexts, including racial disparities and more youth from the LGBTQ+ communities.

    Just as I talked about the evaluation of suffering in general, having a more whole person perspective on homelessness is critical in understanding the true context of what may be contributing to homelessness… as well as what will help with it.

    It is easy to focus on all the negative outcomes and risk factors of homelessness in youth. What happens when we can start seeing folks experiencing homelessness as loved and meaningful members of our communitiesData that provides more holistic perspectives, including strengths, could help shift that narratives and even combat stigma and discrimination.

    In my role at San Bernardino County, I helped oversee and design program evaluation, including using tools, like the Child and Adolescent Needs and Strengths (CANS), to assess more holistic impacts of acute programs serving homeless youth. Broadening out our assessment beyond basic negative outcomes to includemetrics like resilience, optimism, and social support not only reinforces good interventions, but also helps us to see the youth experiencing homelessness as youth worthy of investment.

    That’s invaluable.

    3. Data empowers prevention and early intervention 

    Finally, homelessness is rarely a sudden event. In most cases, youth and their families experiencing homelessness have encountered one or more of our community systems before becoming homeless. I’ve talked before about using more whole person data to proactively identify people high-risk people across public (especially health) systems.

    This approach can lead to early identification of people at risk of homelessness. If we can identify youth and family in an early encounter with health, social services, foster care or even the criminal justice system, could we better prevent homelessness in the first place? Some people will still experience homelessness, but could this same approach also help us better identify what kinds of interventions could reduce the duration of homelessness and prevent it from recurring?

    With whole person data, we can continue to refine our interventions and raise more awareness of what best helps youth experiencing homelessness. For instance, research has recognized the value of trauma-informed care with this population. The National Child Traumatic Stress Network has a variety of information that can empower anyone to better help homelessyouth.

    In honor of National Homeless Youth Awareness Month and recognizing the importance of homelessness in general, I encourage you to explore some of these resources and read at least one to become more aware of the reality of the experience of homeless youth. That’s the first step in moving us forward.

    Author: Josh Morgan

    Source: SAS

  • How Greece set an example using online volunteering to battle COVID-19

    How Greece set an example using online volunteering to battle COVID-19

    Assistant Volunteer, a project of Nable Solutions, was born during the HackCoronaGreece online hackathon to better coordinate the efforts of volunteers. Today, Assistant Volunteer’s platform is part of the Greek Ministry of Health’s official response to eradicating the pandemic. 

    This year, online hackathons have proven to be a great source of ideation for easily scalable solutions during crises. From a shortage of medical equipment to caring for patients remotely, solutions to better manage the COVID-19 outbreak flourished globally. However, it was still to be discovered whether these solutions could be developed into mature products able to be integrated into the official’s response programs.

    On April 7th-13th, in Berlin and Athens, the global tech community tackled the most pressing problems Greece faced due to COVID-19 outbreak during the HackCoronaGreece online hackathon organized by Data NativeseHealth Forum and GFOSS with the support of GreeceVsVirus (an initiative by the Greek Ministry of Digital Governance, Ministry of Health, Ministry of Research & Innovation). Just two months later, Assistant Volunteer, matured its solution to the final stages of development and was selected by the Greek Ministry of Health to officially contribute to managing the COVID-19 pandemic in Greece. 

    The era of volunteering

    COVID-19 paved the way to a new era of volunteerism in response to the crisis. Even though isolated from each other, volunteer movements across the globe found ways to dedicate their time and efforts to help the ones in need and introduce innovative and effective ways of helping humanity.

    According to the United Nations, in Europe and Central Asia, the volunteer movement has been officially recognized by some governments for their services provided by volunteers during the COVID-19 pandemic. That’s exactly the case with HackCoronaGreece and the solutions that have been created by diverse communities.

    One such solution, Assistant Volunteer, recognizes the problem of coordination – when thousands of people are gathering for a good cause, their efforts deserve outstanding management to maximize positive effects. 

    What is Assistant Volunteer?

    Assistant Volunteer was developed as part of the HackCoronaGreece hackathon by Nable Solutions, an award-winning startup providing software solutions with a social cause. Assistant Volunteer is an easy-to-use volunteer management software platform for organizations and government agencies. It can be configured to support organizations of all types and sizes to achieve modernization and upgrade of the operations, seamlessly with their workflow. Through the modular architecture design, organisations can coordinate volunteers through the web app and mobile app. 

    Any organization can register, create a profile, come up with actions needed, engage with the database of volunteers, track performance & measure impact.

    Assistant Volunteer competed with 14 other teams to be selected in the finale of the HackCoronaGreece hackathon and continue the development of their idea. The solution was recognized by the Greek Ministry of Health and selected for assistance in further development. 

    Multinational pharma giant MSD Ssupports the project

    Another influential supporter of the project is MSD, a pharmaceutical multinational company that contributed with an award for Assistant Volunteer wich is a monetary prize of 7.000 EUR. 

    Previously, MSD Greece donated 100,000 euros to the Ministry of Health “to strengthen the national greek health system and to protect its citizens”. 

    MSD also donated 800,000 masks to New York and New Jersey. Working with Bill and Melinda Gates Foundation and other healthcare companies, MSD contributes to pushing the development of the vaccine forward, diagnostic tools, and treatments to treat COVID-19 as soon as possible.

    The Greek Ministry of Health included Assistant Volunteer in their official efforts to fight the pandemic and facilitated the population of the platform with 10000 volunteer profiles. Now, organizations can take the next steps in coordinating the volunteer movement in Greece and, potentially, beyond.

    Author: Evgeniya Panova

    Source: Dataconomy

     

  • How Nike And Under Armour Became Big Data Businesses

    960x0Like the Yankees vs the Mets, Arsenal vs Tottenham, or Michigan vs Ohio State, Nike and Under Armour are some of the biggest rivals in sports.
     
    But the ways in which they compete — and will ultimately win or lose — are changing.
     
    Nike and Under Armour are both companies selling physical sports apparel and accessories products, yet both are investing heavily in apps, wearables, and big data.  Both are looking to go beyond physical products and create lifestyle brands athletes don’t want to run without.
     
    Nike
     
    Nike is the world leader in multiple athletic shoe categories and holds an overall leadership position in the global sports apparel market. It also boasts a strong commitment to technology, in design, manufacturing, marketing, and retailing.
     
    It has 13 different lines, in more than 180 countries, but how it segments and serves those markets is its real differentiator. Nike calls it “category offense,” and divides the world into sporting endeavors rather than just geography. The theory is that people who play golf, for example, have more in common than people who simply happen to live near one another.
     
    And that philosophy has worked, with sales reportedly rising more than 70% since the company shifted to this strategy in 2008. This retail and marketing strategy is largely driven by big data.
     
    Another place the company has invested big in data is with wearables and technology.  Although it discontinued its own FuelBand fitness wearable in 2014, Nike continues to integrate with many other brands of wearables including Apple which has recently announced the Apple Watch Nike+.How Nike And Under Armour Became Big Data Businesses
     
    But the company clearly has big plans for its big data as well. In a 2015 call with investors about Nike’s partnership with the NBA, Nike CEO Mark Parker said, “I’ve talked with commissioner Adam Silver about our role enriching the fan experience. What can we do to digitally connect the fan to the action they see on the court? How can we learn more about the athlete, real-time?”
     
    Under Armour
     
    Upstart Under Armour is betting heavily that big data will help it overtake Nike. The company has recently invested $710 million in acquiring three fitness app companies, including MyFitnessPal, and their combined community of more than 120 million athletes — and their data.
     
    While it’s clear that both Under Armour and Nike see themselves as lifestyle brands more than simply apparel brands, the question is how this shift will play out.
     
    Under Armour CEO Kevin Plank has explained that, along with a partnership with a wearables company, these acquisitions will drive a strategy that puts Under Armour directly in the path of where big data is headed: wearable tech that goes way beyond watches
     
    In the not-too-distant future, wearables won’t just refer to bracelets or sensors you clip on your shoes, but rather apparel with sensors built in that can report more data more accurately about your movements, your performance, your route and location, and more.
     
    “At the end of the day we kept coming back to the same thing. This will help drive our core business,” Plank said in a call with investors. “Brands that do not evolve and offer the consumer something more than a product will be hard-pressed to compete in 2015 and beyond.”
     
    The company plans to provide a full suite of activity and nutritional tracking and expertise in order to help athletes improve, with the assumption that athletes who are improving buy more gear.
     
    If it has any chance of unseating Nike, Under Armour has to innovate, and that seems to be exactly where this company is planning to go. But it will have to connect its data to its innovations lab and ultimately to the products it sells for this investment to pay off.
     
     
    Source: forbes.com, November 15, 2016
  • How patent data can provide intelligence for other markets

    How patent data can provide intelligence for other markets

    Patents are an interesting phenomenon. Believe it or not, the number one reason why patent systems exist is to promote innovation and the sharing of ideas. Simply put, a patent is really a trade. A government has the ability to give a limited monopoly to an inventor. In exchange for this exclusivity, the inventor provides a detailed description of their invention. The application for a patent needs to include enough detail about the technology that a peer in the field could pick it up and understand how to make or practice that invention. Next, this description gets published to the world so others can read and learn from it. That's the exchange. Disclose how your invention is made, in enough detail to replicate, and you can get a patent.

    It gets really interesting when you consider that the patent carries additional metadata with it. This additional data is above and beyond the technical description of the invention. Included in this data are the inventor names, addresses, the companies they work for (the patent owner), the date of the patent filing, a list of related patents/applications, and more. This metadata and the technical description of the invention make up an amazing set of data identifying research and development activity across the world. Also, since patents are issued by governments, they are inherently geographic. This means that an inventor has to apply for a patent in every country where they want protection. Add the fact that patents are quite expensive, and we are left with a set of ideas that have at least passed some minimal value threshold. That willingness to spend money signals a value in the technology, specifically a value of that technology in the country/market where each patent is filed. In many ways, if you want to analyze a technology space, patent data can be better than analyzing products. The technology is described in substantial detail and, in many cases, identifies tech that has not hit the market yet.

    Breaking down a patent dataset

    Black Hills IP has amassed a dataset of over 100 million patent and patent application records from across the world. We not only use data published by patent offices, but we also run proprietary algorithms on that data to create additional patent records and metadata. This means we have billions of data points to use in analysis, and likely have the largest consolidated patent dataset in the world. In the artificial intelligence (AI) space alone, we have an identified set of between one hundred thousand and two hundred thousand patent records. This has been fertile ground for analysis and insight.

    Breaking down this dataset, we can see ownership and trends around foundational and implementational technologies. For example, several of the large US players are covering their bases with patent filings in multiple jurisdictions, including China. Interestingly enough, the inverse is not necessarily shown. Many of the inventions from Chinese companies have their patent filings (and thus protection) limited to just China. While many large US companies in the field tend to have their patent portfolios split roughly 50/50 between US and international patent filings, the top players in China have a combined distribution with well over 75% domestic and only the remainder in international jurisdictions. This means that there is a plethora of technology protected only within the borders of China, and the implications could be significant given the push for AI technology development in China and the wealth of resources available to fuel that development.

    So what?

    Why does all this matter? When patents are filed in a single jurisdiction only, they are visible to the world and open the door to free use outside the country of filing. In years past, we have seen Chinese companies repurpose Silicon Valley technologies for the China domestic market. With more of a historical patent thicket in the US than in China, this strategy made sense. When development and patent protection have been strong in the US, repurposing that technology in a less protected Chinese market is not only possible, but a viable business model. What we’re seeing now in the emerging field of AI technology, specifically the implementation of such technologies, is the pendulum starting to swing back.

    In an interesting reversal of roles, the publication of Chinese patents on technologies not concurrently protected in the US has the potential to drive a copying of Chinese-originated AI tech in the US market. We may see some rapid growth of implementational AI technologies in the US or other western countries, fueled by Chinese development and domestic-focused IP strategy. Of course, there are many other insights to glean out of this wealth of patent data. The use of these patent analytics in the technology space will only increase as the patent offices across the world improve their data reporting and availability. Thanks to advances by some of the major patent offices, visibility into new developments is getting easier and easier. Technology and business intelligence programs stand to gain substantially from the insights hidden in IP data.

    Author: Tom Marlow

    Source: Oracle

  • How the data-based gig economy affects all markets

    How the data-based gig economy affects all markets

    Data is infinite. Any organization that wants to grow at a meaningful pace would be wise to learn how to leverage the vast amount of data available to drive growth. Just ask the top five companies in the world today: Apple, Amazon, Google, Facebook, and Microsoft. All these technology giants either process or produce data.

    Companies like these with massive stockpiles of data often find themselves surrounded by other businesses that use that data to operate.Salesforce is a great example: Each year at its Dreamforce conference in San Francisco, hundreds of thousands of attendees and millions of viewers worldwide prove just how many jobs the platform has created.

    Other companies are using vast amounts of information from associated companies to enhance their own data or to provide solutions for their clients to do so. When Microsoft acquired LinkedIn, for instance, it acquired 500 million user profiles and all of the data that each profile has generated on the platform. All ripe for analysis.

    With so much growth evolving from a seemingly infinite ocean of data, tomorrow’s leading companies will be those that understand how to capture, connect, and leverage information into actionable insight. Unless they’re already on the top 10 list of the largest organizations, the problem most companies face is a shortage of highly skilled talent that can do it for them. Enter the data scientist.

    More data, more analysts

    The sheer amount of data at our fingertips isn’t the only thing that’s growing. According to an Evans Data report, more than 6 million developers across the world are officially involved in analyzing big data. Even traditionally brick-and-mortar retail giant Walmart plans to hire 2,000 tech experts, including data scientists, for that specific purpose.

    Companies old and new learned long ago that data analysis is vital to understanding customers’ behavior. Sophisticated data analytics can reveal when customers are likely to buy certain products and what marketing methods would be effective in certain subgroups of their customer base.

    Outside of traditional corporations, companies in the gig economy are relying even more on data to utilize their resources and workforce more efficiently. For example, Uber deploys real-time user data to determine how many drivers are on the road at any given time, where more drivers are needed, and when to enact a surge charge to attract more drivers.

    Data scientists are in demand and being hired by the thousands. Some of the most skilled data scientists are going the freelance route because their expertise allows them to choose more flexible work styles. But how can data scientists who aren’t interested in becoming full-time, in-house hires ensure that the companies for which they freelance are ready for their help?

    The data-based gig economy

    Gartner reports that the number of freelance data scientists will grow five times faster than that of traditionally employed ones by next year. The data-based gig economy can offer access to top talent on flexible schedules. But before data scientists sign on for a project, they should check to see that companies are prepared in the following areas:

    • Companies need to understand their data before they decide what to do with it. That data could include inventory, peak store hours, customer data, or other health metrics.
    • Next, businesses should have streamlined the way they collect and store their data to make it easy to analyze. Use of a CRM platform is a good indicator of preparedness at this stage.
    • Finally, companies need to be able to act on the insights they glean. After freelancers are able to use organizations’ collected and organized data to find valuable connections and actionable insights, those organizations should have a process for implementing the discoveries.

    Today’s organizations need data in order to be successful, and they need data scientists to make use of that data. In order for both parties to thrive in this era, companies need to have the right strategies in place before they invest in freelance talent. When they do, freelance data scientists will have the opportunity to gather critical knowledge from the data and use their talents to drive innovation and success.

    Author: Marcus Sawyerr

    Source: Insidebigdata

  • How to create a trusted data environment in 3 essential steps

    How to create a trusted data environment in 3 essential steps

    We are in the era of the information economy. Nowadays, more than ever, companies have the capabilities to optimize their processes through the use of data and analytics. While there are endless possibilities wjen it comes to data analysis, there are still challenges with maintaining, integrating, and cleaning data to ensure that it will empower people to take decisions.

    Bottom up, top down? What is the best?

    As IT teams begin to tackle the data deluge, a question often asked is: should this problem be approached from the bottom up or top down? There is no “one-size-fits-all” answer here, but all data teams need a high-level view to help you get a quick view of your data subject areas. Think of this high-level view as a map you create to define priorities and identify problem areas for your business within the modern day data-based economy. This map will allow you to set up a phased approach to optimize your most value contributing data assets.

    The high-level view unfortunately is not enough to turn your data into valuable assets. You also need to know the details of your data.

    Getting the details from your data is where a data profile comes into play. This profile tells you what your data is from the technical perspective. The high-level view (the enterprise information model), gives you the view from the business perspective. Real business value comes from the combination of both views. A transversal, holistic view on your data assets, allowing to zoom in or zoom out. The high-level view with technical details (even without the profiling) allows to start with the most important phase in the digital transformation: Discovery of your data assets.

    Not only data integration, but data integrity

    With all the data travelling around in different types and sizes, integrating the data streams across various partners, apps and sources have become critical. But it’s more complex than ever.

    Due to the sizes and variety of data being generated, not to mention the ever-increasing speed in go to market scenarios, companies should look for technology partners that can help them achieve this integration and integrity, either on premise or in the cloud.

    Your 3 step plan to trusted data

    Step 1: Discover and cleanse your data

    A recent IDC study found that only 19% of a data professional’s time is spent analyzing information and delivering valuable business outcomes. They spend 37% of their time preparing data and 24% of their time goes to protecting data. The challenge is to overcome these obstacles by bringing clarity, transparency, and accessibility to your data assets.

    Building this discovery platform, which at the same time allows you to profile your data, to understand the quality of your data and build a confidence score to build trust with the business using the data assets, comes under the form of an auto-profiling data catalog.

    Thanks to the application of Artificial Intelligence (AI) and Machine Learning (ML) in the data catalogs, data profiling can be provided as self-service towards power users.

    Bringing transparency, understanding, and trust to the business brings out the value of the data assets.

    Step 2: Organize data you can trust and empower people

    According to the Gartner Magic Quadrant for Business Intelligence and Analytics Platforms, 2017: “By 2020, organizations that offer users access to a curated catalog of internal and external data will realize twice the business value from analytics investments than those that do not.”

    An important phase in a successful data governance framework is establishing a single point of trust. From the technical perspective this translates to collecting all the data sets together in a single point of control. The governance aspect is the capability to assign roles and responsibilities directly in the central point of control, which allows to instantly operationalize your governance from the place the data originates.

    The organization of your data assets goes along with the business understanding of the data, transparency and provenance. The end to end view of your data lineage ensures compliance and risk mitigation.

    With the central compass in place and the roles and responsibilities assigned, it’s time to empower the people for data curation and remediation, in which an ongoing communication is from vital importance for adoption of a data driven strategy.

    Step 3: Automate your data pipelines & enable data access

    Different layers and technologies make our lives more complex. It is important to keep our data flows and streams aligned and adopt to swift and quick changes in business needs.

    The needed transitions, data quality profiling and reporting can extensively be automated.

    Start small and scale big. A part of intelligence these days can be achieved by applying AI and ML. These algorithms can take the cumbersome work out of the hands of analysts and can also be better and easier scaled. This automation gives the analysts faster understanding of the data and build better faster and more insights in a given time.

    Putting data at the center of everything, implementing automation and provisioning it through one single platform is one of the key success factors in your digital transformation and become a real data-driven organization.

    Source: Talend

  • How to prevent and deal with big data breaches

    How to prevent and deal with big data breaches

    On average, every data breach affects about 25,000 records and costs the affected organization almost $4 million. This cost comes in the form of brand damage, loss of customer trust, and regulatory fines. As data increases, so does your liability should a breach occur. More data also means that your systems become more valuable and appealing to potential attackers.

    Since no system is 100% secure you should prepare yourself for the inevitable attempted or successful breach. In this article, you’ll learn how big data is vulnerable. This should help you keep your data safe. You’ll also learn some best practices for handling any breach that does occur and how to minimize the damage caused.

    How is big data vulnerable?

    Generally, big data is as vulnerable as the system it’s stored in. It is also vulnerable due to the ways it is collected, stored and accessed, and also because of the personal information it often contains.

    Poor data validation

    Big data is collected from many sources, some of which may be insecure. The speed and quantity of data ingestion present many opportunities for attackers to tamper with data or introduce malicious data or files. When collecting data, you open yourself to risk if you do not verify where your data is coming from or ensure that it is safe and reliable. This includes verifying that it is transferred securely.

    Insufficient protection

    Big data tools, particularly open-source tools, often don’t have native or comprehensive security features. So, you must extend security from your existing tools and services. This 'bolted on' security may not interface well with your tooling and can leave gaps that you’re unaware of.

    Lack of data masking or encryption

    You often need to manipulate big data to use it in analyses. The access required for this manipulation creates times when data may not be masked or encrypted. Data masking is when you obscure identifying details from users and interfaces. During access times, data is vulnerable to breach, tampering, or corruption.

    Insecure interfaces

    Big data may be accessed from a variety of interfaces, including web consoles, cloud portals, and third-party integrations. These interfaces enable potential attackers to view, manipulate, and manage data. Vulnerabilities in these interfaces can provide direct access to your data and your systems.

    Distributed storage

    Big data is often stored in multiple locations, such as across distributed databases. While this creates redundancy and availability, storing data in multiple locations also makes it difficult to monitor and secure. Multiple storage locations provide a broader attack surface and increase the chance that attackers can access data through other parts of your system.

    Best practices for dealing with a big data breach

    Big data breaches often involve both data loss and compromised privacy. Both present a significant risk to you and your customers. The following best practices can help you deal with breaches appropriately and, hopefully, reduce these harms.

    Be transparent and notify all relevant parties

    When you discover a breach, it is important to be transparent and timely with your disclosure. This includes informing stakeholders, authorities, regulatory boards, and customers. You should also keep in mind that many regulatory agencies require notification within a specific period. In general, try to notify within 24 to 48 hours. 

    In your notifications, you should include known facts about the breach and the steps you are currently taking. It is better to prepare your shareholders and customers for the worst case than to understate the situation. This will improve your thrustworthiness. On top of that, if you discover that the breach is less serious, it will be a relief for stakeholders and customers.

    After the breach is contained and recovered from, you should share what steps were taken and what will be changed to prevent future breaches. You should not provide the specifics of actions taken throughout the response and recovery processes. Doing so can undermine your efforts by sharing information with attackers. Rather, provide clear, general statements about what is known and how you are taking action.

    Follow your Incident Response Plan

    You should already have an Incident Response Plan (IRP) in place. This plan outlines the responsibilities of your responders and how procedures should be followed and provides information on response priorities. An IRP ensures that your security team can carry out an efficient and effective response.

    Make sure to follow this plan and the procedures it outlines. If you deviate from the plan you are likely to overlook steps or contaminate evidence. Following the processes that you have already created and practiced can help reduce stress on responders and prevent them from making mistakes. Following your IRP also can ensure that responses are comprehensive and that actions are documented appropriately.

    Maintain privileged documentation

    Maintaining consistent documentation of your response measures is often necessary for regulatory compliance, and auditing after a breach. Document all actions you take, including who is performing the action and the tools and methods they are using. Include any approval of processes and the time and date of all related communications.

    As part of this documentation, make sure to keep a secure chain of custody of any breach evidence found. A chain of custody helps ensure that you can prosecute the responsible parties if they’re found. If you fail to document evidence or who has handled it, you risk losing valuable threat information and proof of the attacker’s actions.

    Learn from your mistakes

    While you cannot undo a breach, you can learn from your mistakes. It is vital to analyze data from the breach itself as well as your response to the breach. Refine your IRP and security policies and procedures based on your evaluation.

    Your first priority should be addressing vulnerabilities that were uncovered in the breach. This includes vulnerabilities that an attacker discovered but did not successfully exploit. Often attackers will return and attempt to infiltrate systems again and there is no excuse for their being able to reuse the same exploits.

    If you uncovered vulnerabilities during your response that were not associated with the breach, you should address these as well. Likewise, you should use the breach as an opportunity to discuss security with your teams, shareholders, and customers. Reinforce proper security measures and practices with training and information that they all can apply.

    Conclusion

    Despite your best efforts, at some point or another an attacker is likely to infiltrate your systems and data. When this happens, you need to respond quickly and efficiently. The sooner you can detect and contain an attack, the less data an attacker can steal.

    Hopefully, this article helped you understand how big data is vulnerable and the steps you can take to ensure an effective response. To reduce your chances of having to deal with a breach in the first place, take the time to properly secure your system. You can start by performing a vulnerability assessment to identify where your weaknesses are.

    Author: Gilad David Maayan

    Source: Dataversity

  • How to use AI image recognition responsibly?

    How to use AI image recognition responsibly?

    The use of artificial intelligence (AI) for image recognition offers great potential for business transformation and problem-solving. But numerous responsibilities are interwoven with that potential. Predominant among them is the need to understand how the underlying technologies work, and the safety and ethical considerations required to guide their use.

    Regulations Coming for image, face, and voice recognition?

    Today, governance regulations have sprung up worldwide that dictate how an individual’s personal information is held, used and who owns it. General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA) are examples of regulations designed to address data and security challenges faced by consumers and the businesses that possess their associated data. If laws now apply to personal data information, can regulations governing image and facial recognition (technology that can identify a person’s face and voice, the most personal 'information' we possess) be far behind? Further regulations are likely coming, but organizations shouldn’t wait to plan and direct their utilization. Businesses need to follow how this technology is being both used and misused, and then proactively apply guidelines that govern how to use it effectively, safely, and ethically.

    The use and misuse of technology

    Many organizations use recognition capabilities in helpful and transformative ways. Medical imaging is a prime example. Through machine learning, predictive algorithms come to recognize tumors more accurately and faster than human doctors can. Autonomous vehicles use image recognition to detect road signs, traffic signals, other traffic, and pedestrians. For industrial manufacturers and utilities, machines have learned how to recognize defects in things like power lines, wind turbines, and offshore oil rigs through the use of drones. This ability removes humans from what can sometimes be dangerous environments, improving safety, enabling preventive maintenance, and increasing frequency and thoroughness of inspections. In the insurance field, machine learning helps process claims for auto and property damage after catastrophic events, which improves accuracy and limits the need for humans to put themselves in potentially unsafe conditions.

    Just as most technologies can be used for good, there are always those who seek to use them intentionally for ignoble or even criminal reasons. The most obvious example of the misuse of image recognition is deepfake video or audio. Deepfake video and audio use AI to create misleading content or alter existing content to try to pass off something as genuine that never occurred. An example is inserting a celebrity’s face onto another person’s body to create a pornographic video. Another example is using a politician’s voice to create a fake audio recording that seems to have the politician saying something they never actually said.

    In-between intentional beneficial use and intentional harmful use, there are gray areas and unintended consequences. If an autonomous vehicle company used only one country’s road signs as the data to teach the vehicle what to look for, the results might be disastrous if the technology is used in another country where the signs are different. Also, governments use cameras to capture on-street activity. Ostensibly, the goal is to improve citizen safety by building a database of people and identities. What are the implications for a free society that now seems to be under public surveillance? How does that change expectations of privacy? What happens if that data is hacked?

    Why take proactive measures?

    Governments and corporate governance bodies likely will create guidelines and laws that apply to these types of tools. There are a number of reasons why businesses should proactively plan for how they create and use these tools now before these laws to come into effect.

    Physical safety is a prime concern. If an organization creates or uses these tools in an unsafe way, people could be harmed. Setting up safety standards and guidelines protects people and also protects the business from legal action that may result from carelessness.

    Customers demand accountability from companies that use these technologies. They expect their personal data to be protected, and that expectation will extend to their image and voice information as well. Transparency helps create trust and that trust will be necessary for any business to succeed in the field of image recognition.

    Putting safety and ethics guidelines in place now, including establishing best practices such as model audits and model interpretability, may also give a business a competitive advantage by the time laws governing these tools are passed. Other organizations will be playing catch-up while those who have planned ahead gain market share over their competitors.

    Author: Bethann Noble

    Source: Cloudera

  • How to use data science to get the most useful insights out of your data

    How to use data science to get the most useful insights out of your data

    Big data has been touted as the answer to many of the questions and problems businesses have encountered for years. Granular touch-points should simplify making predictions, solving problems, and anticipating the big picture down the road. The theory behind data science is a law of large numbers; similar to quantum physics, when we try to predict or analyze data lakes to draw a conclusion, it can only be a probability. Data cannot simply be read, it’s like a code that needs to be cracked.

    There’s an incredible amount of insight that can be gleaned from this type of information, including using consumer data to better inform their strategies and bottom lines. But the number of businesses that are actually implementing actionable steps from their data is minimal. So, how can companies ensure that they’re effectively managing the data they’re collecting in order to improve business practices?

    Identify what you’re looking to learn

    Too many companies invest heavily in software and people in a quest for big data and analytics without truly defining the problems that they’re looking to solve. Business leaders expect to instantly throw a wide net over all datasets, but they won’t necessarily get something useful in return.

    Take, for example, a doctor that spent over a year and a half implementing a new system that was supposed to give his colleagues meaningful medical insights.

    After collecting the data without truly defining the problem they wanted to solve, they ended up with the following insight: “Those who have had cancer have had a cancer test.” This, obviously, is a true statement culled from the data. The problem is it’s useless information.

    The theory behind data science was never meant for small data sets, and scaling to do so comes with a host of issues and irregularities; however, more data doesn’t necessarily mean better insights. Knowing what questions to ask is as important for a company as having the best tools for thorough data analysis.

    Prepare your data to be functional

    They say practice makes perfect, but with data science, practice makes permanent if you’re doing it the wrong way.

    The systems that companies use to keep track of data don’t have a lot of validation. Once you start diving into big data for insights, you realize there’s a whole layer of “sanitization” and transformation that needs to happen before you can start running reports and gleaning useful information.

    We’ve seen major companies doing data migration, but with an accuracy rate of 53%. Imagine if you went to the doctor mentioned in the previous section and he admitted his recommendations were only 53% correct. We can make a big bet you’re not going to that doctor anymore.

    To get quality data, you have to understand what quality data looks like. The human element and the machine have to work together; there needs to be an actionable balance. Data sources are constantly in flux, grabbing from new inputs from the outside world, ensuring a useful level of quality on the data coming in is critical or you’ll get questionable results.

    Depend on a reliable tech solution

    Once you have a clear path of checks and balances to ensure you’re on the right track, establishing a minimum viable product — potentially with a more efficient outsourced team — is what will truly drive actionable results. It makes sure the assumptions and projections derived from the insights are continually up to date, and looks from different angles to anticipate major trend changes.

    It’s important to see the big picture, but also be able to change a model’s behavior if it’s not delivering the most valuable insights. Whatever solution you settle on might not necessarily be the most sophisticated, but as long as it’s providing the answers to the right questions, it will be more impactful than something complex and obscure.

    When companies employ tools to untangle their stores of data without having a deep understanding of the limitations of data science, they risk making decisions based on faulty predictions, resulting in detriment to their organization. That means higher costs, incorrect success metrics and errors across marketing initiatives.

    Data science is still evolving very quickly. Although we will never get to the point that we can predict everything accurately, we will get a better understanding of problems to provide even more useful insights from data.

    Author: Luming Wang

    Source: Insidebigdata

  • How valuable is your data science project really? An evaluation guide

    How valuable is your data science project really? An evaluation guide

    Performance metrics can’t tell you what you want to know: how valuable a project actually is

    There is a big focus in data science on various performance metrics. Data scientists will spend months trying to improve various performance metrics for a project. The issue is, it isn’t clear that all of this effort actually provides value. If you’re only looking at performance metrics, it’s not possible to know if you’re increasing the value your model is providing.

    Performance metrics don’t know how valuable your predictions are. To take one example, F1 score explicitly places equal weighting on precision and recall. In practice, there is usually a different business cost to false positives and false negatives.

    You can tell you how well your model is doing using a million different metrics. But none of these tell you what stakeholders actually want to know: What business value does this have?

    Money Talks

    What is business value? At the end of the day, for a for-profit business, business value is monetary value.

    This is great news for data scientists: we love numbers. Money is quantitative.

    Unfortunately, the connection between what we’re building and the monetary value isn’t always straightforward.

    One of the most concrete ways to connect a data science project to business models is to calculate what implementing that model would mean for the company’s bottom line. This isn’t always possible, but it’s a useful exercise. By looking at the situations where we can calculate an explicit value, we can clarify the areas where the connection is less clear.

    A concrete example of calculating the business value of a model

    Let’s take a straightforward example where we are building a model for a business problem: detecting manufacturing defects at a widget factory.

    We know that if we detect a defective widget, we throw it out, leading to a loss of the manufacturing costs. The cost of replacing a defective widget is $100

    If we fail to detect a defective widget, we ship the widget to a customer, and then have to replace their widget and pay for shipping on the new widget. Let’s say the shipping cost is $50, on top of the $100 loss from replacing the widget.

    If we have a model for predicting defective widgets, we can then write out the costs for different scenarios:

    True positives: -$100 for being down a widget

    False positives: -$100 to replace widget we thought was defective

    True negative: 0 (we’re considering “no defect” to be the default scenario the costs are compared against)

    False negative: -$150 to ship replacement widget

    Credit: Tommy Blanchard

    Note that this is equivalent to saying the cost of a false positive is $100 (the difference between false positive and true negative) and the cost of a false negative is $50 (the difference between a false negative and a true positive).

    We can now build a classifier, and calculate for that classifier what the cost from defects would be if we used that classifier as our screening process. To evaluate the business value of the model is straightforward — we just need to produce a confusion matrix on the test set. Let’s take an example:

    Credit: Tommy Blanchard

    Then we multiply the cost of each outcome with the proportion of times that outcome occurs:

    (-100*0.2) + (-100*0.05) + (-150 * 0.05) + (0 * 0.8) = -32.50.

    In other words, on average, for each widget we will lose an average of $32.50 due to defects if we use this model.

    We can compare this to the cost of the current policy. Let’s assume that currently there is no screening for defects and all widgets are shipped. Based on the confusion matrix above, 25% of widgets are defective. So to calculate the cost of this policy, we just multiply 0.25 by the cost of shipping a defective product:

    -150 * 0.25 = -37.50

    This policy costs an average of $37.50 per widget. Therefore, implementing our model to screen out widgets will save the company an average of $5 per widget.

    That’s it! We’ve calculated the business value of our model. Of course, in the real world there may be costs to implementing a new policy, and those costs would have to be compared against the calculated gains of implementing the model.

    Asymmetric costs shift the optimal decision threshold

    With an explicitly defined cost matrix, we have opportunities to finetune our model to minimize costs further.

    Most classifiers by default use a probability decision threshold of 0.5 to determine what to label positive or negative, but with asymmetric costs that’s not necessarily the best threshold to use.

    For example, in this scenario, false negatives cost more than false positives ($100 for false positives vs $50 for false negatives). This pushes the optimal decision threshold lower; false negatives are less costly than false positives, so we should be more willing to take on false negatives.

    Here is a simple simulation of what the cost curve could look like for this cost matrix (note that the minimum on the cost curve is around 0.7, so that would be our optimal decision threshold):

    Credit: Tommy Blanchard
     
    An example cost curve for different probability thresholds. Costs have been normalized so that 1 is the most costly scenario.

    Therefore, we should label anything with a prediction probability above ~0.7 as defective, and everything else as not defective.

    Changing the costs matrix

    It’s important to realize that a change to the cost matrix can change not only the business value of the model, but also the optimal decision threshold.

    For example, let’s say someone at the company has developed a new test for defective widgets. It’s costly, so we don’t want to use it on every widget, but it definitively tells us if a widget is defective. If that test costs $20, we get a big change in the cost matrix for our model:

    True positives: -$120 ($100 for the cost of production, and an additional $20 for a definitive test to make sure it is defective)

    False positives: -$20 for the cost of the definitive test, which will exonerate good widgets

    True negative: 0 (we’re again considering “no defect” to be the default scenario the costs are compared against)

    False negative: -$150 to ship replacement widget

    Credit: Tommy Blanchard

    Because the cost of a false positive is now lower, this shifts the payoff curve. We should now be more willing to have false positives since they are not as costly, while false negatives remain just as costly:

    Credit: Tommy Blanchard
     
    With the change in the cost matrix, the cost curve shifts. Costs have been normalized so that 1 is the most costly scenario. 

    The optimal decision threshold that minimizes costs has shifted to around 0.3. We’ll label many more widgets as potentially defective, but that’s fine since now we’ll be submitting them to further testing instead of throwing them out. We can calculate how costly this policy will be overall, and compare it to other policies (for example, doing the $20 test on every widget).

    Generalizing in more ambiguous situations

    In the real world, it’s rare that we have such a well-defined problem. Defining a business problem in this way is what I’ve referred to as the hard part of data science.

    In the real world, costs aren’t well known and it’s rare to have a straightforward classification problem that completely captures the essence of the business problem. However, by looking at these simplified cases, we can approach the more complicated problems with greater clarity. Recognizing what is ambiguous or missing in a project definition is the first step towards clarifying the problem and connecting it to a technical solution that brings the most business value.

    Author: Tommy Blanchard

    Source: Towards Data Science

  • ING en TU Delft slaan handen ineen met nieuw AI lab

    ING en TU Delft slaan handen ineen met nieuw AI lab

    ING en de TU Delft bundelen hun kennis en expertise op het gebied van artificial intelligence (AI) binnen de financiële sector in het nieuwe AI for FinTech Lab (AFL). Het doel van de samenwerking met het AFL is om met kunstmatige intelligentie-technologie de effectiviteit en doelmatigheid van data- en software-analyse te verbeteren.

    ING en de TU Delft werken al langere tijd samen op het gebied van software onderzoek en ontwikkeling. Binnen het nieuwe AFL zullen onderzoekers en studenten van de TU Delft onderzoek doen naar de ontwikkeling van software voor de financiële sector, waaronder autonome software en systemen voor data-analyse en data-integratie. Binnen deze samenwerking biedt het ING een onmisbare IT-infrastructuur, een ambitieuze organisatiestructuur voor software-ontwikkeling, en een leidende positie op het gebied van data fluency en analytics delivery.

    Gevalideerde oplossingen

    Volgens Arie van Deursen, hoogleraar software engineering aan de TU Delft en wetenschappelijk directeur van het AFL, is het AFL voor de TU Delft een logische volgende stap in de samenwerking met ING. ''Het biedt de kans om nieuwe theorieën, methoden en tools op het gebied van kunstmatige intelligentie te ontwikkelen, en om nieuw talent aan ons te binden. Wij verwachten dat de samenwerking binnen het AFL niet alleen zal leiden tot baanbrekende theorieën, maar ook tot gevalideerde oplossingen die breed verspreid kunnen worden.”

    Görkem Köseoğlu, Chief Analytics Officer bij ING: ‘Het gebruik van klantdata biedt grote kansen om betere diensten te ontwikkelen, maar moet tegelijk zorgvuldig worden vormgeven. Voor klanten zijn hun data van groot belang en ING hecht er veel waarde aan dat klanten ons vertrouwen. De samenwerking met de TU Delft is daarom van groot belang om beide doelen te realiseren.’

    Het AFL bevindt zich op twee locaties: de ING campus in Amsterdam en de campus van de TU Delft in Delft. Zo brengt het studenten, software- en data-specialisten, onderzoekers en ondernemers van beide organisaties samen.

    AI for FinTech Lab en ICAI

    Het AFL is deel van het ICAI, het Innovation Center for Artificial Intelligence. Dit is een nationaal netwerk gericht op technologie- en talentontwikkeling tussen kennisinstellingen, industrie en de overheid op het gebied van Artificial Intelligence. ICAI's innovatiestrategie is georganiseerd rond industry labs, onderzoekslabs die meerjarige strategische samenwerkingen met het bedrijfsleven omvatten. Het AFL kan zo nog beter kennis en expertise uitwisselen met andere ICAI partners, zoals bijvoorbeeld Elsevier, Qualcomm, Bosch, Ahold Delhaize, en de Nationale Politie.

    Bron: BI platform

  • Intelligence, automation, or intelligent automation?

    Intelligence, automation, or intelligent automation?

    There is a lot of excitement about artificial intelligence (AI), and also a lot of fear. Let’s set aside the potential for robots to take over the world for the moment and focus on more realistic fears. There is a growing acceptance that AI will change the way we work. There is also agreement that it is likely to result in a number of jobs disappearing or being replaced by AI systems, and others appearing.

    This has fueled the discussion on the ethics around intelligence, especially AI. Thoughtful commentators note that it is unwise to separate the two. Some have suggested frameworks for the ethical development of AI. Underpinning ethical discussion, however, is a question of what AI will be used for exactly. It is hard to develop an ethics framework out of the blue. In this blog, this issue will be unpicked a little, sharing thoughts about where and how AI is used and how this will affect the value that businesses obtain from AI.

    Defining intelligence

    Artfiicial Intelligence has been defined as the ability of a system to interpret data, learn from it, and then use what it has learnt to adapt and therefore achieve particular tasks. There are therefore three elements to AI:

    1. The system has to correctly interpret data and draw the right conclusions.

    2. It must be able to learn from its interpretation.

    3. It must then be able to use what it has learnt to achieve a task. Simply being able to learn or, indeed, to interpret data or perform a task is not enough to make a system AI-based.

    As consumers, most of our contact with AI is with systems like Alexa and Siri. These are definitely "intelligent," in that they take in what we say, interpret it, learn from experience and perform tasks correctly as a result. However, in business, there is general acceptance that much of the real value from AI will come from automation. In other words, AI will be used to mimic or replace human actions. This is now becoming known as 'intelligent automation'.

    Where does intelligent start and automation stop though? There are plenty of tasks that can be automated simply and easily, without any need for an intelligent system. A lot of the time the ability to automate tasks is overshadowing the need for intelligence to drive the automation. This typically results in very well-integrated systems, which often have decision-making capabilities. However, the quality of those decisions is often ignored.

    Good AI algorithms can suggest extremely good options for decisions. Ignoring this limits the value that companies can get out of their investments in AI. Equally, failing to consider whether the quality of the decision is good enough can lead to poor decisions being made. This undermines trust in the algorithm. This results in less use for decisions, again reducing the value. But how can you assess and ensure the quality of the decisions made or recommended by the algorithm?

    Balancing automation and intelligence

    An ideal AI deployment should have a balance between automation and intelligence. If you lean too much towards the automation side and rely on simple rules-based automation, all you will be able to do is collect all the low-hanging fruit in this case. You will therefore miss out on the potential to use the AI system to support more sophisticated decision making. Lean too much towards other direction though, and you get intelligence without automation or systems like Alexa and Siri. Useful for consumers, but not so much for businesses.

    In business, analytics needs to be at the heart of an AI system. The true measure of a successful AI deployment lies in being able to mimic both human action and human decision making.

    An AI deployment has a huge range of components, it would not be unreasonable to describe it as an ecosystem. This ecosystem might contain audio-visual interpretation functions, multisystem and/or multichannel integration, and human-computer interface components. However, none of those would mean anything without the analytical brain at the centre. Without that, the rest of the ecosystem is simply a lifeless body. It needs the analytics component to provide direction and interpretation of the world around it.

    Author: Yigit Karabag

    Source: SAS

  • Keeping your data safe in an era of cloud computing

    Keeping your data safe in an era of cloud computing

    These cloud security practices for 2020 are absolutely essential to keep your data safe and secure in this new decade. 

    In recent years, cloud computing has gained increasing popularity and proved its effectiveness. There is no doubt that cloud services are changing the business environment. Small companies value the ability to store documents in the cloud and conveniently manage them. Large business players appreciate the opportunity to save money on the acquisition and maintenance of their own data storage infrastructure. The movement towards cloud technologies is perceived as an undoubtedly positive trend that facilitates all aspects of human interaction with information systems.

    Despite the obvious benefits of cloud technologies, there is a set of problematic issues that pose a significant threat to cloud users, such as:

    • The degree of trust to the cloud service provider;
    • Ensuring confidentiality, integrity, relevance, and incontrovertibility of information at all levels;
    • Loss of data control and data leaks;
    • Protection against unauthorized access;
    • Malicious insiders;
    • Saving personal data of users transmitted and processed in the cloud.

    Although cloud computing today is no longer a new technology, issues of ensuring data security represents a relevant point for users worldwide. Security concerns remain to be the main obstacle to the widespread adoption of cloud technologies. What are the main threats to cloud security today? How will they affect the industry? What measures are essential to keep your sensitive data confidential? Read on to figure it out!

    Risks associated with cloud computing

    As you can guess, cloud computing servers have become a very attractive target for hackers. A virtual threat is associated with the possibility of remote penetration due to the vulnerable infrastructure. Cybercriminal groups often steal users’ data for the purposes of blackmailing and committing various frauds. As a rule, cybercriminals focus on small business networks because they are easier to breach. At the same time, the cases of data leakages among large corporation still take place. Fraudsters often go after larger companies because of the allure of larger payouts.

    In November 2018, Marriott International announced that cyber thieves stole data on 500 million customers. The attackers’ targets were contact info, passport number, Starwood Preferred Guest numbers, travel information, credit card numbers and expiration dates of more than 100 million customers. Moreover, police officials have noted that the 'human factor' was directly related to the problem. Employees did not follow all security rules, which made the system vulnerable to hacker attacks.

    Security threats

    When a cloud service vendor supplies your business and stores your corporate data, you place your business in the partner’s hands. According to Risk Based Security research published in the 2019 MidYear QuickView Data Breach Report, during the first six months of 2019, there were more than 3,800 publicly disclosed breaches exposing 4.1 billion compromised records.

    In case you entrust your data to the cloud provider, you should be confident about the reliability of the cloud server. Thus, it is essential to be aware of the existing risk to prevent disclosure of your sensitive information.

    The cloud computing system can be exposed to several types of security threats, which can be divided into the following groups:

    • Threats to the integrity;
    • Threats to confidentiality;
    • Accessibility risks;
    • Authorization risks;
    • Browser vulnerabilities.

    Data security

    Nobody wants their personal information to be disclosed to the broad audience. However, according to Forbes research, unsecured Facebook databases leakages affected more than 419 million users.The principles of virtual technology pose potential threats to the information security of cloud computing associated with the use of shared data warehouses. When the data is transmitted from one VM to another, there is a risk of disclosure from a third party.

    Threats related to the functioning of virtual machines

    Virtual machines are dynamic. They are cloned and can move between physical servers. This variability affects the development of the integrity of the security system. However, vulnerabilities of the OS or applications in a virtual environment spread unchecked and often manifest after an arbitrary period of time (for example, when restoring from a backup). In a cloud computing environment, it is important to securely record the security status of the system, regardless of its location.

    Vulnerability of virtual environment

    Another major risk you may face is vulnerability within the virtual environment. Cloud computing servers and on-premises servers use the same OS and applications. For cloud systems, the risk of remote hacking or malware infection is high. An intrusion detection and prevention systems are installed to detect malicious activity at the virtual machine level, regardless of their location in the cloud.

    Blurring of network perimeter

    When you sign in your cloud, the network perimeter is blurred or disappears. This leads to the fact that the protection of the less secure part of the network determines the overall level of security. To distinguish between segments with different levels of trust in the cloud, virtual machines must be provided with protection by moving the network perimeter to the virtual machine itself. A corporate firewall is the main component for implementing IT security policies and delimiting network segments that will protect your business from undesired disclosure.

    Attacks on hypervisor

    The hypervisor is one of the key elements of a virtual system. Its main function lies in the sharing of resources across virtual machines. An attack on a hypervisor can help one virtual machine (usually installed on the fraudsters’ side) to gain access to the memory and resources of another. To secure your data, it is recommended to use specialized products for virtual environments, integrate host servers with the Active Directory service, use high password complexity and expiration policies, standardize procedures for accessing host server management tools, and use the built-in virtualization host firewall. It is also possible to disable frequently unused services such as web access to the virtualization server.

    Solutions to decrease cloud computing risks

    Encryption

    As you already know, most of the problems related to cloud technologies can be solved with the help of cryptographic information protection. Encryption is one of the most effective ways to protect data. The provider must encrypt the client’s information stored in the data center and also permanently delete it after it is removed from the server. Encryption makes users’ data useless for any person who does not have the keys to decrypt it. The owner of the encryption keys maintains data security and decides to whom, and to what degree their access should be provided.

    Encrypted data is available after authentication only. This data cannot be read or changed, even in cases of access through untrusted nodes. Such technologies are well known, and algorithms and reliable protocols AES, TLS, IPsec have long been used by providers.

    Authentication

    Authentication is an approach to ensure data security. In simple terms, it can be defined as a reliable password protection method. Certification and tokens can also be used to gain a higher level of reliability. For instance, such protocols as LDAP (Lightweight Directory Access Protocol) and SAML (Security Assertion Markup Language) can ensure your sensitive data is stored securely on the cloud server.

    Conclusion

    In these times, data security is more important than ever. Be sure to enact key cloud security measures as we head into 2020.

    Author: Ryan Kh

    Source: Smart Data Collective

  • Kunstmatige intelligentie leert autorijden met GTA

    Zelfrijdende auto toekomst-geschiedenis

    Wie ooit Grand Theft Auto (GTA) heeft gespeeld, weet dat de game niet is gemaakt om je aan de regels te houden. Toch kan GTA volgens onderzoekers van de Technische Universiteit Darmstadt een kunstmatige intelligentie helpen om te leren door het verkeer te rijden. Dat schrijft het universiteitsmagazine van MIT, Technology Review.

    Onderzoekers gebruiken het spel daarom ook om algoritmes te leren hoe ze zich in het verkeer moeten gedragen. Volgens de universiteit is de realistische wereld van computerspelletjes zoals GTA heel erg geschikt om de echte wereld beter te begrijpen. Virtuele werelden worden al gebruikt om data aan algoritmes te geven, maar door games te gebruiken hoeven die werelden niet specifiek gecreëerd te worden.

    Het leren rijden in Grand Theft Auto werkt ongeveer gelijk als in de echte wereld. Voor zelfrijdende auto’s worden objecten en mensen, zoals voetgangers, gelabeld. Die labels kunnen aan het algoritme, waardoor die in staat is om in zowel de echte wereld als de videogame onderscheid te maken tussen verschillende voorwerpen of medeweggebruikers.

    Het is niet de eerste keer dat kunstmatige intelligentie wordt ingezet om computerspelletjes te spelen. Zo werkte onderzoekers al aan een slimme Mario en wordt Minecraft voor eenzelfde doeleinde gebruikt als GTA. Microsoft gebruikt de virtuele wereld namelijk om personages te leren hoe ze zich door de omgeving moeten manoeuvreren. De kennis die wordt opgedaan kan later gebruikt worden om robots in de echte wereld soortgelijke obstakels te laten overwinnen.

    Bron: numrush.nl, 12 september 2016

     

  • Less is More: Confusion in AI Describing Terms

    Less is More: Confusion in AI Describing Terms

    An overview of the competing definitions for addressing AI’s challenges

    There are a lot of people out there working to make artificial intelligence and machine learning suck less. In fact, earlier this year I joined a startup that wants to help people build a deeper connection with artificial intelligence by giving them a more direct way to control what algorithms can do for them. We’re calling it an “Empathetic AI”. As we attempt to give meaning to this new term, I’ve become curious about what other groups are calling their own proposed solutions for algorithms that work for us rather than against us. Here’s an overview of what I found.

    Empathetic AI

    For us at Waverly, empathy refers to giving users control over their algorithms and helping them connect with their aspirations. I found only one other instance of a company using the same term, but in a different way. In 2019, Pega used the term Empathetic AI to sell its Customer Empathy Advisor™ solution, which helps businesses gather customer input before providing a sales offer. This is in contrast to the conventional approach of e-commerce sites that make recommendations based on a user’s behaviour.

    Though both Waverly and Pega view empathy as listening to people rather than proactively recommending results based on large datasets, the key difference in their approaches is who interacts with the AI. At Waverly, we’re creating tools meant to be used by users directly, whereas Pega provides tools for businesses to create and adjust recommendations for users.

    N.B. Empathetic AI shouldn’t be confused with Artificial Empathy (AE), which is a technology designed to detect and respond to human emotions, most commonly used in systems like robots and virtual assistants. There aren’t many practical examples of this today, but some notable attempts are robot pets that have a limited simulated emotional range like PleoAibo, and Cozmo. In software, there are attempts being made to deduce human emotions based on signals like your typing behaviour or tone of voice.

    Responsible AI

    This is the most commonly used term by large organizations that are heavily invested in improving AI technology. AccentureMicrosoftGoogle, and PwC all have some kind of framework or principles for what they define as Responsible AI.

    Here’s an overview of how each of these companies interprets the concept of Responsible AI:

    • Accenture: A framework for building trust in AI solutions. This is intended to help guard against the use of biased data and algorithms, ensure that automated decisions are justified and explainable, and help maintain user trust and individual privacy.
    • Microsoft: Ethical principles that put people first, including fairness, reliability & safety, privacy & security, inclusiveness, transparency, and accountability.
    • Google: An ethical charter that guides the development and use of artificial intelligence in research and products under the principles of fairness, interpretability, privacy, and security.
    • PwC: A tool kit that addresses five dimensions of responsibility (governance, interpretability & explainability, bias & fairness, robustness & security, ethics & regulation).

    Though it’s hard to extract a concise definition from each company, combining the different terms they use to talk about “responsibility” in AI gives us some insight into what these companies care about — or at least what they consider sellable to their clients.

    AI Fairness

    You might have noticed that fairness comes up repeatedly as a subset of Responsible AI, but IBM has the biggest resource dedicated solely to this concept with their AI Fairness 360 open source toolkit. The definition of fairness generally refers to avoiding unwanted bias in systems and datasets.

    Given the increasing public attention toward systemic problems related to bias and inclusivity, it’s no surprise that fairness is one of the most relevant concepts for creating better AI. Despite the seemingly widespread understanding of the term, there are still much needed conversations happening around the impacts of fairness. A recent article on HBR tried to make the case that fairness is not only ethical; it would also make companies more profitable and productive. To get a better sense of how the tiniest decision about an AI’s programming can cause massive ripples in society, check out Parable of Polygons, a brilliant interactive demo by Nicky Case.

    Trustworthy AI

    In 2018, The EU put together a high-level expert group on AI to provide advice on its AI strategy through four deliverables. In April 2019, the EU published the first deliverable, a set of ethics guidelines for Trustworthy AI, which claims that this technology should be:

    1. Lawful — respecting all applicable laws and regulations
    2. Ethical — respecting ethical principles and values
    3. Robust — both from a technical perspective while taking into account its social environment

    The guidelines are further broken down into 7 key requirements, covering topics like agency, transparency, and privacy, among others.

    Almost exactly a year later, Deloitte released a trademarked Trustworthy AI™ Framework. It’s disappointing that they don’t even allude to the extensive work done by the EU before claiming ownership over the term. And then they repurposed it to create their own six dimensions that look a lot like what everyone else is calling Responsible AI. To them, Trustworthy AI™ is fair and impartial, transparent and explainable, responsible and accountable, robust and reliable, respectful of privacy, safe and secure. The framework even comes complete with a chart that can be easily added to any executive’s PowerPoint presentation.

    Finally, in late 2020, Mozilla released their whitepaper on Trustworthy AI with their own definition.

    Mozilla defines Trustworthy AI as AI that is demonstrably worthy of trust, tech that considers accountability, agency, and individual and collective well-being.

    Though they did acknowledge that it’s an extension of the EU’s work on trustworthiness, the deviation from the EU-established understanding of Trustworthy AI perpetuates the trend of companies not aligning on communication.

    Explainable AI (XAI) and Interpretable AI

    All of these different frameworks and principles won’t mean anything if the technology is ultimately hidden in a black box and impossible to understand. This is why many of the frameworks discussed above refer to explainable and interpretable AI.

    These terms refer to how much an algorithm’s code can be understood and what tools can be used to understand it. They’re often used interchangeably, like on this Wikipedia page where interpretability is listed as a subset of explainability. Others have a different perspective, like the author of this article, who discusses the differences between the two and posits the terms on a spectrum.

    Due to the technical nature of these terms, my understanding of their differences is limited. However, it seems there’s a distinction needed between the term “Explainable AI” (XAI) and “explainable model”. The chart above depicts the different models that algorithms can be based on, whereas the Wikipedia page talks about the broader concept of XAI. At this point, it feels like splitting hairs rather than providing clarification for most people, so I’ll leave this debate to the experts.

    Competing definitions will cost us

    As I take stock of all these terms, I find myself more confused than reassured. The industry is using words that carry quite a bit of heft in everyday language, but redefining them in relatively arbitrary ways in the context of AI. Though there are some concerted efforts to create shared understanding, most notably around the EU guidelines, the scope and focus of each company’s definitions are different enough that it’s likely to cause problems in communication and public understanding.

    As a society, we seem to agree that we need AI systems that work in humanity’s best interest, yet we’ve still found a way to make it a race to see who gets credit for the idea rather than the solution. In fact, an analysis by OpenAI — the AI research and deployment company whose mission it is to ensure that AI benefits all of humanity — shows that competitive pressure could actually push companies to under-invest in safety and cause a collective action problem.

    Though alignment would be ideal, diversity at this early stage is a natural step toward collective understanding. What’s imperative is that we don’t get caught up trying to find terms that make our companies sound good and actually take the necessary steps to create AI systems that provide favourable outcomes for all of us.

    Author: Charlie Gedeon

    Source: Towards Data Science

  • Lessons From The U.S. Election On Big Data And Algorithms

    The failure to accurately predict the outcome of the elections has caused some backlash against big data and algorithms. This is misguided. The real issue is failure to build unbiased models that will identify trends that do not fit neatly into our present understanding. This is one of the most urgent challenges for big data, advanced analytics and algorithms.  When speaking with retailers on this subject I focus on two important considerations.  The first is that convergence of what we believe to be true and what is actually true is getting smaller.

    things-you-know-300x179

    This is because people, consumers, have more personal control than ever before.  They source opinions from the web, social media, groups and associations that in the past where not available to them.  For retailers this is critical because the historical view that the merchandising or marketing group holds about consumers is likely growing increasingly out of date.  Yet well meaning business people performing these tasks continue to disregard indicators and repeat the same actions.  Before consumers had so many options this was not a huge problem since change happened more slowly.  Today if you fail to catch a trend there are tens or hundreds of other companies out there ready to capitalize on the opportunity.  While it is difficult to accept, business people must learn a new skill, leveraging analytics to improve their instincts.

    The second is closely related to the first but with an important distinction; go where the data leads. I describe this as the KISS that connects big data to decisions.
    The KISS is about extracting knowledge, testing innovations, developing strategies, and doing all this at high speed. The KISS is what allows the organization to safely travel down the path of discovery – going where the data leads – without falling down a rabbit hole.
    KISS1-300x164
    Getting back to the election prognosticators, there were a few that did identify the trend.  They were repeatedly laughed at and disregarded. This is the foundation of the problem, organizations must foster environments where new ideas are embraced and safely explored.  This is how we will grow the convergence of things we know. 
     
    Source: Gartner, November 10, 2016
  • Localization uses Big Data to Drive Big Business

    There’s growing interest in using big data for business localization now, although the use of customer data for optimal orientation of busi

    localization

    ness locations and promotions has been around for at least a decade.

    There’s growing interest in using big data for business localization now, although the use of customer data for optimal orientation of business locations and promotions has been around for at least a decade.

    In 2006, the Harvard Business Review declared the endof big-box retail standardization in favor of catering to customers’ local and regional tastes, fostering innovation, and – not incidentally – making it harder for competitors to copy their store formats by changing up the one-size-fits-all approach. A decade later, analytics are affordable for businesses of all sizes, giving smaller players in a variety of industries the ability to localize as well.

    An example of early localization of items sold comes from Macy’s. Executive search firm Caldwell Partners describes the department-store chain’s vast localization project, which began in the mid-2000s to differentiate store inventories for customer preferences, beginning in markets such as Miami, Columbus, and Atlanta. This strategy has helped Macy’s remain profitable despite ongoing major declines in department-store sales in recent years.

    Localization for stronger consumer appeal, better product offerings

    In hospitality, hotel chains now use localization strategies to compete with locally owned boutique hotels and with Airbnb rentals that promise a “live like a local” experience.

    Visual News reports that Millennials’ tastes and preferences are driving this trend. These younger travel enthusiasts want a unique experience at each destination, even if they’re staying in properties owned by the same hotel brand.

    Hospitality Technology notes that today’s customer profile data gives hotel chains a “360 degree view of customer spending behavior across industries, channels, and over time,” for more precise location orientation and targeted marketing.

    In fact, any consumer-facing business can benefit from using local-market data. GIS firm ESRI has described how individual bank branches can orient their loan offerings to match the needs and risk profiles of customers in the immediate area. Other elements that can be localized to suit area customers’ tastes and spending power include product prices, menu items, location hours, staffing levels, décor, and product displays.

    Localization for more effective marketing

    Outside the store itself, localization is a powerful tool for improving the return on marketing. By using detailed data about local customer behavior, retailers, restaurants and other businesses can move from overly broad promotions to segmented offers that closely align with each segment’s preferences.

    In some cases, this type of marketing localization can reduce expenses (for example, by lowering the total number of direct-mail pieces required for a campaign) while generating higher redemption rates.

    Localization of marketing efforts goes beyond cost savings to the establishment of customer loyalty and competitive advantage. Study after study shows that consumers expect and respond well to offers based on their preferences, but companies have been slow to provide what customers want.

    An international study reported by Retailing Today in June found that 78% of consumers make repeat purchases when they receive a personalized promotion, and 74% buy something new. Despite this, the study found that less than 30% of the companies surveyed were investing heavily in personalization.

    A similar 2015 study focusing on North American consumers, described by eMarketer, found that more than half of the consumers surveyed wanted promotions tailored to their product preferences, age range, personal style, and geographic location. That study found that although 71% of the regional retailers in the survey say they localize and personalize promotional emails, half the consumers said they got promotional emails that didn’t align with their preferences.

    Clearly, there’s room for improvement in the execution of localized marketing, and businesses that get it right will have an advantage with customers whose expectations are going unmet right now.

    Smart localization and orientation involve understanding the available data and knowing how to use it in cost-effective ways to give customers the information they want. It also involves rethinking the way businesses and consumers interact, and the role geography plays in business.

    Localization and careful audience targeting may be the keys to business survival. A 2013 Forrester report proclaimed that in the digital age, “the only sustainable competitive advantage is knowledge of and engagement with customers.”

    With so much power of choice in the hands of consumers, it’s up to retailers, restaurants and other businesses to earn their loyalty by delivering what they want in real time, no matter where they’re located.

    Author: Charles Hogan

    Charles Hogan is co-founder and CEO at Tranzlogic. He has over 20 years of experience in fintech, data analytics, retail services and payment processing industries. Follow on twitter @Tranzlogic)

  • Machine learning, AI, and the increasing attention for data quality

    Machine learning, AI, and the increasing attention for data quality

    Data quality has been going through a renaissance recently.

    As a growing number of organizations increase efforts to transition computing infrastructure to the cloud and invest in cutting-edge machine learning and AI initiatives, they are finding that the main barrier to success is the quality of their data.

    The old saying “garbage in, garbage out” has never been more relevant. With the speed and scale of today’s analytics workloads and the businesses that they support, the costs associated with poor data quality are also higher than ever.

    This is reflected in a massive uptick in media coverage on the topic. Over the past few months, data quality has been the focus of feature articles in The Wall Street Journal, Forbes, Harvard Business Review, MIT Sloan Management Review and others. The common theme is that the success of machine learning and AI is completely dependent on data quality. A quote that summarizes this dependency very well is this one by Thomas Redman: ''If your data is bad, your machine learning tools are useless.''

    The development of new approaches towards data quality

    The need to accelerate data quality assessment, remediation and monitoring has never been more critical for organizations and they are finding that the traditional approaches to data quality don’t provide the speed, scale and agility required by today’s businesses.

    For this reason, highly rated data preparation business Trifacta recently announced an expansion into data quality and unveiled two major new platform capabilities with active profiling and smart cleaning. This is the first time Trifacta has expanded our focus beyond data preparation. By adding new data quality functionality, the business aims to gain capabilities to handle a wider set of data management tasks as part of a modern DataOps platform.

    Legacy approaches to data quality involve many manual, disparate activities as part of a broader process. Dedicated data quality teams, often disconnected from the business context of the data they are working with, manage the process of profiling, fixing and continually monitoring data quality in operational workflows. Each step must be managed in a completely separate interface. It’s hard to iteratively move back-and-forth between steps such as profiling and remediation. Worst of all, the individuals doing the work of managing data quality often don’t have the appropriate context for the data to make informed decisions when business rules change or new situations arise.

    Trifacta uses interactive visualizations and machine intelligence guides help users by highlighting data quality issues and providing intelligent suggestions on how to address them. Profiling, user interaction, intelligent suggestions, and guided decision-making are all interconnected and drive the other. Users can seamlessly transition back-and-forth between steps to ensure their work is correct. This guided approach lowers the barriers to users and helps to democratize the work beyond siloed data quality teams, allowing those with the business context to own and deliver quality outputs with greater efficiency to downstream analytics initiatives.

    New data platform capabilities like this are only a first (albeit significant) step into data quality. Keep your eyes open and expect more developments towards data quality in the near future!

    Author: Will Davis

    Source: Trifacta

  • Magic Quadrant: 17 top data science and machine learning platforms

    RapidminerRapidMiner, TIBCO Software, SAS and KNIME are among the leading providers of data science and machine learning products, according to the latest Gartner Magic Quadrant report.

    About this Magic Quadrant report

    Gartner Inc. has released its "Magic Quadrant for Data Science and Machine Learning Platforms," which looks at software products that enable expert data scientists, citizen data scientists and application developers to create, deploy and manage their own advanced analytic models. According to Gartner analysts and report authors Carlie Idoine, Peter Krensky, Erick Brethenoux and Alexander Linden, "We define a data science platform as: A cohesive software application that offers a mixture of basic building blocks essential for creating all kinds of data science solutions, and for incorporating those solutions into business processes, surrounding infrastructure and products." Here are the top performers, categorized as Leaders, Challengers, Visionaries or Niche Players.

    Leaders

    According to the Gartner analysts, “Leaders have a strong presence and significant mind share in the data science and ML market. They demonstrate strength in depth and breadth across the full data exploration, model development and operationalization process. While providing outstanding service and support, Leaders are also nimble in responding to rapidly changing market conditions. The number of expert and citizen data scientists using Leaders’ platforms is significant and growing. Leaders are in the strongest position to influence the market’s growth and direction. They address the majority of industries, geographies, data domains and use cases, and therefore have a solid understanding of, and strategy for, this market.” 

    RapidMiner

    RapidMiner is based in Boston, MA. Its platform includes RapidMiner Studio, RapidMiner Server, RapidMiner Cloud, RapidMiner Real-Time Scoring and RapidMiner Radoop. “RapidMiner remains a Leader by striking a good balance between ease of use and data science sophistication,” the Gartner analysts say. “Its platform’s approachability is praised by citizen data scientists, while the richness of its core data science functionality, including its openness to open-source code and functionality, make it appealing to experienced data scientists, too.”

    Tomorrow nr 3 of Data Science platform suplliers

    Source: Information Management

    Author: David Weldon

  • Moving your projects to the cloud, but why?

    Moving your projects to the cloud, but why?

    Understanding the cloud main advantages and disadvantages

    In this article, we are going to change the context slightly. In the last articles, we have been talking about data management, the importance of data quality, and business analytics. This time, I am very excited to announce to you that we are going to explore, over the next few weeks, a current trend that will affect all companies in the decade in which we find ourselves: the cloud. I know that the topic cloud is very broad since it has a lot of concepts so we’ll focus on data in the cloud.

    I thinkby now, we have all heard about the cloud and its capabilities but, do you know all the benefits and implications it has? In this first post, I would like to explore the basic concepts of the cloud with you, and in the next few weeks, accompany you on a trip about how we can find relevant insights using the cloud resources.

    First of all, I want you to understand why this post is for you. So, if you are…

    an individual, whether you’re in business or tech, you need to understand these concepts and how the cloud is changing the game.

    a company, you must have a cloud strategy. We are not talking about having your workload 100% migrated to the cloud tomorrow, but you should have a roadmap for the next few years.

    What is cloud computing?

    At this point, I would like to define what cloud computing is. Since 2017 an infinite amount of statements have been distributed over social networks saying:

    ''Cloud computing is just someone’s else computer''

    This false idea has spread over the Internet. I must admit that I had a sticker on my laptop with that slogan a few years ago. But the truth is if you say that, you are not understanding well what cloud computing is. It is true that, reduced to a minimum, cloud computing is about renting compute power from others for your purposes, an infinite world of possibilities has been raised over this idea with implications at all organizational levels of a company.

    Let’s talk about the advantages

    The economy of scale

    As you surely know, today everything is done on a large scale, especially when we talk about the world of data. For this reason, we must be able to operate less expensively and more efficiently when we do things on a large scale. The cloud takes advantage of the economy of scale, allowing our businesses to grow and be more profitable as they grow.

    Pay-as-you-go

    Another of the many advantages of cloud computing affects the financial level because it changes the spending model. You should understand these spending models well, to know why it is an advantage of the cloud.

    • Capital expenditure (CapEx): they consist of an investment in a fixed asset and then deducting that expense from your tax bill over time. Examples of assets that would fall into this category could be buildings, equipment, or, more specifically, when you buy a server or a data center (On-premise).
    • Operational expenditure (OpEx): can be understood as expenses necessary for the operation of the business. You can deduct this expense from your tax bill in the same year. There are no upfront costs, you pay for what you use.

    Operational expenses enable a pay-as-you-go pricing model, which allows your company to reduce costs and gain flexibility.

    Reduced time-to-market

    Thanks to the cloud, the time-to-market for new products or the growth of the existing ones is reduced.

    If your company, regardless of its size, wants to try a new product, with the cloud you will be able to do so much more agilely, since it allows you to allocate resources in a much faster and more precise way.

    On the other hand, if you already have a product running and want to make it grow to other countries, the cloud will allow you to do it much more efficiently.

    Scalability, elasticity and reliability

    Another advantage of the cloud is closely related to the pay-as-you-go model. In this case, we are talking about scalability and elasticity, which allows your business to constantly adapt to demand. This has two aspects: on the one hand, it prevents you from incurring extra costs when you have wasted infrastructure and, on the other, it allows your business to grow as demand grows, guaranteeing the quality of the service.

    Also, the cloud allows you to increase the reliability of your technology through disaster recovery policies, data replication, or backups.

    Focus on the business

    With the shared responsibility models of cloud providers, you can free yourself from certain responsibilities and put a greater focus on growing your business. There are different cloud models, which we will see below, but I anticipate that depending on the model you choose, the distribution of responsibility will vary.

    It’s not about being carefree. Using technology always carries a great responsibility and many aspects must always be born in mind, especially when we talk about data. However, in any cloud model, there will always be a delegated party to the provider, which will allow you to free yourself to a greater or lesser extent from recurring and costly tasks for your business.

    Security

    I believe that security should always have a separate section. Closely related to economies of scale, security solutions are cheaper when deployed on a large scale, and the cloud takes advantage of this. Security is a key element today, being a differentiator for many clients like you. This demand makes cloud providers put special focus on security.

    Finally, and related to the shared responsibility model, depending on the solutions implemented, the cloud provider usually acquires certain maintenance responsibilities such as the installation of updates, application of security patches, or security implementations at the infrastructure level so you don’t have to worry about these tasks.

    But why does nobody talk about the risks?

    There are always two sides. We have talked about the advantages and I am sure that many, if not all, you would already know. I hope that at this point, you have gathered that the cloud offers you great opportunities whether you are a small, medium, or large company.

    Now, why do so few people talk about risks? There is no perfect solution, so I think it is just as important to know the benefits as it is to talk about the risks. When you have all the information on the table, you can make a decision with a much more objective criterion than just seeing a part.

    Provider dependency

    When you use any technology, a link is established between your business and that technology. A dependency is created that can be higher or lower depending on the technology and the function it has in your business. This dependency gives cloud providers greater bargaining power, as switching costs arise that were not so present before. For example, if we use accounting software or a CRM in the cloud, the switching costs are very high, because they perform very core functions in your business.

    The same happens with the infrastructure, if all your technology infrastructure relies on a specific cloud provider, it gives that provider greater control. For example, if the cloud provider decides to change prices and you have all your infrastructure hosted with that provider, you have two options: either you accept the changes or you incur the cost of infrastructure migration.

    Not all services are available everywhere

    Not all cloud providers offer the same services and the same services from one provider are not available worldwide. You may need to use a service offered by a certain provider that is available in a geographic area that interests you. Now, if you need to scale to other geographic regions, that service may not be available and your ability to act will be limited.

    On the other hand, and related to the previous point, the fact that you use a specific service with a certain provider does not imply that should the time come when you need to change providers, you can do it since not all providers have the same catalog of services.

    As you have seen, the cloud has great potential for your business since it allows you to gain agility, reduce time to market and optimise costs, which, with on-premise solutions, will be much more difficult. However, in addition to the advantages, you must always keep in mind the main disadvantages, since dependencies and change costs that were not so present before may well appear.

     
  • Multi-factor authentication and the importance of big data

    Multi-factor authentication and the importance of big data

    Big data is making a very big impact on multi-factor authentication solutions. Here's how and why this is so important to consider.

    Big data is already playing an essential role in authentication, and as security risks mount, this concern will become greater than ever.

    Kaushik Pal wrote an insightful article on Technopedia a couple of years ago about user authentication and big data. The importance of user authentication has risen more and more since that article was first published. Experts are now discussing the role of multi-factor authentication (MFA) solutions. Big data is proving to be a vital component to that.

    How does big data influence multi-factor user authentication?

    In today’s day and age cybersecurity issues are ever/growing, and simple passwords as security measures are no longer safe. According to some sources, 86% of passwords are notoriously insecure. But even passwords that seem to be secure are vulnerable if they aren’t managed well and protected with additional authentication options.

    As soon as your password has been exposed by malicious parties, they will access your account and they can do whatever they want with it. Fortunately, multi-factor authentication solutions came into existence. But what exactly is a big data based multi-factor authentication? And how canthese solutions help you? If you’re interested to learn more, then keep on reading.

    What is multi-factor authentication?

    Multi-factor authentication or MFA recognizes online users by carefully validating two or more claims offered by the users, from various types of validation. And this would not be possible without contributions from big data. The basic types of validation used include:

    1. Something you have, like a trusted and known device.
    2. Something you know, like a PIN or password.

    The theory behind the multi-factor authentication is that the joint factors of validation are stronger compared to their individual aspects.

    To make the definition simpler MFA incorporates a second, regularly physical method to verify a person’s real identity. Furthermore, MFA is rapidly becoming a standard for more secure as well as safer logins. Big data is playing an important role in addressing these shortcomings.

    Reasons why you should use multi-factor authentication solutions

    Big data has made multi-factor user authentication possible. But what are the core benefits? We list 5 of the many reasons to use MFA for you here:

    1. Enhance security

    Multi-factor authentication is one of the best solutions that you may want to take advantage of, especially if your main goal is to improve security. Big data has made this easier than ever.

    The main benefit of MFA is that it offers additional protection and security by adding new layers of protection. The more factors or layers in place, the smaller the risk of digital burglars obtaining important systems and data.

    2. Increase productivity and flexibility

    Another benefit of using this type of solution is that it replaces the encumbrance of passwords by changing them with alternatives that have the capability to improve productivity. Predictive analytics and other big data technology have made this possible.

    In addition, multi-factor authentication solutions can also bring an improved usability experience because of the improved flexibility of factor kinds.

    3. Achieve compliance

    Big data is vital for ensuring compliance. With multi-factor authentication, you will be able to attain the important compliance requirements particular to your organization that in turn alleviate audit findings as well as avoiding possible penalties.

    4. Simplify the login process

    As we all know, a difficult password does not excel in user-friendliness. Despite the fact that multi-factor authentication adds additional steps, it actually makes the login process a lot easier.

    Single sign-on, for instance, is one-way multi-factor authentication accelerates the said process. For example, a person using an Office suite needs to sign in through multi-factor authentication especially if it is his/her first time to use the app in his/her device.

    5. Location restrictions

    You can use multi-factor authentication to limit or allow login access depending on the current location of the user. If you’re working outside your office frequently or using your personal device, you are putting your company and personal data at risk from physical theft. Multi-factor authentication, on the other hand, can be also used to recognize when a certain user is looking for access from unknown locations.

    Big data is essential for ensuring multi-factor authentication

    Big data is very important for making sure user security is adequate. It has helped by introducing multi-factor authentication. Multi-factor authentication solutions can be of great help, especially if you have been compromised or an unknown person tries to use your password as well as username. Hopefully, you have learned a lot from this post.

    Author: Sean Mallon

    Source: Smart Data Collective

  • Nieuwe API manager uitgebracht door InterSystems

    Nieuwe API manager uitgebracht door InterSystems

    InterSystems heeft de nieuwe functie InterSystems API Manager geïntroduceerd in InterSystems IRIS Data Platform 2019.2. Met deze tool kunnen gebruikers verkeer van en naar webgebaseerde API's binnen hun IT-infrastructuur bewaken en beheren.

    Het aantal API-implementaties stijgt in verschillende sectoren naarmate meer bedrijven service-oriented applicatielagen bouwen. Naarmate dit aantal groeit, worden softwareomgevingen meer gedistribueerd, waardoor het van cruciaal belang is dat API-verkeer goed wordt beheerd en bewaakt. InterSystems API Manager biedt gebruiksgemak door ontwikkelaars in staat te stellen al het verkeer door een gecentraliseerde gateway te leiden en verzoeken door te sturen naar de juiste target nodes. Ontwikkelaars die API Manager van InterSystems gebruiken, kunnen ook:

    • al het API-verkeer op een centrale locatie monitoren, zodat gebruikers problemen kunnen identificeren en oplossen;
    • API-verkeer controleren door de doorvoer te beperken, toegestane payload-groottes te configureren, IP-adressen op de witte lijst of zwarte lijst te zetten;
    • interne en externe ontwikkelaars interactieve API-documentatie bieden via een speciale en aanpasbare ontwikkelaarsportal; en
    • API's op één centrale plaats veilig stellen.

    'Ontwikkelaars gebruiken en implementeren nieuwe API's razendsnel om te voldoen aan de eisen van digitale transformatie, en ze hebben een intuïtieve managementoplossing nodig om hun succes mogelijk te maken', zegt Scott Gnau, hoofd van InterSystems Data Platforms. 'We zijn verheugd om klanten van het InterSystems IRIS-dataplatform API-beheermogelijkheden te bieden die zorgen voor naadloze integratie en snelle implementatie'.

    InterSystems API Manager is eenvoudig te configureren via een intuïtieve, webgebaseerde gebruikersinterface of via API calls, waardoor het een eenvoudig hulpmiddel is voor externe implementaties. InterSystems API Manager wordt vrijgegeven als een container om een gemakkelijke acceptatie mogelijk te maken. Ontwikkelaars kunnen ook een InterSystems API Manager-cluster configureren dat uit meerdere knooppunten bestaat om de doorvoercapaciteit te schalen en latency laag te houden.

    Bron: BI-platform

  • NLP: the booming technology for data-driven businesses

    NLP: the booming technology for data-driven businesses

    Becoming data-driven is the new mantra in business. The consensus is that the most value can be achieved by getting the data in the hands of decision makers. However, this is only true if data consumers know how to handle data. Getting managers to think like data scientists is one way to approach this challenge, another is to make data more approachable and more human. As such, it’s no surprise that natural language processing (NLP) is the talk of the data-driven town.

    Emergence of a new type of language

      When considering the next generation of digital user interaction NLP might not be the first thing that comes to mind, it’s by no means a new concept or technology. As referenced on Wikipedia, NLP is a subfield of computer science, information engineering, andartificial intelligence (AI) concerned with the interactions between computers and human (natural) languages. In particular: how to program computers to process and analyze large amounts of natural language data.

    Developments in the related disciplines of machine learning (ML) and AI are propelling the use of NLP forward. Industry leaders like Gartner have claimed conversational analytics as an emerging paradigm. This shift enables business professionals to explore their data, generate queries, and receive and act on insights using natural language. This can through be voice or text, through mobile devices, or through personal assistants for example.

    Becoming fluent in NLP

    When facing strategic obstacles that can hinder innovation and muddy the decision-making process, such as organizational silos, Deloitte found that businesses with leaders who embody the characteristics of the Industry 4.0 persona “the Data-Driven Decisive” are overcoming these roadblocks through a methodical, data-driven approach and are often bolder in their decisions.

    In order to effectively apply data throughout an organization, companies need to provide employees with a base-level understanding of the importance and role of data within their business. Looking at the overwhelming demand to be met here, this challenge needs to be approached from both ends through education and tools.

    Teaching employees how they can use data and which questions to ask will go some distance to establishing a group of data-capable individuals within the workforce. Giving them the effective media through which they can consume data exponentially increases the number of people that can manipulate, analyze, and visualize data in a way that allows them to make better decisions.

    The aim is not to convert everyone into a data scientist. Data specialists will still be needed to do more forward-looking number crunching, and both groups might yield different solutions. Natural language processing as used in Tableau’s Ask Data solution mainly aims to lower the bar for all the non-data experts to use data to improve the results of their day-to-day jobs.

    Deciphering ambiguity

    Inference remains an area where things can get a bit complicated. NLP is good at interpreting language and spotting ambiguity in elements when there isn’t enough clarity in data sets.

    A business user enters a search term in Ask Data and sees the answer being presented in the most insightful way. But pulling out the right elements from the right tables and variables, the actual SQL query under the hood, is hidden from the user’s view.

    NLP is good at leaving no stone unturned when it comes to solving problems. But NLP alone is not the best interface when the user doesn’t know enough about what they’re looking for, can’t articulate a question, or would rather choose from a list of options. For example, a user might not know what the name of a particular product is, but if they click to view a menu with a list of products to filter through, they’ll be able to make an easier choice. This is where mixed-modality systems like Ask Data shine.

    NLP is still not the most effective at resolving a query when there’s lots of room for interpretation—especially when it hasn’t seen the specific query before. For example, if a colleague were to ask to “email Frank,” then we as humans tend to know to look for the Franks we know professionally, not the Franks in our family or circle of friends. As humans, we have the advantage of tapping our memory to inform the context of a request based on who is making the request. NLP still has some catching up to do in this department.

    Enabling a culture of data

    For companies looking to start talking with their data, the most important first step is to enable a culture of data. It is also important to pay attention to the needs and wants of the people that are required to handle data.

    As with a lot of other implementations, starting with a small team and then expanding tends to be a successful approach. By equipping your team with the tools needed to explore data and ask questions, the team will then get exposed to the new ways data can be accessed. It’s also vital to make them aware of the growing global community resource of data explorers that function as a sharing economy of tips and tricks.

    Lastly, as functionality is still very much developing, providing insight to vendors to inform product updates and new capabilities is invaluable. Endless chatter will get you nowhere. Meaningful conversations, with data, are the ones that count.

    Author: Ryan Atallah

    Source: Tableau

  • Noord-Nederland bundelt krachten in unieke opleiding Data Science

    HanzeHogeschool logo-300x169Op 7 maart start de opleiding Data Science in Noord-Nederland. Om de al maar groeiende hoeveelheid data te managen leidt IT Academy Noord-Nederland professionals uit het Noorden op tot data scientist. Met geaccrediteerde vakken van de Hanzehogeschool Groningen en de Rijksuniversiteit Groningen slaat de opleiding een brug tussen toegepast en wetenschappelijk onderwijs. De opleiding is opgezet in samenwerking met het bedrijfsleven.

    Er liggen steeds meer kansen voor bedrijven en instellingen om met enorme hoeveelheden data op innovatieve wijze nieuwe producten en diensten aan te bieden. Hoe kunnen bedrijven omgaan met deze data en hoe zit het met privacy en het eigendom van data? Het verzamelen van data is stap één, maar het kunnen ordenen en analyseren creëert waarde. Een bekend voorbeeld is Uber die door het gebruik van Big Data een compleet nieuw (disruptive) business model voor de vervoerssector heeft gecreëerd.


    De vraag naar data scientists neemt toe. De opleiding Data Science is de eerste van zijn soort in Noord-Nederland. Het RDW speelde met haar data-intensieve bedrijfsvoering en roep om een opleiding op het gebied van Big Data een cruciale rol in de ontwikkelfase van de opleiding. Om het programma met de juiste elementen te laden bundelde de IT Academy de krachten van de Hanzehogeschool en de RUG. Hoogleraren en docenten van beide instellingen zullen delen van het programma verzorgen. Daarnaast zorgen gastsprekers van andere kennisinstellingen en het bedrijfsleven voor casuïstiek uit de praktijk om de opgedane kennis gelijk toe te passen.

    IT Academy Noord-Nederland
    IT Academy Noord-Nederland biedt state-of-the-art onderwijs, doet onderzoek door middel van open samenwerking tussen bedrijven, kennisinstellingen en organisaties om zo in Noord-Nederland het innovatief vermogen te versterken, werkgelegenheid in ICT te stimuleren en een aantrekkelijke landingsplaats voor talent te zijn. IT Academy Noord-Nederland is een initiatief van de Hanzehogeschool Groningen, Rijksuniversiteit, Samenwerking Noord en IBM Client Innovation Center.

    Source: Groninger krant

  • On-premise or cloud-based? A guide to appropriate data governance

    On-premise or cloud-based? A guide to appropriate data governance

    Data governance involves developing strategies and practices to ensure high-quality data throughout its lifecycle.

    However, besides deciding how to manage data governance, you must choose whether to apply the respective principles in an on-premise setting or the cloud.

    Here are four pointers to help:

    1. Choose on-premise when third-party misconduct is a prevalent concern

    One of the goals of data governance is to determine the best ways to keep data safe. That's why data safety comes into the picture when people choose cloud-based or on-premise solutions. If your company holds sensitive data like health information and you're worried about a third-party not abiding by your data governance policies, an on-premise solution could be right for you.

    Third-party cloud providers must abide by regulations for storing health data, but they still make mistakes. Some companies offer tools that let you determine a cloud company's level of risk and see the safeguards it has in place to prevent data breaches. You may consider using one of those to assess whether third-party misconduct is a valid concern as you strive to maintain data governance best practices.

    One thing to keep in mind is that the shortcomings of third-party companies could cause long-term damage for your company's reputation. For example, in a case where a cloud provider has a misconfigured server that allows a data breach to happen, they're to blame. But, the headlines about the incident will likely primarily feature your brand and may only mention the outside company in a passing sentence.

    If you opt for on-premise data governance, your company alone is in the spotlight if something goes wrong, but it's also possible to exert more control over all facets of data governance to promote consistency. When you need scalability, cloud-based technology typically allows you to ramp up faster, but you shouldn't do that at the expense of a possible third-party blunder.

    2. Select cloud-based data governance if you lack data governance maturity

    Implementing a data governance program is a time-consuming but worthwhile process. A data governance maturity assessment model can be useful for seeing how your company's approach to data governance stacks up to industry-wide best practices. It can also identify gaps to illuminate what has to happen for ongoing progress to occur.

    Using a data governance maturity assessment model can also signal to stakeholders that data governance is a priority within your organization. However, if your assessments show the company has a long way to go before it can adhere to best practices, cloud-based data governance could be the right choice.

    That's because the leading cloud providers have their own in-house data governance strategies in place. They shouldn't replace the ones used in-house at your company, but they could help you fill in the known gaps while improving company-wide data governance.

    3. Go with on-premise if you want ownership

    One of the things that companies often don't like about using a cloud provider for data governance is that they don't have ownership of the software. Instead, they usually enter into a leasing agreement, similarly to leasing an automobile. So, if you want complete control over the software used to manage your data, on-premise is the only possibility which allows that ownership.

    One thing to keep in mind about on-premise data governance is that you are responsible for data security. As such, you must have protocols in place to keep your software updated against the latest security threats.

    Cloud providers usually update their software more frequently than you might in an on-premise scenario. That means you have to be especially proactive about dealing with known security flaws in outdated software. Indeed, on-premise data governance has the benefit of ownership, but your organization has to be ready to accept all the responsibility that option brings.

    4. Know that specialized data governance tools are advantageous in both cases

    You've already learned a few of the pros and cons of on-premise versus cloud-based solutions to meet your data governance requirements. Don't forget that no matter which of those you choose, specialty software can help you get a handle on data access, storage, usage and more. For example, software exists to help companies manage their data lakes whether they are on the premises or in the cloud.

    Those tools can sync with third-party sources of data to allow monitoring of all the data from a single interface. Moreover, they can track metadata changes, allowing users to become more aware of data categorization strategies.

    Regardless of whether you ultimately decide it's best to manage data governance through an on-premise solution or in the cloud, take the necessary time to investigate data governance tools. They could give your company insights that are particularly useful during compliance audits or as your company starts using data in new ways.

    Evaluate the tradeoffs

    As you figure out if it's better to entrust data governance to a cloud company or handle it on-site, don't forget that each option has pros and cons.

    Cloud companies offer convenience, but only if their data governance principles align with your needs. And, if customization is one of your top concerns, on-premise data governance gives you the most flexibility to make tweaks as your company evolves.

    Studying the advantages and disadvantages of these options carefully before making a decision should allow you to get maximally informed about how to accommodate for your company's present and future needs. 

    Author: Kayla Matthews

    Source: Information-management

  • Opleidingen op het gebied van Data Science & Business Intelligence

    IMF Academy Logo Rechthoek

    Opleidingen op het gebied van Data Science & Business Intelligence

    Om u op het gebied van Data Science & Business Intelligence te blijven ontwikkelen, biedt BI Kring u vanaf heden, in samenwerking met IMF Academy, een aantal interessante opleidingen aan. IMF Academy ontwikkelt en organiseert al meer dan 25 jaar actuele (post-HBO) opleidingen, (certificerende) trainingen en distance learning cursussen met onderwerpen als Internet of Things (IoT), Data Science, Big Data en Artificial Intelligence (AI).

    Internet of Things (IoT) certificering

    Deze 3-daagse training leidt op voor het wereldwijd erkende, onafhankelijke Internet of Things (IoT) Foundation certificaat.

    Data Science & Business Analytics

    In deze 10-daagse opleiding (met academisch certificaat) leert u met name data te interpreteren op business relevantie en deze te vertalen naar innovatieve producten en diensten.

    Data Science voor Business Professionals

    In deze 5-daagse training leert u o.m. de methoden achter Data Science kennen en leert u werken met tools voor data mining, machine learning en visualisatie.

    Certified Data Science Professional

    Deze unieke Nederlandstalige opleiding leidt u in 10 dagen (plus 2 examendagen) op tot Certified Data Science Professional.

    Data Foundation

     

    Deze 3-daagse Big Data opleiding (incl. examen) leidt onder meer op voor het officiële internationale Big Data Foundation certificaat.

    Artificial Intelligence (AI) in de praktijk

    Een 4-daagse opleiding waarin alle kennis aan bod komt om met Artificial Intelligence (AI) in de praktijk aan de slag te gaan: de technologie, praktische toepassingsmogelijkheden en implementatie van AI.

    Bent u geïnteresseerd in een van bovenstaande opleidingen, wilt u meer informatie over deze opleidingen of u direct inschrijven? Dat kan hier!

  • Organizing Big Data by means of using AI

    Artificial IntelligenceNo matter what your professional goals are, the road to success is paved with small gestures. Often framed via KPIs – key performance indicators, these transitional steps form the core categories contextualizing business data. But what 

    data matters?

    In the age of big data, businesses are producing larger amounts of information than ever before and there needs to be efficient ways to categorize and interpret that data. That’s where AI comes in.

    Building Data Categories

    One of the longstanding challenges with KPI development is that there are countless divisions any given business can use. Some focus on website traffic while others are concerned with social media engagement, but the most important thing is to focus on real actions and not vanity measures. Even if it’s just the first step toward a sale, your KPIs should reflect value for your bottom line.

     

    Small But Powerful

    KPIs typically cover a variety of similar actions – all Facebook behaviors or all inbound traffic, for example. The alternative, though, is to break down KPI-type behaviors into something known as micro conversions. 

    Micro conversions are simple behaviors that signal movement toward an ultimate goal like completing a sale, but carefully gathering data from micro conversions and tracking them can also help identify friction points and other barriers to conversion. This is especially true any time your business undergoes a redesign or institutes a new strategy. Comparing micro data points from the different phases, then, is a high value means of assessment.

    AI Interpretation

    Without AI, this micro data would be burdensome to manage – there’s just so much of it –but AI tools are both able to collect data and interpret it for application, particularly within comparative frameworks. All AI needs is well-developed KPIs.

    Business KPIs direct AI data collection, allow the system to identify shortfalls, and highlight performance goals that are being met, but it’s important to remember that AI tools can’t fix broader strategic or design problems. With the rise of machine learning, some businesses have come to believe that AI can solve any problem, but what it really does it clarify the data at every level, allowing your business to jump into action.

    Micro Mapping

    Perhaps the easiest way to describe what AI does in the age of big data is with a comparison. Your business is a continent and AI is the cartographer that offers you a map of everything within your business’s boundaries. Every topographical detail and landmark is noted. But the cartographer isn’t planning a trip or analyzing the political situation of your country. That’s up to someone else. In your business, that translates to the marketing department, your UI/UX experts, or C-suite executives. They solve problems by drawing on the map.

    Unprocessed big data is overwhelming – think millions of grains of sand that don’t mean anything on their own. AI processes that data into something useful, something with strategic value. Depending on your KPI, AI can even draw a path through the data, highlighting common routes from entry to conversion, where customers get lost – what you might consider friction points, and where they engage. When you begin to see data in this way, it becomes clear that it’s a world unto itself and one that has been fundamentally incomprehensible to users. 

    Even older CRM and analytics programs fall short when it comes to seeing the big picture and that’s why data management has changed so much in recent years. Suddenly, we have the technology to identify more than click-through-rates or page likes. AI fueled by big data is a new organization era with an emphasis on action. If you’re willing to follow the data, AI will draw you the map

     

    Author: Lary Alton

    Source: Information Management

  • Pattern matching: The fuel that makes AI work

    Pattern matching: The fuel that makes AI work

    Much of the power of machine learning rests in its ability to detect patterns. Much of the basis of this power is the ability of machine learning algorithms to be trained on example data such that, when future data is presented, the trained model can recognize that pattern for a particular application. If you can train a system on a pattern, then you can detect that pattern in the future. Indeed, pattern matching in machine learning (and its counterpart in anomaly detection) is what makes many applications of artificial intelligence (AI) work, from image recognition to conversational applications.

    As you can imagine, there are a wide range of use cases for AI-enabled pattern and anomaly detection systems. Pattern recognition, one of the seven core patterns of AI applications, is being applied to fraud detection and analysis, finding outliers and anomalies in big stacks of data; recommendation systems, providing deep insight into large pools of data; and other applications that depend on identification of patterns through training.

    Fraud detection and risk analysis

    One of the challenges with existing fraud detection systems is that they are primarily rules-based, using predefined notions of what constitutes fraudulent or suspicious behavior. The problem is that humans are particularly creative at skirting rules and finding ways to fool systems. Companies looking to reduce fraud, suspicious behavior or other risk are finding solutions in machine learning systems that can either be trained to recognize patterns of fraudulent behavior or, conversely, find outliers and anomalies to learned acceptable behavior.

    Financial systems, especially banking and credit card processing institutions, are early adopters in using machine learning to enable real-time identification of potentially fraudulent transactions. AI-based systems are able to handle millions of transactions per minute and use trained models to make millisecond decisions as to whether a particular transaction is legitimate. These models can identify which purchases don't fit usual spending patterns or look at interactions between paying parties to decide if something should be flagged for further inspection.

    Cybersecurity firms are also finding significant value in the application of machine learning-based pattern and anomaly systems to bolster their capabilities. Rather than depending on signature-based systems, which are primarily oriented toward responding to attacks that have already been reported and analyzed, machine learning-based systems are able to detect anomalous system behavior and block those behaviors from causing problems to the systems or networks.

    These AI-based systems are able to adapt to continuously changing threats and can more easily handle new and unseen attacks. The pattern and anomaly systems can also help to improve overall security by categorizing attacks and improving spam and phishing detection. Rather than requiring users to manually flag suspicious messages, these systems can automatically detect messages that don't fit the usual pattern and quarantine them for future inspection or automatic deletion. These intelligent systems can also autonomously monitor software systems and automatically apply software patches when certain patterns are discovered.

    Uncovering insights in data

    Machine learning-based pattern recognition systems are also being applied to extract greater value from existing data. Machines can look at data to find insights, patterns and groupings and use the power of AI systems to find patterns and anomalies humans aren't always able to see. This has broad applicability to both back-office and front-office operations and systems. Whereas, before, data visualization was the primary way in which users could extract value from large data sets, machine learning is now being used to find the groupings, clusters and outliers that might indicate some deeper connection or insight.

    In one interesting example, through machine learning pattern analysis, Walmart discovered consumers buy strawberry pop-tarts before hurricanes. Using unsupervised learning approaches, Walmart identified the pattern of products that customers usually buy when stocking up ahead of time for hurricanes. In addition to the usual batteries, tarps and bottled water, it discovered that the rate of purchase of strawberry pop-tarts also increased. No doubt, Walmart and other retailers are using the power of machine learning to find equally unexpected, high-value insights from their data.

    Automatically correcting errors

    Pattern matching in machine learning can also be used to automatically detect and correct errors. Data is rarely clean and often incomplete. AI systems can spot routine mistakes or errors and make adjustments as needed, fixing data, typos and process issues. Machines can learn what normal patterns and behavior look like, quickly spot and identify errors, automatically fix issues on its own and provide feedback if needed.

    For example, algorithms can detect outliers in medical prescription behavior, flag these records in real time and send a notification to healthcare providers when the prescription contains mistakes. Other automated error correction systems are assisting with document-oriented processes, fixing mistakes made by users when entering data into forms by detecting when data such as names are placed into the wrong fields or when other information is incomplete or inappropriately entered.

    Similarly, AI-based systems are able to automatically augment data by using patterns learned from previous data collection and integration activities. Using unstructured learning, these systems can find and group information that might be relevant, connecting all the data sources together. In this way, a request for some piece of data might also retrieve additional, related information, even if not explicitly requested by the query. This enables the system to fill in the gaps when information is missing from the original source, correct errors and resolve inconsistencies.

    Industry applications of pattern matching systems

    In addition to the applications above, there are many use cases for AI systems that implement pattern matching in machine learning capabilities. One use case gaining steam is the application of AI for HR and staffing. AI systems are being tasked to find the best match between job candidates and open positions. While traditional HR systems are dependent on humans to make the connection or use rules-based matching systems, increasingly, HR applications are making use of machine learning to learn what characteristics of employees make the best hires. The systems learn from these patterns of good hires to identify which candidates should float to the surface of the resume pile, resulting in more optimal matches.

    Since the human is eliminated in this situation, AI systems can be used to screen candidates and select the best person, while reducing the risk of bias and discrimination. Machine learning systems can sort through thousands of potential candidates and reach out in a personalized way to start a conversation. The systems can even augment the data in the job applicant's resume with information it gleans from additional online sources, providing additional value.

    In the back office, companies are applying pattern recognition systems to detect transactions that run afoul of company rules and regulations. AI startup AppZen uses machine learning to automatically check all invoices and receipts against expense reports and purchase orders. Any items that don't match acceptable transactional patterns are sent for human review, while the rest are expedited through the process. Occupational fraud, on average, costs a company 5% of its revenues each year, with the annual median loss at $140,000, and over 20% of companies reporting losses of $1 million or more.

    The key to solving this problem is to put processes and controls in place that automatically audit, monitor, and accept or reject transactions that don't fit a recognized pattern. AI-based systems are definitely helping in this way, and we'll increasingly see them being used by more organizations as a result.

    Author: Ronald Schmelzer

    Source: TechTarget

  • Predictive modelling in Market Intelligence is hot

    IRCMSTR14533 Global Predictive Analytics Market 500x457

    Market intelligence is nog steeds een functie in bedrijven die onderbelicht is. Hoe vaak hebben bedrijven accuraat en actueel in beeld hoe groot hun markt precies is? En of deze groeit of krimp vertoont?

    B2C bedrijven kunnen tegen aanzienlijke bedragen nog dure rapporten kopen bij de informatiemakelaars van deze wereld. En als ze dan het geluk hebben dat voor hen relevante segmentaties zijn gebruikt kan dat inderdaad wat opleveren. B2B bedrijven hebben een veel grotere uitdaging. Markt data is doorgaans niet commercieel beschikbaar en zal moeten worden geproduceerd (al dan niet met behulp van B2C data). Waarmee markt data voor deze bedrijven eigenlijk nog duurder wordt.

    Bovenstaande discussie gaat bovendien nog slechts om data over de marktomvang en –waarde. De basis zou je kunnen zeggen. Data over concurrenten, marktaandelen, productontwikkelingen en marktbepalende trends is minstens zo relevant om een goede koers te kunnen bepalen maar ook tactische (inkoop, pricing, distributie) beslissingen te kunnen nemen.

    Toch zijn er mogelijkheden! Ook met behulp van schaarse data is het mogelijk marktdata te gaan reconstrueren. Het uitgangspunt: Als we op zoek gaan in die markten waar we wel data hebben naar voorspellende variabelen dan kunnen andere marktdata wellicht worden ‘benaderd’ of ‘geschat’. Een vorm van statistische reconstructie van marktdata die vaak betrouwbaarder blijkt dat dan die van surveys of expert panels. Meer en meer wordt deze techniek toegepast in market intelligence. Dus ook in dit vakgebied doet data science haar intrede.

    Als dit gemeengoed is, is de stap naar het voorspellen van markten natuurlijk niet ver meer weg. Meer en meer wordt die vraag natuurlijk gesteld. Kunnen we ook in kaart brengen hoe de markt er over 5 of misschien zelfs 10 jaar uitziet? Dit kan! En de kwaliteit van die voorspellingen neemt toe. En daarmee het gebruik. Market intelligence wordt er alleen maar leuker van! En het spel om de knikkers natuurlijk alleen maar interessanter.

    Source: Hammer, market intelligence

    http://www.hammer-intel.com

     

     

  • Preserving privacy within a population: differential privacy

    Preserving privacy within a population: differential privacy

    In this article, I will present the definition of differential privacy and preserving privacy and personal data of users while using their data in training machine learning models or driving insights using data science technologies.

    What is differential privacy?

    Differential privacy describes a promise, made by a data holder, or curator, to a data subject:

    ''You will not be affected, adversely or otherwise, by allowing your data to be used in any study or analysis,no matter what other studies, data sets, or information sources, are available.''

    At their best, differentially private database mechanisms can make confidential data widely available for accurate data analysis, without resorting to data clean rooms, data usage agreements, data protection plans, or restricted views.

    Nonetheless, data utility will eventually be consumed: the Fundamental Law of Information Recovery states that overly accurate answers to too many questions will destroy privacy in a spectacular way.

    Differential privacy addresses the paradox of learning nothing about an individual while learning useful information about a population.

    A medical database may teach us that smoking causes cancer, affecting an insurance company’s view of a smoker’s long-term medical costs. 

    Has the smoker been harmed by the analysis?

    Perhaps — his insurance premiums may rise, if the insurer knows he smokes. He may also be helped — learning of his health risks, he enters a smoking cessation program.

    Has the smoker’s privacy been compromised?

    It is certainly the case that more is known about him after the study than was known before, but was his information “leaked”?

    Differential privacy will take the view that it was not, with the rationale that the impact on the smoker is the same independent of whether or not he was in the study. It is the conclusions reached in the study that affect the smoker, not his presence or absence in the data set

    Differential privacy ensures that the same conclusions, for example, smoking causes cancer, will be reached, independent of whether any individual opts into or opts out of the data set.

    Artificial Intelligence and the privacy paradox

    Consider an institution, e.g. the National Institutes of Health, the Census Bureau, or a social networking company, in possession of dataset containing sensitive information about individuals. For example, the dataset may consist of medical records, socioeconomic attributes, or geolocation data. The institution faces an important tradeoff when deciding how to make this dataset available for statistical analysis.

    On one hand, if the institution releases the dataset (or at least statistical information about it), it can enable important research and eventually inform policy decisions.

    On the other hand, for a number of ethical and legal reasons it is important to protect the individual-level privacy of the data subjects. The field of privacy-preserving data analysis aims to reconcile these two objectives. That is, it seeks to enable rich statistical analyses on sensitive datasets while protecting the privacy of the individuals who contributed to them.

    Differential privacy and Machine Learning

    One of the most useful tasks in data analysis is machine learning: the problem of automatically finding a simple rule to accurately predict certain unknown characteristics of never before seen data.

    Many machine learning tasks can be performed under the constraint of differential privacy. In fact, the constraint of privacy is not necessarily at odds with the goals of machine learning, both of which aim to extract information from the distribution from which the data was drawn, rather than from individual data points.

    The goal in machine learning is very often similar to the goal in private data analysis. The learner typically wishes to learn some simple rule that explains a data set. However, she wishes this rule to generalize — that is, it should be that the rule she learns not only correctly describes the data that she has on hand, but that it should also be able to correctly describe new data that is drawn from the same distribution.

    Generally, this means that she wants to learn a rule that captures distributional information about the data set on hand, in away that does not depend too specifically on any single data point.

    Of course, this is exactly the goal of private data analysis — to reveal distributional information about the private data set, without revealing too much about any single individual in the dataset (you remember the over-fitting phenomena?).

    It should come as no surprise then that machine learning and private data analysis are closely linked. In fact, as we will see, we are often able to perform private machine learning nearly as accurately, with nearly the same number of examples as we can perform non-private machine learning.

    Cryptography and privacy

    Some recent work has focused on machine learning or general computation over encrypted data.

    Recently, Google deployed a new system for assembling a deep learning model form thousands of locally-learned models while preserving privacy, which they call Federated Learning.

    Conclusion

    Differential privacy should not be seen as a limitation in any context. However, we should look at it as a privacy-dog watching our compliance with standards that handles the sensitive data. We generate data more than what we think and we leave digital footprint everywhere, thus; as researchers in machine learning and data science, we should focus more on this topic and find a fair trade-off between privacy and accurate models.

  • Pyramid Analytics: Main lessons learned from the data-driven drilling and production conference

    Pyramid Analytics: Main lessons learned from the data-driven drilling and production conference

    It was great to be at the data-driven drilling and production conference in Houston on June 11 and 12. The conference was well attended by hundreds of oil and gas (O&G) professionals looking to use technology to minimize downtime, enhance safety, and deliver digital transformation throughout their businesses.

    We talked to dozens of attendees looking to educate themselves about modern data collection and ingestion methods, better information management and integration processes, E&P automation & control systems, more efficient change management, and drilling optimization techniques, and advanced and predictive analytics.

    As an analytics and BI vendor, we were there to learn more about how practitioners are using advanced analytics, particularly AI and machine learning, to extract more value out of their data.

    Three key themes

    In our conversations with attendees and other vendors, three key themes emerged:

    • The persistence of data silos

      No surprise here: data silos aren’t going anywhere. The upstream organizations we spoke to struggle with data sharing across departments. It’s a common scenario for users to have limited access to distributed data. It is also common for upstream organizations to perform analytics using numerous tools (many of the individuals we spoke to freely admitted to using three or four different BI tools). This perpetuates the cliché: there is no single version of the truth. The result is duplicate data, duplicate efforts for reporting, duplicate logic and business rules, and more. As a result, collaboration and efficiency suffer.
    • AI and ML operationalization remain elusive

      Many of the professionals we talked to lack effective systems for putting advanced analytics into production. Here’s a common scenario. A line-of-business user will throw data scientists a set of data and say, 'here’s the data, do your magic'. The data isn’t always optimized, so data scientists often spend time prepping the data before they can even analyze it. Then they analyze the data using standalone ML software applications before outputting a flat file and sending it to a business analyst to reload into one of several desktop-based BI applications. This results in a perpetual cycle of extracting, importing, analyzing, exporting, re-importing, and re-analyzing data. The whole process is cumbersome and inefficient; meaningful insights derived from AI and ML initiatives remain limited.

    • It’s hard to move beyond legacy analytics systems 

      For many O&G companies, there is a strong desire to adopt new data and analytics technologies; they acknowledge legacy tools simply aren’t equipped to quickly accommodate newer sources of data and perform advanced and prescriptive analytics. However, the difficulty of migrating from legacy systems often holds some people back, no matter how siloed their data environment is. Many organizations have had their current desktop-based analytics solutions in place for years, and in some cases decades. However, the huge store of analytic models, dashboards, and reports they have created over the years cannot be easily migrated or re-created. 

    The three challenges identified above are tough. But that doesn’t make trying to solve them any less urgent. And from our perspective, this doesn’t make them any less solvable. The price of inaction is too high. No one can stand on the sidelines while the technology environment changes.

    Author: Brigette Casillas

    Source: Pyramid Analytics

  • Recommending with SUCCES as a data scientist

    Recommending with SUCCES as a data scientist

    Have you ever walked an audience through your recommendations only to have them go nowhere? If you’re like most data scientists, chances are that you’ve been in this situation before.

    Part of the work of a data scientist is being able to translate your work into actionable recommendations and insights for stakeholders. This means making your ideas memorable, easy to understand and impactful.

    In this article, we’ll explore the principles behind the book 'Made To Stick' by Chip Heath and Dan Heath, and apply it within the context of data science. This book suggests that the best ideas follow six main principles: Simplicity, Unexpectedness, Concreteness, Credibility, Emotions, and Stories (SUCCES). After reading this article, you’ll be able to integrate these principles into your work and increase the impact of your recommendations and insights.

    Simple

    Making an idea simple is all about stripping the idea to its core. It’s not about dumbing down, but about creating something elegant. This means that you should avoid overwhelming your audience with ideas. When you try to say too many things, you don’t say anything at all. Another key component to making ideas simple is to avoid burying the lead. If during your analysis you find that 10% of customers contribute to 80% of revenues, lead with that key insight! You should follow an inverted pyramid approach where the first few minutes convey the most information, and as you get further you can get more nuanced. Analogies and metaphors are also a great way to get your ideas across simply and succinctly. Being able to use schemas that your audience can understand and relate to, will make it a lot more digestible. For example, a one-sentence analogy like Uber for X can capture the core message of what you’re trying to convey.

    Unexpected

    An unexpected idea is one that violates people’s expectations and takes advantage of surprise. You can do this in several ways, one of which is making people commit to an answer, then falsifying it. For example, asking to guess how much time employees spend doing a task you’re looking to automate before revealing the real answer. Another way to generate interest and leverage the unexpected principle is to use mysteries since they lead to aha moments. This might take the form of starting your presentation with a short story that you don’t resolve until the end, for example.

    Concrete

    Abstractness is the enemy of understanding for non-experts. It’s your job as the data scientist to make your recommendations and insights more concrete. A key to understanding is using concrete images and explaining ideas in terms of human actions and senses. The natural enemy of concreteness is the curse of knowledge. As data scientists, we need to fight the urge to overwhelm our audiences with unnecessary technical information. For example, reporting on the Root Mean Squared Error of a model, may not be as helpful as breaking up the language into more concrete terms that anyone can understand.

    Credible

    Adding credibility to your recommendations can take three forms. The first is the most common one when we think of credibility, which is leveraging experts to back up claims or assertions. Another way is using anti-authorities who are real people with powerful stories. For example, if you’re talking about the dangers of smoking, the story of someone who suffers from lung cancer will be a lot more impactful than a sterile statistic. The third way of adding credibility to your story is by outsourcing the credibility of your point to your audience. This means creating a testable claim that the audience can try out. For example making the claim that customers from region X take up 80% more customer support time than any other region. In posing this claim, your audience can confirm this claim which can make it easier to lead to your recommendation.

    Emotions

    Weaving an emotional component to your ideas is all about getting people to care. Humans are naturally wired to feel for humans, not for abstractions. As a result, one individual often trumps a composite statistic. Another component of emotions is tapping into the group identities that your audience conforms to. By keeping those identities in mind, you can tie in the relevant associations and evoke certain schemas that your audience will be most receptive to. For example, if you know one of your audience members is a stickler for numbers and wants to see a detailed breakdown of how you arrived at certain conclusions, adding an appendix may be helpful.

    Stories

    Humans have been telling stories for centuries and they have proven to be one of the most effective teaching methods. If you reflect on the books you’ve read in the past 5 years, you’re more likely to remember the interesting stories rather than objective facts. When weaving stories into your recommendations, make sure to build tension and don’t give everything away all at once. Another useful tactic is telling stories which act as springboards to other ideas. Creating open-ended stories that your audience can build on is a great way for them to get a sense of ownership.

    Next time you’re tasked with distilling your insights or pitching recommendations, keep in mind these six principles and you’ll be creating simple, unexpected, concrete, credentialed emotional stories in no time!

    Author: Andrei Lyskov

    Source: Towards Data Science

  • Reducing CO2 wth the help of Data Science

    Reducing CO2 wth the help of Data Science

    Optimize operations by shifting loads in time and space

    Data scientists have much to contribute to climate change solutions. Even if your full-time job isn’t working on climate, data scientists who work on a company’s operations can have a big impact within their current role. By finding previously untapped sources of flexibility in operations, data scientists can help shift loads to times and places where the electricity grids have a higher share of carbon-free energy, such as wind and solar. This load shifting allows the grid to transition faster to higher shares of carbon-free energy, and it can also reduces operating costs as well.

    Data Science contributions to climate solutions

    Before getting into specific opportunities in optimizing operations or infrastructure, I would like to acknowledge the broad stage that data scientists have in working on climate. Much of the current excitement about applying data science to climate has been around applications of ML and big data, and rightly so. An excellent starting point is climatechange.ai, an organization of volunteers that has built an extensive community of people who work at the intersection of climate change and AI. Their website includes summaries of each of dozens of “climate change solution domains’’ described in the 2019 paper Tackling Climate Change with Machine Learning [1]. While the solution domains are intended as a guide to high impact ML climate applications, many domains also lend themselves to more “classic” data science methods from statistics and operations research. The list of possibilities is vast, and it can be difficult to know where and how to get started. For data scientists looking to get more engaged on climate problems, either on 20% projects or in pivoting their career trajectory, the Terra.do bootcamp and the workonclimate.org Slack community are good places to meet others and find resources.

     

EasyTagCloud v2.8