3 items tagged "GDPR"

  • How data management can learn from basketball

    How data management can learn from basketball

    A data management plan in a company is not something that can be implemented in isolation by one department or a team in your organisation, it is rather a collective effort, similar to how different players perform in a basketball court.  

    From the smallest schoolyard to the biggest pro venue, from the simplest pickup game to the NBA finals, players, coaches, and even fans will tell you that having a game plan and sticking to it is crucial to winning. It makes sense; while all players bring their own talents to the contest, those talents have to be coordinated and utilized for the greater good. When players have real teamwork, they can accomplish things far beyond what they could achieve individually, even if they are nominally part of the squad. When team players aren’t displaying teamwork, they’re easy targets for competitors who know how to read their weaknesses and take advantage of them.

    Basketball has been used as an analogy for many aspects of business, from coordination to strategy, but among the most appropriate business activities that basketball most resembles is, believe it or not, data management. Perhaps more than anything, companies need to stick to their game plan when it comes to handling data: storing it, labeling it, and classifying it.

    A good data management plan could mean a winning season

    Without a plan followed by everyone in the organization, companies will soon find that their extensive collections of data are useless, just like the top talent a team manages to amass is useless without everyone on a team knowing what their role is. Failure to develop a data management plan could cost a company in time, and even in money. If data is not classified or labeled properly, search queries are likely to miss a great deal of it, skewing reports, profit and loss statements, and much more. 

    Even more worrying for companies is the need for an ability to produce data when regulators come calling. With the implementation of the European Union’s General Data Protection Regulation (GDPR), companies no longer have an option not to have a tight game plan for data management. According to GDPR rules, all EU citizens have 'the right to be forgotten', which requires companies to know what data they have about an individual, and demonstrate an ability to delete it to EU inspectors on demand. Those rules apply not just to companies in Europe, but to all companies that do business with EU residents as well. GDPR violators could be fined as much as €20 million, or 4% annual global turnover, whichever is greater.

    Even companies that have no EU clients or customers need to improve their data management game, because GDPR-style rules are moving stateside as well. California recently passed its own digital privacy law (set to go into effect in January), which gives state residents the right to be forgotten other states are considering similar laws. And with heads of large tech firms calling for privacy legislation in the U.S., it’s likely that federal legislation on the matter will be passed sooner than later.

    Data Management Teamwork, When and Where it Counts

    In basketball, players need to be molded to work together as a unit. A rogue player who decides that they want to be a 'shooting star' instead of following the playbook and passing when appropriate may make a name for themselves, but the team they are playing for is unlikely to benefit much from that kind of approach. Only when all the players work together, with each move complementing the other as prescribed by the game plan, can a team succeed.

    In data management, teams generate information that the organization can use to further its business goals. Data on sales, marketing, engagement with customers, praises and complaints, how long it takes team members to carry out and complete tasks, and a million other metrics all go into the databases and data storage systems of organizations for eventual analysis.

    With that data, companies can accomplish a great deal: Improve sales, make operations more efficient, open new markets, research new products and improve existing ones, and much more. That, of course, can only happen if all departments are able to access the data collected by everyone.

    Metadata management - A star 'player'

    Especially important is the data about data: the metadata, used to refer to data structures, labels, and types. When different departments, and even individual employees, are responsible for entering data into a repository, they need to follow the metadata 'game plan'.Tthe one where all data is being labeled according to a single standard, using common dictionaries, glossaries, and catalogs. Without that plan, data could easily get 'lost', and putting together search queries could be very difficult.

    Another problem is the fact that different departments will use different systems and products to process their data. Each data system comes with its own rules, and of course each set of rules is different. That there is no single system for labeling between the different products just contributes to the confusion, making resolution of metadata issues all the more difficult.

    Unfortunately, not everyone is always a team player when it comes to metadata. Due to pressure of time or other issues, different departments tend to use different terminology for data. For example, a department that works with Europe may label its dates in the form of year/month/day, while one that deals with American companies will use the month/day/year label. In a search form, the fields for 'years' and 'days' will not match across all data repositories, creating confusion. The department 'wins', but what about everyone else? And even in situations where the same terminology is used, the fact that different data systems are in use could impact metadata.

    Different departments have different objectives and goals, but team members cannot forget the overall objective: helping the 'team', the whole company, to win. The data they contribute is needed for those victories, those advancements. Without it, important opportunities could be lost. When data management isn’t done properly, teams may accomplish their own objectives, but the overall advancement of the company will suffer.


    'Superstars', whose objective is to aggrandize themselves, have no place on a basketball team; they should be playing one-on-one hoops with others of their type. Teams in companies should learn the lessonL if you want to succeed in basketball, or in data management, you need to work together with others, following the data plan that will ensure success for everyone.

    Author: Amnon Drori

    Source: Dataconomy

  • Staying on the right track in the era of big data

    Staying on the right track in the era of big data

    Volume dominates the multidimensional big data world. The challenge many organizations today are facing is harnessing the potential of the data and applying all of the usual methods and technologies at scale. After all, data growth is only increasing and is currently being produced at 2.5 quintillion bytes of data per day. Unfortunately, a large portion of this data is unstructured, making it even harder to categorize.

    Compounding the problem, most businesses expect that decisions made based on data will be more effective and successful in the long run. However, with big data often comes big noise. After all, the more information you have, the more chance that some of that information might be incorrect, duplicated, outdated, or otherwise flawed. This is a challenge that most data analysts are prepared for, but one that IT teams need to consider and factor into their downstream processing and decision making to ensure that any bad data does not skew the resulting insights.

    This is why overarching big data analytics solutions alone are not enough to ensure data integrity in the era of big data. In addition, while new technologies like AI and machine learning can help make sense of the data en masse, often these rely on a certain amount of cleaning and condensing going on behind the scenes to be effective and able to run at scale. While accounting for some errors in the data is fine, being able to find and eliminate mistakes where possible is a valuable capability, which can have a catastrophic effect in terms of derailing effective analysis and delaying the time to value. In particular if there is a configuration error or problem with a single data source creating a stream of bad data. Without the right tools, these kinds of errors can create unexpected results and leave data professionals with an unwieldy mass of data to sort through to try and find the culprit.

    This problem is compounded when data is ingested from multiple different sources and systems, each of which may have treated the data in a different way. The sheer complexity of big data architecture can turn the challenge from finding a single needle in a haystack to one more akin to finding a single needle in a whole barn.

    Meanwhile, this problem has become one that doesn’t just affect the IT function and business decision making, but is becoming a legal requirement to overcome. Legislation like the European Union’s General Data Protection Regulation (GDPR) mandates that businesses find a way to manage and track all of their personal data, no matter how complicated the infrastructure or unstructured information. In addition, upon receiving a valid request, organizations need to be able to delete information pertaining to an individual or collect and share it as part of an individual’s right to data portability.

    So, what’s the solution? One of the best solutions for managing the beast of big data overall is also one that builds in a way to ensure data integrity, ensuring a full data lineage by automating data ingestion. This creates a clear path showing how data has been used over time, as well as its origins. In addition, this process is done automatically, making it much easier and more reliable. However, it is important to ensure that lineage is done at the fine detail level.

    With the right data lineage tools, ensuring data integrity in a big data environment becomes far easier. The right tracking means that data scientists can track data back through the process to explain what data was used, from where, and why. Meanwhile, businesses can track down the data of a single individual, sorting through all the noise to fulfill subject access requests without disrupting the big data pipeline as a whole, or diverting significant business resources. As a result, analysis of big data can deliver more insight, and thus more value, faster, despite its multidimensional complexity.

    Author: Neil Barton

    Source: Dataversity

  • What is dark data? And how to deal with it

    What is dark data? And how to deal with it

    It’s easier than ever to collect data without a specific purpose, under the assumption that it may be useful later. Often, though, that data ends up unused and even forgotten because of several simple factors: The fact that the data is being collected isn’t effectively communicated to potential users within an organization. The repositories that hold the data aren’t widely known. Or perhaps there simply isn’t enough analysis capacity within the company to process it. This data that is collected but not used is often termed 'dark data'. 

    Dark data presents an organization with tremendous opportunities, as well as liabilities. If it is harnessed effectively, it can be used to produce insights that wouldn’t otherwise be available. With that in mind, it’s important to make this dark data accessible so it can power those innovative use cases.

    On the other hand, lack of visibility into all the data being collected within an organization can make it difficult to accurately manage costs, and easy to accidentally run afoul of retention policies. It can also hamper efforts to ensure compliance with regulations like the EU’s General Data Protection Regulation (GDPR) and the California Consumer Privacy Act (CCPA).

    So what can be done to maximize the benefits of dark data and avoid these problems?

    Some best practices

    When dealing with dark data, the foremost best practice is to shine a spotlight on it by communicating to potential users within the organization what data is being collected.

    Secondly, organizations need to evaluate whether and for how long it makes sense to retain the data. This is crucial to avoid incurring potentially substantial costs collecting and storing data that isn’t being used and won’t be used in the future, and even more importantly to ensure that the data is being handled and secured properly.

    Perhaps the biggest challenge when working with dark data is simply getting access to it, as it’s often stored in siloed repositories close to where the data is being collected. Additionally, it may be stored in systems and formats that are difficult to query or have limited analytics capabilities.

    So the next step is to ensure that the data that is collected can actually be used effectively. The two main approaches are: (1) investing in tooling that can query the data where it is currently stored, and (2) moving the data into centralized data storage platforms. 

    I recommend combining these two approaches. Firstly, adopt tools that provide the ability to discover, analyze, and visualize data from multiple platforms and locations via a single interface, which will increase data visibility and reduce the tendency to store the same data multiple times. Second, leverage storage platforms that can efficiently aggregate and store data that would otherwise be inaccessible, in order to reduce the number of data stores that must be tracked and managed.

    Considering the potential power and pitfalls that come with having dark data in your organization, it’s definitely worth the effort to bring it out of the dark.

    Author: Dan Cech

    Source: Insidebigdata

EasyTagCloud v2.8