3 items tagged "bias"

  • How AI reinforces stereotypes through biased data

    How AI reinforces stereotypes through biased data

    Artificial intelligence (AI) software systems have been under attack for years for systematizing bias against minorities. Banks using AI algorithms for mortgage approvals are systematically rejecting minority applicants. Many recruiting tools using AI are biased against minority applicants. In health care, African Americans are victims of racial bias from AI-based hospital diagnostic tools. If you understand how data science research produces the models that power AI systems, you will understand why they perpetuate bias, and also how to fix it.

    Merriam Webster defines stereotype as “something conforming to a fixed or general pattern.” Machine learning algorithms, which build models that power AI software systems, are simply pattern matching machines. Deep learning models, which are most commonly used in the latest wave of AI-powered systems, discover patterns in data and perpetuate those patterns in future data. This behavior is great if the goal is to replicate the world represented by the training data. However, for a variety of reasons, that isn’t always the best outcome.

    The root cause of bias in data-driven AI systems comes down to how the data to train those systems is collected, and whether or not the decision-making in that data represents the corporate or societal goals of the deployed system based on that data. Data for AI systems can be derived from many sources, including: the real world, data collection efforts, and synthetic data generation.

    The real world contains a lot of data. For instance, the mortgage applications from a period in the past constitute a training set for a mortgage approval system. Job applications, along with the hiring decisions and the outcomes of those hires, provide data to a human resources hiring system. Similarly, medical data from patients over a time period, including their symptoms, histories, diagnoses, and treatments, might be a useful data set for a medical treatment system.

    Absent valid, useful, or available real-world data, machine learning researchers and application developers can collect data “artificially”, deciding what data they want to collect, deploying a team to design a data collection process, and going out into the world and collecting the data proactively. This data set might be more targeted to the needs of the system builder, but the decisions made in this data collection process might skew the results of the models built from that data, introducing a form of bias based on the design decisions of the data collection process.

    There is a third possibility: synthetic data. If the appropriate data isn’t available, research teams can deploy synthetic data generators, to create artificial data that they believe represents the real world data the proposed application will see, along with the desired analysis that the developer wants their system to assign to that data when it sees it.

    In all of these cases, there is a presumption that the AI system should model the world as it is and perpetuate the behavior it sees in the training data, regardless of its source. And, in a world historically influenced by systemic racism and bias against broad classes of minority groups, or in the case of women, even majority groups, it is not clear at all that the best outcome is a replication of the decision-making of the past.

    If we know that qualified African American mortgage applicants have been denied mortgages in the past, or that the economic system has been biased against African Americans in the past, so that they are more likely to default on mortgages than white applicants, then training AI systems on historical mortgage data is only going to perpetuate the inherent bias encoded historically in the financial system. Similarly, if qualified minorities and women have been underrepresented in the job market in the past, training off of historical hiring data will likely reinforce that bias. If the real world has made the wrong decision in the past, due to systemic bias in societal infrastructure, training systems on historical data is going to reinforce that system bias, but in an algorithmic way, as opposed to in an anecdotal way, perhaps even making the problem worse.

    An effective way to rectify this problem is to create targets for the proposed behavior of data-driven systems, and then engineer or curate training data sets that represent the desired outcomes. This process will allow machine learning training algorithms to learn patterns for making accurate predictions on new data while ensuring that the models capture the inputs and outputs proactively.

    How does this work in reality? Let’s say you want to build a mortgage approval model that will make good decisions about loan risk, but which will treat minority applicants on a par with non-minority applicants. However, it turns out the historical data you have is biased against minority applicants. One simple way to rectify this problem is to filter the data so that the percentage of minority applicants approved for mortgages matches the percentage of non-minority applicants. By skewing the training data to represent the outcome you want to see, as opposed to the way the data reflects historical biases, you can push the machine learning algorithms to create models that treat minority applicants more fairly than they have been treated in the past.

    Some people may want to view machine learning models simply as tools used to capture and represent the world as it has been, flaws and all. Given the reality of systemic racism pervading the systems that run our society, this has led to AI-driven software encoding the racism of the past and present and amplifying it through the power of technology. We can do better, however, by using the pattern-matching ability of machine learning algorithms, and by curating the data sets they are trained on, to make the world what we want it to be, not the flawed version of itself that it has more recently been.

    Author: David Magerman

    Source: Insidebigdata

  • The Growing Influence of Ethical AI in Data Science

    The Growing Influence of Ethical AI in Data Science

    Industries such as insurance that handle personal information are paying more attention to customers’ desire for responsible, transparent AI.

    AI (artificial intelligence) is a tremendous asset to companies that use predictive modeling and have automated tasks. However, AI is still facing problems with data bias. After all, AI gets its marching orders from human-generated data -- which by its nature is prone to bias, no matter how evolved we humans like to think we are.

    With the wide adoption of AI, many industries are starting to pay attention to a new form of governance called responsible or ethical AI. These are governance practices associated with regulated data. For most organizations, this involves removing any unintentional bias or discrimination from their customer data and cross-checking any unexpected algorithmic activity once the data moves into production mode.

    This is an especially important transformation for the insurance industry because consumers today are becoming far more attuned to their personal end-to-end experience in any industry that relies on the use of personal data. By advancing responsible, ethical AI, insurers can confidently map to the way consumers want to search for insurance and find insurance policies, and they can align with the values and ethics that govern this kind of personal search.

    What Does Inherent Bias Look Like in AI Algorithms Today?

    One of the more noticeable examples of human-learned, albeit unintentional, data bias today is around gender. This happens when the AI system does not behave the same way for a man versus a woman, even when the data provided to the system is identical except for the gender information. One example outcome is that individuals who should be in the same insurance risk category are offered unequal policy advice.

    Another example is something called the survivor bias, which is optimizing an AI model using only available, visible data -- i.e., “surviving” data. This approach inadvertently overlooks information due to the lack of visibility, and the results are skewed to one vantage point. To move past this weakness, for example in the insurance industry, AI must be trained not to favor the known customer data over prospective customer data that is not yet known.

    More enterprises are becoming aware of how these data determinants can expose them to unnecessary risk. A case in point: in their State of AI in 2021 report, McKinsey reviewed industry regulatory compliance through the filter of a company’s allegiance to equity and fairness data practices --and reported that two of companies’ top three global concerns are the ability to establish ethical AI and to explain their practices well to customers.

    How Can Companies Proactively Eliminate Data Bias Company-wide?

    Most companies should already have a diversity, equity, and inclusion (DEI) program to set a strong foundation before exploring practices in technology, processes, and people. At a minimum, companies can set a goal to remove ingrained data biases. Fortunately, there are a host of best-practice options to do this.

    • Adopt an open source strategy. First, enterprises need to know that biases are not necessarily where they imagine them to be. There can be a bias in the sales training data or in the data at the later inference or prediction time, or both. At Zelros, for example, we recommend that companies use an open source strategy to be more open and transparent in their AI initiatives. This is becoming an essential baseline anti-bias step that is being practiced at companies of all sizes. 

    • Utilize vendor partnerships. Companies that want to put a bigger stake in the ground when it comes to regulatory compliance and ethical AI standards can collaborate with organizations such as isahit, dedicated to helping organizations across industries become competent in their use and implementation of ethical AI. As a best practice, we recommend that companies work toward adopting responsible AI at every level, not just with their technical R&D or research teams, then communicate this governance proliferation to their customers and partners. 

    • Initiate bias bounties. Another method for eliminating data bias was identified by Forrester as a significant trend in their North American “Predictions 2022” guide. It is an initiative called bias bounties. Forrester stated that, “At least five large companies will introduce bias bounties in 2022.”
      Bias bounties are like bug bounties, but instead of rewarding users based on the issues they detect in software, users are rewarded for identifying bias in AI systems. The bias happens because of incomplete data or existing data that can lead to discriminatory outcomes from AI systems. According to Forrester, in 2022, major tech companies such as Google and Microsoft will implement bias bounties, and so will non-technology organizations such as banks and healthcare companies. With trust high on stakeholders’ agenda, basing decisions on accountability and integrity is more critical than ever.

    • Get certified. Finally, another method for establishing an ethical AI approach -- one that is gaining momentum -- is getting AI system certification. Being able to provide proof of the built-in governance through an external audit goes a long way. In Europe, the AI Act is a resource for institutions to assess their AI systems from a process or operational standpoint. In the U.S., the NAIC is a reference organization providing guiding principles for insurers to follow. Another option is for companies to align to a third-party organization for best practices.

    Can an AI System Be Self-criticizing and Self-sustaining?

    Creating an AI system that is both self-criticizing and self-sustaining is the goal. Through the design itself, the AI must adapt and learn, with the support of human common sense, which the machine cannot emulate.

    Companies that want to have a fair prediction outcome may analyze different metrics at various subgroup levels within a specific model feature (for example gender) because that can help identify and prevent biases before they go to market with consumer-facing capabilities. With any AI, making sure that it doesn’t fall into a trap called a Simpson’s Paradox is key. Simpson's Paradox, which also goes by several other names, is a phenomenon in probability and statistics where a trend appears in several groups of data but disappears or reverses when the groups are combined. Successfully preventing this from happening ensures that personal data does not penalize the client or consumer who it is supposed to benefit.

    Responsible Use of AI Can Be a Powerful Advantage

    Companies are starting to pay attention to how responsible AI has the power to nurture a virtuous, profitable circle of customer retention through more reliable and robust data collection. There will be challenges in the ongoing refinement of ethical AI for many applications, but the strategic advantages and opportunities are clear. In insurance, the ability to monitor, control, and balance human bias can keep policy recommendations meant for certain races and genders fairly focused on the needs of those intended audiences. Responsible AI leads to stronger customer attraction and retention, and ultimately increased profitability.

    Conclusion

    Companies globally are revving up their focus on data equity and fairness as a relevant risk to mitigate. Fortunately, they have options to choose from to protect themselves. AI offers an opportunity to accelerate more diverse, equitable interactions between humans and machines. Solutions can help large enterprises globally provide hyper-personalized, unbiased recommendations across channels. Respected trend analysts have called out data bias a top business concern of 2022. Simultaneously, they identify responsible, ethical AI as a forward-thinking solution companies can deploy to increase customer and partner trust and boost profitability.

    How are you moving toward an ethical use of AI today?

    Author: Damien Philippon

    Source: TDWI

  • Why we should be aware of AI bias in lending

    Why we should be aware of AI bias in lending

    It seems that beyond all the hype AI (artificial intelligence) applications in lending do speed up and automate decision-making.

    Indeed, a couple of months ago Upstart, an AI-leveraging fintech startup, announced that it had raised a total of $160 million since inception. It also inked deals with the First National Bank of Omaha and the First Federal Bank of Kansas City.

    Upstart won recognition due to its innovative approach toward lending. The platform identifies who should get a loan and of what amount using AI trained with the so-called ‘alternative data’. Such alternative data can include information on an applicant’s purchases, type of phone, favorite games, and social media friends’ average credit score.

    However, the use of alternative data in lending is still far from making the process faster, fairer, and wholly GDPR-compliant. Besides, it's not an absolute novelty.

    Early credit agencies hired specialists to dig into local gossip on their customers, while back in 1935 neighborhoods in the U.S. got classified according to their collective creditworthiness. In a more recent case from 2002, a Canadian Tire executive analyzed last year’s transactional data to discover that customers buying roof cleaning tools were more financially reliable than those purchasing cheap motor oil.

    There's one significant difference to the past and the present, however. Earlier, it was a human who collected and processed both alternative and traditional data, including debt-to-income, loan-to-value, and individual credit history. Now, the algorithm is stepping forward as many believe it to be more objective as well as faster.

    What gives cause for concern, though, is that AI can turn out to be no less biased than humans. Heads up: if we don’t control how the algorithm self-learns, AI can go even more one-sided.

    Where AI bias creeps in

    Generally, AI bias doesn’t happen by accident. People who train the algorithm make it subjective. Influenced by some personal, cultural, educational, and location-specific factors, even the best algorithm trainers might use inherently prejudiced input data.

    If not detected timely, it can result in biased decisions, which will only aggravate with time. That's because the algorithm takes its new decisions based on the previous ones. Evolving on its own, it ends up being much more complex than in the beginning of its operation (the classical snowball effect). In plain words, it continuously learns by itself, whether the educational material is correct or not.

    Now, let’s look at how exactly AI might discriminate in the lending decisions it makes. Looking at the examples below, you'll easily follow the key idea: AI bias often goes back to human prejudice.

    AI can discriminate based on gender

    While there are traditionally more men in senior and higher-paid positions, women continue facing the so-called ‘glass ceiling’ and pay gap problems. As a result, even though women on average tend to be better savers and payers, female entrepreneurs continue receiving fewer and smaller business loans compared to men.

    The use of AI might only worsen the tendency, since the sexist input data can lead to a spate of loan denials among women. Relying on misrepresentational statistics, AI algorithms might rather favor a male applicant over a female one even if all other parameters are relatively similar.

    AI can discriminate based on race

    This sounds harsh, but black applicants are twice as likely to be refused mortgage as white ones. If the input data used for the algorithm learning reflects such a racial disparity, it can put it into practice pretty fast and start causing more and more denials.

    Alternative data can also become the source of 'AI racism’. Consider the algorithm using the seemingly neutral information on an applicant’s prior fines and arrests. The truth is, such information is not neutral. According to The Washington Post, African-Americans become policing targets much more frequently than white population, and in many cases baselessly.

    The same goes for some other types of data. Racial minorities face inequality in occupation, and neighborhoods they live in. All of these kinds of metrics might become solid reasons for AI to say ‘no’ to a non-white applicant.

    AI can discriminate based on age

    The longer a credit history, the more we know about a particular person’s creditworthiness. Older people typically have larger credit histories, as there are more financial transactions behind their backs.

    The young generation, on the contrary, has less data about their operations, which can become an unfair reason for a credit denial.

    AI can discriminate based on education

    Consider an AI lending algorithm that analyzes an applicant’s grammar and spelling while making credit decisions. An algorithm might ‘learn’ that bad spelling habits or constant typos point to poor education and, consequently, bad creditworthiness.

    In the long run, the algorithm can start avoiding qualifying individuals with writing difficulties or disorders even if those have nothing to do with such people’s ability to pay bills.

    Tackling prejudice in lending

    Overall, in order to make AI-run loan processes free of bias, it's crucial to make the input data clean from any possible human prejudice, from misogyny and racism to ageism.

    To make training data more neutral, organizations should form more diverse AI development teams of both lenders and data scientists, where the former can inform engineers on the specifics of their job. What's more, such financial organizations should train everyone involved in making decisions with AI to adhere and enforce fair and non-discriminatory practices in their work. Otherwise, without taking measures to ensure diversity and inclusivity, lending businesses risk to generate AI algorithms that can severely violate anti-discrimination and fair-lending laws.

    Another step toward fairer AI is to make sure that there are no lending decisions made solely by the algorithm; a human supervisor should assess these decisions before they make a real-life impact. Article 22 of the GDPR stands with it, claiming that people should not be subjected to purely automated decision-making, specifically if this can have a legal effect.

    The truth is, this is easier said than done. However, if not addressed, the problem of unintentional AI bias might put lending businesses in a tough spot no less than any intentional act of bias, and only through collective effort of data scientists and lending professionals can we avert imminent risks. 

    Author: Yaroslav Kuflinski

    Source: Information-management

EasyTagCloud v2.8