




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
SUPPORTPOOL
OFEXPERTSPROGRAMME
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision
Biasevaluation
byDr.KrisSHRISHAK
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation
2
AspartoftheSPEprogramme,theEDPBmaycommissioncontractorstoprovidereportsandtoolsonspecifictopics.
TheviewsexpressedinthedeliverablesarethoseoftheirauthorsandtheydonotnecessarilyreflecttheofficialpositionoftheEDPB.TheEDPBdoesnotguaranteetheaccuracyoftheinformationincludedinthedeliverables.NeithertheEDPBnoranypersonactingontheEDPB’sbehalfmaybeheldresponsibleforanyusethatmaybemadeoftheinformationcontainedinthedeliverables.
Someexcerptsmayberedactedorremovedfromthedeliverablesastheirpublicationwouldunderminetheprotectionoflegitimateinterests,including,interalia,theprivacyandintegrityofanindividualregardingtheprotectionofpersonaldatainaccordancewithRegulation(EU)2018/1725and/orthecommercialinterestsofanaturalorlegalperson.
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation
3
TABLEOFCONTENTS
1Stateoftheartforbiasevaluation 5
1.1Sourcesofbias 5
1.1.1Biasfromdata 5
1.1.2Algorithmbias 6
1.1.3Evaluationbias 6
1.1.4Sourcesofbiasinfacialrecognitiontechnology 7
1.1.5SourcesofbiasingenerativeAI 7
1.2Methodstoaddressbias 8
1.2.1Pre-processing 9
1.2.2In-processing 11
1.2.3Post-processing 11
1.2.4MethodsforgenerativeAI 12
2Toolsforbiasevaluation 13
2.1IBMAIF360 13
2.2Fairlearn 13
2.3HolisticAI 14
2.4Aequitas 14
2.5What-IfTool 14
2.6Othertoolsconsidered 15
Conclusion 15
Bibliography 16
DocumentsubmittedinMarch2024
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation
4
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation
5
1STATEOFTHEARTFORBIASEVALUATION
Artificialintelligence(AI)systemsaresocio-technicalsystemswhosebehaviourandoutputscanharmpeople.BiasinAIsystemscanharmpeopleinvariousways.Biascanresultfrominterconnectedfactorsthatmaytogetheramplifyharmssuchasdiscrimination(EuropeanUnionAgencyforFundamentalRights,2022;Weertsetal.,2023).MitigatingbiasinAIsystemsisimportantandidentifyingthesourcesofbiasisthefirststepinanybiasmitigationstrategy.
1.1Sourcesofbias
TheAIpipelineinvolvesmanychoicesandpracticesthatcontributetobiasedAIsystems.BiaseddataisjustoneofthesourcesofbiasedAIsystems,andunderstandingitsvariousformscanhelptodetectandtomitigatethebias.Inoneapplication,thelackofrepresentativedatamightbethesourceofbias,e.g.,medicalAIwheredatafromwomenwithheartattacksislessrepresentedthanmeninthe
dataset.Inanother,theproxyvariablesthatembedgenderbiasmightbetheproblem,e.g.,inrésuméscreening.Increasingthedatasetsizeforwomencouldhelpintheformercase,butnotinthelattercase.
Inadditiontobiasfromdata,AIsystemscanalsobebiasedduetothealgorithmandtheevaluation.Thesethreesourcesofbiasarediscussednext.
1.1.1Biasfromdata
1.Historicalbias:WhenAIsystemsaretrainedonhistoricaldata,theyoftenreflectsocietalbiaswhichareembeddedinthedataset.Out-of-datedatasetswithsensitiveattributesandrelatedproxyvariablescontributetohistoricalbias.Thiscanbeattributedtoacombinationoffactors:howandwhatdatawerecollectedandthelabellingofthedata,whichinvolvessubjectivityandthebiasofthelabeller.AnexampleofhistoricalbiasinAIsystemshasbeenshownwithwordembedding(Gargetal.,2018),whicharenumericalrepresentationsofwordsandareusedindevelopingtextgenerationAIsystems.
2.Representationbias:Representationbiasisintroducedwhendefiningandsamplingfromthetargetpopulationduringthedatacollectionprocess.Representationbiascantaketheformofavailabilitybiasandsamplingbias.
a.Availabilitybias:DatasetsusedindevelopingAIsystemsshouldrepresentthechosentargetpopulation.However,datasetsaresometimeschosenbyvirtueoftheiravailabilityratherthantheirsuitabilitytothetaskathand.Availabledatasetsoftenunderrepresentwomenandpeoplewithdisabilities.Furthermore,availabledatasetsareoftenusedoutofcontextforpurposesdifferentfromtheirintendedpurpose(Paulladaetal.,2021).ThiscontributestobiasedAIsystems.
b.Samplingbias:Itisusuallynotpossibletocollectdataabouttheentiretargetpopulation.Instead,asubsetofdatapointsrelatedtothetargetpopulationiscollected,selectedandused.Thissubsetorsampleshouldberepresentativeofthetargetpopulationforittoberelevantandofhighquality.Forinstance,datacollectedfromscrapingRedditorothersocialmediasitesarenotrandomizedandarenotrepresentativeofthepopulationthatdon’tusethesesites.Suchdataarenot
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation
6
generalizableforwiderpopulationbeyondthesesites.Andyet,thedataareusedinAImodelsdeployedinothercontexts.
Whendefiningthetargetpopulation,thesubgroupswithsensitivecharacteristicsshouldbeconsidered.AnAIsystembuiltusingadatasetcollectedfromacitywillonlyhaveasmallpercentageofcertainminoritygroups,say5%.Ifthedatasetisusedas-is,thentheoutputsofthisAIsystemwillbebiasedagainstthisminoritygroupbecausetheyonlymakeup5%ofthedatasetandtheAIsystemhasrelativelylessdatatolearnfromaboutthem.
3.Measurementbias:Datasetscanbetheresultofmeasurementbias.Often,thedatathatiscollectedisaproxyforthedesireddata.Thisproxydataisanoversimplificationofthereality.Sometimestheproxyvariableitselfiswrong.Furthermore,themethodofmeasurement,andconsequently,thecollectionofthedatamayvaryacrossgroups.Thisvariationcouldbeduetoeasieraccesstothedatafromcertaingroupsoverothers.
4.Aggregationbias:Falseconclusionsmaybedrawnaboutindividualsorsmallgroupswhenthedatasetisdrawnfromtheentirepopulation.ThemostcommonformofthisbiasisSimpson’sparadox(Blyth,1972)wherepatternsobservedinthedataforsmallgroupsdisappearwhenonlytheaggregatedataovertheentirepopulationisconsidered.Themostwell-knownexampleofthiscomesfromtheUCBerkeleyadmissionsin1973(Bickeletal.,1975).Basedontheaggregatedata,itseemedthatwomenapplicantswererejectedsignificantlymorethanmen.However,theanalysisofthedataatthedepartmentlevelrevealedthattherejectionrateswerehigherformeninmostdepartments.Theaggregatefailedtorevealthisbecauseahigherproportionofwomenappliedtodepartmentswithlowoverallacceptanceratethantheydidtodepartmentswithhighacceptancerate.
1.1.2Algorithmbias
Althoughmuchofthediscussionaroundbiasfocussesonthebiasfromdata,othersourcesofbiasthatcontributetodiscriminatorydecisionsshouldnotbeoverlooked.Infact,AImodelsreflectbiasedoutputsnotonlyduetothedatasetsbutalsoduetothemodelitself(Hooker,2021).Evenwhenthedatasetsarenotbiasedandareproperlysampled,thealgorithmicchoicescancontributetobiaseddecisions.Thisincludesthechoiceofobjectivefunctions,regularisations,howlongthemodelistrained,andeventhechoiceofstatisticallybiasedestimators(Danks&London,2017).
Thevarioustrade-offsmadeduringthedesignanddevelopmentprocesscouldresultindiscriminatoryoutputs.Suchtrade-offscanincludemodelsizeandthechoiceofprivacyprotectionmechanisms(Ferryetal.,2023;Fiorettoetal.,2022;Kulynychetal.,2022).EvenwithDiversityinFaces(DiF)datasetthathasbroadcoverageoffacialimages,anAImodeltrainedwithcertaindifferentialprivacytechniquesdisproportionatelydegradesperformancefordarker-skinnedfaces(Bagdasaryanetal.,2019).Furthermore,techniquestocompressAImodelscandisproportionallyaffecttheperformanceofAImodelsforpeoplewithunderrepresentedsensitiveattributes(Hookeretal.,2020).
1.1.3Evaluationbias
TheperformanceofAIsystemsisevaluatedbasedonmanymetrics,fromaccuracyto“fairness”.Suchassessmentsareusuallyperformedagainstabenchmark,oratestdataset.Evaluationbiasarisesatthisstagebecausethebenchmarkitselfcouldcontributetobias.
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation
7
AIsystemscanperformextremelywellagainstaspecifictestdataset,andthistestperformancemayfailtotranslateintoreal-worldperformancedueto“overfitting”tothetestdataset.Thisisespeciallyaproblemifthetestdatasetcarriesoverhistorical,representationormeasurementbias.
Forinstance,ifthetestdatasetwascollectedfromtheUSA,itisunlikelytoberepresentativeforthepopulationinGermany;or,ifthedatasetwascollectedin2020duringCOVID-19butusedinamedicalsettinginanon-COVID-19year.Thismeans,thatevenifthebiasinthetrainingdatasetismitigated,biasmightcreepinattheevaluationstage.
1.1.4Sourcesofbiasinfacialrecognitiontechnology
Historical,representationandevaluationbiasarethemaincausesofbiasinfacialrecognitiontechnology(FRT)and,morebroadly,imagerecognition.Thisisbecausethetrainingandbenchmarkdatasetsareconstructedfrompublicly-availableimagedatasets,oftenthroughwebscraping,thatarenotrepresentativeofdifferentgroupsanddifferentgeographies(Raji&Buolamwini,2019).
DatabasessuchasOpenImagesandImageNetmostlycontainimagesfromtheUSAandtheUK(Shankaretal.,2017).IJB-AandAdiencehavebeenshowntomostlycontainimagesofpeoplewithlight-skinandunderrepresentingpeoplewithdarkskin(Buolamwini&Gebru,2018).Furthermore,racialslursandderogatoryphrasesgetembeddedduringthelabellingprocessofimages(Birhane&Prabhu,2021;Crawford&Paglen,2021).Anddespitedatasetsbeingflaggedforremoval,someofthesedatasetsarestillbeingused(Peng,2020).Iftheseareusedfortrainingand/ortestingFRT,then,bydesign,they’llbebiased.
Evendatasetsthatattempttoaddresstheproblemcanfailintheprocess.IBM’s“DiversityinFaces”datasetwasintroducedtoaddressthelackofdiversityinimagedatasets(Merleretal.,2019).However,itraisedmoreconcerns(Crawford&Paglen,2021).First,theimageswerescrapedfromthewebsiteFlickrwithouttheconsentofthesiteusers(Salon,2019).Second,itusesskullshapesasanadditionalmeasure,whichhashistoricallybeenusedtoshowracialsuperiorityofwhitepeopleand,hence,embedshistoricalbias(Gould,1996).Finally,thedatasetwasannotatedbythreeAmazonTurkworkerswhoguessedtheageandgenderoftheimagesthatwerescraped.
1.1.5SourcesofbiasingenerativeAI
GenerativeAIallowsforthegenerationofcontentincludingtext,images,audioandvideo.Thesourcesofbiasdiscussedintheprevioussections—biasfromdata,algorithmbiasandevaluationbias—getcarriedovertoAIthatgeneratescontent.Inaddition,generativeAIsystemsaredevelopedwithlargeamountsonuncurateddatascrapedfromtheweb.Thisaddsanadditionallayerofriskasthedeveloperswouldlackadequateknowledgeaboutthedataanditsstatisticalproperties,makingithardertoassessthesourcesofbias.
Furthermore,manyofthegenerativeAImodelsaredevelopedwithoutanintendedpurpose.Apre-trainedmodelisbuiltandthenapplicationsaredevelopedontopofthispre-trainedmodelbyotherorganisations.Thus,thesourceofbiascanbeinthepre-trainedmodelandinthecontextofthedownstreamapplication.Whenbiasisembeddedinthepre-trainedmodel,thebiaswillpropagatedownstreamtoalltheapplications.
GenerativeAIdatasetscanreflecthistoricalbias,representationbiasandevaluationbias(Benderetal.,2021).Biascanalsoariseduetodatalabelling,especiallywhenfine-tuningapre-trainedmodelforaspecificapplication.LabelsorannotationsareoftenaddedtothedatabyunderpaidworkersandAmazonTurks.Theymaychoosethewronglabelsbecausetheyaredistracted,orworse,becausetheyembedtheirownbiasbynotbeingfromtherepresentativepopulationwheretheAIsystemwillbe
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation
8
deployed.Thisisespeciallythecasewhenmorethanonelabelcouldpotentiallyapplytothedata(Planketal.,2014).
Althoughthedatasetusedforpre-trainedmodeliscurrentlyneithercuratednorlabelledbyhumans(whichorganisationsclaimtobecostly),theprocessofreinforcementlearningfromhumanfeedbackusedbycompaniesdevelopinggenerativeAIintroducesthesamebiases,albeitatalaterstageinthedevelopmentprocess.
Evenwhenthetextdatasetsarewell-labelled,theycancontainsocietalbiasthatariseduetospuriouscorrelations,whicharestatisticalcorrelationsbetweenfeaturesandoutcomes.InthecaseoftextgenerativeAI,suchspuriouscorrelationscanbeobservedwithwordembeddings,whichunderlietextgenerativeAI(Gargetal.,2018):e.g.,‘man’beingassociatedwith‘programming’and‘woman’beingassociatedwith‘homemaker’.Furthermore,asthesearemathematicalobjects,thecontextualinformationaboutthewordsgetlost,andtheyhavebeenobservedtooutput“doctor”-“father”+“mother”as“nurse.”Pre-trainedlanguagemodelssuchasGPTthatrelyonuncurateddatasetsarealsosusceptibletothisissue(Tan&Celis,2019),andmerelyincreasingthesizeofthemodeldoesnotaddresstheproblem(Sagawaetal.,2020).
1.2Methodstoaddressbias
Noautomatedmechanismcanfullydetectandmitigatebias(Wachteretal.,2020).Thereareinherentlimitationswithtechnicalapproachestoaddressbias(Buyl&DeBie,2024).Theseapproachesarenecessary,butnotsufficientforAIsystems,whicharesocio-technicalsystems(Schwartzetal.,2022).ThemostappropriateapproachesdependonthespecificcontextforwhichtheAIsystemisdevelopedandused.Moreover,thecontextualandsocio-culturalknowledgeshouldcomplementthesetechnicalapproaches.
BasedonwhentheinterventionismadeintheAIlifecycletomitigatebias,thetechnicalmethodsandtechniquestoaddressbiascanbeclassifiedintothreetypes(d’Alessandroetal.,2017):
1.Pre-processing:ThesetechniquesmodifythetrainingdatabeforeitisusedtotrainanAImodeltoobscuretheassociationsbetweensensitivevariablesandtheoutput.Pre-processingcanhelpidentifyhistorical,measurementandrepresentationalbiasindata.
2.In-processing:ThesetechniqueschangethewaytheAItrainingprocessisperformedtomitigatebiasthroughchangesintheobjectivefunctionorwithanadditionaloptimisationconstraint.
3.Postprocessing:ThesetechniquestreattheAImodeltobeopaqueandattempttomitigatebiasafterthecompletionofthetrainingprocess.Theassumptionbehindthesetechniquesisthatitisnotpossibletomodifythetrainingdataorthetraining/learningprocesstoaddressthebias.Thus,thesetechniquesshouldbetreatedasalastresortintervention.
Merelyremovingsensitivevariablesfromthedatasetisnotaneffectiveapproachtomitigatebiasduetotheexistenceofproxyvariables(Dworketal.,2012;Kamiran&Calders,2012).
Pre-processingapproachesareagnostictotheAItypeasitfocussesonthedataset.Thisisanimportantadvantage.Furthermore,manyoftheapproacheshavebeendevelopedandtestedoverthepastdecadeandaremorematurethanin-processingtechniques.Pre-processingapproachesareearly-stageinterventionandcanassistwithchangingthedesignanddevelopmentprocess.However,
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation
ifthesetechniquesaretheonlyinterventionused,theymightgivetheillusionthatallthebiashasbeenresolved—whichisnotthecase(Obermeyeretal.,2019).theyareonlythestartingpoint.
Forregulators,preprocessingtechniquesareusefulonlyiftheyhaveaccesstothedatasetsthatwereusedtotrainthemodel.Furthermore,theregulatorneedstoconsiderwhetherotherin-processingandpost-processingtechniqueswereusedbythedeveloperanddeployersoftheAIsystem.
1.2.1Pre-processing
1.Dataprovenance(Cheneyetal.,2009;Gebruetal.,2018):Dataprovenanceisanessentialstepbeforeothermethodstomitigatebiasfromdatacanbeused.Itattemptstoanswerwhere,howandwhythedatasetcametobe,whocreatedit,whatitcontains,howitwillbeused,andbywhom.Intheareaofmachinelearning,theterm‘datasheet’ismorecommonlyused.Dataprovenancecan,inthecontextofdataprotection,includethelistingofpersonaldataandnon-personaldata.
2.Causalanalysis(Glymour&Herington,2019;Salimietal.,2019):DatasetsusedtotrainAImodelsoftenincluderelationshipsanddependenciesbetweensensitiveandnon-sensitivevariables.Thus,anyattemptstomitigatebiasinthedatasetrequiresunderstandingtherelationshipsbetweenthesevariables.Otherwise,non-sensitivevariablescouldactasproxiesforthesensitivevariables.Causalanalysishelpswithidentifyingtheseproxies,oftenintheformofvisualizingasagraphthelinkbetweenthevariablesinthedataset.
Causalanalysiscanbeextendedto“repair”thedatasetbyremovingthedependenciesbasedonpre-defined“fairness”criteria.
1
However,thisapproachreliesonpriorcontextualknowledgeabouttheAImodelanditsdeployment,inadditiontobeingcomputationallyintensiveforlargedatasets.
3.Transformation(Calmonetal.,2017;Feldmanetal.,2015;Zemeletal.,2013):Theseapproachesincludetransformingthedataintoalessbiasedrepresentation.Thesetransformationscouldinvolveeditingthelabelssuchthattheybecomeindependentofspecificprotectedgroupingsorbasedonspecific“fairness”objectives.
Transformationsarenotwithoutlimitations.First,transformationsusuallyaffecttheperformanceoftheAImodelandthereisaninherenttrade-offbetweenbiasmitigationandperformancewhenusingthisapproach.Second,transformationsarelimitedtonumericaldataandcannotbeusedforotherkindsofdatasets.Third,thisapproachissusceptibletobiaspersistingduetotheexistenceofproxyvariables.Forthisreason,theuseofthisapproachshouldbeprecededbycausalanalysistounderstandthelinksbetweenthespecialcategorydataandtheproxyvariablesinthestartingdataset.Eventhen,thereisnoguaranteethatthetransformationshaveeliminatedtherelationshipbetweenthespecialcategorydataand
1Thetechnicalliteratureusestheterm"fairness"andtherearenumerousdefinitionsandmetricsof"fairness"(Hutchinson&Mitchell,2019).ManyofthesehavebeendevelopedinthecontextoftheUSA,somebasedonthe“four-fifthsrule”fromUSFederalemploymentregulation,whicharenotvalidinothercontextsandcountries(Watkinsetal.,2022).Furthermore,thesemetricsareincompatiblewitheachother(Kleinbergetal.,2016).
9
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation
proxyvariables.Third,transformationscouldmaketheAImodellessinterpretable(Leprietal.,2018).
4.Massagingorrelabeling(Kamiran&Calders,2012):Relabelingisaspecifictypeoftransformationtostrategicallymodifythelabelsinthetrainingdatasuchthatthedistributionofpositiveinstancesforallclassesisequal.Forexample,ifadatasetcontainsdataaboutmenandwomen,theproportionofthedatasetthatislabelled‘+’forwomenshouldbethesameasthatformen.Iftheproportionislessforwomen,thensomeofthedatapointsforwomenthatwereclosetobeingclassifiedas‘+‘butwereinitiallylabelled‘-’willbechanged,andthereversewillbedonefordatapointsformen.Thisapproachisnotrestrictedtotrainingdatasetandcanalsobeusedforvalidationandtestdatasets.
5.Reweighing(Caldersetal.,2009;Jiang&Nachum,2020;Krasanakisetal.,2018):Insteadofchangingthelabelsinthedataset,thisapproachaddsspecific‘weight’foreachdatapointtoadjustforthebiasinthetrainingdataset.Theweightscanbechosenbasedonthreefactors:
(1)thespecialcategoriesofpersonaldataalongwiththeprobabilityinthepopulationofthissensitiveattribute,(2)theprobabilityofaspecificoutcome[+/-]and(3)observedprobabilityofthisoutcomeforasensitiveattribute.
Forinstance,womenconstitute50%ofallhumans,andifthelabel‘+’isassignedto60%ofalldatainthedataset,then30%ofthedatasetshouldcontainwomenwitha‘+’label.However,ifitisobservedthatonly20%ofdatasethaswomenwitha‘+’label,thena1.5weightisappendedtowomenwitha‘+’label,0.75isappendedtomenwitha‘+’label,andsoon,toadjustforthebias.
Alternatively,amoredynamicapproachcanbetakenbytraininganunweightedclassifiertolearntheweightsandthenretraintheclassifierbyusingthoseweights.
2
Reweighingismoresuitableforsmallmodelswhereretrainingisnottooexpensiveintermsofcostandresources.
6.Resampling(Kamiran&Calders,2012):Incontrasttothepreviousmethods,theresamplingmethoddoesnotinvolveaddingweightstothesample,nordoesitinvolvechanginglabelsinthetrainingdataset.Instead,thisapproachfocussesonhowsamplesfromthedatasetarechosentobeusedfortrainingsuchthatabalancedsetofsamplesisusedfortraining.Datafromtheminorityclasscanbeduplicated,or“oversampled”,whiledatafromthemajorityclasscanbeskipped,or“under-sampled”.ThechoiceusuallydependsonthesizeoftheentiredatasetandtheoverallimpactontheperformanceoftheAImodel.Forinstance,under-samplingrequiresdatasetswithsufficientlylargeamountsofdatafromthedifferentclasses.
7.Generatingartificialtrainingdata(Sattigerietal.,2019):Whenthequantityofavailabledataislimited,especiallyforunstructureddatasuchasimages,agenerativeprocesscanbeusedtodevelopthedataset.Theuseofgenerativeadversarialnetworks(GAN)whichincludesspecificbiasconsiderationscancontributetogeneratingandusinglessbiaseddatasetsfor
2Thisprocessoftraininganunweightedmodelfirst,makesthisapproachofreweighingamixofin-processingandpre-processing.
10
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation
11
training.Thisapproachassumesthatanappropriatefairnesscriterionisavailable,whichisastrongassumption,anditrequiressignificantcomputingpower.
1.2.2In-processing
1.Regularisation(Kamishimaetal.,2012):Regularisationisusedinmachinelearningtopenaliseundesiredcharacteristics.Thisapproachwasprimarilyusedtoreduceover-fittingbuthasbeenextendedtoaddressbias.Thisapproachpenalisesclassifierswithdiscriminatorybehaviour.Itisadata-drivenapproachthatreliesonbalancingfairness(asdefinedbyachosenfairnessmetric)andaperformancemetricsuchasaccuracyortheratiobetweentruepositiverateandfalsepositiverateforminoritygroups(Bechavod&Ligett,2017).
Whilethisapproachisgenericandflexible,itreliesonthedeveloperchoosingthemostsuitablemetric,whichallowsforgamification.Inaddition,therearealsoconcernsthatnotallfairnessmeasuresareequallyaffectedbyregularisationparameters(Stefanoetal.,2020).Furthermore,thisapproachcouldresultinreducedaccuracyandrobustness.
2.Constrainedoptimisation(Agarwaletal.,2018;Zafaretal.,2017):Constrainedoptimisation,asthenamesuggests,constrainstheoptimisationfunctionbyincorporatingafairnessmetricduringthemodeltrainingbyeitheradaptinganexistinglearningparadigmorthroughwrappermethods.Inessence,thisapproachchangesthealgorithmoftheAImodel.Inadditiontofairnessmetrics,otherconstraintsthatcapturedisparitiesinpopulationfrequenciescanbeincluded,resultingintrade-offsbetweenthemetrics.
Thechosenfairnessmetriccanresultinvastlydifferentmodelsandhence,thisapproachisheavilyreliantonthechoiceofthefairnessmetric,whichresultsindifficultytobalancetheconstraintsaswellasunstabletraining.
3.Adversarialapproach(Celis&Keswani,2019;Zhangetal.,2018):Whileadversariallearningisprimarilyanapproachtodeterminetherobustnessofmachinelearningmodels,itcanalsobeusedasamethodtodeterminefairness.Anadversarycanattackthemodeltodeterminetheprotectedattributefromtheoutputs.Thentheadversaryfeedbackcanbeusedtopenaliseandupdatethemodeltopreventdiscriminatoryoutputs.Themostcommonapproachofincorporatingthisfeedbackisasanadditionalconstraintintheoptimisationprocess,thatis,throughconstrainedoptimisation.
1.2.3Post-processing
1.Calibration(Pleissetal.,2017):Calibrationistheprocesswheretheproportionofpositivepredictionsisthesameforallsubgroups(protectedorotherwise)inthedata.Thisapproachdoesnotdirectlyaddressthebiasesbuttacklesitindirectlybyensuringthattheprobabilityofpositiveoutcomesisequalacrosssocialgroups.
However,calibrationislimitedinflexibilityandinaccommodatingmultiplefairnesscriteria.Infact,thelatterisshowntobeimpossible(Kleinbergetal.,2016).Althoughmanyapproachessuchasrandomisationduringpost-processinghavebeensuggested,thisisanongoingareaofresearchwithoutaclearconsensusonthebestapproach.
AI-ComplexAlgorithmsandeffectiveDataProtectionSupervision-Biasevaluation
12
2.Thre
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2024年度黑龙江省高校教师资格证之高等教育法规综合检测试卷A卷含答案
- 大学语文批判思维试题及答案
- 创新思政理论在2024年的探讨试题及答案
- 【道法】第一单元 珍惜青春时光练习课件-2024-2025学年统编版道德与法治七年级下册
- (高清版)DB12∕T 512-2014 土壤样品中硝态氮的测定方法
- 实习协议三方(2025年版)
- 二零二五年度技术成果保密协议解除及后续处理
- 二零二五年度文化产业分红协议书
- 二零二五年度文化旅游资金代管合作协议
- 2025年度酒店客房协议价及长住客户优惠政策合同
- 2025年上海烟草集团有限责任公司招聘笔试参考题库含答案解析
- 2025年国家电网校园招聘历年高频重点提升(共500题)附带答案详解
- 健康管理智慧健康管理系统建设方案
- 盆底生物反馈治疗肛门直肠功能障碍性疾病中国专家共识(2024版)解读
- 《低压智能断路器检测规范》
- 2025年河北省职业院校技能大赛生产事故应急救援(高职组)赛项考试题库(含答案)
- 《奥马哈系统在老年髋部骨折患者出院后延续性护理中的应用研究》
- 信息安全保密三员培训
- 《赤壁之战》课本剧剧本:烽火连天英雄辈出(6篇)
- 2024年10月自考13648工程项目管理试题及答案含评分参考
- 2型糖尿病护理查房
评论
0/150
提交评论