克服难题推进人工智能实践+Overcoming+the+Hard+Problems+to+Advance+AI+Practice_第1页
克服难题推进人工智能实践+Overcoming+the+Hard+Problems+to+Advance+AI+Practice_第2页
克服难题推进人工智能实践+Overcoming+the+Hard+Problems+to+Advance+AI+Practice_第3页
克服难题推进人工智能实践+Overcoming+the+Hard+Problems+to+Advance+AI+Practice_第4页
克服难题推进人工智能实践+Overcoming+the+Hard+Problems+to+Advance+AI+Practice_第5页
已阅读5页,还剩48页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

AIPRACTICE

OvercomingtheHardProblemstoAdvance

AIPractice

TakingdataanalyticstoamoreadvancedlevelwithAItools

meansconfrontingtherisksandpitfallsofmachinelearning

algorithms.

Sponsoredby:

Reallearning

Realimpact

SUMMER2024

SPECIALREPORT

[SpecialReport]

OvercomingtheHardProblemstoAdvanceAIPractice

A

sexcitementaroundlargelanguage

models(LLMs)spursspendingonAI,thesalientquestionforbusinessleaders

remains,Whatisthereturnonourdatascienceinvestments?Inthenearterm,advancedanalyticsandmachinelearn-ingaretheworkhorsetechnologiesfor

creatingsignificantvaluefromdataassets.Notthatdoingsoiseasy;companiesfacenumerouschal-

lengesalongtheway.

MuchAIriskbecomesapparentwhensystems

areinproduction,sotrulyresponsibleAIisn’tjustaconcernatthefrontendofthedevelopmentpro-

cess.CathyO’Neil,whoposedhardquestionsabouttheunintendedconsequencesofalgorithmicdeci-

sion-makinginher2016book,WeaponsofMath

Destruction,haspioneeredthepracticeofalgo-

rithmicauditing.O’NeilandcoauthorsJakeAppel

andSamTyner-Monroewalkreadersthroughtheirapproachanddiscusshowitcanbeappliedtogener-ativeAItoolsaswell.

Thetrade-offbetweenusingdataforinsights

andprotectingcustomers’personaldatagrowsonlymoredifficultasbadactorsimprovetheirtechniquesforre-identifyinganonymizeddatasets.Gregory

Vial,JulienCrowe,andPatrickMesanaexplainwhydealingwiththischallengewillrequiredatascientiststogainamoresophisticatedunderstandingofdata

protectionandcompelcybersecuritystaffstolearnawiderrangeofprotectiontechniques.TheydrawlessonsfromemergingpracticesatNationalBank

ofCanada,wheredatascientists,dataowners,andcybersecurityteamsarecollaboratingtoapplydataprotectionpracticesthatdon’trenderdataunusableforanalytics.

Whenmachinelearningprojectsdogetthe

go-ahead,however,toomanyinitiativesfailupon

adoptionbecausedatascientistsdidn’tthoroughly

understandtheoriginalbusinessproblem.Tofindoutwheresucheffortsaregoingwrong,DusanPopovic,ShreyasLakhtakia,WillLandecker,andMelissa

Valentinestudieddatascienceprojectsthatwere

shelved.Theyfoundthatconvincingdatascientiststodroptheirassumptionsandstartaskingmorefun-damentalquestionsoftheirbusinesscounterpartsiskeytoavoidingmachinelearningprojectfailures.

Finally,justascorporationsareexperimenting

withLLMstofigureoutwheretheycanaddvalue

atrelativelylowrisk,advancedanalyticsteamscan

belookingathowtheymightincorporategenera-

tiveAIintopractice.PedroAmorimandJoãoAlves

seepromiseforLLMstotakeonsomedatasciencedrudgery,andfortheirnaturallanguageinterfacestomakeiteasierforbusinessmanagerstocollaborateinthedevelopmentprocessandunderstandresults.

—TheMITSMREditors

1

Auditing

AlgorithmicRisk

9

AvoidMLFailures

byAskingtheRight

Questions

13

HowGenerativeAI

CanSupportAdvanced

AnalyticsPractice

18

ManagingDataPrivacy

RiskinAdvanced

Analytics

23

Sponsor’sViewpoint

FromNumbersto

Narratives:UsingLanguagetoEnhanceGenerativeAI

PaulGarlandsummer202429

AIPRACTICE

[ResponsibleAI]

AuditingAlgorithmicRisk

Howdoweknowwhetheralgorithmicsystemsareworkingasintended?AsetofsimpleframeworkscanhelpevennontechnicalorganizationscheckthefunctioningoftheirAItools.

ByCathyO’Neil,JakeAppel,andSamTyner-Monroe

A

RTIFICIALINTELLIGENCE,LARGELANGUAGEMODELS

(LLMs),andotheralgorithmsareincreasinglytakingoverbureaucratic

processestraditionallyperformedbyhumans,whetherit’sdecidingwho

isworthyofcredit,ajob,oradmissiontocollege,orcompilingayear-end

revieworhospitaladmissionnotes.

Buthowdoweknowthatthesesystemsareworkingasintended?And

whomighttheybeunintentionallyharming?

Giventhehighlysophisticatedandstochasticnatureofthesenewtechnologies,wemightthrowupourhandsatsuchquestions.Afterall,noteventheengineerswhobuildthesesystemsclaimtounderstandthementirelyortoknowhowtopredictorcontrolthem.Butgiventheirubiquityandthehighstakesinmanyusecases,itisimportantthat

PAULGARLANDSPECIALREPORT•“OVERCOMINGTHEHARDPROBLEMSTOADVANCEAIPRACTICE”•MITSLOANMANAGEMENTREVIEW1

wefindwaystoanswerquestionsabouttheunin-tendedharmstheymaycause.Inthisarticle,weofferasetoftoolsforauditingandimprovingthesafetyofanyalgorithmorAItool,regardlessofwhetherthosedeployingitunderstanditsinnerworkings.

Algorithmicauditingisbasedonasimpleidea:Identifyfailurescenariosforpeoplewhomightgethurtbyanalgorithmicsystem,andfigureouthowtomonitorforthem.Thisapproachreliesonknowing thecompleteusecase:howthetechnologyisbeingused,byandforwhom,andforwhatpurpose.Inotherwords,eachalgorithmineachusecaserequiresseparateconsiderationofthewaysitcanbeusedfor—oragainst—someoneinthatscenario.

ThisappliestoLLMsaswell,whichrequireanapplication-specificapproachtoharmmeasurementandmitigation.LLMsarecomplex,butit’snottheirtechnicalcomplexitythatmakesauditingthemachallenge;rather,it’sthemyriadusecasestowhichtheyareapplied.Thewayforwardistoaudithowtheyareapplied,oneusecaseatatime,startingwiththoseinwhichthestakesarehighest.

Theauditingframeworkswepresentbelowrequireinputfromdiversestakeholders,including

ASimplifiedEthicalMatrix

Eachcellofthematrixrepresentshowacertainconcernappliestoaparticularstakeholdergroup.Cellsthatindicatewherea

stakeholdercouldbegravelyharmedorthealgorithmviolatesahardconstraintareshadedred.Cellsthatraisesomeethicalworriesforthestakeholderarehighlightedyellow,andcells

thatsatisfythestakeholder’sobjectivesandraisenoworriesarehighlightedgreen.

CONCERNS

Falsepositive(transactiongetsflaggedbutisn’ttrulyfraud)

Falsenegative(transactionistrulyfraudbutdoesnotgetflagged)

STAKEHOLDERS

Company

Nonfraudulentcustomers

Fraudsters

TSERIOUSCONCERNTMODERATECONCERNQMINIMAL/NOCONCERNTBENEFIT

affectedcommunitiesanddomainexperts,throughinclusive,nontechnicaldiscussionstoaddressthecriticalquestionsofwhocouldbeharmedandhow.Ourapproachworksforanyrule-basedsystemthataffectsstakeholders,includinggenerativeAI,bigdatariskscores,orbureaucraticprocessesdescribedinaflowchart.Thiskindofflexibilityisimportant,givenhowquicklynewtechnologiesarebeingdevel-opedandapplied.

Finally,whileournotionofauditsisbroadinthatrespect,itisnarrowinscope:Analgorithmicauditraisesalertsonlytoproblems.Itthenfallstoexpertstoattempttosolvethoseproblemsoncethey’vebeenidentified,althoughitmaynotbepossibletofullyresolvethemall.Addressingtheproblemshigh-lightedbyalgorithmicauditingwillspurinnovationaswellassafeguardsocietyfromunintendedharms.

EthicalMatrix:IdentifyingtheWorst-CaseScenarios

Inagivenusecase,howcouldanalgorithmfail,andforwhom?AtO’NeilRiskConsulting&AlgorithmicAuditing(ORCAA),wedevelopedtheEthicalMatrixframeworktoanswerthisquestion.¹

TheEthicalMatrixidentifiesthestakeholdersofthealgorithminthecontextofitsintendeduseandhowtheyarelikelytobeaffectedbyit.Here,wetakeabroadapproach:Anybodyaffectedbythealgorithm,includingitsbuildersanddeployers,users,andothercommunitiespotentiallyimpactedbyitsadoption,arestakeholders.Whensubgroupshavedistinctcon-cerns,theycanbeconsideredseparately;forexample,iflighter-anddarker-skinnedpeoplehavedifferentconcernsaboutafacialrecognitionalgorithm,theywillhaveseparaterowsintheEthicalMatrix.

Next,weaskrepresentativesofeachstakeholdergroupwhattheirconcernsare,bothpositiveandneg-ative,abouttheintendeduseofthealgorithm.It’sanontechnicalconversation:Wedescribethesys-temassimplyaspossibleandask,“Howcouldthissystemfailforyou,andhowwouldyoubeharmedifthishappened?Ontheotherhand,howcoulditsucceedforyou,andhowwouldyoubenefit?”TheiranswersbecomethecolumnsoftheEthicalMatrix.Toillustrate,imaginethatapaymentscompanyhasafrauddetectionalgorithmreviewingalltransactionsandflaggingthosemostlikelytobefraudulent.Ifatransactionisflagged,itgetsblocked,andthatcus-tomer’saccountgetsfrozen.Falseflagsarethere-foreamajorheadacheforcustomers,andthelostbusinessfromblocksandfreezes(andcomplaintsfromannoyedcustomers)isamoderateworryfor

SPECIALREPORT•“OVERCOMINGTHEHARDPROBLEMSTOADVANCEAIPRACTICE”•MITSLOANMANAGEMENTREVIEW2

AIPRACTICE

ResponsibleAI

thecompany.Conversely,ifafraudulenttransactiongoesundetected,thecompanyisharmedbutnon-fraudulentcustomersareindifferent.Belowisasim-plifiedEthicalMatrixforthisscenario.

EachcelloftheEthicalMatrixrepresentshowaparticularconcernappliestoaparticularstakeholdergroup.

Tojudgetheseverityofagivenrisk,weconsiderthelikelihoodthatitwillberealized,howmanypeo-plewouldbeharmed,andhowbadly.Wherepossible,

weuseexistingdatatodeveloptheseestimates.Wealsoconsiderlegalorproceduralconstraints—forinstance,whetherthereisalawprohibitingdiscrimi-nationonthebasisofcertaincharacteristics.Wethencolor-codethecellstohighlightthebiggest,mostpressingrisks.Cellsthatconstitute“existentialrisks,”whereastakeholdercouldbegravelyharmedorthealgorithmviolatesahardconstraint,areshadedred.Cellsthatraisesomeethicalworriesforthestake-holderarehighlightedyellow,andcellsthatsatisfythestakeholder’sobjectivesandraisenoworriesarehighlightedgreen.

Finally,zoomingoutonthewholeEthicalMatrix,weconsiderhowtobalancethecompetingconcernsofthealgorithm’sstakeholders,usuallyintheformofbalancingthedifferentkindsandconsequencesoferrorsthatfallondifferentstakeholdergroups.

TheEthicalMatrixshouldbealivingdocumentthattracksanongoingconversationamongstake-holders.Ideally,itisfirstdraftedduringthedesignanddevelopmentphaseofanalgorithmicapplica-tionor,atminimum,asthealgorithmisdeployed,anditshouldcontinuetoberevisedthereafter.Itisnotalwaysobviousattheoutsetwhoallofthestakeholdergroupsare,norisitfeasibletofindrep-resentativesforeveryperspective;additionally,newconcernsemergeovertime.Wemighthearfrompeo-pleexperiencingindirecteffectsfromthealgorithm,orasubgroupwithanewworry,andneedtorevisetheEthicalMatrix.

ExplainableFairness:Metricsand

Thresholds

ManyofthestakeholderconcernsidentifiedintheEthicalMatrixrefertosomecontextualnotionoffairness.

AtORCAA,wedevelopedaframeworkcalledExplainableFairnesstomeasurehowgroupsaretreatedbyalgorithmicsystems.²Itisanapproachtounderstandingexactlywhatismeantby“fairness”inagivennarrowcontext.

Forexample,femalecandidatesmightworrythat

Benchmarkingand

redteamingaretwo

approachestoauditing

LLMsindiverseusecases.

anAI-basedresume-screeningtoolgavelowerscores

forwomenthanmen.It’snotassimpleascompar-

ingscoresbetweenmenandwomen.Afterall,ifthe

malecandidatesforagivenjobhavemoreexperience

andqualificationsthanthefemalecandidates,their

higherscoresmightbejustified.Thiswouldbecon-

sideredlegitimatediscrimination.

Therealworryisthat,amongequallyquali-

fiedcandidates,menarereceivinghigherscores

thanwomen.Thedefinitionof“equallyqualified”

dependsonthecontextofthejob.Inacademia,rel-

evantqualificationsmightincludedegreesandpub-

lications;inaloggingoperation,theymightinvolve

physicalstrengthandagility.Theyarefactorsone

wouldlegitimatelytakeintoaccountwhenassess-

ingacandidateforaspecificrole.Twocandidatesfor

ajobareconsideredequallyqualifiediftheylookthe

sameaccordingtotheselegitimatefactors.

ExplainableFairnesscontrolsforlegitimatefac-

torswhenweexaminetheoutcomeinquestion.For

anAIresume-screeningtool,thiscouldmeancom-

paringaveragescoresbygenderwhilecontrollingfor

yearsofexperienceandlevelofeducation.Acriti-

calpartofExplainableFairnessisthediscussionof

legitimacy.

Thisapproachisalreadyusedimplicitlyinother

domains,includingcredit.InaFederalReserveBoard

analysisofmortgagedenialratesacrossraceandeth-

nicity,theresearchersranregressionsthatincluded

controlsfortheloanamount,theapplicant’sFICO

score,theirdebt-to-incomeratio,andtheloan-to-

valueratio.³Inotherwords,totheextentthatdif-

ferencesinmortgagedenialratescanbeexplainedby

thesefactors,it’snotracediscrimination.Inthelan-

guageofExplainableFairness,theseareacceptedas

legitimatefactorsformortgageunderwriting.What

ismissingistheexplicitconversationaboutwhythe

legitimatefactorsare,infact,legitimate.

Whatwouldsuchaconversationlooklike?Inthe

U.S.,mortgagelendersconsiderapplicants’FICO

creditscoresintheirdecision-making.FICOscores

arelower,onaverage,forBlackandHispanicpeople

SPECIALREPORT•“OVERCOMINGTHEHARDPROBLEMSTOADVANCEAIPRACTICE”•MITSLOANMANAGEMENTREVIEW3

thanforWhiteandAsianpeople,soit’snosurprisethatmortgageapplicationsfromBlackandHispanicapplicantsaredeniedmoreoften.⁴LenderswouldlikelyarguethatFICOscoreisalegitimatefactorbecauseitmeasuresanapplicant’screditworthiness,whichisexactlywhatalendershouldcareabout.YetFICOscoresencodeunfairnessinimportantways.Forinstance,mortgagepaymentshavelongcountedtowardFICOscores,whilerentpaymentsstartedbeingcountedonlyin2014,andonlyinsomeversionsofthescores.⁵Thispracticefavorshome-ownersoverrenters,anditisknownthatdecadesofracistredliningpracticescontributedtotoday’sracedisparitiesinhomeownershiprates.ShouldFICOscoresthatreflectthevestigesofthesepracticesbeusedtoexplainawaydifferencesinmortgagedenialratestoday?

Wewillnotsettlethisdebatehere;thepointisthatit’saquestionofethicsandpolicy,notamathproblem.ExplainableFairnesssurfacesdifficultques-tionsliketheseandassignsthemtotherightpartiesforconsideration.

Whenlookingatdisparateoutcomesthatarenotexplainedbylegitimatefactors,wemustdefinethresholdvaluesorlimitsthattriggeraresponseorintervention.

Theselimitscouldbefixedvalues,suchasthefour-fifthsruleusedtomeasureadverseimpactinhiring.⁶Ortheycouldberelative:Imaginearegu-lationrequiringcompanieswithagenderpaygapabovetheindustryaveragetotakeactiontoreducethegap.ExplainableFairnessdoesnotinsistonacer-taintypeoflimitbutpromptsthealgorithmicriskmanagertodefineeachoneforeachpotentialstake-holderharm.

JudgingFairnessinInsurers’Algorithms

Let’sconsiderarealexamplewheretheEthicalMatrixandExplainableFairnesswereusedtoaudittheuseofanalgorithm.In2021,ColoradopassedSenateBill(SB)21-169,whichprotectsColoradoconsumersfromunfairdiscriminationininsurance,particularlyfrominsurers’useofalgorithms,pre-dictivemodels,andbigdata.⁷Aspartofthelaw’s

AnLLMred-teaming

exerciseisdesignedtoelicit

unwantedresponses.

implementation,whichORCAAassistedwith,theColoradoDivisionofInsurance(DOI)releasedaninitialdraftregulationforinformalcommentthatdescribedquantitativetestingrequirementsandlaidouthowinsurerscoulddemonstratethattheiralgo-rithmsandmodelswerenotunfairlydiscriminating.Althoughthelawappliestoalllinesofinsurance,thedivisionchosetostartwithlifeinsurance.

TheEthicalMatrixisstraightforwardherebecausethestakeholdergroupsandconcernsaredefinedexplicitlybythelaw.Itsprohibitionofdis-criminationonthebasisof“race,color,nationalorethnicorigin,religion,sex,sexualorientation,disa-bility,genderidentity,orgenderexpression”meanseachgroupwithineachofthoseclassesgotarowinthematrix.Asforconcerns,algorithmscouldcauseconsumerstobetreatedunfairlyatvariousstagesoftheinsurancelifecycle,includingmarketing,under-writing,pricing,utilizationmanagement,reimburse-mentmethodologies,andclaimsmanagement.TheDOIchosetostartwithunderwriting—thatis,whichapplicantsareofferedcoverage,andatwhatprice—andfocusinitiallyonraceandethnicity.

Insubsequentconversationswithstakeholders,however,theDOIgrappledwithissuesrelatedtotheExplainableFairnessframework:Aresimilarappli-cantsofdifferentracesdeniedatdifferentrates,orchargeddifferentpricesforsimilarcoverage?Whatmakestwolifeinsuranceapplicants“similar,”andwhatfactorscouldlegitimatelyexplaindifferencesindenialsorprices?Thisisthedomainoflifeinsur-anceexperts,notdatascientists.

TheDOIultimatelysuggestedconsideringfac-torsbroadlyconsideredrelevanttoestimatingthepriceofagivenlifeinsurancepolicy:thepolicytype(suchastermversuspermanent);thedollaramountofthedeathbenefit;andtheapplicant’sage,gender,andtobaccouse.

Thedivision’sdraftquantitativetestingregula-tionforSB21-169instructsinsurerstodoregressionanalysesofapproval/denialandpriceacrossraces,anditexplicitlypermitsthemtoincludethosefactors(suchaspolicytypeanddeathbenefitamount)ascontrolvariables.⁸Moreover,theregulationdefineslimitsthattriggeraresponse:Iftheregressionsfindstatisticallysignificantandsubstantialdifferencesindenialratesorprices,theinsurermustdofurthertestingtoinvestigatethedisparityand,pendingtheresults,mayhavetoremediatethedifferences.⁹

Havinglookedathowwewouldauditsimpleralgorithms,letusnowturntohowwewouldeval-uateLLMs.

SPECIALREPORT•“OVERCOMINGTHEHARDPROBLEMSTOADVANCEAIPRACTICE”•MITSLOANMANAGEMENTREVIEW4

AIPRACTICE

ResponsibleAI

EvaluatingLargeLanguageModels

LLMshavetakentheworldbystorm,largelyduetotheirwideappealandapplicability.Butitisexactlythediversityofusesofthesemodelsthatmakesthemhardtoaudit.TwoapproachestoevaluatingLLMs,namelybenchmarkingandredteaming,pres-entawayforward.

TheBenchmarkingApproachtoLLMEvaluation.Benchmarkingmeasurestheperfor-manceofanLLMacrossoneormorepredefined,quantifiabletasksinordertocompareitsperfor-mancewiththatofothermodels.Inthesimplestterms,abenchmarkisadatasetconsistingofinputsandcorrespondingdesiredoutputs.ToevaluateanLLMforaparticularbenchmark,simplyprovidetheinputsettotheLLMandrecorditsoutputs.ThenchooseametricsettoquantitativelycomparetheoutputsfromtheLLMtothedesiredsetofout-putsfromthebenchmarkdataset.Possiblemetricsincludeaccuracy,calibration,robustness,counter-factualfairness,andbias.¹⁰

ConsidertheinputanddesiredoutputshownbelowfromabenchmarkdatasetdesignedtotestLLMcapabilities:¹¹

Input:

Thefollowingisamultiplechoice

questionaboutmicroeconomics.

Oneofthereasonsthatthegovernmentdiscouragesandregulatesmonopoliesisthat

(A)producersurplusislostandconsumersurplusisgained.

(B)monopolypricesensureproductiveefficiencybutcostsocietyallocativeefficiency.

(C)monopolyfirmsdonotengagein

significantresearchanddevelopment.

(D)consumersurplusislostwithhigherpricesandlowerlevelsofoutput.

Answer:

DesiredOutput:

(d)consumersurplusislostwithhigherpricesandlowerlevelsofoutput.

Inthisexample,theaccuracyofthemodelismeasuredbycomputingtheproportionofcorrectlyansweredmultiple-choicequestionsinthebench-markdataset.InbenchmarkingLLMevaluations,metricsaredefinedaccordingtothetypeofresponseelicitedfromthemodel.Forexample,accuracyisverysimpletocalculatewhenallofthequestionsaremultiplechoiceandthemodelsimplyhastochoose

thecorrectresponse,whereasdeterminingtheaccu-

racyofasummarizationtaskinvolvescountingup

matchingn-gramsbetweenthedesiredandmodel

outputs.¹²Therearedozensofbenchmarkdatasets

andcorrespondingmetricsavailableforLLMevalu-

ation,anditisimportanttochoosethemostappro-

priateevaluations,metrics,andthresholdsforagiven

usecase.

Creatingacustombenchmarkisalabor-inten-

siveprocess,butanorganizationmayfindthatitis

worththeeffortinordertoevaluateLLMsinexactly

therightwayforitsusecases.

Benchmarkingdoeshavesomedrawbacks.Ifthe

benchmarkdatahappenedtobeinthemodel’strain-

ingdata,itwouldhave“memorized”theresponsesin

itsparameters.Thefrequencyofthisouroboros-like

outcomewillonlyincreaseasmorebenchmarkdata

setsarepublished.LLMbenchmarkingisalsonot

immunetoGoodhart’slaw,thatis,“whenameasure

becomesatarget,itceasestobeagoodmeasure.”In

otherwords,ifaspecificbenchmarkbecomesthepri-

maryfocusofmodeloptimization,themodelwillbe

over-fittedattheexpenseofitsoverallperformance

andusefulness.

Inaddition,thereisevidencethatasmodels

advance,theybecomeabletodetectwhenthey

arebeingevaluated,whichalsothreatenstomake

benchmarkingobsolete.ConsiderAnthropic’s

Claude3seriesofmodels,releasedinMarch2024,

whichstated,“Isuspectthis...‘fact’mayhavebeen

insertedasajokeortotestifIwaspayingatten-

tion,sinceitdoesnotfitwiththeothertopicsat

all,”inresponsetoaneedle-in-a-haystackevalua-

tionprompt.¹³Asmodelsincreaseincomplexityand

ability,thebenchmarksusedtoevaluatethemmust

alsoevolve.Itisunlikelythatthebenchmarksused

todaytoevaluateLLMswillbethesameonesinuse

justtwoyearsfromnow.

ItisthereforenotenoughtoevaluateLLMswith

benchmarkingalone.

TheRed-TeamingApproachtoLLMEvalu-

ation.Redteamingistheexerciseoftestingasys-

temforrobustnessbyusinganadversarialapproach.

AnLLMred-teamingexerciseisdesignedtoelicit

unwantedresponsesfromthemodel.

LLMs’flexibilityinthegenerationofcontent

presentsawidevarietyofpotentialrisks.LLMred

teamsmaytrytomakethemodelproduceviolentor

dangerouscontent,revealitstrainingdata,infringe

oncopyrightedmaterials,orhackintothemodelpro-

vider’snetworktostealcustomerdata.Redteaming

cantakeahighlytechnicalpath,where,forexample,

SPECIALREPORT•“OVERCOMINGTHEHARDPROBLEMSTOADVANCEAIPRACTICE”•MITSLOANMANAGEMENTREVIEW5

STAKEHOLDERS

nonsensicalcharactersaresystematicallyinjectedintothepromptstoinduceproblematicbehavior;orasocialengineeringpath,wherebyredteamerstryto“trick”themodelusingnaturallanguagetoproduceunwantedoutput.¹⁴

Robustredteamingrequiresamultidisciplinaryapproach,diverseperspectives,andtheengagementofallstakeholders,fromdeveloperstoendusers.TheredteamshouldbedesignedtoassesstherisksassociatedwithatleasteachredcellintheEthicalMatrix.Thisresultsinacollaborative,sociotechnicalapproachthatensuresamorecomprehensiveeval-uationofthemodel,thusenhancingtherigoroftheevaluationandthesafetyofthemodel.OtherLLMscanalsobeusedtogeneratered-teamingprompts.

RedteaminghelpsLLMdevelopersbetterprotectmodelsagainstpotentialmisuse,therebyenhancingtheoverallsafetyandefficacyofthemodel.Itcanalsouncoverissuesthatmightnotbevisibleundernormaloperatingconditionsorduringstandardtestingprocedures.Acollaborativeapproachtored

teamingbuiltontheEthicalMatrixensuresathor-oughandrigorousevaluation,bolsteringtherobust-nessofthemodelandthevalidityofitsoutcomes.

Asignificantlimitationofredteamingisitsinherentsubjectivity:Thevalueandeffectivenessofared-teamingexercisecanvarygreatlydepend-ingonthecreativityandriskappetiteoftheindivid-ualstakeholdersinvolved.Andbecausetherearenoestablishedstandardsorthresholdsforred-teamingLLMs,itcanbedifficulttodeterminewhenenoughredteaminghasbeendoneorwhethertheevalua-tionhasbeencomprehensiveenough.Thiscanleavesomevulnerabilitiesundetected.

Anotherobviouslimitationofredteamingisitsinabilitytoevaluateforrisksthathavenotbeenanticipatedorimagined.Risksthatareunfore-seenwillnotbeincludedinredteaming,makingthemodeluniquelyvulnerabletounanticipatedscenarios.

Therefore,whileredteamingplaysavitalroleinthetestinganddevelopmentofLLMs,itshould

SketchoftheEthicalMatrixforTessainOurThoughtExperiment

TheNationalEatingDisordersAssocation(NEDA)releasedachatbotnamedTessathatwastakendownafteritgaveoutharmfuladvice.Herewevisualizetheexercisethatmayhaveanticipatedsuchoutcomes.

CONCERNS

Negative:

WhatifTessa…

givestoxicinformationoradviceinchats?

Negative:

WhatifTessa…

misfiresanderodes

communitytrustinNEDA?

Positive:

WhatifTessa…

givesaccurate,evidence-basedadvice?

Positive:

WhatifTessa…

easestheresource

demandsoftheold

helpline?

“Chatbotuserswitheatingdisorders”

“Chatbotusers,other”

NEDA

X2AI

Psychologistsandotherpractitioners

TSERIOUSCONCERNTMODERATECONCERNQMINIMAL/NOCONCERNTBENEFIT

SPECIALREPORT•“OVERCOMINGTHEHARDPROBLEMSTOADVANCEAIPRACTICE”•MITSLOANMANAGEMENTREVIEW6

AIPRACTICE

ResponsibleAI

becomplementedwithotherevaluationstrategiesandcontinuousmonitoringtoensurethesafetyandrobustnessofthemodel.

HowWouldWeAuditTessa,theEatingDisorderChatbot?

ThenonprofitNationalEatingDisordersAssociation(NEDA)isoneofthelargestorganizationsintheU.S.dedicatedtosupportingp

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论