




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
ResponsibleAIProgressReport
PublishedinFebruary2025
2
Foreword
AIisatransformationaltechnologythatoffersbothauniqueopportunitytomeetourmission,andthechancetoexpand
scientificdiscoveryandtacklesomeoftheworld’smostimportantproblems.AtGooglewebelieveit’scrucialthat
wecontinuetodevelopanddeployAI
aroundtheworldcanbenefitfromits
extraordinarypotentialwhileatthesametimemitigatingagainstitspotentialrisks.
In2018,wewereoneofthefirstintheindustrytoadoptAIPrinciples,and
sincethen,we’vepublishedannual
informationfromourlatestresearchand
practiceonAIsafetyandresponsibilitytopics.Itdetailsourmethodsforgoverning,mapping,measuring,andmanagingAIrisksalignedtotheNISTframework,aswellasupdateson
howwe’reoperationalizingresponsibleAIinnovationacrossGoogle.Wealsoprovide
frameworksemergingfromothercompaniesandacademicinstitutions.Ourupdated
AIPrinciples—centeredonboldinnovation,responsibledevelopment,andcollaborativepartnership—reflectwhatwe’relearningasAIcontinuestoadvancerapidly.
responsibly,withafocusonmakingsurethatpeople,businesses,andgovernments
AIresponsibilityreportsdetailingourprogress.Thisyear’sreportshares
morespecificinsightsandbestpracticesontopicsrangingfromourrigorousredteamingandevaluationprocessestohowwemitigate
AsAItechnologyanddiscussionsaboutitsdevelopmentandusescontinuetoevolve,wewillcontinuetolearnfromourresearch
riskusingtechniques,includingbettersafetytuningandfilters,securityandprivacy
controls,provenancetechnologyinour
products,andbroadAIliteracyeducation.
andusers,andinnovatenewapproachestoresponsibledevelopmentanddeployment.Aswedo,weremaincommittedtosharingwhatwelearnwiththebroaderecosystemthrough
OurapproachtoAIresponsibilityhasevolvedovertheyearstoaddressthedynamicnatureofourproducts,theexternalenvironment,
andtheneedsofourglobalusers.Since
thepublicationofreportslikethis,andalsothroughcontinuousengagement,discussion,andcollaborationwiththewidercommunitytohelpmaximizethebenefitsofAIforeveryone.
2018,AIhasevolvedintoageneral-purpose
LaurieRichardson
technologyuseddailybybillionsofpeopleandcountlessorganizationsandbusinesses.
VicePresident,Trust&Safety,Google
Thebroadestablishmentofresponsibility
frameworkshasbeenanimportantpartof
thisevolution.We’vebeenencouragedby
progressonAIgovernancecomingfrom
bodiesliketheG7andtheInternational
OrganizationforStandardization,andalso
3
SummaryofourresponsibleAIapproach
WehavedevelopedanapproachtoAIgovernancethatfocusesonresponsibilitythroughouttheAIdevelopmentlifecycle.ThisapproachisguidedbyourAIPrinciples,whichemphasizeboldinnovation,responsibledevelopment,andcollaborativeprogress.
OurongoingworkinthisareareflectskeyconceptsinindustryguidelinesliketheNISTAIRiskManagementFramework.
GovernMapMeasureManage
OurAIPrinciplesguideourdecision-makingand
informthedevelopmentofourdifferentframeworksandpolicies,includingtheSecureAIFramework
forsecurityandprivacy,andtheFrontierSafetyFrameworkforevolvingmodelcapabilitiesandmitigations.Additionalpoliciesaddressdesign,safety,andprohibiteduses.
Ourpre-andpost-launchprocessesensure
alignmentwiththesePrinciplesandpolicies
throughclearrequirements,mitigationsupport,andleadershipreviews.Thesecovermodelandapplicationrequirements,withafocusonsafety,privacy,andsecurity.Post-launchmonitoring
andassessmentsenablecontinuousimprovementandriskmanagement.
Weregularlypublishexternalmodelcardsandtechnicalreportstoprovidetransparencyintomodelcreation,function,andintendeduse.Andweinvestintoolingformodelanddatalineagetopromotetransparencyandaccountability.
WetakeascientificapproachtomappingAIrisksthroughresearchandexpertconsultation,codifyingtheseinputsintoarisktaxonomy.
Acorecomponentisriskresearch,encompassingemergingAImodelcapabilities,emergingrisksfromAI,andpotentialAImisuse.Thisresearch,whichwehavepublishedinover300papers,directlyinformsourAIrisktaxonomy,launchevaluations,and
mitigationtechniques.
Ourapproachalsodrawsonexternaldomainexpertise,offeringnewinsightstohelpusbetterunderstandemergingrisksandcomplementing
in-housework.
WehavedevelopedarigorousapproachtomeasuringAImodelandapplicationperformance,focusingonsafety,privacy,andsecuritybenchmarks.Our
approachiscontinuallyevolving,incorporatingnewmeasurementtechniquesastheybecomeavailable.
Multi-layeredredteamingplaysacriticalrole
inourapproach,withbothinternalandexternal
teamsproactivelytestingAIsystemsforweaknesses
andidentifyingemergingrisks.Security-focused
redteamingsimulatesreal-worldattacks,while
content-focusedredteamingidentifiespotential
vulnerabilitiesandissues.ExternalpartnershipsandAI-assistedredteamingfurtherenhancethisprocess.
Modelandapplicationevaluationsarecentralto
thismeasurementapproach.Theseevaluationsassessalignmentwithestablishedframeworksandpolicies,bothbeforeandafterlaunch.
AI-assistedevaluationshelpusscaleourrisk
measurement.AIautoratersstreamlineevaluationandlabelingprocesses.Synthetictestingdata
expeditesscaledmeasurement.Andautomatic
testingforsecurityvulnerabilitieshelpsusassesscoderisksinrealtime.
Wedeployandevolvemitigationstomanagecontentsafety,privacy,andsecurity,suchassafetyfiltersandjailbreakprotections.
Weoftenphaseourlauncheswithaudience-specifictesting,andconductpost-launchmonitoringofuserfeedbackforrapidremediation.
WeworktoadvanceuserunderstandingofAIthroughinnovativedevelopmentsinprovenancetechnology,ourresearch-backedexplainabilityguidelines,andAIliteracyeducation.
Tosupportthebroaderecosystem,weprovideresearchfunding,aswellastoolsdesignedfordevelopersandusers.Wealsopromoteindustrycollaborationonthedevelopmentofstandardsandbestpractices.
4
Summaryofourresponsible
AIoutcomes
todate
BuildingAIresponsibly
requirescollaborationacross
manygroups,including
researchers,industryexperts,governments,andusers.
300+
researchpapersonAI
responsibilityandsafetytopics
Achieved“mature”ratingfor
GoogleCloudAIinathird-partyevaluationofreadinessthrough
Weareactivecontributorstothisecosystem,workingtomaximizeAI’spotentialwhile
safeguardingsafety,privacy,andsecurity.
$120million
theNISTAIRiskManagementFrameworkgovernanceandISO/IEC42001compliance
forAIeducationandtraining
aroundtheworld
PartneredonAIresponsibilitywith
outsidegroupsandinstitutions
liketheFrontierModelForum,
thePartnershiponAI,theWorldEconomicForum,MLCommons,Thorn,theCoalitionforContent
19,000
ProvenanceandAuthenticity,theDigitalTrust&SafetyPartnership,theCoalitionforSecureAI,and
CertifiedGeminiapp,GoogleCloud,andGoogleWorkspacethroughthe
securityprofessionalshavetakentheSAIFRiskSelfAssessmenttoreceiveapersonalizedreportofAI
theAdCouncil
ISO/IEC42001process
risksrelevanttotheirorganization
5
Govern
Govern:
Full-stack
AIgovernance
Policiesandprinciples
Ourgovernanceprocessisgroundedinourprinciplesandframeworks:
AIPrinciples.Weestablishedandevolveour
AIPrinciplestoguideourapproachtodevelopinganddeployingAImodelsandapplications.CoretothesePrinciplesispursuingAIeffortswherethelikelyoverallbenefitssubstantiallyoutweightheforeseeablerisks.
Wetakeafull-stackapproachtoAI
governance—fromresponsiblemodeldevelopmentanddeploymentto
post-launchmonitoringandremediation.
Ourpoliciesandprinciplesguideour
decision-making,withclearrequirementsatthepre-andpost-launchstages,
leadershipreviews,anddocumentation.
Modelsafetyframework.TheFrontierSafety
Framework,whichwerecentlyupdated,helpsusto
proactivelyprepareforpotentialrisksposedbymorepowerfulfutureAImodels.TheFrameworkfollowstheemergingapproachofResponsibleCapabilityScalingproposedbytheU.K.’sAISafetyInstitute.
Contentsafetypolicies.Ourpoliciesformitigating
harminareassuchaschildsafety,suicide,and
self-harmhavebeeninformedbyyearsofresearch,
userfeedback,andexpertconsultation.Thesepoliciesguideourmodelsandproductstominimizecertain
typesofharmfuloutputs.Someindividualapplications,liketheGeminiapp,alsohavetheirownpolicyguidelines.Wealsoprioritizeneutralandinclusivedesign
principles,withagoalofminimizingunfairbias.And
wehaveProhibitedUsePoliciesgoverninghowpeoplecanengagewithourAImodelsandfeatures.
Securityandprivacyframework.OurSecureAIFrameworkfocusesonthesecurityandprivacy
dimensionsofAI.
OurapproachtotheGemini
appguidesourday-to-day
developmentoftheappanditsbehavior.WebelievetheGeminiappshould:
1.Followyourdirections
Gemini’stoppriorityistoserveyouwell.
2.Adapttoyourneeds
GeministrivestobethemosthelpfulAIassistant.
3.Safeguardyourexperience
GeminiaimstoalignwithasetofpolicyguidelinesandisgovernedbyGoogle’sProhibitedUsePolicy.
Application-specificdevelopmentframeworks.InadditiontoGoogle-wideframeworksandpolicies,severalofourapplicationshavespecificframeworkstoguidetheirday-to-daydevelopmentandoperation.
6
Govern
Pre-andpost-launchreviews
Weoperationalizeourprinciples,frameworks,andpoliciesthroughasystemoflaunchrequirements,leadershipreviews,andpost-launchrequirementsdesignedtosupportcontinuousimprovement.
Modelrequirements.Governancerequirementsformodelsfocusonfilteringtrainingdataforquality,modelperformance,andadherencetopolicies,aswellasdocumentingtrainingtechniquesintechnicalreportsandmodelcards.Theseprocessesalso
includesafety,privacy,andsecuritycriteria.
Applicationrequirements.Launchrequirementsforapplicationsaddressrisksandinclude
testinganddesignguidance.Forexample,anapplicationthatgeneratesaudiovisualcontentisrequiredtoincorporatearobustprovenancesolutionlikeSynthID.Theserequirementsare
basedonthenatureoftheproduct,itsintendeduserbase,plannedcapabilities,andthetypes
ofoutputinvolved.Forexample,anapplicationmadeavailabletominorsmayhaveadditionalrequirementsinareaslikeparentalsupervisionandage-appropriatecontent.
Leadershipreviews.ExecutivereviewerswithexpertiseinresponsibleAIcarefullyassess
evaluationresults,mitigations,andrisksbeforemakingalaunchdecision.Theyalsooverseeourframeworks,policies,andprocesses,ensuringthattheseevolvetoaccountfornewmodalitiesandcapabilities.
Post-launchrequirements.Ourgovernance
continuespost-launchwithassessmentsforany
issuesthatmightariseacrossproducts.Post-launchgovernanceidentifiesunmitigatedresidualand
emergingrisks,andopportunitiestoimproveour
models,applications,andourgovernanceprocesses.
Launchinfrastructure.WeareevolvingourinfrastructuretostreamlineAIlaunchmanagement,responsibilitytesting,andmitigationprogressmonitoring.
Documentation
WefostertransparencyandaccountabilitythroughoutourAIgovernanceprocesses.
Modeldocumentation.Externalmodelcards
andtechnicalreportsarepublishedregularlyastransparencyartifacts.TechnicalreportsprovidedetailsabouthowourmostadvancedAImodels
arecreatedandhowtheyfunction.Thisincludes
offeringclarityontheintendedusecases,any
potentiallimitationsofthemodels,andhowour
modelsaredevelopedincollaborationwithsafety,privacy,security,andresponsibilityteams.In
addition,wepublishmodelcardsforourmost
capablemodelsandopenmodels.Thesecards
offersummariesoftechnicalreportsina“nutritionlabel”formattosurfacevitalinformationneededfordownstreamdevelopersortohelppolicy
leadersassessthesafetyofamodel.
Dataandmodellineage.Weareinvestingin
robustinfrastructuretosupportdataandmodellineagetracking,enablingustounderstandtheoriginsandtransformationsofdataandmodelsusedinourAIapplications.
OurresponsibleAIapproachreflectskeyconceptsinindustryguidelinesliketheNISTAIRiskManagementFramework—govern,map,measure,andmanage.
Map
Identifycurrent,emerging,
andpotentialfuture
AIrisks
Measure
Evaluateandmonitor
identifiedrisksandenhance
testingmethods
Govern
Aproactivegovernanceapproach
toresponsibleAIdevelopment
anddeployment
Manage
Establishandimplement
relevantandeffective
mitigations
7
Govern
Casestudy:PromotingAItransparencywithmodelcards
ModelcardswereintroducedinaGoogleresearchpaperin2019as
However,asgenerativeAImodelshave
advanced,wehaveadaptedourmostrecentmodelcards,suchasthecardforourhighest
ModelCard
awaytodocumentandprovidetransparencyabouthowwe
evaluatemodels.
Thatpaperproposedsomebasicmodelcardfieldsthatwouldhelpprovidemodelenduserswiththeinformationtheyneedtoevaluatehowandwhentouseamodel.Manyofthefieldsfirstproposed
qualitytext-to-imagemodelImagen3,toreflecttherapidlyevolvinglandscapeofAIdevelopmentanddeployment.Whilethesemodelcards
stillcontainsomeofthesamecategoriesof
metadataweoriginallyproposedin2019,theyalsoprioritizeclarity,practicalusability,and
includeanassessmentofamodel’sintendedusage,limitations,risksandmitigations,andethicalandsafetyconsiderations.
ModelDetails
Basicinformationaboutthemodel.
•Personororganizationdevelopingmodel
•Modeldate
•Modelversion
•Modeltype
•Informationabouttrainingalgorithms,parameters,fairnessconstraintsorotherappliedapproaches,andfeatures
•Paperorotherresourceformoreinformation
Metrics
Metricsshouldbechosentoreflectpotentialreal-worldimpactsofthemodel.
•Modelperformancemeasures
•Decisionthresholds
•Variationapproaches
EvaluationData
Detailsonthedataset(s)usedforthequantitativeanalysesinthecard.
•Datasets
remainvitalcategoriesofmetadatathatarefoundinmodelcardsacrosstheindustrytoday.
Previousiterationsofourmodelcards,suchasonetopredict3Dfacialsurfacegeometryandoneforanobjectdetectionmodel,conveyedimportantinformationaboutthoserespectivemodels.
Asmodelscontinuetoevolve,wewillwork
torecognizethekeycommonalitiesbetween
modelsinthesemodelcards.Byidentifying
thesecommonalities,whilealsoremaining
flexibleinourapproach,wecanusemodelcardstosupportasharedunderstandingandincreasedtransparencyaroundhowmodelswork.
•Citationdetails
•License
•Wheretosendquestionsorcommentsaboutthemodel
IntendedUse
Usecasesthatwereenvisionedduringdevelopment.
•Primaryintendeduses
•Out-of-scopeusecases
•Motivation
•Preprocessing
TrainingData
Maynotbepossibletoprovideinpractice.Whenpossible,thissectionshouldmirrorEvaluationData.Ifsuchdetail
isnotpossible,minimalallowableinformationshouldbeprovidedhere,suchasdetailsofthedistributionover
variousfactorsinthetrainingdatasets.
Factors
Factorscouldincludedemographic
environmentalconditions,technical
orphenotypicgroups,
attributes,orothers
QuantitativeAnalyses
•Unitaryresults
•Intersectionalresults
listedasrequired.
•RelevantfactorsEthicalConsiderations
•Evaluationfactors
CaveatsandRecommendations
Themodelcardfieldssuggestedinour2019researchpaper
“ModelCardsforModelReporting.”
8
Map
Map:
Riskresearch
Identifyingand
understandingrisks
We’vepublishedmorethan300papersonresponsibleAItopics,andcollaboratedwithresearchinstitutionsaroundtheworld.Recentareasoffocusinclude:
ResearchonnovelAIcapabilities.WeresearchthepotentialimpactofemergingAIcapabilitiessuchasnewmodalitiesandagenticAI,tobetterunderstandifandhowtheymaterialize,aswellasidentifying
WetakeascientificapproachtomappingAIrisksthroughresearchandexpert
consultation,codifyingtheseinputsintoarisktaxonomy.Ourmappingprocessis
fundamentallyiterative,evolvingalongsidethetechnology,andadaptingtotherangeofcontextsinwhichpeopleuseAImodelsorapplications.
potentialmitigationsandpolicies.
ResearchonemergingrisksfromAI.WealsoinvestinresearchonthepotentialemergingrisksfromAIinareaslikebiosecurity,cybersecurity,self-proliferation,dangerouscapabilities,misinformation,andprivacy,toevolveourmitigationsandpolicies.
ResearchonAImisuse.Mappingthepotential
misuseofgenerativeAIhasbecomeacoreareaof
research,andcontributestohowweassessand
evaluateourownmodelsintheseriskareas,aswellaspotentialmitigations.Thisincludesrecentresearchintohowgovernment-backedthreatactorsaretryingtouseAIandwhetheranyofthisactivityrepresentsnovelrisks.
Externaldomainexpertise
Weaugmentourownresearchbyworkingwith
externaldomainexpertsandtrustedtesterswhocanhelpfurtherourmappingandunderstandingofrisks.
Externalexpertfeedback.Wehostworkshops
anddemosatourGoogleSafetyEngineeringCentersaroundtheworldandindustryconferences,
garneringinsightsacrossacademia,civilsociety,andcommercialorganizations.
Trustedtesters.Teamscanalsoleverageexternaltrustedtestinggroupswhoreceivesecureaccesstotestmodelsandapplicationsaccordingtotheir
domainexpertise.
Risktaxonomy
We’vecodifiedourmappingworkintoataxonomyofpotentialrisksassociatedwithAI,buildingontheNISTAIRiskManagementFrameworkandinformedbyourexperiencesdevelopinganddeployingawiderangeofAImodelsandapplications.Theserisksspansafety,privacy,andsecurity,aswellastransparencyandaccountabilityriskssuchasunclearprovenanceorlackofexplainability.Thisriskmapisdesignedtoenableclarityaroundwhichrisksaremostrelevanttounderstandforagivenlaunch,andwhatmightbeneededtomitigatethoserisks.
9
Map
AselectionofourlatestresearchpublicationsfocusedonresponsibleAI
June2024
GenerativeAIMisuse:ATaxonomyofTacticsandInsightsfromReal-WorldData
BeyondThumbsUp/Down:UntanglingChallengesofFine-GrainedFeedbackforText-to-ImageGeneration
July2024
OnScalableOversightwithWeakLLMsJudgingStrongLLMs
JumpingAhead:ImprovingReconstructionFidelitywithJumpReLUSparseAutoencoders
ShieldGemma:GenerativeAIContentModerationBasedonGemma
August2024
GemmaScope:OpenSparseAutoencodersEverywhereAllAtOnceonGemma2
Imagen3
September2024
KnowingWhentoAsk-BridgingLargeLanguageModelsandData
OperationalizingContextualIntegrityinPrivacy-ConsciousAssistants
AToolboxforSurfacingHealthEquityHarmsandBiasesinLargeLanguageModels
October2024
NewContexts,OldHeuristics:HowYoungPeopleinIndiaandtheUSTrustOnlineContentintheAgeofGenerativeAI
AllTooHuman?MappingandMitigatingtheRiskfromAnthropomorphicAI
GapsintheSafetyEvaluationofGenerativeAI
InsightsonDisagreementPatternsinMultimodalSafetyPerceptionacrossDiverseRaterGroups
STAR:SocioTechnicalApproachtoRedTeamingLanguageModels
November2024
ANewGoldenAgeofDiscovery:SeizingtheAIforScienceOpportunity
December2024
MachineUnlearningDoesn’tDoWhatYouThink:
LessonsforGenerativeAIPolicy,Research,andPractice
January2025
AdversarialMisuseofGenerativeAI
HowweEstimatetheRiskfromPromptInjectionAttacksonAISystems
10
Map
Casestudy:MappingandaddressingriskstosafelydeployAlphaFold3
InMay2024,GoogleDeepMind
releasedAlphaFold3,anAImodelcapableofpredictingmolecular
structuresandinteractionsand
howtheyinteract,whichholdsthepromiseoftransformingscientists’understandingofthebiological
worldandacceleratingdrug
discovery.Scientistscanaccessthemajorityofitscapabilities,forfree,throughourAlphaFoldServer,an
easy-to-useresearchtool,orviaopencodeandweights.
Wecarriedoutextensiveresearchthroughout
AlphaFold3’sdevelopmenttounderstandhowitmighthelporposeriskstobiosecurity.OverthecourseofAlphaFold’sdevelopment,weconsultedwithmorethan50externalexpertsacross
variousfields,includingDNAsynthesis,virology,andnationalsecurity,tounderstandtheir
perspectivesonthepotentialbenefitsandrisks.
Anethicsandsafetyassessmentwasconductedwithexternalexperts,inwhichpotentialrisksandbenefitsofAlphaFold3wereidentifiedandanalyzed,includingtheirpotentiallikelihoodandimpact.Thisassessmentwasgroundedinthe
specifictechnicalcapacitiesofthemodelandcomparedthemodeltootherresourcesliketheProteinDataBankandotherAIbiologytools.
TheassessmentwasthenreviewedbyacouncilofseniorinternalexpertsinAIresponsibilityandsafety,whoprovidedfurtherfeedback.
AswithallGoogleDeepMindmodels,AlphaFold3wasdeveloped,trained,stored,andservedwithinGoogle’sinfrastructure,supportedbysecurity
teams,engineers,andresearchers.QuantitativeandqualitativetechniquesareusedtomonitortheadoptionandimpactofAlphaFold3.WepartneredwiththeEuropeanBioinformaticsInstituteoftheEuropeanMolecularBiologyLaboratory(EMBL)tolaunchfreetutorialsonhowtobestuse
AlphaFoldthatmorethan10,000scientistshaveaccessed.Wearecurrentlyexpandingthecourseandpartneringwithlocalcapacitybuildersto
acceleratetheequitableadoptionofAlphaFold3.
TocontinuetoidentifyandmapemergingrisksandbenefitsfromAItobiosecurity,wecontributetocivilsocietyandindustryeffortssuchastheU.K.NationalThreatInitiative’sAI-BioForumand
theFrontierModelForum,aswellasengagingwithgovernmentbodies.
AlphaFoldisacceleratingbreakthroughsinbiologywithAI,andhasrevealedmillionsof
intricate3Dproteinstructures,helpingscientistsunderstandhowlife’smoleculesinteract.
11
Measure:
Assessingrisksandmitigations
basedonbenchmarksforsafety,privacy,andsecurity.Ourapproachevolveswithdevelopmentsintheunderlyingtechnology,newandemergingrisks,andasnew
measurementtechniquesemerge,suchasAI-assistedevaluations.
Afteridentifyingandunderstandingrisksthroughmapping,wesystematically
assessourAIsystemsthroughredteamingexercises.Weevaluatehowwellour
modelsandapplicationsperform,andhoweffectivelyourriskmitigationswork,
Measure
Multi-layeredredteaming
Redteamingexercises,conductedbothinternallyandexternally,proactivelyassessAIsystemsforweaknessesandareasforimprovement.Teams
workingontheseexercisescollaboratetopromoteinformationsharingandindustryalignmentin
redteamingstandards.
Security-focusedredteaming.OurAIRedTeam
combinessecurityandAIexpertisetosimulate
attackerswhomighttargetAIsystems.Basedon
threatintelligencefromteamsliketheGoogleThreatIntelligenceGroup,theAIRedTeamexploresand
identifieshowAIfeaturescancausesecurityissues,recommendsimprovements,andhelpsensurethat
real-worldattackersaredetectedandthwartedbeforetheycausedamage.
Content-focusedredteaming.OurContent
AdversarialRedTeam(CART)proactivelyidentifiesweaknessesinourAIsystems,enablingustomitigaterisksbeforeproductlaunch.CARThasconductedover150redteamingexercisesacrossvariousproducts.OurinternalAItoolsalsoassisthumanexpertred
teamersandincreasethenumberofattacksthey’reabletotestfor.
Externalredteamingpartnerships.OurexternalredteamingincludeslivehackingeventssuchasDEFCONandEscal8,targetedresearchgrants,challenges,andvulnerabilityrewardsprogramstocomplementour
internalevaluations.
AI-assistedredteaming.Toenhanceourapproach,wehavedevelopedformsofAI-assistedredteaming
—trainingAIagentstofindpotentialvulnerabilitiesinotherAIsystems,drawingonworkfromgaming
breakthroughslikeAlphaGo.Forexample,werecentlyshareddetailsofhowweusedAI-assistedredteamingtounderstandhowvulnerableoursystemsmaybetoindirectpromptinjectionattacks,andtoinformhowwemitigatetherisk.
Modelandapplicationevaluations
Acorecomponentofourmeasurementapproachis
runningevaluationsformodelsandapplications.These
evaluationsprimarilyfocusonknownrisks,incontrasttoredteaming,whichfocusesonknownandunknownrisks.
Modelevaluations.Asubsetofthemappedrisksis
relevanttotestatthemodellevel.Forexample,aswe
preparedtolaunchGemini1.5Pro,weevaluatedthe
modelforriskssuchasself-proliferation,offensive
cybersecurity,childsafetyharms,andpersuasion.Wealsodevelopnewevaluationsinkeyareas—suchas
ourworkonFACTSGrounding,whichisabenchmarkforevaluatinghowaccuratelyLLMsgroundtheirresponsesinprovidedsourcematerialandavoidhallucinations.
Applicationevaluations.Theseevaluationsare
designedtoassesstheextenttowhichagiven
applicationfollowstheframeworksandpoliciesthat
applytothatapplication.Thispre-launchtesting
generallycoversawiderangeofrisksspanning
safety,privacy,andsecurity,andthisportfolioof
testingresultshelpsinformlaunchdecisions.Wealsoinvestinsystematicpost-launchtestingthatcantakedifferentforms,suchasrunningregressiontesting
forevaluatinganapplication’songoingalignment
withourframeworksandpolicies,andcross-productevaluationstoidentifywhetherknownrisksforoneapplicationmayhavemanifestedinotherapplications.
AI-assistedevaluations
AsAIcontinuestoscale,it’scriticalthatourabilityto
measurerisksscalesalongwithit.That’swhywe’re
investinginautomatedtestingsolutions,whichcanrunbothbeforelaunchandonanongoingbasisafterrelease.
AIautoraters.Atthemodellayer,Gemini
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025年全员安全培训考试试题附下载答案
- 2025年管理人员安全培训考试试题答案全面
- 2025新入职员工安全培训考试试题附参考答案(夺分金卷)
- 2025项目内部承包合同模板
- 【部编版】四年级语文下册《习作例文》精美课件
- 2025年律师事务所律师聘用劳动合同范本
- 2025健身教练股权激励合同范本
- 2025教育培训机构师资培训劳动合同模板
- 2025企业间的贷款协议范本:借款合同示例
- 2025电缆施工合同范本
- (二模)2025年深圳市高三年级第二次调研考试历史试卷(含标准答案)
- 广西《疼痛综合评估规范》(材料)
- 2025年山东省淄博市张店区中考一模历史试题(含答案)
- 美容师考试与法律法规相关知识及试题答案
- 推动研究生教育高质量发展方案
- 2025-2030中国药用活性炭行业市场现状供需分析及投资评估规划分析研究报告
- 2025-2031年中国竹鼠养殖及深加工行业投资研究分析及发展前景预测报告
- 超星尔雅学习通《国际经济学(中国人民大学)》2025章节测试附答案
- 第13课 辽宋夏金元时期的对外交流 教案2024-2025学年七年级历史下册新课标
- 固体废弃物处理和资源化利用项目可行性研究报告申请建议书案例一
- 陕西省2024年高中学业水平合格考化学试卷试题(含答案解析)
评论
0/150
提交评论