版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
LIFE
FLIAISafetyIndex2024
IndependentexpertsevaluatesafetypracticesofleadingAIcompaniesacrosscriticaldomains.
11thDecember2024
Availableonlineat:
/index
Contactus:policy@
FUTUREOFLIFEINSTITUTE
1
Contents
Introduction2
Scorecard2
KeyFindings2
IndependentReviewPanel3
IndexDesign4
EvidenceBase5
GradingProcess7
Results7
Conclusions11
AppendixA-GradingSheets12
AppendixB-CompanySurvey42
AppendixC-CompanyResponses64
AbouttheOrganization:TheFutureofLifeInstitute(FLI)isanindependentnonprofitorganizationwiththegoalofreducinglarge-scalerisksandsteeringtransformativetechnologiestobenefithumanity,withaparticularfocusonartificialintelligence(AI).
Learnmore
at.
FUTUREOFLIFEINSTITUTE
2
Introduction
RapidlyimprovingAIcapabilitieshaveincreasedinterestinhowcompaniesreport,assessandattempttomitigateassociatedrisks.TheFutureofLifeInstitute(FLI)thereforefacilitatedtheAISafetyIndex,atooldesignedtoevaluateandcomparesafetypracticesamongleadingAIcompanies.AttheheartoftheIndexis
anindependentreviewpanel,includingsomeoftheworld’sforemostAIexperts.Reviewersweretaskedwith
gradingcompanies’safetypoliciesonthebasisofacomprehensiveevidencebasecollectedbyFLI.TheindexaimstoincentivizeresponsibleAIdevelopmentbypromotingtransparency,highlightingcommendableefforts,andidentifyingareasofconcern.
Scorecard
Firm
OverallGrade
Score
Risk
Assessment
CurrentHarms
Safety
Frameworks
Existential
SafetyStrategy
Governance&Accountability
Transparency&Communication
Anthropic
C
2.13
C+
B-
D+
D+
C+
D+
DeepMind
D+
1.55
C
C+
D-
D
D+
D
OpenAI
D+
1.32
C
D+
D-
D-
D+
D-
ZhipuAI
D
1.11
D+
D+
F
F
D
C
x.AI
D-
0.75
F
D
F
F
F
C
Meta
F
0.65
D+
D
F
F
D-
F
Grading:Usesthe
USGPAsystem
forgradeboundaries:A+,A,A-,B+,[...],Flettervaluescorrespondingtonumericalvalues4.3,4.0,3.7,3.3,[...],0.
KeyFindings
•Largeriskmanagementdisparities:Whilesomecompanieshaveestablishedinitialsafetyframeworksorconductedsomeseriousriskassessmentefforts,othershaveyettotakeeventhemostbasicprecautions.
•Jailbreaks:Alltheflagshipmodelswerefoundtobevulnerabletoadversarialattacks.
•Control-Problem:Despitetheirexplicitambitionstodevelopartificialgeneralintelligence(AGI),capableofrivalingorexceedinghumanintelligence,thereviewpaneldeemedthecurrentstrategiesofallcompaniesinadequateforensuringthatthesesystemsremainsafeandunderhumancontrol.
•Externaloversight:Reviewersconsistentlyhighlightedhowcompanieswereunabletoresistprofit-drivenincentivestocutcornersonsafetyintheabsenceofindependentoversight.WhileAnthropic'scurrentandOpenAI’sinitialgovernancestructureswerehighlightedaspromising,expertscalledforthird-partyvalidationofriskassessmentandsafetyframeworkcomplianceacrossallcompanies.
FUTUREOFLIFEINSTITUTE
3
IndependentReviewPanel
The2024AISafetyIndexwasgradedbyanindependentpanelofworld-renownedAIexpertsinvitedbyFLI’spresident,MITProfessorMaxTegmark.Thepanelwascarefullyselectedtoensureimpartialityandadiverserangeofexpertise,coveringbothtechnicalandgovernanceaspectsofAI.Panelselectionprioritizeddistinguishedacademicsandleadersfromthenon-profitsectortominimizepotentialconflictsofinterest.
AtoosaKasirzadeh
AtoosaKasirzadehisaphilosopherandAIresearcher,servingasanAssistantProfessoratCarnegieMellonUniversity.Previously,shewasavisitingfacultyresearcheratGoogle,aChancellor’sFellowandDirectorofResearchattheCentreforTechnomoralFuturesattheUniversityofEdinburgh,aResearchLeadattheAlanTuringInstitute,aninternatDeepMind,andaGovernanceofAIFellowatOxford.Herinterdisciplinaryresearchaddressesquestionsaboutthesocietalimpacts,governance,andfutureofAI.
Thepanelassignedgradesbasedonthegatheredevidencebase,consideringbothpublicandcompany-submittedinformation.Theirevaluations,combinedwithactionablerecommendations,aimtoincentivizesaferAIpracticeswithintheindustry.Seethe“GradingProcess”sectionformoredetails.
TeganMaharaj
TeganMaharajisanAssistantProfessorintheDepartmentofDecisionSciencesatHECMontréal,wheresheleadstheERRATAlabonEcologicalRiskandResponsibleAI.SheisalsoacoreacademicmemberatMila.HerresearchfocusesonadvancingthescienceandtechniquesofresponsibleAIdevelopment.Previously,sheservedasanAssistantProfessorofMachineLearningattheUniversityofToronto.
YoshuaBengio
YoshuaBengioisaFullProfessorintheDepartmentofComputerScienceandOperationsResearchatUniversitédeMontreal,aswellastheFounderandScientificDirectorofMilaandtheScientificDirectorofIVADO.Heistherecipientofthe2018A.M.TuringAward,aCIFARAIChair,aFellowofboththeRoyalSocietyofLondonandCanada,anOfficeroftheOrderofCanada,KnightoftheLegionofHonorofFrance,MemberoftheUN’sScientificAdvisoryBoardforIndependentAdviceonBreakthroughsinScienceandTechnology,andChairoftheInternationalScientificReportontheSafetyofAdvancedAI.
JessicaNewman
JessicaNewmanistheDirectorofthe
AISecurityInitiative
(AISI),housedattheUCBerkeleyCenterforLong-TermCybersecurity.SheisalsoaCo-DirectoroftheUCBerkeley
AIPolicyHub
.Newman’sresearchfocusesonthegovernance,policy,andpoliticsofAI,withparticularattentiononcomparativeanalysisofnationalAIstrategiesandpolicies,andonmechanismsfortheevaluationandaccountabilityoforganizationaldevelopmentanddeploymentofAIsystems.
DavidKrueger
DavidKruegerisanAssistantProfessorinRobust,ReasoningandResponsibleAIintheDepartmentofComputerScienceandOperationsResearch(DIRO)atUniversityofMontreal,andaCoreAcademicMemberatMila,UCBerkeley’sCenterforHuman-CompatibleAI,andtheCenterfortheStudyofExistentialRisk.Hisworkfocusesonreducingtheriskofhumanextinctionfromartificialintelligencethroughtechnicalresearchaswellaseducation,outreach,governanceandadvocacy.
SnehaRevanur
SnehaRevanuristhefounderandpresidentofEncodeJustice,aglobalyouth-ledorganizationadvocatingfortheethicalregulationofAI.Underherleadership,EncodeJusticehasmobilizedthousandsofyoungpeopletoaddresschallengeslikealgorithmicbiasandAIaccountability.ShewasfeaturedonTIME’sinaugurallistofthe100mostinfluentialpeopleinAI.
StuartRussell
StuartRussellisaProfessorofComputerScienceattheUniversityofCaliforniaatBerkeley,holderoftheSmith-ZadehChairinEngineering,andDirectoroftheCenterforHuman-CompatibleAIandtheKavliCenterforEthics,Science,andthePublic.HeisarecipientoftheIJCAIComputersandThoughtAward,theIJCAIResearchExcellenceAward,andtheACMAllenNewellAward.In2021hereceivedtheOBEfromHerMajestyQueenElizabethandgavetheBBCReithLectures.Heco-authoredthestandardtextbookforAI,whichisusedinover1500universitiesin135countries.
FUTUREOFLIFEINSTITUTE
4
Method
IndexDesign
TheAISafetyIndexevaluatessafetypracticesacrosssixleadinggeneral-purposeAIdevelopers:Anthropic,OpenAI,GoogleDeepMind,Meta,x.AI,andZhipuAI.Theindexprovidesacomprehensiveassessmentbyfocussingonsixcriticaldomains,with42indicatorsspreadacrossthesedomains:
1.RiskAssessment
2.CurrentHarms
3.SafetyFrameworks
4.ExistentialSafetyStrategy
5.Governance&Accountability
6.Transparency&Communication
IndicatorsrangefromcorporategovernancepoliciestoexternalmodelevaluationpracticesandempiricalresultsonAIbenchmarksfocusedonsafety,fairnessandrobustness.Thefullsetofindicatorscanbefoundinthegradingsheetsin
AppendixA
.AquickoverviewisgiveninTable1onthenextpage.Thekeyinclusioncriteriafortheseindicatorswere:
1.Relevance:ThelistemphasizesaspectsofAIsafetyandresponsibleconductthatarewidelyrecognizedbyacademicandpolicycommunities.Manyindicatorsweredirectlyincorporatedfromrelatedprojectsconductedbyleadingresearchorganizations,suchasStanford’sCenterforResearchonFoundationModels.
2.Comparability:Weselectedindicatorsthathighlightmeaningfuldifferencesinsafetypractices,whichcanbeidentifiedbasedontheavailableevidence.Asaresult,safetyprecautionsforwhichconclusivedifferentialevidencewasunavailablewereomitted.
Companieswereselectedbasedontheiranticipatedcapabilitytobuildthemostpowerfulmodelsby2025.Additionally,theinclusionoftheChinesefirmZhipuAIreflectsourintentiontomaketheIndexrepresentativeofleadingcompaniesglobally.Futureiterationsmayfocusondifferentcompaniesasthecompetitivelandscapeevolves.
Weacknowledgethattheindex,whilecomprehensive,doesnotcaptureeveryaspectofresponsibleAIdevelopmentandexclusivelyfocusesongeneral-purposeAI.Wewelcomefeedbackonourindicatorselectionandstrivetoincorporatesuitablesuggestionsintothenextiterationoftheindex.
FUTUREOFLIFEINSTITUTE
5
Table1:Fulloverviewofindicators
RiskAssessment
CurrentHarms
Safety
Frameworks
Existential
SafetyStrategy
Governance&Accountability
Transparency&Communication
Dangerouscapabilityevaluations
AIRBench2024
Riskdomains
Control/Alignmentstrategy
Companystructure
Lobbyingonsafetyregulations
Uplifttrials
TrustLLM
Benchmark
Riskthresholds
Capabilitygoals
Boardofdirectors
Testimoniestopolicymakers
Pre-deploymentexternalsafetytesting
SEALLeaderboardforadversarial
robustness
Modelevaluations
Safetyresearch
Leadership
Leadership
communicationsoncatastrophicrisks
Post-deploymentexternalresearcheraccess
GraySwan
JailbreakingArena-Leaderboard
Decisionmaking
Supportingexternalsafetyresearch
Partnerships
Stanford’s2024
FoundationModelTransparencyIndex1.1
Bugbountiesformodel
vulnerabilities
Fine-tuningprotections
Riskmitigations
Internalreview
Safetyevaluationtransparency
Pre-developmentriskassessments
Carbonoffsets
Conditionalpauses
Missionstatement
Watermarking
Adherence
Whistle-blower
Protection&
Non-disparagement
Agreements
Privacyofuserinputs
Assurance
Compliancetopublic
commitments
Datacrawling
Military,warfare&intelligenceapplications
TermsofServiceanalysis
EvidenceBase
TheAISafetyIndexisunderpinnedbyacomprehensiveevidencebasetoensureevaluationsarewell-informedandtransparent.Thisevidencewascompiledintodetailedgradingsheets,whichpresentedcompany-specificdataacrossall42indicatorstothereviewpanel.Thesesheetsincludedhyperlinkstooriginalsourcesandcanbeaccessedinfullin
AppendixA
.Evidencecollectionreliedontwoprimarypathways:
•PubliclyAvailableInformation:Mostdatawassourcedfrompubliclyaccessiblematerials,includingresearchpapers,policydocuments,newsarticles,andindustryreports.Thisapproachenhancedtransparencyandenabledstakeholderstoverifytheinformationbytracingitbacktoitsoriginalsources.
•CompanySurvey:Tosupplementpubliclyavailabledata,atargetedquestionnairewasdistributedtotheevaluatedcompanies.Thesurveyaimedtogatheradditionalinsightsonsafety-relevantstructures,processes,andstrategies,includinginformationnotyetpubliclydisclosed.
EvidencecollectionspannedfromMay14toNovember27,2024.ForempiricalresultsfromAIbenchmarks,wenoteddataextractiondatestoaccountformodelupdates.Inlinewithourcommitmenttotransparencyandaccountability,allcollectedevidence—whetherpublicorcompany-provided—hasbeendocumentedandmadeavailableforscrutinyintheappendix.
FUTUREOFLIFEINSTITUTE
6
IncorporatedResearchandRelatedWork
TheAISafetyIndexisbuiltonafoundationofextensiveresearchanddrawsinspirationfromseveralnotableprojectsthathaveadvancedtransparencyandaccountabilityinthefieldofgeneral-purposeAI.
Twoofthemostcomprehensiverelatedprojectsarethe
RiskManagementRatings
producedbySaferAI,anon-profitorganizationwithdeepexpertiseinriskmanagement,and
AILabW
,aresearchinitiativeidentifyingstrategiesformitigatingextremerisksfromadvancedAIandreportingoncompanyimplementationofthosestrategies.
TheSafetyIndexdirectlyintegratesfindingsfromStanford’sCenterforResearchonFoundationModels(
CFRN
),
particularlytheir
FoundationModelTransparencyIndex
,aswellasempiricalresultsfrom
AIR-Bench2024
,a
state-of-the-artsafetybenchmarkforGPAIsystems.Additionalempiricaldatacitedincludesscoresfromthe2024
TrustLLM
Benchmark,Scale’s
AdversarialRobustnessevaluation
,andthe
GraySwanJailbreaking
.Thesesourcesofferinvaluableinsightsintothetrustworthiness,fairness,androbustnessofGPAIsystems.
Toevaluateexistentialsafetystrategies,theindexleveragedfindingsfroma
detailedmapping
oftechnicalsafetyresearchatleadingAIcompaniesbytheInstituteforAIPolicyandStrategy.Indicatorsonexternalevaluationswereinformedby
research
ledbyShayneLongpreatMIT,andthestructureofthe‘SafetyFramework’sectiondrewfromrelevantpublicationsfromthe
CenterfortheGovernanceofAI
andtheresearchnon-profit
METR
.Additionally,weexpressgratitudetothejournalistsworkingtokeepcompaniesaccountable,whosereportsarereferencedinthegradingsheets.
CompanySurvey
Tocomplementpubliclyavailabledata,theAISafetyIndexincorporatedinsightsfromatargetedcompanysurvey.Thisquestionnairewasdesignedtogatherdetailedinformationonsafety-relatedstructures,processes,andplans,includingaspectsnotdisclosedinpublicdomains.
Thesurveyconsistedof85questionsspanningsevencategories:Cybersecurity,Governance,Transparency,RiskAssessment,RiskMitigation,CurrentHarms,andExistentialSafety.Questionsincludedbinary,multiple-choice,andopen-endedformats,allowingcompaniestoprovidenuancedresponses.Thefullsurveyisattachedin
AppendixB
.
Surveyresponsesweresharedwiththereviewers,andrelevantinformationfortheindicatorswasalsodirectlyintegratedintothegradingsheets.Informationprovidedbycompanieswasexplicitlyidentifiedinthegradingsheets.Whilex.AIandZhipuAIchosetoengagewiththetargetedquestionsinthesurvey,Anthropic,GoogleDeepMindandMetaonlyreferredustorelevantsourcesofalreadypubliclysharedinformation.OpenAIdecidednottosupportthisproject.
Participationincentive
Whilelessthanhalfofthecompaniesprovidedsubstantialanswers,Engagementwiththesurveywasrecognizedinthe‘TransparencyandCommunications’section.Companiesthatchosenottoengagewiththesurveyreceivedapenaltyofonegradestep.Thisadjustmentincentivizesparticipationandacknowledgesthevalueoftransparencyaboutsafetypractices.Thispenaltyhasbeencommunicatedtothereviewpanelwithinthegradingsheet,andreviewerswereadvisednottoadditionallytakesurveyparticipationintoaccountwhengradingtherelevantsection.FLIremainscommittedtoencouraginghigherparticipationinfutureiterationstoensureasrobustandrepresentativeevaluationsaspossible.
FUTUREOFLIFEINSTITUTE
7
GradingProcess
Thegradingprocesswasdesignedtoensurearigorousandimpartialevaluationofsafetypracticesacrosstheassessedcompanies.Followingtheconclusionoftheevidence-gatheringphaseonNovember27,2024,gradingsheetssummarizingcompany-specificdataweresharedwithanindependentpanelofleadingAIscientistsandgovernanceexperts.Thegradingsheetsincludedallindicator-relevantinformationandinstructionsforscoring.
Panellistswereinstructedtoassigngradesbasedonanabsolutescaleratherthanjustscoringcompaniesrelativetoeachother.FLIincludedaroughgradingrubricforeachdomaintoensureconsistencyinevaluations.Besidestheletter-grades,reviewerswereencouragedtosupporttheirgradeswithshortjustificationsandtoprovidekeyrecommendationsforimprovement.Expertswereencouragedtoincorporateadditionalinsightsandweighindicatorsaccordingtotheirjudgment,ensuringthattheirevaluationsreflectedboththeevidencebaseandtheirspecializedexpertise.Toaccountforthedifferenceinexpertiseamongthereviewers,FLIselectedonesubsettoscorethe“ExistentialSafetyStrategy”andanothertoevaluatethesectionon“CurrentHarms.”Otherwise,allexpertswereinvitedtoscoreeverysection,althoughsomepreferredtoonlygradedomainstheyaremostfamiliarwith.Intheend,everysectionwasgradedbyfourormorereviewers.Gradeswereaggregatedintoaveragescoresforeachdomain,whicharepresentedinthescorecard.
Byadoptingthisstructuredyetflexibleapproach,thegradingprocessnotonlyhighlightscurrentsafetypracticesbutalsoidentifiesactionableareasforimprovement,encouragingcompaniestostriveforhigherstandardsinfutureevaluations.
Onecanarguethatlargecompaniesonthefrontiershouldbeheldtothehighestsafetystandards.Initially,wethereforeconsideredgiving1/3extrapointtocompanieswithmuchlessstafforsignificantlylowermodelscores.Intheend,wedecidednottodothisforthesakeofsimplicity.Thischoicedidnotchangetheresultingrankingofcompanies.
Results
Thissectionpresentsaveragegradesforeachdomainandsummarizesthejustificationsandimprovementrecommendationsprovidedbythereviewpanelexperts.
RiskAssessment
Anthropic
DeepMind
OpenAI
ZhipuAI
x.AI
Meta
Grade
C+
C
C
D+
F
D+
Score
2.67
2.10
2.10
1.55
0
1.50
OpenAI,GoogleDeepMind,andAnthropicwerecommendedforimplementingmorerigoroustestsforidentifyingpotentialdangerouscapabilities,suchasmisuseincyber-attacksorbiologicalweaponcreation,comparedtotheircompetitors.Yet,eventheseeffortswerefoundtofeaturenotablelimitations,leavingtherisksassociatedwithGPAIpoorlyunderstood.OpenAI’supliftstudiesandevaluationsfordeceptionwerenotabletoreviewers.AnthropichasdonethemostimpressiveworkincollaboratingwithnationalAISafetyInstitutes.Metaevaluateditsmodelsfordangerouscapabilitiesbeforedeployment,butcriticalthreatmodels,suchasthoserelatedtoautonomy,scheming,andpersuasionremainunaddressed.ZhipuAI’sRiskAssessmenteffortswerenotedas
FUTUREOFLIFEINSTITUTE
8
lesscomprehensive,whilex.AIfailedtopublishanysubstantivepre-deploymentevaluations,fallingsignificantlybelowindustrystandards.Areviewersuggestedthatthescopeandsizeofhumanparticipantupliftstudiesshouldbeincreasedandstandardsforacceptableriskthresholdsneedtobeestablished.ReviewersnotedthatonlyGoogleDeepMindandAnthropicmaintaintargetedbug-bountyprogramsformodelvulnerabilities,withMeta’sinitiativenarrowlyfocusingonprivacy-relatedattacks.
CurrentHarms
Anthropic
DeepMind
OpenAI
ZhipuAI
x.AI
Meta
Grade
B-
C+
D+
D+
D
D
Score
2.83
2.50
1.68
1.50
1.00
1.18
Anthropic’sAIsystemsreceivedthehighestscoresonleadingempiricalsafetyandtrustworthinessbenchmarks,withGoogleDeepMindrankingsecond.Reviewersnotedthatothercompanies’systemsattainednotablylowerscores,raisingconcernsabouttheadequacyofimplementedsafetymitigations.ReviewerscriticizedMeta’spolicyofpublishingtheweightsoftheirfrontiermodels,asthisenablesmaliciousactorstoeasilyremovethesafeguardsoftheirmodelsandusetheminharmfulways.GoogleDeepMind’sSynthIDwatermarksystemwasrecognizedasaleadingpracticeformitigatingtherisksofAI-generatedcontentmisuse.Incontrast,mostothercompanieslackrobustwatermarkingmeasures.ZhipuAIreportedusingwatermarksinthesurveybutseemsnottodocumenttheirpracticeontheirwebsite.
Additionally,environmentalsustainabilityremainsanareaofdivergence.WhileMetaandMetaactivelyoffsettheircarbonfootprints,othercompaniesonlypartiallyachievethisorevenfailtoreportontheirpracticespublicly.x.AI’sreporteduseofgasturbinestopowerdatacentersisparticularlyconcerningfromasustainabilitystandpoint.
Further,reviewersstronglyadvisecompaniestoensuretheirsystemsarebetterpreparedtowithstandadversarialattacks.Empiricalresultsshowthatmodelsarestillvulnerabletojailbreaking,withOpenAI’smodelsbeingparticularlyvulnerable(nodataforx.AIorZhipuareavailable).DeepMind’smodeldefenceswerethemostrobustintheincludedbenchmarks.
Thepanelalsocriticizedcompaniesforusinguser-interactiondatatotraintheirAIsystems.OnlyAnthropicandZhipuAIusedefaultsettingswhichpreventthemodelfrombeingtrainedonuserinteractions(exceptthoseflaggedforsafetyreview).
SafetyFrameworks
Anthropic
DeepMind
OpenAI
ZhipuAI
x.AI
Meta
Grade
D+
D-
D-
F
F
F
Score
1.67
0.80
0.90
0.35
0.35
0.35
AllsixcompaniessignedtheSeoul
FrontierAISafetyCommitments
andpledgedtodevelopsafetyframeworkswiththresholdsforunacceptablerisks,advancedsafeguardsforhigh-risklevels,andconditionsforpausingdevelopmentifriskscannotbemanaged.Asofthepublicationofthisindex,onlyOpenAI,AnthropicandGoogleDeepMindhavepublishedtheirframeworks.Assuch,thereviewerscouldonlyassesstheframeworksofthosethreecompanies.
FUTUREOFLIFEINSTITUTE
9
Whiletheseframeworkswerejudgedinsufficienttoprotectthepublicfromunacceptablelevelsofrisk,expertsstillconsideredtheframeworkstobeeffectivetosomedegree.Anthropic’sframeworkstoodouttoreviewersasthemostcomprehensivebecauseitdetailedadditionalimplementationguidance.Oneexpertnotedtheneedforamoreprecisecharacterizationofcatastrophiceventsandclearerthresholds.OthercommentsnotedthattheframeworksfromOpenAIandGoogleDeepMindwerenotdetailedenoughfortheireffectivenesstobedeterminedexternally.Additionally,noframeworksufficientlydefinedspecificsaroundconditionalpausesandareviewersuggestedtriggerconditionsshouldfactorinexternaleventsandexpertopinion.Multipleexpertsstressedthatsafetyframeworksneedtobesupportedbyrobustexternalreviewsandoversightmechanismsortheycannotbetrustedtoaccuratelyreportrisklevels.Anthropic’seffortstowardexternaloversightweredeemedbest,ifstillinsufficient.
ExistentialSafetyStrategy
Anthropic
DeepMind
OpenAI
ZhipuAI
x.AI
Meta
Grade
D+
D
D-
F
F
F
Score
1.57
1.10
0.93
0
0.35
0.17
Whileallassessedcompanieshavedeclaredtheirintentiontobuildartificialgeneralintelligenceorsuperintelligence,andmosthaveacknowledgedtheexistentialriskspotentiallyposedbysuchsystems,onlyGoogleDeepMind,OpenAIandAnthropicareseriouslyresearchinghowhumanscanremainincontrolandavoidcatastrophicoutcomes.ThetechnicalreviewersassessingthissectionunderlinedthatnoneofthecompanieshaveputforthanofficialstrategyforensuringadvancedAIsystemsremaincontrollableandalignedwithhumanvalues.Thecurrentstateoftechnicalresearchoncontrol,alignmentandinterpretabilityforadvancedAIsystemswasjudgedtobeimmatureandinadequate.
Anthropicattainedthehighestscores,buttheirapproachwasdeemedunlikelytopreventthesignificantrisksofsuperintelligentAI.Anthropic’s“CoreViewsonAISafety”blog-postarticulatesafairlydetailedportraitoftheirstrategyforensuringsafetyassystemsbecomemorepowerful.Expertsnotedthattheirstrategyindicatesasubstantialdepthofawarenessofrelevanttechnicalissues,likedeceptionandsituationalawareness.Onerevieweremphasizedtheneedtomovetowardlogicalorquantitativeguaranteesofsafety.
OpenAI’sblogposton“PlanningforAGIandbeyond”shareshigh-levelprinciples,whichreviewersconsiderreasonablebutcannotbeconsideredaplan.ExpertsthinkthatOpenAI’sworkonscalableoversightmightworkbutisunderdevelopedandcannotbereliedon.
ResearchupdatessharedbyGoogleDeepMind’sAlignmentTeamwerejudgedusefulbutimmatureandinadequatetoensuresafety.Reviewersalsostressedthatrelevantblogpostscannotbetakenasameaningfulrepresentationofthestrategy,plans,orprinciplesoftheorganizationasawhole.
NeitherMeta,x.AIorZhipuAIhaveputforthplansortechnicalresearchaddressingtherisksposedbyartificialgeneralintelligence.ReviewersnotedthatMeta’sopensourceapproachandx.AI’svisionofdemocratizedaccesstotruth-seekingAImayhelpmitigatesomerisksfromconcentrationofpowerandvaluelock-in.
FUTUREOFLIFEINSTITUTE
10
Governance&Accountability
Anthropic
DeepMind
OpenAI
ZhipuAI
x.AI
Meta
Grade
C+
D+
D+
D
F
D-
Score
2.42
1.68
1.43
1.18
0.57
0.80
ReviewersnotedtheconsiderablecareAnthropic’sfoundershaveinvestedinbuildingaresponsiblegovernancestructure,whichmakesitmorelikelytoprioritizesafety.Anthropic’sotherproactiveefforts,liketheirresponsiblescalingpolicy,werealsonotedpositively.
OpenAIwassimilarlycommendedforitsinitialnon-profitstructure,butrecentchanges,includingthedisbandmentofsafetyteamsanditsshifttoafor-profitmodel,raisedconcernsaboutareducedempha
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 《客房服务管理》课件
- 《孟非的激励人生》课件
- 《实验室管理》课件
- 2024-2025学年浙江省9+1联考高一上学期期中考试历史试题(解析版)
- 单位管理制度集合大合集【人事管理】十篇
- 单位管理制度集粹汇编【人力资源管理篇】
- 单位管理制度汇编大合集职员管理篇
- 单位管理制度合并汇编人力资源管理篇十篇
- 《汉字的结构特点》课件
- 单位管理制度范例选集【员工管理篇】
- 外研版(三起)(2024)小学三年级上册英语全册教案
- 初一《皇帝的新装》课本剧剧本
- 幼儿园意识形态风险点排查报告
- 英美文学导论21级学习通超星期末考试答案章节答案2024年
- 腰椎感染护理查房
- 2023-2024学年全国小学三年级上语文人教版期末考卷(含答案解析)
- 2024秋期国家开放大学专科《法律咨询与调解》一平台在线形考(形考任务1至4)试题及答案
- 七年级全册语文古诗词
- 销售业务拓展外包协议模板2024版版
- 2024软件维护合同范本
- 2022-2023学年北京市海淀区七年级上学期期末语文试卷(含答案解析)
评论
0/150
提交评论