人工智能安全指数报告 -FLI AI Safety Index 2024_第1页
人工智能安全指数报告 -FLI AI Safety Index 2024_第2页
人工智能安全指数报告 -FLI AI Safety Index 2024_第3页
人工智能安全指数报告 -FLI AI Safety Index 2024_第4页
人工智能安全指数报告 -FLI AI Safety Index 2024_第5页
已阅读5页,还剩141页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

LIFE

FLIAISafetyIndex2024

IndependentexpertsevaluatesafetypracticesofleadingAIcompaniesacrosscriticaldomains.

11thDecember2024

Availableonlineat:

/index

Contactus:policy@

FUTUREOFLIFEINSTITUTE

1

Contents

Introduction2

Scorecard2

KeyFindings2

IndependentReviewPanel3

IndexDesign4

EvidenceBase5

GradingProcess7

Results7

Conclusions11

AppendixA-GradingSheets12

AppendixB-CompanySurvey42

AppendixC-CompanyResponses64

AbouttheOrganization:TheFutureofLifeInstitute(FLI)isanindependentnonprofitorganizationwiththegoalofreducinglarge-scalerisksandsteeringtransformativetechnologiestobenefithumanity,withaparticularfocusonartificialintelligence(AI).

Learnmore

at.

FUTUREOFLIFEINSTITUTE

2

Introduction

RapidlyimprovingAIcapabilitieshaveincreasedinterestinhowcompaniesreport,assessandattempttomitigateassociatedrisks.TheFutureofLifeInstitute(FLI)thereforefacilitatedtheAISafetyIndex,atooldesignedtoevaluateandcomparesafetypracticesamongleadingAIcompanies.AttheheartoftheIndexis

anindependentreviewpanel,includingsomeoftheworld’sforemostAIexperts.Reviewersweretaskedwith

gradingcompanies’safetypoliciesonthebasisofacomprehensiveevidencebasecollectedbyFLI.TheindexaimstoincentivizeresponsibleAIdevelopmentbypromotingtransparency,highlightingcommendableefforts,andidentifyingareasofconcern.

Scorecard

Firm

OverallGrade

Score

Risk

Assessment

CurrentHarms

Safety

Frameworks

Existential

SafetyStrategy

Governance&Accountability

Transparency&Communication

Anthropic

C

2.13

C+

B-

D+

D+

C+

D+

Google

DeepMind

D+

1.55

C

C+

D-

D

D+

D

OpenAI

D+

1.32

C

D+

D-

D-

D+

D-

ZhipuAI

D

1.11

D+

D+

F

F

D

C

x.AI

D-

0.75

F

D

F

F

F

C

Meta

F

0.65

D+

D

F

F

D-

F

Grading:Usesthe

USGPAsystem

forgradeboundaries:A+,A,A-,B+,[...],Flettervaluescorrespondingtonumericalvalues4.3,4.0,3.7,3.3,[...],0.

KeyFindings

•Largeriskmanagementdisparities:Whilesomecompanieshaveestablishedinitialsafetyframeworksorconductedsomeseriousriskassessmentefforts,othershaveyettotakeeventhemostbasicprecautions.

•Jailbreaks:Alltheflagshipmodelswerefoundtobevulnerabletoadversarialattacks.

•Control-Problem:Despitetheirexplicitambitionstodevelopartificialgeneralintelligence(AGI),capableofrivalingorexceedinghumanintelligence,thereviewpaneldeemedthecurrentstrategiesofallcompaniesinadequateforensuringthatthesesystemsremainsafeandunderhumancontrol.

•Externaloversight:Reviewersconsistentlyhighlightedhowcompanieswereunabletoresistprofit-drivenincentivestocutcornersonsafetyintheabsenceofindependentoversight.WhileAnthropic'scurrentandOpenAI’sinitialgovernancestructureswerehighlightedaspromising,expertscalledforthird-partyvalidationofriskassessmentandsafetyframeworkcomplianceacrossallcompanies.

FUTUREOFLIFEINSTITUTE

3

IndependentReviewPanel

The2024AISafetyIndexwasgradedbyanindependentpanelofworld-renownedAIexpertsinvitedbyFLI’spresident,MITProfessorMaxTegmark.Thepanelwascarefullyselectedtoensureimpartialityandadiverserangeofexpertise,coveringbothtechnicalandgovernanceaspectsofAI.Panelselectionprioritizeddistinguishedacademicsandleadersfromthenon-profitsectortominimizepotentialconflictsofinterest.

AtoosaKasirzadeh

AtoosaKasirzadehisaphilosopherandAIresearcher,servingasanAssistantProfessoratCarnegieMellonUniversity.Previously,shewasavisitingfacultyresearcheratGoogle,aChancellor’sFellowandDirectorofResearchattheCentreforTechnomoralFuturesattheUniversityofEdinburgh,aResearchLeadattheAlanTuringInstitute,aninternatDeepMind,andaGovernanceofAIFellowatOxford.Herinterdisciplinaryresearchaddressesquestionsaboutthesocietalimpacts,governance,andfutureofAI.

Thepanelassignedgradesbasedonthegatheredevidencebase,consideringbothpublicandcompany-submittedinformation.Theirevaluations,combinedwithactionablerecommendations,aimtoincentivizesaferAIpracticeswithintheindustry.Seethe“GradingProcess”sectionformoredetails.

TeganMaharaj

TeganMaharajisanAssistantProfessorintheDepartmentofDecisionSciencesatHECMontréal,wheresheleadstheERRATAlabonEcologicalRiskandResponsibleAI.SheisalsoacoreacademicmemberatMila.HerresearchfocusesonadvancingthescienceandtechniquesofresponsibleAIdevelopment.Previously,sheservedasanAssistantProfessorofMachineLearningattheUniversityofToronto.

YoshuaBengio

YoshuaBengioisaFullProfessorintheDepartmentofComputerScienceandOperationsResearchatUniversitédeMontreal,aswellastheFounderandScientificDirectorofMilaandtheScientificDirectorofIVADO.Heistherecipientofthe2018A.M.TuringAward,aCIFARAIChair,aFellowofboththeRoyalSocietyofLondonandCanada,anOfficeroftheOrderofCanada,KnightoftheLegionofHonorofFrance,MemberoftheUN’sScientificAdvisoryBoardforIndependentAdviceonBreakthroughsinScienceandTechnology,andChairoftheInternationalScientificReportontheSafetyofAdvancedAI.

JessicaNewman

JessicaNewmanistheDirectorofthe

AISecurityInitiative

(AISI),housedattheUCBerkeleyCenterforLong-TermCybersecurity.SheisalsoaCo-DirectoroftheUCBerkeley

AIPolicyHub

.Newman’sresearchfocusesonthegovernance,policy,andpoliticsofAI,withparticularattentiononcomparativeanalysisofnationalAIstrategiesandpolicies,andonmechanismsfortheevaluationandaccountabilityoforganizationaldevelopmentanddeploymentofAIsystems.

DavidKrueger

DavidKruegerisanAssistantProfessorinRobust,ReasoningandResponsibleAIintheDepartmentofComputerScienceandOperationsResearch(DIRO)atUniversityofMontreal,andaCoreAcademicMemberatMila,UCBerkeley’sCenterforHuman-CompatibleAI,andtheCenterfortheStudyofExistentialRisk.Hisworkfocusesonreducingtheriskofhumanextinctionfromartificialintelligencethroughtechnicalresearchaswellaseducation,outreach,governanceandadvocacy.

SnehaRevanur

SnehaRevanuristhefounderandpresidentofEncodeJustice,aglobalyouth-ledorganizationadvocatingfortheethicalregulationofAI.Underherleadership,EncodeJusticehasmobilizedthousandsofyoungpeopletoaddresschallengeslikealgorithmicbiasandAIaccountability.ShewasfeaturedonTIME’sinaugurallistofthe100mostinfluentialpeopleinAI.

StuartRussell

StuartRussellisaProfessorofComputerScienceattheUniversityofCaliforniaatBerkeley,holderoftheSmith-ZadehChairinEngineering,andDirectoroftheCenterforHuman-CompatibleAIandtheKavliCenterforEthics,Science,andthePublic.HeisarecipientoftheIJCAIComputersandThoughtAward,theIJCAIResearchExcellenceAward,andtheACMAllenNewellAward.In2021hereceivedtheOBEfromHerMajestyQueenElizabethandgavetheBBCReithLectures.Heco-authoredthestandardtextbookforAI,whichisusedinover1500universitiesin135countries.

FUTUREOFLIFEINSTITUTE

4

Method

IndexDesign

TheAISafetyIndexevaluatessafetypracticesacrosssixleadinggeneral-purposeAIdevelopers:Anthropic,OpenAI,GoogleDeepMind,Meta,x.AI,andZhipuAI.Theindexprovidesacomprehensiveassessmentbyfocussingonsixcriticaldomains,with42indicatorsspreadacrossthesedomains:

1.RiskAssessment

2.CurrentHarms

3.SafetyFrameworks

4.ExistentialSafetyStrategy

5.Governance&Accountability

6.Transparency&Communication

IndicatorsrangefromcorporategovernancepoliciestoexternalmodelevaluationpracticesandempiricalresultsonAIbenchmarksfocusedonsafety,fairnessandrobustness.Thefullsetofindicatorscanbefoundinthegradingsheetsin

AppendixA

.AquickoverviewisgiveninTable1onthenextpage.Thekeyinclusioncriteriafortheseindicatorswere:

1.Relevance:ThelistemphasizesaspectsofAIsafetyandresponsibleconductthatarewidelyrecognizedbyacademicandpolicycommunities.Manyindicatorsweredirectlyincorporatedfromrelatedprojectsconductedbyleadingresearchorganizations,suchasStanford’sCenterforResearchonFoundationModels.

2.Comparability:Weselectedindicatorsthathighlightmeaningfuldifferencesinsafetypractices,whichcanbeidentifiedbasedontheavailableevidence.Asaresult,safetyprecautionsforwhichconclusivedifferentialevidencewasunavailablewereomitted.

Companieswereselectedbasedontheiranticipatedcapabilitytobuildthemostpowerfulmodelsby2025.Additionally,theinclusionoftheChinesefirmZhipuAIreflectsourintentiontomaketheIndexrepresentativeofleadingcompaniesglobally.Futureiterationsmayfocusondifferentcompaniesasthecompetitivelandscapeevolves.

Weacknowledgethattheindex,whilecomprehensive,doesnotcaptureeveryaspectofresponsibleAIdevelopmentandexclusivelyfocusesongeneral-purposeAI.Wewelcomefeedbackonourindicatorselectionandstrivetoincorporatesuitablesuggestionsintothenextiterationoftheindex.

FUTUREOFLIFEINSTITUTE

5

Table1:Fulloverviewofindicators

RiskAssessment

CurrentHarms

Safety

Frameworks

Existential

SafetyStrategy

Governance&Accountability

Transparency&Communication

Dangerouscapabilityevaluations

AIRBench2024

Riskdomains

Control/Alignmentstrategy

Companystructure

Lobbyingonsafetyregulations

Uplifttrials

TrustLLM

Benchmark

Riskthresholds

Capabilitygoals

Boardofdirectors

Testimoniestopolicymakers

Pre-deploymentexternalsafetytesting

SEALLeaderboardforadversarial

robustness

Modelevaluations

Safetyresearch

Leadership

Leadership

communicationsoncatastrophicrisks

Post-deploymentexternalresearcheraccess

GraySwan

JailbreakingArena-Leaderboard

Decisionmaking

Supportingexternalsafetyresearch

Partnerships

Stanford’s2024

FoundationModelTransparencyIndex1.1

Bugbountiesformodel

vulnerabilities

Fine-tuningprotections

Riskmitigations

Internalreview

Safetyevaluationtransparency

Pre-developmentriskassessments

Carbonoffsets

Conditionalpauses

Missionstatement

Watermarking

Adherence

Whistle-blower

Protection&

Non-disparagement

Agreements

Privacyofuserinputs

Assurance

Compliancetopublic

commitments

Datacrawling

Military,warfare&intelligenceapplications

TermsofServiceanalysis

EvidenceBase

TheAISafetyIndexisunderpinnedbyacomprehensiveevidencebasetoensureevaluationsarewell-informedandtransparent.Thisevidencewascompiledintodetailedgradingsheets,whichpresentedcompany-specificdataacrossall42indicatorstothereviewpanel.Thesesheetsincludedhyperlinkstooriginalsourcesandcanbeaccessedinfullin

AppendixA

.Evidencecollectionreliedontwoprimarypathways:

•PubliclyAvailableInformation:Mostdatawassourcedfrompubliclyaccessiblematerials,includingresearchpapers,policydocuments,newsarticles,andindustryreports.Thisapproachenhancedtransparencyandenabledstakeholderstoverifytheinformationbytracingitbacktoitsoriginalsources.

•CompanySurvey:Tosupplementpubliclyavailabledata,atargetedquestionnairewasdistributedtotheevaluatedcompanies.Thesurveyaimedtogatheradditionalinsightsonsafety-relevantstructures,processes,andstrategies,includinginformationnotyetpubliclydisclosed.

EvidencecollectionspannedfromMay14toNovember27,2024.ForempiricalresultsfromAIbenchmarks,wenoteddataextractiondatestoaccountformodelupdates.Inlinewithourcommitmenttotransparencyandaccountability,allcollectedevidence—whetherpublicorcompany-provided—hasbeendocumentedandmadeavailableforscrutinyintheappendix.

FUTUREOFLIFEINSTITUTE

6

IncorporatedResearchandRelatedWork

TheAISafetyIndexisbuiltonafoundationofextensiveresearchanddrawsinspirationfromseveralnotableprojectsthathaveadvancedtransparencyandaccountabilityinthefieldofgeneral-purposeAI.

Twoofthemostcomprehensiverelatedprojectsarethe

RiskManagementRatings

producedbySaferAI,anon-profitorganizationwithdeepexpertiseinriskmanagement,and

AILabW

,aresearchinitiativeidentifyingstrategiesformitigatingextremerisksfromadvancedAIandreportingoncompanyimplementationofthosestrategies.

TheSafetyIndexdirectlyintegratesfindingsfromStanford’sCenterforResearchonFoundationModels(

CFRN

),

particularlytheir

FoundationModelTransparencyIndex

,aswellasempiricalresultsfrom

AIR-Bench2024

,a

state-of-the-artsafetybenchmarkforGPAIsystems.Additionalempiricaldatacitedincludesscoresfromthe2024

TrustLLM

Benchmark,Scale’s

AdversarialRobustnessevaluation

,andthe

GraySwanJailbreaking

.Thesesourcesofferinvaluableinsightsintothetrustworthiness,fairness,androbustnessofGPAIsystems.

Toevaluateexistentialsafetystrategies,theindexleveragedfindingsfroma

detailedmapping

oftechnicalsafetyresearchatleadingAIcompaniesbytheInstituteforAIPolicyandStrategy.Indicatorsonexternalevaluationswereinformedby

research

ledbyShayneLongpreatMIT,andthestructureofthe‘SafetyFramework’sectiondrewfromrelevantpublicationsfromthe

CenterfortheGovernanceofAI

andtheresearchnon-profit

METR

.Additionally,weexpressgratitudetothejournalistsworkingtokeepcompaniesaccountable,whosereportsarereferencedinthegradingsheets.

CompanySurvey

Tocomplementpubliclyavailabledata,theAISafetyIndexincorporatedinsightsfromatargetedcompanysurvey.Thisquestionnairewasdesignedtogatherdetailedinformationonsafety-relatedstructures,processes,andplans,includingaspectsnotdisclosedinpublicdomains.

Thesurveyconsistedof85questionsspanningsevencategories:Cybersecurity,Governance,Transparency,RiskAssessment,RiskMitigation,CurrentHarms,andExistentialSafety.Questionsincludedbinary,multiple-choice,andopen-endedformats,allowingcompaniestoprovidenuancedresponses.Thefullsurveyisattachedin

AppendixB

.

Surveyresponsesweresharedwiththereviewers,andrelevantinformationfortheindicatorswasalsodirectlyintegratedintothegradingsheets.Informationprovidedbycompanieswasexplicitlyidentifiedinthegradingsheets.Whilex.AIandZhipuAIchosetoengagewiththetargetedquestionsinthesurvey,Anthropic,GoogleDeepMindandMetaonlyreferredustorelevantsourcesofalreadypubliclysharedinformation.OpenAIdecidednottosupportthisproject.

Participationincentive

Whilelessthanhalfofthecompaniesprovidedsubstantialanswers,Engagementwiththesurveywasrecognizedinthe‘TransparencyandCommunications’section.Companiesthatchosenottoengagewiththesurveyreceivedapenaltyofonegradestep.Thisadjustmentincentivizesparticipationandacknowledgesthevalueoftransparencyaboutsafetypractices.Thispenaltyhasbeencommunicatedtothereviewpanelwithinthegradingsheet,andreviewerswereadvisednottoadditionallytakesurveyparticipationintoaccountwhengradingtherelevantsection.FLIremainscommittedtoencouraginghigherparticipationinfutureiterationstoensureasrobustandrepresentativeevaluationsaspossible.

FUTUREOFLIFEINSTITUTE

7

GradingProcess

Thegradingprocesswasdesignedtoensurearigorousandimpartialevaluationofsafetypracticesacrosstheassessedcompanies.Followingtheconclusionoftheevidence-gatheringphaseonNovember27,2024,gradingsheetssummarizingcompany-specificdataweresharedwithanindependentpanelofleadingAIscientistsandgovernanceexperts.Thegradingsheetsincludedallindicator-relevantinformationandinstructionsforscoring.

Panellistswereinstructedtoassigngradesbasedonanabsolutescaleratherthanjustscoringcompaniesrelativetoeachother.FLIincludedaroughgradingrubricforeachdomaintoensureconsistencyinevaluations.Besidestheletter-grades,reviewerswereencouragedtosupporttheirgradeswithshortjustificationsandtoprovidekeyrecommendationsforimprovement.Expertswereencouragedtoincorporateadditionalinsightsandweighindicatorsaccordingtotheirjudgment,ensuringthattheirevaluationsreflectedboththeevidencebaseandtheirspecializedexpertise.Toaccountforthedifferenceinexpertiseamongthereviewers,FLIselectedonesubsettoscorethe“ExistentialSafetyStrategy”andanothertoevaluatethesectionon“CurrentHarms.”Otherwise,allexpertswereinvitedtoscoreeverysection,althoughsomepreferredtoonlygradedomainstheyaremostfamiliarwith.Intheend,everysectionwasgradedbyfourormorereviewers.Gradeswereaggregatedintoaveragescoresforeachdomain,whicharepresentedinthescorecard.

Byadoptingthisstructuredyetflexibleapproach,thegradingprocessnotonlyhighlightscurrentsafetypracticesbutalsoidentifiesactionableareasforimprovement,encouragingcompaniestostriveforhigherstandardsinfutureevaluations.

Onecanarguethatlargecompaniesonthefrontiershouldbeheldtothehighestsafetystandards.Initially,wethereforeconsideredgiving1/3extrapointtocompanieswithmuchlessstafforsignificantlylowermodelscores.Intheend,wedecidednottodothisforthesakeofsimplicity.Thischoicedidnotchangetheresultingrankingofcompanies.

Results

Thissectionpresentsaveragegradesforeachdomainandsummarizesthejustificationsandimprovementrecommendationsprovidedbythereviewpanelexperts.

RiskAssessment

Anthropic

Google

DeepMind

OpenAI

ZhipuAI

x.AI

Meta

Grade

C+

C

C

D+

F

D+

Score

2.67

2.10

2.10

1.55

0

1.50

OpenAI,GoogleDeepMind,andAnthropicwerecommendedforimplementingmorerigoroustestsforidentifyingpotentialdangerouscapabilities,suchasmisuseincyber-attacksorbiologicalweaponcreation,comparedtotheircompetitors.Yet,eventheseeffortswerefoundtofeaturenotablelimitations,leavingtherisksassociatedwithGPAIpoorlyunderstood.OpenAI’supliftstudiesandevaluationsfordeceptionwerenotabletoreviewers.AnthropichasdonethemostimpressiveworkincollaboratingwithnationalAISafetyInstitutes.Metaevaluateditsmodelsfordangerouscapabilitiesbeforedeployment,butcriticalthreatmodels,suchasthoserelatedtoautonomy,scheming,andpersuasionremainunaddressed.ZhipuAI’sRiskAssessmenteffortswerenotedas

FUTUREOFLIFEINSTITUTE

8

lesscomprehensive,whilex.AIfailedtopublishanysubstantivepre-deploymentevaluations,fallingsignificantlybelowindustrystandards.Areviewersuggestedthatthescopeandsizeofhumanparticipantupliftstudiesshouldbeincreasedandstandardsforacceptableriskthresholdsneedtobeestablished.ReviewersnotedthatonlyGoogleDeepMindandAnthropicmaintaintargetedbug-bountyprogramsformodelvulnerabilities,withMeta’sinitiativenarrowlyfocusingonprivacy-relatedattacks.

CurrentHarms

Anthropic

Google

DeepMind

OpenAI

ZhipuAI

x.AI

Meta

Grade

B-

C+

D+

D+

D

D

Score

2.83

2.50

1.68

1.50

1.00

1.18

Anthropic’sAIsystemsreceivedthehighestscoresonleadingempiricalsafetyandtrustworthinessbenchmarks,withGoogleDeepMindrankingsecond.Reviewersnotedthatothercompanies’systemsattainednotablylowerscores,raisingconcernsabouttheadequacyofimplementedsafetymitigations.ReviewerscriticizedMeta’spolicyofpublishingtheweightsoftheirfrontiermodels,asthisenablesmaliciousactorstoeasilyremovethesafeguardsoftheirmodelsandusetheminharmfulways.GoogleDeepMind’sSynthIDwatermarksystemwasrecognizedasaleadingpracticeformitigatingtherisksofAI-generatedcontentmisuse.Incontrast,mostothercompanieslackrobustwatermarkingmeasures.ZhipuAIreportedusingwatermarksinthesurveybutseemsnottodocumenttheirpracticeontheirwebsite.

Additionally,environmentalsustainabilityremainsanareaofdivergence.WhileMetaandMetaactivelyoffsettheircarbonfootprints,othercompaniesonlypartiallyachievethisorevenfailtoreportontheirpracticespublicly.x.AI’sreporteduseofgasturbinestopowerdatacentersisparticularlyconcerningfromasustainabilitystandpoint.

Further,reviewersstronglyadvisecompaniestoensuretheirsystemsarebetterpreparedtowithstandadversarialattacks.Empiricalresultsshowthatmodelsarestillvulnerabletojailbreaking,withOpenAI’smodelsbeingparticularlyvulnerable(nodataforx.AIorZhipuareavailable).DeepMind’smodeldefenceswerethemostrobustintheincludedbenchmarks.

Thepanelalsocriticizedcompaniesforusinguser-interactiondatatotraintheirAIsystems.OnlyAnthropicandZhipuAIusedefaultsettingswhichpreventthemodelfrombeingtrainedonuserinteractions(exceptthoseflaggedforsafetyreview).

SafetyFrameworks

Anthropic

Google

DeepMind

OpenAI

ZhipuAI

x.AI

Meta

Grade

D+

D-

D-

F

F

F

Score

1.67

0.80

0.90

0.35

0.35

0.35

AllsixcompaniessignedtheSeoul

FrontierAISafetyCommitments

andpledgedtodevelopsafetyframeworkswiththresholdsforunacceptablerisks,advancedsafeguardsforhigh-risklevels,andconditionsforpausingdevelopmentifriskscannotbemanaged.Asofthepublicationofthisindex,onlyOpenAI,AnthropicandGoogleDeepMindhavepublishedtheirframeworks.Assuch,thereviewerscouldonlyassesstheframeworksofthosethreecompanies.

FUTUREOFLIFEINSTITUTE

9

Whiletheseframeworkswerejudgedinsufficienttoprotectthepublicfromunacceptablelevelsofrisk,expertsstillconsideredtheframeworkstobeeffectivetosomedegree.Anthropic’sframeworkstoodouttoreviewersasthemostcomprehensivebecauseitdetailedadditionalimplementationguidance.Oneexpertnotedtheneedforamoreprecisecharacterizationofcatastrophiceventsandclearerthresholds.OthercommentsnotedthattheframeworksfromOpenAIandGoogleDeepMindwerenotdetailedenoughfortheireffectivenesstobedeterminedexternally.Additionally,noframeworksufficientlydefinedspecificsaroundconditionalpausesandareviewersuggestedtriggerconditionsshouldfactorinexternaleventsandexpertopinion.Multipleexpertsstressedthatsafetyframeworksneedtobesupportedbyrobustexternalreviewsandoversightmechanismsortheycannotbetrustedtoaccuratelyreportrisklevels.Anthropic’seffortstowardexternaloversightweredeemedbest,ifstillinsufficient.

ExistentialSafetyStrategy

Anthropic

Google

DeepMind

OpenAI

ZhipuAI

x.AI

Meta

Grade

D+

D

D-

F

F

F

Score

1.57

1.10

0.93

0

0.35

0.17

Whileallassessedcompanieshavedeclaredtheirintentiontobuildartificialgeneralintelligenceorsuperintelligence,andmosthaveacknowledgedtheexistentialriskspotentiallyposedbysuchsystems,onlyGoogleDeepMind,OpenAIandAnthropicareseriouslyresearchinghowhumanscanremainincontrolandavoidcatastrophicoutcomes.ThetechnicalreviewersassessingthissectionunderlinedthatnoneofthecompanieshaveputforthanofficialstrategyforensuringadvancedAIsystemsremaincontrollableandalignedwithhumanvalues.Thecurrentstateoftechnicalresearchoncontrol,alignmentandinterpretabilityforadvancedAIsystemswasjudgedtobeimmatureandinadequate.

Anthropicattainedthehighestscores,buttheirapproachwasdeemedunlikelytopreventthesignificantrisksofsuperintelligentAI.Anthropic’s“CoreViewsonAISafety”blog-postarticulatesafairlydetailedportraitoftheirstrategyforensuringsafetyassystemsbecomemorepowerful.Expertsnotedthattheirstrategyindicatesasubstantialdepthofawarenessofrelevanttechnicalissues,likedeceptionandsituationalawareness.Onerevieweremphasizedtheneedtomovetowardlogicalorquantitativeguaranteesofsafety.

OpenAI’sblogposton“PlanningforAGIandbeyond”shareshigh-levelprinciples,whichreviewersconsiderreasonablebutcannotbeconsideredaplan.ExpertsthinkthatOpenAI’sworkonscalableoversightmightworkbutisunderdevelopedandcannotbereliedon.

ResearchupdatessharedbyGoogleDeepMind’sAlignmentTeamwerejudgedusefulbutimmatureandinadequatetoensuresafety.Reviewersalsostressedthatrelevantblogpostscannotbetakenasameaningfulrepresentationofthestrategy,plans,orprinciplesoftheorganizationasawhole.

NeitherMeta,x.AIorZhipuAIhaveputforthplansortechnicalresearchaddressingtherisksposedbyartificialgeneralintelligence.ReviewersnotedthatMeta’sopensourceapproachandx.AI’svisionofdemocratizedaccesstotruth-seekingAImayhelpmitigatesomerisksfromconcentrationofpowerandvaluelock-in.

FUTUREOFLIFEINSTITUTE

10

Governance&Accountability

Anthropic

Google

DeepMind

OpenAI

ZhipuAI

x.AI

Meta

Grade

C+

D+

D+

D

F

D-

Score

2.42

1.68

1.43

1.18

0.57

0.80

ReviewersnotedtheconsiderablecareAnthropic’sfoundershaveinvestedinbuildingaresponsiblegovernancestructure,whichmakesitmorelikelytoprioritizesafety.Anthropic’sotherproactiveefforts,liketheirresponsiblescalingpolicy,werealsonotedpositively.

OpenAIwassimilarlycommendedforitsinitialnon-profitstructure,butrecentchanges,includingthedisbandmentofsafetyteamsanditsshifttoafor-profitmodel,raisedconcernsaboutareducedempha

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论