人工智能安全指数报告 -FLI AI Safety Index 2024

上传人：策*** IP属地：山西上传时间：2024-12-23 格式：DOCX 页数：146 大小：717.73KB 积分：19.9 举报 版权申诉

人工智能安全指数报告 -FLI AI Safety Index 2024_第2页

人工智能安全指数报告 -FLI AI Safety Index 2024_第3页

人工智能安全指数报告 -FLI AI Safety Index 2024_第4页

人工智能安全指数报告 -FLI AI Safety Index 2024_第5页

已阅读5页，还剩141页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

LIFE

FLIAISafetyIndex2024

IndependentexpertsevaluatesafetypracticesofleadingAIcompaniesacrosscriticaldomains.

11thDecember2024

Availableonlineat:

/index

Contactus:policy@

FUTUREOFLIFEINSTITUTE

Contents

Introduction2

Scorecard2

KeyFindings2

IndependentReviewPanel3

IndexDesign4

EvidenceBase5

GradingProcess7

Results7

Conclusions11

AppendixA-GradingSheets12

AppendixB-CompanySurvey42

AppendixC-CompanyResponses64

AbouttheOrganization:TheFutureofLifeInstitute(FLI)isanindependentnonprofitorganizationwiththegoalofreducinglarge-scalerisksandsteeringtransformativetechnologiestobenefithumanity,withaparticularfocusonartificialintelligence(AI).

Learnmore

at.

FUTUREOFLIFEINSTITUTE

Introduction

RapidlyimprovingAIcapabilitieshaveincreasedinterestinhowcompaniesreport,assessandattempttomitigateassociatedrisks.TheFutureofLifeInstitute(FLI)thereforefacilitatedtheAISafetyIndex,atooldesignedtoevaluateandcomparesafetypracticesamongleadingAIcompanies.AttheheartoftheIndexis

anindependentreviewpanel,includingsomeoftheworld’sforemostAIexperts.Reviewersweretaskedwith

gradingcompanies’safetypoliciesonthebasisofacomprehensiveevidencebasecollectedbyFLI.TheindexaimstoincentivizeresponsibleAIdevelopmentbypromotingtransparency,highlightingcommendableefforts,andidentifyingareasofconcern.

Scorecard

Firm

OverallGrade

Score

Risk

Assessment

CurrentHarms

Safety

Frameworks

Existential

SafetyStrategy

Governance&Accountability

Transparency&Communication

Anthropic

2.13

C+

B-

D+

C+

D+

Google

DeepMind

D+

1.55

C+

D-

D+

OpenAI

D+

1.32

D+

D-

D+

D-

ZhipuAI

1.11

D+

x.AI

D-

0.75

Meta

0.65

D+

D-

Grading:Usesthe

USGPAsystem

forgradeboundaries:A+,A,A-,B+,[...],Flettervaluescorrespondingtonumericalvalues4.3,4.0,3.7,3.3,[...],0.

KeyFindings

•Largeriskmanagementdisparities:Whilesomecompanieshaveestablishedinitialsafetyframeworksorconductedsomeseriousriskassessmentefforts,othershaveyettotakeeventhemostbasicprecautions.

•Jailbreaks:Alltheflagshipmodelswerefoundtobevulnerabletoadversarialattacks.

•Control-Problem:Despitetheirexplicitambitionstodevelopartificialgeneralintelligence(AGI),capableofrivalingorexceedinghumanintelligence,thereviewpaneldeemedthecurrentstrategiesofallcompaniesinadequateforensuringthatthesesystemsremainsafeandunderhumancontrol.

•Externaloversight:Reviewersconsistentlyhighlightedhowcompanieswereunabletoresistprofit-drivenincentivestocutcornersonsafetyintheabsenceofindependentoversight.WhileAnthropic'scurrentandOpenAI’sinitialgovernancestructureswerehighlightedaspromising,expertscalledforthird-partyvalidationofriskassessmentandsafetyframeworkcomplianceacrossallcompanies.

FUTUREOFLIFEINSTITUTE

IndependentReviewPanel

The2024AISafetyIndexwasgradedbyanindependentpanelofworld-renownedAIexpertsinvitedbyFLI’spresident,MITProfessorMaxTegmark.Thepanelwascarefullyselectedtoensureimpartialityandadiverserangeofexpertise,coveringbothtechnicalandgovernanceaspectsofAI.Panelselectionprioritizeddistinguishedacademicsandleadersfromthenon-profitsectortominimizepotentialconflictsofinterest.

AtoosaKasirzadeh

AtoosaKasirzadehisaphilosopherandAIresearcher,servingasanAssistantProfessoratCarnegieMellonUniversity.Previously,shewasavisitingfacultyresearcheratGoogle,aChancellor’sFellowandDirectorofResearchattheCentreforTechnomoralFuturesattheUniversityofEdinburgh,aResearchLeadattheAlanTuringInstitute,aninternatDeepMind,andaGovernanceofAIFellowatOxford.Herinterdisciplinaryresearchaddressesquestionsaboutthesocietalimpacts,governance,andfutureofAI.

Thepanelassignedgradesbasedonthegatheredevidencebase,consideringbothpublicandcompany-submittedinformation.Theirevaluations,combinedwithactionablerecommendations,aimtoincentivizesaferAIpracticeswithintheindustry.Seethe“GradingProcess”sectionformoredetails.

TeganMaharaj

TeganMaharajisanAssistantProfessorintheDepartmentofDecisionSciencesatHECMontréal,wheresheleadstheERRATAlabonEcologicalRiskandResponsibleAI.SheisalsoacoreacademicmemberatMila.HerresearchfocusesonadvancingthescienceandtechniquesofresponsibleAIdevelopment.Previously,sheservedasanAssistantProfessorofMachineLearningattheUniversityofToronto.

YoshuaBengio

YoshuaBengioisaFullProfessorintheDepartmentofComputerScienceandOperationsResearchatUniversitédeMontreal,aswellastheFounderandScientificDirectorofMilaandtheScientificDirectorofIVADO.Heistherecipientofthe2018A.M.TuringAward,aCIFARAIChair,aFellowofboththeRoyalSocietyofLondonandCanada,anOfficeroftheOrderofCanada,KnightoftheLegionofHonorofFrance,MemberoftheUN’sScientificAdvisoryBoardforIndependentAdviceonBreakthroughsinScienceandTechnology,andChairoftheInternationalScientificReportontheSafetyofAdvancedAI.

JessicaNewman

JessicaNewmanistheDirectorofthe

AISecurityInitiative

(AISI),housedattheUCBerkeleyCenterforLong-TermCybersecurity.SheisalsoaCo-DirectoroftheUCBerkeley

AIPolicyHub

.Newman’sresearchfocusesonthegovernance,policy,andpoliticsofAI,withparticularattentiononcomparativeanalysisofnationalAIstrategiesandpolicies,andonmechanismsfortheevaluationandaccountabilityoforganizationaldevelopmentanddeploymentofAIsystems.

DavidKrueger

DavidKruegerisanAssistantProfessorinRobust,ReasoningandResponsibleAIintheDepartmentofComputerScienceandOperationsResearch(DIRO)atUniversityofMontreal,andaCoreAcademicMemberatMila,UCBerkeley’sCenterforHuman-CompatibleAI,andtheCenterfortheStudyofExistentialRisk.Hisworkfocusesonreducingtheriskofhumanextinctionfromartificialintelligencethroughtechnicalresearchaswellaseducation,outreach,governanceandadvocacy.

SnehaRevanur

SnehaRevanuristhefounderandpresidentofEncodeJustice,aglobalyouth-ledorganizationadvocatingfortheethicalregulationofAI.Underherleadership,EncodeJusticehasmobilizedthousandsofyoungpeopletoaddresschallengeslikealgorithmicbiasandAIaccountability.ShewasfeaturedonTIME’sinaugurallistofthe100mostinfluentialpeopleinAI.

StuartRussell

StuartRussellisaProfessorofComputerScienceattheUniversityofCaliforniaatBerkeley,holderoftheSmith-ZadehChairinEngineering,andDirectoroftheCenterforHuman-CompatibleAIandtheKavliCenterforEthics,Science,andthePublic.HeisarecipientoftheIJCAIComputersandThoughtAward,theIJCAIResearchExcellenceAward,andtheACMAllenNewellAward.In2021hereceivedtheOBEfromHerMajestyQueenElizabethandgavetheBBCReithLectures.Heco-authoredthestandardtextbookforAI,whichisusedinover1500universitiesin135countries.

FUTUREOFLIFEINSTITUTE

Method

IndexDesign

TheAISafetyIndexevaluatessafetypracticesacrosssixleadinggeneral-purposeAIdevelopers:Anthropic,OpenAI,GoogleDeepMind,Meta,x.AI,andZhipuAI.Theindexprovidesacomprehensiveassessmentbyfocussingonsixcriticaldomains,with42indicatorsspreadacrossthesedomains:

1.RiskAssessment

2.CurrentHarms

3.SafetyFrameworks

4.ExistentialSafetyStrategy

5.Governance&Accountability

6.Transparency&Communication

IndicatorsrangefromcorporategovernancepoliciestoexternalmodelevaluationpracticesandempiricalresultsonAIbenchmarksfocusedonsafety,fairnessandrobustness.Thefullsetofindicatorscanbefoundinthegradingsheetsin

AppendixA

.AquickoverviewisgiveninTable1onthenextpage.Thekeyinclusioncriteriafortheseindicatorswere:

1.Relevance:ThelistemphasizesaspectsofAIsafetyandresponsibleconductthatarewidelyrecognizedbyacademicandpolicycommunities.Manyindicatorsweredirectlyincorporatedfromrelatedprojectsconductedbyleadingresearchorganizations,suchasStanford’sCenterforResearchonFoundationModels.

2.Comparability:Weselectedindicatorsthathighlightmeaningfuldifferencesinsafetypractices,whichcanbeidentifiedbasedontheavailableevidence.Asaresult,safetyprecautionsforwhichconclusivedifferentialevidencewasunavailablewereomitted.

Companieswereselectedbasedontheiranticipatedcapabilitytobuildthemostpowerfulmodelsby2025.Additionally,theinclusionoftheChinesefirmZhipuAIreflectsourintentiontomaketheIndexrepresentativeofleadingcompaniesglobally.Futureiterationsmayfocusondifferentcompaniesasthecompetitivelandscapeevolves.

Weacknowledgethattheindex,whilecomprehensive,doesnotcaptureeveryaspectofresponsibleAIdevelopmentandexclusivelyfocusesongeneral-purposeAI.Wewelcomefeedbackonourindicatorselectionandstrivetoincorporatesuitablesuggestionsintothenextiterationoftheindex.

FUTUREOFLIFEINSTITUTE

Table1:Fulloverviewofindicators

RiskAssessment

CurrentHarms

Safety

Frameworks

Existential

SafetyStrategy

Governance&Accountability

Transparency&Communication

Dangerouscapabilityevaluations

AIRBench2024

Riskdomains

Control/Alignmentstrategy

Companystructure

Lobbyingonsafetyregulations

Uplifttrials

TrustLLM

Benchmark

Riskthresholds

Capabilitygoals

Boardofdirectors

Testimoniestopolicymakers

Pre-deploymentexternalsafetytesting

SEALLeaderboardforadversarial

robustness

Modelevaluations

Safetyresearch

Leadership

communicationsoncatastrophicrisks

Post-deploymentexternalresearcheraccess

GraySwan

JailbreakingArena-Leaderboard

Decisionmaking

Supportingexternalsafetyresearch

Partnerships

Stanford’s2024

FoundationModelTransparencyIndex1.1

Bugbountiesformodel

vulnerabilities

Fine-tuningprotections

Riskmitigations

Internalreview

Safetyevaluationtransparency

Pre-developmentriskassessments

Carbonoffsets

Conditionalpauses

Missionstatement

Watermarking

Adherence

Whistle-blower

Protection&

Non-disparagement

Agreements

Privacyofuserinputs

Assurance

Compliancetopublic

commitments

Datacrawling

Military,warfare&intelligenceapplications

TermsofServiceanalysis

EvidenceBase

TheAISafetyIndexisunderpinnedbyacomprehensiveevidencebasetoensureevaluationsarewell-informedandtransparent.Thisevidencewascompiledintodetailedgradingsheets,whichpresentedcompany-specificdataacrossall42indicatorstothereviewpanel.Thesesheetsincludedhyperlinkstooriginalsourcesandcanbeaccessedinfullin

AppendixA

.Evidencecollectionreliedontwoprimarypathways:

•PubliclyAvailableInformation:Mostdatawassourcedfrompubliclyaccessiblematerials,includingresearchpapers,policydocuments,newsarticles,andindustryreports.Thisapproachenhancedtransparencyandenabledstakeholderstoverifytheinformationbytracingitbacktoitsoriginalsources.

•CompanySurvey:Tosupplementpubliclyavailabledata,atargetedquestionnairewasdistributedtotheevaluatedcompanies.Thesurveyaimedtogatheradditionalinsightsonsafety-relevantstructures,processes,andstrategies,includinginformationnotyetpubliclydisclosed.

EvidencecollectionspannedfromMay14toNovember27,2024.ForempiricalresultsfromAIbenchmarks,wenoteddataextractiondatestoaccountformodelupdates.Inlinewithourcommitmenttotransparencyandaccountability,allcollectedevidence—whetherpublicorcompany-provided—hasbeendocumentedandmadeavailableforscrutinyintheappendix.

FUTUREOFLIFEINSTITUTE

IncorporatedResearchandRelatedWork

TheAISafetyIndexisbuiltonafoundationofextensiveresearchanddrawsinspirationfromseveralnotableprojectsthathaveadvancedtransparencyandaccountabilityinthefieldofgeneral-purposeAI.

Twoofthemostcomprehensiverelatedprojectsarethe

RiskManagementRatings

producedbySaferAI,anon-profitorganizationwithdeepexpertiseinriskmanagement,and

AILabW

,aresearchinitiativeidentifyingstrategiesformitigatingextremerisksfromadvancedAIandreportingoncompanyimplementationofthosestrategies.

TheSafetyIndexdirectlyintegratesfindingsfromStanford’sCenterforResearchonFoundationModels(

CFRN

particularlytheir

FoundationModelTransparencyIndex

,aswellasempiricalresultsfrom

AIR-Bench2024

state-of-the-artsafetybenchmarkforGPAIsystems.Additionalempiricaldatacitedincludesscoresfromthe2024

TrustLLM

Benchmark,Scale’s

AdversarialRobustnessevaluation

,andthe

GraySwanJailbreaking

.Thesesourcesofferinvaluableinsightsintothetrustworthiness,fairness,androbustnessofGPAIsystems.

Toevaluateexistentialsafetystrategies,theindexleveragedfindingsfroma

detailedmapping

oftechnicalsafetyresearchatleadingAIcompaniesbytheInstituteforAIPolicyandStrategy.Indicatorsonexternalevaluationswereinformedby

research

ledbyShayneLongpreatMIT,andthestructureofthe‘SafetyFramework’sectiondrewfromrelevantpublicationsfromthe

CenterfortheGovernanceofAI

andtheresearchnon-profit

METR

.Additionally,weexpressgratitudetothejournalistsworkingtokeepcompaniesaccountable,whosereportsarereferencedinthegradingsheets.

CompanySurvey

Tocomplementpubliclyavailabledata,theAISafetyIndexincorporatedinsightsfromatargetedcompanysurvey.Thisquestionnairewasdesignedtogatherdetailedinformationonsafety-relatedstructures,processes,andplans,includingaspectsnotdisclosedinpublicdomains.

Thesurveyconsistedof85questionsspanningsevencategories:Cybersecurity,Governance,Transparency,RiskAssessment,RiskMitigation,CurrentHarms,andExistentialSafety.Questionsincludedbinary,multiple-choice,andopen-endedformats,allowingcompaniestoprovidenuancedresponses.Thefullsurveyisattachedin

AppendixB

Surveyresponsesweresharedwiththereviewers,andrelevantinformationfortheindicatorswasalsodirectlyintegratedintothegradingsheets.Informationprovidedbycompanieswasexplicitlyidentifiedinthegradingsheets.Whilex.AIandZhipuAIchosetoengagewiththetargetedquestionsinthesurvey,Anthropic,GoogleDeepMindandMetaonlyreferredustorelevantsourcesofalreadypubliclysharedinformation.OpenAIdecidednottosupportthisproject.

Participationincentive

Whilelessthanhalfofthecompaniesprovidedsubstantialanswers,Engagementwiththesurveywasrecognizedinthe‘TransparencyandCommunications’section.Companiesthatchosenottoengagewiththesurveyreceivedapenaltyofonegradestep.Thisadjustmentincentivizesparticipationandacknowledgesthevalueoftransparencyaboutsafetypractices.Thispenaltyhasbeencommunicatedtothereviewpanelwithinthegradingsheet,andreviewerswereadvisednottoadditionallytakesurveyparticipationintoaccountwhengradingtherelevantsection.FLIremainscommittedtoencouraginghigherparticipationinfutureiterationstoensureasrobustandrepresentativeevaluationsaspossible.

FUTUREOFLIFEINSTITUTE

GradingProcess

Thegradingprocesswasdesignedtoensurearigorousandimpartialevaluationofsafetypracticesacrosstheassessedcompanies.Followingtheconclusionoftheevidence-gatheringphaseonNovember27,2024,gradingsheetssummarizingcompany-specificdataweresharedwithanindependentpanelofleadingAIscientistsandgovernanceexperts.Thegradingsheetsincludedallindicator-relevantinformationandinstructionsforscoring.

Panellistswereinstructedtoassigngradesbasedonanabsolutescaleratherthanjustscoringcompaniesrelativetoeachother.FLIincludedaroughgradingrubricforeachdomaintoensureconsistencyinevaluations.Besidestheletter-grades,reviewerswereencouragedtosupporttheirgradeswithshortjustificationsandtoprovidekeyrecommendationsforimprovement.Expertswereencouragedtoincorporateadditionalinsightsandweighindicatorsaccordingtotheirjudgment,ensuringthattheirevaluationsreflectedboththeevidencebaseandtheirspecializedexpertise.Toaccountforthedifferenceinexpertiseamongthereviewers,FLIselectedonesubsettoscorethe“ExistentialSafetyStrategy”andanothertoevaluatethesectionon“CurrentHarms.”Otherwise,allexpertswereinvitedtoscoreeverysection,althoughsomepreferredtoonlygradedomainstheyaremostfamiliarwith.Intheend,everysectionwasgradedbyfourormorereviewers.Gradeswereaggregatedintoaveragescoresforeachdomain,whicharepresentedinthescorecard.

Byadoptingthisstructuredyetflexibleapproach,thegradingprocessnotonlyhighlightscurrentsafetypracticesbutalsoidentifiesactionableareasforimprovement,encouragingcompaniestostriveforhigherstandardsinfutureevaluations.

Onecanarguethatlargecompaniesonthefrontiershouldbeheldtothehighestsafetystandards.Initially,wethereforeconsideredgiving1/3extrapointtocompanieswithmuchlessstafforsignificantlylowermodelscores.Intheend,wedecidednottodothisforthesakeofsimplicity.Thischoicedidnotchangetheresultingrankingofcompanies.

Results

Thissectionpresentsaveragegradesforeachdomainandsummarizesthejustificationsandimprovementrecommendationsprovidedbythereviewpanelexperts.

RiskAssessment

Anthropic

Google

DeepMind

OpenAI

ZhipuAI

x.AI

Meta

Grade

C+

D+

Score

2.67

2.10

1.55

1.50

OpenAI,GoogleDeepMind,andAnthropicwerecommendedforimplementingmorerigoroustestsforidentifyingpotentialdangerouscapabilities,suchasmisuseincyber-attacksorbiologicalweaponcreation,comparedtotheircompetitors.Yet,eventheseeffortswerefoundtofeaturenotablelimitations,leavingtherisksassociatedwithGPAIpoorlyunderstood.OpenAI’supliftstudiesandevaluationsfordeceptionwerenotabletoreviewers.AnthropichasdonethemostimpressiveworkincollaboratingwithnationalAISafetyInstitutes.Metaevaluateditsmodelsfordangerouscapabilitiesbeforedeployment,butcriticalthreatmodels,suchasthoserelatedtoautonomy,scheming,andpersuasionremainunaddressed.ZhipuAI’sRiskAssessmenteffortswerenotedas

FUTUREOFLIFEINSTITUTE

lesscomprehensive,whilex.AIfailedtopublishanysubstantivepre-deploymentevaluations,fallingsignificantlybelowindustrystandards.Areviewersuggestedthatthescopeandsizeofhumanparticipantupliftstudiesshouldbeincreasedandstandardsforacceptableriskthresholdsneedtobeestablished.ReviewersnotedthatonlyGoogleDeepMindandAnthropicmaintaintargetedbug-bountyprogramsformodelvulnerabilities,withMeta’sinitiativenarrowlyfocusingonprivacy-relatedattacks.

CurrentHarms

Anthropic

Google

DeepMind

OpenAI

ZhipuAI

x.AI

Meta

Grade

B-

C+

D+

Score

2.83

2.50

1.68

1.50

1.00

1.18

Anthropic’sAIsystemsreceivedthehighestscoresonleadingempiricalsafetyandtrustworthinessbenchmarks,withGoogleDeepMindrankingsecond.Reviewersnotedthatothercompanies’systemsattainednotablylowerscores,raisingconcernsabouttheadequacyofimplementedsafetymitigations.ReviewerscriticizedMeta’spolicyofpublishingtheweightsoftheirfrontiermodels,asthisenablesmaliciousactorstoeasilyremovethesafeguardsoftheirmodelsandusetheminharmfulways.GoogleDeepMind’sSynthIDwatermarksystemwasrecognizedasaleadingpracticeformitigatingtherisksofAI-generatedcontentmisuse.Incontrast,mostothercompanieslackrobustwatermarkingmeasures.ZhipuAIreportedusingwatermarksinthesurveybutseemsnottodocumenttheirpracticeontheirwebsite.

Additionally,environmentalsustainabilityremainsanareaofdivergence.WhileMetaandMetaactivelyoffsettheircarbonfootprints,othercompaniesonlypartiallyachievethisorevenfailtoreportontheirpracticespublicly.x.AI’sreporteduseofgasturbinestopowerdatacentersisparticularlyconcerningfromasustainabilitystandpoint.

Further,reviewersstronglyadvisecompaniestoensuretheirsystemsarebetterpreparedtowithstandadversarialattacks.Empiricalresultsshowthatmodelsarestillvulnerabletojailbreaking,withOpenAI’smodelsbeingparticularlyvulnerable(nodataforx.AIorZhipuareavailable).DeepMind’smodeldefenceswerethemostrobustintheincludedbenchmarks.

Thepanelalsocriticizedcompaniesforusinguser-interactiondatatotraintheirAIsystems.OnlyAnthropicandZhipuAIusedefaultsettingswhichpreventthemodelfrombeingtrainedonuserinteractions(exceptthoseflaggedforsafetyreview).

SafetyFrameworks

Anthropic

Google

DeepMind

OpenAI

ZhipuAI

x.AI

Meta

Grade

D+

D-

Score

1.67

0.80

0.90

0.35

AllsixcompaniessignedtheSeoul

FrontierAISafetyCommitments

andpledgedtodevelopsafetyframeworkswiththresholdsforunacceptablerisks,advancedsafeguardsforhigh-risklevels,andconditionsforpausingdevelopmentifriskscannotbemanaged.Asofthepublicationofthisindex,onlyOpenAI,AnthropicandGoogleDeepMindhavepublishedtheirframeworks.Assuch,thereviewerscouldonlyassesstheframeworksofthosethreecompanies.

FUTUREOFLIFEINSTITUTE

Whiletheseframeworkswerejudgedinsufficienttoprotectthepublicfromunacceptablelevelsofrisk,expertsstillconsideredtheframeworkstobeeffectivetosomedegree.Anthropic’sframeworkstoodouttoreviewersasthemostcomprehensivebecauseitdetailedadditionalimplementationguidance.Oneexpertnotedtheneedforamoreprecisecharacterizationofcatastrophiceventsandclearerthresholds.OthercommentsnotedthattheframeworksfromOpenAIandGoogleDeepMindwerenotdetailedenoughfortheireffectivenesstobedeterminedexternally.Additionally,noframeworksufficientlydefinedspecificsaroundconditionalpausesandareviewersuggestedtriggerconditionsshouldfactorinexternaleventsandexpertopinion.Multipleexpertsstressedthatsafetyframeworksneedtobesupportedbyrobustexternalreviewsandoversightmechanismsortheycannotbetrustedtoaccuratelyreportrisklevels.Anthropic’seffortstowardexternaloversightweredeemedbest,ifstillinsufficient.

ExistentialSafetyStrategy

Anthropic

Google

DeepMind

OpenAI

ZhipuAI

x.AI

Meta

Grade

D+

D-

Score

1.57

1.10

0.93

0.35

0.17

Whileallassessedcompanieshavedeclaredtheirintentiontobuildartificialgeneralintelligenceorsuperintelligence,andmosthaveacknowledgedtheexistentialriskspotentiallyposedbysuchsystems,onlyGoogleDeepMind,OpenAIandAnthropicareseriouslyresearchinghowhumanscanremainincontrolandavoidcatastrophicoutcomes.ThetechnicalreviewersassessingthissectionunderlinedthatnoneofthecompanieshaveputforthanofficialstrategyforensuringadvancedAIsystemsremaincontrollableandalignedwithhumanvalues.Thecurrentstateoftechnicalresearchoncontrol,alignmentandinterpretabilityforadvancedAIsystemswasjudgedtobeimmatureandinadequate.

Anthropicattainedthehighestscores,buttheirapproachwasdeemedunlikelytopreventthesignificantrisksofsuperintelligentAI.Anthropic’s“CoreViewsonAISafety”blog-postarticulatesafairlydetailedportraitoftheirstrategyforensuringsafetyassystemsbecomemorepowerful.Expertsnotedthattheirstrategyindicatesasubstantialdepthofawarenessofrelevanttechnicalissues,likedeceptionandsituationalawareness.Onerevieweremphasizedtheneedtomovetowardlogicalorquantitativeguaranteesofsafety.

OpenAI’sblogposton“PlanningforAGIandbeyond”shareshigh-levelprinciples,whichreviewersconsiderreasonablebutcannotbeconsideredaplan.ExpertsthinkthatOpenAI’sworkonscalableoversightmightworkbutisunderdevelopedandcannotbereliedon.

ResearchupdatessharedbyGoogleDeepMind’sAlignmentTeamwerejudgedusefulbutimmatureandinadequatetoensuresafety.Reviewersalsostressedthatrelevantblogpostscannotbetakenasameaningfulrepresentationofthestrategy,plans,orprinciplesoftheorganizationasawhole.

NeitherMeta,x.AIorZhipuAIhaveputforthplansortechnicalresearchaddressingtherisksposedbyartificialgeneralintelligence.ReviewersnotedthatMeta’sopensourceapproachandx.AI’svisionofdemocratizedaccesstotruth-seekingAImayhelpmitigatesomerisksfromconcentrationofpowerandvaluelock-in.

FUTUREOFLIFEINSTITUTE

Governance&Accountability

Anthropic

Google

DeepMind

OpenAI

ZhipuAI

x.AI

Meta

Grade

C+

D+

D-

Score

2.42

1.68

1.43

1.18

0.57

0.80

ReviewersnotedtheconsiderablecareAnthropic’sfoundershaveinvestedinbuildingaresponsiblegovernancestructure,whichmakesitmorelikelytoprioritizesafety.Anthropic’sotherproactiveefforts,liketheirresponsiblescalingpolicy,werealsonotedpositively.

OpenAIwassimilarlycommendedforitsinitialnon-profitstructure,butrecentchanges,includingthedisbandmentofsafetyteamsanditsshifttoafor-profitmodel,raisedconcernsaboutareducedempha

人人文库> 全部分类> 应用文书 > 研究报告

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

人工智能安全指数报告 -FLI AI Safety Index 2024

文档简介

温馨提示

最新文档

评论

人工智能安全指数报告 -FLI AI Safety Index 2024

文档简介

温馨提示

最新文档

评论

相关文档