软件工程专业英语-Big Data_第1页
软件工程专业英语-Big Data_第2页
软件工程专业英语-Big Data_第3页
软件工程专业英语-Big Data_第4页
软件工程专业英语-Big Data_第5页
已阅读5页,还剩35页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

我们毕业啦其实是答辩的标题地方软件工程专业英语SOFTWAREENGINEERINGESSENTIALSCOMPETENCIESAfteryouhavereadthischapter,youshouldbeableto:Explainwhatbigdatais.Explainthe4Vpropertiesofbigdata:volume,velocity,variety,andveracity.Distinguishthecategoriesofbigdata.Understandthebigvalueofbigdata.DescribeJimGray’sFourthParadigm.Understandtheevolutionofbigdata.Discussthefourkindsofchallengesforbigdata.2023/10/21..22023/10/21Weareexperiencingunprecedentedgrowthintheamountofdataavailableinnearlyeveryarea,rangingfromthephysicalworld,includingbiology,astronomy,remotesensingetc.,tohumanactivities,includingsocialnetworks,Internet,health,finance,economics,transportation,etc..2023/10/21..3Thesedataarecommonlycalledbigdata,andbelievedtocontaingreatvaluesandimplynewopportunities.Thischapterpresentsanoverviewofbigdata,includingitsdefinition,properties,andcategories,aswellaschallengesitbringstous.2023/10/212023/10/21..46.1

BIGDATAANDITSPROPERTIES6.2CATEGORIESOFBIGDATA6.3BIG

VALUEFORBIGDATA6.4JIMGRAY’SFOURTHPARADIGM6.5EVOLUTIONOFDATAMANAGEMENTSUMMARYCHPTER6

BIGDATA6.6BIGDATACHALLENGES2023/10/212023/10/21..56.1BIGDATAANDITSPROPERTIES2023/10/21Bigdataisabroadtermfordatasetssolargeorcomplexthattraditionaldataprocessingapplicationsareinadequate.Mostprofessionalsintheindustryconsidermultipleterabytesorpetabytestobethecurrentbigdatabenchmark.2023/10/21..6Others,however,arehesitanttocommittoaspecificquantity,astherapidpaceoftechnologicaldevelopmentmayrendertoday’sconceptofbigastomorrow’snorm.Inthemeantime,aconsensusisreachedonthepropertiesofbigdata.2023/10/212023/10/21..712WHATISBIGDATA?FOURDIMENSIONSOFBIGDATA2023/10/212023/10/21..8Inadynamicworld,organizationshavebeguntomoreheavilyrelyoninsightsderivedfromtheirdatatouncovernewfactsandopportunitiesforsciencediscoveryorrevenuegrowth.Intheprocessofdiscoveringanddeterminingtheseinsights,largecomplexdatasetsaregeneratedthatthenmustbemanaged,analyzedandmanipulatedbyskilledprofessionals.Thecompilationofthislargecollectionofdataiscollectivelyknownasbigdata.ThedefinitionofbigdatafromWikipediafollows.”Bigdatausuallyincludesdatasetswithsizesbeyondtheabilityofcommonlyusedsoftwaretoolstocapture,curate,manage,andprocessdatawithinatolerableelapsedtime.Bigdata“size”isaconstantlymovingtarget,asof2012rangingfromafewdozenterabytestomanypetabytesofdata.Bigdataisasetoftechniquesandtechnologiesthatrequirenewformsofintegrationtouncoverlargehiddenvaluesfromlargedatasetsthatarediverse,complex,andofamassivescale.”WHATISBIGDATA?2023/10/212023/10/21..9Ina2001researchreportandrelatedlectures,METAGroup

(nowGartner)analystDougLaneydefineddatagrowthchallengesandopportunitiesasbeingthree-dimensional,i.e.increasingvolume(amountofdata),velocity(speedofdatainandout),andvariety(rangeofdatatypesandsources).In2012,Gartnerupdateditsdefinitionasfollows:"Bigdataishighvolume,highvelocity,and/orhighvarietyinformationassetsthatrequirenewformsofprocessingtoenableenhanceddecisionmaking,insightdiscoveryandprocessoptimization."[4]Additionally,anewV"Veracity"isaddedbysomeorganizations.

FOURDIMENSIONSOFBIGDATA2023/10/212023/10/21..10VolumeThesizeofthedatadeterminesthevalueandpotentialofthedataunderconsideration,andwhetheritcanactuallybeconsideredbigdataornot.Bigdatadoesn‘tsample.Itjustobservesandtrackswhathappens.

VelocityVelocityisanindicationofhowquicklythedatacanbemadeavailableforanalysis.Bigdataisoftenavailableinreal-time.VarietyVarietyreferencesthedifferenttypesofstructuredandunstructureddatathatorganizationscancollect,suchastransactiondata,video,audio,textandlogfiles.Bigdatadrawsfromalltypesofdata.VeracityVeracityisanindicationofdataintegrityandtheabilityforanorganizationtotrustthedataandbeabletoconfidentlyuseittomakecrucialdecisions.2023/10/212023/10/21..11Differentkindofdatashowsdifferentproperties.Forexample,socialnetworkhasmorerequirementsondatavolumeandvelocityratherthanvarietyandveracity.2023/10/212023/10/21..126.2CATEGORIESOFBIGDATA2023/10/212023/10/21..1312DATAFROMTHEPHYSICALWORLDDATAFROMHUMANACTIVITIESTheamountofdatainourworldhasbeenexplodingwithourabilitytoacquiredataenhancing.Bigdataisubiquitousandcanbepartitionedintotwocategories:2023/10/212023/10/21..14Scientificexperimentsandobservationsproducemassivescientificdatasetsaboutthephysicalworld.Sensornetworksconsistingoflargenumberofcheapsensorshavebeenwidelyusedtoobtainednaturaldata.OceanObservatoriesInitiativeuseselectro-opticallycabledobservingsystemstomeasureoceanactivitiesinthenortheastPacificOcean.PanoramicSurveyTelescopeandRapidResponseSystem(Pan-STARRS)produce2.5PBdataeachyear.Over5000genomeprojectsin2010producedseveralEBgenomicdata.Remotesensingproducesevenlargernaturaldatasets.DATAFROMTHEPHYSICALWORLD2023/10/212023/10/21..152023/10/212023/10/21..16Bigdatasetsproducedbyhumanactivities,suchassocialnetworks,Internet,health,finance,economics,transportation,etc.,attractedmuchattentioninrecentyears.Togainabetterunderstandingonhowmanydataisbeinggenerated,considerthefollowingnoteworthyfacts:Facebookcurrentlyholdsmorethan45billionphotosinitsuserdatabase,anumberthatisgrowingdaily.AccordingtoIBM,userscreate2.5quintillionbytesofdataeveryday.Inpracticalterms,thismeansthat90%ofthedataintheworldtodayhasbeencreatedinthelasttwoyearsalone.AccordingtoFICO,thecreditcardfraud

detectionsystemcurrentlyinplacehelpsprotectovertwobillionaccountsallovertheglobe.

Walmartcontrolsmorethan1millioncustomertransactionseveryhour,whicharethentransferredintoadatabaseworkingwithover2.5petabytesofinformation.DATAFROMHUMANACTIVITIES2023/10/212023/10/21..176.3BIGVALUEFORBIGDATA2023/10/21GinniRometty,PresidentandChiefExecutiveOfficerofIBM,hassaidthat“Dataistheworld'sgreatnewnaturalresource.Whatsteampowerwastothe18thcentury,electromagnetismtothe19thandfossilfuelstothe20th…datawillbetothe21st.”Likeotheressentialfactorsofproductionsuchashardassetsandhumancapital,itisincreasinglythecasethatmuchofmoderneconomicactivity,innovation,andgrowthsimplycouldn’ttakeplacewithoutdata.2023/10/21..182023/10/21Whiledigitaldatamightoncehavebeenconcernedbyonlyafewdatageeks,bigdataisnowrelevantforleadersacrosseverysector,andconsumersofproductsandservicestostandtobenefitfromitsapplication.2023/10/21..19ThecombinationofdeepeninginvestmentsinbigdataandmanagerialinnovationtocreatecompetitiveadvantageandboostproductivityisverysimilartothewayITdevelopedfromthe1970sonward.TheexperienceofITstronglysuggeststhatwecouldbeonthecuspofanewwaveofproductivitygrowthenabledbytheuseofbigdata.2023/10/21Therearemanywaysthatbigdatacanbeusedtocreatevalue.Largecompaniesacrosstheglobehavescoredearlysuccessesintheiruseofbigdata.Therearenotableexamplesofcompaniesaroundtheglobethatarewell-knownfortheirextensiveandeffectiveuseofdata.Forinstance,Tesco’sloyaltyprogramgeneratesatremendousamountofcustomerdatathatthecompanyminestoinformdecisionsfrompromotionstostrategicsegmentationofcustomers.Amazonusescustomerdatatopoweritsrecommendationengine“youmayalsolike…”basedonatypeofpredictivemodelingtechniquecalledcollaborativefiltering.2023/10/21..202023/10/21Bymakingsupplyanddemandsignalsvisiblebetweenretailstoresandsuppliers,Wal-Martwasanearlyadopterofvendor-managedinventorytooptimizethesupplychain.Harrah’s,theUShotelsandcasinosgroup,compilesdetailedholisticprofilesofitscustomersandusesthemtotailormarketinginawaythathasincreasedcustomerloyalty.ProgressiveInsuranceandCapitalOnearebothknownforconductingexperimentstosegmenttheircustomerssystematicallyandeffectivelyandtotailorproductoffersaccordingly.Smart,aleadingwirelessplayerinthePhilippines,analyzesitspenetration,retailercoverage,andaveragerevenueperuseratthecityortownlevelinordertofocusonthemicromarketswiththemostpotential.2023/10/21..212023/10/21McKinsey&

Companyobservedhowbigdatacreatedvaluesafterin-depthresearchontheU.S.healthcare,theEUpublicsectoradministration,theU.S.retail,theglobalmanufacturing,andtheglobalpersonallocationdata.Throughresearchonthefivecoreindustriesthatrepresenttheglobaleconomy,theMcKinseyreportpointedoutthatbigdatamaygiveafullplaytotheeconomicfunction,improvetheproductivityandcompetitivenessofenterprisesandpublicsectors,andcreatehugebenefitsforconsumers.2023/10/21..222023/10/21McKinseysummarizedthevaluesthatbigdatacouldcreate:ifbigdatacouldbecreativelyandeffectivelyutilizedtoimproveefficiencyandquality,thepotentialvalueoftheU.S.medicalindustrygainedthroughdatamaysurpassUSD300billion,thusreducingtheU.S.healthcareexpenditurebyover8%;retailersthatfullyutilizebigdatamayimprovetheirprofitbymorethan60%;bigdatamayalsobeutilizedtoimprovetheefficiencyofgovernmentoperations,suchthatthedevelopedeconomiesinEuropecould

saveoverEUR100billion(whichexcludestheeffectofreducedfrauds,errors,andtaxdifference).

2023/10/21..232023/10/21TheMcKinseyreportisregardedasprospectiveandpredictive,whilethefollowingfactsmayvalidatethevaluesofbigdata.Duringthe2009flupandemic,Googleobtainedtimelyinformationbyanalyzingbigdata,whichevenprovidedmorevaluableinformationthanthatprovidedbydiseasepreventioncenters.Nearlyallcountriesrequiredhospitalsinformagencies,suchasdiseasepreventioncenters,ofnewtypeofinfluenzacases

.However,patientsusuallydidnotseedoctorsimmediatelywhentheygotinfected.Italsotooksometimetosendinformationfromhospitalstodiseasepreventioncenters,andfordiseasepreventioncenterstoanalyzeandsummarizesuchinformation.Therefore,whenthepublicisawareofthepandemicofanewtypeofinfluenza,thediseasemayhavealreadyspreadforonetotwoweekswithaserioushystereticnature.2023/10/21..242023/10/21Googlefoundthatduringthespreadingofinfluenza,entriesfrequentlysoughtatitssearchengineswouldbedifferentfromthoseatordinarytimes,andtheusagefrequenciesoftheentrieswerecorrelatedtotheinfluenzaspreadinginbothtimeandlocation.Googlefound45searchentrygroupsthatwerecloselyrelevanttotheoutbreakofinfluenzaandincorporatedtheminspecificmathematicmodelstoforecastthespreadingofinfluenzaandeventopredictplaceswhereinfluenzawillspreadfrom.TherelatedresearchresultshavebeenpublishedinNature.2023/10/21..252023/10/21In2008,MicrosoftpurchasedFarecast,asci-techventurecompanyintheU.S.Forecasthasanairlineticketforecastingsystemthatpredictsthetrendsandrising/droppingrangesofairlineticketprices.ThesystemhasbeenincorporatedintotheBingsearchengineofMicrosoft.By2012,thesystemhassavednearlyUSD50perticketperpassenger,withtheforecastaccuracyashighas75%.2023/10/21..262023/10/212023/10/21..276.4JIMGRAY’SFOURTHPARADIGM2023/10/212023/10/21..28FirstExperimentalScienceOriginally,therewasjustexperimentalsciencedescribingnaturalphenomena.SecondTheoreticalScienceTheninthelastfewhundredyears,therewastheoreticalscience,withKepler’sLaws,Newton’sLawsofMotion,Maxwell’sequations,andsoon.ThirdComputationalScienceInthelastfewdecades,formanyproblems,thetheoreticalmodelsgrewtoocomplicatedtosolveanalytically,andpeoplehadtostartsimulating.Thesesimulationshavecarriedusthroughmuchofthelasthalfofthelastmillennium.Atthispoint,thesesimulationsaregeneratingawholelotofdata,alongwithahugeincreaseindatafromtheexperimentalsciences.Peoplenowdonotactuallylookthroughtelescopes.Instead,theyare“looking”throughlarge-scale,complexinstrumentswhichrelaydatatodatacenters,andonlythendotheylookattheinformationontheircomputers.FourthData-intensiveScienceTheworldofsciencehaschanged,andthereisnoquestionaboutthis.Thenewmodelisforthedatatobecapturedbyinstrumentsorgeneratedbysimulationsbeforebeingprocessedbysoftwareandfortheresultinginformationorknowledgetobestoredincomputers.Scientistsonlygettolookattheirdatafairlylateinthispipeline.Thetechniquesandtechnologiesforsuchdata-intensivesciencearesodifferentthatitisworthdistinguishingdata-intensivesciencefromcomputationalscienceasanew,fourthparadigmforscientificexploration.

JimGraybelievesthatscientificdiscoveryhasexperiencedfourparadigms.2023/10/212023/10/21..296.5EVOLUTIONOFDATAMANAGEMENT2023/10/212023/10/21..30Peopleproposed“sharenothing”aparalleldatabasesystem,tomeetthedemandoftheincreasingdatavolume.Thesharenothingsystemarchitectureisbasedontheuseofclusterandeverymachinehasitsownprocessor,storage,anddisk.Teradatasystemwasthefirstsuccessfulcommercialparalleldatabasesystem.Suchdatabasebecameverypopularlately.Inthe1980sTheconceptof“databasemachine”emerged,whichisatechnologyspeciallyusedforstoringandanalyzingdata.Withtheincreaseofdatavolume,thestorageandprocessingcapacityofasinglemainframecomputersystemhasbecomeinadequate.Inlate1970s2023/10/212023/10/21..31Theadvantageoftheparalleldatabasewaswidelyrecognizedinthedatabasefield.Inlate1990sOnJune2,1986,amilestoneeventoccurred,whenTeradatadeliveredthefirstparalleldatabasesystemwithastoragecapacityof1TBtoKmarttohelpthelarge-scaleretailcompanyinNorthAmericatoexpanditsdatawarehouse.OnJune2,19862023/10/212023/10/21..32Anothermilestoneeventoccurred,whenEMC/IDCpublishedaresearchreporttitledExtractingValuesfromChaos,whichintroducedtheconceptandpotentialofbigdataforthefirsttime.Thisresearchreportarousedgreatinterestinbothindustryandacademiaonbigdata.InJune2011However,manychallengesonbigdataarose.Contentsgeneratedbyusers,sensors,andotherubiquitousdatasourcesdrivetheoverwhelmingdataflows,whichrequiredafundamentalchangeonthecomputingarchitectureandlarge-scaledataprocessingmechanism.JimGray,apioneerofdatabasesoftware,calledsuchtransformation“TheFourthParadigm”.Hethoughttheonlywaytocopewithsuchaparadigmwastodevelopanewgenerationofcomputingtoolstomanage,visualize,andanalyzemassivedata.InJanuary20072023/10/212023/10/21..33Overthepastfewyears,nearlyallmajorcompanies,includingEMC,Oracle,IBM,Microsoft,Google,Amazon,andFacebook,etc.,havestartedtheirbigdataprojects.TakingIBMasanexample,since2005,IBMhasinvestedUSD16billionon30acquisitionsrelatedtobigdata.Industry2023/10/212023/10/21..34Inacademia,bigdatawasalsounderthespotlight.In2008,Naturepublishedthebigdataspecialissue.In2011,Sciencealsolaunchedaspecialissueonthekeytechnologiesof“dataprocessing”inbigdata.In2012,EuropeanResearchConsortiumforInformaticsandMathematics(ERCIM)Newspublishedaspecialissueonbigdata.Inthebeginningof2012,areporttitledBigData,BigImpactpresentedattheDavosForuminSwitzerland,announcedthatbigdatahasbecomeanewkindofeconomicassets,justlikecurrencyofgold.Gartner,aninternationalresearchagency,issuedHypeCyclesfrom2012to2013,whichclassifiedbigdatacomputing,socialanalysis,andstoreddataanalysisinto48emergingtechnologiesthatdeservemostattention.Academia2023/10/212023/10/21..352023/10/212023/10/21..36ManynationalgovernmentssuchastheU.S.alsopaidgreatattentiontobigdata.InMarch2012,theObamaAdministrationannouncedaUSD200millioninvestmenttolaunchtheBigDataResearchandDevelopmentInitiative,whichwasasecondmajorscientificandtechnologicaldevelopmentinitiativeaftertheInformationHighwayInitiativein1993.InJuly2012,theJapan’sICTprojectissuedbyMinistryofInternalAffairsandCommunicationsindicatedthatthebigdatadevelopmentshouldbeanationalstrategyandapplicationtechnologiesshouldbethefocus

.InJuly2012,theUnitedNationsissuedBigDataforDevelopmentreport,whichsummarizedhowgovernmentsutilizedbigdatatobetterserveandprotecttheirpeople.Government2023/10/212023/10/21..376.6BIGDATACHALLENGES2023/10/212023/10/21..38VolumeDatavolumeposesthemostnoticeablechallengeduetolimitedhardwarecapacityandsoftwareefficiencyandeffectiveness.Hardwarecapacityincludessizeandspeedofstoragesystems,CPUfrequencyandparallelism,andcommunicationbandwidth.Softwareefficiencyandeffectivenesshavetodowitharchitectsandalgorithms.Theseissuesshouldberevisitedtoaddressthechallengeofbigdatavolume.VelocityThechallengeofbigdatavelocitycomeswiththeneedstotimelygenerateresultsandprocessarrivingdata.Bigdatavelocitybringschallengestoeverystackofadatamanagementplatform.Boththestoragelayerandthequery

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论