版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Lesson10DataWarehouseOverview
(第十课数据仓库概论)
Vocabulary(词汇)ImportantSentences(重点句)QuestionsandAnswers(问答)Problems(问题)
TheworddatawarehousewasfirstdevelopedbyBillInmonintheearly1990s.Hereferredtoitasbeingaintegratedcollectionofinformationthatcouldhelpcompaniesandorganizationsmakebetterdecisions.
Tobeeffective,adatawarehousehadtobeintegrated,subjectoriented,non-volatile,andtimevariant.Inthisarticle,Iwillgooverallthesefactorsindetail.Ifyouarebuildingadatawarehouse,itisimportantforyoutounderstandwhytheyareimportant.
Beingsubjectorientedmeansthatthedatawillprovideinformationaboutaspecificsubjectratherthantheinformationaboutthefunctionsofacompany.Becauseadatawarehouseissubjectoriented,itwillallowyoutoanalyzeinformationthatisconnectedtoaspecificsubject.Beingintegratedmeansthatthedatathatiscollectedwithinthedatawarehousecancomefromdifferentsources,butcanbecombinedintooneunitthatisrelevantandlogical.Havingatime-variantmeansthatalltheinformationwithinthedatawarehousecanbefoundwithagivenperiodoftime.[1]
Itisimportantthattheinformationcontainedwithinadatawarehouseisstable.Whiledatacanbeadded,itshouldneverbedeleted.Thispropertyisreferredtoasbeingnon-volatile.Whenacompanyusesadatawarehousethatisstable,thiswillallowthemtogetabetterunderstandingoftheoperationswithintheircompany.Despitethefactthatthesetermswerefirstcoinedinthe1990s,theyarestillhighlyaccuratetoday.However,itshouldbenotedthatsomedatawarehousesarevolatile.Thereasonforthisisbecausemanymoderndatawarehousesdealwithterabytesofdata.Becausetheymuststoreterabytesofdata,manycompaniesareforcedtodeletesomeoftheirinformationafteracertainperiodoftime.Forinstance,somecompanieswillsystematicallydeletedatathathasreachedthreeyearsofage.Beforeadatawarehousecanbebuilt,thecorrectdatamustbelocated.Generally,theinformationthatwillbeaddedtothewarehousewillcomefromdailyinformationorhistoricalinformation.Thehistoricalinformationmaybestoredinalegacysystem,andischallengingtoextract.
Thedesignofthedatawarehouseisimportantaswell.Itisimportantfordesignerstomakesurethedesignisconsistentwiththequeriesthatwillbeconductedwithinthewarehouse.Todothissuccessfully,itisimportantfordesignerstounderstandthedatabaseschema.Itiscrucialtomakesurethedatawarehouseisdesignedcorrectly,asitisdifficulttorecreatesomeformsofdata.Anotherimportantaspectofdatawarehousesisdataacquisition.Dataacquisitioncanbedefinedastransferringdatafromasourcetothewarehouse.Dataacquisitionisoneofthemostexpensivepartsofbuildingadatawarehouse.ThisprocesswilloftenbeconductedwithanETL(Extracting,TranslatingandLoading)tool.
Asofthistime,therearejustover50ETLtoolsbeingsold.Itmaycostacompanymillionsofdollarsinordertotransferdatafromsourcestothewarehouse.Oncetheinitialdatahasbeentransferredtothedatawarehouse,theprocessmustberepeatedconsistently.Dataacquisitionisacontinuousprocess,andthegoalofacompanyistomakesurethewarehouseisupdatedonaregularbasis.Whenthewarehouseisupdated,itisoftenhardtodeterminewhichinformationinthesourcehaschangedsincethepreviousupdate.Theprocessofdealingwiththisissueiscalledchangeddatacapture.Thisprocesshasbecomeaseparatefield,andthereareanumberofproductscurrentlybesoldtodealwithit.
Itisimportantfordatatobecleanedbeforeitcanbeplacedinthewarehouse.Thedatacleansingprocessisusuallydoneduringthedataacquisitionphase.Anydatathatisplacedinawarehousebeforebeingcleanwillposeadangertothesystem,anditcannotbeused.Thereasonforthisisbecausethedatamaynotbecorrectifitisnotcleaned,andacompanymaymakeincorrectdecisionsbasedonit.Thiscouldleadtoanumberofproblems.Forexample,alltheinformationwithinadatawarehousethatmeansthesamethingmustbestoredinthesameform.Ifthereisinformationthatreads“MS”and“Microsoft”,eventhoughtheymeanthesamething,onlyoneofthemcanbeusedtorecognizetheelementwithinthedatawarehouse.1DataWarehouseTools
Thereareanumberofimportanttoolswhichareconnectedtodatawarehouses,andoneoftheseisdataaggregation.Adatawarehousecanbedesignedtostoreinformationbasedonacertainlevelofdetail.Forexample,youcanstoredatabasedoneachtransaction,oryoucanstoreitbasedonasummary.Theseareexamplesofdataaggregation.Whendataissummarized,thequerieswillmoveatamuchfasterrate.However,someoftheinformationmaybelostduringaquery,andthisinformationmaybeimportantforsolvingacertainproblem.
Beforeyoudecidewhichoneyouwilluse,itisimportanttoweighyouroptionscarefully.Onceyouhavecarriedoutanoperation,youwillneedtorebuildthewarehouseinordertoundoit.Thebestwaytohandlethissituationistomakesurethedatawarehouseisconstructedwithalargeamountofdetail.However,thecostforthiscanbehugedependingonthestorageoptionsyouchoose.Onceyouhavefilledyourdatawarehousewithimportantinformation,youwillwanttousethisdatatohelpyoumakesmartinvestmentdecisions.Thetoolsthatcanallowyoutodothiswillfallunderatopicthatiscalledbusinessintelligence.
Businessintelligenceisafieldwhichisverydiverse.ItiscomprisedofthingssuchasExecutiveInformationSystems,DecisionSupportSystems,andBusinessintelligencecanfurtherbebrokendownintoafieldthatiscalledmulti-dimensionalanalysistools.Thesearetoolsthatwillallowausertoviewdatafromawidevarietyofangles.AquerytoolwillallowausertosendSQLquerieswithinawarehousetolookforresults.Dataminingisalsoafieldthatfallsunderbusinessintelligence,andwillallowyoutolookforpatternsandrelationshipswithinadatawarehouse.
Anothertoolthatisconnectedtodatawarehousesisdatavisualization.Thetoolsthatareusedfordatavisualizationwillpresentvisualmodelsofdata.Thisdatacouldcomeintheformofintricate3Dimages.Thegoalofdatavisualizationistoallowtheusertoviewtrendsinamethodwhichiseasiertounderstandthancomplicatedmodelsthatarebasedoffstatistics.OnetoolthatisallowingthisfieldtoadvanceisVRML,orVirtualRealityModelingLanguage.Inorderfordatawarehousestofunctionproperly,itisalsoimportanttoplaceanemphasisonmetadatamanagement.Metadatacanbedescribedasbeing“informationaboutinformation”.
Metadatamustbemanagedwhendataisacquiredoranalyzed.Metadatawillbeheldinarepository,andcangiveyouimportantinformationaboutmanyofthedatawarehousetools.Theprocessofproperlymanagingmetadatahasbecomeasciencewithinitself.Ifitisdoneproperly,thecompanycangreatlybenefit.Thereasonwhyitisimportantisbecauseitcanalloworganizationstoanalyzethechangesthatoccurwithindatabasetables.Thisisatoolthatplaysanimportantpartoftheconstructionofadatawarehouse.
Datawarehousingisafieldwhichissomewhatcomplicated.Therearemanyvendorswhoareattemptingtoadvertisethetools,butthecostandcomplexityinvolvedwiththeproductshasnotallowedthemtobeusedbyalargenumberofcompanies.Anycompanythatisthinkingofusingdatawarehousesmustmakesuretheyhavetakenthetimetoreviewandunderstandthetechnology.Itcanonlybeusefulifyouknowhowtouseit.Onceyouunderstandandacquirethetechnology,itispossibleforyoutogainapowerfuladvantageoveryourcompetitors.Thishasmadedatawarehousesattractivetomanycompanies.
Oneofthebiggestadvantagestodatawarehousesisthattheyallowyoutostoreinformationthatyoucanusetoimprovethemarketingstrategiesofyourcompany.Notonlycanyouimprovethemarketingstrategies,butyouwillalsobeabletomakestrategicdecisionsbasedontheinformationyouhavecompiledandorganized.Withtechniquessuchasdatamininganddatavisualization,youwillbeabletodiscoverimportantpatternsthatyoudidn’tknowexisted.Thepatternsthatyoudiscovercanallowyourcompanytoearnlargeprofits.2DataWarehousingMethods
Mostorganizationsagreethatdatawarehousesareausefultool.Theybenefitfromtheabilitytostoreandanalyzedata,andthiscanallowthemtomakesoundbusinessdecisions.Itisalsoimportantforthemtomakesurethecorrectinformationispublished,anditshouldbeeasytoaccessbythepeoplewhoareresponsibleformakingdecisions.
Therearetwoelementsthatmakeupthedatawarehouseenvironment,andthesearepresentationandstaging.Thestagingcouldalsobeknownastheacquisitionarea.ItiscomposedofETLoperations,andoncethedatahasbeenprepared,itwillbesenttothepresentationarea.
Whenthedataisplacedwithinthepresentationarea,anumberofprogramswillanalyzeandreviewit.Whilemanyorganizationsagreeontheoverallgoalofdatawarehouses,theapproachestobuildingthemmaydiffer.Attemptingtousedatamartsaloneisnotagoodapproach,becausetheyaregearedtowardsdepartments.Inadditiontothis,attemptingtousedatamartsalonewillbeinefficient,andyouwillrunintoanumberoflongtermproblems.Therearetwotechniquesforbuildingdatawarehousesthathavebecomeverypopular.ThesearetheKimballBusArchitectureandtheCorporateInformationFactory.
WiththeKimballtechnique,theroughdatawillbetransformedandrefinedwithinthestagingarea.Itisimportanttomakesurethedataisproperlyhandledduringthisstep.Duringthestagingprocess,theroughdatawillbepulledfromthesourcesystems.Whilesomeofthestagingprocessesmaybecentralized,otherswillbedistributed.Thepresentationareawillhaveadimensionalstructure,andthismodelwillholdthesameinformationasastandardmodel.However,itwillbeeasiertouse,anditwilldisplayinformationthatissummarized.
Adimensionalmodelwillbecreatedbyabusinessoperation.Departmentswithintheorganizationdonotplayaroleinthis.Thedatawillbepopulatedonceitisplacedwithinthedimensionalwarehouse,andisnotdependentonthevariousdepartmentsthatmaycomposeanorganization.Whenbusinessprocesseshavebeendevelopedwithinthewarehouse,thesystemwillbecomehighlyefficient.ThenextpopulardatawarehouseapproachthatyouwillwanttobecomefamiliarwithistheCorporateInformationFactory.AnothernameforthistechniqueistheEDWapproach.Thedatathatisextractedfromthesourcewillbecoordinated.
WithintheCIF,astandarddatawarehouseisusedtoholddatarepositories,anditmayalsohavespecificdatawarehouseswhicharedesignedfordatamining.Thedatamartsmaybedesignedforspecificdepartments,andtheymayhavesummarydatawhichisintheformofadimensionalstructure.Theatomicdatamaybeobtainedfromthestandarddatawarehouse.Whiletherearesomesimilaritiesbetweenthesetotechniques,therearesomenotabledifferencesaswell.
Oneoftheprimarydifferencesbetweenthesetwotechniquesisthenormalizeddatafoundation.WiththeKimballapproach,thedatastructuresthatmustbeobtainedbeforethedimensionalpresentationwillbedependentonthesourcedataandtransformation.Inmostcases,theduplicatestorageofdataisnotrequiredinbothdimensionalandnormalizedfoundations.Manyofthepeoplewhochoosetouseanormalizeddatastructurebelievethatitisfasterthanthedimensionalstructure,buttheyoftenfailtotakeETLintoconsideration.
Anotherthingthatseparatesthetwodatawarehouseapproachesisthemanagementofatomicdata.WiththeCIF,atomicdatawillbestoredwithinanormalizeddatawarehouse.Incontrast,theKimballmethodstatesthattheatomicdatashouldbeplacedwithinadimensionalstructure.Whenthedataisplacedwithinadimensionalstructure,itcanbesummarizedinawidevarietyofdifferentways.
Itisimportanttomakesuretheinformationyouhaveisdetailedsothatuserswillbeabletoaskrelevantquestions.Whilemostuserswillnotplaceanemphasisonthedetailsofoneatomictransaction,theymaywantasummaryofalargenumberoftransactions.Itisimportantforthemtohavethedetailssothattheywillbeabletoanswerimportantquestions.Theapproachthatyouchooseshouldbetheonewhichbestservestheneedsofyourcompany.3DataWarehouseDesignStrategies
Tobuildaneffectivedatawarehouse,itisimportantforyoutounderstanddatawarehousedesignprinciples.Ifyourdatawarehouseisnotbuiltcorrectly,youcanrunintoanumberofdifferentproblems.
Thepropermethodsforbuildingapowerfuldatawarehousearebasedoninformationtechnologytactics.Firstoff,itisimportantthatyouandyourorganizationunderstandtheimportanceofhavingadatawarehouse.Ifworkersfeelthatadatawarehouseisunnecessary,theymaynotuseit,andthiscouldcauseconflicts.Everyoneinyourorganizationshouldunderstandtheimportanceofusingthesystem.
Afteryouhavegotyourcolleaguesbehindtheconceptofusingadatawarehouse,youwillwanttonextfocusondataintegrity.Youwillwanttoavoiddesigningadatawarehousethatwillloaddatathatisnotconsistent.Itisalsoimportanttoavoidcreatingadatabasethatwillreplicatedata.Thegoalofyourorganizationshouldbetointegratedataandcreatestandardsthatwillbeusedandfollowed.Afterdataintegrity,youwillnextwanttolookatimplementationefficiency.Thisbasicallymeansthatyouwillwanttodesignatsystemthatissimpletouse.Itdoesn’tmatterhowwelldesignedyourdatawarehouseisifyourworkershaveahardtimeusingit.
Ifyourworkershaveahardtimeusingthedatawarehouse,itwillslowdownthespeedandproductivityofyouroperation.Whenitcomestocreatingadatawarehouse,youwillwanttomakeitassimpleaspossible.Allofyourworkersshouldbeabletouseitwithoutproblems.Implementationefficiencyisaprinciplethatnaturallyleadstothenexttopicyouwillwanttofocuson,andthisisuserfriendliness.Thisisaconceptthatisanimportantpartofyourbusiness.Thereasonforthisisbecauseenduserswillnotutilizeaprogramthatistoodifficulttouse.Itisimportantforyoutokeeptheminmind.Useadesignwhichisfriendlyandeasytolearn.
Onceyouhavedesignedadatawarehousethatisuserfriendly,youwillnextwanttolookatoperationalefficiency.Oncethedatawarehousehasbeencreated,itshouldbeabletocarryoutoperationsquickly.Inadditiontothis,itshouldnothaveerrorsorothertechnicalproblems.Whenerrorsortechnicalproblemsdooccur,theyshouldbesimpletofix.Anotherthingyouwillwanttolookatisthecostinvolvedwithsupportingthesystem.Youwillwanttokeepthesecostslowasmuchaspossible.
Thedesignprinciplesthathavebeendiscussedinthisarticlesofararemorerelatedtobusinessthaninformationtechnology.However,thereareanumberofITdesignprinciplesthatyouwillwanttofollow.Oneoftheseisscalability.Thisisaproblemthatmanydatawarehousedesignersruninto.Thebestwaytodealwiththisissueistocreateadatawarehousethatisscalablefromthebeginning.Designitinawaywhichwillallowittosupportexpansionsorupgrades.Youshouldbeabletoadaptittoanumberofdifferentbusinesssituations.Thebestdatawarehousesarethosewhicharescalable.
Thedatawarehousethatyoudesignshouldfallundertheguidelinesofinformationtechnologystandards.EverytoolthatyouusetobuildyourdatawarehouseshouldworkwellwithITstandards.Youwillwanttomakesureitisdesignedinawaythatmakesiteasierforyourworkerstouse.Whilefollowingtheguidelinesinthisarticlewon’tallowyoutoalwaysbesuccessful,itwillgreatlytiptheoddsinyourfavor.Youshouldbewaryofcompaniesthatpromiseyouperfectresultsifyouusetheirdesignmethods.[2]Nomatterhowwelldesignedyourdatawarehouseis,youwillalwaysrunintoproblems.However,followingtherightprincipleswillmaketheproblemseasiertorecognizeandsolve.
Whenitcomestousingadatawarehouse,itisnotamatterof“if”youwillrunintoproblems.Itismatterof“how”and“when”.Whenyourdatawarehouseiswelldesigned,youwillbebetterequippedtosolveanyproblemsyouencounter.
1. warehousen.仓库,货栈。
2. goover受欢迎,获得接受;检查。
3. orientvt.vi.使熟悉,使适应;使朝向;确定位置;朝向;确定方向;使适应n.东方,亚洲。
4. variantn.变体;变种;变型adj.不同的;差别的;变异的;各种各样的。
5. specificadj.明确的,确切的,详尽的;具体的,特有的,特定的;仅限于……的。Vocabulary
6. volatileadj.飞行的,挥发性的,可变的,不稳定的,轻快的,爆炸性的n.有翅的动物,挥发物。
7. scheman.概要,计划,图表,模式。
8. acquisitionn.获得,得到的东西;得到的人,买进。
9. aggregationn.集合,凝聚,集成,集结(作用),集合[成]体,集团。
10. strategyn.战略(学),策略,计谋,作战方针;智谋,手腕strategyandtactics战略与战术。
11. Intricateadj.复杂的,错综的,难以理解的。
12. martn.市场;贸易场所。
13. repositoryn.仓库,储藏所;储物器皿,博物馆;学识渊博的人;受人信赖的人,知己。
14. Stagingn.举行,进行;配置,阶变,级,级组,分段运输;分级法。
15. Populatevt.居住,使人口聚居于;移民于;殖民于人口稠密(稀少)的城市。
[1]Beingsubjectorientedmeansthatthedatawillprovideinformationaboutaspecificsubjectratherthantheinformationaboutthefunctionsofacompany.Becauseadatawarehouseissubjectoriented,itwillallowyoutoanalyzeinformationthatisconnectedtoaspecificsubject.Beingintegratedmeansthatthedatathatiscollectedwithinthedatawarehousecancomefromdifferentsources,butcanbecombinedintooneunitthatisrelevantandlogical.Havingatime-variantmeansthatalltheinformationwithinthedatawarehousecanbefoundwithagivenperiodoftime.ImportantSentences
所谓“面向主题”,就是数据将提供有关一个具体的主题的信息,而不是有关公司运行的信息。由于数据仓库是面向主题的,因此它就允许你分析与具体主题相关的
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2021年度医疗保障行政检查计划
- 文化服务建设工作计划
- 关于个人年度工作计划模板汇编
- 2024年妇联工作计划妇联工作计划样本
- 2024年中学工作计划书
- 学校后勤部工作计划范文
- 下半年工作计划集锦
- 幼儿园班级互换互动工作计划
- 4林业局工会工作计划
- 新初三暑期数学学习计划书
- 2024年铁总服务中心招聘6人高频难、易错点500题模拟试题附带答案详解
- 福建省泉州市2023-2024学年高一上学期期末质检英语试题(解析版)
- 2024秋期国家开放大学专科《建设法规》一平台在线形考(形成性作业一至五)试题及答案
- 医院内分泌科危急重症应急预案
- 第三单元第1课 标志设计 课件 2024-2025学年人教版(2024)初中美术七年级上册
- 肿瘤物理消融治疗新进展
- 独立站合同模板
- 行政管理学(山东联盟)智慧树知到答案2024年曲阜师范大学
- 安徽省2022年中考语文现代文阅读真题及答案
- 小儿短肠综合征
- 2024年新苏教版科学六年级上册全册背诵专用知识点
评论
0/150
提交评论