版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Extraction,Transformation,
andLoading(ETL)ExtractionandTransportationObjectivesAftercompletingthislesson,youshouldbeabletodothefollowing:DescribethecoreETLframeworkinsidethedatabaseanditsintegrationadvantageExplaindatawarehousingextractionmethodsIdentifytransportationmethods:FlatfileDistributesoperationsTransportabletablespacesDescribetransformationflowOverviewLesson4:Extraction/TransportationLesson5:LoadingLesson6:TransformationWhatIsETL?ETLisanacronymforExtraction,Transformation,andLoading.ThefollowinghappenduringtheETLprocess:Thedesireddataisidentifiedandextractedfrommanydifferentsources.Sometransformationsmaytakeplaceduringthisextractionprocess.Afterextraction,thedatamustbetransportedtoatargetsystemoranintermediatesystemforfurtherprocessing.Dependingonthemethodoftransportation,sometransformationscanbedonesimultaneously.ETLreferstoabroadprocess.ExtractionMethodsExtractioncanbethoughtofintwoparts:ExtractionTransportationTherearetwoextractionmethods:LogicalPhysicalYourlogicalchoiceinfluencesthewaythedataisphysicallyextracted.Somecriteriaforchoosingacombination:BusinessneedsLocationofthesourceandtargetsystemsAvailabilityofthesourcesystemTimerequiredtoextractdataLogicalExtractionMethodsTherearetwokindsoflogicalextraction:FullextractionAlldataispulledLessinformationtotrackMoretimerequiredtopullthedataIncrementalextractionAsubsetofdataispulledMusttrackwhatdataneedstobepulledLesstimerequiredtopullthedataPhysicalExtractionMethodsTherearetwotypesofphysicalextraction.Onlineextraction:PullsdatafromthesourcesystemOfflineextraction:PullsdatafromastagingareaStagingareasincludeflatfiles,dumpfiles,andtransportabletablespaces.OfflineExtractionStagingareas:FlatfilesRequiresdatainapredefined,genericformatDumpfilesMustbeinanOracle-specificformatRedoandarchivelogsDatalocatedinspecialdumpfilesTransportabletablespacesPowerful,fastmethodformovinglargevolumesofdataImplementingMethodsofExtractionExtractingtoafile:SpoolingfromSQL*PlusUsingOCIorPro*CtodumptoafileUsingDataPumptoexporttoanOracledumpfileUsingexternaltablesExtractingthroughdistributedoperationsIncrementalExtractionUsingCDCCDCcancaptureandpublishcommittedchangedataineitherofthefollowingmodes:SynchronousTriggersonthesourcedatabaseallowchangedatatobecapturedimmediately.Changedataiscapturedaspartofthetransactionmodifyingthesourcetable.AsynchronousChangedataiscapturedafteraSQLstatementperformingDMLiscommittedusingtheredologs.AsynchronousChangeDataCaptureisbuiltonOracleStreams.PublishandSubscribeModelThepublisherperformsthefollowingtasks:IdentifiessourcetablesfromwhichthedatawarehouseisinterestedincapturingchangedataUsestheDBMS_CDC_PUBLISHpackageto:SetupthecaptureofdatafromthesourcetablesDetermineandadvancethechangesetsPublishthechangedataAllowscontrolledaccesstosubscribersusingtheSQLGRANT
andREVOKE
statementsPublishandSubscribeModelThesubscriberusestheDBMS_CDC_SUBSCRIBEpackageto:SubscribetosourcetablesExtendthewindowandcreatechangeviewPreparethesubscriberviewsViewdatastoredinchangetablesPurgethesubscriberviewRemovethesubscriberviewsSynchronousCDCSourcetablesSourcedatabasetransactionsSYNC_SOURCEChangesourceChangesetChangetablesSubscriber
viewsTriggerexecutionSourcedatabaseAsynchronousCDCAsynchronousCDC:CaptureschangedatafromredologfilesafterchangeshavebeencommittedtothesourcedatabaseModesaredependentonthelevelofsupplementalloggingusedonthesourcedatabaseUsesOracleStreamstocapturechangedatafromredologfilesHasthreesourcemodes:AsynchronousAutoLogmodeAsynchronousHotLogmodeAsynchronousDistributedHotLogmodeAsynchronousAutoLogModeSourcetablesSourcedatabasetransactionsLGWROnlineredologsDistributedAutoLogchangesetChangesetChange
tablesSubscriber
viewsSourcedatabaseStagingdatabaseDistributedAutoLogchangesourceRFSStandbyredologsStreams
captureLOG_ARCHIVE_DEST_2AsynchronousHotLogConfigurationSourcetablesSourcedatabasetransactionsHOTLOG_SOURCEChangeSourceChangesetChangetablesSubscriber
viewsStreamslocalcaptureSourcedatabaseLGWROnlineredologsAsynchronousDistributedHotLogModeSourcetablesSourcedatabasetransactionsLGWROnlineredologsDistributedHotLogchangesetChangesetChangetablesSubscriber
viewsSourcedatabaseStagingdatabaseDBlinkDistributedHotLogchangesourceDBlinkStreamspropagationPreparingtoPublishChangeData1. Gatherrequirementsfromthesubscribers.2. Determinewhichsourcedatabasecontainstherelevantsourcetables.3. Choosethecapturemode:SynchronousAsynchronousHotLogAsynchronousDistributedHotLogAsynchronousAutoLogEnsurethatthesourceandstagingdatabaseshaveappropriatedatabaseinitializationparametersset.Setupdatabaselinksbetweenthesourcedatabaseandthestagingdatabase.CreatingaPublisherUserThestagingdatabasepublishermustbegrantedthefollowingprivilegesandroles:EXECUTE_CATALOG_ROLEprivilegeSELECT_CATALOG_ROLEprivilegeCREATETABLEand
CREATESESSIONprivilegesEXECUTEonthe
DBMS_CDC_PUBLISHpackageCreateadefaulttablespacefor
thepublisher.SynchronousPublishing1. Createachangeset.2. Createachangetable.3. Grantaccesstosubscribers.BEGINDBMS_CDC_PUBLISH.CREATE_CHANGE_SET(change_set_name=>'CHICAGO_DAILY',description=>'Changesetforsaleshistoryinfo',change_source_name=>'SYNC_SOURCE');END;GRANTSELECTONducts_ctTOsubscriber1;AsynchronousDistributed
HotLogPublishingPreparethesourceandstagingdatabases:1. ConfigureOracleNetsothatthesourcedatabasecancommunicatewiththestagingdatabase.2. Setinitializationparametersonthesourcedatabase.
3. Spatible=10.2.0global_names=truejob_queue_processes=<currentvalue>+2open_links=4parallel_max_servers=<currentvalue>+3processes=<currentvalue>+4sessions=<currentvalue>+1streams_pool_size=<currentvalue>+20MBundo_retention=3600AsynchronousDistributed
HotLogPublishingPreparethestagingdatabase:Spatible=10.2.0global_names=truejava_pool_size=50000000open_links=4job_queue_processes=2parallel_max_servers=<current_value>+2processes=<current_value>+3sessions=<currentvalue>+1streams_pool_size=<current_value>+11MBundo_retention=3600AsynchronousDistributed
HotLogPublishingAlterthesourcedatabase:1. PlacethedatabaseintoFORCELOGGINGloggingmodetoprotectagainstunloggeddirectwrites.2. Enablesupplementallogging.3. Createanunconditionalloggrouponallcolumnstobecapturedinthesourcetable.ALTERDATABASEFORCELOGGING;ALTERDATABASEADDSUPPLEMENTALLOGDATA;ALTERTABLESH.PRODUCTSADDSUPPLEMENTALLOGGROUPlog_group_products(PROD_ID,PROD_NAME,PROD_LIST_PRICE)ALWAYS;AsynchronousDistributed
HotLogPublishingPublisherprivilegesonsourceandstagingdatabases:1. Createandgrantprivilegestothesourcedatabasepublisher.2. Createandgrantprivilegestothestagingdatabasepublisher.CREATEUSERsource_cdcpubIDENTIFIEDBYsource_cdcpubQUOTAUNLIMITEDONSYSTEMQUOTAUNLIMITEDONSYSAUX;GRANTCREATESESSIONTOsource_cdcpub;GRANTDBATOsource_cdcpub;GRANTCREATEDATABASELINKTOsource_cdcpub;GRANTEXECUTEonDBMS_CDC_PUBLISHTOsource_cdcpub;GRANTEXECUTE_CATALOG_ROLETOsource_cdcpub;GRANTSELECT_CATALOG_ROLETOsource_cdcpub;EXECUTEDBMS_STREAMS_AUTH.GRANT_ADMIN_PRIVILEGE(GRANTEE=>'source_cdcpub');AsynchronousDistributed
HotLogPublishingCreatesourceandstagingdatabaselinks:1. Createthesourcedatabaselink.2. Createthestagingdatabaselink.CREATEDATABASELINKstaging_dbCONNECTTOstaging_cdcpubIDENTIFIEDBYstaging_cdcpubUSING'staging_db';CREATEDATABASELINKsource_dbCONNECTTOsource_cdcpubIDENTIFIEDBYsource_cdcpubUSING'source_db';AsynchronousDistributed
HotLogPublishingCreatechangesourcesandchangesets:1. Createthechangesources.2. Createthechangesets.BEGINDBMS_CDC_PUBLISH.CREATE_HOTLOG_CHANGE_SOURCE(change_source_name=>'CHICAGO',description=>'testsource',source_database=>'source_db');END;DBMS_CDC_PUBLISH.CREATE_CHANGE_SET(change_set_name=>'CHICAGO_DAILY',description=>'changesetforproductinfo',change_source_name=>'CHICAGO',stop_on_ddl=>'y');END;AsynchronousDistributed
HotLogPublishingCreatethechangetablesonthestagingdatabase:BEGINDBMS_CDC_PUBLISH.CREATE_CHANGE_TABLE(owner=>'staging_cdcpub',change_table_name=>'products_ct',change_set_name=>'CHICAGO_DAILY',source_schema=>'SH',source_table=>'PRODUCTS',column_type_list=>'PROD_IDNUMBER(6),PROD_NAMEVARCHAR2(50),PROD_LIST_PRICENUMBER(8,2),JOB_IDVARCHAR2(10),DEPARTMENT_IDNUMBER(4)',capture_values=>'both',rs_id=>'y',row_id=>'n',...options_string=>'TABLESPACETS_CHICAGO_DAILY');END;AsynchronousDistributed
HotLogPublishingEnablethechangesourceandchangeset:1. Enablethechangesource.2. Enablethechangeset.3. Grantaccesstosubscribers.BEGINDBMS_CDC_PUBLISH.ALTER_HOTLOG_CHANGE_SOURCE(change_source_name=>'CHICAGO',enable_source=>'Y');END;BEGINDBMS_CDC_PUBLISH.ALTER_CHANGE_SET(change_set_name=>'CHICAGO_DAILY',enable_capture=>'y');END;SubscribingtoChangeData1. Findthesourcetablesforwhichthesubscriberhasaccessprivileges.2. Findthechangesetnamesandcolumnsforwhichthesubscriberhasaccessprivileges.SQL>SELECT*FROMALL_SOURCE_TABLES;SOURCE_SCHEMA_NAMESOURCE_TABLE_NAME------------------------------------SHPRODUCTSSQL>SELECTUNIQUECHANGE_SET_NAME,COLUMN_NAME,PUB_IDFROM2ALL_PUBLISHED_COLUMNSWHERESOURCE_SCHEMA_NAME='SH'AND3SOURCE_TABLE_NAME='PRODUCTS';CHANGE_SET_NAMECOLUMN_NAMEPUB_ID----------------------------------------------CHICAGO_DAILYPROD_ID41494CHICAGO_DAILYPROD_LIST_PRICE41494CHICAGO_DAILYPROD_NAME41494SubscribingtoChangeData3. Createasubscription.4. Subscribetoasourcetableandcolumns.BEGINDBMS_CDC_SUBSCRIBE.CREATE_SUBSCRIPTION(change_set_name=>'CHICAGO_DAILY',description=>'ChangedataforPRODUCTS',subscription_name=>'SALES_SUB');END;BEGINDBMS_CDC_SUBSCRIBE.SUBSCRIBE(subscription_name=>'SALES_SUB',source_schema=>'SH',source_table=>'PRODUCTS',column_list=>'PROD_ID,PROD_NAME,PROD_LIST_PRICE',subscriber_view=>'SALES_VIEW');END;SubscribingtoChangeData5. Activatethesubscription.6. Getthenextsetofchangedata.BEGINDBMS_CDC_SUBSCRIBE.ACTIVATE_SUBSCRIPTION(subscription_name=>'SALES_SUB');END;BEGINDBMS_CDC_SUBSCRIBE.EXTEND_WINDOW(subscription_name=>'SALES_SUB');END;SubscribingtoChangeData7. Querythesubscriberviews.8. Indicatethatthechangedataisnolongerneeded.9. Endthesubscription.SELECTPROD_ID,PROD_NAME,PROD_LIST_PRICEFROMSALES_VIEW;PROD_IDPROD_NAMEPROD_LIST_PRICE------------------------------------------------------30And2CrosscourtTeeKids14.9930And2CrosscourtTeeKids17.6610Gurfield&MurksPleatedTrousers17.9910Gurfield&MurksPleatedTrousers21.99BEGINDBMS_CDC_SUBSCRIBE.PURGE_WINDOW(subscription_name=>'SALES_SUB');END;AsynchronousDistributedHotLog
SourceDatabaseInitializationParameters
ForallOracleDatabase10greleases:3600UNDO_RETENTIONThecurrentvalue+(thenumberofchangesourcesplanned)SESSIONSThecurrentvalue+(4timesthenumberofchangesourcesplanned)PROCESSESThecurrentvalue+(3timesthenumberofchangesourcesplanned)PARALLEL_MAX_SERVERSShouldbeequaltothenumberofDistributedHotLogchangesourcesplannedOPEN_LINKSMaximumnumberofDBMS_JOBjobsthatcanrunsimultaneouslyplus2JOB_QUEUE_PROCESSESTRUEGLOBAL_NAMES10.2.0or10.0.0COMPATIBLE
ValueParameterAsynchronousDistributedHotLog
SourceDatabaseInitializationParameters
ForOracle9.2databases:Thecurrentvalue+(thenumberofchangesourcesplanned)PROCESSESThecurrentvalue+(3timesthenumberofchangesourcesplanned)PARALLEL_MAX_SERVERSThenumberofDistributedHotLogchangesourcesplannedOPEN_LINKSThenumberofchangesourcesplannedLOGMNR_MAX_PERSISTENT_SESSIONS1LOG_PARALLELISMMaximumnumberofDBMS_JOBjobsthatcanrunsimultaneouslyplus2JOB_QUEUE_PROCESSESTRUEGLOBAL_NAMES9.2.0COMPATIBLE
ValueParameterAsynchronousDistributedHotLog
StagingDatabaseInitializationParametersForOracleDatabase10gRelease2:Settothecurrentvalue+((thenumberofchangesourcesplanned)*(11MB))STREAMS_POOL_SIZEThecurrentvalue+(thenumberofchangesourcesplanned)SESSIONSThecurrentvalue+(3timesthenumberofchangesourcesplanned)PROCESSESThecurrentvalue+(2timesthenumberofchangesourcesplanned)PARALLEL_MAX_SERVERSEqualtothenumberofDistributedHotLogchangesourcesplanned,butnolessthan4OPEN_LINKS50000000JAVA_POOL_SIZETRUEGLOBAL_NAMES10.2.0COMPATIBLE
ValueParameterDataDictionaryViewsSupportingCDCCHANGE_SOURCESlistsexistingchangesources.CHANGE_SETSlistsexistingchangesets.CHANGE_PROPAGATIONSdescribesthestreamspropagationassociatedwithagivendistributedHotLogchangesourceonthesourcedatabase.CHANGE_TABLESlistsexistingchangetables.DBA_SOURCE_TABLESlistspublishedsourcetables.DBA_PUBLISHED_COLUMNSlistspublishedsourcetablecolumns.DBA_SUBSCRIPTIONSlistsallregisteredsubscriptions.DBA_SUBSCRIBED_TABLESlistspublishedtablestowhichsubscribershavesubscribed.DBA_SUBSCRIBED_COLUMNSliststhecolumnsoftablestowhichsubscribershavesubscribed.TransportationinaDataWareh
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- java课程设计形编辑系统
- 24屋盖钢结构课程设计
- ux课程设计培训
- flash关于疫情的课程设计
- 合同退款协议书范本
- 酒店预订代理合作协议
- 水电工程安全监测服务合同
- 外贸采购合同的法律效力
- 遗产继承合同补充协议
- 快速购销合同
- 安全检查记录表-等保制度模板
- 2021五四制新青岛版五年级科学上册20《食物链》课件
- 地理高三一轮复习试卷讲评公开课课件
- 高考地理热点问题-光伏治沙-课件
- 2023年云上贵州大数据(集团)有限公司招聘笔试模拟试题及答案解析
- 七年级英语上册Unit3IsthisyourpencilSectionA11a-2d教案新版人教新目标版
- DB31 506-2020 集成电路晶圆制造单位产品能源消耗限额
- PR-13 纠正与预防措施管理程序
- 《论语》十二章-多角度分析问题议论微写作 课件-统编版高中语文选择性必修上册
- DB15T 1641-2019 煤化工企业输煤栈桥施工技术规范
- 《计算机组装与维护》技能考核方案
评论
0/150
提交评论