DataStage常见的开发工作_第1页
DataStage常见的开发工作_第2页
DataStage常见的开发工作_第3页
DataStage常见的开发工作_第4页
DataStage常见的开发工作_第5页
已阅读5页,还剩10页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、本文档涉及到的处理方式有:数据类型转换,表的连接,计算。工作需求:从给定的文本中读取数据,经过数据类型的转化,并计算后写入给 定文本文件或者数据库中。例1 :数据源:Staff.txt要求:求不同的工作种类的总工资和平均工资。1.源数据分析,数据如下:所要分析的内容有:分隔符(delimiter),是空格,单引号,双引- 号等等。是否存在空值,如下图中YEARS和COMM字段中存在下划线。字段中记录的最大长度,如下图中共有七个字段,要记录每个字段的最大长度是多少。-记事本立件楚)编辑 榕式 查看 帮助IDHAI1EDEPTJOBYEARS SALARYCOMM10Sanders2& Mgr71

2、8357.50-Pernal20 Sales818171.25612.4530Marnghi3fl Mgr517506.75-nc0Brien38 Sales618006.008M.5550Hanes15 Mgr1Q20659.&B60Quigley38 Sales二16806.30450.25Rothnan15 Sales16502.831152.0080James20 Clerk13504.6612S.2090Koonitz42 Sales618001.751386.70100Plotz蚯Mgr718352.SO110Ngan15 Clerk51250S.2B206.60126Naught

3、on38 Clerk一1295U.75180.00130Yanaguchi1*2 Clerk610505.9075.6010 Fraye51 Mgr621150.00150Willians51 Sales619456.50637.65160Mollnare1Q Mgr722959.20-170Kernisch15 Clerk片12258.50110.10180AbrahamsClerk312009.75236.50190Sneider20 Clerk814252.75126.50缈Scoutten蚯 Clerk一11S08.60&20210Lu1 0 Mgr1020010.00Smith51

4、Sales717654.&0992.80Lundquist51 Clerk313969.80189.65Daniels1 0 Ngr519260.25二250Wheeler51 Clerk614460.00513.30260Joris1 0 Mgr1221234.0Q-270Lea66 Mgr918555.50一280Wilson66 Sales918671*. 50911.50Quill84 Mgr1019S1S.0Q-3卵Dauls84 Sales515454.50806.102.需求分析:最终的需求是不同工作种类的平均工资和总工资。从源数据可以看出,共有3种JOB: Sales Cler

5、k、Mgr。另外需要的字段就是SALARY,其他字段都是多余的。由此可以确定使用 Sequential File Stage读取数据后,即可使用 Copy Stage过 滤掉多余的字段。因为源数据在文本形式下都是字符型,我们用Varchar型来读取数据,需要转换成实际的类型。如SALARY的类型,应该是 Decimal类型。或者是Int型,double型的。故接下来的转换会用到Transformer Stage进行数据类型的转换。假设已经转换成功,接下来要做的是分组和计算了。故需要使用Aggegator Stage在使用Aggegator Stage时就能够分组并计算出总的工资,但是要算平均工

6、资的时候,需要分组计算出每种 JOB共有人,即分组计算不同JOB下的记录数。同样是使用AggegatorStage,但是没有办法在同一个里面实现,故采用两个 Aggegator Stage之后使用LookupStage做连接。后再使用Transformer Stage,求平均工资。最后写入文本或者数据库中。详细设计如下图:顺便赘述一下,源数据的文本文件是预先存放在DS的服务器上的,所以我们才能读取。详细开发步骤如下:1.编辑(l)Output Properties中有一点要注意,当源数据的第一行是列名时选择:TrueSequential_File_0 Sequential FileStage

7、Q山 |Output ft:DSLink6TlColumns.Gerrad Properties | Format Columns | AdvancedLi门已is Lqlum门皿和厌:Truertformation;Type: ListII true., the first line d a file contains coliwm iwnes on writing and 忌 ignored on reading.三 O Source曲 File = /home/dsiidm/staff.txt Read Method = Specific File() = OptionsFirst Li

8、ne is Cotumn Hames TrueQ Ketp File Paili(*on$ False Missing File Mode = Depends Fl eject M ode = Continue Fl eport Progress = YesOutput Format要注意选择Sage Output |Output namec DSLinkGColumns.Delimit 酊-i_ Record laydQ Final defcriiter = end- Field defaultsDelmnitei = wihitespace:金 Quote nore -_ Type dea

9、ut$ Gertera String Decimal Numeric Date Time TinwstampwhitespaceT raikig delimiter on all fields of the lecord.whilespace Import skips all standard while-space characters (space, tab, and newline trailing a field.end - The last field in the(ecord 锂 composed of all remaining byte? unHI the end of Hie

10、 record.none = Fields have no delimiter.null The delimiter is tht ASCII niJI character. = Asine ASCII character.Available properties to bcW:勺General Piopeitiss Focmal | Columns Adyanctd Properties:Output Colu mn这里的字段及类型长度都是自己填进去的,因为表结构是不存在的,无法通过DataStageManger加载进来。通过这个例子我们可以明白,无论是通过Load,还是手写,都只是给一个表

11、结构给源数据。Shge Q呷出Columns Advanced |Output name: |DSLinkSColumn nameSQL typeExtendedLength | ScaleNullableGereral | Properties Format1IDVarChar3Na2 1NAMEVarChar9NoDEPTVarChar2No斗JOBVai Char5Na5YEARSVarChar1No 1SALARYVaChar02No_7 |COMMVaCharR2NaView 一下源数据:可见数据已成功加载进来。tt DSBrovser*. Sequent ial_File_0vDS

12、Link6 Data BrowserIDNAMEDEPTJOBYEARSSALARVCOMMEEBSanders20Mgr118357.50一20Perna120Sales818171,25612.4530Marnghi38Mgr5175 06,75-40OBrien38Sales61B006.0081*6-55S0Hanes15Mgr12 06S9.8U-60Quigley38Sales一16808.30650.2S70Rothman15Salas116502.8311S2.BO80Janes20Clerk一13504.60128玄990Koonitz42Sales618001.751386

13、.70100PlotzK2.Mgr118352.RO一110Ngan15Clerk呂12508.21206,60120Naughton38Clerk-12954.75180.00190Vjmaguchi42Clerk610505-9075.60140Fraye51Mgr621150-00-150Williams51Sales619456.50637.65160Molinare10Mgr122959-20170Kermisch1&Clerku12258-50110佃186Attrahans38Clerk312009.752360190Sneider20Clerk81*1252.75126.502

14、0QScoutten也2Clerk-11508,60820210Lu10Mgr120010.00 -220Smith51Salps717654.51992.6023 0Lundquist51Clerk313369-8B1B9.65240Daniels10Mgr519260.25一2&0Wheeler51Clerk6266Jones10Mgr121234.00一27BLea66Mgr918555*50一28 0Wilson66Sales918674.50811.5029 eQuill84Mgr119818,00-300Dauls84Sales515454.51B06.10310Graham66S

15、ales121000.00200.303. copy stage作用:过滤掉多余的字段。4.Transformer Stage ,我们需要的Salary的类型应该是数值型的故要把String型转换成Decimal 型。简单多拽后,右击选择Fun ctio n- Type Con version - Stri ngToDecimal.JOBTolianskiini|ian?Motj5a TrsCourtT an st JeCulumini N-dvriPt?SALARYMSlnngToDecimallTglTflnsFwni 5AZRYITrantCohmt匚口 nxhdint:IJnvahon

16、LoluBin Njww*T rtiansluirnJOB15ALAAY1Cdwm name | K刖S Q L lypeE KicndcdLcnh Scale NUIaUt_1JJO6 VwChai5No2 SALARY DMimal82 HqColumn MmeVai&iflrEintended LwigliiN呻at5Mo2 SAUWT VwChai52 Na13Siwe | Input | Ouipu |Staq&nwift pObSJarfGenerd 日甲蛹嵯 Atwced NLS LociieH O Grouping KqisO Group = JOB一 Agcpegations

17、 Aggraiion Type = CalculationF 宓 Colurnn for Calculation - SALARYSum OiJtput Colunin = SALWAYj-Dplioriks Alki Ndl Output FalseQ IMhod = HaslhSLtfTi OilpiJl Cdurfin: | SALARYType: Output: ColumnName oi column lo hold the sum ol data m liie aggregate column.4.上面的那个 Aggregator Stage (JobSalary)功能:分组后,对

18、Salary进行求和。Stage-Properties 的设置如下:AvailablejDioperties lo add.Decimai OutputOutput o MappingJobSalsiiEy arSlage | Input Qi*putOutput name jobSdloLookCdumns.fienetal tJappins olwins Advanced |JobStoLookExpressionColumn NameTransfJobSa.JOBJOBSum(TrsfJohSa.SALARY| SALARYCci lumnxDerivationTransfJobSa.J

19、OBSum(l ransfJobSa. SALARY)JOBSALARYStase | Input | flutput |Coun(RwsGeneral Properties | Athanced NLS LocaJa jt L_- Gidqpi林g KfijJS 題 Group-JOB-J AggregaliorE Aggregation T= Counl Rows Count Output Column = CcunlRows -OptiCMHS Allow Null Dutput Fake Q Method 工 Hash:ounl CJuB匚dlum门;CauntRow:rJmrri 已

20、 tion:Tjppe: Output ColumnColumn cn which a caint of lhe numbei of ieco(d$ wilhin each group is output.5下面的 Aggregator Stage (CountRows)功能:分组并计算 JOB的记录数。Stage-Properties 的设置如下:Count Output Column中的值是自己写入的。命名是自己起的。* CountRowE AggregatorOutput - Mapping6. Lookup Stage功能:将两个表做连接。此处我们在求平均工资的时候,需要用到按照JOB

21、 分组后的总工资/总的记录数。详细设置如下:LouklT rdriitforni| DiiYn1 CuluBin Ndirwf 1JobSatoLookJOBJOBJabSatoLookSALARYSALARY| 匚OUrifligwiL口Ccurtfl口HgWpynHRgwJ苗SMoLock | CourffigwLook.up |匚刁111的门的匕K.別SL 也puEmended Lcrgih Soaic HdRHu血匚l_ _ W机仙I丨 5| I Nd2 SAURY DouHeYesColumn nflmpK刪SQL typeEtflcnGtedl Lcngfri S胡u Nulati

22、e r丄2JDB口 VaiChdi5丽SAURY DouHeYk3CoortRomLI DouUeYb?LDakTramloim11 l.onlEupLookup St Mgr 10gS Jt电F轉M-7.后一个Transformer Stage功能:将经过 Lookup Stage连接好的表,计算平均工资,并 给字段改名字。再赘述一下:字段名 SumSalary和Average,是自己改写的。Laak.T ransfaimCokimn nameK即SQL 如eErfended Lentfh Scaie NiJIabJejJjDBVaiCliai5No2SALARYDoubteiYe*3Ccu

23、rtflowsYesiDlargarlColumn name JSQLtpeErtendad LengthScsie MUlat1 RMBinnFVaiCh2 StfnSalary8DoLbls伽3 AviageDoubleYasIFStageIncut name:|ToTaigeftColunwis,.,VieiVM Oata.8.Sequential File Stage (TargetS)功能:将最后的结果写入文本文件。此处也可以写入数据库。作为目标文件只要做如下设置即可:指定文件的写入路径。TarpetS - Sequential File旦ensdl Rfoptifrs Rartit

24、ionir Format Colunrts Advartctd-二 TargetFile = /home/djadinAaTgrfstafroSFile:l/hfirne/d&adm/largeflstaffOG Filo Update Mode verwrite-O Optere Cleanup On Faiure =Tnue First Line is Cdlumn Naniss * False R eject M cde = Continue一岂Type:- PathnanfteName of a file ihal the incoming 也诂 冈ill be written to.

25、关于In put -Format 一栏是没有必要像读取源数据时那样的设置,默认时,每行记录的最后文件的分隔符为:end每个字段之间分隔符为逗号,每个字段的值带有双引号。在运行后可以打开生成的文件:1U i i I i 1 1 i i 丄|Uj j . j I j, j j 2U. i i 1 i w 34 j w . 1 j . j 4lj j . j 1 . j j 3U . j j I j j . 0LI|1rClerk, f,l 5135.35口口口口口口口口:3 5*口5”, ”12 612 6125口口口口口口口29+口42 . 1443 233 OOOOOOOOISE+OS, f,

26、l. 78693 6OS333 333 359E+04F,3 .17863799999999988+05 ”1.980S7999999999993E+04,|b当然如果最后的需求要求我们更改,那就按照要求改动。StageInput nsnefiereral Properties | PrtitioninflPropeftiesc- Record level0 Final delimitef - end-i|_i Field def Auh0 Delimiter comma Quote double-Type default 目 General 亡 String U Decimal i Nume

27、ricAvailable properbes lo add: Fi char鸟 FbiaI rlRliiriilrif sfinnIri Datelr i TineT m&starftD到此一个完整的 JOB已经开发完成,接下来我们在DataStage Desinger中编译并执行这个JOB慑 oCcW鶴6 raws:.1 35f iows/secT3HITdransfcrm:35 iWfi. 35 -,-TiansroniiarilansfJobSa;3&3Sliows/secCouniRowsTransCount:35LookupfunlR uMookup:3 rcnws, 3 rows/

28、sacLocCrerwIicw3 rows, 3ldw$/5ttTeTaigefi 、3 ram3 4-row$/secTafgelS编译通过并运行成功,看一下目标数据:I t est 01 _ Targets. ToTargert Dat a Brovser具体的运行日志可以在DataStage Director当中看到,我们可以查看 JOB的运行情况,看到有警告报出,根据警告的信息进行调试。10:21:42Control10:21:4220(B-11Inlo10:21:422009-6-11Info10:21.422D0M11Inlo10:21:422009-B-11Inlo10:21:4

29、22om-nInlo10:21:42009-6-11InloCD 10:21:432009-6-11Warning010:21.4330OM11W-arning 10:21:432009-6-11Warning10:21420m-11Warning10:21:442009-B-11Warning(5 10:21:442009-6-11Warning010:21:442009-6-11Into10:21:442009-6-11Info10:21 442tXB-11Inlo2009-B-11Inlo10:21:442om-nInfo10:21:442009-6-11Control其中一个警告的详细

30、信息为:Stanhrig Job 曲*01Envtfonment vaiable settings: (.)Parallel job initiatedPardtel job default NLS map ASCLGB2312, default bcale OFFmain_program: Ascential DataStage(tmj Enterprise Edition 7.5.2 (.) main_program; orchgenerl: loaded (r.)main_program: APT configuratior file: /home/dsadmZAsceritial/Da

31、taStag已/C. Lookup: When checking operater: perator of typeKAPTLLITCreateOpwil. TargetS: When checking operMor: Wlhen alidai.mg e-Kporl schema: Al field I, TargetS: When checking operator: When validating 酹pmt schana: At field . TargetS: Wheh checking operator: A sequential operator cannol premve (he

32、. StalfjD: Missing irecoid defaniter ,rn saw EOF insteadStaff JO: Import warning at record 34,StalljO: Import complete; 36 records imported successfully, 0 rejected. TargetsExport complete; 3 records exported! successfulljii. 0 rejected, mairvprogiam: Step execiHion finished with stalus OK main_prog

33、ram: Startup time. 0:01; production run time, 0:00.Pardlel job reports succssful completionFinished Job testDI.if Event Detail回冈ProjectJob name:test (19216S 37.100)testOIClo&eE vent itT imestamp:Next2009S-11 10:45:03Event type:UserBMotn I|WorningdsadmMessage:T argetS: When checking operator: When va

34、lidating export schema At fieldSumS alaryEporting nullable field without null handling properties根据信息可以看出,由于字段可以为空,但是没有进行为空时候的处理。做如下处理:Ccillurnn 门酗总K刖SQL type Extended Length Scale NullableDeamipticin1 JOB口 VaiChar5 NoSumSelary DdmbleYes3 Average一 DoubleYes111:00:284911:00:28 11:00:29*11:00:29011:00

35、:29 11:OO:2S$11:00:29011:00:3111:00:32 11:00;320)11:00:3211:00:32*11:00:3211:00:32*11:00:32II 11:00:32I 念 nrnjra20O9-6ni2009-6-11 2DO9-M1 2009-6-11 2009-6-11 2009-6-11 2009-6-11 2009-6-11 200911 2009-6-11 2009-6-11 2009-6J1 2009 6-11 2009-6-11 2009压 11 2009-6-11 2009-6-11在此编译并运行JOB,并在 DataStage Dire

36、ctor 查看日志ControlS toting J ab le$tO1.InfoErwironment variable settings: (.)InfoPaallel iob inrtiatedIrifoPnllel iob default NLS map ASCL_G02312t defau* locale OFFInfomain_piograiic Ascential Data6tage(tm) Enterpnise E dition 7.5l2 (.Infomam_piogrant orchgeneral: loaded(.Infomain_piogramr APT configu

37、ration ffe; /hKwne/dsadm/Ascential/DataStase/C.Warning Lookup: When checking opefator: Operator of type APT_LUT CreateOpvjil. Warning TagetS. When checking oper-ator: A sequenlidl operatoi cannot preserve the. Warning S;Missing racvrd delimiter n* saw EOF insteadWaning StafFO: linpoit warring at rec

38、ord 34.InloSlLO: Impoilcomplete: 35 recordslimpated succtssfullP 0 rejected.InfoT vgatSJO; Export complete; 3 records exported successful, 0 rejected.Infomain_piogramc Step execution finished 曲h slatu = OK.Infomain_piogram. Stalup time, 0.03: production run time, 0:00.I rtfoParallel job reports succ

39、essful completionControlFinished J ob testOl.刚才的警告消失了。并可以根据其他的警告信息,处理。一般来说,出现的警告并不影响大局。当然警告的数目不能太多。工作需求:从两个文本文件中读取数据,做关联后保留以下字段:name(firstname+lastname),department name, sex, age并输入到数据文件中源数据: employeetxt 和 departmenttxt1.源数据分析: employeetxt2:件洞辑 稱式 互君 帘助吵EMPHOF1RSTHHEHIDIN1TLASTNAIIEWORKDEPTPHOHEHDHI

40、REDATE0BEDLEUELSEXBIRTHDATE008810CHRISTINEIHAASA00397801/81/1P65PBES18FQ6/24/1933Q0002BMICHAELLTHOMPSONB01347610/18/1973HA HA GIER18H02/92/1948SALI VAKmNC01H738cw/A5/197NAHAGER2FCQQ05BJOHNDGEYEREO167S9MANAGER161105/15/1925008860IRUIHGFSTERND116i2309/14/1973MA HA GIER1fi1107/07/191150S 8 870lEUflDPUL

41、flSKJ021783109/38/1980NRNRGER16FOS/Za/19530OBOQt ILLLHWHEhDEflSUHEllB8/1S/19/OHflHflEIFHFGS/15/19H1800100THEODOREQSPENSERE21097206/19/19S0MANAGERHiII12/18/195600Q119UIHCENZOGLUCCHESSIA 90349005/16/1958SftLESREP19II11/85/1929C0012D一OCONNELLft 90216712/ 05/1 M3CLERK1耳Hie/is/i9i2肌 instjDOLUHEE11IflJlHl

42、AHAC01l|?7907/28/1971AHALVSl卜G00149HEATHERtlHECHOLLSCQ1179312/157197*flHALm1SF0QO15BBRUCE-ADiftMSOHD11W1002/12/1972DESIGNER16H05/17/194700010 0ELIZABETHRFEANKRDll378210/11/1977DESIGNER17F0/12/1955Q0017BMASATOSHIJVDSH1NURAD112840W/15/1O7SDESIGNER16M01/85/195160Q18BMARILYNSSCOUTTEHD11168287/07/1973DES

43、IGNER17F02/21/19W00O19BJflHESHWALKERD11299607/26/1971(DESIGIHERH0ft/25/19520呻RRDAU1D一RRQUblD11U50103/0/1966PESIEhFR16HQ5/29/19M000210D1LLIAHTJONESt)11Wfii294711/1979DiESIGHER1?110G/23/19&3008220JENNIFERKLUTZD11&67208/29/1968DESIGNER18F03/19/1911866Q230JftMESJJEFFERSONQ21209411/21/1966CLERK14M05/3 D/

44、1935QI3H2aE1SALUHTDRE-nklHRtblD12137SH12/OF/1979CLERK17M03/31/195*4000250DANIELsSMITHB21096110/38/1969CLERK15H11/12/19390082613SYBILpJOHNSOHD21895389/11/1975CLERK1fiF10/05/1936C0027DHAmnLPEREZD219001H9/30/1980CLERK15FB5/26/1953department txtE department, tit -记事本回百文件U)MB(K)格式辺 査看 帮助pEPTNODEPTNAMEftD

45、MRDEPT LOCATIONA00SPIFFV COMPUTER SERVICE DIUA00-B01PLANNINGA00-C01INFORMATION CENTERADO-DQ1DEUELOPMENT CENTERA0O-D11MANUFACTURING SVSTEHSD01-D21ADMINISTRATION SVSTEMSDM-E01SUPPORT SERUICESAQO-E15OPERATIONSEQ1-E22SOFTUARE SUPPORTEO1-2.用 DataStage Designer 设计如下的 JOBdaparlmentDeplTcCcRejects3详细的流程,在此不

46、再重复赘述,在此重点介绍Transformer Stage中的设置。在这个需求中要求将first name和last name合并成Name输出。合并的处理比较简单,只要用英文的冒号将两个字段连接就行了 :ToTransform.FIRSTNME:ToT ransform.LASTNAME还要计算出age这里的思路是:先将String型,转化成date型,再用函数YearFromDate(CurrentDate()-Y earFromDate(变量)计算年龄。首先打开 Transformer Stage的Stage Properties点击 Variables,在里面写入变量名,并指定类型和长

47、度。ansformerTransformer Stage PropertiesStage | Inputs | utput$| T ransformeiNameInitial ValueSQLExtendedPrecisionScaleDesciiption1BrthdayDate100Generd IAdvanced Link Ordering| T riggers NL5 LomI& 点击OK后就能在Stage Variables看到写入的变量名 Birthday 了。再将 BIRTHDATE 拖到 Stage Variables 中,双击打开 Derivation 属性, 右击选择 Fu

48、nction Type Conversion StringToDate,写成如下形式: StringToDate(ToTransform.BIRTHDATE,%mm-%dd-%yyyy),这里年月曰的格式 取决于源数据中的年月日格式。第三,在EmpLookup的Column name中写入AGE,则在Mapping表中会出现 AGE 一栏,双击 AGE 左边后,右击选择 FunctionDate&TimeYearFromDate,具体内容为:YearFromDate(CurrentDate()- Y earFromDate(Birthday)TrarLsforMer 一 TrartsforBer Stage會护删口I I S B I MI BiLASTNAMEWORKDEPTSEXBIRTHDATESlage VariableSlringT o&aleT oT ransloim.BI RTlH &AT EBirthday / /EmpLookup V

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论