版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1.Whenattemptingtominimizememory
usage,
themostefficientwaytodogroupprocessingwhenusingtheMEANSprocedureisto
use:
A.theBYstatement.
B.GROUPBYwiththeNOTSORTEDspecification.
C.theCLASSstatement.
D.multipleWHEREstatements.
ShapeoftheDataMostEfficientTools
arelativelysmallnumberofdistinctcombinationsofvalues・PROCMEANS(orPROC
oftheclassvariable-thesummarizeddatacontains10%SUMMARY)
orfeweroftheobservationsintheoriginaldataset•PROCREPORT
•PROCTABULATE
arelativelylargenumberofdistinctcombinationsofvalues•PROCSQL
oftheclassvariable-thesummarizeddatacontainsmore•DATAstepandPROCSORT
than10%oftheobservationsintheoriginaldataset
由UsingaGROUPBYstatementin
PROCSQLissimilarinCPUusageto
aPROCSORTst叩followedbya
DATAstep.
ComparingResourceUsageacrossThreeTechniquesforUsingPROCMEANS
Thethreetechniquesforsummarizingdataforspecificcombinationsofclass
variables(allbutthebasicPROCMEANSstep)differinresourceusageasfollows:
•TheTYPESstatementinaPROCMEANSstepusesthefewestresources.
•AprogramthatcontainstheNWAYoptioninmultiplePROCMEANSstepsuses
themostresourcesbecauseSASmustreadthedatasetseparatelyforeachPROC
MEANSstep.
•TheWHERE=datasetoptioninaPROCMEANSstepusesmoreresourcesthanthe
TYPESstatementinPROCMEANSbecauseSASmustcalculateallpossible
combinationsofclassvariablesbeforesubsetting.However,theWHERE=data
setoptioninPROCMEANSusesfewerresourcesthantheNWAYoptioninmultiple
PROCMEANSsteps.
2.TheSASdatasetWORK.CHECKhasa
variablenamedId_Codeinit.
WhichSQLstatementwouldcreateanindexonthisvariable?
A.createindexIdCodeonWORK.CHECK;
B.createindex(ld_Code)onWORK.CHECK;
C.makeindex=IdCodefromWORK.CHECK;
1).defineindex(Id_Code)inWORK.CHECK;
关于在SQL语句中创建INDEX的语句格式如下:
Generalform,CREATEINDEXstatement:
CREATE<UN1QUE>INDEXindex-nameONtable-name(column-name-l<,,,,column-name-n>);
示例语句:
/*creatingasimpleindex*/
datacrackman;
inputid_codename$@;
datalines;
1java
2c++
3dephi
4asp
5c#.net
r
procsql;
createindexid_codeonwork.crackman;
quit;
log提示简单索引id_code已经建立
或者使用:
procsql;
createindexidcodeonwork.crackman(id_code);
quit;
之所以括号内的column-name是可选项,是应为在建立Simpleindex时,
index-name必须与column-name是同一个名字。
datacrackman;
inputid_codename
datalines;
1java
2c++
3dephi
4asp
4c#.net
r
procsql;
createuniqueindexid_codeonwork.crackman;
quit;
LOG:提示ERROR:文件CRACKMAN的索弓Iid_code不允许使用重复的值。
/*dropingindexse*/
procsql;
dropindexdaily
fromwork.marchflights;
quit;
3.GiventheSASdatasets:
WORK.EMPLOYEEWORK.NEWEMPLOYEE
NameDeptNamesSalary
AlanSalesMichelle50000
MichelleSalesParesh60000
ASASprogramissubmittedand
thefollowingiswrittentotheSASlog:
101procsql;
102selectdept,name
103fromWORK.EMPLOYEE
104wherename=(selectnamesfromnewemployeewheresalary>40000)
ERROR:Subqueryevaluatedtomorethanonerow.
105;
106quit;
Whatwouldallowtheprogramto
successfullyexecutewithouterrors?
A.Replacethewhereclausewith:
whereEMPLOYEE.Name=(selectNamesdelimitedwith
fromWORK.NEWEMPLOYEE
whereSalary>40000);
B.Replaceline104with:
whereEMPLOYEE.Name=ANY(selectNamesseparatedwith
fromWORK.NEWEMPLOYEE
whereSalary>40000);
C.ReplacetheequalsignwiththeINoperator.
D.Qualifythecolumnnameswiththetablenames.
首先分析这个原程序第104行,中wherename=(selectnamesfromnewemployeewhere
salsry>40000)
原程序在执行过程,应先是执行子查询,selectnamesfromnewemployeewhere
salsry>40000,将返回的结果作为下一个查询的筛选条件。
在这里返回的结果NAMES这个变量包括了两个值Michelle和Paresh,所以在执行过程出现
错误。那么应该改成IN或者ANYoINANY指定的是一个匹配范围,而等号必须是•个具体
匹配的值。
这里可以思考SEPARATEDWITH这个语句,SAS认为是错误。
4.GiventheSASdataset
SASUSER.HIGHWAY:
SteeringSeatbeltSpeedStatusCount
absentNo0-29serious31
absentNo0-29not1419
absentNo30-49serious191
absentno30-49not2004
absentno50+serious216
ThefollowingSASprogramissubmitted:
procsqlnoprint;
selectdistinct
Speed[insertSQLclause]
fromSASUSER.HIGHWAY;
quit;
titlel〃Speedvaluesrepresentedare:&GROUPS”;
procprintdata=SASUSER.HIGHWAY;
run;
WhichSQLclausestoresthetext0-29,30-49,50+in
themacrovariableGROUPS?
A.into&GROUPS
B.into:GROUPS
C.into:GROUPSseparatedby','
D.into&GROUPSseparatedby','
此题考察的是SQL与宏的结合。
用SQL语句产生宏变量,语法结构如下:
PROCSQLNOPRINT;
SELECTcolumnl<,column2,...>
INTO:macro-variable-1<,:macro-variable-2,...>
FROMtable-1|view-1
<WHEREexpression>
<otherclauses>;
QUIT;
下面这个语法结构是将一变量的多个值组合在一个宏变量里面,例如:
Generalform,SELECTstatementwithINTOclausefbrcombiningvaluesintoone
macrovariable:
PROCSQLNOPRINT;
SELECTcolumn!
INTO:macro-variable-1
SEPARATEDBY'delimited'
FROMtable-1|view-1
<WHEREexpression>
<otherclauses>;
QUIT;
/*示例程序:*/
datacrackman;
inputSteering$Seatbelt$Speed$Status$Count@;
datalines;
absentNo0-29serious31
absentNo0-29not1419
absentNo30-49serious191
absentno30-49not2004
absentno50+serious216
run;
procsql;
11
selectdistinctspeedinto:groupsseparatedbyzfrom
crackman;
quit;
这里创建的宏变量是全局宏变量
/*Supposeyouwanttocreaterangesofmacrovariablesthat
containthecoursecode,location,andstartingdateofall
coursesthatarescheduledin2002.*/
procsqlnoprint;
selectcount(*)into:numrows
fromsasuser.schedule
whereyear(begindate)=2002;
%letnumrows=&numrows;
%putThereare&numrowscoursesin2002;
selectcourse_codezlocation,begindateformat=mmddyylO.
into:crsidl-:crsid&numrows,
splacel-:place&numrowsz
:datel-:date&numrows
fromsasuser.schedule
whereyear(begindate)=2002
orderbybegindate;
%put_user__;
quit;
5.TheSASdatasetWORK.CHECKhasan
indexonthevariableCodeandthefollowing
SASprogramissubmitted.
procsortdata=W0RK.CHECK;
byCode;
run;
Whichdescribestheresultofsubmitting
theSASprogram?
A.TheindexonCodeisdeleted.
B.TheindexonCodeisupdated.
C.TheindexonCodeisuneffected.
D.Thesortdoesnotexecute.
SORT过程是不能对已经建立索引INDEX的数据集进行SORT或者REPLACE,因为那样会破坏数
据集的INDEX文件。
当然如果非要排序,破坏INDEX文件,可以再后面加入一个参数force。
6.ThetableWORK.PILOTScontainsthe
followingdata:
WORK.PILOTS
IdNameJobcodeSalary
001AlbertPT150000
002BrendaPT170000
003CarlPT160000
004DonnaPT280000
005EdwardPT290000
006FloraPT3100000
Thedatasetwassummarizedtoincludeaverage
salarybasedonjobcode:
JobcodeSalaryAvg
P/p>
P/p>
PT16000060000
PT28000085000
PT29000085000
PT3100000100000
WhichSQLstatementcould\0Tgenerate
thisresult?
A.select
Jobcode,
Salary,
avg(Salary)label=,Avg'
fromWORK.PILOTS
groupbyJobcode
orderbyId
B.select
Jobcode,
Salary,
(selectavg(Salary)
fromWORK.PILOTSasPl
wherePl.Jobcode=P2.Jobcode)asAvg
fromWORK.PILOTSasP2
orderbyId
C.select
Jobcode,
Salary,
(selectavg(Salary)
fromWORK.PILOTS
groupbyJobcode)asAvg
fromWORK.PILOTS
orderbyId
D.select
Jobcode,
Salary,
Avg
from
WORK.PILOTS,
(select
JobcodeasJc,
avg(Salary)asAvg
fromWORK.PILOTS
groupby1)
whereJobcode=Jc
orderbyId
DATAPILOTS;
inputIdName$Jobcode$Salary@@;
datalines;
001AlbertPT150000
002BrendaPT170000
003CarlPT160000
004DonnaPT280000
005EdwardPT290000
006FloraPT3100000
run;
/*选项A*/
procsql;
select
Jobcode,
Salaryz
avg(Salary)label='Avg1
fromWORK.PILOTS
groupbyJobcode
orderbyId;
quit;
/*NOTE:指定的查询涉及按其SELECT子句中没出现的一项排序。*
//NOTE:查询要求将汇总统计量与原始的数据重新合并。可以运行出结果,sas没有报
错★/
/*选项B*/
procsql;
select
Jobcode,
Salaryz
(selectavg(Salary)
fromWORK.PILOTSasPl
wherePl.Jobcode=P2.Jobcode)asAvg
fromWORK.PILOTSasP2
orderbyId;
quit;
*这是一个correctedsubqueriesz
Correlatedsubqueriescannotbeevaluatedindependentlyzbut
dependonthevalues
passedtothembytheouterqueryfortheirresults.;
/*选项c*/
procsql;
select
Jobcodez
Salary,
(selectavg(Salary)
fromWORK.PILOTS
groupbyJobcode)asAvg
fromWORK.PILOTS
orderbyId;
quit;
/*运行不出结果,因为在括号中是multiple-valuenoncorrectedsuqueries,
Somesubqueriesaremultiple-valuesubqueries:theyreturnmorethan
onevalue(row)totheouterquery.
Ifyournoncorrelatedsubquerymightreturnavalueformorethanone
row,buttheWHEREorHAVINGclauseintheouterquery
containsanoperatorotherthanoneoftheoperatorsthatarespecified
above,thequerywillfail.Anerrormessage
isdisplayedintheSASlog,whichindicatesthatthesubqueryevaluated
tomorethanonerow.*/
/*选项d*/
procsql;
select
Jobcodez
Salary,
Avg
from
WORK.PILOTS,
(select
JobcodeasJcz
avg(Salary)asAvg
fromWORK.PILOTS
groupby1)
whereJobcode=Jc
orderbyId;
quit;
/*括号内的子查询已经得出一个各个jobcode的均值,形成一个view,通过where语
句进行match,最后可以得出结果//
7.Aquickruleofthumbforthespace
requiredtorunPROCSORTis:
A.twotimesthesizeoftheSASdatasetbeingsorted.
B.threetimesthesizeoftheSASdatasetbeingsorted.
C.fourtimesthesizeoftheSASdatasetbeingsorted.
D.fivetimesthesizeoftheSASdatasetbeingsorted.
这道题有争议,sasprepguide里面的原话是这样的:
TheSORTprocedureisonetechniquethatcanbeusedtogroupororderdata.
However,theSORTprocedurecanuseahighvolumeofresources.Whenanuncompressed
datafileissortedusingtheSORTprocedure,SASrequiresenoughspaceinthedata
libraryfortwocopiesofthedatafile,plusaworkspacethatisapproximatelytwo
tofourtimesthesizeofthedatafile.
However,Themultiplier4appliesonlytoutilityfilesusedinreleasesprior
toSAS9,whenPROCSORTneedstousediskspaceinordertosortthedata.For
in-memorysortingandsortingwithSAS9andlater,themultiplieris2orless.
ThreadedprocessingtakesadvantageofmultipleCPUsbyexecutingmultiple
threadsinparallel(parallelprocessing).Threadedproceduresarecompletedinless
realtimethanifeachtaskwerehandledsequentially,althoughtheCPUtimeis
generallyincreased.
Whenathreadedsortisused,theobservationsintheinputdatasetaredivided
intoequaltemporarysubsets,basedonhowmanyprocessorsareallocatedtotheSORT
procedure.Eachsubsetisthensortedonadifferentprocessor.Thesortedsubsets
aretheninterleavedtorecreatethesortedversionoftheinputdataset.
所以最后选A两倍。
8.Multi-threadedprocessingforPROC
SORTwilleffectwhichofthesesystem
resources?
A.CPUtimewilldecrease,wallclocktimewilldecrease
B.CPUtimewillincrease,wallclocktimewilldecrease
C.CPUtimewilldecrease,wallclocktimewillincrease
1).CPUtimewillincrease,wallclocktimewillincrease
Thread-enabledproceduresarecompletedinsubstantiallylessrealtimethanifeach
taskishandledsequentially,buttheCPUtimeisgenerallyincreased.
ThreadedsortingisenabledordisabledbyusingtheSASsystemoptionTHREADS|
NOTHREADSortheTHREADS|NOTHREADSprocedureoption.
Generalform,SORTprocedurewiththeTHREADS|NOTHREADSoption:
PROCSORTSAS-data-set-nameTHREADS|NOTHREADS;
9.GiventheSASdataset
WORK.TRANSACT:
RepCostShip
SMITH20050
SMITH40020
JONES10010
SMITH600100
JONES1005
Thefollowingoutputisdesired:
Rep
JONES105
SMITH250
WhichSQLstatementwasused?
A.select
rep,
min(Cost+Ship)
fromWORK.TRANSACT
orderbyRep
B.select
Rep,
min(Cost,Ship)asMin
fromWORK.TRANSACT
summarybyRep
orderbyRep
C.select
Rep,
min(Cost,Ship)
fromWORK.TRANSACT
groupbyRep
orderbyRep
D.select
Rep,
min(Cost+Ship)
fromWORK.TRANSACT
groupbyRep
orderbyRep
/*示例程序:*/
dataTRANSACT;
inputRep$CostShip@;
datalines;
SMITH20050
SMITH40020
JONES10010
SMITH600100
JONES1005
run;
/★选项A★/
procsql;
selectrepfmin(Cost+Ship)
fromWORK.TRANSACTorderbyRep;
quit;
/*
JONES105
JONES105
SMITH105
SMITH105
SMITH105
得出的结果应该是没有分组比较的结果,就是自身比较,所以会得到5个结果,没有分组,
而且M1N的值都为105*/
/*选项B*/
procsql;
selectRep,min(CostrShip)asMin
fromWORK.TRANSACT
summarybyReporderbyRep;
quit;
/*ERROR22-322:语法错误,期望下列之一:,3,GROUP.*/
/*选项C*/
procsql;
selectRepzmin(CostzShip)
fromWORK.TRANSACT
groupbyReporderbyRep;quit;
/*Rep
JONES10
JONES5
SMITH50
SMITH20
SMITH100
得到是在一个REP上的这两个变量COSTSHIP之间的小值。*
/*选项D*/
procsql;
selectRep,min(Cost+Ship)
fromWORK.TRANSACT
groupbyReporderbyRep
r
quit;
/*因为在要得到的数据集中有一个是没有命名的,所以在
中MIN(COST+SH工P)是没有命名的变量的值。*/
变形题:
GiventheSASdatasetWORK.TRANSACT:
RepCostShip
SMITH20050
SMITH40020
JONES10010
SMITH600100
JONES1005
Thefollowingoutputisdesired:
Rep
JONES105
JONES105
SMITH105
SMITH105
SMITH105
WhichSQLstatementwasused?
A.
selectRep,min(Cost+Ship)
fromWORK.TRANSACT
orderbyRep
B.
selectRep,min(Cost,Ship)asMin
fromWORK.TRANSACT
summarybyRep
orderbyRep
C.
selectRep,min(Cost,Ship)
fromWORK.TRANSACT
groupbyRep
orderbyRep
D.
selectRep,min(Cost+Ship)
fromWORK.TRANSACT
groupbyRep
orderbyRep
Answer:A
10.ThefollowingSASprogramis
submitted:
%letValue=9;
%letAdd=5;
%letNewval=%eval(&Value/&Add);
%put&Newval;
Whatisthevalueofthemacrovariable
Newvalwhenthe%PUTstatementexecutes?
A.0.555
B.2
C.1.8
D.1
%PUTstatementshowsthat%EVALdiscardsthefractionalpart(小数部分),whenit
performsdivisiononintegersthatwouldresultinafraction:
如果要获得准确的结果应该选择函数%sysevalf替换%eval.
the%EVALfunctiongeneratesERRORmessagesinthelogwhenitencountersan
expressionthatcontainsnon-integervalues.InordertoavoidtheseERRORmessages,
youcanusethe%SYSEVALFfunction.The%SYSEVALFfunctionevaluatesarithmeticand
logicalexpressionsusingfloating-pointarithmetic.
11.ThefollowingSAScodeissubmitted:
dataWORK.TEMPWORK.ERRORS/view=W0RK.TEMP;
infileRAWDATA;
inputXaXbXc;
ifXa=.thenoutputWORK.ERRORS;
elseoutputWORK.TEMP;
run;
WhichofthefollowingistrueoftheWORK.ERRORSdataset?
A.ThedatasetiscreatedwhentheDATAstepissubmitted.
B.ThedatasetiscreatedwhentheviewTEMPisusedinanotherSASstep.
C.ThedatasetisnotcreatedbecausetheDATAstatementcontainsasyntaxerror.
I).ThedescriptorportionofWORK.ERRORSiscreatedwhentheDATAstepissubmitted.
UsingSASDATAStepViewstoConserveDataStorageSpace(continued)
DATAStepViews
ADATAstepviewcontainsapartiallycompiledDATAstepprogramthatcanreaddatafromavarietyofsources,including
•rawdatafiles
•SASdatafiles
・PROCSQLviews
•SAS/ACCESSviews
•DB2,ORACLE,orotherDBMSdata.
ADATAstepviewcanbecreatedonlyinaDATAstepADATAstepviewcannotcontainglobalstatements,host-specificdatasetoptions,ormosthost
specificFILEandINF1LEstatements.Also,aDATAstepviewcannotbeindexedorcompressed.
YoucanuseDATAstepviewsto
•alwaysaccessthemostcurrentdatainchangingfiles
•avoidstoringacopyofalargedatafile
•combinedatafrommultiplesources
Thecompiledcodedoesnottakeupmuchroomforstorage,soyoucancreateDATAstepviewstoconsen/ediskspace.Ontheotherhand,useofDATA
stepviewscanincreaseCPUusagebecauseSASmustexecutethestoredDATAstepprogrameachtimeyouusetheview
Generalform,DATAsteptocreateaDATAstepview:
DATASAS-data-view<SAS-data-file-1...SASdata-file-n>I
VIEW-SAS-data-v/etv;
<SASstatements〉
RUN;
where
•SAS-data-viewnamesthedataviewtobecreated
•SAS-data-file-1SAS-data-file-nisanoptionallistthatnamesanydatafilestobecreated
•SASstatementsincludesotherDATAstepsyntaxtocreatethedataviewandanydatafilesthatarelistedintheDATAstatement
TheVIEW=optiontellsSAStocompile,butnottoexecute,thesourceprogramandtostorethecompiledcodeintheinputDATAstepviewthatisnamedinthe
option.
回IfyouspecifyadditionaldatafilesintheDATAstatement,SAScreatesthesedatafileswhentheviewisprocessedinasubsequentDATAorPROCstep.
Therefore,youneedtoreferencethedataviewbeforeyouattempttoreferencethedatafileinlatersteps.
Example
ThefollowingprogramcreatesaDATAstepviewnamedCompany.NewdatathatreadsfromthefilereferencedbythefilerefintheINFILEstatement
datacoiqsany.newdata/view=company.newdata;
infile<fileref>;
<DATAszepscatezencs>
run;
所以这里sasstatement并没有语法错误,更为准确的答案是B
12.Whichtitlestatementwouldalwaysdisplaythecurrentdate?
A.titleTodayis:&sysdate.〃;
B.title"Todayis:&sysdate9.〃;
C.title"Todayis:&today.〃;
D.titleTodayis:%sysfunc(today(),worddate.);
A选项和B选项输出的是sassessionstarted的时间或者日期
SYSDATE9representsthedateonwhichtheSASsessionstarted,andSYSTIMErepresents
thetimeatwhichtheSASsessionstarted.Supposeyouwouldratherseethedate
insomeotherformat,orsupposeyouwouldratherseethecurrentdateortime.You
canusethe%SYSFUNCfunctiontoexecuteotherSASfunctionsaspartofthemacro
facility.
Generalform,%SYSFUNCfunction:
%SYSFUNC(function(argument(s))<.formats)
where
•functionisthenameoftheSASfunctiontoexecute.
•argument(s)isoneormoreargumentsthatareusedbyfunction.Usecommastoseparateallarguments.Anargumentcanbea
macrovariablereferenceoratextexpressionthatproducesargumentsforafunction.
•formatisanoptionalformattoapplytotheresultoffunction.Bydefault,numericresultsareconvertedtoacharacterstring
usingtheBEST12.format,andcharacterresultsareusedastheyare.withoutformattingortranslation.
举个例子:
Supposeyouwanttocreateareporttitlethatincludesthecurrentdatein
WORDDATE,format.Youcouldusethisstatement:
title"ReportProducedon%sysfunc(today(),worddate.)";
However,thatwouldresultinthefollowingtitle:
ReportProducedonJune7,2002
Youcanmaskthecommabyusingthe%QSYSFUNCfunctioninstead,asfollows:
title"ReportProducedon
%sysfunc(left(%qsysfunc(today(),worddate.)))";
Themodifiedstatementgeneratesthefollowingtitle:
ReportProducedonJune7,2002(去空)
13.GiventheSASdatasets:
WORK.ONEWORK.TWO
IdNameIdSalary
112Smith243150000
243Wei35545000
457Jones52375000
ThefollowingSASprogramissubmitted:
dataWORK.COMBINE;
mergeWORK.ONEWORK.TWO;
byId;
run;
WhichSQLprocedurestatementproducesthesameresults?
A.createtableWORK.COMBINEas
select
Id,
Name,
Salary
from
WORK.ONE
fulljoin
WORK.TWO
onONE.Id=TWO.Id
B.createtableWORK.COMBINEas
select
coalesce(ONE.Id,TWO.Id)asId,
Name,
Salary
from
WORK.ONE,
WORK.TWO
whereONE.ld=TWO.Id
C.createtableWORK.COMBINEas
select
coalesce(ONE.Id,TWO.Id)asId,
Name,
Salary
from
WORK.ONE
fulljoin
WORK.TWO
onONE.Id=TWO.Id
orderbyId
D.createtableWORK.COMBINEas
select
coalesce(ONE.Id,TWO.Id)asId,
Name,
Salary
from
WORK.ONE,
WORK.TWO
whereONE.Id=TWO.Id
orderbyONE.Id
下面对四个答案进行解析。
A:
犯了一个明显的错误,就是ID这个变量引用不明确。所以运行不出结果来;
B:
procsql;
createtableWORK.COMBINEas
select
coalesce(ONE.IdzTWO.Id)asIdf
Name,
Salary
from
WORK.ONE,
WORK.TWO
whereONE.Id=TWO.Id
r
quit;
procprint;run;
*WhenyouaddtheCOALESCEfunctiontotheSELECTclauseofthePROC
SQLouterjoin,
thePROCSQLouterjoincanproducethesameresultasaDATAstep
match-merge.;
/*返回只有一个观测,因为ONE和TWO的连接按照where筛选条件进行了筛选,结果只有一
条符合条件。*/
C
procsql;
createtableWORK.COMBINEas
select
coalesce(ONE.Id,TWO.Id)asIdf
Name,
Salary
from
WORK.ONE
fulljoin
WORK.TWO
onONE.Id=TWO.Id
orderbyId
r
quit;
/*首先执行的是按照ON条件执行FULLJOIN链接,此时如果不用COALESCE,数据集会
有两个ID,一个是ONE•个是TWO的,
如果ONE和TWO的工D不匹配,那么想要的工D就为缺失值。
那么COALESCE函数就是合并这两个工D,然后重复的合并在一起,剔除缺失值,所以结
果就和MERGE一致。*/
D;
错误和B一样,连接方式不是FULLJOIN,而是innerjoin。
看mergeSQL在横向水平上合并数据集的区别
datacrackmanl(index=(x));
inputxy;
datalines;
12
34
56
/
run;
datacrackman2(index=(x));
inputxz;
datalines;
18
39
57
/
run;
datacrackman;
mergecrackmanlcrackman2;
byx;
run;
procsql;
createtablecrackmanasselecta.xra.yfb.z
fromcrackmanla,crackman2bwherea.x=b.x;
quit;
两种方法都可以实现既定的目标
但是各有什么优缺点呢?
MERGE:
优点:
1.除非内存限制,否则对要合并的数据集的观测总数是没有限制的
2.可以通过DATASTEP里面的DOLOOP和ARRAY以及其他一些MERGE的功能来实现比较复杂
的逻辑合并
3.可以通过对要合并数据集的多个变量的比较进行合并
缺点:
1.要合并的数据集必须先排序或者建立索引
2.根据多个变量或者一个变量的比较进行数据集合并,那么前提是要合并的数据集必须包括
要比较的变量。
3.按照指定的变量进行严格比较,获得所需要的结果,而且必须至少有一个观测是匹配成功
的。
因为MERGE在DATASTEP里面,在匹配的过程中,如果匹配成功之后,不会再重复读取已经
匹配成功的数据观测,例如:如果crackman1的第一个观测和crackman2的第一个观测匹配
成功了,那么在进行第二个观测匹配时,就不会再去读取第一个观测的数据,所以效率会高
些。但是SQL不是这样的,SQL是多对多的扫描匹配,产生一个cartesianproduct,笛卡
尔乘积,所以相对MERGE显得速度和效率要慢一些
SQL:
优点:
L数据集不需要排序或者建立索引,当然如果建立索引更好
2.多个数据集可以
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 演讲稿小短文合集【五篇】
- 2025年清洁漂白项目申请报告
- 2024-2025学年西藏拉萨市城关区数学三上期末检测试题含解析
- 2025年工业废水处理项目提案报告模范
- 2025年无玷污采水器项目规划申请报告模范
- 2025年声学海流计项目提案报告模板
- 2025年沼气设备项目申请报告模稿
- 募捐倡议书模板锦集八篇
- 青铜葵花读后感汇编15篇
- 关于会计专业自我鉴定集锦9篇
- 北京市朝阳区2023-2024学年九年级上学期期末物理试卷
- 2024-2025学年高一上学期期末数学试卷(基础篇)(含答案)
- 统编版(2024版)道德与法治七年级上册期末质量监测试卷 3套(含答案)
- 智慧康复医院智能化总体规划方案
- 申能集团在线测评题目
- 十四五规划药剂科展望
- 四川政采评审专家入库考试基础题复习试题
- 一年级上册语文拼音前后鼻韵母和平翘专练
- 2025年产科护理工作计划
- 【MOOC】概率统计和随机过程-南京邮电大学 中国大学慕课MOOC答案
- 【2024】苏教版科学一年级上册每课教学反思(带目录)
评论
0/150
提交评论