




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Copyright©2019McGraw-HillEducation.Allrightsreserved.NoreproductionordistributionwithoutthepriorwrittenconsentofMcGraw-HillEducation.
CHAPTER2LABS-KEY
(Level1Header)Lab2-1Createarequestfordataextraction
Q1.GiventhatyouarenewandtryingtogetagrasponSláinte’soperations,listthreequestionsrelatedtosalesthatwouldhelpyoubeginyouranalysis.Forexample,howmanyproductsweresoldineachstate?
Open-ended–nokeyprovided.
Possibleanswers:
Whatisthehighestsellingproduct?
Howdoquantitiessoldperproductdifferacrossstates?
Whatisaveragequantityofeachproductsoldperday/perstate/permonth?
Whatisthetotalquantityofeachproductsoldperday/perstate/permonth?
Q2.Nowhypothesizetheanswerstoeachofthequestions.Remember,youranswersdon’thavetobecorrectatthispoint.Theywillhelpyouunderstandwhattypeofdatayouarelookingfor.Forexample:500inMissouri,6,000inPennsylvania,4,000inNewYork,etc.
Open-ended–nokeyprovided.
Q3.Finally,foreachquestion,identifythespecifictablesandattributesthatareneededtoansweryourquestions.Forexample,toanswerthequestionaboutstatesales,youwouldneedthe[State]attributewhichismostlikelylocatedinthe[Customer]mastertableaswellasa[QuantitySold]attributeina[Sales]table.Ifyouhadaccesstostoreordistributioncenterlocationdata,youmayalsolookfora[State]fieldthereaswell.
Open-ended–nokeyprovided.
(Level2Header)Part2:Generatearequestfordata
Nowthatyou’veidentifiedthedatayouneedforyouranalysis,completeaDataRequestForm.
OpentheDataRequestForm
Enteryourcontactinformation.
Inthedescriptionfield,identifythetablesthatyou’dliketoanalyze,alongwiththetimeperiods(e.g.pastmonth,pastyear,etc.)
Open-ended–nokeyprovided.
Selectafrequency.Inthiscasethisisa“One-offrequest”.
Enterarequestdate(today)andarequireddate(oneweekfromtoday)
Chooseaformat(spreadsheet).
FinallycompletetheTobeusedinbox(internalanalysis).
TAKEASCREENSHOT(2-1)ofyourcompletedform.
(Level2Header)Part3:Performananalysisofthedata
Q4.Takeamomentandidentifyanyattributesthatyouaremissingfromyouroriginalrequestthatwouldbenecessarytoansweryouroriginalquestionof“Howmanyproductsweresoldineachstate?”.
Open-ended–nokeyprovided.
Q5.Evaluateyouroriginalquestionsandresponses.Canyoustillanswertheoriginalquestion?
Open-ended–nokeyprovided.
Q6.IsthereanotherquestionyoucouldanswerfromthedataRachelprovided?
Possibleanswers:
Howmanysalesordershaseachemployeecreated?
HowmanysaleswerecreatedinthemonthofOctober?
Howmuchmoneywasgeneratedthroughsalesfortheentireperiod?
HowmuchmoneywasgeneratedthroughsalesforthemonthofOctober?
ENDOFLAB
(Level1Header)Lab2-2UsePivotTablestode-normalizeandanalyzethedata
(Level2Header)Part1:IdentifytheQuestions
Q1.GivenSláinte’srequest,identifythedataattributesandtablesneededtoanswerthequestion.
Sales_Subset:Product_Code,Sales_Order_Quantity_Sold
WouldalsobehelpfultohaveFGI_Product:Product_Description
(Level2Header)Part2:Masterthedata:PreparedataforanalysisinExcel
Q2.Whenwoulditbeagoodideatouseasingletable?
Anytimeallofthedatayouneedareinasingletable,thereisnoneedtoextractmorethanonetable.
Alternative2:UsetheExcelInternalDataModel
TAKEASCREENSHOT(2-2a)oftheManageRelationshipswindowwithbothrelationshipscreated.
Q3.Howcomfortableareyouwithidentifyingprimarykey-foreignkeyrelationships?
KEY:open-endedquestion,nokeyprovided
Alternative3:MergingthedataintoasingletableusingExcelQueryEditor
MaximizetheQueryEditorwindow,andTAKEASCREENSHOT(2-2b).
KEYScreenshot:
Q4.HaveyouusedtheQueryEditorinExcelbefore?Double-clickthe[Sales_Subset]queryandclickthroughthetabsontheribbon.Whichoptionsdoyouthinkwillbeusefulinthefuture?
KEY:Open-endedquestion,nokeyprovided.
Alternative4:UseSQLqueriesinAccess
TAKEASCREENSHOT(2-2c).
KEY:Screenshot
(Level2Header)Part3:PerformananalysisusingPivotTablesandQueries
TAKEASCREENSHOT(2-2d)
KEYSREENSHOT:
TAKEASCREENSHOT(2-2e)
Keyscreenshot:
TAKEASCREENSHOT(2-2f)
Keyscreenshot:
SaveyourqueryasTotal_Sales_By_Productandcloseyourdatabase.
(Level2Header)Part4:Addressandrefineyourresults
Q5.IftheownerofSláintewishestoidentifywhichproductsoldthemost,howwouldyoumakethisreportmoreuseful?
Severalpossibleanswers.Someoptionsinclude:sortingthedataorfilteringthedatatoviewonlytheproductassociatedwithhighesttotal_sales.
Q6.Ifyouwantedtoprovidemoredetail,whatotherattributeswouldbeusefultoaddasadditionalrowsorcolumnstoyourreport,orwhatotherreportswouldyoucreate?
Manypossibleanswers.AgoodoptionwouldbetoincludeDatedatafromtheSales_Subsettabletodoanalysisonwhichproductsellsmorebasedonmonthsorseasons.
(Level2Header)Part5:Communicateyourfindings
Let'smakethiseasyforotherstounderstandusingvisualizationandexplanations.
Q7.WriteabriefparagraphabouthowyouwouldinterprettheresultsofyouranalysisinplainEnglish?Forexample,whichdatapointsstandout?
Open-endedquestion,nosolutionprovided.
Q8.InChapter4we’lldiscusssomevisualizationtechniques.Describeawayyoucouldpresentthisdataasachartorgraph.
Open-endedquestion,nosolutionprovided.PossibleanswersincludePivotChartvisualizedasabarchart,usingfilters,slicers,ortimelinestomakethedatamoreinteractive.
Endoflab
(Level1Header)Lab2-3ResolvecommondataproblemsinExcelandAccess
Q1.WhatdoyouexpectwillbemajordataqualityissueswithLendingClub’sdata?
Open-endedquestion,nokeyprovided.StudentsshouldrelyonwhattheylearnedinChapter2regardingdataqualityissuestomakeassumptionsonwhatcouldcauseproblemsinthisfile.
(Level2Header)Part2:MastertheData
Q2.Giventhislistofattributes,whatconcernsdoyouhavewiththedata’sabilitytopredictanswerstothequestionsyouidentifiedinChapter1?
Open-endedquestion,nokeyprovided.
Q3.Isthereanythinginthedatathatyouthinkwillmakeanalysisdifficult?Forexample,arethereanyspecialsymbols,non-standarddata,ornumbersthatlookoutofplace?
Open-endedquestion,nokeyprovided.Thesheersizeofthedatamaystrikesomestudentsasbeingdifficulttoanalyze,aswellastheamountofblank/nullvalues.
Q4.Whatwouldyoudotocleanthedatainthisfile?
Open-endedquestion,nokeyprovided.Thenextsectionofthelab,“Let’sidentifysomeissueswiththedata…”introducesseveraloftheitemsthatneedtobecleaned(ortransformed).
Let’sidentifysomeissueswiththedata.
Therearemanyattributeswithoutanydata,andthatmaynotbenecessary.
The[int_rate]valuesarewrittenin##.##%,butanalysiswillrequire#.####
The[term]valuesincludetheword“months”,whichshouldberemovedfornumericalanalysis.
The[emp_length]valuesinclude“n/a”,“<”,“+”,“year”,and“years”,whichshouldberemovedfornumericalanalysis
Dates,including[issue_d],canbemoreusefulifweexpandthemtoshowtheday,month,andyearasseparateattributes.Datescauseissuesingeneralbecausedifferentsystemsusedifferentdateformats(e.g.1/9/2009,Jan-2009,9/1/2009forEuropeandates,etc.),sotypicallysomeconversionisnecessary.
First,removetheunwanteddata:
Saveyourfileas“Loans2007-2011.xlsx”totakeadvantageofsomeofExcel’sfeatures.
Deletethefirstrowthatsays“Notesofferedbyprospectus…”
Deletethelastfourrowsthatinclude“Totalamountfunded…”
Deletecolumnsthathavenovalues,including[id],[member_id],[url]
Repeatforanyotherblankcolumnsorunwantedattributes.
Thecolumnswiththeheadersrevol_bal_joint,sec_app_earliest_cr_line,sec_app_inq_last_6mths,sec_app_mort_acc,sec_app_open_acc,sec_app_revol_util,sec_app_open_act_il,sec_app_num_rev_accts,sec_app_chargeoff_within_12_mths,sec_app_collections_12_mths_ex_med,andsec_app_mths_since_last_major_derogcanalsobedeleted.
Next,fixyournumbers:
Selectthe[int_rate]column.
IntheHometab,gototheNumbersectionandchangethenumbertypefromPercentagetoGeneralusingthedrop-downmenu.
Repeatforanyotherattributeswithpercentages.
TAKEASCREENSHOT(2-3a)ofyourpartiallycleaneddatafile.
KeyScreenshot:
Then,removeanywordsfromnumericalvalues:
Selectthe[term]column.
UseFind&Replace(Ctrl+HorHome>Editing>Find&Select>Find&Replace)tofindthewords“months”and“month”andreplacethemwithanull/blankvalue“”.Important:Besuretoincludethespacebeforethewordsandgofromthelongestvariationofthewordtotheshortest.Inthiscaseifyoureplaced“month”first,youwouldendupwithalotofvaluesthatstillhadtheletter“s”From“months”.
Nowselectthe[emp_length]columnandfindandreplacethefollowingvalues:
Originalvalue
Newvalue
naorn/a
0
<1year
0
1year
1
2years
2
3years
3
4years
4
5years
5
6years
6
7years
7
8years
8
9years
9
10+years
10
,(comma)
(blank)
ThiscanbedoneeitherwithFindandReplaceorwithaFalseVLookup.Then/acellshavenonprintablecharactersinthem,sothe=CLEANfunctionwillbeusefulforensuringthen/avaluesarefoundintheircells.
TAKEASCREENSHOT(2-3b)ofyourpartiallycleaneddatafile,showingthe[term]column.
Q5.Whydoyouthinkitisusefultoreformatandextractpartsofthedatesbeforeyouconductyouranalysis?Whatdoyouthinkwouldhappenifyoudidn’t?
Open-endedquestion,nokeyprovided.Possibleanswersincludethatifyouintendtodoanalysisonanyofthevaluesthathaven’tfirstbeencleaned,youranalysismaynotworkormaynotrunonthecompletedataset.
Q6.Didyourunintoanymajorissueswhenyouattemptedtocleanthedata?Howwouldyouresolvethose?
Open-endedquestion,nokeyprovided.
ENDOFLAB
(Level1Header)Lab2-4GeneratesummarystatisticsinExcel
Becauseeveryquestioninthislabisopen-ended,thereisnokeyprovided.
ENDOFLAB
(Level1Header)Lab2-5–CollegeScorecardExtractandDataPreparation
(Level2Header)Part2:MastertheData
Takeascreenshot(1)
ScreenshotKey:
Q1.Bylookingthroughthedatainthetextfile,whatdoyouthinkthedelimiteris?
Comma
Takeascreenshot(2)
ScreenshotKey:
Toensurethatyoucapturedallofthedatathroughtheextractionfromthe.txtfile,weneedtovalidateit.Validatethefollowingchecksums:
Youshouldhave7,704records(rows).
Comparetheattributenames(columnheaders)totheattributeslistedinthedatadictionary.Areyoumissingany,ordoyouhaveanyextras?
TheaverageSATscoreshouldbe1,059.07(thisisleavingNULLvaluesasNULL).
Q2.Inthechecksums,youvalidatedthattheaverageSATscoreforalloftherecordsis1,059.07.Whenweworkwiththedatamorerigorously,severaltestswillrequireustotransformNULLvalues.IfyouweretotransformtheNULLSATvaluesinto0,whatwouldhappentotheaverage(woulditstaythesame,decrease,orincrease)?
Theaveragewoulddecrease
Howwouldthatchangetotheaverageimpactthewayyouwouldinterpretthedata?
ItwouldinaccuratelyrepresentaverylowSATaverageacrossallschools(CorrectAnswer)
Doyouthinkit’sagoodideatoreplaceNULLvalueswith0sinthiscase?
No
ToavoidtheissueswithNULL,blanks,and0s,wewillremovealloftherecordsthatcontainNULLvaluesineitherSAT_AVGorC150_4.Doso.
Performa=COUNT()toverifytheamountofrecordsthatremainafterremovingallrecordsassociatedwithNULLvaluesinSAT_AVGorC150_4.1,271recordsshouldremain.
Takeascreenshot(3)
KeyScreenshot:
Yourdataisnowreadyforthetestplan.Thislabwillcontinueinchapter3.
ENDOFLAB
(Level1Header)Lab2-6ComprehensiveCase:Dillard’sStoreData:HowtoCreateanE-RDiagram
QuestionsforParts1-3areallopen-ended,nokeyprovided.
(Level2Header)Part4:AddressandRefineResults
Q3. WhatistheprimarykeyfortheTRANSACTtable?WhatistheprimarykeyfortheSKUtable?
ITEM_ID–CorrectAnswerforSKUtable
TRANSACTION_ID–CorrectAnswerforTRANSACTtable
Q4. HowdoweconnecttheSKUdatabasetotheTRANSACTtable?Howdowejointablesfromtwodifferentrelatedtables?
Tablesarejoinedbyrelatingtheforeignandprimarykeys.TheTRANSACTtablehasaforeignkeyfromtheSKUtable,sotherelationshipbetweenthetwocharactersisthejoiningofTRANSACT.ITEM_IDandSKU.ITEM_ID.
ENDOFLAB
(Level1Header)Lab2-7ComprehensiveCase:Dillard’sStoreData:HowtoPreviewDataFromTablesinaQuery
(Level2Header)Part1:IdentifytheQuestions
Q1. Howwouldaviewoftheentiredatabaseorcertaintablesoutofthatdatabaseallowustogetafeelforthedata?
Open-endedquestion,possibleanswersincludethatitisnecessarytoviewhowadatabaseisstructuredinordertoknowwhatdataisavailabletoanalyze.Viewingtheactualdatastoredintablescanhelpexplainwhatcertainattributesrepresent,especiallyifyoudon’thaveadatadictionaryorifthedatadictionaryisn’tverydescriptive.
Q2. WhattypesofdatawouldyouguessthatDillard’s,aretailstore,gatherthatmightbeuseful?HowcouldDillard’ssuppliersusethisdatatopredictfuturepurchases?
Open-endedquestion,nokeyprovided.Possibleanswersinclude:salesdata(salesorders,salesorderdates,itemssold),customerdata(whateachcustomerpurchases,wheretheylive),inventory(retailprice,cost,category),etc.
TAKEASCREENSHOTOFYOURRESULTS(2)
KEYScreenshot
Q3. Whatdoyouthink‘P’and‘R’representintheTRAN_TYPEtable?Howmighttransactionsdifferiftheyarerepresentedby‘P’or‘R’.
Answerswillvary,butPrepresentsPurchaseandRrepresentsReturn.
Q4. Whatbenefitcanyougainfromselectingonlythetopfewrowsofyourdata,particularlyfromalargedataset?
Answerswillvary,butsomepossiblesolutionsincludegettingaquickglanceatthedatawithouthavingtowaitforthequerytorunifit’salargedataset.
(Level1Header)Lab2-8ComprehensiveCase:Dillard’sStoreData:ConnectingExceltoaSQLDatabase
Q1. WhatcanyoudoinExcelthatismuchmoredifficulttodoinotherdatamanagementprograms?
Open-endedquestion,nokeyprovided.Possibleanswersmightbebasedaroundstudents’generalfamiliaritywithExcel–it’seasierforthemtoworkwiththanPython,R,orSQL,forexample,becauseofitsfriendlierinterface.
Q2. BecausemostaccountantsarefamiliarwithExcel,namethreedatamanagementfunctionsyoucandoeasierinExcelthananyotherprogram?Howdoesthatfamiliarityhelpyouwithyouranalysis?
Open-endedquestion,nokeyprovided.PossibleanswersincludePivotTables,Tables,andvisualizingdata(it’sarguablewhetherthesefunctionsaretrulyeasierinExcel,butmostofourstudentswillprobablythinkit’seasierinExcelatthispointduetohavinglessexposuretoothertools.Theiranswerstothisquestionisverydependentontheirpreviousexperience).
TakeascreenshotofthePivotTable.
Q3. ReferenceyourPivotTableandfindwhichstatehasthehighestnumberofDillard’sstores.Whichstateshavethefewest?Howmanystoresarethereacrossthecountry?
Texashasthehighestnumberofstores,NewYorkandWyominghavethelowest.Thereare313storesacrossthecountry.
Q4. CountingthenumberofstoresperstateisoneexampleofhowthedatathathasbeenloadedfromSQLServerintoExcelcanbecomeusefulinformationthroughaPivotTable.WhatareotherwaysthatyoucouldorganizetheSTOREdatainaPivotTabletocomeupwithmeaningfulinformation?
Open-endedquestion,possibleanswersincludedrillingdownintoDivisionandCity.Tomakethedatafarmoreinteresting,joininginothertablescouldprovidemeaningfulanalysis(salesperstore,salesperstate,etc.)
Q5.JoinsaremadebasedontheirPrimaryKey–ForeignKeyrelationship.LookingattheERDiagramorthedataset,whichtwocolumnsformtherelationshipbetweentheTRANSACTandSTOREtables?
Transact.Store=Store.Store
Q6.Lookingatthefirstseveralrowsofdata,comparetheamountsinORIG_PRICE,SALE_PRICE,TRAN_AMT.Whatdoyouthinktran_amtrepresents?
Thetotaltransactionamount,takingintoaccountdiscounts.
Q7.Whatarethemeansforeachoftheattributes?
ORIG_PRICE:53.99
SALE_PRICE:35.46
TRAN_AMT:27.84
Q8.ThemeanfromTRAN_AMTislowerthanthemeansforbothORIG_PRICEandSALE_PRICE,whydoyouthinkthatis?(Hint:itisnotanerror).
TheTRAN_AMTnotonlytakesintoaccountdiscounts,butalsoisnegativewhenthetransactionisareturn.
(Level2Header)Part5:AddressandRefineResults
Q9.HowdoesdoingaquerywithinExcelallowquickerandmoreefficientaccessandanalysisofthedata?
Open-endedquestion,nokeyprovided.Possibleresponsesincludenothavingtoexportthequeryresultsfromthedatabase.
Q10.Is15daysofdata
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 链接你我系统分析师试题及答案
- 多媒体设计师通过率提升方案及试题及答案
- 用心备考2025年网络规划设计师考试试题及答案
- 2025年网络设计师关键知识点及试题及答案
- 煤矿工人考试题库及答案
- 关于再障的试题及答案
- 社会工作发展史初级考试试题及答案
- 多媒体应用设计师考试提升方案试题及答案
- 加拿大空乘管理制度
- 体检指引管理制度
- 安徽省合肥市2025届高三下学期5月教学质量检测(三模)英语试卷(含音频)
- 贵州国企招聘2025贵州乌江煤层气勘探开发有限公司招聘16人笔试参考题库附带答案详解
- 浙能镇海联合发电公司燃机异地迁建改造项目环评报告
- 办公大楼保安试题及答案
- 新一代大型机场行李处理系统关键技术与应用
- 铁路电务设备培训课件
- 2024年云南曲靖公开招聘社区工作者考试试题答案解析
- 工业设计接单合同协议
- 湖南省长沙市雅礼集团2024-2025学年高二下学期3月月考物理试卷(原卷版+解析版)
- 营房维修考试题及答案
- 研发实验室试题及答案
评论
0/150
提交评论