版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
TopofForm
BottomofForm
\o"Home"
Home
\o"WhatisBigData?"
BigData
\o"FindHadoopTutorialshere"
HadoopTutorials
\o"CassandraandCQL"
Cassandra
\o"CassandraHectorAPI"
HectorAPI
\o"AskforaTutorial"
RequestTutorial
\o"AboutMeandBigDataPlanet"
About
LABELS:
HADOOP-TUTORIAL
,
HDFS
3OCTOBER2013
HadoopTutorial:Part1-WhatisHadoop?(anOverview)
HadoopisanopensourcesoftwareframeworkthatsupportsdataintensivedistributedapplicationswhichislicensedunderApachev2license.
At-leastthisiswhatyouaregoingtofindasthefirstlineofdefinitiononHadoopinWikipedia.So
whatisdataintensivedistributedapplications?
Well
dataintensive
isnothingbut
BigData
(datathathasoutgrowninsize)anddistributedapplications
aretheapplicationsthatworksonnetworkbycommunicatingand
coordinatingwitheachotherbypassingmessages.(sayusingaRPCinterprocesscommunicationorthroughMessage-Queue)
HenceHadoopworksonadistributedenvironmentandisbuildtostore,handleandprocesslargeamountofdataset(inpetabytes,exabyteandmore).Nowheresinceiamsayingthathadoopstorespetabytesofdata,thisdoesn'tmeanthatHadoopisadatabase.Againrememberitsaframeworkthathandleslargeamountofdataforprocessing.YouwillgettoknowthedifferencebetweenHadoopandDatabases(orNoSQLDatabases,wellthat'swhatwecallBigData'sdatabases)asyougodownthelineinthecomingtutorials.
HadoopwasderivedfromtheresearchpaperpublishedbyGoogleon
GoogleFileSystem(GFS)
and
Google'sMapReduce.SotherearetwointegralpartsofHadoop:
HadoopDistributedFileSystem(HDFS)
and
HadoopMapReduce.
HadoopDistributedFileSystem(HDFS)
HDFSisafilesystemdesignedforstoring
verylargefiles
with
streamingdataaccesspatterns,runningonclustersof
commodityhardware.
WellLetsgetintothedetailsofthestatementmentionedabove:
VeryLargefiles:
Nowwhenwesayverylargefileswemeanherethatthesizeofthefilewillbeinarangeofgigabyte,terabyte,petabyteormaybemore.
Streamingdataaccess:
HDFSisbuiltaroundtheideathatthemostefficientdataprocessingpatternisawrite-once,read-many-timespattern.Adatasetistypicallygeneratedorcopiedfromsource,andthenvariousanalysesareperformedonthatdatasetovertime.Eachanalysiswillinvolvealargeproportion,ifnotall,ofthedataset,sothetimetoreadthewholedatasetismoreimportantthanthelatencyinreadingthefirstrecord.
CommodityHardware:
Hadoopdoesn'trequireexpensive,highlyreliablehardware.It’sdesignedtorun
onclustersofcommodityhardware(commonlyavailablehardwarethatcanbeobtainedfrommultiplevendors)forwhichthechanceofnodefailureacrosstheclusterishigh,atleastforlargeclusters.HDFSisdesignedtocarryonworkingwithoutanoticeableinterruptiontotheuserinthefaceofsuchfailure.
NowherewearetalkingaboutaFileSystem,HadoopDistributedFileSystem.AndweallknowaboutafewoftheotherFileSystemslikeLinuxFileSystemandWindowsFileSystem.Sothenextquestioncomesis...
WhatisthedifferencebetweennormalFileSystemandHadoopDistributedFileSystem?
ThemajortwodifferencesthatisnotablebetweenHDFSandotherFilesystemsare:
BlockSize:
Everydiskismadeupofablocksize.Andthisisthe
minimum
amountofdatathatiswrittenandreadfromaDisk.NowaFilesystemalsoconsistsofblockswhichismadeoutoftheseblocksonthedisk.Normallydiskblocksareof512bytesandthoseoffilesystemareofafewkilobytes.
Incaseof
HDFS
wealsohavetheblocksconcept.Buthereoneblocksizeisof64MBbydefaultandwhichcanbeincreasedinanintegralmultipleof64i.e.128MB,256MB,512MBorevenmoreinGB's.Italldependontherequirementanduse-cases.
SoWhyaretheseblockssizesolargeforHDFS?keeponreadingandyouwillgetitinanextfewtutorials:)
Metadata
Storage:
Innormalfilesystem
thereisa
hierarchical
storageofmetadatai.e.letssaythereisafolder
ABC,
insidethatfolderthereisagainoneanotherfolder
DEF,
andinsidethatthereis
hello.txt
file.Nowtheinformationabout
hello.txt
(i.e.metadatainfoofhello.txt)
filewillbewith
DEF
andagainthemetadataof
DEF
willbewith
ABC.Hencethisformsa
hierarchy
andthishierarchyismaintaineduntiltherootofthefilesystem.Butin
HDFS
wedon'thaveahierarchyofmetadata.Allthemetadatainformationresideswithasinglemachineknownas
Namenode
(orMasterNode)onthecluster.Andthisnodecontainsalltheinformationaboutotherfilesandfolderandlotsofotherinformationtoo,whichwewilllearninthenextfewtutorials.:)
WellthiswasjustanoverviewofHadoopandHadoopDistributedFileSystem.NowinthenextpartiwillgointothedepthofHDFSandthereafterMapReduceandwillcontinuefromhere...
Letmeknowifyouhaveanydoubtsin
understanding
anythingintothecommentsectionandiwillbereallygladtoanswerthesame:)
IfyoulikewhatyoujustreadandwanttocontinueyourlearningonBIGDATAyoucan
subscribetoourEmail
andLikeour
facebookpage
Thesemightalsohelpyou:,
HadoopTutorial:Part4-WriteOperationsinHDFS
HadoopTutorial:Part3-ReplicaPlacementorReplicationandReadOperationsinHDFS
HadoopTutorial:Part2-HadoopDistributedFileSystem(HDFS)
HadoopTutorial:Part1-WhatisHadoop?(anOverview)
BestofBooksandResourcestoGetStartedwithHadoop
HadoopTutorial:Part5-AllHadoopShellCommandsyouwillNeed.
HadoopInstallationonLocalMachine(SinglenodeCluster)
FindCommentsbeloworAddone
RomainRigaux
said...
Nicesummary!
\o"commentpermalink"
October03,2013
pragyakhare
said...
Iknowi'mabeginnerandthisquestionmytbeasilly1butcanyoupleaseexplaintomethathowPARALLELISMisachievedviamap-reduceattheprocessorlevel???ifI'veadualcoreprocessor,isitthatonly2jobswillrunatatimeinparallel?
\o"commentpermalink"
October05,2013
Anonymoussaid...
HiIamfromMainframebackgroundandwithlittleknowledgeofcorejava...DoyouthinkJavaisneededforlearningHadoopinadditiontoHive/PIG?EvenwanttolearnJavaformapreducebutcouldn'tfindwhatallwillbeusedinrealtime..anddefinitiveguidebooksseemstoughforlearningmapreducewithJava..anyoptionwhereIcanlearnitstepbystep?
Sorryforlongcomment..butitwouldbehelpfulifyoucanguideme..
\o"commentpermalink"
October05,2013
DeepakKumar
said...
@PragyaKhare...
Firstthingalwaysremember...theonePopularsayingNOQuestionsareFoolish:)Andbtwitisaverygoodquestion.
Actuallytherearetwothings:
Oneiswhatwillbethebestpractice?andotheriswhathappensintherebydefault?...
Wellbydefaultthenumberofmapperandreducerissetto2foranytasktracker,henceoneseesamaximumof2mapsand2reducesatagiveninstanceonaTaskTracker(whichisconfigurable)..WellthisDoesn'tonlydependontheProcessorbutonlotsofotherfactoraswelllikeram,cpu,power,diskandothers
/blog/best-practices-for-selecting-apache-hadoop-hardware/
Andfortheotherfactori.eforBestPracticesitdependsonyourusecase.Youcangothroughthe3rdpointofthebelowlinktounderstanditmoreconceptually
/blog/2009/12/7-tips-for-improving-mapreduce-performance/
WelliwillexplainallthesewheniwillreachtheadvanceMapReducetutorials..Tillthenkeepreading!!:)
\o"commentpermalink"
October05,2013
DeepakKumar
said...
@Anonymous
AsHadoopiswritteninJava,somostofitsAPI'sarewrittenincoreJava...WelltoknowabouttheHadooparchitectureyoudon'tneedJava...ButtogotoitsAPILevelandstartprogramminginMapReduceyouneedtoknowCoreJava.
Andasfortherequirementinjavayouhaveaskedfor...youjustneedsimplecorejavaconceptsandprogrammingforHadoopandMapReduce..AndHive/PIGaretheSQLkindofdataflowlanguagesthatisreallyeasytolearn...Andsinceyouarefromaprogrammingbackgrounditwon'tbeverydifficulttolearnjava:)youcanalsogothroughthelinkbelowforfurtherdetails:)
/2013/09/What-are-the-Pre-requsites-for-getting-started-with-Big-Data-Technologies.html
\o"commentpermalink"
October05,2013
PostaComment
\o"NewerPost"
NewerPost→
\o"OlderPost"
←OlderPost
ABOUTTHEAUTHOR
DEEPAKKUMAR
BigData/HadoopDeveloper,SoftwareEngineer,Thinker,Learner,Geek,Blogger,Coder
IlovetoplayaroundData.
BigData
!
SubscribeupdatesviaEmail
TopofForm
JoinBigDataPlanettocontinueyourlearningonBigDataTechnologies
BottomofForm
GetUpdatesonFacebook
BigDataLibraries
BIGDATANEWS
CASSANDRA
HADOOP-TUTORIAL
HDFS
HECTOR-API
INSTALLATION
SQOOP
WhichNoSQLDatabasesaccordingtoyouisMostPopular?
GetConnectedonGoogle+
MostPopularBlogArticle
HadoopInstallationonLocalMachine(SinglenodeCluster)
HadoopTutorial:Part5-AllHadoopShellCommandsyouwillNeed.
WhatarethePre-requisitesforgettingstartedwithBigDataTechnologies
HadoopTutorial:Part3-ReplicaPlacementorReplicationandReadOperationsinHDFS
HadoopTutorial:Part1-WhatisHadoop?(anOverview)
HadoopTutorial:Part2-HadoopDistributedFileSystem(HDFS)
HadoopTutorial:Part4-WriteOperationsinHDFS
BestofBooksandResourcestoGetStartedwithHadoop
HowtouseCassandraCQLinyourJavaApplication
BacktoTop▲
#Note:UseScreenResolutionof1280pxandmoretoviewthewebsite@itsbest.AlsousethelatestversionofthebrowserasthewebsiteusesHTML5andCSS3:)
\o"Twitter:@bigdataplanet"
\o"Facebook:BigDataPlanet"
\o"RSSFeed:Blog"
RSS
\o"GooglePlus:BigDataPlanet"
ABOUTME
CONTACT
PRIVACYPOLICY
©2013AllRightsReserved
BigDataPlanet.
Allarticlesonthiswebsite
by
DeepakKumar
islicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike3.0UnportedLicense
TopofForm
BottomofForm
\o"Home"
Home
\o"WhatisBigData?"
BigData
\o"FindHadoopTutorialshere"
HadoopTutorials
\o"CassandraandCQL"
Cassandra
\o"CassandraHectorAPI"
HectorAPI
\o"AskforaTutorial"
RequestTutorial
\o"AboutMeandBigDataPlanet"
About
LABELS:
HADOOP-TUTORIAL
,
HDFS
6OCTOBER2013
HadoopTutorial:Part2-HadoopDistributedFileSystem(HDFS)
Inthelasttutorialon
WhatisHadoop?
ihavegivenyouabriefideaaboutHadoop.SothetwointegralpartsofHadoopisHadoop
HDFS
andHadoop
MapReduce.
LetsgofurtherdeepinsideHDFS.
HadoopDistributedFileSystem
(HDFS)
Concepts:
FirsttakealookatthefollowingtwoterminologiesthatwillbeusedwhiledescribingHDFS.
Cluster:Ahadoopclusterismadebyhavingmanymachinesinanetwork,eachmachineistermedasanode,andthesenodestalkstoeachotheroverthenetwork.
BlockSize:
Thisistheminimumamountofsizeofoneblockinafilesystem,inwhichdatacanbekeptcontiguously.
ThedefaultsizeofasingleblockinHDFSis64Mb.
InHDFS,Dataiskeptbysplittingitintosmallchunksorparts.Letssayyouhaveatextfileof200MBandyouwanttokeepthisfileinaHadoopCluster.Thenwhathappensisthat,
thefilebreaksorsplitsintoalargenumberofchunks,whereeachchunkisequaltotheblocksizethatissetfortheHDFScluster(whichis64MBbydefault).
Hencea200Mboffilegetssplitinto4parts,3partsof64mband1partof8mb,andeachpartwillbekeptonadifferentmachine.OnwhichmachinewhichsplitwillbekeptisdecidedbyNamenode,aboutwhichwewillbediscussingindetailsbelow.
NowinaHadoopDistributedFileSystemorHDFSCluster,therearetwokindsofnodes,AMasterNodeandmanyWorkerNodes.Theseareknownas:
Namenode(masternode)andDatanode(workernode).
Namenode:
Thenamenodemanagesthefilesystemnamespace.Itmaintainsthefilesystemtreeandthemetadataforallthefilesanddirectoriesinthetree.Soitcontainstheinformationofallthefiles,directoriesandtheirhierarchyintheclusterintheformofa
NamespaceImage
and
editlogs.AlongwiththefilesysteminformationitalsoknowsabouttheDatanodeonwhich
alltheblocksofafileiskept.
Aclientaccessesthefilesystemonbehalfoftheuserbycommunicatingwiththenamenodeanddatanodes.TheclientpresentsafilesysteminterfacesimilartoaPortableOperatingSystemInterface(POSIX),sotheusercodedoesnotneedtoknowaboutthenamenodeanddatanodetofunction.
Datanode:
Thesearetheworkersthatdoestherealwork.Andherebyrealworkwemeanthatthestorageofactualdataisdonebythedatanode.Theystoreandretrieveblockswhentheyaretoldto(byclientsorthenamenode),andtheyreportbacktothenamenodeperiodicallywithlistsofblocksthattheyarestoring.
Hereoneimportantthingthatistheretonote:
InoneclustertherewillbeonlyoneNamenodeandtherecanbeNnumberofdatanodes.
SincetheNamenodecontainsthemetadataofallthefilesanddirectoriesandalsoknowsaboutthedatanodeonwhicheachsplitoffilesarestored.SoletssayNamenodegoesdownthenwhatdoyouthinkwillhappen?.
Yes,iftheNamenodeisDownwecannotaccessanyofthefilesanddirectoriesinthecluster.
Evenwewillnotbeabletoconnectwithanyofthedatanodestogetanyofthefiles.
Nowthinkofit,sincewehavekeptourfilesbysplittingitin
different
chunksandalsowehavekeptthemindifferentdatanodes.AnditistheNamenodethatkeepstrackofallthefilesmetadata.SoonlyNamenodeknowshowtoreconstructafilebackintoonefromallthesplits.andthisisthereasonthatifNamenodeisdowninahadoopclustersoeverythingisdown.
Thisisalsothereason
that's
whyHadoopisknownasaSinglePointoffailure.
NowsinceNamenodeissoimportant,wehavetomakethenamenoderesilienttofailure.Andforthathadoopprovidesuswithtwomechanism.
Thefirstwayistobackupthefilesthatmakeupthepersistentstateofthefilesystemmetadata.Hadoopcanbeconfiguredsothatthenamenodewritesitspersistentstatetomultiplefilesystems.Thesewritesaresynchronousandatomic.TheusualconfigurationchoiceistowritetolocaldiskaswellasaremoteNFSmount.
Thesecondwayisrunninga
SecondaryNamenode.
Wellasthenamesuggests,it
doesnot
actlikeaNamenode.Soifitdoesn'tactlikeanamenodehowdoesitpreventsfromthefailure.
Wellthe
Secondarynamenode
alsocontainsa
namespaceimage
and
editlogs
likenamenode.Nowaftereverycertainintervaloftime(whichisonehourbydefault)
itcopiesthe
namespaceimage
from
namenode
andmergethis
namespaceimage
withthe
editlog
andcopyitbacktothe
namenode
sothat
namenode
willhavethefreshcopyof
namespaceimage.Nowletssupposeatanyinstanceoftimethe
namenodegoesdownandbecomescorruptthenwecanrestart
someothermachinewiththenamespaceimageandtheeditlogthat'swhatwehavewiththe
secondarynamenodeandhencecanbepreventedfromatotalfailure.
SecondaryNamenodetakesalmostthesameamountofmemoryandCPUforitsworkingastheNamenode.Soitisalsokeptinaseparatemachinelikethatofanamenode.Henceweseeherethat
inasingleclusterwehaveoneNamenode,oneSecondarynamenodeandmanyDatanodes,andHDFSconsistsofthesethreeelements.
ThiswasagainanoverviewofHadoopDistributedFileSystemHDFS,InthenextpartofthetutorialwewillknowabouttheworkingofNamenodeandDatanodeinamoredetailedmanner.WewillknowhowreadandwritehappensinHDFS.
Letmeknowifyouhaveanydoubtsin
understanding
anythingintothecommentsectionandiwillbereallygladtoansweryourquestions:)
IfyoulikewhatyoujustreadandwanttocontinueyourlearningonBIGDATAyoucan
subscribetoourEmail
andLikeour
facebookpage
Thesemightalsohelpyou:,
HadoopInstallationonLocalMachine(SinglenodeCluster)
HadoopTutorial:Part4-WriteOperationsinHDFS
HadoopTutorial:Part3-ReplicaPlacementorReplicationandReadOperationsinHDFS
HadoopTutorial:Part2-HadoopDistributedFileSystem(HDFS)
HadoopTutorial:Part1-WhatisHadoop?(anOverview)
BestofBooksandResourcestoGetStartedwithHadoop
HadoopTutorial:Part5-AllHadoopShellCommandsyouwillNeed.
FindCommentsbeloworAddone
vishwash
said...
veryinformative...
\o"commentpermalink"
October07,2013
TusharKarande
said...
Thanksforsuchainformatictutorials:)
pleasekeepposting..waitingformore...:)
\o"commentpermalink"
October08,2013
Anonymoussaid...
NiceinformationButIhaveonedoubtlike,whatistheadvantageofkeepingthefileinpartofchunksondifferent-2datanodes?Whatkindofbenefitwearegettinghere?
\o"commentpermalink"
October08,2013
DeepakKumar
said...
@Anonymous:Welltherearelotsofreasons...iwillexplainthatwithgreatdetailsinthenextfewarticles...
Butfornowletusunderstandthis...sincewehavesplitthefileintotwo,nowwecantakethepoweroftwoprocessors(parallelprocessing)ontwodifferentnodestodoouranalysis(likesearch,calculation,predictionandlotsmore)..Againletssaymyfilesizeisinsomepetabytes...Yourwon'tfindoneHarddiskthatbig..andletssayifitisthere...howdoyouthinkthatwearegoingtoreadandwriteonthatharddisk(thelatencywillbereallyhightoreadandwrite)...itwilltakelotsoftime...Againtherearemorereasonsforthesame...Iwillmakeyouunderstandthisinmoretechnicalwaysinthecomingtutorials...Tillthenkeepreading:)
\o"commentpermalink"
October08,2013
PostaComment
\o"NewerPost"
NewerPost→
\o"OlderPost"
←OlderPost
ABOUTTHEAUTHOR
DEEPAKKUMAR
BigData/HadoopDeveloper,SoftwareEngineer,Thinker,Learner,Geek,Blogger,Coder
IlovetoplayaroundData.
BigData
!
SubscribeupdatesviaEmail
TopofForm
JoinBigDataPlanettocontinueyourlearningonBigDataTechnologies
BottomofForm
GetUpdatesonFacebook
BigDataLibraries
BIGDATANEWS
CASSANDRA
HADOOP-TUTORIAL
HDFS
HECTOR-API
INSTALLATION
SQOOP
WhichNoSQLDatabasesaccordingtoyouisMostPopular?
GetConnectedonGoogle+
MostPopularBlogArticle
HadoopInstallationonLocalMachine(SinglenodeCluster)
HadoopTutorial:Part5-AllHadoopShellCommandsyouwillNeed.
WhatarethePre-requisitesforgettingstartedwithBigDataTechnologies
HadoopTutorial:Part3-ReplicaPlacementorReplicationandReadOperationsinHDFS
HadoopTutorial:Part1-WhatisHadoop?(anOverview)
HadoopTutorial:Part2-HadoopDistributedFileSystem(HDFS)
HadoopTutorial:Part4-WriteOperationsinHDFS
BestofBooksandResourcestoGetStartedwithHadoop
HowtouseCassandraCQLinyourJavaApplication
BacktoTop▲
#Note:UseScreenResolutionof1280pxandmoretoviewthewebsite@itsbest.AlsousethelatestversionofthebrowserasthewebsiteusesHTML5andCSS3:)
\o"Twitter:@bigdataplanet"
\o"Facebook:BigDataPlanet"
\o"RSSFeed:Blog"
RSS
\o"GooglePlus:BigDataPlanet"
ABOUTME
CONTACT
PRIVACYPOLICY
©2013AllRightsReserved
BigDataPlanet.
Allarticlesonthiswebsite
by
DeepakKumar
islicensedundera
CreativeCommonsAttribution-NonCommercial-ShareAlike3.0UnportedLicense
TopofForm
BottomofForm
\o"Home"
Home
\o"WhatisBigData?"
BigData
\o"FindHadoopTutorialshere"
HadoopTutorials
\o"CassandraandCQL"
Cassandra
\o"CassandraHectorAPI"
HectorAPI
\o"AskforaTutorial"
RequestTutorial
\o"AboutMeandBigDataPlanet"
About
LABELS:
HADOOP-TUTORIAL
,
HDFS
3OCTOBER2013
HadoopTutorial:Part1-WhatisHadoop?(anOverview)
HadoopisanopensourcesoftwareframeworkthatsupportsdataintensivedistributedapplicationswhichislicensedunderApachev2license.
At-leastthisiswhatyouaregoingtofindasthefirstlineofdefinitiononHadoopinWikipedia.So
whatisdataintensivedistributedapplications?
Well
dataintensive
isnothingbut
BigData
(datathathasoutgrowninsize)anddistributedapplications
aretheapplicationsthatworksonnetworkbycommunicatingand
coordinatingwitheachotherbypassingmessages.(sayusingaRPCinterprocesscommunicationorthroughMessage-Queue)
HenceHadoopworksonadistributedenvironmentandisbuildtostore,handleandprocesslargeamountofdataset(inpetabytes,exabyteandmore).Nowheresinceiamsayingthathadoopstorespetabytesofdata,thisdoesn'tmeanthatHadoopisadatabase.Againrememberitsaframeworkthathandleslargeamountofdataforprocessing.YouwillgettoknowthedifferencebetweenHadoopandDatabases(orNoSQLDatabases,wellthat'swhatwecallBigData'sdatabases)asyougodownthelineinthecomingtutorials.
HadoopwasderivedfromtheresearchpaperpublishedbyGoogleon
GoogleFileSystem(GFS)
and
Google'sMapReduce.SotherearetwointegralpartsofHadoop:
HadoopDistributedFileSystem(HDFS)
and
HadoopMapReduce.
HadoopDistributedFileSystem(HDFS)
HDFSisafilesystemdesignedforstoring
verylargefiles
with
streamingdataaccesspatterns,runningonclustersof
commodityhardware.
WellLetsgetintothedetailsofthestatementmentionedabove:
VeryLargefiles:
Nowwhenwesayverylargefileswemeanherethatthesizeofthefilewillbeinarangeofgigabyte,terabyte,petabyteormaybemore.
Streamingdataaccess:
HDFSisbuiltaroundtheideathatthemostefficientdataprocessingpatternisawrite-once,read-many-timespattern.Adatasetistypicallygeneratedorcopiedfromsource,andthenvariousanalysesareperformedonthatdatasetovertime.Eachanalysiswillinvolvealargeproportion,ifnotall,ofthedataset,sothetimetoreadthewholedatasetismoreimportantthanthelatencyinreadingthefirstrecord.
CommodityHardware:
Hadoopdoesn'trequireexpensive,highlyreliablehardware.It’sdesignedtorun
onclustersofcommodityhardware(commonlyavailablehardwarethatcanbeobtainedfrommultiplevendors)forwhichthechanceofnodefailureacrosstheclusterishigh,atleastforlargeclusters.HDFSisdesignedtocarryonworkingwithoutanoticeableinterruptiontotheuserinthefaceofsuchfailure.
NowherewearetalkingaboutaFileSystem,HadoopDistributedFileSystem.AndweallknowaboutafewoftheotherFileSystemslikeLinuxFileSystemandWindowsFileSystem.Sothenextquestioncomesis...
WhatisthedifferencebetweennormalFileSystemandHadoopDistributedFileSystem?
ThemajortwodifferencesthatisnotablebetweenHDFSandotherFilesystemsare:
BlockSize:
Everydiskismadeupofablocksize.Andthisisthe
minimum
amountofdatathatiswrittenandreadfromaDisk.NowaFilesystemalsoconsistsofblockswhichismadeoutoftheseblocksonthedisk.Normallydiskblocksareof512bytesandthoseoffilesystemareofafewkilobytes.
Incaseof
HDFS
wealsohavetheblocksconcept.Buthereoneblocksizeisof64MBbydefaultandwhichcanbeincreasedinanintegralmultipleof64i.e.128MB,256MB,512MBorevenmoreinGB's.Italldependontherequirementanduse-cases.
SoWhyaretheseblockssizesolargeforHDFS?keeponreadingandyouwillgetitinanextfewtutorials:)
Metadata
Storage:
Innormalfilesystem
thereisa
hierarchical
storageofmetadatai.e.letssaythereisafolder
ABC,
insidethatfolderthereisagainoneanotherfolder
DEF,
andinsidethatthereis
hello.txt
file.Nowtheinformationabout
hello.txt
(i.e.metadatainfoofhello.txt)
filewillbewith
DEF
andagainthemetadataof
DEF
willbewith
ABC.Hencethisformsa
hierarchy
andthishierarchyismaintaineduntiltherootofthefilesystem.Butin
HDFS
wedon'thaveahierarchyofmetadata.Allthemetadatainformationresideswithasinglemachineknownas
Namenode
(orMasterNode)onthecluster.Andthisnodecontainsalltheinformationaboutotherfilesandfolderandlotsofotherinformationtoo,whichwewilllearninthenextfewtutorials.:)
WellthiswasjustanoverviewofHadoopandHadoopDistributedFileSystem.NowinthenextpartiwillgointothedepthofHDFSandthereafterMapReduceandwillcontinuefromhere...
Letmeknowifyouhaveanydoubtsin
understanding
anythingintothecommentsectionandiwillbereallygladtoanswerthesame:)
IfyoulikewhatyoujustreadandwanttocontinueyourlearningonBIGDATAyoucan
subscribetoourEmail
andLikeour
facebookpage
Thesemightalsohelpyou:,
HadoopTutorial:Part4-WriteOperationsinHDFS
HadoopTutorial:Part3-ReplicaPlacementorReplicationandReadOperationsinHDFS
HadoopTutorial:Part2-HadoopDistributedFileSystem(HDFS)
HadoopTutorial:Part1-WhatisHadoop?(anOverview)
BestofBooksandResourcestoGetStartedwithHadoop
HadoopTutorial:Part5-AllHadoopShellCommandsyouwillNeed.
HadoopInstallationonLocalMachine(SinglenodeCluster)
FindCommentsbeloworAddone
RomainRigaux
said...
Nicesummary!
\o"commentpermalink"
October03,2013
pragyakhare
said...
Iknowi'mabeginnerandthisquestionmytbeasilly1butcanyoupleaseexplaintomethathowPARALLELISMisachievedviamap-reduceattheprocessorlevel???ifI'veadualcoreprocessor,isitthatonly2jobswillrunatatimeinparallel?
\o"commentpermalink"
October05,2013
Anonymoussaid...
HiIamfromMainframebackgroundandwithlittleknowledgeofcorejava...DoyouthinkJavaisneededforlearningHadoopinadditiontoHive/PIG?EvenwanttolearnJavaformapreducebutcouldn'tfindwhatallwillbeusedinrealtime..anddefinitiveguidebooksseemstoughforlearningmapreducewithJava..anyoptionwhereIcanlearnitstepbystep?
Sorryforlongcomment..butitwouldbehelpfulifyoucanguideme..
\o"commentpermalink"
October05,2013
DeepakKumar
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 项目服务合同案例
- 农村房屋买卖合同规定
- 自动门销售合同
- 国旗下的爱国讲话演讲稿小学生篇
- 存货分析论文研究报告
- 地产行业SAPERP信息化规划实施项目方案相关两份资料
- 服装设计行业流行趋势分析
- 2024新版专柜装修工程合同范本
- 2024建筑工程防水材料采购合同
- 2021森林防火宣传教育工作方案5篇范文
- 新苏教版六上科学6.《生物的变异》课件
- 2022年陕西煤业股份有限公司招聘笔试试题及答案解析
- 非遗传统文化课件
- 幼儿园园长专业发展标准解读课件
- 部编版语文五年级上册八单元集体备课
- 某单位用车加油卡使用登记台账
- 未带有效居民身份证考生承诺书
- 螺杆式空压机结构课件
- 《数理方程》全套教学课件
- 米兰·昆德拉课件
- 登革热病毒优质课件
评论
0/150
提交评论