分布式,可扩展性系统架构及设计_第1页
分布式,可扩展性系统架构及设计_第2页
分布式,可扩展性系统架构及设计_第3页
分布式,可扩展性系统架构及设计_第4页
分布式,可扩展性系统架构及设计_第5页
已阅读5页,还剩65页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、LiveJournals BackendA history of scalingApril 2005Brad FMark S / / This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License. To view a copy of this license, visit /licenses/by-nc-sa/1.0/ or send a letter to Creative Commons, 559 Nathan Ab

2、bott Way, Stanford, California 94305, USA.LiveJournal Overviewcollege hobby project, Apr 1999“blogging”, forumssocial-networking (friends)aggregator: “friends page”April 20042.8 million accountsApril 20056.8 million accounts thousands of hits/secondwhy its interesting to you.100+ serverslots of MySQ

3、LLiveJournal Backend: TodayRoughly.User DB Cluster 1uc1auc1bUser DB Cluster 2uc2auc2bUser DB Cluster 3uc3auc3bUser DB Cluster 4uc4auc4bUser DB Cluster 5uc5auc5bMemcachedmc4mc3mc2mc12.mc1mod_perlweb4web3web2web50.web1BIG-IPbigip2bigip1perlbal (httpd/proxy)proxy4proxy3proxy2proxy5proxy1Global Database

4、slave1master_a master_bslave2.slave5MogileFS Databasemog_amog_bMogile Trackerstracker2tracker1Mogile Storage Nodes.sto2sto8sto1net.LiveJournal Backend: TodayRoughly.User DB Cluster 1uc1auc1bUser DB Cluster 2uc2auc2bUser DB Cluster 3uc3auc3bUser DB Cluster 4uc4auc4bUser DB Cluster 5uc5auc5bMemcachedm

5、c4mc3mc2mc12.mc1mod_perlweb4web3web2web50.web1BIG-IPbigip2bigip1perlbal (httpd/proxy)proxy4proxy3proxy2proxy5proxy1Global Databaseslave1master_a master_bslave2.slave5MogileFS Databasemog_amog_bMogile Trackerstracker2tracker1Mogile Storage Nodes.sto2sto8sto1net.The plan.Backend evolutionwork up to pr

6、evious diagramMyISAM vs. InnoDB(rare situations to use MyISAM)Four ways to do MySQL clustersfor high-availability and load balancingCachingmemcachedWeb load balancingPerlbal, MogileFSThings to look out for.MySQL wishlistBackend EvolutionFrom 1 server to 100+.where it hurtshow to fixLearn from this!d

7、ont repeat my mistakescan implement our design on a single serverOne Servershared serverdedicated server (still rented)still hurting, but could tune itlearn Unix pretty quickly (first root)CGI to FastCGISimpleOne Server - ProblemsSite gets slow eventually.reach point where tuning doesnt helpNeed ser

8、versstart “paid accounts”SPOF (Single Point of Failure):the box itselfTwo ServersPaid account revenue buys:Kenny: 6U Dell web serverCartman: 6U Dell database serverbigger / extra disksNetwork simple2 NICs eachCartman runs MySQL on internal networkTwo Servers - ProblemsTwo single points of failureNo

9、hot or cold sparesSite gets slow again.CPU-bound on web nodeneed more web nodes.Four ServersBuy two more web nodes (1U this time)Kyle, StanOverview: 3 webs, 1 dbNow we need to load-balance!Kept Kenny as gateway to outside worldmod_backhand amongst em allFour Servers - ProblemsPoints of failure:datab

10、asekenny (but could switch to another gateway easily when needed, or used heartbeat, but we didnt)nowadays: WhackamoleSite gets slow.IO-boundneed another database server . how to use another database?Five Serversintroducing MySQL replicationWe buy a new database serverMySQL replicationWrites to Cart

11、man (master)Reads from bothReplication Implementationget_db_handle() : $dbhexistingget_db_reader() : $dbrtransition to thisweighted selectionpermissions: slaves select-onlymysql option for this nowbe prepared for replication lageasy to detect in MySQL 4.xuser actions from $dbh, not $dbrMore ServersS

12、ites fast for a while,Then slowMore web servers,More database slaves,.IO vs CPU fightBIG-IP load balancerscheap from usenettwo, but not automatic fail-over (no support contract)LVS would work tooChaos!Where were at.mod_perlweb4web3web2web12.web1BIG-IPbigip2bigip1mod_proxyproxy3proxy2proxy1Global Dat

13、abaseslave1 slave2.slave6masternet.Problems with Architectureor,“This dont scale.”DB master is SPOFSlaves upon slaves doesnt scale well.only spreads reads200 writes/s200 write/s500 reads/s250 reads/s200 write/s250 reads/sw/ 1 serverw/ 2 serversEventually.databases eventual consumed by writing400 wri

14、te/s3 reads/s400write/s3 r/s400 write/s3 reads/s400write/s3 r/s400 write/s3 reads/s400write/s3 r/s400 write/s3 reads/s400write/s3 r/s400 write/s3 reads/s400write/s3 r/s400 write/s3 reads/s400write/s3 r/s400 write/s3 reads/s400write/s3 r/sSpreading WritesOur database machines already did RAIDWe did b

15、ackupsSo why put user data on 6+ slave machines? (12+ disks)overkill redundancywasting time writing everywhereIntroducing User ClustersAlready had get_db_handle() vs get_db_reader()Specialized handles:Partition datasetcant join. dont care. never join user data w/ other user dataEach user assigned to

16、 a cluster numberEach cluster has multiple machineswrites self-contained in cluster (writing to 2-3 machines, not 6)User Clustersalmost resembles todays architectureSELECT userid,clusterid FROM user WHERE user=bobuserid: 839clusterid: 2SELECT . FROM .WHERE userid=839 .OMG i like totally hate my pare

17、nts they just dont understand me and i h8 the world omg lol rofl *! :-;add me as a friend!User Cluster Implementationper-user numberspacescant use AUTO_INCREMENTuser A has id 5 on cluster 1.user B has id 5 on cluster 2. cant move to cluster 1PRIMARY KEY (userid, users_postid)InnoDB clusters this. us

18、er moves fast. most space freed in B-Tree when deleting from source.moving users around clustershave a read-only flag on userscareful user mover tooluser-moving harnessjob server that coordinates, distributed long-lived user-mover clients who ask for tasksbalancing disk I/O, disk spaceUser Cluster I

19、mplementation$u = LJ:load_user(“brad”)hits global cluster$u object contains its clusterid$dbcm = LJ:get_cluster_master($u)writesdefinitive reads$dbcr = LJ:get_cluster_reader($u)readsDBI:Role DB Load BalancingOur little library to give us DBI handlesGPL; not packaged anywhere but our cvsReturns handl

20、es given a role namemaster (writes), slave (reads)cluster,slave,a,bCan cache connections within a request or foreverVerifies connections from previous requestRealtime balancing of DB nodes within a roleweb / CLI interfaces (not part of library)dynamic reweighting when node downWhere were at.mod_perl

21、web4web3web2web25.web1BIG-IPbigip2bigip1mod_proxyproxy4proxy3proxy2proxy5proxy1net.User DB Cluster 1slave1slave2masterUser DB Cluster2slave1slave2masterGlobal Databaseslave1 slave2.slave6masterPoints of Failure1 x Global masterlamen x User cluster mastersn x lame.Slave relianceone dies, others readi

22、ng too muchSolution? .User DB Cluster 1slave1slave2masterUser DB Cluster2slave1slave2masterGlobal Databaseslave1 slave2.slave6masterMaster-Master Clusters!two identical machines per clusterboth “good” machinesdo all reads/writes to one at a time, both replicate from each otherintentionally only use

23、half our DB hardware at a time to be prepared for crasheseasy maintenance by flipping the active in pairno points of failureUser DB Cluster 1uc1auc1bUser DB Cluster 2uc2auc2bappMaster-Master Prereqsfailover shouldnt break replication, be it:automatic (be prepared for flapping)by hand (probably have

24、other problems)fun/tricky part is number allocationsame number allocated on both pairscross-replicate, explode.strategiesodd/even numbering (a=odd, b=even)if numbering is public, users suspicious3rd party: global database (our solution).Cold Co-Masterinactive machine in pair isnt getting readsStrate

25、giesswitch at night, orsniff reads on active pair, replay to inactive guyignore itnot a big deal with InnoDB7A7BClientsHot cache,happy.Cold cache,sad.Where were at.mod_perlweb4web3web2web25.web1BIG-IPbigip2bigip1mod_proxyproxy4proxy3proxy2proxy5proxy1net.User DB Cluster 1slave1slave2masterGlobal Dat

26、abaseslave1 slave2.slave6masterUser DB Cluster 2uc2auc2bMyISAM vs. InnoDBMyISAM vs. InnoDBUse InnoDB.Really.Little bit more config work, but worth it:wont lose data(unless your disks are lying, see later.)fast as hellMyISAM for:loggingwe do our web access logs to itread-only static dataplenty fast f

27、or readsLogging to MySQLmod_perl logging handlerINSERT DELAYED to mysqlMyISAM: appends to table w/o holes dont blockApaches access logging disableddiskless web nodeserror logs through syslog-ngProblems:too many connections to MySQL, too many connects/second (local port exhaustion)had to switch to sp

28、ecialized daemondaemons keeps persistent conn to MySQLother solutions werent fast enoughFour Clustering Strategies.Master / Slavedoesnt always scalereduces reads, not writescluster eventually writing full timegood uses:read-centric applicationssnapshot machine for backupscan be underpoweredbox for “

29、slow queries”when specialized non-production query requiredtable scannon-optimal index available200 writes/s500 reads/sw/ 1 server200 write/s250 reads/s200 write/s250 reads/sw/ 2 serversDownsidesDatabase master is SPOFReparenting slaves on master failure is trickyhang new master as slave off old mas

30、terwhile in production, loop:slave stop all slavescompare replication positionsif unequal, slave start, repeat.eventually itll matchif equal, change all slaves to be slaves of new master, stop old master, change config of whos the masterGlobal Databaseslave1 slave2new mastermasterGlobal Databaseslav

31、e1 slave2new mastermasterGlobal Databaseslave1slave2new mastermasterMaster / Mastergreat for maintenanceflipping active side for maintenance / backupsgreat for peace of mindtwo separate copiesCon: requires careful schemaeasiest to design for from beginningharder to tack on laterUser DB Cluster 1uc1a

32、uc1bMySQL Cluster“MySQL Cluster”: the productin-memory onlygood for small datasetsneed 2-4x RAM as your datasetperhaps your userid,username - user row (w/ clusterid) table?new set of table quirks, restrictionswas in developmentperhaps better now?Likely to kick ass in future:when not restricted to in

33、-memory dataset.planned development, last I heard?DRBDDistributed Replicated Block DeviceTurn pair of InnoDB machines into a clusterlooks like 1 box to outside world. floating IP.Linux block device driversits atop another block devicesyncs w/ another machines block devicecross-over gigabit cable ide

34、al. network is faster than random writes on your disks usually.One machine at a time running fs / MySQLHeartbeat does:failure detection, moves virtual IP, mounts filesystem, starts MySQL, InnoDB recoversMySQL 4.1 w/ binlog sync/flush options: goodThe cluster can be a master or slave as well.CachingC

35、achingcachings key to performancecant hit the DB all the timeMyISAM: r/w concurrency problemsInnoDB: better; not perfectMySQL has to parse your queries all the timebetter with new MySQL binary protocolWhere to cache?mod_perl caching (address space per apache child)shared memory (limited to single ma

36、chine, same with Java/C#/Mono)MySQL query cache: flushed per update, small max sizeHEAP tables: fixed length rows, small max sizememcachedhttp:/ Open Source, distributed caching systemrun instances wherever theres free memoryrequests hashed out amongst them allno “master node”protocol simple and XML

37、-free; clients for:perl, java, php, python, ruby, .In use by:LiveJournal, Slashdot, Wikipedia, SourceForge, HowardS, (hundreds).People speeding up their:websites, mail servers, .very fast.LiveJournal and memcached12 unique hostsnone dedicated28 instances30 GB of cached data90-93% hit rateWhat to Cac

38、heEverything?Start with stuff thats hotLook at your logsquery logupdate logslow logControl MySQL logging at runtimecanthelp me bug them.sniff the queries!mysniff.pl (uses Net:Pcap and decodes mysql stuff)canonicalize and countor, name queries: SELECT /* name=foo */Caching Disadvantagesextra codeupda

39、ting your cacheperhaps you can hide it all?clean object setting/accessor API?but dont cache (DB query) - (result set)want finer granularitymore stuff to adminbut only one real option: memory to useWeb Load BalancingWeb Load BalancingBIG-IP mostly packet-leveldoesnt buffer HTTP responsesneed to spoon

40、-feed clientsBIG-IP and others cant adjust server weighting quick enoughDB apps have widly varying response times: few ms to multiple secondsTried a dozen reverse proxiesnone did what we wanted or were fast enoughWrote Perlbalfast, smart, manageable HTTP web server/proxycan do internal redirectsPerl

41、balPerlbalPerluses epoll, kqueuesingle threaded, async event-basedconsole / HTTP remote managementlive config changeshandles dead nodes, balancingmultiple modesstatic webserverreverse proxyplug-ins (Javascript message bus.).plug-insGIF/PNG altering, .Perlbal: Persistent Connectionspersistent connect

42、ionsperlbal to backends (mod_perls)know exactly when a connection is ready for a new requestno complex load balancing logic: just use whatevers free. beats managing “weighted round robin” hell.clients persistent; not tied to backendverifies new connectionsconnects often fast, but talking to kernel,

43、not apache (listen queue)send OPTIONs request to see if apache is theremultiple queuesfree vs. paid user queuesPerlbal: cooperative large file servinglarge file serving w/ mod_perl bad.mod_perl has better things to do than spoon-feed clients bytesinternal redirectsmod_perl can pass off serving a big

44、 file to Perlbaleither from disk, or from other URL(s)client sees no HTTP redirect“Friends-only” imagesone, clean URLmod_perl does auth, and is done.perlbal serves.Internal redirect pictureMogileFSMogileFS: distributed filesystemalternatives at time were either:closed, expensive, in development, com

45、plicated, scary/impossible when it came to data recoveryMogileFS main ideas:files belong to classesclasses: minimum replica countstracks what disks files are onset disks state (up, temp_down, dead) and hostkeep replicas on devices on different hostsScrew RAID! (for this, for databases its good.)mult

46、iple tracker databasesall share same MySQL database clusterbig, cheap disksdumb storage nodes w/ 12, 16 disks, no RAIDMogileFS componentsclientstrackersmysql database clusterstorage nodesMogileFS: Clientstiny text-based protocolcurrently only Perlporting to $LANG would be trivialdoesnt do database a

47、ccessMogileFS: Trackerinterface between client protocol and cluster of MySQL machinesalso does automatic file replication, deleting, etc.MySQL databasemaster-slave or, recommended: MySQL on DRBDStorage nodesNFS or HTTP transportLinux NFS incredibly problematicHTTP transport is Perlbal with PUT &

48、 DELETE enabledStores blobs on filesystem, not in database:otherwise cant sendfile() on themwould require lots of user/kernel copiesLarge file GET requestLarge file GET requestAuth: complex, but quickSpoonfeeding: slow, but event-basedThings to watch out for.MyISAMsucks at concurrencyreads and writes at same time: cantexcept appendsloses data in unclean shutdown / powerlossrequires slow myisamchk / REPAIR TABLEindex corruption more often than Id likeInnoDB: checksums itselfSolution:use InnoDB tablesLying St

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论