版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
1、LiveJournals BackendA history of scalingApril 2005Brad FMark S / / This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License. To view a copy of this license, visit /licenses/by-nc-sa/1.0/ or send a letter to Creative Commons, 559 Nathan Ab
2、bott Way, Stanford, California 94305, USA.LiveJournal Overviewcollege hobby project, Apr 1999“blogging”, forumssocial-networking (friends)aggregator: “friends page”April 20042.8 million accountsApril 20056.8 million accounts thousands of hits/secondwhy its interesting to you.100+ serverslots of MySQ
3、LLiveJournal Backend: TodayRoughly.User DB Cluster 1uc1auc1bUser DB Cluster 2uc2auc2bUser DB Cluster 3uc3auc3bUser DB Cluster 4uc4auc4bUser DB Cluster 5uc5auc5bMemcachedmc4mc3mc2mc12.mc1mod_perlweb4web3web2web50.web1BIG-IPbigip2bigip1perlbal (httpd/proxy)proxy4proxy3proxy2proxy5proxy1Global Database
4、slave1master_a master_bslave2.slave5MogileFS Databasemog_amog_bMogile Trackerstracker2tracker1Mogile Storage Nodes.sto2sto8sto1net.LiveJournal Backend: TodayRoughly.User DB Cluster 1uc1auc1bUser DB Cluster 2uc2auc2bUser DB Cluster 3uc3auc3bUser DB Cluster 4uc4auc4bUser DB Cluster 5uc5auc5bMemcachedm
5、c4mc3mc2mc12.mc1mod_perlweb4web3web2web50.web1BIG-IPbigip2bigip1perlbal (httpd/proxy)proxy4proxy3proxy2proxy5proxy1Global Databaseslave1master_a master_bslave2.slave5MogileFS Databasemog_amog_bMogile Trackerstracker2tracker1Mogile Storage Nodes.sto2sto8sto1net.The plan.Backend evolutionwork up to pr
6、evious diagramMyISAM vs. InnoDB(rare situations to use MyISAM)Four ways to do MySQL clustersfor high-availability and load balancingCachingmemcachedWeb load balancingPerlbal, MogileFSThings to look out for.MySQL wishlistBackend EvolutionFrom 1 server to 100+.where it hurtshow to fixLearn from this!d
7、ont repeat my mistakescan implement our design on a single serverOne Servershared serverdedicated server (still rented)still hurting, but could tune itlearn Unix pretty quickly (first root)CGI to FastCGISimpleOne Server - ProblemsSite gets slow eventually.reach point where tuning doesnt helpNeed ser
8、versstart “paid accounts”SPOF (Single Point of Failure):the box itselfTwo ServersPaid account revenue buys:Kenny: 6U Dell web serverCartman: 6U Dell database serverbigger / extra disksNetwork simple2 NICs eachCartman runs MySQL on internal networkTwo Servers - ProblemsTwo single points of failureNo
9、hot or cold sparesSite gets slow again.CPU-bound on web nodeneed more web nodes.Four ServersBuy two more web nodes (1U this time)Kyle, StanOverview: 3 webs, 1 dbNow we need to load-balance!Kept Kenny as gateway to outside worldmod_backhand amongst em allFour Servers - ProblemsPoints of failure:datab
10、asekenny (but could switch to another gateway easily when needed, or used heartbeat, but we didnt)nowadays: WhackamoleSite gets slow.IO-boundneed another database server . how to use another database?Five Serversintroducing MySQL replicationWe buy a new database serverMySQL replicationWrites to Cart
11、man (master)Reads from bothReplication Implementationget_db_handle() : $dbhexistingget_db_reader() : $dbrtransition to thisweighted selectionpermissions: slaves select-onlymysql option for this nowbe prepared for replication lageasy to detect in MySQL 4.xuser actions from $dbh, not $dbrMore ServersS
12、ites fast for a while,Then slowMore web servers,More database slaves,.IO vs CPU fightBIG-IP load balancerscheap from usenettwo, but not automatic fail-over (no support contract)LVS would work tooChaos!Where were at.mod_perlweb4web3web2web12.web1BIG-IPbigip2bigip1mod_proxyproxy3proxy2proxy1Global Dat
13、abaseslave1 slave2.slave6masternet.Problems with Architectureor,“This dont scale.”DB master is SPOFSlaves upon slaves doesnt scale well.only spreads reads200 writes/s200 write/s500 reads/s250 reads/s200 write/s250 reads/sw/ 1 serverw/ 2 serversEventually.databases eventual consumed by writing400 wri
14、te/s3 reads/s400write/s3 r/s400 write/s3 reads/s400write/s3 r/s400 write/s3 reads/s400write/s3 r/s400 write/s3 reads/s400write/s3 r/s400 write/s3 reads/s400write/s3 r/s400 write/s3 reads/s400write/s3 r/s400 write/s3 reads/s400write/s3 r/sSpreading WritesOur database machines already did RAIDWe did b
15、ackupsSo why put user data on 6+ slave machines? (12+ disks)overkill redundancywasting time writing everywhereIntroducing User ClustersAlready had get_db_handle() vs get_db_reader()Specialized handles:Partition datasetcant join. dont care. never join user data w/ other user dataEach user assigned to
16、 a cluster numberEach cluster has multiple machineswrites self-contained in cluster (writing to 2-3 machines, not 6)User Clustersalmost resembles todays architectureSELECT userid,clusterid FROM user WHERE user=bobuserid: 839clusterid: 2SELECT . FROM .WHERE userid=839 .OMG i like totally hate my pare
17、nts they just dont understand me and i h8 the world omg lol rofl *! :-;add me as a friend!User Cluster Implementationper-user numberspacescant use AUTO_INCREMENTuser A has id 5 on cluster 1.user B has id 5 on cluster 2. cant move to cluster 1PRIMARY KEY (userid, users_postid)InnoDB clusters this. us
18、er moves fast. most space freed in B-Tree when deleting from source.moving users around clustershave a read-only flag on userscareful user mover tooluser-moving harnessjob server that coordinates, distributed long-lived user-mover clients who ask for tasksbalancing disk I/O, disk spaceUser Cluster I
19、mplementation$u = LJ:load_user(“brad”)hits global cluster$u object contains its clusterid$dbcm = LJ:get_cluster_master($u)writesdefinitive reads$dbcr = LJ:get_cluster_reader($u)readsDBI:Role DB Load BalancingOur little library to give us DBI handlesGPL; not packaged anywhere but our cvsReturns handl
20、es given a role namemaster (writes), slave (reads)cluster,slave,a,bCan cache connections within a request or foreverVerifies connections from previous requestRealtime balancing of DB nodes within a roleweb / CLI interfaces (not part of library)dynamic reweighting when node downWhere were at.mod_perl
21、web4web3web2web25.web1BIG-IPbigip2bigip1mod_proxyproxy4proxy3proxy2proxy5proxy1net.User DB Cluster 1slave1slave2masterUser DB Cluster2slave1slave2masterGlobal Databaseslave1 slave2.slave6masterPoints of Failure1 x Global masterlamen x User cluster mastersn x lame.Slave relianceone dies, others readi
22、ng too muchSolution? .User DB Cluster 1slave1slave2masterUser DB Cluster2slave1slave2masterGlobal Databaseslave1 slave2.slave6masterMaster-Master Clusters!two identical machines per clusterboth “good” machinesdo all reads/writes to one at a time, both replicate from each otherintentionally only use
23、half our DB hardware at a time to be prepared for crasheseasy maintenance by flipping the active in pairno points of failureUser DB Cluster 1uc1auc1bUser DB Cluster 2uc2auc2bappMaster-Master Prereqsfailover shouldnt break replication, be it:automatic (be prepared for flapping)by hand (probably have
24、other problems)fun/tricky part is number allocationsame number allocated on both pairscross-replicate, explode.strategiesodd/even numbering (a=odd, b=even)if numbering is public, users suspicious3rd party: global database (our solution).Cold Co-Masterinactive machine in pair isnt getting readsStrate
25、giesswitch at night, orsniff reads on active pair, replay to inactive guyignore itnot a big deal with InnoDB7A7BClientsHot cache,happy.Cold cache,sad.Where were at.mod_perlweb4web3web2web25.web1BIG-IPbigip2bigip1mod_proxyproxy4proxy3proxy2proxy5proxy1net.User DB Cluster 1slave1slave2masterGlobal Dat
26、abaseslave1 slave2.slave6masterUser DB Cluster 2uc2auc2bMyISAM vs. InnoDBMyISAM vs. InnoDBUse InnoDB.Really.Little bit more config work, but worth it:wont lose data(unless your disks are lying, see later.)fast as hellMyISAM for:loggingwe do our web access logs to itread-only static dataplenty fast f
27、or readsLogging to MySQLmod_perl logging handlerINSERT DELAYED to mysqlMyISAM: appends to table w/o holes dont blockApaches access logging disableddiskless web nodeserror logs through syslog-ngProblems:too many connections to MySQL, too many connects/second (local port exhaustion)had to switch to sp
28、ecialized daemondaemons keeps persistent conn to MySQLother solutions werent fast enoughFour Clustering Strategies.Master / Slavedoesnt always scalereduces reads, not writescluster eventually writing full timegood uses:read-centric applicationssnapshot machine for backupscan be underpoweredbox for “
29、slow queries”when specialized non-production query requiredtable scannon-optimal index available200 writes/s500 reads/sw/ 1 server200 write/s250 reads/s200 write/s250 reads/sw/ 2 serversDownsidesDatabase master is SPOFReparenting slaves on master failure is trickyhang new master as slave off old mas
30、terwhile in production, loop:slave stop all slavescompare replication positionsif unequal, slave start, repeat.eventually itll matchif equal, change all slaves to be slaves of new master, stop old master, change config of whos the masterGlobal Databaseslave1 slave2new mastermasterGlobal Databaseslav
31、e1 slave2new mastermasterGlobal Databaseslave1slave2new mastermasterMaster / Mastergreat for maintenanceflipping active side for maintenance / backupsgreat for peace of mindtwo separate copiesCon: requires careful schemaeasiest to design for from beginningharder to tack on laterUser DB Cluster 1uc1a
32、uc1bMySQL Cluster“MySQL Cluster”: the productin-memory onlygood for small datasetsneed 2-4x RAM as your datasetperhaps your userid,username - user row (w/ clusterid) table?new set of table quirks, restrictionswas in developmentperhaps better now?Likely to kick ass in future:when not restricted to in
33、-memory dataset.planned development, last I heard?DRBDDistributed Replicated Block DeviceTurn pair of InnoDB machines into a clusterlooks like 1 box to outside world. floating IP.Linux block device driversits atop another block devicesyncs w/ another machines block devicecross-over gigabit cable ide
34、al. network is faster than random writes on your disks usually.One machine at a time running fs / MySQLHeartbeat does:failure detection, moves virtual IP, mounts filesystem, starts MySQL, InnoDB recoversMySQL 4.1 w/ binlog sync/flush options: goodThe cluster can be a master or slave as well.CachingC
35、achingcachings key to performancecant hit the DB all the timeMyISAM: r/w concurrency problemsInnoDB: better; not perfectMySQL has to parse your queries all the timebetter with new MySQL binary protocolWhere to cache?mod_perl caching (address space per apache child)shared memory (limited to single ma
36、chine, same with Java/C#/Mono)MySQL query cache: flushed per update, small max sizeHEAP tables: fixed length rows, small max sizememcachedhttp:/ Open Source, distributed caching systemrun instances wherever theres free memoryrequests hashed out amongst them allno “master node”protocol simple and XML
37、-free; clients for:perl, java, php, python, ruby, .In use by:LiveJournal, Slashdot, Wikipedia, SourceForge, HowardS, (hundreds).People speeding up their:websites, mail servers, .very fast.LiveJournal and memcached12 unique hostsnone dedicated28 instances30 GB of cached data90-93% hit rateWhat to Cac
38、heEverything?Start with stuff thats hotLook at your logsquery logupdate logslow logControl MySQL logging at runtimecanthelp me bug them.sniff the queries!mysniff.pl (uses Net:Pcap and decodes mysql stuff)canonicalize and countor, name queries: SELECT /* name=foo */Caching Disadvantagesextra codeupda
39、ting your cacheperhaps you can hide it all?clean object setting/accessor API?but dont cache (DB query) - (result set)want finer granularitymore stuff to adminbut only one real option: memory to useWeb Load BalancingWeb Load BalancingBIG-IP mostly packet-leveldoesnt buffer HTTP responsesneed to spoon
40、-feed clientsBIG-IP and others cant adjust server weighting quick enoughDB apps have widly varying response times: few ms to multiple secondsTried a dozen reverse proxiesnone did what we wanted or were fast enoughWrote Perlbalfast, smart, manageable HTTP web server/proxycan do internal redirectsPerl
41、balPerlbalPerluses epoll, kqueuesingle threaded, async event-basedconsole / HTTP remote managementlive config changeshandles dead nodes, balancingmultiple modesstatic webserverreverse proxyplug-ins (Javascript message bus.).plug-insGIF/PNG altering, .Perlbal: Persistent Connectionspersistent connect
42、ionsperlbal to backends (mod_perls)know exactly when a connection is ready for a new requestno complex load balancing logic: just use whatevers free. beats managing “weighted round robin” hell.clients persistent; not tied to backendverifies new connectionsconnects often fast, but talking to kernel,
43、not apache (listen queue)send OPTIONs request to see if apache is theremultiple queuesfree vs. paid user queuesPerlbal: cooperative large file servinglarge file serving w/ mod_perl bad.mod_perl has better things to do than spoon-feed clients bytesinternal redirectsmod_perl can pass off serving a big
44、 file to Perlbaleither from disk, or from other URL(s)client sees no HTTP redirect“Friends-only” imagesone, clean URLmod_perl does auth, and is done.perlbal serves.Internal redirect pictureMogileFSMogileFS: distributed filesystemalternatives at time were either:closed, expensive, in development, com
45、plicated, scary/impossible when it came to data recoveryMogileFS main ideas:files belong to classesclasses: minimum replica countstracks what disks files are onset disks state (up, temp_down, dead) and hostkeep replicas on devices on different hostsScrew RAID! (for this, for databases its good.)mult
46、iple tracker databasesall share same MySQL database clusterbig, cheap disksdumb storage nodes w/ 12, 16 disks, no RAIDMogileFS componentsclientstrackersmysql database clusterstorage nodesMogileFS: Clientstiny text-based protocolcurrently only Perlporting to $LANG would be trivialdoesnt do database a
47、ccessMogileFS: Trackerinterface between client protocol and cluster of MySQL machinesalso does automatic file replication, deleting, etc.MySQL databasemaster-slave or, recommended: MySQL on DRBDStorage nodesNFS or HTTP transportLinux NFS incredibly problematicHTTP transport is Perlbal with PUT &
48、 DELETE enabledStores blobs on filesystem, not in database:otherwise cant sendfile() on themwould require lots of user/kernel copiesLarge file GET requestLarge file GET requestAuth: complex, but quickSpoonfeeding: slow, but event-basedThings to watch out for.MyISAMsucks at concurrencyreads and writes at same time: cantexcept appendsloses data in unclean shutdown / powerlossrequires slow myisamchk / REPAIR TABLEindex corruption more often than Id likeInnoDB: checksums itselfSolution:use InnoDB tablesLying St
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- JJF(陕) 033-2020 超声波水浸探伤系统校准规范
- 提升学生兴趣的工作措施计划
- 《计算机的日常维护》课件
- 2024-2025学年年七年级数学人教版下册专题整合复习卷28.2 解直角三角形(3)(含答案)
- 《保护支持与运动》课件
- 《保险学引言》课件
- 前台工作环境的美化建议计划
- 组织年度人事工作总结大会计划
- 小型工程机械相关行业投资规划报告
- 井下波速测量仪相关项目投资计划书
- 河南省郑州市金水区2023-2024学年四年级数学第一学期期末统考试题含答案
- 服装设计习题含参考答案
- 人教版八年级上册英语重点单词+短语+句子默写大全
- 建筑能源管理系统-设计说明书
- 广东省各地市地图(可编辑)课件
- 《思想道德与法治》学习法治思想 提升法治素养-第六章
- 浅谈PROFIBUS-DP现场总线在桥式起重机地应用
- 建筑工地农民工业余学校教学台帐
- 2023年应急管理部宣传教育中心招聘笔试备考试题及答案解析
- 单位实习生意外应急预案
- T-DLSHXH 002-2023 工业干冰标准规范
评论
0/150
提交评论