




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
云
计
算
时
代
的
社
交
网
络
平
台
和
技
术张智威副院长,研究院,谷歌中国教授,电机工程系,加州大学2/16/2011
Ed
Chang
1ChinaOpportunityChina
&
US
in
2006-07180
million
208
million60
million
60million
500
million
180million600
k72
kChinaU.S.2Mobile
PhonesEngineering
Graduates2/16/2011(125%)(13%)(190%)
(129%)InternetPopulationBroadband
UsersEd
Chang·Size(~700)-200
engineers-400other
employees—Almost
100internsLocations-Beijing
(2005)—Taipei(2006)—Shanghai(2007)Google
China2/16/2011●OrganizingtheWorld'sInformation,Socially·社区平
台
(SocialPlatform)·云运
算
(Cloud
Computing)·结论与前瞻(Concluding
Remarks)2/16/2011Ed
Chang40
□2/16/2011Ed
Chang
5-jpg.htmO.jpgLmsgWeb
1.0htmhtm.htm.htm.htm.docO
OWebwith
People(2.0)doc2/16/2011htmEd
Chang-jpgO
Omsg,.xls.htm.htm.htm-jpgmsgO
oO
Oa
e6.msg.xls.htm-jpgO-jpgOO.htmApp(Gadget)
.doc+Social
Platformspp(GaeEdChang2/16/2011.htmmsg.htmG
0ae7oDoneaddapplicationadd
applicationEd
Chang盘prafile
edit☑
scrapboolphotosvideos○
teatimoniolsask
fnendsApps
editMusic
iLike面HcroscopesFunWall
by
Sl..listsmessagesupdatessettingsstartFlixster
Movies
食食☆☆☆addapolicatien
e
o
k
s
i
a
ppesli
ba
oik
e
riiRlilRc
d
o
us
nen
ye
aadvikl
ao
nooiounr★lv女esmicayrsh☆eadinigtorruwaeszaaccnmcanoveaucroisanmokgeotacewoacPeeapplicationdirectoryHome>EdChang>applicationdirectory<previous
I
next>
h
t
o
o
tA
t
h
e
Ir
i
o
ioo
sa
aapnt
mim
tililoy
hnadt
mh
l
i
a
o
eTw
pri
eetsn
fisk
sbt
i
ol
mpetitive
typing
game.Compete
against
yourfriends
and
thew合h由o女le音☆addapplicatiengcnroar-evefiort
oin
thew
sraoninmaclndgeRscool有南urthprassyothemwixhasolu,sosyspkoacanthen,sasneahve!5laswatarenimnol1ewaracwurnaos
ec
a
t
o
i
te
with
friends.Create
and
challenge
friends
to
movieesasttatervaoimilr
msuwirespanmfoidvnmos
ahertingotraMeetmovieeezrzauihqSDookmarks
Iools
LelpO
htp://AcoDrectory.asoxMG-In
…
EectricalaCom
… arkut
applicatian
diractory
Mozilla
Firafax
Horoscopes
山出击合台Login
G-C..echangP
1ogohGetyourhoroscopes
-Updated
every
other
day.MGooge,com-e
…QCdetdit
Yew
HttoryH
inenetfewsR…orkutDorkut-applic-|Flixster2ht
-scrkot.-?E"E*Pteet32eneL?*VTransferninodatafromlstbe.tanva.
cn
…star
whdbwnEtlere
PNeP@
oocd-c⁰d.
htee//ebedenyeE0m回
区资用友望入来吧邮件姓名发送道读我要基发逐语》eee-eeeee-MozillaFiretexCle
Edit
Yew
Htgtory
Bookmarks
我的主页
资料
围友我的朋友圈|我的拥友wusmIoob
Eep来笔
站子PmnCagem
mas礼瘤
得价
蛋县薄影集日记uanehaeg回到自己(0e●乌三巴托蒙古●呼和洁料●大西■济南,郑合肥哈尔滨长春阳朝鲜平壤首尔韩国日本0示京回
区MGoode
.com-Irbox(49)-edchang@g
…△成都●重庄费阳●星明武汉
南昌福州
约鱼岛●台北拉萨不舟孟加拉国缅卸●柳光可富汗
●伊斯兰堡巴基斯坦●新语里
尼泊名mee
MoyillaFiretoxCleEdit
Yew
Hgtory
Bookmsrks
Iook
8。天准来吧·我的朋友型MoPeEPonoo00--t://see
srye.e.
-104●网斯培纳
业
言尔吉斯斯组塔吉克斯坦、老过越南西沙群岛G
S供罗斯联邦■南速门NS●河内
不砂杆应Hep号。天涯来吧-苹津完地图数璃@2007
Mm
sta天涯来E-calo_qu的个人资料2WheeweEtlere●乌毒木齐克新●西安Done●●印度●●●i置化县还花因区涿底县—怀来县aig
主于满块自治星共有20名用户在比区域(第1页)共5页)eyuch
老汤翻火点击查看细节信县县承
德
县意昌
平区大兴区顺交区
平谷区
Q
迁西县大厂回族自演县宝坻是右照程尚
义
县丹和县天
镇
县阳高
县张北县“怀安县MGcodle.co
…8
天涯*-
8
天遵来吧…
8o
天莲来吧
…Goole,co
…
8o
天通来吧…d
OpenSoa
…
C]Goode地…C]Develoer
…MGoode.co
…人ede
tdit
Yew
Htgtory
Bookmarke
Iook
Hepe
http:/Laba.tanya.onAalba/FiendMap?d=14914947603760770386其丘县易县武
清
县地重称柳意00g
spabccog-薛
县广民县浑
源
县天涯来吧MozillaFirefex深水县高碎店市市太同
县阳原县王田县隆化县mg承
洁
县DoaCa下一页5
翼
后Done○netbt区2/16/2011
Ed
Chang
12开
放
社
区
平
台Linked
inFriends
rciet
siesvilorkut
Linked
in
hi5
sale
force.comorkut
Linked
hi5
sale
force.comorkutLinked
inhi5面sale
force.comOpenSocial开
放
社
区
平
台我是谁2/16/2011Ed
Chang
17社区平台
我
的
朋
友他的活动Fle
Edit
yewHytoryBookmarksIookLtep ho:Aabs
tnva
cnAsea
FustrtendsMGoode
.com-In
…
G]ooe
gadget
…
a
OperSoosl
Rat我的主而资料
朋友
采吧
帖子
彩集
日记我的朋友圈|我的朋友上一步1邮件发送造请我要群发激请。香看我的期2*e-CRP?E天涯来吧我的朋友图-MorillaFiretexTransferrino
data
from
lstbe.tenva.cn
…天涯来吧-我_留意薄Feui
Dashbord礼物Pam..Darren
Hiang-
…Googie.com-C
…8评价年常带用|xPaiment..园到自己star3理hdA天涯来吧我的朋友图-MorillaFiretexFle
Edit
yewHytoryBookmarksIookLep·
ho:Aabs
tnva
cnAsea
FustrtendsMGoode
.com-In
…
G]ooe
gadget
…[C]我的主而资料
朋友
采吧
帖子
彩集日记我的朋友圈|我的朋友上一步1品天源*吧-载-评价
智音清Darren
Hiang-
…G-C
…A邮件发送造语我要群发激请。园到自己五最选的,选的然是105-1过直进QQQ91
五公司的执
扣2白级的建直求金101-12-30白级的实班家金101-1230白领的家庭重金101-1230移殊好友hel人·
永远来吧(离线)担量上线想片吧1男
3
7
岁
北
京项口和⇔也
Pam..
Peaiment..
2*oERstarTransferrino
data
from
lstbe.tenva.cn
…我的好友
×一FeraiDashboerd礼物
?22202m
C
3Frefsxx香看我的朋理书用一我是谁他的东西社区平台他的活动2/16/2011
Ed
Chang
20开
放
社
区
平
台我
的
朋
友e
o:c-mmgrouppnetoa2?2210*
风*
4015程片②关置来花要开复的礼物6ktdryeeHstryEoomartsSe开复四实来来地开复的要集积片6t
6dt·~iGoooe
cn-Roos(a)-edcrgo
…1oosHnp:/hba,trraanuba/OFtVeN-151308031N3*1c8oK
**e-s
#州2读的礼物tCst.的力收到1出
0LecMePFLm
3!要
的ula×c-他的主开复收到的礼物曰
?验明
进出&o
天准书形·
我的缴东细回
C贸意startO2me-
开案的民物孔品庙礼物’*nmAPo送出0menesDone21Social
GraphU
i
t
r
NPi294NANomViee103
KinuteBrowsePicturesUnie…-ViVseei
127
Im31Time
Ter
V
23
Nne=mo
intense
largestcirclerepre
entJoinorVisit
GroupsUni
iVt
st
s
i
li
nlllonrime
Per
Visit
!3s
MinutesBrowseMarketPlaceUn*a
sYtis=te2e
il,
n
*enAddAFriendUi
i
⁷s*y
6
Mi2l2
t
e*CAC
mp
et
comlargestaudienceis*tmoAn
inMtu…io3sineoadeSewevtember(BOr
soer
PFroiefnil
s)
Unieus
Visitem:21
Million
ri
V=*
.Vi-i26e2
a:iSM
Mon
inutesloin
or
Browse
NetworksRead
Discussion
Boards
a-uP
iVstia
s
%u2ttnSearch
for
Members
and
GroyPs
VuiP-…
ri…
iti
s
s
*0*4e7$mitl
MittesomniTiUueellioimniTU
o
e
t
ot5em64in
rl
ta
i
tnhsUniawe
Visitors:
14envime
Pr
Visaa
e
4=70
nutsneottTimUn
zce
e
ek:
t
tyStBalr
dbus
h2
d0e2represent
usage
intesityonrweouanesvncEd
Chang
23Darkeshade2/16/2011T
sri
:
o
M1
$i
suteominnt1:*ssimUWhat
UsersWant?·People
care
about
other
people一careabout
peoplethey
know·
一
connect
to
people
they
do
not
know一
about
who
other
people
are一aboutwhatotherpeople
are
doingDiscover
interesting
information一
basedonother
people2/16/2011
Ed
Chang
24InformationOverflow
ChallengeT
u
n
n
,too
many
choices
of·DesiringaSocialNetworkRecommendation
System2/16/2011
Ed
Chang
25appseopleaysammforoo··“/
ds
i
e
me
to
manageorkfulltwanreliaheenlinneonmysooRecommendationSystemu
n
mmendationoncoRdaForumommeyenitRComFrien···Application
Suggestion
·Ads
Matching2/16/2011
Ed
Chang
26Organizing
the
World's
Information,Socially·
社
区平
台(Social
Platform)·云
运
算(Cloud
Computing)·结论与前瞻(Concluding
Remarks)2/16/2011Ed
Chang27(3
)
算(4
的云计算空强无限无限··)是你的的云计就备在后设不录何所登任无··(1)数据在云端·不怕丢失·不必备份(2
)
端升下级载在云动必件不自软··业界趋势:云计算时代的到来无限速度互联网搜索:
云计算的例子2.分布式预处理数据以便为搜索提供服务:
Gcogle
Infrastructure(thousands
sesdi
e
for
mass
data一
FileSystem一ngucproMa—taVeteateTatH* *
RPmutm-CooeA
Clto*eC.OteSeeCiePit*1边
a7004#Cemmm
-hm1
rmCeat
conputs?-
Cloud
Computing
inraetrguesy
BAEd
CSsnouhSm29
CsOaltm
Jn
msE+eaooglIn
limodity
servers
arcund
theworldofco1.用户输入查询关键字3.返回搜索结果2/16/2011netEelermCemuoudtse2145541335245341352141554254331521312345133352115241355125Collaborative
FilteringGiven
a
matrix
that“encodes”data2/16/2011
Ed
Chang
30214554?133524?53?413521?455425?2
4335213123451333?52?1152?4435451245?Given
a
matrix
that“encodes
”dataManyapplications·User-Community·User-User·Ads
-User·Ads-Community·etc.(collaborativefiltering):Ed
Chang
31Communities2/16/2011UsersCollaborative
Filtering(CF)[Breese,Heckerman
and
Kadie
1998]·Memory-based—
h
il
fi
sm
i
r
,
)·Model-based—Build
a
model
of
relationship
between
subject
matters一Make
predictions
basedonthe
constructedmodelcstreolehprofineiglasts,simieareenrrilasemuwsiilar”samssr,aumrsisetugnBouiveDifferent
similarity
measures
yield
different
techniques一
ons
based
on
the
preferences
of
theseersictiueimilarake
p“sM2/16/2011
Ed
Chang
32Memory-Based
Model[Goldbertetal.1992;Resniket
al.1994;Konstant
et
al.1997]·
Pros一
Simplicity,avoid
model-building
stage·
Cons—MemoryandTimeconsuming,uses
the
entiredatabaseeverytimetomake
aprediction一
Cannotmake
prediction
ifthe
user
has
no
items
incommonwithother
users2/16/2011
Ed
Chang
33Model-Based
Model[Breese
et
al.1998;Hoffman
1999;Blei
et
al.2004]Pros一
it
t
is
much
smaller
than
the
一
nti
ery
the
model
instead
ofCons一
Model-buildingtakestimedatasetdiction,quererpeertheFastatasemodelllabiactuScala2/16/2011
Ed
Chang
34Algorithm
Selection
Criteria
S
al
i
commendation·CloudComputing!ngReainmeTr-ticalableNear-re·
e
t
it
r
i
gs
r
irablecityDesasciantianhTwaldealmenCanIncr2/16/2011
Ed
Chang
35Model-based
PriorWorkLatent
Semantic
Analysis
(LSA)·
ProbabilisticLSA(PLSA)··Latent
Dirichlet
Allocation(LDA)2/16/2011
Ed
Chang
36·
Maphigh-dimensional
count
vectors
tolowerdimensional
representation
called
latent
semantic
space·BySVD
decomposition:A=UEVTDocs
Word×D
SWxD
WxTA=Word-document
co-occurrence
matrixU;
=How
likely
word
i
belongs
to
topic
jji
=How
significant
topic
j
isVi¹=How
likely
topic
i
belongs
to
docjLatent
Semantic
Analysis(LSA)[Deerwester
et
al.1990]2/16/2011
Ed
Chang
37
Latent
Semantic
Analysis(cont.)·LSAkeepsk-largestsingular
values一
Low-rankapproximationtothe
original
matrix一
Savespace,de-noisifiedandreducesparsityOCS·Make
ecommendations
usingA—Word-word
similarity:A
AT-Doc-doc
similarity:ÄT
A—Word-doc
relationship:AWxKWxrDATopKxD2/16/2011ChangWordsEd38Probabilistic
Latent
Semantic
Analysis(PLSA)[Hoffman
1999;Hoffman20041Document
is
viewed
as
a
bag
ofwords··
|
,
licit
meaningiEMexpviwrsPlingd),deea,Plw)ityodel-P(robaMP··A
latent
semantic
layer
isconstructed
inbetweendocuments
and
words
·P(w,d)=P(d)P(w|d)=P(d)EzP(w|z)P(z|d)algorithm2/16/2011ChangEd39·
LDA[Blei
et
al.2003]一
Provideacompletegenerativemodelwith
Dirichlet
prior
·
AT
[Griffiths
&Steyvers
2004]一
Includeauthorshipinformation一
Document
iscategorizedbyauthors
andtopics·
ART[McCallum2004]一
Includeemailrecipientas
additional
information一
is
categorized
by
author,recipients
andtopics2/16/2011
Ed
Chang
40PLSAextensions·
PHITS[Cohn
&Chang
2000]·
e
t
u
[Cohn
&Hoffmann
2001]一
Model
contents(words)and
inter-connectivity
of
documentsHITSencePrrSA
andco-occLnPiombinationofocument-citaclA
li一CombinationalCollaborativeFiltering(CCF)·Fusemultiple
information—Alleviate
the
information
sparsity
problem·Hybridtrainingscheme—Gibbs
sampling
as
initializations
for
EM·Parallelization—Achieve
linear
speedup
with
the
numberof
machines2/16/2011
Ed
Chang
41algorithm·
Givenacollectionofco-occurrencedata-Community:C
={C₁
,C₂
,…,C}-User:U={u₁
,u₂
,…,um}一Description:D={d₁
,d₂
,…,dv}-Latentaspect:Z={z₁
,z₂
,…,zk}·Models—Baseline
models·Community-User(C-U)model·Community-Description(C-D)model-CCF:CombinationalCollaborativeFiltering·
Combines
both
baseline
modelsNotations2/16/2011ChangEd42·Communityis
viewed
as
a
bag
ofwords
·canddarerendered
conditionally·Gi
ent
pr
rodu
zword
d1.A
community
cischosenuniformly
2.A
topic
zisselectedfromP(z|c)3.Awordd
isgenerated
from
P(d|z)Chang
43hgcnaiecrtonss,firativpeneeen·Communityisviewed
as
a
bag
of
usersc
and
u
are
rendered
conditionallyindependentbyintroducingz■(
Generative
process,for
each
user
u
1.Acommunitycischosenuniformly
2.A
topic
zisselected
fromP(z|c)3.Auser
u
is
generatedfrom
P(u|z)2/16/2011
EdModelsCommunity-Description(C-D)model
BaselineCommunity-User(C-U)model
-Pros1.Cluster
communities
based
oncommunity
content(description
words)-Cons1.No
personalized
recommendation2.Donot
considerthe
overlapped
usersbetween
communitiesChang
441.C-U
matrix
information2.Cannot
take
similarity2/16/2011is
sparse,may
sufferfromsparsity
problemadvantage
of
contentbetween
communitiesEdModels(cont.)Community-Description(C-D)model
BaselineCommunity-User(C-U)model
-Pros1.Personalized*Conssuggestioncommunity·CCFcombines
both
baseline
models*A
community
isviewed
as-abag
of
users
AND
a
bag
ofwords*By
adding
C-U,CCF
can
performpersonalizedrecommendationwhichC-Dalone
cannot·By
adding
C-D,CCF
can
perform
betterpersonalizedrecommendationthan
C-Ualonewhich
may
sufferfrom
sparsity·Things
CCF
can
do
that
C-U
and
C-Dcannot-P(d)u),relate
user
to
word-Useful
for
user
targeting
adsCombinational
Filtering(CCF)model
C
P(c)—P(zlc)ZP(ulz)
P(dlz)U
dModelCCF2/16/2011CollaborativeChangEd45Algorithm
Requirements
S
al
i
commendationngReainmeTr-ticalableNear-reIncrementalTraining
is
Desirable2/16/2011
Ed
Chang
46ParallelizingCCFDetailsomitted2/16/2011
Ed
Chang
47(3
)
算(4
的云计算空强无限无限··)是你的的云计就备在后设不录何所登任无··(1)数据在云端·不怕丢失·不必备份(2
)
端升下级载在云动必件不自软··业界趋势:云计算时代的到来无限速度ExperimentsonOrkut
Dataset·Data
description-Collected
on一
Two
types
ofJuly
26,2007data
were
extracted·Community-user,community-description
一312,385users—109,987communities·
—191,034
unique
Englishwords·Speedup·
Community
recommendation·U
m
i
milarity/clusteringtysiartylisimimunserCo2/16/2011Ed
Chang49Community
Recommendation·
Evaluation
Method一
No
ground-truth,no
user
clicks
available—Leave-one-out:randomly
delete
one
community
foreach
user一
Whether
the
deleted
community
can
be
recovered·
Evaluation
metric—Precisionand
Recall2/16/2011
Ed
Chang
50Lengthoftherecommendation
listPecertageObservations:
CCFoutperforms
C-U
e
cboe
u
Citi
C
Uu
nhascaserttermmeehr,todmjoinThe口
For
top20,precision/recall
of
CCFare
twice
higher
than
those
of
C-UNumber
of
communities
a
user
has
joinedEpredict2/16/2011Ed
Chang51·
The
Orkut
dataset
enjoys
a
linear
speedup
when
the
number
of2/16/2011
Ed
Chang
52machines
is
up
to
100
Reduces
the
training
time
from
one
day
to
less
than
14
minutes··RuntimeSpeedupMachinesTime(aee.)Specdup100.23310204,32621.3502.28040.510O1,01491.1200706116But,what
makes
the
speedup
slow
down
after
100
machines?Number
of
mnchinesSpeedup200RuntimeSpeedup(cont.)·Trainingtimeconsistsoftwo
parts:一
Computationtime(Comp)一
Communicationtime(Comm)sdoedupNumberofmachinesNumbarofmachines2/16/2011
Ed
Chang
53CCFSummary·CombinationalCollaborative
Filtering—Fuse
bags
ofwordsand
bags
of
usersinformation—Hybridtrainingprovides
better
adliitionsfor
EM
ratherthan
random一
Parallelizeto
handle
large-scaledatasetsngzaetiseini2/16/2011
Ed
Chang
54China'sContributionson/to
CloudComputing
Parallel
CCF Parallel
SVMs(Kernel
Machines)·
ParallelSpectral
Clustering·
Parallel
Expectation
Maximization
·ParallelAssociation
Mining·
Parallel
LDA2/16/2011
Ed
Chang
55Parallel
SVDSpeeding
up
SVMs
[NIPS
2007]·Approximate
MatrixFactorization·
Parallelization
Open
source
@/p/psvm·A
task
that
takes
7
dayson
1machinetakes
1
hourson500
machines350+downloads
since
December
072/16/2011
Ed
Chang
56≈XIncompleteCholesky
Factorization(ICF)p<<n→Conserve
Storage2/16/2011
Ed
Chang
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 广告效果评估监测协议3篇
- 小卖部承包合同范例3篇
- 入伙协议书格式合同3篇
- 年薪制劳动合同范本3篇
- 保温板施工监控3篇
- 绿色IT与计算机硬件的环保设计考核试卷
- 纸质宠物用品市场趋势与消费行为研究分析考核试卷
- 服务标准化与医药研发服务考核试卷
- 真空泵在石油化工中的应用考核试卷
- 2025年:劳动合同终止的多样情形解析
- 2024华能四川能源开发有限公司下属单位招聘笔试参考题库附带答案详解
- 2025怎样正确理解全过程人民民主的历史逻辑、实践逻辑与理论逻辑?(答案3份)
- 钢结构高处作业安全管理
- JJF 2221-2025导热系数瞬态测定仪校准规范
- 华为手机协议合同
- 甘肃省陇南市礼县第六中学2024-2025学年八年级下学期第一次月考数学试卷(无答案)
- 公司两班倒管理制度
- 完整版高中古诗文必背72篇【原文+注音+翻译】
- 2025年武汉数学四调试题及答案
- 人教版小学四年级语文下册2024-2025学年度第二学期期中质量检测试卷
- 七年级下册道德与法治(2025年春)教材变化详细解读
评论
0/150
提交评论