0-云计算时代的社交网络-平台和技术-张智威_第1页
0-云计算时代的社交网络-平台和技术-张智威_第2页
0-云计算时代的社交网络-平台和技术-张智威_第3页
0-云计算时代的社交网络-平台和技术-张智威_第4页
0-云计算时代的社交网络-平台和技术-张智威_第5页
已阅读5页,还剩65页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

术张智威副院长,研究院,谷歌中国教授,电机工程系,加州大学2/16/2011

Ed

Chang

1ChinaOpportunityChina

&

US

in

2006-07180

million

208

million60

million

60million

500

million

180million600

k72

kChinaU.S.2Mobile

PhonesEngineering

Graduates2/16/2011(125%)(13%)(190%)

(129%)InternetPopulationBroadband

UsersEd

Chang·Size(~700)-200

engineers-400other

employees—Almost

100internsLocations-Beijing

(2005)—Taipei(2006)—Shanghai(2007)Google

China2/16/2011●OrganizingtheWorld'sInformation,Socially·社区平

(SocialPlatform)·云运

(Cloud

Computing)·结论与前瞻(Concluding

Remarks)2/16/2011Ed

Chang40

□2/16/2011Ed

Chang

5-jpg.htmO.jpgLmsgWeb

1.0htmhtm.htm.htm.htm.docO

OWebwith

People(2.0)doc2/16/2011htmEd

Chang-jpgO

Omsg,.xls.htm.htm.htm-jpgmsgO

oO

Oa

e6.msg.xls.htm-jpgO-jpgOO.htmApp(Gadget)

.doc+Social

Platformspp(GaeEdChang2/16/2011.htmmsg.htmG

0ae7oDoneaddapplicationadd

applicationEd

Chang盘prafile

edit☑

scrapboolphotosvideos○

teatimoniolsask

fnendsApps

editMusic

iLike面HcroscopesFunWall

by

Sl..listsmessagesupdatessettingsstartFlixster

Movies

食食☆☆☆addapolicatien

e

o

k

s

i

a

ppesli

ba

oik

e

riiRlilRc

d

o

us

nen

ye

aadvikl

ao

nooiounr★lv女esmicayrsh☆eadinigtorruwaeszaaccnmcanoveaucroisanmokgeotacewoacPeeapplicationdirectoryHome>EdChang>applicationdirectory<previous

I

next>

h

t

o

o

tA

t

h

e

Ir

i

o

ioo

sa

aapnt

mim

tililoy

hnadt

mh

l

i

a

o

eTw

pri

eetsn

fisk

sbt

i

ol

mpetitive

typing

game.Compete

against

yourfriends

and

thew合h由o女le音☆addapplicatiengcnroar-evefiort

oin

thew

sraoninmaclndgeRscool有南urthprassyothemwixhasolu,sosyspkoacanthen,sasneahve!5laswatarenimnol1ewaracwurnaos

ec

a

t

o

i

te

with

friends.Create

and

challenge

friends

to

movieesasttatervaoimilr

msuwirespanmfoidvnmos

ahertingotraMeetmovieeezrzauihqSDookmarks

Iools

LelpO

htp://AcoDrectory.asoxMG-In

EectricalaCom

… arkut

applicatian

diractory

Mozilla

Firafax

Horoscopes

山出击合台Login

G-C..echangP

1ogohGetyourhoroscopes

-Updated

every

other

day.MGooge,com-e

…QCdetdit

Yew

HttoryH

inenetfewsR…orkutDorkut-applic-|Flixster2ht

-scrkot.-?E"E*Pteet32eneL?*VTransferninodatafromlstbe.tanva.

cn

…star

whdbwnEtlere

PNeP@

oocd-c⁰d.

htee//ebedenyeE0m回

区资用友望入来吧邮件姓名发送道读我要基发逐语》eee-eeeee-MozillaFiretexCle

Edit

Yew

Htgtory

Bookmarks

我的主页

资料

围友我的朋友圈|我的拥友wusmIoob

Eep来笔

站子PmnCagem

mas礼瘤

得价

蛋县薄影集日记uanehaeg回到自己(0e●乌三巴托蒙古●呼和洁料●大西■济南,郑合肥哈尔滨长春阳朝鲜平壤首尔韩国日本0示京回

区MGoode

.com-Irbox(49)-edchang@g

…△成都●重庄费阳●星明武汉

南昌福州

约鱼岛●台北拉萨不舟孟加拉国缅卸●柳光可富汗

●伊斯兰堡巴基斯坦●新语里

尼泊名mee

MoyillaFiretoxCleEdit

Yew

Hgtory

Bookmsrks

Iook

8。天准来吧·我的朋友型MoPeEPonoo00--t://see

srye.e.

-104●网斯培纳

言尔吉斯斯组塔吉克斯坦、老过越南西沙群岛G

S供罗斯联邦■南速门NS●河内

不砂杆应Hep号。天涯来吧-苹津完地图数璃@2007

Mm

sta天涯来E-calo_qu的个人资料2WheeweEtlere●乌毒木齐克新●西安Done●●印度●●●i置化县还花因区涿底县—怀来县aig

主于满块自治星共有20名用户在比区域(第1页)共5页)eyuch

老汤翻火点击查看细节信县县承

县意昌

平区大兴区顺交区

平谷区

Q

迁西县大厂回族自演县宝坻是右照程尚

县丹和县天

县阳高

县张北县“怀安县MGcodle.co

…8

天涯*-

8

天遵来吧…

8o

天莲来吧

…Goole,co

8o

天通来吧…d

OpenSoa

C]Goode地…C]Develoer

…MGoode.co

…人ede

tdit

Yew

Htgtory

Bookmarke

Iook

Hepe

http:/Laba.tanya.onAalba/FiendMap?d=14914947603760770386其丘县易县武

县地重称柳意00g

spabccog-薛

县广民县浑

县天涯来吧MozillaFirefex深水县高碎店市市太同

县阳原县王田县隆化县mg承

县DoaCa下一页5

后Done○netbt区2/16/2011

Ed

Chang

12开

台Linked

inFriends

rciet

siesvilorkut

Linked

in

hi5

sale

force.comorkut

Linked

hi5

sale

force.comorkutLinked

inhi5面sale

force.comOpenSocial开

台我是谁2/16/2011Ed

Chang

17社区平台

友他的活动Fle

Edit

yewHytoryBookmarksIookLtep ho:Aabs

tnva

cnAsea

FustrtendsMGoode

.com-In

G]ooe

gadget

a

OperSoosl

Rat我的主而资料

朋友

采吧

帖子

彩集

日记我的朋友圈|我的朋友上一步1邮件发送造请我要群发激请。香看我的期2*e-CRP?E天涯来吧我的朋友图-MorillaFiretexTransferrino

data

from

lstbe.tenva.cn

…天涯来吧-我_留意薄Feui

Dashbord礼物Pam..Darren

Hiang-

…Googie.com-C

…8评价年常带用|xPaiment..园到自己star3理hdA天涯来吧我的朋友图-MorillaFiretexFle

Edit

yewHytoryBookmarksIookLep·

ho:Aabs

tnva

cnAsea

FustrtendsMGoode

.com-In

G]ooe

gadget

…[C]我的主而资料

朋友

采吧

帖子

彩集日记我的朋友圈|我的朋友上一步1品天源*吧-载-评价

智音清Darren

Hiang-

…G-C

…A邮件发送造语我要群发激请。园到自己五最选的,选的然是105-1过直进QQQ91

五公司的执

扣2白级的建直求金101-12-30白级的实班家金101-1230白领的家庭重金101-1230移殊好友hel人·

永远来吧(离线)担量上线想片吧1男

3

7

京项口和⇔也

Pam..

Peaiment..

2*oERstarTransferrino

data

from

lstbe.tenva.cn

…我的好友

×一FeraiDashboerd礼物

?22202m

C

3Frefsxx香看我的朋理书用一我是谁他的东西社区平台他的活动2/16/2011

Ed

Chang

20开

台我

友e

o:c-mmgrouppnetoa2?2210*

风*

4015程片②关置来花要开复的礼物6ktdryeeHstryEoomartsSe开复四实来来地开复的要集积片6t

6dt·~iGoooe

cn-Roos(a)-edcrgo

…1oosHnp:/hba,trraanuba/OFtVeN-151308031N3*1c8oK

**e-s

#州2读的礼物tCst.的力收到1出

0LecMePFLm

3!要

的ula×c-他的主开复收到的礼物曰

?验明

进出&o

天准书形·

我的缴东细回

C贸意startO2me-

开案的民物孔品庙礼物’*nmAPo送出0menesDone21Social

GraphU

i

t

r

NPi294NANomViee103

KinuteBrowsePicturesUnie…-ViVseei

127

Im31Time

Ter

V

23

Nne=mo

intense

largestcirclerepre

entJoinorVisit

GroupsUni

iVt

st

s

i

li

nlllonrime

Per

Visit

!3s

MinutesBrowseMarketPlaceUn*a

sYtis=te2e

il,

n

*enAddAFriendUi

i

⁷s*y

6

Mi2l2

t

e*CAC

mp

et

comlargestaudienceis*tmoAn

inMtu…io3sineoadeSewevtember(BOr

soer

PFroiefnil

s)

Unieus

Visitem:21

Million

ri

V=*

.Vi-i26e2

a:iSM

Mon

inutesloin

or

Browse

NetworksRead

Discussion

Boards

a-uP

iVstia

s

%u2ttnSearch

for

Members

and

GroyPs

VuiP-…

ri…

iti

s

s

*0*4e7$mitl

MittesomniTiUueellioimniTU

o

e

t

ot5em64in

rl

ta

i

tnhsUniawe

Visitors:

14envime

Pr

Visaa

e

4=70

nutsneottTimUn

zce

e

ek:

t

tyStBalr

dbus

h2

d0e2represent

usage

intesityonrweouanesvncEd

Chang

23Darkeshade2/16/2011T

sri

:

o

M1

$i

suteominnt1:*ssimUWhat

UsersWant?·People

care

about

other

people一careabout

peoplethey

know·

connect

to

people

they

do

not

know一

about

who

other

people

are一aboutwhatotherpeople

are

doingDiscover

interesting

information一

basedonother

people2/16/2011

Ed

Chang

24InformationOverflow

ChallengeT

u

n

n

,too

many

choices

of·DesiringaSocialNetworkRecommendation

System2/16/2011

Ed

Chang

25appseopleaysammforoo··“/

ds

i

e

me

to

manageorkfulltwanreliaheenlinneonmysooRecommendationSystemu

n

mmendationoncoRdaForumommeyenitRComFrien···Application

Suggestion

·Ads

Matching2/16/2011

Ed

Chang

26Organizing

the

World's

Information,Socially·

区平

台(Social

Platform)·云

算(Cloud

Computing)·结论与前瞻(Concluding

Remarks)2/16/2011Ed

Chang27(3

)

算(4

的云计算空强无限无限··)是你的的云计就备在后设不录何所登任无··(1)数据在云端·不怕丢失·不必备份(2

)

端升下级载在云动必件不自软··业界趋势:云计算时代的到来无限速度互联网搜索:

云计算的例子2.分布式预处理数据以便为搜索提供服务:

Gcogle

Infrastructure(thousands

sesdi

e

for

mass

data一

Google

FileSystem一ngucproMa—taVeteateTatH* *

RPmutm-CooeA

Clto*eC.OteSeeCiePit*1边

a7004#Cemmm

-hm1

rmCeat

conputs?-

Google

Cloud

Computing

inraetrguesy

BAEd

CSsnouhSm29

CsOaltm

Jn

msE+eaooglIn

limodity

servers

arcund

theworldofco1.用户输入查询关键字3.返回搜索结果2/16/2011netEelermCemuoudtse2145541335245341352141554254331521312345133352115241355125Collaborative

FilteringGiven

a

matrix

that“encodes”data2/16/2011

Ed

Chang

30214554?133524?53?413521?455425?2

4335213123451333?52?1152?4435451245?Given

a

matrix

that“encodes

”dataManyapplications·User-Community·User-User·Ads

-User·Ads-Community·etc.(collaborativefiltering):Ed

Chang

31Communities2/16/2011UsersCollaborative

Filtering(CF)[Breese,Heckerman

and

Kadie

1998]·Memory-based—

h

il

fi

sm

i

r

,

)·Model-based—Build

a

model

of

relationship

between

subject

matters一Make

predictions

basedonthe

constructedmodelcstreolehprofineiglasts,simieareenrrilasemuwsiilar”samssr,aumrsisetugnBouiveDifferent

similarity

measures

yield

different

techniques一

ons

based

on

the

preferences

of

theseersictiueimilarake

p“sM2/16/2011

Ed

Chang

32Memory-Based

Model[Goldbertetal.1992;Resniket

al.1994;Konstant

et

al.1997]·

Pros一

Simplicity,avoid

model-building

stage·

Cons—MemoryandTimeconsuming,uses

the

entiredatabaseeverytimetomake

aprediction一

Cannotmake

prediction

ifthe

user

has

no

items

incommonwithother

users2/16/2011

Ed

Chang

33Model-Based

Model[Breese

et

al.1998;Hoffman

1999;Blei

et

al.2004]Pros一

it

t

is

much

smaller

than

the

nti

ery

the

model

instead

ofCons一

Model-buildingtakestimedatasetdiction,quererpeertheFastatasemodelllabiactuScala2/16/2011

Ed

Chang

34Algorithm

Selection

Criteria

S

al

i

commendation·CloudComputing!ngReainmeTr-ticalableNear-re·

e

t

it

r

i

gs

r

irablecityDesasciantianhTwaldealmenCanIncr2/16/2011

Ed

Chang

35Model-based

PriorWorkLatent

Semantic

Analysis

(LSA)·

ProbabilisticLSA(PLSA)··Latent

Dirichlet

Allocation(LDA)2/16/2011

Ed

Chang

36·

Maphigh-dimensional

count

vectors

tolowerdimensional

representation

called

latent

semantic

space·BySVD

decomposition:A=UEVTDocs

Word×D

SWxD

WxTA=Word-document

co-occurrence

matrixU;

=How

likely

word

i

belongs

to

topic

jji

=How

significant

topic

j

isVi¹=How

likely

topic

i

belongs

to

docjLatent

Semantic

Analysis(LSA)[Deerwester

et

al.1990]2/16/2011

Ed

Chang

37

Latent

Semantic

Analysis(cont.)·LSAkeepsk-largestsingular

values一

Low-rankapproximationtothe

original

matrix一

Savespace,de-noisifiedandreducesparsityOCS·Make

ecommendations

usingA—Word-word

similarity:A

AT-Doc-doc

similarity:ÄT

A—Word-doc

relationship:AWxKWxrDATopKxD2/16/2011ChangWordsEd38Probabilistic

Latent

Semantic

Analysis(PLSA)[Hoffman

1999;Hoffman20041Document

is

viewed

as

a

bag

ofwords··

|

,

licit

meaningiEMexpviwrsPlingd),deea,Plw)ityodel-P(robaMP··A

latent

semantic

layer

isconstructed

inbetweendocuments

and

words

·P(w,d)=P(d)P(w|d)=P(d)EzP(w|z)P(z|d)algorithm2/16/2011ChangEd39·

LDA[Blei

et

al.2003]一

Provideacompletegenerativemodelwith

Dirichlet

prior

·

AT

[Griffiths

&Steyvers

2004]一

Includeauthorshipinformation一

Document

iscategorizedbyauthors

andtopics·

ART[McCallum2004]一

Includeemailrecipientas

additional

information一

Email

is

categorized

by

author,recipients

andtopics2/16/2011

Ed

Chang

40PLSAextensions·

PHITS[Cohn

&Chang

2000]·

e

t

u

[Cohn

&Hoffmann

2001]一

Model

contents(words)and

inter-connectivity

of

documentsHITSencePrrSA

andco-occLnPiombinationofocument-citaclA

li一CombinationalCollaborativeFiltering(CCF)·Fusemultiple

information—Alleviate

the

information

sparsity

problem·Hybridtrainingscheme—Gibbs

sampling

as

initializations

for

EM·Parallelization—Achieve

linear

speedup

with

the

numberof

machines2/16/2011

Ed

Chang

41algorithm·

Givenacollectionofco-occurrencedata-Community:C

={C₁

,C₂

,…,C}-User:U={u₁

,u₂

,…,um}一Description:D={d₁

,d₂

,…,dv}-Latentaspect:Z={z₁

,z₂

,…,zk}·Models—Baseline

models·Community-User(C-U)model·Community-Description(C-D)model-CCF:CombinationalCollaborativeFiltering·

Combines

both

baseline

modelsNotations2/16/2011ChangEd42·Communityis

viewed

as

a

bag

ofwords

·canddarerendered

conditionally·Gi

ent

pr

rodu

zword

d1.A

community

cischosenuniformly

2.A

topic

zisselectedfromP(z|c)3.Awordd

isgenerated

from

P(d|z)Chang

43hgcnaiecrtonss,firativpeneeen·Communityisviewed

as

a

bag

of

usersc

and

u

are

rendered

conditionallyindependentbyintroducingz■(

Generative

process,for

each

user

u

1.Acommunitycischosenuniformly

2.A

topic

zisselected

fromP(z|c)3.Auser

u

is

generatedfrom

P(u|z)2/16/2011

EdModelsCommunity-Description(C-D)model

BaselineCommunity-User(C-U)model

-Pros1.Cluster

communities

based

oncommunity

content(description

words)-Cons1.No

personalized

recommendation2.Donot

considerthe

overlapped

usersbetween

communitiesChang

441.C-U

matrix

information2.Cannot

take

similarity2/16/2011is

sparse,may

sufferfromsparsity

problemadvantage

of

contentbetween

communitiesEdModels(cont.)Community-Description(C-D)model

BaselineCommunity-User(C-U)model

-Pros1.Personalized*Conssuggestioncommunity·CCFcombines

both

baseline

models*A

community

isviewed

as-abag

of

users

AND

a

bag

ofwords*By

adding

C-U,CCF

can

performpersonalizedrecommendationwhichC-Dalone

cannot·By

adding

C-D,CCF

can

perform

betterpersonalizedrecommendationthan

C-Ualonewhich

may

sufferfrom

sparsity·Things

CCF

can

do

that

C-U

and

C-Dcannot-P(d)u),relate

user

to

word-Useful

for

user

targeting

adsCombinational

Filtering(CCF)model

C

P(c)—P(zlc)ZP(ulz)

P(dlz)U

dModelCCF2/16/2011CollaborativeChangEd45Algorithm

Requirements

S

al

i

commendationngReainmeTr-ticalableNear-reIncrementalTraining

is

Desirable2/16/2011

Ed

Chang

46ParallelizingCCFDetailsomitted2/16/2011

Ed

Chang

47(3

)

算(4

的云计算空强无限无限··)是你的的云计就备在后设不录何所登任无··(1)数据在云端·不怕丢失·不必备份(2

)

端升下级载在云动必件不自软··业界趋势:云计算时代的到来无限速度ExperimentsonOrkut

Dataset·Data

description-Collected

on一

Two

types

ofJuly

26,2007data

were

extracted·Community-user,community-description

一312,385users—109,987communities·

—191,034

unique

Englishwords·Speedup·

Community

recommendation·U

m

i

milarity/clusteringtysiartylisimimunserCo2/16/2011Ed

Chang49Community

Recommendation·

Evaluation

Method一

No

ground-truth,no

user

clicks

available—Leave-one-out:randomly

delete

one

community

foreach

user一

Whether

the

deleted

community

can

be

recovered·

Evaluation

metric—Precisionand

Recall2/16/2011

Ed

Chang

50Lengthoftherecommendation

listPecertageObservations:

CCFoutperforms

C-U

e

cboe

u

Citi

C

Uu

nhascaserttermmeehr,todmjoinThe口

For

top20,precision/recall

of

CCFare

twice

higher

than

those

of

C-UNumber

of

communities

a

user

has

joinedEpredict2/16/2011Ed

Chang51·

The

Orkut

dataset

enjoys

a

linear

speedup

when

the

number

of2/16/2011

Ed

Chang

52machines

is

up

to

100

Reduces

the

training

time

from

one

day

to

less

than

14

minutes··RuntimeSpeedupMachinesTime(aee.)Specdup100.23310204,32621.3502.28040.510O1,01491.1200706116But,what

makes

the

speedup

slow

down

after

100

machines?Number

of

mnchinesSpeedup200RuntimeSpeedup(cont.)·Trainingtimeconsistsoftwo

parts:一

Computationtime(Comp)一

Communicationtime(Comm)sdoedupNumberofmachinesNumbarofmachines2/16/2011

Ed

Chang

53CCFSummary·CombinationalCollaborative

Filtering—Fuse

bags

ofwordsand

bags

of

usersinformation—Hybridtrainingprovides

better

adliitionsfor

EM

ratherthan

random一

Parallelizeto

handle

large-scaledatasetsngzaetiseini2/16/2011

Ed

Chang

54China'sContributionson/to

CloudComputing

Parallel

CCF Parallel

SVMs(Kernel

Machines)·

ParallelSpectral

Clustering·

Parallel

Expectation

Maximization

·ParallelAssociation

Mining·

Parallel

LDA2/16/2011

Ed

Chang

55Parallel

SVDSpeeding

up

SVMs

[NIPS

2007]·Approximate

MatrixFactorization·

Parallelization

Open

source

@/p/psvm·A

task

that

takes

7

dayson

1machinetakes

1

hourson500

machines350+downloads

since

December

072/16/2011

Ed

Chang

56≈XIncompleteCholesky

Factorization(ICF)p<<n→Conserve

Storage2/16/2011

Ed

Chang

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论