用机器学习的方法理解社会媒体课件_第1页
用机器学习的方法理解社会媒体课件_第2页
用机器学习的方法理解社会媒体课件_第3页
用机器学习的方法理解社会媒体课件_第4页
用机器学习的方法理解社会媒体课件_第5页
已阅读5页,还剩187页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Understanding

Social

Media

with

Machine

Learning

Xiaojin

Zhu

jerryzhu@

Department

of

Computer

Sciences

University

of

Wisconsin–Madison,

USA

CCF/ADL

Beijing

2013Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

20131

/

95UnderstandingSocialMediaZhuOutline1234Spatio-Temporal

Signal

Recovery

from

Social

MediaMachine

Learning

Basics

Probability

Statistical

Estimation

Decision

Theory

Graphical

Models

Regularization

Stochastic

ProcessesSocioscope:

A

Probabilistic

Model

for

Social

MediaCase

Study:

RoadkillZhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

20132

/

95Outline1Spatio-TemporalSignal

Spatio-Temporal

Signal

Recovery

from

Social

MediaOutline1234Spatio-Temporal

Signal

Recovery

from

Social

MediaMachine

Learning

Basics

Probability

Statistical

Estimation

Decision

Theory

Graphical

Models

Regularization

Stochastic

ProcessesSocioscope:

A

Probabilistic

Model

for

Social

MediaCase

Study:

RoadkillZhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

20133

/

95 Spatio-TemporalSignalRecove

Spatio-Temporal

Signal

Recovery

from

Social

MediaSpatio-temporal

Signal:

When,

Where,

How

Much

Direct

instrumental

sensing

is

di

cult

and

expensiveZhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

20134

/

95 Spatio-TemporalSignalRecov

Spatio-Temporal

Signal

Recovery

from

Social

MediaHumans

as

SensorsZhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

20135

/

95 Spatio-TemporalSignalRecove

Spatio-Temporal

Signal

Recovery

from

Social

MediaHumans

as

Sensors

Not

“hot

trend”

discovery:

We

know

what

event

we

want

to

monitor

Not

natural

language

processing

for

social

media:

We

are

given

a

reliable

text

classifier

for

“hit”

Our

task:

precisely

estimating

a

spatiotemporal

intensity

function

fst

of

a

pre-defined

target

phenomenon.Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

20136

/

95 Spatio-TemporalSignalRecov

Spatio-Temporal

Signal

Recovery

from

Social

MediaChallenges

of

Using

Humans

as

Sensors

Keyword

doesn’t

always

mean

eventIII

was

just

told

I

look

like

dead

crow.Don’t

blame

me

if

one

day

I

treat

you

like

a

dead

crow.Human

sensors

aren’t

under

our

controlLocation

stamps

may

be

erroneous

or

missingIIII3%

have

GPS

coordinates:

(-98.24,

23.22)47%

have

valid

user

profile

location:

Bristol,

UK,

New

York50%

don’t

have

valid

location

informationHogwarts,

In

the

tra

c..blah,

Sitting

On

A

TacoZhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

20137

/

95 Spatio-TemporalSignalRecov

Spatio-Temporal

Signal

Recovery

from

Social

MediaProblem

Definition

Input:

A

list

of

time

and

location

stamps

of

the

target

posts.

Output:

fst

Intensity

of

target

phenomenon

at

location

s

(e.g.,

New

York)

and

time

t

(e.g.,

0-1am)Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

20138

/

95 Spatio-TemporalSignalRecov

Spatio-Temporal

Signal

Recovery

from

Social

MediaWhy

Simple

Estimation

is

Bad

fst

=

xst,

the

count

of

target

posts

in

bin

(s,t)

Justification:

MLE

of

the

model

x

Poisson(f)

However,IIIPopulation

Bias:

Assume

fst

=

fs0t0,

if

more

users

in

(s,t),

thenxst

>

xs0t0Imprecise

location:

Posts

without

location

stamp,

noisy

user

profilelocationZero/Low

counts:

If

we

don’t

see

tweets

from

Antarctica,

no

penguinsthere?Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

20139

/

95 Spatio-TemporalSignalRecov

Machine

Learning

BasicsOutline1234Spatio-Temporal

Signal

Recovery

from

Social

MediaMachine

Learning

Basics

Probability

Statistical

Estimation

Decision

Theory

Graphical

Models

Regularization

Stochastic

ProcessesSocioscope:

A

Probabilistic

Model

for

Social

MediaCase

Study:

RoadkillZhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201310

/

95 MachineLearningBasics1SpatiMachine

Learning

BasicsProbabilityOutline1234Spatio-Temporal

Signal

Recovery

from

Social

MediaMachine

Learning

Basics

Probability

Statistical

Estimation

Decision

Theory

Graphical

Models

Regularization

Stochastic

ProcessesSocioscope:

A

Probabilistic

Model

for

Social

MediaCase

Study:

RoadkillZhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201311

/

95MachineLearningBasicsProbabiMachine

Learning

BasicsProbabilityProbability

The

probability

of

a

discrete

random

variable

A

taking

the

value

a

is

P(A

=

a)

2

[0,1].

Sometimes

written

as

P(a)

when

no

danger

of

confusion.

Normalization

Joint

probability

P(A

=

a,B

=

b)

=

P(a,b),

the

two

events

both

happen

at

the

same

time.

Marginalization

P(A

=

a)

=

B”.

P(a,b)

The

product

rule

P(a,b)

=

P(a)P(b|a)

=

P(b)P(a|b).Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201312

/

95MachineLearningBasicsProbabiBayes

rule

P(a|b)

=P(b|a)P(a).In

general,

P(a|b,C)

=P(b|C)Rp(D|✓)p(✓)d✓

the

evidence,Machine

Learning

BasicsProbabilityBayes

RuleP(b)

P(b|a,C)P(a|C)where

C

can

be

one

or

morerandom

variables.Bayesian

approach:

when

is

model

parameter,

D

is

observed

data,we

havep(✓|D)

=p(D|✓)p(✓)

p(D),Rp(D|✓)d✓

6=

1),IIIIp(✓)

is

the

prior,p(D|✓)

the

likelihood

function

(of

✓,

not

normalized:p(D)

=p(✓|D)

the

posterior.Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201313

/

95BayesruleP(a|b)=P(b|a)P(a).Machine

Learning

BasicsProbabilityIndependence

The

product

rule

can

be

simplified

as

P(a,b)

=

P(a)P(b)

i↵

A

and

B

are

independent

Equivalently,

P(a|b)

=

P(a),

P(b|a)

=

P(b).Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201314

/

95MachineLearningBasicsProbabiR

x2P(x1

<

X

<

x2)

=Z

1R

1Machine

Learning

BasicsProbabilityProbability

density

A

continuous

random

variable

x

has

a

probability

density

function

(pdf)

p(x)

2

[0,1].

p(x)

>

1

is

possible!

Integrates

to

1.

x1Marginalization

p(x)

=

p(x)dx

=

1

1p(x)dx

1

p(x,y)dyZhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201315

/

95Rx2P(x1<X<x2)=Z1R1MapMachine

Learning

BasicsProbabilityExpectation

and

Variance

The

expectation

(“mean”

or

“average”)

of

a

function

f

under

the

probability

distribution

P

is

EP[f]

=

P(a)f(a)

a

Ep[f]

=

p(x)f(x)dx

x

In

particular

if

f(x)

=

x,

this

is

the

mean

of

the

random

variable

x.

The

variance

of

f

isVar(f)

=

E[(f(x)E[f(x)])2]

=

E[f(x)2]E[f(x)]2The

standard

deviation

is

std(f)

=Var(f).Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201316

/

95pMachineLearningBasicsProbabMachine

Learning

BasicsProbabilityMultivariate

Statistics

When

x,y

are

vectors,

E[x]

is

the

mean

vector

Cov(x,y)

is

the

covariance

matrix

with

i,j-th

entry

being

Cov(xi,yj).Cov(x,y)

=

Ex,y[(xE[x])(yE[y])]

=

Ex,y[xy]E[x]E[y]Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201317

/

95MachineLearningBasicsProbabi8

◆<px(1:8

✓:Qdk=1

pkPdMachine

Learning

BasicsProbabilitySome

Discrete

Distributions

a

if

P(X

=

a)

=

1

Binomial.

n

(number

of

trials)

and

p

(head

probability)p)n

x

for

x

=

0,1,...,notherwise

n

f(x)

=

x

0Bernoulli.

Binomial

with

n

=

1.Multinomial

p

=

(p1,...,pd)>

(d-sided

die)f(x)

=

nx1,...,xd◆xk<

0

if

k=1

xk

=

notherwiseZhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201318

/

958✓◆<px(1:8✓:Qdk=1pMachine

Learning

BasicsProbabilityMore

Discrete

Distributions

Poisson.

X

Poisson(

)

if

xx!

f(x)

=

efor

x

=

0,1,2,....

the

rate

or

intensity

parametermean:,

variance:

2)

thenX1

+

X2

Poisson(

1

+

2).This

is

a

distribution

on

unbounded

counts

with

a

probability

massfunction“hump”

(mode

at

d

e1).Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201319

/

95MachineLearningBasicsProbabiGaussian

(Normal):

X

N(µ,Machine

Learning

BasicsProbabilitySome

Continuous

Distributions2)with

parameters

µ

2

R

(themean)

and

2

(the

variance)

1f(x)

=

p2⇡exp✓(x2µ)22◆.is

the

standard

deviation.If

µ

=

0,=

1,

X

has

a

standard

normal

distribution.

2),

then

Z

=

(X

2

2i

iZhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201320

/

95Gaussian(Normal):X⇠N(µ,MaMachine

Learning

BasicsProbabilitySome

Continuous

Distributions

Multivariate

Gaussian.

Let

x,µ

2

Rd,

2

S+

d

a

symmetric,

positive

definite

matrix

of

size

d

d.

Then

X

N(µ,⌃)

with

PDF

1

1

1

f(x)

=

exp

(x

µ)

(x

µ)

.

2

and

1

its

inverseZhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201321

/

95MachineLearningBasicsProbabiMachine

Learning

BasicsProbabilityMarginal

and

Conditional

of

Gaussian

If

two

(groups

of)

variables

x,y

are

jointly

Gaussian:xy⇠

N✓µxµy,

A

CC>

B◆(1)(Marginal)

x

N(µx,A)(Conditional)

y|x

N(µy

+

C>A1(xµx),BC>A1C)Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201322

/

95MachineLearningBasicsProbabiMachine

Learning

BasicsProbabilityMore

Continuous

Distributions

0

with

>

0.

Generalizes

factorial:

(n)

=

(n

1)!

when

n

is

a

positive

integer.

(↵

+

1)

=

(↵)

for

>

0.

parameter

>

0

and

scale

parameter

>

0f(x)

=↵1

(↵)x↵1ex/,

x

>

0.Conjugate

prior

for

Poisson

rate.Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201323

/

95MachineLearningBasicsProbabiMachine

Learning

BasicsStatistical

EstimationOutline1234Spatio-Temporal

Signal

Recovery

from

Social

MediaMachine

Learning

Basics

Probability

Statistical

Estimation

Decision

Theory

Graphical

Models

Regularization

Stochastic

ProcessesSocioscope:

A

Probabilistic

Model

for

Social

MediaCase

Study:

RoadkillZhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201324

/

95MachineLearningBasicsStatistMachine

Learning

BasicsStatistical

EstimationParametric

Models

A

statistical

model

H

is

a

set

of

distributions.

In

machine

learning,

we

call

H

the

hypothesis

space.

A

parametric

model

can

be

parametrized

by

a

finite

number

of

parameters:

f(x)

f(x;✓)

with

parameter

2

Rd:

H

=

f(x;✓)

:

2

Rd

where

is

the

parameter

space.Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201325

/

95MachineLearningBasicsStatistStatistical

Estimation

Machine

Learning

BasicsParametric

Models

We

denote

the

expectationE✓(g)

=Zxg(x)f(x;✓)dxE✓

means

Ex⇠f(x;✓),

not

over

di↵erent

✓’s.

data

1All

(parametric)

models

are

wrong.

Some

are

more

useful

than

others.Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201326

/

95StatisticalEstimation MachinMachine

Learning

BasicsStatistical

EstimationNonparametric

model

A

nonparametric

model

cannot

be

parametrized

by

a

fixed

number

of

parameters.

Model

complexity

grows

indefinitely

with

sample

size

Example:

H

=

{P

:

V

arP(X)

<

1}.

Given

iid

data

x1,...,xn,

the

optimal

estimator

of

the

mean

is

again

xi.

Nonparametric

makes

weaker

model

assumptions

and

thus

is

preferred.

But

parametric

models

converge

faster

and

are

more

practical.Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201327

/

95MachineLearningBasicsStatistMachine

Learning

BasicsStatistical

Estimation(

✓Estimation

X1

...Xn

that

attempts

to

estimate

a

parameter

✓.

This

is

the

“learning”

in

machine

learning!

Example:

In

classification

Xi

=

Pxi,yi)

and

bn

is

the

learned

model.

Consistent

estimators

learn

the

correct

model

with

more

training

data

eventually.Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201328

/

95MachineLearningBasicsStatistbias(✓

bn)

=

E✓(✓

bn)qThe

standard

error

of

an

estimator

is

se(✓

bn)

=Var✓(✓

bn)Pi

xi,

where

xi

N(0,1).

Then

the

standardMachine

Learning

BasicsStatistical

EstimationBias

E✓

is

w.r.t.

the

joint

distribution

f(x1,...,xn;✓)

=

i=1

f(xi;✓).

The

bias

of

the

estimator

is✓ˆdeviation

of

xi

is

1

regardless

of

n.

In

contrast,

se(µ)

=

1/pn

=

n12An

estimator

is

unbiased

if

bias(✓

bn)

=

0.Example:

Let

µ

ˆ

=

n

1which

decreases

with

n.Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201329

/

95bias(✓bn)=E✓(✓bn)qThestanmse(✓

bn)

=

E✓

(✓

bnMachine

Learning

BasicsStatistical

EstimationMSE

The

mean

squared

error

of

an

estimator

is⇣✓)2⌘Bias-variance

decompositionmse(✓

bn)

=

bias2(✓

bn)

+

se2(✓

bn)

=

bias2(✓

bn)

+

Var✓(✓

bn)

PZhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201330

/

95mse(✓bn)=E✓(✓bnMachineYMachine

Learning

BasicsStatistical

EstimationMaximum

Likelihood

Let

x1,...,xn

f(x;✓)

where

2

⇥.

The

likelihood

function

isLn(✓)

=

f(x1,...,xn;✓)

=

ni=1f(xi;✓)The

log

likelihood

function

is

`n(✓)

=

logLn(✓).The

maximum

likelihood

estimator

(MLE)

is

bn

=

argmax✓2⇥Ln(✓)

=

argmax✓2⇥`n(✓)Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201331

/

95YMachineLearningBasicsStatisMachine

Learning

BasicsStatistical

EstimationMLE

examples

The

MLE

for

p(head)

from

n

coin

flips

is

count(head)/n

for

i

Xi

and

2

=

1/n

(Xi

2.

The

MLE

does

not

always

agree

with

intuition.

The

MLE

for

X1,...,Xn

uniform(0,✓)

is

b=

max(X1,...,Xn).Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201332

/

95MachineLearningBasicsStatistMachine

Learning

BasicsStatistical

EstimationProperties

of

MLE

When

H

is

identifiable,

underPcertain

conditions

(see

Wasserman

parameter

✓.

That

is,

the

MLE

is

consistent.

Asymptotic

Normality:

Let

se

=

1/In(✓)

where

In(✓)

is

the

Fisher

information,

and

N(0,1)

se

The

MLE

is

asymptotically

e

cient

(achieves

the

Cram´er-Rao

lower

bound),

“best”

among

unbiased

estimators.Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201333

/

95MachineLearningBasicsStatistMachine

Learning

BasicsStatistical

EstimationFrequentist

statistics

Probability

refers

to

limiting

relative

frequency.

Data

are

random.

Estimators

are

random

because

they

are

functions

of

data.

Parameters

are

fixed,

unknown

constants

not

subject

to

probabilistic

statements.

Procedures

are

subject

to

probabilistic

statements,

for

example

95%

confidence

intervals

trap

the

true

parameter

value

95

Classifiers,

even

learned

with

deterministic

procedures,

are

random

because

the

training

set

is

random.

PAC

bound

is

frequentist.

Most

procedures

in

machine

learning

are

frequentist

methods.Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201334

/

95MachineLearningBasicsStatistMachine

Learning

BasicsStatistical

EstimationBayesian

statistics

Probability

refers

to

degree

of

belief.

Inference

about

a

parameter

is

by

producing

a

probability

distributions

on

it.

Starts

with

prior

distribution

p(✓).

Likelihood

function

p(x

|

✓),

a

function

of

not

x.

After

observing

data

x,

one

applies

the

Bayes

rule

to

obtain

the

posterior

1

=

p(✓

Z

evidence.

Prediction

by

integrating

parameters

out:

p(x

|

Data)

=

Z

p(x

|

✓)p(✓

|

Data)d✓Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201335

/

95MachineLearningBasicsStatistMachine

Learning

BasicsStatistical

EstimationFrequentist

vs

Bayesian

in

machine

learning

Frequentists

produce

a

point

estimate

ˆ

from

Data,

and

predict

with

p(x

|

ˆ).

integrating

over

✓s.

Bayesian

integration

is

often

intractable,

need

either

“nice”

distributions

or

approximations.

The

maximum

a

posteriori

(MAP)

estimate

✓MAP

=

argmax✓p(✓

|

x)

is

a

point

estimate

and

not

Bayesian.Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201336

/

95MachineLearningBasicsStatistMachine

Learning

BasicsDecision

TheoryOutline1234Spatio-Temporal

Signal

Recovery

from

Social

MediaMachine

Learning

Basics

Probability

Statistical

Estimation

Decision

Theory

Graphical

Models

Regularization

Stochastic

ProcessesSocioscope:

A

Probabilistic

Model

for

Social

MediaCase

Study:

RoadkillZhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201337

/

95MachineLearningBasicsDecisioMachine

Learning

BasicsDecision

Theory✓

✓✓

✓Comparing

Estimators

Training

set

D

=

(x1,...,xn)

p(x;✓)

Learned

model:

ˆ⌘

ˆ(D)

an

estimator

of

based

on

data

D.

Loss

function

L(✓,✓

ˆ)

:

7!

R+

squared

loss

L(✓,

ˆ)

=

(✓

ˆ)2

0

=

ˆ

1

6=

ˆ

KL

loss

L(✓,

ˆ)

=

p(x;✓)log

p(x;ˆ)

dx

Since

D

is

random,

both

ˆ(D)

and

L(✓,✓

ˆ)

are

random

variablesZhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201338

/

95MachineLearningBasicsDecisioMachine

Learning

BasicsDecision

Theory✓Risk

The

risk

R(✓,✓

ˆ)

is

the

expected

loss

R(✓,✓

ˆ)

=

ED[L(✓,✓

ˆ(D))]

ED

averaged

over

training

sets

D

sampled

from

the

true

The

risk

is

the

“average

training

set”

behavior

of

a

learning

algorithm

when

the

world

is

Not

computable:

we

don’t

know

which

the

world

is

in.

Assume

squared

loss.

Then

R(✓,✓

ˆ1)

=

1

(hint:

variance),

R(✓,

ˆ2)

=

ED(✓

3.14)2

=

(✓

3.14)2.

Smart

learning

algorithm

ˆ1

and

a

dumb

one

ˆ2.

However,

for

tasks✓

2

(3.141,3.14

+

1)

the

dumb

algorithm

is

better.Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201339

/

95MachineLearningBasicsDecisioDecision

Theory

Machine

Learning

BasicsMinimax

Estimatormaximum

riskRmax(✓

ˆ)

=

supR(✓,✓

ˆ)✓

✓The

minimax

estimator

ˆminimax

minimizes

the

maximum

risk

ˆminimax

=

arginf

supR(✓,✓

ˆ)

ˆ

✓The

infimum

is

over

all

estimators

ˆ.The

minimax

estimator

is

the

“best”

in

guarding

against

the

worstpossible

world.Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201340

/

95DecisionTheory MachineLearniMachine

Learning

BasicsGraphical

ModelsOutline1234Spatio-Temporal

Signal

Recovery

from

Social

MediaMachine

Learning

Basics

Probability

Statistical

Estimation

Decision

Theory

Graphical

Models

Regularization

Stochastic

ProcessesSocioscope:

A

Probabilistic

Model

for

Social

MediaCase

Study:

RoadkillZhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201341

/

95MachineLearningBasicsGraphicGraphical

Models

Machine

Learning

BasicsThe

envelope

quizZhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201342

/

95GraphicalModels MachineLearnMachine

Learning

BasicsGraphical

ModelsThe

envelope

quiz

P(E

=

1)

=

P(E

=

0)

=

1/2

P(B

=

r

|

E

=

1)

=

1/2,P(B

=

r

|

E

=

0)

=

0

1/2?

P(B=b)

Switch.

The

graphical

model:

E

BZhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201342

/

95MachineLearningBasicsGraphicMachine

Learning

BasicsGraphical

ModelsProbabilistic

Reasoning

The

world

is

reduced

to

a

set

of

random

variables

x1,...,xn

I

e.g.

(x1,...,xn

1)

Inference:

given

joint

distribution

p(x1,...,xn),

compute

I

p(x1,...,xn

1,xn)

1)

v

p(x1,...,xn

1,xn

=

v)

Learning:

estimate

p(x1,...,xn)

from

training

data

X(1),...,X(N),

(i)

(i)Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201343

/

95MachineLearningBasicsGraphicMachine

Learning

BasicsGraphical

ModelsIt

is

di

cult

to

reason

with

uncertainty

joint

distribution

p(x1,...,xn)IIexponential

na¨ıve

storage

(2n

for

binary

r.v.)hard

to

interpret

(conditional

independence)

I

Often

can’t

a↵ord

to

do

it

by

brute

forceIf

p(x1,...,xn)

not

given,

estimate

it

from

dataIOften

can’t

a↵ord

to

do

it

by

brute

forceZhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201344

/

95MachineLearningBasicsGraphicMachine

Learning

BasicsGraphical

ModelsGraphical

models

Graphical

models:

e

cient

representation,

inference,

and

learning

on

p(x1,...,xn),

exactly

or

approximately

Two

main

“flavors”:IIdirected

graphical

models

=

Bayesian

Networks

(often

frequentistinstead

of

Bayesian)undirected

graphical

models

=

Markov

Random

FieldsKey

idea:

make

conditional

independence

explicitZhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201345

/

95MachineLearningBasicsGraphicMachine

Learning

BasicsGraphical

ModelsBayesian

Network

Directed

graphical

models

are

also

called

Bayesian

networks

A

directed

graph

has

nodes

X

=

(x1,...,xn),

some

of

them

connected

by

directed

edges

xi

!

xj

A

cycle

is

a

directed

path

x1

!

...

!

xk

where

x1

=

xk

A

directed

acyclic

graph

(DAG)

contains

no

cycles

A

Bayesian

network

on

the

DAG

is

a

family

of

distributions

satisfying{p

|

p(X)

=Yip(xi

|

Pa(xi))}where

Pa(xi)

is

the

set

of

parents

of

xi.p(xi

|

Pa(xi))

is

the

conditional

probability

distribution

(CPD)

at

xiBy

specifying

the

CPDs

for

all

i,

we

specify

a

particular

distributionp(X)Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201346

/

95MachineLearningBasicsGraphicExample:

Alarm

Binary

variablesGraphical

Models

P(E)=0.002

E

A

M

P(M

|

A)

=

0.7

P(M

|

~A)

=

0.01Machine

Learning

Basics

P(B)=0.001

BP(A

|

B,

E)

=

0.95P(A

|

B,

~E)

=

0.94P(A

|

~B,

E)

=

0.29P(A

|

~B,

~E)

=

0.001

J

P(J

|

A)

=

0.9

P(J

|

~A)

=

0.05

P(B,⇠

E,A,J,⇠

M)=

P(B)P(⇠

E)P(A

|

B,⇠

E)P(J

|

A)P(⇠

M

|

A)

0.7)⇡

.000253Zhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201347

/

95Example:AlarmGraphicalModelGraphical

Models

Machine

Learning

BasicsExample:

Naive

Bayes

yy...x1xdx

dp(y,x1,...xd)

=

p(y)

i=1

p(xiUsed

extensively

in

natural

language

processingPlate

representation

on

the

rightZhu

(U

Wisconsin)Understanding

Social

MediaCCF/ADL

Beijing

201348

/

95GraphicalModels MachineLearnGraphical

Models

Machine

Lea

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论