




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Understanding
Social
Media
with
Machine
Learning
Xiaojin
Zhu
jerryzhu@
Department
of
Computer
Sciences
University
of
Wisconsin–Madison,
USA
CCF/ADL
Beijing
2013Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
20131
/
95UnderstandingSocialMediaZhuOutline1234Spatio-Temporal
Signal
Recovery
from
Social
MediaMachine
Learning
Basics
Probability
Statistical
Estimation
Decision
Theory
Graphical
Models
Regularization
Stochastic
ProcessesSocioscope:
A
Probabilistic
Model
for
Social
MediaCase
Study:
RoadkillZhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
20132
/
95Outline1Spatio-TemporalSignal
Spatio-Temporal
Signal
Recovery
from
Social
MediaOutline1234Spatio-Temporal
Signal
Recovery
from
Social
MediaMachine
Learning
Basics
Probability
Statistical
Estimation
Decision
Theory
Graphical
Models
Regularization
Stochastic
ProcessesSocioscope:
A
Probabilistic
Model
for
Social
MediaCase
Study:
RoadkillZhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
20133
/
95 Spatio-TemporalSignalRecove
Spatio-Temporal
Signal
Recovery
from
Social
MediaSpatio-temporal
Signal:
When,
Where,
How
Much
Direct
instrumental
sensing
is
di
cult
and
expensiveZhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
20134
/
95 Spatio-TemporalSignalRecov
Spatio-Temporal
Signal
Recovery
from
Social
MediaHumans
as
SensorsZhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
20135
/
95 Spatio-TemporalSignalRecove
Spatio-Temporal
Signal
Recovery
from
Social
MediaHumans
as
Sensors
Not
“hot
trend”
discovery:
We
know
what
event
we
want
to
monitor
Not
natural
language
processing
for
social
media:
We
are
given
a
reliable
text
classifier
for
“hit”
Our
task:
precisely
estimating
a
spatiotemporal
intensity
function
fst
of
a
pre-defined
target
phenomenon.Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
20136
/
95 Spatio-TemporalSignalRecov
Spatio-Temporal
Signal
Recovery
from
Social
MediaChallenges
of
Using
Humans
as
Sensors
Keyword
doesn’t
always
mean
eventIII
was
just
told
I
look
like
dead
crow.Don’t
blame
me
if
one
day
I
treat
you
like
a
dead
crow.Human
sensors
aren’t
under
our
controlLocation
stamps
may
be
erroneous
or
missingIIII3%
have
GPS
coordinates:
(-98.24,
23.22)47%
have
valid
user
profile
location:
Bristol,
UK,
New
York50%
don’t
have
valid
location
informationHogwarts,
In
the
tra
c..blah,
Sitting
On
A
TacoZhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
20137
/
95 Spatio-TemporalSignalRecov
Spatio-Temporal
Signal
Recovery
from
Social
MediaProblem
Definition
Input:
A
list
of
time
and
location
stamps
of
the
target
posts.
Output:
fst
Intensity
of
target
phenomenon
at
location
s
(e.g.,
New
York)
and
time
t
(e.g.,
0-1am)Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
20138
/
95 Spatio-TemporalSignalRecov
Spatio-Temporal
Signal
Recovery
from
Social
MediaWhy
Simple
Estimation
is
Bad
fst
=
xst,
the
count
of
target
posts
in
bin
(s,t)
Justification:
MLE
of
the
model
x
⇠
Poisson(f)
However,IIIPopulation
Bias:
Assume
fst
=
fs0t0,
if
more
users
in
(s,t),
thenxst
>
xs0t0Imprecise
location:
Posts
without
location
stamp,
noisy
user
profilelocationZero/Low
counts:
If
we
don’t
see
tweets
from
Antarctica,
no
penguinsthere?Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
20139
/
95 Spatio-TemporalSignalRecov
Machine
Learning
BasicsOutline1234Spatio-Temporal
Signal
Recovery
from
Social
MediaMachine
Learning
Basics
Probability
Statistical
Estimation
Decision
Theory
Graphical
Models
Regularization
Stochastic
ProcessesSocioscope:
A
Probabilistic
Model
for
Social
MediaCase
Study:
RoadkillZhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201310
/
95 MachineLearningBasics1SpatiMachine
Learning
BasicsProbabilityOutline1234Spatio-Temporal
Signal
Recovery
from
Social
MediaMachine
Learning
Basics
Probability
Statistical
Estimation
Decision
Theory
Graphical
Models
Regularization
Stochastic
ProcessesSocioscope:
A
Probabilistic
Model
for
Social
MediaCase
Study:
RoadkillZhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201311
/
95MachineLearningBasicsProbabiMachine
Learning
BasicsProbabilityProbability
The
probability
of
a
discrete
random
variable
A
taking
the
value
a
is
P(A
=
a)
2
[0,1].
Sometimes
written
as
P(a)
when
no
danger
of
confusion.
Normalization
Joint
probability
P(A
=
a,B
=
b)
=
P(a,b),
the
two
events
both
happen
at
the
same
time.
Marginalization
P(A
=
a)
=
B”.
P(a,b)
The
product
rule
P(a,b)
=
P(a)P(b|a)
=
P(b)P(a|b).Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201312
/
95MachineLearningBasicsProbabiBayes
rule
P(a|b)
=P(b|a)P(a).In
general,
P(a|b,C)
=P(b|C)Rp(D|✓)p(✓)d✓
the
evidence,Machine
Learning
BasicsProbabilityBayes
RuleP(b)
P(b|a,C)P(a|C)where
C
can
be
one
or
morerandom
variables.Bayesian
approach:
when
✓
is
model
parameter,
D
is
observed
data,we
havep(✓|D)
=p(D|✓)p(✓)
p(D),Rp(D|✓)d✓
6=
1),IIIIp(✓)
is
the
prior,p(D|✓)
the
likelihood
function
(of
✓,
not
normalized:p(D)
=p(✓|D)
the
posterior.Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201313
/
95BayesruleP(a|b)=P(b|a)P(a).Machine
Learning
BasicsProbabilityIndependence
The
product
rule
can
be
simplified
as
P(a,b)
=
P(a)P(b)
i↵
A
and
B
are
independent
Equivalently,
P(a|b)
=
P(a),
P(b|a)
=
P(b).Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201314
/
95MachineLearningBasicsProbabiR
x2P(x1
<
X
<
x2)
=Z
1R
1Machine
Learning
BasicsProbabilityProbability
density
A
continuous
random
variable
x
has
a
probability
density
function
(pdf)
p(x)
2
[0,1].
p(x)
>
1
is
possible!
Integrates
to
1.
x1Marginalization
p(x)
=
p(x)dx
=
1
1p(x)dx
1
p(x,y)dyZhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201315
/
95Rx2P(x1<X<x2)=Z1R1MapMachine
Learning
BasicsProbabilityExpectation
and
Variance
The
expectation
(“mean”
or
“average”)
of
a
function
f
under
the
probability
distribution
P
is
EP[f]
=
P(a)f(a)
a
Ep[f]
=
p(x)f(x)dx
x
In
particular
if
f(x)
=
x,
this
is
the
mean
of
the
random
variable
x.
The
variance
of
f
isVar(f)
=
E[(f(x)E[f(x)])2]
=
E[f(x)2]E[f(x)]2The
standard
deviation
is
std(f)
=Var(f).Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201316
/
95pMachineLearningBasicsProbabMachine
Learning
BasicsProbabilityMultivariate
Statistics
When
x,y
are
vectors,
E[x]
is
the
mean
vector
Cov(x,y)
is
the
covariance
matrix
with
i,j-th
entry
being
Cov(xi,yj).Cov(x,y)
=
Ex,y[(xE[x])(yE[y])]
=
Ex,y[xy]E[x]E[y]Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201317
/
95MachineLearningBasicsProbabi8
✓
◆<px(1:8
✓:Qdk=1
pkPdMachine
Learning
BasicsProbabilitySome
Discrete
Distributions
a
if
P(X
=
a)
=
1
Binomial.
n
(number
of
trials)
and
p
(head
probability)p)n
x
for
x
=
0,1,...,notherwise
n
f(x)
=
x
0Bernoulli.
Binomial
with
n
=
1.Multinomial
p
=
(p1,...,pd)>
(d-sided
die)f(x)
=
nx1,...,xd◆xk<
0
if
k=1
xk
=
notherwiseZhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201318
/
958✓◆<px(1:8✓:Qdk=1pMachine
Learning
BasicsProbabilityMore
Discrete
Distributions
Poisson.
X
⇠
Poisson(
)
if
xx!
f(x)
=
efor
x
=
0,1,2,....
the
rate
or
intensity
parametermean:,
variance:
2)
thenX1
+
X2
⇠
Poisson(
1
+
2).This
is
a
distribution
on
unbounded
counts
with
a
probability
massfunction“hump”
(mode
at
d
e1).Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201319
/
95MachineLearningBasicsProbabiGaussian
(Normal):
X
⇠
N(µ,Machine
Learning
BasicsProbabilitySome
Continuous
Distributions2)with
parameters
µ
2
R
(themean)
and
2
(the
variance)
1f(x)
=
p2⇡exp✓(x2µ)22◆.is
the
standard
deviation.If
µ
=
0,=
1,
X
has
a
standard
normal
distribution.
2),
then
Z
=
(X
2
2i
iZhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201320
/
95Gaussian(Normal):X⇠N(µ,MaMachine
Learning
BasicsProbabilitySome
Continuous
Distributions
Multivariate
Gaussian.
Let
x,µ
2
Rd,
⌃
2
S+
d
a
symmetric,
positive
definite
matrix
of
size
d
⇥
d.
Then
X
⇠
N(µ,⌃)
with
1
1
1
f(x)
=
exp
(x
µ)
⌃
(x
µ)
.
2
and
⌃
1
its
inverseZhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201321
/
95MachineLearningBasicsProbabiMachine
Learning
BasicsProbabilityMarginal
and
Conditional
of
Gaussian
If
two
(groups
of)
variables
x,y
are
jointly
Gaussian:xy⇠
N✓µxµy,
A
CC>
B◆(1)(Marginal)
x
⇠
N(µx,A)(Conditional)
y|x
⇠
N(µy
+
C>A1(xµx),BC>A1C)Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201322
/
95MachineLearningBasicsProbabiMachine
Learning
BasicsProbabilityMore
Continuous
Distributions
0
with
↵
>
0.
Generalizes
factorial:
(n)
=
(n
1)!
when
n
is
a
positive
integer.
(↵
+
1)
=
↵
(↵)
for
↵
>
0.
parameter
↵
>
0
and
scale
parameter
>
0f(x)
=↵1
(↵)x↵1ex/,
x
>
0.Conjugate
prior
for
Poisson
rate.Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201323
/
95MachineLearningBasicsProbabiMachine
Learning
BasicsStatistical
EstimationOutline1234Spatio-Temporal
Signal
Recovery
from
Social
MediaMachine
Learning
Basics
Probability
Statistical
Estimation
Decision
Theory
Graphical
Models
Regularization
Stochastic
ProcessesSocioscope:
A
Probabilistic
Model
for
Social
MediaCase
Study:
RoadkillZhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201324
/
95MachineLearningBasicsStatistMachine
Learning
BasicsStatistical
EstimationParametric
Models
A
statistical
model
H
is
a
set
of
distributions.
In
machine
learning,
we
call
H
the
hypothesis
space.
A
parametric
model
can
be
parametrized
by
a
finite
number
of
parameters:
f(x)
⌘
f(x;✓)
with
parameter
✓
2
Rd:
H
=
f(x;✓)
:
✓
2
⇥
⇢
Rd
where
⇥
is
the
parameter
space.Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201325
/
95MachineLearningBasicsStatistStatistical
Estimation
Machine
Learning
BasicsParametric
Models
We
denote
the
expectationE✓(g)
=Zxg(x)f(x;✓)dxE✓
means
Ex⇠f(x;✓),
not
over
di↵erent
✓’s.
data
1All
(parametric)
models
are
wrong.
Some
are
more
useful
than
others.Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201326
/
95StatisticalEstimation MachinMachine
Learning
BasicsStatistical
EstimationNonparametric
model
A
nonparametric
model
cannot
be
parametrized
by
a
fixed
number
of
parameters.
Model
complexity
grows
indefinitely
with
sample
size
Example:
H
=
{P
:
V
arP(X)
<
1}.
Given
iid
data
x1,...,xn,
the
optimal
estimator
of
the
mean
is
again
xi.
Nonparametric
makes
weaker
model
assumptions
and
thus
is
preferred.
But
parametric
models
converge
faster
and
are
more
practical.Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201327
/
95MachineLearningBasicsStatistMachine
Learning
BasicsStatistical
Estimation(
✓Estimation
X1
...Xn
that
attempts
to
estimate
a
parameter
✓.
This
is
the
“learning”
in
machine
learning!
Example:
In
classification
Xi
=
Pxi,yi)
and
bn
is
the
learned
model.
Consistent
estimators
learn
the
correct
model
with
more
training
data
eventually.Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201328
/
95MachineLearningBasicsStatistbias(✓
bn)
=
E✓(✓
bn)qThe
standard
error
of
an
estimator
is
se(✓
bn)
=Var✓(✓
bn)Pi
xi,
where
xi
⇠
N(0,1).
Then
the
standardMachine
Learning
BasicsStatistical
EstimationBias
E✓
is
w.r.t.
the
joint
distribution
f(x1,...,xn;✓)
=
i=1
f(xi;✓).
The
bias
of
the
estimator
is✓ˆdeviation
of
xi
is
1
regardless
of
n.
In
contrast,
se(µ)
=
1/pn
=
n12An
estimator
is
unbiased
if
bias(✓
bn)
=
0.Example:
Let
µ
ˆ
=
n
1which
decreases
with
n.Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201329
/
95bias(✓bn)=E✓(✓bn)qThestanmse(✓
bn)
=
E✓
(✓
bnMachine
Learning
BasicsStatistical
EstimationMSE
The
mean
squared
error
of
an
estimator
is⇣✓)2⌘Bias-variance
decompositionmse(✓
bn)
=
bias2(✓
bn)
+
se2(✓
bn)
=
bias2(✓
bn)
+
Var✓(✓
bn)
PZhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201330
/
95mse(✓bn)=E✓(✓bnMachineYMachine
Learning
BasicsStatistical
EstimationMaximum
Likelihood
Let
x1,...,xn
⇠
f(x;✓)
where
✓
2
⇥.
The
likelihood
function
isLn(✓)
=
f(x1,...,xn;✓)
=
ni=1f(xi;✓)The
log
likelihood
function
is
`n(✓)
=
logLn(✓).The
maximum
likelihood
estimator
(MLE)
is
✓
bn
=
argmax✓2⇥Ln(✓)
=
argmax✓2⇥`n(✓)Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201331
/
95YMachineLearningBasicsStatisMachine
Learning
BasicsStatistical
EstimationMLE
examples
The
MLE
for
p(head)
from
n
coin
flips
is
count(head)/n
for
i
Xi
and
2
=
1/n
(Xi
2.
The
MLE
does
not
always
agree
with
intuition.
The
MLE
for
X1,...,Xn
⇠
uniform(0,✓)
is
✓
b=
max(X1,...,Xn).Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201332
/
95MachineLearningBasicsStatistMachine
Learning
BasicsStatistical
EstimationProperties
of
MLE
When
H
is
identifiable,
underPcertain
conditions
(see
Wasserman
parameter
✓.
That
is,
the
MLE
is
consistent.
Asymptotic
Normality:
Let
se
=
1/In(✓)
where
In(✓)
is
the
Fisher
information,
and
✓
N(0,1)
se
The
MLE
is
asymptotically
e
cient
(achieves
the
Cram´er-Rao
lower
bound),
“best”
among
unbiased
estimators.Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201333
/
95MachineLearningBasicsStatistMachine
Learning
BasicsStatistical
EstimationFrequentist
statistics
Probability
refers
to
limiting
relative
frequency.
Data
are
random.
Estimators
are
random
because
they
are
functions
of
data.
Parameters
are
fixed,
unknown
constants
not
subject
to
probabilistic
statements.
Procedures
are
subject
to
probabilistic
statements,
for
example
95%
confidence
intervals
trap
the
true
parameter
value
95
Classifiers,
even
learned
with
deterministic
procedures,
are
random
because
the
training
set
is
random.
PAC
bound
is
frequentist.
Most
procedures
in
machine
learning
are
frequentist
methods.Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201334
/
95MachineLearningBasicsStatistMachine
Learning
BasicsStatistical
EstimationBayesian
statistics
Probability
refers
to
degree
of
belief.
Inference
about
a
parameter
✓
is
by
producing
a
probability
distributions
on
it.
Starts
with
prior
distribution
p(✓).
Likelihood
function
p(x
|
✓),
a
function
of
✓
not
x.
After
observing
data
x,
one
applies
the
Bayes
rule
to
obtain
the
posterior
1
=
p(✓
Z
evidence.
Prediction
by
integrating
parameters
out:
p(x
|
Data)
=
Z
p(x
|
✓)p(✓
|
Data)d✓Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201335
/
95MachineLearningBasicsStatistMachine
Learning
BasicsStatistical
EstimationFrequentist
vs
Bayesian
in
machine
learning
Frequentists
produce
a
point
estimate
✓
ˆ
from
Data,
and
predict
with
p(x
|
✓
ˆ).
integrating
over
✓s.
Bayesian
integration
is
often
intractable,
need
either
“nice”
distributions
or
approximations.
The
maximum
a
posteriori
(MAP)
estimate
✓MAP
=
argmax✓p(✓
|
x)
is
a
point
estimate
and
not
Bayesian.Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201336
/
95MachineLearningBasicsStatistMachine
Learning
BasicsDecision
TheoryOutline1234Spatio-Temporal
Signal
Recovery
from
Social
MediaMachine
Learning
Basics
Probability
Statistical
Estimation
Decision
Theory
Graphical
Models
Regularization
Stochastic
ProcessesSocioscope:
A
Probabilistic
Model
for
Social
MediaCase
Study:
RoadkillZhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201337
/
95MachineLearningBasicsDecisioMachine
Learning
BasicsDecision
Theory✓
✓✓
✓
✓Comparing
Estimators
Training
set
D
=
(x1,...,xn)
⇠
p(x;✓)
Learned
model:
✓
ˆ⌘
✓
ˆ(D)
an
estimator
of
✓
based
on
data
D.
Loss
function
L(✓,✓
ˆ)
:
⇥
⇥
⇥
7!
R+
squared
loss
L(✓,
ˆ)
=
(✓
ˆ)2
0
✓
=
✓
ˆ
1
✓
6=
✓
ˆ
KL
loss
L(✓,
ˆ)
=
p(x;✓)log
p(x;ˆ)
dx
Since
D
is
random,
both
✓
ˆ(D)
and
L(✓,✓
ˆ)
are
random
variablesZhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201338
/
95MachineLearningBasicsDecisioMachine
Learning
BasicsDecision
Theory✓Risk
The
risk
R(✓,✓
ˆ)
is
the
expected
loss
R(✓,✓
ˆ)
=
ED[L(✓,✓
ˆ(D))]
ED
averaged
over
training
sets
D
sampled
from
the
true
✓
The
risk
is
the
“average
training
set”
behavior
of
a
learning
algorithm
when
the
world
is
✓
Not
computable:
we
don’t
know
which
✓
the
world
is
in.
Assume
squared
loss.
Then
R(✓,✓
ˆ1)
=
1
(hint:
variance),
R(✓,
ˆ2)
=
ED(✓
3.14)2
=
(✓
3.14)2.
Smart
learning
algorithm
✓
ˆ1
and
a
dumb
one
✓
ˆ2.
However,
for
tasks✓
2
(3.141,3.14
+
1)
the
dumb
algorithm
is
better.Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201339
/
95MachineLearningBasicsDecisioDecision
Theory
Machine
Learning
BasicsMinimax
Estimatormaximum
riskRmax(✓
ˆ)
=
supR(✓,✓
ˆ)✓
✓The
minimax
estimator
✓
ˆminimax
minimizes
the
maximum
risk
✓
ˆminimax
=
arginf
supR(✓,✓
ˆ)
ˆ
✓The
infimum
is
over
all
estimators
✓
ˆ.The
minimax
estimator
is
the
“best”
in
guarding
against
the
worstpossible
world.Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201340
/
95DecisionTheory MachineLearniMachine
Learning
BasicsGraphical
ModelsOutline1234Spatio-Temporal
Signal
Recovery
from
Social
MediaMachine
Learning
Basics
Probability
Statistical
Estimation
Decision
Theory
Graphical
Models
Regularization
Stochastic
ProcessesSocioscope:
A
Probabilistic
Model
for
Social
MediaCase
Study:
RoadkillZhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201341
/
95MachineLearningBasicsGraphicGraphical
Models
Machine
Learning
BasicsThe
envelope
quizZhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201342
/
95GraphicalModels MachineLearnMachine
Learning
BasicsGraphical
ModelsThe
envelope
quiz
P(E
=
1)
=
P(E
=
0)
=
1/2
P(B
=
r
|
E
=
1)
=
1/2,P(B
=
r
|
E
=
0)
=
0
1/2?
P(B=b)
Switch.
The
graphical
model:
E
BZhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201342
/
95MachineLearningBasicsGraphicMachine
Learning
BasicsGraphical
ModelsProbabilistic
Reasoning
The
world
is
reduced
to
a
set
of
random
variables
x1,...,xn
I
e.g.
(x1,...,xn
1)
Inference:
given
joint
distribution
p(x1,...,xn),
compute
I
p(x1,...,xn
1,xn)
1)
v
p(x1,...,xn
1,xn
=
v)
Learning:
estimate
p(x1,...,xn)
from
training
data
X(1),...,X(N),
(i)
(i)Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201343
/
95MachineLearningBasicsGraphicMachine
Learning
BasicsGraphical
ModelsIt
is
di
cult
to
reason
with
uncertainty
joint
distribution
p(x1,...,xn)IIexponential
na¨ıve
storage
(2n
for
binary
r.v.)hard
to
interpret
(conditional
independence)
I
Often
can’t
a↵ord
to
do
it
by
brute
forceIf
p(x1,...,xn)
not
given,
estimate
it
from
dataIOften
can’t
a↵ord
to
do
it
by
brute
forceZhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201344
/
95MachineLearningBasicsGraphicMachine
Learning
BasicsGraphical
ModelsGraphical
models
Graphical
models:
e
cient
representation,
inference,
and
learning
on
p(x1,...,xn),
exactly
or
approximately
Two
main
“flavors”:IIdirected
graphical
models
=
Bayesian
Networks
(often
frequentistinstead
of
Bayesian)undirected
graphical
models
=
Markov
Random
FieldsKey
idea:
make
conditional
independence
explicitZhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201345
/
95MachineLearningBasicsGraphicMachine
Learning
BasicsGraphical
ModelsBayesian
Network
Directed
graphical
models
are
also
called
Bayesian
networks
A
directed
graph
has
nodes
X
=
(x1,...,xn),
some
of
them
connected
by
directed
edges
xi
!
xj
A
cycle
is
a
directed
path
x1
!
...
!
xk
where
x1
=
xk
A
directed
acyclic
graph
(DAG)
contains
no
cycles
A
Bayesian
network
on
the
DAG
is
a
family
of
distributions
satisfying{p
|
p(X)
=Yip(xi
|
Pa(xi))}where
Pa(xi)
is
the
set
of
parents
of
xi.p(xi
|
Pa(xi))
is
the
conditional
probability
distribution
(CPD)
at
xiBy
specifying
the
CPDs
for
all
i,
we
specify
a
particular
distributionp(X)Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201346
/
95MachineLearningBasicsGraphicExample:
Alarm
Binary
variablesGraphical
Models
P(E)=0.002
E
A
M
P(M
|
A)
=
0.7
P(M
|
~A)
=
0.01Machine
Learning
Basics
P(B)=0.001
BP(A
|
B,
E)
=
0.95P(A
|
B,
~E)
=
0.94P(A
|
~B,
E)
=
0.29P(A
|
~B,
~E)
=
0.001
J
P(J
|
A)
=
0.9
P(J
|
~A)
=
0.05
P(B,⇠
E,A,J,⇠
M)=
P(B)P(⇠
E)P(A
|
B,⇠
E)P(J
|
A)P(⇠
M
|
A)
0.7)⇡
.000253Zhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201347
/
95Example:AlarmGraphicalModelGraphical
Models
Machine
Learning
BasicsExample:
Naive
Bayes
yy...x1xdx
dp(y,x1,...xd)
=
p(y)
i=1
p(xiUsed
extensively
in
natural
language
processingPlate
representation
on
the
rightZhu
(U
Wisconsin)Understanding
Social
MediaCCF/ADL
Beijing
201348
/
95GraphicalModels MachineLearnGraphical
Models
Machine
Lea
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 清淤修补 施工方案
- 新型挡水坝施工方案
- 无线施工方案
- 工程外线施工方案
- 房梁圆弧处理施工方案
- 2025年度高端办公室租赁服务合同
- 2025年度知识产权质押贷款合同民间借贷法律规定及操作指南
- 二零二五年度专利信息检索与专利布局合作协议
- 2025年度股东投资退出机制对赌协议书
- 二零二五年度沿街房屋租赁合同(含物业管理服务)
- 事业编 合同范例
- 2025(人教版)音乐三年级下册全册教案及教学设计
- 福建省厦门市第一中学2023-2024学年高二上学期开学考试英语试题(解析版)
- 2025届高考英语读后续写提分技巧+讲义
- 买房协议书样板电子版
- 2024年无锡科技职业学院高职单招数学历年参考题库含答案解析
- 2025年山东新华书店集团限公司临沂市县分公司招聘录取人员高频重点提升(共500题)附带答案详解
- 《经济学的研究方法》课件
- 2025年极兔速递有限公司招聘笔试参考题库含答案解析
- 躁狂的健康宣教
- 2025年浙江省水务集团招聘笔试参考题库含答案解析
评论
0/150
提交评论