




版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
专题论坛大数据课件Big
Data
vs
Smart
Model:Beauty
and
the
BeastProf.
Yike
GuoDepartment
of
ComputingImperial
College
LondonBigDatavsSmartModel:Prof.Model
:
Mathematical
Representation
of
a
SimplifiedPhysical
World
Modelling
is
an
essential
and
inseparable
part
of
all
scientific
activity.
A
scientific
model
seeks
to
represent
empirical
objects,
phenomena,
and
physical
processes
in
a
logical
and
objective
way
To
understand
the
world
or
an
object
(called
a
target
T),
a
modelM
is
a
simplified
mathematical
representation
of
it.
Model
is
the
result
of
abstraction
from
observations
made,
and
it’s
used
to
give
prediction
Human
/
SensorHuman
/
Machine
Human
/
Machine.Model:MathematicalRepresentNo
Model
Is
Perfect:
•
Inherent
Uncertainty
:
These
targets
consist
of
a
set
of
continuous
phenomena
(in
both
time
and
space),
and
they
typically
produce
rich
signals.
Because
of
the
continuity
in
both
time
and
space
of
target,
the
signals
are
in
principle
infinite.
But
observations
(
e.g.
sensor
readings
)
are
made
at
discrete
points
in
time
and
space,
so
they
are
incomprehensive,
and
approximate,
which
brings
the
“uncertainty”.
•
Overfitting
or
Underfitting:
When
learning
a
model
from
observations,
such
as
learning
a
nonlinear
regression
model,
we
need
to
choose
the
parameters
such
as
K.
Considering
the
fact
that
the
information
from
observations
is
partial
.
It
is
hard
to
make
a
perfect
choice
of
K.
Such
imperfectness
causes
the
problem
of
model
error,
like
underfitting
(small
k)
and
overfitting
(large
k).•
Simplification:
From
observations,
we
project
from
a
multi-dimensional
world
a
simplified
model
with
significant
reduced
dimensionality
to
focus
on
the
features
or
properties
we
are
interested
in.Nonlinearregression:
K-order
polynomialNoModelIsPerfect:•SimplGeorge
Box
(statistician)
“All
models
are
wrong,
but
some
areuseful.”
Only
models,
from
cosmological
equations
to
theories
of
humanbehavior,
seemed
to
be
able
to
consistently,
if
imperfectly,
explain
the
worldaround
us.
1980Peter
Norvig
(Google)
:
"All
models
are
wrong,
and
increasinglyyou
can
succeed
without
them."
2008Chris
Anderson
(Wired)
:
There
is
now
a
better
way.
Petabytesallow
us
to
say:
"Correlation
is
enough."
We
can
stop
looking
for
models.We
can
analyze
the
data
without
hypotheses
about
what
it
might
show.
Wecan
throw
the
numbers
into
the
biggest
computing
clusters
the
world
hasever
seen
and
let
statistical
algorithms
find
patterns
where
science
cannot.(The
Data
Deluge
Makes
the
Scientific
Method
Obsolete)20124So,
Why
Model
?GeorgeBox(statistician)The
ArgumentAt
the
petabyte
scale,
information
is
not
a
matter
of
simple
three-
and
four-dimensionaltaxonomy
and
order
but
of
dimensionally
agnostic
statistics.
It
calls
for
an
entirely
differentapproach,
one
that
requires
us
to
lose
the
tether
of
data
as
something
that
can
be
visualizedin
its
totality.
It
forces
us
to
view
data
mathematically
first
and
establish
a
context
for
it
later.For
instance,
conquered
the
advertising
world
with
nothing
more
than
appliedmathematics.
It
didn't
pretend
to
know
anything
about
the
culture
and
conventions
ofadvertising
—
it
just
assumed
that
better
data,
with
better
analytical
tools,
would
win
the
day.And
was
right.Google's
founding
philosophy
is
that
we
don't
know
why
this
page
is
better
than
thatone:
If
the
statistics
of
incoming
links
say
it
is,
that's
good
enough.
No
semantic
orcausal
analysis
is
required.
That's
why
can
translate
languages
without
actually"knowing"
them
(given
equal
corpus
data,
can
translate
Klingon
into
Farsi
aseasily
as
it
can
translate
French
into
German).
And
why
it
can
match
ads
to
contentwithout
any
knowledge
or
assumptions
about
the
ads
or
the
content.TheGoogleArgumentAtthepetaModel
Free
Sensor
Informatics
:
Query
Driventime10am10am
..10amid12..7temp
20
21
…
29Database
Table
raw-dataSensorNetwork3.
Write
output
to
a
file/back
to
the
database4.
Write
data
processing
tools
to
process/aggregate
the
output
(maybe
using
User1.
Extract
all
readings
into
a
file2.
Run
MATLAB/R/other
data
processing
tools
DB)
5.
Decide
new
data
to
acquire
RepeatModel-free
sensing
treats
the
sensory
system
as
a
database,
and
sensing
as
querying
to
fetch
data
from
physicalworld.
One
of
the
leading
vendors
[Crossbow]
is
bundling
a
query
processor
with
their
devices.ModelFreeSensorInformaticsWikisensing
:
A
Model
Free
Sensor
Informatics
SystemBased
on
Big
Data
ArchitectureWikisensing:AModelFreeSenModel
Free
Sensing
is
Super
Inefficient•
Data
misrepresentation
without
model•
Latent
information
missing
without
model•
High
demand
of
computation/storage
without
model•
Require
too
much
of
interoperability
between
sensorsand
analyticsModelFreeSensingisSuperInBayesian:
Data
Is
Not
the
Enemy
of
Models
,
Rather
aGreat
Supporter!Bayesian
probability
is
a
formalism
that
allows
us
to
reason
about
beliefs
of
models
underconditions
of
uncertainty
based
on
the
observations
(data)
.If
we
have
observed
that
a
particular
event
has
happened,
such
as
Britain
coming
10th
in
themedal
table
at
the
2004
Olympics,
then
there
is
no
uncertainty
about
it.However,
suppose
a
is
the
statement
“Britain
sweeps
the
boards
at
2012
London
Olympics,winning
more
than
30
Gold
Medals!“
made
before
28th
of
JulySince
this
is
a
statement
about
a
future
event,
nobody
can
state
with
any
certainty
whether
ornot
it
is
true.
Different
people
may
have
different
beliefs
in
the
statement
depending
on
theirspecific
knowledge
of
factors
that
might
effect
its
likelihoodThe
belief’s
of
the
model
were
changing
daily
based
on
the
performance
data
available
eachday.
By
the
10
of
August,
most
of
people’s
belief
to
this
model
should
be
almost
80%Thus,
in
general,
a
person's
subjective
belief
in
a
statement
a
will
depend
on
some
body
ofknowledge
K.
We
write
this
as
P(a|K).
Henry's
belief
in
a
is
different
from
Marcel's
because
theyare
using
different
K's.
However,
even
if
they
were
using
the
same
K
they
might
still
havedifferent
beliefs
in
a.The
expression
P(a|K)
thus
represents
a
belief
measure.
Sometimes,
for
simplicity,
when
Kremains
constant
we
just
write
P(a),
but
you
must
be
aware
that
this
is
a
simplification.Bayesian:DataIsNottheEneModel
and
Data
Interaction
:
Bayesian
Inference10•Bayes
Rule:
Interaction
between
data
and
model•Learning
as
A
Sequence
of
Interactionsp(Y
|
)
p(
)
p(Y)P(
|
Y)
ModelandDataInteraction:BBig
Data
Meets
Smart
Models
:
A
Bayesian
Approachtowards
Sensor
Informatics•We
need
model
:
a
model
is
the
representation
of
our
knowledge
so
far•••••Data
:
the
observations
which
may
revise
our
belief
to
the
models
we
haveAnalysis
:
assessing
our
belief
and
updating
our
models
to
make
them
more
believableSensing
:
acquiring
needed
data
to
update
(enrich)
modelsModels
are
learned
from
data
(observations)
by
scientists
(theoretical
abstraction)
or
by
machine
(machinelearning)
•
Models
are
hypothesis
(
when
making
new
observation)
•
Models
are
knowledge
(when
established
belief)Sensor
Informatics:
Sensing
management
Managing
the
“neediness”
:
when
and
where
to
sense
•
Sensing
analytics
Managing
model
updating
:
how
to
enrich
models
with
observations
•
Reasoning
Decision
making
based
on
integration
of
trusted
models
•P(M
|
D)
=
P(D
|
M
)
P(M)
/
P(D)BigDataMeetsSmartModels:
Surprising
Event
:
When
an
Observation
Does
not
Fit
a
Known
Model
Posterior
and
prior
(P(M|D)
~
P(M)
)
has
great
variance
->
surprise!How
great
is
great
variance?
Surprise
threshold
αKullback-Leibler
divergence:Other
methods:
signficant
level,
Chebyshev’s
Theorem,
…
From
model,
we
get
C(A,
B)
(e.g.
a
multivariate
Gaussian
distribution)
A:
100mm
B:
50mmModel
consistentA:
100mmB:
500mmSurprise! SurprisingEvent:WhenanObCamera
example:
Image
->
Analog
Signal
->Digital
Data
->
Compressed
Data
->
InformationWhy
sensing
so
much
data
and
then
throw
themaway?Why
not
sensing
information
directly?Using
Compressive
Sensing
Technology
to
OptimizeObservations
Compressive
sensing:
Take
the
advantage
of
sparseness,
to
solve
the
under-determined
signals
with
just
a
small
amount
of
measurement.
Unobserved
behavior
(behavior
not
captured
by
the
current
model)
is
typically
sparse.Reconstruction
method:
L1-min,
Bayesian
CS.Sensing
data
is
enough
when
we
can
recover
the
need
information
through
compressive
sensing.Ψ:
CS
Matrix
built
from
the
modelΦ:
Placement
MatrixCameraexample:Image->AnaloHow
to
Update
Model
–
Parameter
Estimation1Y131.03188.294245.559302.823360.088417.352474.617531.881589.146646.41DEC
25
2011
21:15:23NODAL
SOLUTIONSTEP=360SUB
=1TIME=1800TEMP
(AVG)RSYS=0SMN
=131.03SMX
=646.41
MX
MN
Z
XEstimating
parameter
θ
to
maximize
the
likelihoodof
data
given
the
model:HowtoUpdateModel–ParametModel
:
An
Example
in
Digital
CityModelling
City
Life
via
Causality
:
C(eA,
eB)
is
used
for
predict
current
value
of
location
(A)
whenanother
location
(B)
value
is
given
Location
:
physical
/
logical
locations
with
causality
(through
sensory
cortex)(city
areas,
A.
B)
Relationship
:
topology
(geo
topology
between
A
and
B:
diffusion
Structure
)
Event:
events,
which
is
the
dynamics
of
observable
signal
S
=
f(E)
(heavyrainfall)Model:AnExampleinDigitalOntologies
are
adopted
to
represent
locations
L,
relationships
R*events
E,
and
signals
S.Diffusion:
An
event
e1∈
E
in
n1causes
another
event
e2
∈
E
in
n2,when
two
nodes
n1,
n2
in
G
arelinked.
Digital
City
Model
:
looking
into
the
detailsSystem
T
=
(L,
R,
E)Model
M(T)
=
(G,
∅,
B)Training
for
causality
∅:
use
Bayesian
network
to
represent
theconditional
independencies
between
cause
and
target
variables:1.
Gaussian
Mixture
Models
(GMMs),
estimated
via
expectationmaximization
(EM)
2.
Gaussian
Process
with
Bayesian
Inference.Ontologiesareadoptedtorepr
When
the
surprise
>
surprise
threshold
Diversity
detected
identify
the
incorrect
causality
C(el,
ep),
which
is
sparse
Compressive
sensing
approachNew
observation->
measurement
thatcould
revise
model
in
model
space
tomaximize
the
likelihood
of
observations
Focusing
on
diversityPlacementModel
Updating
Model
Driven
Sensing
:
No
Surprise
!
The
dynamics
of
model
update:
Surprise
->
Sensing
->
Model
Updating
The
goal
for
sensing:
Capturingsurprise
The
goal
of
analysis
:
RevisingmodelA
model
cannot
overfit
/
underfit,
when
there
is
diversity,
it
could
be
updated->
consistent
with
the
universe
(target) Whenthesurprise>surpriseModel
UpdateIt’s
a
Bayesian:
P(M,
ϴ
|
D)
=
P(D
|
M,
ϴ)
P(M,
ϴ)
/
P(D)T:
target,
M:
model,
ϴ:
top-down
parameter*
When
ϴ
is
fixed:
P(M
|
D)
=
P(D
|
M)
P(M)
/
P(D)->
The
variance
between
posterior
and
prior
is
“surprise”->
bottom-up
attention
->
model
update
(data
assimilation):combining
observations
of
the
current
state
of
a
system
with
the
resultsfrom
a
model
(the
forecast)
to
produce
an
analysis.
The
model
is
thenadvanced
in
time
and
its
result
becomes
the
forecast
in
the
nextanalysis
cycle*
When
ϴ
is
updated:
P(M,
ϴ)
=
P(M
|
ϴ)P(ϴ)->
top-down
attention
(alertness)
->
model
updateModelUpdateIt’saBayesian:PAdaptive
Observation:
Sensing
and
Numerical
ModellingCityGML
Ontology
->
GIS
->
Geometry
meshAdaptiveObservation:SensingBuilding
An
Initial
Model
and
Making
Prediction
bySimulationsSetting
up
boundary
conditions,
numerical
schemas,
model
parameters,
etc.BuildingAnInitialModelandSimulation24
Building
Case
(Fine
Mesh
–
600000
Nodes):
20
ProcessorsSimulation24BuildingCase(FiSimulationMoving
Vehicles
and
Scalar
Dispersions
in
Street
CanyonsSimulationMovingVehiclesandUsing
Sensor
to
Verify
the
Prediction
Results
of
theModel
Sensing:
Acquiring
data
to
get
posterior
of
model,
for
validate
(consistent)
or
update
model
.
P(M
|
D)
=
P(D
|
M)
P(M)
/
P(D)Data
sensingModelvalidateupdateUsingSensortoVerifythePreNew
WikiSensing:
Elastic
Sensing
Environment
forLarge
Scale
Sensor
Informatics•
Elastic
sensing
theory
based
on
Bayesian
inference•
Big
Data
architecture
for
large
scale
sensory
data
management•
Ontology
for
the
background
knowledge
management•
Model
driven
adaptive
observation
support•
Digital
City
and
digital
life
applicationsNewWikiSensing:ElasticSensiThe
architecture
of
the
New
WikiSensing
SystemThearchitectureoftheNewWiOntology
Used
to
Organise
the
Complex
knowledgemanagementUsing
ontology
to
represent
the
targets,
signals,sensing
methods,
measurements,
etc.Ontology
to
support
flexible
resolution
Upper
ontology
for
unified
operationOntoSensorOntologyUsedtoOrganisetheConclusion•
Big
data
offers
great
opportunity
for
building
smart
models•
Big
data
provides
new
methodology
for
model
research•
New
informatics
comes
from
the
close
coupled
integration
of
the
data
and
the
model
worlds•
Bayesian
theory
provides
a
nature
foundation
for
such
an
integration•
Sensor
Informatics
is
a
good
example
for
such
a
paradigm•
A
new
uniform
framework
of
sensor
informatics
can
be
developed
based
on
the
Bayesian
theory
wherethe
dynamics
of
data
and
model
capturing
the
essence
of
building
a
sensory
system•
We
are
developing
the
WikiSensing
system
to
realise
this
paradigmConclusion•BigdataoffersThank
youThankyouUnderstanding
Big
DataHaixun
WangUnderstandingBigDataHaixunWData
ExplosionMB
=
106
bytesa
typical
book
in
text
formatGB
=
109
bytesa
one
hour
video
is
about
1GB;data
produced
by
a
biologyexperiment
in
one
dayTB
=
1012
bytesastronomy
data
in
one
night;US
Library
of
Congress
has
1000
TB
data;search
log
of
Bing
is
20
TB
per
day
(2009)DataExplosionMB=106bytesaThe
Arecibo
TelescopeWorld’s
largest
radio
telescopeDiameter
:
305
m
(1,000
ft)Area
:
18
acresLocation:
Arecibo,
Puerto
RicoThe
P-ALFA
surveys800
Terabytes
in
5
yearsTheAreciboTelescopeWorld’slSoftware
Driven
Telescopefrom
few,
large,
expensive,directional
dishes
to
many,
small,cheap,
omni
directional
antennaea
large
number
of
high-speedinput
streams(2Gbps
per
antenna,
25,000antennae
in
an
area
of
340
km
indiameter)SoftwareDrivenTelescopefromData
sizeChallenge
1:
It’s
the
data,
stupid!Data
complexityKey/value
storeColumn
storeDocument
storeGraph
SystemsDatasizeChallenge1:It’stheBig
data
drives
tomorrow’s
economy.•
The
value
of
big
data
lies
in
its
degree
ofconnectedness.•
Existing
systems
cannot
handle
richconnectedness
of
big
data.Bigdatadrivestomorrow’secoRDBMS
and
Rich
Relationships•
Performance
of
multi-way
joins
is
very
poor
inRDBMS•
Managing
data
of
rich
connectedness
requiresmulti-way
Joins
in
RDBMSRDBMSandRichRelationships•Trinity•
A
general
purpose,
distributed,
in
memory
graph
system•
Online
graph
query
processing•
Offline
graph
analyticsTrinity•Ageneralpurpose,dTrinity
Performance
Highlight•
Onlinequeryprocessing
:–
visiting
2.2
million
users
(3
hop
neighborhood)
on
Facebook:
<=
100ms–
foundation
for
graph-based
service,
e.g.,
entity
search•
Offlinegraphanalytics
:–
one
iteration
on
a
1
billion
node
graph:
<=
60sec–
foundation
for
analytics,
e.g.,
social
analyticsTrinityPerformanceHighlight•PeopleSearchDemoPeopleSearchDemoMulti-way
Join
vs.
Graph
TraversalCompanyIncidentProblem…IDCompanyID1ID2ID…IncidentID3ID4ID…ProblemRDBMSTrinityMulti-wayJoinvs.GraphTraveChallenge
2:
Interpretation
of
Big
Data•
IBM
Watson:–
Runs
on
2,880
cores,
15
terabytes
of
RAM,
and80kW
of
power•
A
human
brain:–
Runs
on
a
tuna
fish
sandwich
and
a
glass
of
waterChallenge2:Interpretationofansweringthe
questionunconstrainednatural
languageinferencing
&reasoningdomain
specificlanguagesimplecalculation
Human(Turing
Test)SIRI
Watson
Wolfram
AlphaGoogle/Bing?
the
Eternal
Questunderstanding
the
question
SQLcalculatoransweringthequestionunconstraTurning
the
Web
intoa
DatabaseTurningtheWeb intoWhat
you
see
when
you
look
at
my
homepage
…Haixun
WangMicrosoft
Research
AsiaEmail:
haixunw
@
microsoft
.
comTel:
+86-10-58963289Tel:
+1-914-902-0749I
joined
Microsoft
Research
Asia
in
2009.I
was
with
IBM
T.
J.
Watson
ResearchCenter
from
2000
to
2009.
I
received
theB.S.
and
M.S.
Degree
in
Computer
Sciencefrom
ShanghaiJiaoTongUniversity
in1994
and
1996,
the
Ph.D.
Degree
inComputer
Science
fromUniversityofCalifornia,LosAngelesin
June,
2000.WhatyouseewhenyoulookatAWhat
a
machine
sees
when
it
looks
at
my
homepage
…A
JPEG
Imagea
jpeg
Filetext
in
bigA
bold
fontA4
lines
of
textanother
dozen
lines
oftext
with
twoembedded
URLsAWhatamachineseeswhenitl专题论坛大数据课件Semantic
Web?•
Number
1
trend
in
2008–
Richard
MacManus•
The
infrastructure
to
power
theSemantic
Web
is
already
here.–
Tim
Berners-Lee•
Unstructured
information
will
give
way
to
structuredinformation
–
paving
the
road
to
intelligent
computing.–
Alex
IskoldSemanticWeb?•Number1tren专题论坛大数据课件More
data
beats
better
algorithmsBanko
and
Brill
2001MoredatabeatsbetteralgoritMean
translation
quality(1=incomprehensible,
4
=
perfect)English-Spanish
translation
quality,Microsoft
technical
texts2.5
23.52001200220032004200520062007Systran
Improvealgorithms,
scale
system,and
add
data!Rule-based
system
with
expensive
customizations
for
Microsoft3
MSRMT
Logos
Off-the-shelfrule-based
systemFrom
Rick
Rashid’s
talk:
It’s
a
data
driven
world
–
get
over
it!Meantranslationquality(1=incProbase
isA(concept,entities)isPropertyOf
(attributes)Co-occurrence
(isCEOof,
LocatedIn,etc)Concepts
(“SpanishArtists”)Entities
(“PabloPicaso”)Probase isAisPropertyOfCo-occuExplicit
vs.
Latent
Knowledge•
Abstract
representations
(such
as
clustersfrom
latent
analysis)
that
lack
linguisticcounterparts
are
hard
to
learn
or
validate
andtend
to
lose
information.•
Human
language
has
evolved
over
millennia
tohave
words
for
the
important
concepts;
let’suse
them.Halevy,
Norvig,
Pereira,
“The
Unreasonable
Effectiveness
of
Data”,
IEEE
Intelligent
Systems,
2009.Explicitvs.LatentKnowledge•What
is
interpretation?Whatisinterpretation?Add
Common
Sense
to
ComputingPablo
Picasso
25
Oct
1881SpanishAddCommonSensetoComputingPWhich
is
“kiki”
and
which
is
“bouba”?Whichis“kiki”andwhichis“soundshapezigzaggednesssoundshapezigzaggednessChinaIndiacountryBrazilemerging
marketChinaIndiacountryBrazilemerginbodytastesmell
winebodytastesmellIT
companyThe
engineer
is
eating
an
applefruitITcompanyTheengineeriseat
Multiple
ConceptsObama’s
real-estatepolicypresident,
politicianinvestment,
property,
asset,
plan,
documentpresident,
politician,investment,
property,
asset,
plan,
document MultipleConceptspresident,pMultiple
Concepts
applesoftware
company,
brand,
fruit,
juice
adobebrand,
software
company,
materialsoftware
company,software
manufacturer,
brand
juice,
materialbrand,
company,
fruit,MultipleConcepts apple adobes
Multiple
ConceptsObama’s
real-estatepolicypresident,
politicianinvestment,
property,
asset,
plan,
documentpresident,
politician,investment,
property,
example
plan,
documentthing,
issue,
term,
asset, MultipleConceptspresident,pExample:
(from
B.
Dolan)Who
assassinatedAbraham
Lincoln?Example:(fromB.Dolan)WhoasThe
far
reaching
implicationsScientific
MethodThefarreachingimplicationsSScientific
MethodScientificMethodWhat
really
counts
isunderstandingora
mastery
of
some
commonvocabularyWhatreallycountsisunderstanHow
can
big
data
help?A
much
more
rapid
cycle
of
hypothesisgeneration
and
testing•
General
access
toknowledge
in
science•
Autonomousexperimentation,
withan
‘active
learning’modelHowcanbigdatahelp?AmuchmTechnological
Singularityif
machines
could
even
slightly
surpass
human
intellect,
they
could
improve
theirown
designs
in
ways
unforeseen
by
their
designers,
and
thus
recursively
augmentthemselves
into
far
greater
intelligencesTechnologicalSingularityifmaThanksThanks大数据平台及互联网应用服务大数据平台及互联网应用服务Agenda
当前面临问题和挑战
国内外公司解决方案
大数据领域腾讯解决之道Agenda当前面临问题和挑战国内外公司解决方案Agenda第一篇:当前面临问题和挑战Agenda第一篇:当前面临问题和挑战大数据挑战(1)-海量数据存储技术?
1.PB级数据向ZB级演进,如何降低存储
和计算成本数据量:46PB机器数量:5600台2.工业级业务发展迅速对大数据计算时
效性和可靠性提出新的挑战大数据挑战(1)-海量数据存储技术?数据量:46PB机器数量大数据挑战(2)—数据应用难大数据挑战(2)—数据应用难大数据挑战(3)-精准推荐难1.企业信息泛滥的问题(全互联网)2.推荐精度低3.推荐效果有效评估问题4.如何有效收集用户主动行为数据大数据挑战(3)-精准推荐难1.企业信息泛滥的问题(全互联网Agenda第二篇:
国内外公司解决方案Agenda第二篇:国内外公司解决方案hadoop开源产品HbaseMahoutHive/Pig海豚技术海狗章鱼海星剑鱼蓝鲸…..…..海量计算:基于Hadoop海量存储计算集群,同时提供一站式的计算和存储资源管理
分布式数据挖掘:
基于Mahout分布式数
据数据挖掘数据分发中心:提供批量数据抽取和转载,同时准实时消息,日志分发(采用客户pull方式)
海量数据实时搜索:
基于Hbase和Solr集成,
提供千亿级别数据实时
查询和全文检索流计算框架:类似M/R流式计算框架,可以实现应用快速,提供在线数据加工服务海量数据查询:基于hive和Pig,提供Web页面海量数据可视化查询服务国内案例-支付宝大数据平台
支付宝hadoop相关应用服务hadoop开源HbaseMahoutHive/Pig海豚技•••••Online
news,
News
reports
that
recommendations
increasearticles
viewed
by
38%
(Das
et
al.
2007).Movies,
Netflix
reports
that
over
60%
of
their
rentals
originate
fromrecommendations
(Thompson
2008).Amazon,
which
sells
music,
books,
and
movies,
35%
of
sales
arereported
to
originate
from
recommendations
(Lamere
&
Green
2008).Video,
YouTub
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 高中化学化学平衡常数与转化率2025年秋季学期冲刺测试卷
- 传染病人标准化处理流程
- 2025年统计学期末考试题库数据分析计算题库汇编
- 2025年校园年度工作计划与总结报告编制要点
- 2025年高压电工(高压线路运行维护)专业试题库解析试卷
- 大班预防传染病安全常识
- 2025年研学旅行策划与管理中级考试模拟试卷:线路设计的可持续发展与生态保护
- 2025年注册会计师CPA财务成本管理模拟试卷(成本计算与财务分析)深度解析版
- 2025年学校饮用水安全检测公示透明化管理制度
- 成人高考高升专数学(文)2025年全真模拟试题(中等难度)真题解析
- 三年级语文下册按课文内容填空及相关拓展考查
- 多能互补规划
- 2024年代收居间费协议书模板下载
- 消除“艾梅乙”医疗歧视-从我做起
- DB34∕T 4433-2023 检测实验室公正性风险评估技术规范
- 系统商用密码应用方案v5-2024(新模版)
- 杭州银行春招在线测评题
- DB42-T 2286-2024 地铁冷却塔卫生管理规范
- 门窗施工安全承诺书
- 安徽省2024年中考英语模拟试卷(含答案)4
- 2022年山东威海中考满分作文《竟然如此简单》
评论
0/150
提交评论