专题论坛大数据课件_第1页
专题论坛大数据课件_第2页
专题论坛大数据课件_第3页
专题论坛大数据课件_第4页
专题论坛大数据课件_第5页
已阅读5页,还剩82页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

专题论坛大数据课件Big

Data

vs

Smart

Model:Beauty

and

the

BeastProf.

Yike

GuoDepartment

of

ComputingImperial

College

LondonBigDatavsSmartModel:Prof.Model

:

Mathematical

Representation

of

a

SimplifiedPhysical

World

Modelling

is

an

essential

and

inseparable

part

of

all

scientific

activity.

A

scientific

model

seeks

to

represent

empirical

objects,

phenomena,

and

physical

processes

in

a

logical

and

objective

way

To

understand

the

world

or

an

object

(called

a

target

T),

a

modelM

is

a

simplified

mathematical

representation

of

it.

Model

is

the

result

of

abstraction

from

observations

made,

and

it’s

used

to

give

prediction

Human

/

SensorHuman

/

Machine

Human

/

Machine.Model:MathematicalRepresentNo

Model

Is

Perfect:

Inherent

Uncertainty

:

These

targets

consist

of

a

set

of

continuous

phenomena

(in

both

time

and

space),

and

they

typically

produce

rich

signals.

Because

of

the

continuity

in

both

time

and

space

of

target,

the

signals

are

in

principle

infinite.

But

observations

(

e.g.

sensor

readings

)

are

made

at

discrete

points

in

time

and

space,

so

they

are

incomprehensive,

and

approximate,

which

brings

the

“uncertainty”.

Overfitting

or

Underfitting:

When

learning

a

model

from

observations,

such

as

learning

a

nonlinear

regression

model,

we

need

to

choose

the

parameters

such

as

K.

Considering

the

fact

that

the

information

from

observations

is

partial

.

It

is

hard

to

make

a

perfect

choice

of

K.

Such

imperfectness

causes

the

problem

of

model

error,

like

underfitting

(small

k)

and

overfitting

(large

k).•

Simplification:

From

observations,

we

project

from

a

multi-dimensional

world

a

simplified

model

with

significant

reduced

dimensionality

to

focus

on

the

features

or

properties

we

are

interested

in.Nonlinearregression:

K-order

polynomialNoModelIsPerfect:•SimplGeorge

Box

(statistician)

“All

models

are

wrong,

but

some

areuseful.”

Only

models,

from

cosmological

equations

to

theories

of

humanbehavior,

seemed

to

be

able

to

consistently,

if

imperfectly,

explain

the

worldaround

us.

1980Peter

Norvig

(Google)

:

"All

models

are

wrong,

and

increasinglyyou

can

succeed

without

them."

2008Chris

Anderson

(Wired)

:

There

is

now

a

better

way.

Petabytesallow

us

to

say:

"Correlation

is

enough."

We

can

stop

looking

for

models.We

can

analyze

the

data

without

hypotheses

about

what

it

might

show.

Wecan

throw

the

numbers

into

the

biggest

computing

clusters

the

world

hasever

seen

and

let

statistical

algorithms

find

patterns

where

science

cannot.(The

Data

Deluge

Makes

the

Scientific

Method

Obsolete)20124So,

Why

Model

?GeorgeBox(statistician)The

Google

ArgumentAt

the

petabyte

scale,

information

is

not

a

matter

of

simple

three-

and

four-dimensionaltaxonomy

and

order

but

of

dimensionally

agnostic

statistics.

It

calls

for

an

entirely

differentapproach,

one

that

requires

us

to

lose

the

tether

of

data

as

something

that

can

be

visualizedin

its

totality.

It

forces

us

to

view

data

mathematically

first

and

establish

a

context

for

it

later.For

instance,

Google

conquered

the

advertising

world

with

nothing

more

than

appliedmathematics.

It

didn't

pretend

to

know

anything

about

the

culture

and

conventions

ofadvertising

it

just

assumed

that

better

data,

with

better

analytical

tools,

would

win

the

day.And

Google

was

right.Google's

founding

philosophy

is

that

we

don't

know

why

this

page

is

better

than

thatone:

If

the

statistics

of

incoming

links

say

it

is,

that's

good

enough.

No

semantic

orcausal

analysis

is

required.

That's

why

Google

can

translate

languages

without

actually"knowing"

them

(given

equal

corpus

data,

Google

can

translate

Klingon

into

Farsi

aseasily

as

it

can

translate

French

into

German).

And

why

it

can

match

ads

to

contentwithout

any

knowledge

or

assumptions

about

the

ads

or

the

content.TheGoogleArgumentAtthepetaModel

Free

Sensor

Informatics

:

Query

Driventime10am10am

..10amid12..7temp

20

21

29Database

Table

raw-dataSensorNetwork3.

Write

output

to

a

file/back

to

the

database4.

Write

data

processing

tools

to

process/aggregate

the

output

(maybe

using

User1.

Extract

all

readings

into

a

file2.

Run

MATLAB/R/other

data

processing

tools

DB)

5.

Decide

new

data

to

acquire

RepeatModel-free

sensing

treats

the

sensory

system

as

a

database,

and

sensing

as

querying

to

fetch

data

from

physicalworld.

One

of

the

leading

vendors

[Crossbow]

is

bundling

a

query

processor

with

their

devices.ModelFreeSensorInformaticsWikisensing

:

A

Model

Free

Sensor

Informatics

SystemBased

on

Big

Data

ArchitectureWikisensing:AModelFreeSenModel

Free

Sensing

is

Super

Inefficient•

Data

misrepresentation

without

model•

Latent

information

missing

without

model•

High

demand

of

computation/storage

without

model•

Require

too

much

of

interoperability

between

sensorsand

analyticsModelFreeSensingisSuperInBayesian:

Data

Is

Not

the

Enemy

of

Models

,

Rather

aGreat

Supporter!Bayesian

probability

is

a

formalism

that

allows

us

to

reason

about

beliefs

of

models

underconditions

of

uncertainty

based

on

the

observations

(data)

.If

we

have

observed

that

a

particular

event

has

happened,

such

as

Britain

coming

10th

in

themedal

table

at

the

2004

Olympics,

then

there

is

no

uncertainty

about

it.However,

suppose

a

is

the

statement

“Britain

sweeps

the

boards

at

2012

London

Olympics,winning

more

than

30

Gold

Medals!“

made

before

28th

of

JulySince

this

is

a

statement

about

a

future

event,

nobody

can

state

with

any

certainty

whether

ornot

it

is

true.

Different

people

may

have

different

beliefs

in

the

statement

depending

on

theirspecific

knowledge

of

factors

that

might

effect

its

likelihoodThe

belief’s

of

the

model

were

changing

daily

based

on

the

performance

data

available

eachday.

By

the

10

of

August,

most

of

people’s

belief

to

this

model

should

be

almost

80%Thus,

in

general,

a

person's

subjective

belief

in

a

statement

a

will

depend

on

some

body

ofknowledge

K.

We

write

this

as

P(a|K).

Henry's

belief

in

a

is

different

from

Marcel's

because

theyare

using

different

K's.

However,

even

if

they

were

using

the

same

K

they

might

still

havedifferent

beliefs

in

a.The

expression

P(a|K)

thus

represents

a

belief

measure.

Sometimes,

for

simplicity,

when

Kremains

constant

we

just

write

P(a),

but

you

must

be

aware

that

this

is

a

simplification.Bayesian:DataIsNottheEneModel

and

Data

Interaction

:

Bayesian

Inference10•Bayes

Rule:

Interaction

between

data

and

model•Learning

as

A

Sequence

of

Interactionsp(Y

|

)

p(

)

p(Y)P(

|

Y)

ModelandDataInteraction:BBig

Data

Meets

Smart

Models

:

A

Bayesian

Approachtowards

Sensor

Informatics•We

need

model

:

a

model

is

the

representation

of

our

knowledge

so

far•••••Data

:

the

observations

which

may

revise

our

belief

to

the

models

we

haveAnalysis

:

assessing

our

belief

and

updating

our

models

to

make

them

more

believableSensing

:

acquiring

needed

data

to

update

(enrich)

modelsModels

are

learned

from

data

(observations)

by

scientists

(theoretical

abstraction)

or

by

machine

(machinelearning)

Models

are

hypothesis

(

when

making

new

observation)

Models

are

knowledge

(when

established

belief)Sensor

Informatics:

Sensing

management

Managing

the

“neediness”

:

when

and

where

to

sense

Sensing

analytics

Managing

model

updating

:

how

to

enrich

models

with

observations

Reasoning

Decision

making

based

on

integration

of

trusted

models

•P(M

|

D)

=

P(D

|

M

)

P(M)

/

P(D)BigDataMeetsSmartModels:

Surprising

Event

:

When

an

Observation

Does

not

Fit

a

Known

Model

Posterior

and

prior

(P(M|D)

~

P(M)

)

has

great

variance

->

surprise!How

great

is

great

variance?

Surprise

threshold

αKullback-Leibler

divergence:Other

methods:

signficant

level,

Chebyshev’s

Theorem,

From

model,

we

get

C(A,

B)

(e.g.

a

multivariate

Gaussian

distribution)

A:

100mm

B:

50mmModel

consistentA:

100mmB:

500mmSurprise! SurprisingEvent:WhenanObCamera

example:

Image

->

Analog

Signal

->Digital

Data

->

Compressed

Data

->

InformationWhy

sensing

so

much

data

and

then

throw

themaway?Why

not

sensing

information

directly?Using

Compressive

Sensing

Technology

to

OptimizeObservations

Compressive

sensing:

Take

the

advantage

of

sparseness,

to

solve

the

under-determined

signals

with

just

a

small

amount

of

measurement.

Unobserved

behavior

(behavior

not

captured

by

the

current

model)

is

typically

sparse.Reconstruction

method:

L1-min,

Bayesian

CS.Sensing

data

is

enough

when

we

can

recover

the

need

information

through

compressive

sensing.Ψ:

CS

Matrix

built

from

the

modelΦ:

Placement

MatrixCameraexample:Image->AnaloHow

to

Update

Model

Parameter

Estimation1Y131.03188.294245.559302.823360.088417.352474.617531.881589.146646.41DEC

25

2011

21:15:23NODAL

SOLUTIONSTEP=360SUB

=1TIME=1800TEMP

(AVG)RSYS=0SMN

=131.03SMX

=646.41

MX

MN

Z

XEstimating

parameter

θ

to

maximize

the

likelihoodof

data

given

the

model:HowtoUpdateModel–ParametModel

:

An

Example

in

Digital

CityModelling

City

Life

via

Causality

:

C(eA,

eB)

is

used

for

predict

current

value

of

location

(A)

whenanother

location

(B)

value

is

given

Location

:

physical

/

logical

locations

with

causality

(through

sensory

cortex)(city

areas,

A.

B)

Relationship

:

topology

(geo

topology

between

A

and

B:

diffusion

Structure

)

Event:

events,

which

is

the

dynamics

of

observable

signal

S

=

f(E)

(heavyrainfall)Model:AnExampleinDigitalOntologies

are

adopted

to

represent

locations

L,

relationships

R*events

E,

and

signals

S.Diffusion:

An

event

e1∈

E

in

n1causes

another

event

e2

E

in

n2,when

two

nodes

n1,

n2

in

G

arelinked.

Digital

City

Model

:

looking

into

the

detailsSystem

T

=

(L,

R,

E)Model

M(T)

=

(G,

∅,

B)Training

for

causality

∅:

use

Bayesian

network

to

represent

theconditional

independencies

between

cause

and

target

variables:1.

Gaussian

Mixture

Models

(GMMs),

estimated

via

expectationmaximization

(EM)

2.

Gaussian

Process

with

Bayesian

Inference.Ontologiesareadoptedtorepr

When

the

surprise

>

surprise

threshold

Diversity

detected

identify

the

incorrect

causality

C(el,

ep),

which

is

sparse

Compressive

sensing

approachNew

observation->

measurement

thatcould

revise

model

in

model

space

tomaximize

the

likelihood

of

observations

Focusing

on

diversityPlacementModel

Updating

Model

Driven

Sensing

:

No

Surprise

!

The

dynamics

of

model

update:

Surprise

->

Sensing

->

Model

Updating

The

goal

for

sensing:

Capturingsurprise

The

goal

of

analysis

:

RevisingmodelA

model

cannot

overfit

/

underfit,

when

there

is

diversity,

it

could

be

updated->

consistent

with

the

universe

(target) Whenthesurprise>surpriseModel

UpdateIt’s

a

Bayesian:

P(M,

ϴ

|

D)

=

P(D

|

M,

ϴ)

P(M,

ϴ)

/

P(D)T:

target,

M:

model,

ϴ:

top-down

parameter*

When

ϴ

is

fixed:

P(M

|

D)

=

P(D

|

M)

P(M)

/

P(D)->

The

variance

between

posterior

and

prior

is

“surprise”->

bottom-up

attention

->

model

update

(data

assimilation):combining

observations

of

the

current

state

of

a

system

with

the

resultsfrom

a

model

(the

forecast)

to

produce

an

analysis.

The

model

is

thenadvanced

in

time

and

its

result

becomes

the

forecast

in

the

nextanalysis

cycle*

When

ϴ

is

updated:

P(M,

ϴ)

=

P(M

|

ϴ)P(ϴ)->

top-down

attention

(alertness)

->

model

updateModelUpdateIt’saBayesian:PAdaptive

Observation:

Sensing

and

Numerical

ModellingCityGML

Ontology

->

GIS

->

Geometry

meshAdaptiveObservation:SensingBuilding

An

Initial

Model

and

Making

Prediction

bySimulationsSetting

up

boundary

conditions,

numerical

schemas,

model

parameters,

etc.BuildingAnInitialModelandSimulation24

Building

Case

(Fine

Mesh

600000

Nodes):

20

ProcessorsSimulation24BuildingCase(FiSimulationMoving

Vehicles

and

Scalar

Dispersions

in

Street

CanyonsSimulationMovingVehiclesandUsing

Sensor

to

Verify

the

Prediction

Results

of

theModel

Sensing:

Acquiring

data

to

get

posterior

of

model,

for

validate

(consistent)

or

update

model

.

P(M

|

D)

=

P(D

|

M)

P(M)

/

P(D)Data

sensingModelvalidateupdateUsingSensortoVerifythePreNew

WikiSensing:

Elastic

Sensing

Environment

forLarge

Scale

Sensor

Informatics•

Elastic

sensing

theory

based

on

Bayesian

inference•

Big

Data

architecture

for

large

scale

sensory

data

management•

Ontology

for

the

background

knowledge

management•

Model

driven

adaptive

observation

support•

Digital

City

and

digital

life

applicationsNewWikiSensing:ElasticSensiThe

architecture

of

the

New

WikiSensing

SystemThearchitectureoftheNewWiOntology

Used

to

Organise

the

Complex

knowledgemanagementUsing

ontology

to

represent

the

targets,

signals,sensing

methods,

measurements,

etc.Ontology

to

support

flexible

resolution

Upper

ontology

for

unified

operationOntoSensorOntologyUsedtoOrganisetheConclusion•

Big

data

offers

great

opportunity

for

building

smart

models•

Big

data

provides

new

methodology

for

model

research•

New

informatics

comes

from

the

close

coupled

integration

of

the

data

and

the

model

worlds•

Bayesian

theory

provides

a

nature

foundation

for

such

an

integration•

Sensor

Informatics

is

a

good

example

for

such

a

paradigm•

A

new

uniform

framework

of

sensor

informatics

can

be

developed

based

on

the

Bayesian

theory

wherethe

dynamics

of

data

and

model

capturing

the

essence

of

building

a

sensory

system•

We

are

developing

the

WikiSensing

system

to

realise

this

paradigmConclusion•BigdataoffersThank

youThankyouUnderstanding

Big

DataHaixun

WangUnderstandingBigDataHaixunWData

ExplosionMB

=

106

bytesa

typical

book

in

text

formatGB

=

109

bytesa

one

hour

video

is

about

1GB;data

produced

by

a

biologyexperiment

in

one

dayTB

=

1012

bytesastronomy

data

in

one

night;US

Library

of

Congress

has

1000

TB

data;search

log

of

Bing

is

20

TB

per

day

(2009)DataExplosionMB=106bytesaThe

Arecibo

TelescopeWorld’s

largest

radio

telescopeDiameter

:

305

m

(1,000

ft)Area

:

18

acresLocation:

Arecibo,

Puerto

RicoThe

P-ALFA

surveys800

Terabytes

in

5

yearsTheAreciboTelescopeWorld’slSoftware

Driven

Telescopefrom

few,

large,

expensive,directional

dishes

to

many,

small,cheap,

omni

directional

antennaea

large

number

of

high-speedinput

streams(2Gbps

per

antenna,

25,000antennae

in

an

area

of

340

km

indiameter)SoftwareDrivenTelescopefromData

sizeChallenge

1:

It’s

the

data,

stupid!Data

complexityKey/value

storeColumn

storeDocument

storeGraph

SystemsDatasizeChallenge1:It’stheBig

data

drives

tomorrow’s

economy.•

The

value

of

big

data

lies

in

its

degree

ofconnectedness.•

Existing

systems

cannot

handle

richconnectedness

of

big

data.Bigdatadrivestomorrow’secoRDBMS

and

Rich

Relationships•

Performance

of

multi-way

joins

is

very

poor

inRDBMS•

Managing

data

of

rich

connectedness

requiresmulti-way

Joins

in

RDBMSRDBMSandRichRelationships•Trinity•

A

general

purpose,

distributed,

in

memory

graph

system•

Online

graph

query

processing•

Offline

graph

analyticsTrinity•Ageneralpurpose,dTrinity

Performance

Highlight•

Onlinequeryprocessing

:–

visiting

2.2

million

users

(3

hop

neighborhood)

on

Facebook:

<=

100ms–

foundation

for

graph-based

service,

e.g.,

entity

search•

Offlinegraphanalytics

:–

one

iteration

on

a

1

billion

node

graph:

<=

60sec–

foundation

for

analytics,

e.g.,

social

analyticsTrinityPerformanceHighlight•PeopleSearchDemoPeopleSearchDemoMulti-way

Join

vs.

Graph

TraversalCompanyIncidentProblem…IDCompanyID1ID2ID…IncidentID3ID4ID…ProblemRDBMSTrinityMulti-wayJoinvs.GraphTraveChallenge

2:

Interpretation

of

Big

Data•

IBM

Watson:–

Runs

on

2,880

cores,

15

terabytes

of

RAM,

and80kW

of

power•

A

human

brain:–

Runs

on

a

tuna

fish

sandwich

and

a

glass

of

waterChallenge2:Interpretationofansweringthe

questionunconstrainednatural

languageinferencing

&reasoningdomain

specificlanguagesimplecalculation

Human(Turing

Test)SIRI

Watson

Wolfram

AlphaGoogle/Bing?

the

Eternal

Questunderstanding

the

question

SQLcalculatoransweringthequestionunconstraTurning

the

Web

intoa

DatabaseTurningtheWeb intoWhat

you

see

when

you

look

at

my

homepage

…Haixun

WangMicrosoft

Research

AsiaEmail:

haixunw

@

microsoft

.

comTel:

+86-10-58963289Tel:

+1-914-902-0749I

joined

Microsoft

Research

Asia

in

2009.I

was

with

IBM

T.

J.

Watson

ResearchCenter

from

2000

to

2009.

I

received

theB.S.

and

M.S.

Degree

in

Computer

Sciencefrom

ShanghaiJiaoTongUniversity

in1994

and

1996,

the

Ph.D.

Degree

inComputer

Science

fromUniversityofCalifornia,LosAngelesin

June,

2000.WhatyouseewhenyoulookatAWhat

a

machine

sees

when

it

looks

at

my

homepage

…A

JPEG

Imagea

jpeg

Filetext

in

bigA

bold

fontA4

lines

of

textanother

dozen

lines

oftext

with

twoembedded

URLsAWhatamachineseeswhenitl专题论坛大数据课件Semantic

Web?•

Number

1

trend

in

2008–

Richard

MacManus•

The

infrastructure

to

power

theSemantic

Web

is

already

here.–

Tim

Berners-Lee•

Unstructured

information

will

give

way

to

structuredinformation

paving

the

road

to

intelligent

computing.–

Alex

IskoldSemanticWeb?•Number1tren专题论坛大数据课件More

data

beats

better

algorithmsBanko

and

Brill

2001MoredatabeatsbetteralgoritMean

translation

quality(1=incomprehensible,

4

=

perfect)English-Spanish

translation

quality,Microsoft

technical

texts2.5

23.52001200220032004200520062007Systran

Improvealgorithms,

scale

system,and

add

data!Rule-based

system

with

expensive

customizations

for

Microsoft3

MSRMT

Logos

Off-the-shelfrule-based

systemFrom

Rick

Rashid’s

talk:

It’s

a

data

driven

world

get

over

it!Meantranslationquality(1=incProbase

isA(concept,entities)isPropertyOf

(attributes)Co-occurrence

(isCEOof,

LocatedIn,etc)Concepts

(“SpanishArtists”)Entities

(“PabloPicaso”)Probase isAisPropertyOfCo-occuExplicit

vs.

Latent

Knowledge•

Abstract

representations

(such

as

clustersfrom

latent

analysis)

that

lack

linguisticcounterparts

are

hard

to

learn

or

validate

andtend

to

lose

information.•

Human

language

has

evolved

over

millennia

tohave

words

for

the

important

concepts;

let’suse

them.Halevy,

Norvig,

Pereira,

“The

Unreasonable

Effectiveness

of

Data”,

IEEE

Intelligent

Systems,

2009.Explicitvs.LatentKnowledge•What

is

interpretation?Whatisinterpretation?Add

Common

Sense

to

ComputingPablo

Picasso

25

Oct

1881SpanishAddCommonSensetoComputingPWhich

is

“kiki”

and

which

is

“bouba”?Whichis“kiki”andwhichis“soundshapezigzaggednesssoundshapezigzaggednessChinaIndiacountryBrazilemerging

marketChinaIndiacountryBrazilemerginbodytastesmell

winebodytastesmellIT

companyThe

engineer

is

eating

an

applefruitITcompanyTheengineeriseat

Multiple

ConceptsObama’s

real-estatepolicypresident,

politicianinvestment,

property,

asset,

plan,

documentpresident,

politician,investment,

property,

asset,

plan,

document MultipleConceptspresident,pMultiple

Concepts

applesoftware

company,

brand,

fruit,

juice

adobebrand,

software

company,

materialsoftware

company,software

manufacturer,

brand

juice,

materialbrand,

company,

fruit,MultipleConcepts apple adobes

Multiple

ConceptsObama’s

real-estatepolicypresident,

politicianinvestment,

property,

asset,

plan,

documentpresident,

politician,investment,

property,

example

plan,

documentthing,

issue,

term,

asset, MultipleConceptspresident,pExample:

(from

B.

Dolan)Who

assassinatedAbraham

Lincoln?Example:(fromB.Dolan)WhoasThe

far

reaching

implicationsScientific

MethodThefarreachingimplicationsSScientific

MethodScientificMethodWhat

really

counts

isunderstandingora

mastery

of

some

commonvocabularyWhatreallycountsisunderstanHow

can

big

data

help?A

much

more

rapid

cycle

of

hypothesisgeneration

and

testing•

General

access

toknowledge

in

science•

Autonomousexperimentation,

withan

‘active

learning’modelHowcanbigdatahelp?AmuchmTechnological

Singularityif

machines

could

even

slightly

surpass

human

intellect,

they

could

improve

theirown

designs

in

ways

unforeseen

by

their

designers,

and

thus

recursively

augmentthemselves

into

far

greater

intelligencesTechnologicalSingularityifmaThanksThanks大数据平台及互联网应用服务大数据平台及互联网应用服务Agenda

当前面临问题和挑战

国内外公司解决方案

大数据领域腾讯解决之道Agenda当前面临问题和挑战国内外公司解决方案Agenda第一篇:当前面临问题和挑战Agenda第一篇:当前面临问题和挑战大数据挑战(1)-海量数据存储技术?

1.PB级数据向ZB级演进,如何降低存储

和计算成本数据量:46PB机器数量:5600台2.工业级业务发展迅速对大数据计算时

效性和可靠性提出新的挑战大数据挑战(1)-海量数据存储技术?数据量:46PB机器数量大数据挑战(2)—数据应用难大数据挑战(2)—数据应用难大数据挑战(3)-精准推荐难1.企业信息泛滥的问题(全互联网)2.推荐精度低3.推荐效果有效评估问题4.如何有效收集用户主动行为数据大数据挑战(3)-精准推荐难1.企业信息泛滥的问题(全互联网Agenda第二篇:

国内外公司解决方案Agenda第二篇:国内外公司解决方案hadoop开源产品HbaseMahoutHive/Pig海豚技术海狗章鱼海星剑鱼蓝鲸…..…..海量计算:基于Hadoop海量存储计算集群,同时提供一站式的计算和存储资源管理

分布式数据挖掘:

基于Mahout分布式数

据数据挖掘数据分发中心:提供批量数据抽取和转载,同时准实时消息,日志分发(采用客户pull方式)

海量数据实时搜索:

基于Hbase和Solr集成,

提供千亿级别数据实时

查询和全文检索流计算框架:类似M/R流式计算框架,可以实现应用快速,提供在线数据加工服务海量数据查询:基于hive和Pig,提供Web页面海量数据可视化查询服务国内案例-支付宝大数据平台

支付宝hadoop相关应用服务hadoop开源HbaseMahoutHive/Pig海豚技•••••Online

news,

Google

News

reports

that

recommendations

increasearticles

viewed

by

38%

(Das

et

al.

2007).Movies,

Netflix

reports

that

over

60%

of

their

rentals

originate

fromrecommendations

(Thompson

2008).Amazon,

which

sells

music,

books,

and

movies,

35%

of

sales

arereported

to

originate

from

recommendations

(Lamere

&

Green

2008).Video,

YouTub

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论