版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
河北工程大学毕业论文(设计)论文题目:鸿海种业仓库管理系统的论文题目:鸿海种业仓库管理系统的设计与实现作者姓名:石成华专业班级:信管1001学号信息:指导老师:张贵炜论文日期:2023.04.10
数据仓库
数据仓库为商务运作提供结构与工具,以便系统地组织、理解和使用数据进行决策。大量组织机构已经发现,在当今这个充满竞争、快速发展的世界,数据仓库是一个有价值的工具。在过去的几年中,许多公司已花费数百万美元,建立公司范围的数据仓库。许多人感到,随着工业竞争的加剧,数据仓库成了必备的最新营销武器——通过更多地了解客户需求而保住客户的途径。
“那么”,你也许会充满神秘地问,“到底什么是数据仓库?”
数据仓库已被多种方式定义,使得很难严格地定义它。宽松地讲,数据仓库是一个数据库,它与组织机构的操作数据库分别维护。数据仓库系统允许将各种应用系统集成在一起,为统一的历史数据分析提供坚实的平台,对信息解决提供支持。
按照W.
H.
Inmon,一位数据仓库系统构造方面的领头建筑师的说法,“数据仓库是一个面向主题的、集成的、时变的、非易失的数据集合,支持管理决策制定”。这个简短、全面的定义指出了数据仓库的重要特性。四个关键词,面向主题的、集成的、时变的、非易失的,将数据仓库与其它数据存储系统(如,关系数据库系统、事务解决系统、和文献系统)相区别。让我们进一步看看这些关键特性。
(1)面向主题的:数据仓库围绕一些主题,如顾客、供应商、产品和销售组织。数据仓库关注决策者的数据建模与分析,而不是构造组织机构的平常操作和事务解决。因此,数据仓库排除对于决策无用的数据,提供特定主题的简明视图。
(2)集成的:通常,构造数据仓库是将多个异种数据源,如关系数据库、一般文献和联机事务解决记录,集成在一起。使用数据清理和数据集成技术,保证命名约定、编码结构、属性度量的一致性等。
(3)时变的:数据存储从历史的角度(例如,过去5-10
年)提供信息。数据仓库中的关键结构,隐式或显式地包含时间元素。
(4)
非易失的:数据仓库总是物理地分离存放数据;这些数据源于操作环境下的应用数据。由于这种分离,数据仓库不需要事务解决、恢复和并行控制机制。通常,它只需要两种数据访问:数据的初始化装入和数据访问。概言之,数据仓库是一种语义上一致的数据存储,它充当决策支持数据模型的物理实现,并存放公司决策所需信息。数据仓库也经常被看作一种体系结构,通过将异种数据源中的数据集成在一起而构造,支持结构化和启发式查询、分析报告和决策制定。
“好”,你现在问,“那么,什么是建立数据仓库?”根据上面的讨论,我们把建立数据仓库看作构造和使用数据仓库的过程。数据仓库的构造需要数据集成、数据清理、和数据统一。运用数据仓库经常需要一些决策支持技术。这使得“知识工人”(例如,经理、分析人员和主管)可以使用数据仓库,快捷、方便地得到数据的总体视图,根据数据仓库中的信息做出准确的决策。有些作者使用术语“建立数据仓库”表达构造数据仓库的过程,而用术语“仓库DBMS”表达管理和使用数据仓库。我们将不区分两者。
“组织机构如何使用数据仓库中的信息?”许多组织机构正在使用这些信息支持商务决策活动,涉及:
(1)、增长顾客关注,涉及分析顾客购买模式(如,爱慕买什么、购买时间、预算周期、消费习惯);
(2)、根据季度、年、地区的营销情况比较,重新配置产品和管理投资,调整生产策略;
(3)、分析运作和查找利润源;
(4)、管理顾客关系、进行环境调整、管理合股人的资产开销。从异种数据库集成的角度看,数据仓库也是十分有用的。许多组织收集了形形色色数据,并由多个异种的、自治的、分布的数据源维护大型数据库。集成这些数据,并提供简便、有效的访问是非常希望的,并且也是一种挑战。数据库工业界和研究界都正朝着实现这一目的竭尽全力。对于异种数据库的集成,传统的数据库做法是:在多个异种数据库上,建立一个包装程序和一个集成程序(或仲裁程序)。这方面的例子涉及IBM的数据连接程序和Informix的数据刀。当一个查询提交客户站点,一方面使用元数据字典对查询进行转换,将它转换成相应异种站点上的查询。然后,将这些查询映射和发送到局部查询解决器。由不同站点返回的结果被集成为全局回答。这种查询驱动的方法需要复杂的信息过滤和集成解决,并且与局部数据源上的解决竞争资源。这种方法是低效的,并且对于频繁的查询,特别是需要聚集操作的查询,开销很大。对于异种数据库集成的传统方法,数据仓库提供了一个有趣的替代方案。数据仓库使用更新驱动的方法,而不是查询驱动的方法。这种方法将来自多个异种源的信息预先集成,并存储在数据仓库中,供直接查询和分析。与联机事务解决数据库不同,数据仓库不包含最近的信息。然而,数据仓库为集成的异种数据库系统带来了高性能,由于数据被拷贝、预解决、集成、注释、汇总,并重新组织到一个语义一致的数据存储中。在数据仓库中进行的查询解决并不影响在局部源上进行的解决。此外,数据仓库存储并集成历史信息,支持复杂的多维查询。这样,建立数据仓库在工业界已非常流行。1.操作数据库系统与数据仓库的区别由于大多数人都熟悉商品关系数据库系统,将数据仓库与之比较,就容易理解什么是数据仓库。联机操作数据库系统的重要任务是执行联机事务和查询解决。这种系统称为联机事务解决(OLTP)系统。它们涵盖了一个组织的大部分平常操作,如购买、库存、制造、银行、工资、注册、记帐等。另一方面,数据仓库系统在数据分析和决策方面为用户或“知识工人”提供服务。这种系统可以用不同的格式组织和提供数据,以便满足不同用户的形形色色需求。这种系统称为联机分析解决(OLAP)系统。OLTP和OLAP的重要区别概述如下。(1)用户和系统的面向性:OLTP是面向顾客的,用于办事员、客户、和信息技术专业人员的事务和查询解决。OLAP是面向市场的,用于知识工人(涉及经理、主管、和分析人员)的数据分析。(2)数据内容:OLTP系统管理当前数据。通常,这种数据太琐碎,难以方便地用于决策。OLAP系统管理大量历史数据,提供汇总和聚集机制,并在不同的粒度级别上存储和管理信息。这些特点使得数据容易用于见多识广的决策。(3)数据库设计:通常,OLTP系统采用实体-联系(ER)模型和面向应用的数据库设计。而OLAP系统通常采用星形或雪花模型和面向主题的数据库设计。(4)视图:OLTP系统重要关注一个公司或部门内部的当前数据,而不涉及历史数据或不同组织的数据。相比之下,由于组织的变化,OLAP系统经常跨越数据库模式的多个版本。OLAP系统也解决来自不同组织的信息,由多个数据存储集成的信息。由于数据量巨大,OLAP数据也存放在多个存储介质上。(5)、访问模式:OLTP系统的访问重要由短的、原子事务组成。这种系统需要并行控制和恢复机制。然而,对OLAP系统的访问大部分是只读操作(由于大部分数据仓库存放历史数据,而不是当前数据),尽管许多也许是复杂的查询。OLTP和OLAP的其它区别涉及数据库大小、操作的频繁限度、性能度量等。2.但是,为什么需要一个分离的数据仓库
“既然操作数据库存放了大量数据”,你注意到,“为什么不直接在这种数据库上进行联机分析解决,而是此外花费时间和资源去构造一个分离的数据仓库?”分离的重要因素是提高两个系统的性能。操作数据库是为已知的任务和负载设计的,如使用主关键字索引和散列,检索特定的记录,和优化“罐装的”查询。另一方面,数据仓库的查询通常是复杂的,涉及大量数据在汇总级的计算,也许需要特殊的数据组织、存取方法和基于多维视图的实现方法。在操作数据库上解决OLAP
查询,也许会大大减少操作任务的性能。
此外,操作数据库支持多事务的并行解决,需要加锁和日记等并行控制和恢复机制,以保证一致性和事务的强健性。通常,OLAP
查询只需要对数据记录进行只读访问,以进行汇总和聚集。假如将并行控制和恢复机制用于这OLAP
操作,就会危害并行事务的运营,从而大大减少OLTP
系统的吞吐量。
最后,数据仓库与操作数据库分离是由于这两种系统中数据的结构、内容和用法都不相同。决策支持需要历史数据,而操作数据库一般不维护历史数据。在这种情况下,操作数据库中的数据尽管很丰富,但对于决策,经常还是远远不够的。决策支持需要将来自异种源的数据统一(如,聚集和汇总),产生高质量的、纯净的和集成的数据。相比之下,操作数据库只维护具体的原始数据(如事务),这些数据在进行分析之前需要统一。由于两个系统提供很不相同的功能,需要不同类型的数据,因此需要维护分离的数据库。
Data
warehousing
provides
architectures
and
tools
for
business
executives
to
systematically
organize,
understand,
and
use
their
data
to
make
strategic
decisions.
A
large
number
of
organizations
have
found
that
data
warehouse
systems
are
valuable
tools
in
today's
competitive,
fast
evolving
world.
In
the
last
several
years,
many
firms
have
spent
millions
of
dollars
in
building
enterprise-wide
data
warehouses.
Many
people
feel
that
with
competition
mounting
in
every
industry,
data
warehousing
is
the
latest
must-have
marketing
weapon
——
a
way
to
keep
customers
by
learning
more
about
their
needs.
“So",
you
may
ask,
full
of
intrigue,
“what
exactly
is
a
data
warehouse?"
Data
warehouses
have
been
defined
in
many
ways,
making
it
difficult
to
formulate
a
rigorous
definition.
Loosely
speaking,
a
data
warehouse
refers
to
a
database
that
is
maintained
separately
from
an
organization's
operational
databases.
Data
warehouse
systems
allow
for
the
integration
of
a
variety
of
application
systems.
They
support
information
processing
by
providing
a
solid
platform
of
consolidated,
historical
data
for
analysis.
According
to
W.
H.
Inmon,
a
leading
architect
in
the
construction
of
data
warehouse
systems,
“a
data
warehouse
is
a
subject-oriented,
integrated,
time-variant,
and
nonvolatile
collection
of
data
in
support
of
management's
decision
making
process."
This
short,
but
comprehensive
definition
presents
the
major
features
of
a
data
warehouse.
The
four
keywords,
subject-oriented,
integrated,
time-variant,
and
nonvolatile,
distinguish
data
warehouses
from
other
data
repository
systems,
such
as
relational
database
systems,
transaction
processing
systems,
and
file
systems.
Let's
take
a
closer
look
at
each
of
these
key
features.
(1).Subject-oriented:
A
data
warehouse
is
organized
around
major
subjects,
such
as
customer,
vendor,
product,
and
sales.
Rather
than
concentrating
on
the
day-to-day
operations
and
transaction
processing
of
an
organization,
a
data
warehouse
focuses
on
the
modeling
and
analysis
of
data
for
decision
makers.
Hence,
data
warehouses
typically
provide
a
simple
and
concise
view
around
particular
subject
issues
by
excluding
data
that
are
not
useful
in
the
decision
support
process.
(2)
Integrated:
A
data
warehouse
is
usually
constructed
by
integrating
multiple
heterogeneous
sources,
such
as
relational
databases,
flat
files,
and
on-line
transaction
records.
Data
cleaning
and
data
integration
techniques
are
applied
to
ensure
consistency
in
naming
conventions,
encoding
structures,
attribute
measures,
and
so
on.
(3).Time-variant:
Data
are
stored
to
provide
information
from
a
historical
perspective
(e.g.,
the
past
5-10
years).
Every
key
structure
in
the
data
warehouse
contains,
either
implicitly
or
explicitly,
an
element
of
time.
(4)Nonvolatile:
A
data
warehouse
is
always
a
physically
separate
store
of
data
transformed
from
the
application
data
found
in
the
operational
environment.
Due
to
this
separation,
a
data
warehouse
does
not
require
transaction
processing,
recovery,
and
concurrency
control
mechanisms.
It
usually
requires
only
two
operations
in
data
accessing:
initial
loading
of
data
and
access
of
data.
In
sum,
a
data
warehouse
is
a
semantically
consistent
data
store
that
serves
as
a
physical
implementation
of
a
decision
support
data
model
and
stores
the
information
on
which
an
enterprise
needs
to
make
strategic
decisions.
A
data
warehouse
is
also
often
viewed
as
an
architecture,
constructed
by
integrating
data
from
multiple
heterogeneous
sources
to
support
structured
and/or
ad
hoc
queries,
analytical
reporting,
and
decision
making.
“OK",
you
now
ask,
“what,
then,
is
data
warehousing?"
Based
on
the
above,
we
view
data
warehousing
as
the
process
of
constructing
and
using
data
warehouses.
The
construction
of
a
data
warehouse
requires
data
integration,
data
cleaning,
and
data
consolidation.
The
utilization
of
a
data
warehouse
often
necessitates
a
collection
of
decision
support
technologies.
This
allows
“knowledge
workers"
(e.g.,
managers,
analysts,
and
executives)
to
use
the
warehouse
to
quickly
and
conveniently
obtain
an
overview
of
the
data,
and
to
make
sound
decisions
based
on
information
in
the
warehouse.
Some
authors
use
the
term
“data
warehousing"
to
refer
only
to
the
process
of
data
warehouse
construction,
while
the
term
warehouse
DBMS
is
used
to
refer
to
the
management
and
utilization
of
data
warehouses.
We
will
not
make
this
distinction
here.
“How
are
organizations
using
the
information
from
data
warehouses?"
Many
organizations
are
using
this
information
to
support
business
decision
making
activities,
including:
(1)
increasing
customer
focus,
which
includes
the
analysis
of
customer
buying
patterns
(such
as
buying
preference,
buying
time,
budget
cycles,
and
appetites
for
spending),
(2)
repositioning
products
and
managing
product
portfolios
by
comparing
the
performance
of
sales
by
quarter,
by
year,
and
by
geographic
regions,
in
order
to
fine-tune
production
strategies,
(3)
analyzing
operations
and
looking
for
sources
of
profit,
(4)
managing
the
customer
relationships,
making
environmental
corrections,
and
managing
the
cost
of
corporate
assets.
Data
warehousing
is
also
very
useful
from
the
point
of
view
of
heterogeneous
database
integration.
Many
organizations
typically
collect
diverse
kinds
of
data
and
maintain
large
databases
from
multiple,
heterogeneous,
autonomous,
and
distributed
information
sources.
To
integrate
such
data,
and
provide
easy
and
efficient
access
to
it
is
highly
desirable,
yet
challenging.
Much
effort
has
been
spent
in
the
database
industry
and
research
community
towards
achieving
this
goal.
The
traditional
database
approach
to
heterogeneous
database
integration
is
to
build
wrappers
and
integrators
(or
mediators)
on
top
of
multiple,
heterogeneous
databases.
A
variety
of
data
joiner
and
data
blade
products
belong
to
this
category.
When
a
query
is
posed
to
a
client
site,
a
metadata
dictionary
is
used
to
translate
the
query
into
queries
appropriate
for
the
individual
heterogeneous
sites
involved.
These
queries
are
then
mapped
and
sent
to
local
query
processors.
The
results
returned
from
the
different
sites
are
integrated
into
a
global
answer
set.
This
query-driven
approach
requires
complex
information
filtering
and
integration
processes,
and
competes
for
resources
with
processing
at
local
sources.
It
is
inefficient
and
potentially
expensive
for
frequent
queries,
especially
for
queries
requiring
aggregations.
Data
warehousing
provides
an
interesting
alternative
to
the
traditional
approach
of
heterogeneous
database
integration
described
above.
Rather
than
using
a
query-driven
approach,
data
warehousing
employs
an
update-driven
approach
in
which
information
from
multiple,
heterogeneous
sources
is
integrated
in
advance
and
stored
in
a
warehouse
for
direct
querying
and
analysis.
Unlike
on-line
transaction
processing
databases,
data
warehouses
do
not
contain
the
most
current
information.
However,
a
data
warehouse
brings
high
performance
to
the
integrated
heterogeneous
database
system
since
data
are
copied,
preprocessed,
integrated,
annotated,
summarized,
and
restructured
into
one
semantic
data
store.
Furthermore,
query
processing
in
data
warehouses
does
not
interfere
with
the
processing
at
local
sources.
Moreover,
data
warehouses
can
store
and
integrate
historical
information
and
support
complex
multidimensional
queries.
As
a
result,
data
warehousing
has
become
very
popular
in
industry.
1.
Differences
between
operational
database
systems
and
data
warehouses
Since
most
people
are
familiar
with
commercial
relational
database
systems,
it
is
easy
to
understand
what
a
data
warehouse
is
by
comparing
these
two
kinds
of
systems.
The
major
task
of
on-line
operational
database
systems
is
to
perform
on-line
transaction
and
query
processing.
These
systems
are
called
on-line
transaction
processing
(OLTP)
systems.
They
cover
most
of
the
day-to-day
operations
of
an
organization,
such
as,
purchasing,
inventory,
manufacturing,
banking,
payroll,
registration,
and
accounting.
Data
warehouse
systems,
on
the
other
hand,
serve
users
or
“knowledge
workers"
in
the
role
of
data
analysis
and
decision
making.
Such
systems
can
organize
and
present
data
in
various
formats
in
order
to
accommodate
the
diverse
needs
of
the
different
users.
These
systems
are
known
as
on-line
analytical
processing
(OLAP)
systems.
The
major
distinguishing
features
between
OLTP
and
OLAP
are
summarized
as
follows.
(1).
Users
and
system
orientation:
An
OLTP
system
is
customer-oriented
and
is
used
for
transaction
and
query
processing
by
clerks,
clients,
and
information
technology
professionals.
An
OLAP
system
is
market-oriented
and
is
used
for
data
analysis
by
knowledge
workers,
including
managers,
executives,
and
analysts.
(2).
Data
contents:
An
OLTP
system
manages
current
data
that,
typically,
are
too
detailed
to
be
easily
used
for
decision
making.
An
OLAP
system
manages
large
amounts
of
historical
data,
provides
facilities
for
summarization
and
aggregation,
and
stores
and
manages
information
at
different
levels
of
granularity.
These
features
make
the
data
easier
for
use
in
informed
decision
making.
(3).
Database
design:
An
OLTP
system
usually
adopts
an
entity-relationship
(ER)
data
model
and
an
application
-oriented
database
design.
An
OLAP
system
typically
adopts
either
a
star
or
snowflake
model,
and
a
subject-oriented
database
design.
(4).
View:
An
OLTP
system
focuses
mainly
on
the
current
data
within
an
enterprise
or
department,
without
referring
to
historical
data
or
data
in
different
organizations.
In
contrast,
an
OLAP
system
often
spans
multiple
versions
of
a
database
schema,
due
to
the
evolutionary
process
of
an
organization.
OLAP
systems
also
deal
with
information
that
originates
from
different
organizations,
integrating
information
from
many
data
stores.
Because
of
their
huge
volume,
OLAP
data
are
stored
on
multiple
storage
media.
(5).
Access
patterns:
The
access
patterns
of
an
OLTP
system
consist
mainly
of
short,
atomic
transactions.
Such
a
system
requires
concurrency
control
and
recovery
mechanisms.
However,
accesses
to
OLAP
systems
are
mostly
read-only
operations
(since
most
data
warehouses
store
historical
rather
than
up-to-date
information),
although
many
could
be
complex
queries.
Other
features
which
distinguish
between
OLTP
and
OLAP
systems
include
database
size,
frequency
of
operations,
and
performance
metrics
and
so
on.
2.
But,
why
have
a
separate
data
warehouse?
“Since
operational
databases
store
huge
amounts
of
data",
you
observe,
“why
not
perform
on-line
analytical
processing
directly
on
such
databases
instead
of
spending
additional
time
and
resources
to
construct
a
separate
data
warehouse?"
A
major
reason
for
such
a
separation
is
to
help
promote
the
high
performance
of
both
systems.
An
operational
database
is
designed
and
tuned
from
known
tasks
and
workloads,
such
as
indexing
and
hashing
using
primary
keys,
searching
for
particular
records,
and
optimizing
“canned"
queries.
On
the
other
hand,
data
warehouse
queries
are
often
complex.
They
involve
the
computation
of
large
groups
of
data
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2024年度技术转让合同案例2篇
- 二零二四年度国际物流中心项目融资合同
- 2024年度咨询服务合同:企业管理咨询服务协议2篇
- 2024年度房屋租赁合同:某房东出租商业房产给租户3篇
- 全新民办医院聘用2024年度医生劳动合同3篇
- 商场临时使用许可协议(2024年版)3篇
- 2024年度专利实施许可合同范本许可方权益保护3篇
- 2024年度品牌许可协议2篇
- 二零二四年度零售行业-商品销售2024年度合同2篇
- 2024年度三人合伙的工程协议书
- 精神科住培入科教育
- 播音主持专业职业生涯规划书
- 销售意向合同范本
- 江苏省扬州树人学校2022年中考一模语文试卷及答案
- 环境质量及评价(5篇)
- 14S501-2 双层井盖图集
- 4.1.2 从不同方向看立体图形与立体图形的展开图
- 建筑工程冬期施工规程JGJ/T 104-2011
- 普通话培训 省级普通话测试员
- 2023文化产业促进会工作总结
- 固定资产回收记录单
评论
0/150
提交评论