




免费预览已结束,剩余3页可下载查看
下载本文档
版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
data warehousedata warehousing provides architectures and tools for business executives to systematically organize, understand, and use their data to make strategic decisions. a large number of organizations have found that data warehouse systems are valuable tools in todays competitive, fast evolving world. in the last several years, many firms have spent millions of dollars in building enterprise-wide data warehouses. many people feel that with competition mounting in every industry, data warehousing is the latest must-have marketing weapon a way to keep customers by learning more about their needs.“so, you may ask, full of intrigue, “what exactly is a data warehouse?data warehouses have been defined in many ways, making it difficult to formulate a rigorous definition. loosely speaking, a data warehouse refers to a database that is maintained separately from an organizations operational databases. data warehouse systems allow for the integration of a variety of application systems. they support information processing by providing a solid platform of consolidated, historical data for analysis.according to w. h. inmon, a leading architect in the construction of data warehouse systems, “a data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of managements decision making process. this short, but comprehensive definition presents the major features of a data warehouse. the four keywords, subject-oriented, integrated, time-variant, and nonvolatile, distinguish data warehouses from other data repository systems, such as relational database systems, transaction processing systems, and file systems. lets take a closer look at each of these key features.(1)subject-oriented: a data warehouse is organized around major subjects, such as customer, vendor, product, and sales. rather than concentrating on the day-to-day operations and transaction processing of an organization, a data warehouse focuses on the modeling and analysis of data for decision makers. hence, data warehouses typically provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process.(2)integrated: a data warehouse is usually constructed by integrating multiple heterogeneous sources, such as relational databases, flat files, and on-line transaction records. data cleaning and data integration techniques are applied to ensure consistency in naming conventions, encoding structures, attribute measures, and so on. (3)time-variant: data are stored to provide information from a historical perspective (e.g., the past 5-10 years). every key structure in the data warehouse contains, either implicitly or explicitly, an element of time.(4)nonvolatile: a data warehouse is always a physically separate store of data transformed from the application data found in the operational environment. due to this separation, a data warehouse does not require transaction processing, recovery, and concurrency control mechanisms. it usually requires only two operations in data accessing: initial loading of data and access of data.in sum, a data warehouse is a semantically consistent data store that serves as a physical implementation of a decision support data model and stores the information on which an enterprise needs to make strategic decisions. a data warehouse is also often viewed as an architecture, constructed by integrating data from multiple heterogeneous sources to support structured and/or ad hoc queries, analytical reporting, and decision making.“ok, you now ask, “what, then, is data warehousing?based on the above, we view data warehousing as the process of constructing and using data warehouses. the construction of a data warehouse requires data integration, data cleaning, and data consolidation. the utilization of a data warehouse often necessitates a collection of decision support technologies. this allows “knowledge workers (e.g., managers, analysts, and executives) to use the warehouse to quickly and conveniently obtain an overview of the data, and to make sound decisions based on information in the warehouse. some authors use the term “data warehousing to refer only to the process of data warehouse construction, while the term warehouse dbms is used to refer to the management and utilization of data warehouses. we will not make this distinction here.“how are organizations using the information from data warehouses? many organizations are using this information to support business decision making activities, including:(1) increasing customer focus, which includes the analysis of customer buying patterns (such as buying preference, buying time, budget cycles, and appetites for spending). (2) repositioning products and managing product portfolios by comparing the performance of sales by quarter, by year, and by geographic regions, in order to fine-tune production strategies.(3) analyzing operations and looking for sources of profit. (4) managing the customer relationships, making environmental corrections, and managing the cost of corporate assets.data warehousing is also very useful from the point of view of heterogeneous database integration. many organizations typically collect diverse kinds of data and maintain large databases from multiple, heterogeneous, autonomous, and distributed information sources. to integrate such data, and provide easy and efficient access to it is highly desirable, yet challenging. much effort has been spent in the database industry and research community towards achieving this goal.the traditional database approach to heterogeneous database integration is to build wrappers and integrators (or mediators) on top of multiple, heterogeneous databases. a variety of data joiner and data blade products belong to this category. when a query is posed to a client site, a metadata dictionary is used to translate the query into queries appropriate for the individual heterogeneous sites involved. these queries are then mapped and sent to local query processors. the results returned from the different sites are integrated into a global answer set. this query-driven approach requires complex information filtering and integration processes, and competes for resources with processing at local sources. it is inefficient and potentially expensive for frequent queries, especially for queries requiring aggregations.data warehousing provides an interesting alternative to the traditional approach of heterogeneous database integration described above. rather than using a query-driven approach, data warehousing employs an update-driven approach in which information from multiple, heterogeneous sources is integrated in advance and stored in a warehouse for direct querying and analysis. unlike on-line transaction processing databases, data warehouses do not contain the most current information. however, a data warehouse brings high performance to the integrated heterogeneous database system since data are copied, preprocessed, integrated, annotated, summarized, and restructured into one semantic data store. furthermore, query processing in data warehouses does not interfere with the processing at local sources. moreover, data warehouses can store and integrate historical information and support complex multidimensional queries. as a result, data warehousing has become very popular in industry.1.differences between operational database systems and data warehousessince most people are familiar with commercial relational database systems, it is easy to understand what a data warehouse is by comparing these two kinds of systems.the major task of on-line operational database systems is to perform on-line transaction and query processing. these systems are called on-line transaction processing (oltp) systems. they cover most of the day-to-day operations of an organization, such as, purchasing, inventory, manufacturing, banking, payroll, registration, and accounting. data warehouse systems, on the other hand, serve users or “knowledge workers in the role of data analysis and decision making. such systems can organize and present data in various formats in order to accommodate the diverse needs of the different users. these systems are known as on-line analytical processing (olap) systems.the major distinguishing features between oltp and olap are summarized as follows.(1)users and system orientation: an oltp system is customer-oriented and is used for transaction and query processing by clerks, clients, and information technology professionals. an olap system is market-oriented and is used for data analysis by knowledge workers, including managers, executives, and analysts.(2)data contents: an oltp system manages current data that, typically, are too detailed to be easily used for decision making. an olap system manages large amounts of historical data, provides facilities for summarization and aggregation, and stores and manages information at different levels of granularity. these features make the data easier for use in informed decision making.(3)database design: an oltp system usually adopts an entity-relationship (er) data model and an application -oriented database design. an olap system typically adopts either a star or snowflake model, and a subject-oriented database design.(4)view: an oltp system focuses mainly on the current data within an enterprise or department, without referring to historical data or data in different organizations. in contrast, an olap system often spans multiple versions of a database schema, due to the evolutionary process of an organization. olap systems also deal with information that originates from different organizations, integrating information from many data stores. because of their huge volume, olap data are stored on multiple storage media.(5). access patterns: the access patterns of an oltp system consist mainly of short, atomic transactions. such a system requires concurrency control and recovery mechanisms. however, accesses to olap systems are mostly read-only operations (since most data warehouses store historical rather than up-to-date information), although many could be complex queries. other features which distinguish between oltp and olap systems include database size, frequency of operations, and performance metrics and so on.2.but, why have a separate data warehouse?“since operational databases store huge amounts of data, you observe, “why not perform on-line analytical processing directly on such databases instead of spending additional time and resources to construct a separate data warehouse?a major reason for such a separation is to help promote the high performance of both systems. an operational database is designed and tuned from known tasks and workloads, such as indexing and hashing using primary keys, searching for particular records, and optimizing “canned queries. on the other hand, data warehouse queries are often complex. they involve the computation of large groups of data at summarized levels, and may require the use of special data organization, access, and implementation methods based on multidimensional views. processing olap queries in operational databases would substantially degrade the performance of operational tasks.moreover, an operational database supports the concurrent processing of several transactions. concurrency control and recovery mechanisms, such as locking and logging, are required to ensure the consistency and robustness of transactions. an olap query often needs read-only access of data records for summarization and aggregation. concurrency control and recovery mechanisms, if applied for such olap operations, may jeopardize the execution of concurrent transactions and thus substantially reduce the throughput of an oltp system.finally, the separation of operational databases from data warehouses is based on the different structures, contents, and uses of the data in these two systems. decision support requires historical data, whereas operational databases do not typically maintain historical data. in this context, the data in operational databases, though abundant, is usually far from complete for decision making. decision support requires consolidation (such as aggregation and summarization) of data from heterogeneous sources, resulting in high quality, cleansed and integrated data. in contrast, operational databases contain only detailed raw data, such as transactions, which need to be consolidated before analysis. since the two systems provide quite different functionalities and require different kinds of data, it is necessary to maintain separate databases.8数据仓库数据仓库为商务运作提供了组织结构和工具,以便系统地组织、理解和使用数据进行决策。许多组织发现在如今的具有竞争与快速发展的世界中数据仓库是非常有用的工具。在最近的几年里,许多公司花了几百万美元用于构建企业数据库。许多人也认为随着竞争加剧,数据仓库己成为营销必备的手段一种了解顾客的需求的武器。“那么”,你可能会充满神秘地问,“到底什么是数据仓库?”数据仓库有不同的定义,但却很难有一个严格的定义。不严谨的说,数据仓库是一个数据库,它与组织机构的操作数据库分别维护。数据仓库允许不同应用系统的集成,为统一的历史数据分析提供坚实的平台,对信息处理提供支持。按照w.h inmon,一位数据仓库构造方面的领头建筑师说,“数据仓库是一个面向主题的、集成的、随时间变化的、非易失的数据的集合,支持管理决策制定。”这个简短,但是复合的定义表述了数据仓库的主要特点。四个关键词,面向主题的、集成的、时变的、非易失的,将数据仓库与其它数据存储系统相区别。让我们进下来认识它的四个特征。(1)面向对象:数据仓库是围绕一些主题,如顾客、供应商、产品和销售组织。数据仓库关注决策者的数据建模与分析,而不是构造机构日常操作和事务处理。因此,数据仓库排除了在进程中提供的没有价值的决策。(2)集成的:数据仓库通常由多个数据源组成,如关系数据库、一般文件和联机事务处理记录。数据清理和数据集成技术被运用于确保命名的合理性、代码的结构,结构尺度等。(3)随时间变化:数据被存储是用来提供变化历史角度的信息。数据仓库中所包含的关键字,都显性或隐性的反映时间元素。(4)非易失性:数据仓库是物理地分离存放数据;基于这种分法,数据仓库不需要传输进程,覆盖和并发控制机制。它通常只需要两种数据访问:数据的初使化装入和数据访问。 总得来说,数据仓库是一种语义上一致的数据存储,它充当了物理决策数据模型的实施关于哪种企业需要做战略决策。数据仓库经常被认作一种结构,由集成的数据组合而成,支持结构化和启发式查询、分析报告和决策制定。 “好”,“现在你可以问什么是数据仓库。”基于以上所讲的,我们把数据仓库视为构造和使用数据仓库的过程。数据仓库的构造需要数据集成、数据清理和数据统一。利用数据仓库常常需要一些决策支持技术。这使得知识工作者能够利用数据仓库,快捷方便地得到数据总体视图,根据数据仓库中的信息做出准确的决策。有些人使用术语“建立数据库”表示构造数据仓库的过程,用仓库dbms表示管理和使用数据仓库。我们将不区分二者。“组织是如何从数据仓库中使用数据的?”许多组织使用这些信息支持决策活动,包括:(1)增加顾客关注,包括分析顾客购买模式(如,喜爱买什么、购买时间、预算周期、消费习惯);(2)根据季度、年、地区的营销情况比较,重新配置产品和管理投资,调整生产策略;(3)分析运作和查找利润源;(4)管理顾客关系、进行环境调整、管理合股人的资产开销。从异种数据库集成的角度看,数据仓库也是十分有用的。许多组织收集了不同类的数据,并由多个异种的、自治的、分布的数据源维护大型数据库。集成这些数据,并提供简便、有效的访问是非常希望的,并且也是一种挑战。数据库工业界和研究界都正朝着实现这一目标竭尽全力。对于异种数据库的集成,传统的数据库做法是:在多个异种数据库上,建立一个包装程序和一个集成程序(或仲裁程序)。这方面的例子包括ibm 的数据连接程序 和informix的数据刀。当一个查询提交客户站点,首先使用元数据字典对查询进行转换,将它转换成相应异种站点上的查询。然后,将这些查询映射和发送到局部查询处理器。由不同站点返回的结果被集成为全局回答。这种查询驱动的方法需要复杂的信息过滤和集成处理,并且与局部数据源上的处理竞争资源。这种方法是低效的,并且对于频繁的查询,特别是需要聚集操作的查询,开销很大。对于异种数据库集成的传统方法,数据仓库提供了一个有趣的替代方案。数据仓库使用更新驱动的方法,而不是查询驱动的方法。
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 2025河北唐山人才发展集团有限责任公司为曹妃甸某大型国企招聘储备外包员工50人笔试参考题库附带答案详解
- 2025年芜湖凤鸣控股集团及其子公司选调10人笔试参考题库附带答案详解
- 2025年河北石家庄建筑行业大型国有企业公开招聘46人笔试参考题库附带答案详解
- 2024年度天津市专利代理师科目一(专利法律知识)模拟考试试卷A卷含答案
- 2025年合肥公交集团有限公司高校毕业生招聘30人笔试参考题库附带答案详解
- 2024年度四川省护师类之护师(初级)高分通关题库A4可打印版
- 食品质量异常处理措施试题及答案
- 株洲市幼儿园外聘教职工招聘考试真题2024
- 文山州麻栗坡县融媒体中心招聘考试真题2024
- 上海市静安区融媒体中心招聘笔试真题2024
- 2024年广东惠州仲恺高新区招聘中学教师笔试真题
- 马化腾的创业故事
- 高中主题班会 心怀感恩志存高远课件-高一上学期感恩教育主题班会
- 2024年晋城市城区城市建设投资经营有限公司招聘考试真题
- 社工证笔试题库及答案
- 高考写作专项突破之核心概念阐释要诀 课件
- 2025年全国质量月活动总结参考(2篇)
- 口腔四手操作培训
- 2025年月度工作日历含农历节假日电子表格版
- 第37章 真菌学概论课件
- 总裁助理岗位职责
评论
0/150
提交评论