人工智能英文版课件:18_Learning_Observation_第1页
人工智能英文版课件:18_Learning_Observation_第2页
人工智能英文版课件:18_Learning_Observation_第3页
人工智能英文版课件:18_Learning_Observation_第4页
人工智能英文版课件:18_Learning_Observation_第5页
已阅读5页,还剩101页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Chapter 18 Learning From Observations2OutlineIntroductionForms of learningInductive learningLearning decision treesEnsemble learningWhy learning works: computational learning theory3IntroductionAn idea: Percepts should be also used to improve the agents ability to act in the future. Learning helps t

2、o realize the idea.When learning takes places?The agent observes its interactions with the world.The agent observes its own decision-making processes.4IntroductionThe range of learning: from trivial memorization of experiences to the creation of entire scientific theories. We describes inductive lea

3、rning from observations in this chapter. How to learn simple theories in propositional logic.A theoretical analysis that explains why inductive learning works.5Forms of learningLearning agent Performance element Decides what actions to take Learning elementModifies the performance element so as to m

4、ake better decisions.Machine learning element researchers have come up with a large variety of learning elements.6Forms of learningThree major issues of designing a learning element:Which components of the performance element are to be learned.What feedback is available to learn these components.Wha

5、t representation is used for the components.7Forms of learningMany ways are available to build the performance element of an agentThe components of these agents A direct mapping from conditions on the current state to actions.A means to infer relevant properties of the world from the percept sequenc

6、e.Information about the way the world evolves and about the results of possible actions the agent can take.8Forms of learningUtility information indicating the desirability of world statesAction-value information indicating the desirability of actions.Goals that describe classes of states whose achi

7、evement maximizes the agents utility.Those components can be learned from appropriatefeedback. 9Forms of learningExamples Example 1:An agent training to become a taxi driver.“Brake!” - learn a condition-action rule for when to brake.Example 2:An agent learning whether the image contains a busSeeing

8、many camera images that it is told contain buses - learn to recognize them.10Forms of learningExample 3: An agent trying action and observing the resultsBraking hard on a wet road- learn the effects of its actionExample 4: An agent learning a functionReceives no tip from passengers who have been tho

9、roughly shaken up during the trip -learn a useful component of its overall utility function.11Forms of learningThe most important factor in determining the nature of the learning problem that the agent faces The type of feedback available for learning Three cases in the field of machine learning: Su

10、pervised Learning a function from examples of its inputs and outputs. In Example 1 and 2, a teacher provided the correct output value of the examples .12Forms of learningIn Example 3,the output value was available directly from the agents percepts.In fully observable environments, an agent can obser

11、ve the effects of its actions and hence can use supervised method to learn to predict them.In partially observable environments, the problem is more difficult, because the immediate effects might be invisible 13Forms of learningUnsupervisedLearning patterns in the input when no specific output value

12、s are supplied.Example: A taxi agent might gradually develop a concept of “good traffic days” and “bad traffic days ” without ever being given labeled examples of each.A purely unsupervised learning agent cannot learn what to do ,because it has no information as to what constitutes a correct action

13、or a desirable state.14Forms of learningReinforcement learningMore general than the other two categories.Learning from reinforcement without being told what to do by a teacher.Example : the lack of a tip at the end of the journey give the agent some indication that its behavior is undesirable.Typica

14、lly includes the subproblem of learning how the environment works.15Forms of learningThe representation of the learned information also plays a very important role in determining how the learning algorithm must work.Examples Linear weighted polynomials for utility functions in game-playing programsP

15、ropositional and first-order logical sentences for all of the components in a logical agentProbabilistic descriptions such as Bayesian networks for the inferential components of a decision-theoretic agent.Effective learning algorithms have been devised for all of these. 16Forms of learningAnother ma

16、jor factor in the design of learning systems: the availability of prior knowledge.The majority of learning research in AI, computer science ,and psychology has studied the case in which the agent begins with no knowledge at all about what it is trying to learn. Most human learning takes place in the

17、 context of a good deal of background knowledge. Whatever the truth, there is no doubt that prior knowledge can help enormously in learning. 17Inductive learning An algorithm for deterministic supervised learningInput the correct value of the unknown function for particular inputs and must try to re

18、cover the unknown function or something close to it. More formally, an example is a pair (x,f(x). where x is the input and f(x) is the output of the function applied to x.18Inductive learning Given a collection of examples of f, return a function h that approximates f. The function h is called a hyp

19、othesis. Choosing h is the fundamental problem of inductionA good hypothesis will generalize well.Difficult to distinguish whether any particular h is a good approximation of f 19Inductive learningExample: Fitting a function of a single variable to some data points20Inductive learning Figure 18.1(a)

20、 shows some data with exact fit by a straight line (a polynomial degree 1).Figure 18.1(b) shows a high-degree polynomial that is also consistent with the same data.Figure 18.1(c) shows a second data set. There is no consistent straight line for this data set; It requires a degree-6 polynomial (with

21、7parameters )for an exact fit. Figure 18.1(d) shows that dada in (c) can be fit exactly by a simple function of the form ax+b+csinx21Inductive learning The example shows the importance of the choice of hypothesis space.A learning problem is realizable if the hypothesis space contains the true functi

22、on, otherwise ,it is unrealizable.Sometimes the true function is not known, so we dont know whether a given learning problem is realizable.22Inductive learning Two approaches to solve the problemUse prior knowledge to derive a hypothesis space in which we know the true function must lie. Use the lar

23、gest possible hypothesis space. 23Learning decision trees Decision tree inductionOne of the simplest learning algorithmMost successful forms of learning algorithm.A good introduction to the area of inductive learning Easy to implement. 24Learning decision trees Performance elementsA decision tree In

24、put : an object or situation described by a set of attributes. Attributes can be discrete or continuous. Return: a “decision”, the predicted output value for the input.Learning categoryClassification learning: earning a discrete-valued function. Attributes can be discrete or continuous.Regression: l

25、earning a continuous functionClassification in this chapter is Boolean classification: true (positive) or false (negative)25Learning decision trees A decision tree reaches its decision by performing a sequence of tests Internal node in the tree: a test of the value of one of the properties. The bran

26、ches from the node :labeled with the possible values of the testEach leaf node in the tree specifies the value to be returned if that leaf is reachedThe decision tree representation seems to be very natural for humans 26Learning decision trees A simple example whether to wait for a table at a restau

27、rant. Aim :learn a definition for the goal predicate WillWait List of attributes 1. Alternate : whether there is a suitable alternative restaurant nearby. 2. Bar : whether the restaurant has a comfortable bar area to wait in 3. Fri/Sat : true on Fridays and Saturdays 4. Hungry : whether we are hungr

28、y27Learning decision trees 5. Patrons : how many people in the restaurant (values are None, Some, and Full). 6. Price : the restaurants price range ($,$,$) 7. Raining : whether it is raining outside. 8. Reservation : whether we made a reservation 9. Type : the kind of restaurant (French, Italian, Th

29、ai, or burger) 10. WaitEstimate: the wait estimated by the host(0-10 minutes,10-30,30-60,60)2829Learning decision trees Figure 18.2 shows that the decision tree usually used by one of us for this domain The tree does not use the Price and Type attributes Examples are processed by the tree starting a

30、t the root and following the appropriate branch until a leaf is reached: An example with Patrons=Full and WaitEstimate =0- 10 will be classified as positive (yes, we will wait for a table).30Learning decision trees Expressiveness of decision treesAn assertion of the form of any particular decision t

31、ree hypothesis for the WillWait goal predicate: where each condition Pi (s) is a conjunction of tests corresponding to a path from the root of the tree to a leaf with a positive outcome.The decision tree is really describing a relationship between WillWait and some logical combination of attribute v

32、alues.31Learning decision trees We cannot use decision trees to present tests that refer to two or more different objects. An exampleMeans: Is there a cheaper restaurant nearby?Obviously, we could add another Boolean attribute with the name CheaperRestaurantNearyby, but it is intractable to add all

33、such attributes.32Learning decision trees Any Boolean function can be written as a decision treeBe done trivially by having each row in the truth table for the function correspond to a path in the tree.Yield an exponentially large decision tree representation because the truth table has exponentiall

34、y many rows.Decision tree can represent many function with much smaller trees.33Learning decision trees Decision trees are good for some kinds of function and bad for others. For some kind of function , there is a real problem If the function is parity function ,which returns 1 if and only if an eve

35、n number of inputs are 1,then an exponentially large decision tree will be need.Yield an exponentially large decision tree representation because the truth table has exponentially many rows.It is difficult to use a decision tree to present a majority function, which returns 1if more than half of its

36、 inputs are 1.34Learning decision trees Is there any kind of representation that is efficient for all kind of functions. The answer is no. Consider the set of all Boolean function on n attributes.How many different functions are in this set? This is just the number of different truth tables that we

37、can write down, because the function is defined by its truth table. No matter what representation we use for functions, almost all of the functions are going to require at least that many bits to represent.35Learning decision trees If it takes 2n bits to define the function, then there are 22n diffe

38、rent function on n attributes. This is a scary number. An example With just six Boolean attributes, there are 226 =18,446,744,073,709,551,616 different functions to choose from. We will need some ingenious algorithms to find consistent hypotheses in such a large space. 36Learning decision trees Indu

39、cing decision trees from examplesAn example for a Boolean decision tree consists of a vector of input attributes, X ,and a single Boolean output value y. A set of examples (X1,y1), (X12,y12) is shown in Figure 18.3.The positive examples are the ones in which the goal WillWait is true (X1,X3,.). The

40、negative examples are the ones in which it is false (X1,X5,.). The complete set of examples is called the training set. 37Learning decision trees38Learning decision trees There is a trivial solution to find decision trees that agree with the training set.We could simple construct a decision tree tha

41、t has one path to a leaf for each example, where the path tests each attribute in turn and follows the value for the example and the leaf has the classification of the example.When given the same example again, the decision tree will come up with the right classification39Learning decision trees Pro

42、blem It just memorizes the observations and does not extract any pattern from the examples, so we cannot expect it to be able to extrapolate to example it has not seen. Solution Applying Ockhams razor, find instead the smallest decision tree that is consistent with the examples. Finding “smallest” i

43、s an intractable problem, so finding a “smallish ” one instead.40Learning decision trees The basic idea behind the DECISION-TREE- LEARNING algorithm is to test the most important attribute. Most important attribute: the one that makes the most difference to the classification of an example. The corr

44、ect classification with a small number of tests: All paths in the tree will be short and the tree as a whole will be small. Figure 18.4 shows how the algorithm gets started. 41Learning decision trees42Learning decision trees We are given 12 training examples. Figure 18.4 (a) shows that Type is a poo

45、r attribute: Type leaves us with four possible outcomes, each of which has the same number of positive and negative examples. Figure 18.4 (b) shows Patrons is a fairly important attribute: If the value is None or Some, then we are left with example sets for which we can answer definitively (No and Y

46、es, respectively). If the value is Full, we are left with a mixed set of examples.43Learning decision trees In general ,after the first attribute test splits up the examples, each outcome is a new decision tree learning problem in itself, with fewer examples and one fewer attribute. Four cases to co

47、nsider for these recursive problems: 1. If there are some positive and some negative examples, then choose the best attribute to split them. 2. If all the remaining examples are positive (or all negative), then we are done; we can answer Yes or No 44Learning decision trees 3. If there are no example

48、s left, it means that no such example has been observed, and we return a default value calculated from the majority classification at the nodes parent 4. If there are no attributes left, but both positive and negative examples, we have a problem. 454647From the figure above, we can find that :The tr

49、ee is clearly different from the original tree show in Figure 18.2, despite the fact that the data were actually generated from an agent using the original tree.The learning algorithm looks at the examples, not the function, and its hypothesis not only agrees with all the examples, but is considerab

50、ly simpler than the original tree.The learning algorithm has no reason to include tests for Raining and Reservation, for it can classify all the examples without them.Learning decision trees 48It has also detected an interesting and previously unsuspected pattern: the first author will wait for Thai

51、 food on weekends.Some questions about Figure 18.6:The tree is bound to make a mistake; e.g., it has never seen a case where the wait is 0-10 minutes but the restaurant is full. For a case where Hungry is false, the tree says not to wait, but I(SR) would certainly wait. This raises an obvious questi

52、on: if the algorithm induces a consistent, but in correct, tree from the examples, how incorrect will the tree be?Learning decision trees 49Choosing attribute testsA perfect attribute divides the examples into sets that are all positive or all negative.One suitable measure is the expected amount of

53、information provided by the attribute, where we use the term in the mathematical sense first defined in Shannon and Weaver(1949).To understand the notion of information, think about it as providing the answer to a question. The amount of information contained in the answer depends on ones prior know

54、ledge. The less you know the more information is provided.Learning decision trees 50Learning decision trees Information theory measures information content in bits. One bit of information is enough to answer a yes/no question about which one has no idea.If the possible answer have probabilities , th

55、e information content I of the actual answer is given by 51Learning decision trees A example of Information content I. Whether a coin will come up heads. For the tossing of a fair coin, we getIf the coin is loaded to give 99% heads, we get I(1/100, 99/100)= 0.08 bits, and as the probability of heads

56、 goes to 1, the information of the actual answer goes to 0.52Learning decision trees For decision tree learning, what is the correct classification for a given example.An estimate of the probabilities of the positive and negative examples in the training set.Suppose the training set contains p posit

57、ive examples and n negative examples. Then an estimate of the information contained in a correct answer isThe restaurant training set in Figure 18.3 has p=n=6, so we need 1 bit of information.53Learning decision trees With a attribute, we can measure exactly how much by looking at how much informati

58、on we still need after the attribute test.Any attribute A divides the training set E into subsets E1, , Ev according to their values for A, where A can have v distinct values. Each subset Ei has pi positive examples and ni negative examples, so we need an additional bits of information to answer the

59、 question.54Learning decision trees A randomly chosen example from the training set has the ith value for the attribute with probability , on average, after testing attribute A, the information we need is The information gain from the attribute test is the difference between the original information

60、 requirement and the new requirement:55Learning decision trees Assessing the performance of the learning algorithmMethod 1. Collect a large set of examples. 2.Divide it into two disjoint sets: the training set and the test set. 3.Apply the learning algorithm to the training set, generating a hypothe

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论