Classic heuristic search algorithms can find solutions that take the form of a simple path (A*), a
tree, or an acyclic graph (AO*).
Heuristic search, called LAO*, that can find solutions with loops.
One of the most widely-used frameworks for problem-solving in artificial intelligence is state-space search. A state-space search problem is defined by a set of states, a set of actions (or operators) that map states to successor states, a start state, and a set of goal states. The objective is to find a sequence of actions that transforms the start state into a goal state, and also optimizes some measure of the quality of the solution.
Two well-known heuristic search algorithms for state-space search problems are A* and
AO*.A* finds a solution that takes the form of a sequence of actions leading in a path from the start state to a goal state. AO* finds a solution that has a conditional structure and takes the form of a tree, or more generally, an acyclic graph. But no heuristic search algorithm has been developed that can find a solution that takes the form of a cyclic graph, that is, a solution with loops. For many problems that can be formalized in the state-space search model, it does not make sense for a solution to contain loops. A loop in a solution to a theorem-proving problem represents circular reasoning. A loop in a solution to a problem-reduction task represents a failed reduction to primitive subproblems. However, there is an important class of problems for which it does make sense for a solution to contain loops. These problems can be formalized as Markov decision processes, a framework that is widely used in artificial intelligence for problems of planning and learning under uncertainty.
A Markov decision process (MDP) models problems of sequential decision making that include actions that transform a state into one of several possible successor states, with each possible state transition occurring with some probability. A solution to an MDP takes the form of a mapping from states to actions called a policy. A policy is executed by observing the current state and taking the action prescribed for it. A solution represented in this way implicitly contains both branches and loops. Branching is present because the state that stochastically results from an action determines the next action. Looping is present because the same state may be revisited under a policy. (As an example of a plan with a conditional loop, consider an action that has its desired effect with probability less than one and otherwise has no effect. An appropriate plan might be to repeat the action until it “succeeds”.)
So the question is what is Markov decision process and how it can be represented mathematically.
Markov decision processes (MDPs), named after Andrey Markov, provide a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying a wide range of optimization problems solved via dynamic programming and reinforcement learning. MDPs were known at least as early as the 1950s (cf. Bellman 1957). A core body of research on Markov decision processes resulted from Ronald A. Howard's book published in 1960, Dynamic Programming and Markov Processes. They are used in a wide area of disciplines, including robotics, automated control, economics, and manufacturing. More precisely, a Markov Decision Process is a discrete time stochastic control process. At each time step, the process is in some state , and the decision maker may choose any action that is available in state . The process responds at the next time step by randomly moving into a new state , and giving the decision maker a corresponding reward .The probability that the process moves into its new state is influenced by the chosen action. Specifically, it is given by the state transition function . Thus, the next state depends on the current state and the decision maker's action . But given and , it is conditionally independent of all previous states and actions; in other words, the state transitions of an MDP process satisfies the Markov property.
Markov decision processes are an extension of Markov chains; the difference is the addition of actions (allowing choice) and rewards (giving motivation). Conversely, if only one action exists for each state and all rewards are the same (e.g., zero), a Markov decision process reduces to a Markov chain An optimal policy can be found using a dynamic programming algorithm such as policy iteration or value iteration. But a disadvantage of dynamic programming is that it evaluates the entire state space. In effect, it finds a policy for every possible starting state. By contrast, heuristic search algorithms solve a problem for a particular starting state and use an admissible heuristic to focus the search, and remove from consideration regions of the state space that can’t be reached from the start state by an optimal solution. For problems with large state spaces, heuristic search has an advantage over dynamic programming because it can find an optimal solution for a start state without evaluating
the entire state space... This advantage is well-known for problems that can be solved by A* or AO*. In fact, an important theorem about the behavior of A* is that (under certain conditions) it evaluates
the minimal number of states among all algorithms that find an optimal solution using the
same heuristic . A related result has also been established for AO* . In this page, we generalize the heuristic search approach to find solutions with loops and show that the resulting algorithm, which we call LAO*, can solve planning problems that are formalized as MDPs.
Dynamic programming algorithms for MDPs and heuristic search algorithms for state-space search problems, and discusses their relationship.
Markov decision processes:
We consider an MDP with a finite set of states, S. For each state i ∈ S, let A(i) denote a finite set of actions available in that state. Let pij (a) denote the probability that taking action a in state i results in a transition to state j. Let ci(a) denote the expected immediate cost of taking action a in state i .We focus on a special class of MDPs called stochastic shortest-path problems because it generalizes traditional shortest-path problems for which artificial intelligence (AI) search algorithms have been developed. (The name “shortest-path” reflects an interpretation of action costs as arc lengths.) A stochastic shortest-path problem is an MDP with a set of terminal states, T ⊆ S. For every terminal state i ∈ T , no action can cause a transition out of this state (i.e., it is an absorbing state) and the immediate cost of any action taken in this state is zero. Formally, pii(a) = 1 and ci(a) = 0 for all i ∈ T and a ∈ A(i).For all other states, immediate costs are assumed to be positive for every action, that is, ci(a) > 0 for all states i /∈ T and actions a ∈ A(i). The objective is to reach a terminal state while incurring minimum expected cost. Thus, we can think of terminal states as goal states and we will use the terms “terminal state” and “goal state” interchangeably from now on. Because the probabilistic outcomes of actions can create a nonzero probability of revisiting the same state, the worst-case number of steps needed to reach a goal state cannot be bounded. Hence, stochastic shortest-path problems are said to have an indefinite horizon. A solution to an indefinite-horizon MDP can be represented as a stationary mapping from states to actions, π : S→A, called a policy. A policy is said to be proper if, for every state, it ensures that a goal state is reached with probability 1.0. For a proper policy, the expected cost for reaching a goal state from each state i ∈ S is finite and can be computed by solving the following system of |S| linear equations in |S| unknowns
We call f π the evaluation function of policy π.We assume that the stochastic shortest-path problems we consider have at least one proper policy, and that every improper policy has infinite expected cost for at least one state. This assumption is made by Bertsekas in developing the theory of stochastic shortest-path problems. It generalizes the assumption for deterministic shortest-path problems that there is a path to the goal state from every other state, and all cycles have positive cost.A policy π is said to dominate a policy π if f π (i) f π(i) for every state i . An optimal policy, π∗, dominates every other policy, and its evaluation function, f∗, satisfies the following system of |S| nonlinear equations in |S| unknowns, called the Bellman optimality equation:
Dynamic programming algorithms for MDPs find the evaluation function that satisfies the Bellman equation by successively improving an estimated evaluation function, f , by performing backups. For each state i , a backup takes the following form
Performing a backup for every state i in the state set is referred to as a dynamic programming update. It is the core step of dynamic-programming algorithms for solving MDPs. The fact that a backup is performed for every state in the state set is characteristic of dynamic programming, which evaluates all problem states.
There are two related dynamic programming algorithms for indefinite-horizon MDPs: policy iteration and value iteration. Policy iteration is summarized in Table 1. It interleaves the dynamic-programming update, used for policy improvement, with policy evaluation. When the current policy π is not optimal, the dynamic-programming update finds a new policy π such that f π(i) f π (i) for every state i and f π(i) < f π (i) for at least one state i . Because the number of possible policies is finite and policy iteration improves the current policy each iteration, it converges to an optimal policy after a finite number of iterations.
Heuristic search in AND/OR graphs
Like an MDP, a state-space search problem is defined by a set of states (including a start state and a set of goal states), a set of actions that cause state transitions, and a cost function that specifies costs for state transitions. The objective is to find a minimal-cost path from the start state to a goal state. It is customary to formalize a state-space search problem as a graph in which each node of the graph represents a problem state and each arc represents a state transition that results from taking an action.We use the same notation for state-space search problems as for MDPs. Let S denote the set of possible states, let s ∈ S denote a start state that corresponds to the root of the graph, and let T ⊂ S denote a set of goal (or terminal) states that occur at the leaves of the graph. Let A denote a finite set of actions and let A(i) denote the set of actions applicable in state i . The most general state-space search problem considered in the AI literature is AND/OR graph search. Following Martelli and Montanari and Nilsson , we define an AND/OR graph as a hypergraph. Instead of arcs that connect pairs of states as in an
Fig. (a) shows an OR node with two arcs leaving it, one for action a1 and one for action a2. Each arc leads to an AND node with two successor OR nodes, one for each possible successor state. (By convention, a square denotes an OR node and a circle denotes an AND node. In the terminology of decision analysis, a square corresponds to a choice node and a circle corresponds to a chance node.) (b) shows a state, indicated by a circle, with two 2-connectors leaving it, one for action a1 and one for action a2. Each 2-connector leads to two possible successor states. The representation on the right, using state nodes and k-connectors, is equivalent to the representation on the left, which uses OR and AND nodes. ordinary graph, a hypergraph has hyperarcs or k-connectors that connect a state to a set
of k successor states. Fig. above relates the concept of OR and AND nodes to the concept of a
hyperarc or k-connector. A k-connector can be interpreted in different ways. In problem-reduction search, it is interpreted as the transformation of a problem into k subproblems.We consider problems
of planning under uncertainty in which it is interpreted as an action with an uncertain outcome. The action transforms a state into one of k possible successor states, with a probability attached to each successor. We let pij (a) denote the probability that taking action a in state i results in a transition to state j . This interpretation of AND/OR graphs is made by Martelli and Montanari [14] and Pattipati and Alexandridis , among others. In AND/OR graph search, a solution takes the form of an acyclic subgraph called a solution graph, which is defined as follows:
• the start state belongs to a solution graph;
• for every nongoal state in a solution graph, exactly one outgoing k-connector (corresponding to an action) is part of the solution graph and each of its successor states also belongs to the solution graph;
• every directed path in the solution graph terminates at a goal state. A cost function assigns a cost to each k-connector. Let ci(a) denote the cost for the kconnector that corresponds to taking action a in state i . We assume each goal state has a cost of zero. A minimal-cost solution graph is found by solving the following system of equations,
where f∗(i) denotes the cost of an optimal solution for state i and f∗ is called the optimal evaluation function. Note that this is identical to the optimality equation defined for MDPs in Eq. (2), although we now assume the solution can be represented as an acyclic graph. This assumption means that a goal state can be reached from the start state after a bounded number of actions (where the bound is equal to the longest path from the start state to a goal state). By contrast, a stochastic shortest-path problem has an unbounded or indefinite horizon. Thus the traditional framework of AND/OR graph search does not directly apply to stochastic shortest-path problems.
Search Engine Optimization
Search engine optimization is often about making small modifications to parts of your website. When viewed individually, these changes might seem like incremental improvements, but when combined with other optimizations, they could have a noticeable impact on your site's user experience and performance in organic search results. You're likely already familiar with many of the topics in this guide, because they're essential ingredients for any web page, but you may not be making the most out of them.
Create unique, accurate page titles
A title tag tells both users and search engines what the topic of a particular page is. The <title> tag should be placed within the <head> tag of the HTML document (1). Ideally, you should create a
unique title for each page on your site.
(2) A user performs the query [baseball cards]. Our homepage shows up as a result, with the title listed on the first line (notice that the query terms the user searched for appear in bold).
(3) A user performs the query [rarest baseball cards]. A relevant, deeper page (its
title is unique to the content of the page) on our site appears as a result.
Page title contents are displayed in search results
If your document appears in a search results page, the contents of the title tag will usually appear in the first line of the results (if you're unfamiliar with the different parts of a Google search result,
you might want to check out the anatomy of a search result video by Google engineer Matt Cutts, and this helpful diagram of a Google search results page). Words in the title are bolded if they appear in the user's search query. This can help users recognize if the page is likely to be relevant to their search (2). The title for your homepage can list the name of your website/ business and could include other bits of important information like the physical location of the business or maybe a few of its main focuses or offerings (3).
AO* For state-space search problems that are formalized as AND/OR graphs, an optimal solution graph can be found using the heuristic search algorithm AO*. Nilsson [16,17] first described a version of AO* for searching AND/OR trees to find a solution in the form of a tree. Martelli and Montanari [13,14] generalized this algorithm for searching AND/OR graphs to find a solution in the form of an acyclic graph. Nilsson [18] also used the name AO* for this more general algorithm. Because any acyclic graph can be unfolded into an equivalent tree, the tree-search and graph-search versions of AO* solve the same class of problems. The graph-search version is more efficient when the same state can be reached along different paths because it avoids performing duplicate searches. Like other heuristic search algorithms, AO* can find an optimal solution without considering every problem state. Therefore, a graph is not usually supplied explicitly to the search algorithm. An implicit graph, G, is specified implicitly by a start state s and a successor function. The search algorithm constructs an explicit graph, G , that initially consists only of the start state. A tip or leaf state of the explicit graph is said to be terminal if it is a goal state; otherwise, it is said to be nonterminal. A nonterminal tip state can be expanded by adding to the explicit graph its outgoing k-connectors (one for each action) and any successor states not already in the explicit graph.
AO* solves a state-space search problem by gradually building a solution graph, beginning from the start state. A partial solution graph is defined similarly to a solution graph, with the difference that the tip states of a partial solution graph may be nonterminal states of the implicit AND/OR graph. A partial solution graph is defined as follows:
• the start state belongs to a partial solution graph;
• for every nontip state in a partial solution graph, exactly one outgoing k-connector (corresponding to an action) is part of the partial solution graph and each of its successor states also belongs to the partial solution graph;
• every directed path in a partial solution graph terminates at a tip state of the explicit graph. As with solution graphs, there are many possible partial solution graphs and an evaluation function can be used to rank them. The cost of a partial solution graph is defined similarly to the cost of a solution graph. The difference is that if a tip state of a partial solution graph is nonterminal, it does not have a cost that can be propagated backwards. Instead, we assume there is an admissible heuristic estimate h(i) of the minimal-cost solution graph for state i . A heuristic evaluation function h is said to be admissible if h(i) f ∗(i) for every state i .We can recursively calculate an admissible heuristic estimate f (i) of the optimal cost of any state i in the explicit graph as follows:
The best partial solution graph can be determined at any time by propagating heuristic estimates from the tip states of the explicit graph to the start state. If we mark the action AO*
1. The explicit graph G initially consists of the start state s.
2. While the best solution graph has some nonterminal tip state: (a) Expand best partial solution: Expand some nonterminal tip state n of the best partial solution graph and add any new successor states to G . For each new state i added to G by expanding n, if i is a goal state then f (i) := 0; else f (i) := h(i). (b) Update state costs and mark best actions: i. Create a set Z that contains the expanded state and all of its ancestors in the explicit graph along marked action arcs. (I.e., only include ancestor states from which the expanded state can be reached by following the current best solution.)
ii. Repeat the following steps until Z is empty. A. Remove from Z a state i such that no descendent of i in G occurs in Z. B. Set f (i) := mina∈A(i) [ci (a) + j pij (a)f (j )] and mark the best action for i. (When determining the best action resolve ties arbitrarily, but give preference to the currently marked
action.) 3. Return an optimal solution graph. that minimizes the value of each state, the best partial solution graph can be determined by starting at the root of the graph and selecting the best action for each reachable state. Table outlines the algorithm AO* for finding a minimal-cost solution graph in an acyclic AND/OR graph. It interleaves forward expansion of the best partial solution with
a cost revision step that updates estimated state costs and the best partial solution. In the simplest version of AO*, the costs of the expanded state and all of its ancestor states in the explicit graph are updated . However, Martelli and Montanari and Nilsson note that the only ancestor states that may have their costs change are those from which the expanded state can be reached by taking marked actions (i.e., by choosing the best action for each state). Thus, the parenthetical remark in step 2(b)i of Table 4 indicates that a parent i of state j is not added to Z unless both the estimated cost of state j has changed, and state j can be reached from state i by choosing the best action for state i . To simplify exposition, we have omitted from our outline of AO* a solve-labeling procedure that is often included in the algorithm to improve efficiency. Briefly, a state is labeled solved if it is a goal state or if all of its successor states are labeled solved. Labeling states as solved can improve the efficiency of the solution expansion step because it is unnecessary to search below a solved state for nonterminal tip states. We also note that the best partial solution graph may have many nonterminal tip states. AO* works correctly no matter which of these tip states is chosen for expansion. However, the efficiency of AO* can be improved by using a good selection function to choose which nonterminal tip state of the best partial solution graph to expand next. Possibilities include selecting the tip state with the least estimated cost, or selecting the tip state with greatest probability of being reached. Although AND/OR graphs are usually assumed to be acyclic, the possibility of searching cyclic AND/OR graphs has been studied. In previous work, however, the solution graph is assumed to be acyclic
tree, or an acyclic graph (AO*).
Heuristic search, called LAO*, that can find solutions with loops.
One of the most widely-used frameworks for problem-solving in artificial intelligence is state-space search. A state-space search problem is defined by a set of states, a set of actions (or operators) that map states to successor states, a start state, and a set of goal states. The objective is to find a sequence of actions that transforms the start state into a goal state, and also optimizes some measure of the quality of the solution.
Two well-known heuristic search algorithms for state-space search problems are A* and
AO*.A* finds a solution that takes the form of a sequence of actions leading in a path from the start state to a goal state. AO* finds a solution that has a conditional structure and takes the form of a tree, or more generally, an acyclic graph. But no heuristic search algorithm has been developed that can find a solution that takes the form of a cyclic graph, that is, a solution with loops. For many problems that can be formalized in the state-space search model, it does not make sense for a solution to contain loops. A loop in a solution to a theorem-proving problem represents circular reasoning. A loop in a solution to a problem-reduction task represents a failed reduction to primitive subproblems. However, there is an important class of problems for which it does make sense for a solution to contain loops. These problems can be formalized as Markov decision processes, a framework that is widely used in artificial intelligence for problems of planning and learning under uncertainty.
A Markov decision process (MDP) models problems of sequential decision making that include actions that transform a state into one of several possible successor states, with each possible state transition occurring with some probability. A solution to an MDP takes the form of a mapping from states to actions called a policy. A policy is executed by observing the current state and taking the action prescribed for it. A solution represented in this way implicitly contains both branches and loops. Branching is present because the state that stochastically results from an action determines the next action. Looping is present because the same state may be revisited under a policy. (As an example of a plan with a conditional loop, consider an action that has its desired effect with probability less than one and otherwise has no effect. An appropriate plan might be to repeat the action until it “succeeds”.)
So the question is what is Markov decision process and how it can be represented mathematically.
Markov decision processes (MDPs), named after Andrey Markov, provide a mathematical framework for modeling decision making in situations where outcomes are partly random and partly under the control of a decision maker. MDPs are useful for studying a wide range of optimization problems solved via dynamic programming and reinforcement learning. MDPs were known at least as early as the 1950s (cf. Bellman 1957). A core body of research on Markov decision processes resulted from Ronald A. Howard's book published in 1960, Dynamic Programming and Markov Processes. They are used in a wide area of disciplines, including robotics, automated control, economics, and manufacturing. More precisely, a Markov Decision Process is a discrete time stochastic control process. At each time step, the process is in some state , and the decision maker may choose any action that is available in state . The process responds at the next time step by randomly moving into a new state , and giving the decision maker a corresponding reward .The probability that the process moves into its new state is influenced by the chosen action. Specifically, it is given by the state transition function . Thus, the next state depends on the current state and the decision maker's action . But given and , it is conditionally independent of all previous states and actions; in other words, the state transitions of an MDP process satisfies the Markov property.
Markov decision processes are an extension of Markov chains; the difference is the addition of actions (allowing choice) and rewards (giving motivation). Conversely, if only one action exists for each state and all rewards are the same (e.g., zero), a Markov decision process reduces to a Markov chain An optimal policy can be found using a dynamic programming algorithm such as policy iteration or value iteration. But a disadvantage of dynamic programming is that it evaluates the entire state space. In effect, it finds a policy for every possible starting state. By contrast, heuristic search algorithms solve a problem for a particular starting state and use an admissible heuristic to focus the search, and remove from consideration regions of the state space that can’t be reached from the start state by an optimal solution. For problems with large state spaces, heuristic search has an advantage over dynamic programming because it can find an optimal solution for a start state without evaluating
the entire state space... This advantage is well-known for problems that can be solved by A* or AO*. In fact, an important theorem about the behavior of A* is that (under certain conditions) it evaluates
the minimal number of states among all algorithms that find an optimal solution using the
same heuristic . A related result has also been established for AO* . In this page, we generalize the heuristic search approach to find solutions with loops and show that the resulting algorithm, which we call LAO*, can solve planning problems that are formalized as MDPs.
Dynamic programming algorithms for MDPs and heuristic search algorithms for state-space search problems, and discusses their relationship.
Markov decision processes:
We consider an MDP with a finite set of states, S. For each state i ∈ S, let A(i) denote a finite set of actions available in that state. Let pij (a) denote the probability that taking action a in state i results in a transition to state j. Let ci(a) denote the expected immediate cost of taking action a in state i .We focus on a special class of MDPs called stochastic shortest-path problems because it generalizes traditional shortest-path problems for which artificial intelligence (AI) search algorithms have been developed. (The name “shortest-path” reflects an interpretation of action costs as arc lengths.) A stochastic shortest-path problem is an MDP with a set of terminal states, T ⊆ S. For every terminal state i ∈ T , no action can cause a transition out of this state (i.e., it is an absorbing state) and the immediate cost of any action taken in this state is zero. Formally, pii(a) = 1 and ci(a) = 0 for all i ∈ T and a ∈ A(i).For all other states, immediate costs are assumed to be positive for every action, that is, ci(a) > 0 for all states i /∈ T and actions a ∈ A(i). The objective is to reach a terminal state while incurring minimum expected cost. Thus, we can think of terminal states as goal states and we will use the terms “terminal state” and “goal state” interchangeably from now on. Because the probabilistic outcomes of actions can create a nonzero probability of revisiting the same state, the worst-case number of steps needed to reach a goal state cannot be bounded. Hence, stochastic shortest-path problems are said to have an indefinite horizon. A solution to an indefinite-horizon MDP can be represented as a stationary mapping from states to actions, π : S→A, called a policy. A policy is said to be proper if, for every state, it ensures that a goal state is reached with probability 1.0. For a proper policy, the expected cost for reaching a goal state from each state i ∈ S is finite and can be computed by solving the following system of |S| linear equations in |S| unknowns
We call f π the evaluation function of policy π.We assume that the stochastic shortest-path problems we consider have at least one proper policy, and that every improper policy has infinite expected cost for at least one state. This assumption is made by Bertsekas in developing the theory of stochastic shortest-path problems. It generalizes the assumption for deterministic shortest-path problems that there is a path to the goal state from every other state, and all cycles have positive cost.A policy π is said to dominate a policy π if f π (i) f π(i) for every state i . An optimal policy, π∗, dominates every other policy, and its evaluation function, f∗, satisfies the following system of |S| nonlinear equations in |S| unknowns, called the Bellman optimality equation:
Dynamic programming algorithms for MDPs find the evaluation function that satisfies the Bellman equation by successively improving an estimated evaluation function, f , by performing backups. For each state i , a backup takes the following form
Performing a backup for every state i in the state set is referred to as a dynamic programming update. It is the core step of dynamic-programming algorithms for solving MDPs. The fact that a backup is performed for every state in the state set is characteristic of dynamic programming, which evaluates all problem states.
There are two related dynamic programming algorithms for indefinite-horizon MDPs: policy iteration and value iteration. Policy iteration is summarized in Table 1. It interleaves the dynamic-programming update, used for policy improvement, with policy evaluation. When the current policy π is not optimal, the dynamic-programming update finds a new policy π such that f π(i) f π (i) for every state i and f π(i) < f π (i) for at least one state i . Because the number of possible policies is finite and policy iteration improves the current policy each iteration, it converges to an optimal policy after a finite number of iterations.
Heuristic search in AND/OR graphs
Like an MDP, a state-space search problem is defined by a set of states (including a start state and a set of goal states), a set of actions that cause state transitions, and a cost function that specifies costs for state transitions. The objective is to find a minimal-cost path from the start state to a goal state. It is customary to formalize a state-space search problem as a graph in which each node of the graph represents a problem state and each arc represents a state transition that results from taking an action.We use the same notation for state-space search problems as for MDPs. Let S denote the set of possible states, let s ∈ S denote a start state that corresponds to the root of the graph, and let T ⊂ S denote a set of goal (or terminal) states that occur at the leaves of the graph. Let A denote a finite set of actions and let A(i) denote the set of actions applicable in state i . The most general state-space search problem considered in the AI literature is AND/OR graph search. Following Martelli and Montanari and Nilsson , we define an AND/OR graph as a hypergraph. Instead of arcs that connect pairs of states as in an
Fig. (a) shows an OR node with two arcs leaving it, one for action a1 and one for action a2. Each arc leads to an AND node with two successor OR nodes, one for each possible successor state. (By convention, a square denotes an OR node and a circle denotes an AND node. In the terminology of decision analysis, a square corresponds to a choice node and a circle corresponds to a chance node.) (b) shows a state, indicated by a circle, with two 2-connectors leaving it, one for action a1 and one for action a2. Each 2-connector leads to two possible successor states. The representation on the right, using state nodes and k-connectors, is equivalent to the representation on the left, which uses OR and AND nodes. ordinary graph, a hypergraph has hyperarcs or k-connectors that connect a state to a set
of k successor states. Fig. above relates the concept of OR and AND nodes to the concept of a
hyperarc or k-connector. A k-connector can be interpreted in different ways. In problem-reduction search, it is interpreted as the transformation of a problem into k subproblems.We consider problems
of planning under uncertainty in which it is interpreted as an action with an uncertain outcome. The action transforms a state into one of k possible successor states, with a probability attached to each successor. We let pij (a) denote the probability that taking action a in state i results in a transition to state j . This interpretation of AND/OR graphs is made by Martelli and Montanari [14] and Pattipati and Alexandridis , among others. In AND/OR graph search, a solution takes the form of an acyclic subgraph called a solution graph, which is defined as follows:
• the start state belongs to a solution graph;
• for every nongoal state in a solution graph, exactly one outgoing k-connector (corresponding to an action) is part of the solution graph and each of its successor states also belongs to the solution graph;
• every directed path in the solution graph terminates at a goal state. A cost function assigns a cost to each k-connector. Let ci(a) denote the cost for the kconnector that corresponds to taking action a in state i . We assume each goal state has a cost of zero. A minimal-cost solution graph is found by solving the following system of equations,
where f∗(i) denotes the cost of an optimal solution for state i and f∗ is called the optimal evaluation function. Note that this is identical to the optimality equation defined for MDPs in Eq. (2), although we now assume the solution can be represented as an acyclic graph. This assumption means that a goal state can be reached from the start state after a bounded number of actions (where the bound is equal to the longest path from the start state to a goal state). By contrast, a stochastic shortest-path problem has an unbounded or indefinite horizon. Thus the traditional framework of AND/OR graph search does not directly apply to stochastic shortest-path problems.
Search Engine Optimization
Search engine optimization is often about making small modifications to parts of your website. When viewed individually, these changes might seem like incremental improvements, but when combined with other optimizations, they could have a noticeable impact on your site's user experience and performance in organic search results. You're likely already familiar with many of the topics in this guide, because they're essential ingredients for any web page, but you may not be making the most out of them.
Create unique, accurate page titles
A title tag tells both users and search engines what the topic of a particular page is. The <title> tag should be placed within the <head> tag of the HTML document (1). Ideally, you should create a
unique title for each page on your site.
(2) A user performs the query [baseball cards]. Our homepage shows up as a result, with the title listed on the first line (notice that the query terms the user searched for appear in bold).
(3) A user performs the query [rarest baseball cards]. A relevant, deeper page (its
title is unique to the content of the page) on our site appears as a result.
Page title contents are displayed in search results
If your document appears in a search results page, the contents of the title tag will usually appear in the first line of the results (if you're unfamiliar with the different parts of a Google search result,
you might want to check out the anatomy of a search result video by Google engineer Matt Cutts, and this helpful diagram of a Google search results page). Words in the title are bolded if they appear in the user's search query. This can help users recognize if the page is likely to be relevant to their search (2). The title for your homepage can list the name of your website/ business and could include other bits of important information like the physical location of the business or maybe a few of its main focuses or offerings (3).
AO* For state-space search problems that are formalized as AND/OR graphs, an optimal solution graph can be found using the heuristic search algorithm AO*. Nilsson [16,17] first described a version of AO* for searching AND/OR trees to find a solution in the form of a tree. Martelli and Montanari [13,14] generalized this algorithm for searching AND/OR graphs to find a solution in the form of an acyclic graph. Nilsson [18] also used the name AO* for this more general algorithm. Because any acyclic graph can be unfolded into an equivalent tree, the tree-search and graph-search versions of AO* solve the same class of problems. The graph-search version is more efficient when the same state can be reached along different paths because it avoids performing duplicate searches. Like other heuristic search algorithms, AO* can find an optimal solution without considering every problem state. Therefore, a graph is not usually supplied explicitly to the search algorithm. An implicit graph, G, is specified implicitly by a start state s and a successor function. The search algorithm constructs an explicit graph, G , that initially consists only of the start state. A tip or leaf state of the explicit graph is said to be terminal if it is a goal state; otherwise, it is said to be nonterminal. A nonterminal tip state can be expanded by adding to the explicit graph its outgoing k-connectors (one for each action) and any successor states not already in the explicit graph.
AO* solves a state-space search problem by gradually building a solution graph, beginning from the start state. A partial solution graph is defined similarly to a solution graph, with the difference that the tip states of a partial solution graph may be nonterminal states of the implicit AND/OR graph. A partial solution graph is defined as follows:
• the start state belongs to a partial solution graph;
• for every nontip state in a partial solution graph, exactly one outgoing k-connector (corresponding to an action) is part of the partial solution graph and each of its successor states also belongs to the partial solution graph;
• every directed path in a partial solution graph terminates at a tip state of the explicit graph. As with solution graphs, there are many possible partial solution graphs and an evaluation function can be used to rank them. The cost of a partial solution graph is defined similarly to the cost of a solution graph. The difference is that if a tip state of a partial solution graph is nonterminal, it does not have a cost that can be propagated backwards. Instead, we assume there is an admissible heuristic estimate h(i) of the minimal-cost solution graph for state i . A heuristic evaluation function h is said to be admissible if h(i) f ∗(i) for every state i .We can recursively calculate an admissible heuristic estimate f (i) of the optimal cost of any state i in the explicit graph as follows:
The best partial solution graph can be determined at any time by propagating heuristic estimates from the tip states of the explicit graph to the start state. If we mark the action AO*
1. The explicit graph G initially consists of the start state s.
2. While the best solution graph has some nonterminal tip state: (a) Expand best partial solution: Expand some nonterminal tip state n of the best partial solution graph and add any new successor states to G . For each new state i added to G by expanding n, if i is a goal state then f (i) := 0; else f (i) := h(i). (b) Update state costs and mark best actions: i. Create a set Z that contains the expanded state and all of its ancestors in the explicit graph along marked action arcs. (I.e., only include ancestor states from which the expanded state can be reached by following the current best solution.)
ii. Repeat the following steps until Z is empty. A. Remove from Z a state i such that no descendent of i in G occurs in Z. B. Set f (i) := mina∈A(i) [ci (a) + j pij (a)f (j )] and mark the best action for i. (When determining the best action resolve ties arbitrarily, but give preference to the currently marked
action.) 3. Return an optimal solution graph. that minimizes the value of each state, the best partial solution graph can be determined by starting at the root of the graph and selecting the best action for each reachable state. Table outlines the algorithm AO* for finding a minimal-cost solution graph in an acyclic AND/OR graph. It interleaves forward expansion of the best partial solution with
a cost revision step that updates estimated state costs and the best partial solution. In the simplest version of AO*, the costs of the expanded state and all of its ancestor states in the explicit graph are updated . However, Martelli and Montanari and Nilsson note that the only ancestor states that may have their costs change are those from which the expanded state can be reached by taking marked actions (i.e., by choosing the best action for each state). Thus, the parenthetical remark in step 2(b)i of Table 4 indicates that a parent i of state j is not added to Z unless both the estimated cost of state j has changed, and state j can be reached from state i by choosing the best action for state i . To simplify exposition, we have omitted from our outline of AO* a solve-labeling procedure that is often included in the algorithm to improve efficiency. Briefly, a state is labeled solved if it is a goal state or if all of its successor states are labeled solved. Labeling states as solved can improve the efficiency of the solution expansion step because it is unnecessary to search below a solved state for nonterminal tip states. We also note that the best partial solution graph may have many nonterminal tip states. AO* works correctly no matter which of these tip states is chosen for expansion. However, the efficiency of AO* can be improved by using a good selection function to choose which nonterminal tip state of the best partial solution graph to expand next. Possibilities include selecting the tip state with the least estimated cost, or selecting the tip state with greatest probability of being reached. Although AND/OR graphs are usually assumed to be acyclic, the possibility of searching cyclic AND/OR graphs has been studied. In previous work, however, the solution graph is assumed to be acyclic