WikiJournal Preprints/Cognitive architecture for storing domain-specific knowledge in a hierarchical task network

Game-engine which stores Domain specific knowledge
One of the biggest challenges in Artificial Intelligence is to store domain specific knowledge in a machine readable form. In the literature, many concepts are discussed for example ontologies or semantic networks. Sure, they are machine-readable, in a form that the ontology is a owl-file, which can be parsed automatically, the problem is that the practical usage of hierarchical data are not very high. Because, how exactly can a mind map be used for controlling a robot-agent? Right, it is unsolved yet.

A more elaborated form of knowledge storage is to accept only a symbolic game-engine as a knowledge container. A game-engine is per definition a piece of software which can predict the future. For example, if we are sending the command “move-left” to the game-engine, the system will update the internal state of the player, e.g. with “newpos = {x-10,y}”. From a formal perspective a game-engine is programmed in computer code, for example in C. But from a usage scenario a game-engine can be utilized as a prediction machine. That means, for any game-engine the game-tree can be generated, and this is the pathway to Artificial Intelligence. If we can sample a system with a brute-force solver it is possible to solve the game, that means to generate a plan which fulfills certain conditions.

Let us investigate how modern sampling planner are working. RRT and other graph-based sampler are using an existing game-engine (called action model) for testing out random moves. They can do so, because the game-engine contains a list of allowed moves, and the game-engine calculates what the follow state will be. Or to make the point more clear. If we do not have a game-engine, we can not generate the game-tree and we can not generate the optimal path through it.

Machine readable knowledge is equal to construct a game-engine. That is the only option we have. We can discuss in detail how the game engine can be programmed for example in the Game description language, in PDDL, in plain C or in object-oriented C++, but in any case we need some kind of prediction ready game-engine. Such a system stores the knowledge about the system. Knowledge in a form, that we define which motion primitives are possible and what they will change, after executing them.

Grounding
In most papers about cognitive architectures the term grounding is introduced for connecting the agent actions with the real world. The idea behind grounding is, that the agent has a synthetic memory (episodic memory) in which he is able to describe the world. This description is not totally wrong, but it ignores the engineering aspect for generating such an “episodic memory”. From the game-programming / C++ perspective there is no software module available called episodic memory or grounded perception. The only tool what software-engineers can implement is a game-engine. The engine has the same obligation, to map actions, perception and environment. But the term has nothing to do with psychology but with game-development.

Let us go back to the example with the “move-left” command. If we want to ground the action, we need a new C-function in the form “if (input==”move-left”) do something”. Do something means, to change an internal variable for example “newpos = {x-10,y}”. The action is grounded because we have connected a natural language input “move-left” with C-function in the source code. I would guess, that it is enough to program a layered game-engine; a different kind of grounding module is not needed.

The term “game engine” is a bit uncommon for classical Artificial Intelligence research, because it has nothing to do with psychology science or neural networks in real animals. Also, it is not a mathematical model, so there is the suspicion that a “Game engine” is not scientific at all. And yes, the term is used by engineers not by Academia. They want to implement executable code on real computers. They are not interested in philosophical debates about thinking machines or perception in the human brain.

Game Engine
A game engine is usually used for rendering the graphic and evaluate the user input. The same technique can be used in the context of AI. An early example is the GOAP paradigm which is grouped around a game-engine. The game-engine evaluates a plan and gives a score back. A more sophisticated example are HTN-planners which have as a core a pddl file which acts similar to a game engine. But it is even possible to model story based games with an engine. This is used by text adventures and the engine can be used similar to a HTN-planner: for figuring out what the best plan through the game is.

Knowledge engineering with natural language
About AI planning itself, there is a huge amount of literature available. From the early beginning with the STRIPS planning system in the 1970s, over later developed pddl like HTN-planners upto modern game-oriented GOAP planning goes the history. A sub problem in AI planning is until now open: how does the action model look like for a certain domain? For example, we want to play Robocup and need the domain model in a pddl description.

How the path may be look like to develop machine readable knowledge from scratch is discussed under the the term GIVE challenge (“Generating Instructions in Virtual Environments”). That is not directly an algorithm, but a programming challenge which is grouped around ready-to-run domain knowledge. For all, who are not aware with the original challenge, here the idea in short. The aim is not to guide a robot through a maze but a human. The software generates the output in natural language and the human must act. From the AI history the setup is called decision support system, because the computer only give help to the human but not decides for him. The advantage from the programmer perspective is, that such a system is easier to implement then a full-blown robot control system. A second interesting part is that the output is in natural language.

How the challenge can be solved is not very complicated. At first, a pddl domain description is used, then the current situation of the game is read, and a planner is used for producing the next step for the human. The planner can be based on STRIPS, PDDL, HTN or GOAP. He generates the plan, for example “open door”, “walk left”, “push button”. How the human is executing the plan is on his side. The feedback for the AI is only the situation in the game itself. That means, the software has the tasks:


 * track the actions
 * plan future actions
 * with the help of an internal model

The GIVE challenge doesn't only generate natural language, but it is a subgoal generator. The idea is not to control a robot with low-level commands, instead the goal is to planning part-goals, for example “open door” and similar actions.

Agent oriented programming
Object-oriented programming is the classical paradigm: a method contains commands in a linear fashion. The actions are executed from top to bottom. If another program flow is needed, the programmer has to modify the source code. In contrast: agent-oriented programs are using a layer who is controlling the program flow. The layer is called blackboard and is some kind of note-area to rearranging commands. For example, the first draft contains the following commands:

print, if-then, print

After a modification by an agent, the new program flow is maybe:

if-then, print, goto

In early years this technique was called self-modifying programs, because in the software is another software which is created from scratch at run-time, called the agent-plan. Agent-oriented programming is not using a fixed program flow, created by the programmer, instead the Artificial Intelligence creates the program flow for the need of a problem. If the plan is ready, it will be executed by the interpreter. The amount of modules and subroutines to realize such behavior is called an agent-framework. It is based on object-oriented programming but has new features.

Understanding SOAR and ACT-R
It is widely known, that cognitive architectures are simulating the human process of thinking. But this is a bad introduction for explaining the principle to beginners who are not familiar with Artificial Intelligence. The better approach is to understand a cognitive architecture as a "hierarchical task network". This term is used in the gaming industry in the domain of GOAP (Goal oriented action planning) and means not to create a behavior tree, but to implement a symbolic game engine. The game engine is used to testing out plans and scoring them.

SOAR is an advanced form of a HTN-network. It is basically a hierarchical planner, on the lowest level there are actions, and above the actions are tasks. They are planned hierarchical. Additionally, SOAR has some agent-like features, for example a semantic memory and a blackboard. This is used for evaluating plans.

It is true, that SOAR was programmed in LISP, but LISP is not the best programming language for implement an AI system. It makes sense to start with a mainstream language, for example C/C++. It is a mainstream language, because C/C++ can be used for any kind for software development tasks, from hardware drivers, over desktop-application up to Artificial intelligence. Instead, LISP is only used for AI problems and compiler-compiler which makes it's harder to understand. The bottom up approach for teaching cognitive architecture is first to implement a simple HTN-planner in C++, and extends this later into a cognitive architecture which has agents and deep.-learning features.

Let us explain the core idea behind HTN-planning, because this results into Artificial Intelligence. On the lowest level, an AI is a planner. An easy to understand example is a path-planner like A* and RRT. At first, random movements are created, and each of the nodes is evaluated with a score. The node with the highest score is executed by the AI. Such a core-feature can be written down in two simple lines of code:

for trial calculate score for node

The algorithm runs in a for loop and on each iteration he is evaluating the score. Sure, this simple pseudo-code isn't able to control any robot, but it makes it easier to understand what AI is. SOAR, HTN-planners and agent-architecture are about the details of how to implement such a for-loop. The reason, why the term cognitive architecture is used has to do with the importance of natural language for domain modeling. Classical path-planning works not with language, but with numbers. For example, with the distance from point A to B. The problem is, that with numbers alone is not possible to implement complex domain models. And without a domain model the planner can not search for a node.

The bottleneck in HTN-planning as well in SOAR is to create complex domain models for games. A domain model is usually a mini-game which has possible actions and an outcome after executing it. All agent architecture have such mini-games in the sleeve. In the literature there are some examples given of how to store a domain model: PDDL, Game description language or ontologies. The holy grail of AI is to generate a domain model on-the-fly. That means, the agent plays a new game, creates the domain model in GDL, and this is used by SOAR to plan the next actions.

AGI is a testbed for Narrow AI
A look backwards in the history of AI is often done with the explanation what the difference between Artificial General Intelligence and Narrow AI is. But perhaps the difference is only artificial? Perhaps both are the same? Let us investigate how the robocup robotics challenge work. It contains of two types of software. At first a testbed, which is able to evaluate which of the teams has won, and secondly the agent itself, which is able to play the game. This distinction makes also sense for other domains, for example chess playing, Micromouse and a line following robot. In any case, two types of software is used: one which has the function to act like a turing-test, and the second which is the core AI.

Without any doubt, the second part (the AI) is called in the literature a narrow AI. For example, if somebody has programmed in C++ and with ABL a robocup AI who is able to kick the ball, then this agent can be called intelligent, or at least it es equal to Narrow AI. That means, the software is able to solve a certain problem. On the other hand, the problem which is equal to play robocup has to be monitored by a different type of software. It is uncommon in the literature, to call this part of the overall setup an Artificial General Intelligence, but it would make sense.

From a programming point of view, the evaluation software is called a testbed. It is software which was programmed to run other software inside. For example, the Robocode challenge contains the robocode testbed itself (which was programmed in Java) and the agents which can be executed inside Robocode. The same is true for the famous Mario AI challenge. It contains of the Super Mario Game itself, and possible agents who can be run inside the environment.

The hypothesis is, that early so called Artificial General Intelligence projects like SOAR and the General problem solver, were never invented as a Narrow AI. That means, SOAR is not able to control a robot or play a game. Instead, SOAR is a testbed and can be used in a robotics challenge in which the teams have the task to program the narrow AI.

Robo-soar
For the software “robo-soar” in a paper was described, that this tool acts mainly as a testbed. Soar is explained as a integrated AI framework, and on page 5 the general setup is shown. It is mainly a robotarm, which can pick&place objects and also a human-operator is in the loop. The interesting information is, that SOAR is not the agent for controlling the robotarm (a so called weak artificial intelligence), but SOAR is the overall framework. The modern description would be, that it is the software to monitor a robotics challenge. A program which can say, which participant in the competition has written the best agent-software.

But there is in the above cited paper something else which is interesting, in the 17 pages long paper there is no heuristics or sourcecode given how to solve the problem. The problem was to pick&place objects with the robotarm. A possible solution would be to program an agent in C or LISP which is planning some actions. But, this strategy is not part of the paper. The paper is only about the competition itself, that means it describes in detail what the task is, not how to program the ai-agent.

The same approach is visible in a more recent paper, which was published by the same author in year 2000. The first sentence in the abstract explains very well what the aim is:

“We are creating an environment in which to investigate the role of advanced AI in computer games. [...] In this paper we describe our test bed for pursuing research in developing human-level AI characters within computer games.”

The goal is not to program an AI which is able to play the game autonomously, instead the aim is to program a robocode like environment and to delegate the programming of the AI to participants of the competition.

GOAP Planning
If HTN-planning is the academic term, GOAP is the realization in a computer game. Goal and action planning means usually a hierarchical planning system for an Artificial Intelligence. But what exactly is planning? Geometric planning is the search for a path in a maze around the obstacles, while HTN-planning has to do with a trajectory in a symbolic maze. This maze can't be visualized with x/y coordinates but can be described with tasks and subtasks. If the action gets executed the agent is doing something in a computergame, for example he walks to the door and opens it.

What a HTN-planner is doing is to describe the domain with symbolic constraints. They are similar to obstacles in a 2d space, but they are defined with abstract descriptions for example the goal “search for the key” is an obstacle. It is something which can be fulfilled or not and which is used by the planner to generate actions. Because of the abstract nature of HTN-planners i would call these system natural language evaluation systems. Because the action space is defined by English vocabulary and the plan is also a sequence of action-words.

What the programmer of a GOAP system is doing, is to formulate the domain with abstract obstacle. He is building a map in which the planner is searching for a trajectory. A trajectory means usually a plan which is a sequence of actions. A HTN-planner is able to evaluate a plan, he can score a given plan and search for a better one.

Inner working of a sampling based HTN-planner
What a HTN-planning system is doing on a theoretical level is well known. It plans something according to an domain model. But how can we implement this into software? The first step is to define a hierarchy of actions. There are some textual commands which can be send to the layered simulator:


 * layer0: keyboard left, keyboard right, keyboard up, keyboard  down
 * layer1: walk left, jump over box, jump against block
 * layer2: walk to point a, walk to princess, get key

According to the names, this is an example of a side scrolling game in which the player has some problems to solve. On the lowest level the game can be controlled with the arrow keys only, additional there are more layers which are expressing the semantics of a move. The game can be played on all the layers. Either the player can use the arrow keys, or he can enter high-level commands which brings the game-engine directly to the goal state. For example if he enters the command “get the key”, he has without further problems simply the key in his inventory. This bypass all the details which are usually be necessary to get the key, it is some kind of trainer mode.

The planner works in a way, that random actions are send against all layers of the game-engine to see what will happen. If only the lowest layer is selected it will take a long time because the state space is high. If the planner is using all layers, the game will solved much faster. I want to a give a example. According to the random generator the following actions are selected:

“get key”, “jump over box”, “keyboard left”, “walk left”

That means, we have run the random generator 4 times and these are the random commands. How can we execute the commands against the game engine? Execution of the lowlevel command on layer0 “keyboard left” is easy, we are sending simply the command to the game engine and see what will happen. It is a normal search in the game tree. Executing a mid-level action like “walk left” is a bit harder. According to the table, this command is from layer1, that means it can't be executed directly. The original game has no command with such a name. Instead the command is a subgoal and the planner is forced to solve on the level below. That means, he is sending keyboard commands to the engine, until the player “walks left”.

In general the inner working is equal to a HTN-planning system. We have lowlevel and high-level actions and must generate partial order plans to satisfy certain conditions. Only commands on the lowest layer can be executed directly, the other commands are only symbolic goals, which guides the search procedure.

Usually HTN-planning system are implemented in agents. The agent has an abstract plan and a blackboard and he generates for the plan the lowlevel actions to execute them against the game engine. If some plans are failing during runtime, the agent detects the problem and replans.

BDI agent
If a HTN-planner is able to solve a game, what is the purpose of an BDI agent? The idea is to layer the overall system into submodules: the game itself, ontop a hierarchical simulator, then a HTN-planner and on top of all the BDI-agent. The BDI agent is only the upper way to communicate with the overall system. It is implemented in a C++ class, which has commands like “start-agent”, “show plan”. The class also have a solver integrated, which is based on the module defined on a lower levels. The solver for example is using a HTN-planner for generating a new plan. But additionally to planning, the BDI agent has some other features, for example to search for new goals.

Sometimes a BDI-agent framework is associated with a concrete programming language for example Agentspeak. But it is not a piece of software, it is a software-engineering technique like object-oriented programming. It is a certain way to structure the sourcecode. That means, a BDI agent can be programmed in normal C++ without the need of inventing a new programming language for Artificial Intelligence. In the easiest form a BDI-agent is a class inside of a robot-control-system which has high-level-methods like “start”, “Stop”, “ask for a goal”, “plan”. Usually, it is interacting directly with the GUI. That means, the BDI agent is a high-level description of the Artificial Intelligence. The user starts the program code with “./a.out”, and then he sees a GUI, and in the gui he can enter the goals and sees the plans of the agent to reach the goals.

A BDI-agent is doing the same like a HTN-planner or a layered game-engine. The idea is to play a game autonomously. The difference is, that such a piece of software is very complex and one of the submodule is called BDI-agent.

Programming a HTN-planner
The most demanding task in HTN-planning is not the solver itself. This is in most cases a simple brute-force search algorithm which takes the hierarchical domain model and searches in the graph. No, the bottleneck is to create the hierarchical task network, which is equal to a domain model. Let us go into the details to recognize potential pitfalls. Suppose, we have a game like Mario AI. The interaction works in a way, that the user is pressing the left/right arrow and navigates Mario through the level. So every game has to do with pressing keys: left and right.

The sad news is, that this description is correct but there is missing some important detail. Mario AI is more then only pressing left and right, Mario AI has to do with jumping over a wall, finding the princess and avoid the enemies. Describing a game as a hierarchical task network has to do with finding out the inner working of the game, known as the semantic description. This inner working differs from game to game: a car racing game has a different meaning then a real time strategy game. A HTN-network is about the inner working, it divides the overall control task into modules.

The general idea is to transform a monolithic game into smaller subgames which can be solved on an abstract level. The aim is to increase the interaction level into the direction of natural language. What the user of Mario AI want's is no longer pressing left/right keys, he want's to enter “jump over wall”, “avoid enemy”, “jump with trajectory #2”. The only problem is, that these kinds of commands are not available in the original game, they have to be invented from scratch. It is like designing a new game, which wasn't there before.

So the logical question is: how does the new abstract game will look like? This depends heavily from the domain. The subgames of a racing game have to do with car physics, racing and trajectories for curve ride. The subgames in a RTS-game have to do with build order, long term strategy and position of units on the map. The more general term for describing the subgames is “Domain knowledge”. Domain knowledge is something which is important to win a game and this has to be transferred into an abstract subgame inside the original game.

There is an important reason why domain knowledge has to be formulated as a subgame. Because this is the only way to evaluate the knowledge and search for a plan. A newly invented game engine is the perfect choice for predicting future states of the game, and this is the fundamental idea behind a HTN-planner. What a htn-planner is doing, is using one of the actions from a layer and testing out a move, then he scores the result. That means, the HTN-planner is utilizing a subgame for predicting the future.

The usual idea in the literature is to find the subgames automatically by genetic programming or neural networks. This attempts fails. A game engine for real purposes has to be created by hand, that means inside a software-engineering process. Domain modeling is nothing which can be done automatically, it is mainly a task for the programmer. He has to define, how the HTN-network will look like, what possible high-level-commands are there and what the scoring value of an outcome is. The programming task is similar to programming any other game. Usually the programmer has to implement the game rules in many edit-compile-run iterations until the game works. Or to make the point clear: if manual C++ programming in an IDE together with git and some question on stackoverflow are the only option to program a Mario game from scratch, then the more demanding task of programming a subgame inside Mario AI for predicting the jumping behavior is also a manual task. The infrastructure for building a HTN-network can be described with:


 * object oriented programming language like C++
 * a bugtracker for note down the issues
 * an online forum for asking other programmer for help
 * some examples from the past written down in sourcecode and  academic papers
 * a git-like version control system to modify the sourcecode

I want to elaborate the overall process for developing subgames for a HTN-planner. What the programmer usually has, is the monolithic game and a walkthrough tutorial in natural language. What he doesn't have is machine readable domain knowledge. The task is to convert the instruction manual written in English into subgames which can be run by the HTN-planner for solving the game. The result will look like a layered physics engine written from scratch. It works from the outside view like this:

action(layer,command), e.g. action(layer2, “jump over wall”)

How this physics engine works internally is up to the programmer. The general idea behind abstract subgames is to provide subgoals and scoring rewards. These are used by the HTN-solver for figuring out the best move. Or to be more specific: the monolithic game is split into smaller, layered subgames which can each solved in a small amount of time.

Simulating a plan
Hierarchical task networks were invented in the year 1975 under the term “procedural net / NOAH”. According to the original paper, the general idea is to create a hierarchical plan and simulate it with a procedural net. Let us make a short example. At first we need a hierarchical plan, this plan is executed in a simulator. The simulator gives feedback for the plan. Is the plan valid? How long does it take? What is the overall score?

go to room B  go forward (physics engine #1) leg1 down leg2 left leg1 up  take the key (physics engine #2) open door (physics engine #3) go forward (physics engine #1) leg1 up    leg2 up     leg1 down

Running a hierarchical plan is straightforward compared to running a simple plan in a simulator. Because the procedural net is not only a single physics engine, but each node is it's own physics engine. In the figure the plan and the matching physics engine are given as an example. What the overall HTN-planner is doing is to take a subplan, recognizing which physics engine is the right one, and testing the plan against the engine. Finding the right plan is surprising simple. In the easiest implementation, random plans are generated, tested against the procedural net and the aim is to identify the plan with the highest score.

Hierarchical Backtracking: a less known problem solving technique
With the advent of behavior trees for ingame-AI most software developers are aware of how to build an artificial agent. The idea is, that a designer specify the behavior of the bot and after the script is created it will get executed like a normal computer program. But scripting-AI is only the beginning, there are more sophisticated AI techniques available like HTN-planning and especially HTN-backtracking. The difference is, that they are not recognized broadly and most programmers doesn't know how to implement such strategies.

The basic idea behind HTN planning is, not to specify a script but program a symbolic game engine which is able to predict the future. Such a game-engine is equal to a textadventure, it is some kind of software modul which takes a command like “open door” and reacts to the command. One possible option to realize the engine is the PDDL programming language, but it is also possible to use normal C++ or Python code to program a textadventure. And now the things become complicated. A game-engine itself is not enough, what HTN-planning is about is a hierarchy of game-engines. On the lower level the agent moves around, on the second level he gets strategy commands like “move to point A” and on the high-level he gets very general commands like “search for the door”.

Suppose we have implemented a layered textadventure, then the next step is to use a technique called backtracking. Backtracking is usually known for game-tree search in computerchess. It means, to try out different paths in the tree. But, computerchess contains only of one tree, no hierarchy is used. If a multi-layer-gameengine is available it is available to backtrack over many layers, This saves lot of cpu-time. It means, that the solver is used for each game-engine separately. Let us make a simple example.

The plan which was created is:

search for a door --move to point A    --open door

It is not a flat plan, but a hierarchical one. There are three layers available, and each layer is a separate game. Suppose in one of the layer some action is blocked, because of external reasons. Then the planer can search for a bypass. He uses the game-engine to predict future states and searches for a subplan which fulfills the conditions. The most important feature is, that the solver is really fast. Hierarchical planning means to leave huge parts of the overall plan unchanged. It is possible to replan only a subpart of the plan.

In most cases, the lowest layer of the HTN-planner is connected to a physics engine. Running a physics engine is very CPU-intensive. So the goal is to reduce the number of iterations. For example, if it is known, that the command “open door” will result into a certain situation it is not necessary to start the physics engine on each planning step to check if the agent really is able to open the door. Instead the result can be assumed as successful and this situation is used by the planners in a higher hierarchy. Now, let us suppose, the lowest physics engine layer has created a bug;

search for a door --move to point A    --open door -> not working search for a door --move to point B    --open door -> success

The planner can react to the bug with replanning everything. He is trying to reach the same goal, but with different task sequence. How can we imagine the inner working of such a hierarchical planning tool? The easiest way is to see a HTN-planner as a generator for “natural language”. His primary function is to output sentences which are guiding the agent to a goal. It is similar to street-guidance system. Such software is usually separated in different layers, it can plan for the highway and also for small streets. The user have to enter the starting point and the goal and the system will calculate the steps between them. Backtracking search is used for figuring paths even if one street is blocked, so at the end the user can reach the goal.

The difference is only, that this time the game is more complex. It doesn't only contains streets but behaviors of an agent. For example, the robot navigates through the kitchen, opens the shelf and is using tools for making food.

Bottleneck in HTN-planning
Programming the solver itself to search for a plan is simple. In most cases, it will not take more then 100 lines of code because planning means only to testing out random actions until a condition is true. If the PDDL file for a domain is known, it is very easy to prove the theorem, which means to find a plan. The bottleneck in HTN-planning is somewhere else. It is the creation of the domain model, that means a machine readable prediction of how the world will change after a certain command.

Let's take in look into a formal STRIPS and PDDL file. It is mainly a syntax for describing a game-engine. We have a command for example “open door”, a precondition and the result. It is possible to construct with PDDL a game-engine for playing pacman or a textadventure. But what defines, how a pddl for a certain domain will look like? Usually only the low level game-engine is given for example the box2d simulation of a robot-grasping task. If the human-user is playing around with the physics engine he will generate some example, of how the game works. He is able to generate an internal model how to play this game. But, what he doesn't know is how the matching PDDL domainfile will look like. And here is the bottleneck. How much effort is necessary to create for given domain the PDDL file? Can this done automatically, for example with neural networks?

Let us imagine the PDDL file (or a similar specification in an object-oriented language) is given. That means, the possible actions, their outcome and sub-actions are known. The consequence is, that the game can be called solve. Because if the domain model is hierarchical organized and well implemented it is very easy for every planner to bring the system into a goal state. The current game-situation is know, also the goal of the game, and with a bit brute-force searching and backtracking the solver will find the path through it. It is simply not possible that the planner will fail.

That means, the result depends only from the domain model. If we have no domain model, or a flat model without hierarchy, then the planner will fail. If the only available domain model is the pure low-level box2d engine it makes no sense to search for plans, because the state-space is way to huge. The only bottleneck is the question of how to construct the domain model. Answering this question will result into a working robot-control-system.

In general there are two options a programmer has. At first, he can type in the PDDL file from scratch in his texteditor and see it as a software-engineering process. He can formulate requirements to the domain model to express which features are needed. Then he can track the changes with a version control system and test the model in a simulator. Such technique will result into a working domain model, but it is time consuming. The other option is to use some kind of data-driven learning method. Which means to record a gametrace in a logfile and generate the domain model automatically. This method needs very little human effort but it is unclear what the result will be.

A more easier task is not generate a domain model from scratch but testing out a given domain model if his predicted actions are correct. We are taken a PDDL file and running it against a low-level game-engine and at the end both models must be synchronous. According to previous published literature it is very hard to generate or evaluate a domain model automatically. The reason is, that a high-quality domain model not only contains Finite state machines, but also the action names in natural language, for example the “open door” task is called in the PDDL not “task1” but “open door”. It seems that automatically generate the Finite state machine plus the appropriate English-description by an algorithm is very hard and that a manual software-engineering process is the only way to get a functional domain model. Surprisingly the amount of effort isn't very high to do so. Suppose the PDDL file is 200 lines of code long. According to the average productivity such a domain model can be created by hand in 200/10=20 days by a single programmer. If he finds on github the needed PDDL file with a fulltext search it is possible to reduce the time further.

Perhaps it is time to explain how a domain model looks formally. Mainly it is a computerfile in the language STRIPS, PDDL, Prolog, Golog or even normal C++ code. It is equal to a layered game engine to express possible behaviors of an agent. The value of the domain model is equal to the used lines of code, a pddl file with 10 lines of code is inferior to a domain model which needs 1000 lines of code. A working real life domain model is similar to computercode. That means, more complex actions can not express in PDDL but needs a description in C++ object-oriented programming.

Nevertheless, there was some research in the direction of using LSTM neural networks to predict the action model with natural language. The idea is use a corpus of actions vocabulary and match this with the result in the game for predicting the future with a neural network:

“We developed a symbolic simulator based on the domain knowledge (i.e. Planning Domain Definition Language (PDDL) file) in the Tell Me Dave corpus to train the RL agent.”, page 3

The same idea is described by. The figure on page 4 topleft explains the pipeline. At first a plan is presented: “1. move to cup, 2. grasp cup, 3. stop stop”. The plan gets executed in the game and results into some states. The LSTM neural networks learns the mapping between the actions and the consequences, so it can predict future commands:

“Our model can correctly predict complete action sequences with a probability of 92% on average.”

Hierarchical plan simulation
A plan in computerchess contains only one layer. The moves are numbered and are executed from top to bottom:

1. f2f4 e7e5 2. f4xe5 d7d6

Such a notation can be executed against a chess-engine. It is executing the moves and can print out the board situation at the end. A less known way of executing plans is a hierarchy. The original plan contains more then one layer, the example has three layer, because of the maximum depth. Such a plan can also be executed against a simulator. But the simulator must be more sophisticated, because it reacts to an action on layer0 different then to an action on layer1.

task1 subtask1 action1 action2 subtask2 action3 action4 task2 subtask3 action5

The main idea behind HTN-planning is to construct hierarchical plans and testing them in an simulator. Like in the introductionary chess-example the idea is to calculate a score for a plan. It is a measurement what will happen if the plan is executed. Like a chess plan there are many possible outcomes available. The plan can run smoothly and result into the output “player1 has won”. Or the simulator can detect, that a move is illegal, then the execution will stop with an error message. But the most important feature is, that it is possible to generate a plan by a random generator and search the system backwards for a plan. That means, at the beginning we have no idea which actions are needed, and we must testing out some alternatives. This is discussed in the domain of PDDL and other hierarchical planning systems. The question is, how to construct a solver which will generate a high-score plan from scratch.

The PDDL syntax was mainly invented to increase the performance. If a high-level-action has a clear precondition, postcondition statement it is able to calculate a plan in high-speed. The PDDL environment acts like a super-fast game engine. Another technique (partial order planning) was also invented to increase the plan-search-speed. Here is the idea to plan only parts and aggregate this into much broader plans. A not so advanced planner is not using any kind of optimization and simply generates the complete plan from scratch with a random generator for testing it out against the simulator.

Hierarchical reinforcement learning
Under the term Reinforcement learning usually a data-driven is approach is discussed. The algorithm gets a database, learns from the data and is able to reproduce the movement. Because the missing programming effort, reinforcement learning is attractive in beginners of Artificial Intelligence. There are some main problems in implementing such an algorithm.

The main issue what all beginners and experts in machine learning will recognize is the topic of “my algorithm doesn't learn”. That means, they have constructed some kind of learning policy which is adapting to the data, but the error rate didn't fall under a certain level. What is wrong with the algorithm? This is very easy to answer, it is has to do with the reward function. If no reward function is available the best LSTM network will not learn. But what is a reward-function?

A game like “Mario AI” has a time period in which the game is running. For example it takes 5 minutes until a level is solved. In this time-period the agent is doing some actions, which are controlled by a neural network. The problem is, that the algorithm doesn't know if an action was right or wrong, because it is unclear what an action taken on timeframe 0:04 will has as a future consequence. Missing rewards during the game results into a huge state-space and this is equal to a not-learning reinforcement learning algorithm.

Overcome the problem is simple, and it is discussed in the literature under the term hierarchical reinforcement learning. A hierarchy is a set of actions which are grouped in layers. On the lowlevel layer primitive actions like “press button left”, “press button right” are used. On a higher level a command like “move to object” is given. A hierarchy of actions is equal to subgoals and this is equal to a reward function. For example, if the mid-level command “move to object” was entered then the rewards depends on if Mario AI has reached the object or not.

In my opinion, a hierarchical reinforcement learning system is possible to implement. The systems learns from the data all the tasks, subtasks and actions and is able to simulate the tasks in the agent's memory. The problem is, that from a bootstrapping perspective such an agent-architecture is very complex, because it is an HTN-planner and a action model learning tool at the same time. That means, after starting the software no PDDL domain is available, it is has to be learned from scratch. And if the model is ready, it will be used for planning the next action.

It is possible to explain the algorithm a bit in detail. In a normal HTN-planner a given action model is used, the pddl-file. This file is used for running a simulation. That means, the agent knows what will happen if he executes the command “move to object”. The pddl file is created by hand. In reinforcement learning the idea is to not program something but using existing data and generate the algorithm from the data. That means, Hierarchical reinforcement learning needs no initial PDDL file, it extracts the file from the data. Programming such an algorithm is technical possible but it is a very hard task. I would suggest it is easier to implement a normal HTN-planner which is filled with a manual pddl file. The reason is, that Hierarchical planning in general is very complicated and with reinforcement learning the things will get more complicated.

Expand of high-level actions
A hierarchical plan contains of actions, only the lowlevel actions can be executed directly, the mid- and high-level actions are symbolic labels:

task1 action1 action2 lowlevelaction1 lowlevelaction2 action3

But how exactly can we expand a high-level action? In the domain of compiler and interpreter programming a label is resolved with a grammar to a executable action. In the case of HTN-planning this idea will fail, because the high-level actions are resulting in a different sequence of lowlevel actions. It is a problem called “Answer set programming”, that means the high-level-label is equal to a goal which has to solved.

A HTN-plan is not a program which can be executed from start to end, it is more a list of goals. Some of the goals (the lowlevel actions) can be fulfilled easily, but the most goals not. Let us assume an example from Mario AI. If the goal is “jump with Mario over the wall”, then it is maybe clear what the purpose is and how the result will look like, but it is hard to transform this goal into a sequence of keyboard inputs. The good news is, that there is help. A so called brute-force solver can transform a goal into actions. It is some kind of backward search: we have possible random actions, a game engine and now we can search for the exact path in the gametree.

Apropos gametree. In classical game-theory the game tree is a graph of possible movements, it structures the decision making. In the area of HTN-planning there are many sub-game trees and the goal are equal to a desired state in the gametree. That means, the goal “jump with Mario over the wall” is pointing to a node in the gametree. On this node, the goal is fulfilled.

The question from the perspective of implementing such a system is how we formulate this problem. In the PDDL community the answer is to create a pddl domain model and search in the model for a plan. But I'm in doubt if this is the best way, because the PDDL language is limited and is not able to model object-oriented relations.

From the use-case an goal-resolution system should be work that the user can give a goal and a game-engine while the system gives back a plan:

solver(“jump with Mario over the wall”, gameengine2) -> plan=”right, right, jump, right”.

The gametree of the game engine is calculated, and the textual goal is used to identify a node. Then the solver calculates the plan to reach the goal-node. Such a system must work with different game-engines and different goals:

solver(“find the princess”, gameengine4) -> plan=”jump over wall, jump on enemy, walk right”.

I know, from a certain point of view, this workflow is trivial, because it is the description of a HTN-planner. But it seems, that even the first HTN-planner was developed in the mid 1970s the concept is not very often implemented. Let us describe the complete workflow for expanding a plan into actions:

task1 action1 -> solver("action1", gameengine1) action2 -> solver("action2", gameengine2) plan: lowlevelaction1 plan: lowlevelaction2 action3 -> solver("action3", gameengine3)

It is basically the same plan like in the beginning of the chapter, except that the high-level-goals are converted into requests to a solver.

Goal-Oriented Action Planning for realizing neural networks
Most beginners in neural networks will run into trouble because their agent have no reward function. A reward is equal to a goal, for example reaching a place on the map. In most cases the goal of the game itself is clear: win the game. But with this simple reward function it is impossible to train an agent. Suppose, the agent is doing random action ingame, is that an improvement of his situation or not? And because the algorithm can't answer the question, the agent will not learn, that means, he will not reach the goal nor he will improve his policy.

The answer to that problem is called “Goal-Oriented Action Planning”, which is basically the known “Hierarchical task network” principle adapted for computergames. GOAP means, to define a goal hiearchy which is equal to a reward function. A typical agent has a plan like “1. open door”, “2. move forward”, “3. take the key”, “4. open the door” and so forth. He has structured the game in smaller chunks, called symbolic goals. The list of all goals are stored in the goal tree and a planner is adjusting the plans in realtime. For example, if we are taking the agent into the air and placing him to another location, he replans everything.

It is possible to combine a neural network with the GOAP architecture? Yes, and it is a very good idea. The neural network is able to learn from example without programming to much manually while the GOAP principle is used for generating subgoals which guides the policy search of the agent.

The strengths and weakness of GOAP and neural networks are asymmetrical. The GOAP architecture is very strong in finding subgoals and planning. His weakness is, that the domain model has to be programmed manually. In contrast, neural networks have no subgoals and can't plan anything, but they need no manual programming and can adapt to given data. It is possible to combine both ideas into a powerful neural network cognitive architecture?

Conflict resolution in HTN-planning
The assumption for most HTN-planning domains is, that a working domain model is available. That means, the prediction of the model is accurate. The problem is, that this assumption is not realistic. In the figure a blue robotarm should push the object to the left. According to the HTN-domain model a push against an object moves the object. For the planner, the push action against the object will solve the problem. But in reality, a push against the object will be blocked by a second object. This case is not stored in the domain model so we have a gap between prediction and reality.

What will happen, if the HTN planner is trying to create a plan? According to his domain model he will create the plan “1. push object to left”, and the planner believes this will work. After executing the plan in the simulator there is an error. That means the plan is executed but the result is different from the expectation. This is a tragedy, but it is useful information. Now the planner knows this his plan is wrong. And here comes the magic into the game. The new request to the HTN-planner is: “bring the object to the left but don't use the plan 'push object to the left', because this plan doesn't work”. That means, the planner is forced to solve the problem with a different plan. He can push the other object first and then push the original one.

It is acceptable, if the domain model isn't perfect and can't predict the reality. The HTN-planner will produce plans who doesn't work and this plans are marked as fail. In a second step, an alternative plan is generated which bypass the problem. The feature of detecting failed plans and bypass them is called “Backtracking”:

“If the decomposition is not possible (e.g. because of colliding restrictions), the planner backtracks and creates a different decomposition.”, page 4

Lowlevel HTN-planning
Understanding a HTN-planner and a BDI agent is complicated, because the system contains of many submoduls. Perhaps it would help to describe the system from the bottom up? On the low-level the player can press left-key or the right-key for move his character on the screen. And the aim is to solve the game. The HTN-planner has the obligation to provide subgoals, for example to bring Mario first to the wall and then Mario should jump. Every of the subgoals can be fulfilled with pressing the left/right key. And the planner provides some more goals on the high-level-layer for example, to master level 1, or avoid an enemy. What the HTN-planner is doing is pressing right-left key to fulfill subgoals and high-level-goals. The process works in a realtime-fashion, that means the overall BDI architecture monitors the systems, calculates new subgoals and determines lowlevel actions to fulfill the goals.

Perhaps a small example. If the game takes 2 minutes, the normal player will have pressed 100 times on of the buttons, either left or right. Over the timespan he has navigated Mario through the level, pickups objects and jumped on top of enemies. A HTN-planner structures the 100 lowlevel actions together with the BDI-architecture in a formal way..