Ejemplos de Vida. Tic-Tac-Toe
Author of the program: Manuel de la Herrán Gascón
Pulsate To learn for to begin the process of learning. When consider that it is there is learnt it sufficient, pulsate To end and after To play Against the Ordering for to prove the level of learning to the that it is there is arrived.
The processes of learning they are slow, and not is appropriate detener the learning in any point of the execution. For this reason, probably it will be necessary to wait a time after of have pulsado To end, until that the process of learning it is detiene actually. Has patience, and if use a protective of screen, configure a time of waiting sufficiently high.
The tic-tac-toe is a game very simple and not is difficult to program a ordering for that it make with certain ability. It that makes interesting to this is that not require no knowledge previous on the game. Is to say, the program it will be capable of to play against yes same and against we, and to learn to to depart of the own experiment, and not is going to be necessary to introduce to the program knowledge on how to play, and much less on how to earn. This program apply a variety of Genetic Algorithms for to learn to play to the tic-tac-toe, classifying certain rules generated through pseudoazar according to its usefulness.
Learning to to play to the tic-tac-toe
In this program the entities that born, it is reproduce and die they are "player", and they are formed for "rules" (genes) that define how to act (to play) before determined circumstances. The program handle rules of the type "If i observe such situation, i accomplish such action". To the to play a departure, the entity observe the state of the checkerboard. If possess some rule that to apply, the apply. In case opposite, plays to the random. For example, this rule avoid that the opposite put three chips in line.
All the player possess a indicative call weight that represent the ability of said agent in the game of the tic-tac-toe. When end a departure between two player virtual, the weight of the winning it is will increase in a unit, to the opposite that the of the losing. In the case of that the departure end in tables, without winning, the weights not it is will see affected.
If in a moment given of the departure the entity puede to apply more of a rule, the agent seleccionará the rule that more victories there is provided. For this, to the equal that the agents, the rules also will will have assigned a weight, that will increase each time that the rule there is been used in a departure cattle and will reduce in the lost.
The rules also possess "priorities" of form that before a same situation, being able to it be fired more of a rule, it is disparate the of more priority. For so much, the procedure i complete of election of the rule to to fire is:
1.- Selecciónar of the joint To of all the rules of the agent, a subset B of all the rules that they can be fired in this moment, is to say, those in whose part left it is represent a state of checkerboard that coincide with the current.
2.- If the subset B is empty, it is execute a movement lawful anyone, elected through a function of pseudoazar. If the subset B possess a only element, it is execute said rule. If the subset B not is empty and poseee but of a element, it is selecciona, of the joint B, a subset C of those rules whose priority is maximum (the values numerical more high).
3.- If the subset C possess a only element, it is execute said rule. If the subset C poseee but of a element, it is selecciona, of the joint C, a subset D of those rules whose weight is maximum (the values numerical more high).
4.- If the subset D possess a only element, it is execute said rule. If the subset D poseee but of a element, it is selecciona, of the joint D, a element anyone, elected through a function of pseudoazar.
These rules they are the material genetic that handle the program. When two player it is reproduce, generate a new entity formed for the rules of both taken of form alternative, and creating of time in when a new rule to the random in time of to take it of one of the parents.
Basically, the process of learning consist in to evaluate the entities, to order them in function of its usefulness and to reproduce the better individual, that will occupy the place of the losing. To evaluate the entities in this case is to group them for couples and to make that play departures.
When we consider that the process of learning there is lasted it sufficient (and this puede be much time), we will pulsate the button of "To end" and after the of "To play against the Ordering" for to prove how much it is there is learnt. Pulsateing "To see Agents" we will see a ready of the rules of each agent, with the weights final of each rule and of each agent.
To the to play against the program, in reality we are playing against the first agent of the ready, is to say, the that more times there is cattle, though the program permit also to play against anyone of the other that there is remained "live". In the checkerboard of game it is inform, between other things, of the number of departures that they have ended in tables and of the number of chips total that it is they have put using a rule (in time of to the random), for each cycle.
A example simplified of the operation general of this program would be:
1-We believe 40 agents compounds each one of 100 rules; these rules they are generated to the random. Each rule says that action to accomplish before certain state of the checkerboard. Each rule possess associate a value of priority of 0 to 3, generated also to the random.
2-We put to the agents to to play departures for couples. Is to say, we take agents of two in two and we make that each couple plays 5 departures followed to the tic-tac-toe.
3-Asígnamos punctuations to the agents: each time that a agent earn a departure, earn a point, and if the lose, lose a point.
4-We order to the agents in function of its points (to this it we call "weight" of the agent"
5-Now we go to to make that the better agents continue "living" and in change die the other, whose place it will be occupied for new agents. Selecciónamos the 20 better agents, and the we combine for couples, creating for each couple two new falling that have the rules of one and other progenitor intercalated, and the we copy in to the zone of report of the 20 agents worse.
6-We return to the step 2
Some aspects to to have in account
The premise that it is there is followed in all the process of programming there is been the of not to include in the program no knowledge about of the problem to to solve, already that it that it is it is studying implicitly is the learning in general, and not the representation of the knowledge human for a problem in particular. Without embargo, in some aspects is unavoidable to include certain knowledge, and in other it is there is fact "trap". The aspects unavoidable they are for example, the that it is refer to the game in yes same, as the format of the rules. Each rule possess in the part left nine values that correspond with the nine senses of the agent, and in the part straight a value that represent a of the nine actions possible. Evidently the entities they should be capable of to see the checkerboard and of to put chips, and this also there is that to program it. Other aspect unavoidable is the function that detect which of the player there is cattle the departure, or if the game there is remained in tables. Is to say, the game, the problem to to solve or the environment where inhabit the agent (as it is wants to call), there is that to program it: not we can to solve a problem if not we defined the terms of that problem.
For other part, the program produces a learning very slow. We recall that the program not selecciona player good to the tic-tac-toe (already that not knows it that is a player good, precisely that is it that we try of to discover), but that it is seleccionan player better that the other. The better individual of all it will be very preparation for to play and to earn in its world. Good, but if a child discover almost immediately as not to lose in the game. ¿For what our program is so heavy? Not there is that to forget that the agents see the checkerboard as a sequence of data.
There is that to recognize that even for we would result very uncomfortable to play with a checkerboard thus. If furthermore not supiésemos that this string of cabins represent in reality a grid of 3x3, we could to return us insane attempting to verify how to earn a departure.
These reasons encourage to to commit some exceptions in the generation of rules for to accelerate the process of the learning. In first place, always it is generate rules valid. This not would would have for what be thus, would for example it be generated the rule C********1 that not is valid already that put a chip in a cabin occupy. The agents that applied a rule of this type would lose the departure immediately, and thus only it is would obtain to detain yet more the learning. The program also permit to give greater probability of appearance to certain type of rules that it is knows of beforehand that they can be very interesting, concretely rules with two P ó two C and a V, that in many cases they will be able to correspond with: "If i have two chips in line i put the third and i earn" or "If the opposite has two chips in line, to him i prevent to earn putting the third".
Options of the program
The operation basic puede be modified with different options. The program permit to choose the following:
Number of agents that will play to the tic-tac-toe, that it will be constant during all the learning. Number of rules that possess each agent, also constant. Probability of mutation, is to say, the probability of that to the it be created a rule of a new agent, in time of to take the rule of one of its two progenitors, it is generate a rule to the random. The type of selection, it being permitted four cases: It is selecciona for the reproduction the 50% better of the joint of all the entities, that generate the 50% remaining It is reproduce the 10% better on the 90% remaining It is reproduce the 20% better on the 80% remaining It is selecciona the 40% better and the 10% worse, and it is reproduce on the 50% remaining Frequency of mutations accumulated, of form that the probability of that exist a mutation in a rule not is independent for each rule, but that is more probable that it is produces a mutation in a rule if in its neighboring already it is there is produced. The parents they can it be taken for couples elected to the random or in order of weights. The player they can it be joined equally to the random or in order of weights. In the case of to detect that two progenitors they are equal, is possible to make that the children they will be mutations in all its elements. The weight of a entity it is puede to calculate of form independent, such as it is there is explained, in function of the number of departures cattle, or it is puede to modify this value, reducing the weight if exist other entity very similar, assuring thus the variety in the population. The entities they can to communicate to other its knowledge, of way that when a entity there is cattle a departure, it is increase the weight of the rules used in that departure, but not alone in the entity winning, but in all the population, acting of form similar for the rules used in the departures lost. The generation of rules it is puede to make totally to the random or with a function of random "trucada" that generate with greater probability rules that it is presuppose útiles, giving thus a "empujoncito" to the evolution. As is logical, the algorithms that more success have in problems practical they are those that furthermore of be capable of to learn for yes same, depart of a state initial in the that the entities possess great quantity of knowledge useful about of the problem to to solve. Though the results they could be much more spectacular, not it is makes much emphasis in this issue, already that it that it is it is studying is the process of learning autonomous, and not the representation of the knowledge human in a program. In principle, in the reproduction it is take a rule of one or other progenitor of form alternative: first of the mother, after of the father, the following of the mother, etc., but exist the option of to inherit always the rule of but weight of the two. To substitute for mutations the rules repeated in a same agent. To substitute for mutations the rules with a weight smaller or equal that one given (in all the cycles). To substitute for mutations and each certain number of cycles, the rules with a weight smaller or equal that one given.
Tests of execution of the program
Maintaining the example 2 during but of a day (500 cycles) it is they have obtained the following results. It is procures to arrive to a (and in certain moments two) of the rules sought in its form more optimized. Transucurrido a time from its first appearance, this rule it is maintain in the population. The number of other type of rules good but not optimized, is to say, with values of P or C where would have to have *, stay but or less constant. The number of movements accomplished with rules grow of a form slow but identificable.
[Back to Index]
This program and its files source they are free of charge and of free distribution. The code source it is available and puede be modified, distributed, or used in other programs with deposit freedom.
To download source code
For to obtain the last version of the program, for to suggest possible amplifications, if it is detected some mistake in the programming or if wish to communicate that it is is going to to widen or to use a part or all this program, not doubt in it be put in contact with the author in the address: E-mail
Click here for to download source code.
[Back to Index]
¡You are guest to to hold, criticize and collaborate with your ideas!
Send your commentaries, questions, suggestions, or criticize to to the following address:
[ Home Page Spanish | Home Page English ] The software of Gaia is free, sources available. There is not
commercial interests, only to research and amusement.
You can send your suggestions about
the face of the web or its contents to the E-mail