All AI is a series of weights and biases. That's the name of the game: how those weights are set up (either by the designer or by the box itself though the many means of learning).
public class KLEPAgent : MonoBehaviour {

private KLEPNeuron parentNeuron;
private KLEPExecutableBase currentAction = null;
private int inTandemRecheckLimit = 900; // Limit for rechecking in-tandem actions
private bool recheckInTandemExecutables = true;
private Dictionary<string, float> qValues = new Dictionary<string, float>(); // Store Q-values for actions
public float certaintyThreshold = 0;

private void UpdateQValues(string executableName, float reward, float learningRate)
{

    if (!qValues.ContainsKey(executableName))
    {
        qValues[executableName] = 0.0f;
    }

    qValues[executableName] += learningRate * (reward - qValues[executableName]);
}

private float CalculateActionStickiness(float certainty)
{
    // Action stickiness could be inversely proportional to certainty
    // Higher certainty means less need to switch actions frequently
    return Mathf.Clamp(1.0f - certainty, 0.0f, 1.0f);
}

public void Feed(string executableName)
{
    // Positive feedback, increase Q-value
    UpdateQValues(executableName, 1.0f, 0.1f); // Example values
}

public void Hurt(string executableName)
{
    // Negative feedback, decrease Q-value
    UpdateQValues(executableName, -1.0f, 0.1f); // Example values
}

public void Initialize(KLEPNeuron neuron)
{
    parentNeuron = neuron;
    parentNeuron.bridge.RegisterEvent("KeyAdded", OnKeyAddedEvent);

...
Code example from the agent with the RL portion. You see that the QValues are set when a hurt signal or feed signal are sent along with the executable name (could be anything of course, its just a string). If i have an exe that hurts the agent every time it is on the left edge of the screen. And it has 2 executables that it can choose from, GoLeft and GoRight, then that hurt value will override any initial weight that those executables have. That is the essence of RL, and why RL is so fucking retarded. The next question would be, ok, what if i just want an exe that picks a random place and then we go there, how do i get that RL to influence that selection using only one exe. Well, there are many ways you could design for that, off the top of my head would be when sending a hurt signal, the string you pass contains the x/y position of the agent. Then, when selecting its new position in that exe, it would parse that data to find the positions with the highest positive value, and then pick a random spot near that. The next question would be, ok, but how do i get it to just know that automatically? And then i say, Well, now you are asking for it to code itself, I may be spiffy, but im just a single guy. I have a hypothesis though, called KlepImagination. That would be a component that utilized the key to lock interactions as a sort of baby speak in a way to describe the world/problem space it is navigating. Those key name and lock name combinations would be fed into a dnn or ann and that would then spit out code from a sampled library along with mutations (like NEAT - neuro-evolutionary autonomous topology). Klep already utilized an autonomous topology when navigating problem space, once it can make its own executable; thats when the freaky sci-fi level shit happens.

[-]x0x7

0(+0|0)

My question is how can it learn. That's the one part I haven't been able to pick up about KLEP. Like if it were more effective or desirable for it to use more diagonal movements or if were better for it to spend 85% of the time in the top half of the screen and 15% in the bottom half how would it learn that?

parent linkreply

[-]High_Quality_Dick_Pics

Score		1
Age		1
Proximity		1
Bump		1

Comment preview