dotnet explores the principles of SemanticKernel’s planner

When using SemanticKernel, I was fascinated by SemanticKernel’s powerful plan capability. Through the plan function, AI can automatically schedule and assemble multiple modules to achieve complex functions. I am particularly curious about the principles of the planner in SemanticKernel and how the underlying implementation is implemented. Fortunately, SemanticKernel is completely open source. By reading the source code, I understand the working mechanism of SemanticKernel. Next, I will share with you the principles I have learned.

From the lowest level of non-metaphysical logic, it can be considered that the input and output of the bottom layer of SemanticKernel through AI layers such as GPT are only text, while Planner needs to perform orchestration and scheduling of multiple functional tasks to achieve its functions. The most convenient way to understand is to tell the AI layer in advance what functions or capabilities are currently available, and then let the AI decide how these functions and capabilities should be scheduled to meet the needs.

In other words, humans as engineers provide a variety of functional capabilities. As users who put forward requirements, humans describe the requirements. Next, as AI, based on the requirements description input by the user, it cooperates with the various functional capabilities provided by the engineer to complete the user’s needs. needs

For example, if you want to realize the need to write poems in a certain language, the user’s description of the needs is probably what kind of poem to write and then translate it into which language. At this time, what the engineer provides is a poetry writing function or plug-in, and a translation function or plug-in. Then the AI layer performs orchestration and scheduling. First, the poetry function is called to compose poetry, and then the poetry composition result is used as the translation of the translation function for translation. Finally, the translation result is returned to the user.

The above requirement will be very simple to implement with the help of SemanticKernel

Next, let’s try to complete similar functions without using the Plan tool provided by SemanticKernel. Learn about the implementation details of SemanticKernel by writing your own code instead of the Plan function provided by SemanticKernel

The general principle implementation steps are as follows:

f8934a669cbae886f90b7f476203a49b.jpeg

First, follow the method provided by the dotnet SemanticKernel Getting Started Importing Skills into the Framework blog, and import two SemanticFunction functions into the SemanticKernel framework, namely poetry writing and translation.

kernel.RegisterSemanticFunction("WriterPlugin", "ShortPoem", new PromptTemplateConfig()
{
    Description = "Turn a scenario into a short and entertaining poem.",
}, new PromptTemplate(
    @"Generate a short funny poem or limerick to explain the given event. Be creative and be funny. Let your imagination run wild.
Event:{<!-- -->{$input}}
", new PromptTemplateConfig()
    {
        Input = new PromptTemplateConfig.InputConfig()
        {
            Parameters = new List<PromptTemplateConfig.InputParameter>()
            {
                new PromptTemplateConfig.InputParameter()
                {
                    Name = "input",
                    Description = "The scenario to turn into a poem.",
                }
            }
        }
    }, kernel));

kernel.CreateSemanticFunction(@"Translate the input below into {<!-- -->{$language}}

MAKE SURE YOU ONLY USE {<!-- -->{$language}}.

{<!-- -->{$input}}

Translation:
", new PromptTemplateConfig()
{
    Input = new PromptTemplateConfig.InputConfig()
    {
        Parameters = new List<PromptTemplateConfig.InputParameter>()
        {
            new PromptTemplateConfig.InputParameter()
            {
                Name = "input",
            },
            new PromptTemplateConfig.InputParameter()
            {
                Name = "language",
                Description = "The language which will translate to",
            }
        }
    },
    Description = "Translate the input into a language of your choice",
}, functionName: "Translate", pluginName: "WriterPlugin");

The above SemanticFunction alchemy content comes from the example of SemanticKernel official warehouse

The two functions WriterPlugin.ShortPoem and WriterPlugin.Translate can be registered through the above code. As you can see, in the process of registering these two functions, the functional descriptions of these two functions, as well as the descriptions of their parameters and parameters, are also written in detail. These descriptions are specially designed for the AI layer to read, so that the AI layer can understand the functions of these functions, so that the AI layer knows how to call these functions.

Originally, I first used Chinese to write the above SemanticFunction to implement the content. However, my alchemy level was not up to par and I couldn’t write a good example, so I used the official example. The English descriptions in the above functions are not the focus of this article. If you don’t understand, please skip it. You only need to know that these two functions have been prepared in advance.

After completing the preparation work, we will start writing the core logic of Plan. The core implementation is actually a function similar to SemanticFunction. Millions of alchemists were invited to write the content of the prompt word, which is used to tell the AI layer that an XML structure needs to be created. This XML structure contains the logic of how to schedule and various parameters. What value should be passed in. Since I couldn’t afford a million-dollar alchemist, I had to buy Microsoft’s million-dollar alchemist prompt for free.

var semanticFunction = kernel.CreateSemanticFunction(
    @"Create an XML plan step by step, to satisfy the goal given, with the available functions.

[AVAILABLE FUNCTIONS]

{{{$available_functions}}

[END AVAILABLE FUNCTIONS]

To create a plan, follow these steps:
0. The plan should be as short as possible.
1. From a <goal> create a <plan> as a series of <functions>.
2. A plan has 'INPUT' available in context variables by default.
3. Before using any function in a plan, check that it is present in the [AVAILABLE FUNCTIONS] list. If it is not, do not use it.
4. Only use functions that are required for the given goal.
5. Append an ""END"" XML comment at the end of the plan after the final closing </plan> tag.
6. Always output valid XML that can be parsed by an XML parser.
7. If a plan cannot be created with the [AVAILABLE FUNCTIONS], return <plan />.

All plans take the form of:
<plan>
    <!-- ... reason for taking step ... -->
    <function.{FullyQualifiedFunctionName} ... />
    <!-- ... reason for taking step ... -->
    <function.{FullyQualifiedFunctionName} ... />
    <!-- ... reason for taking step ... -->
    <function.{FullyQualifiedFunctionName} ... />
    (...etc...)
</plan>
<!-- END -->

To call a function, follow these steps:
1. A function has one or more named parameters and a single 'output' which are all strings. Parameter values should be xml escaped.
2. To save an 'output' from a <function>, to pass into a future <function>, use <function.{FullyQualifiedFunctionName} ... setContextVariable=""<UNIQUE_VARIABLE_KEY>""/>
3. To save an 'output' from a <function>, to return as part of a plan result, use <function.{FullyQualifiedFunctionName} ... appendToResult=""RESULT__<UNIQUE_RESULT_KEY>""/ >
4. Use a '$' to reference a context variable in a parameter, e.g. when `INPUT='world'` the parameter 'Hello $INPUT' will evaluate to `Hello world`.
5. Functions do not have access to the context variables of other functions. Do not attempt to use context variables as arrays or objects. Instead, use available functions to extract specific elements or properties from context variables.

DO NOT DO THIS, THE PARAMETER VALUE IS NOT XML ESCAPED:
<function.Name4 input=""$SOME_PREVIOUS_OUTPUT"" parameter_name=""some value with a <!-- 'comment' in it-->""/>

DO NOT DO THIS, THE PARAMETER VALUE IS ATTEMPTING TO USE A CONTEXT VARIABLE AS AN ARRAY/OBJECT:
<function.CallFunction input=""$OTHER_OUTPUT[1]""/>

Here is a valid example of how to call a function ""_Function_.Name"" with a single input and save its output:
<function._Function_.Name input=""this is my input"" setContextVariable=""SOME_KEY""/>

Here is a valid example of how to call a function ""FunctionName2"" with a single input and return its output as part of the plan result:
<function.FunctionName2 input=""Hello $INPUT"" appendToResult=""RESULT__FINAL_ANSWER""/>

Here is a valid example of how to call a function ""Name3"" with multiple inputs:
<function.Name3 input=""$SOME_PREVIOUS_OUTPUT"" parameter_name=""some value with a & amp;lt;!-- & amp;apos;comment & amp;apos; in it-- & amp;gt;""/>

Begin!

<goal>{{$input}}</goal>
");

The above prompt word content is to insert the content named available_functions first, which will be replaced with the currently available function list later. The next step is to tell the AI layer how to formulate a plan, what the output XML format should be, and provide him with an example, such as the following code

<plan>
    <!-- ... reason for taking step ... -->
    <function.{FullyQualifiedFunctionName} ... />
    <!-- ... reason for taking step ... -->
    <function.{FullyQualifiedFunctionName} ... />
    <!-- ... reason for taking step ... -->
    <function.{FullyQualifiedFunctionName} ... />
    (...etc...)
</plan>

As well as telling the AI layer what it should write and what it shouldn’t output. The content of the above prompt words seems to have been carefully designed by Microsoft officials. Several prompt words I wrote casually cannot achieve the above effect.

Since I was worried that the blog engine would hang up due to two {, I changed { to a full-width code>{ symbol, in actual use, the standard { character is still used

After completing the writing of the core logic prompt words, we created a smart function. Next, we try to call this smart function to implement the function.

Before starting, first inject a list of functions that can be used, such as the following code. Through the GetFunctionsManualAsync method, you can export each function currently registered in the SemanticKernel, whether it is a SemanticFunction or a NativeFunction native function.

var relevantFunctionsManual = await kernel.Functions.GetFunctionsManualAsync(new SequentialPlannerConfig());

The above GetFunctionsManualAsync method will return each registered function, as well as the description of the function and the input parameters and parameter descriptions of the function. The approximate content is as follows:

WriterPlugin.ShortPoem:
  description: Turn a scenario into a short and entertaining poem.
  inputs:
    - input: The scenario to turn into a poem.

WriterPlugin.Translate:
  description: Translate the input into a language of your choice
  inputs:
    - input:
    - language: The language which will translate to

Through the above output content, I believe everyone can understand why it is necessary to write a description of the function when defining a function of SemanticKernel. Not only can these descriptions be read by humans, but they can also be read by machines.

Put the above output code into the available_functions variable to let the AI layer know what functions are currently available.

ContextVariables vars = new(goal)
{
    ["available_functions"] = relevantFunctionsManual
};

The goal variable in the above code is the user’s input requirement. Here it is the requirement to help write a poem and then translate it into Chinese. The defined code is as follows

var goal = "Write a poem about John Doe, then translate it into Chinese.";

Or you can directly enter Chinese requirements here

var goal = "Help write a poem about Brother Shui and translate it into Chinese";

After entering the requirements, start running the smart function of Million Alchemists

ContextVariables vars = new(goal)
{
    ["available_functions"] = relevantFunctionsManual
};

var planResult = await kernel.RunAsync(semanticFunction, vars);
string? planResultString = planResult.GetValue<string>()?.Trim();

The planResultString obtained above is the planning and scheduling XML configuration result output by the AI layer. The approximate content is as follows

<plan>
    <!-- First, we create a short poem about "Brother Shui" -->
    <function.WriterPlugin.ShortPoem input="Brother Shui" setContextVariable="POEM"/>
    <!-- Then, we translate the poem into Chinese -->
    <function.WriterPlugin.Translate input="$POEM" language="Chinese" appendToResult="RESULT__FINAL_ANSWER"/>
</plan>

Next we need to write some C# code to convert the XML scheduling tasks outputted above into individual Plan tasks for more detailed scheduling execution.

var xmlString = planResultString;
XmlDocument xmlDoc = new();
xmlDoc.LoadXml("<xml>" + xmlString + "</xml>");
XmlNodeList solution = xmlDoc.GetElementsByTagName("plan");

After converting the logic to XML, the next step is to schedule and configure according to the functions and parameters mentioned in XML. Parsing XML is not difficult. I believe everyone will know how to write code as soon as they see the requirements. The specific execution after the parsing is completed, this time changes to the question of how to execute the function in SemanticKernel. I believe this is also familiar to everyone.

In order to make it easier to understand the effect of our implementation, I continue to use the Plan type of SemanticKernel in the following code to facilitate quick import implementation.

XmlNodeList solution = xmlDoc.GetElementsByTagName("plan");

var plan = new Plan(goal);

foreach (XmlNode solutionNode in solution)
{
    foreach (XmlNode childNode in solutionNode.ChildNodes)
    {
        if (childNode.Name == "#text" || childNode.Name == "#comment")
        {
            // Do not add text or comments as steps.
            // TODO - this could be a way to get Reasoning for a plan step.
            continue;
        }

        if (childNode.Name.StartsWith("function.", StringComparison.OrdinalIgnoreCase))
        {
            var pluginFunctionName = childNode.Name.Split(new string[] { "function." }, StringSplitOptions.None)?[1]  string.Empty;
            SplitPluginFunctionName(pluginFunctionName, out var pluginName, out var functionName);

            if (!string.IsNullOrEmpty(functionName))
            {
                var function = kernel.Functions.GetFunction(pluginName,functionName);
                if (function != null)
                {
                    var planStep = new Plan(function);

                    var functionVariables = new ContextVariables();
                    var functionOutputs = new List<string>();
                    var functionResults = new List<string>();

                    var view = function.Describe();
                    foreach (var p in view.Parameters)
                    {
                        functionVariables.Set(p.Name, p.DefaultValue);
                    }

                    if (childNode.Attributes is not null)
                    {
                        foreach (XmlAttribute attr in childNode.Attributes)
                        {
                            if (attr.Name.Equals("setContextVariable", StringComparison.OrdinalIgnoreCase))
                            {
                                functionOutputs.Add(attr.InnerText);
                            }
                            else if (attr.Name.Equals("appendToResult", StringComparison.OrdinalIgnoreCase))
                            {
                                functionOutputs.Add(attr.InnerText);
                                functionResults.Add(attr.InnerText);
                            }
                            else
                            {
                                functionVariables.Set(attr.Name, attr.InnerText);
                            }
                        }
                    }

                    planStep.Outputs = functionOutputs;
                    planStep.Parameters = functionVariables;
                    foreach (var result in functionResults)
                    {
                        plan.Outputs.Add(result);
                    }

                    foreach (var result in functionResults)
                    {
                        plan.Outputs.Add(result);
                    }

                    plan.AddSteps(planStep);
                }
            }
        }
    }
}

Console.WriteLine(await kernel.RunAsync(plan));

static void SplitPluginFunctionName(string pluginFunctionName, out string pluginName, out string functionName)
{
    var pluginFunctionNameParts = pluginFunctionName.Split('.');
    pluginName = pluginFunctionNameParts?.Length > 1 ? pluginFunctionNameParts[0] : string.Empty;
    functionName = pluginFunctionNameParts?.Length > 1 ? pluginFunctionNameParts[1] : pluginFunctionName;
}

Since the data structure of SemanticKernel’s Plan allows Plans to be nested within Plans, it directly corresponds to the XML structure, and each function is registered to include the calculation process.

In the end, SemanticKernel’s Plan execution method is still used to complete all functions. Executing Plan in SemanticKernel means recursively executing Plan step by step. The bottom layer of execution is still the function of SemanticKernel.

After writing the code here, I believe everyone can see that the principle of SemanticKernel’s planner is to have millions of alchemists write the prompt word content, first convert the user input requirements into a plan and schedule in XML format, and then write C# code to parse the XML content , convert from XML to Plan type, and then call it step by step according to the Plan object to complete the user’s needs

The output result of running the above code is roughly as follows. You are welcome to change the name of another person to try the output result.

In a land where Mandarin is spoken,
There lived a man named Shui Ge, who was a fanatical fan.
For clear water,
He will laugh, he will cheer,
Splash in the water all day like only water people can do.

He would jump into the lake and roar loudly,
Swim in the river, from bank to bank,
In the sea, he will jump for joy,
In the rain, he will dance,
Oh, Brother Shui loves water, that’s for sure!

He would bathe in the puddle and be so happy,
Or drink from the stream, so peaceful,
There were splashes and splashes,
And a little hot pot soup,
Brother Shui, this water man, lives so happily! 

The code of this article is placed on github and gitee. Welcome to visit

You can obtain the source code of this article in the following way. First create an empty folder, then use the command line cd command to enter the empty folder. Enter the following code in the command line to obtain the code of this article.

git init
git remote add origin https://gitee.com/lindexi/lindexi_gd.git
git pull origin f4448f4507145f1695b7ef81045ae030fc8f1a20

The source of gitee is used above. If gitee cannot be accessed, please replace it with the source of github. Please continue to enter the following code on the command line

git remote remove origin
git remote add origin https://github.com/lindexi/lindexi_gd.git
git pull origin f4448f4507145f1695b7ef81045ae030fc8f1a20

After getting the code, go to the SemanticKernelSamples\Example12_Planner folder