Resource Groups And Selector of Presto Resource Management

Article directory

Preface
Resource group configuration
Selector Rules Selector Rules
Global Configuration Global Properties
selector properties
Configuration case
- Configuration

prestoDb

Foreword

Resource groups limit resource usage and can enforce queuing policies on queries that run within them, or allocate resources to subgroups. The query belongs to a single resource group and consumes resources from that group (and its ancestors). In addition to the limitations of queued queries, when a resource group runs out of resources, it does not cause running queries to fail; instead, new queries enter the queued state. A resource group can have subgroups or accept queries, but not both.

Resource groups and associated selection rules are configured by pluggable managers. To enable the built-in manager to read JSON configuration files, add an etc/resource-groups.properties file with the following content:

resource-groups.configuration-manager=file
resource-groups.config-file=etc/resource_groups.json

Change the value of resource-groups.config-file to point to a JSON configuration file, either an absolute path or a path relative to the Presto data directory

Resource group configuration

name (required): The name of the group. Can be a template (see below).
maxQueued (required): The maximum number of queued queries. Once this limit is reached, new queries will be rejected.
hardConcurrencyLimit (required): Maximum number of concurrently running queries.
softMemoryLimit (required): The maximum amount of distributed memory that can be used by this group before a new query is queued. Can be specified as an absolute value (for example, 1GB) or as a percentage of cluster memory (for example, 10%).
softCpuLimit (optional): Maximum CPU time this group can use over a period of time (see cpuQuotaPeriod), otherwise a penalty will be applied to the maximum number of running queries. hardCpuLimit must also be specified.
schedulingPolicy (optional): Specifies how queued queries are selected to run, and how subgroups are eligible to start queries. Can be one of three values:
○ Fair (default): Queries are queued on a first-in, first-out basis, and subgroups must take turns starting new queries if there are any queued queries.
○ weighted_fair: Select weights and queries based on the subgroup’s plan, which are already running concurrently. The expected share of running queries is calculated for subgroups based on the weights of all currently eligible subgroups. The subgroup chooses the one with the smallest concurrency relative to its share to launch the next query.
○ Weighted: Queued queries are randomly selected by priority
(Specified via query_priority:doc:session attribute). Subgroup selected
Start a new query in proportion to its schedulingWeight.
○ query_priority: all subgroups must also be configured
hardCpuLimit (optional): The maximum amount of CPU time this group can use over a period of time.
schedulingPolicy (optional): Specifies how queries waiting to be run in the queue are selected and determines when subgroups are eligible to launch their queries. Can be one of the following three values:
- fair (default): Queued queries are processed in first-in, first-out order, and subgroups must take turns starting new queries (if they have queued queries).
- weighted_fair: Selects a subgroup based on its schedulingWeight and the number of queries it is currently running concurrently. For all currently eligible subgroups, calculate their share of expected run queries based on their weight. The subgroup with the least concurrency relative to its share is selected to launch the next query.
- weighted: Queued queries are selected proportionally to the query’s priority (specified via the query_priority:doc:session property ). Start a new query by selecting subgroups proportionally according to their schedulingWeight.
- query_priority: All subgroups must also configure query_priority. Queued queries will be selected strictly based on their priority.
schedulingWeight (optional): The weight of this subgroup. See instructions above. Default is 1.
jmxExport (optional): If set to true, group statistics will be exported to JMX for monitoring. Default is false.
perQueryLimits (optional): Specifies the maximum amount of resources that each query in the resource group can use. Exceeding the limit will be terminated. These limits are not inherited from the parent group. Three types of restrictions can be set:
- executionTimeLimit (optional): Specifies the absolute maximum time a query can execute (for example, 1h).
- totalMemoryLimit (optional): Specifies the absolute maximum distributed memory that a query can use (for example, 1GB).
- cpuTimeLimit (optional): Specifies the absolute maximum CPU time that a query can use (for example, 1h).
subGroups (optional): List of subgroups.

Please note that the required attributes need to be configured in the JSON configuration file.

Selector Rules

user (optional): Regular expression used to match usernames.
source (optional): Regular expression used to match the source string.
queryType (optional): A string used to match the type of query submitted:
- DATA_DEFINITION: Queries used to modify/create/delete metadata for schemas/tables/views and manage prepared statements, permissions, sessions and transactions.
- DELETE: DELETE query.
- DESCRIBE: DESCRIBE, DESCRIBE INPUT, DESCRIBE OUTPUT, and SHOW queries.
- EXPLAIN: EXPLAIN query.
- INSERT: INSERT and CREATE TABLE AS queries.
- SELECT: SELECT query.
clientTags (optional): list of tags. To match, each tag in this list must be in the client-supplied tag list associated with the query.
group (required): The group in which these queries will run.

Global configuration Global Properties

cpuQuotaPeriod (optional): Period for enforcing CPU quotas.

Selectors are processed in order, and the first matching selector is used.

Selector attribute

The source name can be set as follows:

CLI: Use the –source option.
JDBC: Set the ApplicationName client information property on the Connection instance.

Client tags can be set as follows:

CLI: Use the –client-tags option.
JDBC: Set the ClientTags client information property on the Connection instance.

Configuration case

In the example configuration below, there are several resource groups, some of which are templates.
Templates allow administrators to dynamically build resource group trees. For example, in pipeline_

group,

In the {USER} group,

In the USER group, {USER} will expand to the name of the user who submitted the query. ${SOURCE} is also supported, which will expand to the source from which the query was submitted. You can also use custom named variables in source and user regular expressions.

There are four selectors here that define which queries run in which resource groups:

The first selector matches the query from bob and puts it into the admin group.
The second selector matches all data definition (DDL) queries from source names containing “pipeline” and places them into the global.data_definition group. This helps reduce the queuing time for such queries since they are expected to be fast.
The third selector matches queries from source names containing “pipeline” and places them into a per-user pipeline group that is dynamically created under the global .pipeline group.
The fourth selector matches a query from a BI tool whose source matches the regular expression “jdbc#(?.*)” and has client-supplied tags that are superordinates of “hi-pri” set. These queries will be placed in dynamically created subgroups under the global .pipeline.tools group. The dynamic subgroup will be created based on a named variable called tool_name extracted from the source’s regular expression. Consider a query with source “jdbc#powerfulbi”, user “kayla” and client tags “hipri” and “fast”. This query will be routed to the global.pipeline.bi-powerfulbi.kayla resource group.
The last selector is a catch-all that puts all unmatched queries into the per-user adhoc group.

Together, these selectors implement the following strategies:

User “bob” is an administrator and can run up to 50 queries simultaneously. The query will be run based on the priority provided by the user.

For the remaining users:

Up to 100 queries can be run simultaneously.
Only up to 5 concurrent DDL queries with source “pipeline” can be run. Queries are run in first-in, first-out order.
Non-DDL queries will run under the global .pipeline group with a total concurrency of 45 and a per-user concurrency of 5. Queries are run in first-in, first-out order.
For BI tools, each tool can run up to 10 queries simultaneously, and each user can run up to 3 queries. If the total demand exceeds the limit of 10, the user with the fewest running queries gets the next concurrent slot. In competitive situations, this strategy achieves fairness.
All remaining queries will be placed into each user group under a similar global .adhoc.other.

Configuration

{<!-- -->
  "rootGroups": [
    {<!-- -->
      "name": "global",
      "softMemoryLimit": "80%",
      "hardConcurrencyLimit": 100,
      "maxQueued": 1000,
      "schedulingPolicy": "weighted",
      "jmxExport": true,
      "subGroups": [
        {<!-- -->
          "name": "data_definition",
          "softMemoryLimit": "10%",
          "hardConcurrencyLimit": 5,
          "maxQueued": 100,
          "schedulingWeight": 1
        },
        {<!-- -->
          "name": "adhoc",
          "softMemoryLimit": "10%",
          "hardConcurrencyLimit": 50,
          "maxQueued": 1,
          "schedulingWeight": 10,
          "subGroups": [
            {<!-- -->
              "name": "other",
              "softMemoryLimit": "10%",
              "hardConcurrencyLimit": 2,
              "maxQueued": 1,
              "schedulingWeight": 10,
              "schedulingPolicy": "weighted_fair",
              "subGroups": [
                {<!-- -->
                  "name": "${USER}",
                  "softMemoryLimit": "10%",
                  "hardConcurrencyLimit": 1,
                  "maxQueued": 100
                }
              ]
            },
            {<!-- -->
              "name": "bi-${tool_name}",
              "softMemoryLimit": "10%",
              "hardConcurrencyLimit": 10,
              "maxQueued": 100,
              "schedulingWeight": 10,
              "schedulingPolicy": "weighted_fair",
              "subGroups": [
                {<!-- -->
                  "name": "${USER}",
                  "softMemoryLimit": "10%",
                  "hardConcurrencyLimit": 3,
                  "maxQueued": 10
                }
              ]
            }
          ]
        },
        {<!-- -->
          "name": "pipeline",
          "softMemoryLimit": "80%",
          "hardConcurrencyLimit": 45,
          "maxQueued": 100,
          "schedulingWeight": 1,
          "jmxExport": true,
          "subGroups": [
            {<!-- -->
              "name": "pipeline_${USER}",
              "softMemoryLimit": "50%",
              "hardConcurrencyLimit": 5,
              "maxQueued": 100
            }
          ]
        }
      ]
    },
    {<!-- -->
      "name": "admin",
      "softMemoryLimit": "100%",
      "hardConcurrencyLimit": 50,
      "maxQueued": 100,
      "schedulingPolicy": "query_priority",
      "jmxExport": true
    }
  ],
  "selectors": [
    {<!-- -->
      "user": "bob",
      "group": "admin"
    },
    {<!-- -->
      "source": ".*pipeline.*",
      "queryType": "DATA_DEFINITION",
      "group": "global.data_definition"
    },
    {<!-- -->
      "source": ".*pipeline.*",
      "group": "global.pipeline.pipeline_${USER}"
    },
    {<!-- -->
      "source": "jdbc#(?<tool_name>.*)",
      "clientTags": ["hipri"],
      "group": "global.adhoc.bi-${tool_name}.${USER}"
    },
    {<!-- -->
      "group": "global.adhoc.other.${USER}"
    }
  ],
  "cpuQuotaPeriod": "1h"
}
</code><img class="look-more-preCode contentImg-no-view" src="//i2.wp.com/csdnimg.cn/release/blogv2/dist/pc/img/newCodeMoreBlack. png" alt="" title="">