Table of Contents
DISSECT or GROK? Or both?
Use DISSECT to process data
Dissect pattern
the term
example
DISSECT key modifier
Right padding modifier (->)
Additional modifier ( + )
Add order modifiers (+ and /n)
Named skip key(?)
Reference keys (* and & amp;)
Use GROK to process data
Grok pattern
regular expression
example
Grok debugger
limitation
Your data may contain unstructured strings that you want to structure. This makes it easier to analyze the data. For example, a log message might contain IP addresses that you want to extract so that you can find the most active IP addresses.
For developers who have used Logstash and Ingest pipeline, DISSECT and GROK are no stranger to you. You can refer to the following articles:
-
Elasticsearch: Deep understanding of Dissect ingest processor
-
Elasticsearch: Difference between Dissect and Grok processors
-
Logstash: Use dissect to import documents in CSV format
-
Logstash: Grok pattern example for log parsing
Elasticsearch can build data at index time or query time. At index time, you can use the Dissect and Grok ingestion processors, or the Logstash Dissect and Grok filters. When querying, you can use the ES|QL DISSECT and GROK commands.
DISSECT or GROK? Or both?
DISSECT works by breaking up strings using delimiter-based patterns. GROK works similarly but uses regular expressions. This makes GROK more powerful, but generally slower. DISSECT works well when the data is reliably repeated. When you really need the power of regular expressions, such as when the structure of the text varies from line to line, GROK is the better choice.
You can use DISSECT and GROK for mixed use cases. For example, when part of a line repeats reliably, but the entire line does not. DISSECT can deconstruct repeated line sections. GROK can use regular expressions to process the remaining field values.
Use DISSECT to process data
The DISSECT processing command matches a string against a delimiter-based pattern and extracts the specified keys into columns.
For example, the following pattern:
%{clientip} [%{@timestamp}] %{status}
Matches log lines of the following format:
1.2.3.4 [2023-01-23T12:15:00.000Z] Connected
and add the following columns to the input table:
clientip:keyword | @timestamp:keyword | status:keyword |
---|---|---|
1.2.3.4 |
2023-01-23T12:15:00.000Z |
Connected |
Dissect pattern
The Dissect pattern is defined by the portion of the string that will be discarded. In the previous example, the first part to be discarded was a single space. Dissect finds this space and assigns the value of clientip to everything before that space. Next, dissect matches [ and ], then assigns @timestamp to everything between [ and ]. Paying special attention to the parts of the string you want to discard will help you build successful dissect patterns.
Empty key %{} or
Named skip keys can be used to match a value but exclude the value from the output.
All matching values are output as the keyword string data type. Use type conversion functions to convert to another data type.
Dissect also supports key modifiers that can change the default behavior of dissect. For example, you can instruct dissect to ignore certain fields, append fields, skip padding, etc.
Term
Name | Description |
---|---|
dissect pattern | Describes the set of fields and separators in the text format. Also called dissection. Use a set of %{} parts to describe the dissection: %{a} – %{b} – %{c} |
Field | Text from %{ to } (inclusive). |
The text between the delimiter | } and the following %{ characters. Any character set other than %{, ‘not }’, or } is a delimiter. |
key |
Text between %{ and }, excluding ?, +, & & amp; prefixes and ordinal suffixes. example:
|
example
The following example parses a string containing a timestamp, some text, and an IP address:
ROW a = "2023-01-23T12:15:00.000Z - some text - 127.0.0.1" | DISSECT a "%{date} - %{msg} - %{ip}" | KEEP date, msg, ip
date:keyword | msg:keyword | ip:keyword |
---|---|---|
2023-01-23T12:15:00.000Z |
some text |
127.0.0.1 |
By default, DISSECT outputs the keyword string column. To convert to other types, use type conversion functions:
ROW a = "2023-01-23T12:15:00.000Z - some text - 127.0.0.1" | DISSECT a "%{date} - %{msg} - %{ip}" | KEEP date, msg, ip | EVAL date = TO_DATETIME(date)
msg:keyword | ip:keyword | date:date |
---|---|---|
some text |
127.0.0.1 |
2023-01-23T12:15:00.000Z |
DISSECT key modifier
Key modifiers can change the default behavior of dissect. Key modifiers may be to the left or right of %{keyname} and are always within %{ and }. For example %{ + keyname ->} has append and right padding modifiers.
Modifier | Name | Position | Example | Description | Details |
---|---|---|---|---|---|
|
Skip right padding |
(far) right |
|
Skip all repeated characters to the right |
link |
|
Append |
left |
|
Append two or more fields together |
link |
|
Append with order |
left and right |
|
Append two or more fields together in the specified order |
link |
|
Named skip key |
left |
|
Skip matching values in the output. Same behavior as %{} |
link |
|
Reference keys |
left |
|
Set the output key to * value and & amp; output value |
link |
Pattern
%{ts->} %{level}
Input
1998-08-10T17:15:42,466 WARN
Result
- ts = 1998-08-10T17:15:42,466
- level = WARN
Pattern
%{ts->} %{level}
Input
1998-08-10T17:15:42,466 WARN
Result
- ts = 1998-08-10T17:15:42,466
- level = WARN
The right padding modifier can be used with null keys to help skip unwanted data. For example, the same input string, but enclosed in parentheses, would require an empty right pad key to achieve the same result.
Right padding modifier example with null key
Pattern |
|
Input |
[1998-08-10T17:15:42,466] [WARN] |
Result |
|
Additional modifier (+)
Dissect supports appending two or more results together for output. Values are appended from left to right. Additional delimiters can be specified. In this example, append_separator is defined as whitespace.
Additional modifier examples:
Pattern |
|
Input |
john jacob jingleheimer schmidt |
Result |
|
Add order modifiers (+ and /n)
Dissect supports appending two or more results together for output. Values are appended according to the order defined (/n). Additional delimiters can be specified. In this example, append_separator is defined as a comma.
Example of additional order modifiers:
Pattern |
|
Input |
john jacob jingleheimer schmidt |
Result |
|
Named skip key(?)
Dissect supports ignoring matches in the final result. This can be done using the null key %{}, but for readability the null key may need to be named.
Named skip key modifier example:
Pattern |
|
Input |
1.2.3.4 – – [30/Apr/1998:22:00:52 + 0000] |
Result |
|
Reference Key (* and & amp;)
Dissect supports using parsed values as key/value for structured content. Imagine a system that partially records key/value pairs. Reference keys allow you to maintain this key/value relationship.
Reference key modifier example:
Pattern |
|
Input |
[2018-08-10T17:15:42,466] [ERR] ip:1.2.3.4 error:REFUSED |
Result |
|
Use GROK to process data
The GROK processing command matches a string against a regular expression-based pattern and extracts the specified keys into columns.
For example, the following pattern:
%{IP:ip} \[%{TIMESTAMP_ISO8601:@timestamp}\] %{GREEDYDATA:status}
Matches log lines of the following format:
1.2.3.4 [2023-01-23T12:15:00.000Z] Connected
and add the following columns to the input table:
@timestamp:keyword | ip:keyword | status:keyword |
---|---|---|
2023-01-23T12:15:00.000Z |
1.2.3.4 |
Connected |
Grok pattern
The syntax for Grok pattern is %{SYNTAX:SEMANTIC}
SYNTAX is the name of the pattern that matches your text. For example, 3.44 is matched by the NUMBER pattern, and 55.3.244.1 is matched by the IP pattern. Grammar is how you match.
Semantics are identifiers you provide for matching text fragments. For example, 3.44 might be the duration of the event, so you could just call it duration. Additionally, the string 55.3.244.1 identifies the client making the request.
By default, matched values are output as the keyword string data type. To convert a semantic data type, suffix it with the target data type. For example, %{NUMBER:num:int}, which converts the num semantics from a string to an integer. Currently the only supported conversions are int and float. For other types, use type conversion functions.
For an overview of available modes, see GitHub. You can also use the REST API to retrieve a list of all schemas.
regular expression
Grok is based on regular expressions. Any regular expression is also valid in grok. Grok uses the Oniguruma regular expression library. For the complete supported regular expression syntax, see the Oniguruma GitHub repository.
Note: Special regular expression characters such as [ and ] need to be escaped with \. For example, in the previous pattern:
%{IP:ip} \[%{TIMESTAMP_ISO8601:@timestamp}\] %{GREEDYDATA:status}In ES|QL queries, the backslash character itself is a special character and needs to be escaped with another \. For this example, the corresponding ES|QL query becomes:
ROW a = "1.2.3.4 [2023-01-23T12:15:00.000Z] Connected" | GROK a "%{IP:ip} \[%{TIMESTAMP_ISO8601:@timestamp}\] %{GREEDYDATA:status}"
custom patterns
If grok doesn’t have the pattern you need, you can use the Oniguruma syntax for named capture, which lets you match a piece of text and save it as a column:
(?<field_name>the pattern here)
For example, the queue id for a postfix log is a 10 or 11 character hexadecimal value. This can be captured into a column called queue_id using:
(?<queue_id>[0-9A-F]{10,11})
Example
The following example parses a string containing timestamps, IP addresses, email addresses, and numbers:
ROW a = "2023-01-23T12:15:00.000Z 127.0.0.1 [email protected] 42" | GROK a "%{TIMESTAMP_ISO8601:date} %{IP:ip} %{EMAILADDRESS:email} %{NUMBER:num}" | KEEP date, ip, email, num
date:keyword | ip:keyword | email:keyword | num:keyword |
---|---|---|---|
2023-01-23T12:15:00.000Z |
127.0.0.1 |
42 |
By default, GROK outputs keyword string columns. Int and float types can be converted by appending :type to the semantics in the pattern. For example {NUMBER:num:int}:
ROW a = "2023-01-23T12:15:00.000Z 127.0.0.1 [email protected] 42" | GROK a "%{TIMESTAMP_ISO8601:date} %{IP:ip} %{EMAILADDRESS:email} %{NUMBER:num:int}" | KEEP date, ip, email, num
date:keyword | ip:keyword | email:keyword | num:integer |
---|---|---|---|
2023-01-23T12:15:00.000Z |
127.0.0.1 |
42 |
For other type conversions, use type conversion functions:
ROW a = "2023-01-23T12:15:00.000Z 127.0.0.1 [email protected] 42" | GROK a "%{TIMESTAMP_ISO8601:date} %{IP:ip} %{EMAILADDRESS:email} %{NUMBER:num:int}" | KEEP date, ip, email, num | EVAL date = TO_DATETIME(date)
ip:keyword | email:keyword | num:integer | date:date |
---|---|---|---|
127.0.0.1 |
42 |
2023-01-23T12:15:00.000Z |
Grok Debugger
To write and debug grok mode, you can use the Grok debugger. It provides a UI for testing patterns against sample data. Under the hood, it uses the same engine as the GROK command.
Limitations
The GROK command does not support configuring custom modes or multiple modes. GROK commands are not subject to the Grok watchdog settings.
The knowledge points of the article match the official knowledge files, and you can further learn relevant knowledge. MySQL entry skill treeSQL advanced skillsCTE and recursive query 77721 people are learning the system