Prismo Systems, USA
Web application vulnerabilities are an important entry vector for threat actors. According to the 2019 Verizon Data Breach Incident Report [1], web applications, privilege misuse and miscellaneous errors account for 81 per cent of breaches of retail organizations.
In a paper we presented at VB2019 [2], we detailed query and parameter integrity algorithms used to detect SQL, NoSQL and OS command injection exploitation. In this follow-up paper, we detail algorithms that can be used to detect SQL injection in stored procedures, persistent cross-site scripting (XSS), and server‑side request forgery (SSRF) by instrumenting web applications. Server-side request forgery allows a threat actor to access internal resources by leveraging a vulnerability in Internet-facing web applications – which can be identified by data flow analysis. SQL injection in stored procedures will lead to an additional clause in the executing SQL query, and persistent XSS in the database will lead to HTML elements in the executing query. This additional code, in the form of SQL clauses or HTML elements due to the injection-based exploitation, lays the foundation for the detection algorithms that are discussed in this paper.
Like SQL queries, stored procedures can also be vulnerable to SQL injection. If the stored procedures are using dynamic SQL and the dynamic SQL query is constructed by concatenating the parameters then the stored procedures are prone to SQL injection exploitation. Figure 1 shows an example of a vulnerable stored procedure.
If the @user variable is admin’-- and the password is none, the query becomes:
'SELECT * FROM authtable WHERE UserName = 'admin'--' AND Pass = 'none' '
If there is a username admin, the query will return true and authentication will be successful.
In order to detect SQL injection in stored procedures, honey SQL queries are used. The following are some of examples of honey queries:
SELECT * WHERE id = '[arguments passed to Stored procedure]' AND honey_value = '2' ,
(SELECT * WHERE id = '[arguments passed to Stored procedures]' AND honey_value = '1')
(SELECT * WHERE id = [Arguments passed to Stored procedures] AND honey_value = '1')
SELECT * WHERE id = [Arguments passed to Stored procedure] AND honey_value = '1'
SELECT * WHERE id = '''[arguments passed to Stored procedure]''' AND honey_value = '2' ,
(SELECT * WHERE id = '''[arguments passed to Stored procedures]''' AND honey_value = '1')
'SELECT * WHERE id = "%s" AND honey_value = "2"'
The arguments passed to the stored procedures are extracted and inserted into honey queries. If an argument is inserted in the first honey query, the normalized honey SQL query will always be SELECT * WHERE id = $1 AND honey_value = $2 and the parse tree of the normalized first SQL query will be as shown in Figure 2. The argument of the stored procedure will always be data and not an SQL clause.
In the case of an SQL injection exploit in a stored procedure, the input parameters to the stored procedure will contain data along with the SQL clause. An example of an exploit sent to a stored procedure is: "abhi' ORDER BY 6-- upxr". When the exploit is inserted in the first honey SQL query, the query becomes SELECT * WHERE id = 'abhi' ORDER BY 6-- upxr AND honey_value = '2'. The normalized honey query with exploit becomes SELECT * WHERE id = $1 ORDER BY $2-- upxr AND honey_value = '2'. The parse tree of the normalized query with SQL injection exploit is as shown in Figure 3. SQL injection exploits the stored procedures and inserts additional clauses, leading to changes in the parse tree of the normalized honey query.
The algorithm to detect SQL injection exploitation in stored procedures makes use of application-level hooks to instrument the functions which invoke database stored procedures, setting designated parameters and executing the stored procedures such as prepareCall(), setString() and execute(). The instrumentation helps to reveal the parameters that are passed to the stored procedures. The detection algorithm makes use of a set of honey queries. A parse tree of the normalized honey queries is computed and stored.
During every invocation of the API that invokes the stored procedures, designated parameters are assigned to the stored procedures and the procedures are executed; arguments passed to the stored procedures are extracted. These extracted parameters are then passed to the honey SQL queries, a parse tree of the honey queries is computed and compared with the pre-stored parse tree. If there is an additional or deleted node in the parse tree of the honey queries, an alarm for SQL injection in the stored procedures is raised.
In 2019 the server-side request forgery exploitation technique [3] was used to retrieve AWS (Amazon Web Services) credentials that were subsequently used to steal the personal information of over 100 million Capital One customers.
In any traditional network, local host, web-based services and the internal networks are behind a firewall. SSRF allows a threat actor to exploit a vulnerability in a web application and to make an HTTP request to the local host, web-based services or in the internal networks. Figure 4 shows the vulnerable code of the Google Forms WordPress plug-in [4], which is prone to SSRF.
If a threat actor sends "http://[email protected]" the request will pass the regular expression check in the code shown in Figure 4 and will be sent to the internal IP at the address denoted by "internalip.com". This will prompt a response from the services hosted in the internal network. As per the RFC 3986 [5], the structure of the URI will be as shown in Figure 5.
As per RFC 3986, the authority component is preceded by a double slash (“//”) and is terminated by the next slash (“/”), question mark (“?”) or number sign (“#”) character, or by the end of the URI.
RFC 3986 also specifies the format of authority, as shown in Figure 6.
So if the exploit "http://[email protected]" is sent to a function which parses input, such as urllib.parse, it will be parsed and will give the output value of the host as legitimatewebsite.com while urllib.urlopen() will show the value of the input as internalip.com. This mismatch in the value of the host allows a threat actor to bypass the checks in web applications. Besides the mismatch in the value of the host, RFC 3986 also specifies the option of providing host as IP‑literal, IPv4Address or reg-name.
This means that any check for IP by a web application must ensure that the legitimate IP addresses are checked in every format.
The algorithm to detect SSRF instruments APIs such as urllib.urlopen(), urllib.request.urlopen(), etc., which take a URI as an input parameter and open a network object denoted by the URI to read it. In addition, methods that accept user inputs, such as GET, POST, etc., are also instrumented. A program dependency graph is then used to identify the APIs that make network connections and accept inputs from methods that accept user inputs such as GET() and POST(). For every invocation of an API that opens a URI, a check is made to determine if the IP address of the URL to which the connection is going is either local, a loopback address, or the local link address. If the condition is found to be true, then by using the data flow graph, it can be checked whether the parameters passed to the API which opens a network object denoted by the URI are from a method which accepts external input. If this condition is found to be true, then an alert for SSRF is raised. The internal IP address as per RFC 1918 is shown in Figure 8.
The loopback IP address for most operating systems is 127.0.0.1 ~ 127.255.255.254. If the URI is a file, then the data flow graph is used to check whether the parameters passed to the APIs which open files are from methods which accept external inputs. If the condition is found to be true then an alert for SSRF is raised.
In the case of a persistent XSS vulnerability, methods that accept user inputs, such as POST, etc., are used by a threat actor to inject an XSS exploit. These exploits then get stored in the backend database or in a secure store. This stored data is then sent to the victim without HTML escaping, where the XSS exploit gets executed.
Figure 9 shows exploitable code [6] for persistent XSS in a WordPress plug-in and the changes made to the code to remove the vulnerability. As shown in the code, there is insufficient sanitization of the plug-in version numbers before they get displayed by invoking the printf API on the corresponding plug-in page in the repository. This version number is retrieved from the database. Threat actors can exploit the vulnerability by inserting an XSS exploit with JavaScript code in the version number. This will lead to the execution of the JavaScript code. As shown in Figure 9, in the patched code the esc_html() function has been added to escape the string to ensure it is not passed as HTML.
The algorithm to detect persistent cross-site scripting makes use of application-level hooks to construct a program dependency graph (PDG). The PDG captures the flow of data and control from methods which accept external or user input, such as GET, POST, Cookies, etc., to the functions which execute the SQL query such as mysql_query(), mysql_db_query(), mysql_unbuffered_query(), pg_execute(), pg_query(), pg_query_params(), pg_prepare(), pg_send_query(), and pg_send_query_params(). Once the PDG is generated, it is used to identify the legitimate SQL queries, which are the sink for the value from the methods which accept user or external input such as GET, POST, etc. For every invocation of the query execution function a data flow graph is used to check if the input to the query execution function is from user input. If the condition is found to be true, the input is parsed against the HTML parser [7] to check if the input is an HTML element. If this condition is found to be true, an alert for persistent XSS is raised.
The algorithm to detect SQL injection in stored procedures, persistent XSS, and SSRF makes use of application-level hooks. Injection-based exploitation leads to additional code, resulting in changes to the legitimate code of the application. The computation of changes in the code is carried out for every access to the database. If there is a deviation from the original code, which is identified by changes in the parse tree of honey queries or by executing the external inputs to the executing query against the HTML parser, an alert for injection exploitation is raised.
The algorithm to detect injection-based exploitation has the following inherent advantages:
The parameter and honey query integrity algorithm follows the principle of detect, respond, and remediate. Not only does the algorithm detect exploitation, but responsive measures can be applied to stop exploitation; it also provides remedial action, which will increase the exploitation complexity for a threat actor. In the case of the parameter and query integrity algorithm, remedial action is patching the vulnerable code path, which is automatically identified with each detected exploitation attempt. If the code is patched, detection alerts will decrease, increasing the exploitation complexity for a threat actor.
[1] 2019 Data Breach Investigations Report. https://enterprise.verizon.com/resources/reports/2019-data-breach-investigations-report.pdf.
[2] Singh, A.; Mani, R. Catch me if you can: detection of injection exploitation by validating query and API integrity. https://www.virusbulletin.com/virusbulletin/2020/01/vb2019-paper-catch-me-if-you-can-detection-injection-exploitation-validating-query-and-api-integrity/.
[3] Krebs, B. What we can learn from the capital one hack. https://krebsonsecurity.com/tag/capital-one-breach/.
[4] Google Forms <= 0.91 - Unauthenticated Server-Side Request Forgery (SSRF). https://wpvulndb.com/vulnerabilities/9013.
[5] Uniform Resource Identifier (URI): Generic Syntax RFC 3986. https://tools.ietf.org/html/rfc3986.
[6] Code changes to remove persistent XSS in WordPress. https://meta.trac.wordpress.org/changeset/7195.
[7] HTML parser. https://godoc.org/golang.org/x/net/html.