Validation and SQL Injection
Validating User Input
As many of you might be aware, one of the staples of any web application security is to make sure that data passed to you from the user won't break your application or otherwise damage your data. Just like you'd check to see who is at the door of your house before you let them in, you'd check user supplied data to see if it is acceptable for whatever you intend to use it for.
Another reason to validate input is simply because you can provide more intuitive responses to the user. They may hit a letter by accident while typing in a number. It is far more clear to the user if you inform them that they've made a mistake and allow them to correct it, rather than have the application break, or worse, have the application continue with an unexpected result.
In PHP, user input arrives as one of two types: an array, or a string. Since the most common form of validation involves checking to see if the user has provided a valid integer representation inside one of the string values, we will focus on this point.
What Could Possibly Happen If I Don't?
Well, it largely depends on what your application is doing. It could be as simple as throwing a big ugly error, but it could also be a lot worse. Take the following highly insecure code example:
This piece of code seems simple enough. Open a connection to 'myserver.com', and select a row from 'mytable'. What could possibly go wrong? As it turns out, quite a lot. This particular piece of code has two very distinct problems which we'll look at.
First off, we're not validating the user input at all. This is a problem in cases where the program expects a certain kind of input and we don't explicitly check for that input. I've purposefully labeled the columns "int_col" and "string_col" to let you know what types they are in the table. Notice that one is specifically an integer. Now, if the user enters a non-integer value in to that column, the query will error.
What not to do
Many people will be tempted to use one or more of the following when faced with validating integers:
These are all the wrong ways to approach this problem.
Why casting to INT is bad
Casting to INT is often not a good idea from the standpoint that the behaviour is confusing to the user. For example if they accidentally enter the value "123r5" casting to INT will make this value "123". This could lead to results that simply don't make sense from the user's perspective. It is a much better idea to reject the input and ask them to enter a whole number.
Why ctype_digit() is bad
ctype_digit() has two main flaws when dealing with integer validation:
- It is incapable of dealing with signed numbers, so we can't validate negative values
- If the number is cast to an INT, it will treat the values from 0 - 255 as a character rather than a number
Why is_numeric() is bad
Many people reach for is_numeric() thinking this is a good approach, and it will seem to work just fine. However the problem with is_numeric() isn't that it can't detect an integer, it is that it detects a lot more than just integers. All of the following are valid numbers to is_numeric():
- 1
- 1.123
- 0xFF
- +0123.45e6
See the problem? If we really want just an integer, this function is too broad.
So, how do I fix it?
The best way to fix this sort of validation issue is to use php's filter library. This library is designed to help you check and sanitize user input. For example, we could change our application to use filter_var() and it might look like this:
Now if we try to enter anything but a number for $_GET['int_col'] it won't get to the point where it asks MySQL for data. These kinds of validations are perfect for form data, and can allow you to decide to render the form again with errors to inform the user that they have made a mistake, and to try again. However, we're not quite finished here just yet. There is a remaining problem. The dreaded SQL injection.
What Is an SQL Injection Attack?
An SQL injection attack is when a user injects SQL commands in to an unprotected SQL query. This can lead to a number of issues, including modifying rows you didn't intend for the user to modify, dropped tables, deleted rows, and access to possibly sensitive data. It is critical that you learn and understand how these attacks work. SQL injection attacks are arguably the most common way PHP websites get exploited. The importance can not be overstated.
Initially, in code, the part where we build our query looks like this:
$query = "SELECT secret_data FROM mytable WHERE string_col = '{$_GET['string_col']}' AND int_col = {$_GET['int_col']} AND user_id={$user_id}";
If I call: http://mysite.com/my_vulnerable_file.php?string_col=some_data%27+OR+1%3D1+--+&int_col=1 then:
- $_GET['string_col'] has a value of "some_data' OR 1=1 -- "
- $_GET['int_col'] has a value of "1"
- $user_id has a value of "1" (we set this in code, it is not a user supplied value)
When the code gets to the point where it builds the query, it winds up looking something like this:
SELECT secret_data FROM mytable WHERE string_col = 'some_data' OR 1=1 -- ' and int_col = 1 and user_id = 1
Notice the double dash. This is a mysql comment token, and it will cause everything after it to be ignored. To MySQL, the query now looks like this:
SELECT secret_data FROM mytable WHERE string_col = 'some_data' OR 1=1
Which is clearly now ignoring our $user_id variable, and the int_col clause in favour of what the attacker has chosen. You can probably imagine how bad this could get if instead of a SELECT the query happens to be an UPDATE or DELETE. Depending on the database library, it may also allow multiple queries to be specified in the same string, meaning the innocent SELECT could also have an entirely new query piggy-backed on to it.
How to Deal With Injection Attacks
So, we started off protecting our integer value, but what about the string value? Technically, a string could be any sequence of characters. If your program needs to allow them to be anything, we still have to protect our code. So how do we do this?
As with many things, there are several ways to deal with this problem depending on what database library you're using. To start off, we'll look at the standard mysql library used in the previous examples. The best way to avoid injection attacks when dealing with the standard mysql library, is to escape your parameters. We can do this with mysql_real_escape_string().
For example:
So what's happening here? Consider our previous example where the attacker was sending a malicious string.
- some_data' OR 1=1 --
now becomes
- some_data' OR 1=1 --
See the subtle difference? The "" character makes it so that the quote in the string is now harmless, and instead of closing the value prematurely it will just be a part of the string. This is just one of the many bad characters mysql_real_escape_string() will fix for you.
Using Prepared Statements to Stop Injection Attacks
We've stopped the injection attack! Hurray! But what if there were an easier, better way to do all this? The good news is, there is! PHP comes with another database library to deal with MySQL. It is called PHP Data Objects or PDO for short. This library can use drivers for many different database types, and supports a very important feature known as prepared statements, sometimes also known as parametrized queries.
So what do these prepared statements do? Well, quite a lot. They allow us to design our query ahead of time and to put placeholders in the areas where our user supplied data will go. Then when we ask the library to inject the values, it automatically escapes them for us. Consider this example:
Not only is this method much cleaner, but we can't forget to escape our parameters because PDO is kind enough to do it for us.
Another advantage of prepared statements is that you can use the prepared statement over and over by simply replacing the parameters and executing it again. Many servers detect that you're doing this, and even make the query faster by performing server-side optimizations.