Thursday, April 16, 2009

User input and MySQL queries

One thing that comes up quite often when looking at people's code is the proper handling of user input inside SQL queries. Tutorials rarely put enough effort into explaining how input should be handled when entered into a SQL query or why it should be done the way it is done. With this article, I'd like to clarify a bit about few issues surrounding the proper handling or user input, also known as escaping the data.

Why is important to handle user input properly?

"SQL injection" is the keyword here. When user input is entered into SQL query, lack of proper escaping or data validation can allow malicious users to enter their own SQL commands in the SQL query, which can be used to retrieve data from the database such as passwords or admin details. This data can then be used to access the administration functionality of the website.

Often people mistakenly think that user input only needs to be escaped in parts of a system that are available to the public. However, lack of escaping in administration areas can cause problems as well. For example, if the system is widely available, a malicious user could try to craft a special link, which he tries to get a person with proper access rights to click. At first it may sound odd, but it is a legimate tactic. The url could be disguised in number of ways too, like as a remote image (to make browser load it automatically) or using various url shortening services.

In short, there is no reason to ever skip data escaping for any system at all. Any input that is entered into queries must be properly escaped and proper measures should always be taken regardless of any relating conditions, such as additional input validation or even if it's just a simple internal system.

Even if the data doesn't come from a user, you need to escape it properly. For example, if you're performing maintenance on a database and reading values and reinputting them, some data already in the database could be malicious if entered into query without proper preparations.

A bit about magic quotes

First things first. Before we actually get into preparing the data for SQL query, we need to take care of old and annoying feature called "Magic Quotes". If a setting called magic_quotes_gpc is enabled in php.ini, PHP will automatically enter a backslash before any backslash, single or double quote. This is an old feature that was supposed to fight against SQL injections, but it's not perfect, which makes it more annoying than useful. Protection against SQL injection must be perfect and without flaws.

Since whether the feature is enabled depends on a setting in php.ini, you need to get the value of that setting to determine if any action needs to be taken. Normally you could use ini_get() to get value of a php.ini setting. However, since magic quotes will be removed from PHP6, it's no good if you're preparing for the future. Luckily, there is a function called get_magic_quotes_gpc(), which will return value of the setting. That function will exist even in future version, where it will always return false.

The function that removes any additional slashes added by Magic Quotes is stripslashes(). Be careful, though, since another function called stripcslashes() exists which acts similarly, but only to a certain degree. So, in order to remove the additional slashes added by magic quotes, you could use code like this:

if (get_magic_quotes_gpc())
{
 $_POST['myinput'] = stripslashes($_POST['myinput']);
 $_POST['otherinput'] = stripslashes($_POST['otherinput']);
}

Then add all the input fields you have in the code above. However, if you have many input fields, it can be tedious to maintain the list. Thus, it may be easier to just iterate over the entire array. The following code will strip all extra slashes and work even if you have arrays in the form.

function array_stripslashes (array $array)
{
 foreach ($array as $key => $value)
 {
  $array[$key] = is_array($value)
   ? array_stripslashes($value) : stripslashes($value);
 }
 
 return $array;
}

if (get_magic_quotes_gpc())
{
 $_POST = array_stripslashes($_POST);
}

The code calls the function array_stripslashes recursively for each array, so all values in the $_POST array are stripped properly. While this bit was not strictly about escaping the user input, stripping slashes is an important issue that pops up often with input handling as well, restulting in people wondering why it seems like escaping functions cause additional backslashes in their database.

Escaping the user input

When dealing with MySQL, there are three different approaches to doing SQL queries. You can either use the normal MySQL library, MySQL Improved library or the PDO library. Personally I'd recommend using the PDO, since it provides a common interface for different database systems, if you need to work with other database systems. I'll go over the escaping methods for all these there different approaches.

For the examples below, I have a database called test with a table escapetest. It has three columns name, country and age. The fields name and country are TEXT type, while the field age is an INT. I simply want to insert a new row containing data for a single person into the database.

For each example, let's assume the user sent the data some data in name, country and age fields, and that we first performed the proper stripping using the code above.

Note that while there are many ways to access the MySQL database, each way needs their own extensions and libraries to be enabled in PHP. The standard MySQL library requires the mysql extensions, MySQL Improved library requires the mysqli extension and PDO library requires both pdo and pdo_mysql extensions.

Data escaping with MySQL extension

Using the MySQL extension is the most common way of using MySQL. Most tutorials use it due to it's simplicity, even though it may not be the best solution due to it's lack of support for prepared statements.

With the MySQL extension, to escape data for a SQL query, you need to pass it through mysql_real_escape_string() function. This will add backslashes before any character that needs to be escaped inside a string in a MySQL query. So, to properly do this, I could use code like this:

$link = mysql_connect('localhost', 'root', 'root');
mysql_select_db('test', $link);

$name = mysql_real_escape_string($_POST['name'], $link);
$country = mysql_real_escape_string($_POST['country'], $link);
$age = (int) $_POST['age'];

mysql_query(
 "INSERT INTO escapetest (name, country, age) " .
 "VALUES ('$name', '$country', $age)", $link);
 
mysql_close($link);

There are couple worthwhile things to mention about this code. First, it's important to remember to open the SQL connection before using mysql_real_escape_string(). The connection is needed, because the function determines the characters to escape based on the character set used by the connection.

You may also notice that instead of escaping, I simply cast the age into an integer. For floats and integers, casting the variable to it's appropriate type is enough, because the resulting numeric data can not contain anything malicious and can be entered safely into the query with additional escaping.

Data escaping with MySQL Improved extension

With MySQL Improved extension, you have the power of prepared queries, which make data escaping very easy. The data entry could be done using the following piece of code:

$mysqli = new mysqli('localhost', 'root', 'root', 'test');
$stmt = $mysqli->prepare(
 'INSERT INTO escapetest (name, country, age) ' .
 'VALUES (?, ?, ?)');
$stmt->bind_param('ssi', $_POST['name'], $_POST['country'], $_POST['age']);
$stmt->execute();
$stmt->close();
$mysqli->close();

With prepared statements, you enter question marks into the query instead of the actual values, and then use a separate function to bind values to those parameters in the query. All the data escaping is performed automatically and you don't need to worry about taking care of SQL injection yourself. You could manually build the query yourself in the MySQL Improved extension, but it's much better idea to use the prepared statements.

Data escaping with PDO extension

PDO, also known as the PHP Data Objects is the newest addition to PHP's database libraries. It's a common database layer that can access all database system's which have PDO support. Escaping data with PDO is quite like using the MySQL Improved library, as the example demonstrates:

$dbh = new PDO('mysql:dbname=test;host=localhost', 'root', 'root');
$sth = $dbh->prepare(
 'INSERT INTO escapetest (name, country, age) ' .
 'VALUES (?, ?, ?)');
$sth->execute(array($_POST['name'], $_POST['country'], $_POST['age']));

Using the prepared statement, the possibility of SQL injection is completely avoided, because the database layer makes sure that input data can not cause any harm.

Dynamic queries based on user input

While mysql_real_escape_string() and prepared statements are great, they will only work for data which is supposed to be quoted data, such as the values in the INSERT query or the WHERE section of a SELECT query. Sometimes (but hopefully very rarely) you need to use user input in other parts of the query, such as choosing the table name or determining limits.

In such cases, you have no easy option, unless the data is numeric and you can just cast it into integer or float. You simply must do as good validation for the data as possible. Preferrably you should always have some sort of whitelist of all allowed values, which the user input is checked against. How that list is created or maintained is up to you and out of the scope of this article.

Further thinking

This article only really covered information about how to properly enter user input (and other data) into SQL queries safely without the fear of SQL injection. While these methods are enough to keep your database safe, there are still many security and other considerations to make.

For example, you will need to think what kind of data validation will you do to prevent users from using HTML in the fields and what kind of checks to perform to make sure the data follows correct formats (such as emails or zip codes). Perhaps I will return to these subjects another time, but sadly they are not part of this article.

0 comments:

Post a Comment

Note: Only a member of this blog may post a comment.