This article first appeared in the SQLServerGeeks Magazine.
Author: Leonard Lobel
Subscribe to get your copy.
Always Encrypted is the latest of several encryption features available in SQL Server and Azure SQL Database. We’ve had column-level encryption since SQL Server 2005, which uses either certificates or symmetric keys to keep encrypted data hidden from view. SQL Server 2008 (Enterprise Edition) added Transparent Data Encryption (TDE) to encrypt the entire database–again, using a special database encryption key–so that without that key, the entire database (and its backups) remains encrypted and completely inaccessible.
Although these features serve us well, they do suffer from two significant drawbacks.
First, the very certificates and keys used for encryption are themselves stored in the database (or database server), which means that the database engine is always capable of decrypting the data. While this may be acceptable with an on-premise data center that you manage yourself, it’s a major problem if you want to move your data to the cloud. Because by giving up physical ownership of the database, you’re also handing over the encryption keys and certificates to your cloud provider (Microsoft, in the case of Azure), empowering them to access your private data.
Another concern is the fact that these older features only encrypt data “at rest” (on disk), relying on other protocols (for example, SSL and TLS) to encrypt data “in flight” (across the network).
Enter Always Encrypted
Always Encrypted was introduced in SQL Server 2016 to address these very concerns. With this feature, data is encrypted not just at rest, but also in flight. Furthermore, the cryptography keys themselves–which are essential for both encrypting and decrypting–are not stored in the database. Those keys stay with you, the client.
Thus, Always Encrypted effectively separate the clients who own the data, from the cloud providers who host it. Because the data is always encrypted, SQL Server (and the cloud hosting provider) cannot decrypt it. Data can only be served up in its encrypted state, and so, data is inherently encrypted in flight as well. Only when it arrives at the client can it be decrypted, on the client, by the client, who possesses the necessary keys. Likewise, when inserting or updating new data, that data gets encrypted immediately on the client, before it ever leaves the client, and remains encrypted in-flight all the way to the database server, where SQL Server can only store it in that encrypted state–it cannot decrypt it. This is classic GIGO (garbage in, garbage out), as far as SQL Server is concerned.
The initial version (“V1”) of Always Encrypted in SQL Server 2016 was an important first step toward confidential computing. But in SQL Server 2019 (and now, Azure SQL Database), the feature has been greatly enhanced to work with secure enclaves, and this enables rich query processing over encrypted data beyond what was possible with V1. Since secure enclaves builds on the initial Always Encrypted implementation, you need to start by understanding how Always Encrypted worked prior to SQL Server 2019.
Always Encrypted V1 (SQL Server 2016)
Always Encrypted protects sensitive data on the server using cryptography keys available only to the client. Specifically, these include Column Encryption Keys and Column Master Keys.
Column Encryption Key (CEK)
For each column in each table that you want to encrypt, you create one or more Column Encryption Keys (CEKs). These are keys that encrypt your data using the SHA256 cryptography algorithm. You can create one CEK for each column you want to encrypt, or you can use the same CEK to encrypt multiple columns in multiple tables; it’s not necessarily one-to-one.
With a CEK in hand, data can be encrypted and decrypted in column(s) protected by that CEK. Thus, CEKs must be carefully guarded. They can’t be stored in the database, since encryption and decryption occurs exclusively on the client side.
Column Master Key (CMK)
Actual CEKs can’t be stored in the database, but encrypted CEKs can. And that’s where the Column Master Key (CMK) comes in. Every CEK is encrypted by a CMK. So, think of the CMK as a “key encrypting key” that, itself, encrypts the CEKs (a “data encrypting key”) so that they are safe to store in the database. The CMK is only ever available on the client side; for example, in the client machine’s Windows certificate store, or in the cloud using Azure Key Vault (AKV), accessible using client credentials. Then, in addition to the encrypted CEKs, the client-side path to the CMK (not the CMK itself) is stored in the database.
As a result, the database has all the information that the client needs to perform encryption and decryption but is itself powerless to perform these operations on its own. And that’s because the CEK is needed for cryptography operations; but the database only has a CEK that has been encrypted by the CMK (and not the CEK itself). Furthermore, it has only a client-side path to the CMK (and not the CMK itself). Thus, Always Encrypted can be viewed as a hybrid feature that is based on client-side encryption/decryption and driven by server-side metadata.
Encrypting a Table
SQL Server Management Studio (SSMS) provides tooling to generate CMKs and CEKs for Always Encrypted. It also has a wizard that will migrate an existing (non-encrypted) table to a table with one or more encrypted columns.
For each column, you choose a CEK and an encryption type of deterministic or randomized. You must choose deterministic if you want to be able to query (equality only) or join on the column. This works because the same ciphertext (encrypted data) is always generated from the same clear text. Otherwise, you should choose randomized because it’s much more secure than deterministic. For example, deterministically encrypting a Boolean column yields only two distinct ciphertext values, making it easy for a hacker to distinguish true and false values. With random encryption, the same Boolean column will appear to have many different values, but cannot be queried against.
The wizard then creates a new table with matching schema, and with CEK and encryption type designations assigned accordingly to each column you selected. Then it transfers the rows into the new table, encrypting along the way. But remember, it’s the client (SSMS in this case) that’s performing the encryption–not SQL Server, which has no access to the CEK. All the data gets round-tripped through the wizard, which encrypts the selected column(s) using the CEK(s) and encryption type(s) that you selected. Finally, the wizard drops the old table, renames the new table, and the migration is complete.
Querying an Encrypted Table
On the client side, cryptography operations are transparently performed by the client driver, and is currently supported for ADO.NET, ODBC, JDBC, or PHP. Note that the ADO.NET driver supports Always Encrypted across all .NET flavors (.NET Framework, .NET Core, and .NET Standard).
Here’s the workflow:
1) The client includes “Column Encryption Setting=Enabled” in the connection string
2) The server sends the encrypted CEK (yellow key with red lock) and the CMK path (red key in dotted border) back to the client.
3) The client retrieves the CMK (red key) from the path (local computer certificate store or Azure Key Vault) and uses it to decrypt the CEK.
4) The client encrypts the SSN with the CEK (yellow key), so that it can be queried by the server. This implies that deterministic encryption is being used on the SSN column, making it possible to query on it (equality only).
5) The client issues a modified version of the query with ciphertext for the SSN. This is in-flight encryption, so anyone hacking the wire cannot see the SSN in clear text.
6) The Name column returned to the client is also encrypted, but it could (should) be using randomized encryption if there is never a need to query on it. Again, anyone hacking the wire in this direction cannot see the Name in clear text.
7) The client receives the encrypted name column, and decrypts it using the CEK.
With this basic implementation, data is encrypted not just at rest and in-flight, but “in-use” as well. That is, limited operations over encrypted data can be performed by SQL Server without requiring decryption. Notably, only equality comparison is allowed, supporting point lookups (like the SSN example), as well as JOIN, GROUP BY, and DISTINCT, but not much else. Furthermore, these operations only work with deterministically encrypted columns, which is less secure than randomly encrypted columns. Always Encrypted with Secure Enclaves (SQL Server 2019 and Azure SQL Database)
In SQL Server 2019 (and, most recently, Azure SQL Database), Always Encrypted has been greatly enhanced to leverage secure enclaves. With secure enclaves, server-side processing of encrypted data is fully supported, including not just equality comparisons (previously possible only with deterministic encryption), but range queries, pattern matching (LIKE), and sorting–all over randomly encrypted data.
Furthermore, using secure enclaves, cryptography operations can be performed in place on the server. Encrypting existing data therefore does not require round-tripping the network (like the SSMS wizard with V1), which scales poorly with for large amounts of data.
All this may seem like an impossible feat, given that SQL Server still has no access to the keys needed for encryption and decryption. Yet SQL Server 2019 and Azure SQL Database make this possible by leveraging secure enclaves in conjunction with the base Always Encrypted functionality introduced in SQL Server 2016. So data remains protected on the one hand, while at the same time, the ability to perform rich server-side computations over that data is preserved.
What is a Secure Enclave?
To understand how this magic works, you need to understand what an enclave is. Simply put, an enclave is a special region of the normal memory allocated to a process. This region of memory is isolated and protected not only from its containing process, but everything else on the entire machine. No other processes (not even the almighty kernel itself) can access this region of memory. The enclave is essentially a black box that cannot be accessed even by highly privileged administrators.
Like any memory, an enclave can contain both code and data. However, code must be signed in a special way in order to be able to run in an enclave, and then that becomes the only code running on the machine that can access data contained inside the same enclave.
Several technologies are available today to provide the secure isolation of an enclave. This includes hardware-based solutions such as Intel Software Guard Extensions (SGX), which is used by Azure SQL Database. Secure enclave isolation can also be powered by leveraging the machine’s hypervisor, such as virtualization-based security (VBS) in Windows Server 2019 and Windows 10 v1809, which is used by SQL Server 2019.
An attacker attempting to access an enclave can easily open a debugger, connect to the process that contains the enclave, and find the enclave memory. But the memory contents will not be visible to them. For example, attempting to view the contents of a VBS enclave reveals nothing but question marks:
Enclaves will never be exposed in a full memory dump and they are completely impervious to memory scanning attacks. This makes them an extremely attractive technology that can serve as a trusted execution environment for processing sensitive data.
Leveraging Secure Enclaves
For Always Encrypted, the goal with secure enclaves remains the same; protect sensitive data from highly privileged but unauthorized users (like DBAs and machine admins). By using secure enclaves, this level of protection can now be maintained without compromising SQL Server’s ability to perform rich queries, and encryption can be performed in-place on on the server.
When the database engine starts, it loads an enclave. This means that SQL Server is now a hosting process that contains an enclave, but SQL Server itself does not run in the enclave, nor can it access the enclave’s content. Rather, the enclave acts as an extension of the client-side trust boundary on the server machine; a trusted representative of the client within the SQL Server environment. Think of it as a foreign embassy. The embassy is physically located inside a foreign country. Yet within the perimeter of the embassy, only the laws of its native country apply, while the laws of the hosting foreign country do not. At the same time, it’s just a footstep to enter or exit the embassy, compared with the thousands of miles to travel back and forth between the countries.
The way to think of this is in terms of the Always Encrypted philosophy is, cryptography operations are still performed exclusively by the client, but not necessarily on the client machine. Meaning, the enclave on the server machine in essence is the client. Critically, this means that the client and server can communicate without round-tripping the network, because client code is running inside the enclave as an extension of the client machine.
Enclave Attestation
But how does the client machine know that the enclave on the server machine can be trusted? How does it know that there isn’t malicious code running inside the enclave? This supreme level of trust is achieved by both the client and server machines negotiating through a third machine, called the attestation server.
As the name implies, the sole purpose of this server is to attest to the authenticity of the enclave. That is, it certifies to the client that the enclave on the server is running code that can be trusted. Only then does the client authorize the use of the enclave.
Once attestation succeeds, the client driver establishes a secure tunnel connection to client code running inside the enclave on the server machine. The client machine and the client code inside the enclave on the server both exchange a shared secret over this secure tunnel. This secret is then used to encrypt a CEK on the client machine and send it to the enclave on the server machine. Inside the enclave–and only inside the enclave–the shared secret is used to decrypt the CEK.
Enabling Rich Query
Now the CEK is available inside the enclave, but still completely unavailable to SQL Server running inside the process that’s hosting the enclave. At this point, SQL Server can perform rich queries over encrypted data; for example, with support for pattern matching (LIKE), range comparisons, and sorting. And that’s because the client running in the enclave is close at hand, and can be utilized for cryptography operations all on one machine, with no network activity.
When we ask SQL Server to execute a query that includes rich computations, it’s still powerless to process those portions of the query that operate over encrypted columns. So instead, SQL Server delegates these portions of the query over to the enclave, along with the encrypted data that needs to be examined for that particular operation (for example, a range comparison). The query engine injects the encrypted data into the enclave (which is effectively the same as passing it to the client but without a network call) and asks it to perform the operation. columns are similarly resolved by the enclave in-line with query execution. In this manner, encrypted data is decrypted on the fly and processed by the client running in the enclave, as needed, by the query engine on the server.
Conclusion
This article gave you an overview of Always Encrypted with secure enclaves in SQL Server 2019. And now that this exciting feature is finally available in Azure SQL Database as well, you can leverage the technology for greater security in both your on-premise and cloud databases.
This article first appeared in the SQLServerGeeks Magazine.
Author: Leonard Lobel
Subscribe to get your copy.