Adding Emoji support to your blog

Just admit it. We all like emoji. You might have tried adding it to your blog or website. Sometimes this can be a tricky thing, especially when it's database powered. In this short blogpost I'll show you how to enable emoji on your website in just a few minutes!

Understanding the problem

Every solution starts with understanding the problem. So what is really the problem here? Why does your database reject emoji? MySQL is not scared of emoji, it doesn't hate it. It's just not really honest with us. As it turns out MySQL's utf8 charset only partially implements UTF-8 encoding. It can only store UTF-8-encoded symbols that consist of one to three bytes; encoded symbols that take up four bytes aren’t supported. It doesn't offer full unicode support.

The solution

Since MySQL 5.5.3 there was a new encoding released called utf8mb4 which does have full unicode support. Let me guide you by switching the utf8 encoding to utf8mb4 to support emoji on your website or blog.

1. Backups, backups, backups! And even more backups!

Obviously, backup your database(s) first. Doesn't need much of explanation, there's always a chance sometimes goes wrong.

2. MySQL 5.3.3. or higher

Check if your MySQL version is 5.3.3 or higher. Since it was released in early 2010 this should not be a problem, but maybe it's for the best to check anyway. If it's an earlier version you're having some issues, you're using a version that's 5 years old!

Upgrade MySQL to 5.3.3 or later or ask the server administrator to upgrade the mysql installation for you.

3. Upgrade the databases, tables & columns

Next up is updating the database, tables & columns. Repeat the queries below for every database, table & column that need to support emoji (or just full unicode). Speaks for itself that you fill in the database name, table name & column names (and types etc) reflecting your own database structure.

# Database
ALTER DATABASE database CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
# Table
ALTER TABLE table CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;
# Column
ALTER TABLE table CHANGE column column VARCHAR(16) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

Checking max length of colunmns & index keys

After changing this, the next step, and maybe even the most important one, is checking the max length of columns & index keys. When converting from utf8 to utf8mb4, the maximum length of a column or index key is unchanged in terms of bytes. Therefore, it is smaller in terms of characters, because the maximum length of a character is now four bytes instead of three.

Checkout this example:

A TINYTEXT column can hold up to 255 bytes, which correlates to 85 three-byte or 63 four-byte characters. Let’s say you have a TINYTEXT column that uses utf8 but must be able to contain more than 63 characters. Given this requirement, you can’t convert this column to utf8mb4 unless you also change the data type to a longer type such as TEXT — because if you’d try to fill it with four-byte characters, you’d only be able to enter 63 characters, but not more.

So what's left to do here is checking lengths of fieldnames and updating them if necesarry. The same goes for index keys.

Optimizing tables

And then the final step for the database part is to run the following command to optimize your tables.

mysqlcheck -u root -p --auto-repair --optimize --all-databases

Or if you don't have console access run following queries for every affected table.


4. Modify the client

Next thing is to make sure our client knows what kind of encoding it should use. We will have to set the character-set to utf8mb4 and the collation-server to utf8mb4_unicode_ci. I'm using Laravel so I'm going to use my Laravel database.php config file as example, but it's the same principle for other frameworks or when just using MySQL PDO or any kind like that.

Below you can see an excerpt of the Laravel database mysql config.

'mysql' => [
	'charset'   => 'utf8',
	'collation' => 'utf8_unicode_ci',

Just change it like so, so it reflects the changes as below.

'mysql' => [
	'charset'   => 'utf8mb4',
	'collation' => 'utf8mb4_unicode_ci',

5. Rendering the emoji on your website

The final step is actually rendering the emoji on your website. For this I used Twitter twemoji.

Just include the following javascript file like so.

<script src="//twemoji.maxcdn.com/twemoji.min.js"></script>

And add the following script to enable emoji document wide.

twemoji.parse(document.body, {
	folder: 'svg',
	ext: '.svg',
	callback: function(icon, options, variant) {
		switch ( icon ) {
			case 'a9':      // © copyright
			case 'ae':      // ® registered trademark
			case '2122':    // ™ trademark
				return false;
		return ''.concat(options.base, options.size, '/', icon, options.ext);

In the callback you see a switch with 3 cases, this is to exclude emoji for specific characters such as the copyright & trademark icons.

When I originally ran into the problem while trying to cache some data from the Foursquare API I used this blogpost from Mathias Bynens who wrote more in dept about this problem years before me. His article is the base for this one and actually I borrowed almost all my information from him, so creds go to Mathias!

Anyway, let me know in the comments if this has helped you ?! If you have any issues, please let me know as well!

Update 04 March 2015, 21:28: Originally I did not mention checking the maxlength of columns and indexes but this actually a pretty vital thing, added it after the comment of Cleanse.