Member-only story
Working With Emoji in Swift
Emoji aren’t just simple strings

Emoji have become a big part of our life. iPhones and iPads have a special emoji keyboard (unless it’s turned off). We see them on websites, in mobile and desktop apps, and we enter them when writing texts and filling in forms.
How do we control them? How do we prevent users from entering emoji in UITextField
? How do we parse emoji in the JSON response from a server? Let’s discuss it all.
A Little Bit of Theory
Emoji are a part of modern Unicode. Computers work with bits and bytes — not with smiles and other small pictures. Letters, numbers, and special characters in a text are all encoded in one or more bytes each. The same goes for emoji. They’re just symbols.
There are three standard modifications of Unicode. All of them are constantly evolving, new symbols appear, new languages are included. So there are actually more than three versions, but for us developers, it’s important to know three different standards:
- UTF-8 (Unicode Transformation Format eight bits): Each symbol in this encoding is presented as one or more bytes. Simple latin characters, digits, and some other symbols take only one byte (eight bits). If the first bit is 0, we know that it’s a one-byte symbol. If it’s a Russian, Chinese, or Arabic symbol or an emoji, it will start with bit 1 and have more than one byte.
- UTF-16 (Unicode Transformation Format 16 bits): All symbols are encoded into two or four bytes. Two bytes make 65,536 combinations, which include almost all known characters. Emoji usually take two bytes, but they can have modifiers (colour of skin or hair). In that case, it uses extra space.
- UTF-32 (Unicode Transformation Format 32 bits): The simplest for understanding encoding, but the least memory-efficient. Each symbol takes exactly four bytes.
Emoji have appeared in Unicode since version 6.0 back in 2010. All modern iPhones and Macs support much newer versions, so by adding emoji to your apps, you can be sure that users will see them.
“An
NSString
object encodes a Unicode-compliant text string, represented as a sequence of UTF–16 code…