How to Hide Data Within an Image

Sometimes the best way to share sensitive information is to hide it in plain sight

Vin Busquet

Published in

Better Programming

7 min readMar 15, 2020

This is the purpose of steganography.

My GitHub profile image with encrypted data hidden inside

What’s Steganography?

Steganography is the art and science of embedding and hiding information within a medium in plain sight. It’s related to cryptography and is just about as old. For example, it was used by the Ancient Greeks to hide information about troop movements by tattooing the information on someone’s head and then letting the person grow out their hair.

The word steganography comes from the New Latin steganographia, which combines the Greek words steganós, meaning “covered or concealed,” and -graphia meaning “writing.”

In the context of computer science, the medium is an ordinary file, usually an image. The secret data can then be extracted at its destination later.

Why Steganography?

Unlike conventional encryption, where it’s obvious that a message is being hidden, the purpose of steganography is to hide a message in plain sight without anyone noticing.

The uses of steganography are as varied as the uses of communication itself. It can be useful in situations where sending encrypted messages might raise suspicion, such as in countries where free speech is suppressed. It’s also frequently used as a digital watermark to find when images or audio files are stolen. And also, for fun.

Since there’s probably tons of images on your computer, why not use them to hide data without affecting the images and arising suspicion?

How’s Steganography Implemented?

All steganography requires is a covertext, which is the medium where data will be hidden, a message that’s made up of data, an algorithm that decides how to hide and retrieve the data, and, optionally, a key that’ll be used to randomize the placement of the data and, perhaps, even encrypt it.

There are several different techniques for concealing data inside of normal files. The most commonly discussed steganography is embedded images. This is also the form that has the most research investigating it. While there are many types of algorithms, the three most common are the LSB, DCT, and append types.

LSB, which stands for least significant bit, is the most widely used. This technique changes the last few bits in a byte to encode a message, which is especially useful in something like an image, where the red, green, and blue values of each pixel are represented by eight bits (one byte) ranging from 0 to 255 in decimal or 00000000 to 11111111 in binary.

An example of LSB highlighted in an 8-bit number. Credit: Wikimedia Commons.

Changing the last two bits in a completely red pixel from 11111111 to 11111101 only changes the red value from 255 to 253, which, to the naked eye, creates a nearly imperceptible change in color but still allows us to encode data inside of the picture.

The diagram shows two four-pixel images in both color and binary values. Each block of binary represents the value of the corresponding pixel. Credits: Technical Foundation

The LSB technique works well for media files, where slightly changing byte values creates only slight imperceptible changes, but things like ASCII text don’t fair so well, where a single bit out of place will completely change the character.

Also, because it has an effect on the amount of a color, even if pretty small, replacing this bit with a bit from the hidden data will have the smallest effect on the picture possible. The more bits replaced, the more bit depth is available, and the larger the image, the more data that can be stored in the photo. However, the more bits that are replaced, the more obvious the alterations will appear to both a statistical inspection and a visual inspection.

For this reason, there are lots of other steganography techniques, each with their own benefits and drawbacks.

Another far less detectable one is called the discrete cosine transform (DCT) coefficient technique, which slightly changes the weights (coefficients) of the cosine waves that are used to reconstruct a JPEG image. It works by calculating the frequencies of the image and then replacing some of them. DCT algorithms are more subtle in the way they manipulate photos and so are harder to detect. Note that larger transformations (due to more embedded data) will make the manipulations more obvious.

Just about the worst of these algorithms is the class of append algorithms. Rather than hide the data in the photo by manipulating the picture, it instead appends the data to the end of the file as padding. In this manner, the data is hidden and never read by any photo-displaying program. The only advantage of these algorithms are the simplicity of programming the algorithm and the fact that they’re immune to visual inspection of the picture.

Using Steganography Effectively

By default, steganography is a type of security through obscurity, since it only hides the data — without encryption. Encrypting data before embedding it adds an extra layer of security.

That's the main goal of Cryptosteganography, a Python steganography module to store messages — or other files protected with encryption — inside an image.

This module enhance the security of the steganography through data encryption. The data concealed is encrypted using AES-256 encryption, a popular algorithm used in symmetric-key cryptography. AES has been adopted by the U.S. government and is now used worldwide.

A brute-force breakthrough of the AES-256, which has a key length of 256 bits, requires 2¹²⁸ times more computational power than a 128-bit key. Fifty supercomputers that could check a billion billion (10¹⁸) AES keys per second (if such a device could ever be made) would, in theory, require about 3.4×10⁶⁵ years to exhaust the 256-bit key space. This is an unimaginably large amount of time, far longer than the current age of the universe.

The following table shows that possible key combinations exponentially increase with the key size.

Key sizes and corresponding possible combinations to crack by brute force attack. Source: EE Times.

Embed Hidden Data Into an Image

Using Cryptosteganography is very easy. The module can be used as a library inside a Python program or as shell command-line program.

You need to have pip installed, the Python Package Installer, which can be done following these instructions. Note that on most modern Linux systems, Python and pip come installed with the OS by default.

After this, run the command below to install Cryptosteganography from the terminal:

pip3 install cryptosteganography

Once it’s installed, you can check how to use it by passing the -h or — help argument.

$ cryptosteganography -h
usage: cryptosteganography [-h] {save,retrieve} ...

A python steganography script that save/retrieve a text/file (AES 256
encrypted) inside an image.

positional arguments:
  {save,retrieve}  sub-command help
    save           save help
    retrieve       retrieve help

optional arguments:
  -h, --help       show this help message and exitThe arguments are broken down as follows:

For example, in order to embed data in an image, type the command below.

$ cryptosteganography save -i vin-and-orion.png -m "My secret message..." -o output.png
Enter the key password:

After you enter the password, which will be used to generate the encryption key, the steganographic image is generated.

To retrieve the hidden data from the steganographic image, you can run …

$ cryptosteganography retrieve -i output.png
Enter the key password:
My secret message...

… and type the same password.

The library also allows you to store other files inside the images, with the restriction that the file size must be smaller than the original image. You can check out more CLI options and how to use it as a library inside another Python program here.

Side-by-side comparison of the original image and the steganographic image

The Unsolved Puzzle of My GitHub Profile

When I released this Python library a few years ago, I challenged my coworkers and study groups with a small puzzle. The initial clues were hidden within the library’s source code repository and on my GitHub profile: https://github.com/computationalcore.

To this day, the puzzle remains unsolved.

I also added the same puzzle to my Medium’s profile "About" section.

Note: Services like GitHub often compress or transcode images after upload, potentially corrupting hidden information. However, I discovered that if you upload a photo with the exact dimensions and encoding standard that GitHub uses for its original images (those served without any URL query arguments such as ‘s,’ ‘u,’ and ‘v’), the system preserves the image without alterations. Due to that, a message that may or may not be related to the puzzle is also hidden in the GitHub profile image.

Better Programming

How to Hide Data Within an Image

Sometimes the best way to share sensitive information is to hide it in plain sight

What’s Steganography?

Why Steganography?

How’s Steganography Implemented?

Using Steganography Effectively

Embed Hidden Data Into an Image

The Unsolved Puzzle of My GitHub Profile

References

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in Better Programming

Written by Vin Busquet

No responses yet