What is Encryption vs Hashing vs Encoding vs Compression?
Encryption, hashing, encoding, and compression are different terms related to data manipulation. Knowing the difference between each of them is important to software development and not looking incompetent in the science.
In my experience, managers and even fellow colleagues sometimes misunderstand these terms or misuse one term for another, so let’s describe each one and their key differences.
Encryption is the act of converting a series of data into an “unreadable” state while also permitting another method, decryption, to convert that data back into a readable state. As you might expect, encryption is used for privacy.
Two common types of encryption are
asymmetric encryption. Each of them uses a “key” (which can be thought of as a random series of characters).
The inputs of encryption are then: The original data & some keys.
The output is: The encrypted (“unreadable”) data.
Symmetric encryption uses a single key shared between two parties (commonly referred to as Alice and Bob) where the decryption step also uses the same key. This enables two-way communcation.
Asymmetric encryption uses two keys,
private for one-way communication. The public key is used to encrypt a message so that only the holder of the private key can decrypt the message. The public key can be shared to any party that the private key owner wishes to receive messages from.
Hashing, in comparison to encryption, is a one-way action. It can be thought of as converting a message into a random string of characters using a
hashing function. This “hashed” string cannot be mapped back to its original. In the typical usage of hashing, the same message is always converted to the same hashed string. Multiple different messages can be converted to the same hash string, but it is usually unlikely.
The usage of hashing is often for verifying that the same input was received without storing the input, such as in password hashing. For example, a (bad) password “abcdefg” might map to a hashed string “xH1D9a”. Then, the database will only store the hashed string, and whenever a raw password is received, it is hashed and compared with the stored hashed string to validate its correctness.
While the password hashing example provides some level of privacy, the one-way nature of hashing is not useful if the original data needs to be received.
The inputs of encryption are then: The original data & a hashing function.
The output is: The hashed string.
Encoding is completely separate from encryption and hashing. Its primary use is to enable data to be sent across a channel. For example, base64 encoding is used to convert image data into a string of characters consisting ONLY of upper- and lower-case Roman letters ( A — Z , a — z ), which can be sent through an internet request. Then, decoding is used to convert the data back to its original content. Note that encoding does not provide any privacy protection.
The inputs of encryption are then: The original data & an encoding algorithm.
The output is: The encoded string.
Compression is more closely like encoding than encryption and hashing in that it does not provide any privacy and is usually not a one-way action. The purpose of compression is to minimize the amount of data that is sent over a network. For example, an image is compressed, sent over a network, then decompressed on the receiver to view the original data.
Compression at a very high level uses properties of the data and does not use any additional ‘keys’ or inputs.
The inputs of encryption are then: The original data & a compression algorithm.
The output is: The compressed data.
There are two major types of compression:
As it is aptly called, lossless compression allows the compressed data to be decompressed to the exact same form as it was before. This is important for data such as text files.
In comparison to lossless compression, lossy compression can minimize the data even more, at the cost of not being decompressed to the exact same original data. This is typically used in images or video, where the reduced quality of data decompressed by the receiver is often acceptable.
In conclusion, I have described the usage, properties, and differences of encryption, hashing, encoding, and compression. I hope you have found these high-level descriptions useful.