Finally a good stego problem??
Challenge files: kuwahara-png-problem.zip
I was invited to play Lexington Informatics Tournament CTF 2024 with Liminova. This is my first time playing CTF, and while we did not win, we had two first-solves, one of which is titled Kuwahara
, the problem you're reading about now.
The challenge
We are given a noisy image and an encoded image:


Our goal is to extract the flag from the encoded image, which was transformed from the noisy image using the Kuwahara filter and an encoding scheme.
Kuwahara filter
The Kuwahara filter is a smoothing filter used for adaptive noise reduction on grayscale images. In this image, suppose we take a window divided into 4 overlapping, equal quadrants:
( a a ab b b)
( a a ab b b)
(ac ac abcd bd bd)
( c c cd d d)
( c c cd d d)
In each quadrant, the arithmetic mean and standard deviation is calculated. Then, the center pixel is set to the mean of the quadrant with the smallest variance.
For color images, a copy of the image in grayscale or HSV is made (if HSV, only the brightness channel is kept), and then the filter operates on this copy to select the quadrant with the smallest variance, then the center pixel is set to the RGB mean of that quadrant.
Implementing the Kuwahara filter
Alongside the two images, we are also given the source code for the encoding routines in main.py
, with an instruction to implement the Kuwahara filter ourselves:
def kuwahara(original_image, window_size):
"""
Kuwahara Algorithm. This function applies the Kuwahara algorithm to an image.
More specifically, it runs the average based filter on the image,
going over each pixel, putting 4 windows of window_size x window_size each
having the original pixel as a corner, and then calculating the
standard deviation of these regions (in grayscale). Then, set the color
of the pixel to the color average of the region with the lowest standard deviation.
Args:
original_image (ndarray): The original image to apply the Kuwahara algorithm to.
window_size (int): The size of the window used for calculating the local statistics.
Returns:
tuple: A tuple containing the following elements:
- averages (ndarray): An array of shape (4, H, W, C)
containing the average values for each subregion.
- stddevs (ndarray): An array of shape (4, H, W)
containing the standard deviations for each subregion.
"""
# Your code here
return None, None
The code for this function can be trivially copied from pykuwahara, with the Gaussian parts omitted since we're only interested in calculating means and variances.
Full code
def kuwahara(original_image: MatLike, winsize: int):
image = original_image.astype(np.float32, copy=False)
avgs = np.empty((4, *image.shape), dtype=image.dtype)
stddevs = np.empty((4, *image.shape[:2]), dtype=image.dtype)
image_2d = (
cv2
.cvtColor(original_image, cv2.COLOR_BGR2GRAY)
.astype(image.dtype, copy=False)
)
avgs_2d = np.empty((4, *image.shape[:2]), dtype=image.dtype)
squared_img = image_2d ** 2
radius = (winsize - 1) // 2
kxy = np.ones(radius + 1, dtype=image.dtype) / (radius + 1)
shift = [(0, 0), (0, radius), (radius, 0), (radius, radius)]
for k in range(4):
kx = ky = kxy
_ = cv2.sepFilter2D(image, -1, kx, ky, avgs[k], shift[k])
_ = cv2.sepFilter2D(image_2d, -1, kx, ky, avgs_2d[k], shift[k])
_ = cv2.sepFilter2D(squared_img, -1, kx, ky, stddevs[k], shift[k])
stddevs[k] = stddevs[k] - avgs_2d[k] ** 2
return avgs, stddevs
Encoding information using the Kuwahara filter
Inside main.py
, we are also given the code to encode a message using the filter:
def kuwahara(original_image: MatLike, winsize: int):
# Refer to "Implementing the Kuwahara filter" for code.
...
def kuwahara_encryption(original_image, message, window_size=5):
averages, stddevs = kuwahara(original_image, window_size)
# Might not give them these two, might give them...
indices = np.argmin(stddevs, axis=0)
sorted_indices = np.argsort(stddevs, axis=0)
# Start at (window_size, window_size) to avoid the border, where
# the Kuwahara Algorithm might not get enough pixels.
start_x = start_y = window_size
for m in message:
indices[start_x][start_y] = sorted_indices[m][start_x][start_y]
start_y += 1
if start_y == original_image.shape[1] - window_size:
start_y = window_size
start_x += 1
# Second favorite function when writing this, take_along_axis. Epic.
filtered = (
np.take_along_axis(
averages,
indices[None, ..., None],
0,
)
.reshape(original_image.shape)
.astype(original_image.dtype)
)
return filtered
The encoding procedure can be described as follows:
- Calculate the mean and variance for all quadrants in the image
- Find the quadrants with the smallest variance to set the target pixels to
- Starting from
(x, y) = (window_size, window_size)
, for each digitm
in the message:- Replace the pixel at
(x, y)
with the mean of the quadrant with them + 1
th smallest variance. (e.g. ifm = 1
then the mean of the quadrant with the second least variance is used). - Increment
y
by 1. - If
y
is equal to the width of the image minuswindow_size
, incrementx
by 1 and sety
towindow_size
.
- Replace the pixel at
Since there are only 4 quadrants, the message must be encoded in base-4 or lower. In the script, it is encoded as base-3, in order to distinguish encoded pixels from normal filtered pixels:
flag = [enc + 1 for enc in encode("FLAG", 3)] # FLAG encoded with base 3, plus 1 to ensure offsets are always found.
The base-3 encoding scheme
The implementation of the base-N encoding and decoding routines are not important, and can be read in the provided module encoder.py
. However, there are two important properties to note:
- The
encode
anddecode
functions return/take a list of digits:
>>> encode('A', 2)
[1, 0, 0, 0, 1, 1, 1, 1, 0] # 0b100011110
>>> decode([1, 0, 0, 0, 1, 1, 1, 1, 1], 2)
'B'
- Each character takes up a constant number of digits, determined by the base
and the number of characters in the provided character set :
The provided charset is ASCII letters, digits, space and _{}
(66 characters), so there are 6 digits per character in base 3, the base used by the script.
Decoding the encoded image
So, knowing the encoding scheme, we can decode the flag from the noisy image with a little bit of brute forcing:
- Calculate the mean and variance for all quadrants in the image
- Starting from
(x, y) = (window_size, window_size)
, for each pixel(x, y)
:- Compare it against the averages of the 3 quadrants around the pixel (the one with the smallest variance is excluded).
- If we find a match, the pixel is encoded, and we save the ordinal (1st, 2nd, ...) of the pixel to the message.
- Otherwise, the pixel is normal, and the message has ended. Exit the function.
- Increment
y
by 1. - If
y
is equal to the width of the image minuswindow_size
, incrementx
by 1 and sety
towindow_size
.
A Python implementation of that would look something like this:
def kuwahara(original_image: MatLike, winsize: int):
# Refer to "Implementing the Kuwahara filter" for code.
...
def kuwahara_decryption(original_image: MatLike, enc_image: MatLike, window_size: int = 5):
averages, stddevs = kuwahara(original_image, window_size)
indices = np.argmin(stddevs, axis=0)
sorted_indices = np.argsort(stddevs, axis=0)
message: list[int] = []
# Start at (window_size, window_size) to avoid the border, where
# the Kuwahara Algorithm might not get enough pixels.
x = y = window_size
while start_x < original_image.shape[0]:
actual_pixel = enc_image[x][y]
expected_index = indices[x][y]
expected_pixel = averages[expected_index][x][y].astype(original_image.dtype)
# The encoded pixels are continuous. If we reach a pixel that is the same as
# a normal pixel applied through Kuwahara, we can be reasonably sure that the
# message has ended.
if (expected_pixel == actual_pixel).all():
break
# Brute force the key by comparing the current pixel with the average of the
# 3 remaining quadrants.
for m in range(1, 4):
index = sorted_indices[m][x][y]
pixel = averages[index][x][y].astype(original_image.dtype)
if (pixel == actual_pixel).all():
# We have a key!
message.append(m)
break
# Move to the next pixel
y += 1
if y == original_image.shape[1] - window_size:
x += 1
y = window_size
return message
Yay! We have finished our decoding routine, time to plug the images into it and get our 500 points!! I wonder why no one has solved this challenge yet... Huh?!
$ python main.py
Flag:
There was nothing?!
Problem 1: It's not what you think it is
It was 12pm. I had spent the whole morning on this. I was ready to slam shut my laptop and go get something nice to eat, but then I tried a roundtrip encoding/decoding with my own message as a sanity check. It did:
# input_image = cv2.imread("img/noisy_image.png")
# message = [x + 1 for x in encode("help_im_trapped_in_a_kuwahara_factory", 3)]
# encoded_image = kuwahara_encryption(input_image, message)
# cv2.imsave("img/test.png", encoded_image)
# decoded_message = [x - 1 for x in kuwahara_decryption(input_image, encoded_image)]
# print(f"Flag: {decode(decoded_message, 3)}")
$ python main.py
Flag: help_im_trapped_in_a_kuwahara_factory
Then I noticed that my encoded image had way sharper edges than the provided encoded image:


After experimenting with different options in the pykuwahara
encoder, it became clear that what the provided code referred to as window_size
is actually the size of a quadrant (what pykuwahara
refers to as the radius
), instead of the entire window of 4 quadrants. This was also explained in the docstring of the kuwahara
function in the script:
Kuwahara Algorithm. This function applies the Kuwahara algorithm to an image. More specifically, it runs the average based filter on the image, going over each pixel, putting 4 windows of window_size x window_size each having the original pixel as a corner, and then calculating the standard deviation of these regions (in grayscale)
Learned a lesson in why reading is important. Rewrote the code in kuwahara
to treat the window_size
how it was meant to be. The flag should now appear surely?
$ python main.py
Traceback (most recent call last):
File "...\litctf2024\kuwahara3\kuwahara-png-problem\main.py", line 202, in <module>
final_message = decode(decoded_indices, 3)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "...\litctf2024\kuwahara3\kuwahara-png-problem\encoder.py", line 122, in decode
seq += decoder[conv]
~~~~~~~^^^^^^
KeyError: 1
Nope.
Problem 2: The random off-by-ones
Alternate header: How I autistically brute forced out the flag
It was now 2pm. I had completely forgot about having lunch. I had to get this done, no matter the cost. As any sane developer did, I started print debugging:
$ python main.py
[2, 1, 2, 2, 1, 3, 2, 1, 2, 2, 1, 3, 2, 1, 1, 3, 2, 2, 2, 1, 2, 2, 2, 3, 2, 1, 1, 3, 1, 2, 2]
Traceback (most recent call last):
# ...
That was the decoded message, which was 31 digits in total. Not only was this not divisible by 6 (remember that each character always takes up 6 digits), but the whole characters in there didn't decode to part of the flag wrapper:
>>> dec = [2, 1, 2, 2, 1, 3, 2, 1, 2, 2, 1, 3, 2, 1, 1, 3, 2, 2, 2, 1, 2, 2, 2, 3, 2, 1, 1, 3, 1, 2]
>>> decode([x - 1 for x in dec], 3)
'IIsLp'
>>> [x + 1 for x in encode("LITCT", 3)]
[2, 1, 2, 2, 2, 3, 2, 1, 2, 2, 1, 3, 2, 1, 2, 3, 2, 2, 2, 1, 2, 1, 2, 3, 2, 1, 2, 3, 2, 2]
Notice the off-by-one digits in the decoded message compared to the encoded flag wrapper:
dec = [2, 1, 2, 2, 1, 3, 2, 1, 2, 2, 1, 3, 2, 1, 1, 3, 2, 2, 2, 1, 2, 2, 2, 3, 2, 1, 1, 3, 1, 2]
enc = [2, 1, 2, 2, 2, 3, 2, 1, 2, 2, 1, 3, 2, 1, 2, 3, 2, 2, 2, 1, 2, 1, 2, 3, 2, 1, 2, 3, 2, 2]
^ ^ ^ ^ ^
They were off in seemingly random places, but they were all off by one.
Also, the flag was cut short - signs point to a pixel matching the expected pixel cutting short the message, so we tried again but now testing for all quadrants instead of only the second/third/fourth smallest variance in the decoding function:
- for m in range(1, 4):
+ for m in range(0, 4):
index = sorted_indices[m][x][y]
pixel = averages[index][x][y].astype(original_image.dtype)
if (pixel == actual_pixel).all():
# We have a key!
message.append(m)
break
Trying again:
$ python main.py
[2, 1, 2, 2, 1, 3, 2, 1, 2, 2, 1, 3, 2, 1, 1, 3, 2, 2, 2, 1, 2, 2, 2, 3,
2, 1, 1, 3, 1, 2, 2, 0, 2, 1, 3, 2, 3, 1, 1, 1, 1, 3, 2, 1, 1, 3, 3, 1,
3, 1, 2, 2, 3, 2, 2, 1, 3, 1, 2, 3, 3, 1, 2, 3, 3, 3, 2, 1, 1, 3, 1, 2,
2, 1, 3, 1, 2, 3, 2, 0, 1, 1, 3, 1, 2, 1, 2, 3, 2, 2, 2, 1, 3, 1, 2, 3,
2, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 2, 2, 1, 2, 3, 2, 3,
2, 1, 1, 3, 3, 3, 2, 1, 3, 1, 3, 1, 2, 1, 1, 2, 1, 3, 2, 1, 3, 1, 3, 1,
2, 1, 1, 3, 2, 1, 2, 1, 3, 1, 3, 1, 2, 1, 1, 1, 2, 1,
...] # the rest were 0 interspersed with a few 1s, for some reason.
It looks like we have a message here. However, a bunch of the digits are off by one, and there are two 0
s cutting short the decoding process. If we assume that the first 7 characters and the last character make up the flag wrapper LITCTF{}
, all we need to care about is this section:
>>> dec[7 * 6:-6]
[2, 1, 1, 3, 3, 1, 3, 1, 2, 2, 3, 2, 2, 1, 3, 1, 2, 3,
3, 1, 2, 3, 3, 3, 2, 1, 1, 3, 1, 2, 2, 1, 3, 1, 2, 3,
2, 0, 1, 1, 3, 1, 2, 1, 2, 3, 2, 2, 2, 1, 3, 1, 2, 3,
2, 1, 2, 1, 2, 1, 2, 1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 2,
2, 1, 2, 3, 2, 3, 2, 1, 1, 3, 3, 3, 2, 1, 3, 1, 3, 1,
2, 1, 1, 2, 1, 3, 2, 1, 3, 1, 3, 1, 2, 1, 1, 3, 2, 1,
2, 1, 3, 1, 3, 1]
Decoding all the decodable characters in the message gave us most of the flag (unknown characters replaced with U+FFFD REPLACEMENT CHARACTER
):
u�3�p3�T3A_jUw4h4r4
Notice that all 3 undecodable characters all have a large most significant digit, or had a 0 digit:
>>> msg[6:12]
[3, 1, 2, 2, 3, 2]
>>> msg[18:24]
[3, 1, 2, 3, 3, 3]
>>> msg[36:42]
[2, 0, 1, 1, 3, 1]
By lowering the most significant digit to 2 in the first two cases, and increasing the 0 digit to 1 in the remaining case, we can find out the 3 missing characters:
>>> msg[6] = 2
>>> msg[18] = 2
>>> msg[37] = 1
>>> decode([x - 1 for x in msg[6:12]], 3)
'N'
>>> decode([x - 1 for x in msg[18:24]], 3)
'X'
>>> decode([x - 1 for x in msg[36:42]], 3)
'c'
The message is now uN3Xp3cT3A_jUw4h4r4
. You can probably guess what to fix now, given the spelling mistakes as well as what this challenge is related to. unfortunately i couldn't, my brain was fried. props to rylie for figuring out the final flag.
While this method was tedious, time consuming and psychic damage inducing, we captured the flag and got the first kill, and that was all that seemed to matter.
It's always the damn floating point numbers
After we got the first solve on Kuwahara, I was contacted by the author of the problem, Agniv, who I explained my method to, as well as point out all of the aforementioned problems I hit while trying to solve the challenge.
When he heard about the off-by-ones, he suggested that instead of grayscaling the original image in the kuwahara
function, grayscale the image casted into ndarray[float32]
:
image = original_image.astype(np.float32, copy=False)
- image_2d = cv2.cvtColor(original_image, cv2.COLOR_BGR2GRAY).astype(image.dtype, copy=False)
+ image_2d = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
Trying again this time yields the flag instantly without any messy 0
digits or corrupted bits:
$ python main.py
Flag: LITCTF{uN3Xp3cT3D_kUw4h4r4}
This was not something I had thought of, since the pykuwahara
implementation converted the original image into grayscale instead. But it turns out that in order to get the most accurate RGB to grayscale conversion, a floating point ndarray
is desired, because the conversion algorithm in OpenCV uses floating point coefficients.
Conclusion
Overall, the concept of this problem was quite cool, even if a little simple, however it ended up being one of the last challenges to get solved at LIT CTF 2024 due to all of the problems I mentioned above, making solving this extremely tedious. Maybe if the challenge was worded better, we would have seen more take a crack at it?
Now excuse me, I have to go take my 5pm lunch.