In search of the Pikachu meme

5 min readDec 29, 2018

When I first saw the meme I immediately started thinking about ways to find the exact episode and timestamp of said cropped image. Let’s see how I managed to put together the solution without any previous experience.

Prerequisites

It goes without saying that I needed all the episodes of Pokémon. (I cheated a little and downloaded only the one that actually contains the image. This does not alter the behavior of the script so who cares, right?) Next, we need a Pikachu meme with acceptable quality. Done. Time to start planning.

(by the way Brock’s leg looks like an unusually high fez)

High level plan

As you see my plan was pretty vague. All I know is that I’ll need a library to export frames from a video and an algorithm that finds an image in another image.

Recognition

The first thing I looked into is the recognition logic. I knew that perfect match is not possible. Image compression makes video frames and the meme different on pixel level. I needed something that is a bit smarter than a pixel perfect match. I also knew that OpenCV does stuff to images that is beneficial for programmers.

So far I knew that I want to use OpenCV (whichever language is supported) and I also want to find an image in an image. Googling “opencv find image in image” resulted in the following tutorial: Template Matching

Upon reading it (read: looking at the example for 5 seconds and then copy-pasting it to PyCharm), I realized I’ll be using Python. Why? Because…

I wanted to try out the example they included
Python is the best tool for useless scripts that require serious algorithms (bigdata, deep learning, image processing, etc.)
I know a little Python

Video stream

Googling “python video frames” resulted in this Stackoverflow answer. It surprised me that OpenCV actually supports video playback. Oh well. VideoCapture class it is then.

The Problem

After trying out the example script with poor Pikachu, I realized OpenCV’s matchTemplate() function is not able to find images of different scale on the base image (haystack, whatever you call it). This is a huge problem because I had to start thinking for the first time during this project! I searched for “opencv template matching different size” and found this tutorial which eventually solved my problem. (But it was so trivial that it hurt my pride so I looked into feature matching but it seemed to be an overkill as I literally only needed to match the same sub-image to the base with a little amount of error)

Implementation

TemplateMatcher (abstract-ish class)

You supply the needle (Pikachu), the haystack (single video frame) and a threshold value [0…1]. Given the threshold value as the parameter, the match() method magically decides whether the image is found in the haystack or not.

MultiScaleTemplateMatcher (inherits TemplateMatcher)

As the name says, it implements the so called gaussian pyramid that the previously mentioned solution uses. To put it simply, I create an array of differently scaled versions of Pikachu and try to match them on the video frame by calling the match() method of the base class (that I explained earlier).

Detected locations with MultiScale (threshold=0.5, level=13)

VideoFrameGrabber

The instance needs the path of the video, the width to rescale the frame to and the seconds between every frame sampling.
Rescale is needed to make template matching faster later on.
The reason behind the x seconds skipping is also related to performance as we don’t want to process every single frame in a cartoon.
(trivia: I could finally use yield while iterating through the frames as I didn’t have any use case for it all my life)

Here’s the main function for the script that handles the whole process.

def main():
    pikachu = cv2.imread('pikachu_full.jpg', cv2.IMREAD_COLOR)
    matcher = MultiScaleTemplateMatcher(pikachu, 0.90)
    videoGrabber = VideoFrameGrabber('pokemon_s01e10.mp4', 320, 1)
    for timeMs, frame in videoGrabber.iterate():
        if matcher.match(frame):
            print('Found at: ' + GetTimeFromMs(timeMs))
            break

First, I created MultiScaleTemplateMatcher by supplying the pikachu meme image and the global threshold for matching the image. (0.9 sounded like a good threshold value)

I initialized VideoFrameGrabber with the actual episode because I was too lazy to write recursive directory scanning too.

Finally, the code iterates over the frames and if a match is found, the execution stops and the timestamp is printed on the console:

Performance

Using 320 pixels as a new width for the video and a 13 step scaling for Pikachu resulted in the following stats:

The creation of all frames took 17 seconds. (w/o template matching)
A single real search through an episode took 137 seconds.

This means the CPU worked 1 portion of time on rendering the video frames and 7 on looking for that damn yellow hamster. Given that we have 13 different scales, one scale is just 0.5 rendering time which is not too bad.

This way we can finish scanning all 1000 episodes in 38 hours.

What can we do to speed it up?

Nothing. 38 hours for scanning through a very long series is an insignificant amount of time if it is a one-time occasion.
Optimize the width of the resized video frame, the number of levels in the pyramid, the range of smallest to biggest needle. This way the CPU can work on just enough data that you can get your valid result but quicker.
Move the matching algorithm to GPU. But be aware that it may not be faster because the images are too small and the CPU would spend most of its time sending the images from the RAM to GPU and back. I’m sure it would speed up series that have 4K resolution, though.

What to do next?

Well… You could keep looking for surprised Pikachu faces in the whole series. I mean if there’s one maybe there’s more. I would start using that other template matching algorithm in case the match is rotated or scaled or even worse. I would also use a mask to ignore the unimportant parts of the image.