SLML Part 3 - JPEG to SVG to Stroke-3
SLML Part 3 - JPEG to SVG to Stroke-3
This post is part 3 of “SLML” - Single-Line Machine Learning.
To read the previous post, check out part 2.
If you want to keep reading, here is part 4.
Most computer vision algorithms represent images as a rectangular grid of pixels on a screen. The model that the Magenta team trained, SketchRNN, instead interprets the drawings as a sequence of movements of a pen. They call this “stroke-3 format”, since each step in the sequence is represented by 3 values:
delta_x
: how much did it move left-to-right?delta_y
: how much did it move up-and-down?lift_pen
: was the pen down (continuing the current stroke) or was the pen lifted (moving to the start of a new stroke)
First, I had to convert my JPEG scans into the “stroke-3” format. This would involve:
- converting the files from JPEG to SVG
- converting SVG to stroke-3
- simplifying the drawings to reduce the number of points
JPEG to SVG
When I first started converting to SVG, I had trouble finding a tool that would give me a single, clean stroke for each line. Eventually I found a tool called autotrace
that was able to correctly do a “centerline trace”.
SVG to Points
Then I used a python library called svgpathtools
to take the resulting SVG files, and convert each of the paths to a sequence of points. This step is necessary because SVG paths are often represented as Bezier curves.
One problem I noticed was that the drawings were represented as many separate strokes rather than one continuous line. For example, in the image below, each color represents a separate pen stroke.
Line Simplification
Finally, I’d apply the Ramer-Douglas-Pecker (“RDP”) algorithm on the resulting points, which uses an adjustable “epsilon” parameter to simplify down the drawings by reducing the number of points in a line’s path.
This is important because the SketchRNN model has difficulty with sequences longer than a few hundred points, so it’s helpful to simplify the drawings down by removing some of the very fine details while preserving the overall shape.
Next in my SLML series is part 4, where I experiment with hyperparams and datasets in training SketchRNN.