SLML Part 6 - SketchRNN Experiments: Granular Visual Filtering
SLML Part 6 - SketchRNN Experiments: Granular Visual Filtering
This post is part 6 of “SLML” - Single-Line Machine Learning.
To read the previous post, check out part 5.
If you want to keep reading, check out part 7.
New Dataset: 20240104
Though I’d grown my training dataset by bounding-box separating single pages into multiple drawings, I was concerned about the tradeoff of filtering drawings out versus having a more coherent dataset with similar subject matter.
To make up for more aggressive filtering, I decided to incorporate several additional sketchbooks I scanned and labeled into a new dataset epoch 20240104
.
Differences in dataset 20240104
compared to dataset 20231214
:
- More raw input drawings
- Same preprocessing, with a modified “adaptive” RDP simplification 1.
RDP and Sequence Length
In previous datasets, I had chosen the same strength of RDP line simplification for the whole dataset. Some drawings had been simplified reasonably, but other had been simple to begin with and ended up as a series of straight lines much sharper than the original curves.
For the remaining drawings, I ran the RDP algorithm with varying values for its epsilon
parameter, until the number of points dipped under 250. Then I saved the result as a zipped numpy file.
Training on 20240104
After training on 20240104
, the validation losses (gray and blue lines in Figure 2) seemed substantially lower than the validation losses from the models trained on the previous dataset (beige, light green).
Overly Complex Drawings
One failure mode I noticed in the results generated after training on the bounding-box separated dataset 20231214-filtered
was that some generated drawings had knotted, gnarled lines as in Figure 3.
Reviewing the bounding box-separated dataset I noticed that some drawings were of one figure, and some drawings were patterned or chained with many faces.
Sometimes I make patterns by chaining repeating sequences of faces into long continuous lines. I wondered whether the presence of this kind of drawing in the training data was occasionally encouraging the model to make long continuous chains rather than drawing a single person.
I wanted to exclude those patterns/chains from my training data, so I could give my model the best chance of learning to draw one person at a time.
Similarity-Filtered Dataset
I decided to make a subset 20240104-furtherfiltered
of dataset 20240104
.
My plan was to compute embeddings for every bounding-box separated drawing in dataset 20240104
. Then I could K-Means clustering on them, and decide which clusters I wanted to exclude in bulk. 2
Right away I spotted the “too complex” chained line drawings in cluster 0 (Figure 6 (a)). There were also several chained line drawings in cluster 3 (Figure 6 (b)) mixed in with some squarish horizontal drawings that I was happy to exclude from my training set, as they looked too different from my more typical standalone drawings of individual faces/people.
I also noticed some clusters with drawings that were “too simple”. It seems like many of the drawings in cluster 13 (Figure 7 (a)) were stray lines accidentally separated from any context by the bounding-box preprocessing. Cluster 9 (Figure 7 (b)) had many similar nonsensical lines, though they were mixed in with some false positives - valid drawings that I’d prefer to keep in the dataset.
I was excited to notice some distinct categories in my drawings, seeing them from a distance.
Clusters 1, 4, and 11 (in Figure 8 (a), Figure 8 (c), and Figure 8 (i), respectively) all have vertical, narrow, whole-body figures.
Cluster 2, in Figure 8 (b), mostly has rounder compositions of individual faces without a complete body.
Clusters 8 and 15, in Figure 8 (g) and Figure 8 (l), seem to have more complex drawings but mostly still contain drawings of standalone people.
The remaining clusters contain reasonably uniform drawings of standalone people, in vertical compositions, that are not too narrow. Hovering your mouse over these links Figure 8 (d), Figure 8 (e), Figure 8 (f), Figure 7 (b), Figure 8 (h), Figure 8 (j), Figure 8 (k).
Training on Filtered Dataset
The remainder of the clusters, in Figure 8, looked “good enough” for me to include in my training set. I all other clusters, and saved a filtered-down dataset as 20240104-furtherfiltered
.
Compared to dataset 20240104
, it’s clear in the top row of Figure 9 that in the filtered dataset variant, the distribution of number of strokes has shifted away from the long tail of many-stroke drawings.
20240104
(2100 drawings) to the filtered dataset 20240104-furtherfiltered
(1300 drawings).
Comparing the training metrics in Figure 10 for the model trained on filtered dataset 20240104-furtherfiltered
(in red) with the previous model runs on unfiltered dataset 20240104
(in gray and blue) is not a perfect comparison. Since the validation set for 20240104-furtherfiltered
was also filtered, it’s a smaller (and likely noisier) validation set. Still, the new model’s validation loss was roughly within the bounds of what I expected.
20240104-furtherfiltered
using visual filtering on the bounding-box separated drawings (red).
Qualitatively, the generated results after visual similarity filtering were significantly improved.
Even the generated results that looked less like people/faces to me still had appealing curves and flowing patterns, which I recognize from my own drawing style.
If you want to keep reading, check out part 7 of my SLML series.
Footnotes
based on what I observed when filtering the bounding-box dataset by number of points↩︎
similar to what I did with full sketchbook pages in part 1↩︎