SLML Part 6 - SketchRNN Experiments: Granular Visual Filtering

Experiments training SketchRNN on my dataset of single-line drawings.

Andrew Look


January 4, 2024

SLML Part 6 - SketchRNN Experiments: Granular Visual Filtering

This post is part 6 of “SLML” - Single-Line Machine Learning.

To read the previous post, check out part 5.

If you want to keep reading, check out part 7.

New Dataset: 20240104

Though I’d grown my training dataset by bounding-box separating single pages into multiple drawings, I was concerned about the tradeoff of filtering drawings out versus having a more coherent dataset with similar subject matter.

To make up for more aggressive filtering, I decided to incorporate several additional sketchbooks I scanned and labeled into a new dataset epoch 20240104.

Differences in dataset 20240104 compared to dataset 20231214:

  1. More raw input drawings
  2. Same preprocessing, with a modified “adaptive” RDP simplification 1.

RDP and Sequence Length

In previous datasets, I had chosen the same strength of RDP line simplification for the whole dataset. Some drawings had been simplified reasonably, but other had been simple to begin with and ended up as a series of straight lines much sharper than the original curves.

Figure 1: 30 points

For the remaining drawings, I ran the RDP algorithm with varying values for its epsilon parameter, until the number of points dipped under 250. Then I saved the result as a zipped numpy file.

Training on 20240104

Figure 2: Training and validation loss metrics from models trained on 20240104 using visual filtering on the bounding-box separated drawings, with maxiumum sequence lengths of 200 (gray) and 250 (blue).

After training on 20240104, the validation losses (gray and blue lines in Figure 2) seemed substantially lower than the validation losses from the models trained on the previous dataset (beige, light green).

Overly Complex Drawings

One failure mode I noticed in the results generated after training on the bounding-box separated dataset 20231214-filtered was that some generated drawings had knotted, gnarled lines as in Figure 3.

Figure 3: Generated examples with too much complexity.

Reviewing the bounding box-separated dataset I noticed that some drawings were of one figure, and some drawings were patterned or chained with many faces.

(a) 1093 points
(b) 1127 points
(c) 329 points
Figure 4: Drawings with over 300 points.

Sometimes I make patterns by chaining repeating sequences of faces into long continuous lines. I wondered whether the presence of this kind of drawing in the training data was occasionally encouraging the model to make long continuous chains rather than drawing a single person.

chains example
Figure 5: Example of a “diagonal chain” single-line pattern I draw.

I wanted to exclude those patterns/chains from my training data, so I could give my model the best chance of learning to draw one person at a time.

Similarity-Filtered Dataset

I decided to make a subset 20240104-furtherfiltered of dataset 20240104.

My plan was to compute embeddings for every bounding-box separated drawing in dataset 20240104. Then I could K-Means clustering on them, and decide which clusters I wanted to exclude in bulk. 2

Right away I spotted the “too complex” chained line drawings in cluster 0 (Figure 6 (a)). There were also several chained line drawings in cluster 3 (Figure 6 (b)) mixed in with some squarish horizontal drawings that I was happy to exclude from my training set, as they looked too different from my more typical standalone drawings of individual faces/people.

(a) Cluster 0
(b) Cluster 3
Figure 6: Clusters with drawings that were “too complex”.

I also noticed some clusters with drawings that were “too simple”. It seems like many of the drawings in cluster 13 (Figure 7 (a)) were stray lines accidentally separated from any context by the bounding-box preprocessing. Cluster 9 (Figure 7 (b)) had many similar nonsensical lines, though they were mixed in with some false positives - valid drawings that I’d prefer to keep in the dataset.

(a) Cluster 13
(b) Cluster 9
Figure 7: Clusters with drawings that were “too simple”.

I was excited to notice some distinct categories in my drawings, seeing them from a distance.

In the future, as I add more drawings, it’d be great to explicitly label these drawing categories and even train separate models on them. For now, given that I don’t have enough drawings scanned yet, I’m choosing to keep them in one dataset.

Clusters 1, 4, and 11 (in Figure 8 (a), Figure 8 (c), and Figure 8 (i), respectively) all have vertical, narrow, whole-body figures.

Cluster 2, in Figure 8 (b), mostly has rounder compositions of individual faces without a complete body.

Clusters 8 and 15, in Figure 8 (g) and Figure 8 (l), seem to have more complex drawings but mostly still contain drawings of standalone people.

The remaining clusters contain reasonably uniform drawings of standalone people, in vertical compositions, that are not too narrow. Hovering your mouse over these links Figure 8 (d), Figure 8 (e), Figure 8 (f), Figure 7 (b), Figure 8 (h), Figure 8 (j), Figure 8 (k).

(a) Cluster 1
(b) Cluster 2
(c) Cluster 4
(d) Cluster 5
(e) Cluster 6
(f) Cluster 7
(g) Cluster 8
(h) Cluster 10
(i) Cluster 11
(j) Cluster 12
(k) Cluster 14
(l) Cluster 15
Figure 8: Clusters with drawings that looked good to me.

Training on Filtered Dataset

The remainder of the clusters, in Figure 8, looked “good enough” for me to include in my training set. I all other clusters, and saved a filtered-down dataset as 20240104-furtherfiltered.

Compared to dataset 20240104, it’s clear in the top row of Figure 9 that in the filtered dataset variant, the distribution of number of strokes has shifted away from the long tail of many-stroke drawings.

Figure 9: Comparing to unfiltered dataset 20240104 (2100 drawings) to the filtered dataset 20240104-furtherfiltered (1300 drawings).

Comparing the training metrics in Figure 10 for the model trained on filtered dataset 20240104-furtherfiltered (in red) with the previous model runs on unfiltered dataset 20240104 (in gray and blue) is not a perfect comparison. Since the validation set for 20240104-furtherfiltered was also filtered, it’s a smaller (and likely noisier) validation set. Still, the new model’s validation loss was roughly within the bounds of what I expected.

Figure 10: Training and validation loss metrics from models trained on 20240104-furtherfiltered using visual filtering on the bounding-box separated drawings (red).

Qualitatively, the generated results after visual similarity filtering were significantly improved.

Figure 11: Generated samples after training with visual filtering on bbox-separated dataset.

Even the generated results that looked less like people/faces to me still had appealing curves and flowing patterns, which I recognize from my own drawing style.

Figure 12: Generated samples after training with visual filtering on bbox-separated dataset.

If you want to keep reading, check out part 7 of my SLML series.


  1. based on what I observed when filtering the bounding-box dataset by number of points↩︎

  2. similar to what I did with full sketchbook pages in part 1↩︎