SLML Part 5 - SketchRNN Experiments: Path Preprocessing

Experiments training SketchRNN on my dataset of single-line drawings.

Andrew Look


December 14, 2023

SLML Part 5 - SketchRNN Experiments: Path Preprocessing

This post is part 5 of “SLML” - Single-Line Machine Learning.

To read the previous post, check out part 4. If you want to keep reading, check out part 6.

Path Joining

Despite some improvement within the strokes, my latest models were still producing many short strokes. So I figured that improving my dataset would give better results than fiddling with hyper-parameters or my model.

Looking at my training data, I returned to an issue from the stroke3-conversion process: autotrace hadn’t realized that the different segments of my lines were actually connected. It was producing a series of small, centerline-traced segments, as seen in Figure 1 (a).

I’d need some preprocessing to connect the smaller strokes into larger strokes, so that my SketchRNN model would learn to generate long contiguous single-line drawings.

(a) Original: 22 strokes
(b) After path-joining: 4 strokes
(c) After path-splicing: 2 strokes
Figure 1: Comparison of number of strokes before and after preprocessing, with one color per stroke.

My first preprocesing improvement was simple algorithm that I called “path joining”.

Each of my drawings is represented as a collection of line segments, each with a start and end point. I try to find the shortest connection I can draw to connect two segments. Starting with the longest stroke, I compare it each of the other strokes and calculate the minimum distance between the start and end points of the line segments. After going through all the strokes in a drawing, I connect the two strokes with the shortest gap between their endpoints. I repeat this process, joining the strokes until no strokes remain with endpoints more than 30 pixels apart.

I set an upper bound on the path-joining distance of 30 pixels (in a 200x200 pixel image) was the maximum distance I observed made sense before I started to erroneously join strokes that were too far apart.

Though path-joining improved the average stroke length as seen in Figure 1 (b), I noticed some drawings had large contiguous strokes that weren’t getting connected. I realized that while the strokes were close together and in some cases touching, their start and end points were far apart from each other.

Path Splicing

My next preprocessing improvement, “path splicing”, would run after “path joining” and attempt to address this problem.

After path joining leaves a smaller number of strokes, I want to find the shortest connections to combine multiple longer strokes. Starting with the longest path, I look for a smaller stroke that I could “splice” into the middle of the larger stroke. For each candidate stroke, I’d step through each point on the larger stroke and compare its distance from the start and end points of the shorter paths. When I found the smallest gap, I would “splice” the shorter line into the longer path at the point with the smallest distance.

While not every drawing was turned into a single stroke, this was a big improvement, as seen in Figure 1 (c).

Figure 2: Previous dataset look_i16__minn10, left, compared to path-spliced dataset v2-splicedata, right.

Based my earlier experiment showing the benefits of short-stroke exclusion, I wanted to try training a model on this new dataset.

Training after Path-Splicing

I ran the preprocessing on the whole dataset. Next, I filtered the original 1300 drawings to exclude any drawings with more than 6 strokes, resulting in a new dataset of 1200 drawings that I named v2-splicedata. Then I trained a new set of models, keeping the layernorm and recurrent dropout enabled.

Figure 3: Training and validation loss metrics of recurrent dropout model (light green) alongside models trained on this joined/spliced dataset (turquoise/dark green).

After training some models on a path-spliced dataset the training metrics aren’t a perfect comparison, since the content of the validation also changed when I applied path-splicing. Still, I can see from the validation loss graph that the model started to overfit around 1000 steps. The roughly similar shapes of the early train and validation loss curves at least convinced me that the model hadn’t gotten dramatically worse.

Figure 4: Generated samples from a model trained on path-spliced dataset.

Qualitatively, the generated drawings showed a big improvement. The model had learned to generate longer unbroken strokes. I started to notice results that looked more like single-line drawings of people. In some cases they look surreal, but I started to see some more recognizable face elements and in some cases full bodies.

Figure 5: Iterating through full dataset, with 3 frames per drawing: original, path-joined, and path-spliced. I liked the slight jitter in the animation as I watched the drawings go from colorful (many strokes) to fully blue (single stroke).

Bounding Boxes

My last remaining problem: there were some pages where I’d make 4 or 5 separate drawings that had nothing to do with each other, and had a lot of space between them.

I wanted to separate those into separate examples before making a training dataset, for 2 reasons:

  1. To get more training examples from my limited number of scanned drawings.
  2. To avoid confusing the model when some examples are complex drawings with multiple people, and other drawings are just one person.
If the model sometimes learngs to make a complete drawing and then start a second unrelated drawing, how does the model know when to finish a drawing vs. when to start a second drawing alongside the first one?

My intuition for my third preprocessing improvement, “bounding box separation”, came when I noticed how unrelated strokes within a drawing didn’t overlap much, and tended to be spaced far apart. For each of the stroke within a drawing, I’d determine its top/bottom/left/right extremes and consider a box around each stroke as in Figure 6.

Figure 6: Example drawing with multiple unrelated strokes.

Then for each combination of bounding boxes, I’d compute a ratio of the area of their overlap compared to the area of the non-overlapping parts as in Figure 7.

Figure 7: Intersection over Union (“IOU”) Metric.

If the ratio exceeds some threshold, I consider them to be part of the same drawing, and I merge the bounding boxes as in Figure 9 (a). Combining that merged bounding box along with all the remaining bounding boxes, I repeat the process until no bounding-box intersections remain that exceed the threshold.

IOU = 0.03

IOU = 0.12
(a) Merged BBoxes
Figure 8: Comparison of high-IOU vs. low-IOU bounding box intersections.

Also, if any bounding boxes have a really small area, I just drop them. It turns out this helps exclude small scribbles of text that were ending up in my training data as separate strokes - for example, the page number at the bottom right of Figure 9 (a).

Original: 4 Strokes
(a) Tiny page number: 1 stroke

Merge: 3 Strokes

Figure 9: Example from training set of a very small stroke being removed.

Dataset 20231214

Once I have all the separated strokes, I save then into a new dataset as separate drawings. While the previous dataset v2-splicedata only has 1200 drawings, the new bounding-box separated dataset 20231214 has 2400 drawings.

Figure 10: Comparison of previous dataset v2-splicedata with 20231214 and 20231214-filtered.

The dataset grew from 1200 to 2400 drawings because pages containing multiple distinct drawings (such as Figure 11 (a)) were divided into separate rows in the new training set (like Figure 11 (b), Figure 11 (c), Figure 11 (d)).

(a) Original
Figure 11: Original drawing (left) was one row in dataset v2-splicedata. The rightmost three drawings are distinct rows in dataset 20231214.

The new separated drawings looked more visually consistent with the average drawing out of the training set as a whole. The new dataset contains far more single-character drawings, so I expect that the RNN will benefit on learning from a set of drawings with more similar subject matter.

I hypothesized that the bbox-separated dataset will be a big improvement because of the consistency of the resulting drawings. Before, the model was learning that sometimes drawings end after one person is drawn, but sometimes we move the pen and start a new person.

Filtering by Number of Points

Looking at the distribution of number of points per drawing in the new dataset 20231214 in the bottom-middle chart in Figure 10, I noticed a long tail of drawings with more than 500 points. To explore this, I created a variant dataset 20231214-filtered.

Dataset 20231214-filtered which was filtered down to 1800 drawings, keeping only drawings with more than 50 and less than 300 points as you can see in the bottom-right chart in Figure 10.

Wondering if drawings with many points were less likely to have consistent subject matter (individual people) than the rest of the training set, I sampled some drawings with over 300 points. While drawings such as Figure 12 (a) and Figure 12 (b) were obvious candidates to exlcude, there were valid drawings near the margin such as Figure 12 (c) that I would be excluding after I picked a threshold.

Possible Improvement: Filtering by visual embedding might be more reliable to exclude complex drawings
(a) 1093 points
(b) 1127 points
(c) 329 points
Figure 12: Drawings with over 300 points.

I also looked at the low end of the distribution and found drawings with under 50 points. There were nonsensical squiggles such as Figure 13 (a) that I was happy to exclude. There were cases below the 50 point threshold such as Figure 13 (b) and Figure 13 (c) that looked recognizable as my drawings, but had been simplified by RDP too aggressively.

Possible Improvement: Applying RDP after bounding box separation rather than before.
(a) 21 points
(b) 30 points
(c) 41 points
Figure 13: Drawings with under 50 points.

Training after Bounding Boxes

Figure 14: Training and validation loss metrics comparing model trained on 20231214 (beige) compared with models trained on dataset 20231214-filtered with and without stroke augmentation (green, burgundy).

After training on the unfiltered dataset 20231214, I noticed that some drawings were devolving into a sequence of repeated face features without forming a cohesive person or face, as in Figure 15.

Figure 15: Generated results after training on unfiltered dataset 20231214.

The model results in Figure 16 after training on filtered dataset 20231214-filtered appear qualitatively better to me. The best results I could find had long coherent strokes capturing part of a face and sometimes a corresponding body.

Figure 16: Generated results after training on filtered dataset 20231214-filtered.

The model results in Figure 17 after training with stroke-augmentation on filtered dataset 20231214-filtered appear to be roughly of similar quality to . The best results I could find had long coherent strokes capturing part of a face and sometimes a corresponding body.

Figure 17: Generated results after training on filtered dataset 20231214-filtered, with stroke augmentation enabled.

If you want to keep reading, check out part 6 of my SLML series.