SLML Part 5 - SketchRNN Experiments: Path Preprocessing
SLML Part 5 - SketchRNN Experiments: Path Preprocessing
This post is part 5 of “SLML” - Single-Line Machine Learning.
To read the previous post, check out part 4. If you want to keep reading, check out part 6.
Path Joining
Despite some improvement within the strokes, my latest models were still producing many short strokes. So I figured that improving my dataset would give better results than fiddling with hyper-parameters or my model.
Looking at my training data, I returned to an issue from the stroke3-conversion process: autotrace hadn’t realized that the different segments of my lines were actually connected. It was producing a series of small, centerline-traced segments, as seen in Figure 1 (a).
I’d need some preprocessing to connect the smaller strokes into larger strokes, so that my SketchRNN model would learn to generate long contiguous single-line drawings.
My first preprocesing improvement was simple algorithm that I called “path joining”.
Each of my drawings is represented as a collection of line segments, each with a start and end point. I try to find the shortest connection I can draw to connect two segments. Starting with the longest stroke, I compare it each of the other strokes and calculate the minimum distance between the start and end points of the line segments. After going through all the strokes in a drawing, I connect the two strokes with the shortest gap between their endpoints. I repeat this process, joining the strokes until no strokes remain with endpoints more than 30 pixels apart.
Though path-joining improved the average stroke length as seen in Figure 1 (b), I noticed some drawings had large contiguous strokes that weren’t getting connected. I realized that while the strokes were close together and in some cases touching, their start and end points were far apart from each other.
Path Splicing
My next preprocessing improvement, “path splicing”, would run after “path joining” and attempt to address this problem.
After path joining leaves a smaller number of strokes, I want to find the shortest connections to combine multiple longer strokes. Starting with the longest path, I look for a smaller stroke that I could “splice” into the middle of the larger stroke. For each candidate stroke, I’d step through each point on the larger stroke and compare its distance from the start and end points of the shorter paths. When I found the smallest gap, I would “splice” the shorter line into the longer path at the point with the smallest distance.
While not every drawing was turned into a single stroke, this was a big improvement, as seen in Figure 1 (c).
look_i16__minn10
, left, compared to path-spliced dataset v2-splicedata
, right.
Based my earlier experiment showing the benefits of short-stroke exclusion, I wanted to try training a model on this new dataset.
Training after Path-Splicing
I ran the preprocessing on the whole dataset. Next, I filtered the original 1300 drawings to exclude any drawings with more than 6 strokes, resulting in a new dataset of 1200 drawings that I named v2-splicedata
. Then I trained a new set of models, keeping the layernorm and recurrent dropout enabled.
After training some models on a path-spliced dataset the training metrics aren’t a perfect comparison, since the content of the validation also changed when I applied path-splicing. Still, I can see from the validation loss graph that the model started to overfit around 1000 steps. The roughly similar shapes of the early train and validation loss curves at least convinced me that the model hadn’t gotten dramatically worse.
Qualitatively, the generated drawings showed a big improvement. The model had learned to generate longer unbroken strokes. I started to notice results that looked more like single-line drawings of people. In some cases they look surreal, but I started to see some more recognizable face elements and in some cases full bodies.
Bounding Boxes
My last remaining problem: there were some pages where I’d make 4 or 5 separate drawings that had nothing to do with each other, and had a lot of space between them.
I wanted to separate those into separate examples before making a training dataset, for 2 reasons:
- To get more training examples from my limited number of scanned drawings.
- To avoid confusing the model when some examples are complex drawings with multiple people, and other drawings are just one person.
My intuition for my third preprocessing improvement, “bounding box separation”, came when I noticed how unrelated strokes within a drawing didn’t overlap much, and tended to be spaced far apart. For each of the stroke within a drawing, I’d determine its top/bottom/left/right extremes and consider a box around each stroke as in Figure 6.
Then for each combination of bounding boxes, I’d compute a ratio of the area of their overlap compared to the area of the non-overlapping parts as in Figure 7.
If the ratio exceeds some threshold, I consider them to be part of the same drawing, and I merge the bounding boxes as in Figure 9 (a). Combining that merged bounding box along with all the remaining bounding boxes, I repeat the process until no bounding-box intersections remain that exceed the threshold.
Also, if any bounding boxes have a really small area, I just drop them. It turns out this helps exclude small scribbles of text that were ending up in my training data as separate strokes - for example, the page number at the bottom right of Figure 9 (a).
Dataset 20231214
Once I have all the separated strokes, I save then into a new dataset as separate drawings. While the previous dataset v2-splicedata
only has 1200 drawings, the new bounding-box separated dataset 20231214
has 2400 drawings.
v2-splicedata
with 20231214
and 20231214-filtered
.
The dataset grew from 1200 to 2400 drawings because pages containing multiple distinct drawings (such as Figure 11 (a)) were divided into separate rows in the new training set (like Figure 11 (b), Figure 11 (c), Figure 11 (d)).
v2-splicedata
. The rightmost three drawings are distinct rows in dataset 20231214
.
The new separated drawings looked more visually consistent with the average drawing out of the training set as a whole. The new dataset contains far more single-character drawings, so I expect that the RNN will benefit on learning from a set of drawings with more similar subject matter.
I hypothesized that the bbox-separated dataset will be a big improvement because of the consistency of the resulting drawings. Before, the model was learning that sometimes drawings end after one person is drawn, but sometimes we move the pen and start a new person.
Filtering by Number of Points
Looking at the distribution of number of points per drawing in the new dataset 20231214
in the bottom-middle chart in Figure 10, I noticed a long tail of drawings with more than 500 points. To explore this, I created a variant dataset 20231214-filtered
.
Dataset 20231214-filtered
which was filtered down to 1800 drawings, keeping only drawings with more than 50 and less than 300 points as you can see in the bottom-right chart in Figure 10.
Wondering if drawings with many points were less likely to have consistent subject matter (individual people) than the rest of the training set, I sampled some drawings with over 300 points. While drawings such as Figure 12 (a) and Figure 12 (b) were obvious candidates to exlcude, there were valid drawings near the margin such as Figure 12 (c) that I would be excluding after I picked a threshold.
I also looked at the low end of the distribution and found drawings with under 50 points. There were nonsensical squiggles such as Figure 13 (a) that I was happy to exclude. There were cases below the 50 point threshold such as Figure 13 (b) and Figure 13 (c) that looked recognizable as my drawings, but had been simplified by RDP too aggressively.
Training after Bounding Boxes
20231214
(beige) compared with models trained on dataset 20231214-filtered
with and without stroke augmentation (green, burgundy).
After training on the unfiltered dataset 20231214
, I noticed that some drawings were devolving into a sequence of repeated face features without forming a cohesive person or face, as in Figure 15.
20231214
.
The model results in Figure 16 after training on filtered dataset 20231214-filtered
appear qualitatively better to me. The best results I could find had long coherent strokes capturing part of a face and sometimes a corresponding body.
20231214-filtered
.
The model results in Figure 17 after training with stroke-augmentation on filtered dataset 20231214-filtered
appear to be roughly of similar quality to . The best results I could find had long coherent strokes capturing part of a face and sometimes a corresponding body.
20231214-filtered
, with stroke augmentation enabled.
If you want to keep reading, check out part 6 of my SLML series.