Sketchpose: Learning to Segment Cells with Partial Annotations

Clément Cazorla1,2Orcid, Nathanaël Munier1, Renaud Morin2, Pierre Weiss1
1: Institut de Recherche en Informatique de Toulouse (IRIT), Institut de Mathématiques de Toulouse (IMT), Centre de Biologie Intégrative (CBI), Laboratoire de Biologie Moléculaire, Cellulaire et du Développement (MCD), Université de Toulouse, CNRS, Université Toulouse III – Paul Sabatier, Toulouse, France, 2: Imactiv-3D, Centre Pierre Potier, 1 place Pierre Potier, 31100 Toulouse, France
Publication date: 2025/08/22
https://doi.org/10.59275/j.melba.2025-f7b3
PDF · Code · Documentation · arXiv

Abstract

The most popular networks used for cell segmentation (e.g. Cellpose, Stardist, HoverNet,...) rely on a prediction of a distance map. It yields unprecedented accuracy but hinges on fully annotated datasets. This is a serious limitation to generate training sets and perform transfer learning. In this paper, we propose a method that still relies on the distance map and handles partially annotated objects. We evaluate the performance of the proposed approach in the contexts of frugal learning, transfer learning and regular learning on regular databases. Our experiments show that it can lead to substantial savings in time and resources without sacrificing segmentation quality. The proposed algorithm is embedded in a user-friendly Napari plugin.

Keywords

Cellpose · Deep learning · Distance Map · Frugal learning · Napari · Segmentation

Bibtex @article{melba:2025:016:cazorla, title = "Sketchpose: Learning to Segment Cells with Partial Annotations", author = "Cazorla, Clément and Munier, Nathanaël and Morin, Renaud and Weiss, Pierre", journal = "Machine Learning for Biomedical Imaging", volume = "3", issue = "August 2025 issue", year = "2025", pages = "367--381", issn = "2766-905X", doi = "https://doi.org/10.59275/j.melba.2025-f7b3", url = "https://melba-journal.org/2025:016" }
RISTY - JOUR AU - Cazorla, Clément AU - Munier, Nathanaël AU - Morin, Renaud AU - Weiss, Pierre PY - 2025 TI - Sketchpose: Learning to Segment Cells with Partial Annotations T2 - Machine Learning for Biomedical Imaging VL - 3 IS - August 2025 issue SP - 367 EP - 381 SN - 2766-905X DO - https://doi.org/10.59275/j.melba.2025-f7b3 UR - https://melba-journal.org/2025:016 ER -

2025:016 cover

Disclaimer: the following html version has been automatically generated and the PDF remains the reference version. Feedback can be sent directly to publishing-editor@melba-journal.org

1 Introduction

Image segmentation plays a fundamental role in the analysis of biological images. It enables the extraction of quantitative information on diverse objects ranging from molecules, droplets, membranes, nuclei, cells, vessels or other structures. In modern biological research, accurate segmentation is often pivotal to better understand the mechanisms of life. The increasing availability of high-throughput imaging technologies has led to a surge in the quantity and complexity of image data, raising significant challenges and opportunities. Manual annotation of the resulting images is labor-intensive, time-consuming, and often impractical for large-scale datasets. Automated segmentation is therefore widely accepted as a critical step in biological research.

A simplified history of cell segmentation

Image segmentation has long been dominated by handcrafted algorithms. The processing pipelines typically combine popular tools such as linear filtering, thresholding Otsu (1979), morphological operations Serra and Soille (2012); Legland et al. (2016), active contour models (Snake) Kass et al. (1988) or watershed Vincent and Soille (1991). A significant issue with handcrafted approaches is that they are usually image-specific and rely on the manual tuning of a few complicated hyper-parameters. Although excellent performance can be achieved, it is often the work of a handful of talented people and these techniques are not broadly applicable.

The introduction of machine learning and especially random forests made image segmentation accessible to a much larger range of researchers. These techniques automatically combine and tune elementary image processing bricks. They are driven by a few easily interpretable user annotations. Embedded in well conceived software such as Ilastik Berg et al. (2019) or Labkit Arzt et al. (2022), these techniques heavily contributed to democratize image segmentation and classification.

Deep learning and convolutional neural networks played an important role in improving the segmentation performance around 2015. For instance, the popular U-Net architecture Ronneberger et al. (2015) increased the accuracy on some cell segmentation challenges by more than 10%, which can be considered as a small revolution. This type of neural network architecture seems to be a good prior for segmenting “natural” images, as suggested by the so-called Deep Image Prior principle Lempitsky et al. (2018). However, it can sometimes demonstrate limited effectiveness when it comes to separating nearby or touching objects. Many applications in biology involve densely packed objects (e.g. cells, nuclei) and a pixel-classification U-Net is often insufficient to perform a satisfactory analysis. To address this issue, new architectures coming from computer vision such as Mask R-CNN He et al. (2017) have been developed and continued improving the performance.

Roughly at the same time, a few approaches (Deep watershed transform Bai and Urtasun (2017), Deep Regression of the Distance Map Naylor et al. (2018); Kumar et al. (2019), StarDist Schmidt et al. (2018), Hover-Net Graham et al. (2019), Cellpose Stringer et al. (2021), Omnipose Cutler et al. (2022)) have been developed and generated results with an unprecedented quality. Despite certain differences, they all share a common underlying principle. The idea is to make a regression with respect to some distance function. Given a set of annotated objects, a distance function to the objects centers or boundaries is computed. A convolutional neural network is then trained to predict the distance function rather than a binary map of the objects. The gradient of this distance function points in opposite directions on each side of the boundary, which makes it possible to determine them with much greater precision. This principle created a new gap in the segmentation accuracy, especially for objects with touching boundaries.

A current trend consists in involving the user in the training procedure. This “human in the loop” principle was incorporated in CellPose 2.0 Pachitariu and Stringer (2022). Users fully annotate patches of the segmented image, to adapt the neural network weights to the image at hand.

It would be hazardous to call these approaches the current “state-of-the-art”, since this field is expanding extremely quickly. However – as of 2025 – we can safely claim that algorithms based on the distance map are at the basis of some of the most popular and efficient cell segmentation methods.

Contributions

This work stems from a practical observation: methods which rely on a regression to the distance function currently require exhaustive annotations. As the distance function is a global geometrical property, it is impossible to compute it using just a few sketches. Hence, it is a priori unclear how partial annotations can be used in this framework, see Figure 1. Cellpose 2 gets around this problem by allowing the user to annotate patches of interest in their entirety. Similarly, Sugawara (2023) recently proposed a simple extension of Stardist and Cellpose by training the networks on a subset of completely annotated objects. This is a time-consuming process that does not allow the expert to focus on local spots (e.g. a part of boundary) where the network clearly missed the segmentation.

Refer to caption
(a) Image
Refer to caption
(b) Full annotation
Refer to caption
(c) Distance map
Refer to caption
(d) Partial annotation
Figure 1: Partial VS full annotation. In (b), the complete annotation is used to compute the distance map shown in (c). In (d), it is unclear how to compute it from the partial object annotation.

In this paper, we introduce a novel idea that allows us to use the distance function even with partially annotated objects. After drawing just a few regions and boundaries, the user can train a task-aware neural network. This approach capitalizes on the generalization capacity of neural networks, reducing the overall annotation effort without sacrificing accuracy. We explore the performance of the proposed architecture in 3 different settings:

  • Few-shot learning: starting from random weights, we show that just a few partial annotations are already enough to quickly realize complex cell segmentation analyses. This is interesting when faced with a problem for which no close pre-trained model exists.

  • Transfer learning: starting from Omnipose’s optimized weights, we show that just a few clicks at locations where the segmentation is inaccurate lead to improved weights and fast adaptation to out-of-distribution images. This is the traditional field of transfer learning, domain adaptation, e.g.. Our contribution here is to show that this can be done with only a few scattered annotations.

  • Large databases: finally, we show that large, but partially annotated sets can also be used to train high performance neural networks. This is important since it can significantly accelerate the design of segmentation databases.

This evaluation on both small and large-scale dataset, overall showcases the advantages of our approach in terms of time and resource savings. We developed a Napari plugin Chiu et al. (2022) named Sketchpose, to assess its potential, ensure reproducibility of the results and provide an additional tool to the community. It relies on a modified version of the Omnipose Cutler et al. (2022) algorithm. The plugin is currently being downloaded regularly, with 638 downloads to date.

2 Methodology

2.1 Preliminary definitions and notations

In all the paper 𝒳𝒳\mathcal{X} refers to the image domain, which can be understood as a discrete set of coordinates, or as a continuous domain depending on the context. In the discrete setting, we let |𝒳|𝒳|\mathcal{X}| denote the number of pixels of 𝒳𝒳\mathcal{X}.

Definition 1

For an arbitrary set 𝒮𝒳𝒮𝒳\mathcal{S}\subset\mathcal{X}, we let 𝒮𝒮\partial\mathcal{S} denote its boundary. We use the 4-connectivity (top, bottom, left, right) in the discrete setting.

Definition 2 (Point to set distance)

The distance from a point 𝐱𝒳𝐱𝒳\mathbf{x}\in\mathcal{X} to a set 𝒮𝒳𝒮𝒳\mathcal{S}\subseteq\mathcal{X} is defined by

dist(𝐱,𝒮)=definf𝐱𝒮𝐱𝐱2.superscriptdefdist𝐱𝒮subscriptinfimumsuperscript𝐱𝒮subscriptnorm𝐱superscript𝐱2\mathrm{dist}(\mathbf{x},\mathcal{S})\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\inf_{\mathbf{x}^{\prime}\in\mathcal{S}}\|\mathbf{x}-\mathbf{x}^{\prime}\|_{2}.(1)

2.2 Omnipose

Our work is based on the Omnipose cell segmentation architecture Cutler et al. (2022). In this section, we justify this choice, explain its founding principles and then demonstrate how they can be adapted to deal with partial annotations.

2.2.1 Why Omnipose

Cellpose Stringer et al. (2021) has now become a standard in cell segmentation. Its excellent perfomance, processing speed, and ergonomic graphical interface make it a handy tool for every day cell biology image analysis. However, it occasionally fails in scenarios involving complex and elongated objects. In such cases, it tends to produce over-segmentation, where neighboring objects are split in smaller fragments.

The Omnipose algorithm Cutler et al. (2022) was conceived in order to address this limitation. The main difference between Omnipose and Cellpose is the fact that the distance map is defined as the distance to the cell boundaries in Omnipose, while it is defined as a distance to a cell “centroid” in Cellpose. A weakness of the latter is that there is no canonical choice to define this center, hence Omnipose’s choice seems more principled. This explains our decision to choose and base our work on its architecture.

2.2.2 The main principles

Refer to caption
Figure 2: A sketch of the Omnipose training procedure

Figure 2 summarizes the main ideas behind the Omnipose architecture and its training. Omnipose is based on a regular convolutional neural network (CNN), with a U-Net like architecture Ronneberger et al. (2015). Given an input 2D image with N𝑁N pixels, the CNN can be seen as a mapping Nwomnisuperscriptsubscript𝑁𝑤omniN_{w}^{\textrm{omni}} of the form

Nwomni:NN×N×2Nu(Nwb(u),Nwd(u),Nw𝐯(u)).\begin{array}[]{ccccccccc}N_{w}^{\textrm{omni}}&:&\mathbb{R}^{N}&\to&\mathbb{R}^{N}&\times&\mathbb{R}^{N}&\times&\mathbb{R}^{2N}\\ &&u&\mapsto&(N_{w}^{b}(u)&,&N_{w}^{d}(u)&,&N_{w}^{\mathbf{v}}(u))\\ \end{array}.(2)

It depends on weights w𝑤w that should be optimized during a training stage. It returns 333 different outputs (illustrated on the top of Figure 2):

  • Nwb(u)superscriptsubscript𝑁𝑤𝑏𝑢absentN_{w}^{b}(u)\equiv boundary probability: at every pixel, the value of this image can be interpreted as a probability of being a boundary between the objects to segment.

  • Nwd(u)superscriptsubscript𝑁𝑤𝑑𝑢absentN_{w}^{d}(u)\equiv distance map: at a given pixel, the value of this map is equal to:

    • The distance of the pixel to the closest object boundary, if the pixel is inside an object.

    • 00 (or a fixed negative value) elsewhere.

  • Nw𝐯(u)superscriptsubscript𝑁𝑤𝐯𝑢absentN_{w}^{\mathbf{v}}(u)\equiv flow field: can be interpreted as the gradient of the distance map. It is an essential feature of the Cellpose and Omnipose architectures. Ultimately, the flow is used through a procedure called Euler integration to generate a segmentation mask. This is illustrated on the top right of Figure 2.

2.2.3 The original loss definition

The original training stage involves a collection of K𝐾K\in\mathbb{N} images (uk)1kKsubscriptsubscript𝑢𝑘1𝑘𝐾(u_{k})_{1\leq k\leq K} together with their exhaustive segmentation masks. For every image uksubscript𝑢𝑘u_{k} in the dataset, an algorithm creates the gold standard boundary probability bksuperscriptsubscript𝑏𝑘b_{k}^{\star}, distance map dksuperscriptsubscript𝑑𝑘d_{k}^{\star} and flow field 𝐯ksuperscriptsubscript𝐯𝑘\mathbf{v}_{k}^{\star}. This is illustrated on the bottom of Figure 2.

The weights w𝑤w of the neural network are then optimized so as to minimize a loss function that compares the output of the CNN with the gold standard:

infwlossomni(w)=def1Kk=1K(bk,bk)+𝒟(dk,dk)+𝒱(𝐯k,𝐯k),superscriptdefsubscriptinfimum𝑤superscriptlossomni𝑤1𝐾superscriptsubscript𝑘1𝐾subscriptsubscript𝑏𝑘superscriptsubscript𝑏𝑘subscript𝒟subscript𝑑𝑘superscriptsubscript𝑑𝑘subscript𝒱subscript𝐯𝑘superscriptsubscript𝐯𝑘\inf_{w}\mathrm{loss}^{\mathrm{omni}}(w)\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\frac{1}{K}\sum_{k=1}^{K}\ell_{\mathcal{B}}(b_{k},b_{k}^{\star})+\ell_{\mathcal{D}}(d_{k},d_{k}^{\star})+\ell_{\mathcal{V}}(\mathbf{v}_{k},\mathbf{v}_{k}^{\star}),(3)

where bk=Nwb(uk)subscript𝑏𝑘superscriptsubscript𝑁𝑤𝑏subscript𝑢𝑘b_{k}=N_{w}^{b}(u_{k}), dk=Nwd(uk)subscript𝑑𝑘superscriptsubscript𝑁𝑤𝑑subscript𝑢𝑘d_{k}=N_{w}^{d}(u_{k}), 𝐯k=Nw𝐯(uk)subscript𝐯𝑘superscriptsubscript𝑁𝑤𝐯subscript𝑢𝑘\mathbf{v}_{k}=N_{w}^{\mathbf{v}}(u_{k}).

In the original Omnipose implementation available on GitHub, the different losses were defined as follows:

  • Boundary loss subscript\ell_{\mathcal{B}}:

    This term compares the predictions b𝑏b to bsuperscript𝑏b^{\star} using the following loss:

    (b,b)=defλ|𝒳|𝐱𝒳g(b[𝐱],b[𝐱]),superscriptdefsubscript𝑏superscript𝑏subscript𝜆𝒳subscript𝐱𝒳𝑔𝑏delimited-[]𝐱superscript𝑏delimited-[]𝐱\ell_{\mathcal{B}}(b,b^{\star})\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\frac{\lambda_{\mathcal{B}}}{|\mathcal{X}|}\sum_{\mathbf{x}\in\mathcal{X}}g(b[\mathbf{x}],b^{\star}[\mathbf{x}]),(4)

    where g:×:𝑔g:\mathbb{R}\times\mathbb{R}\to\mathbb{R} combines a sigmoid and a binary cross entropy loss.

  • Distance loss 𝒟subscript𝒟\ell_{\mathcal{D}}:

    This loss calculates a weighted mean squared error between the predicted distance fields and the ground truth distance fields. It is defined as

    𝒟(d,d)=defλ𝒟|𝒳|𝐱𝒳(d[𝐱]d[𝐱])2ρ[𝐱],superscriptdefsubscript𝒟𝑑superscript𝑑subscript𝜆𝒟𝒳subscript𝐱𝒳superscript𝑑delimited-[]𝐱superscript𝑑delimited-[]𝐱2𝜌delimited-[]𝐱\ell_{\mathcal{D}}(d,d^{\star})\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\frac{\lambda_{\mathcal{D}}}{|\mathcal{X}|}\sum_{\mathbf{x}\in\mathcal{X}}(d[\mathbf{x}]-d^{\star}[\mathbf{x}])^{2}\cdot\rho[\mathbf{x}],

    where ρN𝜌superscript𝑁\rho\in\mathbb{R}^{N} is a weight image with higher values around the gold standard boundaries.

  • Flow loss 𝒱subscript𝒱\ell_{\mathcal{V}}:

    This loss is defined as a weighted sum of three losses 𝒱=𝒱1+𝒱2+𝒱3subscript𝒱superscriptsubscript𝒱1superscriptsubscript𝒱2superscriptsubscript𝒱3\ell_{\mathcal{V}}=\ell_{\mathcal{V}}^{1}+\ell_{\mathcal{V}}^{2}+\ell_{\mathcal{V}}^{3}. The first one is a mean squared error loss:

    𝒱1(𝐯,𝐯)=defλ𝒱,1|𝒳|𝐱𝒳𝐯[𝐱]𝐯[𝐱]22ρ[𝐱].superscriptdefsuperscriptsubscript𝒱1𝐯superscript𝐯subscript𝜆𝒱1𝒳subscript𝐱𝒳superscriptsubscriptnorm𝐯delimited-[]𝐱superscript𝐯delimited-[]𝐱22𝜌delimited-[]𝐱\ell_{\mathcal{V}}^{1}(\mathbf{v},\mathbf{v}^{\star})\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\frac{\lambda_{\mathcal{V},1}}{|\mathcal{X}|}\sum_{\mathbf{x}\in\mathcal{X}}\left\|\mathbf{v}[\mathbf{x}]-\mathbf{v}^{\star}[\mathbf{x}]\right\|_{2}^{2}\cdot\rho[\mathbf{x}].(5)

    The second one compares the norms of the vector fields:

    𝒱2(𝐯,𝐯)=defλ𝒱,2|𝒳|𝐱𝒳(𝐯[𝐱]2𝐯[𝐱])2ρ[𝐱].superscriptdefsuperscriptsubscript𝒱2𝐯superscript𝐯subscript𝜆𝒱2𝒳subscript𝐱𝒳superscriptsubscriptnorm𝐯delimited-[]𝐱2normsuperscript𝐯delimited-[]𝐱2𝜌delimited-[]𝐱\ell_{\mathcal{V}}^{2}(\mathbf{v},\mathbf{v}^{\star})\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\frac{\lambda_{\mathcal{V},2}}{|\mathcal{X}|}\sum_{\mathbf{x}\in\mathcal{X}}(\|\mathbf{v}[\mathbf{x}]\|_{2}-\|\mathbf{v}^{\star}[\mathbf{x}]\|)^{2}\cdot\rho[\mathbf{x}].(6)

    The third one aims to minimize the distance between trajectories generated through the ground truth and predicted flows. Trajectories starting from an initial point 𝐳𝐳\mathbf{z} can be generated by simple explicit Euler discretization:

    𝐱0(𝐳)subscript𝐱0𝐳\displaystyle\mathbf{x}_{0}(\mathbf{z})=def𝐳superscriptdefabsent𝐳\displaystyle\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\mathbf{z}
    𝐱l+1(𝐳)subscript𝐱𝑙1𝐳\displaystyle\mathbf{x}_{l+1}(\mathbf{z})=def𝐱l(𝐳)+Δt𝐯[𝐱l(𝐳)]superscriptdefabsentsubscript𝐱𝑙𝐳Δ𝑡𝐯delimited-[]subscript𝐱𝑙𝐳\displaystyle\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\mathbf{x}_{l}(\mathbf{z})+\Delta t\cdot\mathbf{v}[\mathbf{x}_{l}(\mathbf{z})]
    𝐱0(𝐳)subscriptsuperscript𝐱0𝐳\displaystyle\mathbf{x}^{\star}_{0}(\mathbf{z})=def𝐳superscriptdefabsent𝐳\displaystyle\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\mathbf{z}
    𝐱l+1(𝐳)subscriptsuperscript𝐱𝑙1𝐳\displaystyle\mathbf{x}^{\star}_{l+1}(\mathbf{z})=def𝐱l(𝐳)+Δt𝐯[𝐱l(𝐳)].superscriptdefabsentsuperscriptsubscript𝐱𝑙𝐳Δ𝑡superscript𝐯delimited-[]subscriptsuperscript𝐱𝑙𝐳\displaystyle\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\mathbf{x}_{l}^{\star}(\mathbf{z})+\Delta t\cdot\mathbf{v}^{\star}[\mathbf{x}^{\star}_{l}(\mathbf{z})].

    where ΔtΔ𝑡\Delta t is a step-size. Letting L𝐿L\in\mathbb{N} denote an integration time, the “Euler” loss then becomes:

    𝒱3(𝐯,𝐯)=defλ𝒱,3|𝒳|𝐳𝒳l=1L𝐱l(𝐳)𝐱l(𝐳)22.superscriptdefsuperscriptsubscript𝒱3𝐯superscript𝐯subscript𝜆𝒱3𝒳subscript𝐳𝒳superscriptsubscript𝑙1𝐿superscriptsubscriptnormsubscript𝐱𝑙𝐳superscriptsubscript𝐱𝑙𝐳22\ell_{\mathcal{V}}^{3}(\mathbf{v},\mathbf{v}^{\star})\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\frac{\lambda_{\mathcal{V},3}}{|\mathcal{X}|}\sum_{\mathbf{z}\in\mathcal{X}}\sum_{l=1}^{L}\|\mathbf{x}_{l}(\mathbf{z})-\mathbf{x}_{l}^{\star}(\mathbf{z})\|_{2}^{2}.(7)

    It measures how two trajectories generated by Euler integration using the ground truth and predicted vector fields deviate. This loss is implemented in the torchVF library by Peters (2022). For more information, we refer the reader to the related report.

An inspection of the code reveals that the different weights have been set empirically as: λ=10subscript𝜆10\lambda_{\mathcal{B}}=10, λ𝒟=2subscript𝜆𝒟2\lambda_{\mathcal{D}}=2, λ𝒱,1=2subscript𝜆𝒱12\lambda_{\mathcal{V},1}=2, λ𝒱,2=2subscript𝜆𝒱22\lambda_{\mathcal{V},2}=2, λ𝒱,3=1subscript𝜆𝒱31\lambda_{\mathcal{V},3}=1.

Remark 3

The different losses have probably been combined by trial and error to produce the best possible results. However, there are clear redundancies in the definitions of the losses, for instance 𝒱1superscriptsubscript𝒱1\ell_{\mathcal{V}}^{1}, 𝒱2superscriptsubscript𝒱2\ell_{\mathcal{V}}^{2} and 𝒱3superscriptsubscript𝒱3\ell_{\mathcal{V}}^{3} are all measuring the distance between flows using different metrics. In our implementation, we tried to simplify the losses as much as possible, while still maintaining a good performance.

2.3 Adapting to partial annotations

All the principles described above heavily depend on an exhaustive segmentation of the cells. Indeed, the distance functions and gradient flows – which are instrumental to define the loss functions – are global properties which do change heavily if the objects boundaries are incomplete. In this section, we describe the main methodological contribution of this paper, which will allow us to handle partial boundaries.

2.3.1 The gold standard

Refer to caption
(a) Gold-standard
Refer to caption
(b) Valid annotation
Refer to caption
(c) Image
Refer to caption
(d) Admissible
Refer to caption
(e) Not admissible
Figure 3: (Left) Ground-truth and an admissible annotation set. (Right) If a stroke contains multiple objects, the object boundaries have to be drawn. In this example, two nuclei are present under the blue bow-tie-shaped region. Therefore, a manual boundary has to be added in the center.
Table 1: Summary of notations.
NotationDescription
𝒳=𝒳0𝒳1𝒳subscript𝒳0subscript𝒳1\mathcal{X}=\mathcal{X}_{0}\cup\mathcal{X}_{1}Image domain
𝒳0subscript𝒳0\mathcal{X}_{0}True background
𝒳1subscript𝒳1\mathcal{X}_{1}True foreground
𝒮0subscript𝒮0\mathcal{S}_{0}Background strokes
𝒮1subscript𝒮1\mathcal{S}_{1}Foreground strokes
\mathcal{E}True boundaries
=manual0,1subscriptmanualsubscript01\mathcal{B}=\mathcal{B}_{\mathrm{manual}}\cup\mathcal{B}_{0,1}User-defined boundaries
𝒟𝒟\mathcal{D}Valid distance set

The notations are summarized in Table 1. We assume that the domain 𝒳=𝒳0𝒳1𝒳square-unionsubscript𝒳0subscript𝒳1\mathcal{X}=\mathcal{X}_{0}\sqcup\mathcal{X}_{1} is partitioned with the background set 𝒳0subscript𝒳0\mathcal{X}_{0} and the foreground set 𝒳1subscript𝒳1\mathcal{X}_{1}. A difficulty in instance segmentation is that multiple objects may exist within the connected components of a region 𝒳isubscript𝒳𝑖\mathcal{X}_{i}. To differentiate them, we let (𝒳i,j)1jJisubscriptsubscript𝒳𝑖𝑗1𝑗subscript𝐽𝑖(\mathcal{X}_{i,j})_{1\leq j\leq J_{i}} denote a partition of the set 𝒳isubscript𝒳𝑖\mathcal{X}_{i} as different objects within a similar class. For instance in Figure 3(a), the foreground set 𝒳1subscript𝒳1\mathcal{X}_{1} is split in 13 components. A connected component of 𝒳1subscript𝒳1\mathcal{X}_{1} can be split as 𝒳1,2𝒳1,3subscript𝒳12subscript𝒳13\mathcal{X}_{1,2}\cup\mathcal{X}_{1,3}. The background set 𝒳0subscript𝒳0\mathcal{X}_{0} is split in a single component 𝒳0,1subscript𝒳01\mathcal{X}_{0,1}.

We let

=defi{0,1}j=1Ji𝒳i,jsuperscriptdefsubscript𝑖01superscriptsubscript𝑗1subscript𝐽𝑖subscript𝒳𝑖𝑗\mathcal{E}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\bigcup_{i\in\{0,1\}}\bigcup_{j=1}^{J_{i}}\partial\mathcal{X}_{i,j}(8)

denote the set of all edges (or object boundaries) within the image. It is depicted in red in Figure 3(a).

2.3.2 The annotation set

The input of our neural network is a set of “sketches” or strokes drawn by the user. We let 𝒮0subscript𝒮0\mathcal{S}_{0} and 𝒮1subscript𝒮1\mathcal{S}_{1} denote the strokes describing the background and foreground respectively. They are depicted in brown and blue respectively in Figure 3(b). The intersection of the brown and blue strokes define natural boundaries. We can indeed construct the touching boundaries 0,1subscript01\mathcal{B}_{0,1} between different strokes as

0,1=def𝒮0¯𝒮1¯,superscriptdefsubscript01¯subscript𝒮0¯subscript𝒮1\mathcal{B}_{0,1}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\overline{\mathcal{S}_{0}}\cap\overline{\mathcal{S}_{1}},

where 𝒳¯¯𝒳\overline{\mathcal{X}} is the closure of 𝒳𝒳\mathcal{X} in the continuous setting and the interface between neighboring pixels in the discrete setting.

In addition, the user can delineate other boundaries, denoted manualsubscriptmanual\mathcal{B}_{\textrm{manual}}, to separate touching objects within a class. We can concatenate all the boundaries to obtain a complete boundary set \mathcal{B} defined as

=manual0,1.subscriptmanualsubscript01\mathcal{B}=\mathcal{B}_{\textrm{manual}}\bigcup\mathcal{B}_{0,1}.(9)

For the algorithm to work properly, we require the following set of assumptions.

Assumption 1 (Assumptions on the strokes)
  • The strokes correctly separate the background and foreground: 𝒮0𝒳0subscript𝒮0subscript𝒳0\mathcal{S}_{0}\subseteq\mathcal{X}_{0} and 𝒮1𝒳1subscript𝒮1subscript𝒳1\mathcal{S}_{1}\subseteq\mathcal{X}_{1}.

  • The strokes do not overlap: 𝒮0𝒮1=subscript𝒮0subscript𝒮1\mathcal{S}_{0}\cap\mathcal{S}_{1}=\emptyset. This is actually forced by our Napari interface.

  • The boundaries \mathcal{B} are a subset of the exact boundaries \mathcal{E}, that is:

    .\mathcal{B}\subseteq\mathcal{E}.(10)
  • If the stroke 𝒮isubscript𝒮𝑖\mathcal{S}_{i} contains multiple objects, then the boundaries between the objects need to be completely drawn with manualsubscriptmanual\mathcal{B}_{\textrm{manual}} (see Figure 3(d)). Letting 𝒮=def𝒮0𝒮1superscriptdef𝒮subscript𝒮0subscript𝒮1\mathcal{S}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\mathcal{S}_{0}\cup\mathcal{S}_{1} denote the complete stroke set drawn by the user, this condition reads:

    𝒮.𝒮\mathcal{E}\cap\mathcal{S}\subseteq\mathcal{B}.(11)

2.3.3 The main observation

The main result we will use to define and certify our algorithm is summarized in the following theorem.

Theorem 4 (The valid distance set)

Let 𝒞=def𝒮0𝒮1superscriptdef𝒞subscript𝒮0subscript𝒮1\mathcal{C}\mathcal{B}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\partial\mathcal{S}_{0}\cup\partial\mathcal{S}_{1}\cup\mathcal{B} denote the complete set of annotation boundaries and define the valid distance set 𝒟𝒟\mathcal{D} as

𝒟=def{x𝒮,dist(x,)dist(x,𝒞)}.superscriptdef𝒟formulae-sequence𝑥𝒮dist𝑥dist𝑥𝒞\mathcal{D}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\left\{x\in\mathcal{S},\mathrm{dist}(x,\mathcal{B})\leq\mathrm{dist}\left(x,\mathcal{C}\mathcal{B}\right)\right\}.(12)

The following relationships hold:

For all x𝒟,For all 𝑥𝒟\displaystyle\textrm{For all }\quad x\in\mathcal{D},\quaddist(x,)=dist(x,).dist𝑥dist𝑥\displaystyle\mathrm{dist}(x,\mathcal{E})=\mathrm{dist}(x,\mathcal{B}).
For all x𝒮0𝒮1,For all 𝑥subscript𝒮0subscript𝒮1\displaystyle\textrm{For all }\quad x\in\mathcal{S}_{0}\cup\mathcal{S}_{1},\quaddist(x,𝒞)dist(x,).dist𝑥𝒞dist𝑥\displaystyle\mathrm{dist}(x,\mathcal{C}\mathcal{B})\leq\mathrm{dist}(x,\mathcal{E}).

The proof of this theorem is given in Appendix A. This theorem should be understood as follows. The first identity informs us that we can compute the exact distance map dist(x,)dist𝑥\mathrm{dist}(x,\mathcal{E}) to the set of exact boundaries \mathcal{E} on the valid set 𝒟𝒟\mathcal{D}. This set can be computed using only the partial annotations of the boundaries \mathcal{B}\subseteq\mathcal{E} and the different semantic regions 𝒮i𝒳isubscript𝒮𝑖subscript𝒳𝑖\mathcal{S}_{i}\subseteq\mathcal{X}_{i}. The second inequality tells us that we have an information everywhere on the strokes 𝒮0subscript𝒮0\mathcal{S}_{0} and 𝒮1subscript𝒮1\mathcal{S}_{1}. Moreover, in the case of total annotations, we get 𝒟=𝒳𝒟𝒳\mathcal{D}=\mathcal{X} and the proposed idea will lead to a training equivalent to the one in Omnipose and we can see the proposed setting as a generalization. Figure 4 schematically summarizes Theorem 4.

Refer to caption
Figure 4: Illustration of the valid distance set theorem. All the pixels in 𝒟𝒟\mathcal{D} are closer to \mathcal{B} than to the boundaries of 𝒮𝒮\mathcal{S}. The colormap used to represent the distance map exhibits a progressive shift in colors, transitioning from blue to red.

2.3.4 Adapting the training

A simpler architecture

The prediction of the boundaries Nwb(u)superscriptsubscript𝑁𝑤𝑏𝑢N_{w}^{b}(u) is not necessary, since only the distance map Nwd(u)superscriptsubscript𝑁𝑤𝑑𝑢N_{w}^{d}(u) and the flow field Nw𝐯(u)superscriptsubscript𝑁𝑤𝐯𝑢N_{w}^{\mathbf{v}}(u) are needed to compute the final masks. Hence, we keep the same U-Net architecture, but remove the channel associated to the boundaries:

Nwsketch:NN×2Nu(Nwd(u),Nw𝐯(u)).\begin{array}[]{ccccccc}N_{w}^{\textrm{sketch}}&:&\mathbb{R}^{N}&\to&\mathbb{R}^{N}&\times&\mathbb{R}^{2N}\\ &&u&\mapsto&(N_{w}^{d}(u)&,&N_{w}^{\mathbf{v}}(u))\\ \end{array}.(13)
Different summation sets

Equipped with the valid distance set 𝒟𝒟\mathcal{D}, we are ready to adapt the losses to cope with partial annotation. In Omnipose, the losses subscript\ell_{\mathcal{B}}, 𝒟subscript𝒟\ell_{\mathcal{D}}, 𝒱1superscriptsubscript𝒱1\ell_{\mathcal{V}}^{1} and 𝒱2superscriptsubscript𝒱2\ell_{\mathcal{V}}^{2} are defined by summation over the set 𝒳𝒳\mathcal{X} (see paragraph 2.2.3). With partial annotation, the gold standard is not properly defined on this set and we therefore need to change the summation sets.

Based on Theorem 4, the losses related to the distance set and to the flows become:

𝒟partial(d,d)superscriptsubscript𝒟partial𝑑superscript𝑑\displaystyle\ell_{\mathcal{D}}^{\textrm{partial}}(d,d^{\star})=defλ𝒟|𝒟|𝐱𝒟(d[𝐱]d[𝐱])2,superscriptdefabsentsubscript𝜆𝒟𝒟subscript𝐱𝒟superscript𝑑delimited-[]𝐱superscript𝑑delimited-[]𝐱2\displaystyle\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\frac{\lambda_{\mathcal{D}}}{|\mathcal{D}|}\sum_{\mathbf{x}\in\mathcal{\mathcal{D}}}(d[\mathbf{x}]-d^{\star}[\mathbf{x}])^{2},
𝒱partial(𝐯,𝐯)superscriptsubscript𝒱partial𝐯superscript𝐯\displaystyle\ell_{\mathcal{V}}^{{\textrm{partial}}}(\mathbf{v},\mathbf{v}^{\star})=defλ𝒱,1|𝒟|𝐱𝒟𝐯[𝐱]𝐯[𝐱]22.superscriptdefabsentsubscript𝜆𝒱1𝒟subscript𝐱𝒟superscriptsubscriptnorm𝐯delimited-[]𝐱superscript𝐯delimited-[]𝐱22\displaystyle\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\frac{\lambda_{\mathcal{V},1}}{|\mathcal{D}|}\sum_{\mathbf{x}\in\mathcal{D}}\left\|\mathbf{v}[\mathbf{x}]-\mathbf{v}^{\star}[\mathbf{x}]\right\|_{2}^{2}.

We replaced 𝐱𝒳subscript𝐱𝒳\sum_{\mathbf{x}\in\mathcal{X}} by 𝐱𝒟subscript𝐱𝒟\sum_{\mathbf{x}\in\mathcal{D}} in section 2.2.3 and discarded the weights ρ𝜌\rho. This means that we compare the ground truth and prediction only where it makes sense to do so.

Dealing with inequalities

Until now, we just used the first identity in Theorem 4, but the second inequality brings some additional information. We propose to integrate it in the training through the additional asymmetric loss:

𝒟ineq(d,d)=defλ𝒟,ineq|𝒮1𝒮2|𝐳𝒮1𝒮2ReLU2(d[𝐳]d[𝐳]).superscriptdefsuperscriptsubscript𝒟ineq𝑑superscript𝑑subscript𝜆𝒟ineqsubscript𝒮1subscript𝒮2subscript𝐳subscript𝒮1subscript𝒮2superscriptReLU2𝑑delimited-[]𝐳superscript𝑑delimited-[]𝐳\ell_{\mathcal{D}}^{\textrm{ineq}}(d,d^{\star})\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\frac{\lambda_{{\mathcal{D}},\textrm{ineq}}}{|\mathcal{S}_{1}\cup\mathcal{S}_{2}|}\sum_{\mathbf{z}\in\mathcal{S}_{1}\cup\mathcal{S}_{2}}\textrm{ReLU}^{2}\left(d[\mathbf{z}]-d^{\star}[\mathbf{z}]\right).(14)
Putting it all together

We can now define the total sketchpose loss as:

losssketch(w)=𝒱partial(𝐯,𝐯)+𝒟partial(d,d)+𝒟ineq(d,d)superscriptlosssketch𝑤superscriptsubscript𝒱partial𝐯superscript𝐯superscriptsubscript𝒟partial𝑑superscript𝑑superscriptsubscript𝒟ineq𝑑superscript𝑑\mathrm{loss}^{\textrm{sketch}}(w)=\ell_{\mathcal{V}}^{{\textrm{partial}}}(\mathbf{v},\mathbf{v}^{\star})+\ell_{\mathcal{D}}^{\textrm{partial}}(d,d^{\star})+\ell_{\mathcal{D}}^{\textrm{ineq}}(d,d^{\star})(15)

where d,𝐯𝑑𝐯d,\mathbf{v} are obtained by the simplified neural network Nwsketchsuperscriptsubscript𝑁𝑤sketchN_{w}^{\textrm{sketch}}.

2.4 The Sketchpose plugin

A significant part of this work lies in the development of a user-friendly graphical interface to train and use the neural network. It is integrated in Napari Chiu et al. (2022), which is well suited to embedding the Python/PyTorch codes at the core of our approach.

Sketchpose can be easily installed through either the pip package manager or Napari’s Chiu et al. (2022) built-in interface. A detailed documentation can be accessed by clicking on this hyperlink. It offers step-by-step instructions illustrated by short videos, to assist users in effectively testing all the capabilities of the plugin.

The user directly draws a few strokes for the background, foreground and boundaries. The brush size can be adjusted similarly to usual paint software. An entire stroke boundary can be added to the boundary set by a double right-click. The training and drawing can be achieved in parallel to target the places where the segmentation is inaccurate.

The networks can be initialized by random weights or existing pre-trained weights. The multi-threaded plugin’s architecture makes it possible to annotate, train and observe the current segmentation results simultaneously. The user can annotate regions where the segmentation is inaccurate in priority, hence reducing the annotation time. The predictions can be restricted to a bounding box at each epoch of the training process, to reduce the processing time, which is particularly helpful for large scale images. Finally, users can work with a single image or a set of images for the inference and training steps.

3 Experiments

In this paragraph, we conduct several experiments to explore three distinct use cases of the method.

  • Learning from a limited set of annotations on a single image with randomly initialized neural network weights.

  • Learning from a limited set of annotations on a single image, starting from a pre-trained neural network.

  • Learning with randomly initialized weights using a large dataset with sparse annotations. We study the impact of the percentage of labeled pixels (10%, 25%, 50% and 100%) on the segmentation quality, when training on thousands of cells.

After describing the metrics used for validation, we will turn to the practical results. For all the experiments, we used a single Nvidia RTX5000 GPU with 16 Gb.

3.1 Evaluation metrics

To quantify the predictions quality, we enumerate the true positives (TP), the true negatives (TN) and the false positives (FP). A true positive is an object in the gold standard that can be matched to an object in the prediction with an Intersection over Union (IoU) criterion higher than a threshold τ𝜏\tau. We let TP(τ)𝑇𝑃𝜏TP(\tau) denote the total number of true positives. The total number of estimated objects without matches is denoted FP(τ)𝐹𝑃𝜏FP(\tau) (for false positives). The total number of gold standard objects without valid matches is denoted FN(τ)𝐹𝑁𝜏FN(\tau) (for false negatives). Utilizing these values, we compute the object detection accuracy metric (OA(τ)𝑂𝐴𝜏OA(\tau)) Caicedo et al. (2019) for each image using the formula:

OA(τ)=TP(τ)TP(τ)+FP(τ)+FN(τ).𝑂𝐴𝜏𝑇𝑃𝜏𝑇𝑃𝜏𝐹𝑃𝜏𝐹𝑁𝜏OA(\tau)=\frac{TP(\tau)}{TP(\tau)+FP(\tau)+FN(\tau)}.

The reported object dectection accuracy is then computed as the average over all images in the test set.

Additionally, we computed the average DICE and the Aggregated Jaccard Index defined as follows:

Jaggregated(A,B)subscript𝐽aggregated𝐴𝐵\displaystyle J_{\text{aggregated}}(A,B)=1Ni=1N|AiBi||AiBi|,absent1𝑁superscriptsubscript𝑖1𝑁subscript𝐴𝑖subscript𝐵𝑖subscript𝐴𝑖subscript𝐵𝑖\displaystyle=\frac{1}{N}\sum_{i=1}^{N}\frac{|A_{i}\cap B_{i}|}{|A_{i}\cup B_{i}|},
average DICE(A,B)average DICE𝐴𝐵\displaystyle\text{average DICE}(A,B)=1Ni=1N2|AiBi||Ai|+|Bi|,absent1𝑁superscriptsubscript𝑖1𝑁2subscript𝐴𝑖subscript𝐵𝑖subscript𝐴𝑖subscript𝐵𝑖\displaystyle=\frac{1}{N}\sum_{i=1}^{N}\frac{2\cdot|A_{i}\cap B_{i}|}{|A_{i}|+|B_{i}|},

where A𝐴A and B𝐵B are a dataset and its groundtruth.

3.2 Training from scratch on a single image

In this section, we will showcase several results achieved while training from scratch on a small set of images. The tests are made on a variety of biological structures (dendritic cells, osteoclasts, bacteria, insect eggs, adipose tissue, artistic image of cells).

3.2.1 Training details

For this experiment, the model have been trained for 100 epochs (\approx 2 minutes) with a batch size of 16 and image flips for data augmentation.

3.2.2 Staphylococcus aureus

In the example of Figure 5, we use a microscopy image of methicillin-resistant Staphylococcus aureus (MRSA) infections, from European Commission, Horizon Magazine (2020), “Can we reverse antibiotic resistance?”. It is reused under the European Commission’s reuse policy.

Refer to caption
(a) Sparse labels
Refer to captionRefer to caption
(b) Omnipose cyto2 model
Refer to captionRefer to caption
(c) Sketchpose result, trained from scratch (\approx 2’)
Refer to caption
(d) Evaluation of the segmentation quality
Figure 5: (a,b,c) A training from scratch with sparse labels (in blue on the left image). Image credit: Janice Carr, Jeff Hageman, USCDCP. (d) Evaluation of segmentation quality. Omnipose results are: DICE=0.92, Jaccard index=0.82. Sketchpose: DICE=0.99, Jaccard index=0.96.

After drawing for less than one minute and training for 100 epochs (\approx 2’), we achieve a much better result than the trained model of Omnipose (see Figure 5). The quality metrics is shown in Figure 5(d).

3.2.3 Eggs on a tree leaf

In this section, we picked an image from the Omnipose dataset, which likely represents eggs of an insect on a tree leaf. At first sight, the segmentation task is uneasy, since the objects are tightly connected, with identical textures and blurry boundaries. We first annotated a subset of 5 eggs in Figure 6(b) with a minimal amount of background. The segmentation result after training is already surprisingly good in Figure 6(d), but some objects are not detected, and others are merged. We annotated 2 extra eggs in Figure 6(c). With this extra information, retraining the network now produces a near perfect segmentation mask, with a single error (2 pink eggs on the left). This experiment illustrates a unique feature of Sketchpose: it is possible to interactively annotate while training. This offers a possibility to label a minimum amount of regions to reach the desired output. This principle sometimes called “active learning” or “human-in-the-loop” Budd et al. (2021) is significantly enhanced by using partial annotations and the user-friendly Napari interface.

Refer to caption
(a) Image
Refer to caption
(b) Labeled Set 1 (LS1)
Refer to caption
(c) Labeled Set 2 (LS2)
Refer to caption
(d) Sketchpose LS1 \approx 10’
Refer to caption
(e) Sketchpose LS2 \approx 3’
Figure 6: Progressive training in Sketchpose. In this example, we show that it is possible to improve the segmentation performance of Sketchpose by progressively annotating at places where the network failed. Here, a quite minimal annotation set is enough to near perfectly separate the eggs on the leaf.

3.3 Transfer learning on a single image

In this section, we explore the feasibility of improving pre-trained weights using transfer learning.

3.3.1 Training details

As for the previous experiment, the model have been trained for 100 epochs (\approx 2 minutes) with a batch size of 16 and image flips for data augmentation.

3.3.2 Bacteria segmentation

Bacteria are often used as biological models (e.g. in DNA studies). A precise segmentation can be difficult to achieve, because they have elongated shapes and can be clustered.

The Omnipose Cutler et al. (2022) model was initially conceived to address the shortcomings of Cellpose for this task.

Figure 7 shows how transfer learning with sparse annotations can improve the Omnipose results by separating touching bacterias. Figure 7(d) shows a quantitative comparison of both methods. As can be seen, Sketchpose’s adapted weights provide much higher performance. A visual inspection indicates that all objects have been correctly separated, apart from the cluster touching the boundary on the bottom left.

Refer to captionRefer to caption
(a) Sparse labels
Refer to captionRefer to caption
(b) Omnipose bactphase
Refer to captionRefer to caption
(c) Sketchpose (\approx 1’)
Refer to caption
(d) Quality evaluation
Refer to captionRefer to caption
(e) Sparse labels (cropped)
Refer to captionRefer to caption
(f) Omnipose cyto2
Refer to captionRefer to caption
(g) Transfer learning \approx 2’
Refer to caption
(h) Quality evaluation
Refer to captionRefer to caption
(i) Sparse labels
Refer to captionRefer to caption
(j) Omnipose cyto2
Refer to captionRefer to caption
(k) Transfer learning \approx 5’
Refer to caption
(l) Quality evaluation
Figure 7: Transfer learning experiment. In all cases, just a few strokes are enough to significantly improve the segmentation quality. (a–d) Bacteria. (a–c) Training with a few sparse labels. (d) Evaluation of segmentation quality. Omnipose: DICE = 0.81, Jaccard = 0.53. Sketchpose: DICE = 0.90, Jaccard = 0.72. (e–h) Adipocytes. (e–g) Transfer learning from the Omnipose bact_phase model. (h) Evaluation of segmentation quality. Omnipose: DICE = 0.89, Jaccard = 0.69. Sketchpose: DICE = 0.89, Jaccard = 0.79. Image adapted from Zhu et al. (2021). (i–l) Osteoclasts. (i–k) Transfer learning from the Omnipose cyto2 model. (l) Evaluation of segmentation quality. Omnipose: DICE = 0.90, Jaccard = 0.68. Sketchpose: DICE = 0.95, Jaccard = 0.80.

3.3.3 Adipocytes segmentation

The image in Figure 7(e) shows a crop of a very large image of a skin explant provided by DIVA Expertise. One can see a part of the dermis (in pink) and above it, adipose tissue (large white circular cells). Adipose tissue is the third skin layer after the epidermis and dermis, also known as the hypodermis. Hypodermal cells (adipocytes) secrete specific molecules (e.g. adiponectin, leptin) which have a direct impact on the biology of fibroblasts present in the dermis, and also on keratinocytes present in the epidermis. They are the subject of numerous studies (see Bourdens et al. (2019) and Sadick et al. (2015) for instance). For most of the studies where skin explants are imaged, we first need to count the adipocytes number in the image, and remove any potential outliers detected in the dermis and epidermis.

While Omnipose cyto2 results in some undersegmentation for this task, the adapted weights provided by Sketchpose yields significantly enhanced results. Annotating 6 cells and a training for 100 epochs (less than 1 minute) were sufficient to significantly improve the quality of the segmentation and to remove the outliers from the dermis (see Figure 7(c)). Figure 7 shows a quantitative comparison between Omnipose and Sketchpose on this example.

3.3.4 Osteoclasts segmentation

Osteoclasts are responsible for bone resorption, and are widely studied (see Labour et al. (2016) for instance) as being responsible for certain pathologies such as osteoporosis when dysfunctional. Their differentiation goes through several stages, culminating in the activated osteoclast. The latter is generally large and contains numerous nuclei. Atlantic Bone Screen (ABS) company is investigating the effect of different drugs in inducing either proliferation or cell death in these activated osteoclasts, in order to regulate their population. To do so, they extract osteoclasts from biopsies, culture them, apply the drugs and image them under a bright-field microscope.

The studied image is a crop of an image containing around 20,000 cells. We can see touching cells presenting a great variety in size, shape color. The image is complex to segment and poses a real challenge. What is more, ABS does not want to count pre-osteoclasts (small black nuclei), but only the mature cells (according to specific nuclei criteria). Each study comprises around sixty images, hence manual counting task performed at ABS is costly and laborious.

In Figure 7, we present a qualitative depiction that underscores the enhancement in segmentation accuracy attained through transfer learning with just a few labels. Labeling required approximately 2 minutes, while the training process took about 5 minutes. A quantitative comparison is available in Figure 7(l).

3.4 Training from scratch on large datasets

The aim of this experiment is to highlight the possibility to train our model on large datasets with sparse annotations. We use two different datasets as illustrated in Figure 8.

Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Refer to caption
Figure 8: Top: examples of PanNuke images. Bottom: examples of MicrobeSeg images.
Microbeseg dataset

The Microbeseg dataset Scherr et al. (2022) contains 826 fluorescent microscopy images of bacteria with about 30,000 manually annotated objects). It contains a mix of datasets from Omnipose and the Cell Tracking Challenge. It is publicly available here.

PanNuke dataset

The PanNuke dataset Gamper et al. (2019) contains 7,904 image tiles of histopathology slides stained with H&E across 19 tissue types, each with nuclear instance segmentations and five-class nuclear type annotations. It is publicly available from Kaggle and is widely used for benchmarking nucleus segmentation and classification algorithms.

3.4.1 Selecting annotation subsets

In this section, we investigate the model robustness across various annotation levels each characterized by a different percentage of annotated pixels: 10%, 25%, 50%, and 100%. We generate randomly binary masks by thresholding white Gaussian noise with a Gaussian filter of variance σ2superscript𝜎2\sigma^{2}. The resulting Gaussian process is then thresholded to keep only a given proportion of pixels.

While the model is stochastic in nature, the generated data is created once and for all, enabling its deterministic reuse across multiple training sessions. Figure 9 shows an image accompanied by four corresponding label masks illustrating decreasing levels of annotation sparsity.

Refer to caption
(a) A raw image.
Refer to caption
(b) Fully annotated.
Refer to caption
(c) 50%
Refer to caption
(d) 25%
Refer to caption
(e) 10%
Figure 9: Annotations example with four different sparsity levels and a domain regularity σ=50𝜎50\sigma=50.

3.4.2 Training details

For each dataset, we trained the Sketchpose model for 1000 epochs. This takes about 5 hours for the Microbeseg dataset and 30 hours for the PanNuke dataset using our Nvidia RTX5000. Each model was trained with the four percentages of annotated pixels we described above. There was no data augmentation, except random cropping of the images to 224x224 pixels. This is the size which was used in the original Omnipose model.

3.4.3 Results

We evaluated and compared the performance using two alternative models. The first one is the Cellpose 3.0 model Stringer and Pachitariu (2025), which regresses a distance to the objects centroids. The second one is the LKCell model Cui et al. (2024), which is a better performing variant of CellVit Hörst et al. (2024), itself a variant of HoverNet Graham et al. (2019). These models are among the most popular and best performing for histopathology images such as the PanNuke dataset. While they perform classification and segmentation, we will just compare their ability to segment objects, as we are not interested in the classification task here. We trained both LKCell and Cellpose 3.0 from scratch on each of the two datasets with complete annotations for 1000 epochs.

We compare the performance using other standard quality metrics used in instance segmentation. In all the metrics below, an IoU threshold of 50% is used.

  • Precision: measures how many of the predicted positives are actually correct (i.e., the fraction of predicted segments that are true).

  • Recall: how many of the actual positives were correctly predicted (i.e., how complete the prediction is).

  • F1-Score: harmonic mean of precision and recall, balancing both.

  • Detection Quality (DQ): evaluates object-level detection performance, penalizing missed or extra objects. It evaluates the ability to detect object instances correctly, regardless of segmentation quality.

  • Segmentation Quality (SQ): measures how well matched objects are segmented, assuming correct pairing, reflecting the quality of the predicted segment.

The main results are reported in Table 2, and several key observations emerge.

Best-performing methods

On the MicrobeSeg dataset, Cellpose and Sketchpose (100% annotations) deliver the best performance, while LKCell lags behind with a 15% lower F1-score. Conversely, on PanNuke, LKCell outperforms both competitors with a 4–6% F1-score gain, confirming its suitability for this dataset.

Impact of annotation density

For MicrobeSeg, reducing annotation density leads to a significant performance drop for Sketchpose: around 10% at 50–25% annotations, and up to 20% with only 10%. Depending on the application, such degradation may or may not be acceptable.

The situation is more favorable for PanNuke. Sketchpose maintains stable performance, with only a 4% drop when reducing annotations from 100% to 10%. Given that sparse annotations likely reduce annotation time by a factor of ten, this is a promising result—especially since random sampling was used. In practice, targeted annotations by an expert would likely yield even better outcomes.

This contrast may stem from dataset characteristics: PanNuke contains simpler, roughly convex objects, while MicrobeSeg features elongated or irregular shapes, making annotation density more critical.

IoU matching thresholds

To provide a refined view of the performance, we also plot the F1-score as a function of the IoU matching threshold in Figure 10. There, we see that the ranking between the methods is stable up to a mathcing threshold of 70%, which is usually considered a high precision segmentation in biological imaging. A surprising phenomenon is that Sketchpose trained with 25% of annotations performs better than the 50% model on the MicrobeSeg dataset. Similarly, the 50% model performs better than the 100% model on the PanNuke dataset. This might indicate that carefully selected annotations can lead to better results than complete annotation, or helps reducing the influence of errors in the gold-standard database.

Sketchpose is the first distance-based method allowing to take advantage of this observation.

Table 2: Comparison of segmentation methods trained and tested on the MicrobeSeg and PanNuke datasets using various metrics. All values are computed with an IoU threshold of 50%.
MicrobeSeg datasetPanNuke dataset
LKCellCellpose100%50%25%10%LKCellCellpose100%50%25%10%
Precision0.730.880.770.630.680.550.820.880.860.840.780.75
Recall0.650.750.890.860.830.780.840.710.710.750.770.73
F1-Score0.660.790.810.690.720.600.830.780.770.790.770.73
DICE0.820.860.850.850.830.800.880.870.870.880.870.87
Jaccard0.730.790.750.750.720.690.810.790.790.800.790.77
Det. Quality (DQ)0.730.880.770.630.680.550.820.880.860.840.780.75
Seg. Quality (SQ)0.730.790.750.750.720.690.810.790.790.800.790.77
Refer to caption
(a) F1-score over the Microbeseg test dataset.
Refer to caption
(b) F1-score over the PanNuke test dataset.
Figure 10: Evaluation of segmentation F1-score over Microbeseg’s and PanNuke’s test datasets as a function of the percentage of annotated pixels compared to Cellpose and LKCell.

4 Discussion & conclusion

We introduced Sketchpose, an open-source plugin to extend the applicability of Omnipose to partial annotations. From a methodological aspect, we developed a theory making it possible to use distance functions, despite having only access to partial information on the objects boundaries. From a more practical viewpoint, we developed an interactive interface within Napari, which facilitates efficient online learning with a real-time visualization of the training progress. The multi-threaded implementation allows users to continue annotating while the neural network trains or infers.

The new training procedure was tested in three different frames: i) training a neural network from scratch and just a few strokes, ii) improving the weights of a pre-trained network (a.k.a. transfer learning or human in the loop), iii) training with massive, but partial annotations.

For point i), frugal annotation works surprisingly well on a few test cases despite really limited information. A dozen strokes are already enough to provide results on par – or better – than pre-trained networks.

For point ii), our experiments demonstrated the potential benefits of using transfer learning. That is, starting with a pre-trained Omnipose models, we can further refine it using our methodology.

As for point iii), the conclusions are diverse. For datasets containing simple object shapes, such as PanNuke, it seems that a limited number of annotations (down to 25%) is sufficient to achieve results on par or even better than complete annotations. For more complex objects, it seems that complete annotations are still preferable. These conclusions should be validated on a case by case basis, but the ability to annotate while training make it possible to take the minimal amount of annotation time for a given task.

The method also shows a few limitations. First, it would benefit from faster training times to make the method even more interactive. We plan to improve this aspect in the forthcoming versions. Second, it is important to mention that our formalism is currently restricted to the two dimensional setting with two labels (background / foreground). Extending the methodology to numerous classes is rather straightforward, and the proposed ideas extend directly to this case. However, the proposed strategy do not extend to 3D directly. It could be used if the user was able to delineate a surface surrounding the objects of interest, but not just curves in 2D. Indeed, this would result in an empty valid distance set (see Theorem 4) and unadapted loss functions. This limitation of the method must be put into perspective by the fact that even the Cellpose 3D model is based on 2D predictions only, which are aggregated in post-processing.

In summary, the proposed method demonstrated numerous qualities in 2D for partial annotations. We showed that it is possible to train complex networks with a few sketches, reducing the annotation burden significantly. Further developments are needed to accelerate the training process and for a multi-class extension in 3D.


Acknowledgments

C. Cazorla was a recipient of ANRT (Agence Nationale pour la Recherche et la Technologie) in the context of the CIFRE Ph.D. program (N°2020/0843) with Imactiv-3D and Institut de Mathématiques de Toulouse (IMT). P. Weiss acknowledges a support from ANR-3IA Artificial and Natural Intelligence Toulouse Institute ANR-19-PI3A-0004 and from the ANR Micro-Blind ANR-21-CE48-0008. This work was performed using HPC resources from GENCI-IDRIS (Grant 2021-AD011012210R1).

We are grateful for the information provided by Kevin John Cutler about the original Omnipose implementation. The authors acknowledge Atlantic Bone Screen for providing the osteoclasts image and DIVA Expertise for providing the adipocytes image.


Ethical Standards

The work follows appropriate ethical standards in conducting research and writing the manuscript, following all applicable laws and regulations regarding treatment of animals or human subjects.


Conflicts of Interest

We declare we don’t have conflicts of interest.


Data availability

References

  • Arzt et al. (2022) Matthias Arzt, Joran Deschamps, Christopher Schmied, Tobias Pietzsch, Deborah Schmidt, Pavel Tomancak, Robert Haase, and Florian Jug. Labkit: labeling and segmentation toolkit for big image data. Frontiers in computer science, 4:10, 2022.
  • Bai and Urtasun (2017) Min Bai and Raquel Urtasun. Deep watershed transform for instance segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5221–5229, 2017.
  • Berg et al. (2019) Stuart Berg, Dominik Kutra, Thorben Kroeger, Christoph N Straehle, Bernhard X Kausler, Carsten Haubold, Martin Schiegg, Janez Ales, Thorsten Beier, Markus Rudy, et al. Ilastik: interactive machine learning for (bio) image analysis. Nature methods, 16(12):1226–1232, 2019.
  • Bourdens et al. (2019) Marion Bourdens, Yannick Jeanson, Marion Taurand, Noémie Juin, Audrey Carrière, Franck Clément, Louis Casteilla, Anne-Laure Bulteau, and Valérie Planat-Bénard. Short exposure to cold atmospheric plasma induces senescence in human skin fibroblasts and adipose mesenchymal stromal cells. Scientific reports, 9(1):8671, 2019.
  • Budd et al. (2021) Samuel Budd, Emma C Robinson, and Bernhard Kainz. A survey on active learning and human-in-the-loop deep learning for medical image analysis. Medical Image Analysis, 71:102062, 2021.
  • Caicedo et al. (2019) Juan C. Caicedo, Jonathan Roth, Allen Goodman, Tim Becker, Kyle W. Karhohs, Matthieu Broisin, Csaba Molnar, Claire McQuin, Shantanu Singh, Fabian J. Theis, and Anne E. Carpenter. Evaluation of deep learning strategies for nucleus segmentation in fluorescence images. Cytometry Part A, 95(9):952–965, 2019. . URL https://onlinelibrary.wiley.com/doi/abs/10.1002/cyto.a.23863.
  • Chiu et al. (2022) Chi-Li Chiu, Nathan Clack, et al. napari: a python multi-dimensional image viewer platform for the research community. Microscopy and Microanalysis, 28(S1):1576–1577, 2022.
  • Cui et al. (2024) Ziwei Cui, Jingfeng Yao, Lunbin Zeng, Juan Yang, Wenyu Liu, and Xinggang Wang. Lkcell: Efficient cell nuclei instance segmentation with large convolution kernels. arXiv preprint arXiv:2407.18054, 2024.
  • Cutler et al. (2022) Kevin J Cutler, Carsen Stringer, Teresa W Lo, Luca Rappez, Nicholas Stroustrup, S Brook Peterson, Paul A Wiggins, and Joseph D Mougous. Omnipose: a high-precision morphology-independent solution for bacterial cell segmentation. Nature methods, 19(11):1438–1448, 2022.
  • Gamper et al. (2019) Johanna Gamper, Navid Alemi Koohbanani, Katarzyna Benet, Adnan Khuram, and Nasir Rajpoot. Pannuke: an open pan-cancer histology dataset for nuclei instance segmentation and classification. In European Congress on Digital Pathology (ECDP), pages 11–19. Springer, 2019. . URL https://doi.org/10.1007/978-3-030-23937-4_11.
  • Graham et al. (2019) Simon Graham, Quoc Dang Vu, Shan E Ahmed Raza, Ayesha Azam, Yee Wah Tsang, Jin Tae Kwak, and Nasir Rajpoot. Hover-net: Simultaneous segmentation and classification of nuclei in multi-tissue histology images. Medical Image Analysis, 58:101563, 2019.
  • He et al. (2017) Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. In Proceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017.
  • Hörst et al. (2024) Fabian Hörst, Moritz Rempe, Lukas Heine, Constantin Seibold, Julius Keyl, Giulia Baldini, Selma Ugurel, Jens Siveke, Barbara Grünwald, Jan Egger, et al. Cellvit: Vision transformers for precise cell segmentation and classification. Medical Image Analysis, 94:103143, 2024.
  • Kass et al. (1988) Michael Kass, Andrew Witkin, and Demetri Terzopoulos. Snakes: Active contour models. International journal of computer vision, 1(4):321–331, 1988.
  • Kumar et al. (2019) Jitendra Kumar, R. Srivastava, R. Srivastava, A. Mehrotra, R. Srivastava, R. Srivastava, and R. Srivastava. Deep learning framework for recognition of diabetic retinopathy using monocular color fundus images. Computers in Biology and Medicine, 109:283–293, 2019. .
  • Labour et al. (2016) Marie-Noëlle Labour, Mathieu Riffault, Søren T Christensen, and David A Hoey. Tgfβ𝛽\beta1–induced recruitment of human bone mesenchymal stem cells is mediated by the primary cilium in a smad3-dependent manner. Scientific reports, 6(1):35542, 2016.
  • Legland et al. (2016) David Legland, Ignacio Arganda-Carreras, and Philippe Andrey. Morpholibj: integrated library and plugins for mathematical morphology with imagej. Bioinformatics, 32(22):3532–3534, 2016.
  • Lempitsky et al. (2018) Victor Lempitsky, Andrea Vedaldi, and Dmitry Ulyanov. Deep image prior. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 9446–9454. IEEE, 2018.
  • Naylor et al. (2018) Peter Naylor, Marick Laé, Fabien Reyal, and Thomas Walter. Segmentation of nuclei in histopathology images by deep regression of the distance map. IEEE transactions on medical imaging, 38(2):448–459, 2018.
  • Otsu (1979) Nobuyuki Otsu. A threshold selection method from gray-level histograms. IEEE transactions on systems, man, and cybernetics, 9(1):62–66, 1979.
  • Pachitariu and Stringer (2022) Marius Pachitariu and Carsen Stringer. Cellpose 2.0: how to train your own model. Nature methods, 19(12):1634–1641, 2022.
  • Peters (2022) Ryan Peters. Torchvf: Vector fields for instance segmentation. 2022.
  • Ronneberger et al. (2015) Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  • Sadick et al. (2015) Neil S Sadick, Andrew S Dorizas, Nils Krueger, and Amer H Nassar. The facial adipose system: its role in facial aging and approaches to volume restoration. Dermatologic Surgery, 41:S333–S339, 2015.
  • Scherr et al. (2022) Tim Scherr, Johannes Seiffarth, Bastian Wollenhaupt, Oliver Neumann, Dietrich Kohlheyer, Hanno Scharr, Katharina Nöh, and Ralf Mikut. microbeseg models, 2022. URL https://zenodo.org/record/7221151.
  • Schmidt et al. (2018) Uwe Schmidt, Martin Weigert, Coleman Broaddus, and Gene Myers. Cell detection with star-convex polygons. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2018: 21st International Conference, Granada, Spain, September 16-20, 2018, Proceedings, Part II 11, pages 265–273. Springer, 2018.
  • Serra and Soille (2012) Jean Serra and Pierre Soille. Mathematical morphology and its applications to image processing, volume 2. Springer Science & Business Media, 2012.
  • Stringer and Pachitariu (2025) Carsen Stringer and Marius Pachitariu. Cellpose3: one-click image restoration for improved cellular segmentation. Nature methods, 22(3):592–599, 2025.
  • Stringer et al. (2021) Carsen Stringer, Tim Wang, Michalis Michaelos, and Marius Pachitariu. Cellpose: a generalist algorithm for cellular segmentation. Nature methods, 18(1):100–106, 2021.
  • Sugawara (2023) Ko Sugawara. Training deep learning models for cell image segmentation with sparse annotations. BioRxiv, pages 2023–06, 2023.
  • Vincent and Soille (1991) Luc Vincent and Pierre Soille. Watersheds in digital spaces: an efficient algorithm based on immersion simulations. IEEE Transactions on Pattern Analysis & Machine Intelligence, 13(06):583–598, 1991.
  • Zhu et al. (2021) Lillian Zhu, Manohary Rajendram, and Kerwyn Casey Huang. Effects of fixation on bacterial cellular dimensions and integrity. iScience, 24(4):102348, 2021. ISSN 2589-0042. . URL https://www.sciencedirect.com/science/article/pii/S2589004221003163.

A Proof of the valid distance set theorem

We start with a basic observation.

Proposition 5 (Properties of the distance function)
  • 𝒜1𝒜2𝐱𝒳,dist(𝐱,𝒜2)dist(𝐱,𝒜1)formulae-sequencesubscript𝒜1subscript𝒜2for-all𝐱𝒳dist𝐱subscript𝒜2dist𝐱subscript𝒜1\mathcal{A}_{1}\subset\mathcal{A}_{2}\Rightarrow\forall\mathbf{x}\in\mathcal{X},\mathrm{dist}(\mathbf{x},\mathcal{A}_{2})\leq\mathrm{dist}(\mathbf{x},\mathcal{A}_{1}).

  • 𝒜1𝒜2subscript𝒜1subscript𝒜2\mathcal{A}_{1}\subset\mathcal{A}_{2} and 𝐱𝒜1𝐱subscript𝒜1\mathbf{x}\in\mathcal{A}_{1} \Rightarrow dist(𝐱,𝒜1)dist(𝐱,𝒜2)dist𝐱subscript𝒜1dist𝐱subscript𝒜2\mathrm{dist}(\mathbf{x},\partial\mathcal{A}_{1})\leq\mathrm{dist}(\mathbf{x},\partial\mathcal{A}_{2}).

Proof  The first item is direct:

dist(𝐱,𝒜1)dist𝐱subscript𝒜1\displaystyle\mathrm{dist}(\mathbf{x},\mathcal{A}_{1})=inf𝐱𝒜1dist(𝐱,𝐱)absentsubscriptinfimumsuperscript𝐱subscript𝒜1distsuperscript𝐱𝐱\displaystyle=\inf_{\mathbf{x}^{\prime}\in\mathcal{A}_{1}}\mathrm{dist}(\mathbf{x}^{\prime},\mathbf{x})
inf𝐱𝒜2dist(𝐱,𝐱)=dist(𝐱,𝒜2).absentsubscriptinfimumsuperscript𝐱subscript𝒜2distsuperscript𝐱𝐱dist𝐱subscript𝒜2\displaystyle\geq\inf_{\mathbf{x}^{\prime}\in\mathcal{A}_{2}}\mathrm{dist}(\mathbf{x}^{\prime},\mathbf{x})=\mathrm{dist}(\mathbf{x},\mathcal{A}_{2}).

Here is one proof of the second iten by separating the two cases: either x𝒜1̊𝑥̊subscript𝒜1x\in\mathring{\mathcal{A}_{1}} or x𝒜1𝑥subscript𝒜1x\in\partial\mathcal{A}_{1}.

  • Case 1: x𝒜1𝑥subscript𝒜1x\in\partial\mathcal{A}_{1}. This case is trivial since dist(𝐱,𝒜1)=0dist(𝐱,𝒜2)dist𝐱subscript𝒜10dist𝐱subscript𝒜2\mathrm{dist}(\mathbf{x},\partial\mathcal{A}_{1})=0\leq\mathrm{dist}(\mathbf{x},\partial\mathcal{A}_{2}) by positivity of the distance.

  • Case 2: x𝒜1̊𝑥̊subscript𝒜1x\in\mathring{\mathcal{A}_{1}}. In that case, the key argument is to show that the open ball of radius dist(𝐱,𝒜1)dist𝐱subscript𝒜1\mathrm{dist}(\mathbf{x},\partial\mathcal{A}_{1}) centered in x𝑥x is included in 𝒜1̊̊subscript𝒜1\mathring{\mathcal{A}_{1}}. Precisely

    ~=def(x,dist(𝐱,𝒜1))𝒜1̊𝒜2̊.superscriptdef~𝑥dist𝐱subscript𝒜1̊subscript𝒜1̊subscript𝒜2\tilde{\mathcal{B}}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\mathcal{B}\left(x,\mathrm{dist}(\mathbf{x},\partial\mathcal{A}_{1})\right)\subseteq\mathring{\mathcal{A}_{1}}\subseteq\mathring{\mathcal{A}_{2}}.(16)

    Indeed having Equation (16) established implies by contraposition that

    𝒜2𝒜2̊c~csubscript𝒜2superscript̊subscript𝒜2𝑐superscript~𝑐\partial\mathcal{A}_{2}\subseteq\mathring{\mathcal{A}_{2}}^{c}\subseteq\tilde{\mathcal{B}}^{c}(17)

    where the first inclusion is given by 𝒜2=def𝒜2¯𝒜2̊𝒜2̊csuperscriptdefsubscript𝒜2¯subscript𝒜2̊subscript𝒜2superscript̊subscript𝒜2𝑐\partial\mathcal{A}_{2}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\bar{\mathcal{A}_{2}}\setminus\mathring{\mathcal{A}_{2}}\subseteq\mathring{\mathcal{A}_{2}}^{c}. Therefore, taking infimum with respect to these sets, implies the following inequalities and by the way the intended result.

    dist(𝐱,𝒜2)dist𝐱subscript𝒜2\displaystyle\mathrm{dist}(\mathbf{x},\partial\mathcal{A}_{2})=infz𝒜2zxabsentsubscriptinfimum𝑧subscript𝒜2norm𝑧𝑥\displaystyle=\inf_{z\in\partial\mathcal{A}_{2}}\|z-x\|
    infz𝒜2̊czxabsentsubscriptinfimum𝑧superscript̊subscript𝒜2𝑐norm𝑧𝑥\displaystyle\geq\inf_{z\in\mathring{\mathcal{A}_{2}}^{c}}\|z-x\|
    infz~czx=dist(𝐱,𝒜1).absentsubscriptinfimum𝑧superscript~𝑐norm𝑧𝑥dist𝐱subscript𝒜1\displaystyle\geq\inf_{z\in\tilde{\mathcal{B}}^{c}}\|z-x\|=\mathrm{dist}(\mathbf{x},\partial\mathcal{A}_{1}).

    So let’s prove Equation (16): by contradiction, assume that there exists z~𝒜1̊c𝑧~superscript̊subscript𝒜1𝑐z\in\tilde{\mathcal{B}}\cap\mathring{\mathcal{A}_{1}}^{c}. Notice that [x,z]𝒜1̊c𝑥𝑧superscript̊subscript𝒜1𝑐[x,z]\cap\mathring{\mathcal{A}_{1}}^{c} is a compact set (here [x,z]𝑥𝑧[x,z] denotes the closed segment between the points x𝑥x and z𝑧z). Thus

    z=defargminx[x,z]𝒜1̊cxxsuperscriptdefsuperscript𝑧subscriptargminsuperscript𝑥𝑥𝑧superscript̊subscript𝒜1𝑐norm𝑥superscript𝑥z^{*}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\operatorname*{arg\,min}_{x^{\prime}\in[x,z]\cap\mathring{\mathcal{A}_{1}}^{c}}\|x-x^{\prime}\|

    is well defined and the semi-open segment [x,z[[x,z^{*}[ is included in 𝒜1̊̊subscript𝒜1\mathring{\mathcal{A}_{1}}. This implies that z𝒜1superscript𝑧subscript𝒜1z^{*}\in\partial\mathcal{A}_{1} since the sequence zn=defx+(11n)(zx)[x,z[𝒜1̊z_{n}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}x+(1-\frac{1}{n})\left(z^{*}-x\right)\in[x,z^{*}[\subseteq\mathring{\mathcal{A}_{1}} converges to zsuperscript𝑧z^{*}. The contradiction comes from

    dist(𝐱,𝒜1)zxzx<dist(𝐱,𝒜1).dist𝐱subscript𝒜1normsuperscript𝑧𝑥norm𝑧𝑥dist𝐱subscript𝒜1\mathrm{dist}(\mathbf{x},\partial\mathcal{A}_{1})\leq\|z^{*}-x\|\leq\|z-x\|<\mathrm{dist}(\mathbf{x},\partial\mathcal{A}_{1}).

    The first inequality holds because z𝒜1superscript𝑧subscript𝒜1z^{*}\in\partial\mathcal{A}_{1}, the second one because z[x,z]superscript𝑧𝑥𝑧z^{*}\in[x,z] and last one because z~𝑧~z\in\tilde{\mathcal{B}}.

    By proof by contradiction, Equation (16) holds.

In conclusion, in all cases the inequality is verified.  

Theorem 4 can be proven in two steps. First, notice that the inclusion \mathcal{B}\subseteq\mathcal{E} (Assumption 1) and the first bullet in Proposition 5 implies that dist(𝐱,)dist(𝐱,)dist𝐱dist𝐱\mathrm{dist}(\mathbf{x},\mathcal{E})\leq\mathrm{dist}(\mathbf{x},\mathcal{B}) for any 𝐱𝒳𝐱𝒳\mathbf{x}\in\mathcal{X}.

Let’s establish the converse inequality. Let 𝐱𝐱\mathbf{x} denote an arbitrary point in 𝒟𝒟\mathcal{D}. Aiming for a proof by contradiction, assume that dist(𝐱,)<dist(𝐱,)dist𝐱dist𝐱\mathrm{dist}(\mathbf{x},\mathcal{E})<\mathrm{dist}(\mathbf{x},\mathcal{B}). We can proceed by separating two cases:

  • Case 1: dist(𝐱,)=0dist𝐱0\mathrm{dist}(\mathbf{x},\mathcal{E})=0. This implies that 𝐱𝐱\mathbf{x}\in\mathcal{E} since the set \mathcal{E} is closed as a finite union of closed sets 𝒳i,jsubscript𝒳𝑖𝑗\partial\mathcal{X}_{i,j}. Moreover, as 𝐱𝐱\mathbf{x} belongs to 𝒟𝒟\mathcal{D}, in particular 𝐱𝐱\mathbf{x} belongs to 𝒮𝒮\mathcal{S}. It is sufficient to apply (11) and obtain 𝐱𝒮𝐱𝒮\mathbf{x}\in\mathcal{E}\cap\mathcal{S}\subseteq\mathcal{B} which is inconsistent with dist(𝐱,)>0dist𝐱0\mathrm{dist}(\mathbf{x},\mathcal{B})>0.

  • Case 2: r=defdist(𝐱,)>0superscriptdef𝑟dist𝐱0r\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\mathrm{dist}(\mathbf{x},\mathcal{E})>0. The point 𝐱𝐱\mathbf{x} verifies dist(𝐱,)dist(𝐱,𝒞)dist𝐱dist𝐱𝒞\mathrm{dist}(\mathbf{x},\mathcal{B})\leq\mathrm{dist}(\mathbf{x},\mathcal{C}\mathcal{B}) as 𝐱𝒟𝐱𝒟\mathbf{x}\in\mathcal{D}. Let us define r=defdist(𝐱,)>0superscriptdef𝑟dist𝐱0r\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\mathrm{dist}(\mathbf{x},\mathcal{E})>0 and

    ε𝜀\displaystyle\varepsilon=defdist(𝐱,𝒞)dist(𝐱,)superscriptdefabsentdist𝐱𝒞dist𝐱\displaystyle\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\mathrm{dist}(\mathbf{x},\mathcal{C}\mathcal{B})-\mathrm{dist}(\mathbf{x},\mathcal{E})
    dist(𝐱,)dist(𝐱,)>0absentdist𝐱dist𝐱0\displaystyle\geq\mathrm{dist}(\mathbf{x},\mathcal{B})-\mathrm{dist}(\mathbf{x},\mathcal{E})>0

    by assumption. Since 𝐱𝐱\mathbf{x} belongs to 𝒮𝒮\mathcal{S}, there exists i0{0,1}subscript𝑖001i_{0}\in\{0,1\} such that 𝐱𝒮i0𝐱subscript𝒮subscript𝑖0\mathbf{x}\in\mathcal{S}_{i_{0}}. Because dist(𝐱,)=rdist𝐱𝑟\mathrm{dist}(\mathbf{x},\mathcal{E})=r, there exists a point 𝐳𝐳\mathbf{z}\in\mathcal{E} such that r𝐱𝐳2=r+ε/2𝑟subscriptnorm𝐱𝐳2𝑟𝜀2r\leq\|\mathbf{x}-\mathbf{z}\|_{2}=r+\varepsilon/2

    • Case 2.a: 𝐳𝒮i0𝐳subscript𝒮subscript𝑖0\mathbf{z}\in\mathcal{S}_{i_{0}}. By assumption (11), the contradiction comes quickly since now

      𝐳𝒮i0𝒮𝒞𝐳subscript𝒮subscript𝑖0𝒮𝒞\mathbf{z}\in\mathcal{S}_{i_{0}}\cap\mathcal{E}\subseteq\mathcal{S}\cap\mathcal{E}\subseteq\mathcal{B}\subseteq\mathcal{C}\mathcal{B}(18)

      and this implies the contradictive inequality

      r+ε=dist(𝐱,𝒞)𝐱𝐳2r+ε/2.𝑟𝜀dist𝐱𝒞subscriptnorm𝐱𝐳2𝑟𝜀2r+\varepsilon=\mathrm{dist}(\mathbf{x},\mathcal{C}\mathcal{B})\leq\|\mathbf{x}-\mathbf{z}\|_{2}\leq r+\varepsilon/2.
    • Case 2.b: 𝐳𝒮i0𝐳subscript𝒮subscript𝑖0\mathbf{z}\notin\mathcal{S}_{i_{0}}. In that case, we may define the point 𝐲𝐲\mathbf{y} on the line [𝐱,𝐳]𝐱𝐳[\mathbf{x},\mathbf{z}] which is the nearest from the point 𝐱𝐱\mathbf{x} and also in 𝒮i0subscript𝒮subscript𝑖0\partial\mathcal{S}_{i_{0}}. Since 𝐲𝒮i0𝒞𝐲subscript𝒮subscript𝑖0𝒞\mathbf{y}\in\partial\mathcal{S}_{i_{0}}\subseteq\mathcal{C}\mathcal{B}, it implies a contradiction as intended:

      r+ε𝑟𝜀\displaystyle r+\varepsilon=d(𝐱,𝒞)𝐱,P(s)2\displaystyle=d(\mathbf{x},\mathcal{C}\mathcal{B})\leq\|\mathbf{x},P(s)\|_{2}
      =s𝐱𝐳2𝐱𝐳2=r+ε/2.absent𝑠subscriptnorm𝐱𝐳2subscriptnorm𝐱𝐳2𝑟𝜀2\displaystyle=s\|\mathbf{x}-\mathbf{z}\|_{2}\leq\|\mathbf{x}-\mathbf{z}\|_{2}=r+\varepsilon/2.

      The point 𝐲𝐲\mathbf{y} is defined as P(s)𝑃𝑠P(s) where the map P:tt𝐳+(1t)𝐱:𝑃maps-to𝑡𝑡𝐳1𝑡𝐱P:t\mapsto t\mathbf{z}+(1-t)\mathbf{x} assigns to each scalar t[0,1]𝑡01t\in[0,1] a point P(t)𝑃𝑡P(t) of the line and set

      s=definfP(t)𝒮i0t.superscriptdef𝑠subscriptinfimum𝑃𝑡subscript𝒮subscript𝑖0𝑡s\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\inf_{P(t)\in\mathcal{S}_{i_{0}}}t.(19)

      Since P(0)=𝐱𝒮i0𝑃0𝐱subscript𝒮subscript𝑖0P(0)=\mathbf{x}\in\mathcal{S}_{i_{0}} and P(1)=𝐳𝒮i0𝑃1𝐳subscript𝒮subscript𝑖0P(1)=\mathbf{z}\notin\mathcal{S}_{i_{0}}, the scalar s𝑠s is well defined. The remaining task is to show that P(s)𝒮i0𝑃𝑠subscript𝒮subscript𝑖0P(s)\in\partial\mathcal{S}_{i_{0}}. The argument works by construction and with a topological argument. Indeed, by definition of the infimum, there exists a sequence 0<ηn00subscript𝜂𝑛00<\eta_{n}\rightarrow 0 such that P(s+ηn)𝒮i0𝑃𝑠subscript𝜂𝑛subscript𝒮subscript𝑖0P(s+\eta_{n})\notin\mathcal{S}_{i_{0}}, thus P(s)𝒮i0̊𝑃𝑠̊subscript𝒮subscript𝑖0P(s)\notin\mathring{\mathcal{S}_{i_{0}}}. Also by definition, for all 0η<s0𝜂𝑠0\leq\eta<s, P(η)𝒮i0𝑃𝜂subscript𝒮subscript𝑖0P(\eta)\in\mathcal{S}_{i_{0}}, thus P(s)𝒮i0¯𝑃𝑠¯subscript𝒮subscript𝑖0P(s)\in\bar{\mathcal{S}_{i_{0}}}

In both cases, the assumption dist(𝐱,)<dist(𝐱,)dist𝐱dist𝐱\mathrm{dist}(\mathbf{x},\mathcal{E})<\mathrm{dist}(\mathbf{x},\mathcal{B}) leads to a contradiction. We deduce that dist(𝐱,)dist(𝐱,)dist𝐱dist𝐱\mathrm{dist}(\mathbf{x},\mathcal{E})\geq\mathrm{dist}(\mathbf{x},\mathcal{B}).

The second inequality in Theorem 4 is a consequence of the property (10). Indeed, this property implies that we can separate the strokes 𝒮isubscript𝒮𝑖\mathcal{S}_{i} into connected components 𝒮i,jsubscript𝒮𝑖𝑗\mathcal{S}_{i,j}. These are subsets of the connected components 𝒳i,jsubscript𝒳𝑖superscript𝑗\mathcal{X}_{i,j^{\prime}} for some jsuperscript𝑗j^{\prime} depending on j𝑗j. The inequality is then just a consequence of Proposition 5.