Accelerated and improved accuracy of PII detection plugin for Krita 6 with Amazon Rekognition

Accelerated and improved accuracy of PII detection plugin for Krita 6 with Amazon Rekognition

I tried using Amazon Rekognition DetectText for the PII detection plugin for Krita 6. It reduced the processing time from about 30 seconds with the EasyOCR version to about 9 seconds, while also reducing detection misses.
2026.02.11

This page has been translated by machine translation. View original

Introduction

In my previous article, I prototyped a Krita 6 plugin that automatically selects areas containing PII (Personal Identifiable Information) using EasyOCR. However, in my environment, processing was slow due to CPU-only execution, and there were issues with accurately reading small text. In this article, I'll add an implementation using Amazon Rekognition's DetectText to improve speed and accuracy.

As with the previous article, the plugin will focus on these specific roles:

  • Export the current Krita display to a temporary file
  • Execute PII detection in external Python and output rectangles to JSON
  • Apply the JSON rectangles to the selection area

Test Environment

  • OS: Windows 11
  • Krita: Krita 6.0.0 beta1
  • GPU: None (OCR is executed on CPU only)
  • External Python: Python 3.11.9
  • boto3: 1.42.46
  • pillow: 12.1.1

Target Audience

  • People who want to mask personal information in screenshots or images using Krita 6
  • Those interested in Krita Python plugin development
  • Those considering OCR implementation using Amazon Rekognition's DetectText API
  • Those interested in speed and accuracy improvements from the previous EasyOCR version
  • Those who can perform basic operations with Python and AWS CLI

References

Prerequisites

Prepare an IAM policy with minimal permissions to call DetectText.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "RekognitionText",
      "Effect": "Allow",
      "Action": [
        "rekognition:DetectText"
      ],
      "Resource": "*"
    }
  ]
}

Next, set up a profile in your local AWS CLI configuration. In this case, we'll use the profile name krita-ocr.

[profile krita-ocr]
region = ap-northeast-1

Implementation

As with the previous version, the Krita-side plugin is only responsible for exporting to a temporary file and applying the detection results to the selection area. The PII detection itself is separated into an external Python script that calls Amazon Rekognition DetectText via boto3. It converts the bounding boxes of lines returned by DetectText to pixel coordinates, adds margins, and outputs them to JSON.

The sample code is provided at the end of this article.

How to Use

  1. Place pii_mask_aws under the pykrita directory and enable it in Krita's Python Plugin Manager (same as the previous article)
  2. Open an image in Krita and run ToolsScriptsPII (AWS): Detect (email/phone)
  3. After detection completes, rectangles of lines identified as PII will be set as the selection area
    detection result
  4. Adjust the selection area as needed, and finally apply Krita's standard pixelation or other effects

Results

In my environment, the same image took about 30 seconds to process with the EasyOCR version and about 9 seconds with the Amazon Rekognition version. Measurements were taken once per image. Additionally, I reduced the detection misses that occurred with the EasyOCR version.

Email Addresses in Zendesk Console

  • EasyOCR
    • Detected 0 out of 5 cases
    • Processing time: 31.25s
      easyocr result of zendesk console
  • Amazon Rekognition
    • Detected 5 out of 5 cases
    • Processing time: 8.80s
      aws result of zendesk console

Phone Numbers in Twilio Console

  • EasyOCR

    • Detected 3 out of 6 cases
    • Processing time: 28.70s
      easyocr result of twilio console
  • Amazon Rekognition

    • Detected 5 out of 6 cases
    • Processing time: 9.54s
      aws result of twilio console

Email Addresses in Momento Console (abbreviated with ... at the end)

  • EasyOCR

    • Detected 0 out of 1 case
    • Processing time: 24.42s
      easyocr result of momento console
  • Amazon Rekognition

    • Detected 1 out of 1 case
    • Processing time: 9.12s
      aws result of momento console

Conclusion

I added an implementation using Amazon Rekognition DetectText to the Krita 6 PII detection plugin. In my CPU environment, it processed faster than the EasyOCR version and reduced the number of missed detections of small text. Since the masking process after detection is still handled by Krita's standard features, you can switch detection engines with minimal changes to your workflow.

Appendix: Sample Code

Appendix: Sample Code

pii_mask_aws/pii_mask_aws.py

# pykrita/pii_mask_aws/pii_mask_aws.py
from __future__ import annotations

import json
import os
import subprocess
import tempfile
import time
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple

from krita import Extension, InfoObject, Krita, Selection  # type: ignore

try:
    from PyQt6.QtWidgets import QMessageBox
except Exception:  # pragma: no cover
    from PyQt5.QtWidgets import QMessageBox  # type: ignore

PLUGIN_TITLE = "PII Detector (Amazon Rekognition)"
PLUGIN_MENU_TEXT = "PII (AWS): Detect (email/phone)"

def _load_config() -> Dict[str, Any]:
    cfg_path = Path(__file__).with_name("config.json")
    if not cfg_path.exists():
        raise RuntimeError(f"config.json not found: {cfg_path}")
    return json.loads(cfg_path.read_text(encoding="utf-8"))

def _msg(title: str, text: str) -> None:
    app = Krita.instance()
    w = app.activeWindow()
    parent = w.qwindow() if w is not None else None
    QMessageBox.information(parent, title, text)

def _err(title: str, text: str) -> None:
    app = Krita.instance()
    w = app.activeWindow()
    parent = w.qwindow() if w is not None else None
    QMessageBox.critical(parent, title, text)

def _export_projection_png(doc, out_path: Path) -> None:
    # projection (見た目通り) を書き出す
    doc.exportImage(str(out_path), InfoObject())

def _rects_to_selection(doc_w: int, doc_h: int, rects: List[Dict[str, int]]) -> Selection:
    sel = Selection()
    sel.resize(doc_w, doc_h)
    sel.clear()

    for r in rects:
        x, y, w, h = int(r["x"]), int(r["y"]), int(r["w"]), int(r["h"])
        if w <= 0 or h <= 0:
            continue

        tmp = Selection()
        tmp.resize(doc_w, doc_h)
        tmp.clear()
        tmp.select(x, y, w, h, 255)
        sel.add(tmp)

    return sel

def _clean_env_for_external_python() -> Dict[str, str]:
    """
    Krita 同梱 Python / Qt 周りの環境変数が外部 Python を汚染しがちなので、
    Python 系と Qt 系だけ除去して、AWS_* などは温存する。
    """
    env = os.environ.copy()

    for k in list(env.keys()):
        if k.upper().startswith("PYTHON"):
            env.pop(k, None)

    for k in ["QT_PLUGIN_PATH", "QML2_IMPORT_PATH"]:
        env.pop(k, None)

    return env

def _detector_sanity_check(detector_path: Path) -> Optional[str]:
    try:
        head = detector_path.read_text(encoding="utf-8", errors="ignore")[:2000]
    except Exception:
        return None

    if "from krita import" in head or "import krita" in head:
        return (
            "Detector script looks like a Krita plugin (it imports 'krita').\n"
            "config.json 'detector' may be pointing to the wrong file.\n\n"
            f"detector: {detector_path}"
        )

    return None

def _format_timings(data: Dict[str, Any]) -> Tuple[str, str]:
    """
    Returns:
      - short line for message
      - verbose block for message
    """
    timings = data.get("timings")
    if not isinstance(timings, dict):
        return "", ""

    def _f(key: str) -> Optional[float]:
        v = timings.get(key)
        try:
            return float(v)
        except Exception:
            return None

    total = _f("total_sec")
    aws = _f("aws_sec")
    post = _f("post_sec")

    parts: List[str] = []
    if total is not None:
        parts.append(f"detector(total)={total:.2f}s")
    if aws is not None:
        parts.append(f"aws={aws:.2f}s")
    if post is not None:
        parts.append(f"post={post:.2f}s")

    if not parts:
        return "", ""

    short = " / ".join(parts)
    verbose = "\n".join([f"- {p}" for p in parts])
    return short, verbose

class PIIRekognitionExtension(Extension):
    def __init__(self, parent):
        super().__init__(parent)

    def setup(self) -> None:
        pass

    def createActions(self, window) -> None:
        a1 = window.createAction(
            "pii_detect_aws_email_phone",
            PLUGIN_MENU_TEXT,
            "tools/scripts",
        )
        a1.triggered.connect(self.detect)

    def detect(self) -> None:
        app = Krita.instance()
        doc = app.activeDocument()
        if doc is None:
            _err(PLUGIN_TITLE, "No active document.")
            return

        try:
            cfg = _load_config()
        except Exception as e:
            _err(PLUGIN_TITLE, f"Failed to load config.json:\n{e}")
            return

        py = Path(str(cfg.get("python", "")))
        detector = Path(str(cfg.get("detector", "")))

        aws_profile = str(cfg.get("aws_profile", "")).strip()
        aws_region = str(cfg.get("aws_region", "")).strip()

        min_conf = float(cfg.get("min_conf", 0.40))
        pad = int(cfg.get("pad", 2))
        scale = int(cfg.get("scale", 0))

        if not py.exists():
            _err(PLUGIN_TITLE, f"python not found:\n{py}")
            return
        if not detector.exists():
            _err(PLUGIN_TITLE, f"detector not found:\n{detector}")
            return

        bad = _detector_sanity_check(detector)
        if bad:
            _err(PLUGIN_TITLE, bad)
            return

        tmp_dir = Path(tempfile.gettempdir()) / "krita_pii_detector_aws"
        tmp_dir.mkdir(parents=True, exist_ok=True)
        in_png = tmp_dir / "input.png"
        out_json = tmp_dir / "hits.json"

        try:
            _export_projection_png(doc, in_png)
        except Exception as e:
            _err(PLUGIN_TITLE, f"Failed to export image:\n{e}")
            return

        cmd: List[str] = [
            str(py),
            "-E",
            "-s",
            str(detector),
            "--in",
            str(in_png),
            "--out",
            str(out_json),
            "--min-conf",
            str(min_conf),
            "--pad",
            str(pad),
            "--scale",
            str(scale),
        ]
        if aws_profile:
            cmd += ["--aws-profile", aws_profile]
        if aws_region:
            cmd += ["--aws-region", aws_region]

        t0 = time.perf_counter()
        try:
            p = subprocess.run(
                cmd,
                capture_output=True,
                text=True,
                check=False,
                env=_clean_env_for_external_python(),
            )
        except Exception as e:
            _err(PLUGIN_TITLE, f"Failed to run detector:\n{e}")
            return
        elapsed = time.perf_counter() - t0

        print(f"[{PLUGIN_TITLE}] detector elapsed(subprocess): {elapsed:.2f}s")
        if p.stdout.strip():
            print(f"[{PLUGIN_TITLE}] detector stdout:\n{p.stdout}")
        if p.stderr.strip():
            print(f"[{PLUGIN_TITLE}] detector stderr:\n{p.stderr}")

        if p.returncode != 0:
            _err(
                PLUGIN_TITLE,
                f"Detector failed (code={p.returncode}).\n\n"
                f"Elapsed: {elapsed:.2f}s\n\nSTDERR:\n{p.stderr}\n\nSTDOUT:\n{p.stdout}",
            )
            return

        try:
            data = json.loads(out_json.read_text(encoding="utf-8"))
            hits = data.get("hits", [])
        except Exception as e:
            _err(PLUGIN_TITLE, f"Failed to read JSON:\n{e}")
            return

        rects = [h["rect"] for h in hits if isinstance(h, dict) and "rect" in h]
        sel = _rects_to_selection(doc.width(), doc.height(), rects)
        doc.setSelection(sel)

        timing_short, timing_verbose = _format_timings(data)
        timing_line = f"\n{timing_short}" if timing_short else ""

        _msg(
            PLUGIN_TITLE,
            f"Detection done.\nHits: {len(rects)}\nElapsed: {elapsed:.2f}s{timing_line}\n\n"
            "This plugin sends the rendered image to Amazon Rekognition DetectText.\n"
            "Adjust selection manually, then apply pixelize (or other) filter using Krita built-ins.\n"
            + (f"\nTimings:\n{timing_verbose}\n" if timing_verbose else ""),
        )

Krita.instance().addExtension(PIIRekognitionExtension(Krita.instance()))
# detector/pii_detect_aws.py
from __future__ import annotations

import argparse
import json
import re
import sys
import time
import unicodedata
from dataclasses import dataclass
from pathlib import Path
from typing import Any, Dict, List, Optional, Tuple

# assume boto3 is in an external venv
try:
    import boto3  # type: ignore
except Exception as e:  # pragma: no cover
    boto3 = None  # type: ignore
    _BOTO3_IMPORT_ERROR = e
else:
    _BOTO3_IMPORT_ERROR = None

# Regular email
EMAIL_RE = re.compile(r"\b[A-Z0-9._%+\-]+@[A-Z0-9.\-]+\.[A-Z]{2,}\b", re.IGNORECASE)

# Truncated email examples: koshii.takumi@classmet...
# - Captures domains ending with "..."
# - Assumes whitespace/end of line/typical punctuation follows
TRUNC_EMAIL_RE = re.compile(
    r"[A-Z0-9._%+\-]+@[A-Z0-9.\-]{2,}\.{3,}(?=\s|$|[)\]】〉》>、。,.])",
    re.IGNORECASE,
)

def _nfkc(s: str) -> str:
    s = unicodedata.normalize("NFKC", s)

    # Standardize ellipsis to "..." (OCR/font differences)
    s = s.replace("…", "...")   # U+2026
    s = s.replace("⋯", "...")   # U+22EF

    # Also handle Japanese triple dots "・・・" (middle dot sequence)
    s = s.replace("・・・", "...")
    return s

def _looks_like_email(s: str) -> bool:
    t = _nfkc(s)
    if EMAIL_RE.search(t):
        return True
    if TRUNC_EMAIL_RE.search(t):
        return True
    return False

def _read_png_size(path: Path) -> Tuple[int, int]:
    """
    Read PNG width/height from IHDR without Pillow.
    """
    with path.open("rb") as f:
        sig = f.read(8)
        if sig != b"\x89PNG\r\n\x1a\n":
            raise RuntimeError("Input is not a PNG (unexpected signature).")
        # IHDR: length(4) type(4) data(13) crc(4)
        length = int.from_bytes(f.read(4), "big")
        ctype = f.read(4)
        if ctype != b"IHDR":
            raise RuntimeError("PNG IHDR chunk not found where expected.")
        ihdr = f.read(13)
        if len(ihdr) != 13:
            raise RuntimeError("PNG IHDR too short.")
        width = int.from_bytes(ihdr[0:4], "big")
        height = int.from_bytes(ihdr[4:8], "big")
        if width <= 0 or height <= 0:
            raise RuntimeError(f"Invalid PNG size: {width}x{height}")
        return width, height

def _maybe_scale_image(in_path: Path, scale_percent: int) -> Tuple[Path, int, int]:
    """
    scale_percent:
      - 0: no scaling
      - 1..100: scale to that percentage
    Returns: (path_to_send, width, height)
    """
    w, h = _read_png_size(in_path)
    if scale_percent <= 0 or scale_percent == 100:
        return in_path, w, h

    # Optional: use Pillow if available
    try:
        from PIL import Image  # type: ignore
    except Exception:
        raise RuntimeError(
            "scale is requested but Pillow is not available in the external Python.\n"
            "Install pillow, or set scale=0 in config.json."
        )

    new_w = max(1, int(round(w * (scale_percent / 100.0))))
    new_h = max(1, int(round(h * (scale_percent / 100.0))))

    out_path = in_path.with_name(in_path.stem + f"_scaled{scale_percent}.png")
    with Image.open(in_path) as im:
        im = im.convert("RGB")
        im = im.resize((new_w, new_h))
        im.save(out_path, format="PNG")

    return out_path, new_w, new_h

def _clamp(v: int, lo: int, hi: int) -> int:
    return max(lo, min(hi, v))

def _pad_rect(x: int, y: int, w: int, h: int, pad: int, img_w: int, img_h: int) -> Dict[str, int]:
    x2 = x + w
    y2 = y + h
    x = _clamp(x - pad, 0, img_w)
    y = _clamp(y - pad, 0, img_h)
    x2 = _clamp(x2 + pad, 0, img_w)
    y2 = _clamp(y2 + pad, 0, img_h)
    return {"x": x, "y": y, "w": max(0, x2 - x), "h": max(0, y2 - y)}

def _looks_like_phone(s: str) -> bool:
    s0 = _nfkc(s)
    # quick reject if too short
    if len(s0) < 6:
        return False

    digits = re.sub(r"\D", "", s0)
    if not digits:
        return False

    has_plus = "+" in s0
    has_sep = any(ch in s0 for ch in ["-", " ", "(", ")", " ", "/", "/"])

    # E.164-ish
    if has_plus and 8 <= len(digits) <= 15:
        return True

    # Domestic-ish: avoid too-short sequences (years, prices, etc.)
    # Typical: 10-11 digits (JP mobile/landline), with separators is more likely a phone.
    if 10 <= len(digits) <= 11 and has_sep:
        return True

    # Some OCR outputs remove separators but keep "tel:" or similar markers
    if 10 <= len(digits) <= 15 and re.search(r"\b(tel|phone|fax)\b", s0, re.IGNORECASE):
        return True

    return False

@dataclass
class DetLine:
    text: str
    confidence: float
    # Rekognition bbox is normalized: left/top/width/height
    left: float
    top: float
    width: float
    height: float

def _extract_lines(rek_resp: Dict[str, Any]) -> List[DetLine]:
    out: List[DetLine] = []
    dets = rek_resp.get("TextDetections", [])
    if not isinstance(dets, list):
        return out

    for d in dets:
        if not isinstance(d, dict):
            continue
        if d.get("Type") != "LINE":
            continue
        text = d.get("DetectedText")
        conf = d.get("Confidence")
        geom = d.get("Geometry", {})
        bbox = geom.get("BoundingBox", {}) if isinstance(geom, dict) else {}

        if not isinstance(text, str):
            continue
        try:
            conf_f = float(conf)
        except Exception:
            conf_f = 0.0

        try:
            left = float(bbox.get("Left", 0.0))
            top = float(bbox.get("Top", 0.0))
            width = float(bbox.get("Width", 0.0))
            height = float(bbox.get("Height", 0.0))
        except Exception:
            continue

        # Sanity clamp normalized coordinates
        left = max(0.0, min(1.0, left))
        top = max(0.0, min(1.0, top))
        width = max(0.0, min(1.0, width))
        height = max(0.0, min(1.0, height))

        out.append(DetLine(text=text, confidence=conf_f, left=left, top=top, width=width, height=height))

    return out

def _bbox_to_rect_px(line: DetLine, img_w: int, img_h: int) -> Tuple[int, int, int, int]:
    x = int(round(line.left * img_w))
    y = int(round(line.top * img_h))
    w = int(round(line.width * img_w))
    h = int(round(line.height * img_h))

    # clamp within image
    x = _clamp(x, 0, img_w)
    y = _clamp(y, 0, img_h)
    w = _clamp(w, 0, img_w - x)
    h = _clamp(h, 0, img_h - y)
    return x, y, w, h

def _build_hits(lines: List[DetLine], img_w: int, img_h: int, min_conf: float, pad: int) -> List[Dict[str, Any]]:
    hits: List[Dict[str, Any]] = []

    for ln in lines:
        if ln.confidence < (min_conf * 100.0):  # Rekognition confidence is 0..100
            continue

        t = _nfkc(ln.text)

        kinds: List[str] = []
        if _looks_like_email(t):
            kinds.append("email")
        if _looks_like_phone(t):
            kinds.append("phone")

        if not kinds:
            continue

        x, y, w, h = _bbox_to_rect_px(ln, img_w, img_h)
        rect = _pad_rect(x, y, w, h, pad, img_w, img_h)

        # There might be cases where email and phone coexist in one line, so separate by kind in hits
        for k in kinds:
            hits.append(
                {
                    "kind": k,
                    "text": "masked",
                    "confidence": float(ln.confidence),
                    "rect": rect,
                }
            )

    return hits

def _make_session(profile: Optional[str], region: Optional[str]):
    if boto3 is None:
        raise RuntimeError(f"boto3 import failed: {_BOTO3_IMPORT_ERROR}")

    kwargs: Dict[str, Any] = {}
    if profile:
        kwargs["profile_name"] = profile

    session = boto3.session.Session(**kwargs)
    client_kwargs: Dict[str, Any] = {}
    if region:
        client_kwargs["region_name"] = region

    return session, session.client("rekognition", **client_kwargs)

def _main(argv: Optional[List[str]] = None) -> int:
    ap = argparse.ArgumentParser(description="PII detector using Amazon Rekognition DetectText (LINE-based).")
    ap.add_argument("--in", dest="in_path", required=True, help="Input image path (PNG recommended).")
    ap.add_argument("--out", dest="out_path", required=True, help="Output JSON path.")
    ap.add_argument("--min-conf", dest="min_conf", type=float, default=0.40, help="Min confidence (0..1).")
    ap.add_argument("--pad", dest="pad", type=int, default=2, help="Padding in pixels.")
    ap.add_argument(
        "--scale",
        dest="scale",
        type=int,
        default=0,
        help="Scale percent for OCR (0=no scaling, 1..100). Recommended 0 or 100.",
    )
    ap.add_argument("--aws-profile", dest="aws_profile", default="", help="AWS profile name (optional).")
    ap.add_argument("--aws-region", dest="aws_region", default="", help="AWS region (optional).")

    args = ap.parse_args(argv)

    in_path = Path(args.in_path)
    out_path = Path(args.out_path)

    if not in_path.exists():
        raise RuntimeError(f"Input not found: {in_path}")

    min_conf = float(args.min_conf)
    if not (0.0 <= min_conf <= 1.0):
        raise RuntimeError("--min-conf must be in range 0..1")

    scale = int(args.scale)
    if scale < 0 or scale > 100:
        raise RuntimeError("--scale must be 0..100")

    aws_profile = args.aws_profile.strip() or None
    aws_region = args.aws_region.strip() or None

    t0 = time.perf_counter()

    # Optional scaling
    scaled_path, img_w, img_h = _maybe_scale_image(in_path, scale)
    img_bytes = scaled_path.read_bytes()

    # AWS call
    t_aws0 = time.perf_counter()
    _, client = _make_session(aws_profile, aws_region)
    rek = client.detect_text(Image={"Bytes": img_bytes})
    t_aws1 = time.perf_counter()

    # Postprocess
    t_post0 = time.perf_counter()
    lines = _extract_lines(rek)
    hits = _build_hits(lines, img_w, img_h, min_conf=min_conf, pad=int(args.pad))
    t_post1 = time.perf_counter()

    t1 = time.perf_counter()

    data: Dict[str, Any] = {
        "engine_used": "rekognition.detect_text",
        "image_w": img_w,
        "image_h": img_h,
        "timings": {
            "total_sec": float(t1 - t0),
            "aws_sec": float(t_aws1 - t_aws0),
            "post_sec": float(t_post1 - t_post0),
        },
        "hits": hits,
    }

    out_path.parent.mkdir(parents=True, exist_ok=True)
    out_path.write_text(json.dumps(data, ensure_ascii=False, indent=2), encoding="utf-8")
    return 0

if __name__ == "__main__":
    try:
        raise SystemExit(_main())
    except Exception as e:
        # Output errors clearly so they can be displayed in Krita
        print(f"[pii_detect_aws] ERROR: {e}", file=sys.stderr)
        raise

Share this article

FacebookHatena blogX