Skip to content

🏠 Home | ⚙️ API Reference | 🛡️ Core API

🛡️ Core API (DTO & PII)

This section describes the essential data models and security utilities of the codex-core library.

BaseDTO

BaseDTO is an immutable Pydantic model with built-in PII masking for repr and str.

base_dto

Immutable base DTO with GDPR-safe string representation.

Provides :class:BaseDTO, the canonical superclass for all Data Transfer Objects in the codex_core ecosystem. Every DTO that crosses a service boundary (API layer → domain layer, domain layer → log sink) should extend this class to guarantee immutability and automatic PII redaction in log output.

Classes

BaseDTO

Bases: BaseModel

Immutable Pydantic model with automatic PII masking in __repr__.

Acts as the single inheritance root for all DTOs across codex_core and downstream libraries. Two invariants are enforced at the class level:

  1. Immutabilitymodel_config = ConfigDict(frozen=True) prevents in-place mutation after construction, making instances safe to pass across async boundaries and to cache.

  2. GDPR-safe logging__repr__ (and therefore __str__) delegates to :func:~codex_core.core.pii.mask_value for every field, replacing sensitive fields with "***". Which fields are masked depends on the active detection mode in :mod:~codex_core.core.pii: exact-match against :class:~codex_core.core.pii.PIIRegistry when a subclass is defined, or heuristic keyword-based matching otherwise. Nested dict values are recursively scrubbed.

Subclasses must not override model_config in a way that removes frozen=True unless the calling code explicitly manages mutation safety.

Performance note

__repr__ iterates over all fields on every call. Avoid logging BaseDTO instances inside tight loops (e.g. backtracking search algorithms). Prefer logging at entry/exit points of public API methods only.

Example
from codex_core.core.base_dto import BaseDTO
from codex_core.core.pii import PIIRegistry

class UserPII(PIIRegistry):
    email: str
    phone_number: str

class AppointmentDTO(BaseDTO):
    appointment_id: int
    client_name: str
    email: str
    slot_count: int

dto = AppointmentDTO(
    appointment_id=7,
    client_name="Alice Müller",
    email="alice@example.com",
    slot_count=2,
)
print(dto)
# AppointmentDTO(appointment_id=7, client_name='Alice Müller', email='***', slot_count=2)
# Note: client_name is NOT masked because UserPII specified ONLY email/phone_number.
Source code in src/codex_core/core/base_dto.py
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
class BaseDTO(BaseModel):
    """Immutable Pydantic model with automatic PII masking in ``__repr__``.

    Acts as the single inheritance root for all DTOs across codex_core
    and downstream libraries.  Two invariants are enforced at the class
    level:

    1. **Immutability** — ``model_config = ConfigDict(frozen=True)``
       prevents in-place mutation after construction, making instances
       safe to pass across async boundaries and to cache.

    2. **GDPR-safe logging** — ``__repr__`` (and therefore ``__str__``)
       delegates to :func:`~codex_core.core.pii.mask_value` for every
       field, replacing sensitive fields with ``"***"``.  Which fields
       are masked depends on the active detection mode in
       :mod:`~codex_core.core.pii`: exact-match against
       :class:`~codex_core.core.pii.PIIRegistry` when a subclass is
       defined, or heuristic keyword-based matching otherwise.  Nested
       ``dict`` values are recursively scrubbed.

    Subclasses must **not** override ``model_config`` in a way that
    removes ``frozen=True`` unless the calling code explicitly manages
    mutation safety.

    Performance note:
        ``__repr__`` iterates over all fields on every call.  Avoid
        logging ``BaseDTO`` instances inside tight loops (e.g. backtracking
        search algorithms).  Prefer logging at entry/exit points of
        public API methods only.

    Example:
        ```python
        from codex_core.core.base_dto import BaseDTO
        from codex_core.core.pii import PIIRegistry

        class UserPII(PIIRegistry):
            email: str
            phone_number: str

        class AppointmentDTO(BaseDTO):
            appointment_id: int
            client_name: str
            email: str
            slot_count: int

        dto = AppointmentDTO(
            appointment_id=7,
            client_name="Alice Müller",
            email="alice@example.com",
            slot_count=2,
        )
        print(dto)
        # AppointmentDTO(appointment_id=7, client_name='Alice Müller', email='***', slot_count=2)
        # Note: client_name is NOT masked because UserPII specified ONLY email/phone_number.
        ```
    """

    model_config = ConfigDict(frozen=True)

    def __repr__(self) -> str:
        """Build a PII-safe string representation of this DTO instance.

        Iterates over all instance fields and passes each ``(field_name,
        value)`` pair through :func:`~codex_core.core.pii.mask_value`
        before formatting.  The result follows the standard Pydantic
        repr convention ``ClassName(field=value, ...)``.

        Returns:
            A human-readable string suitable for log output with all
            PII fields replaced by ``"***"``.
        """
        cls_name = type(self).__name__
        pairs: list[str] = [
            f"{field_name}={mask_value(field_name, value)!r}"
            for field_name, value in self.__dict__.items()
        ]
        return f"{cls_name}({', '.join(pairs)})"

    __str__ = __repr__
Functions
__repr__()

Build a PII-safe string representation of this DTO instance.

Iterates over all instance fields and passes each (field_name, value) pair through :func:~codex_core.core.pii.mask_value before formatting. The result follows the standard Pydantic repr convention ClassName(field=value, ...).

Returns:

Type Description
str

A human-readable string suitable for log output with all

str

PII fields replaced by "***".

Source code in src/codex_core/core/base_dto.py
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
def __repr__(self) -> str:
    """Build a PII-safe string representation of this DTO instance.

    Iterates over all instance fields and passes each ``(field_name,
    value)`` pair through :func:`~codex_core.core.pii.mask_value`
    before formatting.  The result follows the standard Pydantic
    repr convention ``ClassName(field=value, ...)``.

    Returns:
        A human-readable string suitable for log output with all
        PII fields replaced by ``"***"``.
    """
    cls_name = type(self).__name__
    pairs: list[str] = [
        f"{field_name}={mask_value(field_name, value)!r}"
        for field_name, value in self.__dict__.items()
    ]
    return f"{cls_name}({', '.join(pairs)})"

Functions

PII Protection

Utilities for identifying and masking sensitive data.

pii

PII field registry and masking utilities for GDPR-safe logging.

This module is the single source of truth for Personally Identifiable Information (PII) detection within codex_core. It is consumed by :class:~codex_core.core.base_dto.BaseDTO to scrub sensitive fields before any string serialization reaches a log sink.

The detection strategy supports two modes:

  1. Declarative Registry (Recommended): Subclass :class:PIIRegistry to list exact field names from your database or domain model. This disables heuristic "magic" matching and uses only your specified fields.

  2. Heuristic Fallback: If no registry is defined, the module uses keyword-based substring matching (phone, email, name, etc.).

Example
from codex_core.core.pii import PIIRegistry

class DatabasePII(PIIRegistry):
    # Match these exact column names from your DB
    user_phone: str
    customer_email: str

Attributes

MASK = '***' module-attribute

Redaction placeholder substituted for PII field values in logs.

PII_KEYWORDS = frozenset({'phone', 'email', 'name', 'address', 'note', 'comment', 'phone_number', 'email_address'}) module-attribute

Immutable set of substrings used for heuristic PII detection.

Classes

PIIRegistry

Base class for declaring an explicit PII field registry.

Subclass this and list your sensitive field names as class attributes or type annotations. The registry automatically collects these names.

When any subclass of PIIRegistry is defined, the global masking logic switches from "substring matching" to "exact match" against these fields.

Source code in src/codex_core/core/pii.py
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
class PIIRegistry:
    """Base class for declaring an explicit PII field registry.

    Subclass this and list your sensitive field names as class attributes
    or type annotations. The registry automatically collects these names.

    When any subclass of PIIRegistry is defined, the global masking logic
    switches from "substring matching" to "exact match" against these fields.
    """
    _registered_fields: ClassVar[frozenset[str]] = frozenset()

    def __init_subclass__(cls, **kwargs: Any) -> None:
        """Collect field names from annotations and class attributes."""
        super().__init_subclass__(**kwargs)

        # Collect from annotations: email: str -> "email" (normalized to lowercase)
        fields = {k.lower() for k in cls.__annotations__.keys()}

        # Collect from class attributes: phone = True -> "phone" (normalized to lowercase)
        fields.update(
            name.lower() for name, value in cls.__dict__.items()
            if not name.startswith("_") and not callable(value)
        )

        # Merge into global singleton registry
        new_fields = set(PIIRegistry._registered_fields)
        new_fields.update(fields)
        PIIRegistry._registered_fields = frozenset(new_fields)

        logger.debug(f"PII Registry updated. Fields: {list(PIIRegistry._registered_fields)}")
Functions
__init_subclass__(**kwargs)

Collect field names from annotations and class attributes.

Source code in src/codex_core/core/pii.py
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
def __init_subclass__(cls, **kwargs: Any) -> None:
    """Collect field names from annotations and class attributes."""
    super().__init_subclass__(**kwargs)

    # Collect from annotations: email: str -> "email" (normalized to lowercase)
    fields = {k.lower() for k in cls.__annotations__.keys()}

    # Collect from class attributes: phone = True -> "phone" (normalized to lowercase)
    fields.update(
        name.lower() for name, value in cls.__dict__.items()
        if not name.startswith("_") and not callable(value)
    )

    # Merge into global singleton registry
    new_fields = set(PIIRegistry._registered_fields)
    new_fields.update(fields)
    PIIRegistry._registered_fields = frozenset(new_fields)

    logger.debug(f"PII Registry updated. Fields: {list(PIIRegistry._registered_fields)}")

Functions

is_pii_field(field_name)

Determine whether a field name should be masked.

The check follows this priority: 1. If a :class:PIIRegistry is defined, it checks for an exact match (case-insensitive) against the registered fields. 2. Otherwise, it performs a case-insensitive substring search against :data:PII_KEYWORDS.

Parameters:

Name Type Description Default
field_name str

The field name to evaluate.

required

Returns:

Type Description
bool

True if the field matches the PII criteria, False otherwise.

Source code in src/codex_core/core/pii.py
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
def is_pii_field(field_name: str) -> bool:
    """Determine whether a field name should be masked.

    The check follows this priority:
    1. If a :class:`PIIRegistry` is defined, it checks for an exact match
       (case-insensitive) against the registered fields.
    2. Otherwise, it performs a case-insensitive substring search against
       :data:`PII_KEYWORDS`.

    Args:
        field_name: The field name to evaluate.

    Returns:
        ``True`` if the field matches the PII criteria, ``False`` otherwise.
    """
    name_lower = field_name.lower()

    # Priority 1: Explicit Registry (Exact match)
    if PIIRegistry._registered_fields:
        return name_lower in PIIRegistry._registered_fields

    # Priority 2: Heuristic Fallback (Substring match)
    return any(kw in name_lower for kw in PII_KEYWORDS)

mask_value(field_name, value)

Return a redacted representation of value when the field is PII.

Applies :data:MASK as a flat replacement for scalar PII fields. For non-PII fields the function recurses into dict and list containers so that nested structures are also scrubbed.

Parameters:

Name Type Description Default
field_name str

Name of the field being evaluated.

required
value Any

The value to inspect and potentially redact.

required

Returns:

Type Description
Any

data:MASK for PII fields; a recursively masked dict or

Any

list for containers; or the original value unchanged.

Source code in src/codex_core/core/pii.py
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
def mask_value(field_name: str, value: Any) -> Any:
    """Return a redacted representation of *value* when the field is PII.

    Applies :data:`MASK` as a flat replacement for scalar PII fields.
    For non-PII fields the function recurses into ``dict`` and ``list``
    containers so that nested structures are also scrubbed.

    Args:
        field_name: Name of the field being evaluated.
        value: The value to inspect and potentially redact.

    Returns:
        :data:`MASK` for PII fields; a recursively masked ``dict`` or
        ``list`` for containers; or the original *value* unchanged.
    """
    if is_pii_field(field_name):
        return MASK

    if isinstance(value, dict):
        return {k: mask_value(k, v) for k, v in value.items()}

    if isinstance(value, list):
        return [mask_value(field_name, item) for item in value]

    return value