Architecture
This document explains how Bag works internally, removing the “magic” and showing the underlying mechanisms.
Core Data Model
A Bag is a container of BagNode objects. Each node has a label, a value, and optional attributes.
classDiagram
class Bag {
-List~BagNode~ _nodes
-Dict _node_index
-BagNode _parent_node
+__getitem__(path)
+__setitem__(path, value)
+set_item(path, value, **attrs)
+subscribe(name, callback)
}
class BagNode {
-str _label
-Any _value
-Dict _attr
-Bag _parent
+label: str
+value: Any
+attr: Dict
+parent: Bag
}
Bag "1" *-- "*" BagNode : contains
BagNode "1" --> "0..1" Bag : value can be
Node Structure
Each BagNode holds:
label: The node’s name (key)
value: Any Python object, including another Bag for nesting
attr: Dictionary of metadata attributes
graph LR
subgraph BagNode
L[label: 'user']
V[value: 'Alice']
A[attr: role='admin', active=True]
end
Path Resolution
When you access bag['a.b.c'], the path is parsed and resolved step by step.
flowchart LR
A["bag['a.b.c']"] --> B[Split path: a, b, c]
B --> C[Get node 'a']
C --> D{a.value is Bag?}
D -->|Yes| E[Get node 'b' from a.value]
D -->|No| F[Return None]
E --> G{b.value is Bag?}
G -->|Yes| H[Get node 'c' from b.value]
G -->|No| I[Return None]
H --> J[Return c.value]
Path Syntax
Syntax |
Meaning |
Example |
|---|---|---|
|
Nested path |
|
|
Index access |
|
|
Attribute access |
|
|
Root attribute |
|
Attribute Access
Attributes are accessed using the ? syntax in paths.
flowchart TB
A["bag['user?role']"] --> B[Parse: key='user', attr='role']
B --> C[Get node 'user']
C --> D[Return node.attr['role']]
E["bag.set_item('user', 'Alice', role='admin')"] --> F[Create/update node]
F --> G[Set node.value = 'Alice']
F --> H[Set node.attr['role'] = 'admin']
Resolvers
Resolvers provide lazy loading. When a node’s value is a Resolver, accessing it triggers resolution.
sequenceDiagram
participant User
participant Bag
participant Node
participant Resolver
participant External as External Source
User->>Bag: bag['data']
Bag->>Node: get value
Node->>Node: value is Resolver?
Note over Node: Yes, it's a Resolver
Node->>Resolver: resolve()
Resolver->>External: fetch data
External-->>Resolver: return data
Resolver-->>Node: resolved value
Node-->>Bag: return value
Bag-->>User: data
Resolver Types
classDiagram
class BagResolver {
<<abstract>>
+load() Any
+async_load() Any
+cache_time: int | float | bool
}
class BagCbResolver {
-Callable _callback
+load() Any
}
class UrlResolver {
-str _url
+load() Any
}
class DirectoryResolver {
-Path _path
+load() Bag
}
BagResolver <|-- BagCbResolver
BagResolver <|-- UrlResolver
BagResolver <|-- DirectoryResolver
Caching
Resolvers support caching to avoid repeated expensive operations.
flowchart TB
A[Access resolver] --> B{Cache valid?}
B -->|Yes| C[Return cached value]
B -->|No| D[Call load/async_load]
D --> E[Store in cache with timestamp]
E --> F[Return value]
Subscriptions
Subscriptions enable reactive programming by notifying callbacks when nodes change.
sequenceDiagram
participant User
participant Bag
participant Subscription
participant Callback
User->>Bag: subscribe('watcher', update=on_change)
Bag->>Subscription: create subscription
User->>Bag: bag['name'] = 'Alice'
Bag->>Bag: create/update node
Bag->>Subscription: notify(node, 'ins')
Subscription->>Callback: on_change(node, 'ins')
Event Types
Event |
Trigger |
|---|---|
|
New node inserted |
|
Node value changed |
|
Node attributes changed |
|
Node deleted |
flowchart LR
subgraph Events
INS[ins: insert]
UPD_V[upd_value: value change]
UPD_A[upd_attrs: attrs change]
DEL[del: delete]
end
subgraph Subscription
INSERT[insert: ins only]
UPDATE[update: upd_value + upd_attrs]
DELETE[delete: del only]
ANY[any: ins + upd_* + del]
end
INS --> INSERT
INS --> ANY
UPD_V --> UPDATE
UPD_V --> ANY
UPD_A --> UPDATE
UPD_A --> ANY
DEL --> DELETE
DEL --> ANY
Serialization
Bag supports multiple serialization formats.
flowchart LR
subgraph Bag
B[Bag object]
end
subgraph Formats
XML[XML]
JSON[TYTX JSON]
MSGPACK[TYTX MsgPack]
end
B -->|to_xml| XML
XML -->|from_xml| B
B -->|to_tytx json| JSON
JSON -->|from_tytx json| B
B -->|to_tytx msgpack| MSGPACK
MSGPACK -->|from_tytx msgpack| B
TYTX Format
TYTX (Type-preserving Transfer) maintains Python types across serialization.
flowchart TB
subgraph "Python Object"
PY[value: 42, type: int]
end
subgraph "TYTX Encoding"
TX["{'_t': 'I', '_v': 42}"]
end
subgraph "After Deserialization"
PY2[value: 42, type: int]
end
PY -->|encode| TX
TX -->|decode| PY2
Memory Layout
Understanding memory usage helps optimize for large Bags.
graph TB
subgraph "Bag (48 bytes shallow)"
BL[_nodes: List]
BI[_node_index: Dict]
BP[_parent_node: ref]
end
subgraph "BagNode (~290 bytes each)"
NL[_label: str]
NV[_value: Any]
NA[_attr: Dict]
NP[_parent: ref]
end
BL --> NL
Memory Scaling
Items |
Dict |
Bag |
Per-item |
|---|---|---|---|
100 |
7.7 KB |
36 KB |
~290 bytes |
10,000 |
975 KB |
3.7 MB |
~289 bytes |
100,000 |
11.4 MB |
38.8 MB |
~288 bytes |
Thread Safety
Bag is not thread-safe by default. For concurrent access:
flowchart TB
subgraph "Thread-Safe Pattern"
L[Lock]
B[Bag]
T1[Thread 1]
T2[Thread 2]
end
T1 -->|acquire| L
L -->|protected| B
T2 -->|wait| L
Recommended approach:
Use a threading.Lock for write operations
Or use separate Bag instances per thread
Or use async with proper await patterns
Complete Data Flow
flowchart TB
subgraph Input
I1[Direct assignment]
I2[From dict]
I3[From XML]
end
subgraph "Bag Core"
B[Bag]
N[BagNodes]
S[Subscriptions]
end
subgraph Processing
R[Resolvers]
V[Validation]
end
subgraph Output
O1[Path access]
O2[To XML]
O3[To TYTX]
O4[Iteration]
end
I1 --> B
I2 --> B
I3 --> B
I4 --> V
V --> B
B --> N
N --> S
N --> R
B --> O1
B --> O2
B --> O3
B --> O4
Performance Characteristics
Operation |
Time Complexity |
Notes |
|---|---|---|
Direct access |
O(1) |
Hash lookup |
Path access |
O(n) |
n = path depth |
Index access |
O(1) |
List index |
Insert |
O(1) amortized |
Creates BagNode |
Delete |
O(n) |
n = number of nodes |
Iteration |
O(n) |
n = number of nodes |
Subscribe |
O(1) |
Register callback |
Event dispatch |
O(s) |
s = number of subscribers |