DocsData Flow
Data Flow
How data flows through the CyxWiz platform during various operations.
Job Lifecycle Overview
1
Creation
User builds model, submits job
2
Scheduling
Server matches job to node
3
Execution
Node trains model, reports progress
4
Completion
Payment released, model delivered
1. Job Creation
User Action: Creates model in Node Editor
|
v
+--------------------------------+
| CyxWiz Engine |
| |
| 1. Validate graph topology |
| 2. Generate model definition |
| 3. Package with dataset URI |
| 4. Calculate payment estimate |
+--------------------------------+
|
v
SubmitJobRequest
{
job_type: TRAINING,
model_definition: {...},
dataset_uri: "ipfs://...",
hyperparameters: {...},
payment_amount: 100.0,
required_device: GPU
}2. Job Acceptance & Scheduling
Central Server Processing
+--------------------------------+
| Central Server |
| |
| 1. Validate request |
| 2. Check user balance |
| 3. Create Solana escrow |
| 4. Insert into job queue |
| 5. Return job_id |
+--------------------------------+
|
v
SubmitJobResponse
{
job_id: "uuid-...",
status: QUEUED,
estimated_start: 1702000000
}Node Matching
+--------------------------------+ | Node Matcher | | | | Score = f( | | device_match, | | memory_available, | | reputation, | | location, | | queue_length | | ) | +--------------------------------+ Loop every 5 seconds: 1. Get pending jobs 2. Get available nodes 3. Match by requirements 4. Select best node 5. Send assignment
3. Training Execution
+--------------------------------+
| Job Executor |
| |
| For each epoch: |
| |
| 1. Load batch from DataLoader |
| 2. Forward pass |
| 3. Compute loss |
| 4. Backward pass |
| 5. Optimizer step |
| 6. Report progress |
+--------------------------------+
|
Every N batches
|
v
ReportProgressRequest
{
job_id,
progress: 0.35,
metrics: {
loss: 0.342,
accuracy: 0.876
},
current_epoch: 7
}Real-Time Progress Streaming
Central Server Engine
| |
| StreamJobUpdates(job_id) |
|<-----------------------------|
| |
| JobUpdateStream |
|----------------------------->|
| { progress, metrics } |
| |
| JobUpdateStream |
|----------------------------->|
| { progress, metrics } |
| |
... ...4. Job Completion & Payment
Completion Report
+--------------------------------+
| Server Node |
| |
| 1. Save model weights |
| 2. Upload to storage |
| 3. Calculate final metrics |
| 4. Sign completion proof |
+--------------------------------+
|
v
ReportCompletionRequest
{
job_id,
final_status: SUCCESS,
model_weights_uri: "ipfs://...",
model_weights_hash: "sha256:...",
final_metrics: {...},
total_compute_time: 3600000,
signature: "..."
}Payment Release
+--------------------------------+ | Solana Blockchain | | | | Transaction 1: | | - From: Escrow Account | | - To: Node Wallet | | - Amount: 90 CYXWIZ (90%) | | | | Transaction 2: | | - From: Escrow Account | | - To: Platform Fee Account | | - Amount: 10 CYXWIZ (10%) | +--------------------------------+
Node Heartbeat
Server Node Central Server Database
| | |
| HeartbeatRequest | |
|----------------------------->| |
| { node_id, status } | |
| | |
| | UPDATE nodes |
| | SET last_seen = NOW() |
| |----------------------------->|
| | |
| HeartbeatResponse | |
|<-----------------------------| |
| { keep_alive: true } | |
| | |
10s 10s 10s
| | |
... ... ...Metrics Collection
Hardware Metrics
Collected every 5 seconds:
- CPU: Usage per core, temperature, frequency
- GPU: Utilization, memory, temperature, power
- Memory: RAM used/total, swap used/total
- Network: Bytes sent/received, connections
Training Metrics
- Each Batch:batch_loss, batch_accuracy, batch_time_ms, learning_rate
- Each Epoch:epoch_loss, validation_loss, validation_accuracy
- Training End:total_time_ms, final_loss, final_accuracy, best_epoch
Redis Cache Layers
| Layer | Key Pattern | TTL | Purpose |
|---|---|---|---|
| Request Cache | job:{job_id}:status | 60s | Reduce DB queries |
| Node Cache | node:{node_id}:info | 30s | Fast node lookups |
| Session Cache | session:{token} | 1h | Auth validation |
| Metrics Buffer | metrics:{node_id}:latest | - | Dashboard updates |
Error Handling
Network Retry Logic
- Attempt 1: Wait 1s
- Attempt 2: Wait 2s
- Attempt 3: Wait 4s
- Attempt 4: Wait 8s
- Attempt 5: Wait 16s
If all fail: Log error, update status, notify user, queue for later
Training Errors
- Catch exception in training
- Save current state/checkpoint
- Report failure with error message
- Include last checkpoint URI
- Central server determines partial payment