DocsTensor
Tensor API Reference
The Tensor class is the fundamental data structure in cyxwiz-backend, providing GPU-accelerated multi-dimensional array operations powered by ArrayFire.
Data Types
| Type | Description |
|---|---|
| Float16 | Half precision |
| Float32 | Single precision (default) |
| Float64 | Double precision |
| Int8/Int16/Int32/Int64 | Integer types |
| Bool | Boolean |
Factory Functions
// Zeros Tensor Zeros(const std::vector<int>& shape, DataType dtype = DataType::Float32); // Ones Tensor Ones(const std::vector<int>& shape, DataType dtype = DataType::Float32); // Random uniform [0, 1) Tensor Rand(const std::vector<int>& shape, DataType dtype = DataType::Float32); // Random normal (mean=0, std=1) Tensor Randn(const std::vector<int>& shape, DataType dtype = DataType::Float32); // Range Tensor Arange(float start, float end, float step = 1.0f); // Linspace Tensor Linspace(float start, float end, int num); // Identity matrix Tensor Eye(int n, DataType dtype = DataType::Float32); // Full with value Tensor Full(const std::vector<int>& shape, float value, DataType dtype = DataType::Float32);
Basic Operations
#include <cyxwiz/tensor.h>
using namespace cyxwiz;
// Create tensors
Tensor a = Randn({100, 50});
Tensor b = Randn({50, 30});
// Matrix multiplication
Tensor c = a.MatMul(b); // Shape: (100, 30)
// Element-wise operations
Tensor d = a * 2.0f + 1.0f;
// Reductions
Tensor sum = a.Sum(); // Scalar
Tensor col_sum = a.Sum(0); // Sum along dim 0, shape: (50,)
Tensor row_sum = a.Sum(1); // Sum along dim 1, shape: (100,)
// Statistics
Tensor mean = a.Mean();
Tensor max_val = a.Max();
Tensor min_val = a.Min();Shape Manipulation
Tensor t = Randn({2, 3, 4, 5});
// Reshape
Tensor reshaped = t.Reshape({6, 20});
// Flatten
Tensor flat = t.Flatten(); // Shape: (120,)
// Transpose
Tensor transposed = t.Transpose(-2, -1); // Swap last two dims
// Permute
Tensor permuted = t.Permute({0, 2, 1, 3}); // Reorder dimensions
// Squeeze/Unsqueeze
Tensor squeezed = Randn({1, 10, 1}).Squeeze(); // Shape: (10,)
Tensor unsqueezed = Randn({10}).Unsqueeze(0); // Shape: (1, 10)Indexing and Slicing
Tensor t = Randn({10, 20, 30});
// Single index
Tensor first = t[0]; // Shape: (20, 30)
// Slice
Tensor sliced = t.Slice(0, 2, 5); // t[2:5, :, :], shape: (3, 20, 30)
// Multiple slices
Tensor multi = t.Slice(0, 0, 5).Slice(1, 10, 15); // t[0:5, 10:15, :]Device Management
// Create on default device
Tensor cpu_tensor = Randn({1000, 1000});
// Move to GPU
Tensor gpu_tensor = cpu_tensor.ToGPU();
// Move back to CPU
Tensor back_to_cpu = gpu_tensor.ToCPU();
// Check device
if (gpu_tensor.IsOnGPU()) {
std::cout << "Tensor is on GPU" << std::endl;
}
// Explicit device
Tensor cuda_tensor = cpu_tensor.ToDevice(DeviceType::CUDA);Gradient Computation
// Create tensor with gradient tracking
Tensor x = Randn({10, 5});
x.RequiresGrad(true);
// Forward pass
Tensor y = x.MatMul(Randn({5, 3}));
Tensor loss = y.Sum();
// Backward pass
loss.Backward();
// Access gradient
Tensor grad = x.Grad();
// Zero gradients
x.ZeroGrad();
// Detach from computation graph
Tensor detached = y.Detach();Broadcasting Rules
CyxWiz follows NumPy/PyTorch broadcasting rules:
- If tensors have different numbers of dimensions, prepend 1s to smaller tensor's shape
- Dimensions are compatible if they are equal or one of them is 1
- The output shape is the maximum of each dimension
Tensor a({3, 4, 5});
Tensor b({4, 5}); // Broadcasts to (1, 4, 5)
Tensor c = a + b; // Result shape: (3, 4, 5)
Tensor d({3, 1, 5});
Tensor e({1, 4, 1});
Tensor f = d * e; // Result shape: (3, 4, 5)Python Bindings
import pycyxwiz as cyx # Create tensor t = cyx.Tensor([1.0, 2.0, 3.0, 4.0], [2, 2]) # Factory functions zeros = cyx.zeros([10, 10]) ones = cyx.ones([5, 5]) rand = cyx.rand([100, 50]) randn = cyx.randn([64, 128]) # Operations result = t + t * 2.0 matmul = cyx.matmul(t, t.T) # Device gpu_t = t.to_gpu() cpu_t = gpu_t.to_cpu() # NumPy conversion import numpy as np np_array = t.numpy() from_np = cyx.from_numpy(np_array)
Performance Tips
- Batch operations: Operate on batches rather than individual samples
- Minimize transfers: Keep data on GPU when possible
- Use in-place: Use +=, -= etc. when the original is no longer needed
- Contiguous memory: Ensure tensors are contiguous before heavy computation
- Appropriate dtype: Use Float32 for most cases, Float16 for large models
API Summary
Shape & Dimensions
- Shape()
- NumDimensions()
- NumElements()
- Dim(int index)
Math Operations
- Abs(), Sqrt(), Exp(), Log()
- Pow(float), Sin(), Cos(), Tanh()
- MatMul(), Dot()
Reductions
- Sum(dim, keepdim)
- Mean(dim, keepdim)
- Max(dim, keepdim)
- Min(dim, keepdim)
- ArgMax(), ArgMin()
Reshape Operations
- Reshape(), Flatten()
- Squeeze(), Unsqueeze()
- Transpose(), Permute()