DocsTensor

Tensor API Reference

The Tensor class is the fundamental data structure in cyxwiz-backend, providing GPU-accelerated multi-dimensional array operations powered by ArrayFire.

Data Types

Type	Description
Float16	Half precision
Float32	Single precision (default)
Float64	Double precision
Int8/Int16/Int32/Int64	Integer types
Bool	Boolean

Factory Functions

// Zeros
Tensor Zeros(const std::vector<int>& shape, DataType dtype = DataType::Float32);

// Ones
Tensor Ones(const std::vector<int>& shape, DataType dtype = DataType::Float32);

// Random uniform [0, 1)
Tensor Rand(const std::vector<int>& shape, DataType dtype = DataType::Float32);

// Random normal (mean=0, std=1)
Tensor Randn(const std::vector<int>& shape, DataType dtype = DataType::Float32);

// Range
Tensor Arange(float start, float end, float step = 1.0f);

// Linspace
Tensor Linspace(float start, float end, int num);

// Identity matrix
Tensor Eye(int n, DataType dtype = DataType::Float32);

// Full with value
Tensor Full(const std::vector<int>& shape, float value, DataType dtype = DataType::Float32);

Basic Operations

#include <cyxwiz/tensor.h>

using namespace cyxwiz;

// Create tensors
Tensor a = Randn({100, 50});
Tensor b = Randn({50, 30});

// Matrix multiplication
Tensor c = a.MatMul(b);  // Shape: (100, 30)

// Element-wise operations
Tensor d = a * 2.0f + 1.0f;

// Reductions
Tensor sum = a.Sum();        // Scalar
Tensor col_sum = a.Sum(0);   // Sum along dim 0, shape: (50,)
Tensor row_sum = a.Sum(1);   // Sum along dim 1, shape: (100,)

// Statistics
Tensor mean = a.Mean();
Tensor max_val = a.Max();
Tensor min_val = a.Min();

Shape Manipulation

Tensor t = Randn({2, 3, 4, 5});

// Reshape
Tensor reshaped = t.Reshape({6, 20});

// Flatten
Tensor flat = t.Flatten();  // Shape: (120,)

// Transpose
Tensor transposed = t.Transpose(-2, -1);  // Swap last two dims

// Permute
Tensor permuted = t.Permute({0, 2, 1, 3});  // Reorder dimensions

// Squeeze/Unsqueeze
Tensor squeezed = Randn({1, 10, 1}).Squeeze();  // Shape: (10,)
Tensor unsqueezed = Randn({10}).Unsqueeze(0);   // Shape: (1, 10)

Indexing and Slicing

Tensor t = Randn({10, 20, 30});

// Single index
Tensor first = t[0];  // Shape: (20, 30)

// Slice
Tensor sliced = t.Slice(0, 2, 5);  // t[2:5, :, :], shape: (3, 20, 30)

// Multiple slices
Tensor multi = t.Slice(0, 0, 5).Slice(1, 10, 15);  // t[0:5, 10:15, :]

Device Management

// Create on default device
Tensor cpu_tensor = Randn({1000, 1000});

// Move to GPU
Tensor gpu_tensor = cpu_tensor.ToGPU();

// Move back to CPU
Tensor back_to_cpu = gpu_tensor.ToCPU();

// Check device
if (gpu_tensor.IsOnGPU()) {
    std::cout << "Tensor is on GPU" << std::endl;
}

// Explicit device
Tensor cuda_tensor = cpu_tensor.ToDevice(DeviceType::CUDA);

Gradient Computation

// Create tensor with gradient tracking
Tensor x = Randn({10, 5});
x.RequiresGrad(true);

// Forward pass
Tensor y = x.MatMul(Randn({5, 3}));
Tensor loss = y.Sum();

// Backward pass
loss.Backward();

// Access gradient
Tensor grad = x.Grad();

// Zero gradients
x.ZeroGrad();

// Detach from computation graph
Tensor detached = y.Detach();

Broadcasting Rules

CyxWiz follows NumPy/PyTorch broadcasting rules:

If tensors have different numbers of dimensions, prepend 1s to smaller tensor's shape
Dimensions are compatible if they are equal or one of them is 1
The output shape is the maximum of each dimension

Tensor a({3, 4, 5});
Tensor b({4, 5});     // Broadcasts to (1, 4, 5)
Tensor c = a + b;     // Result shape: (3, 4, 5)

Tensor d({3, 1, 5});
Tensor e({1, 4, 1});
Tensor f = d * e;     // Result shape: (3, 4, 5)

Python Bindings

import pycyxwiz as cyx

# Create tensor
t = cyx.Tensor([1.0, 2.0, 3.0, 4.0], [2, 2])

# Factory functions
zeros = cyx.zeros([10, 10])
ones = cyx.ones([5, 5])
rand = cyx.rand([100, 50])
randn = cyx.randn([64, 128])

# Operations
result = t + t * 2.0
matmul = cyx.matmul(t, t.T)

# Device
gpu_t = t.to_gpu()
cpu_t = gpu_t.to_cpu()

# NumPy conversion
import numpy as np
np_array = t.numpy()
from_np = cyx.from_numpy(np_array)

Performance Tips

Batch operations: Operate on batches rather than individual samples
Minimize transfers: Keep data on GPU when possible
Use in-place: Use +=, -= etc. when the original is no longer needed
Contiguous memory: Ensure tensors are contiguous before heavy computation
Appropriate dtype: Use Float32 for most cases, Float16 for large models

API Summary

Shape & Dimensions

Shape()
NumDimensions()
NumElements()
Dim(int index)

Math Operations

Abs(), Sqrt(), Exp(), Log()
Pow(float), Sin(), Cos(), Tanh()
MatMul(), Dot()

Reductions

Sum(dim, keepdim)
Mean(dim, keepdim)
Max(dim, keepdim)
Min(dim, keepdim)
ArgMax(), ArgMin()

Reshape Operations

Reshape(), Flatten()
Squeeze(), Unsqueeze()
Transpose(), Permute()