CyxWiz LogoCyxWiz
DocsTensor

Tensor API Reference

The Tensor class is the fundamental data structure in cyxwiz-backend, providing GPU-accelerated multi-dimensional array operations powered by ArrayFire.

Data Types

TypeDescription
Float16Half precision
Float32Single precision (default)
Float64Double precision
Int8/Int16/Int32/Int64Integer types
BoolBoolean

Factory Functions

// Zeros
Tensor Zeros(const std::vector<int>& shape, DataType dtype = DataType::Float32);

// Ones
Tensor Ones(const std::vector<int>& shape, DataType dtype = DataType::Float32);

// Random uniform [0, 1)
Tensor Rand(const std::vector<int>& shape, DataType dtype = DataType::Float32);

// Random normal (mean=0, std=1)
Tensor Randn(const std::vector<int>& shape, DataType dtype = DataType::Float32);

// Range
Tensor Arange(float start, float end, float step = 1.0f);

// Linspace
Tensor Linspace(float start, float end, int num);

// Identity matrix
Tensor Eye(int n, DataType dtype = DataType::Float32);

// Full with value
Tensor Full(const std::vector<int>& shape, float value, DataType dtype = DataType::Float32);

Basic Operations

#include <cyxwiz/tensor.h>

using namespace cyxwiz;

// Create tensors
Tensor a = Randn({100, 50});
Tensor b = Randn({50, 30});

// Matrix multiplication
Tensor c = a.MatMul(b);  // Shape: (100, 30)

// Element-wise operations
Tensor d = a * 2.0f + 1.0f;

// Reductions
Tensor sum = a.Sum();        // Scalar
Tensor col_sum = a.Sum(0);   // Sum along dim 0, shape: (50,)
Tensor row_sum = a.Sum(1);   // Sum along dim 1, shape: (100,)

// Statistics
Tensor mean = a.Mean();
Tensor max_val = a.Max();
Tensor min_val = a.Min();

Shape Manipulation

Tensor t = Randn({2, 3, 4, 5});

// Reshape
Tensor reshaped = t.Reshape({6, 20});

// Flatten
Tensor flat = t.Flatten();  // Shape: (120,)

// Transpose
Tensor transposed = t.Transpose(-2, -1);  // Swap last two dims

// Permute
Tensor permuted = t.Permute({0, 2, 1, 3});  // Reorder dimensions

// Squeeze/Unsqueeze
Tensor squeezed = Randn({1, 10, 1}).Squeeze();  // Shape: (10,)
Tensor unsqueezed = Randn({10}).Unsqueeze(0);   // Shape: (1, 10)

Indexing and Slicing

Tensor t = Randn({10, 20, 30});

// Single index
Tensor first = t[0];  // Shape: (20, 30)

// Slice
Tensor sliced = t.Slice(0, 2, 5);  // t[2:5, :, :], shape: (3, 20, 30)

// Multiple slices
Tensor multi = t.Slice(0, 0, 5).Slice(1, 10, 15);  // t[0:5, 10:15, :]

Device Management

// Create on default device
Tensor cpu_tensor = Randn({1000, 1000});

// Move to GPU
Tensor gpu_tensor = cpu_tensor.ToGPU();

// Move back to CPU
Tensor back_to_cpu = gpu_tensor.ToCPU();

// Check device
if (gpu_tensor.IsOnGPU()) {
    std::cout << "Tensor is on GPU" << std::endl;
}

// Explicit device
Tensor cuda_tensor = cpu_tensor.ToDevice(DeviceType::CUDA);

Gradient Computation

// Create tensor with gradient tracking
Tensor x = Randn({10, 5});
x.RequiresGrad(true);

// Forward pass
Tensor y = x.MatMul(Randn({5, 3}));
Tensor loss = y.Sum();

// Backward pass
loss.Backward();

// Access gradient
Tensor grad = x.Grad();

// Zero gradients
x.ZeroGrad();

// Detach from computation graph
Tensor detached = y.Detach();

Broadcasting Rules

CyxWiz follows NumPy/PyTorch broadcasting rules:

  1. If tensors have different numbers of dimensions, prepend 1s to smaller tensor's shape
  2. Dimensions are compatible if they are equal or one of them is 1
  3. The output shape is the maximum of each dimension
Tensor a({3, 4, 5});
Tensor b({4, 5});     // Broadcasts to (1, 4, 5)
Tensor c = a + b;     // Result shape: (3, 4, 5)

Tensor d({3, 1, 5});
Tensor e({1, 4, 1});
Tensor f = d * e;     // Result shape: (3, 4, 5)

Python Bindings

import pycyxwiz as cyx

# Create tensor
t = cyx.Tensor([1.0, 2.0, 3.0, 4.0], [2, 2])

# Factory functions
zeros = cyx.zeros([10, 10])
ones = cyx.ones([5, 5])
rand = cyx.rand([100, 50])
randn = cyx.randn([64, 128])

# Operations
result = t + t * 2.0
matmul = cyx.matmul(t, t.T)

# Device
gpu_t = t.to_gpu()
cpu_t = gpu_t.to_cpu()

# NumPy conversion
import numpy as np
np_array = t.numpy()
from_np = cyx.from_numpy(np_array)

Performance Tips

  1. Batch operations: Operate on batches rather than individual samples
  2. Minimize transfers: Keep data on GPU when possible
  3. Use in-place: Use +=, -= etc. when the original is no longer needed
  4. Contiguous memory: Ensure tensors are contiguous before heavy computation
  5. Appropriate dtype: Use Float32 for most cases, Float16 for large models

API Summary

Shape & Dimensions
  • Shape()
  • NumDimensions()
  • NumElements()
  • Dim(int index)
Math Operations
  • Abs(), Sqrt(), Exp(), Log()
  • Pow(float), Sin(), Cos(), Tanh()
  • MatMul(), Dot()
Reductions
  • Sum(dim, keepdim)
  • Mean(dim, keepdim)
  • Max(dim, keepdim)
  • Min(dim, keepdim)
  • ArgMax(), ArgMin()
Reshape Operations
  • Reshape(), Flatten()
  • Squeeze(), Unsqueeze()
  • Transpose(), Permute()