MobiusNet
A vision architecture built on continuous topological principles, replacing traditional activations with wave-based interference gating.
Overview
MobiusNet introduces a fundamentally different approach to neural network design:
- MobiusLens: Wave superposition as a gating mechanism, replacing standard activations (ReLU, GELU)
- Thirds Mask: Cantor-inspired fractal channel suppression for regularization
- Continuous Topology: Layers sample a continuous manifold via the
tparameter, not discrete units - Twist Rotations: Smooth rotation through representation space across network depth
- Integrator: The integrator uses GELU in experimentation to enable additional GELU-based nonlinearity.
Performance
| Model | Params | GFLOPs | Tiny ImageNet |
|---|---|---|---|
| MobiusNet-Base | 33.7M | 2.69 | TBD |
Installation
pip install torch torchvision safetensors huggingface_hub tensorboard tqdm
Quick Start
Training
from mobius_trainer_full import train_tiny_imagenet
model, best_acc = train_tiny_imagenet(
preset='mobius_base',
epochs=200,
lr=3e-4,
batch_size=128,
use_integrator=True,
data_dir='./data/tiny-imagenet-200',
output_dir='./outputs',
hf_repo='AbstractPhil/mobiusnet',
save_every_n_epochs=10,
upload_every_n_epochs=10,
)
Continue from Checkpoint
# From local directory
model, best_acc = train_tiny_imagenet(
preset='mobius_base',
epochs=200,
continue_from="./outputs/checkpoints/mobius_base_tiny_imagenet/20240101_120000",
)
# From HuggingFace (auto-downloads)
model, best_acc = train_tiny_imagenet(
preset='mobius_base',
epochs=200,
continue_from="checkpoints/mobius_base_tiny_imagenet/20240101_120000",
)
Inference
from safetensors.torch import load_file
from mobius_trainer_full import MobiusNet, PRESETS
# Load model
config = PRESETS['mobius_base']
model = MobiusNet(num_classes=200, use_integrator=True, **config)
state_dict = load_file("best_model.safetensors")
model.load_state_dict(state_dict)
model.eval()
# Inference
with torch.no_grad():
logits = model(image_tensor)
pred = logits.argmax(1)
Model Presets
| Preset | Channels | Depths | ~Params |
|---|---|---|---|
mobius_tiny_s |
(64, 128, 256) | (2, 2, 2) | 500K |
mobius_tiny_m |
(64, 128, 256, 512, 768) | (2, 2, 4, 2, 2) | 11M |
mobius_tiny_l |
(96, 192, 384, 768) | (3, 3, 3, 3) | 8M |
mobius_base |
(128, 256, 512, 768, 1024) | (2, 2, 2, 2, 2) | 33.7M |
Architecture
Input
β
βΌ
βββββββββββββββββββββββββββββββββββ
β Stem (Conv β BN) β
βββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β Stage 1-N β
β βββββββββββββββββββββββββββββββ β
β β MobiusConvBlock (Γdepth) β β
β β ββ Depthwise-Sep Conv β β
β β ββ BatchNorm β β
β β ββ MobiusLens (wave gate) β β
β β ββ Thirds Mask β β
β β ββ Learned Residual β β
β βββββββββββββββββββββββββββββββ β
β Downsample (stride-2 conv) β
βββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β Integrator (Conv β BN β GELU) β β Task collapse
βββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β Pool β Linear β Classes β
βββββββββββββββββββββββββββββββββββ
Core Components
MobiusLens
Wave-based gating mechanism with three interference paths:
L = wave(phase_l, drift_l) # Left path (+1 drift)
M = wave(phase_m, drift_m) # Middle path (0 drift, ghost)
R = wave(phase_r, drift_r) # Right path (-1 drift)
# Interference
xor_comp = |L + R - 2*L*R| # Differentiable XOR
and_comp = L * R # Differentiable AND
# Gating
gate = weighted_sum(L, M, R) * interference_blend
output = input * sigmoid(layernorm(gate))
The middle path (M) acts as a "ghost" β present but diminished β maintaining gradient continuity while biasing information flow toward L/R edges (Cantor-like structure).
Thirds Mask
Rotating channel suppression inspired by Cantor set construction:
Layer 0: suppress channels [0:C/3]
Layer 1: suppress channels [C/3:2C/3]
Layer 2: suppress channels [2C/3:C]
Layer 3: back to [0:C/3]
Forces redundancy and prevents co-adaptation across channel groups.
Continuous Topology
Each layer samples a continuous manifold:
t = layer_idx / (total_layers - 1) # 0 β 1
twist_in_angle = t * Ο
twist_out_angle = -t * Ο
scales = scale_range[0] + t * scale_span
Adding layers = finer sampling of the same underlying structure.
Checkpoints
Saved to: checkpoints/{variant}_{dataset}/{timestamp}/
βββ config.json
βββ best_accuracy.json
βββ final_accuracy.json
βββ checkpoints/
β βββ checkpoint_epoch_0010.pt
β βββ checkpoint_epoch_0010.safetensors
β βββ best_model.pt
β βββ best_model.safetensors
β βββ final_model.pt
β βββ final_model.safetensors
βββ tensorboard/
TensorBoard
Monitor training:
tensorboard --logdir ./outputs/checkpoints
Tracks:
- Loss, train/val accuracy
- Per-layer lens parameters (omega, alpha, twist angles, L/M/R weights)
- Residual weights
- Weight histograms
Data Setup
Tiny ImageNet
wget http://cs231n.stanford.edu/tiny-imagenet-200.zip
unzip tiny-imagenet-200.zip -d ./data/
License
Apache 2.0
Citation
@misc{mobiusnet2026,
title={MobiusNet: Wave-Based Topological Vision Architecture},
author={AbstractPhil},
year={2026},
url={https://huggingface.co/AbstractPhil/mobiusnet}
}