MobiusNet

A vision architecture built on continuous topological principles, replacing traditional activations with wave-based interference gating.

Overview

MobiusNet introduces a fundamentally different approach to neural network design:

MobiusLens: Wave superposition as a gating mechanism, replacing standard activations (ReLU, GELU)
Thirds Mask: Cantor-inspired fractal channel suppression for regularization
Continuous Topology: Layers sample a continuous manifold via the t parameter, not discrete units
Twist Rotations: Smooth rotation through representation space across network depth
Integrator: The integrator uses GELU in experimentation to enable additional GELU-based nonlinearity.

Performance

Model	Params	GFLOPs	Tiny ImageNet
MobiusNet-Base	33.7M	2.69	TBD

Installation

pip install torch torchvision safetensors huggingface_hub tensorboard tqdm

Quick Start

Training

from mobius_trainer_full import train_tiny_imagenet

model, best_acc = train_tiny_imagenet(
    preset='mobius_base',
    epochs=200,
    lr=3e-4,
    batch_size=128,
    use_integrator=True,
    data_dir='./data/tiny-imagenet-200',
    output_dir='./outputs',
    hf_repo='AbstractPhil/mobiusnet',
    save_every_n_epochs=10,
    upload_every_n_epochs=10,
)

Continue from Checkpoint

# From local directory
model, best_acc = train_tiny_imagenet(
    preset='mobius_base',
    epochs=200,
    continue_from="./outputs/checkpoints/mobius_base_tiny_imagenet/20240101_120000",
)

# From HuggingFace (auto-downloads)
model, best_acc = train_tiny_imagenet(
    preset='mobius_base',
    epochs=200,
    continue_from="checkpoints/mobius_base_tiny_imagenet/20240101_120000",
)

Inference

from safetensors.torch import load_file
from mobius_trainer_full import MobiusNet, PRESETS

# Load model
config = PRESETS['mobius_base']
model = MobiusNet(num_classes=200, use_integrator=True, **config)
state_dict = load_file("best_model.safetensors")
model.load_state_dict(state_dict)
model.eval()

# Inference
with torch.no_grad():
    logits = model(image_tensor)
    pred = logits.argmax(1)

Model Presets

Preset	Channels	Depths	~Params
`mobius_tiny_s`	(64, 128, 256)	(2, 2, 2)	500K
`mobius_tiny_m`	(64, 128, 256, 512, 768)	(2, 2, 4, 2, 2)	11M
`mobius_tiny_l`	(96, 192, 384, 768)	(3, 3, 3, 3)	8M
`mobius_base`	(128, 256, 512, 768, 1024)	(2, 2, 2, 2, 2)	33.7M

Architecture

Input
  │
  ▼
┌─────────────────────────────────┐
│ Stem (Conv → BN)                │
└─────────────────────────────────┘
  │
  ▼
┌─────────────────────────────────┐
│ Stage 1-N                       │
│ ┌─────────────────────────────┐ │
│ │ MobiusConvBlock (×depth)    │ │
│ │  ├─ Depthwise-Sep Conv      │ │
│ │  ├─ BatchNorm               │ │
│ │  ├─ MobiusLens (wave gate)  │ │
│ │  ├─ Thirds Mask             │ │
│ │  └─ Learned Residual        │ │
│ └─────────────────────────────┘ │
│ Downsample (stride-2 conv)      │
└─────────────────────────────────┘
  │
  ▼
┌─────────────────────────────────┐
│ Integrator (Conv → BN → GELU)   │  ← Task collapse
└─────────────────────────────────┘
  │
  ▼
┌─────────────────────────────────┐
│ Pool → Linear → Classes         │
└─────────────────────────────────┘

Core Components

MobiusLens

Wave-based gating mechanism with three interference paths:

L = wave(phase_l, drift_l)   # Left path  (+1 drift)
M = wave(phase_m, drift_m)   # Middle path (0 drift, ghost)
R = wave(phase_r, drift_r)   # Right path (-1 drift)

# Interference
xor_comp = |L + R - 2*L*R|   # Differentiable XOR
and_comp = L * R              # Differentiable AND

# Gating
gate = weighted_sum(L, M, R) * interference_blend
output = input * sigmoid(layernorm(gate))

The middle path (M) acts as a "ghost" — present but diminished — maintaining gradient continuity while biasing information flow toward L/R edges (Cantor-like structure).

Thirds Mask

Rotating channel suppression inspired by Cantor set construction:

Layer 0: suppress channels [0:C/3]
Layer 1: suppress channels [C/3:2C/3]
Layer 2: suppress channels [2C/3:C]
Layer 3: back to [0:C/3]

Forces redundancy and prevents co-adaptation across channel groups.

Continuous Topology

Each layer samples a continuous manifold:

t = layer_idx / (total_layers - 1)  # 0 → 1

twist_in_angle = t * π
twist_out_angle = -t * π
scales = scale_range[0] + t * scale_span

Adding layers = finer sampling of the same underlying structure.

Checkpoints

Saved to: checkpoints/{variant}_{dataset}/{timestamp}/

├── config.json
├── best_accuracy.json
├── final_accuracy.json
├── checkpoints/
│   ├── checkpoint_epoch_0010.pt
│   ├── checkpoint_epoch_0010.safetensors
│   ├── best_model.pt
│   ├── best_model.safetensors
│   ├── final_model.pt
│   └── final_model.safetensors
└── tensorboard/

TensorBoard

Monitor training:

tensorboard --logdir ./outputs/checkpoints

Tracks:

Loss, train/val accuracy
Per-layer lens parameters (omega, alpha, twist angles, L/M/R weights)
Residual weights
Weight histograms

Data Setup

Tiny ImageNet

wget http://cs231n.stanford.edu/tiny-imagenet-200.zip
unzip tiny-imagenet-200.zip -d ./data/

License

Apache 2.0

Citation

@misc{mobiusnet2026,
  title={MobiusNet: Wave-Based Topological Vision Architecture},
  author={AbstractPhil},
  year={2026},
  url={https://huggingface.co/AbstractPhil/mobiusnet}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support