Skip to content

2aronS/simpletm

Repository files navigation

# SimpleTM

Official implementation of SimpleTM for multivariate time series forecasting

## motivation

Most modern time series forecasting models rely on complex attention mechanisms or intricate architectural designs that require significant computational resources and careful hyperparameter tuning. SimpleTM takes a different approach: it demonstrates that a straightforward temporal mixing architecture can achieve competitive results while remaining interpretable and easy to train. The model focuses on learning temporal dependencies directly through simple feed-forward operations, making it accessible for both research and production environments.

## architecture

```mermaid
graph LR
    A[Input Time Series] --> B[Normalization]
    B --> C[Temporal Mixing Block 1]
    C --> D[Temporal Mixing Block 2]
    D --> E[Temporal Mixing Block N]
    E --> F[Linear Projection]
    F --> G[Denormalization]
    G --> H[Forecast Output]
    
    style C fill:#e1f5ff
    style D fill:#e1f5ff
    style E fill:#e1f5ff

getting started

install

pip install simpletm

quickstart

import torch
from simpletm import SimpleTM

# Initialize model
model = SimpleTM(
    input_len=96,
    output_len=24,
    n_features=7,
    d_model=512,
    n_blocks=4
)

# Prepare data: (batch_size, input_len, n_features)
x = torch.randn(32, 96, 7)

# Generate forecast: (batch_size, output_len, n_features)
forecast = model(x)

# Training
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = torch.nn.MSELoss()

for epoch in range(100):
    optimizer.zero_grad()
    output = model(x_train)
    loss = criterion(output, y_train)
    loss.backward()
    optimizer.step()

how it works

SimpleTM processes multivariate time series through a stack of temporal mixing blocks. Each block applies learned transformations along the time dimension while preserving the feature structure. The model first normalizes input sequences using reversible instance normalization to handle distribution shifts. Then, temporal mixing blocks extract patterns through linear projections and nonlinear activations applied across time steps. Finally, a projection head maps the processed representation to the target forecast horizon. This design avoids the quadratic complexity of attention while maintaining the ability to capture long-range dependencies through stacked operations.

configuration

Key parameters for model initialization:

Parameter Type Default Description
input_len int required Length of input sequence
output_len int required Length of forecast horizon
n_features int required Number of input features
d_model int 512 Hidden dimension size
n_blocks int 4 Number of temporal mixing blocks
dropout float 0.1 Dropout rate
norm_type str 'instance' Normalization method ('instance', 'batch', or 'none')

Example configuration file (config.yaml):

model:
  d_model: 512
  n_blocks: 4
  dropout: 0.1
  norm_type: instance

training:
  batch_size: 32
  learning_rate: 0.001
  epochs: 100
  weight_decay: 1e-5

faq

Q: How does SimpleTM compare to Transformer-based models?

A: SimpleTM trades the flexibility of attention for simplicity and efficiency. It runs faster and uses less memory while achieving comparable accuracy on many benchmarks. Choose Transformers if you need explicit attention weights or have very long sequences.

Q: Can I use this for univariate forecasting?

A: Yes, set n_features=1. The model works fine with single-variable series.

Q: What context length should I use?

A: Start with 96 or 192 steps. Longer contexts don't always help and depend on your data's temporal patterns. Experiment with your specific dataset.

Q: Does this work for irregularly sampled data?

A: No, SimpleTM expects regular intervals. Resample your data first or consider interpolation methods.

Q: How do I handle missing values?

A: Impute before feeding data to the model. Forward fill, linear interpolation, or learned imputation all work depending on your use case.

license

MIT

About

Official implementation of SimpleTM for multivariate time series forecasting

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors