PARCtorch.data package

Submodules

PARCtorch.data.dataset module

class PARCtorch.data.dataset.GenericPhysicsDataset(data_dirs, future_steps=1, min_max_path=None, required_channels=None, validate=True)

Bases: Dataset

A generic PyTorch Dataset for loading preprocessed physics data with sliding window sample generation and channel-wise normalization using precomputed min and max values.

This class is designed to be flexible and can handle various datasets by specifying the data directories, number of channels, and other relevant parameters.

normalize_channel(tensor, channel_idx)

Normalizes a specific channel of the tensor between 0 and 1.

Parameters:
  • tensor (torch.Tensor) – The tensor to normalize.

  • channel_idx (int) – The index of the channel to normalize.

Returns:

The normalized tensor.

Return type:

torch.Tensor

class PARCtorch.data.dataset.InitialConditionDataset(data_dirs, future_steps=1, t1=None, min_max_path=None, required_channels=None, validate=True)

Bases: Dataset

A PyTorch Dataset for loading only the initial condition (first time step) from preprocessed physics data.

Each sample consists of:
  • ic: Initial condition tensor of shape (channels, height, width)

  • t0: Scalar tensor (0.0)

  • t1: Tensor indicating future steps (defined by the user or calculated automatically)

  • target: Placeholder or can be set to None since the model will predict the entire sequence

normalize_channel(tensor, channel_idx)

Normalizes a specific channel of the tensor between 0 and 1.

Parameters:
  • tensor (torch.Tensor) – The tensor to normalize.

  • channel_idx (int) – The index of the channel to normalize.

Returns:

The normalized tensor.

Return type:

torch.Tensor

PARCtorch.data.dataset.custom_collate_fn(batch)

Custom collate function to rearrange the target tensor.

Parameters:

batch – A list of tuples (ic, t0, t1, target)

Returns:

  • ic: (batch_size, channels, height, width)

  • t0: 0.0 (scalar tensor)

  • t1: (future_steps,) tensor

  • target: (future_steps, batch_size, channels, height, width)

Return type:

Batched tensors and fixed time indicators

PARCtorch.data.dataset.initial_condition_collate_fn(batch)

Custom collate function for InitialConditionDataset.

Parameters:

batch – A list of tuples (ic, t0, t1, target)

Returns:

Batched tensors and fixed time indicators:
  • ic: (batch_size, channels, height, width)

  • t0: Scalar tensor (0.0)

  • t1: (future_steps,) tensor

  • target: None (since target is not used)

Return type:

tuple

PARCtorch.data.dataset.validate_data_format(data_dirs, future_steps=1, min_max_path=None, required_channels=None)

Validates the format of the data directories to ensure they contain properly formatted .npy files and corresponding min_max.json files.

Parameters:
  • data_dirs (list of str) – List of directories containing preprocessed .npy files.

  • future_steps (int) – Number of timesteps in the future the model will predict.

  • min_max_path (str, optional) – Path to the JSON file containing min and max values for each channel. If None, it will look for ‘min_max.json’ in each data directory.

  • required_channels (int, optional) – Number of channels expected in the data.

Raises:

ValueError – If any of the validation checks fail.

PARCtorch.data.normalization module

PARCtorch.data.normalization.compute_min_max(data_dirs, output_file='min_max.json')

Computes the min and max values for each channel across multiple datasets.

Parameters:
  • data_dirs (list of str) – List of directories containing .npy files.

  • output_file (str) – Name of the output JSON file.

Returns:

None

Module contents