PARCtorch.data package
Submodules
PARCtorch.data.dataset module
- class PARCtorch.data.dataset.GenericPhysicsDataset(data_dirs, future_steps=1, min_max_path=None, required_channels=None, validate=True)
Bases:
DatasetA generic PyTorch Dataset for loading preprocessed physics data with sliding window sample generation and channel-wise normalization using precomputed min and max values.
This class is designed to be flexible and can handle various datasets by specifying the data directories, number of channels, and other relevant parameters.
- normalize_channel(tensor, channel_idx)
Normalizes a specific channel of the tensor between 0 and 1.
- Parameters:
tensor (torch.Tensor) – The tensor to normalize.
channel_idx (int) – The index of the channel to normalize.
- Returns:
The normalized tensor.
- Return type:
torch.Tensor
- class PARCtorch.data.dataset.InitialConditionDataset(data_dirs, future_steps=1, t1=None, min_max_path=None, required_channels=None, validate=True)
Bases:
DatasetA PyTorch Dataset for loading only the initial condition (first time step) from preprocessed physics data.
- Each sample consists of:
ic: Initial condition tensor of shape (channels, height, width)
t0: Scalar tensor (0.0)
t1: Tensor indicating future steps (defined by the user or calculated automatically)
target: Placeholder or can be set to None since the model will predict the entire sequence
- normalize_channel(tensor, channel_idx)
Normalizes a specific channel of the tensor between 0 and 1.
- Parameters:
tensor (torch.Tensor) – The tensor to normalize.
channel_idx (int) – The index of the channel to normalize.
- Returns:
The normalized tensor.
- Return type:
torch.Tensor
- PARCtorch.data.dataset.custom_collate_fn(batch)
Custom collate function to rearrange the target tensor.
- Parameters:
batch – A list of tuples (ic, t0, t1, target)
- Returns:
ic: (batch_size, channels, height, width)
t0: 0.0 (scalar tensor)
t1: (future_steps,) tensor
target: (future_steps, batch_size, channels, height, width)
- Return type:
Batched tensors and fixed time indicators
- PARCtorch.data.dataset.initial_condition_collate_fn(batch)
Custom collate function for InitialConditionDataset.
- Parameters:
batch – A list of tuples (ic, t0, t1, target)
- Returns:
- Batched tensors and fixed time indicators:
ic: (batch_size, channels, height, width)
t0: Scalar tensor (0.0)
t1: (future_steps,) tensor
target: None (since target is not used)
- Return type:
tuple
- PARCtorch.data.dataset.validate_data_format(data_dirs, future_steps=1, min_max_path=None, required_channels=None)
Validates the format of the data directories to ensure they contain properly formatted .npy files and corresponding min_max.json files.
- Parameters:
data_dirs (list of str) – List of directories containing preprocessed .npy files.
future_steps (int) – Number of timesteps in the future the model will predict.
min_max_path (str, optional) – Path to the JSON file containing min and max values for each channel. If None, it will look for ‘min_max.json’ in each data directory.
required_channels (int, optional) – Number of channels expected in the data.
- Raises:
ValueError – If any of the validation checks fail.
PARCtorch.data.normalization module
- PARCtorch.data.normalization.compute_min_max(data_dirs, output_file='min_max.json')
Computes the min and max values for each channel across multiple datasets.
- Parameters:
data_dirs (list of str) – List of directories containing .npy files.
output_file (str) – Name of the output JSON file.
- Returns:
None