Data¶
Data validation, normalisation, binning, and reshaping.
data ¶
Data validation, normalisation, binning, and reshaping.
This module provides dataset-agnostic utilities for preparing motion-capture
data before DMD analysis. Hawk-specific file loading lives in hawk.
Functions:
| Name | Description |
|---|---|
validate_marker_data |
Check array shape and flatten to |
load_sequence_data |
Extract one sequence from a DataFrame by |
remove_time_duplicates |
Drop duplicate frames from a DataFrame. |
normalise_data |
Centre data by subtracting a mean shape. |
add_average_shape |
Inverse of |
bin_dataframe_means |
Temporal/spatial binning returning per-bin means. |
spline_interpolation |
Cubic-spline onto evenly spaced time points. |
expand_time_sequence |
Create an extended, evenly spaced time array. |
expand_marker_sequence |
Repeat frames to fill an expanded time array. |
validate_marker_data ¶
Validate marker data shape and flatten.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
ndarray
|
Either |
required |
Returns:
| Name | Type | Description |
|---|---|---|
data_flat |
ndarray
|
Shape |
n_frames |
int
|
|
n_markers |
int
|
|
n_coords |
int
|
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If data is not 2-D or 3-D, or the last axis is not 3. |
Source code in src/birddmd/data.py
remove_time_duplicates ¶
Drop duplicate rows based on column_name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Input data (not modified in place). |
required |
column_name
|
str
|
Column used to detect duplicates. |
'frameID'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
De-duplicated copy, index reset. |
Source code in src/birddmd/data.py
load_sequence_data ¶
load_sequence_data(df: DataFrame, seqID: str, marker_column_names: ndarray) -> tuple[np.ndarray, np.ndarray]
Extract marker coordinates and timestamps for one seqID.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Must contain a |
required |
seqID
|
str
|
Sequence identifier to filter on. |
required |
marker_column_names
|
ndarray
|
Column names for the marker coordinates. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
markers |
ndarray
|
Shape |
times |
ndarray
|
Shape |
Source code in src/birddmd/data.py
normalise_data ¶
Centre data by subtracting average_shape.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
markers
|
ndarray
|
Shape |
required |
average_shape
|
ndarray
|
Mean shape to subtract (broadcast-compatible). |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Centred data, same shape as markers. |
Source code in src/birddmd/data.py
add_average_shape ¶
Add the mean shape back to centred data (inverse of normalise_data).
bin_dataframe_means ¶
bin_dataframe_means(dataframe: DataFrame, x_axis: str = 'HorzDistance', bin_size: float = DEFAULT_BIN_SIZE, numeric_cast_columns: list[str] | None = None) -> pd.DataFrame
Bin a DataFrame along x_axis and return per-bin means.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataframe
|
DataFrame
|
Input data. |
required |
x_axis
|
str
|
Column used for binning. |
'HorzDistance'
|
bin_size
|
float
|
Width of each bin. |
DEFAULT_BIN_SIZE
|
numeric_cast_columns
|
list of str
|
Columns to cast to |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
One row per bin centre with mean values for numeric columns. |
Source code in src/birddmd/data.py
spline_interpolation ¶
Cubic-spline interpolation onto evenly spaced time points.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
times
|
ndarray
|
Original (possibly unevenly spaced) time vector. |
required |
markers
|
ndarray
|
Shape |
required |
Returns:
| Name | Type | Description |
|---|---|---|
new_times |
ndarray
|
|
new_markers |
ndarray
|
|
Source code in src/birddmd/data.py
expand_time_sequence ¶
Create an expanded, evenly spaced time sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
times
|
ndarray
|
Original time vector. |
required |
expansion_factor
|
float
|
Multiply the end time and frame count by this factor. |
3.0
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Evenly spaced times from |
Source code in src/birddmd/data.py
expand_marker_sequence ¶
Repeat marker frames to fill expanded_times.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
times
|
ndarray
|
Original time vector. |
required |
markers
|
ndarray
|
Original marker data, shape |
required |
expanded_times
|
ndarray
|
Target time vector (typically from |
required |
Returns:
| Type | Description |
|---|---|
ndarray
|
Expanded markers with the same trailing dimensions as markers. |