Pandas Groupby TDS Frame

The PandasApiGroupbyTdsFrame class is returned by PandasApiTdsFrame.groupby and provides methods for applying aggregation and window operations within each group. The groupby columns also serve as the PARTITION BY clause for OLAP window functions such as rank.

agg

PandasApiGroupbyTdsFrame.agg(func, axis=0, *args, **kwargs)[source]

Aggregate each group using one or more operations.

Alias for aggregate(). See aggregate for full documentation.

See also

aggregate

Equivalent method (canonical name).

Return type:

PandasApiTdsFrame

aggregate

PandasApiGroupbyTdsFrame.aggregate(func, axis=0, *args, **kwargs)[source]

Aggregate each group using one or more operations.

Apply aggregation function(s) to each group defined by the preceding groupby call. The grouping columns always appear in the result alongside the aggregated values. When sort=True was passed to groupby (the default), the result is sorted by the grouping columns.

Parameters:
  • func (Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc, List[Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc]], Mapping[Hashable, Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc, List[Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc]]]]]) –

    Aggregation specification. Accepted forms:

    • str : A named aggregation (e.g. 'sum') applied to all non-grouping columns (or selected columns if bracket notation was used after groupby).

    • callable : A function that receives a column Series proxy and returns an aggregated value (e.g. lambda x: x.sum()).

    • np.ufunc : A NumPy universal function (e.g. np.sum).

    • list : A list of the above, producing one output column per function per input column.

    • dict : A mapping of column name → aggregation(s). Only the specified columns appear in the result.

  • axis (Union[int, str]) – Only 0 / 'index' is supported.

  • *args (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Not supported.

  • **kwargs (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Not supported.

Returns:

A new TDS frame with one row per group and the aggregated columns.

Return type:

PandasApiTdsFrame

Raises:
  • TypeError – If func is not a valid aggregation specification.

  • KeyError – If a column name in a dict-based func does not exist in the frame.

  • NotImplementedError – If axis is not 0 / 'index', or if extra *args / **kwargs are passed.

See also

agg

Alias for aggregate.

PandasApiTdsFrame.aggregate

Frame-level aggregation (no grouping).

Notes

Differences from pandas:

  • The result always contains the grouping columns as regular columns (never as the index), because as_index is always False.

  • Extra *args / **kwargs are not forwarded to the aggregation function.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Dict-based aggregation on groups
frame.groupby("Ship Name").aggregate(
    {"Order Id": "count"}
).head(5).to_pandas()
Ship Name Order Id
0 Alfred's Futterkiste 5
1 Alfreds Futterkiste 1
2 Ana Trujillo Emparedados y helados 4
3 Antonio Moreno Taquería 7
4 Around the Horn 13
# Multiple aggregations per column
frame.groupby("Ship Name").aggregate(
    {"Order Id": ["min", "max"]}
).head(5).to_pandas()
Ship Name min(Order Id) max(Order Id)
0 Alfred's Futterkiste 10692 11011
1 Alfreds Futterkiste 10643 10643
2 Ana Trujillo Emparedados y helados 10308 10926
3 Antonio Moreno Taquería 10365 10856
4 Around the Horn 10355 11016
# Broadcast a single function to all non-grouping columns
frame.groupby("Ship Name", sort=False).aggregate("count").head(5).to_pandas()
Ship Name Order Id Order Date Required Date Shipped Date
0 Alfred's Futterkiste 5 5 5 5
1 Alfreds Futterkiste 1 1 1 1
2 Ana Trujillo Emparedados y helados 4 4 4 4
3 Antonio Moreno Taquería 7 7 7 7
4 Around the Horn 13 13 13 13

count

PandasApiGroupbyTdsFrame.count()[source]

Count non-null values within each group.

Convenience method equivalent to aggregate('count') on the groupby object. Returns the number of non-null values per column for each group.

Returns:

A new TDS frame with one row per group.

Return type:

PandasApiTdsFrame

See also

sum

Compute group sums.

aggregate

General grouped aggregation.

PandasApiTdsFrame.count

Frame-level count (no grouping).

Notes

Differences from the frame-level PandasApiTdsFrame.count():

  • The groupby count takes no parameters (no axis, numeric_only, or **kwargs), matching the pandas DataFrameGroupBy.count signature.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].count().head(5).to_pandas()

cume_dist_legend_ext

PandasApiGroupbyTdsFrame.cume_dist_legend_ext(ascending=True)[source]

PyLegend extension (not present in pandas).

Compute the cumulative distribution within each group, equivalent to SQL CUME_DIST() OVER (PARTITION BY ... ORDER BY col) and Pure cumulativeDistribution.

Return type:

PandasApiTdsFrame

expanding

PandasApiGroupbyTdsFrame.expanding(min_periods=1, method=None, order_by=None, ascending=True)[source]
Return type:

PandasApiWindowTdsFrame

max

PandasApiGroupbyTdsFrame.max(numeric_only=False, min_count=-1, engine=None, engine_kwargs=None)[source]

Compute the maximum value within each group.

Convenience method equivalent to aggregate('max') on the groupby object. For string columns, returns the lexicographically largest value per group.

Parameters:
  • numeric_only (bool) – Must be False. True is not supported.

  • min_count (int) – Must be -1. Other values are not supported.

  • engine (Optional[str]) – Not supported. Must be None.

  • engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.

Returns:

A new TDS frame with one row per group.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If any parameter is set to an unsupported value.

See also

min

Compute group minimums.

aggregate

General grouped aggregation.

PandasApiTdsFrame.max

Frame-level max (no grouping).

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].max().head(5).to_pandas()

mean

PandasApiGroupbyTdsFrame.mean(numeric_only=False, engine=None, engine_kwargs=None)[source]

Compute the mean of values within each group.

Convenience method equivalent to aggregate('mean') on the groupby object.

Parameters:
  • numeric_only (bool) – Must be False. True is not supported.

  • engine (Optional[str]) – Not supported. Must be None.

  • engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.

Returns:

A new TDS frame with one row per group.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If any parameter is set to an unsupported value.

See also

aggregate

General grouped aggregation.

PandasApiTdsFrame.mean

Frame-level mean (no grouping).

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].mean().head(5).to_pandas()

median

PandasApiGroupbyTdsFrame.median()[source]

Compute the median of each numeric column within each group.

Applies PERCENTILE_CONT(0.5) at the SQL level for each numeric column. Non-numeric columns are excluded automatically.

Returns:

A new TDS frame with one row per group.

Return type:

PandasApiTdsFrame

See also

mean

Compute group means.

aggregate

General grouped aggregation.

Notes

Differences from pandas:

  • Only numeric columns are included; non-numeric columns are silently skipped.

  • The pandas numeric_only parameter is not available; the behaviour is always numeric_only=True.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].median().head(5).to_pandas()

min

PandasApiGroupbyTdsFrame.min(numeric_only=False, min_count=-1, engine=None, engine_kwargs=None)[source]

Compute the minimum value within each group.

Convenience method equivalent to aggregate('min') on the groupby object. For string columns, returns the lexicographically smallest value per group.

Parameters:
  • numeric_only (bool) – Must be False. True is not supported.

  • min_count (int) – Must be -1. Other values are not supported.

  • engine (Optional[str]) – Not supported. Must be None.

  • engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.

Returns:

A new TDS frame with one row per group.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If any parameter is set to an unsupported value.

See also

max

Compute group maximums.

aggregate

General grouped aggregation.

PandasApiTdsFrame.min

Frame-level min (no grouping).

Notes

Differences from the frame-level PandasApiTdsFrame.min():

  • The min_count parameter defaults to -1 (matching the pandas DataFrameGroupBy.min default) rather than being absent.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].min().head(5).to_pandas()

ntile_legend_ext

PandasApiGroupbyTdsFrame.ntile_legend_ext(num_buckets, ascending=True)[source]

PyLegend extension (not present in pandas).

Compute the NTILE bucket within each group, equivalent to SQL NTILE(n) OVER (PARTITION BY ... ORDER BY col) and Pure ntile.

Return type:

PandasApiTdsFrame

rank

PandasApiGroupbyTdsFrame.rank(method='min', ascending=True, na_option='bottom', pct=False, axis=0)[source]

Compute the rank of values within each group.

Rank each value within its group defined by the preceding groupby call. The grouping columns act as the PARTITION BY clause in the underlying SQL window function. Only the ranked (non-grouping) columns appear in the result.

Parameters:
  • method (str) –

    How to rank equal values:

    • 'min' : Lowest rank in the group of ties (SQL RANK()).

    • 'first' : Ranks assigned in order of appearance within the group (SQL ROW_NUMBER()).

    • 'dense' : Like 'min' but ranks always increase by 1, no gaps (SQL DENSE_RANK()).

  • ascending (bool) – Whether to rank in ascending order. False ranks in descending order.

  • na_option (str) – How to rank null values. Only 'bottom' is supported. 'keep' and 'top' raise NotImplementedError.

  • pct (bool) – If True, compute percentage ranks (SQL PERCENT_RANK()). Result columns are of float type. Can only be used with method='min'.

  • axis (Union[int, str]) – Only 0 / 'index' is supported.

Returns:

A new TDS frame containing only the ranked columns (the grouping columns are not included in the output). Each column contains integer ranks (or float when pct=True).

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If method is not one of 'min', 'first', 'dense'. If na_option is not 'bottom'. If pct=True with a method other than 'min'. If axis is not 0 or 'index'.

See also

PandasApiTdsFrame.rank

Frame-level rank (no partitioning).

aggregate

Grouped aggregation.

Notes

Differences from pandas:

  • The 'average' and 'max' ranking methods are not supported.

  • na_option only supports 'bottom'.

  • pct=True is only supported with method='min'.

  • The result contains only the ranked columns, not the grouping columns. In pandas, DataFrameGroupBy.rank returns a frame with the same shape as the input, preserving all columns. Here, grouping columns are excluded from the output. To preserve all columns, use bracket assignment with a single-column selection: frame["rank"] = frame.groupby("grp")["col"].rank().

  • numeric_only is not exposed in the groupby rank signature (it is always False).

  • Combining multiple rank calls in a single expression is not supported. Compute them in separate steps.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Rank within groups (only ranked columns in output)
frame.groupby("Ship Name")[["Order Id"]].rank().head(5).to_pandas()
Order Id
0 1
1 1
2 1
3 1
4 1
# Append a grouped rank column to the frame
frame["Order Rank"] = frame.groupby(
    "Ship Name"
)["Order Id"].rank()
frame.head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name Order Rank
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 1
1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 1
2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 1
3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 1
4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices 1
# Dense rank descending within groups
frame.groupby("Ship Name")[["Order Id"]].rank(
    method="dense", ascending=False
).head(5).to_pandas()
Order Id
0 5
1 6
2 14
3 10
4 12

rolling

PandasApiGroupbyTdsFrame.rolling(window, min_periods=None, center=False, win_type=None, on=None, closed=None, step=None, method=None, order_by=None, ascending=True)[source]
Return type:

PandasApiWindowTdsFrame

shift

PandasApiGroupbyTdsFrame.shift(order_by, periods=1, freq=None, axis=0, fill_value=None, suffix=None)[source]

Shift values by desired number of periods within each group.

Replace every column’s values with their shifted values, computing the shift independently for each group. Because the underlying TDS is inherently unordered, this requires an explicit order_by parameter to define the ordering for the window function partitioned by the group keys.

Parameters:
  • order_by (Union[str, Sequence[str]]) – Column name(s) to order the frame by within each group before applying the shift. Unlike pandas, this is required to ensure deterministic output. All specified columns must be present in the base frame.

  • periods (Union[int, Sequence[int]]) – Number of periods to shift. Currently, only 1 (shift down, SQL LAG) and -1 (shift up, SQL LEAD) are supported. If a sequence is provided, it cannot contain duplicate values.

  • freq (Union[str, int, None]) – Not supported. Must be None.

  • axis (Union[int, str]) – Axis to shift along. Only 0 / 'index' is supported.

  • fill_value (Optional[Hashable]) – Not supported. Must be None. Missing values introduced by the shift will always be null.

  • suffix (Optional[str]) – If provided, renames the resulting shifted columns by appending this string to the original column names. This argument can only be used if periods is a sequence (not a single integer).

Returns:

A new TDS frame with the shifted columns computed per group.

Return type:

PandasApiTdsFrame

Raises:
  • NotImplementedError – If periods contains any values other than 1 or -1. If freq is not None. If axis is not 0 or 'index'. If fill_value is not None.

  • ValueError – If any column specified in order_by is not present in the frame. If periods contains duplicate values. If suffix is specified but periods is a single integer.

See also

PandasApiTdsFrame.shift

Shift values for the entire frame.

Notes

Differences from pandas:

  • The order_by parameter is mandatory. In pandas, shift relies on the implicit order of the dataframe’s index. Here, because it translates to SQL, an explicit order must be provided.

  • periods is strictly limited to 1 or -1. Arbitrary integer shifts are not supported.

  • fill_value is not supported and must remain None.

  • The freq parameter is not supported and must be None.

  • axis=1 (shifting horizontally across columns) is not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Shift the entire frame down by 1 row within each 'Ship Name' group,
frame.groupby("Ship Name")[["Order Date", "Shipped Date"]].shift(
    order_by="Order Date",
    periods=1
).head(3).to_pandas()
Order Date Shipped Date
0 NaT NaT
1 NaT NaT
2 NaT NaT

std

PandasApiGroupbyTdsFrame.std(ddof=1, engine=None, engine_kwargs=None, numeric_only=False)[source]

Compute the standard deviation within each group.

Convenience method equivalent to aggregate('std') on the groupby object. Supports both ddof=1 (sample, maps to STDDEV_SAMP) and ddof=0 (population, maps to STDDEV_POP) at the SQL level.

Parameters:
  • ddof (int) – Degrees of freedom. 1 for sample standard deviation (STDDEV_SAMP), 0 for population standard deviation (STDDEV_POP).

  • engine (Optional[str]) – Not supported. Must be None.

  • engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.

  • numeric_only (bool) – Must be False. True is not supported.

Returns:

A new TDS frame with one row per group.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If ddof is not 0 or 1, or if engine, engine_kwargs, or numeric_only are set to unsupported values.

See also

var

Compute group variances.

aggregate

General grouped aggregation.

PandasApiTdsFrame.std

Frame-level std (no grouping).

Notes

Differences from pandas:

  • Only ddof=0 and ddof=1 are supported. Other values raise NotImplementedError.

  • engine, engine_kwargs, and numeric_only are not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].std().head(5).to_pandas()

sum

PandasApiGroupbyTdsFrame.sum(numeric_only=False, min_count=0, engine=None, engine_kwargs=None)[source]

Compute the sum of values within each group.

Convenience method equivalent to aggregate('sum') on the groupby object.

Parameters:
  • numeric_only (bool) – Must be False. True is not supported.

  • min_count (int) – Must be 0. Non-zero values are not supported.

  • engine (Optional[str]) – Not supported. Must be None.

  • engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.

Returns:

A new TDS frame with one row per group.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If any parameter is set to an unsupported value.

See also

aggregate

General grouped aggregation.

PandasApiTdsFrame.sum

Frame-level sum (no grouping).

Notes

Differences from the frame-level PandasApiTdsFrame.sum():

  • No axis, skipna, or **kwargs parameters. The groupby convenience methods follow the pandas DataFrameGroupBy signature, which omits these.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].sum().head(5).to_pandas()

var

PandasApiGroupbyTdsFrame.var(ddof=1, engine=None, engine_kwargs=None, numeric_only=False)[source]

Compute the variance within each group.

Convenience method equivalent to aggregate('var') on the groupby object. Supports both ddof=1 (sample, maps to VAR_SAMP) and ddof=0 (population, maps to VAR_POP) at the SQL level.

Parameters:
  • ddof (int) – Degrees of freedom. 1 for sample variance (VAR_SAMP), 0 for population variance (VAR_POP).

  • engine (Optional[str]) – Not supported. Must be None.

  • engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.

  • numeric_only (bool) – Must be False. True is not supported.

Returns:

A new TDS frame with one row per group.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If ddof is not 0 or 1, or if engine, engine_kwargs, or numeric_only are set to unsupported values.

See also

std

Compute group standard deviations.

aggregate

General grouped aggregation.

PandasApiTdsFrame.var

Frame-level var (no grouping).

Notes

Differences from pandas:

  • Only ddof=0 and ddof=1 are supported. Other values raise NotImplementedError.

  • engine, engine_kwargs, and numeric_only are not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].var().head(5).to_pandas()

window_frame_legend_ext

PandasApiGroupbyTdsFrame.window_frame_legend_ext(frame_spec=<pylegend.core.language.pandas_api.pandas_api_frame_spec.RowsBetween object>, order_by=None, ascending=True)[source]

PyLegend extension (not present in pandas).

Create a custom window specification with explicit control over the window frame. When called on a groupby frame the grouping columns are automatically used as PARTITION BY columns.

Parameters:
  • frame_spec (Optional[FrameSpec]) – A RowsBetween or RangeBetween specification object. None means no frame clause (just PARTITION BY + ORDER BY).

  • order_by (Union[str, Sequence[str], None]) – Column name(s) to use for ORDER BY within the window.

  • ascending (Union[bool, Sequence[bool]]) – Sort direction(s) for the ORDER BY columns.

Return type:

PandasApiWindowTdsFrame