Pandas Groupby TDS Frame

The PandasApiGroupbyTdsFrame class is returned by PandasApiTdsFrame.groupby and provides methods for applying aggregation and window operations within each group. The groupby columns also serve as the PARTITION BY clause for OLAP window functions such as rank.

agg

PandasApiGroupbyTdsFrame.agg(func, axis=0, *args, **kwargs)[source]

Aggregate each group using one or more operations.

Alias for aggregate(). See aggregate for full documentation.

See also

aggregate: Equivalent method (canonical name).

Return type:: PandasApiTdsFrame

aggregate

PandasApiGroupbyTdsFrame.aggregate(func, axis=0, *args, **kwargs)[source]

Aggregate each group using one or more operations.

Apply aggregation function(s) to each group defined by the preceding groupby call. The grouping columns always appear in the result alongside the aggregated values. When sort=True was passed to groupby (the default), the result is sorted by the grouping columns.

Parameters:

func (Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc, List[Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc]], Mapping[Hashable, Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc, List[Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc]]]]]) –
Aggregation specification. Accepted forms:
- str : A named aggregation (e.g. 'sum') applied to all non-grouping columns (or selected columns if bracket notation was used after groupby).
- callable : A function that receives a column Series proxy and returns an aggregated value (e.g. lambda x: x.sum()).
- np.ufunc : A NumPy universal function (e.g. np.sum).
- list : A list of the above, producing one output column per function per input column.
- dict : A mapping of column name → aggregation(s). Only the specified columns appear in the result.
axis (Union[int, str]) – Only 0 / 'index' is supported.
*args (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Not supported.
**kwargs (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Not supported.

Returns:

A new TDS frame with one row per group and the aggregated columns.

Return type:

PandasApiTdsFrame

Raises:

TypeError – If func is not a valid aggregation specification.
KeyError – If a column name in a dict-based func does not exist in the frame.
NotImplementedError – If axis is not 0 / 'index', or if extra *args / **kwargs are passed.

See also

agg: Alias for aggregate.
PandasApiTdsFrame.aggregate: Frame-level aggregation (no grouping).

Notes

Differences from pandas:

The result always contains the grouping columns as regular columns (never as the index), because as_index is always False.
Extra *args / **kwargs are not forwarded to the aggregation function.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Dict-based aggregation on groups
frame.groupby("Ship Name").aggregate(
    {"Order Id": "count"}
).head(5).to_pandas()

	Ship Name	Order Id
0	Alfred's Futterkiste	5
1	Alfreds Futterkiste	1
2	Ana Trujillo Emparedados y helados	4
3	Antonio Moreno Taquería	7
4	Around the Horn	13

# Multiple aggregations per column
frame.groupby("Ship Name").aggregate(
    {"Order Id": ["min", "max"]}
).head(5).to_pandas()

	Ship Name	min(Order Id)	max(Order Id)
0	Alfred's Futterkiste	10692	11011
1	Alfreds Futterkiste	10643	10643
2	Ana Trujillo Emparedados y helados	10308	10926
3	Antonio Moreno Taquería	10365	10856
4	Around the Horn	10355	11016

# Broadcast a single function to all non-grouping columns
frame.groupby("Ship Name", sort=False).aggregate("count").head(5).to_pandas()

	Ship Name	Order Id	Order Date	Required Date	Shipped Date
0	Alfred's Futterkiste	5	5	5	5
1	Alfreds Futterkiste	1	1	1	1
2	Ana Trujillo Emparedados y helados	4	4	4	4
3	Antonio Moreno Taquería	7	7	7	7
4	Around the Horn	13	13	13	13

count

PandasApiGroupbyTdsFrame.count()[source]

Count non-null values within each group.

Convenience method equivalent to aggregate('count') on the groupby object. Returns the number of non-null values per column for each group.

Returns:: A new TDS frame with one row per group.
Return type:: PandasApiTdsFrame

See also

sum: Compute group sums.
aggregate: General grouped aggregation.
PandasApiTdsFrame.count: Frame-level count (no grouping).

Notes

Differences from the frame-level PandasApiTdsFrame.count():

The groupby count takes no parameters (no axis, numeric_only, or **kwargs), matching the pandas DataFrameGroupBy.count signature.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame.groupby("Ship Name")["Order Id"].count().head(5).to_pandas()

cume_dist_legend_ext

PandasApiGroupbyTdsFrame.cume_dist_legend_ext(ascending=True)[source]

PyLegend extension (not present in pandas).

Compute the cumulative distribution within each group, equivalent to SQL CUME_DIST() OVER (PARTITION BY ... ORDER BY col) and Pure cumulativeDistribution.

Return type:: PandasApiTdsFrame

expanding

PandasApiGroupbyTdsFrame.expanding(min_periods=1, method=None, order_by=None, ascending=True)[source]

Return type:: PandasApiWindowTdsFrame

max

PandasApiGroupbyTdsFrame.max(numeric_only=False, min_count=-1, engine=None, engine_kwargs=None)[source]

Compute the maximum value within each group.

Convenience method equivalent to aggregate('max') on the groupby object. For string columns, returns the lexicographically largest value per group.

Parameters:

numeric_only (bool) – Must be False. True is not supported.
min_count (int) – Must be -1. Other values are not supported.
engine (Optional[str]) – Not supported. Must be None.
engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.

Returns:

A new TDS frame with one row per group.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If any parameter is set to an unsupported value.

See also

min: Compute group minimums.
aggregate: General grouped aggregation.
PandasApiTdsFrame.max: Frame-level max (no grouping).

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame.groupby("Ship Name")["Order Id"].max().head(5).to_pandas()

mean

PandasApiGroupbyTdsFrame.mean(numeric_only=False, engine=None, engine_kwargs=None)[source]

Compute the mean of values within each group.

Convenience method equivalent to aggregate('mean') on the groupby object.

Parameters:

numeric_only (bool) – Must be False. True is not supported.
engine (Optional[str]) – Not supported. Must be None.
engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.

Returns:

A new TDS frame with one row per group.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If any parameter is set to an unsupported value.

See also

aggregate: General grouped aggregation.
PandasApiTdsFrame.mean: Frame-level mean (no grouping).

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame.groupby("Ship Name")["Order Id"].mean().head(5).to_pandas()

median

PandasApiGroupbyTdsFrame.median()[source]

Compute the median of each numeric column within each group.

Applies PERCENTILE_CONT(0.5) at the SQL level for each numeric column. Non-numeric columns are excluded automatically.

Returns:: A new TDS frame with one row per group.
Return type:: PandasApiTdsFrame

See also

mean: Compute group means.
aggregate: General grouped aggregation.

Notes

Differences from pandas:

Only numeric columns are included; non-numeric columns are silently skipped.
The pandas numeric_only parameter is not available; the behaviour is always numeric_only=True.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame.groupby("Ship Name")["Order Id"].median().head(5).to_pandas()

min

PandasApiGroupbyTdsFrame.min(numeric_only=False, min_count=-1, engine=None, engine_kwargs=None)[source]

Compute the minimum value within each group.

Convenience method equivalent to aggregate('min') on the groupby object. For string columns, returns the lexicographically smallest value per group.

Parameters:

numeric_only (bool) – Must be False. True is not supported.
min_count (int) – Must be -1. Other values are not supported.
engine (Optional[str]) – Not supported. Must be None.
engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.

Returns:

A new TDS frame with one row per group.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If any parameter is set to an unsupported value.

See also

max: Compute group maximums.
aggregate: General grouped aggregation.
PandasApiTdsFrame.min: Frame-level min (no grouping).

Notes

Differences from the frame-level PandasApiTdsFrame.min():

The min_count parameter defaults to -1 (matching the pandas DataFrameGroupBy.min default) rather than being absent.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame.groupby("Ship Name")["Order Id"].min().head(5).to_pandas()

ntile_legend_ext

PandasApiGroupbyTdsFrame.ntile_legend_ext(num_buckets, ascending=True)[source]

PyLegend extension (not present in pandas).

Compute the NTILE bucket within each group, equivalent to SQL NTILE(n) OVER (PARTITION BY ... ORDER BY col) and Pure ntile.

Return type:: PandasApiTdsFrame

rank

PandasApiGroupbyTdsFrame.rank(method='min', ascending=True, na_option='bottom', pct=False, axis=0)[source]

Compute the rank of values within each group.

Rank each value within its group defined by the preceding groupby call. The grouping columns act as the PARTITION BY clause in the underlying SQL window function. Only the ranked (non-grouping) columns appear in the result.

Parameters:

method (str) –
How to rank equal values:
- 'min' : Lowest rank in the group of ties (SQL RANK()).
- 'first' : Ranks assigned in order of appearance within the group (SQL ROW_NUMBER()).
- 'dense' : Like 'min' but ranks always increase by 1, no gaps (SQL DENSE_RANK()).
ascending (bool) – Whether to rank in ascending order. False ranks in descending order.
na_option (str) – How to rank null values. Only 'bottom' is supported. 'keep' and 'top' raise NotImplementedError.
pct (bool) – If True, compute percentage ranks (SQL PERCENT_RANK()). Result columns are of float type. Can only be used with method='min'.
axis (Union[int, str]) – Only 0 / 'index' is supported.

Returns:

A new TDS frame containing only the ranked columns (the grouping columns are not included in the output). Each column contains integer ranks (or float when pct=True).

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If method is not one of 'min', 'first', 'dense'. If na_option is not 'bottom'. If pct=True with a method other than 'min'. If axis is not 0 or 'index'.

See also

PandasApiTdsFrame.rank: Frame-level rank (no partitioning).
aggregate: Grouped aggregation.

Notes

Differences from pandas:

The 'average' and 'max' ranking methods are not supported.
na_option only supports 'bottom'.
pct=True is only supported with method='min'.
The result contains only the ranked columns, not the grouping columns. In pandas, DataFrameGroupBy.rank returns a frame with the same shape as the input, preserving all columns. Here, grouping columns are excluded from the output. To preserve all columns, use bracket assignment with a single-column selection: frame["rank"] = frame.groupby("grp")["col"].rank().
numeric_only is not exposed in the groupby rank signature (it is always False).
Combining multiple rank calls in a single expression is not supported. Compute them in separate steps.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Rank within groups (only ranked columns in output)
frame.groupby("Ship Name")[["Order Id"]].rank().head(5).to_pandas()

	Order Id
0	1
1	1
2	1
3	1
4	1

# Append a grouped rank column to the frame
frame["Order Rank"] = frame.groupby(
    "Ship Name"
)["Order Id"].rank()
frame.head(5).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name	Order Rank
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier	1
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten	1
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes	1
3	10251	1996-07-08	1996-08-05	1996-07-15	Victuailles en stock	1
4	10252	1996-07-09	1996-08-06	1996-07-11	Suprêmes délices	1

# Dense rank descending within groups
frame.groupby("Ship Name")[["Order Id"]].rank(
    method="dense", ascending=False
).head(5).to_pandas()

	Order Id
0	5
1	6
2	14
3	10
4	12

rolling

PandasApiGroupbyTdsFrame.rolling(window, min_periods=None, center=False, win_type=None, on=None, closed=None, step=None, method=None, order_by=None, ascending=True)[source]

Return type:: PandasApiWindowTdsFrame

shift

PandasApiGroupbyTdsFrame.shift(order_by, periods=1, freq=None, axis=0, fill_value=None, suffix=None)[source]

Shift values by desired number of periods within each group.

Replace every column’s values with their shifted values, computing the shift independently for each group. Because the underlying TDS is inherently unordered, this requires an explicit order_by parameter to define the ordering for the window function partitioned by the group keys.

Parameters:

order_by (Union[str, Sequence[str]]) – Column name(s) to order the frame by within each group before applying the shift. Unlike pandas, this is required to ensure deterministic output. All specified columns must be present in the base frame.
periods (Union[int, Sequence[int]]) – Number of periods to shift. Currently, only 1 (shift down, SQL LAG) and -1 (shift up, SQL LEAD) are supported. If a sequence is provided, it cannot contain duplicate values.
freq (Union[str, int, None]) – Not supported. Must be None.
axis (Union[int, str]) – Axis to shift along. Only 0 / 'index' is supported.
fill_value (Optional[Hashable]) – Not supported. Must be None. Missing values introduced by the shift will always be null.
suffix (Optional[str]) – If provided, renames the resulting shifted columns by appending this string to the original column names. This argument can only be used if periods is a sequence (not a single integer).

Returns:

A new TDS frame with the shifted columns computed per group.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If periods contains any values other than 1 or -1. If freq is not None. If axis is not 0 or 'index'. If fill_value is not None.
ValueError – If any column specified in order_by is not present in the frame. If periods contains duplicate values. If suffix is specified but periods is a single integer.

See also

PandasApiTdsFrame.shift: Shift values for the entire frame.

Notes

Differences from pandas:

The order_by parameter is mandatory. In pandas, shift relies on the implicit order of the dataframe’s index. Here, because it translates to SQL, an explicit order must be provided.
periods is strictly limited to 1 or -1. Arbitrary integer shifts are not supported.
fill_value is not supported and must remain None.
The freq parameter is not supported and must be None.
axis=1 (shifting horizontally across columns) is not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Shift the entire frame down by 1 row within each 'Ship Name' group,
frame.groupby("Ship Name")[["Order Date", "Shipped Date"]].shift(
    order_by="Order Date",
    periods=1
).head(3).to_pandas()

	Order Date	Shipped Date
0	NaT	NaT
1	NaT	NaT
2	NaT	NaT

std

PandasApiGroupbyTdsFrame.std(ddof=1, engine=None, engine_kwargs=None, numeric_only=False)[source]

Compute the standard deviation within each group.

Convenience method equivalent to aggregate('std') on the groupby object. Supports both ddof=1 (sample, maps to STDDEV_SAMP) and ddof=0 (population, maps to STDDEV_POP) at the SQL level.

Parameters:

ddof (int) – Degrees of freedom. 1 for sample standard deviation (STDDEV_SAMP), 0 for population standard deviation (STDDEV_POP).
engine (Optional[str]) – Not supported. Must be None.
engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.
numeric_only (bool) – Must be False. True is not supported.

Returns:

A new TDS frame with one row per group.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If ddof is not 0 or 1, or if engine, engine_kwargs, or numeric_only are set to unsupported values.

See also

var: Compute group variances.
aggregate: General grouped aggregation.
PandasApiTdsFrame.std: Frame-level std (no grouping).

Notes

Differences from pandas:

Only ddof=0 and ddof=1 are supported. Other values raise NotImplementedError.
engine, engine_kwargs, and numeric_only are not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame.groupby("Ship Name")["Order Id"].std().head(5).to_pandas()

sum

PandasApiGroupbyTdsFrame.sum(numeric_only=False, min_count=0, engine=None, engine_kwargs=None)[source]

Compute the sum of values within each group.

Convenience method equivalent to aggregate('sum') on the groupby object.

Parameters:

numeric_only (bool) – Must be False. True is not supported.
min_count (int) – Must be 0. Non-zero values are not supported.
engine (Optional[str]) – Not supported. Must be None.
engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.

Returns:

A new TDS frame with one row per group.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If any parameter is set to an unsupported value.

See also

aggregate: General grouped aggregation.
PandasApiTdsFrame.sum: Frame-level sum (no grouping).

Notes

Differences from the frame-level PandasApiTdsFrame.sum():

No axis, skipna, or **kwargs parameters. The groupby convenience methods follow the pandas DataFrameGroupBy signature, which omits these.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame.groupby("Ship Name")["Order Id"].sum().head(5).to_pandas()

var

PandasApiGroupbyTdsFrame.var(ddof=1, engine=None, engine_kwargs=None, numeric_only=False)[source]

Compute the variance within each group.

Convenience method equivalent to aggregate('var') on the groupby object. Supports both ddof=1 (sample, maps to VAR_SAMP) and ddof=0 (population, maps to VAR_POP) at the SQL level.

Parameters:

ddof (int) – Degrees of freedom. 1 for sample variance (VAR_SAMP), 0 for population variance (VAR_POP).
engine (Optional[str]) – Not supported. Must be None.
engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.
numeric_only (bool) – Must be False. True is not supported.

Returns:

A new TDS frame with one row per group.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If ddof is not 0 or 1, or if engine, engine_kwargs, or numeric_only are set to unsupported values.

See also

std: Compute group standard deviations.
aggregate: General grouped aggregation.
PandasApiTdsFrame.var: Frame-level var (no grouping).

Notes

Differences from pandas:

Only ddof=0 and ddof=1 are supported. Other values raise NotImplementedError.
engine, engine_kwargs, and numeric_only are not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame.groupby("Ship Name")["Order Id"].var().head(5).to_pandas()

window_frame_legend_ext

PandasApiGroupbyTdsFrame.window_frame_legend_ext(frame_spec=<pylegend.core.language.pandas_api.pandas_api_frame_spec.RowsBetween object>, order_by=None, ascending=True)[source]

PyLegend extension (not present in pandas).

Create a custom window specification with explicit control over the window frame. When called on a groupby frame the grouping columns are automatically used as PARTITION BY columns.

Parameters:

frame_spec (Optional[FrameSpec]) – A RowsBetween or RangeBetween specification object. None means no frame clause (just PARTITION BY + ORDER BY).
order_by (Union[str, Sequence[str], None]) – Column name(s) to use for ORDER BY within the window.
ascending (Union[bool, Sequence[bool]]) – Sort direction(s) for the ORDER BY columns.

Return type:

PandasApiWindowTdsFrame

PyLegend

Table of Contents

Quick Search

Pandas Groupby TDS Frame

agg

aggregate

count

cume_dist_legend_ext

expanding

max

mean

median

min

ntile_legend_ext

rank

rolling

shift

std

sum

var

window_frame_legend_ext