Pandas Groupby TDS Frame
The PandasApiGroupbyTdsFrame class is returned by
PandasApiTdsFrame.groupby
and provides methods for applying aggregation and window operations within each group.
The groupby columns also serve as the PARTITION BY clause for OLAP window functions such as rank.
agg
- PandasApiGroupbyTdsFrame.agg(func, axis=0, *args, **kwargs)[source]
Aggregate each group using one or more operations.
Alias for
aggregate(). Seeaggregatefor full documentation.See also
aggregateEquivalent method (canonical name).
- Return type:
PandasApiTdsFrame
aggregate
- PandasApiGroupbyTdsFrame.aggregate(func, axis=0, *args, **kwargs)[source]
Aggregate each group using one or more operations.
Apply aggregation function(s) to each group defined by the preceding
groupbycall. The grouping columns always appear in the result alongside the aggregated values. Whensort=Truewas passed togroupby(the default), the result is sorted by the grouping columns.- Parameters:
func (
Union[Callable[...,Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]],str,ufunc,List[Union[Callable[...,Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]],str,ufunc]],Mapping[Hashable,Union[Callable[...,Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]],str,ufunc,List[Union[Callable[...,Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]],str,ufunc]]]]]) –Aggregation specification. Accepted forms:
str: A named aggregation (e.g.'sum') applied to all non-grouping columns (or selected columns if bracket notation was used aftergroupby).callable: A function that receives a column Series proxy and returns an aggregated value (e.g.lambda x: x.sum()).np.ufunc: A NumPy universal function (e.g.np.sum).list: A list of the above, producing one output column per function per input column.dict: A mapping of column name → aggregation(s). Only the specified columns appear in the result.
axis (
Union[int,str]) – Only0/'index'is supported.*args (
Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]) – Not supported.**kwargs (
Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]) – Not supported.
- Returns:
A new TDS frame with one row per group and the aggregated columns.
- Return type:
PandasApiTdsFrame- Raises:
TypeError – If
funcis not a valid aggregation specification.KeyError – If a column name in a dict-based
funcdoes not exist in the frame.NotImplementedError – If
axisis not0/'index', or if extra*args/**kwargsare passed.
See also
aggAlias for aggregate.
PandasApiTdsFrame.aggregateFrame-level aggregation (no grouping).
Notes
Differences from pandas:
The result always contains the grouping columns as regular columns (never as the index), because
as_indexis alwaysFalse.Extra
*args/**kwargsare not forwarded to the aggregation function.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Dict-based aggregation on groups frame.groupby("Ship Name").aggregate( {"Order Id": "count"} ).head(5).to_pandas()
Ship Name Order Id 0 Alfred's Futterkiste 5 1 Alfreds Futterkiste 1 2 Ana Trujillo Emparedados y helados 4 3 Antonio Moreno Taquería 7 4 Around the Horn 13 # Multiple aggregations per column frame.groupby("Ship Name").aggregate( {"Order Id": ["min", "max"]} ).head(5).to_pandas()
Ship Name min(Order Id) max(Order Id) 0 Alfred's Futterkiste 10692 11011 1 Alfreds Futterkiste 10643 10643 2 Ana Trujillo Emparedados y helados 10308 10926 3 Antonio Moreno Taquería 10365 10856 4 Around the Horn 10355 11016 # Broadcast a single function to all non-grouping columns frame.groupby("Ship Name", sort=False).aggregate("count").head(5).to_pandas()
Ship Name Order Id Order Date Required Date Shipped Date 0 Alfred's Futterkiste 5 5 5 5 1 Alfreds Futterkiste 1 1 1 1 2 Ana Trujillo Emparedados y helados 4 4 4 4 3 Antonio Moreno Taquería 7 7 7 7 4 Around the Horn 13 13 13 13
count
- PandasApiGroupbyTdsFrame.count()[source]
Count non-null values within each group.
Convenience method equivalent to
aggregate('count')on the groupby object. Returns the number of non-null values per column for each group.- Returns:
A new TDS frame with one row per group.
- Return type:
PandasApiTdsFrame
See also
Notes
Differences from the frame-level
PandasApiTdsFrame.count():The groupby
counttakes no parameters (noaxis,numeric_only, or**kwargs), matching the pandasDataFrameGroupBy.countsignature.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].count().head(5).to_pandas()
cume_dist_legend_ext
- PandasApiGroupbyTdsFrame.cume_dist_legend_ext(ascending=True)[source]
PyLegend extension (not present in pandas).
Compute the cumulative distribution within each group, equivalent to SQL
CUME_DIST() OVER (PARTITION BY ... ORDER BY col)and PurecumulativeDistribution.- Return type:
PandasApiTdsFrame
expanding
max
- PandasApiGroupbyTdsFrame.max(numeric_only=False, min_count=-1, engine=None, engine_kwargs=None)[source]
Compute the maximum value within each group.
Convenience method equivalent to
aggregate('max')on the groupby object. For string columns, returns the lexicographically largest value per group.- Parameters:
numeric_only (
bool) – Must beFalse.Trueis not supported.min_count (
int) – Must be-1. Other values are not supported.engine (
Optional[str]) – Not supported. Must beNone.engine_kwargs (
Optional[Dict[str,bool]]) – Not supported. Must beNone.
- Returns:
A new TDS frame with one row per group.
- Return type:
PandasApiTdsFrame- Raises:
NotImplementedError – If any parameter is set to an unsupported value.
See also
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].max().head(5).to_pandas()
mean
- PandasApiGroupbyTdsFrame.mean(numeric_only=False, engine=None, engine_kwargs=None)[source]
Compute the mean of values within each group.
Convenience method equivalent to
aggregate('mean')on the groupby object.- Parameters:
numeric_only (
bool) – Must beFalse.Trueis not supported.engine (
Optional[str]) – Not supported. Must beNone.engine_kwargs (
Optional[Dict[str,bool]]) – Not supported. Must beNone.
- Returns:
A new TDS frame with one row per group.
- Return type:
PandasApiTdsFrame- Raises:
NotImplementedError – If any parameter is set to an unsupported value.
See also
aggregateGeneral grouped aggregation.
PandasApiTdsFrame.meanFrame-level mean (no grouping).
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].mean().head(5).to_pandas()
median
- PandasApiGroupbyTdsFrame.median()[source]
Compute the median of each numeric column within each group.
Applies
PERCENTILE_CONT(0.5)at the SQL level for each numeric column. Non-numeric columns are excluded automatically.- Returns:
A new TDS frame with one row per group.
- Return type:
PandasApiTdsFrame
Notes
Differences from pandas:
Only numeric columns are included; non-numeric columns are silently skipped.
The pandas
numeric_onlyparameter is not available; the behaviour is alwaysnumeric_only=True.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].median().head(5).to_pandas()
min
- PandasApiGroupbyTdsFrame.min(numeric_only=False, min_count=-1, engine=None, engine_kwargs=None)[source]
Compute the minimum value within each group.
Convenience method equivalent to
aggregate('min')on the groupby object. For string columns, returns the lexicographically smallest value per group.- Parameters:
numeric_only (
bool) – Must beFalse.Trueis not supported.min_count (
int) – Must be-1. Other values are not supported.engine (
Optional[str]) – Not supported. Must beNone.engine_kwargs (
Optional[Dict[str,bool]]) – Not supported. Must beNone.
- Returns:
A new TDS frame with one row per group.
- Return type:
PandasApiTdsFrame- Raises:
NotImplementedError – If any parameter is set to an unsupported value.
See also
Notes
Differences from the frame-level
PandasApiTdsFrame.min():The
min_countparameter defaults to-1(matching the pandasDataFrameGroupBy.mindefault) rather than being absent.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].min().head(5).to_pandas()
ntile_legend_ext
rank
- PandasApiGroupbyTdsFrame.rank(method='min', ascending=True, na_option='bottom', pct=False, axis=0)[source]
Compute the rank of values within each group.
Rank each value within its group defined by the preceding
groupbycall. The grouping columns act as thePARTITION BYclause in the underlying SQL window function. Only the ranked (non-grouping) columns appear in the result.- Parameters:
method (
str) –How to rank equal values:
'min': Lowest rank in the group of ties (SQLRANK()).'first': Ranks assigned in order of appearance within the group (SQLROW_NUMBER()).'dense': Like'min'but ranks always increase by 1, no gaps (SQLDENSE_RANK()).
ascending (
bool) – Whether to rank in ascending order.Falseranks in descending order.na_option (
str) – How to rank null values. Only'bottom'is supported.'keep'and'top'raiseNotImplementedError.pct (
bool) – IfTrue, compute percentage ranks (SQLPERCENT_RANK()). Result columns are of float type. Can only be used withmethod='min'.axis (
Union[int,str]) – Only0/'index'is supported.
- Returns:
A new TDS frame containing only the ranked columns (the grouping columns are not included in the output). Each column contains integer ranks (or float when
pct=True).- Return type:
PandasApiTdsFrame- Raises:
NotImplementedError – If
methodis not one of'min','first','dense'. Ifna_optionis not'bottom'. Ifpct=Truewith a method other than'min'. Ifaxisis not0or'index'.
See also
PandasApiTdsFrame.rankFrame-level rank (no partitioning).
aggregateGrouped aggregation.
Notes
Differences from pandas:
The
'average'and'max'ranking methods are not supported.na_optiononly supports'bottom'.pct=Trueis only supported withmethod='min'.The result contains only the ranked columns, not the grouping columns. In pandas,
DataFrameGroupBy.rankreturns a frame with the same shape as the input, preserving all columns. Here, grouping columns are excluded from the output. To preserve all columns, use bracket assignment with a single-column selection:frame["rank"] = frame.groupby("grp")["col"].rank().numeric_onlyis not exposed in the groupbyranksignature (it is alwaysFalse).Combining multiple rank calls in a single expression is not supported. Compute them in separate steps.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Rank within groups (only ranked columns in output) frame.groupby("Ship Name")[["Order Id"]].rank().head(5).to_pandas()
Order Id 0 1 1 1 2 1 3 1 4 1 # Append a grouped rank column to the frame frame["Order Rank"] = frame.groupby( "Ship Name" )["Order Id"].rank() frame.head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name Order Rank 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 1 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 1 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 1 3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 1 4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices 1 # Dense rank descending within groups frame.groupby("Ship Name")[["Order Id"]].rank( method="dense", ascending=False ).head(5).to_pandas()
Order Id 0 5 1 6 2 14 3 10 4 12
rolling
shift
- PandasApiGroupbyTdsFrame.shift(order_by, periods=1, freq=None, axis=0, fill_value=None, suffix=None)[source]
Shift values by desired number of periods within each group.
Replace every column’s values with their shifted values, computing the shift independently for each group. Because the underlying TDS is inherently unordered, this requires an explicit
order_byparameter to define the ordering for the window function partitioned by the group keys.- Parameters:
order_by (
Union[str,Sequence[str]]) – Column name(s) to order the frame by within each group before applying the shift. Unlike pandas, this is required to ensure deterministic output. All specified columns must be present in the base frame.periods (
Union[int,Sequence[int]]) – Number of periods to shift. Currently, only1(shift down, SQLLAG) and-1(shift up, SQLLEAD) are supported. If a sequence is provided, it cannot contain duplicate values.freq (
Union[str,int,None]) – Not supported. Must beNone.axis (
Union[int,str]) – Axis to shift along. Only0/'index'is supported.fill_value (
Optional[Hashable]) – Not supported. Must beNone. Missing values introduced by the shift will always be null.suffix (
Optional[str]) – If provided, renames the resulting shifted columns by appending this string to the original column names. This argument can only be used ifperiodsis a sequence (not a single integer).
- Returns:
A new TDS frame with the shifted columns computed per group.
- Return type:
PandasApiTdsFrame- Raises:
NotImplementedError – If
periodscontains any values other than1or-1. Iffreqis notNone. Ifaxisis not0or'index'. Iffill_valueis notNone.ValueError – If any column specified in
order_byis not present in the frame. Ifperiodscontains duplicate values. Ifsuffixis specified butperiodsis a single integer.
See also
PandasApiTdsFrame.shiftShift values for the entire frame.
Notes
Differences from pandas:
The
order_byparameter is mandatory. In pandas,shiftrelies on the implicit order of the dataframe’s index. Here, because it translates to SQL, an explicit order must be provided.periodsis strictly limited to1or-1. Arbitrary integer shifts are not supported.fill_valueis not supported and must remainNone.The
freqparameter is not supported and must beNone.axis=1(shifting horizontally across columns) is not supported.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Shift the entire frame down by 1 row within each 'Ship Name' group, frame.groupby("Ship Name")[["Order Date", "Shipped Date"]].shift( order_by="Order Date", periods=1 ).head(3).to_pandas()
Order Date Shipped Date 0 NaT NaT 1 NaT NaT 2 NaT NaT
std
- PandasApiGroupbyTdsFrame.std(ddof=1, engine=None, engine_kwargs=None, numeric_only=False)[source]
Compute the standard deviation within each group.
Convenience method equivalent to
aggregate('std')on the groupby object. Supports bothddof=1(sample, maps toSTDDEV_SAMP) andddof=0(population, maps toSTDDEV_POP) at the SQL level.- Parameters:
ddof (
int) – Degrees of freedom.1for sample standard deviation (STDDEV_SAMP),0for population standard deviation (STDDEV_POP).engine (
Optional[str]) – Not supported. Must beNone.engine_kwargs (
Optional[Dict[str,bool]]) – Not supported. Must beNone.numeric_only (
bool) – Must beFalse.Trueis not supported.
- Returns:
A new TDS frame with one row per group.
- Return type:
PandasApiTdsFrame- Raises:
NotImplementedError – If
ddofis not0or1, or ifengine,engine_kwargs, ornumeric_onlyare set to unsupported values.
See also
Notes
Differences from pandas:
Only
ddof=0andddof=1are supported. Other values raiseNotImplementedError.engine,engine_kwargs, andnumeric_onlyare not supported.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].std().head(5).to_pandas()
sum
- PandasApiGroupbyTdsFrame.sum(numeric_only=False, min_count=0, engine=None, engine_kwargs=None)[source]
Compute the sum of values within each group.
Convenience method equivalent to
aggregate('sum')on the groupby object.- Parameters:
numeric_only (
bool) – Must beFalse.Trueis not supported.min_count (
int) – Must be0. Non-zero values are not supported.engine (
Optional[str]) – Not supported. Must beNone.engine_kwargs (
Optional[Dict[str,bool]]) – Not supported. Must beNone.
- Returns:
A new TDS frame with one row per group.
- Return type:
PandasApiTdsFrame- Raises:
NotImplementedError – If any parameter is set to an unsupported value.
See also
aggregateGeneral grouped aggregation.
PandasApiTdsFrame.sumFrame-level sum (no grouping).
Notes
Differences from the frame-level
PandasApiTdsFrame.sum():No
axis,skipna, or**kwargsparameters. The groupby convenience methods follow the pandasDataFrameGroupBysignature, which omits these.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].sum().head(5).to_pandas()
var
- PandasApiGroupbyTdsFrame.var(ddof=1, engine=None, engine_kwargs=None, numeric_only=False)[source]
Compute the variance within each group.
Convenience method equivalent to
aggregate('var')on the groupby object. Supports bothddof=1(sample, maps toVAR_SAMP) andddof=0(population, maps toVAR_POP) at the SQL level.- Parameters:
ddof (
int) – Degrees of freedom.1for sample variance (VAR_SAMP),0for population variance (VAR_POP).engine (
Optional[str]) – Not supported. Must beNone.engine_kwargs (
Optional[Dict[str,bool]]) – Not supported. Must beNone.numeric_only (
bool) – Must beFalse.Trueis not supported.
- Returns:
A new TDS frame with one row per group.
- Return type:
PandasApiTdsFrame- Raises:
NotImplementedError – If
ddofis not0or1, or ifengine,engine_kwargs, ornumeric_onlyare set to unsupported values.
See also
Notes
Differences from pandas:
Only
ddof=0andddof=1are supported. Other values raiseNotImplementedError.engine,engine_kwargs, andnumeric_onlyare not supported.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].var().head(5).to_pandas()
window_frame_legend_ext
- PandasApiGroupbyTdsFrame.window_frame_legend_ext(frame_spec=<pylegend.core.language.pandas_api.pandas_api_frame_spec.RowsBetween object>, order_by=None, ascending=True)[source]
PyLegend extension (not present in pandas).
Create a custom window specification with explicit control over the window frame. When called on a groupby frame the grouping columns are automatically used as PARTITION BY columns.
- Parameters:
frame_spec (
Optional[FrameSpec]) – ARowsBetweenorRangeBetweenspecification object.Nonemeans no frame clause (just PARTITION BY + ORDER BY).order_by (
Union[str,Sequence[str],None]) – Column name(s) to use for ORDER BY within the window.ascending (
Union[bool,Sequence[bool]]) – Sort direction(s) for the ORDER BY columns.
- Return type:
PandasApiWindowTdsFrame