Pandas Groupby Series
A single-column proxy within a grouped context.
A GroupbySeries is the grouped counterpart of
Series.
It represents one column of a
PandasApiGroupbyTdsFrame
and is obtained by bracket-indexing a groupby object with a
single column name.
Obtaining a GroupbySeries
Use bracket notation on a PandasApiGroupbyTdsFrame:
grouped = frame.groupby("group_col")
gseries = grouped["value_col"] # -> GroupbySeries
Passing a list of column names returns a narrowed
PandasApiGroupbyTdsFrame instead (not a GroupbySeries):
grouped[["col_a", "col_b"]] # -> PandasApiGroupbyTdsFrame
The returned subclass matches the column type, following the
same mapping as Series.
For example, an integer column becomes an IntegerGroupbySeries.
Operations
A GroupbySeries must have an applied function (such as
an aggregation or rank()) before it can be executed or
assigned. Attempting to call to_sql_query() on a bare
GroupbySeries without an applied function raises
RuntimeError.
Typical usage patterns:
Grouped aggregation — call an aggregation method directly:
frame.groupby("grp")["val"].sum() frame.groupby("grp")["val"].aggregate(["sum", "mean"])
Grouped rank — call
rank()to get a window-rankedGroupbySeriesthat can be assigned back:frame["ranked"] = frame.groupby("grp")["val"].rank()
Assigning back to the frame
A GroupbySeries (with an applied function like rank())
can be assigned back to the parent
PandasApiTdsFrame
using bracket assignment:
frame["new_col"] = frame.groupby("grp")["val"].rank()
The assignment must target the same frame that was grouped.
See also
SeriesThe non-grouped single-column proxy.
PandasApiGroupbyTdsFrameThe groupby object that produces this.
PandasApiTdsFrame.groupbyCreate a groupby object.
Notes
Differences from pandas:
A
GroupbySeriesis not iterable and does not support direct data access. It is an expression builder that lazily constructs the query.Applying functions on a computed
GroupbySeriesexpression is not supported. For example,(frame.groupby('grp')['col'] + 5).sum()raisesNotImplementedError. Instead, doframe.groupby('grp')['col'].sum() + 5.Only one function call is allowed per expression. To combine multiple, use separate assignment steps.
A bare
GroupbySeries(without an aggregation or window function) cannot be executed. You must call an operation such assum(),rank(), etc. first.
Examples
Download Interactive Notebook
agg
- GroupbySeries.agg(func, axis=0, *args, **kwargs)[source]
Alias for
aggregate().See
aggregate()for full documentation.- Return type:
Union[PandasApiTdsFrame,GroupbySeries]
aggregate
- GroupbySeries.aggregate(func, axis=0, *args, **kwargs)[source]
Aggregate each group using one or more operations.
Reduce the single column within each group to a scalar value. The result is a
PandasApiTdsFramewith one row per group, containing the grouping columns and the aggregated value(s).- Parameters:
func (
Union[Callable[...,Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]],str,ufunc,List[Union[Callable[...,Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]],str,ufunc]],Mapping[Hashable,Union[Callable[...,Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]],str,ufunc,List[Union[Callable[...,Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]],str,ufunc]]]]]) –Aggregation specification:
str — a named aggregation (
'sum','mean','min','max','count','std','var', plus aliases'len','size').callable — a lambda receiving the GroupbySeries and calling one of its aggregation methods (e.g.
lambda x: x.sum()).list of str — multiple named aggregations. Result columns are named
"agg(col_name)".dict —
{column_name: agg_spec}. Keys must match the GroupbySeries’ column name.
axis (
Union[int,str]) – Must be0or'index'.
- Returns:
A frame with one row per group and the aggregated column(s), plus the grouping columns.
- Return type:
Union[PandasApiTdsFrame,GroupbySeries]- Raises:
NotImplementedError – If called on a computed GroupbySeries expression (e.g.
(frame.groupby('grp')['col'] + 5).aggregate('sum')).ValueError – If a dict key does not match the GroupbySeries’ column name.
See also
Notes
Differences from pandas:
The result always includes the grouping columns alongside the aggregated values.
Aggregation on a computed GroupbySeries expression is not supported. Call the aggregation directly, then apply arithmetic if needed.
When
funcis a dict, keys must exactly match the GroupbySeries’ column name.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Single named aggregation frame.groupby("Ship Name")["Order Id"].aggregate( "sum" ).to_pandas().head(5)
Ship Name Order Id 0 Alfred's Futterkiste 54192 1 Alfreds Futterkiste 10643 2 Ana Trujillo Emparedados y helados 42618 3 Antonio Moreno Taquería 74195 4 Around the Horn 139254 # Multiple aggregations frame.groupby("Ship Name")["Order Id"].aggregate( ["min", "max", "count"] ).head(5).to_pandas()
Ship Name min(Order Id) max(Order Id) count(Order Id) 0 Alfred's Futterkiste 10692 11011 5 1 Alfreds Futterkiste 10643 10643 1 2 Ana Trujillo Emparedados y helados 10308 10926 4 3 Antonio Moreno Taquería 10365 10856 7 4 Around the Horn 10355 11016 13
corr
count
- GroupbySeries.count()[source]
Compute the count of non-null values within each group.
- Returns:
A frame with grouping columns and the count per group.
- Return type:
Union[PandasApiTdsFrame,GroupbySeries]
Notes
Equivalent to
gseries.aggregate("count"). Maps to SQLCOUNT(column).Differences from pandas: the signature takes no parameters (the pandas version accepts
normalizeand other keyword arguments which are not supported here).Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].count().to_pandas().head(5)
Ship Name Order Id 0 Alfred's Futterkiste 5 1 Alfreds Futterkiste 1 2 Ana Trujillo Emparedados y helados 4 3 Antonio Moreno Taquería 7 4 Around the Horn 13
cov
cume_dist_legend_ext
- GroupbySeries.cume_dist_legend_ext(ascending=True)[source]
Compute the cumulative distribution within each group.
PyLegend extension — not present in pandas.
Maps to SQL
CUME_DIST() OVER (PARTITION BY ... ORDER BY col)and PurecumulativeDistribution.- Parameters:
ascending (
bool) – Whether to order in ascending direction.- Returns:
A grouped series containing cumulative distribution values (floats between 0 and 1).
- Return type:
GroupbySeries
See also
rankCompute grouped ranks.
ntile_legend_extAssign rows to numbered buckets.
Notes
Differences from pandas:
This method has no pandas equivalent.
CUME_DISTis exposed as a pylegend extension.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame["CumeDist"] = frame.groupby( "Ship Name" )["Order Id"].cume_dist_legend_ext() frame.head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name CumeDist 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 0.2 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 0.166667 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 0.071429 3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 0.1 4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices 0.083333
expanding
- GroupbySeries.expanding(min_periods=1, method=None, order_by=None, ascending=True)[source]
Create an expanding (cumulative) window on a single grouped column.
The grouping columns are automatically used as
PARTITION BY. An expanding window includes all rows from the start of the partition up to the current row.- Parameters:
min_periods (
int) – Minimum number of observations required to produce a value.method (
Optional[str]) – Not supported. Must beNone.order_by (
Union[str,Sequence[str],None]) – Column(s) to order by within the window.ascending (
Union[bool,Sequence[bool]]) – Sort direction(s) fororder_bycolumns.
- Returns:
A window series on which aggregates (
sum,mean, etc.) can be called.- Return type:
WindowSeries- Raises:
NotImplementedError – If
methodis notNone.
See also
rollingFixed-size grouped sliding window.
window_frame_legend_extCustom window specification.
Notes
Differences from pandas:
order_byandascendingare pylegend extensions not present in pandas.methodis not supported.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].expanding( order_by="Order Id" ).sum().to_pandas().head(5)
Order Id 0 10248 1 10249 2 10250 3 10251 4 10252
max
- GroupbySeries.max(numeric_only=False, min_count=-1, engine=None, engine_kwargs=None)[source]
Compute the maximum of values within each group.
- Parameters:
numeric_only (
bool) – Must beFalse.Trueis not supported.min_count (
int) – Must be-1. Other values are not supported.engine (
Optional[str]) – Not supported. Must beNone.engine_kwargs (
Optional[Dict[str,bool]]) – Not supported. Must beNone.
- Returns:
A frame with grouping columns and the maximum values.
- Return type:
Union[PandasApiTdsFrame,GroupbySeries]
Notes
Equivalent to
gseries.aggregate("max"). Works on string columns as well (lexicographic maximum).Differences from pandas:
numeric_only,engine,engine_kwargs, and non-defaultmin_countare not supported.Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].max().to_pandas().head(5)
Ship Name Order Id 0 Alfred's Futterkiste 11011 1 Alfreds Futterkiste 10643 2 Ana Trujillo Emparedados y helados 10926 3 Antonio Moreno Taquería 10856 4 Around the Horn 11016
mean
- GroupbySeries.mean(numeric_only=False, engine=None, engine_kwargs=None)[source]
Compute the mean of values within each group.
- Parameters:
numeric_only (
bool) – Must beFalse.Trueis not supported.engine (
Optional[str]) – Not supported. Must beNone.engine_kwargs (
Optional[Dict[str,bool]]) – Not supported. Must beNone.
- Returns:
A frame with grouping columns and the mean values.
- Return type:
Union[PandasApiTdsFrame,GroupbySeries]
Notes
Equivalent to
gseries.aggregate("mean"). Maps to SQLAVG().Differences from pandas:
numeric_only,engine, andengine_kwargsare not supported.Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].mean().to_pandas().head(5)
Ship Name Order Id 0 Alfred's Futterkiste 10838.4 1 Alfreds Futterkiste 10643.0 2 Ana Trujillo Emparedados y helados 10654.5 3 Antonio Moreno Taquería 10599.285714 4 Around the Horn 10711.846154
median
- GroupbySeries.median()[source]
Compute the median within each group.
Maps to
PERCENTILE_CONT(0.5)at the SQL level.- Returns:
Grouped median values.
- Return type:
Union[PandasApiTdsFrame,GroupbySeries]
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].median().to_pandas().head(5)
Ship Name Order Id 0 Alfred's Futterkiste 10835.0 1 Alfreds Futterkiste 10643.0 2 Ana Trujillo Emparedados y helados 10692.0 3 Antonio Moreno Taquería 10573.0 4 Around the Horn 10743.0
min
- GroupbySeries.min(numeric_only=False, min_count=-1, engine=None, engine_kwargs=None)[source]
Compute the minimum of values within each group.
- Parameters:
numeric_only (
bool) – Must beFalse.Trueis not supported.min_count (
int) – Must be-1. Other values are not supported.engine (
Optional[str]) – Not supported. Must beNone.engine_kwargs (
Optional[Dict[str,bool]]) – Not supported. Must beNone.
- Returns:
A frame with grouping columns and the minimum values.
- Return type:
Union[PandasApiTdsFrame,GroupbySeries]
Notes
Equivalent to
gseries.aggregate("min"). Works on string columns as well (lexicographic minimum).Differences from pandas:
numeric_only,engine,engine_kwargs, and non-defaultmin_countare not supported.Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].min().to_pandas().head(5)
Ship Name Order Id 0 Alfred's Futterkiste 10692 1 Alfreds Futterkiste 10643 2 Ana Trujillo Emparedados y helados 10308 3 Antonio Moreno Taquería 10365 4 Around the Horn 10355
ntile_legend_ext
- GroupbySeries.ntile_legend_ext(num_buckets, ascending=True)[source]
Assign rows to numbered buckets within each group.
PyLegend extension — not present in pandas.
Maps to SQL
NTILE(n) OVER (PARTITION BY ... ORDER BY col)and Purentile.- Parameters:
num_buckets (
int) – Number of buckets to distribute rows into.ascending (
bool) – Whether to order in ascending direction.
- Returns:
A grouped series containing bucket numbers (1-based).
- Return type:
GroupbySeries
See also
rankCompute grouped ranks.
cume_dist_legend_extCumulative distribution within groups.
Notes
Differences from pandas:
This method has no pandas equivalent.
NTILEis exposed as a pylegend extension.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame["Quartile"] = frame.groupby( "Ship Name" )["Order Id"].ntile_legend_ext(4) frame.head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name Quartile 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 1 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 1 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 1 3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 1 4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices 1
rank
- GroupbySeries.rank(method='min', ascending=True, na_option='bottom', pct=False, axis=0)[source]
Compute the rank of values within each group.
Return a new
GroupbySeriescontaining the rank of each value within its group. The grouping columns act as thePARTITION BYclause in the underlying SQL window function. The result can be assigned back to the parent frame or executed directly as a standalone single-column query.- Parameters:
method (
str) –How to rank equal values:
'min': Lowest rank in the group of ties (SQLRANK()).'first': Ranks by order of appearance within the group (SQLROW_NUMBER()).'dense': Like'min'but no gaps (SQLDENSE_RANK()).
ascending (
bool) – Whether to rank in ascending order.na_option (
str) – Only'bottom'is supported.pct (
bool) – IfTrue, compute percentage ranks (SQLPERCENT_RANK()). Returns aFloatGroupbySeries. Only supported withmethod='min'.axis (
Union[int,str]) – Must be0or'index'.
- Returns:
An
IntegerGroupbySeries(orFloatGroupbySerieswhenpct=True) containing the ranks within each group.- Return type:
GroupbySeries- Raises:
NotImplementedError – If called on a computed GroupbySeries expression (e.g.
(frame.groupby('grp')['col'] + 5).rank()). Callrank()first, then apply arithmetic. Ifmethodis not'min','first', or'dense'. Ifna_optionis not'bottom'. Ifpct=Truewith a method other than'min'.
See also
Series.rankFrame-level rank (no partitioning).
PandasApiGroupbyTdsFrame.rankRank all non-grouping columns.
Notes
Differences from pandas:
The
'average'and'max'methods are not supported.na_optiononly supports'bottom'.pct=Trueis only supported withmethod='min'.Calling
rank()on a computed GroupbySeries expression is not supported. Callrank()first, then apply arithmetic:frame.groupby('grp')['col'].rank() + 5.Only one window-function call is allowed per expression. To combine multiple, use separate assignments.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Execute a grouped ranked series directly frame.groupby("Ship Name")["Order Id"].rank().to_pandas().head()
Order Id 0 1 1 1 2 1 3 1 4 1 # Assign a grouped rank to the parent frame frame["Order Rank"] = frame.groupby( "Ship Name" )["Order Id"].rank() frame.head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name Order Rank 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 1 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 1 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 1 3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 1 4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices 1 frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Dense rank, descending frame["Dense Rank"] = frame.groupby( "Ship Name" )["Order Id"].rank(method="dense", ascending=False) frame.head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name Dense Rank 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 5 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 6 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 14 3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 10 4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices 12
rolling
- GroupbySeries.rolling(window, min_periods=None, center=False, win_type=None, on=None, closed=None, step=None, method=None, order_by=None, ascending=True)[source]
Create a fixed-size sliding window on a single grouped column.
The grouping columns are automatically used as
PARTITION BY. A rolling window includes a fixed number of preceding rows for each row within the partition.- Parameters:
window (
int) – Size of the moving window (number of rows).min_periods (
Optional[int]) – Minimum observations required. Defaults towindow.center (
bool) – Not supported. Must beFalse.win_type (
Optional[str]) – Not supported. Must beNone.on (
Optional[str]) – Not supported. Must beNone.closed (
Optional[str]) – Not supported. Must beNone.step (
Optional[int]) – Not supported. Must beNone.method (
Optional[str]) – Not supported. Must beNone.order_by (
Union[str,Sequence[str],None]) – Column(s) to order by within the window.ascending (
Union[bool,Sequence[bool]]) – Sort direction(s) fororder_bycolumns.
- Returns:
A window series on which aggregates (
sum,mean, etc.) can be called.- Return type:
WindowSeries- Raises:
NotImplementedError – If
center,win_type,on,closed,step, ormethodare set to non-default values.
See also
expandingExpanding (cumulative) grouped window.
window_frame_legend_extCustom window specification.
Notes
Differences from pandas:
order_byandascendingare pylegend extensions not present in pandas.center,win_type,on,closed,step, andmethodare not supported.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].rolling( window=3, order_by="Order Id" ).mean().to_pandas().head(5)
Order Id 0 10248.0 1 10249.0 2 10250.0 3 10251.0 4 10252.0
std
- GroupbySeries.std(ddof=1, engine=None, engine_kwargs=None, numeric_only=False)[source]
Compute the standard deviation within each group.
- Parameters:
ddof (
int) – Degrees of freedom.1for sample standard deviation (STDDEV_SAMP),0for population standard deviation (STDDEV_POP).engine (
Optional[str]) – Not supported. Must beNone.engine_kwargs (
Optional[Dict[str,bool]]) – Not supported. Must beNone.numeric_only (
bool) – Must beFalse.Trueis not supported.
- Returns:
A frame with grouping columns and the standard deviation.
- Return type:
Union[PandasApiTdsFrame,GroupbySeries]- Raises:
NotImplementedError – If
ddofis not0or1, or ifengine,engine_kwargs, ornumeric_onlyare set to unsupported values.
Notes
Equivalent to
gseries.aggregate("std"). Maps to SQLSTDDEV_SAMP()(ddof=1) orSTDDEV_POP()(ddof=0).Differences from pandas: only
ddof=0andddof=1are supported.engine,engine_kwargs, andnumeric_onlyare not supported.Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].std().to_pandas().head(5)
Ship Name Order Id 0 Alfred's Futterkiste 143.827327 1 Alfreds Futterkiste <NA> 2 Ana Trujillo Emparedados y helados 261.766435 3 Antonio Moreno Taquería 156.531908 4 Around the Horn 215.034046
sum
- GroupbySeries.sum(numeric_only=False, min_count=0, engine=None, engine_kwargs=None)[source]
Compute the sum of values within each group.
- Parameters:
numeric_only (
bool) – Must beFalse.Trueis not supported.min_count (
int) – Must be0. Non-zero values are not supported.engine (
Optional[str]) – Not supported. Must beNone.engine_kwargs (
Optional[Dict[str,bool]]) – Not supported. Must beNone.
- Returns:
A frame with grouping columns and the summed values.
- Return type:
Union[PandasApiTdsFrame,GroupbySeries]
Notes
Equivalent to
gseries.aggregate("sum").Differences from pandas:
numeric_only,engine, andengine_kwargsare not supported.min_countmust be0.Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].sum().to_pandas().head(5)
Ship Name Order Id 0 Alfred's Futterkiste 54192 1 Alfreds Futterkiste 10643 2 Ana Trujillo Emparedados y helados 42618 3 Antonio Moreno Taquería 74195 4 Around the Horn 139254
transform
- GroupbySeries.transform(func)[source]
Apply a partition-only window aggregate and broadcast back to every row.
Equivalent to pandas
groupby['col'].transform('func'), which computes the aggregate per group and broadcasts the result back to every row.Generates SQL like
FUNC(col) OVER (PARTITION BY ...)and Pure likeextend(over(~[grp]), ~col:{p,w,r | $r.col}:y | $y->func()).- Parameters:
func (
Union[str,Callable[...,object]]) – The aggregation to apply within each partition. Accepts a named aggregation string ('sum','mean','min','max','count','std','var') or a callable that receives aWindowSeriesand returns the result.- Returns:
A grouped series containing the broadcasted aggregate value for each row within its group.
- Return type:
GroupbySeries
See also
Notes
Differences from pandas:
The result keeps every row (same row count as the input), matching pandas
transformsemantics.Only aggregation functions are supported as
func. Arbitrary element-wise transforms (e.g.lambda x: x + 1) are not supported.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame["Group Sum"] = frame.groupby( "Ship Name" )["Order Id"].transform("sum") frame.head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name Group Sum 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 52293 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 63256 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 150623 3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 105831 4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices 128292
var
- GroupbySeries.var(ddof=1, engine=None, engine_kwargs=None, numeric_only=False)[source]
Compute the variance within each group.
- Parameters:
ddof (
int) – Degrees of freedom.1for sample variance (VAR_SAMP),0for population variance (VAR_POP).engine (
Optional[str]) – Not supported. Must beNone.engine_kwargs (
Optional[Dict[str,bool]]) – Not supported. Must beNone.numeric_only (
bool) – Must beFalse.Trueis not supported.
- Returns:
A frame with grouping columns and the variance.
- Return type:
Union[PandasApiTdsFrame,GroupbySeries]- Raises:
NotImplementedError – If
ddofis not0or1, or ifengine,engine_kwargs, ornumeric_onlyare set to unsupported values.
Notes
Equivalent to
gseries.aggregate("var"). Maps to SQLVAR_SAMP()(ddof=1) orVAR_POP()(ddof=0).Differences from pandas: only
ddof=0andddof=1are supported.engine,engine_kwargs, andnumeric_onlyare not supported.Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].var().to_pandas().head(5)
Ship Name Order Id 0 Alfred's Futterkiste 20686.3 1 Alfreds Futterkiste <NA> 2 Ana Trujillo Emparedados y helados 68521.666667 3 Antonio Moreno Taquería 24502.238095 4 Around the Horn 46239.641026
wavg_legend_ext
window_frame_legend_ext
- GroupbySeries.window_frame_legend_ext(frame_spec=<pylegend.core.language.pandas_api.pandas_api_frame_spec.RowsBetween object>, order_by=None, ascending=True)[source]
Create a custom window specification on a single grouped column.
PyLegend extension — not present in pandas.
The grouping columns are automatically used as
PARTITION BY. Theframe_specargument controls theROWS BETWEENorRANGE BETWEENclause.- Parameters:
frame_spec (
Optional[FrameSpec]) – A window-frame specification created viarows_between()orrange_between().order_by (
Union[str,Sequence[str],None]) – Column(s) to order by within the window.ascending (
Union[bool,Sequence[bool]]) – Sort direction(s) fororder_bycolumns.
- Returns:
A window series on which aggregates can be called.
- Return type:
WindowSeries- Raises:
TypeError – If
frame_specis not aRowsBetweenorRangeBetween.
Notes
Differences from pandas:
This method has no pandas equivalent. It is a pylegend extension for fine-grained control over the SQL
ROWS BETWEEN/RANGE BETWEENclause.
Examples
import pylegend from pylegend.core.language.pandas_api.pandas_api_frame_spec import ( RowsBetween, ) frame = pylegend.samples.pandas_api.northwind_orders_frame()
spec = RowsBetween(-2, 0) frame.groupby("Ship Name")["Order Id"].window_frame_legend_ext( spec, order_by="Order Id" ).sum().to_pandas().head()
Order Id 0 10248 1 10249 2 10250 3 10251 4 10252