Pandas Groupby Series

A single-column proxy within a grouped context.

A GroupbySeries is the grouped counterpart of Series. It represents one column of a PandasApiGroupbyTdsFrame and is obtained by bracket-indexing a groupby object with a single column name.

Obtaining a GroupbySeries

Use bracket notation on a PandasApiGroupbyTdsFrame:

grouped = frame.groupby("group_col")
gseries = grouped["value_col"]   # -> GroupbySeries

Passing a list of column names returns a narrowed PandasApiGroupbyTdsFrame instead (not a GroupbySeries):

grouped[["col_a", "col_b"]]  # -> PandasApiGroupbyTdsFrame

The returned subclass matches the column type, following the same mapping as Series. For example, an integer column becomes an IntegerGroupbySeries.

Operations

A GroupbySeries must have an applied function (such as an aggregation or rank()) before it can be executed or assigned. Attempting to call to_sql_query() on a bare GroupbySeries without an applied function raises RuntimeError.

Typical usage patterns:

Grouped aggregation — call an aggregation method directly:

frame.groupby("grp")["val"].sum()
frame.groupby("grp")["val"].aggregate(["sum", "mean"])

Grouped rank — call rank() to get a window-ranked GroupbySeries that can be assigned back:
```
frame["ranked"] = frame.groupby("grp")["val"].rank()
```

Assigning back to the frame

A GroupbySeries (with an applied function like rank()) can be assigned back to the parent PandasApiTdsFrame using bracket assignment:

frame["new_col"] = frame.groupby("grp")["val"].rank()

The assignment must target the same frame that was grouped.

See also

Series: The non-grouped single-column proxy.
PandasApiGroupbyTdsFrame: The groupby object that produces this.
PandasApiTdsFrame.groupby: Create a groupby object.

Notes

Differences from pandas:

A GroupbySeries is not iterable and does not support direct data access. It is an expression builder that lazily constructs the query.
Applying functions on a computed GroupbySeries expression is not supported. For example, (frame.groupby('grp')['col'] + 5).sum() raises NotImplementedError. Instead, do frame.groupby('grp')['col'].sum() + 5.
Only one function call is allowed per expression. To combine multiple, use separate assignment steps.
A bare GroupbySeries (without an aggregation or window function) cannot be executed. You must call an operation such as sum(), rank(), etc. first.

Examples

Download Interactive Notebook

agg

GroupbySeries.agg(func, axis=0, *args, **kwargs)[source]

Alias for aggregate().

See aggregate() for full documentation.

Return type:: Union[PandasApiTdsFrame, GroupbySeries]

aggregate

GroupbySeries.aggregate(func, axis=0, *args, **kwargs)[source]

Aggregate each group using one or more operations.

Reduce the single column within each group to a scalar value. The result is a PandasApiTdsFrame with one row per group, containing the grouping columns and the aggregated value(s).

Parameters:

func (Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc, List[Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc]], Mapping[Hashable, Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc, List[Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc]]]]]) –
Aggregation specification:
- str — a named aggregation ('sum', 'mean', 'min', 'max', 'count', 'std', 'var', plus aliases 'len', 'size').
- callable — a lambda receiving the GroupbySeries and calling one of its aggregation methods (e.g. lambda x: x.sum()).
- list of str — multiple named aggregations. Result columns are named "agg(col_name)".
- dict — {column_name: agg_spec}. Keys must match the GroupbySeries’ column name.
axis (Union[int, str]) – Must be 0 or 'index'.

Returns:

A frame with one row per group and the aggregated column(s), plus the grouping columns.

Return type:

Union[PandasApiTdsFrame, GroupbySeries]

Raises:

NotImplementedError – If called on a computed GroupbySeries expression (e.g. (frame.groupby('grp')['col'] + 5).aggregate('sum')).
ValueError – If a dict key does not match the GroupbySeries’ column name.

See also

agg: Alias for aggregate.
sum: Grouped sum.
PandasApiGroupbyTdsFrame.aggregate: Aggregate on the full groupby frame.

Notes

Differences from pandas:

The result always includes the grouping columns alongside the aggregated values.
Aggregation on a computed GroupbySeries expression is not supported. Call the aggregation directly, then apply arithmetic if needed.
When func is a dict, keys must exactly match the GroupbySeries’ column name.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Single named aggregation
frame.groupby("Ship Name")["Order Id"].aggregate(
    "sum"
).to_pandas().head(5)

	Ship Name	Order Id
0	Alfred's Futterkiste	54192
1	Alfreds Futterkiste	10643
2	Ana Trujillo Emparedados y helados	42618
3	Antonio Moreno Taquería	74195
4	Around the Horn	139254

# Multiple aggregations
frame.groupby("Ship Name")["Order Id"].aggregate(
    ["min", "max", "count"]
).head(5).to_pandas()

	Ship Name	min(Order Id)	max(Order Id)	count(Order Id)
0	Alfred's Futterkiste	10692	11011	5
1	Alfreds Futterkiste	10643	10643	1
2	Ana Trujillo Emparedados y helados	10308	10926	4
3	Antonio Moreno Taquería	10365	10856	7
4	Around the Horn	10355	11016	13

corr

count

GroupbySeries.count()[source]

Compute the count of non-null values within each group.

Returns:: A frame with grouping columns and the count per group.
Return type:: Union[PandasApiTdsFrame, GroupbySeries]

Notes

Equivalent to gseries.aggregate("count"). Maps to SQL COUNT(column).

Differences from pandas: the signature takes no parameters (the pandas version accepts normalize and other keyword arguments which are not supported here).

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame.groupby("Ship Name")["Order Id"].count().to_pandas().head(5)

	Ship Name	Order Id
0	Alfred's Futterkiste	5
1	Alfreds Futterkiste	1
2	Ana Trujillo Emparedados y helados	4
3	Antonio Moreno Taquería	7
4	Around the Horn	13

cov

cume_dist_legend_ext

GroupbySeries.cume_dist_legend_ext(ascending=True)[source]

Compute the cumulative distribution within each group.

PyLegend extension — not present in pandas.

Maps to SQL CUME_DIST() OVER (PARTITION BY ... ORDER BY col) and Pure cumulativeDistribution.

Parameters:: ascending (bool) – Whether to order in ascending direction.
Returns:: A grouped series containing cumulative distribution values (floats between 0 and 1).
Return type:: GroupbySeries

See also

rank: Compute grouped ranks.
ntile_legend_ext: Assign rows to numbered buckets.

Notes

Differences from pandas:

This method has no pandas equivalent. CUME_DIST is exposed as a pylegend extension.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame["CumeDist"] = frame.groupby(
    "Ship Name"
)["Order Id"].cume_dist_legend_ext()
frame.head(5).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name	CumeDist
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier	0.2
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten	0.166667
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes	0.071429
3	10251	1996-07-08	1996-08-05	1996-07-15	Victuailles en stock	0.1
4	10252	1996-07-09	1996-08-06	1996-07-11	Suprêmes délices	0.083333

expanding

GroupbySeries.expanding(min_periods=1, method=None, order_by=None, ascending=True)[source]

Create an expanding (cumulative) window on a single grouped column.

The grouping columns are automatically used as PARTITION BY. An expanding window includes all rows from the start of the partition up to the current row.

Parameters:

min_periods (int) – Minimum number of observations required to produce a value.
method (Optional[str]) – Not supported. Must be None.
order_by (Union[str, Sequence[str], None]) – Column(s) to order by within the window.
ascending (Union[bool, Sequence[bool]]) – Sort direction(s) for order_by columns.

Returns:

A window series on which aggregates (sum, mean, etc.) can be called.

Return type:

WindowSeries

Raises:

NotImplementedError – If method is not None.

See also

rolling: Fixed-size grouped sliding window.
window_frame_legend_ext: Custom window specification.

Notes

Differences from pandas:

order_by and ascending are pylegend extensions not present in pandas.
method is not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame.groupby("Ship Name")["Order Id"].expanding(
    order_by="Order Id"
).sum().to_pandas().head(5)

	Order Id
0	10248
1	10249
2	10250
3	10251
4	10252

max

GroupbySeries.max(numeric_only=False, min_count=-1, engine=None, engine_kwargs=None)[source]

Compute the maximum of values within each group.

Parameters:

numeric_only (bool) – Must be False. True is not supported.
min_count (int) – Must be -1. Other values are not supported.
engine (Optional[str]) – Not supported. Must be None.
engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.

Returns:

A frame with grouping columns and the maximum values.

Return type:

Union[PandasApiTdsFrame, GroupbySeries]

Notes

Equivalent to gseries.aggregate("max"). Works on string columns as well (lexicographic maximum).

Differences from pandas: numeric_only, engine, engine_kwargs, and non-default min_count are not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame.groupby("Ship Name")["Order Id"].max().to_pandas().head(5)

	Ship Name	Order Id
0	Alfred's Futterkiste	11011
1	Alfreds Futterkiste	10643
2	Ana Trujillo Emparedados y helados	10926
3	Antonio Moreno Taquería	10856
4	Around the Horn	11016

mean

GroupbySeries.mean(numeric_only=False, engine=None, engine_kwargs=None)[source]

Compute the mean of values within each group.

Parameters:

numeric_only (bool) – Must be False. True is not supported.
engine (Optional[str]) – Not supported. Must be None.
engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.

Returns:

A frame with grouping columns and the mean values.

Return type:

Union[PandasApiTdsFrame, GroupbySeries]

Notes

Equivalent to gseries.aggregate("mean"). Maps to SQL AVG().

Differences from pandas: numeric_only, engine, and engine_kwargs are not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame.groupby("Ship Name")["Order Id"].mean().to_pandas().head(5)

	Ship Name	Order Id
0	Alfred's Futterkiste	10838.4
1	Alfreds Futterkiste	10643.0
2	Ana Trujillo Emparedados y helados	10654.5
3	Antonio Moreno Taquería	10599.285714
4	Around the Horn	10711.846154

median

GroupbySeries.median()[source]

Compute the median within each group.

Maps to PERCENTILE_CONT(0.5) at the SQL level.

Returns:: Grouped median values.
Return type:: Union[PandasApiTdsFrame, GroupbySeries]

See also

mean: Compute group means.
aggregate: General grouped aggregation.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame.groupby("Ship Name")["Order Id"].median().to_pandas().head(5)

	Ship Name	Order Id
0	Alfred's Futterkiste	10835.0
1	Alfreds Futterkiste	10643.0
2	Ana Trujillo Emparedados y helados	10692.0
3	Antonio Moreno Taquería	10573.0
4	Around the Horn	10743.0

min

GroupbySeries.min(numeric_only=False, min_count=-1, engine=None, engine_kwargs=None)[source]

Compute the minimum of values within each group.

Parameters:

numeric_only (bool) – Must be False. True is not supported.
min_count (int) – Must be -1. Other values are not supported.
engine (Optional[str]) – Not supported. Must be None.
engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.

Returns:

A frame with grouping columns and the minimum values.

Return type:

Union[PandasApiTdsFrame, GroupbySeries]

Notes

Equivalent to gseries.aggregate("min"). Works on string columns as well (lexicographic minimum).

Differences from pandas: numeric_only, engine, engine_kwargs, and non-default min_count are not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame.groupby("Ship Name")["Order Id"].min().to_pandas().head(5)

	Ship Name	Order Id
0	Alfred's Futterkiste	10692
1	Alfreds Futterkiste	10643
2	Ana Trujillo Emparedados y helados	10308
3	Antonio Moreno Taquería	10365
4	Around the Horn	10355

ntile_legend_ext

GroupbySeries.ntile_legend_ext(num_buckets, ascending=True)[source]

Assign rows to numbered buckets within each group.

PyLegend extension — not present in pandas.

Maps to SQL NTILE(n) OVER (PARTITION BY ... ORDER BY col) and Pure ntile.

Parameters:

num_buckets (int) – Number of buckets to distribute rows into.
ascending (bool) – Whether to order in ascending direction.

Returns:

A grouped series containing bucket numbers (1-based).

Return type:

GroupbySeries

See also

rank: Compute grouped ranks.
cume_dist_legend_ext: Cumulative distribution within groups.

Notes

Differences from pandas:

This method has no pandas equivalent. NTILE is exposed as a pylegend extension.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame["Quartile"] = frame.groupby(
    "Ship Name"
)["Order Id"].ntile_legend_ext(4)
frame.head(5).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name	Quartile
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier	1
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten	1
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes	1
3	10251	1996-07-08	1996-08-05	1996-07-15	Victuailles en stock	1
4	10252	1996-07-09	1996-08-06	1996-07-11	Suprêmes délices	1

rank

GroupbySeries.rank(method='min', ascending=True, na_option='bottom', pct=False, axis=0)[source]

Compute the rank of values within each group.

Return a new GroupbySeries containing the rank of each value within its group. The grouping columns act as the PARTITION BY clause in the underlying SQL window function. The result can be assigned back to the parent frame or executed directly as a standalone single-column query.

Parameters:

method (str) –
How to rank equal values:
- 'min' : Lowest rank in the group of ties (SQL RANK()).
- 'first' : Ranks by order of appearance within the group (SQL ROW_NUMBER()).
- 'dense' : Like 'min' but no gaps (SQL DENSE_RANK()).
ascending (bool) – Whether to rank in ascending order.
na_option (str) – Only 'bottom' is supported.
pct (bool) – If True, compute percentage ranks (SQL PERCENT_RANK()). Returns a FloatGroupbySeries. Only supported with method='min'.
axis (Union[int, str]) – Must be 0 or 'index'.

Returns:

An IntegerGroupbySeries (or FloatGroupbySeries when pct=True) containing the ranks within each group.

Return type:

GroupbySeries

Raises:

NotImplementedError – If called on a computed GroupbySeries expression (e.g. (frame.groupby('grp')['col'] + 5).rank()). Call rank() first, then apply arithmetic. If method is not 'min', 'first', or 'dense'. If na_option is not 'bottom'. If pct=True with a method other than 'min'.

See also

Series.rank: Frame-level rank (no partitioning).
PandasApiGroupbyTdsFrame.rank: Rank all non-grouping columns.

Notes

Differences from pandas:

The 'average' and 'max' methods are not supported.
na_option only supports 'bottom'.
pct=True is only supported with method='min'.
Calling rank() on a computed GroupbySeries expression is not supported. Call rank() first, then apply arithmetic: frame.groupby('grp')['col'].rank() + 5.
Only one window-function call is allowed per expression. To combine multiple, use separate assignments.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Execute a grouped ranked series directly
frame.groupby("Ship Name")["Order Id"].rank().to_pandas().head()

	Order Id
0	1
1	1
2	1
3	1
4	1

# Assign a grouped rank to the parent frame
frame["Order Rank"] = frame.groupby(
    "Ship Name"
)["Order Id"].rank()
frame.head(5).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name	Order Rank
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier	1
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten	1
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes	1
3	10251	1996-07-08	1996-08-05	1996-07-15	Victuailles en stock	1
4	10252	1996-07-09	1996-08-06	1996-07-11	Suprêmes délices	1

frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Dense rank, descending
frame["Dense Rank"] = frame.groupby(
    "Ship Name"
)["Order Id"].rank(method="dense", ascending=False)
frame.head(5).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name	Dense Rank
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier	5
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten	6
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes	14
3	10251	1996-07-08	1996-08-05	1996-07-15	Victuailles en stock	10
4	10252	1996-07-09	1996-08-06	1996-07-11	Suprêmes délices	12

rolling

GroupbySeries.rolling(window, min_periods=None, center=False, win_type=None, on=None, closed=None, step=None, method=None, order_by=None, ascending=True)[source]

Create a fixed-size sliding window on a single grouped column.

The grouping columns are automatically used as PARTITION BY. A rolling window includes a fixed number of preceding rows for each row within the partition.

Parameters:

window (int) – Size of the moving window (number of rows).
min_periods (Optional[int]) – Minimum observations required. Defaults to window.
center (bool) – Not supported. Must be False.
win_type (Optional[str]) – Not supported. Must be None.
on (Optional[str]) – Not supported. Must be None.
closed (Optional[str]) – Not supported. Must be None.
step (Optional[int]) – Not supported. Must be None.
method (Optional[str]) – Not supported. Must be None.
order_by (Union[str, Sequence[str], None]) – Column(s) to order by within the window.
ascending (Union[bool, Sequence[bool]]) – Sort direction(s) for order_by columns.

Returns:

A window series on which aggregates (sum, mean, etc.) can be called.

Return type:

WindowSeries

Raises:

NotImplementedError – If center, win_type, on, closed, step, or method are set to non-default values.

See also

expanding: Expanding (cumulative) grouped window.
window_frame_legend_ext: Custom window specification.

Notes

Differences from pandas:

order_by and ascending are pylegend extensions not present in pandas.
center, win_type, on, closed, step, and method are not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame.groupby("Ship Name")["Order Id"].rolling(
    window=3, order_by="Order Id"
).mean().to_pandas().head(5)

	Order Id
0	10248.0
1	10249.0
2	10250.0
3	10251.0
4	10252.0

std

GroupbySeries.std(ddof=1, engine=None, engine_kwargs=None, numeric_only=False)[source]

Compute the standard deviation within each group.

Parameters:

ddof (int) – Degrees of freedom. 1 for sample standard deviation (STDDEV_SAMP), 0 for population standard deviation (STDDEV_POP).
engine (Optional[str]) – Not supported. Must be None.
engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.
numeric_only (bool) – Must be False. True is not supported.

Returns:

A frame with grouping columns and the standard deviation.

Return type:

Union[PandasApiTdsFrame, GroupbySeries]

Raises:

NotImplementedError – If ddof is not 0 or 1, or if engine, engine_kwargs, or numeric_only are set to unsupported values.

Notes

Equivalent to gseries.aggregate("std"). Maps to SQL STDDEV_SAMP() (ddof=1) or STDDEV_POP() (ddof=0).

Differences from pandas: only ddof=0 and ddof=1 are supported. engine, engine_kwargs, and numeric_only are not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame.groupby("Ship Name")["Order Id"].std().to_pandas().head(5)

	Ship Name	Order Id
0	Alfred's Futterkiste	143.827327
1	Alfreds Futterkiste	<NA>
2	Ana Trujillo Emparedados y helados	261.766435
3	Antonio Moreno Taquería	156.531908
4	Around the Horn	215.034046

sum

GroupbySeries.sum(numeric_only=False, min_count=0, engine=None, engine_kwargs=None)[source]

Compute the sum of values within each group.

Parameters:

numeric_only (bool) – Must be False. True is not supported.
min_count (int) – Must be 0. Non-zero values are not supported.
engine (Optional[str]) – Not supported. Must be None.
engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.

Returns:

A frame with grouping columns and the summed values.

Return type:

Union[PandasApiTdsFrame, GroupbySeries]

Notes

Equivalent to gseries.aggregate("sum").

Differences from pandas: numeric_only, engine, and engine_kwargs are not supported. min_count must be 0.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame.groupby("Ship Name")["Order Id"].sum().to_pandas().head(5)

	Ship Name	Order Id
0	Alfred's Futterkiste	54192
1	Alfreds Futterkiste	10643
2	Ana Trujillo Emparedados y helados	42618
3	Antonio Moreno Taquería	74195
4	Around the Horn	139254

transform

GroupbySeries.transform(func)[source]

Apply a partition-only window aggregate and broadcast back to every row.

Equivalent to pandas groupby['col'].transform('func'), which computes the aggregate per group and broadcasts the result back to every row.

Generates SQL like FUNC(col) OVER (PARTITION BY ...) and Pure like extend(over(~[grp]), ~col:{p,w,r | $r.col}:y | $y->func()).

Parameters:: func (Union[str, Callable[..., object]]) – The aggregation to apply within each partition. Accepts a named aggregation string ('sum', 'mean', 'min', 'max', 'count', 'std', 'var') or a callable that receives a WindowSeries and returns the result.
Returns:: A grouped series containing the broadcasted aggregate value for each row within its group.
Return type:: GroupbySeries

See also

aggregate: Reduce groups to a single row per group.
expanding: Expanding (cumulative) window on a grouped column.

Notes

Differences from pandas:

The result keeps every row (same row count as the input), matching pandas transform semantics.
Only aggregation functions are supported as func. Arbitrary element-wise transforms (e.g. lambda x: x + 1) are not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame["Group Sum"] = frame.groupby(
    "Ship Name"
)["Order Id"].transform("sum")
frame.head(5).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name	Group Sum
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier	52293
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten	63256
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes	150623
3	10251	1996-07-08	1996-08-05	1996-07-15	Victuailles en stock	105831
4	10252	1996-07-09	1996-08-06	1996-07-11	Suprêmes délices	128292

var

GroupbySeries.var(ddof=1, engine=None, engine_kwargs=None, numeric_only=False)[source]

Compute the variance within each group.

Parameters:

ddof (int) – Degrees of freedom. 1 for sample variance (VAR_SAMP), 0 for population variance (VAR_POP).
engine (Optional[str]) – Not supported. Must be None.
engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.
numeric_only (bool) – Must be False. True is not supported.

Returns:

A frame with grouping columns and the variance.

Return type:

Union[PandasApiTdsFrame, GroupbySeries]

Raises:

NotImplementedError – If ddof is not 0 or 1, or if engine, engine_kwargs, or numeric_only are set to unsupported values.

Notes

Equivalent to gseries.aggregate("var"). Maps to SQL VAR_SAMP() (ddof=1) or VAR_POP() (ddof=0).

Differences from pandas: only ddof=0 and ddof=1 are supported. engine, engine_kwargs, and numeric_only are not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame.groupby("Ship Name")["Order Id"].var().to_pandas().head(5)

	Ship Name	Order Id
0	Alfred's Futterkiste	20686.3
1	Alfreds Futterkiste	<NA>
2	Ana Trujillo Emparedados y helados	68521.666667
3	Antonio Moreno Taquería	24502.238095
4	Around the Horn	46239.641026

wavg_legend_ext

window_frame_legend_ext

GroupbySeries.window_frame_legend_ext(frame_spec=<pylegend.core.language.pandas_api.pandas_api_frame_spec.RowsBetween object>, order_by=None, ascending=True)[source]

Create a custom window specification on a single grouped column.

PyLegend extension — not present in pandas.

The grouping columns are automatically used as PARTITION BY. The frame_spec argument controls the ROWS BETWEEN or RANGE BETWEEN clause.

Parameters:

frame_spec (Optional[FrameSpec]) – A window-frame specification created via rows_between() or range_between().
order_by (Union[str, Sequence[str], None]) – Column(s) to order by within the window.
ascending (Union[bool, Sequence[bool]]) – Sort direction(s) for order_by columns.

Returns:

A window series on which aggregates can be called.

Return type:

WindowSeries

Raises:

TypeError – If frame_spec is not a RowsBetween or RangeBetween.

See also

expanding: Cumulative grouped window.
rolling: Fixed-size grouped sliding window.

Notes

Differences from pandas:

This method has no pandas equivalent. It is a pylegend extension for fine-grained control over the SQL ROWS BETWEEN / RANGE BETWEEN clause.

Examples

Download Interactive Notebook

import pylegend
from pylegend.core.language.pandas_api.pandas_api_frame_spec import (
    RowsBetween,
)
frame = pylegend.samples.pandas_api.northwind_orders_frame()

spec = RowsBetween(-2, 0)
frame.groupby("Ship Name")["Order Id"].window_frame_legend_ext(
    spec, order_by="Order Id"
).sum().to_pandas().head()

	Order Id
0	10248
1	10249
2	10250
3	10251
4	10252

PyLegend

Table of Contents

Quick Search

Pandas Groupby Series

agg

aggregate

corr

count

cov

cume_dist_legend_ext

expanding

max

mean

median

min

ntile_legend_ext

rank

rolling

std

sum

transform

var

wavg_legend_ext

window_frame_legend_ext

zscore_legend_ext