Pandas Groupby Series

A single-column proxy within a grouped context.

A GroupbySeries is the grouped counterpart of Series. It represents one column of a PandasApiGroupbyTdsFrame and is obtained by bracket-indexing a groupby object with a single column name.

Obtaining a GroupbySeries

Use bracket notation on a PandasApiGroupbyTdsFrame:

grouped = frame.groupby("group_col")
gseries = grouped["value_col"]   # -> GroupbySeries

Passing a list of column names returns a narrowed PandasApiGroupbyTdsFrame instead (not a GroupbySeries):

grouped[["col_a", "col_b"]]  # -> PandasApiGroupbyTdsFrame

The returned subclass matches the column type, following the same mapping as Series. For example, an integer column becomes an IntegerGroupbySeries.

Operations

A GroupbySeries must have an applied function (such as an aggregation or rank()) before it can be executed or assigned. Attempting to call to_sql_query() on a bare GroupbySeries without an applied function raises RuntimeError.

Typical usage patterns:

  • Grouped aggregation — call an aggregation method directly:

    frame.groupby("grp")["val"].sum()
    frame.groupby("grp")["val"].aggregate(["sum", "mean"])
    
  • Grouped rank — call rank() to get a window-ranked GroupbySeries that can be assigned back:

    frame["ranked"] = frame.groupby("grp")["val"].rank()
    

Assigning back to the frame

A GroupbySeries (with an applied function like rank()) can be assigned back to the parent PandasApiTdsFrame using bracket assignment:

frame["new_col"] = frame.groupby("grp")["val"].rank()

The assignment must target the same frame that was grouped.

See also

Series

The non-grouped single-column proxy.

PandasApiGroupbyTdsFrame

The groupby object that produces this.

PandasApiTdsFrame.groupby

Create a groupby object.

Notes

Differences from pandas:

  • A GroupbySeries is not iterable and does not support direct data access. It is an expression builder that lazily constructs the query.

  • Applying functions on a computed GroupbySeries expression is not supported. For example, (frame.groupby('grp')['col'] + 5).sum() raises NotImplementedError. Instead, do frame.groupby('grp')['col'].sum() + 5.

  • Only one function call is allowed per expression. To combine multiple, use separate assignment steps.

  • A bare GroupbySeries (without an aggregation or window function) cannot be executed. You must call an operation such as sum(), rank(), etc. first.

Examples

Download Interactive Notebook

agg

GroupbySeries.agg(func, axis=0, *args, **kwargs)[source]

Alias for aggregate().

See aggregate() for full documentation.

Return type:

Union[PandasApiTdsFrame, GroupbySeries]

aggregate

GroupbySeries.aggregate(func, axis=0, *args, **kwargs)[source]

Aggregate each group using one or more operations.

Reduce the single column within each group to a scalar value. The result is a PandasApiTdsFrame with one row per group, containing the grouping columns and the aggregated value(s).

Parameters:
  • func (Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc, List[Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc]], Mapping[Hashable, Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc, List[Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc]]]]]) –

    Aggregation specification:

    • str — a named aggregation ('sum', 'mean', 'min', 'max', 'count', 'std', 'var', plus aliases 'len', 'size').

    • callable — a lambda receiving the GroupbySeries and calling one of its aggregation methods (e.g. lambda x: x.sum()).

    • list of str — multiple named aggregations. Result columns are named "agg(col_name)".

    • dict{column_name: agg_spec}. Keys must match the GroupbySeries’ column name.

  • axis (Union[int, str]) – Must be 0 or 'index'.

Returns:

A frame with one row per group and the aggregated column(s), plus the grouping columns.

Return type:

Union[PandasApiTdsFrame, GroupbySeries]

Raises:
  • NotImplementedError – If called on a computed GroupbySeries expression (e.g. (frame.groupby('grp')['col'] + 5).aggregate('sum')).

  • ValueError – If a dict key does not match the GroupbySeries’ column name.

See also

agg

Alias for aggregate.

sum

Grouped sum.

PandasApiGroupbyTdsFrame.aggregate

Aggregate on the full groupby frame.

Notes

Differences from pandas:

  • The result always includes the grouping columns alongside the aggregated values.

  • Aggregation on a computed GroupbySeries expression is not supported. Call the aggregation directly, then apply arithmetic if needed.

  • When func is a dict, keys must exactly match the GroupbySeries’ column name.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Single named aggregation
frame.groupby("Ship Name")["Order Id"].aggregate(
    "sum"
).to_pandas().head(5)
Ship Name Order Id
0 Alfred's Futterkiste 54192
1 Alfreds Futterkiste 10643
2 Ana Trujillo Emparedados y helados 42618
3 Antonio Moreno Taquería 74195
4 Around the Horn 139254
# Multiple aggregations
frame.groupby("Ship Name")["Order Id"].aggregate(
    ["min", "max", "count"]
).head(5).to_pandas()
Ship Name min(Order Id) max(Order Id) count(Order Id)
0 Alfred's Futterkiste 10692 11011 5
1 Alfreds Futterkiste 10643 10643 1
2 Ana Trujillo Emparedados y helados 10308 10926 4
3 Antonio Moreno Taquería 10365 10856 7
4 Around the Horn 10355 11016 13

corr

count

GroupbySeries.count()[source]

Compute the count of non-null values within each group.

Returns:

A frame with grouping columns and the count per group.

Return type:

Union[PandasApiTdsFrame, GroupbySeries]

Notes

Equivalent to gseries.aggregate("count"). Maps to SQL COUNT(column).

Differences from pandas: the signature takes no parameters (the pandas version accepts normalize and other keyword arguments which are not supported here).

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].count().to_pandas().head(5)
Ship Name Order Id
0 Alfred's Futterkiste 5
1 Alfreds Futterkiste 1
2 Ana Trujillo Emparedados y helados 4
3 Antonio Moreno Taquería 7
4 Around the Horn 13

cov

cume_dist_legend_ext

GroupbySeries.cume_dist_legend_ext(ascending=True)[source]

Compute the cumulative distribution within each group.

PyLegend extension — not present in pandas.

Maps to SQL CUME_DIST() OVER (PARTITION BY ... ORDER BY col) and Pure cumulativeDistribution.

Parameters:

ascending (bool) – Whether to order in ascending direction.

Returns:

A grouped series containing cumulative distribution values (floats between 0 and 1).

Return type:

GroupbySeries

See also

rank

Compute grouped ranks.

ntile_legend_ext

Assign rows to numbered buckets.

Notes

Differences from pandas:

  • This method has no pandas equivalent. CUME_DIST is exposed as a pylegend extension.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame["CumeDist"] = frame.groupby(
    "Ship Name"
)["Order Id"].cume_dist_legend_ext()
frame.head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name CumeDist
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 0.2
1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 0.166667
2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 0.071429
3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 0.1
4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices 0.083333

expanding

GroupbySeries.expanding(min_periods=1, method=None, order_by=None, ascending=True)[source]

Create an expanding (cumulative) window on a single grouped column.

The grouping columns are automatically used as PARTITION BY. An expanding window includes all rows from the start of the partition up to the current row.

Parameters:
  • min_periods (int) – Minimum number of observations required to produce a value.

  • method (Optional[str]) – Not supported. Must be None.

  • order_by (Union[str, Sequence[str], None]) – Column(s) to order by within the window.

  • ascending (Union[bool, Sequence[bool]]) – Sort direction(s) for order_by columns.

Returns:

A window series on which aggregates (sum, mean, etc.) can be called.

Return type:

WindowSeries

Raises:

NotImplementedError – If method is not None.

See also

rolling

Fixed-size grouped sliding window.

window_frame_legend_ext

Custom window specification.

Notes

Differences from pandas:

  • order_by and ascending are pylegend extensions not present in pandas.

  • method is not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].expanding(
    order_by="Order Id"
).sum().to_pandas().head(5)
Order Id
0 10248
1 10249
2 10250
3 10251
4 10252

max

GroupbySeries.max(numeric_only=False, min_count=-1, engine=None, engine_kwargs=None)[source]

Compute the maximum of values within each group.

Parameters:
  • numeric_only (bool) – Must be False. True is not supported.

  • min_count (int) – Must be -1. Other values are not supported.

  • engine (Optional[str]) – Not supported. Must be None.

  • engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.

Returns:

A frame with grouping columns and the maximum values.

Return type:

Union[PandasApiTdsFrame, GroupbySeries]

Notes

Equivalent to gseries.aggregate("max"). Works on string columns as well (lexicographic maximum).

Differences from pandas: numeric_only, engine, engine_kwargs, and non-default min_count are not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].max().to_pandas().head(5)
Ship Name Order Id
0 Alfred's Futterkiste 11011
1 Alfreds Futterkiste 10643
2 Ana Trujillo Emparedados y helados 10926
3 Antonio Moreno Taquería 10856
4 Around the Horn 11016

mean

GroupbySeries.mean(numeric_only=False, engine=None, engine_kwargs=None)[source]

Compute the mean of values within each group.

Parameters:
  • numeric_only (bool) – Must be False. True is not supported.

  • engine (Optional[str]) – Not supported. Must be None.

  • engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.

Returns:

A frame with grouping columns and the mean values.

Return type:

Union[PandasApiTdsFrame, GroupbySeries]

Notes

Equivalent to gseries.aggregate("mean"). Maps to SQL AVG().

Differences from pandas: numeric_only, engine, and engine_kwargs are not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].mean().to_pandas().head(5)
Ship Name Order Id
0 Alfred's Futterkiste 10838.4
1 Alfreds Futterkiste 10643.0
2 Ana Trujillo Emparedados y helados 10654.5
3 Antonio Moreno Taquería 10599.285714
4 Around the Horn 10711.846154

median

GroupbySeries.median()[source]

Compute the median within each group.

Maps to PERCENTILE_CONT(0.5) at the SQL level.

Returns:

Grouped median values.

Return type:

Union[PandasApiTdsFrame, GroupbySeries]

See also

mean

Compute group means.

aggregate

General grouped aggregation.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].median().to_pandas().head(5)
Ship Name Order Id
0 Alfred's Futterkiste 10835.0
1 Alfreds Futterkiste 10643.0
2 Ana Trujillo Emparedados y helados 10692.0
3 Antonio Moreno Taquería 10573.0
4 Around the Horn 10743.0

min

GroupbySeries.min(numeric_only=False, min_count=-1, engine=None, engine_kwargs=None)[source]

Compute the minimum of values within each group.

Parameters:
  • numeric_only (bool) – Must be False. True is not supported.

  • min_count (int) – Must be -1. Other values are not supported.

  • engine (Optional[str]) – Not supported. Must be None.

  • engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.

Returns:

A frame with grouping columns and the minimum values.

Return type:

Union[PandasApiTdsFrame, GroupbySeries]

Notes

Equivalent to gseries.aggregate("min"). Works on string columns as well (lexicographic minimum).

Differences from pandas: numeric_only, engine, engine_kwargs, and non-default min_count are not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].min().to_pandas().head(5)
Ship Name Order Id
0 Alfred's Futterkiste 10692
1 Alfreds Futterkiste 10643
2 Ana Trujillo Emparedados y helados 10308
3 Antonio Moreno Taquería 10365
4 Around the Horn 10355

ntile_legend_ext

GroupbySeries.ntile_legend_ext(num_buckets, ascending=True)[source]

Assign rows to numbered buckets within each group.

PyLegend extension — not present in pandas.

Maps to SQL NTILE(n) OVER (PARTITION BY ... ORDER BY col) and Pure ntile.

Parameters:
  • num_buckets (int) – Number of buckets to distribute rows into.

  • ascending (bool) – Whether to order in ascending direction.

Returns:

A grouped series containing bucket numbers (1-based).

Return type:

GroupbySeries

See also

rank

Compute grouped ranks.

cume_dist_legend_ext

Cumulative distribution within groups.

Notes

Differences from pandas:

  • This method has no pandas equivalent. NTILE is exposed as a pylegend extension.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame["Quartile"] = frame.groupby(
    "Ship Name"
)["Order Id"].ntile_legend_ext(4)
frame.head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name Quartile
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 1
1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 1
2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 1
3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 1
4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices 1

rank

GroupbySeries.rank(method='min', ascending=True, na_option='bottom', pct=False, axis=0)[source]

Compute the rank of values within each group.

Return a new GroupbySeries containing the rank of each value within its group. The grouping columns act as the PARTITION BY clause in the underlying SQL window function. The result can be assigned back to the parent frame or executed directly as a standalone single-column query.

Parameters:
  • method (str) –

    How to rank equal values:

    • 'min' : Lowest rank in the group of ties (SQL RANK()).

    • 'first' : Ranks by order of appearance within the group (SQL ROW_NUMBER()).

    • 'dense' : Like 'min' but no gaps (SQL DENSE_RANK()).

  • ascending (bool) – Whether to rank in ascending order.

  • na_option (str) – Only 'bottom' is supported.

  • pct (bool) – If True, compute percentage ranks (SQL PERCENT_RANK()). Returns a FloatGroupbySeries. Only supported with method='min'.

  • axis (Union[int, str]) – Must be 0 or 'index'.

Returns:

An IntegerGroupbySeries (or FloatGroupbySeries when pct=True) containing the ranks within each group.

Return type:

GroupbySeries

Raises:

NotImplementedError – If called on a computed GroupbySeries expression (e.g. (frame.groupby('grp')['col'] + 5).rank()). Call rank() first, then apply arithmetic. If method is not 'min', 'first', or 'dense'. If na_option is not 'bottom'. If pct=True with a method other than 'min'.

See also

Series.rank

Frame-level rank (no partitioning).

PandasApiGroupbyTdsFrame.rank

Rank all non-grouping columns.

Notes

Differences from pandas:

  • The 'average' and 'max' methods are not supported.

  • na_option only supports 'bottom'.

  • pct=True is only supported with method='min'.

  • Calling rank() on a computed GroupbySeries expression is not supported. Call rank() first, then apply arithmetic: frame.groupby('grp')['col'].rank() + 5.

  • Only one window-function call is allowed per expression. To combine multiple, use separate assignments.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Execute a grouped ranked series directly
frame.groupby("Ship Name")["Order Id"].rank().to_pandas().head()
Order Id
0 1
1 1
2 1
3 1
4 1
# Assign a grouped rank to the parent frame
frame["Order Rank"] = frame.groupby(
    "Ship Name"
)["Order Id"].rank()
frame.head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name Order Rank
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 1
1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 1
2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 1
3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 1
4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices 1
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Dense rank, descending
frame["Dense Rank"] = frame.groupby(
    "Ship Name"
)["Order Id"].rank(method="dense", ascending=False)
frame.head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name Dense Rank
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 5
1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 6
2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 14
3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 10
4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices 12

rolling

GroupbySeries.rolling(window, min_periods=None, center=False, win_type=None, on=None, closed=None, step=None, method=None, order_by=None, ascending=True)[source]

Create a fixed-size sliding window on a single grouped column.

The grouping columns are automatically used as PARTITION BY. A rolling window includes a fixed number of preceding rows for each row within the partition.

Parameters:
  • window (int) – Size of the moving window (number of rows).

  • min_periods (Optional[int]) – Minimum observations required. Defaults to window.

  • center (bool) – Not supported. Must be False.

  • win_type (Optional[str]) – Not supported. Must be None.

  • on (Optional[str]) – Not supported. Must be None.

  • closed (Optional[str]) – Not supported. Must be None.

  • step (Optional[int]) – Not supported. Must be None.

  • method (Optional[str]) – Not supported. Must be None.

  • order_by (Union[str, Sequence[str], None]) – Column(s) to order by within the window.

  • ascending (Union[bool, Sequence[bool]]) – Sort direction(s) for order_by columns.

Returns:

A window series on which aggregates (sum, mean, etc.) can be called.

Return type:

WindowSeries

Raises:

NotImplementedError – If center, win_type, on, closed, step, or method are set to non-default values.

See also

expanding

Expanding (cumulative) grouped window.

window_frame_legend_ext

Custom window specification.

Notes

Differences from pandas:

  • order_by and ascending are pylegend extensions not present in pandas.

  • center, win_type, on, closed, step, and method are not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].rolling(
    window=3, order_by="Order Id"
).mean().to_pandas().head(5)
Order Id
0 10248.0
1 10249.0
2 10250.0
3 10251.0
4 10252.0

std

GroupbySeries.std(ddof=1, engine=None, engine_kwargs=None, numeric_only=False)[source]

Compute the standard deviation within each group.

Parameters:
  • ddof (int) – Degrees of freedom. 1 for sample standard deviation (STDDEV_SAMP), 0 for population standard deviation (STDDEV_POP).

  • engine (Optional[str]) – Not supported. Must be None.

  • engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.

  • numeric_only (bool) – Must be False. True is not supported.

Returns:

A frame with grouping columns and the standard deviation.

Return type:

Union[PandasApiTdsFrame, GroupbySeries]

Raises:

NotImplementedError – If ddof is not 0 or 1, or if engine, engine_kwargs, or numeric_only are set to unsupported values.

Notes

Equivalent to gseries.aggregate("std"). Maps to SQL STDDEV_SAMP() (ddof=1) or STDDEV_POP() (ddof=0).

Differences from pandas: only ddof=0 and ddof=1 are supported. engine, engine_kwargs, and numeric_only are not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].std().to_pandas().head(5)
Ship Name Order Id
0 Alfred's Futterkiste 143.827327
1 Alfreds Futterkiste <NA>
2 Ana Trujillo Emparedados y helados 261.766435
3 Antonio Moreno Taquería 156.531908
4 Around the Horn 215.034046

sum

GroupbySeries.sum(numeric_only=False, min_count=0, engine=None, engine_kwargs=None)[source]

Compute the sum of values within each group.

Parameters:
  • numeric_only (bool) – Must be False. True is not supported.

  • min_count (int) – Must be 0. Non-zero values are not supported.

  • engine (Optional[str]) – Not supported. Must be None.

  • engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.

Returns:

A frame with grouping columns and the summed values.

Return type:

Union[PandasApiTdsFrame, GroupbySeries]

Notes

Equivalent to gseries.aggregate("sum").

Differences from pandas: numeric_only, engine, and engine_kwargs are not supported. min_count must be 0.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].sum().to_pandas().head(5)
Ship Name Order Id
0 Alfred's Futterkiste 54192
1 Alfreds Futterkiste 10643
2 Ana Trujillo Emparedados y helados 42618
3 Antonio Moreno Taquería 74195
4 Around the Horn 139254

transform

GroupbySeries.transform(func)[source]

Apply a partition-only window aggregate and broadcast back to every row.

Equivalent to pandas groupby['col'].transform('func'), which computes the aggregate per group and broadcasts the result back to every row.

Generates SQL like FUNC(col) OVER (PARTITION BY ...) and Pure like extend(over(~[grp]), ~col:{p,w,r | $r.col}:y | $y->func()).

Parameters:

func (Union[str, Callable[..., object]]) – The aggregation to apply within each partition. Accepts a named aggregation string ('sum', 'mean', 'min', 'max', 'count', 'std', 'var') or a callable that receives a WindowSeries and returns the result.

Returns:

A grouped series containing the broadcasted aggregate value for each row within its group.

Return type:

GroupbySeries

See also

aggregate

Reduce groups to a single row per group.

expanding

Expanding (cumulative) window on a grouped column.

Notes

Differences from pandas:

  • The result keeps every row (same row count as the input), matching pandas transform semantics.

  • Only aggregation functions are supported as func. Arbitrary element-wise transforms (e.g. lambda x: x + 1) are not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame["Group Sum"] = frame.groupby(
    "Ship Name"
)["Order Id"].transform("sum")
frame.head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name Group Sum
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 52293
1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 63256
2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 150623
3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 105831
4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices 128292

var

GroupbySeries.var(ddof=1, engine=None, engine_kwargs=None, numeric_only=False)[source]

Compute the variance within each group.

Parameters:
  • ddof (int) – Degrees of freedom. 1 for sample variance (VAR_SAMP), 0 for population variance (VAR_POP).

  • engine (Optional[str]) – Not supported. Must be None.

  • engine_kwargs (Optional[Dict[str, bool]]) – Not supported. Must be None.

  • numeric_only (bool) – Must be False. True is not supported.

Returns:

A frame with grouping columns and the variance.

Return type:

Union[PandasApiTdsFrame, GroupbySeries]

Raises:

NotImplementedError – If ddof is not 0 or 1, or if engine, engine_kwargs, or numeric_only are set to unsupported values.

Notes

Equivalent to gseries.aggregate("var"). Maps to SQL VAR_SAMP() (ddof=1) or VAR_POP() (ddof=0).

Differences from pandas: only ddof=0 and ddof=1 are supported. engine, engine_kwargs, and numeric_only are not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.groupby("Ship Name")["Order Id"].var().to_pandas().head(5)
Ship Name Order Id
0 Alfred's Futterkiste 20686.3
1 Alfreds Futterkiste <NA>
2 Ana Trujillo Emparedados y helados 68521.666667
3 Antonio Moreno Taquería 24502.238095
4 Around the Horn 46239.641026

wavg_legend_ext

window_frame_legend_ext

GroupbySeries.window_frame_legend_ext(frame_spec=<pylegend.core.language.pandas_api.pandas_api_frame_spec.RowsBetween object>, order_by=None, ascending=True)[source]

Create a custom window specification on a single grouped column.

PyLegend extension — not present in pandas.

The grouping columns are automatically used as PARTITION BY. The frame_spec argument controls the ROWS BETWEEN or RANGE BETWEEN clause.

Parameters:
  • frame_spec (Optional[FrameSpec]) – A window-frame specification created via rows_between() or range_between().

  • order_by (Union[str, Sequence[str], None]) – Column(s) to order by within the window.

  • ascending (Union[bool, Sequence[bool]]) – Sort direction(s) for order_by columns.

Returns:

A window series on which aggregates can be called.

Return type:

WindowSeries

Raises:

TypeError – If frame_spec is not a RowsBetween or RangeBetween.

See also

expanding

Cumulative grouped window.

rolling

Fixed-size grouped sliding window.

Notes

Differences from pandas:

  • This method has no pandas equivalent. It is a pylegend extension for fine-grained control over the SQL ROWS BETWEEN / RANGE BETWEEN clause.

Examples

Download Interactive Notebook

import pylegend
from pylegend.core.language.pandas_api.pandas_api_frame_spec import (
    RowsBetween,
)
frame = pylegend.samples.pandas_api.northwind_orders_frame()
spec = RowsBetween(-2, 0)
frame.groupby("Ship Name")["Order Id"].window_frame_legend_ext(
    spec, order_by="Order Id"
).sum().to_pandas().head()
Order Id
0 10248
1 10249
2 10250
3 10251
4 10252

zscore_legend_ext