Pandas TDS Frame

The PandasApiTdsFrame class provides a Pandas-like interface for working with TDS (Tabular Data Store) frames. It offers methods for data manipulation, filtering, aggregation, joins, and window functions.

agg

PandasApiTdsFrame.agg(func, axis=0, *args, **kwargs)[source]

Alias for aggregate(). See aggregate for full documentation.

Return type:: PandasApiTdsFrame

aggregate

PandasApiTdsFrame.aggregate(func, axis=0, *args, **kwargs)[source]

Aggregate the TDS frame using one or more operations.

Apply one or more aggregation functions across all columns or specific columns, collapsing the frame into a single-row summary. Supported aggregation strings are 'sum', 'mean', 'min', 'max', 'count', 'std', 'var', as well as aliases 'len', 'size' (both map to count), and 'average' / 'avg' (map to mean). Along with these, callables and numpy universal functions are supported.

Parameters:

func (Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc, List[Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc]], Mapping[Hashable, Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc, List[Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc]]]]]) –
Aggregation specification. Accepted forms:
- str : A named aggregation (e.g. 'sum') applied to every column.
- callable : A function that receives a column’s Series proxy and returns an aggregated value (e.g. lambda x: x.sum()), applied to every column.
- np.ufunc : A NumPy universal function (e.g. np.sum), applied to every column.
- list : A list containing one of the above, applied to every column. Output column names are prefixed with the function name (e.g. 'sum(col)').
- dict : A mapping of column name → aggregation (str, callable, np.ufunc, or a list of these). Only the specified columns appear in the result.
axis (Union[int, str]) – Axis along which to aggregate. Only 0 / 'index' is supported.
*args (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Not supported. Passing positional arguments raises NotImplementedError.
**kwargs (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Not supported. Passing keyword arguments raises NotImplementedError.

Returns:

A new single-row TDS frame with the aggregated values.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If axis is not 0 or 'index'. If extra *args or **kwargs are passed.
TypeError – If func is not a supported type (str, callable, np.ufunc, list, or dict). If dict keys are not strings, or dict/list values contain unsupported types.
ValueError – If a dict key refers to a column that does not exist in the frame.

See also

agg: Alias for aggregate.
groupby: Group rows before aggregating.
sum: Convenience method for sum aggregation.
mean: Convenience method for mean aggregation.

Notes

Differences from pandas:

In pandas, aggregate can return a multi-row result when multiple functions are applied (one row per function). Here, multiple functions per column produce multiple columns in a single-row result (e.g. {'col': ['min', 'max']} yields columns 'min(col)' and 'max(col)').
Extra *args and **kwargs are not forwarded to the aggregation function; passing them raises NotImplementedError.
axis=1 (column-wise aggregation) is not supported.
When func is a list, it must contain exactly one element. Multi-element lists behave identically to a single- element list mapping applied to every column.

Examples

Download Interactive Notebook

import pylegend
import numpy as np
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Aggregate a single column with a string function
frame.aggregate({"Order Id": "count"}).to_pandas()

	Order Id
0	830

# Aggregate multiple columns with different functions
frame.aggregate({"Order Id": "min", "Ship Name": "count"}).to_pandas()

	Order Id	Ship Name
0	10248	830

# Broadcast a single function to all columns
frame.aggregate("count").to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name
0	830	830	830	809	830

# Use a lambda for custom aggregation
frame.aggregate({
    "Order Id": lambda x: x.max(),
    "Order Date": np.min,
    "Order Date": np.max,
    "Shipped Date": "min"
}).to_pandas()

	Order Id	Order Date	Shipped Date
0	11077	1998-05-06	1996-07-10

apply

PandasApiTdsFrame.apply(func, axis=0, raw=False, result_type=None, args=(), by_row='compat', engine='python', engine_kwargs=None, **kwargs)[source]

Apply a function to each column of the TDS frame.

The callable receives a Series proxy for each column and must return a transformed value. The function is applied independently to every column, producing a new frame with the same column names but transformed values. Additional positional and keyword arguments can be forwarded to the callable via args and **kwargs.

Parameters:

func (Union[Callable[[Concatenate[Series, ParamSpec(P, bound= None)]], Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str]) – A function that takes a Series (column proxy) as its first argument and returns a primitive value or expression. String-based function names (e.g. 'sum') are not supported; use aggregate() for named aggregations.
axis (Union[int, str]) – Only column-wise application is supported (axis=0 or 'index'). Row-wise application (axis=1) raises ValueError.
raw (bool) – Must be False. True is not supported.
result_type (Optional[str]) – Must be None. Any value raises NotImplementedError.
args (Tuple[Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive], ...]) – Positional arguments to pass to func after the Series argument.
by_row (Union[bool, str]) – Must be False or 'compat'. True raises NotImplementedError.
engine (str) – Must be 'python'. 'numba' is not supported.
engine_kwargs (Optional[Dict[str, Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]]]) – Must be None. Not supported.
**kwargs (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Additional keyword arguments forwarded to func.

Returns:

A new TDS frame with the function applied to every column.

Return type:

PandasApiTdsFrame

Raises:

ValueError – If axis is not 0 or 'index'.
NotImplementedError – If raw=True, result_type is set, by_row=True, engine='numba', engine_kwargs is set, or func is a string.
TypeError – If func is not callable.

See also

assign: Add or overwrite specific columns with callables.
aggregate: Aggregate (reduce) columns to a single row.

Notes

Differences from pandas:

In pandas, apply with axis=0 passes each column as a pandas.Series to the function, which can return a scalar (reducing the frame) or a Series (transforming it). Here, func receives a column Series proxy and must return a scalar expression that defines a row-level transformation. This means apply always produces a frame with the same number of rows — it cannot reduce the frame the way pandas apply can.
Row-wise application (axis=1) is not supported.
String function names (e.g. 'sum') are not supported. Use aggregate() instead.
raw=True, result_type, engine='numba', and engine_kwargs are not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Apply a lambda to every column
frame.filter(items=["Order Id"]).apply(
    lambda x: x * 2
).head(5).to_pandas()

	Order Id
0	20496
1	20498
2	20500
3	20502
4	20504

# Apply a function with extra arguments
def add_offset(series, offset, *, scale=1):
    return series * scale + offset

frame.filter(items=["Order Id"]).apply(
    add_offset, args=(100,), scale=2
).head(5).to_pandas()

	Order Id
0	20596
1	20598
2	20600
3	20602
4	20604

assign

PandasApiTdsFrame.assign(**kwargs)[source]

Add or overwrite columns using keyword arguments.

Return a new TDS frame with new columns added (or existing columns overwritten). Each keyword argument defines a column name and a callable that computes the column’s value from each row.

Parameters:: **kwargs (Callable[[PandasApiTdsRow], Union[int, float, bool, str, date, datetime, PyLegendPrimitive]]) – Each keyword argument is a column name mapped to a function that takes a PandasApiTdsRow and returns a scalar value. Supported return types are int, float, bool, str, date, datetime, and PyLegendPrimitive.
Returns:: A new TDS frame with the additional (or overwritten) columns.
Return type:: PandasApiTdsFrame
Raises:: RuntimeError – If the callable returns an unsupported type (e.g. a list).

See also

filter: Select columns by name, substring, or regex.
drop: Remove columns by label.
rename: Rename existing columns.

Notes

Differences from pandas:

In pandas assign, each keyword argument can be a callable or a static value (e.g. frame.assign(col=5)). Here, every value must be a callable that takes a row, even for constants (e.g. frame.assign(col=lambda x: 5)).
Column values are accessed via typed accessor methods such as x.get_integer("col") and x.get_string("col"), or via bracket notation x["col"].
Returning a non-scalar type (e.g. a list) from the callable raises a RuntimeError, unlike pandas which would broadcast or create nested data.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Add a constant column
frame.assign(constant=lambda x: 100).head(3).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name	constant
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier	100
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten	100
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes	100

# Add a computed column derived from existing columns
frame.assign(
    ship_upper=lambda x: x.get_string("Ship Name").upper()
).head(3).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name	ship_upper
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier	VINS ET ALCOOLS CHEVALIER
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten	TOMS SPEZIALITÄTEN
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes	HANARI CARNES

# Overwrite an existing column
frame.assign(
    **{"Ship Name": lambda x: x.get_string("Ship Name").upper()}
).head(3).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name
0	10248	1996-07-04	1996-08-01	1996-07-16	VINS ET ALCOOLS CHEVALIER
1	10249	1996-07-05	1996-08-16	1996-07-10	TOMS SPEZIALITÄTEN
2	10250	1996-07-08	1996-08-05	1996-07-12	HANARI CARNES

cast

PandasApiTdsFrame.cast(column_type_map)[source]

Change the declared type of one or more columns.

Return a new TDS frame whose column metadata reflects the requested type changes. The underlying data is not transformed in SQL (no CAST expression is emitted); instead a Pure ->cast(...) clause is appended so that the Legend Engine re-interprets the column under the target type.

A cast is allowed only when the source and target types share a subclass relationship in the PyLegend type hierarchy. For example, Integer → BigInt is valid because BigInt is a sub-type of Integer, but String → Integer is not.

Parameters:

column_type_map (Dict[str, Union[PrimitiveType, Tuple[PrimitiveType, ...]]]) – A mapping from column name to the desired target type. Values are produced by the helpers in pylegend.core.language.type_factory — for example tf.bigint(), tf.varchar(200), tf.numeric(10, 2). An empty dict is valid and returns a copy of the frame with unchanged columns.

Returns:

A new TDS frame with the cast column metadata. The original frame is never mutated.

Return type:

PandasApiTdsFrame

Raises:

ValueError – If a column name in column_type_map does not exist in the frame, or if the source-to-target conversion is not allowed (the types do not share a subclass relationship).
TypeError – If the target column is a non-primitive column (e.g. an EnumTdsColumn). Only PrimitiveTdsColumn columns can be cast.

See also

assign: Add or overwrite columns with computed values.
rename: Rename columns without changing their types.

Notes

Differences from pandas:

Pandas DataFrame.astype() converts data values in memory. cast changes only the declared column type in the query metadata; no SQL CAST expression is generated.
The allowed conversions follow the Legend type hierarchy, not Python/NumPy dtype-coercion rules.
Parameterised types such as Varchar(200) and Numeric(10, 2) are supported through the type_factory helpers and are reflected in the generated Pure ->cast(...) clause.

Cross-branch casts (e.g. Integer → Float, String → Boolean) raise ValueError.

Examples

Download Interactive Notebook

import pylegend
from pylegend.core.language import type_factory as tf

frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Widen an integer column to BigInt
casted = frame.cast({"Order Id": tf.bigint()})
casted.head(3).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes

# Cast multiple columns at once
casted = frame.cast({
    "Order Id": tf.bigint(),
    "Ship Name": tf.varchar(200),
})
casted.head(3).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes

concat_legend_ext

PandasApiTdsFrame.concat_legend_ext(other)[source]

Concatenate this frame with another frame vertically.

PyLegend extension — not present in pandas.

Produces a SQL UNION ALL of the two frames. Both frames must have compatible schemas (same column names and types).

Parameters:: other (PandasApiTdsFrame) – The frame to concatenate below this one.
Returns:: A new TDS frame whose rows are the rows of self followed by the rows of other.
Return type:: PandasApiTdsFrame
Raises:: TypeError – If other is not a PandasApiBaseTdsFrame.

See also

merge: SQL join of two frames.

Notes

Differences from pandas:

In pandas, pd.concat is a top-level function that accepts a list of DataFrames. Here, concat_legend_ext is a method on a PandasApiTdsFrame and only supports vertical concatenation (UNION ALL) of two frames with the same schema.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

top = frame.head(3)
bottom = frame.head(3)
top.concat_legend_ext(bottom).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes
3	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier
4	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten
5	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes

count

PandasApiTdsFrame.count(axis=0, numeric_only=False, **kwargs)[source]

Count non-null values in each column.

Convenience method equivalent to aggregate('count'). Returns a single-row TDS frame with the count of non-null values for every column.

Parameters:

axis (Union[int, str]) – Only 0 / 'index' is supported.
numeric_only (bool) – Must be False. True is not supported.
**kwargs (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Not supported.

Returns:

A single-row TDS frame with non-null counts per column.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If any parameter is set to an unsupported value.

See also

aggregate: General aggregation method.
sum: Compute column sums.

Notes

Internally delegates to aggregate('count'). The same pandas deviations as sum() apply (axis=1, numeric_only=True not supported). Unlike sum, count does not have a skipna parameter since counting is always of non-null values.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Count non-null values in each column
frame.count().to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name
0	830	830	830	809	830

cume_dist_legend_ext

PandasApiTdsFrame.cume_dist_legend_ext(ascending=True)[source]

Compute the cumulative distribution of each column.

PyLegend extension — not present in pandas.

Maps to SQL CUME_DIST() OVER (ORDER BY col) and Pure cumulativeDistribution.

Parameters:: ascending (bool) – Whether to order in ascending direction.
Returns:: A new TDS frame with cumulative distribution values (floats between 0 and 1) replacing every column.
Return type:: PandasApiTdsFrame

See also

rank: Compute column ranks.
ntile_legend_ext: Assign rows to numbered buckets.

Notes

Differences from pandas:

This method has no pandas equivalent. CUME_DIST is exposed as a pylegend extension.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame.filter(
    items=["Order Id"]
).cume_dist_legend_ext().head(5).to_pandas()

	Order Id
0	0.001205
1	0.00241
2	0.003614
3	0.004819
4	0.006024

drop

PandasApiTdsFrame.drop(labels=None, axis=1, index=None, columns=None, level=None, inplace=False, errors='raise')[source]

Remove columns from the TDS frame by label.

Return a new TDS frame with the specified columns removed. Columns can be identified via labels (with axis=1) or via the columns parameter directly. Accepts a single column name, a list, tuple, or set of names.

Parameters:

labels (Union[str, Sequence[str], Set[str], None]) – Column name(s) to drop. Mutually exclusive with columns.
axis (Union[str, int, PyLegendInteger]) – The axis to drop along. Only column-axis (1 / 'columns') is supported.
index (Union[str, Sequence[str], Set[str], None]) – Not supported. Passing any value raises NotImplementedError.
columns (Union[str, Sequence[str], Set[str], None]) – Column name(s) to drop. Mutually exclusive with labels.
level (Union[str, int, PyLegendInteger, None]) – Not supported. Passing any value raises NotImplementedError.
inplace (Union[bool, PyLegendBoolean]) – Must be False. In-place mutation is not supported.
errors (str) – If 'raise', a KeyError is raised when any label is not found. If 'ignore', missing labels are silently skipped.

Returns:

A new TDS frame without the specified columns.

Return type:

PandasApiTdsFrame

Raises:

ValueError – If both labels and columns are provided, or if neither is provided. If axis is an invalid value (not 0, 1, 'index', or 'columns').
NotImplementedError – If axis is 0 / 'index' (row-level drop). If index or level is provided. If inplace is True.
KeyError – If any specified column does not exist in the frame and errors='raise'.
TypeError – If labels or columns is an unsupported type (e.g. a callable).

See also

filter: Select columns by name, substring, or regex.
assign: Add or overwrite columns.
rename: Rename existing columns.

Notes

Differences from pandas:

In pandas, drop can remove rows (axis=0) or columns (axis=1). Here, only column-axis dropping is supported (axis=1). Passing axis=0 raises NotImplementedError.
The axis parameter defaults to 1 (columns), whereas in pandas it defaults to 0 (rows). This means bare frame.drop("col") drops a column here but would attempt to drop a row label in pandas.
The index and level parameters are not supported.
inplace=True is not supported; always returns a new frame.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Drop a single column
frame.drop(columns="Ship Name").head(3).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date
0	10248	1996-07-04	1996-08-01	1996-07-16
1	10249	1996-07-05	1996-08-16	1996-07-10
2	10250	1996-07-08	1996-08-05	1996-07-12

# Drop multiple columns
frame.drop(columns=["Ship Name", "Order Date"]).head(3).to_pandas()

	Order Id	Required Date	Shipped Date
0	10248	1996-08-01	1996-07-16
1	10249	1996-08-16	1996-07-10
2	10250	1996-08-05	1996-07-12

# Using labels parameter
frame.drop(labels=["Ship Name"], axis=1).head(3).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date
0	10248	1996-07-04	1996-08-01	1996-07-16
1	10249	1996-07-05	1996-08-16	1996-07-10
2	10250	1996-07-08	1996-08-05	1996-07-12

# Ignore missing columns instead of raising an error
frame.drop(columns=["Ship Name", "NonExistent"], errors="ignore").head(3).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date
0	10248	1996-07-04	1996-08-01	1996-07-16
1	10249	1996-07-05	1996-08-16	1996-07-10
2	10250	1996-07-08	1996-08-05	1996-07-12

drop_duplicates

PandasApiTdsFrame.drop_duplicates(subset=None, *, keep='first', inplace=False, ignore_index=False)[source]

Remove duplicate rows.

Returns a new TDS frame with duplicate rows removed, optionally considering only a subset of columns for identifying duplicates.

Parameters:

subset (Union[str, List[str], None]) – Column label or list of labels to consider for identifying duplicates. If None, all columns are used.
keep (str) – Must be 'first'. Only keeping the first occurrence is supported.
inplace (bool) – Must be False. In-place modification is not supported.
ignore_index (bool) – Must be False. Not supported.

Returns:

A new TDS frame with duplicates removed.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If keep is not 'first', or inplace / ignore_index are True.

Notes

Differences from pandas:

Only keep='first' is supported. 'last' and False are not supported.
inplace=True and ignore_index=True are not supported.
Generates SQL SELECT DISTINCT ON ... or equivalent.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Remove rows with duplicate Ship Name
frame.drop_duplicates(subset=["Ship Name"]).head(5).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes
3	10251	1996-07-08	1996-08-05	1996-07-15	Victuailles en stock
4	10252	1996-07-09	1996-08-06	1996-07-11	Suprêmes délices

dropna

PandasApiTdsFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False, ignore_index=False)[source]

Remove rows with missing values.

Return a new TDS frame with rows containing NA / null values removed. The check can be scoped to specific columns via subset and controlled via how.

Parameters:

axis (Union[int, str]) – Only 0 / 'index' (drop rows) is supported. 1 / 'columns' (drop columns) raises NotImplementedError.
how (str) –
- 'any' : Drop the row if any of the considered columns contain a null value.
- 'all' : Drop the row only if all of the considered columns are null.
thresh (Optional[int]) – Not supported. Passing any value raises NotImplementedError.
subset (Union[str, Sequence[str], None]) – Column names to consider when checking for nulls. If None (default), all columns are considered. An empty list with how='any' keeps all rows; an empty list with how='all' drops all rows.
inplace (bool) – Must be False. True raises NotImplementedError.
ignore_index (bool) – Must be False. True raises NotImplementedError.

Returns:

A new TDS frame with rows containing nulls removed.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If axis=1, thresh is set, inplace=True, or ignore_index=True.
ValueError – If axis is not a recognised value or how is not 'any' or 'all'.
TypeError – If subset is not a list, tuple, or set.
KeyError – If any column in subset does not exist in the frame.

See also

fillna: Fill missing values instead of dropping rows.

Notes

Differences from pandas:

axis=1 (dropping columns with nulls) is not supported.
thresh (minimum number of non-null values to keep a row) is not supported.
inplace=True is not supported; a new frame is always returned.
ignore_index=True is not supported.
Passing an empty subset=[] with how='any' is a no-op (all rows are kept). With how='all', an empty subset=[] drops all rows (the filter becomes false).

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Drop rows where any column is null
frame.dropna().head(5).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes
3	10251	1996-07-08	1996-08-05	1996-07-15	Victuailles en stock
4	10252	1996-07-09	1996-08-06	1996-07-11	Suprêmes délices

# Drop rows where all columns are null
frame.dropna(how="all").head(5).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes
3	10251	1996-07-08	1996-08-05	1996-07-15	Victuailles en stock
4	10252	1996-07-09	1996-08-06	1996-07-11	Suprêmes délices

# Only consider specific columns
frame.dropna(subset=["Ship Name"]).head(5).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes
3	10251	1996-07-08	1996-08-05	1996-07-15	Victuailles en stock
4	10252	1996-07-09	1996-08-06	1996-07-11	Suprêmes délices

expanding

PandasApiTdsFrame.expanding(min_periods=1, axis=0, method=None, order_by=None, ascending=True)[source]

Create an expanding window frame for window-aggregate computations.

An expanding window includes all rows from the start of the partition up to the current row. This is useful for running totals, running averages, and similar cumulative calculations.

Parameters:

min_periods (int) – Minimum number of observations in the window required to have a value; otherwise, result is null.
axis (Union[int, str]) – Only 0 / 'index' is supported.
method (Optional[str]) – Must be None or 'python'.
order_by (Union[str, Sequence[str], None]) – Column(s) to order by within the window. Required for deterministic results.
ascending (Union[bool, Sequence[bool]]) – Sort order for the order_by columns.

Returns:

A window frame on which window aggregates (sum, mean, min, max, etc.) can be called.

Return type:

PandasApiWindowTdsFrame

See also

rolling: Fixed-size sliding window.
groupby: Group rows before applying window functions.

Raises:: NotImplementedError – If axis is not 0, method is not None or 'python', or min_periods is less than 1.

Notes

Differences from pandas:

order_by and ascending are pylegend extensions not present in pandas. They control the ORDER BY clause inside the SQL OVER(...) window specification.
axis=1 is not supported.
method='table' is not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Running sum of Order Id ordered by Order Id
frame.filter(items=["Order Id"]).expanding(
    order_by="Order Id"
).aggregate("sum").head(5).to_pandas()

	Order Id
0	10248
1	20497
2	30747
3	40998
4	51250

fillna

PandasApiTdsFrame.fillna(value=None, axis=0, inplace=False, limit=None)[source]

Fill missing values with a specified value.

Replace NA / null entries in the TDS frame. A scalar value is applied to every column; a dict maps specific columns to their fill values (columns not present in the dict are left unchanged). Implemented via COALESCE at the SQL level.

Parameters:

value (Union[int, float, str, bool, date, datetime, Dict[str, Union[int, float, str, bool, date, datetime]]]) – Value(s) to replace nulls with. Accepted scalar types are int, float, str, bool, date, and datetime. When a dict is provided, keys must be column name strings and values must be scalars of the above types. Columns in the dict that do not exist in the frame are silently ignored. Omitting value entirely raises ValueError.
axis (Union[str, int, None]) – Only 0 / 'index' is supported. 1 / 'columns' raises NotImplementedError.
inplace (bool) – Must be False. True raises NotImplementedError.
limit (Optional[int]) – Not supported. Passing any value raises NotImplementedError.

Returns:

A new TDS frame with null values replaced.

Return type:

PandasApiTdsFrame

Raises:

ValueError – If value is not provided. If axis is not a recognised value.
TypeError – If value is not a scalar or dict. If dict keys are not strings or dict values are not scalars.
NotImplementedError – If axis=1, inplace=True, or limit is set.

See also

dropna: Remove rows with missing values.

Notes

Differences from pandas:

The method parameter ('ffill', 'bfill') available in older pandas versions is not present.
inplace=True is not supported; a new frame is always returned.
limit (maximum number of consecutive nulls to fill) is not supported.
axis=1 (fill along columns) is not supported.

Examples

Download Interactive Notebook

import datetime
import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame = frame.sort_values("Shipped Date")
frame = frame.head()

# check initial count of all the non-null values
frame.to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name
0	11059	1998-04-29	1998-06-10	NaT	Ricardo Adocicados
1	11058	1998-04-29	1998-05-27	NaT	Blauer See Delikatessen
2	11054	1998-04-28	1998-05-26	NaT	Cactus Comidas para llevar
3	11051	1998-04-27	1998-05-25	NaT	La maison d'Asie
4	11045	1998-04-23	1998-05-21	NaT	Bottom-Dollar Markets

# Fill all null values of the "Shipped Date" column with a fixed date
frame = frame.fillna({
    "Shipped Date": datetime.date(1, 1, 1)
})
frame.to_pandas()

filter

PandasApiTdsFrame.filter(items=None, like=None, regex=None, axis=None)[source]

Select columns by label, substring match, or regular expression.

This method selects columns from the TDS frame based on their names. Exactly one of items, like, or regex must be provided; they are mutually exclusive.

Parameters:

items (Optional[List[str]]) – Exact column names to keep. All names must exist in the frame.
like (Optional[str]) – Keep columns whose names contain this substring.
regex (Optional[str]) – Keep columns whose names match this regular expression (uses re.search).
axis (Union[str, int, PyLegendInteger, None]) – The axis to filter on. Only column-axis filtering is supported. Defaults to 1 (columns) when omitted.

Returns:

A new TDS frame containing only the selected columns.

Return type:

PandasApiTdsFrame

Raises:

TypeError – If more than one of items, like, or regex is provided, or if none of them is provided. If items is a string instead of a list, or if like / regex is not a string.
ValueError – If axis is not 1 or 'columns'. If any name in items does not exist in the frame. If no columns match the like substring or regex pattern. If the regex pattern is invalid.

See also

assign: Add or overwrite columns.
drop: Remove columns by label.
rename: Rename columns.

Notes

Differences from pandas:

In pandas, filter supports both row-axis (axis=0) and column-axis (axis=1) filtering. Here, only column-axis filtering is supported (axis=1 or axis='columns'). Passing axis=0 or 'index' raises ValueError.
In pandas, items silently ignores names that do not exist in the frame. Here, all names must exist; unknown names raise a ValueError listing the missing and available columns.
In pandas, like and regex return an empty DataFrame when no columns match. Here, they raise ValueError when no columns match.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Select specific columns by name
frame.filter(items=["Order Id", "Ship Name"]).head(3).to_pandas()

	Order Id	Ship Name
0	10248	Vins et alcools Chevalier
1	10249	Toms Spezialitäten
2	10250	Hanari Carnes

# Select columns whose names contain a substring
frame.filter(like="Ship").head(3).to_pandas()

	Shipped Date	Ship Name
0	1996-07-16	Vins et alcools Chevalier
1	1996-07-10	Toms Spezialitäten
2	1996-07-12	Hanari Carnes

# Select columns matching a regex pattern
frame.filter(regex="^Ship").head(3).to_pandas()

	Shipped Date	Ship Name
0	1996-07-16	Vins et alcools Chevalier
1	1996-07-10	Toms Spezialitäten
2	1996-07-12	Hanari Carnes

# Chain filters to progressively narrow columns
frame.filter(like="Ship").filter(regex="Name$").head(3).to_pandas()

	Ship Name
0	Vins et alcools Chevalier
1	Toms Spezialitäten
2	Hanari Carnes

groupby

PandasApiTdsFrame.groupby(by, level=None, as_index=False, sort=True, group_keys=False, observed=False, dropna=False)[source]

Group the TDS frame by one or more columns.

Return a PandasApiGroupbyTdsFrame object that can be used to apply aggregation functions (sum, mean, min, max, std, var, count, or the general aggregate/agg) and OLAP window functions (rank) to each group. Column selection after grouping is supported via bracket notation (e.g. frame.groupby("A")["B"].sum()).

The groupby columns act as the PARTITION BY clause in the underlying SQL when window functions such as rank are used.

Parameters:

by (Union[str, List[str]]) – Column name or list of column names to group by. All names must exist in the current frame.
level (Union[str, int, List[str], None]) – Not supported. Passing any value raises NotImplementedError. Use by instead.
as_index (bool) – Must be False. Setting to True raises NotImplementedError.
sort (bool) – Whether to sort the result by the grouping columns after aggregation.
group_keys (bool) – Must be False. Setting to True raises NotImplementedError.
observed (bool) – Must be False. Setting to True raises NotImplementedError.
dropna (bool) – Must be False. Setting to True raises NotImplementedError.

Returns:

A groupby object on which aggregation and window methods can be called. See PandasApiGroupbyTdsFrame for the full list of available methods.

Return type:

PandasApiGroupbyTdsFrame

Raises:

NotImplementedError – If level, as_index=True, group_keys=True, observed=True, or dropna=True is provided.
TypeError – If by is not a string or list of strings.
ValueError – If by is an empty list.
KeyError – If any column in by does not exist in the frame.

See also

aggregate: Aggregate without grouping.
sum: Convenience shorthand for sum aggregation.
count: Convenience shorthand for count aggregation.

Notes

Differences from pandas:

as_index defaults to False and must be False. In pandas it defaults to True. This means the grouping columns always appear as regular columns in the result, never as the index.
group_keys, observed, and dropna must all be False; their True variants are not supported.
level (grouping by index level) is not supported.
The groupby object supports column selection via [col] (returns a GroupbySeries) or [[col1, col2]] (returns a narrowed PandasApiGroupbyTdsFrame), matching the pandas pattern frame.groupby(...)["col"].sum().
When sort=True (default), the result is sorted by the grouping columns in ascending order after aggregation.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Group by a single column and count
frame.groupby("Ship Name")["Order Id"].count().head(5).to_pandas()

# Group by a column and sum a numeric column
frame.groupby("Ship Name")["Order Id"].sum().head(5).to_pandas()

# Group by a column with dict-based aggregation
frame.groupby("Ship Name").agg({"Order Id": "count"}).head(5).to_pandas()

Note

The returned PandasApiGroupbyTdsFrame object has its own set of aggregation and window methods whose signatures may differ from the frame-level equivalents. See Pandas Groupby TDS Frame for the full API reference.

head

PandasApiTdsFrame.head(n=5)[source]

Return the first n rows of the TDS frame.

This function returns the first n rows from the frame. It is useful for quickly inspecting the data without loading the entire dataset.

Parameters:

n (int) – Number of rows to return. Must be a non-negative integer. Passing a negative value raises NotImplementedError. Passing a non-int type raises TypeError.

Returns:

A new TDS frame containing only the first n rows.

Return type:

PandasApiTdsFrame

Raises:

TypeError – If n is not an int.
NotImplementedError – If n is negative.

See also

drop: Remove rows or columns by label.
truncate: Truncate rows before and/or after some index value.
iloc: Select rows by integer-location based indexing.

Notes

Differences from pandas:

Negative values for ``n`` are not supported. In pandas, head(-n) returns all rows except the last n. Here, passing a negative value raises NotImplementedError.
The operation is lazy — it builds a query rather than materialising rows in memory. Call to_pandas() or execute_frame_to_string() to materialise the result.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Get first 5 rows (default)
frame.head().to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes
3	10251	1996-07-08	1996-08-05	1996-07-15	Victuailles en stock
4	10252	1996-07-09	1996-08-06	1996-07-11	Suprêmes délices

# Get first 3 rows
frame.head(3).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes

iloc

property PandasApiTdsFrame.iloc: PandasApiIlocIndexer

Purely integer-location based indexing for selection by position.

Access rows and columns by integer position (0-based). Returns a PandasApiIlocIndexer that supports [] notation.

Allowed inputs:

An integer — selects a single row (e.g. frame.iloc[5]).
A slice with ints — selects a range of rows (e.g. frame.iloc[1:7]). Only step=1 (or None) is supported.
A tuple of (rows, cols) — selects rows and columns simultaneously (e.g. frame.iloc[1:5, 0:2]). Each element can be an int or a slice.

Returns:

An indexer object supporting [] notation that returns a new PandasApiTdsFrame.

Return type:

PandasApiIlocIndexer

Raises:

IndexError – If more than two indexers are provided. If a column integer index is out of bounds.
NotImplementedError – If a slice step other than 1 is used for rows or columns. If a list, boolean array, or callable is used as an indexer.

See also

loc: Label-based indexing (row filtering + column selection).
head: Return the first n rows.
truncate: Select rows by index range.
filter: Select columns by name.

Notes

Differences from pandas:

Only int and slice indexers are supported. Lists of integers, boolean arrays, and callable indexers raise NotImplementedError.
Slice steps other than 1 are not supported.
Negative integer indexing for rows is handled via truncate, so it follows truncate’s limitations.
When a single integer row index exceeds the number of rows, an empty frame is returned (no IndexError is raised, unlike pandas).

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Select a single row
frame.iloc[0].to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier

# Select a range of rows and columns
frame.iloc[1:4, 0:2].to_pandas()

	Order Id	Order Date
0	10249	1996-07-05
1	10250	1996-07-08
2	10251	1996-07-08

# Select a single row, all columns
frame.iloc[2, :].to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name
0	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes

info

PandasApiTdsFrame.info(verbose=None, buf=None, max_cols=None, memory_usage=None, show_counts=None)[source]

Print a concise summary of the TDS frame.

Displays the column names and their data types. This is a lightweight alternative to running a query — it uses only the metadata already available on the frame.

Parameters:

verbose (Optional[bool]) – Not supported. Ignored.
buf (Union[IO[str], StringIO, None]) – Not supported. Output always goes to stdout.
max_cols (Optional[int]) – Not supported. Ignored.
memory_usage (Union[bool, str, None]) – Not supported. Ignored.
show_counts (Optional[bool]) – Not supported. Ignored.

Returns:

Prints to stdout; returns nothing.

Return type:

None

Notes

Differences from pandas:

Only column names and types are shown.
memory_usage, verbose, buf, max_cols, and show_counts are accepted for API compatibility but ignored.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame.info()

<class 'pylegend.extensions.tds.pandas_api.frames.pandas_api_legend_service_input_frame.PandasApiLegendServiceInputFrame'>
RangeIndex: 830 entries
Data columns (total 5 columns):
#  Column         Non-Null Count  Dtype     
-  -------------  --------------  ----------
0  Order Id       830 non-null    Integer   
1  Order Date     830 non-null    StrictDate
2  Required Date  830 non-null    StrictDate
3  Shipped Date   809 non-null    StrictDate
4  Ship Name      830 non-null    String    
dtypes: Integer(1), StrictDate(3), String(1)

join

PandasApiTdsFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False, validate=None)[source]

Join this TDS frame with another on shared column(s).

Convenience method that delegates to merge(). The lsuffix and rsuffix parameters are mapped to the suffixes parameter of merge, and on is passed directly.

Parameters:

other (PandasApiTdsFrame) – The right TDS frame to join with.
on (Union[str, Sequence[str], None]) – Column name(s) to join on. Must exist in both frames. Unlike pandas join, this parameter specifies column names, not index labels.
how (Optional[str]) – Type of join. See merge() for details.
lsuffix (str) – Suffix to apply to overlapping column names from the left frame.
rsuffix (str) – Suffix to apply to overlapping column names from the right frame.
sort (Optional[bool]) – If True, sort the result by the join keys.
validate (Optional[str]) – Not supported. Passing any value raises NotImplementedError.

Returns:

A new TDS frame containing the joined result.

Return type:

PandasApiTdsFrame

Raises:

ValueError – If overlapping column names exist and lsuffix / rsuffix do not resolve the conflict.
NotImplementedError – If validate is set.

See also

merge: The underlying merge method with full parameter control.

Notes

Differences from pandas:

In pandas, DataFrame.join joins on the index by default, optionally using on to specify a column in the left frame to match against the right frame’s index. Here, join is purely column-on-column and delegates directly to merge(on=on). There is no index-based joining.
The lsuffix and rsuffix parameters correspond to suffixes=(lsuffix, rsuffix) in merge. In pandas, default suffixes are empty strings (raising on conflict); here they also default to empty strings.
Because this delegates to merge, all limitations of merge apply: no self-join, no left_index / right_index, no indicator, and no validate.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Create a second frame with renamed columns
frame2 = pylegend.samples.pandas_api.northwind_orders_frame()
frame2 = frame2.rename({"Order Id": "Right Order Id"})

# Left join on a common key
frame.head(5).join(
    frame2.head(5),
    on="Ship Name",
    how="left",
    lsuffix="_left",
    rsuffix="_right"
).to_pandas()

	Order Id	Order Date_left	Required Date_left	Shipped Date_left	Ship Name	Right Order Id	Order Date_right	Required Date_right	Shipped Date_right
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier	10248	1996-07-04	1996-08-01	1996-07-16
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten	10249	1996-07-05	1996-08-16	1996-07-10
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes	10250	1996-07-08	1996-08-05	1996-07-12
3	10251	1996-07-08	1996-08-05	1996-07-15	Victuailles en stock	10251	1996-07-08	1996-08-05	1996-07-15
4	10252	1996-07-09	1996-08-06	1996-07-11	Suprêmes délices	10252	1996-07-09	1996-08-06	1996-07-11

loc

property PandasApiTdsFrame.loc: PandasApiLocIndexer

Access rows and columns by label-based indexing or boolean conditions.

Returns a PandasApiLocIndexer that supports [] notation for combined row filtering and column selection.

Row selection (first indexer):

Complete slice :: Select all rows.
Boolean expression: A PyLegendBoolean expression built from column comparisons (e.g. frame['col'] > 5), used as a WHERE filter.
Callable: A function that receives the frame and returns a PyLegendBoolean expression (e.g. lambda x: x['col'] > 5).

Column selection (second indexer):

``str``: A single column name (e.g. 'col1').
``list of str``: Multiple column names.
``list of bool``: Boolean mask over columns (must match the number of columns exactly).
``slice of str``: Label-based column slice (e.g. 'col1':'col3'), inclusive on both ends.
Complete slice :: Select all columns.

Returns:

An indexer object supporting [] notation that returns a new PandasApiTdsFrame.

Return type:

PandasApiLocIndexer

Raises:

IndexError – If more than two indexers are provided. If a boolean column mask has the wrong length.
TypeError – If a label-based slice is used for rows (only : is allowed). If a list of integers, a set, or another unsupported type is used for row or column selection.
KeyError – If a column name in a list does not exist in the frame.

See also

iloc: Integer-position based indexing.
filter: Select columns by name.
head: Return the first n rows.

Notes

Differences from pandas:

For row selection, only :, boolean expressions, and callables are supported. Integer label selection, integer slicing, and list-of-integer selection are not supported.
Label-based row slicing (e.g. frame.loc[2:5]) is not supported — only the complete slice : is allowed.
For column selection, string labels, lists of strings, boolean masks, and label-based slices are supported. Label slices use pandas.Index.slice_indexer internally, so slice semantics are inclusive on both ends (matching pandas loc behaviour).
If a label-based column slice resolves to an empty selection, an empty frame (zero rows) is returned via head(0).

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Select specific columns
frame.loc[:, "Ship Name"].head(3).to_pandas()

	Ship Name
0	Vins et alcools Chevalier
1	Toms Spezialitäten
2	Hanari Carnes

# Filter rows with a boolean condition and select columns
frame.loc[frame["Order Id"] > 10300, ["Order Id", "Ship Name"]].head(5).to_pandas()

	Order Id	Ship Name
0	10301	Die Wandernde Kuh
1	10302	Suprêmes délices
2	10303	Godos Cocina Típica
3	10304	Tortuga Restaurante
4	10305	Old World Delicatessen

# Filter rows with a callable
frame.loc[
    lambda x: x["Ship Name"].startswith("A"),
    ["Order Id", "Ship Name"]
].head(5).to_pandas()

	Order Id	Ship Name
0	10308	Ana Trujillo Emparedados y helados
1	10355	Around the Horn
2	10365	Antonio Moreno Taquería
3	10383	Around the Horn
4	10453	Around the Horn

# Boolean column mask
frame.loc[:, [True, False]].head(3).to_pandas()

max

PandasApiTdsFrame.max(axis=0, skipna=True, numeric_only=False, **kwargs)[source]

Compute the maximum value of each column.

Convenience method equivalent to aggregate('max'). Returns a single-row TDS frame with the maximum value of every column. For string columns, returns the lexicographically largest value.

Parameters:

axis (Union[int, str]) – Only 0 / 'index' is supported.
skipna (bool) – Must be True. False is not supported.
numeric_only (bool) – Must be False. True is not supported.
**kwargs (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Not supported.

Returns:

A single-row TDS frame with column maximums.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If any parameter is set to an unsupported value.

See also

aggregate: General aggregation method.
min: Compute column minimums.

Notes

Internally delegates to aggregate('max'). The same pandas deviations as sum() apply.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Maximum of each column
frame.filter(items=["Order Id"]).max().to_pandas()

	Order Id
0	11077

mean

PandasApiTdsFrame.mean(axis=0, skipna=True, numeric_only=False, **kwargs)[source]

Compute the mean of each column.

Convenience method equivalent to aggregate('mean'). Returns a single-row TDS frame with the arithmetic mean of every column.

Parameters:

axis (Union[int, str]) – Only 0 / 'index' is supported.
skipna (bool) – Must be True. False is not supported.
numeric_only (bool) – Must be False. True is not supported.
**kwargs (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Not supported.

Returns:

A single-row TDS frame with column means.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If any parameter is set to an unsupported value.

See also

aggregate: General aggregation method.
sum: Compute column sums.
std: Compute column standard deviations.

Notes

Internally delegates to aggregate('mean'). The same pandas deviations as sum() apply (skipna=False, numeric_only=True, axis=1 are not supported).

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Mean of numeric columns
frame.filter(items=["Order Id"]).mean().to_pandas()

	Order Id
0	10662.5

merge

PandasApiTdsFrame.merge(other, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), indicator=False, validate=None)[source]

Merge this TDS frame with another using a database-style join.

Combine two frames column-wise based on common columns or explicit key specifications. Supports inner, left, right, outer (full), and cross joins.

Parameters:

other (PandasApiTdsFrame) – The right TDS frame to merge with. Must be a different frame instance; merging a frame with itself raises NotImplementedError.
how (Optional[str]) –
Type of merge:
- 'inner' : Only rows with matching keys in both frames.
- 'left' : All rows from the left frame, NaN-filled for non-matching right rows.
- 'right' : All rows from the right frame, NaN-filled for non-matching left rows.
- 'outer' : All rows from both frames (FULL OUTER JOIN).
- 'cross' : Cartesian product of both frames. No join keys may be specified.
on (Union[str, Sequence[str], None]) – Column name(s) to join on. Must exist in both frames. Mutually exclusive with left_on / right_on.
left_on (Union[str, Sequence[str], None]) – Column name(s) from the left frame to join on.
right_on (Union[str, Sequence[str], None]) – Column name(s) from the right frame to join on. Must have the same length as left_on.
left_index (Optional[bool]) – Not supported. Setting to True raises NotImplementedError.
right_index (Optional[bool]) – Not supported. Setting to True raises NotImplementedError.
sort (Optional[bool]) – If True, sort the result by the join keys in ascending order.
suffixes (Union[Tuple[Optional[str], Optional[str]], List[Optional[str]], None]) – Suffixes to apply to overlapping non-key column names from the left and right frames respectively. Use None to indicate that the column name from the respective frame should be left as-is (will raise if this causes duplicates).
indicator (Union[bool, str, None]) – Not supported. Setting to a truthy value raises NotImplementedError.
validate (Optional[str]) – Not supported. Passing any value raises NotImplementedError.

Returns:

A new TDS frame containing the merged result.

Return type:

PandasApiTdsFrame

Raises:

TypeError – If other is not a PandasApiTdsFrame. If how, on, left_on, right_on, suffixes, or sort have invalid types.
ValueError – If both on and left_on/right_on are specified. If left_on and right_on have different lengths. If no merge keys can be resolved and how is not 'cross'. If how='cross' is used with on/left_on/ right_on. If how is not a recognised join method. If the resulting columns contain duplicates after suffix application.
KeyError – If a key specified in on, left_on, or right_on does not exist in the corresponding frame.
NotImplementedError – If left_index=True, right_index=True, indicator is truthy, validate is set, or the frame is merged with itself.

See also

join: Convenience wrapper around merge with simpler syntax.

Notes

Differences from pandas:

Self-merge is not supported. Merging a frame with itself raises NotImplementedError.
Index-based merging is not supported. left_index and right_index must be False.
``indicator`` and ``validate`` parameters are not supported.
When no join keys are provided (and how is not 'cross'), the merge infers keys from the intersection of column names between the two frames. If no common columns exist, a ValueError is raised (unlike pandas, which would raise a MergeError).
how='outer' maps to a FULL OUTER JOIN at the SQL level.
how='cross' is implemented as a CROSS JOIN in SQL, but mapped to JoinKind.INNER with a 1==1 condition in the PURE query representation.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Create a second frame for joining
frame2 = pylegend.samples.pandas_api.northwind_orders_frame()
frame2 = frame2.rename({"Order Id": "Right Order Id"})

# Inner merge on a common column
frame.head(5).merge(
    frame2.head(5),
    how="inner",
    left_on="Order Id",
    right_on="Right Order Id"
).to_pandas()

	Order Id	Order Date_x	Required Date_x	Shipped Date_x	Ship Name_x	Right Order Id	Order Date_y	Required Date_y	Shipped Date_y	Ship Name_y
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes
3	10251	1996-07-08	1996-08-05	1996-07-15	Victuailles en stock	10251	1996-07-08	1996-08-05	1996-07-15	Victuailles en stock
4	10252	1996-07-09	1996-08-06	1996-07-11	Suprêmes délices	10252	1996-07-09	1996-08-06	1996-07-11	Suprêmes délices

min

PandasApiTdsFrame.min(axis=0, skipna=True, numeric_only=False, **kwargs)[source]

Compute the minimum value of each column.

Convenience method equivalent to aggregate('min'). Returns a single-row TDS frame with the minimum value of every column. For string columns, returns the lexicographically smallest value.

Parameters:

axis (Union[int, str]) – Only 0 / 'index' is supported.
skipna (bool) – Must be True. False is not supported.
numeric_only (bool) – Must be False. True is not supported.
**kwargs (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Not supported.

Returns:

A single-row TDS frame with column minimums.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If any parameter is set to an unsupported value.

See also

aggregate: General aggregation method.
max: Compute column maximums.

Notes

Internally delegates to aggregate('min'). The same pandas deviations as sum() apply.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Minimum of each column
frame.filter(items=["Order Id"]).min().to_pandas()

	Order Id
0	10248

ntile_legend_ext

PandasApiTdsFrame.ntile_legend_ext(num_buckets, ascending=True)[source]

Assign rows to numbered buckets for each column.

PyLegend extension — not present in pandas.

Maps to SQL NTILE(n) OVER (ORDER BY col) and Pure ntile.

Parameters:

num_buckets (int) – Number of buckets to distribute rows into.
ascending (bool) – Whether to order in ascending direction.

Returns:

A new TDS frame with integer bucket numbers (1-based) replacing every column.

Return type:

PandasApiTdsFrame

See also

rank: Compute column ranks.
cume_dist_legend_ext: Cumulative distribution.

Notes

Differences from pandas:

This method has no pandas equivalent. NTILE is exposed as a pylegend extension.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

frame.filter(
    items=["Order Id"]
).ntile_legend_ext(4).head(5).to_pandas()

	Order Id
0	1
1	1
2	1
3	1
4	1

range_between

PandasApiTdsFrame.range_between(start=None, end=None, *, duration_start=None, duration_start_unit=None, duration_end=None, duration_end_unit=None)[source]

Create a RANGE BETWEEN window-frame specification.

PyLegend extension — not present in pandas.

Supports two calling styles:

Simple numeric bounds (same sign convention as rows_between()):

range_between(start=-100, end=0)
# → RANGE BETWEEN 100 PRECEDING AND CURRENT ROW

Duration-based bounds (for date/time ORDER BY columns):

range_between(
    duration_start=-1, duration_start_unit="DAYS",
    duration_end=1, duration_end_unit="MONTHS",
)

Parameters:

start (Union[int, float, Decimal, None]) – Lower bound of the range. None means unbounded preceding.
end (Union[int, float, Decimal, None]) – Upper bound of the range. None means unbounded following.
duration_start (Union[int, float, Decimal, str, None]) – Duration-based lower bound. Pass "unbounded" for unbounded preceding.
duration_start_unit (Optional[str]) – Time unit for duration_start (e.g. "DAYS", "MONTHS").
duration_end (Union[int, float, Decimal, str, None]) – Duration-based upper bound.
duration_end_unit (Optional[str]) – Time unit for duration_end.

Returns:

A frame specification to pass to window_frame_legend_ext().

Return type:

RangeBetween

Raises:

ValueError – If positional bounds and duration bounds are mixed, or if start is greater than end.

See also

rows_between: Create a ROWS BETWEEN specification.
window_frame_legend_ext: Apply a custom window specification.

Notes

Differences from pandas:

This method has no pandas equivalent. It is a pylegend extension for constructing SQL RANGE BETWEEN clauses.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Numeric range: 100 preceding to current row
spec = frame.range_between(-100, 0)

rank

PandasApiTdsFrame.rank(axis=0, method='min', numeric_only=False, na_option='bottom', ascending=True, pct=False)[source]

Compute the rank of values in each column.

Replace every column’s values with their rank within that column. Each column is ranked independently using an SQL window function (RANK, DENSE_RANK, ROW_NUMBER, or PERCENT_RANK).

The result is a new frame with the same column names but all values replaced by their integer (or float when pct=True) rank.

Parameters:

axis (Union[int, str]) – Only 0 / 'index' is supported. 1 raises NotImplementedError.
method (str) –
How to rank equal values:
- 'min' : Lowest rank in the group of ties (SQL RANK()).
- 'first' : Ranks assigned in order of appearance (SQL ROW_NUMBER()).
- 'dense' : Like 'min' but ranks always increase by 1, no gaps (SQL DENSE_RANK()).
numeric_only (bool) – If True, only rank columns of numeric type (Integer, Float, Number). Non-numeric columns are excluded from the result.
na_option (str) – How to rank null values. Only 'bottom' is supported. 'keep' and 'top' raise NotImplementedError.
ascending (bool) – Whether to rank in ascending order. False ranks in descending order.
pct (bool) – If True, compute percentage ranks (SQL PERCENT_RANK()). Result columns are of float type. Can only be used with method='min'.

Returns:

A new TDS frame where every column contains integer ranks (or float when pct=True).

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If axis is not 0 or 'index'. If method is not one of 'min', 'first', 'dense' (e.g. 'average' and 'max' are not supported). If na_option is not 'bottom'. If pct=True with a method other than 'min'.

See also

PandasApiGroupbyTdsFrame.rank: Rank within groups.
sort_values: Sort the frame by column values.

Notes

Differences from pandas:

The 'average' and 'max' ranking methods are not supported. Only 'min', 'first', and 'dense' are available.
na_option only supports 'bottom'. 'keep' and 'top' raise NotImplementedError.
pct=True is only supported with method='min' (maps to PERCENT_RANK()). Combining pct=True with other methods raises NotImplementedError.
When applied to the full frame (not via a Series), all columns are replaced by their ranks. To append a rank column instead, use bracket assignment on a single-column Series: frame["rank_col"] = frame["col"].rank().
Combining multiple rank calls in a single expression is not supported (e.g. frame["col1"].rank() + frame["col2"].rank()). Compute them in separate assignment steps instead.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Rank all columns (replaces values with ranks)
frame.filter(items=["Order Id"]).rank().head(5).to_pandas()

	Order Id
0	1
1	2
2	3
3	4
4	5

# Append a percentage rank column via Series assignment
frame["Order Rank"] = frame["Order Id"].rank(pct=True)
frame.head(5).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name	Order Rank
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier	0.0
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten	0.001206
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes	0.002413
3	10251	1996-07-08	1996-08-05	1996-07-15	Victuailles en stock	0.003619
4	10252	1996-07-09	1996-08-06	1996-07-11	Suprêmes délices	0.004825

rename

PandasApiTdsFrame.rename(mapper=None, index=None, columns=None, axis=1, inplace=False, copy=True, level=None, errors='ignore')[source]

Rename columns of the TDS frame.

Alter column labels using a mapping (dict) or a callable function applied to each column name.

Parameters:

mapper (Union[Dict[str, str], Callable[[str], str], None]) – Mapping of old column names to new column names, or a callable that transforms each column name (e.g. str.upper). Used when axis=1 (columns). Cannot be specified together with columns.
index (Union[Dict[str, str], Callable[[str], str], None]) – Not supported. Passing any value raises NotImplementedError.
columns (Union[Dict[str, str], Callable[[str], str], None]) – Alternative to mapper for renaming columns. Mutually exclusive with mapper when both are provided alongside axis.
axis (Union[str, int]) – Axis to target. Only 1 / 'columns' is supported. 0 / 'index' raises NotImplementedError.
inplace (bool) – Must be False. True raises NotImplementedError.
copy (bool) – Must be True. False raises NotImplementedError.
level (Union[str, int, None]) – Not supported. Passing any value raises NotImplementedError.
errors (str) – If 'raise', raise a KeyError when a key in the mapping does not exist as a column name. If 'ignore', silently skip non-existent keys.

Returns:

A new TDS frame with renamed columns.

Return type:

PandasApiTdsFrame

Raises:

TypeError – If mapper or columns is not a dict or callable. If copy or inplace is not a bool.
ValueError – If both mapper (with axis) and columns/ index are specified simultaneously. If axis is not a supported value. If errors is not 'ignore' or 'raise'. If the rename produces duplicate column names.
KeyError – If errors='raise' and a key in the mapping does not exist in the frame’s columns.
NotImplementedError – If axis=0/'index', index is set, level is set, copy=False, or inplace=True.

See also

filter: Select columns by name.
drop: Remove columns.
assign: Add or overwrite columns.

Notes

Differences from pandas:

Only column renaming is supported (axis=1). Index renaming (axis=0) raises NotImplementedError.
inplace=True is not supported; a new frame is always returned.
copy=False is not supported.
level (multi-level index) is not supported.
The index parameter is not supported.
When using a callable, it is applied to every column name (e.g. str.upper will uppercase all column names).
If errors='ignore' (the default), keys in the mapping that do not match any column are silently ignored, matching pandas behaviour.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Rename with a dict
frame.rename({"Order Id": "OrderId", "Ship Name": "ShipName"}).head(3).to_pandas()

	OrderId	Order Date	Required Date	Shipped Date	ShipName
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes

# Rename with a callable
frame.rename(str.upper).head(3).to_pandas()

	ORDER ID	ORDER DATE	REQUIRED DATE	SHIPPED DATE	SHIP NAME
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes

# Rename via the columns parameter
frame.rename(columns={"Order Id": "order_id"}).head(3).to_pandas()

	order_id	Order Date	Required Date	Shipped Date	Ship Name
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes

rolling

PandasApiTdsFrame.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None, step=None, method=None, order_by=None, ascending=True)[source]

Create a fixed-size sliding window frame for window-aggregate computations.

A rolling window includes a fixed number of preceding rows (and optionally the current row) for each row, enabling moving averages, moving sums, and similar calculations.

Parameters:

window (int) – Size of the moving window (number of rows).
min_periods (Optional[int]) – Minimum number of observations in the window required to have a value. Defaults to window.
center (bool) – Not supported. Must be False.
win_type (Optional[str]) – Not supported. Must be None.
on (Optional[str]) – Not supported. Must be None.
axis (Union[int, str]) – Only 0 / 'index' is supported.
closed (Optional[str]) – Not supported. Must be None.
step (Optional[int]) – Not supported. Must be None.
method (Optional[str]) – Must be None or 'python'.
order_by (Union[str, Sequence[str], None]) – Column(s) to order by within the window. Required for deterministic results.
ascending (Union[bool, Sequence[bool]]) – Sort order for the order_by columns.

Returns:

A window frame on which window aggregates (sum, mean, min, max, etc.) can be called.

Return type:

PandasApiWindowTdsFrame

See also

expanding: Expanding (cumulative) window.
groupby: Group rows before applying window functions.

Raises:: NotImplementedError – If center, win_type, on, closed, or step are set to non-default values. Also raised if axis is not 0 or method is not None / 'python'.

Notes

Differences from pandas:

order_by and ascending are pylegend extensions not present in pandas. They control the ORDER BY clause inside the SQL OVER(...) window specification.
center, win_type, on, closed, step are not supported.
axis=1 is not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# 3-row moving average of Order Id ordered by Order Id
frame.filter(items=["Order Id"]).rolling(
    window=3, order_by="Order Id"
).aggregate("mean").head(5).to_pandas()

	Order Id
0	10248.0
1	10248.5
2	10249.0
3	10250.0
4	10251.0

rows_between

PandasApiTdsFrame.rows_between(start=None, end=None)[source]

Create a ROWS BETWEEN window-frame specification.

PyLegend extension — not present in pandas.

Sign convention (same as legendQL):

None → UNBOUNDED (PRECEDING for start, FOLLOWING for end)
Negative → PRECEDING (e.g. -3 → 3 PRECEDING)
0 → CURRENT ROW
Positive → FOLLOWING (e.g. 2 → 2 FOLLOWING)

Parameters:

start (Optional[int]) – Lower bound of the frame. None means unbounded preceding.
end (Optional[int]) – Upper bound of the frame. None means unbounded following.

Returns:

A frame specification to pass to window_frame_legend_ext().

Return type:

RowsBetween

Raises:

ValueError – If start is greater than end.

See also

range_between: Create a RANGE BETWEEN specification.
window_frame_legend_ext: Apply a custom window specification.

Notes

Differences from pandas:

This method has no pandas equivalent. It is a pylegend extension for constructing SQL ROWS BETWEEN clauses.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# 3-row trailing window (current row and 2 preceding)
spec = frame.rows_between(-2, 0)

shape

property PandasApiTdsFrame.shape: Tuple[int, int]

Return the dimensionality of the TDS frame as (rows, columns).

Warning

Unlike pandas.DataFrame.shape, this property executes the frame against the server to determine the row count. It issues a COUNT aggregation query, so every access incurs a round-trip to the database.

Returns:: A tuple (number_of_rows, number_of_columns).
Return type:: tuple of (int, int)

See also

head: Return the first n rows (lazy, no execution).
count: Count non-null values per column (returns a frame).

Notes

Differences from pandas:

In pandas, DataFrame.shape is an O(1) metadata lookup that never triggers computation. Here, shape executes the current frame to obtain the row count via a COUNT aggregation query. This means it requires a live connection to the database. This will fail on non-executable frames.
The result type is always (int, int); there is no lazy evaluation.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Get the shape (triggers server execution)
frame.head(5).shape

(np.int64(5), 5)

shift

PandasApiTdsFrame.shift(order_by, periods=1, freq=None, axis=0, fill_value=None, suffix=None)[source]

Shift values by desired number of periods.

Replace every column’s values with their shifted values. Because underlying TDS is inherently unordered, this requires an explicit order_by parameter to define the ordering for the window function (LAG or LEAD).

Parameters:

order_by (Union[str, Sequence[str]]) – Column name(s) to order the frame by before applying the shift. Unlike pandas, this is required to ensure deterministic output. All specified columns must be present in the base frame.
periods (Union[int, Sequence[int]]) – Number of periods to shift. Currently, only 1 (shift down, equivalent to SQL LAG) and -1 (shift up, equivalent to SQL LEAD) are supported. If a sequence is provided, it cannot contain duplicate values.
freq (Union[str, int, None]) – Not supported. Must be None.
axis (Union[int, str]) – Axis to shift along. Only 0 / 'index' is supported.
fill_value (Optional[Hashable]) – Not supported. Must be None. Missing values introduced by the shift will always be null.
suffix (Optional[str]) – If provided, renames the resulting shifted columns by appending this string to the original column names. This argument can only be used if periods is a sequence (not a single integer).

Returns:

A new TDS frame with the shifted columns.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If periods contains any values other than 1 or -1. If freq is not None. If axis is not 0 or 'index'. If fill_value is not None.
ValueError – If any column specified in order_by is not present in the frame. If periods contains duplicate values. If suffix is specified but periods is a single integer.

See also

rank: Rank as ascending or descending.
PandasApiGroupbyTdsFrame.shift: Shift values within groups.

Notes

Differences from pandas:

The order_by parameter is mandatory. In pandas, shift relies on the implicit order of the dataframe’s index. Here, an explicit order must be provided.
periods is strictly limited to 1 or -1. Arbitrary integer shifts are not supported.
fill_value is not supported and must be None.
The freq parameter is not supported and must be None.
axis=1 (shifting horizontally across columns) is not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Shift the entire frame down
frame.head(5).shift(
    order_by="Order Date",
    periods=1
).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name
0	<NA>	NaT	NaT	NaT	NaN
1	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier
2	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten
3	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes
4	10251	1996-07-08	1996-08-05	1996-07-15	Victuailles en stock

sort_values

PandasApiTdsFrame.sort_values(by, axis=0, ascending=True, inplace=False, kind=None, na_position='last', ignore_index=True, key=None)[source]

Sort the TDS frame by one or more columns.

Return a new TDS frame sorted by the values in the specified column(s). Supports ascending and descending sort order per column.

Parameters:

by (Union[str, List[str]]) – Column name or list of column names to sort by. All names must exist in the current frame.
axis (Union[str, int]) – Axis along which to sort. Only 0 / 'index' (row-wise sorting) is supported.
ascending (Union[bool, List[bool]]) – Sort order. If a list, must have the same length as by.
inplace (bool) – Must be False. In-place mutation is not supported.
kind (Optional[str]) – Not supported. Must be None; passing any value raises NotImplementedError.
na_position (str) – Position of null values. Accepted but handled at the SQL engine level.
ignore_index (bool) – Must be True. Setting to False raises ValueError.
key (Optional[Callable[[AbstractTdsRow], AbstractTdsRow]]) – Not supported. Must be None; passing a callable raises NotImplementedError.

Returns:

A new TDS frame sorted by the specified columns.

Return type:

PandasApiTdsFrame

Raises:

ValueError – If a column in by does not exist in the frame.
ValueError – If the length of ascending does not match by.
ValueError – If axis is not 0 or 'index'.
ValueError – If inplace is True.
ValueError – If ignore_index is False.
NotImplementedError – If kind or key is provided.

See also

head: Return the first n rows.
truncate: Select a range of rows by position.
filter: Select columns by name, substring, or regex.

Notes

Differences from pandas:

The kind parameter (sort algorithm) is not supported. Sorting is delegated to the underlying Legend Engine.
The key parameter (per-element transform before sorting) is not supported.
inplace=True is not supported; always returns a new frame.
ignore_index must be True; False is not supported because TDS frames do not have an index.
axis=1 (sorting columns) is not supported; only row-wise sorting via axis=0 is available.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Sort by a single column (ascending by default)
frame.sort_values("Ship Name").head(5).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name
0	11011	1998-04-09	1998-05-07	1998-04-13	Alfred's Futterkiste
1	10952	1998-03-16	1998-04-27	1998-03-24	Alfred's Futterkiste
2	10835	1998-01-15	1998-02-12	1998-01-21	Alfred's Futterkiste
3	10702	1997-10-13	1997-11-24	1997-10-21	Alfred's Futterkiste
4	10692	1997-10-03	1997-10-31	1997-10-13	Alfred's Futterkiste

# Sort descending
frame.sort_values("Order Id", ascending=False).head(5).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name
0	11077	1998-05-06	1998-06-03	NaT	Rattlesnake Canyon Grocery
1	11076	1998-05-06	1998-06-03	NaT	Bon app'
2	11075	1998-05-06	1998-06-03	NaT	Richter Supermarkt
3	11074	1998-05-06	1998-06-03	NaT	Simons bistro
4	11073	1998-05-05	1998-06-02	NaT	Pericles Comidas clásicas

# Sort by multiple columns with mixed directions
frame.sort_values(
    by=["Ship Name", "Order Id"],
    ascending=[True, False]
).head(5).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name
0	11011	1998-04-09	1998-05-07	1998-04-13	Alfred's Futterkiste
1	10952	1998-03-16	1998-04-27	1998-03-24	Alfred's Futterkiste
2	10835	1998-01-15	1998-02-12	1998-01-21	Alfred's Futterkiste
3	10702	1997-10-13	1997-11-24	1997-10-21	Alfred's Futterkiste
4	10692	1997-10-03	1997-10-31	1997-10-13	Alfred's Futterkiste

std

PandasApiTdsFrame.std(axis=0, skipna=True, ddof=1, numeric_only=False, **kwargs)[source]

Compute the standard deviation of each column.

Convenience method equivalent to aggregate('std') (ddof=1) or aggregate('std_dev_population') (ddof=0). Returns a single-row TDS frame with the standard deviation of every column.

Parameters:

axis (Union[int, str]) – Only 0 / 'index' is supported.
skipna (bool) – Must be True. False is not supported.
ddof (int) – Degrees of freedom. 1 for sample standard deviation (STDDEV_SAMP), 0 for population standard deviation (STDDEV_POP).
numeric_only (bool) – Must be False. True is not supported.
**kwargs (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Not supported.

Returns:

A single-row TDS frame with column standard deviations.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If ddof is not 0 or 1, or if any other parameter is set to an unsupported value.

See also

aggregate: General aggregation method.
var: Compute column variances.
mean: Compute column means.

Notes

Differences from pandas:

Only ddof=0 and ddof=1 are supported.
Internally delegates to aggregate('std') (ddof=1, maps to STDDEV_SAMP) or aggregate('std_dev_population') (ddof=0, maps to STDDEV_POP).

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Standard deviation of numeric columns
frame.filter(items=["Order Id"]).std().to_pandas()

	Order Id
0	239.744656

sum

PandasApiTdsFrame.sum(axis=0, skipna=True, numeric_only=False, min_count=0, **kwargs)[source]

Compute the sum of each column.

Convenience method equivalent to aggregate('sum'). Returns a single-row TDS frame with the sum of every column.

Parameters:

axis (Union[int, str]) – Only 0 / 'index' is supported.
skipna (bool) – Must be True. False is not supported.
numeric_only (bool) – Must be False. True is not supported.
min_count (int) – Must be 0. Non-zero values are not supported.
**kwargs (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Not supported. Passing any keyword arguments raises NotImplementedError.

Returns:

A single-row TDS frame with column sums.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If axis, skipna, numeric_only, min_count, or **kwargs are set to unsupported values.

See also

aggregate: General aggregation method.
mean: Compute column means.
count: Count non-null values per column.

Notes

Differences from pandas:

skipna=False, numeric_only=True, and non-zero min_count are not supported.
axis=1 is not supported.
Internally delegates to aggregate('sum').

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Sum of all columns
frame.filter(items=["Order Id"]).sum().to_pandas()

	Order Id
0	8849875

truncate

PandasApiTdsFrame.truncate(before=0, after=None, axis=0, copy=True)[source]

Select rows by positional index range.

Return a new TDS frame containing rows from position before (inclusive) to after (inclusive).

Parameters:

before (Union[date, str, int, None]) – Only int and None are supported. First row index to include (0-based, inclusive). Negative values are silently clamped to 0. None is treated as 0.
after (Union[date, str, int, None]) – Only int and None are supported. Last row index to include (0-based, inclusive). None means no upper bound (all remaining rows are returned). Negative values result in an empty frame.
axis (Union[str, int]) – Axis to truncate along. Only 0 / 'index' is supported.
copy (bool) – Must be True. Setting to False raises NotImplementedError.

Returns:

A new TDS frame containing only the rows in the specified positional range.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If axis not ``0 or 'index'. If copy is False. If before or after is a non-integer type (e.g. a string or date). If before or after is a non-integer type (e.g. a string or date).
ValueError – If before is greater than after (after clamping).

See also

head: Return the first n rows.
sort_values: Sort the frame before truncating.
filter: Select columns by name, substring, or regex.

Notes

Differences from pandas:

In pandas, truncate selects rows by label (index value). Here, it selects rows by positional (integer) index only (its translated to LIMIT and OFFSET of the underlying SQL engine). Passing date, str, or other label-based values for before / after raises NotImplementedError.
copy=False is not supported; a new frame is always returned.
axis=1 (truncating columns) is not supported.
Negative before values are silently clamped to 0 rather than raising an error. Negative after values result in an empty frame (zero rows).
The after parameter is inclusive (row at position after is included), matching pandas behaviour.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Get rows at positions 0 through 4 (inclusive)
frame.truncate(before=0, after=4).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name
0	10248	1996-07-04	1996-08-01	1996-07-16	Vins et alcools Chevalier
1	10249	1996-07-05	1996-08-16	1996-07-10	Toms Spezialitäten
2	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes
3	10251	1996-07-08	1996-08-05	1996-07-15	Victuailles en stock
4	10252	1996-07-09	1996-08-06	1996-07-11	Suprêmes délices

# Skip first 5 rows, keep the rest
frame.truncate(before=5).head(5).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name
0	10253	1996-07-10	1996-07-24	1996-07-16	Hanari Carnes
1	10254	1996-07-11	1996-08-08	1996-07-23	Chop-suey Chinese
2	10255	1996-07-12	1996-08-09	1996-07-15	Richter Supermarkt
3	10256	1996-07-15	1996-08-12	1996-07-17	Wellington Importadora
4	10257	1996-07-16	1996-08-13	1996-07-22	HILARION-Abastos

# Get rows at positions 2 through 6 (inclusive)
frame.truncate(before=2, after=6).to_pandas()

	Order Id	Order Date	Required Date	Shipped Date	Ship Name
0	10250	1996-07-08	1996-08-05	1996-07-12	Hanari Carnes
1	10251	1996-07-08	1996-08-05	1996-07-15	Victuailles en stock
2	10252	1996-07-09	1996-08-06	1996-07-11	Suprêmes délices
3	10253	1996-07-10	1996-07-24	1996-07-16	Hanari Carnes
4	10254	1996-07-11	1996-08-08	1996-07-23	Chop-suey Chinese

var

PandasApiTdsFrame.var(axis=0, skipna=True, ddof=1, numeric_only=False, **kwargs)[source]

Compute the variance of each column.

Convenience method equivalent to aggregate('var') (ddof=1) or aggregate('variance_population') (ddof=0). Returns a single-row TDS frame with the variance of every column.

Parameters:

axis (Union[int, str]) – Only 0 / 'index' is supported.
skipna (bool) – Must be True. False is not supported.
ddof (int) – Degrees of freedom. 1 for sample variance (VAR_SAMP), 0 for population variance (VAR_POP).
numeric_only (bool) – Must be False. True is not supported.
**kwargs (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Not supported.

Returns:

A single-row TDS frame with column variances.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If ddof is not 0 or 1, or if any other parameter is set to an unsupported value.

See also

aggregate: General aggregation method.
std: Compute column standard deviations.
mean: Compute column means.

Notes

Differences from pandas:

Only ddof=0 and ddof=1 are supported.
Internally delegates to aggregate('var') (ddof=1, maps to VAR_SAMP) or aggregate('variance_population') (ddof=0, maps to VAR_POP).

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()

# Variance of numeric columns
frame.filter(items=["Order Id"]).var().to_pandas()

	Order Id
0	57477.5

window_frame_legend_ext

PandasApiTdsFrame.window_frame_legend_ext(frame_spec, order_by=None, ascending=True)[source]

Create a custom window specification with explicit frame bounds.

PyLegend extension — not present in pandas.

Provides fine-grained control over the ROWS BETWEEN or RANGE BETWEEN clause used by window-aggregate computations.

Parameters:

frame_spec (FrameSpec) – A window-frame specification created via rows_between() or range_between().
order_by (Union[str, Sequence[str], None]) – Column(s) to order by within the window. None means no explicit ordering (a fallback will be chosen automatically).
ascending (Union[bool, Sequence[bool]]) – Sort direction(s) for the order_by columns.

Returns:

A window frame on which window aggregates (sum, mean, min, max, etc.) can be called.

Return type:

PandasApiWindowTdsFrame

Raises:

TypeError – If frame_spec is not a RowsBetween or RangeBetween.

See also

expanding: Expanding (cumulative) window.
rolling: Fixed-size sliding window.
rows_between: Create a ROWS BETWEEN specification.
range_between: Create a RANGE BETWEEN specification.

Notes

Differences from pandas:

This method has no pandas equivalent. It is a pylegend extension for explicit control over the SQL window frame.

Examples

Download Interactive Notebook

import pylegend
from pylegend.core.language.pandas_api.pandas_api_frame_spec import (
    RowsBetween,
)
frame = pylegend.samples.pandas_api.northwind_orders_frame()

spec = RowsBetween(-2, 0)
frame.filter(items=["Order Id"]).window_frame_legend_ext(
    spec, order_by="Order Id"
).sum().head(5).to_pandas()