Pandas TDS Frame

The PandasApiTdsFrame class provides a Pandas-like interface for working with TDS (Tabular Data Store) frames. It offers methods for data manipulation, filtering, aggregation, joins, and window functions.

agg

PandasApiTdsFrame.agg(func, axis=0, *args, **kwargs)[source]

Alias for aggregate(). See aggregate for full documentation.

Return type:

PandasApiTdsFrame

aggregate

PandasApiTdsFrame.aggregate(func, axis=0, *args, **kwargs)[source]

Aggregate the TDS frame using one or more operations.

Apply one or more aggregation functions across all columns or specific columns, collapsing the frame into a single-row summary. Supported aggregation strings are 'sum', 'mean', 'min', 'max', 'count', 'std', 'var', as well as aliases 'len', 'size' (both map to count), and 'average' / 'avg' (map to mean). Along with these, callables and numpy universal functions are supported.

Parameters:
  • func (Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc, List[Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc]], Mapping[Hashable, Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc, List[Union[Callable[..., Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str, ufunc]]]]]) –

    Aggregation specification. Accepted forms:

    • str : A named aggregation (e.g. 'sum') applied to every column.

    • callable : A function that receives a column’s Series proxy and returns an aggregated value (e.g. lambda x: x.sum()), applied to every column.

    • np.ufunc : A NumPy universal function (e.g. np.sum), applied to every column.

    • list : A list containing one of the above, applied to every column. Output column names are prefixed with the function name (e.g. 'sum(col)').

    • dict : A mapping of column name → aggregation (str, callable, np.ufunc, or a list of these). Only the specified columns appear in the result.

  • axis (Union[int, str]) – Axis along which to aggregate. Only 0 / 'index' is supported.

  • *args (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Not supported. Passing positional arguments raises NotImplementedError.

  • **kwargs (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Not supported. Passing keyword arguments raises NotImplementedError.

Returns:

A new single-row TDS frame with the aggregated values.

Return type:

PandasApiTdsFrame

Raises:
  • NotImplementedError – If axis is not 0 or 'index'. If extra *args or **kwargs are passed.

  • TypeError – If func is not a supported type (str, callable, np.ufunc, list, or dict). If dict keys are not strings, or dict/list values contain unsupported types.

  • ValueError – If a dict key refers to a column that does not exist in the frame.

See also

agg

Alias for aggregate.

groupby

Group rows before aggregating.

sum

Convenience method for sum aggregation.

mean

Convenience method for mean aggregation.

Notes

Differences from pandas:

  • In pandas, aggregate can return a multi-row result when multiple functions are applied (one row per function). Here, multiple functions per column produce multiple columns in a single-row result (e.g. {'col': ['min', 'max']} yields columns 'min(col)' and 'max(col)').

  • Extra *args and **kwargs are not forwarded to the aggregation function; passing them raises NotImplementedError.

  • axis=1 (column-wise aggregation) is not supported.

  • When func is a list, it must contain exactly one element. Multi-element lists behave identically to a single- element list mapping applied to every column.

Examples

Download Interactive Notebook

import pylegend
import numpy as np
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Aggregate a single column with a string function
frame.aggregate({"Order Id": "count"}).to_pandas()
Order Id
0 830
# Aggregate multiple columns with different functions
frame.aggregate({"Order Id": "min", "Ship Name": "count"}).to_pandas()
Order Id Ship Name
0 10248 830
# Broadcast a single function to all columns
frame.aggregate("count").to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name
0 830 830 830 809 830
# Use a lambda for custom aggregation
frame.aggregate({
    "Order Id": lambda x: x.max(),
    "Order Date": np.min,
    "Order Date": np.max,
    "Shipped Date": "min"
}).to_pandas()
Order Id Order Date Shipped Date
0 11077 1998-05-06 1996-07-10

apply

PandasApiTdsFrame.apply(func, axis=0, raw=False, result_type=None, args=(), by_row='compat', engine='python', engine_kwargs=None, **kwargs)[source]

Apply a function to each column of the TDS frame.

The callable receives a Series proxy for each column and must return a transformed value. The function is applied independently to every column, producing a new frame with the same column names but transformed values. Additional positional and keyword arguments can be forwarded to the callable via args and **kwargs.

Parameters:
  • func (Union[Callable[[Concatenate[Series, ParamSpec(P, bound= None)]], Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]], str]) – A function that takes a Series (column proxy) as its first argument and returns a primitive value or expression. String-based function names (e.g. 'sum') are not supported; use aggregate() for named aggregations.

  • axis (Union[int, str]) – Only column-wise application is supported (axis=0 or 'index'). Row-wise application (axis=1) raises ValueError.

  • raw (bool) – Must be False. True is not supported.

  • result_type (Optional[str]) – Must be None. Any value raises NotImplementedError.

  • args (Tuple[Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive], ...]) – Positional arguments to pass to func after the Series argument.

  • by_row (Union[bool, str]) – Must be False or 'compat'. True raises NotImplementedError.

  • engine (str) – Must be 'python'. 'numba' is not supported.

  • engine_kwargs (Optional[Dict[str, Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]]]) – Must be None. Not supported.

  • **kwargs (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Additional keyword arguments forwarded to func.

Returns:

A new TDS frame with the function applied to every column.

Return type:

PandasApiTdsFrame

Raises:
  • ValueError – If axis is not 0 or 'index'.

  • NotImplementedError – If raw=True, result_type is set, by_row=True, engine='numba', engine_kwargs is set, or func is a string.

  • TypeError – If func is not callable.

See also

assign

Add or overwrite specific columns with callables.

aggregate

Aggregate (reduce) columns to a single row.

Notes

Differences from pandas:

  • In pandas, apply with axis=0 passes each column as a pandas.Series to the function, which can return a scalar (reducing the frame) or a Series (transforming it). Here, func receives a column Series proxy and must return a scalar expression that defines a row-level transformation. This means apply always produces a frame with the same number of rows — it cannot reduce the frame the way pandas apply can.

  • Row-wise application (axis=1) is not supported.

  • String function names (e.g. 'sum') are not supported. Use aggregate() instead.

  • raw=True, result_type, engine='numba', and engine_kwargs are not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Apply a lambda to every column
frame.filter(items=["Order Id"]).apply(
    lambda x: x * 2
).head(5).to_pandas()
Order Id
0 20496
1 20498
2 20500
3 20502
4 20504
# Apply a function with extra arguments
def add_offset(series, offset, *, scale=1):
    return series * scale + offset
frame.filter(items=["Order Id"]).apply(
    add_offset, args=(100,), scale=2
).head(5).to_pandas()
Order Id
0 20596
1 20598
2 20600
3 20602
4 20604

assign

PandasApiTdsFrame.assign(**kwargs)[source]

Add or overwrite columns using keyword arguments.

Return a new TDS frame with new columns added (or existing columns overwritten). Each keyword argument defines a column name and a callable that computes the column’s value from each row.

Parameters:

**kwargs (Callable[[PandasApiTdsRow], Union[int, float, bool, str, date, datetime, PyLegendPrimitive]]) – Each keyword argument is a column name mapped to a function that takes a PandasApiTdsRow and returns a scalar value. Supported return types are int, float, bool, str, date, datetime, and PyLegendPrimitive.

Returns:

A new TDS frame with the additional (or overwritten) columns.

Return type:

PandasApiTdsFrame

Raises:

RuntimeError – If the callable returns an unsupported type (e.g. a list).

See also

filter

Select columns by name, substring, or regex.

drop

Remove columns by label.

rename

Rename existing columns.

Notes

Differences from pandas:

  • In pandas assign, each keyword argument can be a callable or a static value (e.g. frame.assign(col=5)). Here, every value must be a callable that takes a row, even for constants (e.g. frame.assign(col=lambda x: 5)).

  • Column values are accessed via typed accessor methods such as x.get_integer("col") and x.get_string("col"), or via bracket notation x["col"].

  • Returning a non-scalar type (e.g. a list) from the callable raises a RuntimeError, unlike pandas which would broadcast or create nested data.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Add a constant column
frame.assign(constant=lambda x: 100).head(3).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name constant
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 100
1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 100
2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 100
# Add a computed column derived from existing columns
frame.assign(
    ship_upper=lambda x: x.get_string("Ship Name").upper()
).head(3).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name ship_upper
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier VINS ET ALCOOLS CHEVALIER
1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten TOMS SPEZIALITÄTEN
2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes HANARI CARNES
# Overwrite an existing column
frame.assign(
    **{"Ship Name": lambda x: x.get_string("Ship Name").upper()}
).head(3).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name
0 10248 1996-07-04 1996-08-01 1996-07-16 VINS ET ALCOOLS CHEVALIER
1 10249 1996-07-05 1996-08-16 1996-07-10 TOMS SPEZIALITÄTEN
2 10250 1996-07-08 1996-08-05 1996-07-12 HANARI CARNES

cast

PandasApiTdsFrame.cast(column_type_map)[source]

Change the declared type of one or more columns.

Return a new TDS frame whose column metadata reflects the requested type changes. The underlying data is not transformed in SQL (no CAST expression is emitted); instead a Pure ->cast(...) clause is appended so that the Legend Engine re-interprets the column under the target type.

A cast is allowed only when the source and target types share a subclass relationship in the PyLegend type hierarchy. For example, Integer BigInt is valid because BigInt is a sub-type of Integer, but String Integer is not.

Parameters:

column_type_map (Dict[str, Union[PrimitiveType, Tuple[PrimitiveType, ...]]]) – A mapping from column name to the desired target type. Values are produced by the helpers in pylegend.core.language.type_factory — for example tf.bigint(), tf.varchar(200), tf.numeric(10, 2). An empty dict is valid and returns a copy of the frame with unchanged columns.

Returns:

A new TDS frame with the cast column metadata. The original frame is never mutated.

Return type:

PandasApiTdsFrame

Raises:
  • ValueError – If a column name in column_type_map does not exist in the frame, or if the source-to-target conversion is not allowed (the types do not share a subclass relationship).

  • TypeError – If the target column is a non-primitive column (e.g. an EnumTdsColumn). Only PrimitiveTdsColumn columns can be cast.

See also

assign

Add or overwrite columns with computed values.

rename

Rename columns without changing their types.

Notes

Differences from pandas:

  • Pandas DataFrame.astype() converts data values in memory. cast changes only the declared column type in the query metadata; no SQL CAST expression is generated.

  • The allowed conversions follow the Legend type hierarchy, not Python/NumPy dtype-coercion rules.

  • Parameterised types such as Varchar(200) and Numeric(10, 2) are supported through the type_factory helpers and are reflected in the generated Pure ->cast(...) clause.

Cross-branch casts (e.g. Integer Float, String Boolean) raise ValueError.

Examples

Download Interactive Notebook

import pylegend
from pylegend.core.language import type_factory as tf
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Widen an integer column to BigInt
casted = frame.cast({"Order Id": tf.bigint()})
casted.head(3).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier
1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten
2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes
# Cast multiple columns at once
casted = frame.cast({
    "Order Id": tf.bigint(),
    "Ship Name": tf.varchar(200),
})
casted.head(3).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier
1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten
2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes

concat_legend_ext

PandasApiTdsFrame.concat_legend_ext(other)[source]

Concatenate this frame with another frame vertically.

PyLegend extension — not present in pandas.

Produces a SQL UNION ALL of the two frames. Both frames must have compatible schemas (same column names and types).

Parameters:

other (PandasApiTdsFrame) – The frame to concatenate below this one.

Returns:

A new TDS frame whose rows are the rows of self followed by the rows of other.

Return type:

PandasApiTdsFrame

Raises:

TypeError – If other is not a PandasApiBaseTdsFrame.

See also

merge

SQL join of two frames.

Notes

Differences from pandas:

  • In pandas, pd.concat is a top-level function that accepts a list of DataFrames. Here, concat_legend_ext is a method on a PandasApiTdsFrame and only supports vertical concatenation (UNION ALL) of two frames with the same schema.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
top = frame.head(3)
bottom = frame.head(3)
top.concat_legend_ext(bottom).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier
1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten
2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes
3 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier
4 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten
5 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes

count

PandasApiTdsFrame.count(axis=0, numeric_only=False, **kwargs)[source]

Count non-null values in each column.

Convenience method equivalent to aggregate('count'). Returns a single-row TDS frame with the count of non-null values for every column.

Parameters:
  • axis (Union[int, str]) – Only 0 / 'index' is supported.

  • numeric_only (bool) – Must be False. True is not supported.

  • **kwargs (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Not supported.

Returns:

A single-row TDS frame with non-null counts per column.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If any parameter is set to an unsupported value.

See also

aggregate

General aggregation method.

sum

Compute column sums.

Notes

Internally delegates to aggregate('count'). The same pandas deviations as sum() apply (axis=1, numeric_only=True not supported). Unlike sum, count does not have a skipna parameter since counting is always of non-null values.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Count non-null values in each column
frame.count().to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name
0 830 830 830 809 830

cume_dist_legend_ext

PandasApiTdsFrame.cume_dist_legend_ext(ascending=True)[source]

Compute the cumulative distribution of each column.

PyLegend extension — not present in pandas.

Maps to SQL CUME_DIST() OVER (ORDER BY col) and Pure cumulativeDistribution.

Parameters:

ascending (bool) – Whether to order in ascending direction.

Returns:

A new TDS frame with cumulative distribution values (floats between 0 and 1) replacing every column.

Return type:

PandasApiTdsFrame

See also

rank

Compute column ranks.

ntile_legend_ext

Assign rows to numbered buckets.

Notes

Differences from pandas:

  • This method has no pandas equivalent. CUME_DIST is exposed as a pylegend extension.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.filter(
    items=["Order Id"]
).cume_dist_legend_ext().head(5).to_pandas()
Order Id
0 0.001205
1 0.00241
2 0.003614
3 0.004819
4 0.006024

drop

PandasApiTdsFrame.drop(labels=None, axis=1, index=None, columns=None, level=None, inplace=False, errors='raise')[source]

Remove columns from the TDS frame by label.

Return a new TDS frame with the specified columns removed. Columns can be identified via labels (with axis=1) or via the columns parameter directly. Accepts a single column name, a list, tuple, or set of names.

Parameters:
  • labels (Union[str, Sequence[str], Set[str], None]) – Column name(s) to drop. Mutually exclusive with columns.

  • axis (Union[str, int, PyLegendInteger]) – The axis to drop along. Only column-axis (1 / 'columns') is supported.

  • index (Union[str, Sequence[str], Set[str], None]) – Not supported. Passing any value raises NotImplementedError.

  • columns (Union[str, Sequence[str], Set[str], None]) – Column name(s) to drop. Mutually exclusive with labels.

  • level (Union[str, int, PyLegendInteger, None]) – Not supported. Passing any value raises NotImplementedError.

  • inplace (Union[bool, PyLegendBoolean]) – Must be False. In-place mutation is not supported.

  • errors (str) – If 'raise', a KeyError is raised when any label is not found. If 'ignore', missing labels are silently skipped.

Returns:

A new TDS frame without the specified columns.

Return type:

PandasApiTdsFrame

Raises:
  • ValueError – If both labels and columns are provided, or if neither is provided. If axis is an invalid value (not 0, 1, 'index', or 'columns').

  • NotImplementedError – If axis is 0 / 'index' (row-level drop). If index or level is provided. If inplace is True.

  • KeyError – If any specified column does not exist in the frame and errors='raise'.

  • TypeError – If labels or columns is an unsupported type (e.g. a callable).

See also

filter

Select columns by name, substring, or regex.

assign

Add or overwrite columns.

rename

Rename existing columns.

Notes

Differences from pandas:

  • In pandas, drop can remove rows (axis=0) or columns (axis=1). Here, only column-axis dropping is supported (axis=1). Passing axis=0 raises NotImplementedError.

  • The axis parameter defaults to 1 (columns), whereas in pandas it defaults to 0 (rows). This means bare frame.drop("col") drops a column here but would attempt to drop a row label in pandas.

  • The index and level parameters are not supported.

  • inplace=True is not supported; always returns a new frame.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Drop a single column
frame.drop(columns="Ship Name").head(3).to_pandas()
Order Id Order Date Required Date Shipped Date
0 10248 1996-07-04 1996-08-01 1996-07-16
1 10249 1996-07-05 1996-08-16 1996-07-10
2 10250 1996-07-08 1996-08-05 1996-07-12
# Drop multiple columns
frame.drop(columns=["Ship Name", "Order Date"]).head(3).to_pandas()
Order Id Required Date Shipped Date
0 10248 1996-08-01 1996-07-16
1 10249 1996-08-16 1996-07-10
2 10250 1996-08-05 1996-07-12
# Using labels parameter
frame.drop(labels=["Ship Name"], axis=1).head(3).to_pandas()
Order Id Order Date Required Date Shipped Date
0 10248 1996-07-04 1996-08-01 1996-07-16
1 10249 1996-07-05 1996-08-16 1996-07-10
2 10250 1996-07-08 1996-08-05 1996-07-12
# Ignore missing columns instead of raising an error
frame.drop(columns=["Ship Name", "NonExistent"], errors="ignore").head(3).to_pandas()
Order Id Order Date Required Date Shipped Date
0 10248 1996-07-04 1996-08-01 1996-07-16
1 10249 1996-07-05 1996-08-16 1996-07-10
2 10250 1996-07-08 1996-08-05 1996-07-12

drop_duplicates

PandasApiTdsFrame.drop_duplicates(subset=None, *, keep='first', inplace=False, ignore_index=False)[source]

Remove duplicate rows.

Returns a new TDS frame with duplicate rows removed, optionally considering only a subset of columns for identifying duplicates.

Parameters:
  • subset (Union[str, List[str], None]) – Column label or list of labels to consider for identifying duplicates. If None, all columns are used.

  • keep (str) – Must be 'first'. Only keeping the first occurrence is supported.

  • inplace (bool) – Must be False. In-place modification is not supported.

  • ignore_index (bool) – Must be False. Not supported.

Returns:

A new TDS frame with duplicates removed.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If keep is not 'first', or inplace / ignore_index are True.

Notes

Differences from pandas:

  • Only keep='first' is supported. 'last' and False are not supported.

  • inplace=True and ignore_index=True are not supported.

  • Generates SQL SELECT DISTINCT ON ... or equivalent.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Remove rows with duplicate Ship Name
frame.drop_duplicates(subset=["Ship Name"]).head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier
1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten
2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes
3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock
4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices

dropna

PandasApiTdsFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False, ignore_index=False)[source]

Remove rows with missing values.

Return a new TDS frame with rows containing NA / null values removed. The check can be scoped to specific columns via subset and controlled via how.

Parameters:
  • axis (Union[int, str]) – Only 0 / 'index' (drop rows) is supported. 1 / 'columns' (drop columns) raises NotImplementedError.

  • how (str) –

    • 'any' : Drop the row if any of the considered columns contain a null value.

    • 'all' : Drop the row only if all of the considered columns are null.

  • thresh (Optional[int]) – Not supported. Passing any value raises NotImplementedError.

  • subset (Union[str, Sequence[str], None]) – Column names to consider when checking for nulls. If None (default), all columns are considered. An empty list with how='any' keeps all rows; an empty list with how='all' drops all rows.

  • inplace (bool) – Must be False. True raises NotImplementedError.

  • ignore_index (bool) – Must be False. True raises NotImplementedError.

Returns:

A new TDS frame with rows containing nulls removed.

Return type:

PandasApiTdsFrame

Raises:
  • NotImplementedError – If axis=1, thresh is set, inplace=True, or ignore_index=True.

  • ValueError – If axis is not a recognised value or how is not 'any' or 'all'.

  • TypeError – If subset is not a list, tuple, or set.

  • KeyError – If any column in subset does not exist in the frame.

See also

fillna

Fill missing values instead of dropping rows.

Notes

Differences from pandas:

  • axis=1 (dropping columns with nulls) is not supported.

  • thresh (minimum number of non-null values to keep a row) is not supported.

  • inplace=True is not supported; a new frame is always returned.

  • ignore_index=True is not supported.

  • Passing an empty subset=[] with how='any' is a no-op (all rows are kept). With how='all', an empty subset=[] drops all rows (the filter becomes false).

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Drop rows where any column is null
frame.dropna().head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier
1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten
2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes
3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock
4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices
# Drop rows where all columns are null
frame.dropna(how="all").head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier
1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten
2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes
3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock
4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices
# Only consider specific columns
frame.dropna(subset=["Ship Name"]).head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier
1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten
2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes
3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock
4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices

expanding

PandasApiTdsFrame.expanding(min_periods=1, axis=0, method=None, order_by=None, ascending=True)[source]

Create an expanding window frame for window-aggregate computations.

An expanding window includes all rows from the start of the partition up to the current row. This is useful for running totals, running averages, and similar cumulative calculations.

Parameters:
  • min_periods (int) – Minimum number of observations in the window required to have a value; otherwise, result is null.

  • axis (Union[int, str]) – Only 0 / 'index' is supported.

  • method (Optional[str]) – Must be None or 'python'.

  • order_by (Union[str, Sequence[str], None]) – Column(s) to order by within the window. Required for deterministic results.

  • ascending (Union[bool, Sequence[bool]]) – Sort order for the order_by columns.

Returns:

A window frame on which window aggregates (sum, mean, min, max, etc.) can be called.

Return type:

PandasApiWindowTdsFrame

See also

rolling

Fixed-size sliding window.

groupby

Group rows before applying window functions.

Raises:

NotImplementedError – If axis is not 0, method is not None or 'python', or min_periods is less than 1.

Notes

Differences from pandas:

  • order_by and ascending are pylegend extensions not present in pandas. They control the ORDER BY clause inside the SQL OVER(...) window specification.

  • axis=1 is not supported.

  • method='table' is not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Running sum of Order Id ordered by Order Id
frame.filter(items=["Order Id"]).expanding(
    order_by="Order Id"
).aggregate("sum").head(5).to_pandas()
Order Id
0 10248
1 20497
2 30747
3 40998
4 51250

fillna

PandasApiTdsFrame.fillna(value=None, axis=0, inplace=False, limit=None)[source]

Fill missing values with a specified value.

Replace NA / null entries in the TDS frame. A scalar value is applied to every column; a dict maps specific columns to their fill values (columns not present in the dict are left unchanged). Implemented via COALESCE at the SQL level.

Parameters:
  • value (Union[int, float, str, bool, date, datetime, Dict[str, Union[int, float, str, bool, date, datetime]]]) – Value(s) to replace nulls with. Accepted scalar types are int, float, str, bool, date, and datetime. When a dict is provided, keys must be column name strings and values must be scalars of the above types. Columns in the dict that do not exist in the frame are silently ignored. Omitting value entirely raises ValueError.

  • axis (Union[str, int, None]) – Only 0 / 'index' is supported. 1 / 'columns' raises NotImplementedError.

  • inplace (bool) – Must be False. True raises NotImplementedError.

  • limit (Optional[int]) – Not supported. Passing any value raises NotImplementedError.

Returns:

A new TDS frame with null values replaced.

Return type:

PandasApiTdsFrame

Raises:
  • ValueError – If value is not provided. If axis is not a recognised value.

  • TypeError – If value is not a scalar or dict. If dict keys are not strings or dict values are not scalars.

  • NotImplementedError – If axis=1, inplace=True, or limit is set.

See also

dropna

Remove rows with missing values.

Notes

Differences from pandas:

  • The method parameter ('ffill', 'bfill') available in older pandas versions is not present.

  • inplace=True is not supported; a new frame is always returned.

  • limit (maximum number of consecutive nulls to fill) is not supported.

  • axis=1 (fill along columns) is not supported.

Examples

Download Interactive Notebook

import datetime
import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame = frame.sort_values("Shipped Date")
frame = frame.head()
# check initial count of all the non-null values
frame.to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name
0 11059 1998-04-29 1998-06-10 NaT Ricardo Adocicados
1 11058 1998-04-29 1998-05-27 NaT Blauer See Delikatessen
2 11054 1998-04-28 1998-05-26 NaT Cactus Comidas para llevar
3 11051 1998-04-27 1998-05-25 NaT La maison d'Asie
4 11045 1998-04-23 1998-05-21 NaT Bottom-Dollar Markets
# Fill all null values of the "Shipped Date" column with a fixed date
frame = frame.fillna({
    "Shipped Date": datetime.date(1, 1, 1)
})
frame.to_pandas()

filter

PandasApiTdsFrame.filter(items=None, like=None, regex=None, axis=None)[source]

Select columns by label, substring match, or regular expression.

This method selects columns from the TDS frame based on their names. Exactly one of items, like, or regex must be provided; they are mutually exclusive.

Parameters:
  • items (Optional[List[str]]) – Exact column names to keep. All names must exist in the frame.

  • like (Optional[str]) – Keep columns whose names contain this substring.

  • regex (Optional[str]) – Keep columns whose names match this regular expression (uses re.search).

  • axis (Union[str, int, PyLegendInteger, None]) – The axis to filter on. Only column-axis filtering is supported. Defaults to 1 (columns) when omitted.

Returns:

A new TDS frame containing only the selected columns.

Return type:

PandasApiTdsFrame

Raises:
  • TypeError – If more than one of items, like, or regex is provided, or if none of them is provided. If items is a string instead of a list, or if like / regex is not a string.

  • ValueError – If axis is not 1 or 'columns'. If any name in items does not exist in the frame. If no columns match the like substring or regex pattern. If the regex pattern is invalid.

See also

assign

Add or overwrite columns.

drop

Remove columns by label.

rename

Rename columns.

Notes

Differences from pandas:

  • In pandas, filter supports both row-axis (axis=0) and column-axis (axis=1) filtering. Here, only column-axis filtering is supported (axis=1 or axis='columns'). Passing axis=0 or 'index' raises ValueError.

  • In pandas, items silently ignores names that do not exist in the frame. Here, all names must exist; unknown names raise a ValueError listing the missing and available columns.

  • In pandas, like and regex return an empty DataFrame when no columns match. Here, they raise ValueError when no columns match.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Select specific columns by name
frame.filter(items=["Order Id", "Ship Name"]).head(3).to_pandas()
Order Id Ship Name
0 10248 Vins et alcools Chevalier
1 10249 Toms Spezialitäten
2 10250 Hanari Carnes
# Select columns whose names contain a substring
frame.filter(like="Ship").head(3).to_pandas()
Shipped Date Ship Name
0 1996-07-16 Vins et alcools Chevalier
1 1996-07-10 Toms Spezialitäten
2 1996-07-12 Hanari Carnes
# Select columns matching a regex pattern
frame.filter(regex="^Ship").head(3).to_pandas()
Shipped Date Ship Name
0 1996-07-16 Vins et alcools Chevalier
1 1996-07-10 Toms Spezialitäten
2 1996-07-12 Hanari Carnes
# Chain filters to progressively narrow columns
frame.filter(like="Ship").filter(regex="Name$").head(3).to_pandas()
Ship Name
0 Vins et alcools Chevalier
1 Toms Spezialitäten
2 Hanari Carnes

groupby

PandasApiTdsFrame.groupby(by, level=None, as_index=False, sort=True, group_keys=False, observed=False, dropna=False)[source]

Group the TDS frame by one or more columns.

Return a PandasApiGroupbyTdsFrame object that can be used to apply aggregation functions (sum, mean, min, max, std, var, count, or the general aggregate/agg) and OLAP window functions (rank) to each group. Column selection after grouping is supported via bracket notation (e.g. frame.groupby("A")["B"].sum()).

The groupby columns act as the PARTITION BY clause in the underlying SQL when window functions such as rank are used.

Parameters:
  • by (Union[str, List[str]]) – Column name or list of column names to group by. All names must exist in the current frame.

  • level (Union[str, int, List[str], None]) – Not supported. Passing any value raises NotImplementedError. Use by instead.

  • as_index (bool) – Must be False. Setting to True raises NotImplementedError.

  • sort (bool) – Whether to sort the result by the grouping columns after aggregation.

  • group_keys (bool) – Must be False. Setting to True raises NotImplementedError.

  • observed (bool) – Must be False. Setting to True raises NotImplementedError.

  • dropna (bool) – Must be False. Setting to True raises NotImplementedError.

Returns:

A groupby object on which aggregation and window methods can be called. See PandasApiGroupbyTdsFrame for the full list of available methods.

Return type:

PandasApiGroupbyTdsFrame

Raises:
  • NotImplementedError – If level, as_index=True, group_keys=True, observed=True, or dropna=True is provided.

  • TypeError – If by is not a string or list of strings.

  • ValueError – If by is an empty list.

  • KeyError – If any column in by does not exist in the frame.

See also

aggregate

Aggregate without grouping.

sum

Convenience shorthand for sum aggregation.

count

Convenience shorthand for count aggregation.

Notes

Differences from pandas:

  • as_index defaults to False and must be False. In pandas it defaults to True. This means the grouping columns always appear as regular columns in the result, never as the index.

  • group_keys, observed, and dropna must all be False; their True variants are not supported.

  • level (grouping by index level) is not supported.

  • The groupby object supports column selection via [col] (returns a GroupbySeries) or [[col1, col2]] (returns a narrowed PandasApiGroupbyTdsFrame), matching the pandas pattern frame.groupby(...)["col"].sum().

  • When sort=True (default), the result is sorted by the grouping columns in ascending order after aggregation.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Group by a single column and count
frame.groupby("Ship Name")["Order Id"].count().head(5).to_pandas()
# Group by a column and sum a numeric column
frame.groupby("Ship Name")["Order Id"].sum().head(5).to_pandas()
# Group by a column with dict-based aggregation
frame.groupby("Ship Name").agg({"Order Id": "count"}).head(5).to_pandas()

Note

The returned PandasApiGroupbyTdsFrame object has its own set of aggregation and window methods whose signatures may differ from the frame-level equivalents. See Pandas Groupby TDS Frame for the full API reference.

iloc

property PandasApiTdsFrame.iloc: PandasApiIlocIndexer

Purely integer-location based indexing for selection by position.

Access rows and columns by integer position (0-based). Returns a PandasApiIlocIndexer that supports [] notation.

Allowed inputs:

  • An integer — selects a single row (e.g. frame.iloc[5]).

  • A slice with ints — selects a range of rows (e.g. frame.iloc[1:7]). Only step=1 (or None) is supported.

  • A tuple of (rows, cols) — selects rows and columns simultaneously (e.g. frame.iloc[1:5, 0:2]). Each element can be an int or a slice.

Returns:

An indexer object supporting [] notation that returns a new PandasApiTdsFrame.

Return type:

PandasApiIlocIndexer

Raises:
  • IndexError – If more than two indexers are provided. If a column integer index is out of bounds.

  • NotImplementedError – If a slice step other than 1 is used for rows or columns. If a list, boolean array, or callable is used as an indexer.

See also

loc

Label-based indexing (row filtering + column selection).

head

Return the first n rows.

truncate

Select rows by index range.

filter

Select columns by name.

Notes

Differences from pandas:

  • Only int and slice indexers are supported. Lists of integers, boolean arrays, and callable indexers raise NotImplementedError.

  • Slice steps other than 1 are not supported.

  • Negative integer indexing for rows is handled via truncate, so it follows truncate’s limitations.

  • When a single integer row index exceeds the number of rows, an empty frame is returned (no IndexError is raised, unlike pandas).

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Select a single row
frame.iloc[0].to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier
# Select a range of rows and columns
frame.iloc[1:4, 0:2].to_pandas()
Order Id Order Date
0 10249 1996-07-05
1 10250 1996-07-08
2 10251 1996-07-08
# Select a single row, all columns
frame.iloc[2, :].to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name
0 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes

info

PandasApiTdsFrame.info(verbose=None, buf=None, max_cols=None, memory_usage=None, show_counts=None)[source]

Print a concise summary of the TDS frame.

Displays the column names and their data types. This is a lightweight alternative to running a query — it uses only the metadata already available on the frame.

Parameters:
  • verbose (Optional[bool]) – Not supported. Ignored.

  • buf (Union[IO[str], StringIO, None]) – Not supported. Output always goes to stdout.

  • max_cols (Optional[int]) – Not supported. Ignored.

  • memory_usage (Union[bool, str, None]) – Not supported. Ignored.

  • show_counts (Optional[bool]) – Not supported. Ignored.

Returns:

Prints to stdout; returns nothing.

Return type:

None

Notes

Differences from pandas:

  • Only column names and types are shown.

  • memory_usage, verbose, buf, max_cols, and show_counts are accepted for API compatibility but ignored.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.info()
<class 'pylegend.extensions.tds.pandas_api.frames.pandas_api_legend_service_input_frame.PandasApiLegendServiceInputFrame'>
RangeIndex: 830 entries
Data columns (total 5 columns):
#  Column         Non-Null Count  Dtype     
-  -------------  --------------  ----------
0  Order Id       830 non-null    Integer   
1  Order Date     830 non-null    StrictDate
2  Required Date  830 non-null    StrictDate
3  Shipped Date   809 non-null    StrictDate
4  Ship Name      830 non-null    String    
dtypes: Integer(1), StrictDate(3), String(1)

join

PandasApiTdsFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False, validate=None)[source]

Join this TDS frame with another on shared column(s).

Convenience method that delegates to merge(). The lsuffix and rsuffix parameters are mapped to the suffixes parameter of merge, and on is passed directly.

Parameters:
  • other (PandasApiTdsFrame) – The right TDS frame to join with.

  • on (Union[str, Sequence[str], None]) – Column name(s) to join on. Must exist in both frames. Unlike pandas join, this parameter specifies column names, not index labels.

  • how (Optional[str]) – Type of join. See merge() for details.

  • lsuffix (str) – Suffix to apply to overlapping column names from the left frame.

  • rsuffix (str) – Suffix to apply to overlapping column names from the right frame.

  • sort (Optional[bool]) – If True, sort the result by the join keys.

  • validate (Optional[str]) – Not supported. Passing any value raises NotImplementedError.

Returns:

A new TDS frame containing the joined result.

Return type:

PandasApiTdsFrame

Raises:
  • ValueError – If overlapping column names exist and lsuffix / rsuffix do not resolve the conflict.

  • NotImplementedError – If validate is set.

See also

merge

The underlying merge method with full parameter control.

Notes

Differences from pandas:

  • In pandas, DataFrame.join joins on the index by default, optionally using on to specify a column in the left frame to match against the right frame’s index. Here, join is purely column-on-column and delegates directly to merge(on=on). There is no index-based joining.

  • The lsuffix and rsuffix parameters correspond to suffixes=(lsuffix, rsuffix) in merge. In pandas, default suffixes are empty strings (raising on conflict); here they also default to empty strings.

  • Because this delegates to merge, all limitations of merge apply: no self-join, no left_index / right_index, no indicator, and no validate.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Create a second frame with renamed columns
frame2 = pylegend.samples.pandas_api.northwind_orders_frame()
frame2 = frame2.rename({"Order Id": "Right Order Id"})
# Left join on a common key
frame.head(5).join(
    frame2.head(5),
    on="Ship Name",
    how="left",
    lsuffix="_left",
    rsuffix="_right"
).to_pandas()
Order Id Order Date_left Required Date_left Shipped Date_left Ship Name Right Order Id Order Date_right Required Date_right Shipped Date_right
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 10248 1996-07-04 1996-08-01 1996-07-16
1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 10249 1996-07-05 1996-08-16 1996-07-10
2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 10250 1996-07-08 1996-08-05 1996-07-12
3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 10251 1996-07-08 1996-08-05 1996-07-15
4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices 10252 1996-07-09 1996-08-06 1996-07-11

loc

property PandasApiTdsFrame.loc: PandasApiLocIndexer

Access rows and columns by label-based indexing or boolean conditions.

Returns a PandasApiLocIndexer that supports [] notation for combined row filtering and column selection.

Row selection (first indexer):

  • Complete slice :: Select all rows.

  • Boolean expression: A PyLegendBoolean expression built from column comparisons (e.g. frame['col'] > 5), used as a WHERE filter.

  • Callable: A function that receives the frame and returns a PyLegendBoolean expression (e.g. lambda x: x['col'] > 5).

Column selection (second indexer):

  • ``str``: A single column name (e.g. 'col1').

  • ``list of str``: Multiple column names.

  • ``list of bool``: Boolean mask over columns (must match the number of columns exactly).

  • ``slice of str``: Label-based column slice (e.g. 'col1':'col3'), inclusive on both ends.

  • Complete slice :: Select all columns.

Returns:

An indexer object supporting [] notation that returns a new PandasApiTdsFrame.

Return type:

PandasApiLocIndexer

Raises:
  • IndexError – If more than two indexers are provided. If a boolean column mask has the wrong length.

  • TypeError – If a label-based slice is used for rows (only : is allowed). If a list of integers, a set, or another unsupported type is used for row or column selection.

  • KeyError – If a column name in a list does not exist in the frame.

See also

iloc

Integer-position based indexing.

filter

Select columns by name.

head

Return the first n rows.

Notes

Differences from pandas:

  • For row selection, only :, boolean expressions, and callables are supported. Integer label selection, integer slicing, and list-of-integer selection are not supported.

  • Label-based row slicing (e.g. frame.loc[2:5]) is not supported — only the complete slice : is allowed.

  • For column selection, string labels, lists of strings, boolean masks, and label-based slices are supported. Label slices use pandas.Index.slice_indexer internally, so slice semantics are inclusive on both ends (matching pandas loc behaviour).

  • If a label-based column slice resolves to an empty selection, an empty frame (zero rows) is returned via head(0).

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Select specific columns
frame.loc[:, "Ship Name"].head(3).to_pandas()
Ship Name
0 Vins et alcools Chevalier
1 Toms Spezialitäten
2 Hanari Carnes
# Filter rows with a boolean condition and select columns
frame.loc[frame["Order Id"] > 10300, ["Order Id", "Ship Name"]].head(5).to_pandas()
Order Id Ship Name
0 10301 Die Wandernde Kuh
1 10302 Suprêmes délices
2 10303 Godos Cocina Típica
3 10304 Tortuga Restaurante
4 10305 Old World Delicatessen
# Filter rows with a callable
frame.loc[
    lambda x: x["Ship Name"].startswith("A"),
    ["Order Id", "Ship Name"]
].head(5).to_pandas()
Order Id Ship Name
0 10308 Ana Trujillo Emparedados y helados
1 10355 Around the Horn
2 10365 Antonio Moreno Taquería
3 10383 Around the Horn
4 10453 Around the Horn
# Boolean column mask
frame.loc[:, [True, False]].head(3).to_pandas()

max

PandasApiTdsFrame.max(axis=0, skipna=True, numeric_only=False, **kwargs)[source]

Compute the maximum value of each column.

Convenience method equivalent to aggregate('max'). Returns a single-row TDS frame with the maximum value of every column. For string columns, returns the lexicographically largest value.

Parameters:
  • axis (Union[int, str]) – Only 0 / 'index' is supported.

  • skipna (bool) – Must be True. False is not supported.

  • numeric_only (bool) – Must be False. True is not supported.

  • **kwargs (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Not supported.

Returns:

A single-row TDS frame with column maximums.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If any parameter is set to an unsupported value.

See also

aggregate

General aggregation method.

min

Compute column minimums.

Notes

Internally delegates to aggregate('max'). The same pandas deviations as sum() apply.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Maximum of each column
frame.filter(items=["Order Id"]).max().to_pandas()
Order Id
0 11077

mean

PandasApiTdsFrame.mean(axis=0, skipna=True, numeric_only=False, **kwargs)[source]

Compute the mean of each column.

Convenience method equivalent to aggregate('mean'). Returns a single-row TDS frame with the arithmetic mean of every column.

Parameters:
  • axis (Union[int, str]) – Only 0 / 'index' is supported.

  • skipna (bool) – Must be True. False is not supported.

  • numeric_only (bool) – Must be False. True is not supported.

  • **kwargs (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Not supported.

Returns:

A single-row TDS frame with column means.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If any parameter is set to an unsupported value.

See also

aggregate

General aggregation method.

sum

Compute column sums.

std

Compute column standard deviations.

Notes

Internally delegates to aggregate('mean'). The same pandas deviations as sum() apply (skipna=False, numeric_only=True, axis=1 are not supported).

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Mean of numeric columns
frame.filter(items=["Order Id"]).mean().to_pandas()
Order Id
0 10662.5

merge

PandasApiTdsFrame.merge(other, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), indicator=False, validate=None)[source]

Merge this TDS frame with another using a database-style join.

Combine two frames column-wise based on common columns or explicit key specifications. Supports inner, left, right, outer (full), and cross joins.

Parameters:
  • other (PandasApiTdsFrame) – The right TDS frame to merge with. Must be a different frame instance; merging a frame with itself raises NotImplementedError.

  • how (Optional[str]) –

    Type of merge:

    • 'inner' : Only rows with matching keys in both frames.

    • 'left' : All rows from the left frame, NaN-filled for non-matching right rows.

    • 'right' : All rows from the right frame, NaN-filled for non-matching left rows.

    • 'outer' : All rows from both frames (FULL OUTER JOIN).

    • 'cross' : Cartesian product of both frames. No join keys may be specified.

  • on (Union[str, Sequence[str], None]) – Column name(s) to join on. Must exist in both frames. Mutually exclusive with left_on / right_on.

  • left_on (Union[str, Sequence[str], None]) – Column name(s) from the left frame to join on.

  • right_on (Union[str, Sequence[str], None]) – Column name(s) from the right frame to join on. Must have the same length as left_on.

  • left_index (Optional[bool]) – Not supported. Setting to True raises NotImplementedError.

  • right_index (Optional[bool]) – Not supported. Setting to True raises NotImplementedError.

  • sort (Optional[bool]) – If True, sort the result by the join keys in ascending order.

  • suffixes (Union[Tuple[Optional[str], Optional[str]], List[Optional[str]], None]) – Suffixes to apply to overlapping non-key column names from the left and right frames respectively. Use None to indicate that the column name from the respective frame should be left as-is (will raise if this causes duplicates).

  • indicator (Union[bool, str, None]) – Not supported. Setting to a truthy value raises NotImplementedError.

  • validate (Optional[str]) – Not supported. Passing any value raises NotImplementedError.

Returns:

A new TDS frame containing the merged result.

Return type:

PandasApiTdsFrame

Raises:
  • TypeError – If other is not a PandasApiTdsFrame. If how, on, left_on, right_on, suffixes, or sort have invalid types.

  • ValueError – If both on and left_on/right_on are specified. If left_on and right_on have different lengths. If no merge keys can be resolved and how is not 'cross'. If how='cross' is used with on/left_on/ right_on. If how is not a recognised join method. If the resulting columns contain duplicates after suffix application.

  • KeyError – If a key specified in on, left_on, or right_on does not exist in the corresponding frame.

  • NotImplementedError – If left_index=True, right_index=True, indicator is truthy, validate is set, or the frame is merged with itself.

See also

join

Convenience wrapper around merge with simpler syntax.

Notes

Differences from pandas:

  • Self-merge is not supported. Merging a frame with itself raises NotImplementedError.

  • Index-based merging is not supported. left_index and right_index must be False.

  • ``indicator`` and ``validate`` parameters are not supported.

  • When no join keys are provided (and how is not 'cross'), the merge infers keys from the intersection of column names between the two frames. If no common columns exist, a ValueError is raised (unlike pandas, which would raise a MergeError).

  • how='outer' maps to a FULL OUTER JOIN at the SQL level.

  • how='cross' is implemented as a CROSS JOIN in SQL, but mapped to JoinKind.INNER with a 1==1 condition in the PURE query representation.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Create a second frame for joining
frame2 = pylegend.samples.pandas_api.northwind_orders_frame()
frame2 = frame2.rename({"Order Id": "Right Order Id"})
# Inner merge on a common column
frame.head(5).merge(
    frame2.head(5),
    how="inner",
    left_on="Order Id",
    right_on="Right Order Id"
).to_pandas()
Order Id Order Date_x Required Date_x Shipped Date_x Ship Name_x Right Order Id Order Date_y Required Date_y Shipped Date_y Ship Name_y
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier
1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten
2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes
3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock
4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices

min

PandasApiTdsFrame.min(axis=0, skipna=True, numeric_only=False, **kwargs)[source]

Compute the minimum value of each column.

Convenience method equivalent to aggregate('min'). Returns a single-row TDS frame with the minimum value of every column. For string columns, returns the lexicographically smallest value.

Parameters:
  • axis (Union[int, str]) – Only 0 / 'index' is supported.

  • skipna (bool) – Must be True. False is not supported.

  • numeric_only (bool) – Must be False. True is not supported.

  • **kwargs (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Not supported.

Returns:

A single-row TDS frame with column minimums.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If any parameter is set to an unsupported value.

See also

aggregate

General aggregation method.

max

Compute column maximums.

Notes

Internally delegates to aggregate('min'). The same pandas deviations as sum() apply.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Minimum of each column
frame.filter(items=["Order Id"]).min().to_pandas()
Order Id
0 10248

ntile_legend_ext

PandasApiTdsFrame.ntile_legend_ext(num_buckets, ascending=True)[source]

Assign rows to numbered buckets for each column.

PyLegend extension — not present in pandas.

Maps to SQL NTILE(n) OVER (ORDER BY col) and Pure ntile.

Parameters:
  • num_buckets (int) – Number of buckets to distribute rows into.

  • ascending (bool) – Whether to order in ascending direction.

Returns:

A new TDS frame with integer bucket numbers (1-based) replacing every column.

Return type:

PandasApiTdsFrame

See also

rank

Compute column ranks.

cume_dist_legend_ext

Cumulative distribution.

Notes

Differences from pandas:

  • This method has no pandas equivalent. NTILE is exposed as a pylegend extension.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.filter(
    items=["Order Id"]
).ntile_legend_ext(4).head(5).to_pandas()
Order Id
0 1
1 1
2 1
3 1
4 1

range_between

PandasApiTdsFrame.range_between(start=None, end=None, *, duration_start=None, duration_start_unit=None, duration_end=None, duration_end_unit=None)[source]

Create a RANGE BETWEEN window-frame specification.

PyLegend extension — not present in pandas.

Supports two calling styles:

Simple numeric bounds (same sign convention as rows_between()):

range_between(start=-100, end=0)
# → RANGE BETWEEN 100 PRECEDING AND CURRENT ROW

Duration-based bounds (for date/time ORDER BY columns):

range_between(
    duration_start=-1, duration_start_unit="DAYS",
    duration_end=1, duration_end_unit="MONTHS",
)
Parameters:
  • start (Union[int, float, Decimal, None]) – Lower bound of the range. None means unbounded preceding.

  • end (Union[int, float, Decimal, None]) – Upper bound of the range. None means unbounded following.

  • duration_start (Union[int, float, Decimal, str, None]) – Duration-based lower bound. Pass "unbounded" for unbounded preceding.

  • duration_start_unit (Optional[str]) – Time unit for duration_start (e.g. "DAYS", "MONTHS").

  • duration_end (Union[int, float, Decimal, str, None]) – Duration-based upper bound.

  • duration_end_unit (Optional[str]) – Time unit for duration_end.

Returns:

A frame specification to pass to window_frame_legend_ext().

Return type:

RangeBetween

Raises:

ValueError – If positional bounds and duration bounds are mixed, or if start is greater than end.

See also

rows_between

Create a ROWS BETWEEN specification.

window_frame_legend_ext

Apply a custom window specification.

Notes

Differences from pandas:

  • This method has no pandas equivalent. It is a pylegend extension for constructing SQL RANGE BETWEEN clauses.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Numeric range: 100 preceding to current row
spec = frame.range_between(-100, 0)

rank

PandasApiTdsFrame.rank(axis=0, method='min', numeric_only=False, na_option='bottom', ascending=True, pct=False)[source]

Compute the rank of values in each column.

Replace every column’s values with their rank within that column. Each column is ranked independently using an SQL window function (RANK, DENSE_RANK, ROW_NUMBER, or PERCENT_RANK).

The result is a new frame with the same column names but all values replaced by their integer (or float when pct=True) rank.

Parameters:
  • axis (Union[int, str]) – Only 0 / 'index' is supported. 1 raises NotImplementedError.

  • method (str) –

    How to rank equal values:

    • 'min' : Lowest rank in the group of ties (SQL RANK()).

    • 'first' : Ranks assigned in order of appearance (SQL ROW_NUMBER()).

    • 'dense' : Like 'min' but ranks always increase by 1, no gaps (SQL DENSE_RANK()).

  • numeric_only (bool) – If True, only rank columns of numeric type (Integer, Float, Number). Non-numeric columns are excluded from the result.

  • na_option (str) – How to rank null values. Only 'bottom' is supported. 'keep' and 'top' raise NotImplementedError.

  • ascending (bool) – Whether to rank in ascending order. False ranks in descending order.

  • pct (bool) – If True, compute percentage ranks (SQL PERCENT_RANK()). Result columns are of float type. Can only be used with method='min'.

Returns:

A new TDS frame where every column contains integer ranks (or float when pct=True).

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If axis is not 0 or 'index'. If method is not one of 'min', 'first', 'dense' (e.g. 'average' and 'max' are not supported). If na_option is not 'bottom'. If pct=True with a method other than 'min'.

See also

PandasApiGroupbyTdsFrame.rank

Rank within groups.

sort_values

Sort the frame by column values.

Notes

Differences from pandas:

  • The 'average' and 'max' ranking methods are not supported. Only 'min', 'first', and 'dense' are available.

  • na_option only supports 'bottom'. 'keep' and 'top' raise NotImplementedError.

  • pct=True is only supported with method='min' (maps to PERCENT_RANK()). Combining pct=True with other methods raises NotImplementedError.

  • When applied to the full frame (not via a Series), all columns are replaced by their ranks. To append a rank column instead, use bracket assignment on a single-column Series: frame["rank_col"] = frame["col"].rank().

  • Combining multiple rank calls in a single expression is not supported (e.g. frame["col1"].rank() + frame["col2"].rank()). Compute them in separate assignment steps instead.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Rank all columns (replaces values with ranks)
frame.filter(items=["Order Id"]).rank().head(5).to_pandas()
Order Id
0 1
1 2
2 3
3 4
4 5
# Append a percentage rank column via Series assignment
frame["Order Rank"] = frame["Order Id"].rank(pct=True)
frame.head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name Order Rank
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 0.0
1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 0.001206
2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 0.002413
3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 0.003619
4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices 0.004825

rename

PandasApiTdsFrame.rename(mapper=None, index=None, columns=None, axis=1, inplace=False, copy=True, level=None, errors='ignore')[source]

Rename columns of the TDS frame.

Alter column labels using a mapping (dict) or a callable function applied to each column name.

Parameters:
  • mapper (Union[Dict[str, str], Callable[[str], str], None]) – Mapping of old column names to new column names, or a callable that transforms each column name (e.g. str.upper). Used when axis=1 (columns). Cannot be specified together with columns.

  • index (Union[Dict[str, str], Callable[[str], str], None]) – Not supported. Passing any value raises NotImplementedError.

  • columns (Union[Dict[str, str], Callable[[str], str], None]) – Alternative to mapper for renaming columns. Mutually exclusive with mapper when both are provided alongside axis.

  • axis (Union[str, int]) – Axis to target. Only 1 / 'columns' is supported. 0 / 'index' raises NotImplementedError.

  • inplace (bool) – Must be False. True raises NotImplementedError.

  • copy (bool) – Must be True. False raises NotImplementedError.

  • level (Union[str, int, None]) – Not supported. Passing any value raises NotImplementedError.

  • errors (str) – If 'raise', raise a KeyError when a key in the mapping does not exist as a column name. If 'ignore', silently skip non-existent keys.

Returns:

A new TDS frame with renamed columns.

Return type:

PandasApiTdsFrame

Raises:
  • TypeError – If mapper or columns is not a dict or callable. If copy or inplace is not a bool.

  • ValueError – If both mapper (with axis) and columns/ index are specified simultaneously. If axis is not a supported value. If errors is not 'ignore' or 'raise'. If the rename produces duplicate column names.

  • KeyError – If errors='raise' and a key in the mapping does not exist in the frame’s columns.

  • NotImplementedError – If axis=0/'index', index is set, level is set, copy=False, or inplace=True.

See also

filter

Select columns by name.

drop

Remove columns.

assign

Add or overwrite columns.

Notes

Differences from pandas:

  • Only column renaming is supported (axis=1). Index renaming (axis=0) raises NotImplementedError.

  • inplace=True is not supported; a new frame is always returned.

  • copy=False is not supported.

  • level (multi-level index) is not supported.

  • The index parameter is not supported.

  • When using a callable, it is applied to every column name (e.g. str.upper will uppercase all column names).

  • If errors='ignore' (the default), keys in the mapping that do not match any column are silently ignored, matching pandas behaviour.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Rename with a dict
frame.rename({"Order Id": "OrderId", "Ship Name": "ShipName"}).head(3).to_pandas()
OrderId Order Date Required Date Shipped Date ShipName
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier
1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten
2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes
# Rename with a callable
frame.rename(str.upper).head(3).to_pandas()
ORDER ID ORDER DATE REQUIRED DATE SHIPPED DATE SHIP NAME
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier
1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten
2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes
# Rename via the columns parameter
frame.rename(columns={"Order Id": "order_id"}).head(3).to_pandas()
order_id Order Date Required Date Shipped Date Ship Name
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier
1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten
2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes

rolling

PandasApiTdsFrame.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None, step=None, method=None, order_by=None, ascending=True)[source]

Create a fixed-size sliding window frame for window-aggregate computations.

A rolling window includes a fixed number of preceding rows (and optionally the current row) for each row, enabling moving averages, moving sums, and similar calculations.

Parameters:
  • window (int) – Size of the moving window (number of rows).

  • min_periods (Optional[int]) – Minimum number of observations in the window required to have a value. Defaults to window.

  • center (bool) – Not supported. Must be False.

  • win_type (Optional[str]) – Not supported. Must be None.

  • on (Optional[str]) – Not supported. Must be None.

  • axis (Union[int, str]) – Only 0 / 'index' is supported.

  • closed (Optional[str]) – Not supported. Must be None.

  • step (Optional[int]) – Not supported. Must be None.

  • method (Optional[str]) – Must be None or 'python'.

  • order_by (Union[str, Sequence[str], None]) – Column(s) to order by within the window. Required for deterministic results.

  • ascending (Union[bool, Sequence[bool]]) – Sort order for the order_by columns.

Returns:

A window frame on which window aggregates (sum, mean, min, max, etc.) can be called.

Return type:

PandasApiWindowTdsFrame

See also

expanding

Expanding (cumulative) window.

groupby

Group rows before applying window functions.

Raises:

NotImplementedError – If center, win_type, on, closed, or step are set to non-default values. Also raised if axis is not 0 or method is not None / 'python'.

Notes

Differences from pandas:

  • order_by and ascending are pylegend extensions not present in pandas. They control the ORDER BY clause inside the SQL OVER(...) window specification.

  • center, win_type, on, closed, step are not supported.

  • axis=1 is not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# 3-row moving average of Order Id ordered by Order Id
frame.filter(items=["Order Id"]).rolling(
    window=3, order_by="Order Id"
).aggregate("mean").head(5).to_pandas()
Order Id
0 10248.0
1 10248.5
2 10249.0
3 10250.0
4 10251.0

rows_between

PandasApiTdsFrame.rows_between(start=None, end=None)[source]

Create a ROWS BETWEEN window-frame specification.

PyLegend extension — not present in pandas.

Sign convention (same as legendQL):

  • None → UNBOUNDED (PRECEDING for start, FOLLOWING for end)

  • Negative → PRECEDING (e.g. -33 PRECEDING)

  • 0 → CURRENT ROW

  • Positive → FOLLOWING (e.g. 22 FOLLOWING)

Parameters:
  • start (Optional[int]) – Lower bound of the frame. None means unbounded preceding.

  • end (Optional[int]) – Upper bound of the frame. None means unbounded following.

Returns:

A frame specification to pass to window_frame_legend_ext().

Return type:

RowsBetween

Raises:

ValueError – If start is greater than end.

See also

range_between

Create a RANGE BETWEEN specification.

window_frame_legend_ext

Apply a custom window specification.

Notes

Differences from pandas:

  • This method has no pandas equivalent. It is a pylegend extension for constructing SQL ROWS BETWEEN clauses.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# 3-row trailing window (current row and 2 preceding)
spec = frame.rows_between(-2, 0)

shape

property PandasApiTdsFrame.shape: Tuple[int, int]

Return the dimensionality of the TDS frame as (rows, columns).

Warning

Unlike pandas.DataFrame.shape, this property executes the frame against the server to determine the row count. It issues a COUNT aggregation query, so every access incurs a round-trip to the database.

Returns:

A tuple (number_of_rows, number_of_columns).

Return type:

tuple of (int, int)

See also

head

Return the first n rows (lazy, no execution).

count

Count non-null values per column (returns a frame).

Notes

Differences from pandas:

  • In pandas, DataFrame.shape is an O(1) metadata lookup that never triggers computation. Here, shape executes the current frame to obtain the row count via a COUNT aggregation query. This means it requires a live connection to the database. This will fail on non-executable frames.

  • The result type is always (int, int); there is no lazy evaluation.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Get the shape (triggers server execution)
frame.head(5).shape
(np.int64(5), 5)

shift

PandasApiTdsFrame.shift(order_by, periods=1, freq=None, axis=0, fill_value=None, suffix=None)[source]

Shift values by desired number of periods.

Replace every column’s values with their shifted values. Because underlying TDS is inherently unordered, this requires an explicit order_by parameter to define the ordering for the window function (LAG or LEAD).

Parameters:
  • order_by (Union[str, Sequence[str]]) – Column name(s) to order the frame by before applying the shift. Unlike pandas, this is required to ensure deterministic output. All specified columns must be present in the base frame.

  • periods (Union[int, Sequence[int]]) – Number of periods to shift. Currently, only 1 (shift down, equivalent to SQL LAG) and -1 (shift up, equivalent to SQL LEAD) are supported. If a sequence is provided, it cannot contain duplicate values.

  • freq (Union[str, int, None]) – Not supported. Must be None.

  • axis (Union[int, str]) – Axis to shift along. Only 0 / 'index' is supported.

  • fill_value (Optional[Hashable]) – Not supported. Must be None. Missing values introduced by the shift will always be null.

  • suffix (Optional[str]) – If provided, renames the resulting shifted columns by appending this string to the original column names. This argument can only be used if periods is a sequence (not a single integer).

Returns:

A new TDS frame with the shifted columns.

Return type:

PandasApiTdsFrame

Raises:
  • NotImplementedError – If periods contains any values other than 1 or -1. If freq is not None. If axis is not 0 or 'index'. If fill_value is not None.

  • ValueError – If any column specified in order_by is not present in the frame. If periods contains duplicate values. If suffix is specified but periods is a single integer.

See also

rank

Rank as ascending or descending.

PandasApiGroupbyTdsFrame.shift

Shift values within groups.

Notes

Differences from pandas:

  • The order_by parameter is mandatory. In pandas, shift relies on the implicit order of the dataframe’s index. Here, an explicit order must be provided.

  • periods is strictly limited to 1 or -1. Arbitrary integer shifts are not supported.

  • fill_value is not supported and must be None.

  • The freq parameter is not supported and must be None.

  • axis=1 (shifting horizontally across columns) is not supported.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Shift the entire frame down
frame.head(5).shift(
    order_by="Order Date",
    periods=1
).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name
0 <NA> NaT NaT NaT NaN
1 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier
2 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten
3 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes
4 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock

sort_values

PandasApiTdsFrame.sort_values(by, axis=0, ascending=True, inplace=False, kind=None, na_position='last', ignore_index=True, key=None)[source]

Sort the TDS frame by one or more columns.

Return a new TDS frame sorted by the values in the specified column(s). Supports ascending and descending sort order per column.

Parameters:
  • by (Union[str, List[str]]) – Column name or list of column names to sort by. All names must exist in the current frame.

  • axis (Union[str, int]) – Axis along which to sort. Only 0 / 'index' (row-wise sorting) is supported.

  • ascending (Union[bool, List[bool]]) – Sort order. If a list, must have the same length as by.

  • inplace (bool) – Must be False. In-place mutation is not supported.

  • kind (Optional[str]) – Not supported. Must be None; passing any value raises NotImplementedError.

  • na_position (str) – Position of null values. Accepted but handled at the SQL engine level.

  • ignore_index (bool) – Must be True. Setting to False raises ValueError.

  • key (Optional[Callable[[AbstractTdsRow], AbstractTdsRow]]) – Not supported. Must be None; passing a callable raises NotImplementedError.

Returns:

A new TDS frame sorted by the specified columns.

Return type:

PandasApiTdsFrame

Raises:
  • ValueError – If a column in by does not exist in the frame.

  • ValueError – If the length of ascending does not match by.

  • ValueError – If axis is not 0 or 'index'.

  • ValueError – If inplace is True.

  • ValueError – If ignore_index is False.

  • NotImplementedError – If kind or key is provided.

See also

head

Return the first n rows.

truncate

Select a range of rows by position.

filter

Select columns by name, substring, or regex.

Notes

Differences from pandas:

  • The kind parameter (sort algorithm) is not supported. Sorting is delegated to the underlying Legend Engine.

  • The key parameter (per-element transform before sorting) is not supported.

  • inplace=True is not supported; always returns a new frame.

  • ignore_index must be True; False is not supported because TDS frames do not have an index.

  • axis=1 (sorting columns) is not supported; only row-wise sorting via axis=0 is available.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Sort by a single column (ascending by default)
frame.sort_values("Ship Name").head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name
0 11011 1998-04-09 1998-05-07 1998-04-13 Alfred's Futterkiste
1 10952 1998-03-16 1998-04-27 1998-03-24 Alfred's Futterkiste
2 10835 1998-01-15 1998-02-12 1998-01-21 Alfred's Futterkiste
3 10702 1997-10-13 1997-11-24 1997-10-21 Alfred's Futterkiste
4 10692 1997-10-03 1997-10-31 1997-10-13 Alfred's Futterkiste
# Sort descending
frame.sort_values("Order Id", ascending=False).head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name
0 11077 1998-05-06 1998-06-03 NaT Rattlesnake Canyon Grocery
1 11076 1998-05-06 1998-06-03 NaT Bon app'
2 11075 1998-05-06 1998-06-03 NaT Richter Supermarkt
3 11074 1998-05-06 1998-06-03 NaT Simons bistro
4 11073 1998-05-05 1998-06-02 NaT Pericles Comidas clásicas
# Sort by multiple columns with mixed directions
frame.sort_values(
    by=["Ship Name", "Order Id"],
    ascending=[True, False]
).head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name
0 11011 1998-04-09 1998-05-07 1998-04-13 Alfred's Futterkiste
1 10952 1998-03-16 1998-04-27 1998-03-24 Alfred's Futterkiste
2 10835 1998-01-15 1998-02-12 1998-01-21 Alfred's Futterkiste
3 10702 1997-10-13 1997-11-24 1997-10-21 Alfred's Futterkiste
4 10692 1997-10-03 1997-10-31 1997-10-13 Alfred's Futterkiste

std

PandasApiTdsFrame.std(axis=0, skipna=True, ddof=1, numeric_only=False, **kwargs)[source]

Compute the standard deviation of each column.

Convenience method equivalent to aggregate('std') (ddof=1) or aggregate('std_dev_population') (ddof=0). Returns a single-row TDS frame with the standard deviation of every column.

Parameters:
  • axis (Union[int, str]) – Only 0 / 'index' is supported.

  • skipna (bool) – Must be True. False is not supported.

  • ddof (int) – Degrees of freedom. 1 for sample standard deviation (STDDEV_SAMP), 0 for population standard deviation (STDDEV_POP).

  • numeric_only (bool) – Must be False. True is not supported.

  • **kwargs (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Not supported.

Returns:

A single-row TDS frame with column standard deviations.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If ddof is not 0 or 1, or if any other parameter is set to an unsupported value.

See also

aggregate

General aggregation method.

var

Compute column variances.

mean

Compute column means.

Notes

Differences from pandas:

  • Only ddof=0 and ddof=1 are supported.

  • Internally delegates to aggregate('std') (ddof=1, maps to STDDEV_SAMP) or aggregate('std_dev_population') (ddof=0, maps to STDDEV_POP).

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Standard deviation of numeric columns
frame.filter(items=["Order Id"]).std().to_pandas()
Order Id
0 239.744656

sum

PandasApiTdsFrame.sum(axis=0, skipna=True, numeric_only=False, min_count=0, **kwargs)[source]

Compute the sum of each column.

Convenience method equivalent to aggregate('sum'). Returns a single-row TDS frame with the sum of every column.

Parameters:
  • axis (Union[int, str]) – Only 0 / 'index' is supported.

  • skipna (bool) – Must be True. False is not supported.

  • numeric_only (bool) – Must be False. True is not supported.

  • min_count (int) – Must be 0. Non-zero values are not supported.

  • **kwargs (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Not supported. Passing any keyword arguments raises NotImplementedError.

Returns:

A single-row TDS frame with column sums.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If axis, skipna, numeric_only, min_count, or **kwargs are set to unsupported values.

See also

aggregate

General aggregation method.

mean

Compute column means.

count

Count non-null values per column.

Notes

Differences from pandas:

  • skipna=False, numeric_only=True, and non-zero min_count are not supported.

  • axis=1 is not supported.

  • Internally delegates to aggregate('sum').

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Sum of all columns
frame.filter(items=["Order Id"]).sum().to_pandas()
Order Id
0 8849875

truncate

PandasApiTdsFrame.truncate(before=0, after=None, axis=0, copy=True)[source]

Select rows by positional index range.

Return a new TDS frame containing rows from position before (inclusive) to after (inclusive).

Parameters:
  • before (Union[date, str, int, None]) – Only int and None are supported. First row index to include (0-based, inclusive). Negative values are silently clamped to 0. None is treated as 0.

  • after (Union[date, str, int, None]) – Only int and None are supported. Last row index to include (0-based, inclusive). None means no upper bound (all remaining rows are returned). Negative values result in an empty frame.

  • axis (Union[str, int]) – Axis to truncate along. Only 0 / 'index' is supported.

  • copy (bool) – Must be True. Setting to False raises NotImplementedError.

Returns:

A new TDS frame containing only the rows in the specified positional range.

Return type:

PandasApiTdsFrame

Raises:
  • NotImplementedError – If axis not ``0 or 'index'. If copy is False. If before or after is a non-integer type (e.g. a string or date). If before or after is a non-integer type (e.g. a string or date).

  • ValueError – If before is greater than after (after clamping).

See also

head

Return the first n rows.

sort_values

Sort the frame before truncating.

filter

Select columns by name, substring, or regex.

Notes

Differences from pandas:

  • In pandas, truncate selects rows by label (index value). Here, it selects rows by positional (integer) index only (its translated to LIMIT and OFFSET of the underlying SQL engine). Passing date, str, or other label-based values for before / after raises NotImplementedError.

  • copy=False is not supported; a new frame is always returned.

  • axis=1 (truncating columns) is not supported.

  • Negative before values are silently clamped to 0 rather than raising an error. Negative after values result in an empty frame (zero rows).

  • The after parameter is inclusive (row at position after is included), matching pandas behaviour.

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Get rows at positions 0 through 4 (inclusive)
frame.truncate(before=0, after=4).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name
0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier
1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten
2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes
3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock
4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices
# Skip first 5 rows, keep the rest
frame.truncate(before=5).head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name
0 10253 1996-07-10 1996-07-24 1996-07-16 Hanari Carnes
1 10254 1996-07-11 1996-08-08 1996-07-23 Chop-suey Chinese
2 10255 1996-07-12 1996-08-09 1996-07-15 Richter Supermarkt
3 10256 1996-07-15 1996-08-12 1996-07-17 Wellington Importadora
4 10257 1996-07-16 1996-08-13 1996-07-22 HILARION-Abastos
# Get rows at positions 2 through 6 (inclusive)
frame.truncate(before=2, after=6).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name
0 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes
1 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock
2 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices
3 10253 1996-07-10 1996-07-24 1996-07-16 Hanari Carnes
4 10254 1996-07-11 1996-08-08 1996-07-23 Chop-suey Chinese

var

PandasApiTdsFrame.var(axis=0, skipna=True, ddof=1, numeric_only=False, **kwargs)[source]

Compute the variance of each column.

Convenience method equivalent to aggregate('var') (ddof=1) or aggregate('variance_population') (ddof=0). Returns a single-row TDS frame with the variance of every column.

Parameters:
  • axis (Union[int, str]) – Only 0 / 'index' is supported.

  • skipna (bool) – Must be True. False is not supported.

  • ddof (int) – Degrees of freedom. 1 for sample variance (VAR_SAMP), 0 for population variance (VAR_POP).

  • numeric_only (bool) – Must be False. True is not supported.

  • **kwargs (Union[int, float, str, bool, date, datetime, Decimal, PyLegendPrimitive]) – Not supported.

Returns:

A single-row TDS frame with column variances.

Return type:

PandasApiTdsFrame

Raises:

NotImplementedError – If ddof is not 0 or 1, or if any other parameter is set to an unsupported value.

See also

aggregate

General aggregation method.

std

Compute column standard deviations.

mean

Compute column means.

Notes

Differences from pandas:

  • Only ddof=0 and ddof=1 are supported.

  • Internally delegates to aggregate('var') (ddof=1, maps to VAR_SAMP) or aggregate('variance_population') (ddof=0, maps to VAR_POP).

Examples

Download Interactive Notebook

import pylegend
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Variance of numeric columns
frame.filter(items=["Order Id"]).var().to_pandas()
Order Id
0 57477.5

window_frame_legend_ext

PandasApiTdsFrame.window_frame_legend_ext(frame_spec, order_by=None, ascending=True)[source]

Create a custom window specification with explicit frame bounds.

PyLegend extension — not present in pandas.

Provides fine-grained control over the ROWS BETWEEN or RANGE BETWEEN clause used by window-aggregate computations.

Parameters:
  • frame_spec (FrameSpec) – A window-frame specification created via rows_between() or range_between().

  • order_by (Union[str, Sequence[str], None]) – Column(s) to order by within the window. None means no explicit ordering (a fallback will be chosen automatically).

  • ascending (Union[bool, Sequence[bool]]) – Sort direction(s) for the order_by columns.

Returns:

A window frame on which window aggregates (sum, mean, min, max, etc.) can be called.

Return type:

PandasApiWindowTdsFrame

Raises:

TypeError – If frame_spec is not a RowsBetween or RangeBetween.

See also

expanding

Expanding (cumulative) window.

rolling

Fixed-size sliding window.

rows_between

Create a ROWS BETWEEN specification.

range_between

Create a RANGE BETWEEN specification.

Notes

Differences from pandas:

  • This method has no pandas equivalent. It is a pylegend extension for explicit control over the SQL window frame.

Examples

Download Interactive Notebook

import pylegend
from pylegend.core.language.pandas_api.pandas_api_frame_spec import (
    RowsBetween,
)
frame = pylegend.samples.pandas_api.northwind_orders_frame()
spec = RowsBetween(-2, 0)
frame.filter(items=["Order Id"]).window_frame_legend_ext(
    spec, order_by="Order Id"
).sum().head(5).to_pandas()