Pandas TDS Frame
The PandasApiTdsFrame class provides a Pandas-like interface for working with TDS (Tabular Data Store) frames.
It offers methods for data manipulation, filtering, aggregation, joins, and window functions.
agg
- PandasApiTdsFrame.agg(func, axis=0, *args, **kwargs)[source]
Alias for
aggregate(). Seeaggregatefor full documentation.- Return type:
PandasApiTdsFrame
aggregate
- PandasApiTdsFrame.aggregate(func, axis=0, *args, **kwargs)[source]
Aggregate the TDS frame using one or more operations.
Apply one or more aggregation functions across all columns or specific columns, collapsing the frame into a single-row summary. Supported aggregation strings are
'sum','mean','min','max','count','std','var', as well as aliases'len','size'(both map to count), and'average'/'avg'(map to mean). Along with these, callables and numpy universal functions are supported.- Parameters:
func (
Union[Callable[...,Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]],str,ufunc,List[Union[Callable[...,Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]],str,ufunc]],Mapping[Hashable,Union[Callable[...,Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]],str,ufunc,List[Union[Callable[...,Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]],str,ufunc]]]]]) –Aggregation specification. Accepted forms:
str: A named aggregation (e.g.'sum') applied to every column.callable: A function that receives a column’s Series proxy and returns an aggregated value (e.g.lambda x: x.sum()), applied to every column.np.ufunc: A NumPy universal function (e.g.np.sum), applied to every column.list: A list containing one of the above, applied to every column. Output column names are prefixed with the function name (e.g.'sum(col)').dict: A mapping of column name → aggregation (str, callable, np.ufunc, or a list of these). Only the specified columns appear in the result.
axis (
Union[int,str]) – Axis along which to aggregate. Only0/'index'is supported.*args (
Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]) – Not supported. Passing positional arguments raisesNotImplementedError.**kwargs (
Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]) – Not supported. Passing keyword arguments raisesNotImplementedError.
- Returns:
A new single-row TDS frame with the aggregated values.
- Return type:
PandasApiTdsFrame- Raises:
NotImplementedError – If
axisis not0or'index'. If extra*argsor**kwargsare passed.TypeError – If
funcis not a supported type (str, callable, np.ufunc, list, or dict). If dict keys are not strings, or dict/list values contain unsupported types.ValueError – If a dict key refers to a column that does not exist in the frame.
See also
Notes
Differences from pandas:
In pandas,
aggregatecan return a multi-row result when multiple functions are applied (one row per function). Here, multiple functions per column produce multiple columns in a single-row result (e.g.{'col': ['min', 'max']}yields columns'min(col)'and'max(col)').Extra
*argsand**kwargsare not forwarded to the aggregation function; passing them raisesNotImplementedError.axis=1(column-wise aggregation) is not supported.When
funcis a list, it must contain exactly one element. Multi-element lists behave identically to a single- element list mapping applied to every column.
Examples
import pylegend import numpy as np frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Aggregate a single column with a string function frame.aggregate({"Order Id": "count"}).to_pandas()
Order Id 0 830 # Aggregate multiple columns with different functions frame.aggregate({"Order Id": "min", "Ship Name": "count"}).to_pandas()
Order Id Ship Name 0 10248 830 # Broadcast a single function to all columns frame.aggregate("count").to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name 0 830 830 830 809 830 # Use a lambda for custom aggregation frame.aggregate({ "Order Id": lambda x: x.max(), "Order Date": np.min, "Order Date": np.max, "Shipped Date": "min" }).to_pandas()
Order Id Order Date Shipped Date 0 11077 1998-05-06 1996-07-10
apply
- PandasApiTdsFrame.apply(func, axis=0, raw=False, result_type=None, args=(), by_row='compat', engine='python', engine_kwargs=None, **kwargs)[source]
Apply a function to each column of the TDS frame.
The callable receives a
Seriesproxy for each column and must return a transformed value. The function is applied independently to every column, producing a new frame with the same column names but transformed values. Additional positional and keyword arguments can be forwarded to the callable viaargsand**kwargs.- Parameters:
func (
Union[Callable[[Concatenate[Series,ParamSpec(P, bound=None)]],Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]],str]) – A function that takes aSeries(column proxy) as its first argument and returns a primitive value or expression. String-based function names (e.g.'sum') are not supported; useaggregate()for named aggregations.axis (
Union[int,str]) – Only column-wise application is supported (axis=0or'index'). Row-wise application (axis=1) raisesValueError.raw (
bool) – Must beFalse.Trueis not supported.result_type (
Optional[str]) – Must beNone. Any value raisesNotImplementedError.args (
Tuple[Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive],...]) – Positional arguments to pass tofuncafter theSeriesargument.by_row (
Union[bool,str]) – Must beFalseor'compat'.TrueraisesNotImplementedError.engine (
str) – Must be'python'.'numba'is not supported.engine_kwargs (
Optional[Dict[str,Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]]]) – Must beNone. Not supported.**kwargs (
Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]) – Additional keyword arguments forwarded tofunc.
- Returns:
A new TDS frame with the function applied to every column.
- Return type:
PandasApiTdsFrame- Raises:
ValueError – If
axisis not0or'index'.NotImplementedError – If
raw=True,result_typeis set,by_row=True,engine='numba',engine_kwargsis set, orfuncis a string.TypeError – If
funcis not callable.
See also
Notes
Differences from pandas:
In pandas,
applywithaxis=0passes each column as apandas.Seriesto the function, which can return a scalar (reducing the frame) or a Series (transforming it). Here,funcreceives a column Series proxy and must return a scalar expression that defines a row-level transformation. This meansapplyalways produces a frame with the same number of rows — it cannot reduce the frame the way pandasapplycan.Row-wise application (
axis=1) is not supported.String function names (e.g.
'sum') are not supported. Useaggregate()instead.raw=True,result_type,engine='numba', andengine_kwargsare not supported.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Apply a lambda to every column frame.filter(items=["Order Id"]).apply( lambda x: x * 2 ).head(5).to_pandas()
Order Id 0 20496 1 20498 2 20500 3 20502 4 20504 # Apply a function with extra arguments def add_offset(series, offset, *, scale=1): return series * scale + offset
frame.filter(items=["Order Id"]).apply( add_offset, args=(100,), scale=2 ).head(5).to_pandas()
Order Id 0 20596 1 20598 2 20600 3 20602 4 20604
assign
- PandasApiTdsFrame.assign(**kwargs)[source]
Add or overwrite columns using keyword arguments.
Return a new TDS frame with new columns added (or existing columns overwritten). Each keyword argument defines a column name and a callable that computes the column’s value from each row.
- Parameters:
**kwargs (
Callable[[PandasApiTdsRow],Union[int,float,bool,str,date,datetime,PyLegendPrimitive]]) – Each keyword argument is a column name mapped to a function that takes aPandasApiTdsRowand returns a scalar value. Supported return types areint,float,bool,str,date,datetime, andPyLegendPrimitive.- Returns:
A new TDS frame with the additional (or overwritten) columns.
- Return type:
PandasApiTdsFrame- Raises:
RuntimeError – If the callable returns an unsupported type (e.g. a list).
See also
Notes
Differences from pandas:
In pandas
assign, each keyword argument can be a callable or a static value (e.g.frame.assign(col=5)). Here, every value must be a callable that takes a row, even for constants (e.g.frame.assign(col=lambda x: 5)).Column values are accessed via typed accessor methods such as
x.get_integer("col")andx.get_string("col"), or via bracket notationx["col"].Returning a non-scalar type (e.g. a list) from the callable raises a
RuntimeError, unlike pandas which would broadcast or create nested data.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Add a constant column frame.assign(constant=lambda x: 100).head(3).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name constant 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 100 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 100 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 100 # Add a computed column derived from existing columns frame.assign( ship_upper=lambda x: x.get_string("Ship Name").upper() ).head(3).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name ship_upper 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier VINS ET ALCOOLS CHEVALIER 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten TOMS SPEZIALITÄTEN 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes HANARI CARNES # Overwrite an existing column frame.assign( **{"Ship Name": lambda x: x.get_string("Ship Name").upper()} ).head(3).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name 0 10248 1996-07-04 1996-08-01 1996-07-16 VINS ET ALCOOLS CHEVALIER 1 10249 1996-07-05 1996-08-16 1996-07-10 TOMS SPEZIALITÄTEN 2 10250 1996-07-08 1996-08-05 1996-07-12 HANARI CARNES
cast
- PandasApiTdsFrame.cast(column_type_map)[source]
Change the declared type of one or more columns.
Return a new TDS frame whose column metadata reflects the requested type changes. The underlying data is not transformed in SQL (no
CASTexpression is emitted); instead a Pure->cast(...)clause is appended so that the Legend Engine re-interprets the column under the target type.A cast is allowed only when the source and target types share a subclass relationship in the PyLegend type hierarchy. For example,
Integer → BigIntis valid becauseBigIntis a sub-type ofInteger, butString → Integeris not.- Parameters:
column_type_map (
Dict[str,Union[PrimitiveType,Tuple[PrimitiveType,...]]]) – A mapping from column name to the desired target type. Values are produced by the helpers inpylegend.core.language.type_factory— for exampletf.bigint(),tf.varchar(200),tf.numeric(10, 2). An empty dict is valid and returns a copy of the frame with unchanged columns.- Returns:
A new TDS frame with the cast column metadata. The original frame is never mutated.
- Return type:
PandasApiTdsFrame- Raises:
ValueError – If a column name in column_type_map does not exist in the frame, or if the source-to-target conversion is not allowed (the types do not share a subclass relationship).
TypeError – If the target column is a non-primitive column (e.g. an
EnumTdsColumn). OnlyPrimitiveTdsColumncolumns can be cast.
See also
Notes
Differences from pandas:
Pandas
DataFrame.astype()converts data values in memory.castchanges only the declared column type in the query metadata; no SQLCASTexpression is generated.The allowed conversions follow the Legend type hierarchy, not Python/NumPy dtype-coercion rules.
Parameterised types such as
Varchar(200)andNumeric(10, 2)are supported through thetype_factoryhelpers and are reflected in the generated Pure->cast(...)clause.
Cross-branch casts (e.g.
Integer → Float,String → Boolean) raiseValueError.Examples
import pylegend from pylegend.core.language import type_factory as tf
frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Widen an integer column to BigInt casted = frame.cast({"Order Id": tf.bigint()}) casted.head(3).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes # Cast multiple columns at once casted = frame.cast({ "Order Id": tf.bigint(), "Ship Name": tf.varchar(200), }) casted.head(3).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes
concat_legend_ext
- PandasApiTdsFrame.concat_legend_ext(other)[source]
Concatenate this frame with another frame vertically.
PyLegend extension — not present in pandas.
Produces a SQL
UNION ALLof the two frames. Both frames must have compatible schemas (same column names and types).- Parameters:
other (
PandasApiTdsFrame) – The frame to concatenate below this one.- Returns:
A new TDS frame whose rows are the rows of
selffollowed by the rows ofother.- Return type:
PandasApiTdsFrame- Raises:
TypeError – If
otheris not aPandasApiBaseTdsFrame.
See also
mergeSQL join of two frames.
Notes
Differences from pandas:
In pandas,
pd.concatis a top-level function that accepts a list of DataFrames. Here,concat_legend_extis a method on aPandasApiTdsFrameand only supports vertical concatenation (UNION ALL) of two frames with the same schema.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
top = frame.head(3) bottom = frame.head(3) top.concat_legend_ext(bottom).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 3 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 4 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 5 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes
count
- PandasApiTdsFrame.count(axis=0, numeric_only=False, **kwargs)[source]
Count non-null values in each column.
Convenience method equivalent to
aggregate('count'). Returns a single-row TDS frame with the count of non-null values for every column.- Parameters:
axis (
Union[int,str]) – Only0/'index'is supported.numeric_only (
bool) – Must beFalse.Trueis not supported.**kwargs (
Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]) – Not supported.
- Returns:
A single-row TDS frame with non-null counts per column.
- Return type:
PandasApiTdsFrame- Raises:
NotImplementedError – If any parameter is set to an unsupported value.
Notes
Internally delegates to
aggregate('count'). The same pandas deviations assum()apply (axis=1,numeric_only=Truenot supported). Unlikesum,countdoes not have askipnaparameter since counting is always of non-null values.Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Count non-null values in each column frame.count().to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name 0 830 830 830 809 830
cume_dist_legend_ext
- PandasApiTdsFrame.cume_dist_legend_ext(ascending=True)[source]
Compute the cumulative distribution of each column.
PyLegend extension — not present in pandas.
Maps to SQL
CUME_DIST() OVER (ORDER BY col)and PurecumulativeDistribution.- Parameters:
ascending (
bool) – Whether to order in ascending direction.- Returns:
A new TDS frame with cumulative distribution values (floats between 0 and 1) replacing every column.
- Return type:
PandasApiTdsFrame
See also
rankCompute column ranks.
ntile_legend_extAssign rows to numbered buckets.
Notes
Differences from pandas:
This method has no pandas equivalent.
CUME_DISTis exposed as a pylegend extension.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.filter( items=["Order Id"] ).cume_dist_legend_ext().head(5).to_pandas()
Order Id 0 0.001205 1 0.00241 2 0.003614 3 0.004819 4 0.006024
drop
- PandasApiTdsFrame.drop(labels=None, axis=1, index=None, columns=None, level=None, inplace=False, errors='raise')[source]
Remove columns from the TDS frame by label.
Return a new TDS frame with the specified columns removed. Columns can be identified via
labels(withaxis=1) or via thecolumnsparameter directly. Accepts a single column name, a list, tuple, or set of names.- Parameters:
labels (
Union[str,Sequence[str],Set[str],None]) – Column name(s) to drop. Mutually exclusive withcolumns.axis (
Union[str,int,PyLegendInteger]) – The axis to drop along. Only column-axis (1/'columns') is supported.index (
Union[str,Sequence[str],Set[str],None]) – Not supported. Passing any value raisesNotImplementedError.columns (
Union[str,Sequence[str],Set[str],None]) – Column name(s) to drop. Mutually exclusive withlabels.level (
Union[str,int,PyLegendInteger,None]) – Not supported. Passing any value raisesNotImplementedError.inplace (
Union[bool,PyLegendBoolean]) – Must beFalse. In-place mutation is not supported.errors (
str) – If'raise', aKeyErroris raised when any label is not found. If'ignore', missing labels are silently skipped.
- Returns:
A new TDS frame without the specified columns.
- Return type:
PandasApiTdsFrame- Raises:
ValueError – If both
labelsandcolumnsare provided, or if neither is provided. Ifaxisis an invalid value (not0,1,'index', or'columns').NotImplementedError – If
axisis0/'index'(row-level drop). Ifindexorlevelis provided. IfinplaceisTrue.KeyError – If any specified column does not exist in the frame and
errors='raise'.TypeError – If
labelsorcolumnsis an unsupported type (e.g. a callable).
See also
Notes
Differences from pandas:
In pandas,
dropcan remove rows (axis=0) or columns (axis=1). Here, only column-axis dropping is supported (axis=1). Passingaxis=0raisesNotImplementedError.The
axisparameter defaults to1(columns), whereas in pandas it defaults to0(rows). This means bareframe.drop("col")drops a column here but would attempt to drop a row label in pandas.The
indexandlevelparameters are not supported.inplace=Trueis not supported; always returns a new frame.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Drop a single column frame.drop(columns="Ship Name").head(3).to_pandas()
Order Id Order Date Required Date Shipped Date 0 10248 1996-07-04 1996-08-01 1996-07-16 1 10249 1996-07-05 1996-08-16 1996-07-10 2 10250 1996-07-08 1996-08-05 1996-07-12 # Drop multiple columns frame.drop(columns=["Ship Name", "Order Date"]).head(3).to_pandas()
Order Id Required Date Shipped Date 0 10248 1996-08-01 1996-07-16 1 10249 1996-08-16 1996-07-10 2 10250 1996-08-05 1996-07-12 # Using labels parameter frame.drop(labels=["Ship Name"], axis=1).head(3).to_pandas()
Order Id Order Date Required Date Shipped Date 0 10248 1996-07-04 1996-08-01 1996-07-16 1 10249 1996-07-05 1996-08-16 1996-07-10 2 10250 1996-07-08 1996-08-05 1996-07-12 # Ignore missing columns instead of raising an error frame.drop(columns=["Ship Name", "NonExistent"], errors="ignore").head(3).to_pandas()
Order Id Order Date Required Date Shipped Date 0 10248 1996-07-04 1996-08-01 1996-07-16 1 10249 1996-07-05 1996-08-16 1996-07-10 2 10250 1996-07-08 1996-08-05 1996-07-12
drop_duplicates
- PandasApiTdsFrame.drop_duplicates(subset=None, *, keep='first', inplace=False, ignore_index=False)[source]
Remove duplicate rows.
Returns a new TDS frame with duplicate rows removed, optionally considering only a subset of columns for identifying duplicates.
- Parameters:
subset (
Union[str,List[str],None]) – Column label or list of labels to consider for identifying duplicates. IfNone, all columns are used.keep (
str) – Must be'first'. Only keeping the first occurrence is supported.inplace (
bool) – Must beFalse. In-place modification is not supported.ignore_index (
bool) – Must beFalse. Not supported.
- Returns:
A new TDS frame with duplicates removed.
- Return type:
PandasApiTdsFrame- Raises:
NotImplementedError – If
keepis not'first', orinplace/ignore_indexareTrue.
Notes
Differences from pandas:
Only
keep='first'is supported.'last'andFalseare not supported.inplace=Trueandignore_index=Trueare not supported.Generates SQL
SELECT DISTINCT ON ...or equivalent.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Remove rows with duplicate Ship Name frame.drop_duplicates(subset=["Ship Name"]).head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices
dropna
- PandasApiTdsFrame.dropna(axis=0, how='any', thresh=None, subset=None, inplace=False, ignore_index=False)[source]
Remove rows with missing values.
Return a new TDS frame with rows containing NA / null values removed. The check can be scoped to specific columns via
subsetand controlled viahow.- Parameters:
axis (
Union[int,str]) – Only0/'index'(drop rows) is supported.1/'columns'(drop columns) raisesNotImplementedError.how (
str) –'any': Drop the row if any of the considered columns contain a null value.'all': Drop the row only if all of the considered columns are null.
thresh (
Optional[int]) – Not supported. Passing any value raisesNotImplementedError.subset (
Union[str,Sequence[str],None]) – Column names to consider when checking for nulls. IfNone(default), all columns are considered. An empty list withhow='any'keeps all rows; an empty list withhow='all'drops all rows.inplace (
bool) – Must beFalse.TrueraisesNotImplementedError.ignore_index (
bool) – Must beFalse.TrueraisesNotImplementedError.
- Returns:
A new TDS frame with rows containing nulls removed.
- Return type:
PandasApiTdsFrame- Raises:
NotImplementedError – If
axis=1,threshis set,inplace=True, orignore_index=True.ValueError – If
axisis not a recognised value orhowis not'any'or'all'.TypeError – If
subsetis not a list, tuple, or set.KeyError – If any column in
subsetdoes not exist in the frame.
See also
fillnaFill missing values instead of dropping rows.
Notes
Differences from pandas:
axis=1(dropping columns with nulls) is not supported.thresh(minimum number of non-null values to keep a row) is not supported.inplace=Trueis not supported; a new frame is always returned.ignore_index=Trueis not supported.Passing an empty
subset=[]withhow='any'is a no-op (all rows are kept). Withhow='all', an emptysubset=[]drops all rows (the filter becomesfalse).
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Drop rows where any column is null frame.dropna().head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices # Drop rows where all columns are null frame.dropna(how="all").head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices # Only consider specific columns frame.dropna(subset=["Ship Name"]).head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices
expanding
- PandasApiTdsFrame.expanding(min_periods=1, axis=0, method=None, order_by=None, ascending=True)[source]
Create an expanding window frame for window-aggregate computations.
An expanding window includes all rows from the start of the partition up to the current row. This is useful for running totals, running averages, and similar cumulative calculations.
- Parameters:
min_periods (
int) – Minimum number of observations in the window required to have a value; otherwise, result isnull.axis (
Union[int,str]) – Only0/'index'is supported.method (
Optional[str]) – Must beNoneor'python'.order_by (
Union[str,Sequence[str],None]) – Column(s) to order by within the window. Required for deterministic results.ascending (
Union[bool,Sequence[bool]]) – Sort order for theorder_bycolumns.
- Returns:
A window frame on which window aggregates (
sum,mean,min,max, etc.) can be called.- Return type:
PandasApiWindowTdsFrame
- Raises:
NotImplementedError – If
axisis not0,methodis notNoneor'python', ormin_periodsis less than1.
Notes
Differences from pandas:
order_byandascendingare pylegend extensions not present in pandas. They control theORDER BYclause inside the SQLOVER(...)window specification.axis=1is not supported.method='table'is not supported.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Running sum of Order Id ordered by Order Id frame.filter(items=["Order Id"]).expanding( order_by="Order Id" ).aggregate("sum").head(5).to_pandas()
Order Id 0 10248 1 20497 2 30747 3 40998 4 51250
fillna
- PandasApiTdsFrame.fillna(value=None, axis=0, inplace=False, limit=None)[source]
Fill missing values with a specified value.
Replace NA / null entries in the TDS frame. A scalar
valueis applied to every column; a dict maps specific columns to their fill values (columns not present in the dict are left unchanged). Implemented viaCOALESCEat the SQL level.- Parameters:
value (
Union[int,float,str,bool,date,datetime,Dict[str,Union[int,float,str,bool,date,datetime]]]) – Value(s) to replace nulls with. Accepted scalar types areint,float,str,bool,date, anddatetime. When a dict is provided, keys must be column name strings and values must be scalars of the above types. Columns in the dict that do not exist in the frame are silently ignored. Omittingvalueentirely raisesValueError.axis (
Union[str,int,None]) – Only0/'index'is supported.1/'columns'raisesNotImplementedError.inplace (
bool) – Must beFalse.TrueraisesNotImplementedError.limit (
Optional[int]) – Not supported. Passing any value raisesNotImplementedError.
- Returns:
A new TDS frame with null values replaced.
- Return type:
PandasApiTdsFrame- Raises:
ValueError – If
valueis not provided. Ifaxisis not a recognised value.TypeError – If
valueis not a scalar or dict. If dict keys are not strings or dict values are not scalars.NotImplementedError – If
axis=1,inplace=True, orlimitis set.
See also
dropnaRemove rows with missing values.
Notes
Differences from pandas:
The
methodparameter ('ffill','bfill') available in older pandas versions is not present.inplace=Trueis not supported; a new frame is always returned.limit(maximum number of consecutive nulls to fill) is not supported.axis=1(fill along columns) is not supported.
Examples
import datetime import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame() frame = frame.sort_values("Shipped Date") frame = frame.head()
# check initial count of all the non-null values frame.to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name 0 11059 1998-04-29 1998-06-10 NaT Ricardo Adocicados 1 11058 1998-04-29 1998-05-27 NaT Blauer See Delikatessen 2 11054 1998-04-28 1998-05-26 NaT Cactus Comidas para llevar 3 11051 1998-04-27 1998-05-25 NaT La maison d'Asie 4 11045 1998-04-23 1998-05-21 NaT Bottom-Dollar Markets # Fill all null values of the "Shipped Date" column with a fixed date frame = frame.fillna({ "Shipped Date": datetime.date(1, 1, 1) }) frame.to_pandas()
filter
- PandasApiTdsFrame.filter(items=None, like=None, regex=None, axis=None)[source]
Select columns by label, substring match, or regular expression.
This method selects columns from the TDS frame based on their names. Exactly one of
items,like, orregexmust be provided; they are mutually exclusive.- Parameters:
items (
Optional[List[str]]) – Exact column names to keep. All names must exist in the frame.like (
Optional[str]) – Keep columns whose names contain this substring.regex (
Optional[str]) – Keep columns whose names match this regular expression (usesre.search).axis (
Union[str,int,PyLegendInteger,None]) – The axis to filter on. Only column-axis filtering is supported. Defaults to1(columns) when omitted.
- Returns:
A new TDS frame containing only the selected columns.
- Return type:
PandasApiTdsFrame- Raises:
TypeError – If more than one of
items,like, orregexis provided, or if none of them is provided. Ifitemsis a string instead of a list, or iflike/regexis not a string.ValueError – If
axisis not1or'columns'. If any name initemsdoes not exist in the frame. If no columns match thelikesubstring orregexpattern. If theregexpattern is invalid.
Notes
Differences from pandas:
In pandas,
filtersupports both row-axis (axis=0) and column-axis (axis=1) filtering. Here, only column-axis filtering is supported (axis=1oraxis='columns'). Passingaxis=0or'index'raisesValueError.In pandas,
itemssilently ignores names that do not exist in the frame. Here, all names must exist; unknown names raise aValueErrorlisting the missing and available columns.In pandas,
likeandregexreturn an empty DataFrame when no columns match. Here, they raiseValueErrorwhen no columns match.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Select specific columns by name frame.filter(items=["Order Id", "Ship Name"]).head(3).to_pandas()
Order Id Ship Name 0 10248 Vins et alcools Chevalier 1 10249 Toms Spezialitäten 2 10250 Hanari Carnes # Select columns whose names contain a substring frame.filter(like="Ship").head(3).to_pandas()
Shipped Date Ship Name 0 1996-07-16 Vins et alcools Chevalier 1 1996-07-10 Toms Spezialitäten 2 1996-07-12 Hanari Carnes # Select columns matching a regex pattern frame.filter(regex="^Ship").head(3).to_pandas()
Shipped Date Ship Name 0 1996-07-16 Vins et alcools Chevalier 1 1996-07-10 Toms Spezialitäten 2 1996-07-12 Hanari Carnes # Chain filters to progressively narrow columns frame.filter(like="Ship").filter(regex="Name$").head(3).to_pandas()
Ship Name 0 Vins et alcools Chevalier 1 Toms Spezialitäten 2 Hanari Carnes
groupby
- PandasApiTdsFrame.groupby(by, level=None, as_index=False, sort=True, group_keys=False, observed=False, dropna=False)[source]
Group the TDS frame by one or more columns.
Return a
PandasApiGroupbyTdsFrameobject that can be used to apply aggregation functions (sum,mean,min,max,std,var,count, or the generalaggregate/agg) and OLAP window functions (rank) to each group. Column selection after grouping is supported via bracket notation (e.g.frame.groupby("A")["B"].sum()).The groupby columns act as the
PARTITION BYclause in the underlying SQL when window functions such asrankare used.- Parameters:
by (
Union[str,List[str]]) – Column name or list of column names to group by. All names must exist in the current frame.level (
Union[str,int,List[str],None]) – Not supported. Passing any value raisesNotImplementedError. Usebyinstead.as_index (
bool) – Must beFalse. Setting toTrueraisesNotImplementedError.sort (
bool) – Whether to sort the result by the grouping columns after aggregation.group_keys (
bool) – Must beFalse. Setting toTrueraisesNotImplementedError.observed (
bool) – Must beFalse. Setting toTrueraisesNotImplementedError.dropna (
bool) – Must beFalse. Setting toTrueraisesNotImplementedError.
- Returns:
A groupby object on which aggregation and window methods can be called. See
PandasApiGroupbyTdsFramefor the full list of available methods.- Return type:
PandasApiGroupbyTdsFrame- Raises:
NotImplementedError – If
level,as_index=True,group_keys=True,observed=True, ordropna=Trueis provided.TypeError – If
byis not a string or list of strings.ValueError – If
byis an empty list.KeyError – If any column in
bydoes not exist in the frame.
See also
Notes
Differences from pandas:
as_indexdefaults toFalseand must beFalse. In pandas it defaults toTrue. This means the grouping columns always appear as regular columns in the result, never as the index.group_keys,observed, anddropnamust all beFalse; theirTruevariants are not supported.level(grouping by index level) is not supported.The groupby object supports column selection via
[col](returns aGroupbySeries) or[[col1, col2]](returns a narrowedPandasApiGroupbyTdsFrame), matching the pandas patternframe.groupby(...)["col"].sum().When
sort=True(default), the result is sorted by the grouping columns in ascending order after aggregation.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Group by a single column and count frame.groupby("Ship Name")["Order Id"].count().head(5).to_pandas()
# Group by a column and sum a numeric column frame.groupby("Ship Name")["Order Id"].sum().head(5).to_pandas()
# Group by a column with dict-based aggregation frame.groupby("Ship Name").agg({"Order Id": "count"}).head(5).to_pandas()
Note
The returned PandasApiGroupbyTdsFrame
object has its own set of aggregation and window methods whose signatures
may differ from the frame-level equivalents. See Pandas Groupby TDS Frame
for the full API reference.
head
- PandasApiTdsFrame.head(n=5)[source]
Return the first n rows of the TDS frame.
This function returns the first
nrows from the frame. It is useful for quickly inspecting the data without loading the entire dataset.- Parameters:
n (
int) – Number of rows to return. Must be a non-negative integer. Passing a negative value raisesNotImplementedError. Passing a non-int type raisesTypeError.- Returns:
A new TDS frame containing only the first n rows.
- Return type:
PandasApiTdsFrame- Raises:
TypeError – If
nis not an int.NotImplementedError – If
nis negative.
See also
Notes
Differences from pandas:
Negative values for ``n`` are not supported. In pandas,
head(-n)returns all rows except the lastn. Here, passing a negative value raisesNotImplementedError.The operation is lazy — it builds a query rather than materialising rows in memory. Call
to_pandas()orexecute_frame_to_string()to materialise the result.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Get first 5 rows (default) frame.head().to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices # Get first 3 rows frame.head(3).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes
iloc
- property PandasApiTdsFrame.iloc: PandasApiIlocIndexer
Purely integer-location based indexing for selection by position.
Access rows and columns by integer position (0-based). Returns a
PandasApiIlocIndexerthat supports[]notation.Allowed inputs:
An integer — selects a single row (e.g.
frame.iloc[5]).A slice with ints — selects a range of rows (e.g.
frame.iloc[1:7]). Only step=1 (orNone) is supported.A tuple of (rows, cols) — selects rows and columns simultaneously (e.g.
frame.iloc[1:5, 0:2]). Each element can be an int or a slice.
- Returns:
An indexer object supporting
[]notation that returns a newPandasApiTdsFrame.- Return type:
PandasApiIlocIndexer
- Raises:
IndexError – If more than two indexers are provided. If a column integer index is out of bounds.
NotImplementedError – If a slice step other than 1 is used for rows or columns. If a list, boolean array, or callable is used as an indexer.
See also
Notes
Differences from pandas:
Only int and slice indexers are supported. Lists of integers, boolean arrays, and callable indexers raise
NotImplementedError.Slice steps other than 1 are not supported.
Negative integer indexing for rows is handled via
truncate, so it follows truncate’s limitations.When a single integer row index exceeds the number of rows, an empty frame is returned (no
IndexErroris raised, unlike pandas).
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Select a single row frame.iloc[0].to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier # Select a range of rows and columns frame.iloc[1:4, 0:2].to_pandas()
Order Id Order Date 0 10249 1996-07-05 1 10250 1996-07-08 2 10251 1996-07-08 # Select a single row, all columns frame.iloc[2, :].to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name 0 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes
info
- PandasApiTdsFrame.info(verbose=None, buf=None, max_cols=None, memory_usage=None, show_counts=None)[source]
Print a concise summary of the TDS frame.
Displays the column names and their data types. This is a lightweight alternative to running a query — it uses only the metadata already available on the frame.
- Parameters:
verbose (
Optional[bool]) – Not supported. Ignored.buf (
Union[IO[str],StringIO,None]) – Not supported. Output always goes to stdout.max_cols (
Optional[int]) – Not supported. Ignored.memory_usage (
Union[bool,str,None]) – Not supported. Ignored.show_counts (
Optional[bool]) – Not supported. Ignored.
- Returns:
Prints to stdout; returns nothing.
- Return type:
None
Notes
Differences from pandas:
Only column names and types are shown.
memory_usage,verbose,buf,max_cols, andshow_countsare accepted for API compatibility but ignored.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.info()
<class 'pylegend.extensions.tds.pandas_api.frames.pandas_api_legend_service_input_frame.PandasApiLegendServiceInputFrame'> RangeIndex: 830 entries Data columns (total 5 columns): # Column Non-Null Count Dtype - ------------- -------------- ---------- 0 Order Id 830 non-null Integer 1 Order Date 830 non-null StrictDate 2 Required Date 830 non-null StrictDate 3 Shipped Date 809 non-null StrictDate 4 Ship Name 830 non-null String dtypes: Integer(1), StrictDate(3), String(1)
join
- PandasApiTdsFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False, validate=None)[source]
Join this TDS frame with another on shared column(s).
Convenience method that delegates to
merge(). Thelsuffixandrsuffixparameters are mapped to thesuffixesparameter ofmerge, andonis passed directly.- Parameters:
other (
PandasApiTdsFrame) – The right TDS frame to join with.on (
Union[str,Sequence[str],None]) – Column name(s) to join on. Must exist in both frames. Unlike pandasjoin, this parameter specifies column names, not index labels.how (
Optional[str]) – Type of join. Seemerge()for details.lsuffix (
str) – Suffix to apply to overlapping column names from the left frame.rsuffix (
str) – Suffix to apply to overlapping column names from the right frame.sort (
Optional[bool]) – IfTrue, sort the result by the join keys.validate (
Optional[str]) – Not supported. Passing any value raisesNotImplementedError.
- Returns:
A new TDS frame containing the joined result.
- Return type:
PandasApiTdsFrame- Raises:
ValueError – If overlapping column names exist and
lsuffix/rsuffixdo not resolve the conflict.NotImplementedError – If
validateis set.
See also
mergeThe underlying merge method with full parameter control.
Notes
Differences from pandas:
In pandas,
DataFrame.joinjoins on the index by default, optionally usingonto specify a column in the left frame to match against the right frame’s index. Here,joinis purely column-on-column and delegates directly tomerge(on=on). There is no index-based joining.The
lsuffixandrsuffixparameters correspond tosuffixes=(lsuffix, rsuffix)inmerge. In pandas, default suffixes are empty strings (raising on conflict); here they also default to empty strings.Because this delegates to
merge, all limitations ofmergeapply: no self-join, noleft_index/right_index, noindicator, and novalidate.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Create a second frame with renamed columns frame2 = pylegend.samples.pandas_api.northwind_orders_frame() frame2 = frame2.rename({"Order Id": "Right Order Id"})
# Left join on a common key frame.head(5).join( frame2.head(5), on="Ship Name", how="left", lsuffix="_left", rsuffix="_right" ).to_pandas()
Order Id Order Date_left Required Date_left Shipped Date_left Ship Name Right Order Id Order Date_right Required Date_right Shipped Date_right 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 10248 1996-07-04 1996-08-01 1996-07-16 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 10249 1996-07-05 1996-08-16 1996-07-10 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 10250 1996-07-08 1996-08-05 1996-07-12 3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 10251 1996-07-08 1996-08-05 1996-07-15 4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices 10252 1996-07-09 1996-08-06 1996-07-11
loc
- property PandasApiTdsFrame.loc: PandasApiLocIndexer
Access rows and columns by label-based indexing or boolean conditions.
Returns a
PandasApiLocIndexerthat supports[]notation for combined row filtering and column selection.Row selection (first indexer):
Complete slice
:: Select all rows.Boolean expression: A
PyLegendBooleanexpression built from column comparisons (e.g.frame['col'] > 5), used as a WHERE filter.Callable: A function that receives the frame and returns a
PyLegendBooleanexpression (e.g.lambda x: x['col'] > 5).
Column selection (second indexer):
``str``: A single column name (e.g.
'col1').``list of str``: Multiple column names.
``list of bool``: Boolean mask over columns (must match the number of columns exactly).
``slice of str``: Label-based column slice (e.g.
'col1':'col3'), inclusive on both ends.Complete slice
:: Select all columns.
- Returns:
An indexer object supporting
[]notation that returns a newPandasApiTdsFrame.- Return type:
PandasApiLocIndexer
- Raises:
IndexError – If more than two indexers are provided. If a boolean column mask has the wrong length.
TypeError – If a label-based slice is used for rows (only
:is allowed). If a list of integers, a set, or another unsupported type is used for row or column selection.KeyError – If a column name in a list does not exist in the frame.
See also
Notes
Differences from pandas:
For row selection, only
:, boolean expressions, and callables are supported. Integer label selection, integer slicing, and list-of-integer selection are not supported.Label-based row slicing (e.g.
frame.loc[2:5]) is not supported — only the complete slice:is allowed.For column selection, string labels, lists of strings, boolean masks, and label-based slices are supported. Label slices use
pandas.Index.slice_indexerinternally, so slice semantics are inclusive on both ends (matching pandaslocbehaviour).If a label-based column slice resolves to an empty selection, an empty frame (zero rows) is returned via
head(0).
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Select specific columns frame.loc[:, "Ship Name"].head(3).to_pandas()
Ship Name 0 Vins et alcools Chevalier 1 Toms Spezialitäten 2 Hanari Carnes # Filter rows with a boolean condition and select columns frame.loc[frame["Order Id"] > 10300, ["Order Id", "Ship Name"]].head(5).to_pandas()
Order Id Ship Name 0 10301 Die Wandernde Kuh 1 10302 Suprêmes délices 2 10303 Godos Cocina Típica 3 10304 Tortuga Restaurante 4 10305 Old World Delicatessen # Filter rows with a callable frame.loc[ lambda x: x["Ship Name"].startswith("A"), ["Order Id", "Ship Name"] ].head(5).to_pandas()
Order Id Ship Name 0 10308 Ana Trujillo Emparedados y helados 1 10355 Around the Horn 2 10365 Antonio Moreno Taquería 3 10383 Around the Horn 4 10453 Around the Horn # Boolean column mask frame.loc[:, [True, False]].head(3).to_pandas()
max
- PandasApiTdsFrame.max(axis=0, skipna=True, numeric_only=False, **kwargs)[source]
Compute the maximum value of each column.
Convenience method equivalent to
aggregate('max'). Returns a single-row TDS frame with the maximum value of every column. For string columns, returns the lexicographically largest value.- Parameters:
axis (
Union[int,str]) – Only0/'index'is supported.skipna (
bool) – Must beTrue.Falseis not supported.numeric_only (
bool) – Must beFalse.Trueis not supported.**kwargs (
Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]) – Not supported.
- Returns:
A single-row TDS frame with column maximums.
- Return type:
PandasApiTdsFrame- Raises:
NotImplementedError – If any parameter is set to an unsupported value.
Notes
Internally delegates to
aggregate('max'). The same pandas deviations assum()apply.Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Maximum of each column frame.filter(items=["Order Id"]).max().to_pandas()
Order Id 0 11077
mean
- PandasApiTdsFrame.mean(axis=0, skipna=True, numeric_only=False, **kwargs)[source]
Compute the mean of each column.
Convenience method equivalent to
aggregate('mean'). Returns a single-row TDS frame with the arithmetic mean of every column.- Parameters:
axis (
Union[int,str]) – Only0/'index'is supported.skipna (
bool) – Must beTrue.Falseis not supported.numeric_only (
bool) – Must beFalse.Trueis not supported.**kwargs (
Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]) – Not supported.
- Returns:
A single-row TDS frame with column means.
- Return type:
PandasApiTdsFrame- Raises:
NotImplementedError – If any parameter is set to an unsupported value.
See also
Notes
Internally delegates to
aggregate('mean'). The same pandas deviations assum()apply (skipna=False,numeric_only=True,axis=1are not supported).Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Mean of numeric columns frame.filter(items=["Order Id"]).mean().to_pandas()
Order Id 0 10662.5
merge
- PandasApiTdsFrame.merge(other, how='inner', on=None, left_on=None, right_on=None, left_index=False, right_index=False, sort=False, suffixes=('_x', '_y'), indicator=False, validate=None)[source]
Merge this TDS frame with another using a database-style join.
Combine two frames column-wise based on common columns or explicit key specifications. Supports inner, left, right, outer (full), and cross joins.
- Parameters:
other (
PandasApiTdsFrame) – The right TDS frame to merge with. Must be a different frame instance; merging a frame with itself raisesNotImplementedError.how (
Optional[str]) –Type of merge:
'inner': Only rows with matching keys in both frames.'left': All rows from the left frame, NaN-filled for non-matching right rows.'right': All rows from the right frame, NaN-filled for non-matching left rows.'outer': All rows from both frames (FULL OUTER JOIN).'cross': Cartesian product of both frames. No join keys may be specified.
on (
Union[str,Sequence[str],None]) – Column name(s) to join on. Must exist in both frames. Mutually exclusive withleft_on/right_on.left_on (
Union[str,Sequence[str],None]) – Column name(s) from the left frame to join on.right_on (
Union[str,Sequence[str],None]) – Column name(s) from the right frame to join on. Must have the same length asleft_on.left_index (
Optional[bool]) – Not supported. Setting toTrueraisesNotImplementedError.right_index (
Optional[bool]) – Not supported. Setting toTrueraisesNotImplementedError.sort (
Optional[bool]) – IfTrue, sort the result by the join keys in ascending order.suffixes (
Union[Tuple[Optional[str],Optional[str]],List[Optional[str]],None]) – Suffixes to apply to overlapping non-key column names from the left and right frames respectively. UseNoneto indicate that the column name from the respective frame should be left as-is (will raise if this causes duplicates).indicator (
Union[bool,str,None]) – Not supported. Setting to a truthy value raisesNotImplementedError.validate (
Optional[str]) – Not supported. Passing any value raisesNotImplementedError.
- Returns:
A new TDS frame containing the merged result.
- Return type:
PandasApiTdsFrame- Raises:
TypeError – If
otheris not aPandasApiTdsFrame. Ifhow,on,left_on,right_on,suffixes, orsorthave invalid types.ValueError – If both
onandleft_on/right_onare specified. Ifleft_onandright_onhave different lengths. If no merge keys can be resolved andhowis not'cross'. Ifhow='cross'is used withon/left_on/right_on. Ifhowis not a recognised join method. If the resulting columns contain duplicates after suffix application.KeyError – If a key specified in
on,left_on, orright_ondoes not exist in the corresponding frame.NotImplementedError – If
left_index=True,right_index=True,indicatoris truthy,validateis set, or the frame is merged with itself.
See also
joinConvenience wrapper around merge with simpler syntax.
Notes
Differences from pandas:
Self-merge is not supported. Merging a frame with itself raises
NotImplementedError.Index-based merging is not supported.
left_indexandright_indexmust beFalse.``indicator`` and ``validate`` parameters are not supported.
When no join keys are provided (and
howis not'cross'), the merge infers keys from the intersection of column names between the two frames. If no common columns exist, aValueErroris raised (unlike pandas, which would raise aMergeError).how='outer'maps to aFULL OUTER JOINat the SQL level.how='cross'is implemented as aCROSS JOINin SQL, but mapped toJoinKind.INNERwith a1==1condition in the PURE query representation.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Create a second frame for joining frame2 = pylegend.samples.pandas_api.northwind_orders_frame() frame2 = frame2.rename({"Order Id": "Right Order Id"})
# Inner merge on a common column frame.head(5).merge( frame2.head(5), how="inner", left_on="Order Id", right_on="Right Order Id" ).to_pandas()
Order Id Order Date_x Required Date_x Shipped Date_x Ship Name_x Right Order Id Order Date_y Required Date_y Shipped Date_y Ship Name_y 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices
min
- PandasApiTdsFrame.min(axis=0, skipna=True, numeric_only=False, **kwargs)[source]
Compute the minimum value of each column.
Convenience method equivalent to
aggregate('min'). Returns a single-row TDS frame with the minimum value of every column. For string columns, returns the lexicographically smallest value.- Parameters:
axis (
Union[int,str]) – Only0/'index'is supported.skipna (
bool) – Must beTrue.Falseis not supported.numeric_only (
bool) – Must beFalse.Trueis not supported.**kwargs (
Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]) – Not supported.
- Returns:
A single-row TDS frame with column minimums.
- Return type:
PandasApiTdsFrame- Raises:
NotImplementedError – If any parameter is set to an unsupported value.
Notes
Internally delegates to
aggregate('min'). The same pandas deviations assum()apply.Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Minimum of each column frame.filter(items=["Order Id"]).min().to_pandas()
Order Id 0 10248
ntile_legend_ext
- PandasApiTdsFrame.ntile_legend_ext(num_buckets, ascending=True)[source]
Assign rows to numbered buckets for each column.
PyLegend extension — not present in pandas.
Maps to SQL
NTILE(n) OVER (ORDER BY col)and Purentile.- Parameters:
num_buckets (
int) – Number of buckets to distribute rows into.ascending (
bool) – Whether to order in ascending direction.
- Returns:
A new TDS frame with integer bucket numbers (1-based) replacing every column.
- Return type:
PandasApiTdsFrame
See also
rankCompute column ranks.
cume_dist_legend_extCumulative distribution.
Notes
Differences from pandas:
This method has no pandas equivalent.
NTILEis exposed as a pylegend extension.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
frame.filter( items=["Order Id"] ).ntile_legend_ext(4).head(5).to_pandas()
Order Id 0 1 1 1 2 1 3 1 4 1
range_between
- PandasApiTdsFrame.range_between(start=None, end=None, *, duration_start=None, duration_start_unit=None, duration_end=None, duration_end_unit=None)[source]
Create a
RANGE BETWEENwindow-frame specification.PyLegend extension — not present in pandas.
Supports two calling styles:
Simple numeric bounds (same sign convention as
rows_between()):range_between(start=-100, end=0) # → RANGE BETWEEN 100 PRECEDING AND CURRENT ROW
Duration-based bounds (for date/time
ORDER BYcolumns):range_between( duration_start=-1, duration_start_unit="DAYS", duration_end=1, duration_end_unit="MONTHS", )
- Parameters:
start (
Union[int,float,Decimal,None]) – Lower bound of the range.Nonemeans unbounded preceding.end (
Union[int,float,Decimal,None]) – Upper bound of the range.Nonemeans unbounded following.duration_start (
Union[int,float,Decimal,str,None]) – Duration-based lower bound. Pass"unbounded"for unbounded preceding.duration_start_unit (
Optional[str]) – Time unit forduration_start(e.g."DAYS","MONTHS").duration_end (
Union[int,float,Decimal,str,None]) – Duration-based upper bound.duration_end_unit (
Optional[str]) – Time unit forduration_end.
- Returns:
A frame specification to pass to
window_frame_legend_ext().- Return type:
RangeBetween- Raises:
ValueError – If positional bounds and duration bounds are mixed, or if
startis greater thanend.
See also
rows_betweenCreate a
ROWS BETWEENspecification.window_frame_legend_extApply a custom window specification.
Notes
Differences from pandas:
This method has no pandas equivalent. It is a pylegend extension for constructing SQL
RANGE BETWEENclauses.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Numeric range: 100 preceding to current row spec = frame.range_between(-100, 0)
rank
- PandasApiTdsFrame.rank(axis=0, method='min', numeric_only=False, na_option='bottom', ascending=True, pct=False)[source]
Compute the rank of values in each column.
Replace every column’s values with their rank within that column. Each column is ranked independently using an SQL window function (
RANK,DENSE_RANK,ROW_NUMBER, orPERCENT_RANK).The result is a new frame with the same column names but all values replaced by their integer (or float when
pct=True) rank.- Parameters:
axis (
Union[int,str]) – Only0/'index'is supported.1raisesNotImplementedError.method (
str) –How to rank equal values:
'min': Lowest rank in the group of ties (SQLRANK()).'first': Ranks assigned in order of appearance (SQLROW_NUMBER()).'dense': Like'min'but ranks always increase by 1, no gaps (SQLDENSE_RANK()).
numeric_only (
bool) – IfTrue, only rank columns of numeric type (Integer, Float, Number). Non-numeric columns are excluded from the result.na_option (
str) – How to rank null values. Only'bottom'is supported.'keep'and'top'raiseNotImplementedError.ascending (
bool) – Whether to rank in ascending order.Falseranks in descending order.pct (
bool) – IfTrue, compute percentage ranks (SQLPERCENT_RANK()). Result columns are of float type. Can only be used withmethod='min'.
- Returns:
A new TDS frame where every column contains integer ranks (or float when
pct=True).- Return type:
PandasApiTdsFrame- Raises:
NotImplementedError – If
axisis not0or'index'. Ifmethodis not one of'min','first','dense'(e.g.'average'and'max'are not supported). Ifna_optionis not'bottom'. Ifpct=Truewith a method other than'min'.
See also
PandasApiGroupbyTdsFrame.rankRank within groups.
sort_valuesSort the frame by column values.
Notes
Differences from pandas:
The
'average'and'max'ranking methods are not supported. Only'min','first', and'dense'are available.na_optiononly supports'bottom'.'keep'and'top'raiseNotImplementedError.pct=Trueis only supported withmethod='min'(maps toPERCENT_RANK()). Combiningpct=Truewith other methods raisesNotImplementedError.When applied to the full frame (not via a Series), all columns are replaced by their ranks. To append a rank column instead, use bracket assignment on a single-column Series:
frame["rank_col"] = frame["col"].rank().Combining multiple rank calls in a single expression is not supported (e.g.
frame["col1"].rank() + frame["col2"].rank()). Compute them in separate assignment steps instead.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Rank all columns (replaces values with ranks) frame.filter(items=["Order Id"]).rank().head(5).to_pandas()
Order Id 0 1 1 2 2 3 3 4 4 5 # Append a percentage rank column via Series assignment frame["Order Rank"] = frame["Order Id"].rank(pct=True) frame.head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name Order Rank 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 0.0 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 0.001206 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 0.002413 3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 0.003619 4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices 0.004825
rename
- PandasApiTdsFrame.rename(mapper=None, index=None, columns=None, axis=1, inplace=False, copy=True, level=None, errors='ignore')[source]
Rename columns of the TDS frame.
Alter column labels using a mapping (dict) or a callable function applied to each column name.
- Parameters:
mapper (
Union[Dict[str,str],Callable[[str],str],None]) – Mapping of old column names to new column names, or a callable that transforms each column name (e.g.str.upper). Used whenaxis=1(columns). Cannot be specified together withcolumns.index (
Union[Dict[str,str],Callable[[str],str],None]) – Not supported. Passing any value raisesNotImplementedError.columns (
Union[Dict[str,str],Callable[[str],str],None]) – Alternative tomapperfor renaming columns. Mutually exclusive withmapperwhen both are provided alongsideaxis.axis (
Union[str,int]) – Axis to target. Only1/'columns'is supported.0/'index'raisesNotImplementedError.inplace (
bool) – Must beFalse.TrueraisesNotImplementedError.copy (
bool) – Must beTrue.FalseraisesNotImplementedError.level (
Union[str,int,None]) – Not supported. Passing any value raisesNotImplementedError.errors (
str) – If'raise', raise aKeyErrorwhen a key in the mapping does not exist as a column name. If'ignore', silently skip non-existent keys.
- Returns:
A new TDS frame with renamed columns.
- Return type:
PandasApiTdsFrame- Raises:
TypeError – If
mapperorcolumnsis not a dict or callable. Ifcopyorinplaceis not a bool.ValueError – If both
mapper(withaxis) andcolumns/indexare specified simultaneously. Ifaxisis not a supported value. Iferrorsis not'ignore'or'raise'. If the rename produces duplicate column names.KeyError – If
errors='raise'and a key in the mapping does not exist in the frame’s columns.NotImplementedError – If
axis=0/'index',indexis set,levelis set,copy=False, orinplace=True.
Notes
Differences from pandas:
Only column renaming is supported (
axis=1). Index renaming (axis=0) raisesNotImplementedError.inplace=Trueis not supported; a new frame is always returned.copy=Falseis not supported.level(multi-level index) is not supported.The
indexparameter is not supported.When using a callable, it is applied to every column name (e.g.
str.upperwill uppercase all column names).If
errors='ignore'(the default), keys in the mapping that do not match any column are silently ignored, matching pandas behaviour.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Rename with a dict frame.rename({"Order Id": "OrderId", "Ship Name": "ShipName"}).head(3).to_pandas()
OrderId Order Date Required Date Shipped Date ShipName 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes # Rename with a callable frame.rename(str.upper).head(3).to_pandas()
ORDER ID ORDER DATE REQUIRED DATE SHIPPED DATE SHIP NAME 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes # Rename via the columns parameter frame.rename(columns={"Order Id": "order_id"}).head(3).to_pandas()
order_id Order Date Required Date Shipped Date Ship Name 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes
rolling
- PandasApiTdsFrame.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None, step=None, method=None, order_by=None, ascending=True)[source]
Create a fixed-size sliding window frame for window-aggregate computations.
A rolling window includes a fixed number of preceding rows (and optionally the current row) for each row, enabling moving averages, moving sums, and similar calculations.
- Parameters:
window (
int) – Size of the moving window (number of rows).min_periods (
Optional[int]) – Minimum number of observations in the window required to have a value. Defaults towindow.center (
bool) – Not supported. Must beFalse.win_type (
Optional[str]) – Not supported. Must beNone.on (
Optional[str]) – Not supported. Must beNone.axis (
Union[int,str]) – Only0/'index'is supported.closed (
Optional[str]) – Not supported. Must beNone.step (
Optional[int]) – Not supported. Must beNone.method (
Optional[str]) – Must beNoneor'python'.order_by (
Union[str,Sequence[str],None]) – Column(s) to order by within the window. Required for deterministic results.ascending (
Union[bool,Sequence[bool]]) – Sort order for theorder_bycolumns.
- Returns:
A window frame on which window aggregates (
sum,mean,min,max, etc.) can be called.- Return type:
PandasApiWindowTdsFrame
See also
- Raises:
NotImplementedError – If
center,win_type,on,closed, orstepare set to non-default values. Also raised ifaxisis not0ormethodis notNone/'python'.
Notes
Differences from pandas:
order_byandascendingare pylegend extensions not present in pandas. They control theORDER BYclause inside the SQLOVER(...)window specification.center,win_type,on,closed,stepare not supported.axis=1is not supported.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# 3-row moving average of Order Id ordered by Order Id frame.filter(items=["Order Id"]).rolling( window=3, order_by="Order Id" ).aggregate("mean").head(5).to_pandas()
Order Id 0 10248.0 1 10248.5 2 10249.0 3 10250.0 4 10251.0
rows_between
- PandasApiTdsFrame.rows_between(start=None, end=None)[source]
Create a
ROWS BETWEENwindow-frame specification.PyLegend extension — not present in pandas.
Sign convention (same as legendQL):
None→ UNBOUNDED (PRECEDING for start, FOLLOWING for end)Negative → PRECEDING (e.g.
-3→3 PRECEDING)0→ CURRENT ROWPositive → FOLLOWING (e.g.
2→2 FOLLOWING)
- Parameters:
start (
Optional[int]) – Lower bound of the frame.Nonemeans unbounded preceding.end (
Optional[int]) – Upper bound of the frame.Nonemeans unbounded following.
- Returns:
A frame specification to pass to
window_frame_legend_ext().- Return type:
RowsBetween- Raises:
ValueError – If
startis greater thanend.
See also
range_betweenCreate a
RANGE BETWEENspecification.window_frame_legend_extApply a custom window specification.
Notes
Differences from pandas:
This method has no pandas equivalent. It is a pylegend extension for constructing SQL
ROWS BETWEENclauses.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# 3-row trailing window (current row and 2 preceding) spec = frame.rows_between(-2, 0)
shape
- property PandasApiTdsFrame.shape: Tuple[int, int]
Return the dimensionality of the TDS frame as
(rows, columns).Warning
Unlike
pandas.DataFrame.shape, this property executes the frame against the server to determine the row count. It issues aCOUNTaggregation query, so every access incurs a round-trip to the database.- Returns:
A tuple
(number_of_rows, number_of_columns).- Return type:
tuple of (int, int)
See also
Notes
Differences from pandas:
In pandas,
DataFrame.shapeis an O(1) metadata lookup that never triggers computation. Here,shapeexecutes the current frame to obtain the row count via aCOUNTaggregation query. This means it requires a live connection to the database. This will fail on non-executable frames.The result type is always
(int, int); there is no lazy evaluation.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Get the shape (triggers server execution) frame.head(5).shape
(np.int64(5), 5)
shift
- PandasApiTdsFrame.shift(order_by, periods=1, freq=None, axis=0, fill_value=None, suffix=None)[source]
Shift values by desired number of periods.
Replace every column’s values with their shifted values. Because underlying TDS is inherently unordered, this requires an explicit
order_byparameter to define the ordering for the window function (LAGorLEAD).- Parameters:
order_by (
Union[str,Sequence[str]]) – Column name(s) to order the frame by before applying the shift. Unlike pandas, this is required to ensure deterministic output. All specified columns must be present in the base frame.periods (
Union[int,Sequence[int]]) – Number of periods to shift. Currently, only1(shift down, equivalent to SQLLAG) and-1(shift up, equivalent to SQLLEAD) are supported. If a sequence is provided, it cannot contain duplicate values.freq (
Union[str,int,None]) – Not supported. Must beNone.axis (
Union[int,str]) – Axis to shift along. Only0/'index'is supported.fill_value (
Optional[Hashable]) – Not supported. Must beNone. Missing values introduced by the shift will always be null.suffix (
Optional[str]) – If provided, renames the resulting shifted columns by appending this string to the original column names. This argument can only be used ifperiodsis a sequence (not a single integer).
- Returns:
A new TDS frame with the shifted columns.
- Return type:
PandasApiTdsFrame- Raises:
NotImplementedError – If
periodscontains any values other than1or-1. Iffreqis notNone. Ifaxisis not0or'index'. Iffill_valueis notNone.ValueError – If any column specified in
order_byis not present in the frame. Ifperiodscontains duplicate values. Ifsuffixis specified butperiodsis a single integer.
See also
rankRank as ascending or descending.
PandasApiGroupbyTdsFrame.shiftShift values within groups.
Notes
Differences from pandas:
The
order_byparameter is mandatory. In pandas,shiftrelies on the implicit order of the dataframe’s index. Here, an explicit order must be provided.periodsis strictly limited to1or-1. Arbitrary integer shifts are not supported.fill_valueis not supported and must beNone.The
freqparameter is not supported and must beNone.axis=1(shifting horizontally across columns) is not supported.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Shift the entire frame down frame.head(5).shift( order_by="Order Date", periods=1 ).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name 0 <NA> NaT NaT NaT NaN 1 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 2 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 3 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 4 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock
sort_values
- PandasApiTdsFrame.sort_values(by, axis=0, ascending=True, inplace=False, kind=None, na_position='last', ignore_index=True, key=None)[source]
Sort the TDS frame by one or more columns.
Return a new TDS frame sorted by the values in the specified column(s). Supports ascending and descending sort order per column.
- Parameters:
by (
Union[str,List[str]]) – Column name or list of column names to sort by. All names must exist in the current frame.axis (
Union[str,int]) – Axis along which to sort. Only0/'index'(row-wise sorting) is supported.ascending (
Union[bool,List[bool]]) – Sort order. If a list, must have the same length asby.inplace (
bool) – Must beFalse. In-place mutation is not supported.kind (
Optional[str]) – Not supported. Must beNone; passing any value raisesNotImplementedError.na_position (
str) – Position of null values. Accepted but handled at the SQL engine level.ignore_index (
bool) – Must beTrue. Setting toFalseraisesValueError.key (
Optional[Callable[[AbstractTdsRow],AbstractTdsRow]]) – Not supported. Must beNone; passing a callable raisesNotImplementedError.
- Returns:
A new TDS frame sorted by the specified columns.
- Return type:
PandasApiTdsFrame- Raises:
ValueError – If a column in
bydoes not exist in the frame.ValueError – If the length of
ascendingdoes not matchby.ValueError – If
axisis not0or'index'.ValueError – If
inplaceisTrue.ValueError – If
ignore_indexisFalse.NotImplementedError – If
kindorkeyis provided.
See also
Notes
Differences from pandas:
The
kindparameter (sort algorithm) is not supported. Sorting is delegated to the underlying Legend Engine.The
keyparameter (per-element transform before sorting) is not supported.inplace=Trueis not supported; always returns a new frame.ignore_indexmust beTrue;Falseis not supported because TDS frames do not have an index.axis=1(sorting columns) is not supported; only row-wise sorting viaaxis=0is available.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Sort by a single column (ascending by default) frame.sort_values("Ship Name").head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name 0 11011 1998-04-09 1998-05-07 1998-04-13 Alfred's Futterkiste 1 10952 1998-03-16 1998-04-27 1998-03-24 Alfred's Futterkiste 2 10835 1998-01-15 1998-02-12 1998-01-21 Alfred's Futterkiste 3 10702 1997-10-13 1997-11-24 1997-10-21 Alfred's Futterkiste 4 10692 1997-10-03 1997-10-31 1997-10-13 Alfred's Futterkiste # Sort descending frame.sort_values("Order Id", ascending=False).head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name 0 11077 1998-05-06 1998-06-03 NaT Rattlesnake Canyon Grocery 1 11076 1998-05-06 1998-06-03 NaT Bon app' 2 11075 1998-05-06 1998-06-03 NaT Richter Supermarkt 3 11074 1998-05-06 1998-06-03 NaT Simons bistro 4 11073 1998-05-05 1998-06-02 NaT Pericles Comidas clásicas # Sort by multiple columns with mixed directions frame.sort_values( by=["Ship Name", "Order Id"], ascending=[True, False] ).head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name 0 11011 1998-04-09 1998-05-07 1998-04-13 Alfred's Futterkiste 1 10952 1998-03-16 1998-04-27 1998-03-24 Alfred's Futterkiste 2 10835 1998-01-15 1998-02-12 1998-01-21 Alfred's Futterkiste 3 10702 1997-10-13 1997-11-24 1997-10-21 Alfred's Futterkiste 4 10692 1997-10-03 1997-10-31 1997-10-13 Alfred's Futterkiste
std
- PandasApiTdsFrame.std(axis=0, skipna=True, ddof=1, numeric_only=False, **kwargs)[source]
Compute the standard deviation of each column.
Convenience method equivalent to
aggregate('std')(ddof=1) oraggregate('std_dev_population')(ddof=0). Returns a single-row TDS frame with the standard deviation of every column.- Parameters:
axis (
Union[int,str]) – Only0/'index'is supported.skipna (
bool) – Must beTrue.Falseis not supported.ddof (
int) – Degrees of freedom.1for sample standard deviation (STDDEV_SAMP),0for population standard deviation (STDDEV_POP).numeric_only (
bool) – Must beFalse.Trueis not supported.**kwargs (
Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]) – Not supported.
- Returns:
A single-row TDS frame with column standard deviations.
- Return type:
PandasApiTdsFrame- Raises:
NotImplementedError – If
ddofis not0or1, or if any other parameter is set to an unsupported value.
See also
Notes
Differences from pandas:
Only
ddof=0andddof=1are supported.Internally delegates to
aggregate('std')(ddof=1, maps toSTDDEV_SAMP) oraggregate('std_dev_population')(ddof=0, maps toSTDDEV_POP).
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Standard deviation of numeric columns frame.filter(items=["Order Id"]).std().to_pandas()
Order Id 0 239.744656
sum
- PandasApiTdsFrame.sum(axis=0, skipna=True, numeric_only=False, min_count=0, **kwargs)[source]
Compute the sum of each column.
Convenience method equivalent to
aggregate('sum'). Returns a single-row TDS frame with the sum of every column.- Parameters:
axis (
Union[int,str]) – Only0/'index'is supported.skipna (
bool) – Must beTrue.Falseis not supported.numeric_only (
bool) – Must beFalse.Trueis not supported.min_count (
int) – Must be0. Non-zero values are not supported.**kwargs (
Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]) – Not supported. Passing any keyword arguments raisesNotImplementedError.
- Returns:
A single-row TDS frame with column sums.
- Return type:
PandasApiTdsFrame- Raises:
NotImplementedError – If
axis,skipna,numeric_only,min_count, or**kwargsare set to unsupported values.
See also
Notes
Differences from pandas:
skipna=False,numeric_only=True, and non-zeromin_countare not supported.axis=1is not supported.Internally delegates to
aggregate('sum').
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Sum of all columns frame.filter(items=["Order Id"]).sum().to_pandas()
Order Id 0 8849875
truncate
- PandasApiTdsFrame.truncate(before=0, after=None, axis=0, copy=True)[source]
Select rows by positional index range.
Return a new TDS frame containing rows from position
before(inclusive) toafter(inclusive).- Parameters:
before (
Union[date,str,int,None]) – OnlyintandNoneare supported. First row index to include (0-based, inclusive). Negative values are silently clamped to0.Noneis treated as0.after (
Union[date,str,int,None]) – OnlyintandNoneare supported. Last row index to include (0-based, inclusive).Nonemeans no upper bound (all remaining rows are returned). Negative values result in an empty frame.axis (
Union[str,int]) – Axis to truncate along. Only0/'index'is supported.copy (
bool) – Must beTrue. Setting toFalseraisesNotImplementedError.
- Returns:
A new TDS frame containing only the rows in the specified positional range.
- Return type:
PandasApiTdsFrame- Raises:
NotImplementedError – If
axis not ``0or'index'. IfcopyisFalse. Ifbeforeorafteris a non-integer type (e.g. a string or date). Ifbeforeorafteris a non-integer type (e.g. a string or date).ValueError – If
beforeis greater thanafter(after clamping).
See also
headReturn the first n rows.
sort_valuesSort the frame before truncating.
filterSelect columns by name, substring, or regex.
Notes
Differences from pandas:
In pandas,
truncateselects rows by label (index value). Here, it selects rows by positional (integer) index only (its translated to LIMIT and OFFSET of the underlying SQL engine). Passingdate,str, or other label-based values forbefore/afterraisesNotImplementedError.copy=Falseis not supported; a new frame is always returned.axis=1(truncating columns) is not supported.Negative
beforevalues are silently clamped to 0 rather than raising an error. Negativeaftervalues result in an empty frame (zero rows).The
afterparameter is inclusive (row at positionafteris included), matching pandas behaviour.
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Get rows at positions 0 through 4 (inclusive) frame.truncate(before=0, after=4).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name 0 10248 1996-07-04 1996-08-01 1996-07-16 Vins et alcools Chevalier 1 10249 1996-07-05 1996-08-16 1996-07-10 Toms Spezialitäten 2 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 3 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 4 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices # Skip first 5 rows, keep the rest frame.truncate(before=5).head(5).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name 0 10253 1996-07-10 1996-07-24 1996-07-16 Hanari Carnes 1 10254 1996-07-11 1996-08-08 1996-07-23 Chop-suey Chinese 2 10255 1996-07-12 1996-08-09 1996-07-15 Richter Supermarkt 3 10256 1996-07-15 1996-08-12 1996-07-17 Wellington Importadora 4 10257 1996-07-16 1996-08-13 1996-07-22 HILARION-Abastos # Get rows at positions 2 through 6 (inclusive) frame.truncate(before=2, after=6).to_pandas()
Order Id Order Date Required Date Shipped Date Ship Name 0 10250 1996-07-08 1996-08-05 1996-07-12 Hanari Carnes 1 10251 1996-07-08 1996-08-05 1996-07-15 Victuailles en stock 2 10252 1996-07-09 1996-08-06 1996-07-11 Suprêmes délices 3 10253 1996-07-10 1996-07-24 1996-07-16 Hanari Carnes 4 10254 1996-07-11 1996-08-08 1996-07-23 Chop-suey Chinese
var
- PandasApiTdsFrame.var(axis=0, skipna=True, ddof=1, numeric_only=False, **kwargs)[source]
Compute the variance of each column.
Convenience method equivalent to
aggregate('var')(ddof=1) oraggregate('variance_population')(ddof=0). Returns a single-row TDS frame with the variance of every column.- Parameters:
axis (
Union[int,str]) – Only0/'index'is supported.skipna (
bool) – Must beTrue.Falseis not supported.ddof (
int) – Degrees of freedom.1for sample variance (VAR_SAMP),0for population variance (VAR_POP).numeric_only (
bool) – Must beFalse.Trueis not supported.**kwargs (
Union[int,float,str,bool,date,datetime,Decimal,PyLegendPrimitive]) – Not supported.
- Returns:
A single-row TDS frame with column variances.
- Return type:
PandasApiTdsFrame- Raises:
NotImplementedError – If
ddofis not0or1, or if any other parameter is set to an unsupported value.
See also
Notes
Differences from pandas:
Only
ddof=0andddof=1are supported.Internally delegates to
aggregate('var')(ddof=1, maps toVAR_SAMP) oraggregate('variance_population')(ddof=0, maps toVAR_POP).
Examples
import pylegend frame = pylegend.samples.pandas_api.northwind_orders_frame()
# Variance of numeric columns frame.filter(items=["Order Id"]).var().to_pandas()
Order Id 0 57477.5
window_frame_legend_ext
- PandasApiTdsFrame.window_frame_legend_ext(frame_spec, order_by=None, ascending=True)[source]
Create a custom window specification with explicit frame bounds.
PyLegend extension — not present in pandas.
Provides fine-grained control over the
ROWS BETWEENorRANGE BETWEENclause used by window-aggregate computations.- Parameters:
frame_spec (
FrameSpec) – A window-frame specification created viarows_between()orrange_between().order_by (
Union[str,Sequence[str],None]) – Column(s) to order by within the window.Nonemeans no explicit ordering (a fallback will be chosen automatically).ascending (
Union[bool,Sequence[bool]]) – Sort direction(s) for theorder_bycolumns.
- Returns:
A window frame on which window aggregates (
sum,mean,min,max, etc.) can be called.- Return type:
PandasApiWindowTdsFrame- Raises:
TypeError – If
frame_specis not aRowsBetweenorRangeBetween.
See also
expandingExpanding (cumulative) window.
rollingFixed-size sliding window.
rows_betweenCreate a
ROWS BETWEENspecification.range_betweenCreate a
RANGE BETWEENspecification.
Notes
Differences from pandas:
This method has no pandas equivalent. It is a pylegend extension for explicit control over the SQL window frame.
Examples
import pylegend from pylegend.core.language.pandas_api.pandas_api_frame_spec import ( RowsBetween, ) frame = pylegend.samples.pandas_api.northwind_orders_frame()
spec = RowsBetween(-2, 0) frame.filter(items=["Order Id"]).window_frame_legend_ext( spec, order_by="Order Id" ).sum().head(5).to_pandas()