Dataframe grouping functions#

Dataframe grouping methods#

class pystarburst.relational_grouped_dataframe.GroupingSets(*sets: Column | List[Column])#

Bases: object

Creates a GroupingSets object from a list of column/expression sets that you pass to DataFrame.group_by_grouping_sets(). See DataFrame.group_by_grouping_sets() for examples of how to use this class with a DataFrame. See GROUP BY GROUPING SETS for its counterpart in SQL (several examples are shown below).

Python interface

SQL interface

GroupingSets([col("a")], [col("b")])

GROUPING SETS ((a), (b))

GroupingSets([col("a") , col("b")], [col("c"), col("d")])

GROUPING SETS ((a, b), (c, d))

GroupingSets([col("a"), col("b")])

GROUPING SETS ((a, b))

GroupingSets(col("a"), col("b"))

GROUPING SETS ((a, b))

class pystarburst.relational_grouped_dataframe.RelationalGroupedDataFrame(df: DataFrame, grouping_exprs: List[Expression], group_type: _GroupType)#

Bases: object

Represents an underlying DataFrame with rows that are grouped by common values. Can be used to define aggregations on these grouped DataFrames.

agg(*exprs: Column | Tuple[ColumnOrName, str] | Dict[str, str]) DataFrame#

Returns a DataFrame with computed aggregates. See examples in DataFrame.group_by().

Parameters:

exprs

A variable length arguments list where every element is

  • a Column object

  • a tuple where the first element is a column object or a column name and the second element is the name of the aggregate function

  • a list of the above

  • a dict maps column names to aggregate function names.

Note

The name of the aggregate function to compute must be a valid Trino aggregate function.

See also

  • DataFrame.agg()

  • DataFrame.group_by()

avg(*cols: ColumnOrName) DataFrame#

Return the average for the specified numeric columns.

builtin(agg_name: str) Callable#

Computes the builtin aggregate agg_name over the specified columns. Use this function to invoke any aggregates not explicitly listed in this class. See examples in DataFrame.group_by().

count() DataFrame#

Return the number of rows for each group.

function(agg_name: str) Callable#

Computes the builtin aggregate agg_name over the specified columns. Use this function to invoke any aggregates not explicitly listed in this class. See examples in DataFrame.group_by().

max(*cols: ColumnOrName) DataFrame#

Return the max for the specified numeric columns.

mean(*cols: ColumnOrName) DataFrame#

Return the average for the specified numeric columns.

min(*cols: ColumnOrName) DataFrame#

Return the min for the specified numeric columns.

sum(*cols: ColumnOrName) DataFrame#

Return the sum for the specified numeric columns.