Table#
Table methods#
- class pystarburst.table.DeleteResult(rows_deleted: int)#
Bases:
NamedTupleResult of deleting rows in a
Table.- rows_deleted: int#
The number of rows deleted.
- class pystarburst.table.MergeResult(rows_affected: int)#
Bases:
NamedTupleResult of merging a
DataFrameinto aTable.- rows_affected: int#
The number of rows inserted, updated or deleted.
- class pystarburst.table.Table(table_name: str, session: Session)#
Bases:
DataFrameRepresents a lazily-evaluated Table. It extends
DataFrameso allDataFrameoperations can be applied to it.You can create a
Tableobject by callingSession.table()with the name of the table in Trino. See examples inSession.table().- delete(condition: Column | None = None, *, statement_properties: Dict[str, str] | None = None) DeleteResult#
Deletes rows in a Table and returns a
DeleteResult, representing the number of rows deleted.- Parameters:
condition – An optional
Columnobject representing the specified condition. It must be provided ifsourceis provided.
Examples
>>> target_df = session.create_dataframe([(1, 1),(1, 2),(2, 1),(2, 2),(3, 1),(3, 2)], schema=["a", "b"]) >>> target_df.write.save_as_table("my_table", mode="overwrite") >>> t = session.table("my_table") >>> # delete all rows in a table >>> t.delete() DeleteResult(rows_deleted=6) >>> t.collect() [] >>> # delete all rows where column "a" has value 1 >>> target_df.write.save_as_table("my_table", mode="overwrite") >>> t.delete(t["a"] == 1) DeleteResult(rows_deleted=2) >>> t.collect() [Row(A=2, B=1), Row(A=2, B=2), Row(A=3, B=1), Row(A=3, B=2)]
- drop_table() None#
Drops the table from the Trino cluster, if exists.
Note that subsequent operations such as
DataFrame.select(),DataFrame.collect()on thisTableinstance and the derived DataFrame will raise errors because the underlying table in the Trino cluster no longer exists.
- is_cached: bool#
Whether the table is cached.
- merge(source: DataFrame, join_expr: Column, clauses: Iterable[WhenMatchedClause | WhenNotMatchedClause], *, statement_properties: Dict[str, str] | None = None) MergeResult#
Merges this
TablewithDataFramesource on the specified join expression and a list of matched or not-matched clauses, and returns aMergeResult, representing the number of rows inserted, updated and deleted by this merge action. See MERGE for details.- Parameters:
source – A
DataFrameto join with thisTable. It can also be anotherTable.join_expr – A
Columnobject representing the expression on which to join thisTableandsource.clauses – A list of matched or not-matched clauses specifying the actions to perform when the values from this
Tableandsourcematch or not match onjoin_expr. These actions can only be instances ofWhenMatchedClauseandWhenNotMatchedClause, and will be performed sequentially in this list.
Examples
>>> from pystarburst.functions import when_matched, when_not_matched >>> target_df = session.create_dataframe([(10, "old"), (10, "too_old"), (11, "old")], schema=["key", "value"]) >>> target_df.write.save_as_table("my_table", mode="overwrite") >>> target = session.table("my_table") >>> source = session.create_dataframe([(10, "new"), (12, "new"), (13, "old")], schema=["key", "value"]) >>> target.merge(source, target["key"] == source["key"], ... [when_matched().update({"value": source["value"]}), when_not_matched().insert({"key": source["key"]})]) MergeResult(rows_affected=3) >>> target.collect() [Row(KEY=13, VALUE=None), Row(KEY=12, VALUE=None), Row(KEY=10, VALUE='new'), Row(KEY=10, VALUE='new'), Row(KEY=11, VALUE='old')]
- sample(frac: float, *, sampling_method: str | None = None) DataFrame#
Samples rows based on either the number of rows to be returned or a percentage of rows to be returned.
Sampling with a seed is not supported on views or subqueries. This method works on tables so it supports
seed. This is the main difference betweenDataFrame.sample()and this method.- Parameters:
frac – The percentage of rows to be sampled.
sampling_method –
Specifies the sampling method to use:
BERNOULLI: Includes each row with a probability of p/100. Similar to flipping a weighted coin for each row.
SYSTEM: Includes each block of rows with a probability of p/100. Similar to flipping a weighted coin for each block of rows. This method does not support fixed-size sampling.
Default is
None. Then the Trino cluster will use “BERNOULLI” by default.
Note
SYSTEM sampling is often faster than BERNOULLI sampling.
- table_name: str#
The table name
- update(assignments: Dict[str, ColumnOrLiteral], condition: Column | None = None, *, statement_properties: Dict[str, str] | None = None) UpdateResult#
Updates rows in the Table with specified
assignmentsand returns aUpdateResult, representing the number of rows modified and the number of multi-joined rows modified.- Parameters:
assignments – A
dictthat associates the names of columns with the values that should be updated. The value ofassignmentscan either be a literal value or aColumnobject.condition – An optional
Columnobject representing the specified condition. It must be provided ifsourceis provided.
Examples
>>> target_df = session.create_dataframe([(1, 1),(1, 2),(2, 1),(2, 2),(3, 1),(3, 2)], schema=["a", "b"]) >>> target_df.write.save_as_table("my_table", mode="overwrite") >>> t = session.table("my_table") >>> # update all rows in column "b" to 0 and all rows in column "a" >>> # to the summation of column "a" and column "b" >>> t.update({"b": 0, "a": t.a + t.b}) UpdateResult(rows_updated=6, multi_joined_rows_updated=0) >>> t.collect() [Row(A=2, B=0), Row(A=3, B=0), Row(A=3, B=0), Row(A=4, B=0), Row(A=4, B=0), Row(A=5, B=0)] >>> # update all rows in column "b" to 0 where column "a" has value 1 >>> target_df.write.save_as_table("my_table", mode="overwrite") >>> t.update({"b": 0}, t["a"] == 1) UpdateResult(rows_updated=2, multi_joined_rows_updated=0) >>> t.collect() [Row(A=1, B=0), Row(A=1, B=0), Row(A=2, B=1), Row(A=2, B=2), Row(A=3, B=1), Row(A=3, B=2)]
- class pystarburst.table.UpdateResult(rows_updated: int)#
Bases:
NamedTupleResult of updating rows in a
Table.- rows_updated: int#
The number of rows modified.
- class pystarburst.table.WhenMatchedClause(condition: Column | None = None)#
Bases:
objectA matched clause for the
Table.merge()action. It matches all remaining rows in the targetTablethat satisfyjoin_exprwhile also satisfyingcondition, if it is provided. You can usefunctions.when_matched()to instantiate this class.- Parameters:
condition – An optional
Columnobject representing the specified condition.
- delete()#
Defines a delete action for the matched clause and returns an updated
WhenMatchedClausewith the new delete action added.Examples
>>> # Adds a matched clause where a row in source is matched >>> # if its key is equal to the key of any row in target. >>> # For all such rows, delete them. >>> from pystarburst.functions import when_matched >>> target_df = session.create_dataframe([(10, "old"), (10, "too_old"), (11, "old")], schema=["key", "value"]) >>> target_df.write.save_as_table("my_table", mode="overwrite") >>> target = session.table("my_table") >>> source = session.create_dataframe([(10, "new")], schema=["key", "value"]) >>> target.merge(source, target["key"] == source["key"], [when_matched().delete()]) MergeResult(rows_affected=2) >>> target.collect() # the rows are deleted [Row(KEY=11, VALUE='old')]
Note
An exception will be raised if this method or
WhenMatchedClause.update()is called more than once on the sameWhenMatchedClauseobject.
- update(assignments: Dict[str, ColumnOrLiteral]) WhenMatchedClause#
Defines an update action for the matched clause and returns an updated
WhenMatchedClausewith the new update action added.- Parameters:
assignments – A list of values or a
dictthat associates the names of columns with the values that should be updated. The value ofassignmentscan either be a literal value or aColumnobject.
Examples
>>> # Adds a matched clause where a row in source is matched >>> # if its key is equal to the key of any row in target. >>> # For all such rows, update its value to the value of the >>> # corresponding row in source. >>> from pystarburst.functions import when_matched >>> target_df = session.create_dataframe([(10, "old"), (10, "too_old"), (11, "old")], schema=["key", "value"]) >>> target_df.write.save_as_table("my_table", mode="overwrite") >>> target = session.table("my_table") >>> source = session.create_dataframe([(10, "new")], schema=["key", "value"]) >>> target.merge(source, target["key"] == source["key"], [when_matched().update({"value": source["value"]})]) MergeResult(rows_affected=2) >>> target.collect() # the value in the table is updated [Row(KEY=10, VALUE='new'), Row(KEY=10, VALUE='new'), Row(KEY=11, VALUE='old')]
Note
An exception will be raised if this method or
WhenMatchedClause.delete()is called more than once on the sameWhenMatchedClauseobject.
- class pystarburst.table.WhenNotMatchedClause(condition: Column | None = None)#
Bases:
objectA not-matched clause for the
Table.merge()action. It matches all remaining rows in the targetTablethat do not satisfyjoin_exprbut satisfycondition, if it is provided. You can usefunctions.when_not_matched()to instantiate this class.- Parameters:
condition – An optional
Columnobject representing the specified condition.
- insert(assignments: Iterable[ColumnOrLiteral] | Dict[str, ColumnOrLiteral]) WhenNotMatchedClause#
Defines an insert action for the not-matched clause and returns an updated
WhenNotMatchedClausewith the new insert action added.- Parameters:
assignments – A list of values or a
dictthat associates the names of columns with the values that should be inserted. The value ofassignmentscan either be a literal value or aColumnobject.
Examples
>>> # Adds a not-matched clause where a row in source is not matched >>> # if its key does not equal the key of any row in target. >>> # For all such rows, insert a row into target whose ley and value >>> # are assigned to the key and value of the not matched row. >>> from pystarburst.functions import when_not_matched >>> target_df = session.create_dataframe([(10, "old"), (10, "too_old"), (11, "old")], schema=["key", "value"]) >>> target_df.write.save_as_table("my_table", mode="overwrite") >>> target = session.table("my_table") >>> source = session.create_dataframe([(12, "new")], schema=["key", "value"]) >>> target.merge(source, target["key"] == source["key"], [when_not_matched().insert([source["key"], source["value"]])]) MergeResult(rows_affected=1) >>> target.collect() # the rows are inserted [Row(KEY=12, VALUE='new'), Row(KEY=10, VALUE='old'), Row(KEY=10, VALUE='too_old'), Row(KEY=11, VALUE='old')] >>> # For all such rows, insert a row into target whose key is >>> # assigned to the key of the not matched row. >>> target_df.write.save_as_table("my_table", mode="overwrite") >>> target.merge(source, target["key"] == source["key"], [when_not_matched().insert({"key": source["key"]})]) MergeResult(rows_affected=1) >>> target.collect() # the rows are inserted [Row(KEY=12, VALUE=None), Row(KEY=10, VALUE='old'), Row(KEY=10, VALUE='too_old'), Row(KEY=11, VALUE='old')]
Note
An exception will be raised if this method is called more than once on the same
WhenNotMatchedClauseobject.