Dataframe write functions#
Dataframe write methods#
- class pystarburst.dataframe_writer.DataFrameWriter(dataframe: DataFrame)#
Bases:
object
Provides methods for writing data from a
DataFrame
to supported output destinations.To use this object:
Create an instance of a
DataFrameWriter
by accessing theDataFrame.write
property.(Optional) Specify the save mode by calling
mode()
, which returns the sameDataFrameWriter
that is configured to save data using the specified mode. The default mode is “errorifexists”.Call
save_as_table()
to save the data to the specified destination.
- mode(save_mode: str) DataFrameWriter #
Set the save mode of this
DataFrameWriter
.- Parameters:
save_mode –
one of the following strings:
”append”: Append data of this DataFrame to existing data.
”overwrite”: Overwrite existing data.
”errorifexists”: Throw an exception if data already exists.
”ignore”: Ignore this operation if data already exists.
Default value is “errorifexists”.
- Returns:
The
DataFrameWriter
itself.
- saveAsTable(table_name: str | Iterable[str], *, mode: str | None = None, column_order: str = 'index', table_properties: Dict[str, str | bool | int | float] = None, statement_properties: Dict[str, str] | None = None) None #
Writes the data to the specified table in a Trino cluster.
saveAsTable()
is an alias ofsave_as_table()
.- Parameters:
table_name – A string or list of strings that specify the table name or fully-qualified object identifier (database name, schema name, and table name).
mode –
One of the following values. When it’s
None
or not provided, the save mode set bymode()
is used.”append”: Append data of this DataFrame to existing data.
”overwrite”: Overwrite existing data.
”errorifexists”: Throw an exception if data already exists.
”ignore”: Ignore this operation if data already exists.
column_order –
When
mode
is “append”, data will be inserted into the target table by matching column sequence or column name. Default is “index”. Whenmode
is not “append”, thecolumn_order
makes no difference.”index”: Data will be inserted into the target table by column sequence.
”name”: Data will be inserted into the target table by matching column names. If the target table has more columns than the source DataFrame, use this one.
table_properties – Any custom table properties used to create the table.
Examples
>>> df = session.create_dataframe([[1,2],[3,4]], schema=["a", "b"]) >>> df.write.mode("overwrite").save_as_table("my_table") >>> session.table("my_table").collect() [Row(A=1, B=2), Row(A=3, B=4)] >>> df.write.save_as_table("my_table", mode="append") >>> session.table("my_table").collect() [Row(A=1, B=2), Row(A=3, B=4), Row(A=1, B=2), Row(A=3, B=4)] >>> df.write.mode("overwrite").save_as_table("my_table", table_properties={"format": "parquet"}) >>> session.table("my_table").collect() [Row(A=1, B=2), Row(A=3, B=4)] >>> df.write.mode("overwrite").save_as_table("my_table", table_properties={"partitioning": ["a"]}) >>> session.table("my_table").collect() [Row(A=1, B=2), Row(A=3, B=4)]
- save_as_table(table_name: str | Iterable[str], *, mode: str | None = None, column_order: str = 'index', table_properties: Dict[str, str | bool | int | float] = None, statement_properties: Dict[str, str] | None = None) None #
Writes the data to the specified table in a Trino cluster.
saveAsTable()
is an alias ofsave_as_table()
.- Parameters:
table_name – A string or list of strings that specify the table name or fully-qualified object identifier (database name, schema name, and table name).
mode –
One of the following values. When it’s
None
or not provided, the save mode set bymode()
is used.”append”: Append data of this DataFrame to existing data.
”overwrite”: Overwrite existing data.
”errorifexists”: Throw an exception if data already exists.
”ignore”: Ignore this operation if data already exists.
column_order –
When
mode
is “append”, data will be inserted into the target table by matching column sequence or column name. Default is “index”. Whenmode
is not “append”, thecolumn_order
makes no difference.”index”: Data will be inserted into the target table by column sequence.
”name”: Data will be inserted into the target table by matching column names. If the target table has more columns than the source DataFrame, use this one.
table_properties – Any custom table properties used to create the table.
Examples
>>> df = session.create_dataframe([[1,2],[3,4]], schema=["a", "b"]) >>> df.write.mode("overwrite").save_as_table("my_table") >>> session.table("my_table").collect() [Row(A=1, B=2), Row(A=3, B=4)] >>> df.write.save_as_table("my_table", mode="append") >>> session.table("my_table").collect() [Row(A=1, B=2), Row(A=3, B=4), Row(A=1, B=2), Row(A=3, B=4)] >>> df.write.mode("overwrite").save_as_table("my_table", table_properties={"format": "parquet"}) >>> session.table("my_table").collect() [Row(A=1, B=2), Row(A=3, B=4)] >>> df.write.mode("overwrite").save_as_table("my_table", table_properties={"partitioning": ["a"]}) >>> session.table("my_table").collect() [Row(A=1, B=2), Row(A=3, B=4)]