Dataframe write functions#

Dataframe write methods#

class pystarburst.dataframe_writer.DataFrameWriter(dataframe: DataFrame)#

Bases: object

Provides methods for writing data from a DataFrame to supported output destinations.

To use this object:

  1. Create an instance of a DataFrameWriter by accessing the DataFrame.write property.

  2. (Optional) Specify the save mode by calling mode(), which returns the same DataFrameWriter that is configured to save data using the specified mode. The default mode is “errorifexists”.

  3. Call save_as_table() to save the data to the specified destination.

mode(save_mode: str) DataFrameWriter#

Set the save mode of this DataFrameWriter.

Parameters:

save_mode

one of the following strings:

  • ”append”: Append data of this DataFrame to existing data.

  • ”overwrite”: Overwrite existing data.

  • ”errorifexists”: Throw an exception if data already exists.

  • ”ignore”: Ignore this operation if data already exists.

Default value is “errorifexists”.

Returns:

The DataFrameWriter itself.

saveAsTable(table_name: str | Iterable[str], *, mode: str | None = None, column_order: str = 'index', table_properties: Dict[str, str | bool | int | float] = None, statement_properties: Dict[str, str] | None = None) None#

Writes the data to the specified table in a Trino cluster.

saveAsTable() is an alias of save_as_table().

Parameters:
  • table_name – A string or list of strings that specify the table name or fully-qualified object identifier (database name, schema name, and table name).

  • mode

    One of the following values. When it’s None or not provided, the save mode set by mode() is used.

    • ”append”: Append data of this DataFrame to existing data.

    • ”overwrite”: Overwrite existing data.

    • ”errorifexists”: Throw an exception if data already exists.

    • ”ignore”: Ignore this operation if data already exists.

  • column_order

    When mode is “append”, data will be inserted into the target table by matching column sequence or column name. Default is “index”. When mode is not “append”, the column_order makes no difference.

    • ”index”: Data will be inserted into the target table by column sequence.

    • ”name”: Data will be inserted into the target table by matching column names. If the target table has more columns than the source DataFrame, use this one.

  • table_properties – Any custom table properties used to create the table.

Examples

>>> df = session.create_dataframe([[1,2],[3,4]], schema=["a", "b"])
>>> df.write.mode("overwrite").save_as_table("my_table")
>>> session.table("my_table").collect()
[Row(A=1, B=2), Row(A=3, B=4)]
>>> df.write.save_as_table("my_table", mode="append")
>>> session.table("my_table").collect()
[Row(A=1, B=2), Row(A=3, B=4), Row(A=1, B=2), Row(A=3, B=4)]
>>> df.write.mode("overwrite").save_as_table("my_table", table_properties={"format": "parquet"})
>>> session.table("my_table").collect()
[Row(A=1, B=2), Row(A=3, B=4)]
>>> df.write.mode("overwrite").save_as_table("my_table", table_properties={"partitioning": ["a"]})
>>> session.table("my_table").collect()
[Row(A=1, B=2), Row(A=3, B=4)]
save_as_table(table_name: str | Iterable[str], *, mode: str | None = None, column_order: str = 'index', table_properties: Dict[str, str | bool | int | float] = None, statement_properties: Dict[str, str] | None = None) None#

Writes the data to the specified table in a Trino cluster.

saveAsTable() is an alias of save_as_table().

Parameters:
  • table_name – A string or list of strings that specify the table name or fully-qualified object identifier (database name, schema name, and table name).

  • mode

    One of the following values. When it’s None or not provided, the save mode set by mode() is used.

    • ”append”: Append data of this DataFrame to existing data.

    • ”overwrite”: Overwrite existing data.

    • ”errorifexists”: Throw an exception if data already exists.

    • ”ignore”: Ignore this operation if data already exists.

  • column_order

    When mode is “append”, data will be inserted into the target table by matching column sequence or column name. Default is “index”. When mode is not “append”, the column_order makes no difference.

    • ”index”: Data will be inserted into the target table by column sequence.

    • ”name”: Data will be inserted into the target table by matching column names. If the target table has more columns than the source DataFrame, use this one.

  • table_properties – Any custom table properties used to create the table.

Examples

>>> df = session.create_dataframe([[1,2],[3,4]], schema=["a", "b"])
>>> df.write.mode("overwrite").save_as_table("my_table")
>>> session.table("my_table").collect()
[Row(A=1, B=2), Row(A=3, B=4)]
>>> df.write.save_as_table("my_table", mode="append")
>>> session.table("my_table").collect()
[Row(A=1, B=2), Row(A=3, B=4), Row(A=1, B=2), Row(A=3, B=4)]
>>> df.write.mode("overwrite").save_as_table("my_table", table_properties={"format": "parquet"})
>>> session.table("my_table").collect()
[Row(A=1, B=2), Row(A=3, B=4)]
>>> df.write.mode("overwrite").save_as_table("my_table", table_properties={"partitioning": ["a"]})
>>> session.table("my_table").collect()
[Row(A=1, B=2), Row(A=3, B=4)]