.. Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to you under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at .. http://www.apache.org/licenses/LICENSE-2.0 .. Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. .. _api.dataframe: ========= DataFrame ========= .. currentmodule:: pyspark.pandas Constructor ----------- .. autosummary:: :toctree: api/ DataFrame Attributes and underlying data ------------------------------ .. autosummary:: :toctree: api/ DataFrame.index DataFrame.info DataFrame.columns DataFrame.empty .. autosummary:: :toctree: api/ DataFrame.dtypes DataFrame.shape DataFrame.axes DataFrame.ndim DataFrame.size DataFrame.select_dtypes DataFrame.values Conversion ---------- .. autosummary:: :toctree: api/ DataFrame.copy DataFrame.isna DataFrame.astype DataFrame.isnull DataFrame.notna DataFrame.notnull DataFrame.bool Indexing, iteration ------------------- .. autosummary:: :toctree: api/ DataFrame.at DataFrame.iat DataFrame.head DataFrame.idxmax DataFrame.idxmin DataFrame.loc DataFrame.iloc DataFrame.insert DataFrame.items DataFrame.iterrows DataFrame.itertuples DataFrame.keys DataFrame.pop DataFrame.tail DataFrame.xs DataFrame.get DataFrame.where DataFrame.mask DataFrame.query Binary operator functions ------------------------- .. autosummary:: :toctree: api/ DataFrame.add DataFrame.radd DataFrame.div DataFrame.rdiv DataFrame.truediv DataFrame.rtruediv DataFrame.mul DataFrame.rmul DataFrame.sub DataFrame.rsub DataFrame.pow DataFrame.rpow DataFrame.mod DataFrame.rmod DataFrame.floordiv DataFrame.rfloordiv DataFrame.lt DataFrame.gt DataFrame.le DataFrame.ge DataFrame.ne DataFrame.eq DataFrame.dot DataFrame.combine_first Function application, GroupBy & Window -------------------------------------- .. autosummary:: :toctree: api/ DataFrame.apply DataFrame.applymap DataFrame.map DataFrame.pipe DataFrame.agg DataFrame.aggregate DataFrame.groupby DataFrame.rolling DataFrame.expanding DataFrame.transform .. _api.dataframe.stats: Computations / Descriptive Stats -------------------------------- .. autosummary:: :toctree: api/ DataFrame.abs DataFrame.all DataFrame.any DataFrame.clip DataFrame.corr DataFrame.corrwith DataFrame.count DataFrame.cov DataFrame.describe DataFrame.ewm DataFrame.kurt DataFrame.kurtosis DataFrame.max DataFrame.mean DataFrame.min DataFrame.median DataFrame.mode DataFrame.pct_change DataFrame.prod DataFrame.product DataFrame.quantile DataFrame.rank DataFrame.nunique DataFrame.sem DataFrame.skew DataFrame.sum DataFrame.std DataFrame.var DataFrame.cummin DataFrame.cummax DataFrame.cumsum DataFrame.cumprod DataFrame.round DataFrame.diff DataFrame.eval Reindexing / Selection / Label manipulation ------------------------------------------- .. autosummary:: :toctree: api/ DataFrame.add_prefix DataFrame.add_suffix DataFrame.align DataFrame.at_time DataFrame.between_time DataFrame.drop DataFrame.droplevel DataFrame.drop_duplicates DataFrame.duplicated DataFrame.equals DataFrame.filter DataFrame.first DataFrame.head DataFrame.last DataFrame.reindex DataFrame.reindex_like DataFrame.rename DataFrame.rename_axis DataFrame.reset_index DataFrame.set_index DataFrame.swapaxes DataFrame.swaplevel DataFrame.take DataFrame.isin DataFrame.sample DataFrame.truncate .. _api.dataframe.missing: Missing data handling --------------------- .. autosummary:: :toctree: api/ DataFrame.backfill DataFrame.dropna DataFrame.fillna DataFrame.replace DataFrame.bfill DataFrame.ffill DataFrame.interpolate DataFrame.pad Reshaping, sorting, transposing ------------------------------- .. autosummary:: :toctree: api/ DataFrame.pivot_table DataFrame.pivot DataFrame.sort_index DataFrame.sort_values DataFrame.nlargest DataFrame.nsmallest DataFrame.stack DataFrame.unstack DataFrame.melt DataFrame.explode DataFrame.squeeze DataFrame.T DataFrame.transpose Combining / joining / merging ----------------------------- .. autosummary:: :toctree: api/ DataFrame.assign DataFrame.merge DataFrame.join DataFrame.update Time series-related ------------------- .. autosummary:: :toctree: api/ DataFrame.resample DataFrame.shift DataFrame.first_valid_index DataFrame.last_valid_index Serialization / IO / Conversion ------------------------------- .. autosummary:: :toctree: api/ DataFrame.from_dict DataFrame.from_records DataFrame.to_table DataFrame.to_delta DataFrame.to_parquet DataFrame.to_csv DataFrame.to_orc DataFrame.to_pandas DataFrame.to_html DataFrame.to_numpy DataFrame.to_spark DataFrame.to_string DataFrame.to_feather DataFrame.to_stata DataFrame.to_json DataFrame.to_dict DataFrame.to_excel DataFrame.to_hdf DataFrame.to_clipboard DataFrame.to_markdown DataFrame.to_records DataFrame.to_latex DataFrame.style Spark-related ------------- ``DataFrame.spark`` provides features that does not exist in pandas but in Spark. These can be accessed by ``DataFrame.spark.``. .. autosummary:: :toctree: api/ :template: autosummary/accessor_method.rst DataFrame.spark.frame DataFrame.spark.cache DataFrame.spark.persist DataFrame.spark.hint DataFrame.spark.to_table DataFrame.spark.to_spark_io DataFrame.spark.apply DataFrame.spark.repartition DataFrame.spark.coalesce .. _api.dataframe.plot: Plotting -------- ``DataFrame.plot`` is both a callable method and a namespace attribute for specific plotting methods of the form ``DataFrame.plot.``. .. autosummary:: :toctree: api/ :template: autosummary/accessor_method.rst DataFrame.plot.area DataFrame.plot.bar DataFrame.plot.barh DataFrame.plot.box DataFrame.plot.density DataFrame.plot.hist DataFrame.plot.kde DataFrame.plot.line DataFrame.plot.pie DataFrame.plot.scatter .. autosummary:: :toctree: api/ DataFrame.hist DataFrame.boxplot DataFrame.kde Pandas-on-Spark specific ------------------------ ``DataFrame.pandas_on_spark`` provides pandas-on-Spark specific features that exists only in pandas API on Spark. These can be accessed by ``DataFrame.pandas_on_spark.``. .. autosummary:: :toctree: api/ :template: autosummary/accessor_method.rst DataFrame.pandas_on_spark.apply_batch DataFrame.pandas_on_spark.transform_batch