Reads in an .ods/.fods file and returns a pandas DataFrame object (+ parse cell formatting)

Go to file

ljnsn 28f61d86bd fix: support py312		2024-05-31 23:45:19 +02:00
.github/workflows	ci: drop tests on 3.8 and add 3.11 and 3.12	2024-05-31 17:55:59 +02:00
pandas_ods_reader	fix-lint: black format	2024-05-31 17:55:59 +02:00
tests	fix-lint: black format	2024-05-31 17:55:59 +02:00
.gitignore	ignore: pyenv python	2024-05-31 17:55:59 +02:00
.pre-commit-config.yaml	fix: remove push hook	2022-12-21 23:51:45 +01:00
LICENSE.txt	add license and manifest	2019-01-28 21:22:40 +01:00
README.md	update package name	2021-08-22 18:57:42 +02:00
poetry.lock	fix: support py312	2024-05-31 23:45:19 +02:00
pyproject.toml	fix: support py312	2024-05-31 23:45:19 +02:00

README.md

pandas-ods-reader

Provides a function to read in a .ods or .fods file and returns a pandas DataFrame.

It uses ezodf to read in .ods files. Since .fods files are essentially xml, lxml is used to read them. The correct parser is automatically chosen based on the file's extension.

If a range is specified in the sheet to be imported, it seems that ezodf imports empty cells as well. Therefore, completely empty rows and columns are dropped from the DataFrame, before it is returned. Only trailing empty rows and columns are dropped.

If the ODS file contains duplicated column names, they will be numbered and the number is appended to the column name in the resulting DataFrame.

Dependencies

ezodf
lxml
pandas

Installation

pip install pandas-ods-reader

Usage

from pandas_ods_reader import read_ods

path = "path/to/file.ods"

# by default the first sheet is imported
df = read_ods(path)

# load a sheet based on its index (1 based)
sheet_idx = 2
df = read_ods(path, sheet_idx)

# load a sheet based on its name
sheet_name = "sheet1"
df = read_ods(path, sheet_name)

# load a file that does not contain a header row
# if no columns are provided, they will be numbered
df = read_ods(path, 1, headers=False)

# load a file and provide custom column names
# if headers is True (the default), the header row will be overwritten
df = read_ods(path, 1, columns=["A", "B", "C"])