Brian Head
07/30/2024, 2:50 PMfil_gaps
function that AttributeError: 'DataFrame' object has no attribute groupby
. I'm wondering if it's a naming convention issue or version issue. I'm playing with converting a script from pandas to polars. I had to change "groupby" to "group_by" in some polars code. This is why I'm wondering if it's a naming convention issue or version issue (e.g., maybe polars previously used groupby and that's causing a problem with the fill_gaps function.Brian Head
07/30/2024, 2:51 PMJosé Morales
07/30/2024, 3:53 PMBrian Head
07/30/2024, 4:50 PMJosé Morales
07/30/2024, 4:52 PMBrian Head
07/30/2024, 4:56 PMBrian Head
07/30/2024, 5:56 PMimport mlforecast
from statsforecast import StatsForecast
from utilsforecast.preprocessing import fill_gaps
import pandas as pd
import numpy as np
import polars as pl
# Create an empty DataFrame
df = pd.DataFrame(columns=['ds', 'y', 'id'])
# Generate dates (e.g., from January 2023 to December 2024)
start_date = '2023-01-01'
end_date = '2024-12-31'
date_range = pd.date_range(start=start_date, end=end_date, freq='MS') # MS: Month Start frequency
# Populate the DataFrame
df['ds'] = date_range
df['y'] = np.random.randint(1000, 1501, size=len(date_range)) # Random 'y' values
df['id'] = 42 # Constant 'id' value (you can choose any value)
# Introduce missing dates (e.g., remove every 5th row)
missing_indices = np.arange(0, len(df), 5)
df = df.drop(index=missing_indices)
df_polars = pl.DataFrame(df)
ts = fill_gaps(df_polars, 'MS')
Error I get is:
AttributeError: 'DataFrame' object has no attribute 'groupby'
Brian Head
07/30/2024, 6:02 PMJosé Morales
07/30/2024, 6:10 PMColumnNotFoundError: unique_id
.
If I set id_col='id'
I get PanicException: expected leading integer in the duration string, found m
, which is the way that polars says it doesn't recognize 'MS'
, which is pandas-specific. In order to use polars you have to provide a polars offset, in this case '1mo'
, although I get an error about different units (I'll work on a fix). In summary if you change these two lines:
date_range = pd.date_range(start=start_date, end=end_date, freq='MS', unit='us')
ts = fill_gaps(df_polars, freq='1mo', id_col='id')
it should work as expectedBrian Head
07/30/2024, 6:15 PMTypeError: DatetimeArray._generate_range() got an unexpected keyword argument 'unit'
So, I removed that part. Not sure if that impacts things.
with:
ts = fill_gaps(df_polars, freq='1mo')
I get an error that says:
ValueError: Invalid frequency: 1mo
Brian Head
07/30/2024, 6:18 PMFile /databricks/python/lib/python3.11/site-packages/pandas/_libs/tslibs/offsets.pyx:3878, in pandas._libs.tslibs.offsets._get_offset()
But, it's confusing since I'm not using a pandas dataframe. I've confirmed df_polars is a polars dataframe.José Morales
07/30/2024, 6:20 PMBrian Head
07/30/2024, 6:21 PMJosé Morales
07/30/2024, 6:21 PMBrian Head
07/30/2024, 6:21 PMBrian Head
07/30/2024, 6:21 PMBrian Head
07/30/2024, 6:22 PMJosé Morales
07/30/2024, 6:22 PMimport utilsforecast.compat
utilsforecast.compat.pl_DataFrame = pl.DataFrame
Brian Head
07/30/2024, 6:23 PMBrian Head
07/30/2024, 6:26 PMarg_sort
operation not supported for dtype null
Brian Head
07/30/2024, 6:26 PMJosé Morales
07/30/2024, 6:27 PMutilsforecast.compat.pl_Series = pl.Series
Brian Head
07/30/2024, 6:28 PMJosé Morales
07/30/2024, 6:30 PMBrian Head
07/30/2024, 6:31 PMJosé Morales
07/30/2024, 6:31 PMBrian Head
07/30/2024, 6:31 PMJosé Morales
07/31/2024, 4:47 PMutilsforecast.compat.pl = pl
José Morales
07/31/2024, 4:47 PM