Are the Nixtla functions names the same for polars...
# general
b
Are the Nixtla functions names the same for polars and pandas? I'm getting an error when trying to use the
fil_gaps
function that
AttributeError: 'DataFrame' object has no attribute groupby
. I'm wondering if it's a naming convention issue or version issue. I'm playing with converting a script from pandas to polars. I had to change "groupby" to "group_by" in some polars code. This is why I'm wondering if it's a naming convention issue or version issue (e.g., maybe polars previously used groupby and that's causing a problem with the fill_gaps function.
I do not get this error when using pandas, but do with polars.
j
Do you mean the fill_gaps function doesn't work for polars?
b
correct
j
Can you provide an example? It works fine on my end
b
Right now my work isn't something I can share. I'll see if I can replicate on a test/dummy set.
OK. I built out an example. Maybe I'm doing something wrong. Please let me know.
Copy code
import mlforecast
from statsforecast import StatsForecast
from utilsforecast.preprocessing import fill_gaps

import pandas as pd
import numpy as np
import polars as pl
Copy code
# Create an empty DataFrame
df = pd.DataFrame(columns=['ds', 'y', 'id'])

# Generate dates (e.g., from January 2023 to December 2024)
start_date = '2023-01-01'
end_date = '2024-12-31'
date_range = pd.date_range(start=start_date, end=end_date, freq='MS')  # MS: Month Start frequency

# Populate the DataFrame
df['ds'] = date_range
df['y'] = np.random.randint(1000, 1501, size=len(date_range))  # Random 'y' values
df['id'] = 42  # Constant 'id' value (you can choose any value)

# Introduce missing dates (e.g., remove every 5th row)
missing_indices = np.arange(0, len(df), 5)
df = df.drop(index=missing_indices)
Copy code
df_polars = pl.DataFrame(df)
Copy code
ts = fill_gaps(df_polars, 'MS')
Error I get is:
Copy code
AttributeError: 'DataFrame' object has no attribute 'groupby'
Also, here are my versions: pandas==1.5.3 utilsforecast==0.1.10
j
If I run your example as-is I get
ColumnNotFoundError: unique_id
. If I set
id_col='id'
I get
PanicException: expected leading integer in the duration string, found m
, which is the way that polars says it doesn't recognize
'MS'
, which is pandas-specific. In order to use polars you have to provide a polars offset, in this case
'1mo'
, although I get an error about different units (I'll work on a fix). In summary if you change these two lines:
Copy code
date_range = pd.date_range(start=start_date, end=end_date, freq='MS', unit='us')
ts = fill_gaps(df_polars, freq='1mo', id_col='id')
it should work as expected
b
I updated the pd dataframe to use 'unique_id' to remove any issues there from the equation. when I include the "unit = 'us'" I get an error:
Copy code
TypeError: DatetimeArray._generate_range() got an unexpected keyword argument 'unit'
So, I removed that part. Not sure if that impacts things. with:
Copy code
ts = fill_gaps(df_polars, freq='1mo')
I get an error that says:
Copy code
ValueError: Invalid frequency: 1mo
It also provides more detail:
Copy code
File /databricks/python/lib/python3.11/site-packages/pandas/_libs/tslibs/offsets.pyx:3878, in pandas._libs.tslibs.offsets._get_offset()
But, it's confusing since I'm not using a pandas dataframe. I've confirmed df_polars is a polars dataframe.
j
did you install polars on the same session?
b
Yes, given a current configuration I think I have to
j
Can you restart your kernel? since polars is an optional dependency we try to import it when you import utilsforecast and if it's not installed we use a placeholder which will evaluate to false if you installed it afterwards
b
yep, just confirmed. It's a managed DB environment and that's how this is currently setup.
I just restarted the kernal and I lose polars
I have to reinstall it each time. Out of my control.
j
Hmm, then the following might work but it's really suboptimal:
Copy code
import utilsforecast.compat
utilsforecast.compat.pl_DataFrame = pl.DataFrame
b
This environment is temporary while some other things are being worked on. I'll give this a try and hopefully this will resolve when the other things are worked out.
That returns a new error: PanicException:
arg_sort
operation not supported for dtype
null
I might just have to wait until things are worked out for the environment.
j
I think you also have to do:
utilsforecast.compat.pl_Series = pl.Series
b
Thanks, I added that but still get the error above about a PanicException
j
yeah the library isn't really designed to handle the installation in the same session
b
Understood. Thank you for your help anyway! Still love the Nixtla-verse and all the help you've provided. One other question: is there documentation specific for polars? Most of the examples I see on the Nixtla page are pandas.
j
It should work the same way except for the frequency aliases
👍 1
b
Great. Thank you!
j
Hey I think it might work if you also override the polars alias:
Copy code
utilsforecast.compat.pl = pl
Or if you install polars before importing anything else