Thanks for developing such a great project. I have a question around using LightGBM... i'm having issues with the distributed module (completely unrelated) so I'm thinking about just using a large single machine to do the training. My dataset is roughly 800M records and reading it into a pandas dataframe may be an issue (or maybe not, just speculating). does MLforecast accept inputs other than pandas dataframes? would I be able to use the native lgb.Dataset(...)?
08/29/2023, 7:08 PM
Hey. We currently support only pandas dataframes, so you'd need to load it into memory. You could build a lightgbm dataset after the preprocessing and train using that, not sure if it that'd help a lot though.
08/29/2023, 7:09 PM
Makes sense. Thanks for the quick response!
08/29/2023, 7:18 PM
We try to keep the types where possible, so if you define the id as categorical and the target as float32 you could reduce the memory usage