how can I make multiple predictions at once It seems that ca Nixtla Community #neural-forecast

how can I make multiple predictions at once? It se...

Jonathan Mackenzie

04/29/2025, 3:04 AM

how can I make multiple predictions at once? It seems that calling predict with a dataframe with multiple rows only predicts on the last row:

Olivier

04/30/2025, 2:26 PM

Note sure what you mean? The predictions returned depend on the test df that you provide to the .predict function?

Rodrigo Sodré

04/30/2025, 5:22 PM

@Jonathan Mackenzie isn't the 'horizont' parameter u set in the training phase what u want?

Jonathan Mackenzie

05/01/2025, 12:02 AM

@Olivier I have horizon=6 so I expect 6 records, but I pass in a test df with 2 rows, one at 11am and another at 2:40pm, I want predictions made at both of those times

Marco

05/05/2025, 12:34 PM

Hello! Data must be contiguous, meaning that it must be ordered in time, ideally without missing values. Here, by passing a df with a value at 11am 2:40pm, the model uses those two values as input and makes a forecast for 2:50pm and so on. If you want to forecast from 11am, then you must first pass a df that ends at 11am, such that this input is used to forecast 11:10am, and so on. Otherwise, if you are forecasting two different series (two different unique ids), then id1 can end at 11am, and id 2 can end at 2:40pm. I hope this helps 🙂

Jonathan Mackenzie

05/06/2025, 4:29 AM

@Marco thanks, I have another question, I have a dataset that goes from 4am to 8pm every day, at 10 minute intervals and I want to predict data over the whole day (a horizon of 96). Should I make a different unique_id for each day?

Marco

05/06/2025, 10:28 AM

No, just set your horizon to 96 and all timestamps will be predicted

Jonathan Mackenzie

05/06/2025, 10:29 AM

I got some pretty bad results doing that

Olivier

05/06/2025, 10:36 AM

We don't know enough about the forecasting problem to provide further guidance.... e.g. you could be trying to predict a financial instrument and then the entire exercise is pointless, or it can be a misconfiguration, or a data issue, or an incorrect model setup, etc, etc. Happy to provide further guidance if you'd give more context or ideally a piece of code that we can run / look at

Jonathan Mackenzie

05/06/2025, 10:45 AM

@Olivier the data is solar power generation data (I have total kwh generated for the current day, and instantaneous power output), I have 2 years of data at 10 minute intervals, from 6am to 8pm each day. I've also got weather data from a nearby weather station (added it to see if it would help improve predictions). I can't share the whole solar data, but my test dataframe looks like this:

Copy code

test_df.iloc[0]
Out[12]: 
timestamp                  2025-01-01 18:10:00
ac_power_site                              0.0
irradiance                                 0.0
daily_energy_site                       375.47
daily_energy_inverter_1                 124.19
ac_power_inverter_1                        0.0
daily_energy_inverter_2                 117.41
ac_power_inverter_2                        0.0
daily_energy_inverter_3                 133.88
ac_power_inverter_3                        0.0
air_temperature                           17.8
air_pressure                            1013.2
humidity                                  84.0
sunshine_duration                          0.0
solar_radiation                            0.0
unique_id                                 S034

We wanted to also make a model for the individual inverters that make up the whole solar plant My code looks like this:

Copy code

def train(site, inverter, target_prefix="daily_energy"):
    """
    Train a model using neuralforecast NHITS model for a given site or inverter
    """
    if not inverter:
        target = f"{target_prefix}_site"
    else:
        target = f"{target_prefix}_{inverter}"

    data = load_data(site)
    df = data.reset_index(drop=False)
    print(f"Training site={site} inverter={inverter} target={target}")
    # drop null values
    # get weather prediction value
    df = df.dropna(subset=['timestamp', 'irradiance', target])
    split_idx = int(0.8 * len(df))
    train_df = df.iloc[:split_idx]
    test_df = df.iloc[split_idx:]
    horizon = 96  # n predictions at 10 minutes ahead
    tb_logger = TensorBoardLogger(save_dir="tb_logs", name="solar_tb_logs")
    extra_fields = [
        'irradiance',
        'air_temperature',
        'air_pressure',
        'humidity',
        'sunshine_duration',
        'solar_radiation',
    ]
    # Use your own config or AutoNHITS.default_config
    nhits_config = {
        "learning_rate": tune.choice([1.9e-7]),
        "max_steps": tune.choice([1024]),  # Number of SGD steps
        "input_size": tune.choice([5 * horizon, 3 * horizon, 8 * horizon]),  # input_size = multiplier * horizon
        "batch_size": tune.choice([32, 16, 8]),  # Number of series in windows
        "windows_batch_size":  tune.choice([128, 256, 512, 1024]),  # Number of windows in batch
        "n_pool_kernel_size": tune.choice(
            [[2, 2, 1], 3 * [1], 3 * [2], 3 * [4], [8, 4, 1], [16, 8, 1]]
        ),  # MaxPool's Kernel size
        "n_freq_downsample": tune.choice([[168, 24, 1], [24, 12, 1], [4, 2, 1], [1, 1, 1]]),
        "hist_exog_list": tune.choice([extra_fields]),
        # Interpolation expressivity ratios
        "activation": tune.choice(['ReLU']),  # Type of non-linear activation
        "n_blocks": tune.choice([[1, 1, 1]]),  # Blocks per each 3 stacks
        "mlp_units": tune.choice([3 * [[512, 512]], 4 * [[512, 512]], 5 * [256, 256]]),
        "early_stop_patience_steps": tune.choice([4]),
        # 2 512-Layers per block for each stack
        "interpolation_mode": tune.choice(['linear']),  # Type of multi-step interpolation
        "val_check_steps": tune.choice([20]),  # Compute validation every 100 epochs
        "random_seed": tune.randint(1, 20),
        "logger": tune.choice([tb_logger]),
        "callbacks": tune.choice([[LearningRateFinder()]])
    }



    nf = NeuralForecast(
        models=[
            AutoNHITS(
                h=horizon,
                config=nhits_config,
                num_samples=16,
            ),
        ],
        freq='10min'
    )
    # print("Best config", nf.models[0].results.get_best_result().config)

    nf.fit(
        df=df[['unique_id', 'timestamp', target] + extra_fields],
        time_col="timestamp",
        target_col=target,
        val_size=int(0.15 * len(train_df)),
    )

    if inverter:
        model_name = f"{site}_inverter_{inverter}_h_{horizon}"
    else:
        model_name = f"{site}_site_h_{horizon}"
    model_output = root_path / 'models' / model_name
    nf.save(str(model_output), overwrite=True, save_dataset=False)
    print("Writing to", model_output)
    return nf, train_df, test_df, target, f"{site} {inverter}"

Olivier

05/06/2025, 10:56 AM

Thanks - I'll have a look but first thing I noticed is that .fit should probably be on

train_df

, not the full

df

, no? Second, do you have more than one unique_id? Otherwise there's an error in how you create the train and test sets - do that by doing e.g.

df.groupby(["unique_id"], sort=False).tail(20)

if you want to keep the last 20 timesteps. Make sure the dataframe is sorted too before making the split:

df.sort_values(by=["unique_id", "ds"])

. I'll come back later but these are a few easy things from looking at your code.

Jonathan Mackenzie

05/06/2025, 10:57 AM

thanks. I've fixed the train_df bit, and there is only 1 unique_id value in this dataset ("S034"), ie. site 34

👍 1

Olivier

05/06/2025, 11:00 AM

Futher continuing it seems you don't have a

scaler_type

set which is usually in any algorithm the most important hyperparameter. Learning rate doesn't make sense, way too low. Just start with the default config of AutoNHITS.

Jonathan Mackenzie

05/06/2025, 11:01 AM

the plots of the data we might want to predict look like this:

Jonathan Mackenzie

05/06/2025, 11:01 AM

I used the LRFinder callback, is that not compatible with setting the learning rate?

Olivier

05/06/2025, 11:02 AM

You're overcomplicating things. All these things don't move the needle and should be something you look at at the very end when you want to squeeze the last 1% of performance.

Olivier

05/06/2025, 11:04 AM

E.g. start with something simple:

nhits_config = {

"scaler_type"= tune.choice(["minmax1", "robust", "identity"]),

"max_steps"=tune.choice([500, 1000, 2000, 5000]),

Jonathan Mackenzie

05/06/2025, 11:04 AM

what's wrong with doing early stopping?

Olivier

05/06/2025, 11:07 AM

Nothing, but as I said, you're overcomplicating things, which seems unnecessary at this point. Start out simple, you can always add complexity. Now you have no clue why performance is bad, because you're immediately jumping to the most complex pipeline in the history of mankind 🙂

3 Views

Open in Slack

Previous Next