Hi Team I m getting this error in R almost every time I have Nixtla Community #timegpt

Hi Team, I'm getting this error in R almost every...

Vidar Ingason

08/20/2024, 10:40 AM

Hi Team, I'm getting this error in R almost every time I have "large" dataset.

Copy code

Error in `httr2::req_perform()`:                               
! Failed to perform HTTP request.
Caused by error in `curl::curl_fetch_memory()`:
! Timeout was reached: [<http://dashboard.nixtla.io|dashboard.nixtla.io>] SSL/TLS connection timeout

The dataset is 462.000 rows and around 11.000 time series. I've seen blog posts by someone from the team and you talk about forecasting millions of time series using TimeGPT. I always seem to have trouble with dataset of several thousands time series.

Vidar Ingason

08/20/2024, 11:08 AM

This is what i see before I get the error

Mariana Menchero

08/20/2024, 7:21 PM

Hi @Vidar Ingason the current CRAN/dev versions of nixtlar don't have parallel processing yet. That's probably why you're having issues with such a large dataset. We're currently working on implementing that feature soon. It was going to be released in the next CRAN version, but if you need it soon, we'll add it to the dev version.

Vidar Ingason

08/20/2024, 7:39 PM

Hi @Mariana Menchero, do you have an estimated date of when you might add it? I'm setting up a client which has around 11.000 products and I would like to keep TimeGPT part of the solution. I know you're busy but since you asked, I would love to have this as soon as possible 🙂

Mariana Menchero

08/20/2024, 7:46 PM

Parallel processing is already a feature of the next version for CRAN, so let me check if we can add it to the dev version this week. I’ll get back back to you on Thursday. It shouldn't take long.

Vidar Ingason

08/20/2024, 8:02 PM

Thank you so much Mariana!

Mariana Menchero

08/23/2024, 7:09 AM

Hi @Vidar Ingason thanks for your patience. I opened a PR for supporting large datasets in

nixtlar

. It should be ready early next week. As soon as it has been merged to main, I'll let you know here. Regards.

Vidar Ingason

08/23/2024, 8:36 AM

You’re a genius 😃

Mariana Menchero

08/23/2024, 8:57 PM

that's overly generous 😅 We're working hard to add the features our users need as quickly as possible. We’ll keep you updated and let you know as soon as it’s ready

Vidar Ingason

08/28/2024, 10:21 AM

Hi @Mariana Menchero, just curious. Do you think you'll update the package this week?

Mariana Menchero

08/28/2024, 11:22 PM

Hi @Vidar Ingason, yes, we just updated the package. Please try it out and let us know if you encounter any issues.

Vidar Ingason

08/29/2024, 9:57 AM

Hi, I keep getting Please set NIXTLA_API_KEY. Use nixtla_set_api_key() or set it as an environment variable in .Renviron when I sue the

num_partitions

argument. If I do not use the argument then everything runs fine.

Vidar Ingason

08/29/2024, 9:03 PM

I might be mistaken but since each worker runs in a separate R session, they don't automatically inherit the environment variables from the main R session which might be the reason for the error I'm getting. Wouldn't you need to include the

nixtla_set_api_key

somehow within future_lapply?

Vidar Ingason

08/29/2024, 9:10 PM

Did a small test:

Copy code

run_parallel_forecasts <- function(data_tbl, M, h = 12, id_col = "unique_id", time_col = "ds", target_col = "y") {
  
  # Split the data into M partitions
  unique_ids <- unique(data_tbl[[id_col]])
  ids_per_partition <- ceiling(length(unique_ids) / M)
  split_ids <- split(unique_ids, rep(1:M, each = ids_per_partition, length.out = length(unique_ids)))
  
  # Define a helper function to filter the data based on unique IDs
  filter_data <- function(ids, data) {
    data %>% filter(!!sym(id_col) %in% ids)
  }
  
  # Plan for parallel processing
  plan(multisession)
  
  # Run the forecasts in parallel
  forecasts <- future_lapply(split_ids, function(ids) {
    # Ensure the API key is set inside each worker
    nixtla_set_api_key(Sys.getenv("TIMEGPT"))             # <- ONLY WORKS IF I ADD THIS TO MY CODE
    
    partition_data <- filter_data(ids, data_tbl)
    nixtla_client_forecast(
      df = partition_data,
      h = h,
      id_col = id_col,
      time_col = time_col,
      target_col = target_col
    )
  })
  
  # Combine the results into a single data frame
  result <- bind_rows(forecasts)
  
  return(result)
}

I get the same error with my code above unless include

nixtla_set_api_key

within

future_lapply

Mariana Menchero

08/29/2024, 9:29 PM

Have you tried saving your API key in the .Renviron file @Vidar Ingason?

Mariana Menchero

08/29/2024, 9:30 PM

as you pointed out, the separate R sessions are probably causing the issue with the API key

Mariana Menchero

08/29/2024, 9:30 PM

here's how to set yours in the .Renviron file

Vidar Ingason

08/29/2024, 9:37 PM

I had it in my .Renviron but with a different name. Of course it works now after I changed the name of the environment variable in the .Renviron file.

Mariana Menchero

08/29/2024, 9:42 PM

That's great to hear. Yes, for parallel processing you'll probably need your credentials defined globally.

Vidar Ingason

08/30/2024, 12:41 PM

Hi @Mariana Menchero, what is your experience regarding this error:

Copy code

Error in getGlobalsAndPackages(expr, envir = envir, globals = globals) : 
  The total size of the 4 globals exported for future expression ('FUN()') is 12.31 GiB.. This exceeds the maximum allowed size of 500.00 MiB (option 'future.globals.maxSize'). The three largest globals are 'FUN' (12.31 GiB of class 'function'), '.get_api_key' (1.96 KiB of class 'function') and '.transient_errors' (1.30 KiB of class 'function')

Should I set it to some high value like:

Copy code

options(future.globals.maxSize = 20 * 1024^3)

Or is there a better way to avoid this error?

Vidar Ingason

08/30/2024, 3:04 PM

Ok it's because I'm running this functions as a part of a larger wrapper function and the future package is absorbing the whole environment, I think.

3 Views

Open in Slack

Previous Next