Hi Team, I'm getting this error in R almost every...
# timegpt
v
Hi Team, I'm getting this error in R almost every time I have "large" dataset.
Copy code
Error in `httr2::req_perform()`:                               
! Failed to perform HTTP request.
Caused by error in `curl::curl_fetch_memory()`:
! Timeout was reached: [<http://dashboard.nixtla.io|dashboard.nixtla.io>] SSL/TLS connection timeout
The dataset is 462.000 rows and around 11.000 time series. I've seen blog posts by someone from the team and you talk about forecasting millions of time series using TimeGPT. I always seem to have trouble with dataset of several thousands time series.
This is what i see before I get the error
m
Hi @Vidar Ingason the current CRAN/dev versions of nixtlar don't have parallel processing yet. That's probably why you're having issues with such a large dataset. We're currently working on implementing that feature soon. It was going to be released in the next CRAN version, but if you need it soon, we'll add it to the dev version.
v
Hi @Mariana Menchero, do you have an estimated date of when you might add it? I'm setting up a client which has around 11.000 products and I would like to keep TimeGPT part of the solution. I know you're busy but since you asked, I would love to have this as soon as possible 🙂
m
Parallel processing is already a feature of the next version for CRAN, so let me check if we can add it to the dev version this week. I’ll get back back to you on Thursday. It shouldn't take long.
v
Thank you so much Mariana!
m
Hi @Vidar Ingason thanks for your patience. I opened a PR for supporting large datasets in
nixtlar
. It should be ready early next week. As soon as it has been merged to main, I'll let you know here. Regards.
v
You’re a genius 😃
m
that's overly generous 😅 We're working hard to add the features our users need as quickly as possible. We’ll keep you updated and let you know as soon as it’s ready
v
Hi @Mariana Menchero, just curious. Do you think you'll update the package this week?
m
Hi @Vidar Ingason, yes, we just updated the package. Please try it out and let us know if you encounter any issues.
v
Hi, I keep getting Please set NIXTLA_API_KEY. Use nixtla_set_api_key() or set it as an environment variable in .Renviron when I sue the
num_partitions
argument. If I do not use the argument then everything runs fine.
I might be mistaken but since each worker runs in a separate R session, they don't automatically inherit the environment variables from the main R session which might be the reason for the error I'm getting. Wouldn't you need to include the
nixtla_set_api_key
somehow within future_lapply?
Did a small test:
Copy code
run_parallel_forecasts <- function(data_tbl, M, h = 12, id_col = "unique_id", time_col = "ds", target_col = "y") {
  
  # Split the data into M partitions
  unique_ids <- unique(data_tbl[[id_col]])
  ids_per_partition <- ceiling(length(unique_ids) / M)
  split_ids <- split(unique_ids, rep(1:M, each = ids_per_partition, length.out = length(unique_ids)))
  
  # Define a helper function to filter the data based on unique IDs
  filter_data <- function(ids, data) {
    data %>% filter(!!sym(id_col) %in% ids)
  }
  
  # Plan for parallel processing
  plan(multisession)
  
  # Run the forecasts in parallel
  forecasts <- future_lapply(split_ids, function(ids) {
    # Ensure the API key is set inside each worker
    nixtla_set_api_key(Sys.getenv("TIMEGPT"))             # <- ONLY WORKS IF I ADD THIS TO MY CODE
    
    partition_data <- filter_data(ids, data_tbl)
    nixtla_client_forecast(
      df = partition_data,
      h = h,
      id_col = id_col,
      time_col = time_col,
      target_col = target_col
    )
  })
  
  # Combine the results into a single data frame
  result <- bind_rows(forecasts)
  
  return(result)
}
I get the same error with my code above unless include
nixtla_set_api_key
within
future_lapply
m
Have you tried saving your API key in the .Renviron file @Vidar Ingason?
as you pointed out, the separate R sessions are probably causing the issue with the API key
here's how to set yours in the .Renviron file
v
I had it in my .Renviron but with a different name. Of course it works now after I changed the name of the environment variable in the .Renviron file.
m
That's great to hear. Yes, for parallel processing you'll probably need your credentials defined globally.
v
Hi @Mariana Menchero, what is your experience regarding this error:
Copy code
Error in getGlobalsAndPackages(expr, envir = envir, globals = globals) : 
  The total size of the 4 globals exported for future expression ('FUN()') is 12.31 GiB.. This exceeds the maximum allowed size of 500.00 MiB (option 'future.globals.maxSize'). The three largest globals are 'FUN' (12.31 GiB of class 'function'), '.get_api_key' (1.96 KiB of class 'function') and '.transient_errors' (1.30 KiB of class 'function')
Should I set it to some high value like:
Copy code
options(future.globals.maxSize = 20 * 1024^3)
Or is there a better way to avoid this error?
Ok it's because I'm running this functions as a part of a larger wrapper function and the future package is absorbing the whole environment, I think.