:wave: Is there a general guidance on optimizing s...
# general
k
👋 Is there a general guidance on optimizing spark config to run distributed forecast faster? In Nixtla doc, it has a section called "Helpful Configuration", but it was never mentioned how it was tuned.
j
Please ignore that configuration and set
spark.sql.shuffle.partitions
to a multiple of your executors
k
I see. are we subject to the general tuning guideline of too large a multiple, maybe goes OOM, too small, too much overhead.
j
More or less. Memory isn't really a concern here, since it doesn't spike as some common ETL tasks could, it's more towards reducing the overhead. So you can try with 1 or 2 as the multiple and it should work
Mainly we want to avoid databricks' default (200)
k
thanks...
j
Here's an old thread with some more discussion if you're interested: https://linen.nixtla.io/t/16075533/hello-nixtla-community-i-have-a-few-questions-regarding-dist. It's on a different page because we're on the free tier and slack deletes old messages.