I am trying to implement the quickstart (distributed) on my synapse notebook and when I am trying to fit the lightGBM model I am getting the following error:
Py4JJavaError: An error occurred while calling o27957.fit.
: java.lang.Exception: Dataset create from samples call failed in LightGBM with error: Feature (Column_) appears more than one time.
at com.microsoft.azure.synapse.ml.lightgbm.LightGBMUtils$.validate(LightGBMUtils.scala:18)
at com.microsoft.azure.synapse.ml.lightgbm.dataset.ReferenceDatasetUtils$.createReferenceDatasetFromSample(ReferenceDatasetUtils.scala:47)
at com.microsoft.azure.synapse.ml.lightgbm.LightGBMBase.calculateRowStatistics(LightGBMBase.scala:545)
at com.microsoft.azure.synapse.ml.lightgbm.LightGBMBase.trainOneDataBatch(LightGBMBase.scala:425)
at com.microsoft.azure.synapse.ml.lightgbm.LightGBMBase.$anonfun$train$2(LightGBMBase.scala:62)
at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logVerb(SynapseMLLogging.scala:163)
at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logVerb$(SynapseMLLogging.scala:160)
at com.microsoft.azure.synapse.ml.lightgbm.LightGBMRegressor.logVerb(LightGBMRegressor.scala:39)
at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logFit(SynapseMLLogging.scala:153)
at com.microsoft.azure.synapse.ml.logging.SynapseMLLogging.logFit$(SynapseMLLogging.scala:152)
at com.microsoft.azure.synapse.ml.lightgbm.LightGBMRegressor.logFit(LightGBMRegressor.scala:39)
at com.microsoft.azure.synapse.ml.lightgbm.LightGBMBase.train(LightGBMBase.scala:64)
at com.microsoft.azure.synapse.ml.lightgbm.LightGBMBase.train$(LightGBMBase.scala:36)
at com.microsoft.azure.synapse.ml.lightgbm.LightGBMRegressor.train(LightGBMRegressor.scala:39)
at com.microsoft.azure.synapse.ml.lightgbm.LightGBMRegressor.train(LightGBMRegressor.scala:39)
at org.apache.spark.ml.Predictor.fit(Predictor.scala:114)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.base/java.lang.Thread.run(Thread.java:829)
I tried debugging it by using .preprocess to check if there are any dupilcate columns, but there aren't any. Is there a way to resolve this issue?