Dataproc serverless writing to Bigtable: org.apache.spark.SparkException: Task failed while writing rows

huangapple go评论49阅读模式
英文:

Dataproc serverless writing to Bigtable: org.apache.spark.SparkException: Task failed while writing rows

问题

如何找出根本原因?(我从Cassandra读取并写入Bigtable)

我尝试过:

  • 查看Cassandra日志
  • 消除列,以防它是数据问题
  • 将spark.cassandra.input.fetch.size_in_rows从100减少到10
  • spark.speculation既为true又为false
  • 等等

它在抛出错误之前首先加载了数十万行。Bigtable有数TB的可用空间。

23/03/30 18:13:42 WARN TaskSetManager: 在阶段1.0中丢失任务5.0(TID 6)(执行器1,IP地址:10.128.0.46):org.apache.spark.SparkException:在写入行时任务失败
        at org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:163)
        at org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1(SparkHadoopWriter.scala:88)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: 失败的操作:IllegalArgumentException:1次,存在问题的服务器:bigtable.googleapis.com
        at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.getExceptions(BigtableBufferedMutator.java:188)
        at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.handleExceptions(BigtableBufferedMutator.java:142)
        at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.mutate(BigtableBufferedMutator.java:133)
        at org.apache.hadoop.hbase.mapred.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:101)
        at org.apache.hadoop.hbase.mapred.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:52)
        at org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil.write(SparkHadoopWriter.scala:246)
        at org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$executeTask$1(SparkHadoopWriter.scala:138)
        at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1525)
        at org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:135)
        ... 9 more
英文:

How do I find out root cause? (I'm reading from Casssandra and writing to Bigtable)

I've tried:

  • looking through Cassandra logs
  • eliminating columns in case it was a data issue
  • reducing spark.cassandra.input.fetch.size_in_rows from 100 to 10
  • spark.speculation both true and false
  • etc.

It does load 100s of thousands of rows first before it throws the error. Bigtable has TBs of free space.

23/03/30 18:13:42 WARN TaskSetManager: Lost task 5.0 in stage 1.0 (TID 6) (10.128.0.46 executor 1): org.apache.spark.SparkException: Task failed while writing rows
        at org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:163)
        at org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1(SparkHadoopWriter.scala:88)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: IllegalArgumentException: 1 time, servers with issues: bigtable.googleapis.com
        at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.getExceptions(BigtableBufferedMutator.java:188)
        at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.handleExceptions(BigtableBufferedMutator.java:142)
        at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.mutate(BigtableBufferedMutator.java:133)
        at org.apache.hadoop.hbase.mapred.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:101)
        at org.apache.hadoop.hbase.mapred.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:52)
        at org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil.write(SparkHadoopWriter.scala:246)
        at org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$executeTask$1(SparkHadoopWriter.scala:138)
        at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1525)
        at org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:135)
        ... 9 more

答案1

得分: 1

错误消息表明它是由IllegalArgumentException引起的。

鉴于在引发错误之前,您能够将数千行写入Bigtable,很可能是因为达到了100,000个变更限制 https://cloud.google.com/bigtable/quotas#limits-operations。请注意,这个限制是基于变更的数量,而不是行数。

有可能某些行具有过多的列,并且每个列都会转换为一个变更 https://cloud.google.com/bigtable/docs/writes#write-types

您可以尝试以下方法:

  1. 检查您如何从Cassandra数据创建行变更。
  2. 检查是否有一些行具有超过10,000列(假设您每列创建1个变更)。
英文:

The error message indicates that it's caused by IllegalArgumentException.

Given that you were able to write thousands of rows to Bigtable before it throws the error, it's likely that you hit the 100,000 mutation limit https://cloud.google.com/bigtable/quotas#limits-operations. Note that this limit is on the number of mutations instead of number of rows.

It's possible that that some of the rows has too many columns, and each column is converted into a mutation https://cloud.google.com/bigtable/docs/writes#write-types.

You can try the following things:

  1. Check how you're creating row mutations from your cassandra data.
  2. Check if there are some rows with more than 10000 columns (assuming you're creating 1 mutation per column)

答案2

得分: 0

一些来自Cassandra的行出现了损坏:其中一些行的键中存在空值。在将表导出为CSV文件并加载到另一个数据库后,我偶然发现了这个问题。在删除这些损坏的行之后,一切都加载正常。

英文:

It turns out that a few rows from Cassandra were corrupt: there were nulls in the keys for a few rows. I discovered this accidentally after dumping the table to csv files and loading into another database.

After removing those corrupt rows, everything loaded fine.

huangapple
  • 本文由 发表于 2023年3月31日 04:29:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/75892734.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定