2023年3月31日 04:29:11go评论137阅读模式

英文:

Dataproc serverless writing to Bigtable: org.apache.spark.SparkException: Task failed while writing rows

问题

如何找出根本原因？（我从Cassandra读取并写入Bigtable）

我尝试过：

查看Cassandra日志
消除列，以防它是数据问题
将spark.cassandra.input.fetch.size_in_rows从100减少到10
spark.speculation既为true又为false
等等

它在抛出错误之前首先加载了数十万行。Bigtable有数TB的可用空间。

23/03/30 18:13:42 WARN TaskSetManager: 在阶段1.0中丢失任务5.0（TID 6）（执行器1，IP地址：10.128.0.46）：org.apache.spark.SparkException：在写入行时任务失败
        at org.apache.spark.internal.io.SparkHadoopWriter$.executeTask（SparkHadoopWriter.scala:163）
        at org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1（SparkHadoopWriter.scala:88）
        at org.apache.spark.scheduler.ResultTask.runTask（ResultTask.scala:90）
        at org.apache.spark.scheduler.Task.run（Task.scala:131）
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3（Executor.scala:506）
        at org.apache.spark.util.Utils$.tryWithSafeFinally（Utils.scala:1491）
        at org.apache.spark.executor.Executor$TaskRunner.run（Executor.scala:509）
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker（ThreadPoolExecutor.java:1128）
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run（ThreadPoolExecutor.java:628）
        at java.base/java.lang.Thread.run（Thread.java:829）
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: 失败的操作：IllegalArgumentException：1次，存在问题的服务器：bigtable.googleapis.com
        at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.getExceptions（BigtableBufferedMutator.java:188）
        at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.handleExceptions（BigtableBufferedMutator.java:142）
        at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.mutate（BigtableBufferedMutator.java:133）
        at org.apache.hadoop.hbase.mapred.TableOutputFormat$TableRecordWriter.write（TableOutputFormat.java:101）
        at org.apache.hadoop.hbase.mapred.TableOutputFormat$TableRecordWriter.write（TableOutputFormat.java:52）
        at org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil.write（SparkHadoopWriter.scala:246）
        at org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$executeTask$1（SparkHadoopWriter.scala:138）
        at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks（Utils.scala:1525）
        at org.apache.spark.internal.io.SparkHadoopWriter$.executeTask（SparkHadoopWriter.scala:135）
        ... 9 more

英文:

How do I find out root cause? (I'm reading from Casssandra and writing to Bigtable)

I've tried:

looking through Cassandra logs
eliminating columns in case it was a data issue
reducing spark.cassandra.input.fetch.size_in_rows from 100 to 10
spark.speculation both true and false
etc.

It does load 100s of thousands of rows first before it throws the error. Bigtable has TBs of free space.

23/03/30 18:13:42 WARN TaskSetManager: Lost task 5.0 in stage 1.0 (TID 6) (10.128.0.46 executor 1): org.apache.spark.SparkException: Task failed while writing rows
        at org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:163)
        at org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$write$1(SparkHadoopWriter.scala:88)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
        at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
        at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
        at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 1 action: IllegalArgumentException: 1 time, servers with issues: bigtable.googleapis.com
        at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.getExceptions(BigtableBufferedMutator.java:188)
        at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.handleExceptions(BigtableBufferedMutator.java:142)
        at com.google.cloud.bigtable.hbase.BigtableBufferedMutator.mutate(BigtableBufferedMutator.java:133)
        at org.apache.hadoop.hbase.mapred.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:101)
        at org.apache.hadoop.hbase.mapred.TableOutputFormat$TableRecordWriter.write(TableOutputFormat.java:52)
        at org.apache.spark.internal.io.HadoopMapRedWriteConfigUtil.write(SparkHadoopWriter.scala:246)
        at org.apache.spark.internal.io.SparkHadoopWriter$.$anonfun$executeTask$1(SparkHadoopWriter.scala:138)
        at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1525)
        at org.apache.spark.internal.io.SparkHadoopWriter$.executeTask(SparkHadoopWriter.scala:135)
        ... 9 more

答案1

得分: 1

错误消息表明它是由IllegalArgumentException引起的。

鉴于在引发错误之前，您能够将数千行写入Bigtable，很可能是因为达到了100,000个变更限制 https://cloud.google.com/bigtable/quotas#limits-operations。请注意，这个限制是基于变更的数量，而不是行数。

有可能某些行具有过多的列，并且每个列都会转换为一个变更 https://cloud.google.com/bigtable/docs/writes#write-types。

您可以尝试以下方法：

检查您如何从Cassandra数据创建行变更。
检查是否有一些行具有超过10,000列（假设您每列创建1个变更）。

英文:

The error message indicates that it's caused by IllegalArgumentException.

Given that you were able to write thousands of rows to Bigtable before it throws the error, it's likely that you hit the 100,000 mutation limit https://cloud.google.com/bigtable/quotas#limits-operations. Note that this limit is on the number of mutations instead of number of rows.

It's possible that that some of the rows has too many columns, and each column is converted into a mutation https://cloud.google.com/bigtable/docs/writes#write-types.

You can try the following things:

Check how you're creating row mutations from your cassandra data.
Check if there are some rows with more than 10000 columns (assuming you're creating 1 mutation per column)

答案2

得分: 0

一些来自Cassandra的行出现了损坏：其中一些行的键中存在空值。在将表导出为CSV文件并加载到另一个数据库后，我偶然发现了这个问题。在删除这些损坏的行之后，一切都加载正常。

英文:

It turns out that a few rows from Cassandra were corrupt: there were nulls in the keys for a few rows. I discovered this accidentally after dumping the table to csv files and loading into another database.

After removing those corrupt rows, everything loaded fine.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Dataproc serverless writing to Bigtable: org.apache.spark.SparkException: Task failed while writing rows

问题

答案1

答案2

错误：（gcloud.preview.app）无效的选择：’run’

ClassNotFoundException: org.apache.spark.sql.connector.read.SupportsRuntimeFiltering on Google Dataproc cluster using Airflow

输出的Parquet文件在使用Spark中的列重新分区后非常大。

如何设置Dataproc Serverless的运行时版本

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论