问题

我使用Python将一个BQ表以JSON格式导出到了GCS。导出成功了，但当我从GCS下载JSON文件时，我注意到特殊字符已经改变了。例如，

BQ中的 "Shirt &amp; Trouser Presses"

在GCS中变成了

"Shirt \u0026 Trouser Presses"

有没有一种方法可以确保在从BQ导出到GCS的过程中不改变编码？

以下是我使用的代码片段：

dataset_ref = bigquery.DatasetReference(BQ_PROJECT, dataset_id)
client = bigquery.Client(project=BQ_PROJECT)
tables = client.list_tables(dataset_id)
job_config = bigquery.job.ExtractJobConfig()
job_config.destination_format = bigquery.DestinationFormat.NEWLINE_DELIMITED_JSON
for table in tables:
    if table.table_type == "TABLE":
        table_id = table.table_id
        destination_blob = table_id
        table_ref = dataset_ref.table(table_id)
        destination_uri = "gs://{}/{}".format(BUCKET, destination_blob)

        extract_job = client.extract_table(
            table_ref,
            destination_uri,
            job_config=job_config,
            # Location must match that of the source table.
            location="EU",
        )  # API request
        extract_job.result()  # Waits for job to complete.

英文:

I exported a BQ table to GCS in JSON format using python. The export was successful, however, when I download the JSON files from GCS, I noticed that special caracters have changed. For example,

Shirt &amp; Trouser Presses

in BQ has changed to

Shirt \u0026 Trouser Presses

in GCS.

Is there a way to to ensure that the encoding does not change while exporting from BQ to GCS in JSON format?

Here is the code snippet I use:

dataset_ref = bigquery.DatasetReference(BQ_PROJECT, dataset_id)
        client = bigquery.Client(project=BQ_PROJECT)
        tables = client.list_tables(dataset_id)
        job_config = bigquery.job.ExtractJobConfig()
        job_config.destination_format = bigquery.DestinationFormat.NEWLINE_DELIMITED_JSON
        for table in tables:
            if table.table_type == &quot;TABLE&quot;:
                table_id = table.table_id
                destination_blob = table_id
                table_ref = dataset_ref.table(table_id)
                destination_uri = &quot;gs://{}/{}&quot;.format(BUCKET, destination_blob)
    
                extract_job = client.extract_table(
                    table_ref,
                    destination_uri,
                    job_config=job_config,
                    # Location must match that of the source table.
                    location=&quot;EU&quot;,
                )  # API request
                extract_job.result()  # Waits for job to complete.

答案1

得分: 2

我通过@johnHanley的帮助发现，当我使用pandas从GCS读取数据时，我得到了正确的编码。因此，"Shirt \u0026 Trouser Presses" 将被读取为 "Shirt & Trouser Presses"，问题因此解决。

英文:

With the help of @johnHanley, I figured out that when I read data from GCS using pandas I get the right encoding back. So "Shirt \u0026 Trouser Presses" will be read as "Shirt & Trouser Presses" using pandas. Hence the problem is solved

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何将BQ表以JSON格式导出到GCS而不更改编码方式。

问题

答案1

将API请求函数传递给另一个GoLang函数

按钮未显示在PYQT QVBoxLayout上。

在Python中根据另一个列表找到列表的最大值

在Pandas中根据条件添加一列

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论