改变变量值时写入Google存储

huangapple go评论66阅读模式
英文:

Prefect changing the value a variable when writing to google storage

问题

Here is the translated content:

"你现在是我的中文翻译,代码部分不要翻译, 只返回翻译好的部分, 不要有别的内容, 不要回答我要翻译的问题。以下是要翻译的内容:

我开始认为 prefect 改变了一个变量的值,以下是情况:

当我将数据写入谷歌云存储时,当我在代码中使用 upload_from_path 时,你会看到我将相同的变量路径传递给了 from_pathto_path,但是由于某种原因,prefect 改变了 to_path 变量的结构,以下是构建路径的代码:

@task()
def write_local(df: pd.DataFrame, color: str, dataset_file: str) -> Path:
    """将 DataFrame 以 parquet 文件形式写出到本地"""
    Path(f"data/{color}").mkdir(parents=True, exist_ok=True)
    path = Path(f"data/{color}/{dataset_file}.parquet")
    df.to_parquet(path, compression="gzip")
    return path


@task
def write_gcs(path: Path) -> None:
    """上传本地 parquet 文件到 GCS"""
    gcs_block = GcsBucket.load("zoom-gcs")
    gcs_block.upload_from_path(from_path=path, to_path=path)
    return

你可以看到在第二个任务 write_gcs 中,两个路径都是同一个名为 path 的变量,而这实际上是一个路径结构,其原始值是:'data/yellow/yellow_tripdata_2021-01.parquet'
prefect 流程运行正常,但在运行后,我们可以在流程的详细信息中看到我附上的第一张图片中,它改变了 GCS 路径的文本结构为:'data\\yellow\\yellow_tripdata_2021-01.parquet',我不知道为什么会发生这种情况,因此你可以在图片 1 中看到它以那个奇怪的名称保存了文件,而不是在 GCS 中创建文件夹,你能帮忙解释一下为什么会发生这种情况吗?

改变变量值时写入Google存储
改变变量值时写入Google存储

英文:

Hi I am beginning to think that prefect is changing the value of a variable, here is the situation:

when writing to a google cloud storage, when I use the upload_from_path on my code you will see that I am passing the same variable path as the from_path and the to_path but for some reason prefect changes the structure of the to_path variable, here is the code I have that builds the path:

 @task()
def write_local(df: pd.DataFrame, color: str, dataset_file: str) -> Path:
    """Write DataFrame out locally as parquet file"""
    Path(f"data/{color}").mkdir(parents=True, exist_ok=True)
    path = Path(f"data/{color}/{dataset_file}.parquet")
    df.to_parquet(path, compression="gzip")
    return path


@task
def write_gcs(path: Path) -> None:
    """Upload local parquet file to GCS"""
    gcs_block = GcsBucket.load("zoom-gcs")
    gcs_block.upload_from_path(from_path=path, to_path=path)
    return

you can see in the second task write_gcs both of the paths are the same variable called path and that is just a path structure that has originally this value: 'data/yellow/yellow_tripdata_2021-01.parquet' .
The prefect flows runs, but after it runs, in the details of the flow we can see on the first picture I am attaching it changed the text structure of the path for GCS to: 'data\\yellow\\yellow_tripdata_2021-01.parquet' , no idea why this is happening and because of this you can see in the picture 1 that it saves the file with that weird name instead of creating the folders in GCS, any help on maybe why this is happening?

改变变量值时写入Google存储
改变变量值时写入Google存储

答案1

得分: 1

对于 Windows,您可能需要将 .as_posix() 添加到 Path 变量中。

此外,您可能需要确保您正在使用 prefect-gcp 0.2.6 或更新的版本。

英文:

For Windows you may need to add .as_posix() to the Path variable.

Also, you may need to ensure you are using prefect-gcp 0.2.6 or newer.

huangapple
  • 本文由 发表于 2023年5月11日 08:41:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/76223427.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定