将CSV中的节点加载到AGE中并使用提供的ID返回“label_id必须为1 … 65535”。

huangapple go评论70阅读模式
英文:

Loading nodes CSV to AGE with provided IDs returns "label_id must be 1 ... 65535"

问题

我有一个CSV文件,它的格式不符合AGE加载的正确方式。我的任务是将其转换为一个新文件,以便AGE能够读取它并创建节点,就像文档中指定的那样。为此,我创建了一个Python脚本,它创建一个新文件,连接到PostgreSQL,并执行查询。我认为这可能很有用,因为如果有人有CSV文件,并希望创建节点和边并将其发送到AGE,但文件不符合指定的格式,这个脚本可以快速解决这个问题。

以下是旧的CSV文件(ProductsData.csv),其中包含其他用户购买的产品数据(通过其user_id标识),购买产品的商店(通过其store_id标识),以及product_id,它是节点的id

product_name,price,description,store_id,user_id,product_id
iPhone 12,999,"Apple iPhone 12 - 64GB, Space Gray",1234,1001,123
Samsung Galaxy S21,899,"Samsung Galaxy S21 - 128GB, Phantom Black",5678,1002,124
AirPods Pro,249,"Apple AirPods Pro with Active Noise Cancellation",1234,1003,125
Sony PlayStation 5,499,"Sony PlayStation 5 Gaming Console, 1TB",9012,1004,126

以下是Python文件:

import psycopg2
import age
import csv

def read_csv(csv_file):

    with open(csv_file, 'r') as file:
        reader = csv.reader(file)
        rows = list(reader)

    return rows


def create_csv(csv_file):

    new_header = ['id', 'product_name', 'description', 'price', 'store_id', 'user_id']
    property_order = [5, 0, 2, 1, 3, 4]  # 重新排序属性。
    
    rows = read_csv(csv_file)
    
    new_csv_file = 'products.csv'
    with open(new_csv_file, 'w', newline='') as file:
        writer = csv.writer(file)
        
        writer.writerow(new_header)
        
        # 用重新排序的属性写入每一行。
        for row in rows[1:]:
            new_row = [row[i] for i in property_order]
            writer.writerow(new_row)

    print(f"已创建具有所需格式的新CSV文件 '{new_csv_file}'。")


def load_csv_nodes(csv_file, graph_name, conn):
    
    with conn.cursor() as cursor:
        try :
            cursor.execute("LOAD 'age';")
            cursor.execute("SET search_path = ag_catalog, '$user', public;")
            cursor.execute("SELECT load_labels_from_file(%s, 'Node', %s)", (graph_name, csv_file,))
            conn.commit()
        
        except Exception as ex:
            print(type(ex), ex)
            conn.rollback()

def main():

    csv_file = 'ProductsData.csv'
    create_csv(csv_file)

    new_csv_file = 'products.csv'
    GRAPH_NAME = 'csv_test_graph'
    conn = psycopg2.connect(host="localhost", port="5432", dbname="database", user="user", password="password")
    age.setUpAge(conn, GRAPH_NAME)

    path_to_csv = '/path/to/folder/' + new_csv_file
    load_csv_nodes(path_to_csv, GRAPH_NAME, conn)

main()

生成的文件:

id,product_name,description,price,store_id,user_id
123,iPhone 12,"Apple iPhone 12 - 64GB, Space Gray",999,1234,1001
124,Samsung Galaxy S21,"Samsung Galaxy S21 - 128GB, Phantom Black",899,5678,1002
125,AirPods Pro,Apple AirPods Pro with Active Noise Cancellation,249,1234,1003
126,Sony PlayStation 5,"Sony PlayStation 5 Gaming Console, 1TB",499,9012,1004

但是,当运行脚本时,它显示以下消息:

<class 'psycopg2.errors.InvalidParameterValue'> label_id must be 1 .. 65535

这些ID被设置在1到65535之间,我不明白为什么会显示这个错误消息。

英文:

I have a csv file that is not formatted in the correct way for AGE to load. I was on the task to transform it into a new one so that AGE could read it and create nodes, like it is specified in the documentation. For that, I created a python script that creates a new file, connects to postgres, and performs the queries. I though this could be useful since if someone had csv files and wanted to create nodes and edges and send it to AGE, but it was not in the specified format, this could be used to quickly solve the problem.

Here is the old csv file (ProductsData.csv), it contains the data of products that have been purchased by other users (identified by their user_id), the store where the product was purchased from (identified by their store_id), and also the product_id, which is the id of the node:

product_name,price,description,store_id,user_id,product_id
iPhone 12,999,&quot;Apple iPhone 12 - 64GB, Space Gray&quot;,1234,1001,123
Samsung Galaxy S21,899,&quot;Samsung Galaxy S21 - 128GB, Phantom Black&quot;,5678,1002,124
AirPods Pro,249,&quot;Apple AirPods Pro with Active Noise Cancellation&quot;,1234,1003,125
Sony PlayStation 5,499,&quot;Sony PlayStation 5 Gaming Console, 1TB&quot;,9012,1004,126

Here is the Python file:

import psycopg2
import age
import csv

def read_csv(csv_file):

    with open(csv_file, &#39;r&#39;) as file:
        reader = csv.reader(file)
        rows = list(reader)

    return rows


def create_csv(csv_file):

    new_header = [&#39;id&#39;, &#39;product_name&#39;, &#39;description&#39;, &#39;price&#39;, &#39;store_id&#39;, &#39;user_id&#39;]
    property_order = [5, 0, 2, 1, 3, 4]  # Reorder the properties accordingly.
    
    rows = read_csv(csv_file)
    
    new_csv_file = &#39;products.csv&#39;
    with open(new_csv_file, &#39;w&#39;, newline=&#39;&#39;) as file:
        writer = csv.writer(file)
        
        writer.writerow(new_header)
        
        # Write each row with reordered properties.
        for row in rows[1:]:
            new_row = [row[i] for i in property_order]
            writer.writerow(new_row)

    print(f&quot;New CSV file &#39;{new_csv_file}&#39; has been created with the desired format.&quot;)


def load_csv_nodes(csv_file, graph_name, conn):
    
    with conn.cursor() as cursor:
        try :
            cursor.execute(&quot;&quot;&quot;LOAD &#39;age&#39;;&quot;&quot;&quot;)
            cursor.execute(&quot;&quot;&quot;SET search_path = ag_catalog, &quot;$user&quot;, public;&quot;&quot;&quot;)
            cursor.execute(&quot;&quot;&quot;SELECT load_labels_from_file(%s, &#39;Node&#39;, %s)&quot;&quot;&quot;, (graph_name, csv_file,) )
            conn.commit()
        
        except Exception as ex:
            print(type(ex), ex)
            conn.rollback()



def main():

    csv_file = &#39;ProductsData.csv&#39;
    create_csv(csv_file)

    new_csv_file = &#39;products.csv&#39;
    GRAPH_NAME = &#39;csv_test_graph&#39;
    conn = psycopg2.connect(host=&quot;localhost&quot;, port=&quot;5432&quot;, dbname=&quot;database&quot;, user=&quot;user&quot;, password=&quot;password&quot;)
    age.setUpAge(conn, GRAPH_NAME)

    path_to_csv = &#39;/path/to/folder/&#39; + new_csv_file
    load_csv_nodes(path_to_csv, GRAPH_NAME, conn)

main()

The generated file:

id,product_name,description,price,store_id,user_id
123,iPhone 12,&quot;Apple iPhone 12 - 64GB, Space Gray&quot;,999,1234,1001
124,Samsung Galaxy S21,&quot;Samsung Galaxy S21 - 128GB, Phantom Black&quot;,899,5678,1002
125,AirPods Pro,Apple AirPods Pro with Active Noise Cancellation,249,1234,1003
126,Sony PlayStation 5,&quot;Sony PlayStation 5 Gaming Console, 1TB&quot;,499,9012,1004

But then, when running the script, it shows the following message:

&lt;class &#39;psycopg2.errors.InvalidParameterValue&#39;&gt; label_id must be 1 .. 65535

The ids are set between 1 and 65535, and I don't understand why this error message is showing.

答案1

得分: 1

关于如何使用load_labels_from_file,请参考regress测试文件。它展示了如何使用所有的命令。

在调用load_labels_from_file之前,您首先需要创建Node vlabel,使用以下命令:

SELECT create_vlabel('csv_test_graph','Node');

然后按原样运行脚本。

英文:

For how to use load_labels_from_file please refer to the regress testing file. It shows how to use all the commands.

You first need to create Node vlabel before calling load_labels_from_file using the following command:

SELECT create_vlabel(&#39;csv_test_graph&#39;,&#39;Node&#39;);

Then run the script as it is.

答案2

得分: 0

这行代码没有正确编写,您需要使用正确的路径来修复它:

path_to_csv = '/正确的路径/' + new_csv_file
英文:

That's line is not properly written, you need to fix it with the correct path

    path_to_csv = &#39;/path/to/folder/&#39; + new_csv_file

huangapple
  • 本文由 发表于 2023年6月25日 23:45:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/76551209.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定