Fast-CSV 在加载 CSV 到 MySQL 前进行修改

huangapple go评论72阅读模式
英文:

Fast-CSV modify before loading CSV to MySQL

问题

我正在尝试将CSV文件加载到我的MYSQL数据库中,但在这之前,我需要稍微修改它。CSV文件是用竖线(|)分隔的。

CSV文件中有一列名为Party:Identification。此列包含类似"a:hello, b:hi c:151 ......"的结果。这可以无限延续。

我只需要获取c的值。我已经想出了一种方法,可以解决这个问题,但我不知道如何在文件插入数据库之前修改该值。

我尝试将标题中的所有冒号替换为空字符串,然后使用.transform来修改值,但似乎这只会更改标题,而不是列中的值。附上下面的代码。

csv.parseFile(req.file.path, {
    headers: headers => headers.map(function (header) {
        const newHeaders = header.replaceAll(" ", "").replaceAll(":", "")
        console.log(newHeaders)
        return newHeaders
    }),
    delimiter: '|'
})
    .transform(function(data) {
        console.log(data)
        PartyIdentification: getPartyID(data.partyIdentification)
    })
    .on("error", (err) => console.error(err))
    .on("finish", function () {
        query("LOAD DATA LOCAL INFILE '" +
            file +
            "' INTO TABLE table " +
            "FIELDS TERMINATED BY '|'" +
            " LINES TERMINATED BY  '\n'" +
            " IGNORE 1 ROWS;").then(r =>
            console.log(file)
        )
    })

function getPartyID(str) {
    if (str === undefined) return ""
    const split = str.split(",")
    const value = split.find(val => {
        return val.includes("c")
    })
    if(value === undefined) return ""
    return (value.split(":")[1].trim())
}

希望这可以帮助你解决问题。如果需要进一步的帮助,请随时提问。

英文:

I am trying to load a CSV file to my MYSQL database, however before I do so I need to modify it slightly. The CSV file is Pipe delimitated (|)
I have a column in the CSV file called Party:Identification. This column has results such as "a:hello, b:hi c:151 ......" This can go on infinitely.
I only need to get the value for c. I have come up with a method that works for this, however I am stuck on how to modify the value before the file is inserted into the database.

I tried replacing all the ":" in the headers with "" and then using .transform to modify the values, however this doesn't appear to change the values in the column, only the header. Code is attached below.

 csv.parseFile(req.file.path, {
        headers: headers => headers.map(function (header) {
            const newHeaders = header.replaceAll(" ", "").replaceAll(":", "")
            console.log(newHeaders)
            return newHeaders
        }),
        delimiter: '|'
    })
        .transform(function(data) {
            console.log(data)
            PartyIdentification: getPartyID(data.partyIdentification)
        })
        .on("error", (err) => console.error(err))
        .on("finish", function () {
            query("LOAD DATA LOCAL INFILE '" +
                file +
                "' INTO TABLE table " +
                " FIELDS TERMINATED BY '|'" +
                " LINES TERMINATED BY  '\n'" +
                " IGNORE 1 ROWS;").then(r =>
                console.log(file)
            )
        })



function getPartyID(str) {
    if (str === undefined) return ""
    const split = str.split(",")
    const value = split.find(val => {
        return val.includes("c")
    })
    if(value === undefined) return ""
    return (value.split(":")[1].trim())
}

答案1

得分: 1

您可以使用正则表达式来解析字符串中的 c:123 值:

function getPartyID(str) {
  if (str === undefined) return "";
  const m = str.match(/\bc:([^ ]*)/); 
  return m ? m[1] : null;
}

[
  "a:hello, b:hi c:151 d:foo",
  "a:hello, b:no_c",
].forEach(str => {
  console.log(str, '==>', getPartyID(str));
});

输出:

a:hello, b:hi c:151 d:foo ==> 151
a:hello, b:no_c ==> null

正则表达式的解释:

  • \b -- 单词边界
  • c: -- 文本字面值
  • ([^ ]*) -- 捕获组 1,包括空格之前的值

**更新 1:**根据关于如何将修改后的数据插入到 MySQL 的附加问题,这是一个不使用 INFILE 的解决方案。相反,它将文件加载到内存中(在这里模拟为 const input),根据需要修改数据,并构造一个插入所有数据的 SQL 语句。重要提示:您可能需要针对 SQL 注入添加转义。

const input = `Name|Party:Identification|Date
Foo|a:hello, b:hi c:151 d:foo|2022-01-11
Bar|a:hola, b:hey c:99 d:bar|2022-01-12
Nix|a:nix, b:ni d:nix|2022-01-13`;
const partIdFieldName = 'Party:Identification';

function getPartyID(str) {
  if (str === undefined) return "";
  const m = str.match(/\bc:([^ ]*)/); 
  return m ? m[1] : 0;
}

let partIdIdx = 0;
let data = input.split(/[\r\n]+/).map((row, idx) => {
  let cells = row.split('|');
  if(idx === 0) {
    partIdIdx = cells.indexOf(partIdFieldName);
  } else {
    cells[partIdIdx] = getPartyID(cells[partIdIdx]);
  }
  return cells;
});

let sql = 'INSERT INTO tbl_name\n' +
 '  (' + data[0].map(val => '"' + val + '"').join(',') + ')\n' +
 'VALUES\n' +
 data.slice(1).map(row => {
   return '  (' + row.map(val => /^[\d+\.]+$/.test(val)
     ? val
     : '"' + val + '"'
   ).join(',') + ')';
 }).join('\n') + ';';
console.log(sql);

输出:

INSERT INTO tbl_name
  ("Name","Party:Identification","Date")
VALUES
  ("Foo",151,"2022-01-11")
  ("Bar",99,"2022-01-12")
  ("Nix",0,"2022-01-13");
英文:

You can use a regex to parse the value of c:123 in a string:

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->

function getPartyID(str) {
if (str === undefined) return &quot;&quot;;
const m = str.match(/\bc:([^ ]*)/); 
return m ? m[1] : null;
}
[
&quot;a:hello, b:hi c:151 d:foo&quot;,
&quot;a:hello, b:no_c&quot;,
].forEach(str =&gt; {
console.log(str, &#39;==&gt;&#39;, getPartyID(str));
});

<!-- end snippet -->

Output:

a:hello, b:hi c:151 d:foo ==&gt; 151
a:hello, b:no_c ==&gt; null

Explanation of regex:

  • \b -- word boundary
  • c: -- literal text
  • ([^ ]*) -- capture group 1 with value, up to and excluding space

UPDATE 1: Based on additional question on how to insert modified data into MySQL, here is a solution that does not use INFILE, but instead loads the file into memory (here simulated with const input), modifies the data as needed, and constructs a SQL statement that inserts all the data. IMPORTANT: You likely want to add escapes against SQL injections.

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-js -->
const input = Name|Party:Identification|Date
Foo|a:hello, b:hi c:151 d:foo|2022-01-11
Bar|a:hola, b:hey c:99 d:bar|2022-01-12
Nix|a:nix, b:ni d:nix|2022-01-13
;
const partIdFieldName = 'Party:Identification';

function getPartyID(str) {
if (str === undefined) return &quot;&quot;;
const m = str.match(/\bc:([^ ]*)/); 
return m ? m[1] : 0;
}
let partIdIdx = 0;
let data = input.split(/[\r\n]+/).map((row, idx) =&gt; {
let cells = row.split(&#39;|&#39;);
if(idx === 0) {
partIdIdx = cells.indexOf(partIdFieldName);
} else {
cells[partIdIdx] = getPartyID(cells[partIdIdx]);
}
return cells;
});
//console.log(&#39;data&#39;, &#39;==&gt;&#39;, data);
let sql = &#39;INSERT INTO tbl_name\n&#39; +
&#39;  (&#39; + data[0].map(val =&gt; &#39;&quot;&#39; + val + &#39;&quot;&#39;).join(&#39;,&#39;) + &#39;)\n&#39; +
&#39;VALUES\n&#39; +
data.slice(1).map(row =&gt; {
return &#39;  (&#39; + row.map(val =&gt; /^[\d+\.]+$/.test(val)
? val
: &#39;&quot;&#39; + val + &#39;&quot;&#39;
).join(&#39;,&#39;) + &#39;)&#39;
}).join(&#39;\n&#39;) + &#39;;&#39;;
console.log(sql);

<!-- end snippet -->

Output:

INSERT INTO tbl_name
(&quot;Name&quot;,&quot;Party:Identification&quot;,&quot;Date&quot;)
VALUES
(&quot;Foo&quot;,151,&quot;2022-01-11&quot;)
(&quot;Bar&quot;,99,&quot;2022-01-12&quot;)
(&quot;Nix&quot;,0,&quot;2022-01-13&quot;);

答案2

得分: 0

不要在加载之前费心修复 CSV 文件,只需在加载时将不需要的列丢弃。

例如,这将仅加载第三列:

LOAD DATA ...
    (@a, @b, c_col, @d, @e, ...)

也就是说,将不需要的列捕获到 @ 变量中,然后忽略它们。

如果您需要在存储到表中之前移除 "c:",那么可以这样做:

LOAD DATA ...
    (@a, @b, @c, @d, @e, ...)
    SET c_c0l = MID(@c, 3)

(或者任何适用的表达式。还可以查看 SUBSTRING_INDEX,看看是否更适合您的情况。)

LOAD DATA 非常快,即使在这种浪费模式下也是如此。而且对您来说编码工作要少得多。

英文:

Don't bother fixing the csv file before loading, simply toss the unwanted columns as you LOAD it.

This, for example, will load only the 3rd column:

LOAD DATA ...
(@a, @b, c_col, @d, @e, ...)

That is, capture the unwanted columns into @variables that you will then ignore.

If you need to remove the c: before storing into the table, then

LOAD DATA ...
(@a, @b, @c, @d, @e, ...)
SET c_c0l = mid(@c, 3)

(or whatever expression will work. See also SUBSTRING_INDEX in case it would work better.)

LOAD DATA is plenty fast, even in this wasteful mode. And a lot less coding on your part.

huangapple
  • 本文由 发表于 2023年1月9日 06:36:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/75051762.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定