英文:
Fast-CSV modify before loading CSV to MySQL
问题
我正在尝试将CSV文件加载到我的MYSQL数据库中,但在这之前,我需要稍微修改它。CSV文件是用竖线(|)分隔的。
CSV文件中有一列名为Party:Identification。此列包含类似"a:hello, b:hi c:151 ......"的结果。这可以无限延续。
我只需要获取c的值。我已经想出了一种方法,可以解决这个问题,但我不知道如何在文件插入数据库之前修改该值。
我尝试将标题中的所有冒号替换为空字符串,然后使用.transform来修改值,但似乎这只会更改标题,而不是列中的值。附上下面的代码。
csv.parseFile(req.file.path, {
headers: headers => headers.map(function (header) {
const newHeaders = header.replaceAll(" ", "").replaceAll(":", "")
console.log(newHeaders)
return newHeaders
}),
delimiter: '|'
})
.transform(function(data) {
console.log(data)
PartyIdentification: getPartyID(data.partyIdentification)
})
.on("error", (err) => console.error(err))
.on("finish", function () {
query("LOAD DATA LOCAL INFILE '" +
file +
"' INTO TABLE table " +
"FIELDS TERMINATED BY '|'" +
" LINES TERMINATED BY '\n'" +
" IGNORE 1 ROWS;").then(r =>
console.log(file)
)
})
function getPartyID(str) {
if (str === undefined) return ""
const split = str.split(",")
const value = split.find(val => {
return val.includes("c")
})
if(value === undefined) return ""
return (value.split(":")[1].trim())
}
希望这可以帮助你解决问题。如果需要进一步的帮助,请随时提问。
英文:
I am trying to load a CSV file to my MYSQL database, however before I do so I need to modify it slightly. The CSV file is Pipe delimitated (|)
I have a column in the CSV file called Party:Identification. This column has results such as "a:hello, b:hi c:151 ......" This can go on infinitely.
I only need to get the value for c. I have come up with a method that works for this, however I am stuck on how to modify the value before the file is inserted into the database.
I tried replacing all the ":" in the headers with "" and then using .transform to modify the values, however this doesn't appear to change the values in the column, only the header. Code is attached below.
csv.parseFile(req.file.path, {
headers: headers => headers.map(function (header) {
const newHeaders = header.replaceAll(" ", "").replaceAll(":", "")
console.log(newHeaders)
return newHeaders
}),
delimiter: '|'
})
.transform(function(data) {
console.log(data)
PartyIdentification: getPartyID(data.partyIdentification)
})
.on("error", (err) => console.error(err))
.on("finish", function () {
query("LOAD DATA LOCAL INFILE '" +
file +
"' INTO TABLE table " +
" FIELDS TERMINATED BY '|'" +
" LINES TERMINATED BY '\n'" +
" IGNORE 1 ROWS;").then(r =>
console.log(file)
)
})
function getPartyID(str) {
if (str === undefined) return ""
const split = str.split(",")
const value = split.find(val => {
return val.includes("c")
})
if(value === undefined) return ""
return (value.split(":")[1].trim())
}
答案1
得分: 1
您可以使用正则表达式来解析字符串中的 c:123
值:
function getPartyID(str) {
if (str === undefined) return "";
const m = str.match(/\bc:([^ ]*)/);
return m ? m[1] : null;
}
[
"a:hello, b:hi c:151 d:foo",
"a:hello, b:no_c",
].forEach(str => {
console.log(str, '==>', getPartyID(str));
});
输出:
a:hello, b:hi c:151 d:foo ==> 151
a:hello, b:no_c ==> null
正则表达式的解释:
\b
-- 单词边界c:
-- 文本字面值([^ ]*)
-- 捕获组 1,包括空格之前的值
**更新 1:**根据关于如何将修改后的数据插入到 MySQL 的附加问题,这是一个不使用 INFILE
的解决方案。相反,它将文件加载到内存中(在这里模拟为 const input
),根据需要修改数据,并构造一个插入所有数据的 SQL 语句。重要提示:您可能需要针对 SQL 注入添加转义。
const input = `Name|Party:Identification|Date
Foo|a:hello, b:hi c:151 d:foo|2022-01-11
Bar|a:hola, b:hey c:99 d:bar|2022-01-12
Nix|a:nix, b:ni d:nix|2022-01-13`;
const partIdFieldName = 'Party:Identification';
function getPartyID(str) {
if (str === undefined) return "";
const m = str.match(/\bc:([^ ]*)/);
return m ? m[1] : 0;
}
let partIdIdx = 0;
let data = input.split(/[\r\n]+/).map((row, idx) => {
let cells = row.split('|');
if(idx === 0) {
partIdIdx = cells.indexOf(partIdFieldName);
} else {
cells[partIdIdx] = getPartyID(cells[partIdIdx]);
}
return cells;
});
let sql = 'INSERT INTO tbl_name\n' +
' (' + data[0].map(val => '"' + val + '"').join(',') + ')\n' +
'VALUES\n' +
data.slice(1).map(row => {
return ' (' + row.map(val => /^[\d+\.]+$/.test(val)
? val
: '"' + val + '"'
).join(',') + ')';
}).join('\n') + ';';
console.log(sql);
输出:
INSERT INTO tbl_name
("Name","Party:Identification","Date")
VALUES
("Foo",151,"2022-01-11")
("Bar",99,"2022-01-12")
("Nix",0,"2022-01-13");
英文:
You can use a regex to parse the value of c:123
in a string:
<!-- begin snippet: js hide: false console: true babel: false -->
<!-- language: lang-js -->
function getPartyID(str) {
if (str === undefined) return "";
const m = str.match(/\bc:([^ ]*)/);
return m ? m[1] : null;
}
[
"a:hello, b:hi c:151 d:foo",
"a:hello, b:no_c",
].forEach(str => {
console.log(str, '==>', getPartyID(str));
});
<!-- end snippet -->
Output:
a:hello, b:hi c:151 d:foo ==> 151
a:hello, b:no_c ==> null
Explanation of regex:
\b
-- word boundaryc:
-- literal text([^ ]*)
-- capture group 1 with value, up to and excluding space
UPDATE 1: Based on additional question on how to insert modified data into MySQL, here is a solution that does not use INFILE
, but instead loads the file into memory (here simulated with const input
), modifies the data as needed, and constructs a SQL statement that inserts all the data. IMPORTANT: You likely want to add escapes against SQL injections.
<!-- begin snippet: js hide: false console: true babel: false -->
<!-- language: lang-js -->
const input = Name|Party:Identification|Date
;
Foo|a:hello, b:hi c:151 d:foo|2022-01-11
Bar|a:hola, b:hey c:99 d:bar|2022-01-12
Nix|a:nix, b:ni d:nix|2022-01-13
const partIdFieldName = 'Party:Identification';
function getPartyID(str) {
if (str === undefined) return "";
const m = str.match(/\bc:([^ ]*)/);
return m ? m[1] : 0;
}
let partIdIdx = 0;
let data = input.split(/[\r\n]+/).map((row, idx) => {
let cells = row.split('|');
if(idx === 0) {
partIdIdx = cells.indexOf(partIdFieldName);
} else {
cells[partIdIdx] = getPartyID(cells[partIdIdx]);
}
return cells;
});
//console.log('data', '==>', data);
let sql = 'INSERT INTO tbl_name\n' +
' (' + data[0].map(val => '"' + val + '"').join(',') + ')\n' +
'VALUES\n' +
data.slice(1).map(row => {
return ' (' + row.map(val => /^[\d+\.]+$/.test(val)
? val
: '"' + val + '"'
).join(',') + ')'
}).join('\n') + ';';
console.log(sql);
<!-- end snippet -->
Output:
INSERT INTO tbl_name
("Name","Party:Identification","Date")
VALUES
("Foo",151,"2022-01-11")
("Bar",99,"2022-01-12")
("Nix",0,"2022-01-13");
答案2
得分: 0
不要在加载之前费心修复 CSV 文件,只需在加载时将不需要的列丢弃。
例如,这将仅加载第三列:
LOAD DATA ...
(@a, @b, c_col, @d, @e, ...)
也就是说,将不需要的列捕获到 @ 变量中,然后忽略它们。
如果您需要在存储到表中之前移除 "c:",那么可以这样做:
LOAD DATA ...
(@a, @b, @c, @d, @e, ...)
SET c_c0l = MID(@c, 3)
(或者任何适用的表达式。还可以查看 SUBSTRING_INDEX
,看看是否更适合您的情况。)
LOAD DATA
非常快,即使在这种浪费模式下也是如此。而且对您来说编码工作要少得多。
英文:
Don't bother fixing the csv file before loading, simply toss the unwanted columns as you LOAD
it.
This, for example, will load only the 3rd column:
LOAD DATA ...
(@a, @b, c_col, @d, @e, ...)
That is, capture the unwanted columns into @variables that you will then ignore.
If you need to remove the c:
before storing into the table, then
LOAD DATA ...
(@a, @b, @c, @d, @e, ...)
SET c_c0l = mid(@c, 3)
(or whatever expression will work. See also SUBSTRING_INDEX
in case it would work better.)
LOAD DATA
is plenty fast, even in this wasteful mode. And a lot less coding on your part.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论