Remove a column from CSV file in Notepad++

huangapple go评论55阅读模式
英文:

Remove a column from CSV file in Notepad++

问题

首先,我是Notepad++的新手。我尝试在Notepad++中编辑存储为CSV格式的数据集。由于某些单元格包含超过15位数字,无法在Excel中打开此文件,因为Excel会将这些数字转换为科学计数法。

数据集的列标题如下,

BNFUniqueID,username,CollectTeam,Hos_methodID,CollectData,hhstatus,hhupazila,hhunion,hhlocationsitetype,hhsitename,hhlocalblock,villagename,hhlandmark,hhmajiname,hhmajitel,HoHname,Hhsize,HoHcontact,cardtype,cardnum,olduniqueID,BNFname,BNFage,BNFsex,BNFagegroup

在这个数据集中,有一列(hhlandmark)在中间,我正在尝试删除整列。其中一个问题是,并非所有单元格都包含数据,所以使用<kbd>ALT</kbd> + <kbd>SHIFT</kbd> + <kbd>↓</kbd>不适合此任务,因为垂直选择也会选择其他列的单元格。

我正在寻找一种避免这个问题并仅删除我想要删除的列的方法。

英文:

First of all, I'm a newbie in Notepad++. I'm trying to edit this dataset in Notepad++, which is stored in CSV. I can't open this file in Excel as there are some cells containing digits longer than 15 and Excel will convert these digits.

The column headings are like this,

BNFUniqueID,username,CollectTeam,Hos_methodID,CollectData,hhstatus,hhupazila,hhunion,hhlocationsitetype,hhsitename,hhlocalblock,villagename,hhlandmark,hhmajiname,hhmajitel,HoHname,Hhsize,HoHcontact,cardtype,cardnum,olduniqueID,BNFname,BNFage,BNFsex,BNFagegroup

In this dataset there is a column (hhlandmark) in the middle and I'm trying to delete this whole column. One of the problem is that, not all the cells contain data, so the <kbd>ALT</kbd> + <kbd>SHIFT</kbd> + <kbd>↓</kbd> isn't suitable for this task, as vertical selection would block cells from other columns as well.

I'm looking for a way to avoid this and delete only the column I want to delete.

答案1

得分: 2

以下是已翻译的内容:

有12列在hhlandmark列之前,所以我们可以尝试以下的正则表达式查找和替换:

查找: ^((?:[^,],){12})[^,],(.*)$
替换: $1$2

这个模式用于匹配:

  • ^ 从行的开头
    • ((?:[^,]*,){12}) 匹配并捕获前12列到$1
    • [^,]*, 然后匹配第13列(hhlandmark)
    • (.*) 匹配并捕获行的其余部分到$2
  • $ 行的结尾

然后我们用$1$2进行替换,从而有效地删掉第13列hhlandmark

这里有一个正在运行的正则表达式演示:demo

英文:

There are 12 columns before the hhlandmark column, so we can try the following find and replace in regex mode:

Find:    ^((?:[^,]*,){12})[^,]*,(.*)$
Replace: $1$2

This pattern says to match:

  • ^ from the start of the line
    • ((?:[^,]*,){12}) match and capture in $1 the first 12 columns
    • [^,]*, then match the 13th column (hhlandmark)
    • (.*) match and capture in the $2 the rest of the line
  • $ end of the line

We then replace with $1$2 to effectively splice out the 13th hhlandmark column.

Here is a running regex demo.

答案2

得分: 0

但你可以很容易地使用Python的pandas库来做到这一点。

英文:

I dont know about notepad++, but you could do this easily using pandas library of python.

huangapple
  • 本文由 发表于 2023年3月1日 13:32:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/75599916.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定