在Python中如何在行的开头和结尾添加字符。

huangapple go评论148阅读模式
英文:

How to add a character to the beginning of a line and end of a line in python

问题

以下是翻译好的部分:

我有一个数据集,是从我的IT团队那里获取的。他们有一个自动提取数据的过程,他们不愿意更改。

文件看起来像这样(根据需要添加更多行):

col1#|#col2#|#col3#|#col4

data1#|#data2#|#data3#|#data4

data1#|#data2#|#data3#|#data4

cdata1#|#da#ta2#|#data3#|#data4

(第4行是一个示例,其中我的数据中的#会破坏只有#的引号字符)

#看起来它们试图成为引号字符。我想像那样使用它们,但由于某种原因,它们不包括在每行的开头或每行的结尾。文件具有不同的列数,因此我试图处理它们,基本上是在每行的开头和结尾添加一个#。

而且,由于#经常出现在我的数据中,我想将#转换为###,以使导入到我的工具更清洁。

所以我想要的是:

###col1###|###col2###|###col3###|###col4###|###col4###

我应该如何实现这一点?

#当前用于处理CSV的代码:

csv_pointer = open(file, encoding=CSV_Encoding, errors=Error_Detection)
csv_reader = csv.reader(
    csv_pointer,
    delimiter=CSV_Seperator,
    quoting=csv.QUOTE_NONE
)
batch = list()
# 对于CSV阅读器中的每一行
for row in csv_reader:
    # 将处理后的行追加到批处理列表中
    # 处理后的行意味着我们剥离字段以去除冗余数据
    # 如果行的长度不达到FIELDS_COUNT,则添加Nones
    batch.append([k.strip() for k in row] + [None] * (FIELDS_COUNT - len(row)))
# 检查批处理的长度是否大于或等于ROWS_AT_ONCE
if len(batch) >= ROWS_AT_ONCE:
    # 如果是的话,使用executemany方法将批处理列表中的数据插入数据库
    curr.executemany(insert_func(Table_Name), batch)
    # 提交
    conn.commit()
    # 将批处理列表重置为空列表
    batch = list()
# 如果批处理列表不为空
if batch:
    # 如果是的话,使用executemany方法将批处理列表中的数据插入数据库
    curr.executemany(insert_func(Table_Name), batch)
    # 提交
    conn.commit()
    # 删除批处理(以防程序出错并需要删除批处理)
    del batch

我尝试将分隔符更改为#|#,这似乎可以解决我的问题,但它返回错误:

TypeError: “delimiter”必须是一个1字符的字符串

英文:

I have a data set I get from my IT group. They have an automated extract they are unwilling to change.

The files looks like this (add more lines as requested)

col1#|#col2#|#col3#|#col4

data1#|#data2#|#data3#|#data4

data1#|#data2#|#data3#|#data4

cdata1#|#da#ta2#|#data3#|#data4

(line 4, this is an example where a # in my data screws up the quote character of only #)

The # look like they are attempting to be quote characters. I'd like to use them like that, but for whatever reason they don't include one at the start of each line or the end of each line. The files are of various column counts, so I'm trying to process them to basically add a # to the start and end of each line.

Also since # is often found in my data I'd like to convert the # into ### to make the import to my tool cleaner.

So I'd like

###col1###|###col2###|###col3###|###col4###|###col4###

How could I accomplish this?

current code being used to process csv:

csv_pointer = open(file, encoding=CSV_Encoding, errors=Error_Detection)
csv_reader = csv.reader(
    csv_pointer,
    delimiter=CSV_Seperator,
    quoting=csv.QUOTE_NONE

)
batch = list()
# for each row in csv reader
for row in csv_reader:
    # append the processed row to the batch list
    # processed row meaning we strip down the fields to remove redundant 
data
    # and add Nones if the length of the row is not up to the FIELDS_COUNT
    batch.append([k.strip() for k in row] + [None] * (FIELDS_COUNT - len(row)))
# check if the batch length is greater than ROWS_AT_ONCE
if len(batch) >= ROWS_AT_ONCE:
    # if it is use the executemany over the cursor to insert the data in the batch list to the database
    curr.executemany(insert_func(Table_Name), batch)
    # commit
    conn.commit()
    # set the batch to empty list again
    batch = list()
# if the batch list is not empty
if batch:
# if it is use the executemany over the cursor to insert the data in the batch list to the database
curr.executemany(insert_func(Table_Name), batch)
# commit
conn.commit()
# delete batch (just incase the program message up and it need to delete the batch)
del batch

I attempted to change my delimiter to #|#, which seems like it would fix my problem, but it's returning the error:
TypeError: "delimiter" must be a 1-character string

答案1

得分: 1

如果可以的话,为什么不使用你的IT团队的CSV格式分隔策略呢?你可以在解析工具中使用“#|#”来分割(如果是在Python中):

text = "col1#|#col2#|#col3#|#col4"
values = text.split("#|#")
# values is ['col1', 'col2', 'col3', 'col4']

使用csv模块,你需要指定quoting参数。具体来说,在第2行:

csv_reader = csv.reader(
    csv_pointer,
    delimiter=CSV_Seperator,
    quoting='#'
)

如果这干扰了你字段中的“#”,那么你可能想采取一种字面上的方法来解决这个问题(而不使用csv库):

batch = []
with open(file, 'r') as f:
     for l in f.readlines()[1:]: # 如果有标题,如果没有,那就删除[1:]
         batch.append(l.split("#|#"))
英文:

If I may, why not use the delimiting strategy of the csv format from your IT team?
You can split on "#|#" in parsing tool (if it's in python):

text="col1#|#col2#|#col3#|#col4"
values = text.split("#|#")
# values is ['col1', 'col2', 'col3', 'col4']

Using the csv module, you'll have to specify the quoting parameter. Specifically on line 2:

csv_reader = csv.reader(
    csv_pointer,
    delimiter=CSV_Seperator,
    quoting='#'
)

If that interferes with '#'s in your fields, then you may want to take a literal approach to this problem (without the csv library):

batch = []
with open(file, r) as f:
     for l in f.readlines()[1:]: # if there's a header, if not, then remove the [1:]
         batch.append(l.split("#|#"))

答案2

得分: 0

这将适用于您吗?

# 初始文本
text = "col1#|#col2#|#col3#|#col4"
# 在开头和结尾添加 ###
text = '###{}###'.format(text)
# 将 #|# 替换为 ###|###
text = text.replace("#|#", "###|###")

这将返回:

###col1###|###col2###|###col3###|###col4###

显然,这需要放入某种循环中以处理您拥有的所有数据,并且还可以合并成一行,但我将其分开以尽量使其更清晰。

英文:

Would something like this work for you?

#Initial text
text="col1#|#col2#|#col3#|#col4"
#adds a ### to start and end
text='###{}###'.format(text)
#Replaces #|# with ###|### 
text=text.replace("#|#","###|###")

This returns:

###col1###|###col2###|###col3###|###col4###

Obviously, this would need to go in a loop of some kind to go through all the data you have, and could also be consolidated into one line but I split it up to try and make it clearer.

huangapple
  • 本文由 发表于 2020年1月3日 23:25:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/59581136.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定