2020年1月3日 23:25:23go评论154阅读模式

英文:

How to add a character to the beginning of a line and end of a line in python

问题

以下是翻译好的部分：

我有一个数据集，是从我的IT团队那里获取的。他们有一个自动提取数据的过程，他们不愿意更改。

文件看起来像这样（根据需要添加更多行）：

col1#|#col2#|#col3#|#col4

data1#|#data2#|#data3#|#data4

cdata1#|#da#ta2#|#data3#|#data4

（第4行是一个示例，其中我的数据中的#会破坏只有#的引号字符）

#看起来它们试图成为引号字符。我想像那样使用它们，但由于某种原因，它们不包括在每行的开头或每行的结尾。文件具有不同的列数，因此我试图处理它们，基本上是在每行的开头和结尾添加一个#。

而且，由于#经常出现在我的数据中，我想将#转换为###，以使导入到我的工具更清洁。

所以我想要的是：

###col1###|###col2###|###col3###|###col4###|###col4###

我应该如何实现这一点？

#当前用于处理CSV的代码：

csv_pointer = open(file, encoding=CSV_Encoding, errors=Error_Detection)
csv_reader = csv.reader(
    csv_pointer,
    delimiter=CSV_Seperator,
    quoting=csv.QUOTE_NONE
)
batch = list()
# 对于CSV阅读器中的每一行
for row in csv_reader:
    # 将处理后的行追加到批处理列表中
    # 处理后的行意味着我们剥离字段以去除冗余数据
    # 如果行的长度不达到FIELDS_COUNT，则添加Nones
    batch.append([k.strip() for k in row] + [None] * (FIELDS_COUNT - len(row)))
# 检查批处理的长度是否大于或等于ROWS_AT_ONCE
if len(batch) >= ROWS_AT_ONCE:
    # 如果是的话，使用executemany方法将批处理列表中的数据插入数据库
    curr.executemany(insert_func(Table_Name), batch)
    # 提交
    conn.commit()
    # 将批处理列表重置为空列表
    batch = list()
# 如果批处理列表不为空
if batch:
    # 如果是的话，使用executemany方法将批处理列表中的数据插入数据库
    curr.executemany(insert_func(Table_Name), batch)
    # 提交
    conn.commit()
    # 删除批处理（以防程序出错并需要删除批处理）
    del batch

我尝试将分隔符更改为#|#，这似乎可以解决我的问题，但它返回错误：

TypeError: “delimiter”必须是一个1字符的字符串

英文:

I have a data set I get from my IT group. They have an automated extract they are unwilling to change.

The files looks like this (add more lines as requested)

col1#|#col2#|#col3#|#col4

data1#|#data2#|#data3#|#data4

cdata1#|#da#ta2#|#data3#|#data4

(line 4, this is an example where a # in my data screws up the quote character of only #)

The # look like they are attempting to be quote characters. I'd like to use them like that, but for whatever reason they don't include one at the start of each line or the end of each line. The files are of various column counts, so I'm trying to process them to basically add a # to the start and end of each line.

Also since # is often found in my data I'd like to convert the # into ### to make the import to my tool cleaner.

So I'd like

###col1###|###col2###|###col3###|###col4###|###col4###

How could I accomplish this?

current code being used to process csv:

csv_pointer = open(file, encoding=CSV_Encoding, errors=Error_Detection)
csv_reader = csv.reader(
    csv_pointer,
    delimiter=CSV_Seperator,
    quoting=csv.QUOTE_NONE

)
batch = list()
# for each row in csv reader
for row in csv_reader:
    # append the processed row to the batch list
    # processed row meaning we strip down the fields to remove redundant 
data
    # and add Nones if the length of the row is not up to the FIELDS_COUNT
    batch.append([k.strip() for k in row] + [None] * (FIELDS_COUNT - len(row)))
# check if the batch length is greater than ROWS_AT_ONCE
if len(batch) &gt;= ROWS_AT_ONCE:
    # if it is use the executemany over the cursor to insert the data in the batch list to the database
    curr.executemany(insert_func(Table_Name), batch)
    # commit
    conn.commit()
    # set the batch to empty list again
    batch = list()
# if the batch list is not empty
if batch:
# if it is use the executemany over the cursor to insert the data in the batch list to the database
curr.executemany(insert_func(Table_Name), batch)
# commit
conn.commit()
# delete batch (just incase the program message up and it need to delete the batch)
del batch

I attempted to change my delimiter to #|#, which seems like it would fix my problem, but it's returning the error:
TypeError: "delimiter" must be a 1-character string

答案1

得分: 1

如果可以的话，为什么不使用你的IT团队的CSV格式分隔策略呢？你可以在解析工具中使用“#|#”来分割（如果是在Python中）：

text = "col1#|#col2#|#col3#|#col4"
values = text.split("#|#")
# values is ['col1', 'col2', 'col3', 'col4']

使用csv模块，你需要指定quoting参数。具体来说，在第2行：

csv_reader = csv.reader(
    csv_pointer,
    delimiter=CSV_Seperator,
    quoting='#'
)

如果这干扰了你字段中的“#”，那么你可能想采取一种字面上的方法来解决这个问题（而不使用csv库）：

batch = []
with open(file, 'r') as f:
     for l in f.readlines()[1:]: # 如果有标题，如果没有，那就删除[1:]
         batch.append(l.split("#|#"))

英文:

If I may, why not use the delimiting strategy of the csv format from your IT team?
You can split on "#|#" in parsing tool (if it's in python):

text=&quot;col1#|#col2#|#col3#|#col4&quot;
values = text.split(&quot;#|#&quot;)
# values is [&#39;col1&#39;, &#39;col2&#39;, &#39;col3&#39;, &#39;col4&#39;]

Using the csv module, you'll have to specify the quoting parameter. Specifically on line 2:

csv_reader = csv.reader(
    csv_pointer,
    delimiter=CSV_Seperator,
    quoting=&#39;#&#39;
)

If that interferes with '#'s in your fields, then you may want to take a literal approach to this problem (without the csv library):

batch = []
with open(file, r) as f:
     for l in f.readlines()[1:]: # if there&#39;s a header, if not, then remove the [1:]
         batch.append(l.split(&quot;#|#&quot;))

答案2

得分: 0

这将适用于您吗？

# 初始文本
text = "col1#|#col2#|#col3#|#col4"
# 在开头和结尾添加 ###
text = '###{}###'.format(text)
# 将 #|# 替换为 ###|###
text = text.replace("#|#", "###|###")

这将返回：

###col1###|###col2###|###col3###|###col4###

显然，这需要放入某种循环中以处理您拥有的所有数据，并且还可以合并成一行，但我将其分开以尽量使其更清晰。

英文:

Would something like this work for you?

#Initial text
text=&quot;col1#|#col2#|#col3#|#col4&quot;
#adds a ### to start and end
text=&#39;###{}###&#39;.format(text)
#Replaces #|# with ###|### 
text=text.replace(&quot;#|#&quot;,&quot;###|###&quot;)

This returns:

###col1###|###col2###|###col3###|###col4###

Obviously, this would need to go in a loop of some kind to go through all the data you have, and could also be consolidated into one line but I split it up to try and make it clearer.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Python中如何在行的开头和结尾添加字符。

问题

current code being used to process csv:

答案1

答案2

在Python 3中创建嵌套字典内的列表和元组。

python3的itertools.filterfalse非常慢。有哪些替代方法？

Python and Reddit APIs: my code doesn't give back all results from the huge reddit database. Why?

在录制视频中检测特定对象的角度

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论