2023年6月29日 01:49:51go评论86阅读模式

英文:

Parsing data with unusual format in python

问题

# Open file
f = open('data.txt', 'r')

# Initialize variables to store data
headers = []
values = []

# Flag to identify whether we are reading headers or values
reading_headers = True

# Loop over lines and extract variables of interest
for line in f:
    line = line.strip()
    # Check if the line is empty, and if so, switch to reading values
    if not line:
        reading_headers = not reading_headers
        continue
    
    columns = line.split()
    if reading_headers:
        headers.extend(columns)
    else:
        values.extend(columns)

# Now, you can organize the data into five separate columns
num_columns = 5
num_values = len(values) // num_columns

for i in range(num_values):
    start_index = i * num_columns
    end_index = (i + 1) * num_columns
    header_values = headers[start_index:end_index]
    data_values = values[start_index:end_index]
    print("\t".join(header_values + data_values))
    
# Close the file
f.close()

This code will read the file and separate the headers and values into two separate lists. It will then organize the data into five separate columns as requested and print the result.

英文:

I have this ASCII file that is formatted somewhat oddly where the headers are organized by rows followed the corresponding values and their assoicated error also organized by rows. A snippet of the file is shown below.

f     7523
  10001  10002  10003  10004  10005  10006  10007  10008  10009  10010  10011
  10012  10013  10014  10015  10016  10017  10018  10019  10020  10021  10022
vals
  0.00000E+00 0.0000  7.65079E-12 0.7071  0.00000E+00 0.0000  3.87977E-14 0.5513
  0.00000E+00 0.0000  0.00000E+00 0.0000  0.00000E+00 0.0000  0.00000E+00 0.0000
  0.00000E+00 0.0000  1.92698E-14 0.7071  4.47277E-14 0.7071  0.00000E+00 0.0000
  0.00000E+00 0.0000  0.00000E+00 0.0000  0.00000E+00 0.0000  1.10023E-11 0.7053
  1.74410E-11 0.5005  0.00000E+00 0.0000  1.69181E-13 0.7071  0.00000E+00 0.0000
  0.00000E+00 0.0000  0.00000E+00 0.0000
vals
  1.34313E-06 0.0104  1.32163E-06 0.0102  1.30039E-06 0.0105  1.37575E-06 0.0106
  1.30792E-06 0.0102  1.28351E-06 0.0104  1.32164E-06 0.0102  1.32969E-06 0.0104
  1.31707E-06 0.0104  1.27281E-06 0.0103  1.28429E-06 0.0106  1.27027E-06 0.0105
  1.29623E-06 0.0105  1.32037E-06 0.0101  1.28948E-06 0.0105  1.33163E-06 0.0106
  1.36073E-06 0.0102  1.35462E-06 0.0102  1.38641E-06 0.0102  1.33099E-06 0.0102
  1.35307E-06 0.0100  1.33882E-06 0.0105

I want to parse this file so that the data is organized in five separate columns so that it looks something like

10001      0.00000E+00 0.0000   1.34313E-06 0.0104
10002      7.65079E-12 0.7071   1.32163E-06 0.0102
10003      0.00000E+00 0.0000   1.30039E-06 0.0105
....
10022      0.00000E+00 0.0000   1.33882E-06 0.0105

I am unsure of how to cycle through this data to have the above format however beyond the simple start of it with

# Open file
f = open(&#39;data.txt&#39;, &#39;r&#39;)

# Loop over lines and extract variables of interest
for line in f:
    line = line.strip()
    columns = line.split()

</details>


# 答案1
**得分**: 1

你可以尝试使用 `re` 来将文件解析成数据框：

```py
text = &quot;&quot;&quot;\
f     7523
  10001  10002  10003  10004  10005  10006  10007  10008  10009  10010  10011
  10012  10013  10014  10015  10016  10017  10018  10019  10020  10021  10022
vals
  0.00000E+00 0.0000  7.65079E-12 0.7071  0.00000E+00 0.0000  3.87977E-14 0.5513
  0.00000E+00 0.0000  0.00000E+00 0.0000  0.00000E+00 0.0000  0.00000E+00 0.0000
  0.00000E+00 0.0000  1.92698E-14 0.7071  4.47277E-14 0.7071  0.00000E+00 0.0000
  0.00000E+00 0.0000  0.00000E+00 0.0000  0.00000E+00 0.0000  1.10023E-11 0.7053
  1.74410E-11 0.5005  0.00000E+00 0.0000  1.69181E-13 0.7071  0.00000E+00 0.0000
  0.00000E+00 0.0000  0.00000E+00 0.0000
vals
  1.34313E-06 0.0104  1.32163E-06 0.0102  1.30039E-06 0.0105  1.37575E-06 0.0106
  1.30792E-06 0.0102  1.28351E-06 0.0104  1.32164E-06 0.0102  1.32969E-06 0.0104
  1.31707E-06 0.0104  1.27281E-06 0.0103  1.28429E-06 0.0106  1.27027E-06 0.0105
  1.29623E-06 0.0105  1.32037E-06 0.0101  1.28948E-06 0.0105  1.33163E-06 0.0106
  1.36073E-06 0.0102  1.35462E-06 0.0102  1.38641E-06 0.0102  1.33099E-06 0.0102
  1.35307E-06 0.0100  1.33882E-06 0.0105
&quot;&quot;&quot;

import re
import pandas as pd

data = [
    [c.split() for c in group.split(&quot;  &quot;)]
    for _, group in re.findall(r&quot;(^\s+)(.*?)(?=^\S|\Z)&quot;, text, flags=re.S | re.M)
]

idx = [h for c in data[0] for h in c]

df = pd.concat(
    [pd.DataFrame(d, index=idx) for d in data[1:]], axis=1
)
df.columns = range(len(df.columns))

print(df)

输出：

                 0       1            2       3
10001  0.00000E+00  0.0000  1.34313E-06  0.0104
10002  7.65079E-12  0.7071  1.32163E-06  0.0102
10003  0.00000E+00  0.0000  1.30039E-06  0.0105
10004  3.87977E-14  0.5513  1.37575E-06  0.0106
10005  0.00000E+00  0.0000  1.30792E-06  0.0102
10006  0.00000E+00  0.0000  1.28351E-06  0.0104
10007  0.00000E+00  0.0000  1.32164E-06  0.0102
10008  0.00000E+00  0.0000  1.32969E-06  0.0104
10009  0.00000E+00  0.0000  1.31707E-06  0.0104
10010  1.92698E-14  0.7071  1.27281E-06  0.0103
10011  4.47277E-14  0.7071  1.28429E-06  0.0106
10012  0.00000E+00  0.0000  1.27027E-06  0.0105
10013  0.00000E+00  0.0000  1.29623E-06  0.0105
10014  0.00000E+00  0.0000  1.32037E-06  0.0101
10015  0.00000E+00  0.0000  1.28948E-06  0.0105
10016  1.10023E-11  0.7053  1.33163E-06  0.0106
10017  1.74410E-11  0.5005  1.36073E-06  0.0102
10018  0.00000E+00  0.0000  1.35462E-06  0.0102
10019  1.69181E-13  0.7071  1.38641E-06  0.0102
10020  

<details>
<summary>英文:</summary>

You can try to use `re` to parse the file into a dataframe:

```py
text = &quot;&quot;&quot;\
f     7523
  10001  10002  10003  10004  10005  10006  10007  10008  10009  10010  10011
  10012  10013  10014  10015  10016  10017  10018  10019  10020  10021  10022
vals
  0.00000E+00 0.0000  7.65079E-12 0.7071  0.00000E+00 0.0000  3.87977E-14 0.5513
  0.00000E+00 0.0000  0.00000E+00 0.0000  0.00000E+00 0.0000  0.00000E+00 0.0000
  0.00000E+00 0.0000  1.92698E-14 0.7071  4.47277E-14 0.7071  0.00000E+00 0.0000
  0.00000E+00 0.0000  0.00000E+00 0.0000  0.00000E+00 0.0000  1.10023E-11 0.7053
  1.74410E-11 0.5005  0.00000E+00 0.0000  1.69181E-13 0.7071  0.00000E+00 0.0000
  0.00000E+00 0.0000  0.00000E+00 0.0000
vals
  1.34313E-06 0.0104  1.32163E-06 0.0102  1.30039E-06 0.0105  1.37575E-06 0.0106
  1.30792E-06 0.0102  1.28351E-06 0.0104  1.32164E-06 0.0102  1.32969E-06 0.0104
  1.31707E-06 0.0104  1.27281E-06 0.0103  1.28429E-06 0.0106  1.27027E-06 0.0105
  1.29623E-06 0.0105  1.32037E-06 0.0101  1.28948E-06 0.0105  1.33163E-06 0.0106
  1.36073E-06 0.0102  1.35462E-06 0.0102  1.38641E-06 0.0102  1.33099E-06 0.0102
  1.35307E-06 0.0100  1.33882E-06 0.0105
&quot;&quot;&quot;

import re
import pandas as pd

data = [
    [c.split() for c in group.split(&quot;  &quot;)]
    for _, group in re.findall(r&quot;(^\s+)(.*?)(?=^\S|\Z)&quot;, text, flags=re.S | re.M)
]

idx = [h for c in data[0] for h in c]

df = pd.concat(
    [pd.DataFrame(d, index=idx) for d in data[1:]], axis=1
)
df.columns = range(len(df.columns))

print(df)

Prints:

                 0       1            2       3
10001  0.00000E+00  0.0000  1.34313E-06  0.0104
10002  7.65079E-12  0.7071  1.32163E-06  0.0102
10003  0.00000E+00  0.0000  1.30039E-06  0.0105
10004  3.87977E-14  0.5513  1.37575E-06  0.0106
10005  0.00000E+00  0.0000  1.30792E-06  0.0102
10006  0.00000E+00  0.0000  1.28351E-06  0.0104
10007  0.00000E+00  0.0000  1.32164E-06  0.0102
10008  0.00000E+00  0.0000  1.32969E-06  0.0104
10009  0.00000E+00  0.0000  1.31707E-06  0.0104
10010  1.92698E-14  0.7071  1.27281E-06  0.0103
10011  4.47277E-14  0.7071  1.28429E-06  0.0106
10012  0.00000E+00  0.0000  1.27027E-06  0.0105
10013  0.00000E+00  0.0000  1.29623E-06  0.0105
10014  0.00000E+00  0.0000  1.32037E-06  0.0101
10015  0.00000E+00  0.0000  1.28948E-06  0.0105
10016  1.10023E-11  0.7053  1.33163E-06  0.0106
10017  1.74410E-11  0.5005  1.36073E-06  0.0102
10018  0.00000E+00  0.0000  1.35462E-06  0.0102
10019  1.69181E-13  0.7071  1.38641E-06  0.0102
10020  0.00000E+00  0.0000  1.33099E-06  0.0102
10021  0.00000E+00  0.0000  1.35307E-06  0.0100
10022  0.00000E+00  0.0000  1.33882E-06  0.0105

答案2

得分: 1

这是您提供的Python代码的翻译部分：

from pprint import pp

# 打开名为'data.txt'的文件并读取内容
with open('data.txt') as f:
    data = f.read()

# 用空字符串替换换行符，并按'vals'拆分字符串
rows, val1, val2 = data.replace('\n', '').split('vals')

# 将rows转换为整数列表
rows = list(map(int, rows.split()[2:]))
val1 = val1.split()
val2 = val2.split()

# 使用列表推导式从val1和val2中收集元组
# 如果需要实际的浮点数，可以在内部使用list(map(float, valX[i:i+2]))
val1 = [val1[i:i+2] for i in range(0, len(val1), 2)]
val2 = [val2[i:i+2] for i in range(0, len(val2), 2)]

# 使用zip函数和解包元组
result = [[a, *b, *c] for a, b, c in zip(rows, val1, val2)]

pp(result)

请注意，此翻译只涵盖了代码的翻译部分，不包括问题或其他额外内容。

英文:

from pprint import pp


with open(&#39;data.txt&#39;) as f: data = f.read()
rows, val1, val2 = data.replace(&#39;\n&#39;,&#39;&#39;).split(&#39;vals&#39;)

# split into to lists
# drop &#39;f     7523&#39; and convert rows to ints
rows = list(map(int, rows.split()[2:]))
val1 = val1.split()
val2 = val2.split()

# list comprehension to gather tuples from vals
# list(map(float, valX[i:i+2])) inside if you want actual floats
val1 = [val1[i:i+2] for i in range(0, len(val1), 2)]
val2 = [val2[i:i+2] for i in range(0, len(val2), 2)]

# zip and unpack tuples
result = [[a,*b,*c] for a,b,c in zip(rows, val1, val2)]

pp(result)

[[10001, &#39;0.00000E+00&#39;, &#39;0.0000&#39;, &#39;1.34313E-06&#39;, &#39;0.0104&#39;],
 [10002, &#39;7.65079E-12&#39;, &#39;0.7071&#39;, &#39;1.32163E-06&#39;, &#39;0.0102&#39;],
 [10003, &#39;0.00000E+00&#39;, &#39;0.0000&#39;, &#39;1.30039E-06&#39;, &#39;0.0105&#39;],
 [10004, &#39;3.87977E-14&#39;, &#39;0.5513&#39;, &#39;1.37575E-06&#39;, &#39;0.0106&#39;],
 [10005, &#39;0.00000E+00&#39;, &#39;0.0000&#39;, &#39;1.30792E-06&#39;, &#39;0.0102&#39;],
 [10006, &#39;0.00000E+00&#39;, &#39;0.0000&#39;, &#39;1.28351E-06&#39;, &#39;0.0104&#39;],
 [10007, &#39;0.00000E+00&#39;, &#39;0.0000&#39;, &#39;1.32164E-06&#39;, &#39;0.0102&#39;],
 [10008, &#39;0.00000E+00&#39;, &#39;0.0000&#39;, &#39;1.32969E-06&#39;, &#39;0.0104&#39;],
 [10009, &#39;0.00000E+00&#39;, &#39;0.0000&#39;, &#39;1.31707E-06&#39;, &#39;0.0104&#39;],
 [10010, &#39;1.92698E-14&#39;, &#39;0.7071&#39;, &#39;1.27281E-06&#39;, &#39;0.0103&#39;],
 [10011, &#39;4.47277E-14&#39;, &#39;0.7071&#39;, &#39;1.28429E-06&#39;, &#39;0.0106&#39;],
 [10012, &#39;0.00000E+00&#39;, &#39;0.0000&#39;, &#39;1.27027E-06&#39;, &#39;0.0105&#39;],
 [10013, &#39;0.00000E+00&#39;, &#39;0.0000&#39;, &#39;1.29623E-06&#39;, &#39;0.0105&#39;],
 [10014, &#39;0.00000E+00&#39;, &#39;0.0000&#39;, &#39;1.32037E-06&#39;, &#39;0.0101&#39;],
 [10015, &#39;0.00000E+00&#39;, &#39;0.0000&#39;, &#39;1.28948E-06&#39;, &#39;0.0105&#39;],
 [10016, &#39;1.10023E-11&#39;, &#39;0.7053&#39;, &#39;1.33163E-06&#39;, &#39;0.0106&#39;],
 [10017, &#39;1.74410E-11&#39;, &#39;0.5005&#39;, &#39;1.36073E-06&#39;, &#39;0.0102&#39;],
 [10018, &#39;0.00000E+00&#39;, &#39;0.0000&#39;, &#39;1.35462E-06&#39;, &#39;0.0102&#39;],
 [10019, &#39;1.69181E-13&#39;, &#39;0.7071&#39;, &#39;1.38641E-06&#39;, &#39;0.0102&#39;],
 [10020, &#39;0.00000E+00&#39;, &#39;0.0000&#39;, &#39;1.33099E-06&#39;, &#39;0.0102&#39;],
 [10021, &#39;0.00000E+00&#39;, &#39;0.0000&#39;, &#39;1.35307E-06&#39;, &#39;0.0100&#39;],
 [10022, &#39;0.00000E+00&#39;, &#39;0.0000&#39;, &#39;1.33882E-06&#39;, &#39;0.0105&#39;]]

答案3

得分: 0

读取文件逐行操作。

如果当前行以 f 开头，表示开始一个新的标题部分。（在你的示例中，7523 值与 f 在同一行，不清楚你想要对它进行什么操作）

否则，如果当前行以 vals 开头，表示开始一个新的数值部分。

否则，你正在处理已经开始的标题或数值部分。将行中的值读取到一个列表中，并适当地存储它们。

当文件处理完毕时，打印所有的标题和数值。

英文:

Read the file line by line.

If the current line starts with f, you have started a new header section. (It's not clear what you want to do with the 7523 value that is on the same line as the f in your example)

Otherwise if the current line starts with vals, you have started a new values section.

Otherwise you're in the middle of a header or values section that has already started. Read the values from the line into a list and store them appropriately.

When the file is exhausted, print all the headers and values.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

用Python解析不寻常格式的数据

问题

答案2

答案3

SqlAlchemy拼接两列以进行`ilike`查询

在 pandas 中创建一列，该列中包含每天的平均损失值，放在列的最后一行。

纳斯达克首次公开募股数据抓取

Changing compression parameter in python-blosc2

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论