2023年4月20日 01:20:58go评论87阅读模式

英文:

modifiy one element of namedtuple of list

问题

我已经编写了脚本，用于从PDF文件中提取一些信息。
每一页都被读取为块。
如果找到**[V2G**，那么将保存它以及标题、副标题和项目列表。

我的代码：

data = []
req = namedtuple('Req', 'a b c d e f')
    
for page in doc:
    dic = page.get_text("dict")
    blocks = dic['blocks']  # 文本块
    ...
    
    for b in blocks: 
        # 标题
        if (font1 == 'Cambria-Bold'):
            nr = text.partition(" ")[2]
            title = text.partition(" ")[2] 
        #副标题
        elif font1 == 'Cambria':
            Sub_nr = text.partition(" ")[0]
            sub_title = text.partition(" ")[2]
        #文本
        elif text.startswith('[V2G'):                   
            id = text.replace('[', '')
            txt = text1.strip()
            data.append(req(nr, title, Sub_nr, sub_title, id, txt))
    
        #文本后的项目列表
        elif text.startswith("—"):
            text += "\n" + text

问题出在项目列表上，因为它位于文本**([V2G)的下一个块中。
而且，不是以[V2G**开头的每个单词都有项目列表。

那么我怎样才能保存项目列表的文本以及来自txt-variable的文本，并将其保存在namedtuple参数(f)中？

然后，我想将其推送到我的列表的同一最后一行？

是否可以只修改命名元组的一个参数，并将其附加到我的列表(data)的最后一个元素，而不影响另一个参数？

结果应该是：

英文:

I have written script to extract some information from pdf file.
Each page is read as blocks.
if [V2G has been found, then it will saved it as well as the title ,subtitle and the bulleted list.

My code:

data = []
req = namedtuple(&#39;Req&#39;, &#39;a b c d e f&#39;)
    
for page in doc:
   dic = page.get_text(&quot;dict&quot;)
   blocks = dic[&#39;blocks&#39;]  # text blocks
   ...
    
   for b in blocks : 
    #title
     if (font1 == &#39;Cambria-Bold&#39;:
         nr = text.partition(&quot; &quot;)
         title = text.partition(&quot; &quot;)[2] 
     #subtitle
    elif font1 == &#39;Cambria&#39;:
         Sub_nr = text.partition(&quot; &quot;)[0] 
         sub_title = text.partition(&quot; &quot;)[2]
    #text
    elif text.startswith(&#39;[V2G&#39;):                   
         id = text.replace(&#39;[&#39;, &#39;&#39;)
         txt = text1.strip()
         data.append(req(nr,title,Sub_nr,sub_title, id, txt))
    
    #bulleted list after the text
    elif text.startswith(&quot;—&quot;):
         text += &quot;\n&quot; + text

the problem is the bulleted list ,because it located in the next block of the text ([V2G).
Also not each word begin with [V2G has a bulleted list.

So how can I save the text of bulleted list as well as the text from txt-varibale and save it in the namedtuple argument (f)?

then I would like to push it on the same last row of my list?

is it possible to modify just one argument of named tuple and append it to the last element of my list(data) without the another argument?

Result should be:

答案1

得分: 2

Barmar 在上面的评论中指出，(命名)元组是不可变的。为什么不使用字典呢？你可以在循环中将新条目分配给这个字典，并更新它们。当然，在每次迭代的最后一步应该将字典附加到 data 列表中。

我不确定为什么要将项目符号列表与其他文本分开处理。不了解 text 和 text1 是如何分配的，很难回答这部分问题。

英文:

As Barmar has pointed out in a comment above, (named) tuples are immutable. Why not use a dictionary instead? You could assign new items to this dictionary throughout the loop and update them as well. Appending the dictionary to the data list should of course be done as the last step in each iteration.

I am not sure why the bullet point lists have to be treated separately from the other text. It is hard to answer this part of the question without knowing how text and text1 are assigned.

data = []
    
for page in doc:
    dic = page.get_text(&quot;dict&quot;)
    blocks = dic[&#39;blocks&#39;]  # text blocks
    ...
    
    for b in blocks : 
        req = {&#39;txt&#39;: &quot;&quot;}
        # title
        if font1 == &#39;Cambria-Bold&#39;:
            req[&#39;nr&#39;] = text.partition(&quot; &quot;)
            req[&#39;title&#39;] = text.partition(&quot; &quot;)[2] 
        
        # subtitle
        elif font1 == &#39;Cambria&#39;:
            req[&#39;sub_nr&#39;] = text.partition(&quot; &quot;)[0] 
            req[&#39;sub_title&#39;] = text.partition(&quot; &quot;)[2]
    
        # text
        elif text.startswith(&#39;[V2G&#39;):                   
            req[&#39;id&#39;] = text.replace(&#39;[&#39;, &#39;&#39;)
            req[&#39;txt&#39;] = text1.strip()
         
        # bulleted list after the text
        elif text.startswith(&quot;—&quot;):
            req[&#39;txt&#39;] += (&quot;\n&quot; + text)
        data.append(req)

data is now a list of dictionaries, which pandas' DataFrame class will accept as input.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

修改列表中命名元组的一个元素

问题

答案1

File location macro for python (Jupyter notebook)

Group values of a key if another key has got same values that are list type in list of dictionary

从Python项目中加载数据存储实体到Go语言会导致嵌套的结构体切片错误。

percent-encoded %2F fail request

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。