问题

抱歉，以下是您要翻译的部分：

"The upset plot tutorials on the documentation have this example with movies: https://upsetplot.readthedocs.io/en/stable/formats.html#When-category-membership-is-indicated-in-DataFrame-columns
I wanted to know, after creating data from memberships "Genre" and plotting how do I list the names of the movies as well?
In the plot, I want to print the list of movies at each intersection. So at intersection 48, I want to list the 48 movies."

英文:

The upset plot tutorials on the documentation have this example with movies: https://upsetplot.readthedocs.io/en/stable/formats.html#When-category-membership-is-indicated-in-DataFrame-columns
I wanted to know, after creating data from memberships "Genre" and plotting how do I list the names of the movies as well?
In the plot, I want to print the list of movies at each intersection. So at intersection 48, I want to list the 48 movies.

答案1

得分: 2

以下是您要翻译的部分：

"In the example on the documentation page, this information is contained in the dataframe movies_by_genre, which is defined as: movies_by_genre = from_indicators(genre_indicators, data=movies). Now, we can extract the required information from this data frame. We just need to make sure that the order of the boolean tuple of length 20, (True, False, ....., True) in the pandas Series object intersection and the pandas Series object movies_by_genre.Genres. I used a dict to map the order of columns. For reproducibility, the end-to-end python script is given below:"

英文:

In the example on the documentation page, this information is contained in the dataframe movies_by_genre, which is defined as: movies_by_genre = from_indicators(genre_indicators, data=movies). Now, we can extract the required information from this data frame. We just need to make sure that the order of the boolean tuple of length 20, (True, False, ....., True) in the pandas Series object intersection and the pandas Series object movies_by_genre.Genres. I used a dict to map the order of columns. For reproducibility, the end-to-end python script is given below:

# ! pip install upsetplot
# ! pip install smartprint 
from upsetplot import from_indicators
import pandas as pd 
from upsetplot import UpSet
from smartprint import smartprint as sprint

def get_movie_list_at_intersection(u, movies_by_genre, col=0):
    &quot;&quot;&quot;
    Args: 
        u: result of the call UpSet(movies_by_genre, min_subset_size=15, show_counts=True)
        movies_by_genre: result of from_indicators(genre_indicators, data=movies)
        column number: 0 implies the first intersection with 48 elements
    Returns:
        list of movie names at column number col
    
    &quot;&quot;&quot;

    keys = list(u.intersections.index.names)
    values = list(u.intersections.index[col]) 
    
    # Fix the order of columns between movies df and the movies_by_genre_df
    dict_ = dict(zip(keys, values)) 
    
    column_names_in_df_movies_by_genre = movies_by_genre.Genre.index.names
    mapped_boolean = [*map(dict_.get, column_names_in_df_movies_by_genre)]

    movie_list = movies_by_genre.loc[tuple(mapped_boolean)].Title.tolist() 
    return movie_list


from upsetplot import from_indicators
import pandas as pd 
from upsetplot import UpSet

movies = pd.read_csv(&quot;https://raw.githubusercontent.com/peetck/IMDB-Top1000-Movies/master/IMDB-Movie-Data.csv&quot;)
genre_indicators = pd.DataFrame([{cat: True
                                  for cat in cats}
                                 for cats in movies.Genre.str.split(&#39;,&#39;).values]).fillna(False)
movies_by_genre = from_indicators(genre_indicators, data=movies)
u = UpSet(movies_by_genre, min_subset_size=15, show_counts=True)


# For for the 4th intersection set, i.e. column number 3 we have the following,
# which outputs the corresponding list of length 15 movies

sprint (get_movie_list_at_intersection(u, movies_by_genre, 3))
sprint (len(get_movie_list_at_intersection(u, movies_by_genre, 3)))

Output:

 get_movie_list_at_intersection(u, movies_by_genre, 3) : [&#39;Nocturnal Animals&#39;, &#39;Miss Sloane&#39;, &#39;Forushande&#39;, &#39;Kynodontas&#39;, &#39;Norman: The Moderate Rise and Tragic Fall of a New York Fixer&#39;, &#39;Black Swan&#39;, &#39;The imposible&#39;, &#39;The Lives of Others&#39;, &#39;Zipper&#39;, &#39;Lavender&#39;, &#39;Man Down&#39;, &#39;A Bigger Splash&#39;, &#39;Flight&#39;, &#39;Contagion&#39;, &#39;The Skin I Live In&#39;]
len(get_movie_list_at_intersection(u, movies_by_genre, 3)) : 15

EDIT:

Upon clarification from OP, the list of names should be printed on the plot. So, we can follow the same method and put the text on the plots manually. I did the following:

Modified the _plot_bars() function inside upsetplot.plotting.py such that it allows us to add text from a parameterlist called lol_of_intersection_names; lol stands for list of list. Additionally, I added an alpha parameter to reduce the transparency of the bars when ax.bar is called; otherwise the text will not be visible. (alpha = 0.5) in the example below.


for (name, y), color in zip(data_df.items(), colors):
    rects = ax.bar(x, y, .5, cum_y,
                   color=color, zorder=10,
                   label=name if use_labels else None,
                   align=&#39;center&#39;,alpha=0.5)
    cum_y = y if cum_y is None else cum_y + y
    all_rects.extend(rects)

    ############# Start of Snippet
    # Iterate over each bar 
    for bar_num in  range(len(y.tolist())):
        bar = ax.patches[bar_num] # extract the bar 
        for counter in range(y.tolist()[bar_num]):
            # insert text according to 
            ax.text( bar.get_width()/2 + bar.get_x(), bar.get_y() + bar.get_height() * \
                     counter/y.tolist()[bar_num] , self.lol_of_intersection_names[bar_num]0
+
网站访问量, \
                     color=&#39;blue&#39;, ha=&#39;center&#39;, va=&#39;center&#39;, fontsize=0.5)
            counter += 1
    ############# End of Snippet    

    self._label_sizes(ax, rects, &#39;top&#39; if self._horizontal else &#39;right&#39;)

Inserted the parameters into the object u of class Upset so that it can be accessed inside the function _plot_bars() as shown below:

    u = UpSet(movies_by_genre, min_subset_size=15, show_counts=True)


    lol_of_intersection_names = [] # lol: list of list
    for i in range(u.intersections.shape[0]):
        lol_of_intersection_names.append((get_movie_list_at_intersection(u, movies_by_genre, i)))
    u.lol_of_intersection_names = lol_of_intersection_names
    
    
    u.plot()
    plt.savefig(&quot;Upset_plot.png&quot;, dpi=600)
    plt.show()

Finally, the output looks as shown below:

However, given the long list of names, I am unsure of the practical importance of plotting like this. Only when I save the image in 600DPI, can I zoom in and see the names of movies.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

“Upset plot python list row names” can be translated as “Python Upset 绘图列表行名称.”

问题

答案1

理解Flask中的路由/ URL映射。

使用单词距离校正列内的拼写错误

将不同长度的列表转换为一列中的行，使用pandas数据框。

在Python字符串中删除字符串中的'(data)’。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论