2023年1月9日 01:47:22go评论91阅读模式

英文:

Trying to create a streamlit app that uses user-provided URLs to scrape and return a downloadable df

问题

I'm trying to use this create_df() function in Streamlit to gather a list of user-provided URLs called "recipes" and loop through each URL to return a df I've labeled "res" towards the end of the function. I've tried several approaches with the Streamlit syntax but I just cannot get this to work as I'm getting this error message:

recipe_scrapers._exceptions.WebsiteNotImplementedError: recipe-scrapers exception: Website (h) not supported.

Have a look at my entire repo here. The main.py script works just fine once you've installed all requirements locally, but when I try running the same script with Streamlit syntax in the streamlit.py script I get the above error. Once you run streamlit run streamlit.py in your terminal and have a look at the UI I've create it should be quite clear what I'm aiming at, which is providing the user with a csv of all ingredients in the recipe URLs they provided for a convenient grocery shopping list.

Any help would be greatly appreciated!

英文:

recipe_scrapers._exceptions.WebsiteNotImplementedError: recipe-scrapers exception: Website (h) not supported.

Any help would be greatly appreciated!

def create_df(recipes):
&quot;&quot;&quot;
Description:
Creates one df with all recipes and their ingredients
Arguments:
* recipes: list of recipe URLs provided by user
Comments:
Note that ingredients with qualitative amounts e.g., &quot;scheutje melk&quot;, &quot;snufje zout&quot; have been ommitted from the ingredient list
&quot;&quot;&quot;
df_list = []
for recipe in recipes:
scraper = scrape_me(recipe)
recipe_details = replace_measurement_symbols(scraper.ingredients())
recipe_name = recipe.split(&quot;https://www.hellofresh.nl/recipes/&quot;, 1)[1]
recipe_name = recipe_name.rsplit(&#39;-&#39;, 1)[0]
print(&quot;Processing data for &quot;+ recipe_name +&quot; recipe.&quot;)
for ingredient in recipe_details:
try:
df_temp = pd.DataFrame(columns=[&#39;Ingredients&#39;, &#39;Measurement&#39;])
df_temp[str(recipe_name)] = recipe_name
ing_1 = ingredient.split(&quot;2 * &quot;, 1)[1]
ing_1 = ing_1.split(&quot; &quot;, 2)
item = ing_1[2]
measurement = ing_1[1]
quantity = float(ing_1[0]) * 2
df_temp.loc[len(df_temp)] = [item, measurement, quantity]
df_list.append(df_temp)
except (ValueError, IndexError) as e:
pass
df = pd.concat(df_list)
print(&quot;Renaming duplicate ingredients e.g., Kruimige aardappelen, Voorgekookte halve kriel met schil -&gt; Aardappelen&quot;)
ingredient_dict = {
&#39;Aardappelen&#39;: (&#39;Dunne frieten&#39;, &#39;Half kruimige aardappelen&#39;, &#39;Voorgekookte halve kriel met schil&#39;,
&#39;Kruimige aardappelen&#39;, &#39;Roodschillige aardappelen&#39;, &#39;Opperdoezer Ronde aardappelen&#39;),
&#39;Ui&#39;: (&#39;Rode ui&#39;),
&#39;Kipfilet&#39;: (&#39;Kipfilet met tuinkruiden en knoflook&#39;),
&#39;Kipworst&#39;: (&#39;Gekruide kipworst&#39;),
&#39;Kipgehakt&#39;: (&#39;Gemengd gekruid gehakt&#39;, &#39;Kipgehakt met Mexicaanse kruiden&#39;, &#39;Half-om-halfgehakt met Italiaanse kruiden&#39;,
&#39;Kipgehakt met tuinkruiden&#39;),
&#39;Kipshoarma&#39;: (&#39;Kalkoenshoarma&#39;)
}
reverse_label_ing = {x:k for k,v in ingredient_dict.items() for x in v}
df[&quot;Ingredients&quot;].replace(reverse_label_ing, inplace=True)
print(&quot;Assigning ingredient categories&quot;)
category_dict = {
&#39;brood&#39;: (&#39;Biologisch wit rozenbroodje&#39;, &#39;Bladerdeeg&#39;, &#39;Briochebroodje&#39;, &#39;Wit platbrood&#39;),
&#39;granen&#39;: (&#39;Basmatirijst&#39;, &#39;Bulgur&#39;, &#39;Casarecce&#39;, &#39;Cashewstukjes&#39;,
&#39;Gesneden snijbonen&#39;, &#39;Jasmijnrijst&#39;, &#39;Linzen&#39;, &#39;Ma√&#216;s in blik&#39;,
&#39;Parelcouscous&#39;, &#39;Penne&#39;, &#39;Rigatoni&#39;, &#39;Rode kidneybonen&#39;,
&#39;Spaghetti&#39;, &#39;Witte tortilla&#39;),
&#39;groenten&#39;: (&#39;Aardappelen&#39;, &#39;Aubergine&#39;, &#39;Bosui&#39;, &#39;Broccoli&#39;,
&#39;Champignons&#39;, &#39;Citroen&#39;, &#39;Gele wortel&#39;, &#39;Gesneden rodekool&#39;,
&#39;Groene paprika&#39;, &#39;Groentemix van paprika, prei, gele wortel en courgette&#39;,
&#39;IJsbergsla&#39;, &#39;Kumato tomaat&#39;, &#39;Limoen&#39;, &#39;Little gem&#39;,
&#39;Paprika&#39;, &#39;Portobello&#39;, &#39;Prei&#39;, &#39;Pruimtomaat&#39;,
&#39;Radicchio en ijsbergsla&#39;, &#39;Rode cherrytomaten&#39;, &#39;Rode paprika&#39;, &#39;Rode peper&#39;,
&#39;Rode puntpaprika&#39;, &#39;Rode ui&#39;, &#39;Rucola&#39;, &#39;Rucola en veldsla&#39;, &#39;Rucolamelange&#39;,
&#39;Semi-gedroogde tomatenmix&#39;, &#39;Sjalot&#39;, &#39;Sperziebonen&#39;, &#39;Spinazie&#39;, &#39;Tomaat&#39;,
&#39;Turkse groene peper&#39;, &#39;Veldsla&#39;, &#39;Vers basilicum&#39;, &#39;Verse bieslook&#39;,
&#39;Verse bladpeterselie&#39;, &#39;Verse koriander&#39;, &#39;Verse krulpeterselie&#39;, &#39;Wortel&#39;, &#39;Zoete aardappel&#39;),
&#39;kruiden&#39;: (&#39;A√&#216;oli&#39;, &#39;Bloem&#39;, &#39;Bruine suiker&#39;, &#39;Cranberrychutney&#39;, &#39;Extra vierge olijfolie&#39;,
&#39;Extra vierge olijfolie met truffelaroma&#39;, &#39;Fles olijfolie&#39;, &#39;Gedroogde laos&#39;,
&#39;Gedroogde oregano&#39;, &#39;Gemalen kaneel&#39;, &#39;Gemalen komijnzaad&#39;, &#39;Gemalen korianderzaad&#39;,
&#39;Gemalen kurkuma&#39;, &#39;Gerookt paprikapoeder&#39;, &#39;Groene currykruiden&#39;, &#39;Groentebouillon&#39;,
&#39;Groentebouillonblokje&#39;, &#39;Honing&#39;, &#39;Italiaanse kruiden&#39;, &#39;Kippenbouillonblokje&#39;, &#39;Knoflookteen&#39;,
&#39;Kokosmelk&#39;, &#39;Koreaanse kruidenmix&#39;, &#39;Mayonaise&#39;, &#39;Mexicaanse kruiden&#39;, &#39;Midden-Oosterse kruidenmix&#39;,
&#39;Mosterd&#39;, &#39;Nootmuskaat&#39;, &#39;Olijfolie&#39;, &#39;Panko paneermeel&#39;, &#39;Paprikapoeder&#39;, &#39;Passata&#39;,
&#39;Pikante uienchutney&#39;, &#39;Runderbouillonblokje&#39;, &#39;Sambal&#39;, &#39;Sesamzaad&#39;, &#39;Siciliaanse kruidenmix&#39;,
&#39;Sojasaus&#39;, &#39;Suiker&#39;, &#39;Sumak&#39;, &#39;Surinaamse kruiden&#39;, &#39;Tomatenblokjes&#39;, &#39;Tomatenblokjes met ui&#39;,
&#39;Truffeltapenade&#39;, &#39;Ui&#39;, &#39;Verse gember&#39;, &#39;Visbouillon&#39;, &#39;Witte balsamicoazijn&#39;, &#39;Wittewijnazijn&#39;,
&#39;Zonnebloemolie&#39;, &#39;Zwarte balsamicoazijn&#39;),
&#39;vlees&#39;: (&#39;Gekruide runderburger&#39;, &#39;Half-om-half gehaktballetjes met Spaanse kruiden&#39;, &#39;Kipfilethaasjes&#39;, &#39;Kipfiletstukjes&#39;,
&#39;Kipgehaktballetjes met Italiaanse kruiden&#39;, &#39;Kippendijreepjes&#39;, &#39;Kipshoarma&#39;, &#39;Kipworst&#39;, &#39;Spekblokjes&#39;,
&#39;Vegetarische d√∂ner kebab&#39;, &#39;Vegetarische kaasschnitzel&#39;, &#39;Vegetarische schnitzel&#39;),
&#39;zuivel&#39;: (&#39;Ei&#39;, &#39;Geraspte belegen kaas&#39;, &#39;Geraspte cheddar&#39;, &#39;Geraspte grana padano&#39;, &#39;Geraspte oude kaas&#39;,
&#39;Geraspte pecorino&#39;, &#39;Karnemelk&#39;, &#39;Kruidenroomkaas&#39;, &#39;Labne&#39;, &#39;Melk&#39;, &#39;Mozzarella&#39;,
&#39;Parmigiano reggiano&#39;, &#39;Roomboter&#39;, &#39;Slagroom&#39;, &#39;Volle yoghurt&#39;)
}
reverse_label_cat = {x:k for k,v in category_dict.items() for x in v}
df[&quot;Category&quot;] = df[&quot;Ingredients&quot;].map(reverse_label_cat)
col = &quot;Category&quot;
first_col = df.pop(col)
df.insert(0, col, first_col)
df = df.sort_values([&#39;Category&#39;, &#39;Ingredients&#39;], ascending = [True, True])
print(&quot;Merging ingredients by row across all recipe columns using justify()&quot;)
gp_cols = [&#39;Ingredients&#39;, &#39;Measurement&#39;]
oth_cols = df.columns.difference(gp_cols)
arr = np.vstack(df.groupby(gp_cols, sort=False, dropna=False).apply(lambda gp: justify(gp.to_numpy(), invalid_val=np.NaN, axis=0, side=&#39;up&#39;)))
# Reconstruct DataFrame
# Remove entirely NaN rows based on the non-grouping columns
res = (pd.DataFrame(arr, columns=df.columns)
.dropna(how=&#39;all&#39;, subset=oth_cols, axis=0))
res = res.fillna(0)
res[&#39;Total&#39;] = res.drop([&#39;Ingredients&#39;, &#39;Measurement&#39;], axis=1).sum(axis=1)
res=res[res[&#39;Total&#39;] !=0] #To drop rows that are being duplicated with 0 for some reason; will check later
print(&quot;Processing complete!&quot;)
return res

答案1

得分: 1

Your function create_df needs a list as an argument, but st.text_input always returns a string.

In your streamlit.py, replace this df_download = create_df(recs) with this df_download = create_df([recs]). However, if you need to handle multiple URLs, you should use str.split like this:

def create_df(recipes):
    recipes = recipes.split(",") # <--- add this line to create a list from the user input
### rest of the code ###
if download:
    df_download = create_df(recs)

英文:

Your function create_df needs a list as an argument but st.text_input returs always a string.

In your streamlit.py, replace this df_download = create_df(recs) by this df_download = create_df([recs]). But if you need to handle multiple urls, you should use str.split like this :

def create_df(recipes):
recipes = recipes.split(&quot;,&quot;) # &lt;--- add this line to make a list from the user-input
### rest of the code ###
if download:
df_download = create_df(recs)

# Output :

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Trying to create a streamlit app that uses user-provided URLs to scrape and return a downloadable df

问题

答案1

# Output :

获取每个时间点的过去成功率，使用pandas。

Property 'collection' does not exist on type 'Firestore' error in Firebase Firestore with Firebase v9 modular syntax

理解 ‘and’ 和 ‘or’ 的运作方式。

Python 3.9 使用 OR | 运算符表示联合类型吗？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。