问题

我正在使用snscrape来抓取有关EURUSD的推文，并结合机器学习来预测EURUSD的价格是否会在第二天上涨或下跌，使用已抓取的推文的情感。我在这个项目中遇到的问题是如何计划和组织我的代码，比如我应该将这些推文用作ML模型的特征，还是应该对当天的这些推文情感进行平均，并将它们用作模型的特征。我将感激那些曾经做过类似项目的人提供的任何建议。

英文:

I'm using snscrape to scrape tweets about EURUSD and combining machine learning to predict if the price of of EURUSD will go up or down the following day using sentiments of those tweets that have been scraped. The problem I have with this project is how I would plan and structure my code, like for example should I use those tweets as features for the ML model or should I average the the sentiments of those tweets for that particular day and use them as features for the model to use. I will appreciate any advice from people that have worked on similar projects like these.

答案1

得分: -1

提供您拥有所需的令牌，您可以测试执行类似这样的操作：

import pandas as pd
import snscrape.modules.twitter as sntwitter
from textblob import TextBlob
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
tweets = []
for i, tweet in enumerate(sntwitter.TwitterSearchScraper('EURUSD').get_items()):
    if i > 1000:
        break
    tweets.append([tweet.date, tweet.content])
df_tweets = pd.DataFrame(tweets, columns=['date', 'text'])
df_tweets.to_csv('tweets.csv', index=False)
def get_sentiment(text):
    sentiment = TextBlob(text).sentiment.polarity
    if sentiment > 0:
        return 1
    elif sentiment < 0:
        return -1
    else:
        return 0
df_tweets['sentiment'] = df_tweets['text'].apply(get_sentiment)
df_features = df_tweets.groupby('date').agg({'sentiment': 'mean'})
df_features.reset_index(inplace=True)
X_train, X_test, y_train, y_test = train_test_split(df_features.drop('date', axis=1), df_features['price_direction'], test_size=0.2, random_state=42)
rfc = RandomForestClassifier(n_estimators=100, random_state=42)
rfc.fit(X_train, y_train)
y_pred = rfc.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy:', accuracy)
last_day_sentiment = df_features.iloc[-1]['sentiment']
next_day_direction = rfc.predict([[last_day_sentiment]])[0]
print('Next day direction:', next_day_direction)

如您所请求，这是代码的翻译部分，没有其他内容。

英文:

Provided that you have the tokens necessary you can test to do something like this:

import pandas as pd
import snscrape.modules.twitter as sntwitter
from textblob import TextBlob
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
tweets = []
for i, tweet in enumerate(sntwitter.TwitterSearchScraper(&#39;EURUSD&#39;).get_items()):
    if i &gt; 1000:
        break
    tweets.append([tweet.date, tweet.content])
df_tweets = pd.DataFrame(tweets, columns=[&#39;date&#39;, &#39;text&#39;])
df_tweets.to_csv(&#39;tweets.csv&#39;, index=False)
def get_sentiment(text):
    sentiment = TextBlob(text).sentiment.polarity
    if sentiment &gt; 0:
        return 1
    elif sentiment &lt; 0:
        return -1
    else:
        return 0
df_tweets[&#39;sentiment&#39;] = df_tweets[&#39;text&#39;].apply(get_sentiment)
df_features = df_tweets.groupby(&#39;date&#39;).agg({&#39;sentiment&#39;: &#39;mean&#39;})
df_features.reset_index(inplace=True)
X_train, X_test, y_train, y_test = train_test_split(df_features.drop(&#39;date&#39;, axis=1), df_features[&#39;price_direction&#39;], test_size=0.2, random_state=42)
rfc = RandomForestClassifier(n_estimators=100, random_state=42)
rfc.fit(X_train, y_train)
y_pred = rfc.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(&#39;Accuracy:&#39;, accuracy)
last_day_sentiment = df_features.iloc[-1][&#39;sentiment&#39;]
next_day_direction = rfc.predict([[last_day_sentiment]])[0]
print(&#39;Next day direction:&#39;, next_day_direction)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

我需要关于情感分析和机器学习的建议。

问题

答案1

在Linux中，您可以在哪里找到pip并将其添加到我的路径中？Python3.11.3

Telegram Telethon: 在多个不同客户端之间共享媒体下载

应用TA-Lib的KAMA到带有groupby的DataFrame。

谷歌分析在Streamlit应用程序上无法正常工作

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。