You know the feeling. You're heads-down on a new feature, and you remember to check Twitter for mentions of your product. You spend twenty minutes scrolling through search results, find a few interesting conversations, and get back to work. Three days later, you discover a high-profile user in your niche was asking for a tool that does exactly what you do, got a dozen recommendations for your competitor, and you missed the entire thread.
Manual monitoring doesn't scale. It's inconsistent, time-consuming, and the moment you stop checking, something important slips through. For indie hackers, solo founders, and small teams, the answer isn't a $500/month enterprise social listening suite. The answer is a lightweight, automated bot that runs in the background, collects what you care about, and surfaces it in a format you can act on.
This guide is a complete, production-focused walkthrough for building a Twitter keyword monitoring bot in Python. We'll skip the beginner-level theory and build something you'd actually run in production. By the end, you'll have a single Python script that searches Twitter for a keyword on a schedule, stores new tweets in a local SQLite database with automatic deduplication, and sends a real-time alert to Slack for every new mention it finds.
The Four Components of a Monitoring Bot
Before writing a single line of code, it's worth being explicit about the architecture. A reliable monitoring bot isn't a single, monolithic script. It's a small pipeline with four distinct components, each with a single responsibility.
Component | Responsibility |
|---|---|
Data Collection | Fetches tweets from Twitter for a given keyword, handles pagination and authentication |
Storage & Deduplication | Persists tweets in a structured format, prevents the same tweet from being processed twice |
Scheduling | Triggers the bot automatically at a regular interval without manual intervention |
Alerting | Notifies you in real-time when a new, relevant tweet is found |
Thinking in terms of these four components makes the system easier to build, debug, and extend over time. When something breaks (and it will), you'll know exactly which layer to look at.
Step 1: Setting Up the Project
Let's start with a clean, isolated environment. This ensures the bot is repeatable and won't be affected by other Python projects on your machine.
mkdir twitter-monitoring-bot
cd twitter-monitoring-bot
python3 -m venv .venv
source .venv/bin/activate # macOS/Linux
# .venv\Scripts\activate # WindowsInstall the dependencies:
pip install scrapebadger requests
pip freeze > requirements.txtWe'll use scrapebadger for the data collection layer and requests for sending Slack notifications. Now configure your credentials as environment variables. Never hardcode API keys in your scripts.
export SCRAPEBADGER_API_KEY="your_scrapebadger_api_key"
export SLACK_WEBHOOK_URL="your_slack_incoming_webhook_url"You can get a ScrapeBadger API key from scrapebadger.com — there are 1,000 free credits with no credit card required, which is enough to validate the whole pipeline before committing to anything. For the Slack webhook, follow Slack's incoming webhooks guide to create one in a few minutes.
Create the project structure:
mkdir -p output
touch bot.pyStep 2: The Data Collection Layer
This is where most monitoring projects fail. The official Twitter API is expensive and rate-limited. Building your own scraper is a maintenance nightmare — Twitter's anti-bot measures are aggressive, and a scraper that works today can silently break tomorrow when the page structure changes. A scraping API gives us the best of both worlds: reliable, structured data without the operational overhead.
Let's build the data collection function. We'll use the ScrapeBadger Python SDK, which handles authentication, pagination, and response parsing for us.
# bot.py
import asyncio
import os
import logging
from scrapebadger import ScrapeBadger
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s [%(levelname)s] %(message)s"
)
async def fetch_tweets(keyword: str, limit: int = 100) -> list[dict]:
"""
Fetches the latest tweets for a given keyword using ScrapeBadger.
The SDK handles pagination internally — we just set a max_items limit.
"""
api_key = os.getenv("SCRAPEBADGER_API_KEY")
if not api_key:
raise RuntimeError("SCRAPEBADGER_API_KEY environment variable is not set.")
logging.info(f"Fetching up to {limit} tweets for keyword: '{keyword}'")
tweets = []
try:
async with ScrapeBadger(api_key=api_key) as client:
stream = client.twitter.tweets.search_all(keyword, max_items=limit)
async for tweet in stream:
tweets.append(tweet)
except Exception as e:
logging.error(f"Error fetching tweets: {e}")
return []
logging.info(f"Fetched {len(tweets)} tweets.")
return tweetsThe search_all method returns an async generator that handles pagination cursors automatically. We just iterate over it and collect the results. The max_items parameter bounds the job — without it, the stream could run indefinitely. For a monitoring bot that runs every 15 minutes, 100 tweets is a reasonable limit; you're unlikely to miss anything meaningful between runs.
Let's verify this works before going further:
if __name__ == "__main__":
results = asyncio.run(fetch_tweets("producthunt", limit=10))
for tweet in results:
print(tweet.get("id"), tweet.get("text", "")[:80])Run python3 bot.py. You should see tweet IDs and truncated text printed to your console. If you see an error, double-check that your SCRAPEBADGER_API_KEY environment variable is set correctly.
Step 3: Storage and Deduplication with SQLite
Now that we can fetch tweets, we need a place to store them. SQLite is perfect for this use case — it's a single file on disk, requires no separate server, and is fully supported by Python's standard library. For a monitoring bot processing hundreds of tweets per day, SQLite will handle the load without any issues.
The key to deduplication is treating tweet_id as a PRIMARY KEY in our database table. When we try to insert a tweet that already exists, SQLite raises an IntegrityError, which we catch and ignore. This means we can safely re-run the bot without worrying about duplicates, even if the same tweets appear in multiple fetches.
# bot.py (continued)
import sqlite3
from pathlib import Path
SCRIPT_DIR = Path(__file__).parent.resolve()
DB_FILE = SCRIPT_DIR / "tweets.db"
def setup_database():
"""Creates the database and the tweets table if they don't exist."""
con = sqlite3.connect(DB_FILE)
cur = con.cursor()
cur.execute("""
CREATE TABLE IF NOT EXISTS tweets (
tweet_id TEXT PRIMARY KEY,
keyword TEXT NOT NULL,
username TEXT,
text TEXT,
created_at TEXT,
like_count INTEGER DEFAULT 0,
retweet_count INTEGER DEFAULT 0,
reply_count INTEGER DEFAULT 0
)
""")
con.commit()
con.close()
logging.info("Database ready.")
def normalize_tweet(tweet: dict, keyword: str) -> dict:
"""
Flattens the raw tweet payload into a stable, structured dictionary.
Uses safe defaults for all fields to prevent crashes on missing data.
"""
metrics = tweet.get("public_metrics") or {}
user = tweet.get("user") or {}
return {
"tweet_id": str(tweet.get("id") or ""),
"keyword": keyword,
"username": str(user.get("username") or ""),
"text": str(tweet.get("text") or ""),
"created_at": str(tweet.get("created_at") or ""),
"like_count": int(metrics.get("like_count") or 0),
"retweet_count": int(metrics.get("retweet_count") or 0),
"reply_count": int(metrics.get("reply_count") or 0),
}
def save_tweets(raw_tweets: list[dict], keyword: str) -> list[dict]:
"""
Saves tweets to the database. Returns only the newly inserted tweets
(i.e., those that weren't already in the database).
"""
new_tweets = []
con = sqlite3.connect(DB_FILE)
cur = con.cursor()
for raw_tweet in raw_tweets:
tweet = normalize_tweet(raw_tweet, keyword)
if not tweet["tweet_id"]:
continue # Skip tweets with no ID
try:
cur.execute("""
INSERT INTO tweets
(tweet_id, keyword, username, text, created_at, like_count, retweet_count, reply_count)
VALUES
(:tweet_id, :keyword, :username, :text, :created_at, :like_count, :retweet_count, :reply_count)
""", tweet)
new_tweets.append(tweet)
except sqlite3.IntegrityError:
pass # Tweet already exists — this is expected behavior
con.commit()
con.close()
logging.info(f"Saved {len(new_tweets)} new tweets (skipped {len(raw_tweets) - len(new_tweets)} duplicates).")
return new_tweetsNotice that save_tweets returns only the newly inserted tweets. This is important — it's what we'll use to trigger alerts. If the bot runs and finds no new tweets, no alerts are sent. If it finds 10 new tweets, 10 alerts go out.
We also added a keyword column to the database. This is useful if you later decide to monitor multiple keywords with the same bot — you can filter the database by keyword for analysis.
Step 4: Alerting on New Mentions with Slack
Storing tweets is useful for retrospective analysis, but for real-time monitoring, we need alerts. Let's add a function to send a formatted message to Slack for each new tweet.
We'll use Slack's Block Kit format, which produces a much more readable notification than a plain text message.
# bot.py (continued)
import requests
def send_slack_alert(tweet: dict):
"""
Sends a formatted Slack notification for a new tweet.
Uses Block Kit for a clean, readable layout.
"""
webhook_url = os.getenv("SLACK_WEBHOOK_URL")
if not webhook_url:
logging.warning("SLACK_WEBHOOK_URL not set — skipping alert.")
return
tweet_url = (
f"https://twitter.com/{tweet['username']}/status/{tweet['tweet_id']}"
)
payload = {
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": (
f":bird: *New mention of `{tweet['keyword']}`*\n"
f"*@{tweet['username']}* wrote:\n"
f">{tweet['text']}"
)
}
},
{
"type": "context",
"elements": [
{
"type": "mrkdwn",
"text": (
f":heart: {tweet['like_count']} "
f":repeat: {tweet['retweet_count']} "
f":speech_balloon: {tweet['reply_count']} "
f"| <{tweet_url}|View on Twitter>"
)
}
]
},
{"type": "divider"}
]
}
try:
response = requests.post(webhook_url, json=payload, timeout=10)
response.raise_for_status()
except requests.exceptions.RequestException as e:
logging.error(f"Failed to send Slack alert: {e}")The timeout=10 parameter is important. Without it, a slow Slack API response could cause your bot to hang indefinitely, blocking the entire run.
Step 5: Wiring It All Together
Now let's write the main() function that orchestrates the four components, and add command-line argument handling so the bot is flexible.
# bot.py (continued)
import sys
async def main(keyword: str):
logging.info(f"=== Starting monitoring run for keyword: '{keyword}' ===")
# 1. Set up the database (idempotent — safe to call on every run)
setup_database()
# 2. Fetch tweets from Twitter via ScrapeBadger
raw_tweets = await fetch_tweets(keyword, limit=100)
if not raw_tweets:
logging.info("No tweets fetched. Exiting.")
return
# 3. Save to database, get back only the new ones
new_tweets = save_tweets(raw_tweets, keyword)
# 4. Alert on new tweets
if new_tweets:
logging.info(f"Sending {len(new_tweets)} Slack alerts...")
for tweet in new_tweets:
send_slack_alert(tweet)
else:
logging.info("No new tweets found since last run.")
logging.info(f"=== Run complete ===")
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python3 bot.py <keyword>")
print("Example: python3 bot.py 'your-product-name'")
sys.exit(1)
asyncio.run(main(sys.argv[1]))Run the bot manually to test the full pipeline:
python3 bot.py "your-product-name"On the first run, you should see new tweets being saved and Slack alerts firing. On the second run with the same keyword, you should see 0 new tweets — the deduplication is working.
Step 6: Scheduling with Cron
The bot works, but it only runs when we execute it manually. The final step is to automate it using cron, the standard job scheduler on Linux and macOS.
Open your crontab file:
crontab -eAdd a line to run the bot every 15 minutes. Use absolute paths for both the Python executable and the script — cron runs in a minimal environment and doesn't know about your current working directory or PATH.
# Run the Twitter monitoring bot every 15 minutes
*/15 * * * * /home/youruser/twitter-monitoring-bot/.venv/bin/python /home/youruser/twitter-monitoring-bot/bot.py "your-product-name" >> /home/youruser/twitter-monitoring-bot/bot.log 2>&1The >> bot.log 2>&1 part redirects both standard output and error output to a log file, so you can inspect what happened on each run. This is essential for debugging issues with scheduled jobs.
To monitor multiple keywords, add a separate cron entry for each one:
*/15 * * * * /home/youruser/.../python /home/youruser/.../bot.py "your-product-name" >> bot.log 2>&1
*/15 * * * * /home/youruser/.../python /home/youruser/.../bot.py "competitor-name" >> bot.log 2>&1
*/30 * * * * /home/youruser/.../python /home/youruser/.../bot.py "your-market-keyword" >> bot.log 2>&1Notice that the market research keyword runs every 30 minutes instead of every 15. Frequency should match urgency — brand mentions warrant more frequent checks than broad market keywords.
Real-World Use Cases
The same bot architecture serves several different goals depending on what keywords you're tracking.
Brand monitoring is the most obvious use case. Track your product name, your Twitter handle, and common misspellings. A single influential tweet can drive a meaningful spike in signups — or surface a bug you didn't know existed. Getting that alert within 15 minutes instead of three days is the difference between a good customer experience and a missed opportunity.
Competitor tracking means monitoring your competitors' product names and keywords associated with their positioning. When a competitor announces a new feature or receives negative press, you want to know before your customers bring it up in a support ticket. Over time, the database you're building becomes a valuable archive of competitive intelligence.
Lead discovery is an underused application. People frequently tweet about problems they're trying to solve — "anyone know a good tool for X?" or "frustrated with Y, looking for alternatives." Monitoring for these intent signals and engaging authentically is a legitimate acquisition channel for early-stage products. The bot ensures you never miss the signal; human judgment determines how to respond.
Product feedback and market research involves tracking broader topic keywords to understand how your target audience talks about a problem space. What language do they use? What frustrations come up repeatedly? This qualitative data is invaluable for positioning and copywriting decisions, and it's hard to get any other way.
Limitations and Operational Considerations
A few things worth being honest about before you ship this to production.
Noise filtering is a real problem. A keyword like "AI" or "automation" will return enormous volumes of irrelevant content. You'll want to add filters to the save_tweets function — for example, skipping retweets, ignoring tweets with zero engagement, or filtering by language. The goal is a signal-to-noise ratio you can actually act on.
Scheduling frequency has trade-offs. Running every 15 minutes means you get fast alerts, but it also means more API calls and more database writes. For most monitoring use cases, every 15–30 minutes is a reasonable balance. Running more frequently than you need wastes credits and creates noise.
The bot can fail silently. If the ScrapeBadger API is temporarily unavailable, or if your cron job fails to start, you won't know unless you're monitoring the log file. For a critical monitoring system, consider adding a health check service like Healthchecks.io. You'd send a ping at the end of a successful run, and if the ping doesn't arrive on schedule, you get an alert that your bot is down.
SQLite has limits. For a single-user monitoring bot, SQLite is more than sufficient. If you're processing tens of thousands of tweets per day or need concurrent access from multiple processes, consider migrating to PostgreSQL.
Why ScrapeBadger for the Data Layer
The hardest part of building a Twitter monitoring bot isn't the logic — it's getting reliable data into the pipeline in the first place. This is the problem that ScrapeBadger is specifically designed to solve.
The official Twitter API has become prohibitively expensive for most small teams, with the Basic tier starting at $100–$200 per month and strict rate limits that are easy to hit when running a multi-keyword monitoring pipeline. DIY scraping with headless browsers is cheaper but fragile — Twitter's anti-bot measures are aggressive, and maintaining a scraper is a part-time job in itself.
ScrapeBadger sits in between: a structured REST API that returns clean, predictable JSON at $0.10 per 1,000 items with no rate limits. The Python SDK handles all the complexity of pagination, so the data collection layer in our bot is just a few lines of code. When Twitter changes its internal page structure — which happens regularly — ScrapeBadger's team handles the update. Your bot keeps running.
For a monitoring bot checking 100 tweets every 15 minutes across three keywords, you're looking at roughly 12,000 API calls per day, which costs about $1.20. That's a reasonable price for a system that never misses a mention.
Conclusion
You now have a complete, production-style Twitter monitoring bot. It runs automatically, collects data reliably, stores it efficiently with built-in deduplication, and alerts you in real-time when something new appears. The architecture is simple enough to understand in a single sitting but robust enough to run unattended for weeks.
More importantly, this bot represents a shift in how you interact with Twitter. Instead of reactively searching for mentions when you remember to, you have a system that proactively surfaces the conversations that matter to you. Brand mentions, competitor news, potential leads, product feedback — all of it lands in your Slack channel automatically.
The next natural extensions are adding sentiment analysis to prioritize high-priority alerts, building a simple dashboard on top of the SQLite database to visualize trends over time, or expanding the keyword list to cover your entire competitive landscape. The foundation is in place. Build from here.

Written by
Thomas Shultz
Thomas Shultz is the Head of Data at ScrapeBadger, working on public web data, scraping infrastructure, and data reliability. He writes about real-world scraping, data pipelines, and turning unstructured web data into usable signals.
Ready to get started?
Join thousands of developers using ScrapeBadger for their data needs.
