-
Bot Attack Surface Area

Description
1) Test Approach:
- Choose tools to measure impact of Bots
- Choose tools to induce Bot like stress
- Form a baseline of resource usage with Bot attack before applying Bot guardrails
- After each layer is added — verify the layer holds under the same load. We would watch the sztab-backend and sztabina pods specifically during bot stress tests.
2) Identify tools to measure impact of Bot (CPU usage or I/O usage)
- Grafana + Prometheus — we already have this or it's easy to add to the cluster via Helm.
- Gives us CPU, memory, and network I/O per pod.
3) Identify tools to induce Bot-like stress
A) k6 — open source load testing tool
We can write scripts in TypeScript and simulate concurrent anonymous/bot traffic against specific endpoints.
Example:
import http, { RefinedResponse, ResponseType } from 'k6/http'; import { check } from 'k6'; export default function (): void { const res: RefinedResponse<ResponseType> = http.get( 'https://staging.sztab.com/api/projects/1/pulls/5/diff', { headers: { 'User-Agent': 'GPTBot/1.0' }, } ); check(res, { 'status is 200': (r) => r.status === 200, }); }B) Java with Gatling
This is essentially the Java equivalent of k6. Since the broader Tigase team is Java-first, Gatling scripts would feel more natural to them and fit into Maven builds. Shall I use this option? This way the Bot simulation scripts an be reused for other Tigase projects.
Kotlin developers can use Gatling in Kotlin; Java developers can use Gatling in Java
C) JMX
JMX scripts can serve a dual purpose:
- Bot simulation
- Stress test
However, k6 is frictionless and will work "out of the box".
4) Layered approach to Bot mitigation
a) Layer 1: Spring Security — anonymous request blocking
(lowest effort, highest impact)b) Layer 2: Caddy — rate limiting + bot filtering at the edge
(before Spring even sees the request)c) Layer 3: robots.txt (soft signal, respected by well-behaved bots)
d) Layer 4: Permission-based access (Artur's suggestion — most flexible)
4.1 Layer 1
The simplest way is to identify the most expensive APIs and mandate authentication for shortlisted APIs.
With Spring this is easy: in the Spring Security policy add
.authenticated()for such endpoints.APIs that trigger git clone and git merge are candidates.
4.2 Layer 2
Since Caddy is already our reverse proxy with
forward_auth, we can add:# Rate limiting for anonymous traffic @anonymous not header Authorization * @anonymous not header Cookie * rate_limit @anonymous 10r/m # Block known bot user agents @bots header_regexp User-Agent `(?i)(GPTBot|ClaudeBot|CCBot|Bytespider|SemrushBot|AhrefsBot)` respond @bots 403This stops bots before they consume Spring Boot or Sztabina resources at all.
4.3 Layer 3 — robots.txt
Serve a
robots.txtfrom Caddy directly blocking AI crawlers:User-agent: GPTBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: CCBot Disallow: / User-agent: * Disallow: /api/ Allow: /This is a soft signal, respected only by well-behaved crawlers.
4.4 Layer 4 — Permission-based access
This is the existing ExternalUserPolicy / role system extended with a new dimension.
Instead of just authenticated vs anonymous, we gate by role.
Example:
@PreAuthorize("hasPermission(#projectId, 'Project', 'READ_DIFFS')") public DiffResponse getPullRequestDiff(...) { ... }Roles like GUEST / COMMUNITY could be explicitly excluded from diff/search endpoints even if authenticated.
This is useful if we ever allow public read-only accounts but still want to protect expensive resources.
4.5 Layer 5 (Host Layer) — Using Host IDS (such as OSSEC)
OSSEC / Wazuh (OSSEC's modern fork) can help — it does log analysis, anomaly detection, and can trigger active responses (e.g. auto-ban an IP via iptables). But I think for now this may be an overkill in Sztab's context.
Known limitations
- Authenticated scenario uses a single shared session cookie across all 20 VUs. Real bot farms distribute load across multiple accounts/sessions. A more realistic simulation would create 5-10 bot accounts and distribute cookies among VUs — deferred to a later iteration.
-
I have assumed that the Bots/crawlers can cause performance issues Alon by exhausting resources.
But bots can also attempt privilege escalation. Hence this issue is in part about security posture as well.
Data harvesting is another risk: A crawler indexing all the issues, PRs, comments, and code — even if read-only, this is a confidentiality problem for private projects and can be used for competitor intelligence gathering.
Please let me know if we should treat this as a performance issue alone in this rev.
-
Monitoring tool
Phase 1 (immediate) — kubectl top for CPU/memory across the three pods during stress tests. Free, zero setup, good enough to establish baseline.
Phase 2 (proper) — add node_exporter to the EC2 node for disk I/O, feed into Grafana alongside Caddy metrics. Full picture.
-
SZ-73 Bot Protection — Baseline Measurements
Purpose
Establish pre-mitigation resource usage baseline on staging, before any bot protection layers are applied. These numbers will be used to validate the effectiveness of each mitigation layer as it is implemented.
Environment
- Cluster: k3s on AWS EC2 (us-west-2)
- Host: ec2-35-87-145-56.us-west-2.compute.amazonaws.com
- Namespace: sztab-staging
- Image tag: sz73-bot-protection (rebased on wolnosc, no SZ-73 changes applied yet)
- Date: 2026-03-12
Idle Baseline (no load)
Captured via
kubectl top pods -n sztab-stagingwith no active traffic.Pod CPU (cores) Memory sztab-backend 5m 369Mi sztab-db 4m 46Mi sztabina 1m 1Mi caddy 1m 10Mi sztab-ui 1m 2Mi Notes:
sztab-backendmemory at 369Mi reflects normal Spring Boot JVM baseline (expected)sztabinaandcaddyare effectively idlesztab-dbat 4m CPU reflects background PostgreSQL activity only
Bot Stress Baseline (under simulated load)
TODO: Run k6 stress test simulating anonymous bot traffic against expensive endpoints. Capture CPU and memory spike for sztab-backend, sztabina, and sztab-db.
Target Endpoints
Endpoint Why Expensive GET /api/projects/{id}/pulls/{id}/diffTriggers git diff via Sztabina GET /api/projects/{id}/issues?q=...DSL query, DB-heavy GET /api/projects/{id}/files/{branch}Git tree traversal via Sztabina k6 Test Parameters
- Virtual users: TBD
- Duration: TBD
- User-Agent:
GPTBot/1.0(simulates AI crawler) - Auth: none (anonymous)
Results
TODO: Fill in after k6 run.
Pod CPU (cores) Memory Delta vs Idle sztab-backend - - - sztab-db - - - sztabina - - - Post-Mitigation Measurements
TODO: Re-run same k6 test after each layer is applied and record results here.
Layer Description Backend CPU Sztabina CPU Notes Layer 1 Spring Security .authenticated()- - - Layer 2 Caddy rate limiting + bot UA blocking - - - Layer 3 robots.txt - - soft signal only Layer 4 Permission-based access (role gating) - - - -
Next step: install k6 on my laptop:
rksuma@Ramakrishnans-MacBook-Pro sztab % brew install k6 //... rksuma@Ramakrishnans-MacBook-Pro sztab % k6 version k6 v1.6.1 (commit/devel, go1.26.0, darwin/arm64) rksuma@Ramakrishnans-MacBook-Pro sztab %Now, I'll write a Python or Typescript script targeting the three expensive endpoints with a GPTBot user agent, no auth, and enough virtual users to actually stress the backend.
-
-
Results of Layer 1 testing after locking down all expensive methods with .authrequired() => (please disregard the spurious error at the end in deleting the test project)
Essentially since the Bot does not authenticate itself, it runs into http/403 for all hits and hence makes no difference to the resource usage of Sztab.
rksuma@Ramakrishnans-MacBook-Pro sztab % ADMIN_USER=admin ADMIN_PASSWORD=SztabStagingAdmin! ./scripts/stress-test/k6/run-stress-test.sh [INFO] === SZ-73 Bot Stress Test === [INFO] Base URL: http://ec2-35-87-145-56.us-west-2.compute.amazonaws.com [INFO] Namespace: sztab-staging [INFO] VUs: 50 [INFO] Duration: 60s [INFO] --- Step 1: Login --- [INFO] Login successful. [INFO] Logged in as user id=1 [INFO] --- Step 2: Create Sztab project --- [INFO] Project 'SZ73 Stress Test' already exists — looking up existing project... [INFO] Found existing project: id=16 [INFO] --- Step 3: Create issue --- [INFO] Issue created: id=3 [INFO] --- Step 4: Create pull request --- [INFO] Pull request created: id=3 [INFO] --- Step 5: Baseline pod metrics (idle) --- NAME CPU(cores) MEMORY(bytes) caddy-847774bbf9-xzvnv 1m 12Mi sztab-backend-644c77d58-r46xd 2m 432Mi sztab-db-fb967c9d5-fs84w 2m 44Mi sztab-ui-57764ffc4f-r9hlg 1m 3Mi sztabina-65b5cff756-kzl4f 1m 3Mi [INFO] --- Step 6: Run k6 stress test --- [INFO] Watch pod metrics in another terminal: kubectl top pods -n sztab-staging --watch /\ Grafana /‾‾/ /\ / \ |\ __ / / / \/ \ | |/ / / ‾‾\ / \ | ( | (‾) | / __________ \ |_|\_\ \_____/ execution: local script: /Users/rksuma/tigase/sztab/scripts/stress-test/k6/bot-stress-test.ts output: - scenarios: (100.00%) 1 scenario, 50 max VUs, 1m30s max duration (incl. graceful stop): * default: 50 looping VUs for 1m0s (gracefulStop: 30s) █ THRESHOLDS http_req_duration ✓ 'p(95)<5000' p(95)=134.53ms █ TOTAL RESULTS checks_total.......: 69856 1161.339279/s checks_succeeded...: 25.00% 17464 out of 69856 checks_failed......: 75.00% 52392 out of 69856 ✗ status is 200 (unprotected) ↳ 0% — ✓ 0 / ✗ 17464 ✗ status is 401 (auth required) ↳ 0% — ✓ 0 / ✗ 17464 ✓ status is 403 (bot blocked) ✗ status is 429 (rate limited) ↳ 0% — ✓ 0 / ✗ 17464 HTTP http_req_duration....: avg=71.19ms min=29.65ms med=55.02ms max=422.05ms p(90)=124.52ms p(95)=134.53ms http_req_failed......: 100.00% 17464 out of 17464 http_reqs............: 17464 290.33482/s EXECUTION iteration_duration...: avg=172.15ms min=130.26ms med=155.68ms max=522.51ms p(90)=225.17ms p(95)=235.3ms iterations...........: 17464 290.33482/s vus..................: 50 min=50 max=50 vus_max..............: 50 min=50 max=50 NETWORK data_received........: 7.8 MB 129 kB/s data_sent............: 2.3 MB 38 kB/s running (1m00.2s), 00/50 VUs, 17464 complete and 0 interrupted iterations default ✓ [======================================] 50 VUs 1m0s [INFO] --- Step 7: Pod metrics (post-stress) --- NAME CPU(cores) MEMORY(bytes) caddy-847774bbf9-xzvnv 99m 20Mi sztab-backend-644c77d58-r46xd 252m 440Mi sztab-db-fb967c9d5-fs84w 2m 45Mi sztab-ui-57764ffc4f-r9hlg 1m 3Mi sztabina-65b5cff756-kzl4f 1m 4Mi [INFO] === Stress test complete. Teardown will run now. === [INFO] --- Teardown --- [INFO] Deleting Sztab project 16... [ERROR] Failed to delete project 16 [INFO] Teardown complete. rksuma@Ramakrishnans-MacBook-Pro sztab % -
Baseline stress test results (pre-protection, 2026-03-14)
Ran k6 stress test against staging (
ec2-35-87-145-56.us-west-2.compute.amazonaws.com) with 50 VUs for 60s — 30 unauthenticated (anonymous bot simulation) and 20 authenticated (bot with DEVELOPER role, hitting issues/PR/branch endpoints).Throughput: 279 req/s
Pod metrics (idle → under load)
Pod CPU idle CPU load Memory idle Memory load sztab-backend 2m 370m 443Mi 544Mi sztab-db 4m 137m 46Mi 77Mi caddy 1m 117m 23Mi 23Mi sztabina 1m 1m 2Mi 2Mi Observations
- Unauthenticated requests: 100% returning 403 -- Layer 1 (Spring Security) blocking all anonymous traffic correctly.
- Authenticated requests: 100% returning 200 -- DEVELOPER role has correct read access.
- Backend CPU peaks at 370m under load -- this is the baseline to beat after Caddy rate limiting is applied.
- DB CPU peaks at 137m -- issue/PR list queries are the likely driver.
- Sztabina unaffected -- git ops not triggered by read-only REST traffic.
Known limitations
- Authenticated scenario uses a single shared session cookie across all 20 VUs. Real bot farms distribute load across multiple accounts/sessions. A more realistic simulation would create 5-10 bot accounts and distribute cookies among VUs -- deferred to a later iteration.
Next steps
Implement Layer 2 (Caddy rate limiting) and re-run to measure impact.
-
Layer 2: Caddy-level rate limiting and bot blocking
Rejection is now pushed upstream to the reverse proxy, before requests ever reach the JVM. I added two defenses to the Caddyfile:
-
UA blocklist -- known well-behaved AI crawlers (GPTBot, ClaudeBot, CCBot, Bytespider, SemrushBot, AhrefsBot) are rejected with 403 at the proxy edge. Btw, this check is easily sidestepped: adversarial scrapers that spoof their user agent will bypass this, which is why rate limiting is the primary defense.
-
Anonymous rate limiting -- unauthenticated traffic is capped at 30 requests/min per IP. Authenticated users (identified by session cookie or API token) are exempt. At 30 r/m, a human browsing casually has ample headroom; a bot hammering endpoints hits the ceiling immediately.
To support this, I built a custom Caddy image with the rate limiting plugin baked in, pinned to
v2.8.4for reproducibility. The next stress test run will measure how much backend CPU drops as a result. -
-
Layer 2 stress test results (Caddy rate limiting, 2026-03-14)
Setup
Same test as baseline: 50 VUs for 60s, 30 unauthenticated and 20 authenticated (DEVELOPER role). Rate limiting applied to anonymous traffic only (30 r/min per IP).
Pod metrics (idle => under load)
Pod CPU idle CPU load Memory idle Memory load sztab-backend 2m 174m 443Mi 542Mi sztab-db 4m 147m 46Mi 77Mi caddy 1m 102m 12Mi 17Mi sztabina 1m 1m 2Mi 2Mi Comparison vs baseline (Layer 1 only)
Pod Layer 1 Layer 2 Change sztab-backend 370m 174m -53% sztab-db 137m 147m ~flat (noise) caddy 117m 102m -13% Observations
- Backend CPU dropped by 53% -- anonymous bot traffic is now absorbed by Caddy before requests reach the JVM. The JVM no longer wakes up, allocates objects, or runs the filter chain for unauthenticated requests that exceed the rate limit.
- DB CPU is flat -- authenticated queries still run as expected. The reduction in backend CPU is entirely from eliminating the unauthenticated filter chain overhead.
- Caddy CPU is slightly lower too -- the rate limit decision short-circuits before the upstream proxy step, so Caddy does less work per rejected request than it did forwarding 403s from the backend.
- Memory is stable across both scenarios -- no sign of heap pressure or GC storms under load.
Next steps
Layer 3 (robots.txt) and Layer 4 (permission-based access gating) to follow.
| Type |
New Feature
|
| Priority |
Normal
|
| Assignee | |
| Version |
none
|
| Sprints |
n/a
|
| Customer |
n/a
|
The main problem we faced was our servers overload by AI bots and crawlers. The most sensible solution seems to hide resource heavy from anonymous or guests access. I suggested to make these operations accessible based on user permissions. This would give us the most flexibility.