My Work at Share.xyz: Building the Prediction-Markets Discovery Layer

Open Table of contents

what share is
performance
reliability
architecture
product
stack

Share is Instagram for onchain assets. as Polymarket’s official partner, it’s the face of Polymarket onchain on mobile.

i own the discovery layer of that: the read path, rendering, search, explore, leaderboards, positions and notifications behind every sports-market screen in the app. it runs across two services i live in every day, a Go gateway and a Python service, talking to Postgres, Redis and external markets APIs.

owning a layer means i’m on the hook for what people feel, not just the code i merge. so i’ve organized this by what each piece actually moved. everything here is shipped and in production.

performance

this is my favorite part. when something is slow i don’t want a theory, i want to see it. so most of this is the same loop: measure, find where requests serialize or redo dumb work, remove it, prove it’s gone.

prediction tags: >10s to ~11ms on a warm cache (~100x), and the upstream 429s gone. the cache was plain TTL, so every expiry was a hard miss that re-ran the whole multi-call fetch on whichever request was unlucky. i moved it to stale-while-revalidate: a soft TTL for freshness, a hard TTL for validity. stale reads return instantly and kick off one background refresh, shared across replicas with a Redis SET NX EX lock so nothing stampedes on expiry.
explore / series / odds-chart: ~14s / 7s / 9s to sub-second. the killer was N+1 database queries for team colors on every row, plus a per-row DB hit building chart titles. batched the color lookups, served titles from cached team data.
leaderboard: 10-16s to ~4.2s median (~4x). two N+1s, one per service, both found in traces. the Go side did one Redis GET per wallet, 50 round-trips for a 50-row board, so i made it one MGET. the Python side resolved proxy→EOA wallets one Postgres query at a time, so i made it one batch query. processing dropped to a few hundred ms. what’s left is an external API, not us.
username search: timeout to sub-second. a B-tree index, after actually benchmarking B-tree vs GIN + pg_trgm for the query shape instead of guessing.
found why FastAPI throughput fell apart under load. concurrent requests were serializing on the single asyncio event loop, because blocking HTTP and CPU-bound transforms were running right on it. at 100 concurrent searches the endpoint ran 8.3x slower than its own single-request time. the rule i set: I/O waits go async on the loop (httpx, HTTP/2, a wide pool), CPU work gets pushed to a thread pool. then i moved 40+ client methods to async and got the transforms off the loop. scaling went back to roughly linear.

reliability

money-adjacent code has to be correct. these are the bugs that taught me to respect the details.

a 7ms idempotency race in prod. duplicate posts and comments showing up milliseconds apart, the classic Redis GET-then-SET. replaced it with an atomic SETNX and a rollback. window closed.
87% of closed positions were silently dropped (48 → 182, 3.8x). two bugs across 365 positions i went through by hand: pagination stopped early because it checked the requested limit instead of the API’s hard 50-per-page cap, and the markets fetch never passed closed=true (it defaults to false), so every settled market vanished. the same fix cut market_not_found skips by 85% (240 → 35).
proxy-wallet search was just broken. proxy and EOA addresses were stored swapped. fixed the resolution to go one direction only, gated on wallet type.
killed a crash loop from memory growth. the service sat near 92% RAM and kept restarting, and there was no memory observability to tell a real leak from a plateau. added per-worker metrics (RSS, threads, file descriptors, GC by generation) and continuous profiling, then traced the growth to multi-worker preload fragmentation, not a leak. that pointed the fix at worker and process tuning instead of a phantom hunt.
capped runaway concurrency. put a limit on errgroups that fan out over big arrays, so they stop spiking DB connections and CPU.

architecture

rewrote the event and market core in Go. the prediction event and market reads used to proxy every request to the Python service. i moved that core into the Go gateway so it serves natively. the hard part is safety, so i built a three-mode rollout: proxy, shadow, native. shadow runs both at once on live traffic, diffs every field, emits a mismatch metric. you prove parity before you cut over. i did the core pieces and set the pattern, and the rest of the team is extending it.
a backend-driven display contract for sports markets. all the per-market-type display logic used to live in the client. i moved it into a backend displayTitle / displaySubtitle contract with the category derived at read time, so fixing or adding a market type is a backend change, not an app release.
re-architected error handling in the Python service. prod 500s were almost impossible to debug: across 200M log lines in a week, zero tracebacks. root cause was a guessing game. i replaced HTTP-coupled exceptions with a typed error model: a small set of kinds that map to status and log severity, a free-form code for the exact cause, structured context logged untruncated, one translation boundary, one problem+json shape out. now a 500 arrives carrying its own cause and trace id. hours of local repro became reading one line.
wrote the team’s engineering standards for the service: typed domain ids, Decimal for money and odds, UTC everywhere, typed exceptions that are never swallowed, async that never blocks the loop.

product

i also ship a lot of the surface people actually touch.

led the Spreads / Totals / Props launch. ~40 urgent tickets in 3 weeks, 10+ sports (tennis, cricket, soccer, UFC, esports, basketball, rugby), zero rollbacks. spread sorting, UFC rounds, odd/even grouping, player props with sport-specific verb and unit, tennis 1st-set split.
led the FIFA World Cup 2026 backend. a dedicated page, group sub-tags and team maps, per-group pages, search injection so games show up, live timestamp and status fixes, and compatibility for older iOS clients gated on App Store version.
built the finance / crypto explore. synthetic time-period tags, a dual-call event setup, 3rd-degree nested tag AND-filtering, a unified explore-tag config.
built @-mentions of any user or token, end to end: prefix autocomplete, structured-text storage, the data model, bulk hydration, push and pull notifications.
built the points / rewards (airdrop) system: on-chain and in-app activity aggregation, scoring, leaderboard, schema.
built the prediction notifications pipeline. every prediction notification routes through it.
wired up analytics and observability: Mixpanel events and Sentry, including in Lambda.

stack

Go and Python in prod. Postgres, Redis, DynamoDB for storage. AWS (ECS, Lambda). Polymarket, Gamma and Polynode on the outside. ~10 months, ~280+ PRs across the two repos, but the lines above are the ones i’d actually point at.