How I integrated an LLM into SahamLens without leaking portfolio data.

A two-stage pipeline: local summarisation first, LLM analysis second. The portfolio data never leaves the machine. Here is how the architecture works and what the prompt engineering problem actually was.

May 25, 20263 min readsahamlens, llm, anthropic, local-first, python

On this page

The pattern I rejected

The obvious pattern: send the full stock data context to the LLM API on every request. Simple, stateless, easy to implement.

The problem: my portfolio positions, watchlist, and trade journal would be in every API request. Anthropic's data retention policy is reasonable, but I do not want to depend on a policy. I want the data to stay local by design.

What I built instead

A two-stage pipeline:

Local summarisation. The Python core computes indicators, scores tickers, and generates a structured summary — all locally. The summary contains no portfolio positions, no trade history, no personally identifying data. Just: ticker, indicator values, signal strength, recent news headlines.

LLM analysis on the summary. The summary goes to the Anthropic API. The LLM sees market data, not my portfolio. It returns an analysis brief. The brief is stored locally in DuckDB.

The portfolio data never leaves the machine. The LLM sees only what I would be comfortable posting publicly.

The prompt engineering problem

The first version of the system prompt produced generic analysis. 'BBCA shows bullish momentum based on RSI.' That is not useful. I already know the RSI value.

What I wanted: analysis that accounts for Indonesian market context. IDX has different liquidity profiles, different sector dynamics, and different retail investor behaviour than US markets. A prompt tuned on US market data produces US-market analysis.

I added three things to the system prompt:

A few-shot example with an IDX ticker (BBCA) showing the analysis style I wanted.

Explicit context about IDX market hours, settlement cycles, and common retail patterns.

A constraint: 'Do not reference US market analogies unless directly relevant.'

Quality improved measurably. The analysis started surfacing IDX-specific observations — sector rotation patterns, foreign investor flow signals, rights issue dilution effects — that the generic prompt missed.

What I would tell past me

Design the data boundary first. Decide what the LLM is allowed to see before you write a single line of integration code. If you design the integration first and the boundary second, you will find yourself refactoring the data pipeline to fit the LLM's input shape — which is backwards.

The local-first constraint was not a limitation. It was a forcing function that produced a better architecture. The two-stage pipeline is more testable, more auditable, and more composable than a single API call would have been.