Power BI <> Redshift Integration
10 min
power bi ā redshift integration architecture and limitations operational document overview end users connect to redshift through a single, non materialized raw view they can't create additional sql objects ā no joins, no aggregations, no materialized views, no new schema objects that constraint pushes essentially all data modeling work into power bi architecture diagram where data modeling happens in redshift ā nothing the raw view is the only structure available no joins, filters, business rules, or aggregations happen at this layer in power bi ā everything data cleaning and normalization derived columns and aggregations relationships and dax measures all business logic and semantic modeling done via power query (m) for transformations and the power bi data model for relationships and measures power bi is the semantic layer, full stop directquery vs import mode mode behavior under this access model verdict directquery tries to push transformations back to redshift, but joins/aggregations can't fold against a raw view most logic runs locally, performance degrades, visuals can time out not recommended import loads the raw view, then builds the full semantic model inside power bi recommended gateway requirements and ownership redshift is not publicly reachable ā access requires ip whitelisting because of this, the power bi service (cloud) can't connect directly; an on premises data gateway is required to bridge the connection item detail why it's needed power bi's cloud service has no route to redshift unless a gateway, installed on a machine whose ip is whitelisted, relays the connection where it runs on a server/vm with network access to redshift (i e , its outbound ip is on the whitelist) ā typically on prem or in the same vpc/network as redshift who sets it up it / data infrastructure team ā installing and configuring the gateway, and adding its ip to the redshift whitelist, isn't something report authors do who maintains it same team ā gateway software needs updates, the host machine needs uptime/monitoring, and credentials used by the gateway need to stay current cost ownership no separate microsoft licensing fee for the gateway itself the cost is the hosting/compute for the machine running it (vm, on prem server), which falls under whatever team owns infrastructure spend ā not the power bi/reporting budget refresh impact all scheduled refreshes and directquery calls route through this gateway, so its uptime and capacity directly affect report reliability ā another reason it sits with infra, not with individual report owners net this is an infrastructure dependency, not a power bi configuration step whoever owns the whitelist and the hosting environment owns the gateway limitations modeling no star or snowflake schema can be built from redshift directly ā no joins to dimension/lookup tables, no enrichment from other redshift data all of it has to be rebuilt manually in power bi transformation complex transforms increase refresh time, large datasets strain memory, and logic that could be optimized in sql has to be recreated in every report performance larger raw datasets mean bigger power bi datasets, longer refreshes, more gateway load, and degraded responsiveness ā especially under directquery consistency without a centralized semantic layer, business logic can diverge between reports ā same metric, different definition depending on who built it risks risk detail scalability as volume grows, import refreshes may exceed time limits or dataset size caps; memory pressure increases and reports get unstable maintenance any change to the raw view structure means updating every report, rebuilding transforms, revalidating dax, redoing relationships data quality no upstream normalization means duplicates, nulls, and inconsistent formats land in power bi and have to be fixed manually ā errors can propagate across reports governance no shared semantic layer or enforced logic means no shared kpi definitions, and analysts may implement the same metric differently \<font color="#b91c1c">bottom line redshift is acting purely as a data source here ā one read only view, no modeling power bi is absorbing every modeling, transformation, and governance responsibility that would normally sit upstream import mode is the only mode that holds up under this constraint \</font> summary redshift single raw, non materialized view ⢠no joins ⢠no aggregations ⢠no new sql objects ⢠read only accessraw rows power bi becomes the entire semantic layer ⢠power query cleaning, joins, modeling ⢠data model relationships ⢠dax business logic & measures ⢠import mode (recommended) loads raw view, models locally, builds full semantic model in power bi's own engine ⢠directquery not recommended query folding fails on joins/aggregations, transformations run locally ā slow, visuals may time out
