Using Power BI is easy. We download the Desktop software, install it in a snap, add a couple data sources, throw a table on a report, select a few columns. Done in 5 mins. No muss no fuss =) Seems pretty straightforward right? But if you ask any Power BI developer, he/she will tell you the devil is in the details. Having used it for a while, one of my biggest complaints remains an important decision you have to make early when you first start with a new report (and even more so if you are new to Power BI), i.e. DirectQuery vs Import.
It is presented as such an easy-to-answer question; almost everyone already has the SQL Server database ready sitting somewhere. However, when we look deeper, we discover there are worth-mentioning pros and cons in both modes.
“I have the tables. DirectQuery is the obvious choice, why bother?”
It is what everyone thinks BUT when choosing this, you have also declared that you are ready for these limitations:
- Your BI report performance depends greatly on the performance of the underlying data source. If you distribute your reports to a massive number of users, each request and each periodically refreshed dashboard tile will be sending at least one query per visual to the underlying source via the gateway. If your network/database/server becomes too bogged down, any query that takes longer than a few minutes will time out.
- When importing data, we can combine data from multiple sources into a single data model, mixing your own excel or csv files and SQL Server tables. But for DirectQuery, all tables must come from a single database.
- No built-in date hierarchy for DirectQuery. To counter that, you need to have a Date table.
- Very limited functions in calculated columns. Yes, the simple FirstName + LastName, in all examples given, can be done easily but anything a bit more elaborated will leave you scratching your head such as masking a partial Customer/Account ID or putting a period somewhere in the DeptID and append DeptName for display. Not impossible but you will end up finding yourself spending so much time looking up DAX or M language just to do simple things like that. My advice in this case: do it in the backend.
- The ability to define a calculated table using a DAX expression is not supported in DirectQuery mode.
- When using DirectQuery, it is not possible to use the Clustering capability, to automatically find groups.
- Quite a few more…
With these many disadvantages, perhaps we should always use Import? I wish the answer is that easy; that’s why it is a debate sometimes. These are some not-to-be-overlooked reasons to use DirectQuery.
- Data is changing frequently, and near ‘real-time’ reporting is needed.
- Data is very large. The 1 GB dataset limitation does not apply to DirectQuery.
- Security rules are defined in the underlying data source.
- Data cannot leave the organization.
These reasons leave us this question: what are the best practices to use DirectQuery? If you are using Power BI or have interest about this topic, drop me a line. If there is enough interest, I’ll make that my next topic. Happy coding.