Wednesday, April 28, 2010

When GEO meets SQL: Hotwiring Data to Locations

I gave this presenation last Saturday at SQLSaturday41, in Atlanta. This was conducted by the local SQL Server user group and PASS, the international association of SQL Server professionals.

The talk was ambitious: Show the geospatial characteristics (distance, drive time) for members of an ice cream gourmet society related to three ice cream facilities. Substitute patients for members and clinics for facilities as you please. Key methods, properties, and queries were shown, as well as several tricks and traps.  All of this was done in the context of enhancing an existing data mart/warehouse with geo-aware data. More details, including the slide deck and samples, can be found here .

Michael

Sunday, April 11, 2010

Session on SQL Server Geospatial Features to be presented

On April 24th, I'll be presenting the topic: When GEO meets SQL: Hotwiring Data to Locations at SQLSaturday in the Atlanta regional Microsoft office.


This is a beginner's level session on geospatial for the DBA and TSQL developer. We will focus on the adaptation of the spatial capabilities to a data warehouse and how to acquire resources with little or no money ( a key requirement in the health care field). The description is below:


Spatial fields, operators, and functions were introduced in SQL 2008. I will detail how these can be integrated with location-based information in existing data warehouses and marts - and on the cheap. We will briefly discuss the basics (geometry versus geography, geocoding), then move on to the common problems with this task and the remedies.

------------------
Stay tuned. I'll let you know how it went.

Saturday, March 6, 2010

When Queries Zone Out in RS - The Unknown Element

Frequently the report developers will approach you with a problem: A query that runs lightening fast in SQL Server Management Studio completely tanks when the query is hosted by a Reporting Services report.

RS 2008 makes calls to the Data Source in a slightly different fashion than a call by other clients. One major difference is that parameters are called using Unicode. This can result in table scans if the destination objects are not in Unicode.

Besides checking for the usual issues (table scans, indexes, etc.), I've had good results with the inclusion of the Optimize for Unknown optimizer hint. This has restored performance for many queries, from minutes to seconds. Give it a try. See details below:

http://blogs.msdn.com/mssqlisv/archive/2008/11/26/optimize-for-unknown-a-little-known-sql-server-2008-feature.aspx

Back in the Swim

This blog has been woefully neglected. Look for blogs on the following subjects in the new few weeks:

 
  • Dimensional modeling tips for a Mart based upon Epic Clarity data.
  • Use of SQL Server's Spatial fields - adding geospatial to a data mart.
  • Geocoding data - on the cheap.

Sunday, September 20, 2009

A New Job, (almost) the Same Environment, and a New Perspective

I'm now at a different employer. Yes, it's still the health field, but it's a well known health plan rather than a pediatric hospital. It's still an Epic shop and Clarity is used, but the source is Teradata. SQL Server is used as the mart data store and query engine.

Unlike the former shop, I now work in the business and analytics section rather than IT. Both the former and latter have difficulties with Epic's over-normalization of the 7000 Clarity tables and all that implies - although the size of the data in the new environment is far larger than the former. In response, the new employer started a couple of years ago to identify demensional models of the core business. Extracts were in turn developed to feed Clarity data into star structures in MSSQL. All of this was farmed out to a third party server hosting firm. The result: We focus on the business need (trends in membership, payments, etc.) rather than the SQL maintenance and other back room tasks.

Performance in the data warehouse is terrific. Stay tuned on other observations as I pursue this leg of my career.

Back from Microsoft's Health User Group

This two day exploration of the use of MS technology meets annually on the MS campus in Redmond, WA. It's attended by a cross section of health IT people. There's a similar meeting in the Spring for one day.

What’s new? I’ve been attending since 2006 with a particular interest in Business Intelligence, but can say that it had not been a significant HUG interest until now. The sessions on BI, analytics, or Clinical Informatics were packed. One session, mentioned below, was SRO until more chairs could be obtained.

One slide for a keynote (which isn’t available) cited a Gartner statistic that BI would be the top interest in Health IT in both 2009 and 2012.

A theme that repeatedly surfaced was the paramount role of business and clinical lines rather than IT. One session in particular was packed, Leveraging SharePoint’s Business Intelligence Capabilities to Drive Process Improvements in Healthcare.

Don’t dwell to much on the Sharepoint theme. The other theme was how to structure a successful BI operation in a health care organization. Basically, the hospital for which the presenter worked embarked for years on a BI effort led by IT, which failed. Success only occured when a BI Governance system was put in place to transform the process - one which placed control by the business stakeholders.

The other key presentation to note was Next Generation Decision Support for Hospitals: Using Microsoft Technologies You Already Own to Drive New Levels of Performance, delivered by Bob Lokken, CEO, WhiteCloud Analytics. Bob founded and developed Proclarity, a company which consulted this last decade on building solutions with MS Analysis services (“cubes” and data mining). Furthermore, his company developed software which became perhaps the pre-eminent software to “surf” on the cubes.

He sold the company to Microsoft in 2006 and worked there briefly as world-wide head of the BI consulting arm. Now he’s back with his own new company that specializes on providing analytics to the health care vertical. I had met him years ago and introduced him at his section.

Bob mentioned that 5 out of 6 (successful?) projects are now initiated not by the IT shop, but by the business and clinical lines. He focused on the difference between reporting and analytics and offered six examples.

I noticed that his samples documented outliers that exceeded a certain number of standard deviations. Also, he keenly documented how necessary it was for designers to be aware of navigation (drill down, drill to a separate “report”). Some of the data was run through data mining before the user “surfed” on the data.

The other big item was “self-serve” BI, whereas power users have the tools to discover patterns without the need to have the Report Team develop something per a work order request. Bob was particularly critical of the issue of limiting analysis to those trained in SPSS and other statistical power packages. He outlined size examples where analytics (as opposed to operational reporting) uncovered relationships which yielded significant savings and/or put a health care institution back on track with a sound footing.

Saturday, July 25, 2009

RS Parameter Problems - Part I

Microsoft issued a "Fast Publish" notice about a problem with parameters issued by Reporting Services
(http://support.microsoft.com/default.aspx?scid=kb;en-us;2000524&sd=rss&spid=13165).

A Microsoft consultant told me about this in mid-May. Parameters are coerced into Unicode and applied in the SQL query to the SQL Server.

The problem is query optimization. As suggested in the notice:

This may result in performance issues on the backend SQL server. For example SQL server may perform a table scan instead of a seek.

The interim workarounds are:

  • Use an expression based command text with Join().
  • Use stored procedures with parameters passed in e.g. as comma-separated string.
  • Implement custom data extensions with your own implementation of multi-value query parameter rewrite.
  • Change database schema column definition to NVarchar

None of them are particularly appealing if your shop has invested in designing hundreds of reports with multi-parameter drop down options against thousands of tables in the Clarity database, many of which are so large that access using a table scan is of concern.

The first two are the most feasible and are being explored. However, now the issue is public and some clever approaches will likely be forthcoming.

Stay tuned on this issue.

There is an even larger issue of parameters in RS and the optimization of queries. That will be covered in a subsequent posting.

Edit Update: One of the MS field engineers provides more detail on the remedies at http://blogs.msdn.com/sql_pfe_blog/archive/2009/06/17/possible-performance-implications-when-using-string-parameters-in-reporting-services.aspx

I'll be applying these next week to some of our problem queries issued by RS and see if index scans can be turned into index seeks.