Knowing Your Data

When creating a point-in-polygon solution, it is important to understand how your data will affect the performance and choices of which Spectrum operation you will use, and some of the limiting factors of your data.

Where is your data? Knowing where your data is located and what type of data you are going to have for your solution is important. For example, having TAB files on the file system vs data in a DBMS will change performance of your operation. Spectrum pushes the processing of certain operations (spatial joins) down to the database (for example, Oracle and SQL Server) which will increase your performance. For example, operations similar to the following:

Select a.id, b.id from flood_plane a, customers b where MI_Contains(a.geom, b.geom)

What is the geometry format? There is a difference in performance when using a MapInfo native geometry format vs a file with x/y format (for example, CSV file with lat/long values). To improve performance, consider using a x/y format instead of MapInfo native geometry format.

Is your data static or changing? If you know your data is not changing (all or just some of your data), and you are using the Query Spatial Data operation, there is a configuration option when creating your named resources that can greatly improve performance. In Spectrum this is know as volatility. By default, all resources have volatility set to true. This means Spectrum is assuming that the data can change at any time, and has to check the data each time it is accessed to determine if it has changed and decide if it needs to load new data.

So how does volatility affect point-in-polygon operations?. If you have volatility set to true, and even if the data source has not changed, just the matter of checking the resource will decrease performance. If you know your data is not going to change, and performance is a requirement, then turn off volatility. If performance is an issue, and you know some of your data is going to change (say client lists or survey points), but some of your data is not going to change (parcels or land or sales areas), then make sure volatility is turned off on as many of your static data as possible. This will increase performance. For more information on volatility and how to change this setting for resources, see Data Source Volatility.

Are you using TAB files? When using TAB files, you have the ability to maintain a pool of open file handles to avoid the expense of opening and reopening every time the file is read. Spectrum Spatial will use the file handle pool for native TAB files whose volatility setting is false. Native TAB files include native Extended (NativeX) and Seamless TAB files. All tables in the Spectrum Spatial repository are, by default, volatile (true). Volatility for native TAB files means that the schema could change at any time. To take advantage of this performance boost, set the volatility setting to false in Spatial Manager. In general, setting volatility to false is recommended if the data will only be changing at known time periods or not at all. For more detailed information about the file handle pool for native and seamless TAB files, see the MapInfo Native TAB and Seamless TAB topics in the Resources and Data section of the Spectrum Spatial Guide.