Classes¶
This section describes the Classes and APIs used in Location Intelligence SDK for Big Data.
SpatialAPI¶
-
class
li.SpatialAPI.SpatialAPI¶ This class contains all the supported operations methods for spatial operations.
Supported spatial operations:
PointInPolygon
SearchNearest
JoinByDistance
GenerateHexagon
-
static
generateHexagon(sparkSession: pyspark.sql.SparkSession, minLongitude: float, minLatitude: float, maxLongitude: float, maxLatitude: float, hexLevel: int = 1, containerLevel: int = 1, numOfPartitions: int = 1, maximumNumOfRowsPerPartition: int = 1)¶ A HexagonGeneration Operation: This method generates the hexagons within a bounding box defined by minimum and maximum value of longitude and latitude Hexagon output can be used for map display.
- Parameters
sparkSession (pyspark.sql.SparkSession) – Spark session to be used
minLongitude (float) – Minimum longitude value of the bounding box for which hexagons needs to be generated
minLatitude (float) – Minimum latitude value of the bounding box for which hexagons needs to be generated
maxLongitude (float) – Maximum longitude value of the bounding box for which hexagons needs to be generated
maxLatitude (float) – Maximum latitude value of the bounding box for which hexagons needs to be generated
hexLevel (int) – The level to generate hexagons for. Must be between 1 and 11, defaults to
1containerLevel (int) – A hint for providing some parallel hexagon generation. Must be less than the hexLevel parameter, defaults to
1numOfPartitions (int) – Number of partitions, defaults to
1maximumNumOfRowsPerPartition (int) – Maximum number of rows per partition, defaults to
1
- Returns
A dataframe representing the hexagons in WKT format
- Return type
pyspark.sql.DataFrame
-
static
joinByDistance(df1: pyspark.sql.DataFrame, df2: pyspark.sql.DataFrame, df1Longitude: str, df1Latitude: str, df2Longitude: str, df2Latitude: str, searchRadius: float, distanceUnit: str, geoHashPrecision: int = 7, options: Optional[dict] = None)¶ A JoinByDistance Operation: This method joins two dataframes taking longitude and latitude values, one set from each dataframe, representing the location of the record to be joined. The coordinate values must be in CoordSysConstants.longLatWGS84 coordinate system. This method also takes a searchRadius, which is the buffer around the first point to search for the second point to be inside. The last parameter is a geohash precision that will be used within the calculation.
- Parameters
df1 (pyspark.sql.DataFrame) – The dataframe to join to
df2 (pyspark.sql.DataFrame) – The dataframe to be joined
df1Longitude (str) – The Longitude value from the first dataframe
df1Latitude (str) – The Latitude value from the first dataframe
df2Longitude (str) – The Longitude value from the second dataframe
df2Latitude (str) – The Latitude value from the second dataframe
searchRadius (float) – The buffer length around point 1 to search for point 2
distanceUnit (str) – Unit of measurement for searchRadius parameter.
geoHashPrecision (int) – The geohash precision value to be used for search, defaults to
7.options (dict) – A key/value map of DistanceJoinOption that apply to the join, defaults to
None.
- Returns
A dataframe that is the result of the join
- Return type
pyspark.sql.DataFrame
-
static
pointInPolygon(inputDF: pyspark.sql.DataFrame, tableFileType: str, tableFilePath: str, tableFileName: str, longitude: str, latitude: str, outputFields: list, downloadManager=None, libraries: Optional[str] = None, includeEmptySearchResults: bool = True)¶ A PointInPolygon Operation: This method filters the point coordinates in input dataframe which are within a specified polygon. (for example, the polygon of the continental USA). Adds output fields from polygon table to input dataset as columns.
- Parameters
inputDF (pyspark.sql.DataFrame) – dataframe of input dataset
tableFileType (str) – Type of target polygon data file (either TAB/shape/geodatabase)
tableFilePath (str) – Path to polygon data files
tableFileName (str) – Name of the TAB/shape/geodatabase file
longitude (str) – Name of column containing longitude values in input point data
latitude (str) – Name of column containing latitude values in input point data
outputFields (list) – The requested fields to be included in the output
downloadManager (DownloadManager) – DownloadManager instance to be used if data is present in S3 or HDFS, defaults to
None.libraries (str) – libraries in case of geodatabase tableFileType, defaults to
None.includeEmptySearchResults (bool) – if true then an empty search will keep the original input row and the new columns will be null and if false then an empty search will result in the row not appearing in the outputted DataFrame, defaults to
true
- Returns
input DataFrame appended with output fields as columns if point coordinates lie within specified polygon
- Return type
pyspark.sql.DataFrame
-
static
searchNearest(inputDF: pyspark.sql.DataFrame, tableFileType: str, tableFilePath: str, tableFileName: str, geometryStringType: str, geometryColumnName: str, outputFields: list, distanceValue: float, distanceUnit: str, distanceColumnName: str = 'distance', downloadManager=None, libraries: Optional[str] = None, maxCandidates: int = 1000, includeEmptySearchResults: bool = True)¶ A SearchNearest Operation: This method takes in a geometry string (either in GeoJSON, WKT, KML or WKB format) and searches for it in a table of geometries within a specified distance. Searched geometries counts can be limited by defining maxCandidates parameter. By default, geometries are listed from nearest to farthest.
- Parameters
inputDF (pyspark.sql.DataFrame) – dataframe of input dataset
tableFileType (str) – Type of target polygon data file (either TAB/shape/geodatabase)
tableFilePath (str) – Path to polygon data files
tableFileName (str) – Name of the TAB/shape/geodatabase file
geometryStringType (str) – Type of geometry string provided in input file. Supported values are WKT/GeoJSON/WKB/KML
geometryColumnName (str) – Name of column containing string representation of geometry
outputFields (list) – The requested fields to be included in the output
distanceValue (float) – The absolute value of distance from source geometry within which target geometries will be searched for.
distanceUnit (str) – Unit of measurement for distanceValue parameter. This same unit will also be used when appending distance column in output dataframe.
distanceColumnName (str) – Name of the distance column in output dataframe which indicates distance between source geometry and target geometry.
downloadManager (DownloadManager) – DownloadManager instance to be used if data is present in S3 or HDFS, defaults to
None.libraries (str) – libraries in case of geodatabase tableFileType, defaults to
None.maxCandidates (int) – Limits the count of target geometries to search for, defaults to
1000.includeEmptySearchResults (bool) – if true then an empty search will keep the original input row and the new columns will be null and if false then an empty search will result in the row not appearing in the outputted DataFrame, defaults to
true.
- Returns
input DataFrame appended with output fields as columns if distance between source geometry and target geometry is within distanceValue. Also, an additional column with name distanceColumnName is returned indicating the distance between source and target geometry and records are ordered by ascending value of this column.
- Return type
pyspark.sql.DataFrame
SQLRegistrator¶
DistanceJoinOption¶
-
class
li.DistanceJoinOption.DistanceJoinOption¶ Options for the distance join operations.
-
DistanceColumnName¶ Adds a column to the result dataframe that contains the distance calculated.
-
LimitMatches¶ Limits the number of joined results for each source dataframe record. The argument should be a number, and the match results will be limited to those that rank at the number or lower based on distance. Default is no limit.
-
LimitMethod¶ The method used for ranking matches. The argument should be a LimitMethods value
-
LimitMethods¶
DownloadManagerBuilder¶
-
class
li.DownloadManagerBuilder.DownloadManagerBuilder(downloadLocation=None, permissions=None)¶ This builder class to configure downloading from remote paths.
-
addDownloader(downloader)¶ Adds a configured download manager to use when downloading from remote paths. If multiple download managers claim to support a path, then the download manager added first will be used.
- Parameters
downloader – The downloader configuration for remote path.
- Returns
An object of DownloadManagerBuilder class
- Return type
-
build()¶ Returns the configured DownloadManager used when downloading from remote paths.
- Returns
An object of configured DownloadManager class
- Return type
-
S3Downloader¶
GoogleDownloader¶
-
class
li.GoogleDownloader.GoogleDownloader(hadoopConfiguration)¶ This Downloader class is used enabling the use of data located at Google Storage.
-
getDownloader()¶ Returns the object reference of Google Downloader responsible for downloading the data from Google Storage.
- Returns
An object of GoogleDownloader
- Return type
-
HDFSDownloader¶
LocalFilePassthroughDownloader¶
-
class
li.LocalFilePassthroughDownloader.LocalFilePassthroughDownloader¶ This Downloader class is used enabling the use of data located at Local File System.
-
getDownloader()¶ Returns the object reference of Local Downloader responsible for downloading the data from Local File System.
- Returns
An object of LocalFilePassthroughDownloader
- Return type
-