Classes¶

This section describes the Classes and APIs used in Location Intelligence SDK for Big Data.

SpatialAPI¶

class li.SpatialAPI.SpatialAPI¶

This class contains all the supported operations methods for spatial operations.

Supported spatial operations:

PointInPolygon
SearchNearest
JoinByDistance
GenerateHexagon

static generateHexagon(sparkSession: pyspark.sql.SparkSession, minLongitude: float, minLatitude: float, maxLongitude: float, maxLatitude: float, hexLevel: int = 1, containerLevel: int = 1, numOfPartitions: int = 1, maximumNumOfRowsPerPartition: int = 1)¶

A HexagonGeneration Operation: This method generates the hexagons within a bounding box defined by minimum and maximum value of longitude and latitude Hexagon output can be used for map display.

Parameters

sparkSession (pyspark.sql.SparkSession) – Spark session to be used
minLongitude (float) – Minimum longitude value of the bounding box for which hexagons needs to be generated
minLatitude (float) – Minimum latitude value of the bounding box for which hexagons needs to be generated
maxLongitude (float) – Maximum longitude value of the bounding box for which hexagons needs to be generated
maxLatitude (float) – Maximum latitude value of the bounding box for which hexagons needs to be generated
hexLevel (int) – The level to generate hexagons for. Must be between 1 and 11, defaults to 1
containerLevel (int) – A hint for providing some parallel hexagon generation. Must be less than the hexLevel parameter, defaults to 1
numOfPartitions (int) – Number of partitions, defaults to 1
maximumNumOfRowsPerPartition (int) – Maximum number of rows per partition, defaults to 1

Returns

A dataframe representing the hexagons in WKT format

Return type

pyspark.sql.DataFrame

static joinByDistance(df1: pyspark.sql.DataFrame, df2: pyspark.sql.DataFrame, df1Longitude: str, df1Latitude: str, df2Longitude: str, df2Latitude: str, searchRadius: float, distanceUnit: str, geoHashPrecision: int = 7, options: Optional[dict] = None)¶

A JoinByDistance Operation: This method joins two dataframes taking longitude and latitude values, one set from each dataframe, representing the location of the record to be joined. The coordinate values must be in CoordSysConstants.longLatWGS84 coordinate system. This method also takes a searchRadius, which is the buffer around the first point to search for the second point to be inside. The last parameter is a geohash precision that will be used within the calculation.

Parameters

df1 (pyspark.sql.DataFrame) – The dataframe to join to
df2 (pyspark.sql.DataFrame) – The dataframe to be joined
df1Longitude (str) – The Longitude value from the first dataframe
df1Latitude (str) – The Latitude value from the first dataframe
df2Longitude (str) – The Longitude value from the second dataframe
df2Latitude (str) – The Latitude value from the second dataframe
searchRadius (float) – The buffer length around point 1 to search for point 2
distanceUnit (str) – Unit of measurement for searchRadius parameter.
geoHashPrecision (int) – The geohash precision value to be used for search, defaults to 7.
options (dict) – A key/value map of DistanceJoinOption that apply to the join, defaults to None.

Returns

A dataframe that is the result of the join

Return type

pyspark.sql.DataFrame

static pointInPolygon(inputDF: pyspark.sql.DataFrame, tableFileType: str, tableFilePath: str, tableFileName: str, longitude: str, latitude: str, outputFields: list, downloadManager=None, libraries: Optional[str] = None, includeEmptySearchResults: bool = True)¶

A PointInPolygon Operation: This method filters the point coordinates in input dataframe which are within a specified polygon. (for example, the polygon of the continental USA). Adds output fields from polygon table to input dataset as columns.

Parameters

inputDF (pyspark.sql.DataFrame) – dataframe of input dataset
tableFileType (str) – Type of target polygon data file (either TAB/shape/geodatabase)
tableFilePath (str) – Path to polygon data files
tableFileName (str) – Name of the TAB/shape/geodatabase file
longitude (str) – Name of column containing longitude values in input point data
latitude (str) – Name of column containing latitude values in input point data
outputFields (list) – The requested fields to be included in the output
downloadManager (DownloadManager) – DownloadManager instance to be used if data is present in S3 or HDFS, defaults to None.
libraries (str) – libraries in case of geodatabase tableFileType, defaults to None.
includeEmptySearchResults (bool) – if true then an empty search will keep the original input row and the new columns will be null and if false then an empty search will result in the row not appearing in the outputted DataFrame, defaults to true

Returns

input DataFrame appended with output fields as columns if point coordinates lie within specified polygon

Return type

pyspark.sql.DataFrame

static searchNearest(inputDF: pyspark.sql.DataFrame, tableFileType: str, tableFilePath: str, tableFileName: str, geometryStringType: str, geometryColumnName: str, outputFields: list, distanceValue: float, distanceUnit: str, distanceColumnName: str = 'distance', downloadManager=None, libraries: Optional[str] = None, maxCandidates: int = 1000, includeEmptySearchResults: bool = True)¶

A SearchNearest Operation: This method takes in a geometry string (either in GeoJSON, WKT, KML or WKB format) and searches for it in a table of geometries within a specified distance. Searched geometries counts can be limited by defining maxCandidates parameter. By default, geometries are listed from nearest to farthest.

Parameters

inputDF (pyspark.sql.DataFrame) – dataframe of input dataset
tableFileType (str) – Type of target polygon data file (either TAB/shape/geodatabase)
tableFilePath (str) – Path to polygon data files
tableFileName (str) – Name of the TAB/shape/geodatabase file
geometryStringType (str) – Type of geometry string provided in input file. Supported values are WKT/GeoJSON/WKB/KML
geometryColumnName (str) – Name of column containing string representation of geometry
outputFields (list) – The requested fields to be included in the output
distanceValue (float) – The absolute value of distance from source geometry within which target geometries will be searched for.
distanceUnit (str) – Unit of measurement for distanceValue parameter. This same unit will also be used when appending distance column in output dataframe.
distanceColumnName (str) – Name of the distance column in output dataframe which indicates distance between source geometry and target geometry.
downloadManager (DownloadManager) – DownloadManager instance to be used if data is present in S3 or HDFS, defaults to None.
libraries (str) – libraries in case of geodatabase tableFileType, defaults to None.
maxCandidates (int) – Limits the count of target geometries to search for, defaults to 1000.
includeEmptySearchResults (bool) – if true then an empty search will keep the original input row and the new columns will be null and if false then an empty search will result in the row not appearing in the outputted DataFrame, defaults to true.

Returns

input DataFrame appended with output fields as columns if distance between source geometry and target geometry is within distanceValue. Also, an additional column with name distanceColumnName is returned indicating the distance between source and target geometry and records are ordered by ascending value of this column.

Return type

pyspark.sql.DataFrame

SQLRegistrator¶

class li.SQLRegistrator.SQLRegistrator¶

static registerAll()¶

Registers the pre-defined LI SQL UDF and UDT functions to execute the SQL operations.

Param: None
Returns: None
Return type: None

DistanceJoinOption¶

class li.DistanceJoinOption.DistanceJoinOption¶

Options for the distance join operations.

DistanceColumnName¶: Adds a column to the result dataframe that contains the distance calculated.

LimitMatches¶: Limits the number of joined results for each source dataframe record. The argument should be a number, and the match results will be limited to those that rank at the number or lower based on distance. Default is no limit.

LimitMethod¶: The method used for ranking matches. The argument should be a LimitMethods value

LimitMethods¶

class li.LimitMethods.LimitMethods¶

LimitMethods options for providing value in DistanceJoinOption.LimitMethod

DenseRank¶: A DenseRank window function for limiting matches.

Rank¶: A Rank window function for limiting matches.

RowNumber¶: A RowNumber window function for limiting matches.

DownloadManagerBuilder¶

class li.DownloadManagerBuilder.DownloadManagerBuilder(downloadLocation=None, permissions=None)¶

This builder class to configure downloading from remote paths.

addDownloader(downloader)¶

Adds a configured download manager to use when downloading from remote paths. If multiple download managers claim to support a path, then the download manager added first will be used.

Parameters: downloader – The downloader configuration for remote path.
Returns: An object of DownloadManagerBuilder class
Return type: DownloadManagerBuilder

build()¶

Returns the configured DownloadManager used when downloading from remote paths.

Returns: An object of configured DownloadManager class
Return type: DownloadManager

S3Downloader¶

class li.S3Downloader.S3Downloader(hadoopConfiguration)¶

This Downloader class is used enabling the use of data located at S3.

getDownloader()¶

Returns the object reference of S3 Downloader responsible for downloading the data from S3.

Returns: An object of S3Downloader
Return type: S3Downloader

GoogleDownloader¶

class li.GoogleDownloader.GoogleDownloader(hadoopConfiguration)¶

This Downloader class is used enabling the use of data located at Google Storage.

getDownloader()¶

Returns the object reference of Google Downloader responsible for downloading the data from Google Storage.

Returns: An object of GoogleDownloader
Return type: GoogleDownloader

HDFSDownloader¶

class li.HDFSDownloader.HDFSDownloader(hadoopConfiguration)¶

This Downloader class is used enabling the use of data located at HDFS.

getDownloader()¶

Returns the object reference of HDFS Downloader responsible for downloading the data from HDFS.

Returns: An object of HDFSDownloader
Return type: HDFSDownloader

LocalFilePassthroughDownloader¶

class li.LocalFilePassthroughDownloader.LocalFilePassthroughDownloader¶

This Downloader class is used enabling the use of data located at Local File System.

getDownloader()¶

Returns the object reference of Local Downloader responsible for downloading the data from Local File System.

Returns: An object of LocalFilePassthroughDownloader
Return type: LocalFilePassthroughDownloader

HadoopConfiguration¶

class li.HadoopConfiguration.HadoopConfiguration¶

This configuration class allows you to create HadoopConfiguration.

getHadoopConfiguration()¶

Returns the wrapped HadoopConfiguration class.

Returns: An object of HadoopConfiguration
Return type: HadoopConfiguration

Classes¶

SpatialAPI¶

SQLRegistrator¶

DistanceJoinOption¶

LimitMethods¶

DownloadManagerBuilder¶

S3Downloader¶

GoogleDownloader¶

HDFSDownloader¶

LocalFilePassthroughDownloader¶

HadoopConfiguration¶

Table of Contents

Previous topic