R is amazing at geospatial analysis. Most geospatial objects import easily. Some objects from some formats do not. This post is about importing a GPS track from the Keyhole Markup Language (KML) that would not import using sf::st_read.
Background
KML files are a special Extensible Markup Language (XML) files invented by Google to store geospatial information (points, lines, polygons, etc.). KML files, like all XML files, are simply text files. They contain markup tags to identify and store a hierarchical tree-like structure of data. While the virtues of KML and general structure of XML are beyond the scope of this post, I can say that one key feature is that they are readible and editable if you know what you are doing.
Recently, I received a KML file containing GPS tracks collected by and exported from the navigation system of an airplane. R function sf::st_read would not read the tracks and I was at a loss. When sf::st_read() failed to read the track in my KML file, I reluctantly opened it in a text editor and had a look around. Without much trouble, I found tags for a waypoint inside the KML in addition to the track. sf::st_read() read this waypoint just fine; but, sf::st_read() completely ignored ~1500 other points that were part of the GPS track.
I am no expert on KML or XML. I do not fully understand namespaces defined inside the XML. Luckily, my training at Google University allowed me to learn that my track was stored as a gx:Track object (inside the KML). I learned this because I found the <gx:Track> tag embedded inside a <Placeholder> tag. This is what the KML contained (note the <gx:Track> tag on line 16):
If coordinates of the track were embeded in a <Linestring> tag and reformated a bit, sf::st_read() would have read them. I learned this by inspecting a KML file that correctly imported into R as a line. But, format of the coordinates for Linestring is different than that for gx:Track. Linestring coordinates are comma separated (e.g., -139.668256,59.508059,23.2) and are embedded in <coordinates> tag, while gx:Track coordinates are separated by spaces (e.g., -139.668256 59.508059 23.2 as above).
My Solution
I read the KML file as XML, extracted geographic coordinates, extracted time stamps, made a data frame, and converted the data frame to a sf LINESTRING object using sfheaders. The difficult part for me was finding the <gx:coord> and <when> tags in the XML tree. This was difficult only because I am an sophomoric XML user. The key was realizing I either had to strip the namespaces from the XML (using xml2::xml_ns_strip()) or include namespaces in references to nodes (using <ns>:<nodeName>).
Here is the function I wrote to import gx:Track into R as either LINESTRING or POINT objects. The xml2, dplyr, sf, and sfheaders packages are required.
#' @title extractTrackFromKML - Extract GPS track from KML file.
#'
#' @description
#' Read a KML file, extract `gx:Track` objects, and convert them into
#' `sf` either LINESTRING or POINT objects.
#'
#' @param kmlFile Path to the KML file to parse.
#'
#' @param crs The EPSG code for the Coordinate Reference System to use.
#'
#' @param to Format of the returned 'sf' object. Either 'line' or 'points'.
#' 'line' returns a LINESTRING object without timestamps on individual points.
#' 'points' returns a POINT object with timestamps as attributes.
#'
#' @details
#' It is assumed that all coordinates in the KML are decimal latitude-longitude.
#'
#' @return Either an 'sf' LINESTRING or POINT object. LINESTRING objects only
#' have start and stop times. POINT objects have time stamps as attributes.
#'
#' @examples
#'
#' kmlFile <- "example.kml"
#' kmlLine <- extractTrackFromKML(kmlFile, "line")
#' kmlPoints <- extractTrackFromKML(kmlFile, "points")
#'
#'
#' @export
#' @importFrom magrittr %>%
extractTrackFromKML <- function( kmlFile
, to = "line"
, crs = 4326){
xmlDoc <- xml2::read_xml(kmlFile)
whenNodes <- xml2::xml_find_all(xmlDoc, "/d1:kml/d1:Document/d1:Placemark/gx:Track/d1:when")
coordsNodes <- xml2::xml_find_all(xmlDoc, "/d1:kml/d1:Document/d1:Placemark/gx:Track/gx:coord")
whenNodes <- xml2::xml_text(whenNodes)
coordsNodes <- xml2::xml_text(coordsNodes)
whenNodes <- data.frame(time = whenNodes) %>%
dplyr::mutate(time = as.POSIXct(time, format = "%Y-%m-%dT%H:%M:%S"))
coordsNodes <- data.frame(coords=coordsNodes) %>%
tidyr::separate(coords, into = c("lon", "lat", "elev_ft"), sep = " ") %>%
dplyr::mutate(dplyr::across(dplyr::everything(), as.numeric))
trackDf <- dplyr::bind_cols(coordsNodes, whenNodes)
if( to == "line" ){
minTime <- min(trackDf$time)
maxTime <- max(trackDf$time)
trackDf <- trackDf %>%
sfheaders::sf_linestring( x = "lon"
, y="lat"
, z="elev_ft" ) %>%
dplyr::mutate(startTime = minTime
, endTime = maxTime
)
} else {
trackDf <- trackDf %>%
sfheaders::sf_point( x = "lon"
, y="lat"
, z="elev_ft"
, keep = TRUE)
}
trackDf <- trackDf %>%
sf::st_set_crs(crs)
}
After sourcing that function, or incorporating it into a package, these are the results it gives on an example KML file.
Simple feature collection with 16858 features and 1 field
Geometry type: POINT
Dimension: XYZ
Bounding box: xmin: -145.9868 ymin: 59.99269 xmax: -143.6651 ymax: 60.51434
z_range: zmin: 13.7 zmax: 518.6
Geodetic CRS: WGS 84
First 10 features:
time geometry
1 2023-06-01 17:41:42 POINT Z (-145.4756 60.49422...
2 2023-06-01 17:41:52 POINT Z (-145.4756 60.49422...
3 2023-06-01 17:43:04 POINT Z (-145.4756 60.49421...
4 2023-06-01 17:43:05 POINT Z (-145.4756 60.49421...
5 2023-06-01 17:43:06 POINT Z (-145.4756 60.49421...
6 2023-06-01 17:43:08 POINT Z (-145.4756 60.49423...
7 2023-06-01 17:43:09 POINT Z (-145.4756 60.49423...
8 2023-06-01 17:43:10 POINT Z (-145.4756 60.49423...
9 2023-06-01 17:43:11 POINT Z (-145.4756 60.49423...
10 2023-06-01 17:43:12 POINT Z (-145.4756 60.49423...
plot(kmlPoints$geometry , col=rainbow(300) , main ="Points from KML" , cex.main =3)
If there is a more direct solution, for example if sf::st_read will work somehow, please send me an email.