Map matching of taxi trajectory data under offline OSM data conditions

Map matching of taxi trajectory data under offline OSM data conditions

Preface: In the process of studying the road network matching of Python taxi GPS data (TransBigData + leuvenmapmatching) written by senior Xiaoxu of traffic data, I found that the osmnx library is difficult to obtain a complete road network of a city or a larger area due to network restrictions. network data. In order to solve this problem, I used the osm2gmns library to build a network and perform map matching, achieving the same effect as Senior Xiaoxu’s traffic data.

01 Obtaining OSM road network data

There are many ways to obtain OSM road network data, but it is difficult to obtain data for designated cities or provinces. Generally, it is either too large (country) or too small (urban area). Therefore, I first collect data in a larger range and then proceed. Processed by cropping method. The specific processing methods are as follows:
Log in to the OSM website, select [Geofabrik Download] (regularly updated excerpts for continents, countries and specific cities). Select the region you want to select and select .osm.pbf download.
OSM interface
Select download

02 Use osmconvert software to cut data

Get the shp file of the required boundaries before use, such as the regional scope of Shenzhen City. Make sure it is in WGS84 coordinate system

Generally this kind of data is easy to find, I recommend one,

Then write the following code: Build .poly file

Enter the .shp file and use the following code to generate the Shenzhen boundary file in .poly format, and then make sure it is in the same folder as osmconvert64-0.8.8p.exe.

import geopandas as gpd
import os

from shapely.geometry import Polygon, MultiPolygon

def write_poly(polygons, output_file):
    with open(output_file, "w") as f:
        c=1
        for polygon in polygons:
            if isinstance(polygon, Polygon):
                poly_ext = list(polygon.exterior.coords)
                f.write("{}\\
".format(c))
                f.write("1\\
")
                for coord in poly_ext:
                    f.write("\t{} {}\\
".format(coord[0], coord[1]))
                f.write("END\\
")
            elif isinstance(polygon, MultiPolygon):
                for poly in polygon.geoms:
                    poly_ext = list(poly.exterior.coords)
                    f.write("{}\\
".format(c))
                    f.write("1\\
")
                    for coord in poly_ext:
                        f.write("\t{} {}\\
".format(coord[0], coord[1]))
                    f.write("END\\
")
                    c + = 1
        f.write("END\\
")

shp_file = "Shenzhen City.shp"
output_file = "shenzhen_polyfile.poly"

gdf = gpd.read_file(shp_file)
crs_4326 = {<!-- -->"init": "epsg:4326"}
gdf = gdf.to_crs(crs_4326) # Convert coordinate system to WGS84 (EPSG:4326) if necessary
polygons = gdf.geometry
write_poly(polygons, output_file)

Use osmconvert to extract Shenzhen road network

First get the osmconvert software. There are many ways to get it. One is recommended. Then run osmconvert and enter the following interface osmconvert interface
Then enter a and enter the name of the previously downloaded .osm.pbf file (note: the file must also be in the same folder as the software) Insert picture description here
After pressing enter, select 4, and then enter the border area .poly file just prepared
After pressing Enter, you can get the processed .osm.pbf file that only contains the Shenzhen range.

03 Road network data processing

001 First convert the osm road network data in .pbf format into links and nodes

What is chosen here is the osm2gmns library, which can selectively clean the osm road network, select the type of road you want and build it into a road network. You can go to the official website of the library to learn about the specific parameters. It will output a link.csv and node.csv, which is the key to building a road network

import osm2gmns as og
net = og.getNetFromFile('shenzhen-latest.osm_01.pbf', network_types='auto',link_types={<!-- -->'motorway', 'trunk', ' primary', 'secondary', 'tertiary', 'residential', 'service', 'unclassified', 'connector'},strict_mode=False,combine=True,)
og.consolidateComplexIntersections(net, auto_identify=True)
og.outputNetToCSV(net)

002 Construction diagram

import pandas as pd
import networkx as nx

node_data = pd.read_csv('node.csv', encoding='gbk')
link_data = pd.read_csv('link.csv', encoding='gbk')

G = nx.Graph()
G.graph["crs"] = "epsg:4326"

for idx, row in node_data.iterrows():
    node_id = row['node_id']
    node_lon = row['x_coord']
    node_lat = row['y_coord']
    G.add_node(node_id, lon=node_lon, lat=node_lat)

for idx, row in link_data.iterrows():
    link_id = row['link_id']
    link_from = row['from_node_id']
    link_to = row['to_node_id']
    G.add_edge(link_from, link_to, id=link_id)

003 GPS data reading

The data here is obtained from the link shared in the road network matching of Python taxi GPS data (TransBigData + leuvenmapmatching) written by Xiaoxu, a traffic data senior. You can download it yourself.

import transbigdata as tbd
import pandas as pd
# Map matching package
from leuvenmapmatching.matcher.distance import DistanceMatcher
from leuvenmapmatching.map.inmem import InMemMap
from leuvenmapmatching import visualization as mmviz
 
#Read data
data = pd.read_csv('TaxiData-Sample.csv', header = None)
data.columns = ['VehicleNum','Time','Lng','Lat','OpenStatus','Speed']

#Extract OD and path GPS from GPS data
oddata = tbd.taxigps_to_od(data,col = ['VehicleNum','Time','Lng','Lat','OpenStatus'])
data_deliver,data_idle = tbd.taxigps_traj_point(data,oddata,col=['VehicleNum', 'Time', 'Lng', 'Lat', 'OpenStatus'])

004 Build the map network required for leuvenmapmatching (for map matching)

from leuvenmapmatching.map.inmem import InMemMap

#Create an InMemMap object
map_con = InMemMap(G, use_latlon=True, use_rtree=True, index_edges=True)

# Traverse link_data to add nodes and edges
for _, row in link_data.iterrows():
    link_id = row['link_id']
    link_from = row['from_node_id']
    link_to = row['to_node_id']

    #Extract the latitude and longitude of the node
    lat1, lon1 = G.nodes[link_from]['lat'], G.nodes[link_from]['lon']
    lat2, lon2 = G.nodes[link_to]['lat'], G.nodes[link_to]['lon']

    #Add nodes in map_con
    map_con.add_node(link_from, (lat1, lon1))
    map_con.add_node(link_to, (lat2, lon2))

    # Connect these two nodes
    map_con.add_edge(link_from, link_to)

# Clear unnecessary information to improve performance
map_con.purge()

005 Map Match

#Use transbigdata to extract travel trajectories
import geopandas as gpd
tmp_gdf = data_deliver[data_deliver['ID'] == 22].sort_values(by = 'Time')
#trajectorydensification
tmp_gdf = tbd.traj_densify(tmp_gdf,col = ['ID', 'Time', 'Lng', 'Lat'],timegap = 15)
#Convert the coordinate system of the trajectory to a geographical coordinate system
tmp_gdf['geometry'] = gpd.points_from_xy(tmp_gdf['Lng'],tmp_gdf['Lat'])
tmp_gdf = gpd.GeoDataFrame(tmp_gdf)
tmp_gdf.crs = {<!-- -->'init':'epsg:4326'}
#tmp_gdf = tmp_gdf.to_crs(2416)
#Get track points
path = list(zip(tmp_gdf.geometry.y, tmp_gdf.geometry.x))
#Build map matching tool
matcher = DistanceMatcher(map_con,
                          max_dist=500,
                          max_dist_init=170,
                          min_prob_norm=0.0001,
                        non_emitting_length_factor=0.95,
                        obs_noise=50,
                          obs_noise_ne=50,
                              dist_noise=50,
                              max_lattice_width=20,
                              non_emitting_states=True)
#Perform map matching
states, _ = matcher.match(path, unique=False)
#Draw basemap matching results
mmviz.plot_map(map_con, matcher=matcher,
               show_labels=False, show_matching=True,#show_graph=False,
               filename=None)

Obtain the following result chart

006 Get map matching path

Note: This method obtains the node number of each matched road segment through matcher.path_pred_onlynodes, named u. Since the road segments are connected before and after, the starting and ending node numbers of each road segment are obtained after misalignment and merging.

#This method obtains the node number of each matched road segment through matcher.path_pred_onlynodes, named u. Since the road segments are connected before and after, the starting and ending node numbers of each road segment are obtained after dislocation and merging.
pathdf = pd.DataFrame(matcher.path_pred_onlynodes,columns = ['u'])
pathdf['v'] = pathdf['u'].shift(-1)
pathdf = pathdf[-pathdf['v'].isnull()]
#Use pd.merge to match the start and end nodes of the link table to obtain the geographical information of the matching road segment
from shapely import wkt
# You need to ensure that the "geometry" column in the CSV file has stored geometry data in a suitable format (for example, in "Well-Known Text (WKT) format)
link_data['geometry'] = link_data['geometry'].apply(wkt.loads)
pathgdf = pd.merge(pathdf,link_data.reset_index(),left_on=['u','v'],right_on=['from_node_id','to_node_id'])
pathgdf = gpd.GeoDataFrame(pathgdf,geometry='geometry')#Generate Geopandas file
pathgdf.plot()

The running results are as follows: the matching road segments

007 Overlay road network visualization

First, we need to obtain the center points of all road network segments, so that we can later filter the segments to be displayed.

#link_data['geometry'] = link_data['geometry'].apply(wkt.loads)
#Get the center points of all line segments
from shapely.geometry import Point

# Create empty lists to store longitude and latitude values
lons = []
lats = []

# For each row in the table
for geometry in link_data['geometry']:
    # Calculate center point
    centroid = geometry.centroid

    # Add longitude and latitude values to the corresponding lists
    lons.append(centroid.x)
    lats.append(centroid.y)

# Add longitude and latitude values to the table
link_data['lon'] = lons
link_data['lat'] = lats

Then perform trajectory matching visualization

#Visualize with the road network
import matplotlib as mpl
import matplotlib.pyplot as plt
 
fig = plt.figure(1,(8,8),dpi = 100)
ax = plt.subplot(111)
plt.sca(ax)
fig.tight_layout(rect = (0.05,0.1,1,0.9))
#Set visual boundaries
bounds = pathgdf.unary_union.bounds
gap = 0.003
bounds = [bounds[0]-gap,bounds[1]-gap,bounds[2] + gap,bounds[3] + gap]
#Draw matching paths
pathgdf.plot(ax = ax,zorder = 1)
#Draw base map road network

link_data = gpd.GeoDataFrame(link_data,geometry='geometry')
tbd.clean_outofbounds(link_data,bounds,col = ['lon','lat']).plot(ax = ax,color = '#333',lw = 0.1)
#Draw GPS points
tmp_gdf.to_crs(4326).plot(ax = ax,color = 'r',markersize = 5,zorder = 2)
plt.axis('off')
plt.xlim(bounds[0],bounds[2])
plt.ylim(bounds[1],bounds[3])
plt.show()

Finally, I would like to express my gratitude to the predecessors for their selfless sharing!