Map matching of taxi trajectory data under offline OSM data conditions
Preface: In the process of studying the road network matching of Python taxi GPS data (TransBigData + leuvenmapmatching) written by senior Xiaoxu of traffic data, I found that the osmnx library is difficult to obtain a complete road network of a city or a larger area due to network restrictions. network data. In order to solve this problem, I used the osm2gmns library to build a network and perform map matching, achieving the same effect as Senior Xiaoxu’s traffic data.
01 Obtaining OSM road network data
There are many ways to obtain OSM road network data, but it is difficult to obtain data for designated cities or provinces. Generally, it is either too large (country) or too small (urban area). Therefore, I first collect data in a larger range and then proceed. Processed by cropping method. The specific processing methods are as follows:
Log in to the OSM website, select [Geofabrik Download] (regularly updated excerpts for continents, countries and specific cities). Select the region you want to select and select .osm.pbf download.
02 Use osmconvert software to cut data
Get the shp file of the required boundaries before use, such as the regional scope of Shenzhen City. Make sure it is in WGS84 coordinate system
Generally this kind of data is easy to find, I recommend one,
Then write the following code: Build .poly file
Enter the .shp file and use the following code to generate the Shenzhen boundary file in .poly format, and then make sure it is in the same folder as osmconvert64-0.8.8p.exe.
import geopandas as gpd import os from shapely.geometry import Polygon, MultiPolygon def write_poly(polygons, output_file): with open(output_file, "w") as f: c=1 for polygon in polygons: if isinstance(polygon, Polygon): poly_ext = list(polygon.exterior.coords) f.write("{}\\ ".format(c)) f.write("1\\ ") for coord in poly_ext: f.write("\t{} {}\\ ".format(coord[0], coord[1])) f.write("END\\ ") elif isinstance(polygon, MultiPolygon): for poly in polygon.geoms: poly_ext = list(poly.exterior.coords) f.write("{}\\ ".format(c)) f.write("1\\ ") for coord in poly_ext: f.write("\t{} {}\\ ".format(coord[0], coord[1])) f.write("END\\ ") c + = 1 f.write("END\\ ") shp_file = "Shenzhen City.shp" output_file = "shenzhen_polyfile.poly" gdf = gpd.read_file(shp_file) crs_4326 = {<!-- -->"init": "epsg:4326"} gdf = gdf.to_crs(crs_4326) # Convert coordinate system to WGS84 (EPSG:4326) if necessary polygons = gdf.geometry write_poly(polygons, output_file)
Use osmconvert to extract Shenzhen road network
First get the osmconvert software. There are many ways to get it. One is recommended. Then run osmconvert and enter the following interface
Then enter a and enter the name of the previously downloaded .osm.pbf file (note: the file must also be in the same folder as the software)
After pressing enter, select 4, and then enter the border area .poly file just prepared
After pressing Enter, you can get the processed .osm.pbf file that only contains the Shenzhen range.
03 Road network data processing
001 First convert the osm road network data in .pbf format into links and nodes
What is chosen here is the osm2gmns library, which can selectively clean the osm road network, select the type of road you want and build it into a road network. You can go to the official website of the library to learn about the specific parameters. It will output a link.csv and node.csv, which is the key to building a road network
import osm2gmns as og net = og.getNetFromFile('shenzhen-latest.osm_01.pbf', network_types='auto',link_types={<!-- -->'motorway', 'trunk', ' primary', 'secondary', 'tertiary', 'residential', 'service', 'unclassified', 'connector'},strict_mode=False,combine=True,) og.consolidateComplexIntersections(net, auto_identify=True) og.outputNetToCSV(net)
002 Construction diagram
import pandas as pd import networkx as nx node_data = pd.read_csv('node.csv', encoding='gbk') link_data = pd.read_csv('link.csv', encoding='gbk') G = nx.Graph() G.graph["crs"] = "epsg:4326" for idx, row in node_data.iterrows(): node_id = row['node_id'] node_lon = row['x_coord'] node_lat = row['y_coord'] G.add_node(node_id, lon=node_lon, lat=node_lat) for idx, row in link_data.iterrows(): link_id = row['link_id'] link_from = row['from_node_id'] link_to = row['to_node_id'] G.add_edge(link_from, link_to, id=link_id)
003 GPS data reading
The data here is obtained from the link shared in the road network matching of Python taxi GPS data (TransBigData + leuvenmapmatching) written by Xiaoxu, a traffic data senior. You can download it yourself.
import transbigdata as tbd import pandas as pd # Map matching package from leuvenmapmatching.matcher.distance import DistanceMatcher from leuvenmapmatching.map.inmem import InMemMap from leuvenmapmatching import visualization as mmviz #Read data data = pd.read_csv('TaxiData-Sample.csv', header = None) data.columns = ['VehicleNum','Time','Lng','Lat','OpenStatus','Speed'] #Extract OD and path GPS from GPS data oddata = tbd.taxigps_to_od(data,col = ['VehicleNum','Time','Lng','Lat','OpenStatus']) data_deliver,data_idle = tbd.taxigps_traj_point(data,oddata,col=['VehicleNum', 'Time', 'Lng', 'Lat', 'OpenStatus'])
004 Build the map network required for leuvenmapmatching (for map matching)
from leuvenmapmatching.map.inmem import InMemMap #Create an InMemMap object map_con = InMemMap(G, use_latlon=True, use_rtree=True, index_edges=True) # Traverse link_data to add nodes and edges for _, row in link_data.iterrows(): link_id = row['link_id'] link_from = row['from_node_id'] link_to = row['to_node_id'] #Extract the latitude and longitude of the node lat1, lon1 = G.nodes[link_from]['lat'], G.nodes[link_from]['lon'] lat2, lon2 = G.nodes[link_to]['lat'], G.nodes[link_to]['lon'] #Add nodes in map_con map_con.add_node(link_from, (lat1, lon1)) map_con.add_node(link_to, (lat2, lon2)) # Connect these two nodes map_con.add_edge(link_from, link_to) # Clear unnecessary information to improve performance map_con.purge()
005 Map Match
#Use transbigdata to extract travel trajectories import geopandas as gpd tmp_gdf = data_deliver[data_deliver['ID'] == 22].sort_values(by = 'Time') #trajectorydensification tmp_gdf = tbd.traj_densify(tmp_gdf,col = ['ID', 'Time', 'Lng', 'Lat'],timegap = 15) #Convert the coordinate system of the trajectory to a geographical coordinate system tmp_gdf['geometry'] = gpd.points_from_xy(tmp_gdf['Lng'],tmp_gdf['Lat']) tmp_gdf = gpd.GeoDataFrame(tmp_gdf) tmp_gdf.crs = {<!-- -->'init':'epsg:4326'} #tmp_gdf = tmp_gdf.to_crs(2416) #Get track points path = list(zip(tmp_gdf.geometry.y, tmp_gdf.geometry.x)) #Build map matching tool matcher = DistanceMatcher(map_con, max_dist=500, max_dist_init=170, min_prob_norm=0.0001, non_emitting_length_factor=0.95, obs_noise=50, obs_noise_ne=50, dist_noise=50, max_lattice_width=20, non_emitting_states=True) #Perform map matching states, _ = matcher.match(path, unique=False) #Draw basemap matching results mmviz.plot_map(map_con, matcher=matcher, show_labels=False, show_matching=True,#show_graph=False, filename=None)
Obtain the following result chart
006 Get map matching path
Note: This method obtains the node number of each matched road segment through matcher.path_pred_onlynodes, named u. Since the road segments are connected before and after, the starting and ending node numbers of each road segment are obtained after misalignment and merging.
#This method obtains the node number of each matched road segment through matcher.path_pred_onlynodes, named u. Since the road segments are connected before and after, the starting and ending node numbers of each road segment are obtained after dislocation and merging. pathdf = pd.DataFrame(matcher.path_pred_onlynodes,columns = ['u']) pathdf['v'] = pathdf['u'].shift(-1) pathdf = pathdf[-pathdf['v'].isnull()] #Use pd.merge to match the start and end nodes of the link table to obtain the geographical information of the matching road segment from shapely import wkt # You need to ensure that the "geometry" column in the CSV file has stored geometry data in a suitable format (for example, in "Well-Known Text (WKT) format) link_data['geometry'] = link_data['geometry'].apply(wkt.loads) pathgdf = pd.merge(pathdf,link_data.reset_index(),left_on=['u','v'],right_on=['from_node_id','to_node_id']) pathgdf = gpd.GeoDataFrame(pathgdf,geometry='geometry')#Generate Geopandas file pathgdf.plot()
The running results are as follows: the matching road segments
007 Overlay road network visualization
First, we need to obtain the center points of all road network segments, so that we can later filter the segments to be displayed.
#link_data['geometry'] = link_data['geometry'].apply(wkt.loads) #Get the center points of all line segments from shapely.geometry import Point # Create empty lists to store longitude and latitude values lons = [] lats = [] # For each row in the table for geometry in link_data['geometry']: # Calculate center point centroid = geometry.centroid # Add longitude and latitude values to the corresponding lists lons.append(centroid.x) lats.append(centroid.y) # Add longitude and latitude values to the table link_data['lon'] = lons link_data['lat'] = lats
Then perform trajectory matching visualization
#Visualize with the road network import matplotlib as mpl import matplotlib.pyplot as plt fig = plt.figure(1,(8,8),dpi = 100) ax = plt.subplot(111) plt.sca(ax) fig.tight_layout(rect = (0.05,0.1,1,0.9)) #Set visual boundaries bounds = pathgdf.unary_union.bounds gap = 0.003 bounds = [bounds[0]-gap,bounds[1]-gap,bounds[2] + gap,bounds[3] + gap] #Draw matching paths pathgdf.plot(ax = ax,zorder = 1) #Draw base map road network link_data = gpd.GeoDataFrame(link_data,geometry='geometry') tbd.clean_outofbounds(link_data,bounds,col = ['lon','lat']).plot(ax = ax,color = '#333',lw = 0.1) #Draw GPS points tmp_gdf.to_crs(4326).plot(ax = ax,color = 'r',markersize = 5,zorder = 2) plt.axis('off') plt.xlim(bounds[0],bounds[2]) plt.ylim(bounds[1],bounds[3]) plt.show()
Finally, I would like to express my gratitude to the predecessors for their selfless sharing!