This repository has been archived by the owner on Oct 14, 2022. It is now read-only.
planetdiff
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
parent directory.. | ||||
planetdiff ========== Generates a file containing the differences between two planet.osm dumps. The program supports .gz and .bz2 compressed files transparently. It also runs an internal version of UTF8sanitizer on the input data so that it can be used on a file downloaded from http://planet.openstreeetmap.org without needing any other manipulation. Note that the algorithm used relies on the strict ordering of data in the planet.osm file to operate correctly. Data produced by other OSM tools normally do not follow these rules and can not be manipulated using this program. Build requirements ------------------ The code relies on the libraries below: libxml2 bzip2 zlib To compile this code on Fedora you need at least the following packages installed: libxml2-devel bzip2-devel zlib-devel Compiling --------- On a Linux or other Unix-like system: $ make This will produce both planetdiff and planetpatch (as below). Data ordering rules ------------------- The input OSM file must obey the following rules to work with the current algorithms. The planet.osm export script used to generate the planet.osm dumps does conform to these rules (whether by accident or design). - The OSM file must be generated in node, segment, way order. - The ID of each object of a given type (e.g. nodes) must be increasing. Diff file format ---------------- The diff format is an XML file containing OSM objects to delete and add. Objects which are modified have both a delete and add section. The format of each section is a copy of the OSM object from the planet.osm file. <?xml version="1.0" encoding="UTF-8"?> <planetdiff version="0.1" generator="OpenStreetMap planetdiff" from="a.osm" to="b.osm"> <add> <node id="10310557" timestamp="2006-07-10 23:17:35" lat="51.7670078090236" lon="-0.471281873153888"> <tag k="created_by" v="JOSM"/> </node> </add> <add> <node id="13602100" timestamp="2006-08-16 00:02:13" lat="51.778541285096" lon="-0.448173637230418"/> </add> <delete> <node id="26983956" lat="51.77874880458334" lon="-0.450481106821043"> <tag k="created_by" v="JOSM"/> </node> </delete> <add> <node id="26983956" lat="51.77874880458334" lon="-0.450481106821043"> <tag k="created_by" v="JOSMXX"/> </node> </add> ... </planetdiff> See example-diff.xml for an example file. Example usage: -------------- This example shows how the tool can be used to extract the differences between two planet.osm dumps. The errors below are from the UTF8sanitizer code and can be ignored. $ planetdiff planet-070307.osm.bz2 planet-070321.osm.bz2 > delta2.xml Processing: node(8420k) Processing: segment(0k)Error at line 29333138 Error at line 29334932 Error at line 29334990 Error at line 29334990 Error at line 29334994 Error at line 29334994 Error at line 29336882 Error at line 29337279 Error at line 29337338 Error at line 29337351 Processing: segment(8830k) Processing: way(370k)Error at line 72505269 Error at line 73944573 Processing: way(380k)Error at line 74022760 Error at line 72583739 Processing: way(430k) $ bzip2 -c delta2.xml > delta2.xml.bz2 $ ls -l -rw-rw-r-- 1 jburgess jburgess 10732308 Apr 6 11:41 delta2.xml.bz2 -rw-rw-r-- 1 jburgess jburgess 147704026 Apr 6 03:50 delta2.xml -rw-rw-r-- 1 jburgess jburgess 186168637 Mar 7 20:21 planet-070307.osm.bz2 -rw-rw-r-- 1 jburgess jburgess 193761852 Mar 22 19:24 planet-070321.osm.bz2 The planet.osm file can be regenerated using planetpatch below. The compressed diff file is only 10MB which is a much smaller download than a whole new planet.osm dump. planetpatch =========== Generates a new planet.osm file by applying a differences file created by planetdiff to an existing file. Example usage: -------------- The patch file generated by the planetdiff example above is used to regenerate the planet-070321.osm file: $ time planetpatch planet-070307.osm.bz2 delta2.xml > regen.xml Processing: node(8420k) Processing: segment(8830k) Processing: way(370k)Error at line 72505269 Processing: way(380k)Error at line 72583739 Processing: way(430k) real 19m54.654s user 12m59.771s sys 3m35.929s The output file, in this case 'regen.xml' should now be the same as an uncompressed and UTF8sanitized version of planet-070321.osm.bz2 Verification ------------ To verify that this is equal to the new planet.osm file we can compare it to a previously generated UTF8sanitized version of the same file. 'cmp -l' reports every single byte of difference between the files (in octal) $ cmp -l planet-070321a.osm regen.xml 1403627544 11 40 3457266276 11 40 It seems that the process has converted the tab character (ASCII 9) to space (32). Other than these two character differences the generated output is identical to the original version of the new planet.osm. This seems close enough to be useable right now.