Skip to content
This repository has been archived by the owner on Oct 14, 2022. It is now read-only.

Latest commit

 

History

History

planetdiff

planetdiff
==========
Generates a file containing the differences between two planet.osm dumps.

The program supports .gz and .bz2 compressed files transparently.
It also runs an internal version of UTF8sanitizer on the input data so
that it can be used on a file downloaded from 
http://planet.openstreeetmap.org without needing any other manipulation.

Note that the algorithm used relies on the strict ordering of data in the 
planet.osm file to operate correctly. Data produced by other OSM tools
normally do not follow these rules and can not be manipulated using this
program. 

Build requirements
------------------
The code relies on the libraries below:

 libxml2
 bzip2
 zlib

To compile this code on Fedora you need at least the following packages 
installed:

 libxml2-devel
 bzip2-devel
 zlib-devel


Compiling
---------
On a Linux or other Unix-like system:

 $ make

This will produce both planetdiff and planetpatch (as below).

Data ordering rules
-------------------
The input OSM file must obey the following rules to work with the current
algorithms. The planet.osm export script used to generate the planet.osm
dumps does conform to these rules (whether by accident or design).

- The OSM file must be generated in node, segment, way order.
- The ID of each object of a given type (e.g. nodes) must be increasing.


Diff file format
----------------
The diff format is an XML file containing OSM objects to delete and add.
Objects which are modified have both a delete and add section. The format
of each section is a copy of the OSM object from the planet.osm file.

<?xml version="1.0" encoding="UTF-8"?>
<planetdiff version="0.1" generator="OpenStreetMap planetdiff" from="a.osm" to="b.osm">
  <add>
    <node id="10310557" timestamp="2006-07-10 23:17:35" lat="51.7670078090236" lon="-0.471281873153888">
      <tag k="created_by" v="JOSM"/>
    </node>
  </add>
  <add>
    <node id="13602100" timestamp="2006-08-16 00:02:13" lat="51.778541285096" lon="-0.448173637230418"/>
  </add>
  <delete>
    <node id="26983956" lat="51.77874880458334" lon="-0.450481106821043">
      <tag k="created_by" v="JOSM"/>
    </node>
  </delete>
  <add>
    <node id="26983956" lat="51.77874880458334" lon="-0.450481106821043">
      <tag k="created_by" v="JOSMXX"/>
    </node>
  </add>
...
</planetdiff>


See example-diff.xml for an example file.


Example usage:
--------------
This example shows how the tool can be used to extract the differences
between two planet.osm dumps. The errors below are from the UTF8sanitizer 
code and can be ignored.


$ planetdiff planet-070307.osm.bz2 planet-070321.osm.bz2 > delta2.xml

Processing: node(8420k)
Processing: segment(0k)Error at line 29333138
Error at line 29334932
Error at line 29334990
Error at line 29334990
Error at line 29334994
Error at line 29334994
Error at line 29336882
Error at line 29337279
Error at line 29337338
Error at line 29337351
Processing: segment(8830k)
Processing: way(370k)Error at line 72505269
Error at line 73944573
Processing: way(380k)Error at line 74022760
Error at line 72583739
Processing: way(430k) 

$ bzip2 -c delta2.xml > delta2.xml.bz2
$ ls -l 
-rw-rw-r-- 1 jburgess jburgess   10732308 Apr  6 11:41 delta2.xml.bz2
-rw-rw-r-- 1 jburgess jburgess  147704026 Apr  6 03:50 delta2.xml
-rw-rw-r-- 1 jburgess jburgess  186168637 Mar  7 20:21 planet-070307.osm.bz2
-rw-rw-r-- 1 jburgess jburgess  193761852 Mar 22 19:24 planet-070321.osm.bz2

The planet.osm file can be regenerated using planetpatch below. The compressed 
diff file is only 10MB which is a much smaller download than a whole new planet.osm 
dump.



planetpatch
===========
Generates a new planet.osm file by applying a differences file created by 
planetdiff to an existing file.


Example usage:
--------------
The patch file generated by the planetdiff example above is used to
regenerate the planet-070321.osm file:

$ time planetpatch planet-070307.osm.bz2 delta2.xml > regen.xml

Processing: node(8420k)
Processing: segment(8830k)
Processing: way(370k)Error at line 72505269
Processing: way(380k)Error at line 72583739
Processing: way(430k)

real    19m54.654s
user    12m59.771s
sys     3m35.929s

The output file, in this case 'regen.xml' should now be the same as an
uncompressed and UTF8sanitized version of planet-070321.osm.bz2



Verification
------------
To verify that this is equal to the new planet.osm file we can compare it
to a previously generated UTF8sanitized version of the same file. 
'cmp -l' reports every single byte of difference between the files (in octal)

$ cmp -l planet-070321a.osm regen.xml
1403627544  11  40
3457266276  11  40

It seems that the process has converted the tab character (ASCII 9) to space (32).
Other than these two character differences the generated output is identical to the
original version of the new planet.osm. This seems close enough to be useable right
now.