A Format for Self-Published IP
Geolocation Feeds
Loon LLC
1600 Amphitheatre Parkway
Mountain View
CA
94043
United States of America
ek@loon.com
Google
1600 Amphitheatre Parkway
Mountain View
CA
94043
United States of America
kduleba@google.com
Google Switzerland GmbH
Brandschenkestrasse 110
8002
Zürich
Switzerland
zszami@google.com
Google Switzerland GmbH
Brandschenkestrasse 110
8002
Zürich
Switzerland
smoser@google.com
Google
1600 Amphitheatre Parkway
Mountain View
CA
94043
United States of America
warren@kumari.net
This document records a format whereby a network operator can publish
a mapping of IP address prefixes to simplified geolocation information,
colloquially termed a "geolocation feed". Interested parties can poll
and parse these feeds to update or merge with other geolocation data
sources and procedures. This format intentionally only allows specifying
coarse-level location.
Some technical organizations operating networks that move from one
conference location to the next have already experimentally published
small geolocation feeds.
This document describes a currently deployed format. At
least one consumer (Google) has incorporated these feeds into a
geolocation data pipeline, and a significant number of ISPs are
using it to inform them where their prefixes should be geolocated.
Introduction
Motivation
Providers of services over the Internet have grown to depend on
best-effort geolocation information to improve the user experience.
Locality information can aid in directing traffic to the nearest
serving location, inferring likely native language, and providing
additional context for services involving search queries.
When an ISP, for example, changes the location where an IP prefix
is deployed, services that make use of geolocation information may
begin to suffer degraded performance. This can lead to customer
complaints, possibly to the ISP directly. Dissemination of correct
geolocation data is complicated by the lack of any centralized means
to coordinate and communicate geolocation information to all
interested consumers of the data.
This document records a format whereby a network operator (an ISP,
an enterprise, or any organization that deems the geolocation of its
IP prefixes to be of concern) can publish a mapping of IP address
prefixes to simplified geolocation information, colloquially termed a
"geolocation feed". Interested parties can poll and parse these feeds
to update or merge with other geolocation data sources and
procedures.
This document describes a currently deployed format. At
least one consumer (Google) has incorporated these feeds into a
geolocation data pipeline, and a significant number of ISPs are
using it to inform them where their prefixes should be geolocated.
Requirements Notation
The key words "MUST", "MUST NOT",
"REQUIRED", "SHALL", "SHALL
NOT", "SHOULD", "SHOULD NOT",
"RECOMMENDED", "NOT RECOMMENDED",
"MAY", and "OPTIONAL" in this document are
to be interpreted as
described in BCP 14
when, and only when, they appear in all capitals, as shown here.
As this is an informational document about a data format and set of
operational practices presently in use, requirements notation captures
the design goals of the authors and implementors.
Assumptions about Publication
This document describes both a format and a mechanism for
publishing data, with the assumption that the network operator to whom
operational responsibility has been delegated for any published data
wishes it to be public. Any privacy risk is bounded by the format, and
feed publishers MAY omit prefixes or any location field associated with
a given prefix to further protect privacy (see
for details about which fields exactly may be omitted). Feed publishers
assume the responsibility of determining which data should be made
public.
This document does not incorporate a mechanism to communicate
acceptable use policies for self-published data. Publication itself is
inferred as a desire by the publisher for the data to be usefully
consumed, similar to the publication of information like host names,
cryptographic keys, and Sender Policy Framework (SPF) records in the DNS.
Self-Published IP Geolocation Feeds
The format described here was developed to address the need of
network operators to rapidly and usefully share geolocation information
changes. Originally, there arose a specific case where regional
operators found it desirable to publish location changes rather than
wait for geolocation algorithms to "learn" about them. Later, technical
conferences that frequently use the same network prefixes advertised
from different conference locations experimented by publishing
geolocation feeds updated in advance of network location changes in
order to better serve conference attendees.
At its simplest, the mechanism consists of a network operator
publishing a file (the "geolocation feed") that contains several text
entries, one per line. Each entry is keyed by a unique (within the feed)
IP prefix (or single IP address) followed by a sequence of network
locality attributes to be ascribed to the given prefix.
Specification
For operational simplicity, every feed should contain data about
all IP addresses the provider wants to publish. Alternatives, like
publishing only entries for IP addresses whose geolocation data has
changed or differ from current observed geolocation behavior "at
large", are likely to be too operationally complex.
Feeds MUST use UTF-8 character encoding.
Lines are delimited by a line break (CRLF) (as specified in
), and blank lines are ignored.
Text from a '#' character to the end of the current line is treated
as a comment only and is similarly ignored (note that this does not
strictly follow , which has no
support for comments).
Feed lines that are not comments MUST be formatted
as comma-separated values (CSV), as described in . Each feed entry is a text line of
the form:
The IP prefix field is REQUIRED, all others are
OPTIONAL (can be empty), though the requisite minimum
number of commas SHOULD be present.
Geolocation Feed Individual Entry Fields
IP Prefix
REQUIRED: Each IP prefix field
MUST be either a single IP address or an IP prefix
in Classless Inter-Domain Routing (CIDR) notation in conformance
with for
IPv4 or
for IPv6.
Examples include "192.0.2.1" and "192.0.2.0/24" for IPv4 and
"2001:db8::1" and "2001:db8::/32" for IPv6.
Alpha2code (Previously: 'country')
OPTIONAL: The alpha2code field, if non-empty,
MUST be a 2-letter
ISO country code conforming to ISO 3166-1 alpha 2 . Parsers
SHOULD treat this field
case-insensitively.
Earlier versions of this document called this field "country",
and it may still be referred to as such in existing
tools/interfaces.
Parsers MAY additionally support other 2-letter
codes outside the ISO 3166-1 alpha 2 codes, such as the 2-letter
codes from the "Exceptionally reserved codes" set.
Examples include "US" for the United States, "JP" for Japan,
and "PL" for Poland.
Region
OPTIONAL: The region field, if non-empty,
MUST be an ISO region code conforming to ISO 3166-2
. Parsers
SHOULD treat this field case-insensitively.
Examples include "ID-RI" for the Riau province of Indonesia and
"NG-RI" for the Rivers province in Nigeria.
City
OPTIONAL: The city field, if non-empty,
SHOULD be free UTF-8
text, excluding the comma (',') character.
Examples include "Dublin", "New York", and "Sao Paulo"
(specifically "S" followed by 0xc3, 0xa3, and "o Paulo").
Postal Code
OPTIONAL, DEPRECATED: The postal code field, if
non-empty, SHOULD be free UTF-8 text, excluding the
comma (',') character. The use of this field is deprecated;
consumers of feeds should be able to parse feeds containing these
fields, but new feeds SHOULD NOT include this
field due to the granularity of this information. See for additional discussion.
Examples include "106-6126" (in Minato ward, Tokyo, Japan).
Prefixes with No Geolocation Information
Feed publishers may indicate that some IP prefixes should not
have any associated geolocation information. It may be that some
prefixes under their administrative control are reserved, not yet
allocated or deployed, or in the process of being redeployed
elsewhere and existing geolocation information can, from the
perspective of the publisher, safely be discarded.
This special case can be indicated by explicitly leaving blank
all fields that specify any degree of geolocation information. For
example:
Historically, the user-assigned alpha2code identifier of "ZZ" has
been used for this same purpose. This is not necessarily preferred,
and no specific interpretation of any of the other user-assigned
alpha2code codes is currently defined.
Additional Parsing Requirements
Feed entries that do not have an IP address or prefix field or have an
IP address or prefix field that fails to parse correctly
MUST be discarded.
While publishers SHOULD follow for IPv6 prefix fields,
consumers MUST nevertheless accept all valid string
representations.
Duplicate IP address or prefix entries MUST be
considered an error, and consumer implementations
SHOULD log the repeated entries for further
administrative review. Publishers SHOULD take
measures to ensure there is one and only one entry per IP address
and prefix.
Multiple entries that constitute nested prefixes are
permitted. Consumers SHOULD consider the entry with
the longest matching prefix (i.e., the "most specific") to be the
best matching entry for a given IP address.
Feed entries with non-empty optional fields that fail to parse,
either in part or in full, SHOULD be discarded. It is
RECOMMENDED that they also be logged for further
administrative review.
For compatibility with future additional fields, a parser
MUST ignore any fields beyond those it expects. The
data from fields that are expected and that parse successfully
MUST still be considered valid. Per , no extensions to this format
are in use nor are any anticipated.
Examples
Example entries using different IP address formats and describing
locations at alpha2code ("country code"), region, and city granularity
level, respectively:
The IETF network publishes geolocation information for the meeting
prefixes, and generally just comment out the last meeting information
and append the new meeting information. The , at the time of this writing, contains:
Experimentally, RIPE has published geolocation information for
their conference network prefixes, which change location in accordance
with each new event. , at the time of
writing, contains:
Similarly, ICANN has published geolocation information for their
portable conference network prefixes. , at
the time of writing, contains:
A longer example is the Google Corp
Geofeed, which lists the geolocation information for Google corporate
offices.
At the time of writing, Google processes approximately 400 feeds
comprising more than 750,000 IPv4 and IPv6 prefixes.
Consuming Self-Published IP Geolocation Feeds
Consumers MAY treat published feed data as a hint only
and MAY choose
to prefer other sources of geolocation information for any given IP
prefix. Regardless of a consumer's stance with respect to a given
published feed, there are some points of note for sensibly and
effectively consuming published feeds.
Feed Integrity
The integrity of published information SHOULD be
protected by securing the means of publication, for example, by using
HTTP over TLS . Whenever
possible, consumers SHOULD prefer retrieving
geolocation feeds in a manner that guarantees integrity of the
feed.
Verification of Authority
Consumers of self-published IP geolocation feeds SHOULD perform
some form of verification that the publisher is in fact authoritative
for the addresses in the feed. The actual means of verification is
likely dependent upon the way in which the feed is discovered. Ad hoc
shared URIs, for example, will likely require an ad hoc verification
process. Future automated means of feed discovery SHOULD have an
accompanying automated means of verification.
A consumer should only trust geolocation information for IP addresses
or prefixes for which the publisher has been verified as
administratively authoritative. All other geolocation feed entries
should be ignored and logged for further administrative review.
Verification of Accuracy
Errors and inaccuracies may occur at many levels, and publication
and consumption of geolocation data are no exceptions. To the extent
practical, consumers SHOULD take steps to verify the accuracy of
published locality. Verification methodology, resolution of
discrepancies, and preference for alternative sources of data are left
to the discretion of the feed consumer.
Consumers SHOULD decide on discrepancy thresholds
and SHOULD flag, for administrative review, feed entries
that exceed set thresholds.
Refreshing Feed Information
As a publisher can change geolocation data at any time and without
notification, consumers SHOULD implement mechanisms to periodically
refresh local copies of feed data. In the absence of any other refresh
timing information, it is recommended that consumers SHOULD refresh
feeds no less often than weekly and no more often than is likely
to cause issues to the publisher.
For feeds available via HTTPS (or HTTP), the publisher MAY
communicate refresh timing information by means of the standard HTTP
expiration model (). Specifically, publishers can
include either an Expires header () or a Cache-Control header () specifying the
max-age. Where practical, consumers SHOULD refresh feed
information before the expiry time is reached.
Privacy Considerations
Publishers of geolocation feeds are advised to have fully considered
any and all privacy implications of the disclosure of such information
for the users of the described networks prior to publication. A thorough
comprehension of the security considerations () of a chosen geolocation policy is
highly recommended, including an understanding of some of the
limitations of information obscurity () (see also ).
As noted in , each location
field in an entry is
optional, in order to support expressing only the level of specificity
that the publisher has deemed acceptable. There is no requirement that
the level of specificity be consistent across all entries within a feed.
In particular, the Postal Code field () can
provide very specific geolocation, sometimes within a building. Such
specific Postal Code values MUST NOT be published in geofeeds without
the express consent of the parties being located.
Operators who publish geolocation information are strongly encouraged
to inform affected users/customers of this fact and of the potential
privacy-related consequences and trade-offs.
Relation to Other Work
While not originally done in conjunction with the GEOPRIV
Working Group , Richard Barnes
observed that this work
is nevertheless consistent with that which the group has defined, both
for address format and for privacy. The data elements in geolocation
feeds are equivalent to the following XML structure ( ):
country
region
city
postal_code
]]>
Providing geolocation information to this granularity is equivalent
to the following privacy policy (the definition of the 'building'
level of
disclosure):
building
]]>
Security Considerations
As there is no true security in the obscurity of the location of any
given IP address, self-publication of this data fundamentally opens no
new attack vectors. For publishers, self-published data may increase
the ease with which such location data might be exploited (it can, for
example, make easy the discovery of prefixes populated with customers
as distinct from prefixes not generally in use).
For consumers, feed retrieval processes may receive input from
potentially hostile sources (e.g., in the event of hijacked traffic). As
such, proper input validation and defense measures MUST be taken (see
the discussion in ).
Similarly, consumers who do not perform sufficient verification of
published data bear the same risks as from other forms of geolocation
configuration errors (see the discussion in Sections
and ).
Validation of a feed's contents includes verifying that the publisher
is authoritative for the IP prefixes included in the feed. Failure to
verify IP prefix authority would, for example, allow ISP Bob to make
geolocation statements about IP space held by ISP Alice. At this time,
only out-of-band verification methods are implemented (i.e., an ISP's
feed may be verified against publicly available IP allocation data).
Planned Future Work
In order to more flexibly support future extensions, use of a more
expressive feed format has been suggested. Use of JavaScript Object
Notation (JSON) ,
specifically, has been
discussed. However, at the time of writing, no such specification nor
implementation exists. Nevertheless, work on extensions is deferred
until a more suitable format has been selected.
The authors are planning on writing a document describing such a
new format. This document describes a currently deployed and used
format. Given the extremely limited extensibility of the present format
no extensions to it are anticipated. Extensibility requirements are
instead expected to be integral to the development of a new format.
Finding Self-Published IP Geolocation Feeds
The issue of finding, and later verifying, geolocation feeds is not
formally specified in this document. At this time, only ad hoc feed
discovery and verification has a modicum of established practice (see
below); discussion of other mechanisms has been removed for clarity.
Ad Hoc 'Well-Known' URIs
To date, geolocation feeds have been shared informally in the form
of HTTPS URIs exchanged in email threads. Three example URIs (, , and )
describe networks that change locations periodically, the operators
and operational practices of which are well known within their
respective technical communities.
The contents of the feeds are verified by a similarly ad hoc
process, including:
- personal knowledge of the parties involved in the exchange
and
- comparison of feed-advertised prefixes with the BGP-advertised
prefixes of Autonomous System Numbers known to be operated by the
publishers.
Ad hoc mechanisms, while useful for early experimentation by
producers and consumers, are unlikely to be adequate for long-term,
widespread use by multiple parties. Future versions of any such
self-published geolocation feed mechanism SHOULD address scalability
concerns by defining a means for automated discovery and verification
of operational authority of advertised prefixes.
Other Mechanisms
Previous versions of this document referenced use of the
WHOIS service operated by
Regional Internet Registries (RIRs), as well as
possible DNS-based schemes to discover and validate geofeeds.
To the authors' knowledge, support for such mechanisms has never been
implemented, and this speculative text has been removed to avoid
ambiguity.
IANA Considerations
This document has no IANA actions.
References
Normative References
ISO 3166-1 decoding table
ISO
ISO 3166-2:2007
ISO
Extensible Markup Language (XML) 1.0 (Fifth Edition)
Informative References
IETF Meeting Network Geolocation Data
RIPE NCC Meeting Geolocation Data
Réseaux IP Européens
Network Coordination Centre
ICANN Meeting Geolocation Data
ICANN
Google Corp Geofeed
Google, LLC
Geographic Location/Privacy (geopriv)
IETF
Google's Python IP address manipulation library
Google Inc.
Google Inc.
Glossary for ISO 3166
ISO
Sample Python Validation Code
Included here is a simple format validator in Python for
self-published ipgeo feeds. This tool reads CSV data in the
self-published ipgeo feed format from the standard input and performs
basic validation. It is intended for use by feed publishers before
launching a feed. Note that this validator does not verify the
uniqueness of every IP prefix entry within the feed as a whole but only
verifies the syntax of each single line from within the feed. A complete
validator MUST also ensure IP prefix uniqueness.
The main source file "ipgeo_feed_validator.py" follows. It requires
use of the open source ipaddr Python library for IP address and CIDR
parsing and validation .
0)
is_correct = self._IsIPAddressOrPrefixCorrect(fields[0])
if len(fields) > 1:
if not self._IsAlpha2CodeCorrect(fields[1]):
is_correct = False
if len(fields) > 2 and not self._IsRegionCodeCorrect(fields[2]):
is_correct = False
if len(fields) != 5:
self._ReportWarning('5 fields were expected (got %d).'
% len(fields))
############################################################
def _IsIPAddressOrPrefixCorrect(self, field):
if '/' in field:
return self._IsCIDRCorrect(field)
return self._IsIPAddressCorrect(field)
def _IsCIDRCorrect(self, cidr):
try:
ipprefix = ipaddr.IPNetwork(cidr)
if ipprefix.network._ip != ipprefix._ip:
self._ReportError('Incorrect IP Network.')
return False
if ipprefix.is_private:
self._ReportError('IP Address must not be private.')
return False
except:
self._ReportError('Incorrect IP Network.')
return False
return True
def _IsIPAddressCorrect(self, ipaddress):
try:
ip = ipaddr.IPAddress(ipaddress)
except:
self._ReportError('Incorrect IP Address.')
return False
if ip.is_private:
self._ReportError('IP Address must not be private.')
return False
return True
############################################################
def _IsAlpha2CodeCorrect(self, alpha2code):
if len(alpha2code) == 0:
return True
if len(alpha2code) != 2 or not alpha2code.isalpha():
self._ReportError(
'Alpha 2 code must be in the ISO 3166-1 alpha 2 format.')
return False
return True
def _IsRegionCodeCorrect(self, region_code):
if len(region_code) == 0:
return True
if '-' not in region_code:
self._ReportError('Region code must be in ISO 3166-2 format.')
return False
parts = region_code.split('-')
if not self._IsAlpha2CodeCorrect(parts[0]):
return False
return True
############################################################
def _ReportError(self, message):
self._ReportWithSeverity('ERROR', message)
def _ReportWarning(self, message):
self._ReportWithSeverity('WARNING', message)
def _ReportWithSeverity(self, severity, message):
self.is_correct_line = False
output_line = '%s: %s\n' % (severity, message)
if severity not in self.output_log:
self.output_log[severity] = []
self.output_log[severity].append(output_line)
if self.output_stream is not None:
self.output_stream.write(output_line)
def _FlushOutputStream(self):
if self.is_correct_line: return
if self.output_stream is None: return
self.output_stream.write('line %d: %s\n\n'
% (self.line_number, self.line))
############################################################
def main():
feed_validator = IPGeoFeedValidator()
feed_validator.Validate(sys.stdin)
if feed_validator.CountErrors('ERROR'):
sys.exit(1)
if __name__ == '__main__':
main()
]]>
A unit test file, "ipgeo_feed_validator_test.py" is provided as well.
It provides basic test coverage of the code above, though does not test
correct handling of non-ASCII UTF-8 strings.
Acknowledgements
The authors would like to express their gratitude to reviewers and
early implementors, including but not limited to , ,
, ,
, ,
, , , , ,
, , , and .
In particular, and contributed substantial review,
text, and advice.