Data curation report
|Last edit||12 May 2019 18:10:42|
|Support contact||Kai Hüner|
A report is provided as an Excel file with the following pages:
- documentation: Link to this page, to provide up-to-date documentation to report users.
- charts: Visual summaries of certain aspects of the data curation results. Particular charts are documented below.
- curation results: Detailed results of the data curation per record with comparison of results with input data. Details are documented below.
- country quality (chart base): Request similarity statistics per country, numerical basis of the Address Curation Quality chart.
- changes (chart base): Statistics on data changes (e.g. added data, changed data, confirmed data) per address field, numerical basis of the Changed Address Fields chart.
The charts of the Data Curation Report summarize certain statistics of the data curation result, e.g. changes per address field or address curation quality per country.
Address Curation Qualityrequest similarity by 3 categories (low, some, high).
The "confidence" classification indicates the probability that a given result represents the requested Address. For results in the "high confidence" range
(0.7,1.0], result data is most likely better than requested data: The high request similarity indicates that the response "fits" to requested data (e.g. same post code, similar thoroughfare), and missing elements (e.g. administrative area, geographic coordinates) are added by consistent and complete reference data.
Also "some confidence" results (i.e. request similarity in
(0.5,0.7]) mostly provide correct and "better" results. However, some results may not fit to the requested address and represent another address, e.g. with a similar bu different thoroughfare or even a different locality (but thoroughfare with same name). By filtering results and checking questionable columns, correct and wrong results can be distinguished quit well.
Results in "low confidence" range
[0.0,0.5] are most likely representing a different address. There may be cases with correct results but low request similarity due to local language versions and translation issues. But most results may be wrong and should not be copied automatically to operative data.
Changed Address Fieldsaddress fields administrative area, locality, post code, thoroughfare, thoroughfare number, and geographic coordinates were changed by data curation. Change is classified by the following categories:
- Missing: Data was missing in request, but data curation could not add missing information.
- Confirmed: Data curation found reference data for the given field which is exactly the same like given data.
- Enriched: Data was missing in request and data curation could add missing information.
- Changed: Data curation found reference data for the given field which is different to given data.
Statistics are calculated from "change" columns in "curation results", e.g. "Locality value change", see example below. "Changed" category summarizes both
MINOR CHANGE and
MAJOR CHANGE tags which differentiate case-only changes from different characters.
Curation result detailsLocality:
- Locality value: Data from input set,
St Gulenin the right-hand example.
- Locality value curated: Result from data curation for the given value,
St. Gallenin the example.
- Locality value change: Classification for "what has changed". There may be
NO CHANGE(i.e. input and result are equal) or data may be
ADDED(i.e. input was empty, data curation enriched missing data). If data was changed, the report distinguishes
MINOR CHANGE(i.e. only character case changed) and
MAJOR CHANGE(i.e. characters were changed, really different data).
- Locality: Provides the request similarity for the particular field, 85.71% in the example for