|Last edit||11 October 2018 06:21:57|
|Support contact||Simon Schlosser|
In the Corporate Data League, business rules ensure data quality and specify how to process collaborative reviews. Currently, 1173 data quality rules are checked to ensure a high level of data quality and 17 review rules ensure efficient processing of collaborative reviews. Each business rule is documented in a form that is understandable by business users referencing the defined data model concepts in the rule definitions.
The standardization approach of the data quality standard across all CDL Members is a collaborative endeavor and managed by the CDL Team. Rules need to be analyzed and adapted on a permanent basis. Existing rules may become inapplicable, or it may be necessary to introduce new rules. Some business rules specify which data values are valid for specific attributes. For this purpose, reference data is collected and documented on these pages (e.g. countries). Integrating new reference data usually goes along with the adaptation of an existing business rule or the creation of new ones.
Business rule use cases
There are different use cases that are supported by the CDL data validation. The list will be continuously extended:
- Qualified validation. Checking the consistency of names, legal address and identifiers within one record against trusted data sources (i.e. CDL repository, company registers, etc.)
- Fix proposals. Providing a correct record for given errorneous input data.
- Qualified VAT number check. Validate VAT numbers according to the legal requirements of specific EU member states
- Address data validation
- Dummy check
- General check
Data quality rules
The list below gives an overview on all data quality rules that are currently documented and implemented.
|Business partner name missing||It is necessary that each business partner has at least one name. With respect to the CDL data model it is at least required that a name of type LOCAL or INTERNATIONAL is present.|
|Care of information misplaced||Care of (typicall indicated by "c/o") information must not be maintained in the business partner's name, locality or thoroughfare but is to be managed as care of attribute. If there is care of information found as an attribute value other than "care of" the rule is violated.|
|Contact information misplaced||Contact information is not allowed in the registered name, trade name or international name. This rule checks whether contact information is misplaced by identifying e.g. common keywords such as "attn:" or "z.Hd." and additionally parsing the company name for natural person names that are not meant to be part of the legal name (e.g. when natural person names are placed after the legal form)|
|EIN format invalid (United States)||This rule checks the format of Employer identification number (United States) as described in the additional information tab|
|Tax identifier missing (Italy)||Tax Identification Number(Italy) is known as Codice Fiscale and consists of 16 digits of characters, where C1 to C6 Alphabetic, C7, and C8 is Numeric, C9 is Alphabetic, C10, and C11 is Numeric, C12 is Alphabetic, C13 to C15 belongs to Numeric and C16 is the Numeric.
C1 C2 C3 - Are letters for the last name. C4 C5 C6 - Are letter for the name. C7 C8 - Are numbers for the year of birth. C9 - Is a letter for the month of birth. C10 C11 -Are numbers for the day of birth and sex. C12 C13 C14 C15 - Are one letter and three numbers for the Italian town or to the foreign state of birth.C16 - Have a supervisory function. It is a checksum digit.
|Identifier format invalid (European value added tax identifier (Austria))||The European value added tax identifier in Austria consists of the prefix "AT" followed by the character "U" and 8 numerical digits. This rule checks the presence of "U" followed by exact 8 digits without considering possible whitespaces, hyphens or dots that might be comprised in the identifier value. If there are less/more or wrong placed digits or the "U" is missing then the rule is violated.|
|Identifier format invalid (European value added tax identifier (Belgium))||The European value added tax identifier in Belgium consists of exact 10 numerical digits prefixed by "BE". The first digit following the prefix is always 0 or 1. This rule checks the existence of exact 10 digits without considering possible whitespaces, hyphens or dots that might be comprised in the identifier value. If there are less/more, non-numeric digits or when the first digit does not equal 0 or 1 the rule is violated.|
|Identifier format invalid (European value added tax identifier (Bulgaria))||The European value added tax identifier in Bulgaria consists of 9 or 10 numerical digits prefixed by "BG". This rule checks the existence of 9 or 10 digits without considering possible whitespaces, hyphens or dots that might be comprised in the identifier value. If there are less/more or non-numeric digits the rule is violated.|
|Identifier format invalid (European value added tax identifier (Cyprus))||The European value added tax identifier in Cyprus consists of 9 characters (8 numerical digits + 1 letter) prefixed by "CY". This rule checks the existence of 8 numerical digits followed by a character without considering possible whitespaces, hyphens or dots that might be comprised in the identifier value. If there are less/more or wrong placed digits or the last cipher is not a letter but a numerical digit then the rule is violated.|
|Identifier format invalid (European value added tax identifier (Czech Republic))||This rule checks the format of European value added tax identifier (Czech Republic) consists of 8-10 digits prefixed by "CZ". This rule checks the existence of 8 numerical digits without considering possible whitespaces, hyphens or dots that might be comprised in the identifier value. If there are less/more or wrong placed digits then the rule is violated.|
|... further results|
|Cadastro de Pessoa Fisica invalid (Brazil)||The CPF number is an identification number of Brazilian citizens emitted by the Brazilian Ministry of Revenue, which is called "Ministério da Fazenda". CPF stands for "Cadastro de Pessoa Física" (literally, physical person registration) as opposed to the CNPJ number for companies. CPF consists of C1...C11. Where C1...C9 consists of random numbers and C10, C11 are the check numbers. The check number can be calculated by the following methods.
- From right to left all digits are multiplied by a descending sequence starting with 9. - The sum of all products is computed. - The sum of step 2 is taken modulo 11. - The result of step 3 is taken modulo 10.- The checkdigit found is appended to the number and steps 1 to 4.
|Company is greylisted||The rule checks whether a given business partner is known to be inactive by means of being "out of business", "in liquidation" or in a similar status. For this purpose the rule searches for information in the CDL business partner repository and in addition in several connected data sources. These are:
|Fundamental address parts missing||It is necessary that an address, PO Box- or street address, comprises at least a post code or locality.|
|Identifier format invalid (Business number (Canada))||The Canadian business number consists of exactly 9 numerical digits. The rule checks whether there are exactly 9 numerical digits available not considering possible whitespaces or other delimiters between the digits.|
|Identifier format invalid (CNPJ number (Brazil))||The CNPJ consists of a 14-digit number formatted as 00.000.000/0001-00 — The first eight digits identify the company, the four digits after the slash identify the branch or subsidiary ("0001" defaults to the headquarters), and the last two are check digits. This rule checks the existence of exactly 14 digits without considering the concrete formatting using ".","/" or "-".|
|Identifier format invalid (CUIT number (Argentina))||The CUIT number in Argentina consists of 11 numerical digits.|
|Identifier format invalid (Corporate number (Japan))||The Corporate number in Japan consists of exactly 13 numerical digits. The first digit is the check digit and cannot be 0.|
|Identifier format invalid (European value added tax identifier (Croatia))||The European value added tax identifier in Croatia consists of the prefix "HR" followed by 11 numerical digits. This rule checks the presence of exact 11 numerical digits without considering possible whitespaces, hyphens or dots that might be comprised in the identifier value. If there are less/more or e.g. alphabetic values then the rule is violated.|
|Code Structure of Legal Entity Identifier||The technical specification for LEI is ISO 17442. An LEI consists of a 20-character alphanumeric string, with the first 4 characters identifying the Local Operating Unit (LOU) that issued the LEI. Characters 5 and 6 are reserved as '00'. Characters 7-18 are the unique alphanumeric string assigned to the organisation by the LOU. The final 2 characters are checksum digits.|
|Identifier format invalid (PAN code (India))||The Permanent Account Number (PAN) in India are composed like this: First five characters are letters, next 4 ciphers are numerical digits and the last cipher is a letter. This rule checks the conformance with this reference scheme without considering possible whitespaces, hyphens or dots that might be comprised in the identifier value.|
|... further results|
Data quality rules, listed by country and attributes (Excel):
The list below gives an overview on all review rules which are currently documented and implemented.
|Auto reject removal of given legal form||Auto reject a review if the only difference is a missing Business partner legal form in the update while current data comprises a Business partner legal form.||RELEASED|
|Auto reject update with error defect in address||Auto reject a review if the address violates at least one data quality rule.||RELEASED|
|Auto reject update with error defect in business partner||Auto reject a review if the business partner violates at least one data quality rule.||RELEASED|
|Auto reject additional legal address||Adding a legal address is rejected, if a legal address does already exist.||DRAFT|
|Auto reject address country update||Do not allow edits of Country.||DRAFT|
No inactive auto reject review rules
|Review address locality update||Trigger manual review for Locality value updates.||RELEASED|
|Review address post code update||Trigger manual review for Post code updates, except the updated value is more precisely than current data.||RELEASED|
|Review address thoroughfare number update||Trigger manual review for Thoroughfare number updates, except the updated value is more precisely than current data.||RELEASED|
|Review address thoroughfare update||Trigger manual review for Thoroughfare value updates.||RELEASED|
|Review business partner identifier update||Trigger manual review if a Identifier is changed.||RELEASED|
|Review business partner legal form update||Trigger manual review if Business partner legal form is changed.||RELEASED|
|Review major business partner name update||Trigger manual review if Name value is changed in a significant way. In this context, "significant" means that the new names differ in more than 2 categories, e.g. uppper/lower case and additional/less punctuation (e.g. ||RELEASED|
No draft manual review rules
No inactive manual review rules
No released review processing rules
|Mandatory review of address removals||Removal of addresses is always reviewed.||DRAFT|
|Merge of business partner updates||If there is already a review pending for the provided business partner, the new update is merged into the pending one. Values from the newer update are preferred.||DRAFT|
|Reviews of non-legal addresses||A new address is only reviewed, if it is a legal address. Adding any other address does not trigger a review and is accepted as is.||DRAFT|
No inactive review processing rules
|Uniform language and alphabet||Use ISO basic Latin alphabet and English translation (e.g. for Locality) for addresses.||DRAFT|
|Prefer European value added tax identifier in Germany||Remove Tax identification number (Germany) if European value added tax identifier (Germany) is available.||DRAFT|
|Skip legal name after legal form||Remove Tax identification number (Germany) if European value added tax identifier (Germany) is available.||DRAFT|
|Trim business partner name||Remove all non-characters at the beginning and at the end from Name value, e.g. whitespace or ||DRAFT|
We are continuously defining and implementing additional rules. Please get in touch with us if you observe that a business rule is missing! Also if you are interested in the business rules management architecture and its implementation we would be happy to provide you with additional information or a showcase.
The technical implementation of the business rules uses the semantics defined in this wiki. The knowledge documented in this wiki is stored as RDF triples in a triple store (Jena TBC). The RDF triples are made accessible by a SPARQL endpoint (Jena Fuseki). Business rules are translated into a semantic representation as RDF triples and added to the ontology provided by the endpoint. For representing and executing the rules SPIN is used. SPIN is a collection of RDF vocabularies which enable the use of SPARQL to define constraints and inference rules on Semantic Web models. For checking a business partner or an address for business rule violations, the data record is translated into a semantic representation which is an instantiation of the data model. The business rules do then check whether the instance does confirm to the world defined in this wiki or not.
From a theoretical point of view the data model concepts and the relations between them define the Corporate Data League domain (in other words the world as it is understood by the CDL). Within this world everything would be possible when there are no rules. Business rules constrain this world by reducing the space of possible instantiations of the modeled domain. An example for this is a business rule that constrains the possible values a country. It says that an allowed value for a country are only those countries that are defined in the ISO 3166-1 standard. These countries are documented as reference data in this portal. Without this rule a country could have any other value such as "Romulus". To take up again the wording from above: The documented countries are knowledge about the CDL world (domain), and this knowledge is used to constrain the possible space of options for the name of a country.