Matthias Andreas Benkard | 832a54e | 2019-01-29 09:27:38 +0100 | [diff] [blame] | 1 | // Copyright 2015 The Go Authors. All rights reserved. |
| 2 | // Use of this source code is governed by a BSD-style |
| 3 | // license that can be found in the LICENSE file. |
| 4 | |
| 5 | //go:generate go run gen.go |
| 6 | |
| 7 | // Package identifier defines the contract between implementations of Encoding |
| 8 | // and Index by defining identifiers that uniquely identify standardized coded |
| 9 | // character sets (CCS) and character encoding schemes (CES), which we will |
| 10 | // together refer to as encodings, for which Encoding implementations provide |
| 11 | // converters to and from UTF-8. This package is typically only of concern to |
| 12 | // implementers of Indexes and Encodings. |
| 13 | // |
| 14 | // One part of the identifier is the MIB code, which is defined by IANA and |
| 15 | // uniquely identifies a CCS or CES. Each code is associated with data that |
| 16 | // references authorities, official documentation as well as aliases and MIME |
| 17 | // names. |
| 18 | // |
| 19 | // Not all CESs are covered by the IANA registry. The "other" string that is |
| 20 | // returned by ID can be used to identify other character sets or versions of |
| 21 | // existing ones. |
| 22 | // |
| 23 | // It is recommended that each package that provides a set of Encodings provide |
| 24 | // the All and Common variables to reference all supported encodings and |
| 25 | // commonly used subset. This allows Index implementations to include all |
| 26 | // available encodings without explicitly referencing or knowing about them. |
| 27 | package identifier |
| 28 | |
| 29 | // Note: this package is internal, but could be made public if there is a need |
| 30 | // for writing third-party Indexes and Encodings. |
| 31 | |
| 32 | // References: |
| 33 | // - http://source.icu-project.org/repos/icu/icu/trunk/source/data/mappings/convrtrs.txt |
| 34 | // - http://www.iana.org/assignments/character-sets/character-sets.xhtml |
| 35 | // - http://www.iana.org/assignments/ianacharset-mib/ianacharset-mib |
| 36 | // - http://www.ietf.org/rfc/rfc2978.txt |
| 37 | // - http://www.unicode.org/reports/tr22/ |
| 38 | // - http://www.w3.org/TR/encoding/ |
| 39 | // - https://encoding.spec.whatwg.org/ |
| 40 | // - https://encoding.spec.whatwg.org/encodings.json |
| 41 | // - https://tools.ietf.org/html/rfc6657#section-5 |
| 42 | |
| 43 | // Interface can be implemented by Encodings to define the CCS or CES for which |
| 44 | // it implements conversions. |
| 45 | type Interface interface { |
| 46 | // ID returns an encoding identifier. Exactly one of the mib and other |
| 47 | // values should be non-zero. |
| 48 | // |
| 49 | // In the usual case it is only necessary to indicate the MIB code. The |
| 50 | // other string can be used to specify encodings for which there is no MIB, |
| 51 | // such as "x-mac-dingbat". |
| 52 | // |
| 53 | // The other string may only contain the characters a-z, A-Z, 0-9, - and _. |
| 54 | ID() (mib MIB, other string) |
| 55 | |
| 56 | // NOTE: the restrictions on the encoding are to allow extending the syntax |
| 57 | // with additional information such as versions, vendors and other variants. |
| 58 | } |
| 59 | |
| 60 | // A MIB identifies an encoding. It is derived from the IANA MIB codes and adds |
| 61 | // some identifiers for some encodings that are not covered by the IANA |
| 62 | // standard. |
| 63 | // |
| 64 | // See http://www.iana.org/assignments/ianacharset-mib. |
| 65 | type MIB uint16 |
| 66 | |
| 67 | // These additional MIB types are not defined in IANA. They are added because |
| 68 | // they are common and defined within the text repo. |
| 69 | const ( |
| 70 | // Unofficial marks the start of encodings not registered by IANA. |
| 71 | Unofficial MIB = 10000 + iota |
| 72 | |
| 73 | // Replacement is the WhatWG replacement encoding. |
| 74 | Replacement |
| 75 | |
| 76 | // XUserDefined is the code for x-user-defined. |
| 77 | XUserDefined |
| 78 | |
| 79 | // MacintoshCyrillic is the code for x-mac-cyrillic. |
| 80 | MacintoshCyrillic |
| 81 | ) |