rfc2231.txt 19KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564
  1. Network Working Group N. Freed
  2. Request for Comments: 2231 Innosoft
  3. Updates: 2045, 2047, 2183 K. Moore
  4. Obsoletes: 2184 University of Tennessee
  5. Category: Standards Track November 1997
  6. MIME Parameter Value and Encoded Word Extensions:
  7. Character Sets, Languages, and Continuations
  8. Status of this Memo
  9. This document specifies an Internet standards track protocol for the
  10. Internet community, and requests discussion and suggestions for
  11. improvements. Please refer to the current edition of the "Internet
  12. Official Protocol Standards" (STD 1) for the standardization state
  13. and status of this protocol. Distribution of this memo is unlimited.
  14. Copyright Notice
  15. Copyright (C) The Internet Society (1997). All Rights Reserved.
  16. 1. Abstract
  17. This memo defines extensions to the RFC 2045 media type and RFC 2183
  18. disposition parameter value mechanisms to provide
  19. (1) a means to specify parameter values in character sets
  20. other than US-ASCII,
  21. (2) to specify the language to be used should the value be
  22. displayed, and
  23. (3) a continuation mechanism for long parameter values to
  24. avoid problems with header line wrapping.
  25. This memo also defines an extension to the encoded words defined in
  26. RFC 2047 to allow the specification of the language to be used for
  27. display as well as the character set.
  28. 2. Introduction
  29. The Multipurpose Internet Mail Extensions, or MIME [RFC-2045, RFC-
  30. 2046, RFC-2047, RFC-2048, RFC-2049], define a message format that
  31. allows for:
  32. Freed & Moore Standards Track [Page 1]
  33. RFC 2231 MIME Value and Encoded Word Extensions November 1997
  34. (1) textual message bodies in character sets other than
  35. US-ASCII,
  36. (2) non-textual message bodies,
  37. (3) multi-part message bodies, and
  38. (4) textual header information in character sets other than
  39. US-ASCII.
  40. MIME is now widely deployed and is used by a variety of Internet
  41. protocols, including, of course, Internet email. However, MIME's
  42. success has resulted in the need for additional mechanisms that were
  43. not provided in the original protocol specification.
  44. In particular, existing MIME mechanisms provide for named media type
  45. (content-type field) parameters as well as named disposition
  46. (content-disposition field). A MIME media type may specify any
  47. number of parameters associated with all of its subtypes, and any
  48. specific subtype may specify additional parameters for its own use. A
  49. MIME disposition value may specify any number of associated
  50. parameters, the most important of which is probably the attachment
  51. disposition's filename parameter.
  52. These parameter names and values end up appearing in the content-type
  53. and content-disposition header fields in Internet email. This
  54. inherently imposes three crucial limitations:
  55. (1) Lines in Internet email header fields are folded
  56. according to RFC 822 folding rules. This makes long
  57. parameter values problematic.
  58. (2) MIME headers, like the RFC 822 headers they often
  59. appear in, are limited to 7bit US-ASCII, and the
  60. encoded-word mechanisms of RFC 2047 are not available
  61. to parameter values. This makes it impossible to have
  62. parameter values in character sets other than US-ASCII
  63. without specifying some sort of private per-parameter
  64. encoding.
  65. (3) It has recently become clear that character set
  66. information is not sufficient to properly display some
  67. sorts of information -- language information is also
  68. needed [RFC-2130]. For example, support for
  69. handicapped users may require reading text string
  70. Freed & Moore Standards Track [Page 2]
  71. RFC 2231 MIME Value and Encoded Word Extensions November 1997
  72. aloud. The language the text is written in is needed
  73. for this to be done correctly. Some parameter values
  74. may need to be displayed, hence there is a need to
  75. allow for the inclusion of language information.
  76. The last problem on this list is also an issue for the encoded words
  77. defined by RFC 2047, as encoded words are intended primarily for
  78. display purposes.
  79. This document defines extensions that address all of these
  80. limitations. All of these extensions are implemented in a fashion
  81. that is completely compatible at a syntactic level with existing MIME
  82. implementations. In addition, the extensions are designed to have as
  83. little impact as possible on existing uses of MIME.
  84. IMPORTANT NOTE: These mechanisms end up being somewhat gibbous when
  85. they actually are used. As such, these mechanisms should not be used
  86. lightly; they should be reserved for situations where a real need for
  87. them exists.
  88. 2.1. Requirements notation
  89. This document occasionally uses terms that appear in capital letters.
  90. When the terms "MUST", "SHOULD", "MUST NOT", "SHOULD NOT", and "MAY"
  91. appear capitalized, they are being used to indicate particular
  92. requirements of this specification. A discussion of the meanings of
  93. these terms appears in [RFC- 2119].
  94. 3. Parameter Value Continuations
  95. Long MIME media type or disposition parameter values do not interact
  96. well with header line wrapping conventions. In particular, proper
  97. header line wrapping depends on there being places where linear
  98. whitespace (LWSP) is allowed, which may or may not be present in a
  99. parameter value, and even if present may not be recognizable as such
  100. since specific knowledge of parameter value syntax may not be
  101. available to the agent doing the line wrapping. The result is that
  102. long parameter values may end up getting truncated or otherwise
  103. damaged by incorrect line wrapping implementations.
  104. A mechanism is therefore needed to break up parameter values into
  105. smaller units that are amenable to line wrapping. Any such mechanism
  106. MUST be compatible with existing MIME processors. This means that
  107. (1) the mechanism MUST NOT change the syntax of MIME media
  108. type and disposition lines, and
  109. Freed & Moore Standards Track [Page 3]
  110. RFC 2231 MIME Value and Encoded Word Extensions November 1997
  111. (2) the mechanism MUST NOT depend on parameter ordering
  112. since MIME states that parameters are not order
  113. sensitive. Note that while MIME does prohibit
  114. modification of MIME headers during transport, it is
  115. still possible that parameters will be reordered when
  116. user agent level processing is done.
  117. The obvious solution, then, is to use multiple parameters to contain
  118. a single parameter value and to use some kind of distinguished name
  119. to indicate when this is being done. And this obvious solution is
  120. exactly what is specified here: The asterisk character ("*") followed
  121. by a decimal count is employed to indicate that multiple parameters
  122. are being used to encapsulate a single parameter value. The count
  123. starts at 0 and increments by 1 for each subsequent section of the
  124. parameter value. Decimal values are used and neither leading zeroes
  125. nor gaps in the sequence are allowed.
  126. The original parameter value is recovered by concatenating the
  127. various sections of the parameter, in order. For example, the
  128. content-type field
  129. Content-Type: message/external-body; access-type=URL;
  130. URL*0="ftp://";
  131. URL*1="cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar"
  132. is semantically identical to
  133. Content-Type: message/external-body; access-type=URL;
  134. URL="ftp://cs.utk.edu/pub/moore/bulk-mailer/bulk-mailer.tar"
  135. Note that quotes around parameter values are part of the value
  136. syntax; they are NOT part of the value itself. Furthermore, it is
  137. explicitly permitted to have a mixture of quoted and unquoted
  138. continuation fields.
  139. 4. Parameter Value Character Set and Language Information
  140. Some parameter values may need to be qualified with character set or
  141. language information. It is clear that a distinguished parameter
  142. name is needed to identify when this information is present along
  143. with a specific syntax for the information in the value itself. In
  144. addition, a lightweight encoding mechanism is needed to accommodate 8
  145. bit information in parameter values.
  146. Freed & Moore Standards Track [Page 4]
  147. RFC 2231 MIME Value and Encoded Word Extensions November 1997
  148. Asterisks ("*") are reused to provide the indicator that language and
  149. character set information is present and encoding is being used. A
  150. single quote ("'") is used to delimit the character set and language
  151. information at the beginning of the parameter value. Percent signs
  152. ("%") are used as the encoding flag, which agrees with RFC 2047.
  153. Specifically, an asterisk at the end of a parameter name acts as an
  154. indicator that character set and language information may appear at
  155. the beginning of the parameter value. A single quote is used to
  156. separate the character set, language, and actual value information in
  157. the parameter value string, and an percent sign is used to flag
  158. octets encoded in hexadecimal. For example:
  159. Content-Type: application/x-stuff;
  160. title*=us-ascii'en-us'This%20is%20%2A%2A%2Afun%2A%2A%2A
  161. Note that it is perfectly permissible to leave either the character
  162. set or language field blank. Note also that the single quote
  163. delimiters MUST be present even when one of the field values is
  164. omitted. This is done when either character set, language, or both
  165. are not relevant to the parameter value at hand. This MUST NOT be
  166. done in order to indicate a default character set or language --
  167. parameter field definitions MUST NOT assign a default character set
  168. or language.
  169. 4.1. Combining Character Set, Language, and Parameter Continuations
  170. Character set and language information may be combined with the
  171. parameter continuation mechanism. For example:
  172. Content-Type: application/x-stuff
  173. title*0*=us-ascii'en'This%20is%20even%20more%20
  174. title*1*=%2A%2A%2Afun%2A%2A%2A%20
  175. title*2="isn't it!"
  176. Note that:
  177. (1) Language and character set information only appear at
  178. the beginning of a given parameter value.
  179. (2) Continuations do not provide a facility for using more
  180. than one character set or language in the same
  181. parameter value.
  182. (3) A value presented using multiple continuations may
  183. contain a mixture of encoded and unencoded segments.
  184. Freed & Moore Standards Track [Page 5]
  185. RFC 2231 MIME Value and Encoded Word Extensions November 1997
  186. (4) The first segment of a continuation MUST be encoded if
  187. language and character set information are given.
  188. (5) If the first segment of a continued parameter value is
  189. encoded the language and character set field delimiters
  190. MUST be present even when the fields are left blank.
  191. 5. Language specification in Encoded Words
  192. RFC 2047 provides support for non-US-ASCII character sets in RFC 822
  193. message header comments, phrases, and any unstructured text field.
  194. This is done by defining an encoded word construct which can appear
  195. in any of these places. Given that these are fields intended for
  196. display, it is sometimes necessary to associate language information
  197. with encoded words as well as just the character set. This
  198. specification extends the definition of an encoded word to allow the
  199. inclusion of such information. This is simply done by suffixing the
  200. character set specification with an asterisk followed by the language
  201. tag. For example:
  202. From: =?US-ASCII*EN?Q?Keith_Moore?= <moore@cs.utk.edu>
  203. 6. IMAP4 Handling of Parameter Values
  204. IMAP4 [RFC-2060] servers SHOULD decode parameter value continuations
  205. when generating the BODY and BODYSTRUCTURE fetch attributes.
  206. 7. Modifications to MIME ABNF
  207. The ABNF for MIME parameter values given in RFC 2045 is:
  208. parameter := attribute "=" value
  209. attribute := token
  210. ; Matching of attributes
  211. ; is ALWAYS case-insensitive.
  212. This specification changes this ABNF to:
  213. parameter := regular-parameter / extended-parameter
  214. regular-parameter := regular-parameter-name "=" value
  215. regular-parameter-name := attribute [section]
  216. attribute := 1*attribute-char
  217. Freed & Moore Standards Track [Page 6]
  218. RFC 2231 MIME Value and Encoded Word Extensions November 1997
  219. attribute-char := <any (US-ASCII) CHAR except SPACE, CTLs,
  220. "*", "'", "%", or tspecials>
  221. section := initial-section / other-sections
  222. initial-section := "*0"
  223. other-sections := "*" ("1" / "2" / "3" / "4" / "5" /
  224. "6" / "7" / "8" / "9") *DIGIT)
  225. extended-parameter := (extended-initial-name "="
  226. extended-value) /
  227. (extended-other-names "="
  228. extended-other-values)
  229. extended-initial-name := attribute [initial-section] "*"
  230. extended-other-names := attribute other-sections "*"
  231. extended-initial-value := [charset] "'" [language] "'"
  232. extended-other-values
  233. extended-other-values := *(ext-octet / attribute-char)
  234. ext-octet := "%" 2(DIGIT / "A" / "B" / "C" / "D" / "E" / "F")
  235. charset := <registered character set name>
  236. language := <registered language tag [RFC-1766]>
  237. The ABNF given in RFC 2047 for encoded-words is:
  238. encoded-word := "=?" charset "?" encoding "?" encoded-text "?="
  239. This specification changes this ABNF to:
  240. encoded-word := "=?" charset ["*" language] "?" encoded-text "?="
  241. 8. Character sets which allow specification of language
  242. In the future it is likely that some character sets will provide
  243. facilities for inline language labeling. Such facilities are
  244. inherently more flexible than those defined here as they allow for
  245. language switching in the middle of a string.
  246. Freed & Moore Standards Track [Page 7]
  247. RFC 2231 MIME Value and Encoded Word Extensions November 1997
  248. If and when such facilities are developed they SHOULD be used in
  249. preference to the language labeling facilities specified here. Note
  250. that all the mechanisms defined here allow for the omission of
  251. language labels so as to be able to accommodate this possible future
  252. usage.
  253. 9. Security Considerations
  254. This RFC does not discuss security issues and is not believed to
  255. raise any security issues not already endemic in electronic mail and
  256. present in fully conforming implementations of MIME.
  257. 10. References
  258. [RFC-822]
  259. Crocker, D., "Standard for the Format of ARPA Internet
  260. Text Messages", STD 11, RFC 822 August 1982.
  261. [RFC-1766]
  262. Alvestrand, H., "Tags for the Identification of
  263. Languages", RFC 1766, March 1995.
  264. [RFC-2045]
  265. Freed, N., and N. Borenstein, "Multipurpose Internet Mail
  266. Extensions (MIME) Part One: Format of Internet Message
  267. Bodies", RFC 2045, December 1996.
  268. [RFC-2046]
  269. Freed, N. and N. Borenstein, "Multipurpose Internet Mail
  270. Extensions (MIME) Part Two: Media Types", RFC 2046,
  271. December 1996.
  272. [RFC-2047]
  273. Moore, K., "Multipurpose Internet Mail Extensions (MIME)
  274. Part Three: Representation of Non-ASCII Text in Internet
  275. Message Headers", RFC 2047, December 1996.
  276. [RFC-2048]
  277. Freed, N., Klensin, J. and J. Postel, "Multipurpose
  278. Internet Mail Extensions (MIME) Part Four: MIME
  279. Registration Procedures", RFC 2048, December 1996.
  280. [RFC-2049]
  281. Freed, N. and N. Borenstein, "Multipurpose Internet Mail
  282. Extensions (MIME) Part Five: Conformance Criteria and
  283. Examples", RFC 2049, December 1996.
  284. Freed & Moore Standards Track [Page 8]
  285. RFC 2231 MIME Value and Encoded Word Extensions November 1997
  286. [RFC-2060]
  287. Crispin, M., "Internet Message Access Protocol - Version
  288. 4rev1", RFC 2060, December 1996.
  289. [RFC-2119]
  290. Bradner, S., "Key words for use in RFCs to Indicate
  291. Requirement Levels", RFC 2119, March 1997.
  292. [RFC-2130]
  293. Weider, C., Preston, C., Simonsen, K., Alvestrand, H.,
  294. Atkinson, R., Crispin, M., and P. Svanberg, "Report from the
  295. IAB Character Set Workshop", RFC 2130, April 1997.
  296. [RFC-2183]
  297. Troost, R., Dorner, S. and K. Moore, "Communicating
  298. Presentation Information in Internet Messages: The
  299. Content-Disposition Header", RFC 2183, August 1997.
  300. 11. Authors' Addresses
  301. Ned Freed
  302. Innosoft International, Inc.
  303. 1050 Lakes Drive
  304. West Covina, CA 91790
  305. USA
  306. Phone: +1 626 919 3600
  307. Fax: +1 626 919 3614
  308. EMail: ned.freed@innosoft.com
  309. Keith Moore
  310. Computer Science Dept.
  311. University of Tennessee
  312. 107 Ayres Hall
  313. Knoxville, TN 37996-1301
  314. USA
  315. EMail: moore@cs.utk.edu
  316. Freed & Moore Standards Track [Page 9]
  317. RFC 2231 MIME Value and Encoded Word Extensions November 1997
  318. 12. Full Copyright Statement
  319. Copyright (C) The Internet Society (1997). All Rights Reserved.
  320. This document and translations of it may be copied and furnished to
  321. others, and derivative works that comment on or otherwise explain it
  322. or assist in its implementation may be prepared, copied, published
  323. and distributed, in whole or in part, without restriction of any
  324. kind, provided that the above copyright notice and this paragraph are
  325. included on all such copies and derivative works. However, this
  326. document itself may not be modified in any way, such as by removing
  327. the copyright notice or references to the Internet Society or other
  328. Internet organizations, except as needed for the purpose of
  329. developing Internet standards in which case the procedures for
  330. copyrights defined in the Internet Standards process must be
  331. followed, or as required to translate it into languages other than
  332. English.
  333. The limited permissions granted above are perpetual and will not be
  334. revoked by the Internet Society or its successors or assigns.
  335. This document and the information contained herein is provided on an
  336. "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
  337. TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
  338. BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
  339. HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
  340. MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
  341. Freed & Moore Standards Track [Page 10]