rfc2234.txt 24KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788
  1. Network Working Group D. Crocker, Ed.
  2. Request for Comments: 2234 Internet Mail Consortium
  3. Category: Standards Track P. Overell
  4. Demon Internet Ltd.
  5. November 1997
  6. Augmented BNF for Syntax Specifications: ABNF
  7. Status of this Memo
  8. This document specifies an Internet standards track protocol for the
  9. Internet community, and requests discussion and suggestions for
  10. improvements. Please refer to the current edition of the "Internet
  11. Official Protocol Standards" (STD 1) for the standardization state
  12. and status of this protocol. Distribution of this memo is unlimited.
  13. Copyright Notice
  14. Copyright (C) The Internet Society (1997). All Rights Reserved.
  15. TABLE OF CONTENTS
  16. 1. INTRODUCTION .................................................. 2
  17. 2. RULE DEFINITION ............................................... 2
  18. 2.1 RULE NAMING .................................................. 2
  19. 2.2 RULE FORM .................................................... 3
  20. 2.3 TERMINAL VALUES .............................................. 3
  21. 2.4 EXTERNAL ENCODINGS ........................................... 5
  22. 3. OPERATORS ..................................................... 5
  23. 3.1 CONCATENATION RULE1 RULE2 ............................. 5
  24. 3.2 ALTERNATIVES RULE1 / RULE2 ................................... 6
  25. 3.3 INCREMENTAL ALTERNATIVES RULE1 =/ RULE2 .................... 6
  26. 3.4 VALUE RANGE ALTERNATIVES %C##-## ........................... 7
  27. 3.5 SEQUENCE GROUP (RULE1 RULE2) ................................. 7
  28. 3.6 VARIABLE REPETITION *RULE .................................... 8
  29. 3.7 SPECIFIC REPETITION NRULE .................................... 8
  30. 3.8 OPTIONAL SEQUENCE [RULE] ..................................... 8
  31. 3.9 ; COMMENT .................................................... 8
  32. 3.10 OPERATOR PRECEDENCE ......................................... 9
  33. 4. ABNF DEFINITION OF ABNF ....................................... 9
  34. 5. SECURITY CONSIDERATIONS ....................................... 10
  35. Crocker & Overell Standards Track [Page 1]
  36. RFC 2234 ABNF for Syntax Specifications November 1997
  37. 6. APPENDIX A - CORE ............................................. 11
  38. 6.1 CORE RULES ................................................... 11
  39. 6.2 COMMON ENCODING .............................................. 12
  40. 7. ACKNOWLEDGMENTS ............................................... 12
  41. 8. REFERENCES .................................................... 13
  42. 9. CONTACT ....................................................... 13
  43. 10. FULL COPYRIGHT STATEMENT ..................................... 14
  44. 1. INTRODUCTION
  45. Internet technical specifications often need to define a format
  46. syntax and are free to employ whatever notation their authors deem
  47. useful. Over the years, a modified version of Backus-Naur Form
  48. (BNF), called Augmented BNF (ABNF), has been popular among many
  49. Internet specifications. It balances compactness and simplicity,
  50. with reasonable representational power. In the early days of the
  51. Arpanet, each specification contained its own definition of ABNF.
  52. This included the email specifications, RFC733 and then RFC822 which
  53. have come to be the common citations for defining ABNF. The current
  54. document separates out that definition, to permit selective
  55. reference. Predictably, it also provides some modifications and
  56. enhancements.
  57. The differences between standard BNF and ABNF involve naming rules,
  58. repetition, alternatives, order-independence, and value ranges.
  59. Appendix A (Core) supplies rule definitions and encoding for a core
  60. lexical analyzer of the type common to several Internet
  61. specifications. It is provided as a convenience and is otherwise
  62. separate from the meta language defined in the body of this document,
  63. and separate from its formal status.
  64. 2. RULE DEFINITION
  65. 2.1 Rule Naming
  66. The name of a rule is simply the name itself; that is, a sequence of
  67. characters, beginning with an alphabetic character, and followed by
  68. a combination of alphabetics, digits and hyphens (dashes).
  69. NOTE: Rule names are case-insensitive
  70. The names <rulename>, <Rulename>, <RULENAME> and <rUlENamE> all refer
  71. to the same rule.
  72. Crocker & Overell Standards Track [Page 2]
  73. RFC 2234 ABNF for Syntax Specifications November 1997
  74. Unlike original BNF, angle brackets ("<", ">") are not required.
  75. However, angle brackets may be used around a rule name whenever their
  76. presence will facilitate discerning the use of a rule name. This is
  77. typically restricted to rule name references in free-form prose, or
  78. to distinguish partial rules that combine into a string not separated
  79. by white space, such as shown in the discussion about repetition,
  80. below.
  81. 2.2 Rule Form
  82. A rule is defined by the following sequence:
  83. name = elements crlf
  84. where <name> is the name of the rule, <elements> is one or more rule
  85. names or terminal specifications and <crlf> is the end-of- line
  86. indicator, carriage return followed by line feed. The equal sign
  87. separates the name from the definition of the rule. The elements
  88. form a sequence of one or more rule names and/or value definitions,
  89. combined according to the various operators, defined in this
  90. document, such as alternative and repetition.
  91. For visual ease, rule definitions are left aligned. When a rule
  92. requires multiple lines, the continuation lines are indented. The
  93. left alignment and indentation are relative to the first lines of the
  94. ABNF rules and need not match the left margin of the document.
  95. 2.3 Terminal Values
  96. Rules resolve into a string of terminal values, sometimes called
  97. characters. In ABNF a character is merely a non-negative integer.
  98. In certain contexts a specific mapping (encoding) of values into a
  99. character set (such as ASCII) will be specified.
  100. Terminals are specified by one or more numeric characters with the
  101. base interpretation of those characters indicated explicitly. The
  102. following bases are currently defined:
  103. b = binary
  104. d = decimal
  105. x = hexadecimal
  106. Crocker & Overell Standards Track [Page 3]
  107. RFC 2234 ABNF for Syntax Specifications November 1997
  108. Hence:
  109. CR = %d13
  110. CR = %x0D
  111. respectively specify the decimal and hexadecimal representation of
  112. [US-ASCII] for carriage return.
  113. A concatenated string of such values is specified compactly, using a
  114. period (".") to indicate separation of characters within that value.
  115. Hence:
  116. CRLF = %d13.10
  117. ABNF permits specifying literal text string directly, enclosed in
  118. quotation-marks. Hence:
  119. command = "command string"
  120. Literal text strings are interpreted as a concatenated set of
  121. printable characters.
  122. NOTE: ABNF strings are case-insensitive and
  123. the character set for these strings is us-ascii.
  124. Hence:
  125. rulename = "abc"
  126. and:
  127. rulename = "aBc"
  128. will match "abc", "Abc", "aBc", "abC", "ABc", "aBC", "AbC" and "ABC".
  129. To specify a rule which IS case SENSITIVE,
  130. specify the characters individually.
  131. For example:
  132. rulename = %d97 %d98 %d99
  133. or
  134. rulename = %d97.98.99
  135. Crocker & Overell Standards Track [Page 4]
  136. RFC 2234 ABNF for Syntax Specifications November 1997
  137. will match only the string which comprises only lowercased
  138. characters, abc.
  139. 2.4 External Encodings
  140. External representations of terminal value characters will vary
  141. according to constraints in the storage or transmission environment.
  142. Hence, the same ABNF-based grammar may have multiple external
  143. encodings, such as one for a 7-bit US-ASCII environment, another for
  144. a binary octet environment and still a different one when 16-bit
  145. Unicode is used. Encoding details are beyond the scope of ABNF,
  146. although Appendix A (Core) provides definitions for a 7-bit US-ASCII
  147. environment as has been common to much of the Internet.
  148. By separating external encoding from the syntax, it is intended that
  149. alternate encoding environments can be used for the same syntax.
  150. 3. OPERATORS
  151. 3.1 Concatenation Rule1 Rule2
  152. A rule can define a simple, ordered string of values -- i.e., a
  153. concatenation of contiguous characters -- by listing a sequence of
  154. rule names. For example:
  155. foo = %x61 ; a
  156. bar = %x62 ; b
  157. mumble = foo bar foo
  158. So that the rule <mumble> matches the lowercase string "aba".
  159. LINEAR WHITE SPACE: Concatenation is at the core of the ABNF
  160. parsing model. A string of contiguous characters (values) is
  161. parsed according to the rules defined in ABNF. For Internet
  162. specifications, there is some history of permitting linear white
  163. space (space and horizontal tab) to be freelyPand
  164. implicitlyPinterspersed around major constructs, such as
  165. delimiting special characters or atomic strings.
  166. NOTE: This specification for ABNF does not
  167. provide for implicit specification of linear white
  168. space.
  169. Any grammar which wishes to permit linear white space around
  170. delimiters or string segments must specify it explicitly. It is
  171. often useful to provide for such white space in "core" rules that are
  172. Crocker & Overell Standards Track [Page 5]
  173. RFC 2234 ABNF for Syntax Specifications November 1997
  174. then used variously among higher-level rules. The "core" rules might
  175. be formed into a lexical analyzer or simply be part of the main
  176. ruleset.
  177. 3.2 Alternatives Rule1 / Rule2
  178. Elements separated by forward slash ("/") are alternatives.
  179. Therefore,
  180. foo / bar
  181. will accept <foo> or <bar>.
  182. NOTE: A quoted string containing alphabetic
  183. characters is special form for specifying alternative
  184. characters and is interpreted as a non-terminal
  185. representing the set of combinatorial strings with the
  186. contained characters, in the specified order but with
  187. any mixture of upper and lower case..
  188. 3.3 Incremental Alternatives Rule1 =/ Rule2
  189. It is sometimes convenient to specify a list of alternatives in
  190. fragments. That is, an initial rule may match one or more
  191. alternatives, with later rule definitions adding to the set of
  192. alternatives. This is particularly useful for otherwise- independent
  193. specifications which derive from the same parent rule set, such as
  194. often occurs with parameter lists. ABNF permits this incremental
  195. definition through the construct:
  196. oldrule =/ additional-alternatives
  197. So that the rule set
  198. ruleset = alt1 / alt2
  199. ruleset =/ alt3
  200. ruleset =/ alt4 / alt5
  201. is the same as specifying
  202. ruleset = alt1 / alt2 / alt3 / alt4 / alt5
  203. Crocker & Overell Standards Track [Page 6]
  204. RFC 2234 ABNF for Syntax Specifications November 1997
  205. 3.4 Value Range Alternatives %c##-##
  206. A range of alternative numeric values can be specified compactly,
  207. using dash ("-") to indicate the range of alternative values. Hence:
  208. DIGIT = %x30-39
  209. is equivalent to:
  210. DIGIT = "0" / "1" / "2" / "3" / "4" / "5" / "6" /
  211. "7" / "8" / "9"
  212. Concatenated numeric values and numeric value ranges can not be
  213. specified in the same string. A numeric value may use the dotted
  214. notation for concatenation or it may use the dash notation to specify
  215. one value range. Hence, to specify one printable character, between
  216. end of line sequences, the specification could be:
  217. char-line = %x0D.0A %x20-7E %x0D.0A
  218. 3.5 Sequence Group (Rule1 Rule2)
  219. Elements enclosed in parentheses are treated as a single element,
  220. whose contents are STRICTLY ORDERED. Thus,
  221. elem (foo / bar) blat
  222. which matches (elem foo blat) or (elem bar blat).
  223. elem foo / bar blat
  224. matches (elem foo) or (bar blat).
  225. NOTE: It is strongly advised to use grouping
  226. notation, rather than to rely on proper reading of
  227. "bare" alternations, when alternatives consist of
  228. multiple rule names or literals.
  229. Hence it is recommended that instead of the above form, the form:
  230. (elem foo) / (bar blat)
  231. be used. It will avoid misinterpretation by casual readers.
  232. The sequence group notation is also used within free text to set off
  233. an element sequence from the prose.
  234. Crocker & Overell Standards Track [Page 7]
  235. RFC 2234 ABNF for Syntax Specifications November 1997
  236. 3.6 Variable Repetition *Rule
  237. The operator "*" preceding an element indicates repetition. The full
  238. form is:
  239. <a>*<b>element
  240. where <a> and <b> are optional decimal values, indicating at least
  241. <a> and at most <b> occurrences of element.
  242. Default values are 0 and infinity so that *<element> allows any
  243. number, including zero; 1*<element> requires at least one;
  244. 3*3<element> allows exactly 3 and 1*2<element> allows one or two.
  245. 3.7 Specific Repetition nRule
  246. A rule of the form:
  247. <n>element
  248. is equivalent to
  249. <n>*<n>element
  250. That is, exactly <N> occurrences of <element>. Thus 2DIGIT is a
  251. 2-digit number, and 3ALPHA is a string of three alphabetic
  252. characters.
  253. 3.8 Optional Sequence [RULE]
  254. Square brackets enclose an optional element sequence:
  255. [foo bar]
  256. is equivalent to
  257. *1(foo bar).
  258. 3.9 ; Comment
  259. A semi-colon starts a comment that continues to the end of line.
  260. This is a simple way of including useful notes in parallel with the
  261. specifications.
  262. Crocker & Overell Standards Track [Page 8]
  263. RFC 2234 ABNF for Syntax Specifications November 1997
  264. 3.10 Operator Precedence
  265. The various mechanisms described above have the following precedence,
  266. from highest (binding tightest) at the top, to lowest and loosest at
  267. the bottom:
  268. Strings, Names formation
  269. Comment
  270. Value range
  271. Repetition
  272. Grouping, Optional
  273. Concatenation
  274. Alternative
  275. Use of the alternative operator, freely mixed with concatenations can
  276. be confusing.
  277. Again, it is recommended that the grouping operator be used to
  278. make explicit concatenation groups.
  279. 4. ABNF DEFINITION OF ABNF
  280. This syntax uses the rules provided in Appendix A (Core).
  281. rulelist = 1*( rule / (*c-wsp c-nl) )
  282. rule = rulename defined-as elements c-nl
  283. ; continues if next line starts
  284. ; with white space
  285. rulename = ALPHA *(ALPHA / DIGIT / "-")
  286. defined-as = *c-wsp ("=" / "=/") *c-wsp
  287. ; basic rules definition and
  288. ; incremental alternatives
  289. elements = alternation *c-wsp
  290. c-wsp = WSP / (c-nl WSP)
  291. c-nl = comment / CRLF
  292. ; comment or newline
  293. comment = ";" *(WSP / VCHAR) CRLF
  294. alternation = concatenation
  295. *(*c-wsp "/" *c-wsp concatenation)
  296. Crocker & Overell Standards Track [Page 9]
  297. RFC 2234 ABNF for Syntax Specifications November 1997
  298. concatenation = repetition *(1*c-wsp repetition)
  299. repetition = [repeat] element
  300. repeat = 1*DIGIT / (*DIGIT "*" *DIGIT)
  301. element = rulename / group / option /
  302. char-val / num-val / prose-val
  303. group = "(" *c-wsp alternation *c-wsp ")"
  304. option = "[" *c-wsp alternation *c-wsp "]"
  305. char-val = DQUOTE *(%x20-21 / %x23-7E) DQUOTE
  306. ; quoted string of SP and VCHAR
  307. without DQUOTE
  308. num-val = "%" (bin-val / dec-val / hex-val)
  309. bin-val = "b" 1*BIT
  310. [ 1*("." 1*BIT) / ("-" 1*BIT) ]
  311. ; series of concatenated bit values
  312. ; or single ONEOF range
  313. dec-val = "d" 1*DIGIT
  314. [ 1*("." 1*DIGIT) / ("-" 1*DIGIT) ]
  315. hex-val = "x" 1*HEXDIG
  316. [ 1*("." 1*HEXDIG) / ("-" 1*HEXDIG) ]
  317. prose-val = "<" *(%x20-3D / %x3F-7E) ">"
  318. ; bracketed string of SP and VCHAR
  319. without angles
  320. ; prose description, to be used as
  321. last resort
  322. 5. SECURITY CONSIDERATIONS
  323. Security is truly believed to be irrelevant to this document.
  324. Crocker & Overell Standards Track [Page 10]
  325. RFC 2234 ABNF for Syntax Specifications November 1997
  326. 6. APPENDIX A - CORE
  327. This Appendix is provided as a convenient core for specific grammars.
  328. The definitions may be used as a core set of rules.
  329. 6.1 Core Rules
  330. Certain basic rules are in uppercase, such as SP, HTAB, CRLF,
  331. DIGIT, ALPHA, etc.
  332. ALPHA = %x41-5A / %x61-7A ; A-Z / a-z
  333. BIT = "0" / "1"
  334. CHAR = %x01-7F
  335. ; any 7-bit US-ASCII character,
  336. excluding NUL
  337. CR = %x0D
  338. ; carriage return
  339. CRLF = CR LF
  340. ; Internet standard newline
  341. CTL = %x00-1F / %x7F
  342. ; controls
  343. DIGIT = %x30-39
  344. ; 0-9
  345. DQUOTE = %x22
  346. ; " (Double Quote)
  347. HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F"
  348. HTAB = %x09
  349. ; horizontal tab
  350. LF = %x0A
  351. ; linefeed
  352. LWSP = *(WSP / CRLF WSP)
  353. ; linear white space (past newline)
  354. OCTET = %x00-FF
  355. ; 8 bits of data
  356. SP = %x20
  357. Crocker & Overell Standards Track [Page 11]
  358. RFC 2234 ABNF for Syntax Specifications November 1997
  359. ; space
  360. VCHAR = %x21-7E
  361. ; visible (printing) characters
  362. WSP = SP / HTAB
  363. ; white space
  364. 6.2 Common Encoding
  365. Externally, data are represented as "network virtual ASCII", namely
  366. 7-bit US-ASCII in an 8-bit field, with the high (8th) bit set to
  367. zero. A string of values is in "network byte order" with the
  368. higher-valued bytes represented on the left-hand side and being sent
  369. over the network first.
  370. 7. ACKNOWLEDGMENTS
  371. The syntax for ABNF was originally specified in RFC 733. Ken L.
  372. Harrenstien, of SRI International, was responsible for re-coding the
  373. BNF into an augmented BNF that makes the representation smaller and
  374. easier to understand.
  375. This recent project began as a simple effort to cull out the portion
  376. of RFC 822 which has been repeatedly cited by non-email specification
  377. writers, namely the description of augmented BNF. Rather than simply
  378. and blindly converting the existing text into a separate document,
  379. the working group chose to give careful consideration to the
  380. deficiencies, as well as benefits, of the existing specification and
  381. related specifications available over the last 15 years and therefore
  382. to pursue enhancement. This turned the project into something rather
  383. more ambitious than first intended. Interestingly the result is not
  384. massively different from that original, although decisions such as
  385. removing the list notation came as a surprise.
  386. The current round of specification was part of the DRUMS working
  387. group, with significant contributions from Jerome Abela , Harald
  388. Alvestrand, Robert Elz, Roger Fajman, Aviva Garrett, Tom Harsch, Dan
  389. Kohn, Bill McQuillan, Keith Moore, Chris Newman , Pete Resnick and
  390. Henning Schulzrinne.
  391. Crocker & Overell Standards Track [Page 12]
  392. RFC 2234 ABNF for Syntax Specifications November 1997
  393. 8. REFERENCES
  394. [US-ASCII] Coded Character Set--7-Bit American Standard Code for
  395. Information Interchange, ANSI X3.4-1986.
  396. [RFC733] Crocker, D., Vittal, J., Pogran, K., and D. Henderson,
  397. "Standard for the Format of ARPA Network Text Message," RFC 733,
  398. November 1977.
  399. [RFC822] Crocker, D., "Standard for the Format of ARPA Internet Text
  400. Messages", STD 11, RFC 822, August 1982.
  401. 9. CONTACT
  402. David H. Crocker Paul Overell
  403. Internet Mail Consortium Demon Internet Ltd
  404. 675 Spruce Dr. Dorking Business Park
  405. Sunnyvale, CA 94086 USA Dorking
  406. Surrey, RH4 1HN
  407. UK
  408. Phone: +1 408 246 8253
  409. Fax: +1 408 249 6205
  410. EMail: dcrocker@imc.org paulo@turnpike.com
  411. Crocker & Overell Standards Track [Page 13]
  412. RFC 2234 ABNF for Syntax Specifications November 1997
  413. 10. Full Copyright Statement
  414. Copyright (C) The Internet Society (1997). All Rights Reserved.
  415. This document and translations of it may be copied and furnished to
  416. others, and derivative works that comment on or otherwise explain it
  417. or assist in its implementation may be prepared, copied, published
  418. and distributed, in whole or in part, without restriction of any
  419. kind, provided that the above copyright notice and this paragraph are
  420. included on all such copies and derivative works. However, this
  421. document itself may not be modified in any way, such as by removing
  422. the copyright notice or references to the Internet Society or other
  423. Internet organizations, except as needed for the purpose of
  424. developing Internet standards in which case the procedures for
  425. copyrights defined in the Internet Standards process must be
  426. followed, or as required to translate it into languages other than
  427. English.
  428. The limited permissions granted above are perpetual and will not be
  429. revoked by the Internet Society or its successors or assigns.
  430. This document and the information contained herein is provided on an
  431. "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
  432. TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
  433. BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
  434. HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
  435. MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
  436. Crocker & Overell Standards Track [Page 14]