You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

242 lines
9.5 KiB

4 years ago
////
Included in:
- user-manual: tables
////
=== Escaping the Cell Separator
The parser scans for the cell separator to partition cells _before_ it processes the cell text.
So even if you try to hide the cell separator using an inline passthrough, the parser will see it.
If the cell contain contains the cell separator, you must escape that character.
There are three ways to escape it:
* Prefix the character with a leading backslash (i.e., `\|`), which will be removed from the output.
* Use the `\{vbar}` attribute reference as a substitute.
* Change the cell separator used by the table.
Unless you do one of these things, the cell separator will be interpreted as a cell boundary.
Consider the following example, which escapes the cell separator using a leading backslash:
[source]
----
[cols=2*]
|====
|The default separator in PSV tables is the \| character.
|The \| character is often referred to as a "`pipe`".
|====
----
This table will render as follows:
.Result: Converted PSV table that contains pipe characters
[cols=2*]
|====
|The default separator in PSV tables is the \| character.
|The \| character is often referred to as a "`pipe`".
|====
Notice that the pipe character appears without the leading backslash (i.e., unescaped) in the rendered result.
An alternative is to use the `\{vbar}` attribute reference as a substitute.
This approach produces the same result as the previous example.
[source]
----
[cols=2*]
|====
|The default separator in PSV tables is the {vbar} character.
|The {vbar} character is often referred to as a "`pipe`".
|====
----
Escaping each cell separator character that appears in the content of a cell can be tedious.
There are also times when you can't or don't want to modify the cell content (perhaps because it is being included from another file).
To address these cases, AsciiDoc allows you to override the cell separator.
The cell separator is controlled using the `separator` attribute on the table block.
You'll want to select a character that will never be used for content.
A good candidate is the broken bar, or `¦`.
Here's the previous example rewritten using a custom separator.
[source]
----
[cols=2*,separator=¦]
|====
¦The default separator in PSV tables is the | character.
¦The | character is often referred to as a "`pipe`".
|====
----
Notice that it's no longer necessary to escape the pipe character in the content of the table cells.
You can safely use the original cell separator in the cell content and not worry about it being interpreted as the boundary of a cell.
[#delimiter-separated-values]
=== Delimiter-Separated Values
Tables can also be populated from data formatted as delimiter-separated values (i.e., data tables).
In contrast with the PSV format, in which the delimiter is placed in front of each cell value, the delimiter in a delimiter-separated format (CSV, TSV, DSV) is placed between the cell values (called a _separator_) and does not accept a cell formatting spec.
Each line of data is assumed to represent a single row, though you'll learn that's not a strict rule.
How the table data gets interpreted is controlled by the `format` and `separator` attributes on the table.
.What the delimiter?
****
Aren't comma-separated values a subset of https://en.wikipedia.org/wiki/Delimiter-separated_values[delimiter-separated values]?
It really depends on who you consult.
The term "`delimiter-separated values`" in this text refers to the family of data formats that use a delimiter, including comma-separated values (CSV), tab-separated values (TSV) and delimited data (DSV), all of which are supported in AsciiDoc tables.
CSV is the data format used most often.
"`Comma-separated values`" is really a misleading term since CSV can use delimiters other than `,` as the field separator (which, in this context, separates cells).
What we're really talking about is how the data is interpreted.
CSV and TSV both use a delimiter and an optional enclosing character, loosely based on https://tools.ietf.org/html/rfc4180[RFC 4180].
DSV (i.e., delimited data) only uses a delimiter, which can be escaped using a backslash; an enclosing character is not recognized.
These parsing rules are described in detail in <<data-table-formats>>.
****
Let's consider an example of using comma-separated values (CSV) to populate an AsciiDoc table with data.
To instruct the processor to read the data as CSV, set the value of the `format` attribute on the table to `csv`.
When the `format` attribute is set to `csv`, the default data separator is a comma (`,`), as seen in the table below.
[source]
----
include::ex-table-data.adoc[tag=csv]
----
.Result: Rendered CSV table
[width=90%]
include::ex-table-data.adoc[tag=csv]
This feature is particularly useful when you want to populate a table in your manuscript from data stored in a separate file.
You can do so using the <<user-manual#include-directive,include directive>> between the table delimiters, as shown here:
[source]
----
[%header,format=csv]
|===
\include::tracks.csv[]
|===
----
If your data is separated by tabs instead of commas, set the `format` to `tsv` (tab-separated values) instead.
Now let's consider an example of using delimited data (DSV) to populate an AsciiDoc table with data.
To instruct the processor to read the data as DSV, set the value of the `format` attribute on the table to `dsv`.
When the `format` attribute is set to `dsv`, the default data separator is a colon (`:`), as seen in the table below.
[source]
----
include::ex-table-data.adoc[tag=dsv]
----
.Result: Rendered DSV table
[width=90%]
include::ex-table-data.adoc[tag=dsv]
==== Data Table Formats
The CSV and TSV data formats are parsed differently from the DSV data format.
The following two sections outline those differences.
===== CSV and TSV
Table data in either CSV or TSV format is parsed according to the following rules, loosely based on https://tools.ietf.org/html/rfc4180[RFC 4180]:
* The default delimiter for CSV is a comma (`,`) while the default delimiter for TSV is a tab character.
* Blank lines are skipped (unless enclosed in a quoted value).
* Whitespace surrounding each value is stripped.
* Values can be enclosed in double quotes (`"`).
** A quoted value may contain zero or more separator or newline characters.
** A newline begins a new row unless the newline is enclosed in double quotes.
** A quoted value may include the double quote character if escaped using another double quote (`""`).
** Newlines in quoted values are retained (as of 1.5.7).
* If rows do not have the same number of cells ("`ragged`" tables), cells are shuffled to fully fill the rows.
** This is different behavior than Excel, which pads short rows with empty cells.
** Extra cells at the end of the last row get dropped.
** As a rule of thumb, data for a single row should be on the same line.
===== DSV
Table data in DSV format is parsed according to the following rules:
* The default delimiter for DSV is a colon (`:`).
* Blank lines are skipped.
* Whitespace surrounding each value is stripped.
* The delimiter character can be included in the value if escaped using a single backslash (`\:`).
* If rows do not have the same number of cells ("`ragged`" tables), cells are shuffled to fully fill the rows.
==== Custom Delimiters
Each data format has a default separator associated with it (csv = comma, tsv = tab, dsv = colon), but the separator can be changed to any character (or even a string of characters) by setting the `separator` attribute on the table.
Here's an example of a DSV table that uses a custom separator character (i.e., delimiter):
.A DSV table with a custom separator
[source]
----
[format=dsv,separator=;]
|===
a;b;c
d;e;f
|===
----
TIP: To make a TSV table, you can set the `format` attribute to `csv` and the separator to `\t`.
Though the `tsv` format is preferred.
The separator is independent of the processing rules for the format.
If you set `format=dsv` and `separator=,`, the data will be processed using the DSV rules, even though the data looks like CSV.
==== Shorthand Notation for Data Tables
Asciidoctor provides shorthand notation for specifying the data format of a table.
The first position of the table block delimiter (i.e., `|===`) can be replaced by a built-in delimiter to set the table format (e.g., `,===` for CSV).
To make a CSV table, you can use `,===` as the table block delimiter:
[source]
----
include::ex-table-data.adoc[tag=s-csv]
----
.Result: Rendered CSV table using shorthand syntax
[width=90%]
include::ex-table-data.adoc[tag=s-csv]
To make a DSV table, you can use `:===` as the table block delimiter:
[source]
----
include::ex-table-data.adoc[tag=s-dsv]
----
.Result: Rendered DSV table using shorthand syntax
[width=90%]
include::ex-table-data.adoc[tag=s-dsv]
When using either the CSV or DSV shorthand, you do not need to set the `format` attribute as it's implied.
To make a TSV table, you can set the `format` attribute to `tsv` instead of having to set the `format` to `csv` and the separator to `\t`.
In this case, you can use either `|===` or `,===` as the table block delimiter.
There is no special delimited block notation for a TSV table.
==== Formatting Cells in a Data Table
The delimited formats do not provide a way to express formatting of individual table cells.
Instead, you can apply cell formatting to all cells in a given column using the `cols` spec on the table:
[source]
----
[format=csv,cols="1h,1a"]
|===
Sky,image::sky.jpg[]
Forest,image::forest.jpg[]
|===
----
Data tables do not support cells that span multiple rows or columns, since that information can only be expressed at the cell level.
You are advised to use the PSV format if you need that functionality.