HTML in text
Introduction
Some fields in ONIX support complex formatted text such as (X)HTML, Word documents and PDF. Example of these fields are <Description> and <Text>. All of these fields use the textformat attribute to define the content type of these fields. Bokbasen only supports plain text in their data for these fields, but we allow several text formats to be imported. This page explains which text formats that supported and how Bokbasen converts unsupported data into plain text.
This text assumes technical knowledge on HTML, HTML entities and the ONIX standard. Contact Bokbasen if you have any questions.
textformat attribute
Bokbasen only support the following value for textformat="", fields with other values will be ignored: empty (defaults to 06), 02,03,05 or 06. All of these values are found in Onix code list 34.
Conversion
All fields that have supported value in textformat will be treated equally, and converted according to the following rules:
- All HTML-tags are removed
- All HTML entities are converted to standard characters
Standard line breaks (\n) are preserved. Bokbasen does not do any attempt to replicate formatting given in HTML, these are just removed. To ensure that data sent to Bokbasen is exported in the same way, we recommend properly formatted plain text and text attribute 06.
Examples
Here are some examples of how various combinations of entities and CDATA tags are handled by Bokbasen.
Onix in | Onix export |
---|---|
<Text>Bill & Ben</Text> | <Text>Bill & Ben</Text> |
<Text>Bill <![CDATA[&]]> Ben</Text> | <Text>Bill & Ben</Text> |
<Text>Klågerup</Text> | <Text>Klågerup</Text> |
<Text>Kl<![CDATA[å]]>gerup</Text> | <Text>Klågerup</Text> |
<Text>Kl<![CDATA[å]]>gerup</Text> | <Text>Klågerup</Text> |
<Text>Klågerup</Text> | <Text>Klågerup</Text> |
<Text>Klågerup</Text> | <Text>Klågerup</Text> |
<Text>Kl<![CDATA[å]]>gerup</Text> | <Text>Klågerup</Text> |
<Text>Kl<![CDATA[å]]>gerup</Text> | <Text>Klågerup</Text> |
<Text><![CDATA[<p>Testing</p><br/> Number]]></Text> | <Text>Testing Number</Text> |