Informix character fields can contain special characters and even newlines. This allows for formatted data, e.g. a list spanning multiple lines within a single field. A newline is also an end of record character when a table is unloaded to a file, e.g. with the UNLOAD command, a dbexport, or optionally in an external table. To distinguish a literal newline character in the unload file, Informix escapes the newline with a backslash (“\”) to indicate that the next character does not have any special meaning. Note that the escape character can be changed by setting DEFAULTESCCHAR in the Informix configuration file ($ONCONFIG), or as a session environment variable.
When a file is loaded into a table with Informix, the escape character itself is not inserted into the table but instead directs the load to insert the following newline as a literal newline and to not treat it as the end of the record.
When loading such a file into a non-Informix database, the special significance of an escape character is not supported, and instead the loading utility will attempt to load the escape character itself and will then treat the embedded newline as the end of the record. This will usually result in an error because the record will have too few fields. Loading into Oracle for example may show as:
ORA-01722: invalid number
The usual solution for this condition is to convert the unload file and strip out the 2-character combination of backslash + newline. Most Unix utilities will also treat newlines as an end of record characters so you cannot just use a simple “search and replace” regular expression to identify and remove these combinations. There are several ways to tackle the problem, but I use the “sed” command with the “N” operator to pull up the following line when a line ends in a backslash, then remove the backslash + newline and replace it with a “space”. I choose to substitute a space for the newline to preserve readability in cases where the field may have contained a list.
The following is an example of a script that will perform this conversion. It also:
- Strips return characters from the unload file
- Replaces literal “\” characters, shown as “\\” in the unload file with a single backslash
The script assumes that the default character of “\” is used to escape newlines in the file.
INFILE=$1
if [[ ! -s "${INFILE}" ]]
then
echo "File $INFILE does not exist or is empty"
exit 1
fi
cat ${INFILE} | tr -d "\015" | sed '
:loopy
/\\$/{
N
s/\\\n/ /g
/\\$/b loopy
};
s/\\\\/\\/g
'
Input file (tstfile):
AAA|BBB|CCC|
DDD\
D\
DDD|EE\
EE|F\
F|
GGG|HHH|I\
II\
I|
JJJ|KKK|LLL|
MMM|List #1 - 1\\2 a thing\
List #2\
List #3|NNN|
Results:
./remove_nl.sh tstfile
AAA|BBB|CCC|
DDD D DDD|EE EE|F F|
GGG|HHH|I II I|
JJJ|KKK|LLL|
MMM|List #1 - 1\2 a thing List #2 List #3|NNN|
Mike Walker (mike@xdbsystems.com)
IBM Data Champion