- Data Interchange Format
-
Data Interchange Format (.dif) is a text file format used to import/export single spreadsheets between spreadsheet programs (OpenOffice.org Calc, Excel, Gnumeric, StarCalc, Lotus 1-2-3, FileMaker, dBase, Framework, Multiplan, etc.). It is also known as "Navy DIF". One limitation is that DIF format cannot handle multiple spreadsheets in a single workbook.
Contents
History
DIF was developed by Software Arts, Inc. (the developers of the VisiCalc program) in the early 1980s. The specification was included in many copies of VisiCalc, and published in Byte Magazine. Bob Frankston developed the format, with input from others, including Mitch Kapor, who helped so that it could work with his VisiPlot program. (Mitch later went on to found Lotus and make Lotus 1-2-3 happen.) The specification was copyright 1981.
DIF was a registered trademark of Software Arts Products Corp. (a legal name for Software Arts at the time).
Syntax
DIF stores everything in an ASCII text file to mitigate many cross-platform issues back in the days of its creation. However modern spreadsheet software, e.g. OpenOffice.org Calc and Gnumeric, offer more character encoding to export/import. The file is divided into 2 sections: header and data. Everything in DIF is represented by a 2- or 3-line chunk. Headers get a 3-line chunk; data, 2. Header chunks start with a text identifier that is all caps, only alphabetic characters, and less than 32 letters. The following line must be a pair of numbers, and the third line must be a quoted string. On the other hand, data chunks start with a number pair and the next line is a quoted string or a keyword.
Values
A value occupies two lines, the first a pair of numbers and the second either a string or a keyword. The first number of the pair indicates type:
- −1 – directive type, the second number is ignored, the following line is one of these keywords:
- BOT – beginning of tuple (start of row)
- EOD – end of data
- 0 – numeric type, value is the second number, the following line is one of these keywords:
- V – valid
- NA – not available
- ERROR – error
- TRUE – true boolean value
- FALSE – false boolean value
- 1 – string type, the second number is ignored, the following line is the string in double quotes
Header chunk
A header chunk is composed of an identifier line followed by the two lines of a value.
- TABLE - a numeric value follows of the version, the disused second line of the value contains a generator comment
- VECTORS - the number of columns follows as a numeric value
- TUPLES - the number of rows follows as a numeric value
- DATA - after a dummy 0 numeric value, the data for the table follow, each row preceded by a BOT value, the entire table terminated by an EOD value
The numeric values in header chunks use just an empty string instead of the validity keywords.
Discrepancies in implementations
Some implementations (notably those of older Microsoft products) swapped the meaning of VECTORS and TUPLES. Some implementations are insensitive to errors in the dimensions of the table as written in the header and simply use the layout in the DATA section.
Example
For example, assume we have two columns with one column header row and two data rows:
Text Number hello 1 has a double quote " in text -3 In a .dif file, this would be:
TABLE 0,1 "EXCEL" VECTORS 0,2 "" TUPLES 0,3 "" DATA 0,0 "" -1,0 BOT 1,0 "Text" 1,0 "Number" -1,0 BOT 1,0 "hello" 0,1 V -1,0 BOT 1,0 "has a double quote "" in text" 0,-3 V -1,0 EOD
References
- Jeff Walden: File Formats for Popular PC Software. John Wiley & Sons, Inc., 1986. ISBN 0-471-83671-0
- Comment from Dan Bricklin, one of the developers of VisiCalc, on the discussion page of this article
External links
- Article on Navy DIF
- Announcement of DIF Clearinghouse by Software Arts Products Corp.
Categories:- Spreadsheet file formats
- Data serialization formats
- −1 – directive type, the second number is ignored, the following line is one of these keywords:
Wikimedia Foundation. 2010.