dcm2xml [options] dcmfile-in [xmlfile-out]
The dcm2xml utility converts the contents of a DICOM file (file format or raw data set) to XML (Extensible Markup Language). There are two output formats. The first one is specific to DCMTK with its DTD (Document Type Definition) described in the file dcm2xml.dtd. The second one refers to the 'Native DICOM Model' which is specified for the DICOM Application Hosting service found in DICOM part 19.
If dcm2xml reads a raw data set (DICOM data without a file format meta-header) it will attempt to guess the transfer syntax by examining the first few bytes of the file. It is not always possible to correctly guess the transfer syntax and it is better to convert a data set to a file format whenever possible (using the dcmconv utility). It is also possible to use the -f and -t[ieb] options to force dcm2xml to read a data set with a particular transfer syntax.
dcmfile-in DICOM input filename to be converted xmlfile-out XML output filename (default: stdout)
  -h    --help
          print this help text and exit
        --version
          print version information and exit
        --arguments
          print expanded command line arguments
  -q    --quiet
          quiet mode, print no warnings and errors
  -v    --verbose
          verbose mode, print processing details
  -d    --debug
          debug mode, print debug information
  -ll   --log-level  [l]evel: string constant
          (fatal, error, warn, info, debug, trace)
          use level l for the logger
  -lc   --log-config  [f]ilename: string
          use config file f for the logger
input file format:
  +f    --read-file
          read file format or data set (default)
  +fo   --read-file-only
          read file format only
  -f    --read-dataset
          read data set without file meta information
input transfer syntax:
  -t=   --read-xfer-auto
          use TS recognition (default)
  -td   --read-xfer-detect
          ignore TS specified in the file meta header
  -te   --read-xfer-little
          read with explicit VR little endian TS
  -tb   --read-xfer-big
          read with explicit VR big endian TS
  -ti   --read-xfer-implicit
          read with implicit VR little endian TS
long tag values:
  +M    --load-all
          load very long tag values (e.g. pixel data)
  -M    --load-short
          do not load very long values (default)
  +R    --max-read-length  [k]bytes: integer (4..4194302, default: 4)
          set threshold for long values to k kbytes
specific character set:
  +Cr   --charset-require
          require declaration of extended charset (default)
  +Ca   --charset-assume  [c]harset: string
          assume charset c if no extended charset declared
  +Cc   --charset-check-all
          check all data elements with string values
          (default: only PN, LO, LT, SH, ST, UC and UT)
          # this option is only used for the mapping to an appropriate
          # XML character encoding, but not for the conversion to UTF-8
  +U8   --convert-to-utf8
          convert all element values that are affected
          by Specific Character Set (0008,0005) to UTF-8
          # requires support from an underlying character encoding library
          # (see output of --version on which one is available)
general XML format:
  -dtk  --dcmtk-format
          output in DCMTK-specific format (default)
  -nat  --native-format
          output in Native DICOM Model format (part 19)
  +Xn   --use-xml-namespace
          add XML namespace declaration to root element
DCMTK-specific format (not with --native-format):
  +Xd   --add-dtd-reference
          add reference to document type definition (DTD)
  +Xe   --embed-dtd-content
          embed document type definition into XML document
  +Xf   --use-dtd-file  [f]ilename: string
          use specified DTD file (only with +Xe)
          (default: /usr/local/share/dcmtk/dcm2xml.dtd)
  +Wn   --write-element-name
          write name of the DICOM data elements (default)
  -Wn   --no-element-name
          do not write name of the DICOM data elements
  +Wb   --write-binary-data
          write binary data of OB and OW elements
          (default: off, be careful with --load-all)
encoding of binary data:
  +Eh   --encode-hex
          encode binary data as hex numbers
          (default for DCMTK-specific format)
  +Eu   --encode-uuid
          encode binary data as a UUID reference
          (default for Native DICOM Model)
  +Eb   --encode-base64
          encode binary data as Base64 (RFC 2045, MIME)
The basic structure of the DCMTK-specific XML output created from a DICOM file looks like the following:
<?xml version="1.0" encoding="ISO-8859-1"?> <!DOCTYPE file-format SYSTEM "dcm2xml.dtd"> <file-format xmlns="http://dicom.offis.de/dcmtk"> <meta-header xfer="1.2.840.10008.1.2.1" name="Little Endian Explicit"> <element tag="0002,0000" vr="UL" vm="1" len="4" name="MetaElementGroupLength"> 166 </element> ... <element tag="0002,0013" vr="SH" vm="1" len="16" name="ImplementationVersionName"> OFFIS_DCMTK_353 </element> </meta-header> <data-set xfer="1.2.840.10008.1.2" name="Little Endian Implicit"> <element tag="0008,0005" vr="CS" vm="1" len="10" name="SpecificCharacterSet"> ISO_IR 100 </element> ... <sequence tag="0028,3010" vr="SQ" card="2" name="VOILUTSequence"> <item card="3"> <element tag="0028,3002" vr="xs" vm="3" len="6" name="LUTDescriptor"> 256\0\8 </element> ... </item> ... </sequence> ... <element tag="7fe0,0010" vr="OW" vm="1" len="262144" name="PixelData" loaded="no" binary="hidden"> </element> </data-set> </file-format>
The 'file-format' and 'meta-header' tags are absent for DICOM data sets.
Furthermore, binary information of OB and OW attributes are not written to the XML output file by default. These elements can be identified by the additional attribute 'binary' with a value of 'hidden' (default is 'no'). The command line option --write-binary-data causes also binary value fields to be printed (attribute value is 'yes' or 'base64'). But, be careful when using this option together with --load-all because of the large amounts of pixel data that might be printed to the output. Please note that in this context element values with a VR of OD or OF are not regarded as 'binary information'.
Multiple values (i.e. where the DICOM value multiplicity is greater than 1) are separated by a backslash '\' (except for Base64 encoded data). The 'len' attribute indicates the number of bytes for the particular value field as stored in the DICOM data set, i.e. it might deviate from the XML encoded value length e.g. because of non-significant padding that has been removed. If this attribute is missing in 'sequence' or 'item' start tags, the corresponding DICOM element has been stored with undefined length.
The description of the Native DICOM Model format can be found in the DICOM standard, part 19 ('Application Hosting').
In addition, Supplement 163 (Store Over the Web by Representational State Transfer Services) introduces a new <InlineBinary> XML element that allows for encoding binary data as Base64. Currently, the command line option --encode-base64 enables this encoding for the following VRs: OB, OD, OF, OW, and UN.
ASCII (ISO_IR 6) => "UTF-8" UTF-8 "ISO_IR 192" => "UTF-8" ISO Latin 1 "ISO_IR 100" => "ISO-8859-1" ISO Latin 2 "ISO_IR 101" => "ISO-8859-2" ISO Latin 3 "ISO_IR 109" => "ISO-8859-3" ISO Latin 4 "ISO_IR 110" => "ISO-8859-4" ISO Latin 5 "ISO_IR 148" => "ISO-8859-9" Cyrillic "ISO_IR 144" => "ISO-8859-5" Arabic "ISO_IR 127" => "ISO-8859-6" Greek "ISO_IR 126" => "ISO-8859-7" Hebrew "ISO_IR 138" => "ISO-8859-8"
If this DICOM attribute is missing in the input file, although needed, option --charset-assume can be used to specify an appropriate character set manually (using one of the DICOM defined terms). For reasons of backward compatibility with previous versions of this tool, the following terms are also supported and mapped automatically to the associated DICOM defined terms: latin-1, latin-2, latin-3, latin-4, latin-5, cyrillic, arabic, greek, hebrew.
Multiple character sets using code extension techniques are not supported. If needed, option --convert-to-utf8 can be used to convert the DICOM file or data set to UTF-8 encoding prior to the conversion to XML format. This is also useful for DICOMDIR files where each directory record can have a different character set.
The level of logging output of the various command line tools and underlying libraries can be specified by the user. By default, only errors and warnings are written to the standard error stream. Using option --verbose also informational messages like processing details are reported. Option --debug can be used to get more details on the internal activity, e.g. for debugging purposes. Other logging levels can be selected using option --log-level. In --quiet mode only fatal errors are reported. In such very severe error events, the application will usually terminate. For more details on the different logging levels, see documentation of module 'oflog'.
In case the logging output should be written to file (optionally with logfile rotation), to syslog (Unix) or the event log (Windows) option --log-config can be used. This configuration file also allows for directing only certain messages to a particular output stream and for filtering certain messages based on the module or application where they are generated. An example configuration file is provided in <etcdir>/logger.cfg.
All command line tools use the following notation for parameters: square brackets enclose optional values (0-1), three trailing dots indicate that multiple values are allowed (1-n), a combination of both means 0 to n values.
Command line options are distinguished from parameters by a leading '+' or '-' sign, respectively. Usually, order and position of command line options are arbitrary (i.e. they can appear anywhere). However, if options are mutually exclusive the rightmost appearance is used. This behavior conforms to the standard evaluation rules of common Unix shells.
In addition, one or more command files can be specified using an '@' sign as a prefix to the filename (e.g. @command.txt). Such a command argument is replaced by the content of the corresponding text file (multiple whitespaces are treated as a single separator unless they appear between two quotation marks) prior to any further evaluation. Please note that a command file cannot contain another command file. This simple but effective approach allows one to summarize common combinations of options/parameters and avoids longish and confusing command lines (an example is provided in file <datadir>/dumppat.txt).
The dcm2xml utility will attempt to load DICOM data dictionaries specified in the DCMDICTPATH environment variable. By default, i.e. if the DCMDICTPATH environment variable is not set, the file <datadir>/dicom.dic will be loaded unless the dictionary is built into the application (default for Windows).
The default behavior should be preferred and the DCMDICTPATH environment variable only used when alternative data dictionaries are required. The DCMDICTPATH environment variable has the same format as the Unix shell PATH variable in that a colon (':') separates entries. On Windows systems, a semicolon (';') is used as a separator. The data dictionary code will attempt to load each file specified in the DCMDICTPATH environment variable. It is an error if no data dictionary can be loaded.
<datadir>/dcm2xml.dtd - Document Type Definition (DTD) file
Copyright (C) 2002-2016 by OFFIS e.V., Escherweg 2, 26121 Oldenburg, Germany.