CSVLoader
From BioWeka
Contents |
Description
The CSVLoader converts any text file in a tabular format into the ARFF format. It was especially designed to load gene expression data. It was tested with the following formats:
- TAV: TIGR MIDAS
- MEV: TIGR MeV
- Stanford Microarray Database
- Spot
Application
Command line
Given the file test.dat shown below:
Name,Experiment 1,Experiment 2 A,1.0 B,NaN,.05
Note that the first line contains the column names and the second and third column is numeric while the first column is of type string. Columns are separated by comma. The first data item is incomplete, i.e. the last column is missing and in the second data item, the third line, a value could not be measured, thus it value is NaN (not a number).
The file can be converted into the ARFF format using the CSVLoader on a Windows command line like this:
>java -cp weka.jar;biojava-1.4.jar;bioweka-0.4.1.jar;jaligner.jar bioweka.core.converters.universal.CSVLoader -i test.dat -o test.arff -X "," -R 2,3
The command line parameters -i and -o define the input and output file. -X sets the column separator and -R defines the range of numeric attributes.
The result in the output file test.arff looks like this:
@relation 'test.dat | bioweka.core.converters.universal.CSVLoader -R 2,3 -X ,' @attribute Name string @attribute 'Experiment 1' numeric @attribute 'Experiment 2' numeric @data A,1,? B,?,0.05

