SG++-Doxygen-Documentation
Loading...
Searching...
No Matches
sgpp::datadriven::DataSourceConfig Struct Reference

Configuration structure used for all kinds of SampleProviders including default values. More...

#include <DataSourceConfig.hpp>

Public Attributes

size_t batchSize_ = 0
 
datadriven::DataTransformationConfig dataTransformationConfig_
 
size_t epochs_ = 1
 The number of epochs to train on.
 
std::string filePath_ = ""
 Valid path to a file on disk.
 
DataSourceFileType fileType_ = DataSourceFileType::NONE
 Which type of input file are we dealing with? NONE for auto detection or generated artificial datasets.
 
bool hasTargets_ = true
 whether the file has targets (i.e.
 
bool isCompressed_ = false
 The dataset is gzip compressed.
 
size_t numBatches_ = 1
 How many batches should the dataset be split into for batch learning - if 1, take the entire dataset.
 
int64_t randomSeed_ = -1
 Seed for the shuffling prng.
 
std::vector< double > readinClasses_ = std::vector<double>()
 Specifies the set of classes (targets) to be read-in from the data file Any line with a class not contained in this vector is skipped If hasTargets=false this is ignored If empty then all classes/targets are considered (default)
 
std::vector< size_t > readinColumns_ = std::vector<size_t>()
 Specifies the set of columns (dimensions) to be read-in from the data file Starts at 0, order matters; Any column not contained in this vector is ignored as a dimension If empty, then all columns are read in (default)
 
size_t readinCutoff_ = -1
 After how many (valid) lines of the sourcefile to stop reading.
 
DataSourceShufflingType shuffling_ = DataSourceShufflingType::sequential
 The type of shuffling to be applied to the data.
 
size_t testBatchSize_ = 0
 
std::string testFilePath_ = ""
 Valid path to a file on disk.
 
DataSourceFileType testFileType_ = DataSourceFileType::NONE
 Which type of input file are we dealing with? NONE for auto detection or generated artificial datasets.
 
bool testHasTargets_ = true
 whether the file has targets (i.e.
 
bool testIsCompressed_ = false
 The dataset is gzip compressed.
 
size_t testNumBatches_ = 1
 How many batches should the dataset be split into for batch learning - if 1, take the entire dataset.
 
std::vector< double > testReadinClasses_ = std::vector<double>()
 Specifies the set of classes (targets) to be read-in from the data file Any line with a class not contained in this vector is skipped If hasTargets=false this is ignored If empty then all classes/targets are considered (default)
 
std::vector< size_t > testReadinColumns_ = std::vector<size_t>()
 Specifies the set of columns (dimensions) to be read-in from the data file Starts at 0, order matters; Any column not contained in this vector is ignored as a dimension If empty, then all columns are read in (default)
 
size_t testReadinCutoff_ = -1
 After how many (valid) lines of the sourcefile to stop reading.
 
double validationPortion_ = 0.3
 

Detailed Description

Configuration structure used for all kinds of SampleProviders including default values.

Member Data Documentation

◆ batchSize_

◆ dataTransformationConfig_

◆ epochs_

size_t sgpp::datadriven::DataSourceConfig::epochs_ = 1

The number of epochs to train on.

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ filePath_

std::string sgpp::datadriven::DataSourceConfig::filePath_ = ""

◆ fileType_

◆ hasTargets_

bool sgpp::datadriven::DataSourceConfig::hasTargets_ = true

whether the file has targets (i.e.

supervised learning)

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ isCompressed_

◆ numBatches_

size_t sgpp::datadriven::DataSourceConfig::numBatches_ = 1

How many batches should the dataset be split into for batch learning - if 1, take the entire dataset.

Referenced by sgpp::datadriven::DataSource::end(), sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig(), sgpp::datadriven::DataSource::getNextSamples(), and sgpp::datadriven::DataSourceBuilder::inBatches().

◆ randomSeed_

int64_t sgpp::datadriven::DataSourceConfig::randomSeed_ = -1

◆ readinClasses_

std::vector<double> sgpp::datadriven::DataSourceConfig::readinClasses_ = std::vector<double>()

Specifies the set of classes (targets) to be read-in from the data file Any line with a class not contained in this vector is skipped If hasTargets=false this is ignored If empty then all classes/targets are considered (default)

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ readinColumns_

std::vector<size_t> sgpp::datadriven::DataSourceConfig::readinColumns_ = std::vector<size_t>()

Specifies the set of columns (dimensions) to be read-in from the data file Starts at 0, order matters; Any column not contained in this vector is ignored as a dimension If empty, then all columns are read in (default)

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ readinCutoff_

size_t sgpp::datadriven::DataSourceConfig::readinCutoff_ = -1

After how many (valid) lines of the sourcefile to stop reading.

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ shuffling_

◆ testBatchSize_

size_t sgpp::datadriven::DataSourceConfig::testBatchSize_ = 0

◆ testFilePath_

std::string sgpp::datadriven::DataSourceConfig::testFilePath_ = ""

Valid path to a file on disk.

Empty for generated artificial datasets

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ testFileType_

DataSourceFileType sgpp::datadriven::DataSourceConfig::testFileType_ = DataSourceFileType::NONE

Which type of input file are we dealing with? NONE for auto detection or generated artificial datasets.

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ testHasTargets_

bool sgpp::datadriven::DataSourceConfig::testHasTargets_ = true

whether the file has targets (i.e.

supervised learning)

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ testIsCompressed_

bool sgpp::datadriven::DataSourceConfig::testIsCompressed_ = false

The dataset is gzip compressed.

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ testNumBatches_

size_t sgpp::datadriven::DataSourceConfig::testNumBatches_ = 1

How many batches should the dataset be split into for batch learning - if 1, take the entire dataset.

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ testReadinClasses_

std::vector<double> sgpp::datadriven::DataSourceConfig::testReadinClasses_ = std::vector<double>()

Specifies the set of classes (targets) to be read-in from the data file Any line with a class not contained in this vector is skipped If hasTargets=false this is ignored If empty then all classes/targets are considered (default)

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ testReadinColumns_

std::vector<size_t> sgpp::datadriven::DataSourceConfig::testReadinColumns_ = std::vector<size_t>()

Specifies the set of columns (dimensions) to be read-in from the data file Starts at 0, order matters; Any column not contained in this vector is ignored as a dimension If empty, then all columns are read in (default)

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ testReadinCutoff_

size_t sgpp::datadriven::DataSourceConfig::testReadinCutoff_ = -1

After how many (valid) lines of the sourcefile to stop reading.

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ validationPortion_

double sgpp::datadriven::DataSourceConfig::validationPortion_ = 0.3

The documentation for this struct was generated from the following file: