Configuration structure used for all kinds of SampleProviders including default values. More...

#include <DataSourceConfig.hpp>

Public Attributes
size_t	batchSize_ = 0

datadriven::DataTransformationConfig	dataTransformationConfig_

size_t	epochs_ = 1
	The number of epochs to train on.

std::string	filePath_ = ""
	Valid path to a file on disk.

DataSourceFileType	fileType_ = DataSourceFileType::NONE
	Which type of input file are we dealing with? NONE for auto detection or generated artificial datasets.

bool	hasTargets_ = true
	whether the file has targets (i.e.

bool	isCompressed_ = false
	The dataset is gzip compressed.

size_t	numBatches_ = 1
	How many batches should the dataset be split into for batch learning - if 1, take the entire dataset.

int64_t	randomSeed_ = -1
	Seed for the shuffling prng.

std::vector< double >	readinClasses_ = std::vector<double>()
	Specifies the set of classes (targets) to be read-in from the data file Any line with a class not contained in this vector is skipped If hasTargets=false this is ignored If empty then all classes/targets are considered (default)

std::vector< size_t >	readinColumns_ = std::vector<size_t>()
	Specifies the set of columns (dimensions) to be read-in from the data file Starts at 0, order matters; Any column not contained in this vector is ignored as a dimension If empty, then all columns are read in (default)

size_t	readinCutoff_ = -1
	After how many (valid) lines of the sourcefile to stop reading.

DataSourceShufflingType	shuffling_ = DataSourceShufflingType::sequential
	The type of shuffling to be applied to the data.

size_t	testBatchSize_ = 0

std::string	testFilePath_ = ""
	Valid path to a file on disk.

DataSourceFileType	testFileType_ = DataSourceFileType::NONE
	Which type of input file are we dealing with? NONE for auto detection or generated artificial datasets.

bool	testHasTargets_ = true
	whether the file has targets (i.e.

bool	testIsCompressed_ = false
	The dataset is gzip compressed.

size_t	testNumBatches_ = 1
	How many batches should the dataset be split into for batch learning - if 1, take the entire dataset.

std::vector< double >	testReadinClasses_ = std::vector<double>()
	Specifies the set of classes (targets) to be read-in from the data file Any line with a class not contained in this vector is skipped If hasTargets=false this is ignored If empty then all classes/targets are considered (default)

std::vector< size_t >	testReadinColumns_ = std::vector<size_t>()
	Specifies the set of columns (dimensions) to be read-in from the data file Starts at 0, order matters; Any column not contained in this vector is ignored as a dimension If empty, then all columns are read in (default)

size_t	testReadinCutoff_ = -1
	After how many (valid) lines of the sourcefile to stop reading.

double	validationPortion_ = 0.3

Detailed Description

Configuration structure used for all kinds of SampleProviders including default values.

Member Data Documentation

◆ batchSize_

size_t sgpp::datadriven::DataSourceConfig::batchSize_ = 0

Referenced by sgpp::datadriven::DataSource::getAllSamples(), sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig(), sgpp::datadriven::DataSource::getNextSamples(), and sgpp::datadriven::DataSourceBuilder::withBatchSize().

◆ dataTransformationConfig_

datadriven::DataTransformationConfig sgpp::datadriven::DataSourceConfig::dataTransformationConfig_

Referenced by sgpp::datadriven::DataSource::DataSource(), sgpp::datadriven::DataSource::getAllSamples(), sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig(), and sgpp::datadriven::DataSource::getNextSamples().

◆ epochs_

size_t sgpp::datadriven::DataSourceConfig::epochs_ = 1

The number of epochs to train on.

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ filePath_

std::string sgpp::datadriven::DataSourceConfig::filePath_ = ""

Valid path to a file on disk.

Empty for generated artificial datasets

Referenced by sgpp::datadriven::DataSource::DataSource(), sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig(), and sgpp::datadriven::DataSourceBuilder::withPath().

◆ fileType_

DataSourceFileType sgpp::datadriven::DataSourceConfig::fileType_ = DataSourceFileType::NONE

Which type of input file are we dealing with? NONE for auto detection or generated artificial datasets.

Referenced by sgpp::datadriven::DataSourceBuilder::crossValidationAssemble(), sgpp::datadriven::DataSourceBuilder::crossValidationFromConfig(), sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig(), sgpp::datadriven::DataSourceBuilder::splittingAssemble(), sgpp::datadriven::DataSourceBuilder::splittingFromConfig(), sgpp::datadriven::DataSourceBuilder::withFileType(), and sgpp::datadriven::DataSourceBuilder::withPath().

◆ hasTargets_

bool sgpp::datadriven::DataSourceConfig::hasTargets_ = true

whether the file has targets (i.e.

supervised learning)

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ isCompressed_

bool sgpp::datadriven::DataSourceConfig::isCompressed_ = false

The dataset is gzip compressed.

Referenced by sgpp::datadriven::DataSourceBuilder::crossValidationAssemble(), sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig(), sgpp::datadriven::DataSourceBuilder::splittingAssemble(), and sgpp::datadriven::DataSourceBuilder::withCompression().

◆ numBatches_

size_t sgpp::datadriven::DataSourceConfig::numBatches_ = 1

How many batches should the dataset be split into for batch learning - if 1, take the entire dataset.

Referenced by sgpp::datadriven::DataSource::end(), sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig(), sgpp::datadriven::DataSource::getNextSamples(), and sgpp::datadriven::DataSourceBuilder::inBatches().

◆ randomSeed_

int64_t sgpp::datadriven::DataSourceConfig::randomSeed_ = -1

Seed for the shuffling prng.

Referenced by sgpp::datadriven::DataShufflingFunctorFactory::buildDataShufflingFunctor(), and sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ readinClasses_

std::vector<double> sgpp::datadriven::DataSourceConfig::readinClasses_ = std::vector<double>()

Specifies the set of classes (targets) to be read-in from the data file Any line with a class not contained in this vector is skipped If hasTargets=false this is ignored If empty then all classes/targets are considered (default)

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ readinColumns_

std::vector<size_t> sgpp::datadriven::DataSourceConfig::readinColumns_ = std::vector<size_t>()

Specifies the set of columns (dimensions) to be read-in from the data file Starts at 0, order matters; Any column not contained in this vector is ignored as a dimension If empty, then all columns are read in (default)

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ readinCutoff_

size_t sgpp::datadriven::DataSourceConfig::readinCutoff_ = -1

After how many (valid) lines of the sourcefile to stop reading.

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ shuffling_

DataSourceShufflingType sgpp::datadriven::DataSourceConfig::shuffling_ = DataSourceShufflingType::sequential

The type of shuffling to be applied to the data.

Referenced by sgpp::datadriven::DataShufflingFunctorFactory::buildDataShufflingFunctor(), and sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ testBatchSize_

size_t sgpp::datadriven::DataSourceConfig::testBatchSize_ = 0

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ testFilePath_

std::string sgpp::datadriven::DataSourceConfig::testFilePath_ = ""

Valid path to a file on disk.

Empty for generated artificial datasets

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ testFileType_

DataSourceFileType sgpp::datadriven::DataSourceConfig::testFileType_ = DataSourceFileType::NONE

Which type of input file are we dealing with? NONE for auto detection or generated artificial datasets.

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ testHasTargets_

bool sgpp::datadriven::DataSourceConfig::testHasTargets_ = true

whether the file has targets (i.e.

supervised learning)

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ testIsCompressed_

bool sgpp::datadriven::DataSourceConfig::testIsCompressed_ = false

The dataset is gzip compressed.

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ testNumBatches_

size_t sgpp::datadriven::DataSourceConfig::testNumBatches_ = 1

How many batches should the dataset be split into for batch learning - if 1, take the entire dataset.

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ testReadinClasses_

std::vector<double> sgpp::datadriven::DataSourceConfig::testReadinClasses_ = std::vector<double>()

Specifies the set of classes (targets) to be read-in from the data file Any line with a class not contained in this vector is skipped If hasTargets=false this is ignored If empty then all classes/targets are considered (default)

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ testReadinColumns_

std::vector<size_t> sgpp::datadriven::DataSourceConfig::testReadinColumns_ = std::vector<size_t>()

Specifies the set of columns (dimensions) to be read-in from the data file Starts at 0, order matters; Any column not contained in this vector is ignored as a dimension If empty, then all columns are read in (default)

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ testReadinCutoff_

size_t sgpp::datadriven::DataSourceConfig::testReadinCutoff_ = -1

After how many (valid) lines of the sourcefile to stop reading.

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig().

◆ validationPortion_

double sgpp::datadriven::DataSourceConfig::validationPortion_ = 0.3

Referenced by sgpp::datadriven::DataMiningConfigParser::getDataSourceConfig(), and sgpp::datadriven::DataSourceSplitting::reset().

The documentation for this struct was generated from the following file:

datadriven/src/sgpp/datadriven/datamining/modules/dataSource/DataSourceConfig.hpp

Public Attributes

Detailed Description

Member Data Documentation

◆ batchSize_

◆ dataTransformationConfig_

◆ epochs_

◆ filePath_

◆ fileType_

◆ hasTargets_

◆ isCompressed_

◆ numBatches_

◆ randomSeed_

◆ readinClasses_

◆ readinColumns_

◆ readinCutoff_

◆ shuffling_

◆ testBatchSize_

◆ testFilePath_

◆ testFileType_

◆ testHasTargets_

◆ testIsCompressed_

◆ testNumBatches_

◆ testReadinClasses_

◆ testReadinColumns_

◆ testReadinCutoff_

◆ validationPortion_