We collected data representing the high-level knowledge sectors from several statistical databases provided by the following:
The data is comprised of:
The economic welfare is represented with the attribute GNI per capita, calculated according to the World Bank Atlas method.
GNI stands for Gross National Income and represents the total value of goods and services produced within a country.
GNI per capita was collected from The World Bank database in two forms:
Data was collected for the years 2001 and 2010. Since countries are not obliged to report all of the data every year, significant proportion of values were missing. In the case of 2001 data, the problem was alleviated with approximation technique – a missing value was substituted with a value for the closest year available from the time interval between 1999 and 2006. In the case of 2010 data, the problem was alleviated by retaining only those examples/countries with less missing values.
The 2010 data was intended for testing. It contains the same set of attributes and modifications as the 2001 data. The 2010 data sets are marked with that year. All other data sets contain data for 2001.
Besides “standard” data sets that contain attributes collected from the statistical databases, there are two types of data sets that contain constructed attributes. The data sets marked as “modified” contain the attributes constructed by a human to test the hypotheses posed during preliminary analysis with the “standard” data (while executing Human-Machine Data Mining interactive algorithm). In contrast, the data sets marked as “constructed” contain automatically constructed attributes, which are obtained by executing sum, min and max functions on pairs of attributes.
name | num. | nom. | ex. | class | download |
---|---|---|---|---|---|
Higher education | 60 | 0 | 167 | discrete | csv, arff |
Higher education-modified | 43 | 6 | 167 | discrete | csv, arff, description of modifications |
Higher education-modified | 40 | 0 | 167 | numerical (1,2,3) | csv, arff |
Higher education-constructed | 5370 | 0 | 167 | discrete | csv, arff |
Higher education-2010 | 60 | 0 | 125 | discrete | csv, arff |
Higher education-modified-2010 | 43 | 6 | 125 | discrete | csv, arff |
Higher education-constructed-2010 | 5370 | 0 | 125 | discrete | csv, arff |
R&D | 48 | 0 | 167 | discrete | csv, arff |
R&D | 48 | 0 | 104 | discrete | csv, arff |
R&D | 48 | 0 | 104 | numerical (1,2,3) | csv, arff |
R&D-modified | 62 | 5 | 167 | discrete | csv, arff, description of modifications |
R&D-constructed | 3432 | 0 | 167 | discrete | csv, arff |
R&D-2010 | 48 | 0 | 78 | discrete | csv, arff |
R&D-modified-2010 | 62 | 5 | 78 | discrete | csv, arff |
R&D-constructed-2010 | 3432 | 0 | 78 | discrete | csv, arff |
High-level knowledge | 108 | 0 | 167 | discrete | csv, arff |
High-level knowledge | 108 | 0 | 167 | numerical (in US$) | csv, arff |
In the table:
Vidulin V., Bohanec M., Gams M. (2014) Combining human analysis and machine data mining to obtain credible data relations. Information Sciences, 288, 254-278.
pdf
Vidulin V. (2012) Searching for credible relations in machine learning, PhD Thesis, JoĹľef Stefan International Postgraduate School, Ljubljana, Slovenia.
pdf
Vidulin V., Gams M. (2011) Impact of high-level knowledge on economic welfare through interactive data mining. Applied Artificial Intelligence, 25(4), 267-291.
pdf