One-to-N Encoding for Nominal Variable


This is an example of my matlab implementation of the function.
One of N encoding is a very simple way of encoding classes for a machine learning method.
A class set is a dataset value that can have one of several non-numeric values.
The number of classes must be known ahead of time.

nv = size(X, 2);
nc = size(X, 1);
Y = X;

% for each variable
for i=1:nv
atts = unique(table2array(X(:, i)));
% We only encode the variable that has more than 2 states (e.g., 0 or 1)
if(size(atts, 1) ~= 2)

numVar = size(atts, 1);
% create new variables equals to the possible state of the variable
v = zeros(nc, numVar);
% for each case
for j=1:nc
% find the index of the state of the variable
idx = find( atts == X{j, i} );
if(size(idx, 1) == 1)
v(j, idx) = 1;
error('Error: Index error when encoding.');
% remove the variable and replace with the new variables
removedVarName = X(:, i).Properties.VariableNames;
Y(:, removedVarName) = [];

newVars = array2table(v);
% rename the var according to the removed variable
for k=1:numVar
name = strcat(removedVarName, '_v', int2str(k));
newVars.Properties.VariableNames(k) = name;

Y = [Y newVars];

end % end if
end % end function OneOfNEncodingNominal

Leave a Reply

Your email address will not be published. Required fields are marked *