arin_g
Newbie level 3
normalizing attribute values
Hi all,
I am a researcher in the field of Human Computer interaction.
Now, I am doing research on automatic detection of users skill level
using user interface events.
Machine learning algorithms were used to build statistical predictive
models of skill (decision trees).
we asked some subjects to perform a specific task for 15 trials, so that they acquire skill by repetition.
The specified task was a seven step image manipulation job with a paint software.
All their interactions were logged automatically. We have built our classifiers based
on the attribute values extracted from these log files. Attribute values such as as task completion time, mouse distance traveled,mouse velocity and acceleration, pause counts and etc. were extracted from logged high frequency user interface events(mouse and keyboard events).
instances (attribute values) from early trials of users were labeled as novice and high trials were labeled as skilled.
First we built task-dependent classifiers,
i.e. the classifier is trained and built using attribute values extracted from
interactions to perform a specific step or task in a specific application.
Thus, this sort of classifiers can only be used to classify new instances of attribute
values from the same task in the same application.
For example, consider mouseDistanseTravelled (distance traveled by the mouse to perform a task in an application in pixels) attribute, the values for this attribute learn by a classifier for a specific step or task are dependent on that task and cannot be used for classification in other steps.
In order to make the classifiers reusable in other applications and tasks, We normalized the attributes. Normalizing means, making the range of attribute values identical or making attribute values from different tasks comparable to each other. we considered that some of the attributes were intrinsically normalized for a diverse range of applications and tasks (perhaps with a little approximation) and no further normalizations were necessary, such as mouse velocity.
But, some of the attributes such as mouseDistanseTravelled required normalization. e.g. To normalize distanceTravelled, it was divided by the minimum distance required to be traveled by the mouse to perform that task. Or in order to normalize pauseCount (number of pauses during performance of a task), it was divided by the number of actions (e.g. a mouse click is an action ) performed in that task. This new quantity gives us the average number of
pauses taken place before an action. then we trained our classifiers using this normalized values and built task-independent classifiers that can be used to classify instances from every desired UI task.
My question is:
is our approach in making our classifiers task-independent correct?
are there theories in machine learning for this sort of problems?
does my problem relates to multi-task learning?
Multi-task learning is an approach to machine learning that learns a problem together with other
related problems at the same time, using a shared representation. This often leads to a better
model for the main task, because it allows the learner to use the commonality among the tasks.
but as i interpret, my problem does not relate to multi-task learning, because
i have completely identical tasks and not similar (I want to classify the skill level in different ui tasks).
but multi-task learning is used in situations when multiple related (but not identical) tasks in a domain are
going to be learn such as,
learning the phonemes and stresses to give a speech synthesizer to pronounce the words given it as inputs, or
given a newswire story, predicting its subject categories as well
as the regional categories of reported events based on the same text.
I will be thankful for any guides, comments or any ideas related!
Arin
Hi all,
I am a researcher in the field of Human Computer interaction.
Now, I am doing research on automatic detection of users skill level
using user interface events.
Machine learning algorithms were used to build statistical predictive
models of skill (decision trees).
we asked some subjects to perform a specific task for 15 trials, so that they acquire skill by repetition.
The specified task was a seven step image manipulation job with a paint software.
All their interactions were logged automatically. We have built our classifiers based
on the attribute values extracted from these log files. Attribute values such as as task completion time, mouse distance traveled,mouse velocity and acceleration, pause counts and etc. were extracted from logged high frequency user interface events(mouse and keyboard events).
instances (attribute values) from early trials of users were labeled as novice and high trials were labeled as skilled.
First we built task-dependent classifiers,
i.e. the classifier is trained and built using attribute values extracted from
interactions to perform a specific step or task in a specific application.
Thus, this sort of classifiers can only be used to classify new instances of attribute
values from the same task in the same application.
For example, consider mouseDistanseTravelled (distance traveled by the mouse to perform a task in an application in pixels) attribute, the values for this attribute learn by a classifier for a specific step or task are dependent on that task and cannot be used for classification in other steps.
In order to make the classifiers reusable in other applications and tasks, We normalized the attributes. Normalizing means, making the range of attribute values identical or making attribute values from different tasks comparable to each other. we considered that some of the attributes were intrinsically normalized for a diverse range of applications and tasks (perhaps with a little approximation) and no further normalizations were necessary, such as mouse velocity.
But, some of the attributes such as mouseDistanseTravelled required normalization. e.g. To normalize distanceTravelled, it was divided by the minimum distance required to be traveled by the mouse to perform that task. Or in order to normalize pauseCount (number of pauses during performance of a task), it was divided by the number of actions (e.g. a mouse click is an action ) performed in that task. This new quantity gives us the average number of
pauses taken place before an action. then we trained our classifiers using this normalized values and built task-independent classifiers that can be used to classify instances from every desired UI task.
My question is:
is our approach in making our classifiers task-independent correct?
are there theories in machine learning for this sort of problems?
does my problem relates to multi-task learning?
Multi-task learning is an approach to machine learning that learns a problem together with other
related problems at the same time, using a shared representation. This often leads to a better
model for the main task, because it allows the learner to use the commonality among the tasks.
but as i interpret, my problem does not relate to multi-task learning, because
i have completely identical tasks and not similar (I want to classify the skill level in different ui tasks).
but multi-task learning is used in situations when multiple related (but not identical) tasks in a domain are
going to be learn such as,
learning the phonemes and stresses to give a speech synthesizer to pronounce the words given it as inputs, or
given a newswire story, predicting its subject categories as well
as the regional categories of reported events based on the same text.
I will be thankful for any guides, comments or any ideas related!
Arin