Everyday, we are exposed to various images and videos thanks to the social media, like Facebook, Youtube, Flickr, Instagram and others. In this scenario, the use of expressing preferences for a given multimedia content (for example by the use of liking mechanisms) has become pervasive and massive, becoming a social mass phenomenon. One of the main findings of cognitive sciences is that automatic processes of which we are unaware shape, to a significant extent, our perception of the environment. The phenomenon applies not only to the real world, but also to multimedia data we consume every day. Whenever we look at pictures, watch a video or listen to audio recordings, our conscious attention efforts focus on the observable content, but our cognition spontaneously perceives intentions, beliefs, values, attitudes and other constructs that, while being outside of our conscious awareness, still shape our reactions and behavior. So far, multimedia technologies have neglected such a phenomenon to a large extent. This thesis argues that taking into account cognitive effects is possible and it can also improve multimedia approaches. For this purpose we take into account Computational Aesthetics and Social Signal Processing principles under a computational point of view. On one side Computational Aesthetics makes applicable aesthetic decision in a similar fashion as human can allowing to multimedia technologies to learn, model and evaluate a common sense of beauty. On the other side, Social Signal Processing field has the aim of modeling with algorithms cognitive processes that codify social signal and that lead us to interact with a particular way with people or to prefer a particular image or video. This represents an invaluable opportunity for CA because human aesthetic response is formed by a combination of genetic predisposition, cultural assimilation, and unique individual experience and indeed it can be learned from online pictures using the wisdom of crowds. The thesis focuses on images as a first attempt in this direction. The motivation of why focusing on pictures are many: from one side, taking pictures is the action most commonly performed with mobile phones, on the other side, users either post online original images or videos or share and redistribute those posted by others. To this aim the thesis presents a study on personal aesthetics, where the goal is to recognize people and their characteristics by considering the images they like by developing several hybrid approaches using generative models and regressors. The general idea assumes that, given a set of preferred images, it is possible to extract a set of features individuating discriminative visual patterns, that can be used to infer personal characteristics of the subject that preferred them. As first contribution we propose a soft biometric system, that allows to discriminate an individual from another using the images he/she likes. The study and development of biometric system have become of paramount importance for both identification of individual and security applications and recommendation systems. On a dataset of 200 users and 40K images, the developed frameworks gives 97\% of probability of guessing the correct user using 5 preferred images as biometric template; as for the verification capability, the equal error rate is 0.11. Furthermore, we developed a system able to infer the personality of a subject using the images preferred by him/her. The motivation is that whenever we meet a person for the first time, but also when we observe her in video recordings, or we interact with an artifact displaying human-like behavior or with the multimedia material she shares online, we tend to attribute personality traits to her. The process is spontaneous and unconscious. While not necessarily accurate, the process still influences significantly our behavior towards others, especially when in comes to social interactions. As a supporting proof-of-concept, the thesis shows that there are visual patterns correlated with the personality traits of Flickr users to a statistically significant extent, and that the personality traits (both self-assessed and attributed by others) of those users can be inferred from the images these latter mark as ``favorite''. One of the most important part of the thesis has been the collection of the PsychoFlickr corpus, composed of 60K images of 300 Flickr users annotated in terms of personality traits both self and attributed by 22 assessors. The prediction are performed using multiple approaches (multiple instance regression approach and a deep learning framework), reaching a correlation up to 0.68 and an accuracy up to 0.69 between actual and predicted traits. The prediction of traits attributed from others achieve higher results compared to the self-assessed ones: the reason is that pictures dominate the personality impressions that the judges develop and the consensus across the judges is statistically significant. These two conditions help the regression approaches to achieve higher performances. When the users self-assess their personality, they take into account information that is not available in the favorite pictures like, e.g., personal history, inner state, education, etc.. Therefore, this does not allow the regression approaches to achieve high performances. This is an important finding as it can help to better understand the social behavior of people, to design artificial agents capable of eliciting the perception of predefined desirable traits and providing suggestions on how to manage online impressions using favorite pictures. \\
A Social Signal Processing Perspective on Computational Aesthetics: Theories and Applications - OPERA NON IN COMMERCIO
Segalin, Cristina
2016-01-01
Abstract
Everyday, we are exposed to various images and videos thanks to the social media, like Facebook, Youtube, Flickr, Instagram and others. In this scenario, the use of expressing preferences for a given multimedia content (for example by the use of liking mechanisms) has become pervasive and massive, becoming a social mass phenomenon. One of the main findings of cognitive sciences is that automatic processes of which we are unaware shape, to a significant extent, our perception of the environment. The phenomenon applies not only to the real world, but also to multimedia data we consume every day. Whenever we look at pictures, watch a video or listen to audio recordings, our conscious attention efforts focus on the observable content, but our cognition spontaneously perceives intentions, beliefs, values, attitudes and other constructs that, while being outside of our conscious awareness, still shape our reactions and behavior. So far, multimedia technologies have neglected such a phenomenon to a large extent. This thesis argues that taking into account cognitive effects is possible and it can also improve multimedia approaches. For this purpose we take into account Computational Aesthetics and Social Signal Processing principles under a computational point of view. On one side Computational Aesthetics makes applicable aesthetic decision in a similar fashion as human can allowing to multimedia technologies to learn, model and evaluate a common sense of beauty. On the other side, Social Signal Processing field has the aim of modeling with algorithms cognitive processes that codify social signal and that lead us to interact with a particular way with people or to prefer a particular image or video. This represents an invaluable opportunity for CA because human aesthetic response is formed by a combination of genetic predisposition, cultural assimilation, and unique individual experience and indeed it can be learned from online pictures using the wisdom of crowds. The thesis focuses on images as a first attempt in this direction. The motivation of why focusing on pictures are many: from one side, taking pictures is the action most commonly performed with mobile phones, on the other side, users either post online original images or videos or share and redistribute those posted by others. To this aim the thesis presents a study on personal aesthetics, where the goal is to recognize people and their characteristics by considering the images they like by developing several hybrid approaches using generative models and regressors. The general idea assumes that, given a set of preferred images, it is possible to extract a set of features individuating discriminative visual patterns, that can be used to infer personal characteristics of the subject that preferred them. As first contribution we propose a soft biometric system, that allows to discriminate an individual from another using the images he/she likes. The study and development of biometric system have become of paramount importance for both identification of individual and security applications and recommendation systems. On a dataset of 200 users and 40K images, the developed frameworks gives 97\% of probability of guessing the correct user using 5 preferred images as biometric template; as for the verification capability, the equal error rate is 0.11. Furthermore, we developed a system able to infer the personality of a subject using the images preferred by him/her. The motivation is that whenever we meet a person for the first time, but also when we observe her in video recordings, or we interact with an artifact displaying human-like behavior or with the multimedia material she shares online, we tend to attribute personality traits to her. The process is spontaneous and unconscious. While not necessarily accurate, the process still influences significantly our behavior towards others, especially when in comes to social interactions. As a supporting proof-of-concept, the thesis shows that there are visual patterns correlated with the personality traits of Flickr users to a statistically significant extent, and that the personality traits (both self-assessed and attributed by others) of those users can be inferred from the images these latter mark as ``favorite''. One of the most important part of the thesis has been the collection of the PsychoFlickr corpus, composed of 60K images of 300 Flickr users annotated in terms of personality traits both self and attributed by 22 assessors. The prediction are performed using multiple approaches (multiple instance regression approach and a deep learning framework), reaching a correlation up to 0.68 and an accuracy up to 0.69 between actual and predicted traits. The prediction of traits attributed from others achieve higher results compared to the self-assessed ones: the reason is that pictures dominate the personality impressions that the judges develop and the consensus across the judges is statistically significant. These two conditions help the regression approaches to achieve higher performances. When the users self-assess their personality, they take into account information that is not available in the favorite pictures like, e.g., personal history, inner state, education, etc.. Therefore, this does not allow the regression approaches to achieve high performances. This is an important finding as it can help to better understand the social behavior of people, to design artificial agents capable of eliciting the perception of predefined desirable traits and providing suggestions on how to manage online impressions using favorite pictures. \\File | Dimensione | Formato | |
---|---|---|---|
tesiCS.pdf
accesso aperto
Tipologia:
Tesi di dottorato
Licenza:
Dominio pubblico
Dimensione
81.26 MB
Formato
Adobe PDF
|
81.26 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.