I want to help machines understand the world.

About. I recieved a PhD in Computer Science in Dec. 2016. My research interests lie at the intersection of computer vision and natural language processing, and include deep learning, topic modeling and graphical models. Specifically, I am interested in developing end-to-end learning architectures to jointly detect fine-grained attributes on both images and text. In the summer of 2017, I worked for NASA as an Artificial Intelligence Researcher to automatically search for long-period comets that might impact Earth. In the summer of 2015, I conducted an internship at Microsoft Research in Cambridge, where I focused on machine learning for optimizing environments for large scale software development. Before my PhD, I obtained two Masters degrees, one in Mechanical Engineering, where my research focused on human-robot interaction technologies, and one in Mathematical Physics, where I focused on gravitational fluctuations in Domain Wall Spacetimes. In 2014, I was awarded a Google Anita Borg Scholarship.


Google Scholar Profile

Searching for Long-Period Comets with Deep Learning Tools.

Sponsored by: Nasa seti ibm nvidia

In planetary defense, long-period comets are recognized as the potentially most devastating threat. However, any new comet on an impact trajectory would likely only be discovered about one year before impact. The goal of this project is to add years of extra warning time by providing comet searchers directions on where to look for comets when they are still far out. To aid and guide a dedicated search for these dangerous objects, meteor showers may offer a clue. Comets leave debris trails as they travel along their orbits. When our planet intersects such debris trails, we see them as meteors. By detecting rare aperiodic meteor showers from dust clouds, we can estimate the orbit of the parent body and narrow down the search space where to look for long-period comets.

Illustration of debris trail from the comet Swift-Tuttle. When Earth passes through this trail, we see the famous Perseid meteor shower.

The Cameras for Allsky Meteor Surveillance or CAMS monitors the sky to detect meteors. Until now, processing the images has required time-consuming human input to rule out false positives. Automating this process allows to free the data analyst in CAMS and enable a global expansion and temporal coverage of the camera network that can detect the dust trails of those potentially hazardous long period comets that came close to Earth’s orbit in the past ten millennia.
We developed deep learning tools that allow such automation. Specifically, we developed a Convolutional Neural Network (CNN) that discerns images of meteors vs. other objects in the sky and achieves precision and recall scores of 88.3% and 90.3%, respectively. In addition, we developed a Long-Short Term Memory (LSTM) network that encodes the light curve tracklets into a latent space, and learns to predict whether the tracklet corresponds to a meteor or not. The LSTM achieves a precision of 90.0% and a recall of 89.1%. These methods can now be used by meteor astronomers to automatically analyze sky detections and help guide the search for long-period comets.
Susana Zoghbi, Marcelo De Cicco, Antonio Ordoñez, Andres Plata Stapper, Peter S. Gural, Siddha Ganju, and Peter Jenniskens
Workshop on Deep Learning for Physical Sciences (DLPS 2017), NIPS 2017, Long Beach, CA, USA.
Cross-modal Search for Fashion Attributes
In this paper we develop a neural network which learns inter- modal representations for fashion attributes to be utilized in a cross-modal search tool. Our neural network learns from organic e-commerce data, which is characterized by clean image material, but noisy and incomplete product descrip- tions. First, we experiment with techniques to segment e- commerce images and their product descriptions into respec- tively image and text fragments denoting fashion attributes. Here, we propose a rule-based image segmentation approach which exploits the cleanness of e-commerce images. Next, we design an objective function which encourages our model to induce a common embedding space where a semantically related image fragment and text fragment have a high in- ner product. This objective function incorporates similarity information of image fragments to obtain better intermodal representations. A key insight is that similar looking image fragments should be described with the same text fragments. We explicitly require this in our objective function, and as such recover information which was lost due to noise and in- completeness in the product descriptions. We evaluate the inferred intermodal representations in cross-modal search. We demonstrate that the neural network model trained with our objective function on image fragments acquired with our rule-based segmentation approach improves the results of image search with textual queries by 198% for recall@1 and by 181% for recall@5 compared to results obtained by a state-of-the-art image search system on the same benchmark dataset.

We learn to align image fragments and textual segments which improves performance for the task of cross-modal search.

Katrien Laenen, Susana Zoghbi, Sien Moens
KDD Machine Learning Meets Fashion Worshop, 2017
Latent Dirichlet Allocation for Linking User-Generated Content and e-Commerce Data
Automatic linking of online content improves navigation possibilities for end users. We focus on linking content generated by users to other relevant sites. In particular, we study the problem of linking information between different usages of the same language, e.g., colloquial and formal idioms or the language of consumers versus the language of sellers. The challenge is that the same items are described using very distinct vocabularies. As a case study, we investigate a new task of linking textual Pinterest.com pins (colloquial) to online webshops (formal). We evaluate three different modeling paradigms based on probabilistic topic modeling: monolingual latent Dirichlet allocation (LDA), bilingual LDA (BiLDA) and a novel multi-idiomatic LDA model (MiLDA). We compare these to the unigram model with Dirichlet prior. Our results for all three topic models reveal the usefulness of modeling the hidden thematic structure of the data through topics. Our proposed MiLDA model is able to deal with intrinsic multi-idiomatic data by considering the shared vocabulary between the aligned document pairs.
Susana Zoghbi, Ivan Vulic, Sien Moens
Information Sciences, 2016

Examples that may be linked between user-generated content from Pinterest.com and e-commerce data from Amazon.com. We see the difference in language usage between social media and online products . On each row, the items are the same (or very similar), but the textual description differs. This difference in language makes it difficult to link the items as referring to the related objects.

Fashion Meets Computer Vision and NLP at E-Commerce Search
We focus on cross-modal (visual and textual) e-commerce search within the fashion domain. Particularly, we investigate two tasks: 1) given a query image, we retrieve textual descriptions that correspond to the visual attributes in the query; and 2) given a textual query that may express an interest in specific visual product characteristics, we retrieve relevant images that exhibit the required visual attributes. Our dataset consists of 53,689 images coupled with textual descriptions. The images contain fashion garments that display a great variety of visual attributes, such as different shapes, colors and textures in natural language. Unlike previous datasets, the text provides a rough and noisy description of the item in the image. We extensively analyze this dataset in the context of cross-modal e-commerce search. We investigate two latent variable models to bridge between textual and visual data: bilingual latent Dirichlet allocation and canonical correlation analysis. We use state-of-the-art visual and textual features and report promising results
Susana Zoghbi, Geert Heyman, Juan Carlos Gomez, Sien Moens
International Journal of Computer and Electrical Engineering (IJCEE), 2016

Image to text

Text to image

Cross-Modal Fashion Search
In this paper we show an online demo that allows bidrectional multimodal queries for garments.
Susana Zoghbi, Geert Heyman, Juan Carlos Gomez, Sien Moens
In Lecture Notes in Computer Science (LNCS) Vol. 9517, pp 367-373, 2016
Inferring User Interests on Social Media From Text and Images
We propose to infer user interests on social media where multi-modal data (text, image etc.) exist. We leverage user-generated data from Pinterest.com as a natural expression of users’ interests. Our main contribution is exploiting a multi-modal space composed of images and text. This is a natural approach since humans express their interests with a combination of modalities. We performed experiments using the state-of-the-art image and textual representations, such as convolutional neural networks, word embeddings, and bags of visual and textual words. Our experimental results show that in fact jointly processing image and text increases the overall interest classification accuracy, when compared to uni-modal representations (i.e., using only text or using only images).
Yagmur Gizem Cinar, Susana Zoghbi, Marie-Francine Moens
International Workshop on Social Media Retrieval and Analysis in conjunction with the IEEE International Conference on Data Mining (ICDM 2015)

Examples of pins (image-text pairs) with the corresponding categories (top) our system predicts

Learning to Bridge Colloquial and Formal Language Applied to Linking and Search of E-Commerce Data
We study the problem of linking information between different idiomatic usages of the same language, for example, colloquial and formal language. We propose a novel probabilistic topic model called multi-idiomatic LDA (MiLDA). Its modeling principles follow the intuition that certain words are shared between two idioms of the same language, while other words are non-shared. We demonstrate the ability of our model to learn relations between cross-idiomatic topics in a dataset containing product descriptions and reviews. We present the utility of the new MiLDA topic model in a recently proposed information retrieval task of linking Pinterest pins to online webshops . We show that our multi-idiomatic model outperforms the standard monolingual LDA model and the pure bilingual LDA model both in terms of perplexity and MAP scores in the IR task.
Ivan Vulic, Susana Zoghbi and Sien Moens
ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR '14)

Graphical representation of the multiidiomatic LDA (MiLDA) model

show more
Are words enough?: a study on text-based representations and retrieval models for linking pins to online shops.
Ivan Vulic, Susana Zoghbi and Sien Moens
Proceedings of the 2013 International Workshop on Mining Unstructured Big Data using Natural Language Processing in conjunction with The 24th ACM International Conference on Information and Knowledge Management (CIKM 2013)
I pinned it. where can i buy one like it?: Automatically linking pinterest pins to online webshops
Susana Zoghbi, Ivan Vulic and Sien Moens
DUBMOD '13 Proceedings of the 2013 workshop on Data-driven User Behavioral Modelling and Mining from Social Media. Pages 9-12. In conjunction with The 24th ACM International Conference on Information and Knowledge Management (CIKM 2013)
How well do your facebook status updates express your personality
Golnoosh Farnadi, Susana Zoghbi, Marie-Francine Moens, Martine De Cock
Proceedings of the 22nd Edition of the Annual Belgian-Dutch Conference on Machine Learning, BENELEARN 2013
Recognising personality traits using Facebook status updates
Golnoosh Farnadi, Susana Zoghbi, Marie-Francine Moens, Martine De Cock
Workshop on Computational Personality Recognition (WCPR 2013) in conjunction with the Seventh International AAAI Conference on Weblogs and Social Media (ICWSM)
Enhancing collaborative human–robot interaction through physiological-signal based communication
Susana Zoghbi, Chris Parker, Elizabeth Croft and H.F. Machiel Van der Loos
Proceedings of Workshop on Multimodal Human–Robot Interfaces, 2010 IEEE International Conference on Robotics and Automation (ICRA 2010)
Measuring intent in human-robot cooperative manipulation.
Davide De Carli, Evan Hohert, Chris AC Parker, Susana Zoghbi, Simon Leonard, Elizabeth Croft, Antonio Bicchi
Proceedings of Workshop on Multimodal Human–Robot Interfaces, 2010 IEEE International Conference on Robotics and Automation (ICRA 2010)
Measurement instruments for the anthropomorphism, animacy, likeability, perceived intelligence, and perceived safety of robots
Shristoph Bartneck, Dana Kulić, Elizabeth Croft, Susana Zoghbi
International journal of social robotics, pp 71-81. (2009).
Evaluation of affective state estimations using an on-line reporting device during human-robot interactions
Susana Zoghbi, Elizabeth Croft, Dana Kulić, Mike Van der Loos
International Conference on Intelligent Robots and Systems, 2009. IROS 2009. IEEE/RSJ
On line-Affective state reporting device: A tool for evaluating affective state inference systems
Susana Zoghbi, Dana Kuliff, Elizabeth Croft, Machiel Van der Loos
ACM Proceedings of the 4th ACM/IEEE international conference on Human robot interaction, 2009.