That glorious name, McDonald! An exploration of cross-modal correspondences in brand names and logos using CLIP

Paul Schreiber

Tilburg University

Giovanni Cassani

Tilburg University

Research in marketing shows that companies benefit from a coherent brand identity which leverages congruency in name, company mission, and logo. We use CLIP (Contrastive Language-Image Pre-training) to study branding strategies and assess to what extent logos align with names and mission. By focusing on different company names (relying on made-up words, e.g., Verizon, people names, e.g., MacDonald, or existing words, e.g., Apple), we aim to assess whether CLIP successfully encodes relations between company logo, mission, and name also when the name does not have an established meaning in the English language.

We used the Large Logo Dataset, from which we derived company names and logos, and the Company classification Dataset to retrieve company descriptions. We removed all words identifying an industry, e.g., transportation, to avoid that superficial correspondences between name and description trivialize the task. At the end of this process, we had a total of 1,813 <name,logo,description> triplets. Company names were then manually tagged as consisting of made-up words, people names, existing words, or a combination of these.

In the first experiment, we embedded the company description and the name. We were interested in whether the two are embedded similarly and evaluate this by sampling 19 foils and checking whether the correct name falls in the top 1 or 5 closest names by cosine similarity. The foils were chosen in three different ways to assess to what extent the form of the name affects retrieval: i) randomly; ii) selecting the closest names in terms of edit distance; and iii) selecting the closest names in terms of cosine distance in the CLIP embedding space. We then embedded the logo using CLIP and took the average between the embedding of the description and that of the logo, expecting that this would improve retrieval.

Performance is remarkably good considering the limited amount of information: all models outperform a random choice baseline as well as the permutation-based baselines. Performance is highest when foils are sampled randomly with 26,4% (@1) and 53,4% (@5) but remains high in the Levenshtein condition (23,7% @1 and 53,1% @5) and only drops in the cosine condition (21,5% @1 and 47,4% @5). Adding logos generally improves retrieval of the right name by 17,4% (@1) and 14% (@5) when foils are sampled randomly, and by 1.17% (@1) and 14.16% (@5) in the edit-distance sampling. In the cosine-based sampling, adding logos yields a decrease in performance of 39.23% (@1) and 34,1% (@5) compared to only using descriptions. Performance is however not stable across name types, with CLIP struggling with people names but doing well with both made-up and real words. Adding logos considerably increases the performance for made-up words by 16.32% (@5).

These results confirm that CLIP, which relies on statistical co-occurrences between (sub-)lexical regularities and visual features encodes form-meaning correspondences which extend to made-up words and capture intuitions that marketers embedded in the choice of a company name and logo.

CLIN33
The 33rd Meeting of Computational Linguistics in The Netherlands (CLIN 33)
UAntwerpen City Campus: Building R
Rodestraat 14, Antwerp, Belgium
22 September 2023
logo of Clips