AMEC Innovation Hub Series: Using Generative AI in Measurement & Analytics: Pitfalls & Opportunities7th September 2023/in AMEC Member Article, Innovation Hub, News DeLange Analytics, Konstantina Vasileva Slaveykova, Victoria University of Wellington/by Julie WilkinsonBy Konstantina Slaveykova, DeLange Analytics Collaborator, Research Capability Team Leader at Victoria University of Wellington The massive success of ChatGPT-3.5 has propelled “Generative AI” from obscure technical jargon into a phrase we now encounter daily. Google searches of the term are through the proverbial roof, and so is content on generative AI’s implications for business, innovation and the future of work. This creates an environment where it can be hard to suss out the hype from reliable information. Fawning over the things generative AI does well (editing, synthesising and generating content in human and programming languages) can leave us blind to its failures: from confident answers based on biased and unreliable training sets to completely fabricated information that looks legitimate to non-experts. There are also ethical, privacy and copyright implications of training the models with datasets containing content scraped from the internet. Pitfalls Good analytics rely on rigour and precision. Generative AI still cannot guarantee them. It is tempting to fall for the confidence of ChatGPT’s answers. It is a technique humans also use called Gish gallop: a rhetorical device of presenting multiple arguments at a rapid pace (with no regard for their accuracy) designed to demonstrate superior knowledge of a topic. Presenting a lot of information projects an air of competence, especially for non-specialised audiences not equipped to identify errors in what is said. Subject matter experts are better at spotting inconsistencies and evaluating ChatGPT responses more critically. A common observation is that ChatGPT “hallucinates” properly looking references and literature reviews when asked to summarise knowledge in a specific domain. However, if you look up the papers and authors mentioned in these results, they often turn out to be non-existent. This issue can be even more problematic in consumer analytics. The sheer volume of online content makes it much harder to verify AI-generated summaries of consumer conversations and social media narratives. Issues arise not only with textual but also with numeric information. Andy Cotgreave (host of If data could talk) recently ran a test trying to perform a basic data processing task: summing up two sets of numbers. ChatGPT did not succeed even at this straightforward math task, getting overall calculations and percentages wrong. Of course, future ChatGPT and other generative AI iterations will likely improve on such mistakes. Still, an objective assessment shows that blind excitement about AI should be replaced with a Trust-but-verify attitude. Generative AI and Consumer Privacy A common assumption in Consumer Intelligence is that it is okay to scrape and mine “publicly available data” and openly shared content on social media. However, there are caveats to what is openly accessible. The LAION data set is a treasure trove of open-source data: it contains 400 million images with text captions used to train commercial machine-learning products. In 2022 a California-based artist discovered a photo from her confidential medical record on LAION using Have I been trained?: a newly developed tool that allows people to check (via text prompt or a photo) if their data is part of LAION. Thad Pitney, legal expert and General Counsel at DataCamp, points out that the problem with private information in datasets is that “language models retain the information acquired during training: even if the training data is later removed”. That means that even campaigns like #OptOut which allow people to ask for their image to be removed from LAION, cannot guarantee that private data is not still in use. The open-source culture in generative AI research has many advantages for business and scientific innovation, but it makes privacy concerns even harder to contain. In media analytics and consumer intelligence, this issue is compounded by the “black box” nature of commercial AI-automated tools. Simply put: analysts may know what media sources are captured by a tool and enjoy the functionality to search images as well as text (thanks to metadata and tags), but there is no transparency about the datasets used for training these tools or the privacy limitations for using the harvested content in a for-profit context. Illustration: AI image created with Bing Image Creator with the prompt “illustration of a robot doing media content analysis” Are we analysing actual human conversations? Unfortunately, troll farms, fake accounts and spam content are a standard fixture of the online landscape. An experienced human analyst should be able to spot them and weed them out (or at least flag them) before they start building a report. Otherwise, poor data will skew measurement and analytic results. However, weeding out non-human input will become increasingly more challenging due to the ability of AI to generate believable texts. AI is better than many humans at spotting issues with spelling and grammar. Due to its exposure to millions of texts, ChatGPT can easily generate well-written or at least believable content. Many forms of online copywriting can be quite generic and formulaic and can be easily automated. Automation is excellent at improving cost efficiency. But it can muddy the water substantially if you want to capture genuine consumer attitudes and measure reputational impact. If it walks like a duck and quacks like a duck, can we tell if it is an actual consumer or an AI-generated online persona? With automation (and increasingly realistic AI-generated images), the number of fake accounts created to boost or bust online reputation can increase exponentially and become much harder to spot. In such a climate, human analysts will need to become even more skilled at evaluating context, making informed judgements and investigating the possible reasons why specific patterns occur. It is also high time we reconsider the mainly numeric criteria about who is actually “influencing” others online. Opportunities Better contextual understanding (with some caveats) No company in the measurement industry that works at scale uses only human-driven monitoring and analysis. That would be unsustainable with the amount of media content generated on a daily basis. Especially if you are tracking multiple brands across markets. That is why the industry has been relying on a combination of human analysis and AI-automated tools for years. Natural language processing (NLP) and machine learning (ML) have become trendy terms recently, but they have actually been around for decades. Many existing media listening tools that provide automated sentiment and topic analysis have been using them (with varying success) for automated topic modelling or text classification. Topic modelling does not require training data. It uses unsupervised ML to group similar words and expressions often used together. The model identifies clusters or common topics by seeking patterns and recognising inherent structures. Text classification is a different approach: it uses supervised ML and requires a set of predefined topics. Then someone needs to tag examples to show the model which text belongs to which topic. The problem with both existing approaches has always been the poor ability of ML to capture context and intangible textual features like humour or sarcasm. Even models that perform well with test data can generalise poorly to real-world content, especially on social media. The analysts who have to clean up dashboards with automated topic and sentiment tags know this all too well. Hanz Saenz Gomez’ AMEC article on Topic analysis with classification models can give you more technical details here. The good news is that new generative AI models have a lot of advantages over previous approaches to text analysis. This is especially evident in newer paid products like ChatGPT-4 and the implementation of Multimodal LLMs (large language models). Multimodal means they can respond to both text and images: which should allow them to achieve a more complex and richer contextual understanding. As Thomas Wolf, cofounder of Hugging Face, shared with MIT Technology Review, multimodality might allow models to “tackle traditional weak points of language models, like spatial reasoning”. LLMs also feature a massive number of parameters, which is still quite expensive to implement and provides a substantial advantage to the few tech players able to go into that space. As with every nascent technology, now is the time to set standards and best practices. Google highlights the potential for using AI to streamline content creation, engagement, and monetisation in media and communications. If you have a realistic understanding of AI’s pitfalls, you can use that knowledge to your advantage. AI can do the grunt work of summarising big data, and human analysts can verify and refine the result, adding in-depth understanding and domain expertise to the process. It’s a good moment for upskilling your team and adding prompt engineering to your analysts’ skillset. Transparency should be a key value in any good reproducible research, and it is even more crucial with the advent of AI tools. What part of the analysis was done by humans? What was automated, and under what parameters? What prompt was used? What is the estimated proportion of the analysed content that can be suspected to be AI-generated? If a client is making substantial financial decisions based on the reported data, the answers to such questions should not be kept in a black box. Ethical AI use can lead to lean work processes, improved efficiency and increased productivity. Why spend 30 minutes on a generic email that can be drafted in seconds by a good prompt? Why spend 2 hours on a presentation deck when you can use this time to further understand your data? AI can create many such efficiencies and free up time for deeper and more cognitively demanding analytical work. The experts who built the field are engineers, cognitive and computer scientists. Listen to what they say (including their ethical concerns) and reserve a healthy scepticism for those who capitalise on the current hype by advertising commercial tools that sound too good to be true. About Konstantina Vasileva Slaveykova – Research Capability Team Leader at Victoria University of Wellington Konstantina has worked in research and media evaluation for more than a decade. Her experience spans from commercial projects for global brands (in organisations like Publicis Groupe and AMEC members De Lange Analytics and A Data Pro) to research work in the higher education sector and STEM. She is in the final stage of her PhD in Cognitive and Behavioral Neuroscience at Victoria University of Wellington, New Zealand. She has also undergone additional training in data analysis with R & Python, data visualisation and data science micro-credentials from IBM. Her Research Capability Team offers digital research tools training and high-performance computing support, as well as consultations & research capability services for academics and early career researchers (including collaboration with the global initiative The Carpentries and co-organising of the digital literacy conference ResBaz). https://amec.blazedev.co.uk/wp-content/uploads/2023/09/delange.jpg 178 260 Julie Wilkinson https://amec.blazedev.co.uk/wp-content/uploads/2021/01/AMEC-25.png Julie Wilkinson2023-09-07 15:30:132023-09-07 15:30:13AMEC Innovation Hub Series: Using Generative AI in Measurement & Analytics: Pitfalls & Opportunities