Midjourney’s Textual Turn: Chasing Creativity in LLMs

Midjourney, the AI image generator juggernaut (boasting a Discord community that could populate a small nation), is apparently branching out. Seems visuals weren’t enough. Now, they’re tackling text. The audacity!

Following summer murmurings of bespoke AI hardware, Midjourney dropped a research paper in collaboration with NYU, venturing into the realm of Large Language Models (LLMs). The aim? To inject some much-needed pizzazz into the writing capabilities of models like Meta’s Llama and Mistral’s offerings.

The paper, now residing on Hugging Face, details two new techniques: Diversified Direct Preference Optimization (DDPO) and Diversified Odds Ratio Preference Optimization (DORPO). These aren’t just fancy acronyms; they’re designed to broaden the spectrum of LLM outputs without sacrificing coherence or readability. So, less robotic drone, more…well, let’s just say slightly less robotic drone.

For a company synonymous with dazzling AI imagery, this foray into text suggests a broader ambition. Could this signal a Midjourney-branded LLM, a fine-tuned Frankensteinian version of an existing model? Your guess is as good as mine. Founder David Holz is keeping mum for now.

Regardless of Midjourney’s own LLM ambitions, the research holds broader implications. Enterprise AI teams, product developers, and content creators could leverage these techniques to breathe life into AI-generated text.

It also underscores that the Transformer architecture, despite the hype surrounding multimodal and reasoning models, still has untapped potential for text-focused applications. The old dog still has some tricks, apparently.

The Problem: LLMs and the Blandness of Homogeneity

When it comes to factual Q&A or coding assistance, LLMs are expected to deliver the singular, definitive answer. Think laser-focused precision.

Creative writing, however, thrives on ambiguity and multiple valid interpretations. A prompt like, “Write a story about a dog on the moon,” could yield:

A forgotten astronaut’s canine companion.
A canine space colony refugee.
A stranded pooch befriending lunar aliens.

The problem? Instruction-tuned LLMs often converge on predictable narratives, like lemmings marching toward the same cliff. This is due to:

Post-training prioritizing user preference over originality, leading to a feedback loop of blandness.
Instruction tuning inadvertently smoothing out the interesting wrinkles.
Existing diversity-boosting methods being applied only during the final output stage.

The result is AI-generated content that feels recycled and lacking in depth. Snoozefest, indeed.

The Solution: Injecting Diversity into the Learning Process

To combat this, the researchers developed DDPO and DORPO, extensions of existing preference optimization methods. The key innovation lies in leveraging “deviation” – a measure of how much a response diverges from the norm – to guide training.

Here’s the gist:

The model receives a writing prompt and generates multiple responses.
Each response is compared, and a “deviation” score is assigned.
Rare and high-quality responses are weighted more heavily, rewarding originality.

By integrating deviation into DPO and ORPO, the model learns to produce varied, yet high-quality, responses. This is essentially AI’s version of learning to think outside the box, or perhaps, outside the doghouse.

How Midjourney’s Researchers Did It

The study involved training LLMs on creative writing tasks using the r/writingPrompts subreddit dataset, a fertile ground of prompts and short stories.

The researchers employed two base models:

Meta’s Llama-3.1-8B: An 8-billion-parameter behemoth.
Mistral-7B-v0.3: A 7-billion-parameter contender from Mistral AI.

The process involved:

Supervised Fine-Tuning (SFT): Initial fine-tuning using LoRA for efficient parameter adjustment.
Preference Optimization:
- DPO and ORPO as baselines: Focusing on improving response quality based on preference signals.
- DDPO and DORPO: Introducing deviation-based weighting to encourage unique responses.
Evaluation:
- Automatic evaluation: Measuring diversity using embedding-based techniques.
- Human evaluation: Assessing outputs for diversity and engagement compared to GPT-4o and Claude 3.5.

Key Training Findings:

DDPO outperformed standard DPO in terms of diversity while maintaining quality. Imagine that.
Llama-3.1-8B with DDPO struck the best balance, producing responses more varied than GPT-4o while remaining coherent. A win for team Llama!
Even with reduced dataset sizes, DDPO models retained diversity, but required a minimum number of diverse training samples. No free lunch, apparently.

Enterprise Implications: Creative AI for the Masses

For enterprises leveraging AI-generated content, enhancing diversity is paramount. This research has implications for:

Chatbots: Ensuring responses aren’t carbon copies.
Content Marketing: Preventing robotic copy.
Game Development: Crafting dynamic dialogue.

For professionals fine-tuning models, this research offers:

A way to enhance creativity without compromising quality.
An alternative to inference-time diversity tuning, integrating diversity into the core learning process.
The potential to create more engaging applications.

For those orchestrating AI models, this research highlights:

The importance of training-stage tuning.
A method for adaptive storytelling in AI applications.
A means of making LLM outputs more human-like.

The Future Looks…Slightly Less Predictable

The success of DDPO and DORPO demonstrates the potential of diversity-focused training. Future avenues include:

Integrating deviation-based learning into enterprise models.
Applying these methods to other generative tasks (poetry, screenwriting, etc.).
Developing hybrid approaches that balance diversity and instruction-following.

The code is planned to be available on GitHub.

Whether you’re fine-tuning LLMs or orchestrating large-scale AI deployments, this study offers insights into building AI systems that are not just smart, but also imaginative. Perhaps one day, they’ll even tell a joke that lands.

Midjourney’s SHOCKING New AI Research Will Change Writing Forever!