1. Likability. Which model's final output is better?
2. Expression. Which model's final output better expresses the provided images?