Google Research has released a research paper on Lumiere, an AI video generation model that can create realistic and coherent video from text prompts and images. The model, which uses a Space-Time U-Net architecture, generates the temporal duration of a video at once through a single pass, resulting in smoother and hyper-realistic video clips. Lumiere can generate video from different inputs, including text-to-video, image-to-video, and stylized generation. Additionally, the model can be used to edit existing videos through visual stylizations, cinemagraphs, and inpainting. In comparison to other text-to-video diffusion models, Lumiere outperformed in terms of visual quality and motion. While the model has not been released to the public, demos can be viewed on the Lumiere website.