AnnouncementsFunnyVideosMusicAncapsTechnologyEconomicsPrivacyGIFSCringeAnarchyFilmPicsThemesIdeas4MatrixAskMatrixHelpTop Subs
3
Comment preview
[-]x0x7
1(+1|0)
It seems pretty smooth as far as AI goes.
Now you said that you are taking the last frame and putting it into the first. How many frames is an image sequence? I'm asking because I'm curious how it managed to keep track and move the glitter in a reasonable way. My own intuition for how these models are usually structured tells me that it would typically suck at that. Was the glitter actually making it across your jumps from one sequence to the next?
On the second part are you talking about averaging the current and prior latent space?
Another thing i thought of after re-reading your question. There is a thing specifically made for animations called animation diffusion, and this can take in a batch of latent space (potential images) and imagine over them to create a contextually cohesive sequence of images uses one of several cohesion techniques called fuse methods. All in all, Its some extra nodes that your run your shit through prior to it hitting up the Ksampler. This lets you use motion loras to add in camera motion or use video sequences (where each frame is translated into latent space) to do vid2vid or gif2gif interpretations.
https://github.com/Kosinkadink/ComfyUI-AnimateDiff-Evolved/blob/main/README.md <- has an animation showing the differing fuse methods https://animatediff.github.io/ <- their landing page and reference to their white paper
-The sequence can be as long as my memory/gpu will allow, so prob around 750 or so with my rig, i dunno, never really just hit go and let it run because it take time and it makes the PC unusable for most things other then basic web surfing. like it 100%'s my 3060 for as long as it is rendering and if the render breaks or is interrupted, then nothing gets kicked out. This workflow lets me break that down into smaller chunks so i can render out 240 frame sequences or whatever and have that continue through the night.
-Further, this also lets me edit prompts, add in nodes, change values WHILE it is generating. So if im doing still images, it can be rendering and i can make my adjustments to fine tune what the subject matter looks like by adding and removing wordage from the input, or dialing in values. Like visual techno.
-I'm using LCM to generate video in this case, which is a really fast generation method that lets you make an image in 4 to 8 steps of your Ksampler (the thing that takes the noise and makes the image), each step bringing the noise from nonsense to image : where before it took 20 to 40 steps. So that lets me have a 1:30 render time for 60 frames vs 10 minutes.
-Currently im taking the last outputted image of that sequence and then refeeding that back into the system, so it has to take that image and then translate that to latent space (VAE encode), and im thinking that is causing a kind of photocopy of a photocopy of a photocopy thing to happen. The tutorial i followed to show how to do an infinite zoom, which is what solved the issue of loops not being allowed in comfyUI, had issues of the image gradually being lightened over time. My idea is what if i bypass that encode - decode - encode loop and pass in the last latent space vs the translated image. So that is what my experiment will be of today.
-Another thing i am trying to figure out is how to take the last image of a sequence (solved) and the first image of a sequence (solved), and then animate from A to B (like an image morph or something). Because animation diffusion will let you dictate that you want a loop vs a non-looping animation, this means that it must be able to do something like that under the hood, i just have to figure out how to expose that.
Sorry for the long winded write up, i just woke up and its fun talking about this stuff.
Also, just remembered why it was important for me to find this solution: Now i can hit go and sleep for a few hour and when i wake up i ought to have a thing. Working while you sleep is VERY American!
Each chunk was 64 frames or 2 sec in time. I can prob do 240 or something and really smooth it out to make that transition less noticeable. Sorry for missing movie night - i had been busy making weird shit with these models.