So the reason im bashing my head against latent space is because when you translate it to an image through your VAE, then that kind of hammers down the details and stuff. Well, if you translate it back, then it washes those details + more. So its not a perfect translation back, hence the photocopy of a photocopy reference. Thus, im trying to do my "image" modification to the latent space and not the image it kicks out to preserve that noise continuity. I will run an experiment of translating it back and forth from latent to image to latent to image and post the result. I'll also post my workflow _! But i havent found a good comparison node as of yet to allow a 0 to 1 blend of a 2nd latent space. Like, the issue is that over time the image washes out because the noise is unstable and will want to come to rest, and that means a black image. So to fight against that, the tutorial i found that guided me into refeeding the image back into the system to have this loop, that guy was manually adjusting the amount of noise/color offset that that would be injected into the image. I think thats dumb because we have nodes that evaluate the luminosity and gamma of an image, therefore, i ought to be able to use that to control or gate how much noise gets injected/the color offset during renders to fight against the system coming to a rest.
Is there any good A.I. that can take a moderate sketch, improve it, make it more photo real or stylize it for a graphic novel, ad colour, etc.?
Can you focus on a part that you don't like and ask for it to do different/better?
I'm asking because I have a screenplay (to finish) that I want to illustrate (also takes time) into a graphic novel - but only in frames that are cinematic. Thus it would be the perfect pitch tool for actually making it into a feature.
Yes. The term you are looking for is in painting, which both comfyUI and automatic A1111 can do out of the box. In painting allows you to upload an image, modify it or mask it (or then upload a 2nd image which acts as your mask), and that then reinterprets that part of the image. here are 2 intros -
past that, if you are just looking to enhance a doodle or to generate something for inspiration, then img2img is another technique that is the above process but without a mask. The CFG value is how strong it sticks to your input prompt, forcing it to put in whatever your prompt is, while the steps is how many times its allowed to modify the image. So if you have a doodle and you want it to only slightly change it, then a small CFG 2.5 or so, and a small number of steps 10 or 5, will produce. Of course you should play with those values to dial in what you want.
Finally, im more about animation these days. Look into live portrait for facial reinterpretations - [https://x.com/Yokohara_h/status/1815440947976794574] example of output. full body movement has several processes that go along with that, but i gotta write some code so i dont have time to talk more in depth about that. If you need a walk through or have any other questions, hit me up here or we can do a screen share sometime and i'll go through a workflow or two with you.
//------
As far as styles go, check out LORA's, which are mini models that are trained to output a specific style or look.
Maybe you know of an A.I. that can find and write grants for me. (kidding) Then I can actually spend the time needed to focus and finish getting all these stories, some finished some not, out of my head and onto media that I can share. The sooner they're out, the sooner they can be improved, built on, or even go into production.
I don't know why X hasn't been working for me for a week or two.
LOL : Style of Buckminster Fuller.
Wouldn't mind seeing the styles of some of my faves: Arthur Adams, Banksy, Bill Sienkiewicz, Boris Vallejo, Drew Struzan, Frank Frazetta, Glenn Barr, H.R. Giger, Mark Bryan, Mark Ryden, Moebius , Robert Crumb, Shepard Fairey.
It looks like he uses Mark Ryden.
I see he did Audrey Kawasaki, Alex Ross, and Frank Miller.
Oh shit, read the list of artists. Yeah, i got a banksy lora, crumb lora and giger lora. The others i'm sure you can find. I mostly just browse looking for styles and grab the ones that pop because you can always blend these when generating images. So 20% crumb and 40% 1930's animation, and 100% Glam Rock.
Well, chatgpt helped me write my preliminary patent application for the USPTO and file as a micro entity (brings it down to 80 bucks or 60 bucks, something like that). I'd check with Claude.ai as its not google based and is slightly better currently then GPT4.0 (which i have a lot of complaints with). The issues of LLM's is that you have to at least kind of know what your talking about, its like dealing with a yes man who will latch onto the first thing you say. Its just a rubber duck that can surf the web sometimes : https://en.wikipedia.org/wiki/Rubber_duck_debugging. For code, which is what i primarily use it for, it has opened my eyes to some patterns. It's def gone down hill however, as it used to tell you if your opinion or grasp of knowledge was wrong. Now its just yes this, yes that. But it is nice for it to kick out a few functions and i can concentrate on the creative part of coding.
I both agree and disagree. If you do not train the model to answer questions, then the model may refuse to answer or give inaccurate data. That is the risk of tossing your questions into the box and trusting its response. So, as they refined the model to answer questions, it did just that, but did so by trying to shut up the user as fast as possible. Image if it hurt every time you answered a question. Your answers could not be wrong, because then the questioner would ask for clarification. Your answers would not be long, because it hurts. Your answers would try to placate the questioner as fast as possible, and the most effective method of doing that would be telling them they are indeed correct and here is a short reason why they are. If you look at the difference from the start of bings AI vs OpenAI's, you can see a drastic difference and what a poorly trained model outputs vs what a decent model can do (not does). But knowing that allows you to trick the box into providing quality information by A) not posing questions that presume a solution and B) asking for technical specifics and either being able to test those specifics or know enough to judge if its bullshit or not. THis is why it rocks for coding.
How much does it differ from not adding the latent-rep of a white solid? What happens if you blend it with the latent-rep from an actual image?
That actually might be a good case for inline images here.
![](image url)
So the reason im bashing my head against latent space is because when you translate it to an image through your VAE, then that kind of hammers down the details and stuff. Well, if you translate it back, then it washes those details + more. So its not a perfect translation back, hence the photocopy of a photocopy reference. Thus, im trying to do my "image" modification to the latent space and not the image it kicks out to preserve that noise continuity. I will run an experiment of translating it back and forth from latent to image to latent to image and post the result. I'll also post my workflow _! But i havent found a good comparison node as of yet to allow a 0 to 1 blend of a 2nd latent space. Like, the issue is that over time the image washes out because the noise is unstable and will want to come to rest, and that means a black image. So to fight against that, the tutorial i found that guided me into refeeding the image back into the system to have this loop, that guy was manually adjusting the amount of noise/color offset that that would be injected into the image. I think thats dumb because we have nodes that evaluate the luminosity and gamma of an image, therefore, i ought to be able to use that to control or gate how much noise gets injected/the color offset during renders to fight against the system coming to a rest.
Is there any good A.I. that can take a moderate sketch, improve it, make it more photo real or stylize it for a graphic novel, ad colour, etc.?
Can you focus on a part that you don't like and ask for it to do different/better?
I'm asking because I have a screenplay (to finish) that I want to illustrate (also takes time) into a graphic novel - but only in frames that are cinematic. Thus it would be the perfect pitch tool for actually making it into a feature.
FYI, I can draw very well. But it does take time.
Yes. The term you are looking for is in painting, which both comfyUI and automatic A1111 can do out of the box. In painting allows you to upload an image, modify it or mask it (or then upload a 2nd image which acts as your mask), and that then reinterprets that part of the image. here are 2 intros -
past that, if you are just looking to enhance a doodle or to generate something for inspiration, then img2img is another technique that is the above process but without a mask. The CFG value is how strong it sticks to your input prompt, forcing it to put in whatever your prompt is, while the steps is how many times its allowed to modify the image. So if you have a doodle and you want it to only slightly change it, then a small CFG 2.5 or so, and a small number of steps 10 or 5, will produce. Of course you should play with those values to dial in what you want.
Finally, im more about animation these days. Look into live portrait for facial reinterpretations - [https://x.com/Yokohara_h/status/1815440947976794574] example of output. full body movement has several processes that go along with that, but i gotta write some code so i dont have time to talk more in depth about that. If you need a walk through or have any other questions, hit me up here or we can do a screen share sometime and i'll go through a workflow or two with you.
//------
As far as styles go, check out LORA's, which are mini models that are trained to output a specific style or look.
Wow!
Thanks for all that!
Maybe you know of an A.I. that can find and write grants for me. (kidding) Then I can actually spend the time needed to focus and finish getting all these stories, some finished some not, out of my head and onto media that I can share. The sooner they're out, the sooner they can be improved, built on, or even go into production.
I don't know why X hasn't been working for me for a week or two.
LOL : Style of Buckminster Fuller.
Wouldn't mind seeing the styles of some of my faves: Arthur Adams, Banksy, Bill Sienkiewicz, Boris Vallejo, Drew Struzan, Frank Frazetta, Glenn Barr, H.R. Giger, Mark Bryan, Mark Ryden, Moebius , Robert Crumb, Shepard Fairey.
It looks like he uses Mark Ryden.
I see he did Audrey Kawasaki, Alex Ross, and Frank Miller.
Oh shit, read the list of artists. Yeah, i got a banksy lora, crumb lora and giger lora. The others i'm sure you can find. I mostly just browse looking for styles and grab the ones that pop because you can always blend these when generating images. So 20% crumb and 40% 1930's animation, and 100% Glam Rock.
Well, chatgpt helped me write my preliminary patent application for the USPTO and file as a micro entity (brings it down to 80 bucks or 60 bucks, something like that). I'd check with Claude.ai as its not google based and is slightly better currently then GPT4.0 (which i have a lot of complaints with). The issues of LLM's is that you have to at least kind of know what your talking about, its like dealing with a yes man who will latch onto the first thing you say. Its just a rubber duck that can surf the web sometimes : https://en.wikipedia.org/wiki/Rubber_duck_debugging. For code, which is what i primarily use it for, it has opened my eyes to some patterns. It's def gone down hill however, as it used to tell you if your opinion or grasp of knowledge was wrong. Now its just yes this, yes that. But it is nice for it to kick out a few functions and i can concentrate on the creative part of coding.
TIL.
I suspect the problematic yes-boss A.I. yes-man stuff is due to Google wanting to provide the one and only answer, when often it's not that simple.
I think my X wasn't working for lack of space. Maybe.
I both agree and disagree. If you do not train the model to answer questions, then the model may refuse to answer or give inaccurate data. That is the risk of tossing your questions into the box and trusting its response. So, as they refined the model to answer questions, it did just that, but did so by trying to shut up the user as fast as possible. Image if it hurt every time you answered a question. Your answers could not be wrong, because then the questioner would ask for clarification. Your answers would not be long, because it hurts. Your answers would try to placate the questioner as fast as possible, and the most effective method of doing that would be telling them they are indeed correct and here is a short reason why they are. If you look at the difference from the start of bings AI vs OpenAI's, you can see a drastic difference and what a poorly trained model outputs vs what a decent model can do (not does). But knowing that allows you to trick the box into providing quality information by A) not posing questions that presume a solution and B) asking for technical specifics and either being able to test those specifics or know enough to judge if its bullshit or not. THis is why it rocks for coding.