I’m looking for something that I could run locally and turn loose on a collection of videos to get a quick list of tags for each piece of content.
e.g. A video of a cat playing in a front yard on a sunny day would generate a collection of tags like ["Cat", "Grass", "Sidewalk", "Sunny", "Flower", "Dirt"]
or a video of children playing on a playground would generate ["Child", "Slide", "Swing", "Seesaw", "Kids"]
There seem to be a number of online products that will do this sort of thing for YouTube videos or allow you to upload content to their cloud for analysis (and often for a decent price) but I don’t want to run everything through the internet as it seems like I’d spend more time uploading stuff than it’d be worth the bother.
It seems like OpenCV might be capable of doing something like this, but I haven’t found anyone speaking of its use without having to first train your own model which would probably reduce the effectiveness of this approach as I’d have to go tag all my own content first to teach the model how to do it?
I’m also interested in this. I want to set up a camera and speaker to automatically yell at my dog when he’s digging up the garden beds.
OpenCV may be fairly set for purpose for your needs already. If you could train the Cascade classifier (See: https://docs.opencv.org/4.x/dc/d88/tutorial_traincascade.html) with enough data of what constitutes your dog in the garden, you’d be able to string a video feed through a script and have it trigger audio playback whenever it identifies the naughty doggo. (See: https://docs.opencv.org/4.x/db/d28/tutorial_cascade_classifier.html)
Could likely achieve that with https://frigate.video and https://www.home-assistant.io