The Everlasting Picture Categorizing Project

Started by zourtney, Sep 05, 2008, 11:05 AM

Previous topic - Next topic

zourtney

Truth be told, I'm not actually interested in electrocution. The picture taking idea seems far superior in every aspect imaginable.

Why are the randomnets so slow?

Nick

Brent is streaming music on his computer. That's all I can think of. But that's download and shouldn't affect the upload speed. Beyond that I don't know.

zourtney

If I ever get rolling on this little project, would anyone be interested in testing for me?

Nick

Of course. I can help give you a little push if you like (get you rolling and all that....hmmmm rolling... go karts :) )

zourtney

The Wheels of Moderate Interest have started to turn again and the hulking Chariot of Motivation has begun to crawl. The Axles of....nevermind, my vehicular metaphor fell apart.

Anyway, I'm starting to design this is my head again. It comes in cycles and has become more frequent, as of late. Some day I might actually do something even! Maybe I can hash out my requirements for the 23rd time during lunch today.

Nick

Let us see them! I want to help out, but it is your project. You lead us! And we, your humble code monkey conspirators, shall follow. 

zourtney

#156
As I somewhat had expected, there is a decent way to tag photos in Windows. What I did not expect is that it is built in to Windows itself (Vista and 7, anyway). You can edit tags from the properties dialog on images, doc files, and a few others (but not all files, for unknown reasons). It's easy and the search bar in the upper-right corner actually does a pretty good job and will recurse into subfolders. It'll even pick up the camera model; I did a search for "rebel xsi" and it tossed me back everything taken with my new camera.

By default, it'll search on filenames as well, but you for prefix it with tag: and just search the tags. For example, I tagged a few things and then did tag:christmas and got what I was looking for. The only obvious feature I see missing is that you can't right-click on a folder and add tags -- you have to select a group of images first. And, of course, there is no tag merging or splitting, but that is minor and would probably never be used anyway.

I think I will play with this for a while and see if it is a usable solution to my never-ending image tagging project. If it does work decently, there may still be a need to write a program that'll do a nicer-looking search, import/export.

Edit: It's kind of obvious, but I forgot to mention it -- that there is no sort of tagcloud or anything. So, while the tagging might work great for searching, you aren't going to be able to visually see what you take pictures of the most or do any kind of "data mining" on tag frequency or tag relevance. Anyway, I'm gonna play with the hard part -- tagging 100,000 pictures!

Nick

Perhaps making a companion program to the native tagging in windows would work. Something that makes tagging large numbers of pictures easier and keeps track of what tags have been aplied to things (that way you don't have to remember what tags you used when trying to search for something. In example, you named a beach trip "sandysomething" and then forgot you did it.)

zourtney

So, I have tagged several thousand pictures with simple tags like "taken by courtney" or "klamath falls". The integration into the Explorer windows is pretty nice -- there is even a little bar across the bottom which lets you view and edit tags. The only problem is that it's a little to compact for my liking. Tags are only given an area about 100px width and 30px tall. As such, you can only see about 2 or 3 at a time unless you click on tagging box.

So, here are my conclusions, as of today:

  • Unsurprisingly, Windows seems to have jumped on the file-tagging bandwagon a bit late. Internet has it that Mac OS now has full file-tagging support built in to the file system itself (I think). In Windows, you can tag stuff, but only select file types (those with dynamically sizable headers, I suppose? I found that PDFs can NOT be tagged... >:(). The Explorer window and Properties dialog editing is a fairly quick and easy solution and is similar to what I had in mind. However, it is rather slow to apply tags in batches over 200 or so files. Patience required.
  • You can't recursively apply tags to subfolders. This is annoying. You have to manually select all JPG files and then edit the set. The tag property does not show up if you have non-JPG files selected. So, if you have any movie files (which I almost always do), you'll have to deselect them first. Ordering by file type helps a lot, but is still an annoyance. Enjoyment of tedious tasks required. By the way, searching for *.jpg and then trying to tag that simply refuses to work. It times out, or something, but never actually tells you what [didn't] happen.
  • When tagging stuff, windows will pop up suggested tags. This helps reduce the "bard briles" typo type mistakes. No complaints about that.
  • The search works quite well and will recursively search subdirectories. It is decently quick and will search both filenames and image tags.
  • There is no real way to know what tags the system is keeping. Surely they must be stored somewhere. But I do not know where. Consequently, I have no way of doing "count" type querying. Acceptance of tag statistic ignorance required.
  • I need an interface with a little more real-estate. Viewing the tags in the little 1.5 lines that is shown in the bottom of the window isn't enough. You can't glance at it and get any useful information. I'm looking for a tagcloud type thing when multiple items are selected.
  • As good as the search is, it lends itself to filename/folder searches. I'd prefer to search by tags and disregard filenames entirely. This could be fixed by giving my JPGs lame filenames, but...that's a lot of changing of stuff that doesn't need changed. An example of when this is annoying is searching for "christmas tree." I have hundreds of files named "Christmas xxx.jpg" and they show up. I want stuff tagged with "christmas" and/or "tree". Minor, but annoying. Other improvements I can see would be EXIF data filters for camera type, date range, exposure bias, etc. An "advanced search" seems necessary in theory, but maybe not in practice...after all, search results do not need to be 100% slim and trim accurate, they just need to return what you're looking for.

Nick

It seems there is some definite room for improvement over the windows getup. People are not patient, do not enjoy tedious and repetitive tasks, like more real-estate but probably are ignorant of tag statistics.

So you don't have to implement search because its already done for you.

Is there any way to make a "tool bar" to go into explorer in order to make more real estate available to the tags, as well as perhaps adding some buttons for tagging and doing some statistical culling of information. (then you could make a tag cloud!) You would just have to scan all the files to get what they are all tagged with, then store it all in a tags.db next to the thumbs.db. It'd be nice though if you could get windows to do it all for you.

zourtney

I'm wondering if I can find information about how windows stores these tags. It is obvious that they are stored in some system-wide database. When I start typing, it pulls email contacts, word-document authors, and a whole slew of tags I never entered. If I can seamlessly tie in to this, I think that's the way to go...even if it does tie the tag system down to a Windows platform at this point.

And, while it seems that the tag names are stored in some sort of database (or some other persistent, quick access data structure), I am not convinced that the tags are. They are quite likely just stored as text entries in the files themselves. The searches are quick, but there is a noticeable pause when searching through 80,000 pictures -- the kind of pause you wouldn't expect to see from a relatively simple database query.

But that is all just conjecture. I haven't started digging yet.

Nick

The database is probably built when the file system gets indexed. The pause might be from searching newer non-indexed files?

I don't know. I only speculate.