Velocity Reviews - Computer Hardware Reviews

Velocity Reviews > Newsgroups > Programming > Python > Understanding search queries, semantics, and "Meaning" ...aren't weall looking for meaning?

Reply
Thread Tools

Understanding search queries, semantics, and "Meaning" ...aren't weall looking for meaning?

 
 
5lvqbwl02@sneakemail.com
Guest
Posts: n/a
 
      12-30-2008
I have Section 4.4.1 of SICP rattling around in my head (database
queries), and I'm trying to come up with a simple dictionary-based
database in Python to represent circuit diagrams. My main confusion
isn't one of implementation, but a matter of "big thinking",
fundamentally, about the problem. Please don't suggest using a SQL
library, as I'm looking to learn to fish, so to speak, and to learn a
bit about the biology of fish.

I've subclassed dict to hdict ("hashable dict") and rewritten the
__hash__ function so I can include a dict into a set. Thanks to a
previous poster here for providing that suggestion.

A circuit component looks like this for example:

comp1 = hdict(value=1e-6, footprint='0402', vendor='digikey')
comp2 = hdict(value=10e3, footprint='0603', vendor='mouser')
etc, etc.

The database holds the values like this:
db = dict() # normal dict
db['value'] = ([1e-6, 10e3], [comp1, comp2]) #ordered set for fast
lookup/insertion
db['footprint'] = {'0402':set[comp1], '0603':comp2} # unordered uses
normal dict for fast lookup
db['vendor'] = {'digikey':[comp1], 'mouser':[comp2]}

So basically the keys are the component parameters, and the values is
the list of components with that value. Stuff that is comparable is
ordered; stuff that is discrete is not ordered, using either 2-tuples
or dicts, respectively.

This allows extremely fast lookup of components based on their
properties, with O(1) performance for non-ordered stuff, and O(log n)
performance for ordered stuff (using bisect methods). The set
operations are extremely fast, so I can do AND and OR operations on
compound queries of this nature without worry.

I need this speed not so much for selecting components when the user
types in a query, but for when the mouse is hovering over the
schematic and I need things to light up underneath, if the GUI is
generating hundreds of mouse messages per second, and for inspector
windows to appear quickly when you click, etc. If you have ever used
Altium, you can see how effective this is in terms of creating a good
interactive user experience.

My question is what happens when I choose to search for NOT
footprint='0402'.

Should this return a blank list? This happens by default., and is in
fact true: a blank list satisfies "not anything" actually.
Should this return everything that is NOT footprint 0402 (ie returns
0603 components)? This *strongly implies* a pre-selection of *all*
objects before performing the negating function, or of looking at the
ordering of other queries in the overall search term, and then
applying NOT to the results of another query.

But I'm suspicious of a brute force preselection of all objects
whenever I see a NOT, and anyway it messes up the clean query/combiner
method of search I'm doing, and it requires an implied sequence of
search, where I'm pretty sure it should not rely on sequencing. Even
though this is single threaded, etc., the semantics of the query
should not rely on ordering of the search term:

footprint='0402' and NOT vendor='mouser' should return the same as
NOT vendor='mouser' and footprint='0402'.

So this is my philosophical quandary. I'm not sure what the correct
thing is. In SICP they are using nondeterministic stuff which I don't
quite get, so it's hard to follow. Also they are not using
dictionaries and hashes, so I'm not sure if their generate-and-test
method would work here anyway. Generate-and-test seems extremely
inefficient.

Can a wise guru please enlighten me?

thanks
Michael
 
Reply With Quote
 
 
 
 
Jonathan Gardner
Guest
Posts: n/a
 
      12-30-2008
On Dec 30, 12:35*pm, (E-Mail Removed) wrote:
> I have Section 4.4.1 of SICP rattling around in my head (database
> queries), and I'm trying to come up with a simple dictionary-based
> database in Python to represent circuit diagrams. *My main confusion
> isn't one of implementation, but a matter of "big thinking",
> fundamentally, about the problem. Please don't suggest using a SQL
> library, as I'm looking to learn to fish, so to speak, and to learn a
> bit about the biology of fish.
>


I'm going to break rule #1 of your requirements but in an unexpected
way. Rather than studying PostgreSQL, MySQL, or Oracle, why don't you
crack open the topic of relational database theory and relational
algebra? That should help with the "big thinking" bit in the same way
understanding 1+1 helps you understand how to add any two numbers
together. No, SQL is not the be-all-end-all of relational theory. Yes,
it does a pretty dang good job at it, though, which is why it is still
around.


> I've subclassed dict to hdict ("hashable dict") and rewritten the
> __hash__ function so I can include a dict into a set. Thanks to a
> previous poster here for providing that suggestion.
>
> A circuit component looks like this for example:
>
> comp1 = hdict(value=1e-6, footprint='0402', vendor='digikey')
> comp2 = hdict(value=10e3, footprint='0603', vendor='mouser')
> etc, etc.
>
> The database holds the values like this:
> db = dict() # normal dict
> db['value'] = ([1e-6, 10e3], [comp1, comp2]) #ordered set for fast
> lookup/insertion
> db['footprint'] = {'0402':set[comp1], '0603':comp2} # unordered uses
> normal dict for fast lookup
> db['vendor'] = {'digikey':[comp1], 'mouser':[comp2]}
>
> So basically the keys are the component parameters, and the values is
> the list of components with that value. *Stuff that is comparable is
> ordered; stuff that is discrete is not ordered, using either 2-tuples
> or dicts, respectively.
>
> This allows extremely fast lookup of components based on their
> properties, with O(1) performance for non-ordered stuff, and O(log n)
> performance for ordered stuff (using bisect methods). *The set
> operations are extremely fast, so I can do AND and OR operations on
> compound queries of this nature without worry.
>
> I need this speed not so much for selecting components when the user
> types in a query, but for when the mouse is hovering over the
> schematic and I need things to light up underneath, if the GUI is
> generating hundreds of mouse messages per second, and for inspector
> windows to appear quickly when you click, etc. *If you have ever used
> Altium, you can see how effective this is in terms of creating a good
> interactive user experience.
>


OK, turn it around in your head. Consider the indexes you built above
as yet another table. Consider what a table or a database really is.
When you see how they are all really the same thing, either you've
mastered relational algebra of you've seen the big picture. Kind of
like the cons cell being enough to describe any data structure in the
universe, a table is enough to describe everything in the relational
world.

> My question is what happens when I choose to search for NOT
> footprint='0402'.
>
> Should this return a blank list? This happens by default., and is in
> fact true: a blank list satisfies "not anything" actually.
> Should this return everything that is NOT footprint 0402 (ie returns
> 0603 components)? *This *strongly implies* a pre-selection of *all*
> objects before performing the negating function, or of looking at the
> ordering of other queries in the overall search term, and then
> applying NOT to the results of another query.
>
> But I'm suspicious of a brute force preselection of all objects
> whenever I see a NOT, and anyway it messes up the clean query/combiner
> method of search I'm doing, and it requires an implied sequence of
> search, where I'm pretty sure it should not rely on sequencing. *Even
> though this is single threaded, etc., the semantics of the query
> should not rely on ordering of the search term:
>
> footprint='0402' and NOT vendor='mouser' should return the same as
> NOT vendor='mouser' and footprint='0402'.
>
> So this is my philosophical quandary. *I'm not sure what the correct
> thing is. *In SICP they are using nondeterministic stuff which I don't
> quite get, so it's hard to follow. *Also they are not using
> dictionaries and hashes, so I'm not sure if their generate-and-test
> method would work here anyway. *Generate-and-test seems extremely
> inefficient.
>
> Can a wise guru please enlighten me?
>


(I don't think I qualify as a guru, but I think I see how I can help.)

In a typical SQL database, when you type in "SELECT foo FROM bar WHERE
baz='bo'", you are not writing a program, at least not in the sense of
Python or C or Java or Perl where you give instructions on HOW to run
the program. You are writing a program in the sense of Lisp or Scheme
or Haskell in that you are giving instructions on WHAT the program is.

The database doesn't, at least shouldn't, read in all the rows in a
table and then start going through the process of elimination removing
rows it doesn't want to think about. That's highly inefficient, of
course. Instead, it transforms the query (the WHAT) into a set of
procedures that describe HOW to get the result.

That step---taking the definition of the program and turning it into
actual instructions on HOW to run the program---is the compilation and
optimization phase. It's a very hard step to do, but no harder than
the process Python goes through taking files of bytes and converting
them into Python bytecode.

Oh, by the way, this step is nondeterministic. Why? Well, no one can
really say what the BEST way to run any sufficiently complicated
program is. We can point out good ways and bad ways, but not the best
way. It's like the travelling salesman problem in a way.

Now, that "hard stuff" you are glossing over in SICP is the IMPORTANT
stuff. It's the WHOLE POINT of that section. Stop glossing over it.
Keep working at it bit by bit until you understand it. You, as a
programmer, will be infinitely better for it. When you understand it,
you will really unlock the full potential of Python and Scheme and
whatever other language is out there because you will see how to go
from HOW languages to WHAT languages.

Programmers who can write compilers are GOOD programmers. Programmers
who can understand someone else's compilers are even better.

On Python-specific stuff: For a moment, forget about Python and hashes
and dicts and arrays and lists and such. Focus only on the cons cell
and Scheme. When you have mastered those concepts, you will see how to
efficiently use the Python data structures and how to simplify the
Scheme code into Python. Right now you are doing a bit of premature
optimization.

That's my semi-guru advice.
 
Reply With Quote
 
 
 
 
5lvqbwl02@sneakemail.com
Guest
Posts: n/a
 
      12-30-2008
> > library, as I'm looking to learn to fish, so to speak, and to learn a
> > bit about the biology of fish.

>
> I'm going to break rule #1 of your requirements but in an unexpected
> way. Rather than studying PostgreSQL, MySQL, or Oracle, why don't you
> crack open the topic of relational database theory and relational
> algebra? That should help with the "big thinking" bit in the same way
> understanding 1+1 helps you understand how to add any two numbers


Ah, exactly, thanks for understanding my original post I guess
'relational database theory' is what I should start looking around
for... otherwise googling on sql, queries, etc., just returns how to
execute a query, but says nothing about how a fish works.


>
> In a typical SQL database, when you type in "SELECT foo FROM bar WHERE
> baz='bo'", you are not writing a program, at least not in the sense of
> Python or C or Java or Perl where you give instructions on HOW to run
> the program. You are writing a program in the sense of Lisp or Scheme
> or Haskell in that you are giving instructions on WHAT the program is.


I've gotten a strong inkling that parsing a query yields new code,
(lambdas) that are created on the fly to do the search.


> table and then start going through the process of elimination removing
> rows it doesn't want to think about. That's highly inefficient, of


Exactly what I'm trying to avoid.

> course. Instead, it transforms the query (the WHAT) into a set of
> procedures that describe HOW to get the result.


For now I'm not parsing actual text queries... my real "search query"
is coded directly in python like this:
p1 = lambda: db.search_leaf('x location', 'lte', 5)
p2 = lambda: db.search_leaf('footprint', 'eq', '0603')
p3 = lambda: db.search(db.AND, p1, p2)

p4 = lambda: db.search_leaf('x location', 'gte', 19)
p5 = lambda: db.search_leaf('footprint', 'eq', '0402')
p6 = lambda: db.search(db.AND, p1, p2)

fc = db.search(db.OR, p3, p4)

this particular example doesn't necessarily make any sense, but in
effect I'm trying to string together lambda functions which are
created explicitly for the individual query, then strung together with
combiner functions.



> Oh, by the way, this step is nondeterministic. Why? Well, no one can
> really say what the BEST way to run any sufficiently complicated
> program is. We can point out good ways and bad ways, but not the best
> way. It's like the travelling salesman problem in a way.


The nondeterministic stuff... wow, I've come across (call/cc...),
(require...), and different variants of this, both in sicp, teach
yourself scheme, the plt docs, other places, etc., but it still eludes
me. I'm afraid that unless I understand it, I'll wind up creating
some type of cargo-cult mimcry of a database without doing it right
(http://en.wikipedia.org/wiki/Cargo_cult)


> programmer, will be infinitely better for it. When you understand it,
> you will really unlock the full potential of Python and Scheme and
> whatever other language is out there because you will see how to go
> from HOW languages to WHAT languages.


I'm under the impression python doesn't have the internal guts for
nondeterministic programming, specifcially that its lambdas are
limited to single-line expressions, but I may be wrong here.

Still, at the begining of the nondeterministic section of SICP
(section 4.3), it says "nondeterministic computing... is useful for
'generate and test' applications", which almost categorically sounds
like an element-by-element search for what you're looking for. This
was my initial intention behind (prematurely?) optimizing the database
by using parameter keys instead of something like [x for x in stuff if
pred(x, val)] filtering.


> Programmers who can write compilers are GOOD programmers. Programmers
> who can understand someone else's compilers are even better.


Does incorporating a search capability in an application necessarily
mean I'm writing a search compiler? That seems overkill for the
specific case, but may be true generally. For instance, if a word
processing app allows you to search for characters with a certain
font, is that incorporating a search compiler and query language? Or
is it just brute-force filtering?


> That's my semi-guru advice.


Thanks
Michael
 
Reply With Quote
 
Jonathan Gardner
Guest
Posts: n/a
 
      01-02-2009
On Dec 30 2008, 3:25*pm, (E-Mail Removed) wrote:
> > In a typical SQL database, when you type in "SELECT foo FROM bar WHERE
> > baz='bo'", you are not writing a program, at least not in the sense of
> > Python or C or Java or Perl where you give instructions on HOW to run
> > the program. You are writing a program in the sense of Lisp or Scheme
> > or Haskell in that you are giving instructions on WHAT the program is.

>
> I've gotten a strong inkling that parsing a query yields new code,
> (lambdas) that are created on the fly to do the search.
>


More on this at the very end. Just smile to know that you're very
close.

> > course. Instead, it transforms the query (the WHAT) into a set of
> > procedures that describe HOW to get the result.

>
> For now I'm not parsing actual text queries... my real "search query"
> is coded directly in python like this:
> p1 = lambda: db.search_leaf('x location', 'lte', 5)
> p2 = lambda: db.search_leaf('footprint', 'eq', '0603')
> p3 = lambda: db.search(db.AND, p1, p2)
>
> p4 = lambda: db.search_leaf('x location', 'gte', 19)
> p5 = lambda: db.search_leaf('footprint', 'eq', '0402')
> p6 = lambda: db.search(db.AND, p1, p2)
>
> fc = db.search(db.OR, p3, p4)
>
> this particular example doesn't necessarily make any sense, but in
> effect I'm trying to string together lambda functions which are
> created explicitly for the individual query, then strung together with
> combiner functions.
>


If only compilers were so simple... Again, you're writing the "HOW"
when you're doing what you're doing above, when you really want to
write the "WHAT".

Oh, and BTW, lambdas are just an expression to generate a function
quickly. You're confusing the python lambda expression with lambdas
the theoretical concepts. Lambdas the theoretical concept are really
Python's callable objects. Python just expresses them in a very weird
way that is seemingly unrelated to lambda the theoretical concept (but
really is).

> > Oh, by the way, this step is nondeterministic. Why? Well, no one can
> > really say what the BEST way to run any sufficiently complicated
> > program is. We can point out good ways and bad ways, but not the best
> > way. It's like the travelling salesman problem in a way.

>
> The nondeterministic stuff... wow, I've come across (call/cc...),
> (require...), and different variants of this, both in sicp, teach
> yourself scheme, the plt docs, other places, etc., but it still eludes
> me. *I'm afraid that unless I understand it, I'll wind up creating
> some type of cargo-cult mimcry of a database without doing it right
> (http://en.wikipedia.org/wiki/Cargo_cult)
>


Sorry, I got confused on the meaning of non-deterministic programming.
I forgot that this was a specific thing in the Scheme world.

Yes, you've got to understand this. It makes writing a compiler much,
much easier, almost trivial.

> > programmer, will be infinitely better for it. When you understand it,
> > you will really unlock the full potential of Python and Scheme and
> > whatever other language is out there because you will see how to go
> > from HOW languages to WHAT languages.

>
> I'm under the impression python doesn't have the internal guts for
> nondeterministic programming, specifcially that its lambdas are
> limited to single-line expressions, but I may be wrong here.
>


(Recall what I mentioned about lambdas above. It applies here as
well.)

> Still, at the begining of the nondeterministic section of SICP
> (section 4.3), it says "nondeterministic computing... is useful for
> 'generate and test' applications", which almost categorically sounds
> like an element-by-element search for what you're looking for. *This
> was my initial intention behind (prematurely?) optimizing the database
> by using parameter keys instead of something like [x for x in stuff if
> pred(x, val)] filtering.
>


Yes, you certainly can't use list comprehensions to solve your
problems. They're too primitive and part of the interface is that they
actually iterate through the sequence.

> > Programmers who can write compilers are GOOD programmers. Programmers
> > who can understand someone else's compilers are even better.

>
> Does incorporating a search capability in an application necessarily
> mean I'm writing a search compiler? *That seems overkill for the
> specific case, but may be true generally. *For instance, if a word
> processing app allows you to search for characters with a certain
> font, is that incorporating a search compiler and query language? *Or
> is it just brute-force filtering?
>


Ok, here's some big picture hand-waving.

I'm going to make an assertion, and it turns out to be fundamental. I
won't explain it here but I hope you'll learn what it means as time
goes on. The assertion is this:

All programs simply transform one program's code into another
program's code.

By program, I mean, a set of instructions to a computer of some sort
that are meant to be actually evaluated somehow. By program's code, I
mean a description in any form of such a program.

For instance, all human language is really some kind of program code.
So is all Scheme, Lisp, Python, whatever code. So is HTML. So are
bytes and words in memory representing a compiled program. So are IP
packets on the internet.

When you type in search terms to the search feature of Word, you are
writing program code. Sure, the program code is going to be
interpreted in some way, but it represents a program that says, "Find,
by whatever method available, the next instance of text matching the
text I type in here". That's a fairly easy thing to do, and you don't
need to know about relational algebra and non-deterministic code to
write a correct and fast solution.

If you're going to implement a general function that takes arbitrary
search criteria against arbitrary structured data and uses several
methods together to find the answers, that's going to require a fairly
advanced compiler and optimizer. You can use Scheme's non-
deterministic methods to build that program, which is really neat and
a big time-saver in terms of development time. That level of program
is pretty advanced and only those who really understand how to write a
compiler can put it together.

For the mouse-hovering bit, since you're always running the same
query, you can "pre-compile" it and just write a specific index and
some specific code to look it up. This isn't too difficult.
 
Reply With Quote
 
Aahz
Guest
Posts: n/a
 
      01-14-2009
In article <(E-Mail Removed)>,
<(E-Mail Removed)> wrote:
>
>I have Section 4.4.1 of SICP rattling around in my head (database
>queries), and I'm trying to come up with a simple dictionary-based
>database in Python to represent circuit diagrams. My main confusion
>isn't one of implementation, but a matter of "big thinking",
>fundamentally, about the problem. Please don't suggest using a SQL
>library, as I'm looking to learn to fish, so to speak, and to learn a
>bit about the biology of fish.


Having read the rest of your conversation with Jonathan, I'm going to
suggest a related book that I think may help you approach this from a
different angle: _Mastering Regular Expressions_ by Jeffrey Friedl. (I
haven't gone through SICP myself, so I can't be sure that it's relevant,
but it covers the topic of transforming search criteria into executable
statements -- and it will almost certainly be useful to you even if it's
not relevant to your current problem.)
--
Aahz ((E-Mail Removed)) <*> http://www.pythoncraft.com/

"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are, by
definition, not smart enough to debug it." --Brian W. Kernighan
 
Reply With Quote
 
 
 
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
Help understand probems - Binary Search and Sequenital Search Timmy C++ 5 07-09-2007 02:41 PM
Looking for a good site for a basic understanding of FTP Lookout Computer Support 32 01-09-2007 02:45 AM
search within a search within a search - looking for better way...my script times out Abby Lee ASP General 5 08-02-2004 04:01 PM



Advertisments