Discussion:
[basex-talk] standalone vs GUI character parsing
Bridger Dyson-Smith
2016-09-28 16:24:00 UTC
Permalink
Hi all -

I'm able to create a database with the GUI from an XML document that
contains an invalid character (U+0000) -- I guess BaseX does some character
scrubbing, which is awesome :). When I try creating a database with the
same input XML document from the commandline (using the basex executable),
I get an error message.

Here are my commands in standalone mode:
BaseX 8.5.3 [Standalone]
Try 'help' to get more information.
create db test-bad-char /usr/home/bridger/src/another-test.xml
"/usr/home/bridger/src/another-test.xml" (Line 4): An invalid XML character
(Unicode: 0xb) was found in the element content of the document.
xquery db:create("test-bad-char",
"/usr/home/bridger/src/another-test.xml", "/usr/home/bridger/src/", map {
"chop": false() })
Stopped at ., 1/10:
[FODC0002] "/usr/home/bridger/src/another-test.xml" (Line 4): An invalid
XML character (Unicode: 0xb) was found in the element content of the
document.
This isn't a surprise (I'm more surprised (pleasantly) by the GUI
behavior); is there a way to apply the same "scrubbing" from the GUI in
standalone mode? I'm sure there is, but I'm not able to figure out which
option to apply.

As always, thank you for your time and trouble, and thanks for BaseX.
Best,
Bridger
Christian Grün
2016-09-30 12:10:39 UTC
Permalink
Hi Bridger,

Sorry for letting you wait.
Post by Bridger Dyson-Smith
I'm able to create a database with the GUI from an XML document that
contains an invalid character (U+0000) -- I guess BaseX does some character
scrubbing, which is awesome :).
By default, XML documents with invalid characters should be rejected;
but if you turn on the internal parser in the parsing tab of the
Database Creation dialog, all invalid characters will be replaced with
FFFD. Maybe that’s what you have done?
Post by Bridger Dyson-Smith
BaseX 8.5.3 [Standalone]
Try 'help' to get more information.
create db test-bad-char /usr/home/bridger/src/another-test.xml
SET INTPARSE on
CREATE DB ...
I have slightly extended our Wiki entry for the INTPARSE option [1];
hope this helps,
Christian

[1] http://docs.basex.org/wiki/Options#INTPARSE
Bridger Dyson-Smith
2016-09-30 13:53:23 UTC
Permalink
Hil Christian,
Post by Christian Grün
Hi Bridger,
Sorry for letting you wait.
No trouble at all.
Post by Christian Grün
Post by Bridger Dyson-Smith
I'm able to create a database with the GUI from an XML document that
contains an invalid character (U+0000) -- I guess BaseX does some
character
Post by Bridger Dyson-Smith
scrubbing, which is awesome :).
By default, XML documents with invalid characters should be rejected;
but if you turn on the internal parser in the parsing tab of the
Database Creation dialog, all invalid characters will be replaced with
FFFD. Maybe that’s what you have done?
That's exactly what I've done! :) I've habitually used the internal parser
and didn't realize that I needed to add it as an option.
Post by Christian Grün
Post by Bridger Dyson-Smith
BaseX 8.5.3 [Standalone]
Try 'help' to get more information.
create db test-bad-char /usr/home/bridger/src/another-test.xml
SET INTPARSE on
CREATE DB ...
I have slightly extended our Wiki entry for the INTPARSE option [1];
hope this helps,
Christian
Absolutely yes.
Thank you kindly.
Best,
Bridger
Post by Christian Grün
[1] http://docs.basex.org/wiki/Options#INTPARSE
George Sofianos
2016-10-27 19:30:52 UTC
Permalink
What about characters that outside the UTF-8 scope? I think that still
makes the internal parser to fail. I thought that was intended behaviour
so I never mentioned it.
Post by Christian Grün
By default, XML documents with invalid characters should be rejected;
but if you turn on the internal parser in the parsing tab of the
Database Creation dialog, all invalid characters will be replaced with
FFFD. Maybe that’s what you have done?
I also noticed that the QUERYPATH has been removed from latest builds,
how can I set the Docker image to find xq modules? I was using the
QUERYPATH to map them.
Christian Grün
2016-10-27 20:14:22 UTC
Permalink
Post by George Sofianos
What about characters that outside the UTF-8 scope?
That’s a difficult one. You may end up parsing silly stuff once you
tolerate wrongly encoded characters. If there is no chance to get your
input cleaned before sending it to BaseX, Tagsoup may be the last
resort.
Post by George Sofianos
I also noticed that the QUERYPATH has been removed from latest builds, how
can I set the Docker image to find xq modules? I was using the QUERYPATH to map them.
I pass this on to the Docker aficionados on the list…

Christian
George Sofianos
2016-10-27 22:28:46 UTC
Permalink
Post by Christian Grün
I pass this on to the Docker aficionados on the list…
Christian
Thanks and sorry for responding on a month old post about the xml
parser, I just noticed my email filters were not working.

About the QUERYPATH, I think the issue isn't specifically about docker.
Maybe I'm missing something, but how can a basexclient execute XQUERY
"import module namespace test = "test" at "test.xq" if there isn't a
querypath to define the directory for the modules? I'm trying this on a
local server instance and it searches for the test.xq in the BaseX bin
directory. I hope there is an alternative way to declare the path,
because I won't be able to use BaseX any more from my java application,
using the BasexClient query method.

Specifically about Docker, the older images can't run because of the .m2
permissions, and the latest one is missing QUERYPATH.
Christian Grün
2016-10-28 05:11:43 UTC
Permalink
Hi George,
Post by George Sofianos
how can a basexclient execute XQUERY
"import module namespace test = "test" at "test.xq" if there isn't a
querypath to define the directory for the modules?
One way is to specify the base URI in your query [1]. If you
frequently import server-side modules, the approach we recommend is to
move the modules into the repository.

Hope this helps,
Christian

[1] https://www.w3.org/TR/xquery-31/#id-base-uri-decl
[2] http://docs.basex.org/wiki/Repository
George Sofianos
2016-10-28 11:03:25 UTC
Permalink
Post by Christian Grün
One way is to specify the base URI in your query [1]. If you
frequently import server-side modules, the approach we recommend is to
move the modules into the repository.
Hope this helps,
Christian
[1] https://www.w3.org/TR/xquery-31/#id-base-uri-decl
[2] http://docs.basex.org/wiki/Repository
While the base uri works, it isn't very convenient, because it forces
you to know the modules directory path beforehand, which means I can't
deploy it in two different systems and expect it to work without changes.

I will also take a better look at the repository later, but from what I
understand I need to remove all relative location uri from the module
import of every xquery script? Beucase I tried just copying the files in
the repo and it didn't work. I have a large amount of scripts (maybe
over 500), that will need manual changes.

Finally, I think the querypath option was very useful, so please don't
remove it :)
Christian Grün
2016-10-28 11:06:21 UTC
Permalink
Beucase I tried just copying the files in the repo
and it didn't work.
Could you give me some details what went wrong?
Finally, I think the querypath option was very useful, so please don't
remove it :)
Well, this will be difficult… We had to do numerous rewritings, and
the QUERYPATH option was kind of hacky (seen from today’s
perspective). We may be able adding something similar for specific use
cases like yours, but I can’t promise anything yet.
George Sofianos
2016-10-28 11:43:37 UTC
Permalink
Post by Christian Grün
Could you give me some details what went wrong?
Well even if I copy the modules in the repo directory, the main module
still has a relative path in the import declaration, so basex is
searching for the modules in "/srv/" directory in the docker container,
or in the /bin directory if I run the basexserver from the terminal. If
I remove the 'at "file.xq"' it is trying to do something different, but
I still get this error: [XQST0059] Module not found: "namespace" both on
docker container and on normal basex server.
Christian Grün
2016-10-28 11:46:53 UTC
Permalink
Well even if I copy the modules in the repo directory, the main module still
has a relative path in the import declaration
You’ll indeed need to remove the location specifier in your main
module; after that, it should work (see our documentation for more
details).
George Sofianos
2016-10-28 11:55:32 UTC
Permalink
Post by Christian Grün
You’ll indeed need to remove the location specifier in your main
module; after that, it should work (see our documentation for more
details).
I understand. However, this won't do for my use case. Let's say
hypothetically I have about 500 modules, most of them library and some
of them main modules. All of them are in the same directory, and some of
them share the same namespace. Most of the library modules also import
other library modules, so I will have to manually remove all location
specifiers from every file. Also currently, most of the same scripts can
also work with Saxon without any changes, but if I remove the specifiers
this will probably make Saxon stop working. So I guess I will have to
use base-uri for now.
Christian Grün
2016-10-28 12:11:42 UTC
Permalink
I see. If all modules are in the same directory, it could make sense
to start the BaseX server from that directory.

I assume we didn’t have uses cases like yours in our mind, because in
our setups, all XQuery code is either stored completely server-side or
organized in the repository. – Sorry for the surprise.
Post by George Sofianos
Post by Christian Grün
You’ll indeed need to remove the location specifier in your main
module; after that, it should work (see our documentation for more
details).
I understand. However, this won't do for my use case. Let's say
hypothetically I have about 500 modules, most of them library and some of
them main modules. All of them are in the same directory, and some of them
share the same namespace. Most of the library modules also import other
library modules, so I will have to manually remove all location specifiers
from every file. Also currently, most of the same scripts can also work with
Saxon without any changes, but if I remove the specifiers this will probably
make Saxon stop working. So I guess I will have to use base-uri for now.
Christian Grün
2016-10-28 12:13:37 UTC
Permalink
PS, out of interest: How did you solve the URI problem with Saxon?
Post by George Sofianos
Post by Christian Grün
You’ll indeed need to remove the location specifier in your main
module; after that, it should work (see our documentation for more
details).
I understand. However, this won't do for my use case. Let's say
hypothetically I have about 500 modules, most of them library and some of
them main modules. All of them are in the same directory, and some of them
share the same namespace. Most of the library modules also import other
library modules, so I will have to manually remove all location specifiers
from every file. Also currently, most of the same scripts can also work with
Saxon without any changes, but if I remove the specifiers this will probably
make Saxon stop working. So I guess I will have to use base-uri for now.
George Sofianos
2016-10-28 12:26:36 UTC
Permalink
Post by Christian Grün
PS, out of interest: How did you solve the URI problem with Saxon?
In Saxon, I've set the base uri to the directory that has the xquery
modules on the Saxon compiler, using the setBaseURI method of the S9.
This works because Saxon is running within our web application, so they
share the same filesystem. We are running BaseX from Docker, which
allows us to scale basex instances when we need more power, and expose
the modules directory to the docker container.
George Sofianos
2017-06-12 14:50:21 UTC
Permalink
Hi,

I'm trying to use a most recent version of BaseX server (I'm stuck with
8.4.4 now), and I'm trying to use base-uri to import my modules (BaseX
latest snapshot). However, I can't seem to make this work. While
declaring the base-uri works for the XQuery scripts that runs on BaseX
GUI, basexserver and basexhttp seems to be giving errors. Unless I'm
doing something very wrong, I get this error from my library module:

lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.

Also I'm declaring the base-uri in the main module only, like: declare
base-uri "file:///home/user/directory/"; without the final slash it gets
resolved to the previous directory, I guess that's intentional? I only
found a reference in this closed issue:
https://github.com/BaseXdb/basex/issues/1454

The alternative would be to bring QUERYPATH back, somehow :) I will then
be able to upgrade without compatibility issues.

Thanks,

George
Post by Christian Grün
Well, this will be difficult… We had to do numerous rewritings, and
the QUERYPATH option was kind of hacky (seen from today’s
perspective). We may be able adding something similar for specific use
cases like yours, but I can’t promise anything yet.
Christian Grün
2017-06-12 17:09:00 UTC
Permalink
Unless I'm doing
lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
Hi George, do you have a mini example that I can test out-of-the-box?
Thanks in advance.
declare
base-uri "file:///home/user/directory/"; without the final slash it gets
resolved to the previous directory, I guess that's intentional?
Exactly.
George Sofianos
2017-06-13 08:38:52 UTC
Permalink
Sorry for the delay, I want to make sure first there is nothing wrong
with my system. I noticed these scripts also fail on a local server
(8.4.3) with file not found error (while the file exists), but they run
fine on a BaseX GUI. I will reply again when I find out what's wrong -
or not.

George
Post by Christian Grün
Unless I'm doing
lineNumber: 1; columnNumber: 1; Content is not allowed in prolog.
Hi George, do you have a mini example that I can test out-of-the-box?
Thanks in advance.
Loading...