-
support
- Site Admin
- Posts: 3065
- Joined: Fri Feb 07, 2003 12:48 pm
- Location: Melbourne, Australia
-
Contact:
Post
by support » Sun Jul 30, 2017 11:59 pm
awasu.user wrote:first channel finish - get data from here to Awasu
n channel finish - get data from here to Awasu
end of the script execution
Write your script to generate one RSS file for each channel.
Create a channel in Awasu for each file.
awasu.user wrote:I use in generation code between tags:
Creating a CDATA block is not 100% reliable e.g. if the content is talking about how to use CDATA blocks, you will end up with nested CDATA blocks, which doesn't work

Microsoft smart quotes will also break things. Post the failing bit of content and I'll see if I can spot the error.
-
awasu.user
- Posts: 105
- Joined: Fri Jan 06, 2017 12:50 pm
Post
by awasu.user » Mon Jul 31, 2017 5:19 am
For test I use code:
rss_data = u'''<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
<channel>
<title><![CDATA[W3Schools Home Page]</title>
<link><![CDATA[
https://www.w3schools.com]</link>
<description><![CDATA[PL characters: ąęłńóźć! ?!Free web building tutorials]</description>
<item>
<title><![CDATA[RSS Tutorial]</title>
<link><![CDATA[
https://www.w3schools.com/xml/xml_rss.asp]</link>
<description><![CDATA[New RSS tutorial on W3Schools]</description>
</item>
<item>
<title><![CDATA[XML Tutorial]</title>
<link><![CDATA[
https://www.w3schools.com/xml]</link>
<description><![CDATA[New XML tutorial on W3Schools]</description>
</item>
</channel>
</rss>
'''
print(rss_data)
In Awasu I get:
Code: Select all
XML parse failed (4:L7:C39): not well-formed (invalid token)
data from Channel Feed in Awasu:
<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
<channel>
<title><![CDATA[W3Schools Home Page]</title>
<link><![CDATA[
https://www.w3schools.com]</link>
<description><![CDATA[PL characters: �����! ?!Free web building tutorials]</description>
<item>
<title><![CDATA[RSS Tutorial]</title>
<link><![CDATA[
https://www.w3schools.com/xml/xml_rss.asp]</link>
<description><![CDATA[New RSS tutorial on W3Schools]</description>
</item>
<item>
<title><![CDATA[XML Tutorial]</title>
<link><![CDATA[
https://www.w3schools.com/xml]</link>
<description><![CDATA[New XML tutorial on W3Schools]</description>
</item>
</channel>
</rss>
I tried
, but it has on the string beginning python bytes indicator (b'). I tried too
Code: Select all
from __future__ import print_function
import sys
def safeprint(s):
try:
print(s)
except UnicodeEncodeError:
if sys.version_info >= (3,):
print(s.encode('utf8').decode(sys.stdout.encoding))
else:
print(s.encode('utf8'))
safeprint(rss_data)
and is the same problem.
-
support
- Site Admin
- Posts: 3065
- Joined: Fri Feb 07, 2003 12:48 pm
- Location: Melbourne, Australia
-
Contact:
Post
by support » Mon Jul 31, 2017 6:25 am
You haven't closed the CDATA sections properly
When you're having problems like this, one trick you can use is to save the output to a file, then open it in a browser - it might give you more clues as to what the problem is. Also, if I open the XML up in Notepad++, the syntax highlighting tells me something is wrong.
-
awasu.user
- Posts: 105
- Joined: Fri Jan 06, 2017 12:50 pm
Post
by awasu.user » Mon Jul 31, 2017 7:37 am
Oops! You've right!

I closed tag correctly and still can get right output. Problem is with putting UTF-8 to console. On test charset characters Awasu is talking about mistake...

I'm looking for alternative to print to write output.
-
support
- Site Admin
- Posts: 3065
- Joined: Fri Feb 07, 2003 12:48 pm
- Location: Melbourne, Australia
-
Contact:
Post
by support » Mon Jul 31, 2017 7:50 am
awasu.user wrote:Problem is with putting UTF-8 to console. On test charset characters Awasu is talking about mistake...

I'm looking for alternative to print to write output.
I talked about this in the big
Unicode tutorial. Your
safeprint() function is not really the way to go, take a look at my
print_utf8().
-
awasu.user
- Posts: 105
- Joined: Fri Jan 06, 2017 12:50 pm
Post
by awasu.user » Mon Jul 31, 2017 9:20 am
Your function in my code:
Code: Select all
import sys
def print_utf8( val ) :
sys.stdout.buffer.write( val.encode( "utf-8" ) )
sys.stdout.buffer.write( b"\n" )
rss_data = u'''<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0">
<channel>
<title><![CDATA[W3Schools Home Page]</title>
<link><![CDATA[https://www.w3schools.com]</link>
<description><![CDATA[PL characters: ąęłńóźć! ?!Free web building tutorials]</description>
<item>
<title><![CDATA[RSS Tutorial]</title>
<link><![CDATA[https://www.w3schools.com/xml/xml_rss.asp]</link>
<description><![CDATA[New RSS tutorial on W3Schools]</description>
</item>
<item>
<title><![CDATA[XML Tutorial]</title>
<link><![CDATA[https://www.w3schools.com/xml]</link>
<description><![CDATA[New XML tutorial on W3Schools]</description>
</item>
</channel>
</rss>
'''
print_utf8(rss_data)
gets me error:
sys.stdout.buffer.write( val.encode( "utf-8" ) )
AttributeError: 'PseudoOutputFile' object has no attribute 'buffer'
I'm start digging in python docs to find more...
-
support
- Site Admin
- Posts: 3065
- Joined: Fri Feb 07, 2003 12:48 pm
- Location: Melbourne, Australia
-
Contact:
Post
by support » Mon Jul 31, 2017 10:06 am
Works for me (Python 3.6.1).
-
awasu.user
- Posts: 105
- Joined: Fri Jan 06, 2017 12:50 pm
Post
by awasu.user » Mon Jul 31, 2017 10:38 am
It's strange. I use the same version on Win7 and this code make me error

-
support
- Site Admin
- Posts: 3065
- Joined: Fri Feb 07, 2003 12:48 pm
- Location: Melbourne, Australia
-
Contact:
Post
by support » Mon Jul 31, 2017 10:49 am
Do you have the "official" Python distribution (from python.org), or something else e.g. ActiveState's
Start Python from the command line, and tell me what this gives you:
Code: Select all
import sys
sys.version
type(sys.stdout.buffer)
-
awasu.user
- Posts: 105
- Joined: Fri Jan 06, 2017 12:50 pm
Post
by awasu.user » Mon Jul 31, 2017 12:44 pm
Official from python.org and ipython installed by pip (for jupyter notebook).
Result typing is:
Python 3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import sys
>>> sys.version
'3.6.1 (v3.6.1:69c0db5, Mar 21 2017, 18:41:36) [MSC v.1900 64 bit (AMD64)]'
>>> type(sys.stdout.buffer)
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
type(sys.stdout.buffer)
AttributeError: 'PseudoOutputFile' object has no attribute 'buffer'
>>>
-
support
- Site Admin
- Posts: 3065
- Joined: Fri Feb 07, 2003 12:48 pm
- Location: Melbourne, Australia
-
Contact:
Post
by support » Mon Jul 31, 2017 1:00 pm
My misteak, I should've asked for
I get
io.TextIOWrapper, you seem to have
PseudoOutputFile, and given that the traceback points to a file called pyshell, I rather suspect it's ipython that's screwing things up (at a guess, I'd say it's capturing the output so it can do something with it).
There are other ways of outputing UTF8, just Google around a bit.
-
awasu.user
- Posts: 105
- Joined: Fri Jan 06, 2017 12:50 pm
Post
by awasu.user » Mon Jul 31, 2017 1:10 pm
support wrote:My misteak, I should've asked for
I change and I got:
Code: Select all
type(sys.stdout.buffer)
<class '_io.BufferedWriter'>
Eh, googling again.
-
awasu.user
- Posts: 105
- Joined: Fri Jan 06, 2017 12:50 pm
Post
by awasu.user » Mon Jul 31, 2017 1:56 pm
Only working solution - change charstet to Windows. In Awasu no errors, characters coding ok.